I would like to sort/arrange data by group. That's easy enough. However, I only want to sort values within specific groups, not all groups.
I found one possible instance of a similar question at the link. But I found it to be confusing due to the framing of the question by the OP.
Arrange values within a specific group
Sample data:
df <- data.frame(var = c("apple", "banana", "eggplant", "carrot", "dill", "fava", "garlic"),
grp = c("A", "A", "B", "B", "B", "C", "C"),
val = c(4, 2, 1, 3, 7, 6, 2))
df
# var grp val
# 1 apple A 4
# 2 banana A 2
# 3 carrot B 3
# 4 dill B 7
# 5 eggplant B 1
# 6 fava C 6
# 7 garlic C 2
Desired output:
# var grp val
# 1 apple A 4
# 2 banana A 2
# 3 eggplant B 1
# 4 carrot B 3
# 5 dill B 7
# 6 garlic C 2
# 7 fava C 6
Partial solution:
library(dplyr)
df %>%
group_by(grp) %>%
arrange(val, .by_group = T)
This of course sorts for all groups. How do I get it to sort for only the groups I would like sorted, which are "B" and "C"? I would like a tidyverse solution but feel free to post a base solution as well.
We can change the sign to the elements in 'val' that correspond to "A" group so that it is ordered in the opposite direction compared to the 'val' elements in other group
library(dplyr)
df %>%
arrange(grp, val * c(1, -1)[(grp == 'A') + 1])
-output
var grp val
1 apple A 4
2 banana A 2
3 eggplant B 1
4 carrot B 3
5 dill B 7
6 garlic C 2
7 fava C 6
Or if the values for 'A' should be kept as such, then mltiply by 0 so that each value is same for 'A'
df %>%
arrange(grp, val * c(1, 0)[(grp == 'A') + 1])
var grp val
1 apple A 4
2 banana A 2
3 eggplant B 1
4 carrot B 3
5 dill B 7
6 garlic C 2
7 fava C 6
NOTE: This is done without any group_by attribute
If we want to use the OP's way, i.e. using group_by
df %>%
group_by(grp) %>%
arrange(case_when(grp == 'A' ~ -1 * val, TRUE ~ val),
.by_group = TRUE) %>%
ungroup
-ouptutu
# A tibble: 7 x 3
var grp val
<chr> <chr> <dbl>
1 apple A 4
2 banana A 2
3 eggplant B 1
4 carrot B 3
5 dill B 7
6 garlic C 2
7 fava C 6
If the values in 'val' for grp 'A' are showed in descending order because of coincidence, then create a sequence column before doing the grouping and then use that for modifying
df %>%
mutate(rn = row_number()) %>%
group_by(grp) %>%
arrange(case_when(grp == 'A' ~ as.numeric(rn), TRUE ~ val),
.by_group = TRUE) %>%
ungroup %>%
dplyr::select(-rn)
-output
# A tibble: 7 x 3
var grp val
<chr> <chr> <dbl>
1 apple A 4
2 banana A 2
3 eggplant B 1
4 carrot B 3
5 dill B 7
6 garlic C 2
7 fava C 6
Or using base R
df[with(df, order(grp, c(1, 0)[(grp == 'A') + 1] * val)),]
var grp val
1 apple A 4
2 banana A 2
3 eggplant B 1
4 carrot B 3
5 dill B 7
7 garlic C 2
6 fava C 6
You can filter the groups you want to arrange, sort them and bind to the remaining data.
library(dplyr)
order_groups <- c('B', 'C')
df %>%
filter(grp %in% order_groups) %>%
arrange(grp, val) %>%
bind_rows(df %>%
filter(!grp %in% order_groups)) %>%
arrange(grp)
#. var grp val
#1 apple A 4
#2 banana A 2
#3 eggplant B 1
#4 carrot B 3
#5 dill B 7
#6 garlic C 2
#7 fava C 6
I have a dataframe that has two columns "id" and "detail" (df_current below). I need to group the dataframe by id, and spread the file so that the columns become "Interface1", "Interface2", etc. and the contents under the interface columns are the immediate values under each time the interface value appears. Essentially the "!" is working as a separator, but it is not needed in the output.
The desired output is shown below as: "df_needed_from_current".
I have tried multiple approaches (group_by, spread, reshape, dcast etc.), but can't get it to work. Any help would be greatly appreciated!
Sample Current Dataframe (code to create under):
id
detail
1
!
1
Interface1
1
a
1
b
1
!
1
Interface2
1
a
1
b
2
!
2
Interface1
2
a
2
b
2
c
2
!
2
Interface2
2
a
3
!
3
Interface1
3
a
3
b
3
c
3
d
df_current <- data.frame(
id = c("1","1","1","1","1","1","1","1","2",
"2","2","2","2","2","2","2","3","3",
"3","3","3","3","4","4","4","4","4",
"4","4","4","4","4","4","4","4","4",
"5","5","5","5","5","5","5","5","5",
"5","5","5","5"),
detail = c("!", "Interface1","a","b","!",
"Interface2","a","b","!","Interface1",
"a","b","c","!","Interface2","a",
"!", "Interface1","a","b","c","d",
"!", "Interface1","a","b","!",
"Interface2","a","b","c","!","Interface3",
"a","b","c","!","Interface1","a","b","!",
"Interface2","a","b","c","!","Interface3",
"a","b"))
Dataframe Needed (code to create under):
ID
Interface1
Interface2
Interface3
1
a
a
NA
1
b
b
NA
2
a
a
NA
2
b
NA
NA
2
c
NA
NA
3
a
NA
NA
3
b
NA
NA
3
c
NA
NA
3
d
NA
NA
df_needed_from_current <- data.frame(
id = c("1","1","2","2","2","3","3","3","3","4","4","4","5","5","5"),
Interface1 = c("a","b","a","b","c","a","b","c","d","a","b","NA","a","b","NA"),
Interface2 = c("a","b","a","NA","NA","NA","NA","NA","NA","a","b","c","a","b","c"),
Interface3 = c("NA","NA","NA","NA","NA","NA","NA","NA","NA","a","b","c","a","b","NA")
)
We remove the rows where the 'detail' values is "!", then create a new column 'interface' with only values that have prefix 'Interface' from 'detail', use fill from tidyr to fill the NA elements with the previous non-NA, filter the rows where the 'detail' values are not the same as 'interface' column, create a row sequence id with rowid(from data.table) and reshape to 'wide' format with pivot_wider
library(dplyr)
library(tidyr)
library(data.table)
library(stringr)
df_current %>%
filter(detail != "!") %>%
mutate(interface = case_when(str_detect(detail, 'Interface') ~ detail)) %>%
group_by(id) %>%
fill(interface) %>%
ungroup %>%
filter(detail != interface) %>%
mutate(rn = rowid(id, interface)) %>%
pivot_wider(names_from = interface, values_from = detail) %>%
select(-rn)
# A tibble: 15 x 4
# id Interface1 Interface2 Interface3
# <chr> <chr> <chr> <chr>
# 1 1 a a <NA>
# 2 1 b b <NA>
# 3 2 a a <NA>
# 4 2 b <NA> <NA>
# 5 2 c <NA> <NA>
# 6 3 a <NA> <NA>
# 7 3 b <NA> <NA>
# 8 3 c <NA> <NA>
# 9 3 d <NA> <NA>
#10 4 a a a
#11 4 b b b
#12 4 <NA> c c
#13 5 a a a
#14 5 b b b
#15 5 <NA> c <NA>
I got a simple question that I cannot figure out solutions.
Also, I didn't find an answer that I understand.
Imagine I got this data frame
(ts <- tibble(
+ a = LETTERS[1:10],
+ b = c(rep(1, 5), rep(2,5))
+ ))
# A tibble: 10 x 2
a b
<chr> <dbl>
1 A 1
2 B 1
3 C 1
4 D 1
5 E 1
6 F 2
7 G 2
8 H 2
9 I 2
10 J 2
What I want is simple. I want to build a df with the column b indexing a sliding window which sizes n f the column a.
The output can be something like this:
# A tibble: 8 x 2
b a
<dbl> <chr>
1 1 A B
2 1 B C
3 1 C D
4 1 D E
5 2 F G
6 2 G H
7 2 H I
8 2 I J
I don't care if the column a contains an array (nest values).
I just need a new data frame based on the sliding window.
Since this operation will run in a relational database I'd like a function compatible with DBI-PostgresSQL.
Any help is appreciated.
Thanks in advance
We can group by 'b', create the new column based on the lead of 'a', remove the NA rows with na.omit
library(dplyr)
ts %>%
group_by(b) %>%
mutate(a2 = lead(a)) %>%
ungroup %>%
na.omit %>%
select(b, everything())
# A tibble: 8 x 3
# b a a2
# <dbl> <chr> <chr>
#1 1 A B
#2 1 B C
#3 1 C D
#4 1 D E
#5 2 F G
#6 2 G H
#7 2 H I
#8 2 I J
If lead doesn't works, then just remove the first element, append NA at the end in the mutate step
ts %>%
group_by(b) %>%
mutate(a2 = c(a[-1], NA)) %>%
ungroup %>%
na.omit %>%
select(b, everything())
I have 2 data frames. One (df1) looks like this:
var.1 var.2 var.3 var.4
1 7 9 1 2
2 4 6 9 7
3 2 NA NA NA
And the other (df2) looks like this:
var.a var.b var.c var.d
1 1 b c d
2 2 f g h
3 4 j k l
3 7 j k z
...
with all of the values listed out in var.1-var.4 in df1 in var.a of df2.
I want to match var.a from df2 across all of the columns listed in df1 and then add these columns to df1 with new/combined column names. So for instance it'll look like this:
var.1 var1.b var1.c var1.d ... var.4 var4.b var4.c var4.d
1 7 j k z 2 f g h
2 4 j k l 7 j k z
3 2 f g h NA NA NA NA
Thanks in advance!
Here's a tidyverse solution. First, I define the data frames.
df1 <- read.table(text = " var.1 var.2 var.3 var.4
1 7 9 1 2
2 4 6 9 7
3 2 NA NA NA", header = TRUE)
df2 <- read.table(text = " var.a var.b var.c var.d
1 1 b c d
2 2 f g h
3 4 j k l
4 7 j k z", header=TRUE)
Then, I load the libraries.
# Load libraries
library(tidyr)
library(dplyr)
library(tibble)
Finally, I restructure the data.
# Manipulate data
df1 %>%
rownames_to_column() %>%
gather(variable, value, -rowname) %>%
left_join(df2, by = c("value" = "var.a")) %>%
gather(foo, bar, -variable, -rowname) %>%
unite(goop, variable, foo) %>%
spread(goop, bar) %>%
select(-rowname)
#> Warning: attributes are not identical across measure variables;
#> they will be dropped
which gives,
#> var.1_value var.1_var.b var.1_var.c var.1_var.d var.2_value var.2_var.b
#> 1 7 j k z 9 <NA>
#> 2 4 j k l 6 <NA>
#> 3 2 f g h <NA> <NA>
#> var.2_var.c var.2_var.d var.3_value var.3_var.b var.3_var.c var.3_var.d
#> 1 <NA> <NA> 1 b c d
#> 2 <NA> <NA> 9 <NA> <NA> <NA>
#> 3 <NA> <NA> <NA> <NA> <NA> <NA>
#> var.4_value var.4_var.b var.4_var.c var.4_var.d
#> 1 2 f g h
#> 2 7 j k z
#> 3 <NA> <NA> <NA> <NA>
Created on 2019-05-30 by the reprex package (v0.3.0)
This is a little bit convoluted, but I'll try to explain.
I turn row numbers into a column at first, as this will help me put the data back together at the very end.
I go from wide to long format for df1.
I join df2 to df1 based on var.a and var.1 (now called value), respectively.
I go from wide to long again.
I combine the variable names from each data frame into one variable.
Finally, I go from long to wide format (this is where the row numbers come in handy) and drop the row numbers.
I have some data as follows:
library(tidyr)
library(data.table)
thisdata <- data.frame(numbers = c(1,3,4,5,6,1,2,4,5,6)
,letters = c('A','A','A','A','A','B','B','B','B','B'))
otherdata <- data.frame(numbers = c(1,2,3,4,5,6))
I am looking to split 'thisdata' by the letters column, merge the two lists to 'otherdata' by the numbers column, then fill letters NA with the corresponding letter in that list. So:
out <- split(thisdata , f = thisdata$letters )
out2 <- lapply(out, function(x) merge(x,otherdata,by="numbers",all = TRUE))
However, I can't get the 'fill' function in tidyr to work within the lapply
out3 <- lapply(out2,function(x) fill(x$channel))
Error in UseMethod("fill_") :
no applicable method for 'fill_' applied to an object of class "NULL"
This is the output I'm after, but would rather perform the calculation within the list format:
out4 <- rbindlist(out2)
out5 <- out4 %>%
fill(letters) %>% #default direction down
fill(letters,.direction = "up")
numbers letters
1: 1 A
2: 2 A
3: 3 A
4: 4 A
5: 5 A
6: 6 A
7: 1 B
8: 2 B
9: 3 B
10: 4 B
11: 5 B
12: 6 B
fill expects a data frame as first parameter, try fill(x, letters) or x %>% fill(letters) with magrittr pipe:
out3 <- lapply(out2,function(x) fill(x, letters))
out3
#$A
# numbers letters
#1 1 A
#2 2 A
#3 3 A
#4 4 A
#5 5 A
#6 6 A
#$B
# numbers letters
#1 1 B
#2 2 B
#3 3 B
#4 4 B
#5 5 B
#6 6 B
A simpler method is use tidyr::complete:
thisdata %>%
complete(numbers = otherdata$numbers, letters) %>%
arrange(letters)
# A tibble: 12 x 2
# numbers letters
# <dbl> <fctr>
# 1 1 A
# 2 2 A
# 3 3 A
# 4 4 A
# 5 5 A
# 6 6 A
# 7 1 B
# 8 2 B
# 9 3 B
#10 4 B
#11 5 B
#12 6 B