Dynamic select expression in function [duplicate] - r

This question already has answers here:
Reshaping multiple sets of measurement columns (wide format) into single columns (long format)
(8 answers)
Closed 4 years ago.
I am trying to write a function that will convert this data frame
library(dplyr)
library(rlang)
library(purrr)
df <- data.frame(obj=c(1,1,2,2,3,3,3,4,4,4),
S1=rep(c("a","b"),length.out=10),PR1=rep(c(3,7),length.out=10),
S2=rep(c("c","d"),length.out=10),PR2=rep(c(7,3),length.out=10))
obj S1 PR1 S2 PR2
1 1 a 3 c 7
2 1 b 7 d 3
3 2 a 3 c 7
4 2 b 7 d 3
5 3 a 3 c 7
6 3 b 7 d 3
7 3 a 3 c 7
8 4 b 7 d 3
9 4 a 3 c 7
10 4 b 7 d 3
In to this data frame
df %>% {bind_rows(select(., obj, S = S1, PR = PR1),
select(., obj, S = S2, PR = PR2))}
obj S PR
1 1 a 3
2 1 b 7
3 2 a 3
4 2 b 7
5 3 a 3
6 3 b 7
7 3 a 3
8 4 b 7
9 4 a 3
10 4 b 7
11 1 c 7
12 1 d 3
13 2 c 7
14 2 d 3
15 3 c 7
16 3 d 3
17 3 c 7
18 4 d 3
19 4 c 7
20 4 d 3
But I would like the function to be able to work with any number of columns. So it would also work if I had S1, S2, S3, S4 or if there was an additional category ie DS1, DS2. Ideally the function would take as arguments the patterns that determine which columns are stacked on top of each other, the number of sets of each column, the names of the output columns and the names of any variables that should also be kept.
This is my attempt at this function:
stack_col <- function(df, patterns, nums, cnames, keep){
keep <- enquo(keep)
build_exp <- function(x){
paste0("!!sym(cnames[[", x, "]]) := paste0(patterns[[", x, "]],num)") %>%
parse_expr()
}
exps <- map(1:length(patterns), ~expr(!!build_exp(.)))
sel_fun <- function(num){
df %>% select(!!keep,
!!!exps)
}
map(nums, sel_fun) %>% bind_rows()
}
I can get the sel_fun part to work for a fixed number of patterns like this
patterns <- c("S", "PR")
cnames <- c("Species", "PR")
keep <- quo(obj)
sel_fun <- function(num){
df %>% select(!!keep,
!!sym(cnames[[1]]) := paste0(patterns[[1]], num),
!!sym(cnames[[2]]) := paste0(patterns[[2]], num))
}
sel_fun(1)
But the dynamic version that I have tried does not work and gives this error:
Error: `:=` can only be used within a quasiquoted argument

Here is a function to get the expected output. Loop through the 'patterns' and the corresponding new column names ('cnames') using map2, gather into 'long' format, rename the 'val' column to the 'cnames' passed into the function, bind the columns (bind_cols) and select the columns of interest
stack_col <- function(dat, pat, cname, keep) {
purrr::map2(pat, cname, ~
dat %>%
dplyr::select(keep, matches(.x)) %>%
tidyr::gather(key, val, matches(.x)) %>%
dplyr::select(-key) %>%
dplyr::rename(!! .y := val)) %>%
dplyr::bind_cols(.) %>%
dplyr::select(keep, cname)
}
stack_col(df, patterns, cnames, 1)
# obj Species PR
#1 1 a 3
#2 1 b 7
#3 2 a 3
#4 2 b 7
#5 3 a 3
#6 3 b 7
#7 3 a 3
#8 4 b 7
#9 4 a 3
#10 4 b 7
#11 1 c 7
#12 1 d 3
#13 2 c 7
#14 2 d 3
#15 3 c 7
#16 3 d 3
#17 3 c 7
#18 4 d 3
#19 4 c 7
#20 4 d 3
Also, multiple patterns reshaping can be done with data.table::melt
library(data.table)
melt(setDT(df), measure = patterns("^S\\d+", "^PR\\d+"),
value.name = c("Species", "PR"))[, variable := NULL][]

This solves your problem, although it does not fix your function:
The idea is to use gather and spread on the columns which starts with the specific pattern. Therefore I create a regex which matches the column names and then first gather all of them, extract the group and the rename the groups with the cnames. Finally spread takes separates the new columns.
library(dplyr)
library(purrr)
library(tidyr)
library(stringr)
patterns <- c("S", "PR")
cnames <- c("Species", "PR")
names(cnames) <- patterns
complete_pattern <- str_c("^", str_c(patterns, collapse = "|^"))
df %>%
mutate(rownumber = 1:n()) %>%
gather(new_variable, value, matches(complete_pattern)) %>%
mutate(group = str_extract(new_variable, complete_pattern),
group = str_replace_all(group, cnames),
group_number = str_extract(new_variable, "\\d+")) %>%
select(-new_variable) %>%
spread(group, value)
# obj rownumber group_number PR Species
# 1 1 1 1 3 a
# 2 1 1 2 7 c
# 3 1 2 1 7 b
# 4 1 2 2 3 d
# 5 2 3 1 3 a
# 6 2 3 2 7 c
# 7 2 4 1 7 b
# 8 2 4 2 3 d
# 9 3 5 1 3 a
# 10 3 5 2 7 c
# 11 3 6 1 7 b
# 12 3 6 2 3 d
# 13 3 7 1 3 a
# 14 3 7 2 7 c
# 15 4 8 1 7 b
# 16 4 8 2 3 d
# 17 4 9 1 3 a
# 18 4 9 2 7 c
# 19 4 10 1 7 b
# 20 4 10 2 3 d

Related

How to stack multiple columns into one using R [duplicate]

This question already has answers here:
Reshaping data.frame from wide to long format
(8 answers)
Closed 1 year ago.
I have the following data frame:
A <- c(3,5,6,7)
B <- c(2,4,5,3)
C <- c(4,6,7,8)
D <- c(2,4,5,3)
gene <- c(1,2,3,4)
df <- data.frame(gene,A,B,C,D)
df
gene A B C D
1 1 3 2 4 2
2 2 5 4 6 4
3 3 6 5 7 5
4 4 7 3 8 3
How can I stack each lettered column into one new column called "count" such that there is another new column called "sample" that keeps track of the original column from which each count value came frame (ie. I would like the following output):
count sample
3 A
5 A
6 A
7 A
2 B
4 B
5 B
3 B
4 C
6 C
7 C
8 C
2 D
4 D
5 D
3 D
Sorry this is difficult to explain but the output data frame above should make it clear.
Thanks
In base R, use stack after removing the first column
out <- stack(df[-1])
names(out) <- c("count", "sample")
We could use pivot_longer:
library(tidyr)
library(dplyr)
df %>%
pivot_longer(
cols = -gene,
names_to = "sample",
values_to = "count"
) %>%
select(-gene) %>%
arrange(sample)
sample count
<chr> <dbl>
1 A 3
2 A 5
3 A 6
4 A 7
5 B 2
6 B 4
7 B 5
8 B 3
9 C 4
10 C 6
11 C 7
12 C 8
13 D 2
14 D 4
15 D 5
16 D 3

Transpose and Merge columns in R [duplicate]

This question already has answers here:
Reshaping data.frame from wide to long format
(8 answers)
Closed 2 years ago.
Quite new to R and I have a dataset in this format:
A B C
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
But I want it in this format:
A 1
A 2
A 3
A 4
A 5
B 1
B 2
B 3
...etc.
Seems like such a simple issue but I need HELP! Thanks
df <- data.frame(
A = 1:5,
B = 1:5,
C = 1:5
)
stack(df)
values ind
1 1 A
2 2 A
3 3 A
4 4 A
5 5 A
6 1 B
7 2 B
8 3 B
9 4 B
10 5 B
11 1 C
12 2 C
13 3 C
14 4 C
15 5 C
Examples using dplyr's gather function:
library(tidyverse)
A <- c(1,2,3,4,5)
B <- c(1,2,3,4,5)
C <- c(1,2,3,4,5)
df <- data.frame(A,B,C)
df %>% gather(key = "key", value = "value")
key value
1 a 1
2 a 2
3 a 3
4 a 4
5 a 5
6 b 1
7 b 2
8 b 3
9 b 4
10 b 5
11 c 1
12 c 2
13 c 3
14 c 4
15 c 5
You can use the package tidyr. This let's you choose, which columns you want to gather in the column "variable".
# if not installed yet
install.packages("tidyr")
library(tidyr)
data <- data.frame(
A = 1:5,
B = 1:5,
C = 1:5
)
data %>% pivot_longer(c(A, B, C), names_to = "variable", values_to = "value")
# Result
variable value
<chr> <int>
1 A 1
2 B 1
3 C 1
4 A 2
5 B 2
6 C 2
7 A 3
8 B 3
9 C 3
10 A 4
11 B 4
12 C 4
13 A 5
14 B 5
15 C 5

Expand dataframe by ID to generate a special column

I have the following dataframe
df<-data.frame("ID"=c("A", "A", "A", "A", "A", "B", "B", "B", "B", "B"),
'A_Frequency'=c(1,2,3,4,5,1,2,3,4,5),
'B_Frequency'=c(1,2,NA,4,6,1,2,5,6,7))
The dataframe appears as follows
ID A_Frequency B_Frequency
1 A 1 1
2 A 2 2
3 A 3 NA
4 A 4 4
5 A 5 6
6 B 1 1
7 B 2 2
8 B 3 5
9 B 4 6
10 B 5 7
I Wish to create a new dataframe df2 from df that looks as follows
ID CFreq
1 A 1
2 A 2
3 A 3
4 A 4
5 A 5
6 A 6
7 B 1
8 B 2
9 B 3
10 B 4
11 B 5
12 B 6
13 B 7
The new dataframe has a column CFreq that takes unique values from A_Frequency, B_Frequency and groups them by ID. Then it ignores the NA values and generates the CFreq column
I have tried dplyr but am unable to get the required response
df2<-df%>%group_by(ID)%>%select(ID, A_Frequency,B_Frequency)%>%
mutate(Cfreq=unique(A_Frequency, B_Frequency))
This yields the following which is quite different
ID A_Frequency B_Frequency Cfreq
<fct> <dbl> <dbl> <dbl>
1 A 1 1 1
2 A 2 2 2
3 A 3 NA 3
4 A 4 4 4
5 A 5 6 5
6 B 1 1 1
7 B 2 2 2
8 B 3 5 3
9 B 4 6 4
10 B 5 7 5
Request someone to help me here
gather function from tidyr package will be helpful here:
library(tidyverse)
df %>%
gather(x, CFreq, -ID) %>%
select(-x) %>%
na.omit() %>%
unique() %>%
arrange(ID, CFreq)
A different tidyverse possibility could be:
df %>%
nest(A_Frequency, B_Frequency, .key = C_Frequency) %>%
mutate(C_Frequency = map(C_Frequency, function(x) unique(x[!is.na(x)]))) %>%
unnest()
ID C_Frequency
1 A 1
2 A 2
3 A 3
4 A 4
5 A 5
9 A 6
10 B 1
11 B 2
12 B 3
13 B 4
14 B 5
18 B 6
19 B 7
Base R approach would be to split the dataframe based on ID and for every list we count the number of unique enteries and create a sequence based on that.
do.call(rbind, lapply(split(df, df$ID), function(x) data.frame(ID = x$ID[1] ,
CFreq = seq_len(length(unique(na.omit(unlist(x[-1]))))))))
# ID CFreq
#A.1 A 1
#A.2 A 2
#A.3 A 3
#A.4 A 4
#A.5 A 5
#A.6 A 6
#B.1 B 1
#B.2 B 2
#B.3 B 3
#B.4 B 4
#B.5 B 5
#B.6 B 6
#B.7 B 7
This will also work when A_Frequency B_Frequency has characters in them or some other random numbers instead of sequential numbers.
In tidyverse we can do
library(tidyverse)
df %>%
group_split(ID) %>%
map_dfr(~ data.frame(ID = .$ID[1],
CFreq= seq_len(length(unique(na.omit(flatten_chr(.[-1])))))))
A data.table option
library(data.table)
cols <- c('A_Frequency', 'B_Frequency')
out <- setDT(df)[, .(CFreq = sort(unique(unlist(.SD)))),
.SDcols = cols,
by = ID]
out
# ID CFreq
# 1: A 1
# 2: A 2
# 3: A 3
# 4: A 4
# 5: A 5
# 6: A 6
# 7: B 1
# 8: B 2
# 9: B 3
#10: B 4
#11: B 5
#12: B 6
#13: B 7

Gathering specific pairs of columns into rows by dplyr in R [duplicate]

This question already has answers here:
Reshaping multiple sets of measurement columns (wide format) into single columns (long format)
(8 answers)
Closed 4 years ago.
I am trying to convert a data frame from wide to long format by gathering specific pairs of columns of which example is shown below:
An example of data frame
df <- data.frame(id=c(1,2,3,4,5), var=c("a","d","g","f","i"),a1=c(3,5,1,2,2), b1=c(2,4,1,2,3), a2=c(8,1,2,5,1), b2=c(1,6,4,7,2), a3=c(7,7,2,3,1), b3=c(1,1,4,9,6))
Initial table:
id var a1 b1 a2 b2 a3 b3
1 1 a 3 2 8 1 7 1
2 2 d 5 4 1 6 7 1
3 3 g 1 1 2 4 2 4
4 4 f 2 2 5 7 3 9
5 5 i 2 3 1 2 1 6
Desired result:
id var a b
1 1 a 3 2
2 1 a 8 1
3 1 a 7 1
4 2 d 5 4
5 2 d 1 6
6 2 d 7 1
7 3 g 1 1
8 3 g 2 4
9 3 g 2 4
10 4 f 2 2
11 4 f 5 7
12 4 f 3 9
13 5 i 2 3
14 5 i 1 2
15 5 i 1 6
Conditions:
Pair of ai and bi should be gathered: As there are 3 pairs of a and b, "a1 and b1", "a2 and b2" and "a3 and b3", values in those pairs should be moved to a pair of "a and b" by replicating each record in three times
First and second fields (id of each sample and its common variable) should be kept in each replicated rows
I was thinking that it is possible to make it by gather() in tidyverse, however, as far as I understand, I suppose that gather function may not be suitable for gathering such specific pairs of fields into specific multiple columns (two columns in this case).
It is possible to make it to prepare three data frames separately and binding it into one (example scripts are shown below), however I prefer to make it in one continuous pipe operation in tidyverse not to stop manipulation.
df1 <- df %>% dplyr::select(id,var,a1,b1)
df2 <- df %>% dplyr::select(id,var,a2,b2)
df3 <- df %>% dplyr::select(id,var,a3,b3)
df.fin <- bind_rows(df1,df2,df3)
I would appreciate your elegant suggestons using tidyverse.
=================Additional Questions==================
#Akrun & Camille
Thank you for your suggestions and sorry for my late reply. I am now trying to apply your idea into actual data frame but still struggling with another issue.
Followings are column names in actual data frame (sorry, I do not set any values of each columns as it may not be a matter).
colnames(df) <- c("hid","mid","rel","age","gen","mlic","vlic",
"wtaz","staz","ocp","ocpot","emp","empot","expm",
"minc","otaz1","op1","dtime1","atime1","dp1","dtaz1",
"pur1", "repm1","lg1t1","lg2t1","lg3t1","lg4t1","expt1",
"otaz2","op2","dtime2","atime2","dp2","dtaz2","pur2",
"repm2","lg1t2","lg2t2","lg3t2","lg4t2","expt2",
"otaz3","op3","dtime3","atime3","dp3","dtaz3","pur3",
"repm3","lg1t3","lg2t3","lg3t3","lg4t3","expt3",
"otaz4","op4","dtime4","atime4","dp4","dtaz4","pur4",
"repm4","lg1t4","lg2t4","lg3t4","lg4t4","expt4",
"otaz5","op5","dtime5","atime5","dp5","dtaz5","pur5",
"repm5","lg1t5","lg2t5","lg3t5","lg4t5","expt5"
)
Then, I am trying to apply your suggestions as below:
In the data frame, columns 1:15 are commons variables and others are repeated variables with 5 repetitions (1 to 5 located at the end of each varible). I could rund following script but still have problem:
#### Convert member table into activity table
## Common variables
hm.com <- names(hm)[c(1:15)]
## Repeating variables
hm.rep <- names(hm)[c(-1:-15)]
hm.rename <- unique(sub("\\d+$","",hm.rep))
## Extract members with trips
hm.trip <- hm %>% filter(otaz!=0) %>% data.frame()
## Convert from member into trip table
test <- split(hm.rep, sub(".*[^1-9$]", "", hm.rep)) %>%
map_df(~ hm.trip %>% dplyr::select(hm.com, .x)) %>%
rename_at(16:28, ~ hm.rename) %>%
arrange(hid,mid,dtime,atime) %>%
data.frame()
The result still have an issue:
I could rename first set of repeated variables, however remaining fields from 2 to 5 are still remaining and records are not appropriately stored in the data frame.
I mean that, a set of repeated variables, for instance, from otaz2 to expt2, are stored not in the second row of otaz~expt but stored in its original position (from otaz2 to expt2). I suppose map_df is not working correctly in my case.
========== Problem Solved ==========
Above script was containing incorrect manipulation:
Wrong:
map_df(~ hm.trip %>% dplyr::select(hm.com, .x)) %>%
rename_at(16:28, ~ hm.rename)
Correct:
map_df(~ hm.trip %>% dplyr::select(hm.com, .x) %>%
rename_at(16:28, ~ hm.rename))
Thank you, I could go to the next step.
We could do this with melt from data.table which can take multiple patterns in the measure argument to reshape into 'long' format. In this case we are using column names that start (^) with "a" followed by numbers as one pattern and those start with "b" and followed by numbers as other
library(data.table)
melt(setDT(df), measure = patterns("^a\\d+", "^b\\d+"),
value.name = c("a", "b"))[order(id)][, variable := NULL][]
# id var a b
# 1: 1 a 3 2
# 2: 1 a 8 1
# 3: 1 a 7 1
# 4: 2 d 5 4
# 5: 2 d 1 6
# 6: 2 d 7 1
# 7: 3 g 1 1
# 8: 3 g 2 4
# 9: 3 g 2 4
#10: 4 f 2 2
#11: 4 f 5 7
#12: 4 f 3 9
#13: 5 i 2 3
#14: 5 i 1 2
#15: 5 i 1 6
Or using tidyverse, we gather the columns of interest to 'long' format (but should be cautious when dealing with groups of columns that are having different classes - where melt is more useful), then separate the 'key' column into two, and spread to 'wide' format
library(tidyverse)
df %>%
gather(key, val, a1:b3) %>%
separate(key, into = c("key1", "key2"), sep=1) %>%
spread(key1, val) %>%
select(-key2)
# id var a b
#1 1 a 3 2
#2 1 a 8 1
#3 1 a 7 1
#4 2 d 5 4
#5 2 d 1 6
#6 2 d 7 1
#7 3 g 1 1
#8 3 g 2 4
#9 3 g 2 4
#10 4 f 2 2
#11 4 f 5 7
#12 4 f 3 9
#13 5 i 2 3
#14 5 i 1 2
#15 5 i 1 6
This isn't very scaleable, so if you end up needing more than these 3 pairs of columns, go with #akrun's answer. I just wanted to point out that the bind_rows snippet you included could, in fact, be done in one pipe:
library(tidyverse)
bind_rows(
df %>% select(id, var, a = a1, b = b1),
df %>% select(id, var, a = a2, b = b2),
df %>% select(id, var, a = a3, b = b3)
) %>%
arrange(id, var)
#> id var a b
#> 1 1 a 3 2
#> 2 1 a 8 1
#> 3 1 a 7 1
#> 4 2 d 5 4
#> 5 2 d 1 6
#> 6 2 d 7 1
#> 7 3 g 1 1
#> 8 3 g 2 4
#> 9 3 g 2 4
#> 10 4 f 2 2
#> 11 4 f 5 7
#> 12 4 f 3 9
#> 13 5 i 2 3
#> 14 5 i 1 2
#> 15 5 i 1 6
Created on 2018-05-07 by the reprex package (v0.2.0).
If you want something that scales and you like map_* functions (from purrr in the tidyverse), you can abstract the above pipeline:
1:3 %>%
map_df(~select(df, id, var, ends_with(as.character(.))) %>%
setNames(c("id", "var", "a", "b"))) %>%
arrange(id, var)
where 1:3 just represents the numbers of the pairs you have.
a base R solution:
res <- do.call(rbind,lapply(1:3,function(x) setNames(df[c(1:2,2*x+(1:2))],names(df)[1:4])))
res[order(res$id),]
# id var a1 b1
# 1 1 a 3 2
# 6 1 a 8 1
# 11 1 a 7 1
# 2 2 d 5 4
# 7 2 d 1 6
# 12 2 d 7 1
# 3 3 g 1 1
# 8 3 g 2 4
# 13 3 g 2 4
# 4 4 f 2 2
# 9 4 f 5 7
# 14 4 f 3 9
# 5 5 i 2 3
# 10 5 i 1 2
# 15 5 i 1 6

Compare variable names with String to compute new variable

I want to use variable names in R for comparison with String-Data in an id-Variable.
Table 1 shows how my sample table is currently looking and Table 2 shows what I want to achieve. In Table 2 corresponding values are copied into a new variable "newid" if the variable id matches the column name. The other values are also copied into new variables for easier calculating
id a b c newid
c 1 2 3
a 4 2 1
b 5 8 9
id a b c newid rest1 rest2
c 1 2 3 3 1 2
a 4 2 1 4 2 1
b 5 8 9 8 5 9
Thank you in advance
An approach via tidyverse,
library(tidyverse)
df %>%
gather(var, val, -id) %>%
filter(id == var) %>%
left_join(df, ., by = 'id') %>%
select(-var)
which gives,
id a b c val
1 c 1 2 3 3
2 a 4 2 1 4
3 b 5 8 9 8
Using match and subsetting with an index matrix:
DF <- read.table(text = "id a b c
c 1 2 3
a 4 2 1
b 5 8 9 ", header = TRUE)
DF$newid <- DF[, -1][cbind(seq_len(nrow(DF)),
match(DF$id, colnames(DF)[-1]))]
# id a b c newid
#1 c 1 2 3 3
#2 a 4 2 1 4
#3 b 5 8 9 8
By using melt from reshape
temp=with(melt(dt,'id'),melt(dt,'id')[id==variable,])
dt$New=temp$value[match(dt$id,temp$variable)]
dt
id a b c New
1 c 1 2 3 3
2 a 4 2 1 4
3 b 5 8 9 8

Resources