I like the bind_rows function in dplyr but I find it annoying that when passing the .id argument it can only add a numeric index in the new column.
I'm trying write a bind_rows_named function but am getting stuck accessing the object names. This works as expected:
bind_name_to_df <- function(df){
dfname <- deparse(substitute(df))
df %>% mutate(label=dfname)
}
a <- data_frame(stuff=1:10)
bind_name_to_df(a)
But I can't work out how to apply this to a list of data frames, e.g. using .dots. I want this to work, but I know I have the semantics for the ... wrong somehow. Can anyone shed light?
b <- data_frame(stuff=1:10)
bind_rows_named <- function(...){
return(
bind_rows(lapply(..., bind_name_to_df)))
}
bind_rows_named(a, b)
Here is an option using base R
bind_named <- function(...){
v1 <- sapply(match.call()[-1], deparse)
dfs <- list(...)
Map(cbind, dfs, label = v1)
}
bind_named(a, b)
#[1]]
# stuff label
#1 1 a
#2 2 a
#3 3 a
#4 4 a
#5 5 a
#6 6 a
#7 7 a
#8 8 a
#9 9 a
#10 10 a
#[[2]]
# stuff label
#1 1 b
#2 2 b
#3 3 b
#4 4 b
#5 5 b
#6 6 b
#7 7 b
#8 8 b
#9 9 b
#10 10 b
Or using tidyverse
library(tidyverse)
bind_named <- function(...) {
nm1 <- quos(...) %>%
map(quo_name)
dfs <- list(...)
dfs %>%
map2(nm1, ~mutate(., label = .y))
}
res <- bind_named(a, b)
res %>%
map(head, 2)
#[[1]]
# stuff label
#1 1 a
#2 2 a
#[[2]]
# stuff label
#1 1 b
#2 2 b
It can be also be made into a single chain
bind_named <- function(...) {
quos(...) %>%
map(quo_name) %>%
map2_df(list(...), ., ~mutate(.data = .x, label = .y))
}
bind_named(a, b)
# A tibble: 20 x 2
# stuff label
# <int> <chr>
# 1 1 a
# 2 2 a
# 3 3 a
# 4 4 a
# 5 5 a
# 6 6 a
# 7 7 a
# 8 8 a
# 9 9 a
#10 10 a
#11 1 b
#12 2 b
#13 3 b
#14 4 b
#15 5 b
#16 6 b
#17 7 b
#18 8 b
#19 9 b
#20 10 b
NOTE: Initially, we thought the OP wanted to create columns on separate datasets and get a list output. Upon clarification, the map2 is changed to map2_df which return a single dataset
Related
data <- tibble(time = c(1,1,2,2), a = c(1,2,3,4), b =c(4,3,2,1), c = c(1,1,1,1))
The result will look like this
result <- tibble(
t = c(1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2),
firm1 = c("a","a","a","b","b","b","c","c","c","a","a","a","b","b","b","c","c","c"),
firm2 = c("a","b","c","a","b","c","a","b","c","a","b","c","a","b","c","a","b","c"),
value = c(6,10,5,10,14,9,5,9,4,14,10,9,10,6,5,9,5,4))
result
The function could be
function(x, y){sum(x, y)}
Basically I am looking for a tidy solution to expand.grid data at each point of time and apply functions across columns. Can anyone help?
I tried this, but I could not have time in front of the pairs.
expected_result<-expand.grid(names(data[-1]), names(data[-1])) %>%
mutate(value = map2(Var1, Var2, ~ fun1(data[.x], data[.y])))
expected_result
Use exand.grid you get all possible combination of columns, split the data by time and apply fun for each row of tmp.
library(dplyr)
library(purrr)
tmp <- expand.grid(firm1 = names(data[-1]), firm2 = names(data[-1]))
fun <- function(x, y) sum(x, y)
result <- data %>%
group_split(time) %>%
map_df(~cbind(time = .x$time[1], tmp,
value = apply(tmp, 1, function(x) fun(.x[[x[1]]], .x[[x[2]]]))))
result
# time firm1 firm2 value
#1 1 a a 6
#2 1 b a 10
#3 1 c a 5
#4 1 a b 10
#5 1 b b 14
#6 1 c b 9
#7 1 a c 5
#8 1 b c 9
#9 1 c c 4
#10 2 a a 14
#11 2 b a 10
#12 2 c a 9
#13 2 a b 10
#14 2 b b 6
#15 2 c b 5
#16 2 a c 9
#17 2 b c 5
#18 2 c c 4
You may also do this in base R -
result <- do.call(rbind, by(data, data$time, function(x) {
cbind(time = x$time[1], tmp,
value = apply(tmp, 1, function(y) fun(x[[y[1]]], x[[y[2]]])))
}))
We may use
library(dplyr)
library(tidyr)
library(purrr)
data1 <- data %>%
group_by(time) %>%
summarise(across(everything(), sum, na.rm = TRUE), .groups = 'drop') %>%
pivot_longer(cols = -time) %>%
group_split(time)
map_dfr(data1, ~ {dat <- .x
crossing(firm1 = dat$name, firm2 = dat$name) %>%
mutate(value = c(outer(dat$value, dat$value, FUN = `+`))) %>%
mutate(time = first(dat$time), .before = 1)})
-output
# A tibble: 18 × 4
time firm1 firm2 value
<dbl> <chr> <chr> <dbl>
1 1 a a 6
2 1 a b 10
3 1 a c 5
4 1 b a 10
5 1 b b 14
6 1 b c 9
7 1 c a 5
8 1 c b 9
9 1 c c 4
10 2 a a 14
11 2 a b 10
12 2 a c 9
13 2 b a 10
14 2 b b 6
15 2 b c 5
16 2 c a 9
17 2 c b 5
18 2 c c 4
I have a single row data frame like this:
X1 X2 X3
1 [['1','2','3'], ['4','6','5'], ['7','8']] ['9','10','11','12','13']
I would like create a new dataframe from that using columns X2 and X3 that looks like this:
ID Group
1 A
2 A
3 A
4 B
5 B
6 B
7 C
8 C
9 D
10 D
11 D
12 D
13 D
So each number in the dataframe is grouped by the square brackets in the orignal dataframe.
Can anyone recommend a good way of doing this in R.
One option would be to split the 'X2' at the , followed by the ], concatenate with 'X3', extract the numeric elements with str_extract_all into a list, stack it to a two column data.frame
library(stringr)
v1 <- c(strsplit(df1$X2, "\\],\\s*")[[1]], df1$X3)
out <- stack(setNames(str_extract_all(v1, "\\d+"), LETTERS[1:4]))
names(out) <- c("ID", "Group")
out
# ID Group
#1 1 A
#2 2 A
#3 3 A
#4 4 B
#5 6 B
#6 5 B
#7 7 C
#8 8 C
#9 9 D
#10 10 D
#11 11 D
#12 12 D
#13 13 D
Or using tidyverse
library(dplyr)
library(tidyr)
df1 %>%
pivot_longer(cols = -X1) %>%
separate_rows(value, sep="(?<=\\]),\\s*") %>%
transmute(Group = LETTERS[row_number()], ID = value) %>%
mutate(ID = str_extract_all(ID, "\\d+")) %>%
unnest(c(ID))
# A tibble: 13 x 2
# Group ID
# <chr> <chr>
# 1 A 1
# 2 A 2
# 3 A 3
# 4 B 4
# 5 B 6
# 6 B 5
# 7 C 7
# 8 C 8
# 9 D 9
#10 D 10
#11 D 11
#12 D 12
#13 D 13
data
df1 <- structure(list(X1 = 1L, X2 = "[['1','2','3'], ['4','6','5'], ['7','8']]",
X3 = "['9','10','11','12','13']"), class = "data.frame", row.names = c(NA,
-1L))
I constructed the following code below. It shall assign the value "1" or "2" to vector v2, if an element in vector v1 occurs twice, e.g. "A" in vector v1 appears twice, hence in the respective rows, v2 should once read "1" and in the other case "2".
The code works sort of fine, except in some cases, a similar number is assigned to v2, when an element in v1 occurs twice, this should obviously not be the case.
Can anybody help me with the issue? Thanks!
v1 <- c(rep(c("A","B","C","D","E","F","G"),rep(2,7)),c("H","I","J","K"))
v2 <- rep(3,length(v1))
df1 <- data.frame(v1,v2)
for (i in 1:length(df1$v1)) {
if (sum(df1$v1[i]==df1$v1)==2 & df1$v2[i]==3) {
df1$v2[i] <- sample(c(1,2),1,replace=TRUE)
} else if (sum(df1$v1[i]==df1$v1)==2 & df1$v2[i]==1) {
df1$v2[i] <- 2
} else if (sum(df1$v1[i]==df1$v1)==2 & df1$v2[i]==2) {
df1$v2[i] <- 1
} else {
df1$v2[i] <- 2
}
}
I think that I have understood what you require and hopefully the below should do what you want, using dplyr. It will randomly assign integer values from 1 to n, where n is the number of occurrences of a given letter (note this is generalizable from your requirement of 2 occurrences).
library(dplyr)
df1 <- data.frame(v1 = c(rep(c("A","B","C","D","E","F","G"),rep(2,7)),c("H","I","J","K")))
df1 <- df1 %>%
group_by(v1) %>%
mutate(v2 = case_when(n() > 1 ~ sample(c(1:n()), n(), replace = FALSE),
TRUE ~ 1L))
v1 <- c(rep(c("A","B","C","D","E","F","G"),rep(2,7)),c("H","I","J","K"))
value = 1:length(v1)
v2 <- rep(3,length(v1))
df1 <- data.frame(v1,value,v2)
library(dplyr)
set.seed(9)
df1 %>%
sample_frac(1) %>% # shuffle rows
group_by(v1) %>% # for each v1 value
mutate(v2 = row_number()) %>% # count and flag occurences
ungroup() %>% # forget the grouping
arrange(v1) # order by v1 (only for visualisation purposes)
# # A tibble: 18 x 3
# v1 value v2
# <fct> <int> <int>
# 1 A 1 1
# 2 A 2 2
# 3 B 4 1
# 4 B 3 2
# 5 C 5 1
# 6 C 6 2
# 7 D 7 1
# 8 D 8 2
# 9 E 9 1
#10 E 10 2
#11 F 12 1
#12 F 11 2
#13 G 14 1
#14 G 13 2
#15 H 15 1
#16 I 16 1
#17 J 17 1
#18 K 18 1
Using base R, I think you can arrive at what you want somewhat easily by using table and sequence in connection and manipulating the output.
Edit: After your comments. I now think I understand what you what.
res <- data.frame(v1, v2 = sequence(table(v1)), row.names = NULL)
res <- res[sample(1:nrow(res)), ] # Scramble data order
res <- res[order(res$v1), ] # Reorder by v1 column
# v1 v2
#1 A 1
#2 A 2
#3 B 1
#4 B 2
#5 C 1
#6 C 2
#7 D 2 # note 2 comes first here
#8 D 1
#9 E 1
#10 E 2
#11 F 1
#12 F 2
#13 G 1
#14 G 2
#15 H 1
#16 I 1
#17 J 1
#18 K 1
Edit2 "randomly" sorting before assigning:
df1 <- data.frame(v1)
df1[order(rank(v1, ties.method = "random")), "v2"] <- sequence(table(v1))
df1
I have a list of data frames, I want to add a column to each data frame and this column would be the concatenation of the row number and another variable.
I have managed to do that using a for loop but it is taking a lot of time when dealing with a large dataset, is there a way to avoid a for loop?
my_data_vcf <-lapply(my_vcf_files,read.table, stringsAsFactors = FALSE)
for i in 1:length(my_data_vcf){
for(j in 1:length(my_data_vcf[[i]]){
my_data_vcf[[i]] <- cbind(my_data_vcf[[i]], "Id" = paste(c(variable,j), collapse = "_"))}}
You can use lapply; since you don't provide a minimal sample dataset, I'm generating some sample data.
# Sample list of data.frame's
lst <- list(
data.frame(one = letters[1:10], two = 1:10),
data.frame(one = letters[11:20], two = 11:20))
# Concatenate row number with entries in second column
lapply(lst, function(x) { x$three <- paste(1:nrow(x), x$two, sep = "_"); x })
#[1]]
# one two three
#1 a 1 1_1
#2 b 2 2_2
#3 c 3 3_3
#4 d 4 4_4
#5 e 5 5_5
#6 f 6 6_6
#7 g 7 7_7
#8 h 8 8_8
#9 i 9 9_9
#10 j 10 10_10
#
#[[2]]
# one two three
#1 k 11 1_11
#2 l 12 2_12
#3 m 13 3_13
#4 n 14 4_14
#5 o 15 5_15
#6 p 16 6_16
#7 q 17 7_17
#8 r 18 8_18
#9 s 19 9_19
#10 t 20 10_20
One way we can do this is to create a nested data frame using enframe from the tibble package. Once we've done that, we can unnest the data and use mutate to concatenate the row number and a column:
library(tidyverse)
# using Maurits Evers' data, treating stringsAsFactors
lst <- list(
data.frame(one = letters[1:10], two = 1:10, stringsAsFactors = F),
data.frame(one = letters[11:20], two = 11:20, stringsAsFactors = F)
)
lst %>%
enframe() %>%
unnest(value) %>%
group_by(name) %>%
mutate(three = paste(row_number(), two, sep = "_")) %>%
nest()
Returns:
# A tibble: 2 x 2
name data
<int> <list>
1 1 <tibble [10 × 3]>
2 2 <tibble [10 × 3]>
If we unnest the data, we can see that var three is the concatenation of var two and the row number:
lst %>%
enframe() %>%
unnest(value) %>%
group_by(name) %>%
mutate(three = paste(row_number(), two, sep = "_")) %>%
nest() %>%
unnest(data)
Returns:
# A tibble: 20 x 4
name one two three
<int> <chr> <int> <chr>
1 1 a 1 1_1
2 1 b 2 2_2
3 1 c 3 3_3
4 1 d 4 4_4
5 1 e 5 5_5
6 1 f 6 6_6
7 1 g 7 7_7
8 1 h 8 8_8
9 1 i 9 9_9
10 1 j 10 10_10
11 2 k 11 1_11
12 2 l 12 2_12
13 2 m 13 3_13
14 2 n 14 4_14
15 2 o 15 5_15
16 2 p 16 6_16
17 2 q 17 7_17
18 2 r 18 8_18
19 2 s 19 9_19
20 2 t 20 10_20
I have some data as follows:
library(tidyr)
library(data.table)
thisdata <- data.frame(numbers = c(1,3,4,5,6,1,2,4,5,6)
,letters = c('A','A','A','A','A','B','B','B','B','B'))
otherdata <- data.frame(numbers = c(1,2,3,4,5,6))
I am looking to split 'thisdata' by the letters column, merge the two lists to 'otherdata' by the numbers column, then fill letters NA with the corresponding letter in that list. So:
out <- split(thisdata , f = thisdata$letters )
out2 <- lapply(out, function(x) merge(x,otherdata,by="numbers",all = TRUE))
However, I can't get the 'fill' function in tidyr to work within the lapply
out3 <- lapply(out2,function(x) fill(x$channel))
Error in UseMethod("fill_") :
no applicable method for 'fill_' applied to an object of class "NULL"
This is the output I'm after, but would rather perform the calculation within the list format:
out4 <- rbindlist(out2)
out5 <- out4 %>%
fill(letters) %>% #default direction down
fill(letters,.direction = "up")
numbers letters
1: 1 A
2: 2 A
3: 3 A
4: 4 A
5: 5 A
6: 6 A
7: 1 B
8: 2 B
9: 3 B
10: 4 B
11: 5 B
12: 6 B
fill expects a data frame as first parameter, try fill(x, letters) or x %>% fill(letters) with magrittr pipe:
out3 <- lapply(out2,function(x) fill(x, letters))
out3
#$A
# numbers letters
#1 1 A
#2 2 A
#3 3 A
#4 4 A
#5 5 A
#6 6 A
#$B
# numbers letters
#1 1 B
#2 2 B
#3 3 B
#4 4 B
#5 5 B
#6 6 B
A simpler method is use tidyr::complete:
thisdata %>%
complete(numbers = otherdata$numbers, letters) %>%
arrange(letters)
# A tibble: 12 x 2
# numbers letters
# <dbl> <fctr>
# 1 1 A
# 2 2 A
# 3 3 A
# 4 4 A
# 5 5 A
# 6 6 A
# 7 1 B
# 8 2 B
# 9 3 B
#10 4 B
#11 5 B
#12 6 B