I have two lists, both of which contain similar datasets corresponding to different years. I wish to merge the datasets in both lists, element by element. When I use mapply, alongside dplyr::full_join, in the instance where the variable names don't match and I need to use the by argument, R is unable to perform the join.
library(dplyr)
set.seed(100)
first_list <- list(data.frame(x = 1:3, y = rnorm(3)),
data.frame(x = 4:6, y = rnorm(3)))
second_list <- list(data.frame(z = 1:3, w = rnorm(3)),
data.frame(z = 4:6, w = rnorm(3)))
Map(full_join, by = c("x" = "z"), first_list, second_list)
#Error: 'z' column not found in rhs, cannot join
However,
Map(function(x, y) full_join(x, y, by = c("x" = "z")), first_list, second_list)
works successfully. I am curious about this behaviour and wonder if anyone could provide some explanation.
Since Map is a wrapper to mapply, use its MoreArgs argument while the other required args (...) include lists to be vectorized over (see ?mapply):
test1 <- Map(full_join, first_list, second_list, MoreArgs=list(by = c("x" = "z")))
test2 <- Map(function(x, y) full_join(x, y, by = c("x" = "z")), first_list, second_list)
all.equal(test1, test2)
# [1] TRUE
Related
Naive question ahead: I would like to remove a columns after a map
Repex:
tibble(a = rep(c("A", "B"), each = 5),
x = runif(10),
y = runif(10),
z = runif(10)) %>%
split(.$a) %>%
map(`[`, c("x", "y", "z"))
selects me the x, y, and z columns of the tibbles.
What if I want to drop the column a instead?
(Same result, but easier for me.)
Using base R
map(~.x[grep('a', names(.x), invert = TRUE)])
#OR
map(function(x) x[grep('a', names(x), invert = TRUE)])
Using dplyr
map(~select(.x, -a))
Let's presume I have the following dataframe:
df <- data.frame(x = rnorm(10), y = rnorm(10), z = rnorm(10))
And I would like to replace the values in the variables by their corresponding data frame and variable names in the following list:
replace_df <- list(x = data.frame(x = 1:10),
y = data.frame(y = 11:20),
z = data.frame(z = 21:30))
How would I do that using dplyr?
I feel like my issue is related to this Q&A, but I haven't been able to implement the answers to that question correctly to my situation.
I've attempted the below, among others, without success:
library(tidyverse)
variables <- c("x", "y", "z")
df %>%
mutate_at(vars(variables), funs(replace_df[[.]][[.]]))
The "dumb" way would be the following:
df %>%
mutate(x = replace_df[["x"]][["x"]],
y = replace_df[["y"]][["y"]],
z = replace_df[["z"]][["z"]])
You need to use expr! I am not sure if the subsetting will work as you tried above, but I was able to get the correct output by making a simple function and passing in an argument that was wrapped in expr()
df <- data.frame(x = rnorm(10), y = rnorm(10), z = rnorm(10))
replace_df <- list(x = data.frame(x = 1:10),
y = data.frame(y = 11:20),
z = data.frame(z = 21:30))
my_func <- function(string) {
return(
replace_df[[string]][[string]]
)
}
df %>%
mutate_at(vars(x, y, z), funs(my_func(expr(.))))
This code works fine but requires knowledge of the data table names ahead of time to construct list(x,y,z)
library(data.table)
x <- data.table(i = c("a","b","c"), j = 1:3)
y <- data.table(i = c("b","c","d"), k = 4:6)
z <- data.table(i = c("c","d","a"), l = 7:9)
Reduce(function(...) merge(..., all = TRUE, by = "i"), list(x, y, z))
But I have a script that generates the data tables (the names are constructed dynamically) and creates a character vector as follows:
dtList <- c("x", "y", "z")
I want to use dtList in the Reduce code. I have tried a variety of things. None of these work
list(dtList)
as.vector(dtList, mode = "list")
Here's the code I came up with following JRR's comments that seems to work for my particular setup. dtNameList is actually read in from somewhere else in my full code but for this example, I just created a dummy version of it.
library(data.table)
dtList <- list()
dtNameList <- c("x", "y", "z")
for (k in 1:length(dtNameList)){
dt <- data.table(i = c("a","b","c"), j = 1:3)
# assign(k, dt)
dtList[[dtNameList[k]]] <- dt
}
Reduce(function(...) merge(..., all = TRUE, by = "i"), dtList)
I would like to loop through a list of dataframes and change the column names (I want each of the columns to have the same name)
Does anyone have a solution using the following data?
df <- data.frame(x = 1:10, y = 2:11, z = 3:12)
df2 <- data.frame(x = 1:10, y = 2:11, z = 3:12)
df3 <- data.frame(x = 1:10, y = 2:11, z = 3:12)
x <- list(df, df2, df3)
Either using a for loop or apply? Would actually love to see both if possible
Thanks,
Ben
Both hrbrmstr and David Arenburg's answers are perfect.
I'm trying to merge multiple data frames by row names.
I know how to do it with two:
x = data.frame(a = c(1,2,3), row.names = letters[1:3])
y = data.frame(b = c(1,2,3), row.names = letters[1:3])
merge(x,y, by = "row.names")
But when I try using the reshape package's merge_all() I'm getting an error.
z = data.frame(c = c(1,2,3), row.names = letters[1:3])
l = list(x,y,z)
merge_all(l, by = "row.names")
Error in -ncol(df) : invalid argument to unary operator
What's the best way to do this?
Merging by row.names does weird things - it creates a column called Row.names, which makes subsequent merges hard.
To avoid that issue you can instead create a column with the row names (which is generally a better idea anyway - row names are very limited and hard to manipulate). One way of doing that with the data as given in OP (not the most optimal way, for more optimal and easier ways of dealing with rectangular data I recommend getting to know data.table instead):
Reduce(merge, lapply(l, function(x) data.frame(x, rn = row.names(x))))
maybe there exists a faster version using do.call or *apply, but this works in your case:
x = data.frame(X = c(1,2,3), row.names = letters[1:3])
y = data.frame(Y = c(1,2,3), row.names = letters[1:3])
z = data.frame(Z = c(1,2,3), row.names = letters[1:3])
merge.all <- function(x, ..., by = "row.names") {
L <- list(...)
for (i in seq_along(L)) {
x <- merge(x, L[[i]], by = by)
rownames(x) <- x$Row.names
x$Row.names <- NULL
}
return(x)
}
merge.all(x,y,z)
important may be to define all the parameters (like by) in the function merge.all you want to forward to merge since the whole ... arguments are used in the list of objects to merge.
As an alternative to Reduce and merge:
If you put all the data frames into a list, you can then use grep and cbind to get the data frames with the desired row names.
## set up the data
> x <- data.frame(x1 = c(2,4,6), row.names = letters[1:3])
> y <- data.frame(x2 = c(3,6,9), row.names = letters[1:3])
> z <- data.frame(x3 = c(1,2,3), row.names = letters[1:3])
> a <- data.frame(x4 = c(4,6,8), row.names = letters[4:6])
> lst <- list(a, x, y, z)
## combine all the data frames with row names = letters[1:3]
> gg <- grep(paste(letters[1:3], collapse = ""),
sapply(lapply(lst, rownames), paste, collapse = ""))
> do.call(cbind, lst[gg])
## x1 x2 x3
## a 2 3 1
## b 4 6 2
## c 6 9 3