I am trying to apply a function to different dataframes. After doing that I want to get the resulting dataframe and save them keeping their original names and adding something else to differentiate the new dataframes.
This is what I've tried, which is obviously not working.
#Creating dummi data
N <- 8
df1 <- data.frame(x1 = rnorm(N), x2 = sample(1:10, size = N, replace = TRUE), x3 = 1*(runif(n = N) < .75))
df2 <- data.frame(y1 = rnorm(N), y2 = sample(100:200, size = N, replace = TRUE), y3 = runif(N))
df3 <- data.frame(z1 =rnorm(N), z2 = sample(8:80, size = N,replace = TRUE), Z3 = runif(N))
# Making a list of the three data frames
mydata <- list(df1=df1, df2=df2, df3= df3)
#Applying a function to mydata list
mydata2 <- lapply(mydata, function(x) mean(unlist(x)))
# Renaming each dataset
n <- 1:length(mydata2)
noms <- names(mydata2)
for (i in 1:n){
mynewlist <- lapply(mydata2, function(x) {names(x) <-("_mean", sep ="");
return(x))}
Please any help will be deeply apreciated.
We can use list2env if we need to create multiple objects in the global environment (though not recommended as most of the operations can be done within the list itself).
We change the names of the list by pasteing a suffix substring and then use list2env
list2env(setNames(mydata2, paste0(names(mydata2),
"_newname")), envir=.GlobalEnv)
Related
This question is similar to Joining dataframes from lists of unequal length.
I have a shiny script where I am using fileImport to allow the user to import a variable number of data files. Each datafile is then split into a list of dataframes, and these are imported as a list. So I have a list of a list of dataframes.
The input datafiles have two format possibilities, one may be 129 dataframes long, the other may be 67 - where the 67 is actually a subset of the 129 (so all 67 are present in the 129, but not all 129 are present in the 67). I am then trying to rbind the dataframes by name.
A reproducible example:
# Some data
df.l1 <- list(df1 = data.frame(A = letters[1:10],
B = rnorm(10, 5, 1)),
df2 = data.frame(A = letters[11:20],
B = rnorm(10, 10, 2)))
df.l2 <- list(df1 = data.frame(A = letters[1:10],
B = rnorm(10, 5, 1)),
df2 = data.frame(A = letters[11:20],
B = rnorm(10, 10, 2)))
df.l3 <- list(df1 = data.frame(A = letters[1:10],
B = rnorm(10, 5, 1)),
df2 = data.frame(A = letters[11:20],
B = rnorm(10, 10, 2)),
df3 = data.frame(A = LETTERS[1:10],
B = rnorm(10, 15, 2)))
This works when binding lists of equal length (e.g. df.l1 and df.l2)
df.two <- list(df.l1, df.l2)
list.merged <- do.call(function(...) Map(rbind, ...), df.two)
But fails when binding list of dataframes with variables lengths.
df.three <- list(df.l1, df.l2, df.l3)
list.merged <- do.call(function(...) Map(rbind, ...), df.three)
Giving the error:
Warning messages:
1: In mapply(FUN = f, ..., SIMPLIFY = FALSE) :
longer argument not a multiple of length of shorter
2: In mapply(FUN = f, ..., SIMPLIFY = FALSE) :
longer argument not a multiple of length of shorter
As I said above, similar questions have been asked, but this situation is unique given the variable number of lists I am trying to merge. Help is greatly appreciated!
For a robust handling of this I would use dplyr::bind_rows or data.table::rbindlist. First you bind each list, then you bind at the upper level:
tidyverse version:
library(dplyr)
bind_rows(lapply(df.three, bind_rows))
data.table version:
library(data.table)
rbindlist(lapply(df.three, rbindlist))
Not only will this handle weird corner cases you don't expect, but it will also be much faster than do.call.
edit in response to comment
Try this:
library(purrr)
library(dplyr)
df_names <- unique(unlist(sapply(df.three, names)))
result <- list()
for (n in df_names) {
result[[n]] <- map(df.three, n)
}
map(result, dplyr::bind_rows)
I have two lists of dataframes, the first list of dfs hold values that extend down the column and the second list of dfs holds single values like this:
dynamic_df_1 <- data.frame(x = 1:10)
dynamic_df_2 <- data.frame(y = 1:10)
df_list <- list(dynamic_df_1, dynamic_df_2)
df_list
static_df_1 <- data.frame(mu = 10,
stdev = 5)
static_df_2 <- data.frame(mu = 12,
stdev = 6)
static_df_list <- list(stat_df1 = static_df_1,
stat_df2 = static_df_2)
static_df_list
I would like to add a column to each dataframe (dynamic_df_1 and dynamic_df_2) using values from static_df_1 and static_df_2 to perform the calculation where the calculation for dynamic_df_1 computes with static_df_1 and the calculation for dynamic_df_2 computes with static_df_2.
The result I'm aiming for is this:
df_list[[1]] <- df_list[[1]] %>%
mutate(z = dnorm(x = df_list[[1]]$x, mean = static_df_list$stat_df1$mu, sd = static_df_list$stat_df1$stdev))
df_list
df_list[[2]] <- df_list[[2]] %>%
mutate(z = dnorm(x = df_list[[2]]$y, mean = static_df_list$stat_df2$mu, sd = static_df_list$stat_df2$stdev))
df_list
I can take a loop approach which gets messy with more complex functions in my real code:
for (i in 1:length(df_list)) {
df_list[[i]]$z <- dnorm(x = df_list[[i]][[1]], mean = static_df_list[[i]]$mu, sd = static_df_list[[i]]$stdev)
}
df_list
I'm trying to find an lapply / map / mutate type solution that calculates across dataframes - imagine a grid of dataframes where the objective is to calculate across rows. Also open to other solutions such as single df with nested values but haven't figured out how to do that yet.
Hope that is clear - I did my best!
Thanks!
This Map solution seems to be simpler. And the results are identical(). The code that creates df_list2 and df_list3 follows below.
df_list4 <- df_list
fun <- function(DF, Static_DF){
DF[["z"]] = dnorm(DF[[1]], mean = Static_DF[["mu"]], sd = Static_DF[["stdev"]])
DF
}
df_list4 <- Map(fun, df_list4, static_df_list)
identical(df_list2, df_list3)
#[1] TRUE
identical(df_list2, df_list4)
#[1] TRUE
Data.
After running the question's code that creates the initial df_list, run the dplyr pipe and for loop code:
df_list2 <- df_list
df_list2[[1]] <- df_list2[[1]] %>%
mutate(z = dnorm(x = df_list2[[1]]$x, mean = static_df_list$stat_df1$mu, sd = static_df_list$stat_df1$stdev))
df_list2[[2]] <- df_list2[[2]] %>%
mutate(z = dnorm(x = df_list2[[2]]$y, mean = static_df_list$stat_df2$mu, sd = static_df_list$stat_df2$stdev))
df_list3 <- df_list
for (i in 1:length(df_list3)) {
df_list3[[i]]$z <- dnorm(x = df_list3[[i]][[1]], mean = static_df_list[[i]]$mu, sd = static_df_list[[i]]$stdev)
}
The data I have contain three variables. There are three unique IDs and each has multiple records.
ID <- c(rep(1,2), rep(2,1), rep(3,2))
y0 <- c(rep(5,2), rep(3,1), rep(1,2))
z0 <- c(rep(1,2), rep(13,1), rep(4,2))
dat1 <- data.frame(ID, y0,z0)
What I am trying to is repeat the whole data N times (N needs to be a parameter), and I need to add a new column with the repetition number.
So if N = 2, the new data look like:
rep <- c(rep(1,2), rep(2,2), rep(1,1), rep(2,1), rep(1,2), rep(2,2))
ID <- c(rep(1,4), rep(2,2), rep(3,4))
y0 <- c(rep(5,4), rep(3,2), rep(1,4))
z0 <- c(rep(1,4), rep(13,2), rep(4,4))
dat2 <- data.frame(rep, ID, y0,z0)
We replicate the sequence of rows and order it later to get the expected output
res <- cbind(rep = rep(seq_len(2), each = nrow(dat1)), dat1[rep(seq_len(nrow(dat1)), 2),])
resN <- res[order(res$ID),]
row.names(resN) <- NULL
all.equal(dat2, resN, check.attributes = FALSE)
#[1] TRUE
Or another option is to replicate into a list and then with Map create the 'rep' column (it is not recommended to have function names as column names, object names etc.) and rbind the list elements
res1 <- do.call(rbind, Map(cbind, rep = seq_len(2), replicate(2, dat1, simplify = FALSE)))
res2 <- res1[order(res1$ID),]
row.names(res2) <- NULL
all.equal(dat2, res2, check.attributes = FALSE)
#[1] TRUE
Consider these three dataframes in a nested list:
df1 <- data.frame(a = runif(10,1,10), b = runif(10,1,10), c = runif(10,1,10))
df2 <- data.frame(a = runif(10,1,10), b = runif(10,1,10), c = runif(10,1,10))
df3 <- data.frame(a = runif(10,1,10), b = runif(10,1,10), c = runif(10,1,10))
dflist1 <- list(df1,df2,df3)
dflist2 <- list(df1,df2,df3)
nest_list <- list(dflist1, dflist2)
I want to do a 'cor.test' between column 'a' against column 'a', 'b' against 'b' and 'c' against 'c' in all 'dfs' for each dflist. I can do it individually if assign each one to the global environment with the code below thanks to this post:
for (i in 1:length(nest_list)) { # extract dataframes from list in to individual dfs
for(j in 1:length(dflist1)) {
temp_df <- Norm_red_list[[i]][[j]]}
ds <- paste (names(nest_list[i]),names(nestlist[[i]][[j]]), sep = "_")
assign(ds,temp_df)
}
}
combn(paste0("df", 1:3), 2, FUN = function(x) { #a ctual cor.test
x1 <- mget(x, envir = .GlobalEnv)
Map(function(x,y) cor.test(x,y, method = "spearman")$p.value, x1[[1]], x1[[2]])})
I am not sure that I understand exactly what you want to do but could something like this help you ?
#vector of your columns name
columns <- c("a","b","c")
n <- length(columns)
# correlation calculation function
correl <- function(i,j,data) {cor.test(unlist(data[i]),unlist(data[j]), method = "spearman")$p.value}
correlfun <- Vectorize(correl, vectorize.args=list("i","j"))
# Make a "loop" on columns vector (u will then be each value in columns vector, "a" then "b" then "c")
res <- sapply(columns,function(u){
# Create another loop on frames that respect the condition names(x)==u (only the data stored in columns "a", "b" or "c")
lapply(lapply(nest_list,function(x){sapply(x,function(x){x[which(names(x)==u)]})}),function(z)
# on those data, use the function outer to apply correlfun function on each pair of vectors
{outer(1:n,1:n,correlfun,data=z)})},simplify = FALSE,USE.NAMES = TRUE)
Is this helping ? Not sure I'm really clear in my explanation :)
I need to do this for a list of dataframes that all have a common variable. I want to expand each dataframe so that they would have the common variable expanded to all of the levels present in all of the dataframes.
myList <- list(A = data.frame(A1 = rnorm(10), A2 = rnorm(10), A3 = rnorm(10),
year = factor(c(2000:2009))),
B = data.frame(B1 = rnorm(10), B2 = rnorm(10), B3 = rnorm(10),
year = factor(c(2001:2010))))
masterYear <- unique(unlist(lapply(myList, function(x) levels(x$year)), use.names = F))
I've thus far tried to use dplyr and tidyr packages in a function
funExpand <- function(x){
levels(x$year) <- c(levels(x$year), setdiff(masterYear, levels(x$year)))
vars <- names(x)[-length(names(x))]
x %>%
tidyr::complete_(x, c(vars), fill = list(0))
x
}
myList2 <- lapply(myList, funExpand)
But that yields an error. I've tried various combinations of tidyr::complete and tidyr::complete_ functions (first argument x or year?), all yielding some error. That tells me that I'm not interpreting the complete functions correctly.
Aside fixes for this error, I also welcome all suggestions for improving the process.
Updated to reflect comment by OP
Try this,
myList2 <- lapply(myList,
function(db) {
db$year <- factor(as.character(db$year), levels=masterYear)
merge(db, data.frame(year=setdiff(masterYear, db$year)), all=T)
})
The new rows will have NA, if you really need them to be 0 add another line db[is.na(db)] <- 0 in the function.
I guess you don't need x %>%
funExpand <- function(x) {levels(x$year) <- c(levels(x$year),
setdiff(masterYear, levels(x$year)))
vars <- names(x)[-length(names(x))]
complete_(x, vars, fill=list(0))}
lapply(myList, funExpand)