applying a created function in a list with data frames - r

I would like to calculate the variation coefficient in a list composed of several data frames. However, when I apply my function that calculates the variation coeficient in my list of data frames I am getting this error:
coef_var = lapply(dists_log, cvs)
Error in is.data.frame(x) :
'list' object cannot be coerced to type 'double'
Here what I did:
List = list (A = data.frame(A = rnorm(30), B = rnorm(30), C =rnorm (30), D = rnorm(30)),
B = data.frame(A = rnorm(30), B = rnorm(30), C =rnorm (30), D = rnorm(30)),
C = data.frame(A = rnorm(30), B = rnorm(30), C =rnorm (30), D = rnorm(30)),
D = data.frame(A = rnorm(30), B = rnorm(30), C =rnorm (30), D = rnorm(30)))
#function to calculate the variation coeficient
cvs <- function (dist){
cv <- sd(dist, na.rm=T) / mean(dist, na.rm=T) * 100
return(cv)
}
The I run:
coef_var = lapply(dists_log, cvs)
and got the error message above
Can someone help me with this error?

We need a nested list as sd and mean requires the input to be vector and not a data.frame. So, we loop over the columns of the data.frame with lapply, apply the 'cvs' function, assign back to the object and return the data.frame object
lapply(dists_log, function(x) {x[] <- lapply(x, cvs); x})
If we are expecting only a single element as output
lapply(dists_log, function(x) unlist(lapply(x, cvs)))

Related

How to merge dataframe lists of unequal length?

This question is similar to Joining dataframes from lists of unequal length.
I have a shiny script where I am using fileImport to allow the user to import a variable number of data files. Each datafile is then split into a list of dataframes, and these are imported as a list. So I have a list of a list of dataframes.
The input datafiles have two format possibilities, one may be 129 dataframes long, the other may be 67 - where the 67 is actually a subset of the 129 (so all 67 are present in the 129, but not all 129 are present in the 67). I am then trying to rbind the dataframes by name.
A reproducible example:
# Some data
df.l1 <- list(df1 = data.frame(A = letters[1:10],
B = rnorm(10, 5, 1)),
df2 = data.frame(A = letters[11:20],
B = rnorm(10, 10, 2)))
df.l2 <- list(df1 = data.frame(A = letters[1:10],
B = rnorm(10, 5, 1)),
df2 = data.frame(A = letters[11:20],
B = rnorm(10, 10, 2)))
df.l3 <- list(df1 = data.frame(A = letters[1:10],
B = rnorm(10, 5, 1)),
df2 = data.frame(A = letters[11:20],
B = rnorm(10, 10, 2)),
df3 = data.frame(A = LETTERS[1:10],
B = rnorm(10, 15, 2)))
This works when binding lists of equal length (e.g. df.l1 and df.l2)
df.two <- list(df.l1, df.l2)
list.merged <- do.call(function(...) Map(rbind, ...), df.two)
But fails when binding list of dataframes with variables lengths.
df.three <- list(df.l1, df.l2, df.l3)
list.merged <- do.call(function(...) Map(rbind, ...), df.three)
Giving the error:
Warning messages:
1: In mapply(FUN = f, ..., SIMPLIFY = FALSE) :
longer argument not a multiple of length of shorter
2: In mapply(FUN = f, ..., SIMPLIFY = FALSE) :
longer argument not a multiple of length of shorter
As I said above, similar questions have been asked, but this situation is unique given the variable number of lists I am trying to merge. Help is greatly appreciated!
For a robust handling of this I would use dplyr::bind_rows or data.table::rbindlist. First you bind each list, then you bind at the upper level:
tidyverse version:
library(dplyr)
bind_rows(lapply(df.three, bind_rows))
data.table version:
library(data.table)
rbindlist(lapply(df.three, rbindlist))
Not only will this handle weird corner cases you don't expect, but it will also be much faster than do.call.
edit in response to comment
Try this:
library(purrr)
library(dplyr)
df_names <- unique(unlist(sapply(df.three, names)))
result <- list()
for (n in df_names) {
result[[n]] <- map(df.three, n)
}
map(result, dplyr::bind_rows)

function applied to dataset R

Below are two dataframes labeled as 'A' and 'C'. I have created a function that would take the top 5 rows for dataframe and want the same applied to dataframe C. However, it only replicates it for A. How would I have this function be applied for C only. Thanks!
L3 <- LETTERS[1:3]
fac <- sample(L3, 10, replace = TRUE)
(d <- data.frame(x = 1, y = 1:10, fac = fac))
## The "same" with automatic column names:
A<-data.frame(1, 1:10, sample(L3, 10, replace = TRUE))
L3 <- LETTERS[7:9]
fac <- sample(L3, 10, replace = TRUE)
(d <- data.frame(x = 1, y = 1:10, fac = fac))
## The "same" with automatic column names:
C<-data.frame(1, 1:10, sample(L3, 10, replace = TRUE))
function_y<-function(Data_Analysis_Task) {
sample2<-head(A, 5)
return(sample2)
}
D<-function_y(C)
We need to have the same argument passed inside the function as well
function_y <- function(Data_Analysis_Task) {
head(Data_Analysis_Task, 5)
}
D <- function_y(C)
If we use head(A, 5), inside the function, it looks for the object 'A', inside the function env first, then if it doesn't find, looks at the parent env, and so on until it finds the object 'A' in the global env. So, it would return the same output of head of 'A' every time the function is called

Automatically add any variables that exist in one data.frame but missing in other data.frames in R

Suppose I have a reference data.frame called a. I was wondering how I could automatically add any variables that exist in a but missing in other data.frames b and d?
NOTE: My goal is to make a function out of this such that any number of data.frames, and any number of variables can be completed based on a single reference data.frame.
a <- data.frame(x = 2:3, y = 4:5, z = c(T, F)) ## reference data.frame
b <- data.frame(x = 6:7) ## Add y and z here
d <- data.frame(x = 7:8) ## Add y and z here
Supposing all the data.frames involved share the same number of rows, you can simply:
toadd<-setdiff(colnames(a),colnames(b))
b[toadd]<-a[toadd]
Wrapping the above in a function:
f<-function(refdf, ...) {
res<-listdf<-list(...)
res<-lapply(listdf, function(x) {
toadd<-setdiff(names(refdf),names(x))
x[toadd]<-refdf[toadd]
x
})
c(list(refdf),res)
}
Then try for instance:
f(a,b)
f(a,b,d)
# Using a reference data.frame perform a right join in order
# to append required vectors to provided data.frames:
add_empty_vecs <- function(refdf, ...){
# Store the names of the other data.frames: df_names => character vector
df_names <- as.list(substitute(list(...)))[-1L]
# Return the right joined the reference data.frame to the
# provided data.frames: list => .GlobalEnv()
setNames(lapply(list(...), function(y){
merge(refdf, y, by = intersect(names(refdf), names(y)), all.y = TRUE)
}
), c(df_names))
}
# Apply function only df b:
add_empty_vecs(a, b)
# Apply function to both df b & df d:
add_empty_vecs(a, b, d)
# Apply function to all b, d, e:
add_empty_vecs(a, b, d, e)
Data:
a <- data.frame(x = 2:3, y = 4:5, z = c(T, F)) ## reference data.frame
b <- data.frame(x = 6:7) ## Add y and z here
d <- data.frame(x = 7:8) ## Add y and z here
e <- data.frame(x = 9:10)

Merging list elements of two lists if unequal length

I have two lists. List elements are data.tables.
One list contains all Keys:
listA <- list(Key1 = data.table(A = rnorm(5), B = rnorm(5), C = rnorm(5)),
Key2 = data.table(A = rnorm(5), B = rnorm(5), C = rnorm(5)),
Key3 = data.table(A = rnorm(5), B = rnorm(5), C = rnorm(5)))
The other list is a subset with additional information:
listB <- list(Key1 = data.table(D = "B"),
Key2 = data.table(D = "N"))
I want to add column D from the tables in listB to the tables in listA, where the Key is matching. I have tried with:
mapply(FUN = function(x, y) x[, D := y[, D]], x = listA, y = listB, SIMPLIFY = F)
but this throws the warning.
Warning message:
In mapply(FUN = function(x, y) x[, :=(D, y[, D])], x = listA, :
longer argument not a multiple of length of Shorter
In the end it is doing the job, but recycles the D column for the Key not in listB.
How can I achieve that column D is only added for the tables if the Key matches. Or even better to add column D for all tables in listA and if there is no match, then just fill NA in column D in listA. Thanks.
What is happening here is that if you use mapply with vectors or lists of different lenghts, it will repeat the shorter element. Thus, if you have the two lists:
mapply(FUN, list(a1, a2, a3), list(b1, b2))
Is equivalent to doing (throwing a warning):
mapply(FUN, list(a1, a2, a3), list(b1, b2, b1))
What you can do to avoid this is create in listB the data tables that don't have a matching in listA, with a column D filled with NA's. Something like:
listB[[setdiff(names(listA), names(listB))]] <- data.table(D = NA)
mapply(FUN = function(x, y) x[, D := y[, D]], x = listA, y = listB, SIMPLIFY = FALSE)
Now it does not throw warnings and you have column D from non-matching elements filled with NA's.

R: object y not found in function (x,y) [function to pass through data frames in r]

I am writing a function to build new data frames based on existing data frames. So I essentially have
f1 <- function(x,y) {
x_adj <- data.frame("DID*"= df.y$`DM`[x], "LDI"= df.y$`DirectorID*`[-(x)], "LDM"= df.y$`DM`[-(x)], "IID*"=y)
}
I have 4,000 data frames df., so I really need to use this and R is returning an error saying that df.y is not found. y is meant to be used through a list of all the 4000 names of the different df. I am very new at R so any help would be really appreciated.
In case more specifics are needed I essentially have something like
df.1 <- data.frame(x = 1:3, b = 5)
And I need the following as a result using a function
df.11 <- data.frame(x = 1, c = 2:3, b = 5)
df.12 <- data.frame(x = 2, c = c(1,3), b = 5)
df.13 <- data.frame(x = 3, c = 1:2, b = 5)
Thanks in advance!
OP seems to access data.frame with dynamic name.
One option is to use get:
get(paste("df",y,sep = "."))
The above get will return df.1.
Hence, the function can be modified as:
f1 <- function(x,y) {
temp_df <- get(paste("df",y,sep = "."))
x_adj <- data.frame("DID*"= temp_df$`DM`[x], "LDI"= temp_df$`DirectorID*`[-(x)],
"LDM"= temp_df$`DM`[-(x)], "IID*"=y)
}

Resources