How to rbind output of repeated function to a df - vectorized - r

I am reading everywhere that you should not use for-loops in R, but rather do it 'vectorized'. However I find for-loops intuitive and struggle with transforming my code.
I have a function f1 that I want to use multiple times. The inputs for the function are in a list called l1. My f1 outputs a df. I want to rbind these output dfs into one df. The for loop I have now is this:
z3 <- data.frame()
for(i in l1) {
z3 <- rbind(z3, f1(i))
}
Could anyone help me to do the same, but without the for-loop?

You can use lapply(), and do.call()
do.call(rbind, lapply(l1, f1))

another more verbose approach:
## function that returns a 1 x 2 dataframe:
get_fresh_straw <- function(x) data.frame(col1 = x, col2 = x * pi)
l1 = list(A = 1, B = 5, C = 2)
Reduce(l1,
f = function(camel_back, list_item){
rbind(camel_back, get_fresh_straw(list_item))
},
init = data.frame(col1 = NULL, col2 = NULL)
)

Related

Is there a way to sum together lists of data frames within a larger list?

I have a large list (z) containing 3 lists of 10 data frames. I would like to collapse this object into a list of 3 data frames where each data frame is the sum of the 10 prior data frames (think matrix addition). Here is what I am working with, keep in mind that these are fake numbers, as the real data are read in from hundreds of *.csv files
x = rep(1,100)
x = matrix(x,10,10)
x = as.data.frame(x)
y = list(x,x,x,x,x,x,x,x,x,x)
z = list(y,y,y)
The desired end product would look like this:
x1 = rep(10,100)
x1 = matrix(x,10,10)
y1 = list(x1,x1,x1)
I keep trying stuff along the lines of:
z1 = c()
for (i in 1:3){
for (j in 1:10){
z1[[i]] = sum(z[[i]][[j]])
}
}
However, this does not yield the desired output. I have also messed around with some of the the apply functions, but to no avail
Thanks in advance for your help!
We can use Reduce to sum the corresponding i, j elements in the list and collapse it to a single dataset
lapply(z, function(x) Reduce(`+`, x))
If we want to remove the last column which is not numeric
lapply(z, function(x) Reduce(`+`, lapply(x, function(y) y[-ncol(y)])))
Or it can be looped over the sequence of list
lapply(seq_along(z), function(i) Reduce(`+`, lapply(seq_along(z[[i]]),
function(j) z[[i]][[j]][-ncol(z[[i]][[j]])])))
If we want to use sum, the data.frames inside the list can be converted to an array, loop over the array with apply, specify the MARGIN and do the sum. In this option, there is also possiblity to take care of NA elements with na.rm = TRUE in sum
lapply(z, function(x) apply(array(unlist(x), c(10, 10, 10)),
1:2, sum, na.rm = TRUE))
Or make it more efficient by looping only on one dimension and use colSums
lapply(z, function(x) apply(array(unlist(x), c(10, 10, 10)), 1, colSums, na.rm = TRUE))
Or using a for loop
z1 <- replicate(length(z), matrix(0, 10, 10), simplify = FALSE)
for(i in seq_along(z)) for(j in seq_along(z[[1]])) z1[[i]] <- z1[[i]] + z[[i]][[j]]

R: object y not found in function (x,y) [function to pass through data frames in r]

I am writing a function to build new data frames based on existing data frames. So I essentially have
f1 <- function(x,y) {
x_adj <- data.frame("DID*"= df.y$`DM`[x], "LDI"= df.y$`DirectorID*`[-(x)], "LDM"= df.y$`DM`[-(x)], "IID*"=y)
}
I have 4,000 data frames df., so I really need to use this and R is returning an error saying that df.y is not found. y is meant to be used through a list of all the 4000 names of the different df. I am very new at R so any help would be really appreciated.
In case more specifics are needed I essentially have something like
df.1 <- data.frame(x = 1:3, b = 5)
And I need the following as a result using a function
df.11 <- data.frame(x = 1, c = 2:3, b = 5)
df.12 <- data.frame(x = 2, c = c(1,3), b = 5)
df.13 <- data.frame(x = 3, c = 1:2, b = 5)
Thanks in advance!
OP seems to access data.frame with dynamic name.
One option is to use get:
get(paste("df",y,sep = "."))
The above get will return df.1.
Hence, the function can be modified as:
f1 <- function(x,y) {
temp_df <- get(paste("df",y,sep = "."))
x_adj <- data.frame("DID*"= temp_df$`DM`[x], "LDI"= temp_df$`DirectorID*`[-(x)],
"LDM"= temp_df$`DM`[-(x)], "IID*"=y)
}

Least error prone way to add columns to an R data.frame through functions

The case I have is I want to "tack on" a bunch of columns to an existing data.frame, where each column is a function that does math on other columns. My goals are:
I want to specify the functions once
I don't want to worry about having to pass arguments in the right order and/or match them by name
I want to specify the order in which to apply the functions once
I want the new column names to be the function names
Ideally I want something like:
df <- data.frame(a = rnorm(10), b = rnorm(10))
y <- function (x) a + b
z <- function (x) b * y
df2 <- lapply (list (y, z), df)
where df2 is a data.frame with 4 columns: a, b, y and z. I think this achieves the goals.
The closest I've gotten to this is the following:
df <- data.frame(a = rnorm(10), b = rnorm(10))
y <- function (x) x$a + x$b
z <- function (x) x$b * x$y
funs <- list (
y = y,
z = z
)
df2 <- df
df2$y <- funs$y(df2)
df2$z <- funs$z(df2)
This achieves goals 1 and 2, but not 3 and 4.
Thanks in advance for the help.
This maybe the thing you want. After defining the function dfapply, it can be used very similar to your original intention without too much things like x$a etc, except to use expression instead of function.
dfapply <- function(exprs, df){
for (expr in exprs) {
df <- within(df, eval(expr))
}
df
}
df <- data.frame(a = rnorm(10), b = rnorm(10))
expr1 <- expression(y <- a + b)
expr2 <- expression(z <- b * y)
df2 <- dfapply(c(expr1, expr2), df)

Perform Student's t-test between data.frames contained in two lists

I have got two separate lists which contain 4 data.frames each one. I need to perform a Student's t-test (t.test) for rainfall between each data.frames within the two lists.
Here the lists:
lst1 = list(data.frame(rnorm(20), rnorm(20)), data.frame(rnorm(25), rnorm(25)), data.frame(rnorm(16), rnorm(16)), data.frame(rnorm(34), rnorm(34)))
lst1 = lapply(lst1, setNames, c('rainfall', 'snow'))
lst2 = list(data.frame(rnorm(19), rnorm(19)), data.frame(rnorm(38), rnorm(38)), data.frame(rnorm(22), rnorm(22)), data.frame(rnorm(59), rnorm(59)))
lst2 = lapply(lst2, setNames, c('rainfall', 'snow'))
What I would need to do is:
t.test(lst1[[1]]$rainfall, lst2[[1]]$rainfall)
t.test(lst1[[2]]$rainfall, lst2[[2]]$rainfall)
t.test(lst1[[3]]$rainfall, lst2[[3]]$rainfall)
t.test(lst1[[4]]$rainfall, lst2[[4]]$rainfall)
I can do it as above by writing each of the 4 data.frames (I actually have 40 with my real data) but I would like to know if there exists a smarter and quickier way to do it.
Here below what I tried (without success):
myfunction = function(x,y) {
test = t.test(x, y)
return(test)
}
result = mapply(myfunction, x=lst1, y=lst2)
x <- NULL
for (i in seq_along(lst1)){
x[[i]] <- t.test(lst1[[i]]$rainfall, lst2[[i]]$rainfall)
}
x
Works for me. I would use simplify = FALSE to get the results formatted better though.
lst1 <- list()
lst1[[1]] <- data.frame(rainfall = rnorm(10))
lst1[[2]] <- data.frame(rainfall = rnorm(10))
lst2 <- list()
lst2[[1]] <- data.frame(rainfall = rnorm(10))
lst2[[2]] <- data.frame(rainfall = rnorm(10))
myfunction = function(x,y) {
test = t.test(x$rainfall, y$rainfall)
return(test)
}
mapply(myfunction, x = lst1, y = lst2, SIMPLIFY = FALSE)

looping through dataframes using 'for' [duplicate]

This question already has answers here:
How can R loop over data frames?
(2 answers)
Closed 6 years ago.
Here is a simple made up data set:
df1 <- data.frame(x = c(1,2,3),
y = c(4,6,8),
z= c(1, 6, 7))
df2 <- data.frame(x = c(3,5,6),
y = c(3,4,9),
z= c(6, 7, 7))
What I want to do is to create a new variable "a" which is just the sum of all three variables (x,y,z)
Instead of doing this separately for each dataframe I thought it would be more efficient to just create a loop. So here is the code I wrote:
my.list<- list(df1, df2)
for (i in 1:2) {
my.list[i]$a<- my.list[i]$x +my.list[i]$y + my.list[i]$z
}
or alternatively
for (i in 1:2) {
my.list[i]<- transform(my.list[i], a= x+ y+ z)
}
In both cases it does not work and the error "number of items to replace is not a multiple of replacement length" is returned.
What would be the best solution to writing a loop code where I can loop through dataframes?
See ?Extract:
Recursive (list-like) objects
Indexing by [ is similar to atomic vectors and selects a list of the
specified element(s).
Both [[ and $ select a single element of the list.
In short, my.list[i] returns a list of length 1, and you are trying to assign it a data.frame, so that doesn't work; whereas my.list[[i]] returns the data.frame #i in your list, which you can replace with a data.frame.
So you can use either:
for (i in 1:2) {
my.list[[i]]$a<- my.list[[i]]$x +my.list[[i]]$y + my.list[[i]]$z
}
or
for (i in 1:2) {
my.list[[i]]<- transform(my.list[[i]], a= x+ y+ z)
}
But it would be even simpler to use lapply, where you don't need [[:
my.list <- lapply(my.list, function(df) df$a <- df$x + df$y + df$z)
Rather than using an explicit loop to extract the data.frames from the list, just use lapply. It takes a list of data.frames (or any object) and a function, applies the function to every element of the list, and returns a list with the results.
# Sample data
df1 <- data.frame(x = c(1,2,3), y = c(4,6,8), z = c(1, 6, 7))
df2 <- data.frame(x = c(3,5,6), y = c(3,4,9), z = c(6, 7, 7))
# Put them in a list
df_list <- list(df1, df2)
# Use lapply to iterate. FUN takes the function you want, and
# then its arguments (a = x + y + z) are just listed after it.
result_list <- lapply(df_list, FUN = transform, a = x + y + z)

Resources