looping through dataframes using 'for' [duplicate] - r

This question already has answers here:
How can R loop over data frames?
(2 answers)
Closed 6 years ago.
Here is a simple made up data set:
df1 <- data.frame(x = c(1,2,3),
y = c(4,6,8),
z= c(1, 6, 7))
df2 <- data.frame(x = c(3,5,6),
y = c(3,4,9),
z= c(6, 7, 7))
What I want to do is to create a new variable "a" which is just the sum of all three variables (x,y,z)
Instead of doing this separately for each dataframe I thought it would be more efficient to just create a loop. So here is the code I wrote:
my.list<- list(df1, df2)
for (i in 1:2) {
my.list[i]$a<- my.list[i]$x +my.list[i]$y + my.list[i]$z
}
or alternatively
for (i in 1:2) {
my.list[i]<- transform(my.list[i], a= x+ y+ z)
}
In both cases it does not work and the error "number of items to replace is not a multiple of replacement length" is returned.
What would be the best solution to writing a loop code where I can loop through dataframes?

See ?Extract:
Recursive (list-like) objects
Indexing by [ is similar to atomic vectors and selects a list of the
specified element(s).
Both [[ and $ select a single element of the list.
In short, my.list[i] returns a list of length 1, and you are trying to assign it a data.frame, so that doesn't work; whereas my.list[[i]] returns the data.frame #i in your list, which you can replace with a data.frame.
So you can use either:
for (i in 1:2) {
my.list[[i]]$a<- my.list[[i]]$x +my.list[[i]]$y + my.list[[i]]$z
}
or
for (i in 1:2) {
my.list[[i]]<- transform(my.list[[i]], a= x+ y+ z)
}
But it would be even simpler to use lapply, where you don't need [[:
my.list <- lapply(my.list, function(df) df$a <- df$x + df$y + df$z)

Rather than using an explicit loop to extract the data.frames from the list, just use lapply. It takes a list of data.frames (or any object) and a function, applies the function to every element of the list, and returns a list with the results.
# Sample data
df1 <- data.frame(x = c(1,2,3), y = c(4,6,8), z = c(1, 6, 7))
df2 <- data.frame(x = c(3,5,6), y = c(3,4,9), z = c(6, 7, 7))
# Put them in a list
df_list <- list(df1, df2)
# Use lapply to iterate. FUN takes the function you want, and
# then its arguments (a = x + y + z) are just listed after it.
result_list <- lapply(df_list, FUN = transform, a = x + y + z)

Related

Applying a Function to a Data Frame : lapply vs traditional way

I have this data frame in R:
x <- seq(1, 10,0.1)
y <- seq(1, 10,0.1)
data_frame <- expand.grid(x,y)
I also have this function:
some_function <- function(x,y) { return(x+y) }
Basically, I want to create a new column in the data frame based on "some_function". I thought I could do this with the "lapply" function in R:
data_frame$new_column <-lapply(c(data_frame$x, data_frame$y),some_function)
This does not work:
Error in `$<-.data.frame`(`*tmp*`, f, value = list()) :
replacement has 0 rows, data has 8281
I know how to do this in a more "clunky and traditional" way:
data_frame$new_column = x + y
But I would like to know how to do this using "lapply" - in the future, I will have much more complicated and longer functions that will be a pain to write out like I did above. Can someone show me how to do this using "lapply"?
Thank you!
When working within a data.frame you could use apply instead of lapply:
x <- seq(1, 10,0.1)
y <- seq(1, 10,0.1)
data_frame <- expand.grid(x,y)
head(data_frame)
some_function <- function(x,y) { return(x+y) }
data_frame$new_column <- apply(data_frame, 1, \(x) some_function(x["Var1"], x["Var2"]))
head(data_frame)
To apply a function to rows set MAR = 1, to apply a function to columns set MAR = 2.
lapply, as the name suggests, is a list-apply. As a data.frame is a list of columns you can use it to compute over columns but within rectangular data, apply is often the easiest.
If some_function is written for that specific purpose, it can be written to accept a single row of the data.frame as in
x <- seq(1, 10,0.1)
y <- seq(1, 10,0.1)
data_frame <- expand.grid(x,y)
head(data_frame)
some_function <- function(row) { return(row[1]+row[2]) }
data_frame$yet_another <- apply(data_frame, 1, some_function)
head(data_frame)
Final comment: Often functions written for only a pair of values come out as perfectly vectorized. Probably the best way to call some_function is without any function of the apply-familiy as in
some_function <- function(x,y) { return(x + y) }
data_frame$last_one <- some_function(data_frame$Var1, data_frame$Var2)

using mapply with ggplot

Continuing on my quest to work with functions and ggplot:
I sorted out basic ways on how to use lapply and ggplot to cycle through a list of y_columns to make some individual plots:
require(ggplot2)
# using lapply with ggplot
df <- data.frame(x=c("a", "b", "c"), col1=c(1, 2, 3), col2=c(3, 2, 1), col3=c(4, 2, 3))
cols <- colnames(df[2:4])
myplots <- vector('list', 3)
plot_function <- function(y_column, data) {
ggplot(data, aes_string(x="x", y=y_column, fill = "x")) +
geom_col() +
labs(title=paste("lapply:", y_column))
}
myplots <- lapply(cols, plot_function, df)
myplots[[3]])
I know what to bring in a second variable that I will use to select rows. In my minimal example I am skipping the selection and just reusing the same plots and dfs as before, I simply add 3 iterations. So I would like to generate the same three plots as above, but now labelled as iteration A, B, and C.
I took me a while to sort out the syntax, but I now get that mapply needs to vectors of identical length that get passed on to the function as matched pairs. So I am using expand.grid to generate all pairs of variable 1 and variable 2 to create a dataframe and then pass the first and second column on via mapply. The next problem to sort out was that I need to pass on the dataframe as list MoreArgs =. So it seems like everything should be good to go. I am using the same syntax for aes_string() as above in my lapply example.
However, for some reason now it is not evaluating the y_column properly, but simply taking it as a value to plot, not as an indicator to plate the values contained in df$col1.
HELP!
require(ggplot2)
# using mapply with ggplot
df <- data.frame(x=c("a", "b", "c"), col1=c(1, 2, 3), col2=c(3, 2, 1), col3=c(4, 2, 3))
cols <- colnames(df[2:4])
iteration <- c("Iteration A", "Iteration B", "Iteration C")
multi_plot_function <- function(y_column, iteration, data) {
plot <- ggplot(data, aes_string(x="x", y=y_column, fill = "x")) +
geom_col() +
labs(title=paste("mapply:", y_column, "___", iteration))
}
# mapply call
combo <- expand.grid(cols=cols, iteration=iteration)
myplots <- mapply(multi_plot_function, combo[[1]], combo[[2]], MoreArgs = list(df), SIMPLIFY = F)
myplots[[3]]
We may need to use rowwise here
out <- lapply(asplit(combo, 1), function(x)
multi_plot_function(x[1], x[2], df))
In the OP's code, the only issue is that the columns are factor for 'combo', so it is not parsed correctly. If we change it to character, it works
out2 <- mapply(multi_plot_function, as.character(combo[[1]]),
as.character(combo[[2]]), MoreArgs = list(df), SIMPLIFY = FALSE)
-testing
out2[[1]]

How can I multiply multiple dataframes of a list by each observation of a vector?

I have a list of dataframes that I would like to multiply for each element of vector.
The first dataframe in the list would be multiplied by the first observation of the vector, and so on, producing another list of dataframes already multiplied.
I tried to do this with a loop, but was unsuccessful. I also tried to imagine something using map or lapply, but I couldn't.
for(i in vec){
for(j in listdf){
listdf2 <- i*listdf[[j]]
}
}
Error in listdf[[j]] : invalid subscript type 'list'
Any idea how to solve this?
*Vector and the List of Dataframes have the same length.
Use Map :
listdf2 <- Map(`*`, listdf, vec)
in purrr this can be done using map2 :
listdf2 <- purrr::map2(listdf, vec, `*`)
If you are interested in for loop solution you just need one loop :
listdf2 <- vector('list', length(listdf))
for (i in seq_along(vec)) {
listdf2[[i]] <- listdf[[i]] * vec[i]
}
data
vec <- c(4, 3, 5)
df <- data.frame(a = 1:5, b = 3:7)
listdf <- list(df, df, df)

R: I'm trying to apply rollmean(zoo) to a particular column in several dataframes which are in a list

reprod:
df1 <- data.frame(X = c(0:9), Y = c(10:19))
df2 <- data.frame(X = c(0:9), Y = c(10:19))
df3 <- data.frame(X = c(0:9), Y = c(10:19))
list_of_df <- list(A = df1, B = df2, C = df3)
list_of_df
I'm trying to apply the rollmean function from zoo to every 'Y' column in this list of dataframes.
I've tried lapply with no success, It seems no matter which way i spin it, there is no way to get around specifying the dataframe you want to apply to at some point.
This does one of the dataframes
roll_mean <- rollmean(list_of_df$A, 2)
roll_mean
obviously this doesn't work:
roll_mean1 <- rollmean(list_of_df, 2)
roll_mean1
I also tried this:
subset(may not be necessary)
Sub1 <- lapply(list_of_df, "[", 2)
roll_mean1 <- rollmean(Sub1, 2)
roll_mean1
there doesn't seem to be a way to do it without having to
specify the particular dataframe in the rollmean function
lapply(list_of_df), function(x) rollmean(list_of_df, 2))
for loop? also no success
For (i in list_of_df) {roll_mean1 <- rollmean(Sub1, 2)
Exp
}
Stating the obvious but I'm very new to coding in general and would appreciate some pointers.
It has occurred to me that even if it did work, the column that has been averaged would be one value longer than the rest of the dataframe; how would I get around that?
The question at one point says that it wants to perform the rollmean only on Y and at another point says that this works roll_mean <- rollmean(list_of_df$A, 2) but that does all columns.
1) Assuming that you want to apply rollmean to all columns:
Use lapply like this:
lapply(list_of_df, rollmean, 2)
This also works:
for(i in seq_along(list_of_df)) list_of_df[[i]] <- rollmean(list_of_df[[i]], 2)
2) If you only want to apply it to the Y column:
lapply(list_of_df, transform, Y = rollmean(Y, 2, fill = NA))
or
for(i in seq_along(list_of_df)) {
list_of_df[[i]]$Y <- rollmean(list_of_df[[i]]$Y, 2, fill = NA)
}

Is there a way to sum together lists of data frames within a larger list?

I have a large list (z) containing 3 lists of 10 data frames. I would like to collapse this object into a list of 3 data frames where each data frame is the sum of the 10 prior data frames (think matrix addition). Here is what I am working with, keep in mind that these are fake numbers, as the real data are read in from hundreds of *.csv files
x = rep(1,100)
x = matrix(x,10,10)
x = as.data.frame(x)
y = list(x,x,x,x,x,x,x,x,x,x)
z = list(y,y,y)
The desired end product would look like this:
x1 = rep(10,100)
x1 = matrix(x,10,10)
y1 = list(x1,x1,x1)
I keep trying stuff along the lines of:
z1 = c()
for (i in 1:3){
for (j in 1:10){
z1[[i]] = sum(z[[i]][[j]])
}
}
However, this does not yield the desired output. I have also messed around with some of the the apply functions, but to no avail
Thanks in advance for your help!
We can use Reduce to sum the corresponding i, j elements in the list and collapse it to a single dataset
lapply(z, function(x) Reduce(`+`, x))
If we want to remove the last column which is not numeric
lapply(z, function(x) Reduce(`+`, lapply(x, function(y) y[-ncol(y)])))
Or it can be looped over the sequence of list
lapply(seq_along(z), function(i) Reduce(`+`, lapply(seq_along(z[[i]]),
function(j) z[[i]][[j]][-ncol(z[[i]][[j]])])))
If we want to use sum, the data.frames inside the list can be converted to an array, loop over the array with apply, specify the MARGIN and do the sum. In this option, there is also possiblity to take care of NA elements with na.rm = TRUE in sum
lapply(z, function(x) apply(array(unlist(x), c(10, 10, 10)),
1:2, sum, na.rm = TRUE))
Or make it more efficient by looping only on one dimension and use colSums
lapply(z, function(x) apply(array(unlist(x), c(10, 10, 10)), 1, colSums, na.rm = TRUE))
Or using a for loop
z1 <- replicate(length(z), matrix(0, 10, 10), simplify = FALSE)
for(i in seq_along(z)) for(j in seq_along(z[[1]])) z1[[i]] <- z1[[i]] + z[[i]][[j]]

Resources