Creating new aggregated R dataframes with a function in lapply

Creating new aggregated R dataframes with a function in lapply - r

I want to create new aggregated dataframes from existing ones that are collected in a list. Ideally they would appear as their own dataframes indicated by a prefix. This is where I've got:
dfList <- list(A = data.frame(y = sample(1:100), x = c("grp1","grp2")),
B = data.frame(y = sample(1:100), x = c("grp1","grp2")))
agrFun <- function(df){
prefix <- deparse(substitute(df))
assign(paste0(prefix,"_AGR"),
aggregate(y ~ x, data = df, sum))
}
lapply(seq_along(dfList), function(x) agrFun(dfList[[x]]))
Aggregation happens as intended but I'm not sure what I need to do otherwise in order to create dataframes A_AGR and B_AGR.
EDIT:
A bit of clarification. I want to have the aggregated dataframes appear in the environment.
So instead of this
ls()
[1] "agrFun" "dfList"
my goal is to have
ls()
[1] "A_AGR" "B_AGR" "agrFun" "dfList"
EDIT2:
Or more ideal would be to have dfList include dataframes A, A_AGR, B and B_AGR after the process.
EDIT3:
And I also want to preserve the names of the dataframes.

Your way to do seems extremely complicated. You can create the wanted named data.frames in a list with this one liner:
setNames(lapply(dfList, function(u) aggregate(y~x, u, sum)), paste0(names(dfList),"_AGR"))
#$A_AGR
# x y
#1 grp1 2340
#2 grp2 2710
#$B_AGR
# x y
#1 grp1 2573
#2 grp2 2477
With your function agrFun:
lst = setNames(lapply(dfList, function(x) agrFun(x)), paste0(names(dfList),"_AGR"))
If you want to append the two lists:
dfList = append(lst, dfList)

Related

Adding a column to every dataframe in a list with the name of the list element

I have a list containing multiple data frames, and each list element has a unique name. The structure is similar to this dummy data
a <- data.frame(z = rnorm(20), y = rnorm(20))
b <- data.frame(z = rnorm(30), y = rnorm(30))
c <- data.frame(z = rnorm(40), y = rnorm(40))
d <- data.frame(z = rnorm(50), y = rnorm(50))
my.list <- list(a,b,c,d)
names(my.list) <- c("a","b","c","d")
I want to create a column in each of the data frames that has the name of it's respective list element. My goal is to merge all the list element into a single data frame, and know which data frame they came from originally. The end result I'm looking for is something like this:
z y group
1 0.6169132 0.09803228 a
2 1.1610584 0.50356131 a
3 0.6399438 0.84810547 a
4 1.0878453 1.00472105 b
5 -0.3137200 -1.20707112 b
6 1.1428834 0.87852556 b
7 -1.0651735 -0.18614224 c
8 1.1629891 -0.30184443 c
9 -0.7980089 -0.35578381 c
10 1.4651651 -0.30586852 d
11 1.1936547 1.98858128 d
12 1.6284174 -0.17042835 d
My first thought was to use mutate to assign the list element name to a column in each respective data frame, but it appears that when used within lapply, names() refers to the column names, not the list element names
test <- lapply(my.list, function(x) mutate(x, group = names(x)))
Error: Column `group` must be length 20 (the number of rows) or one, not 2
Any suggestions as to how I could approach this problem?

there is no need to mutate just bind using dplyr's bind_rows
library(tidyverse)
my.list %>%
bind_rows(.id = "groups")
Obviously requires that the list is named.

We can use Map from base R
Map(cbind, my.list, group = names(my.list))
Or with imap from purrr
library(dplyr)
library(purrr)
imap(my.list, ~ .x %>% mutate(group = .y))
Or if the intention is to create a single data.frame
library(data.table)
rbindlist(my.list. idcol = 'groups')

How do I apply a function over multiple data frames, but overwrite them?

I'm trying to use lapply to run the same function over multiple data frames, but can't get lapply to work without assigning it to something. When I do this, I then have to go back in and re-separate the resulting lists which is annoying. Does anyone know why lapply won't just store the result over the data frames themselves? Here's a simple example:
keepCols <- c(1:6, 23, 24, 27:34, 37, 41:43)
myList <- list(x, y, z)
When I do this, all it does is print the result
lapply(myList, function(x) x[, ..keepCols])
If I assign it to something, I get a large list with what I want in it
df <- lapply(myList, function(x) x[, ..keepCols])
Why is lapply not working the way I want it to?

You can use the list2env() function.
list2env(data_list, envir = .GlobalEnv)
This will return all the data frames from your list and save them to the environment. This will also keep the data frame object's name.

You may just loop through the data frames in the globalenv() using a get-assign-approach, which is even possible in a one-liner.
Example
Consider a list of data frames like this one,
df1 # same as df2 and df3
# X1 X2 X3 X4
# 1 1 3 5 7
# 2 2 4 6 8
where you want to keep columns 1 and 3.
kp <- c(1, 3)
Do this:
sapply(ls(pat="^df"), function(x) assign(x, get(x, envir=.GlobalEnv)[kp], envir=.GlobalEnv))
Result:
df1
# X1 X3
# 1 1 5
# 2 2 6
Note: Instead of ls(pattern="^df") you can write alternatively c("df1", "df2", "df3). To keep the console clean you may wrap an invisible() around the sapply.
Data
df1 <- df2 <- df3 <- data.frame(matrix(1:8, 2, 4))

Speeding up an R for loop to paste multiple variables together

I'm new here but could use some help. I have a list of data frames, and for each element within my list (i.e., data.frame) I want to quickly paste one column in a data set to multiple other columns in the same data set, separated only by a period (".").
So if I have one set of data in a list of data frames:
list1[[1]]
A B C
2 1 5
4 2 2
Then I want the following result:
list1[[1]]
A B C
2.5 1.5 5
4.2 2.2 2
Where C is pasted to A and B individually. I then want this operation to take place for each data frame in my list.
I have tried the following:
pasteX<-function(df) {for (i in 1:dim(df)[2]-1) {
df[,i]<-as.numeric(sprintf("%s.%s", df[,i], df$C))
}
return(df)}
list2<-lapply(list1, pasteX)
But this approach is verrrry slow for larger matrices and lists. Any recommendations for making this code faster? Thanks!

Assuming everything is integers < 10
lapply(list1, function(x){
x[,-3] <- x[,-3] + x[,3]/10
x})

We can use Map
list1[[1]][-3] <- Map(function(x, y) as.numeric(sprintf('%s.%s', x, y)),
list1[[1]][-3], list1[[1]][3])
If there are many datasets, loop using lapply, convert the first two columns to matrix and paste with the third column, update the output, and return the dataset
lapply(list1, function(x) {
x[1:2] <- as.numeric(sprintf('%s.%s', as.matrix(x[1:2]), x[,3]));
x })
#[[1]]
# A B C
#1 2.5 1.5 5
#2 4.2 2.2 2
Or using tidyverse
library(tidyverse)
map(list1, ~ .x %>%
mutate_at(1:2, funs(as.numeric(sprintf('%s.%s', ., C)))))
Or with data.table
library(data.table)
lapply(list1, function(x) setDT(x)[, (1:2) :=
lapply(.SD, function(x) as.numeric(sprintf('%s.%s', x, C))) ,
.SDcols = 1:2][])

try this:
df <- data.frame(a = c(1,2,3), b = c(3,2,1), c = c(2,1,1))
pastex <- function(x){
m<- sapply(df[,1:2], function(x) as.numeric(paste(x, df$c, sep = '.')))
m <- as.data.frame(m)
m <- cbind(m, df["c"])
return(m)
}
mylist <- list(df1 = df, df2 = df)
lapply(mylist, pastex)

for loop and a function combined to calculate a formula and then a regression

A little confused with how I am trying to acheive the results I want.
I have an environment in R which consists of 5 data.frames called df[i]
So;
df1
df2
df3
df4
df5
Inside of these df´s I have 5 columns called col[j]
col1
col2
col3
col4
col5
In total I have 25 columns across 5 data frames (5 df x 5 col).
I also have a static variable called R which is a vector of numbers
I am trying to calculate for each column of each dataframe a basic formula using a function/loop. The formula for column 1 of df1 would be;
Y = df1$col1 - R
I am trying to calculate this and repeat for each colum[j:5] in df[i:5] and store it in a new data.frame
j <- 1:5
i <- 1:5
fun <- function(x){
for(i in 1:col[j](df[i])){
Y[j] <- col[j] - R
}
}
EDIT: Added comment below for easier reading.
Y1a = df1$col1 - R
Y2a = df1$col2 - R
Y3a = df1$col3 - R
.....
.....
Y1b = df2$col1 - R
Y2b = df2$col2 - R
Y3b = df2$col3 - R
..... etc

# Put your data in a list:
dflist = mget(paste0("df", 1:5))
# Apply your function to every data frame
ylist = lapply(dflist, function(x) x - R)
# Name the resulting columns y1:y5
ylist = lapply(ylist, setNames, paste0("y", 1:5))
Have a look at How to make a list of data frames for examples and discussion of why using lists is better.

tidyverse version
dplyr::mutate_all apply a fonction to each column of a data.frame.
So I would do like that:
all_df <- list(df1, df2, df3, df4, df5)
map(all_df, function(x) mutate_all(x, function(y) y - R))
It should return you a list of length 5. Each df contains your desired statistic.

Creating multiple columns by different functions of different variables

I'm having trouble transitioning to data.table. I am trying to group by some categorical variables, and apply a list of functions that each target different variables in order to create new columns. This is something that seems like it should be easy with mapply or Map, but I can't figure out to assemble the proper subset in the to pass to the functions.
Here is what it looks like,
set.seed(2015)
dat <- data.table(cat1 = factor('Total'),
cat2 = factor(rep(letters[1:4], 5)),
cat3 = factor(rep(1:4, each=5)),
var1 = sample(20),
var2 = sample(20),
var3 = sample(20))
## I have list of factor columns to group by
groups <- c(paste0("cat", 1:3))
setkeyv(dat, groups)
## List of functions, and corresponding list of column names that
## they are to be applied to. So, in this example I should get
## two new columns: V1=sum(var1) and V2=mean(var2, var3)
thing <- function(...) mean(c(...), na.rm=TRUE) # arbitrary function
funs <- list("sum", "thing") # named functions
targets <- list("var1", c("var2", "var3")) # variables
outnames <- funs # names or result columns
## Can't get this part
f <- function(fn, vars) do.call(fn, vars)
dat[, outnames := Map(f, funs, targets), by=groups]
The result for this example should be like this
dat[, `:=`(sum=sum(var1), thing=thing(var2, var3)), by=groups]

We need to subset the dataset columns based on the column names in the 'targets' list. One way would to loop through the list elements of 'targets' and subset the data.table (.SD[, x, with=FALSE]), and then apply the function.
dat[, unlist(outnames) := Map(f, funs, lapply(targets, function(x)
.SD[, ..x])), by = groups]

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Creating new aggregated R dataframes with a function in lapply - r

Related

Adding a column to every dataframe in a list with the name of the list element

How do I apply a function over multiple data frames, but overwrite them?

Speeding up an R for loop to paste multiple variables together

for loop and a function combined to calculate a formula and then a regression

Creating multiple columns by different functions of different variables

Categories

Resources