Apply an already defined function to all dataframes at once [duplicate] - r

This question already has an answer here:
How to apply a function to a certain column for all the data frames in environment in R
(1 answer)
Closed 1 year ago.
I already have defined a function (which works fine). Nevertheless, I have 20 dataframes in the working space to which I want to lapply the same function (dat1 to dat20).
So far it looks like this:
dat1 <- func(dat=dat1)
dat2 <- func(dat=dat2)
dat3 <- func(dat=dat3)
dat4 <- func(dat=dat4)
...
dat20 <- func(dat=dat20)
However, is there a way to do this more elegant with a shorter command, i.e. to lapply the function to all dataframes at once?
I tried this, but it didn't work:
mylist <- paste0("dat", 1:20, sep="")
lapply(mylist, func)

Try something like:
lapply(mget(ls(pattern="dat")),func)
Some details: The pattern argument in ls will limit which object names it lists (e.g., I assume you have other objects including your function in the global environment). mget retrieves those objects from the environment and turns them into a list, which you can then lapply your function over.

If you have the name of a variable, you can use get() to retrieve the value from the workspace. The corresponding assignment function is called assign():
mylist <- paste0("dat", 1:20)
lapply(mylist, function(name) assign(name, func(dat=get(name))) )

The desired behavior can be obtained using eval instead of lapply.
Assume mylist to be the names of the data.frame you want to apply fun to. mylist might be generated using
mylist <- ls(pattern="dat")
Then you can use the following code to do exactly what you want:
cCmd <- paste(mylist , "<- func(" ,mylist,")", sep="")
eCmd <- parse(text=cCmd)
eval(eCmd)

Related

colnames and mutate on multiple dataframes

I have a problem with cleaning up my code. I understand I could type this all out but we don't want that obviously.
I have only dataframes in my global environment. They are all "data.frame".
I want to check the dimensions of all of them and put that in a tibble. I managed that somehow. I also would like to change their colnames() tolower() which works easy if I just type the name of the data.frame, but there's more than 2 and I want it done automatically. Then I also want to mutate all data.frames in the same way.
Small example of my code:
library(tidyverse)
x <- data.frame(letters[1:2]) #To create the data
y <- data.frame(letters[3:4])
dfs <- as.list(ls()) #I take whatever is in my environment
I managed below to get a tibble of the dimensions:
z <- as_tibble(lapply(seq_along(dfs),
function(j) dim(get(dfs[[j]]))), .name_repair = "unique")
colnames(z) <- dfs
Now for the colnames of all the data.frames stored in my list I basically want to perform this code:
colnames(dfs[[1]]) <- tolower(colnames(dfs[[1]])
but that returns NULL as I found out earlier. So I used get() in there to make it work for the dimensions. But if I use get() to assign colnames it says it can't find function "get<-".
Since all colnames for all dataframes are the same (just different nrows()) I could save the lowercase colnames as value and use that, but that doesn't take away that it cant find the get<- function.
names <- tolower(colnames(x))
sapply(seq_along(dfs),
function(j) colnames(get(dfs[[j]])) <- names)
*Error in colnames(get(dfs[[j]])) <- names :
could not find function "get<-"*
as for the mutating part I tried a for loop:
for(i in seq_along(dfs)){
get(dfs[[i]]) <- get(dfs[[i]]) %>% mutate(cd = ab)
}
But it's the same issue.
Could anyone help clearing this problem for me? (and if a cleaner code for the dimensions is available that would be highly appreciated)
I am just trying to up my coding skills. I would have been long done if I just typed it all out but that defeats the purpose.
Thanks!
-JK
Using base R
lapply(dfs, function(x) transform(setNames(x, tolower(names(x))), X = c('a', 'b')))

Create a variable in Multiple Dataframes in R

I want to create a ranked variable that will appear in multiple data frames.
I'm having trouble getting the ranked variable into the data frames.
Simple code. Can't make it happen.
dfList <- list(df1,df2,df3)
for (df in dfList){
rAchievement <- rank(df["Achievement"])
df[[rAchievement]]<-rAchievement
}
The result I want is for df1, df2 and df3 to each gain a new variable called rAchievement.
I'm struggling!! And my apologies. I know there are similar questions out there. I have reviewed them all. None seem to work and accepted answers are rare.
Any help would be MUCH appreciated. Thank you!
We can use lapply with transform in a single line
dfList <- lapply(dfList, transform, rAchievement = rank(Achievement))
If we need to update the objects 'df1', 'df2', 'df3', set the names of the 'dfList' with the object names and use list2env (not recommended though)
names(dfList) <- paste0('df", 1:3)
list2env(dfList, .GlobalEnv)
Or using the for loop, we loop over the sequence of the list, extract the list element assign a new column based on the rank of the 'Achievement'
for(i in seq_along(dfList)) {
dfList[[i]][['rAchievement']] <- rank(dfList[[i]]$Achievement)
}

Combine lapply, seq_along and ddply

I've been searching around this forum and trying to implement in my case what was said in previous answers from those questions. However, something in my code is missing.
I use lapply() with a function inside that runs ddply. This works nice. However, I would like to identify every result from a single data frame by reading the name of the data frame, and not [[1]], [[2]]...
For this reason, I am trying to implement the seq_along argument, but unsuccessfully. Let's see what I have:
I created a list to group 16 different data frames (with the same structure) in one object, called melt_noNA_noDC_regression:
melt_noNA_noDC_regression <-
list(I1U_melt_noNA_noDC_regression, I1L_melt_noNA_noDC_regression,
I1U_melt_noNA_noDC_regression, I1L_melt_noNA_noDC_regression,
CU_melt_noNA_noDC_regression, CL_melt_noNA_noDC_regression,
P3U_melt_noNA_noDC_regression, P3L_melt_noNA_noDC_regression,
P4U_melt_noNA_noDC_regression, P4L_melt_noNA_noDC_regression,
M1U_melt_noNA_noDC_regression, M1L_melt_noNA_noDC_regression,
M2U_melt_noNA_noDC_regression, M2L_melt_noNA_noDC_regression,
M3U_melt_noNA_noDC_regression, M3L_melt_noNA_noDC_regression)
Later, I run this lapply() line successfully.
lapply(melt_noNA_noDC_regression, function(x) ddply(x, .(Species), model_regression))
As I have 16 different data frames, I would like to identify them in the results of the lapply function. I have tried several combinations to include seq_along within the lapply code, as in this case:
lapply(melt_noNA_noDC_regression, function(x) {
ddply(x, .(Species), model_regression)
seq_along(x), function(i) paste(names(x)[[i]], x[[i]])
})
However, I've been getting errors constantly, and it is a bit frustrating. It is maybe very easy to solve, but I am block.
Any idea to solve this?
Consider using eapply (lapply's lesser known sibling) or mget to retrieve a named list of your dataframes. Then run them through lapply for the ddply call to return the same named dataframe list with new corresponding values.
df_list <- eapply(.GlobalEnv, function(d) d)[c("I1U_melt_noNA_noDC_regression",
"I1L_melt_noNA_noDC_regression",
"I1U_melt_noNA_noDC_regression",
...)]
df_list <- mget(c("I1U_melt_noNA_noDC_regression",
"I1L_melt_noNA_noDC_regression",
"I1U_melt_noNA_noDC_regression",
...))
# GENERALIZED FOR ANY DF IN GLOBAL ENV
df_list <- Filter(function(i) class(i)=="data.frame", eapply(.GlobalEnv, function(d) d))
new_list <- lapply(df_list, function(x) ddply(x, .(Species), model_regression))
And because eapply (being environment apply) is part of the apply family and can iterate through objects, you can bypass lapply. But you must account for non-dataframes and then filter out by df names. Hence, tryCatch is used and [] indexing:
new_list2 <- eapply(.GlobalEnv, function(x)
tryCatch(ddply(x, .(Species), model_regression),
warning = function(w) return(NA),
error = function(e) return(NA)
)
)[c("I1U_melt_noNA_noDC_regression",
"I1L_melt_noNA_noDC_regression",
"I1U_melt_noNA_noDC_regression",
...)]
all.equal(new_list, new_list2)
# [1] TRUE
With all that said, ideally in your data processing you would originally use a named dataframe list and not create separate, similar structured 16 objects flooding your global environment. Therefore, consider adjusting the source of your regression objects, so replace the following:
I1U_melt_noNA_noDC_regression <- ...
with this:
df_list = list()
df_list["I1U_melt_noNA_noDC_regression"] <- ...

R - New variables over several data frames in a loop

In R, I have several datasets, and I want to use a loop to create new variables (columns) within each of them:
All dataframes have the same name structure, so that is what I am using to loop through them. Here is some pseudo-code with what I want to do
Name = Dataframe_1 #Assume the for-loop goes from Dataframe_1 to _10 (loop not shown)
#Pseudo-code
eval(as.name(Name))$NewVariable <- c("SomeString") #This is what I would like to do, but I get an error ("could not find function eval<-")
As a result, I should have the same dataframe with one extra column (NewVariable), where all rows have the value "SomeString".
If I use eval(as.name(Name)) I can call up the dataframe Name with no problem, but none of the usual data frame operators seem to work with that particular call (not <- assignment, or $ or [[]])
Any ideas would be appreciated, thanks in advance!
We can place the datasets in a list and create a new column by looping over the list with lapply. If needed, the original dataframe objects can be updated with list2env.
lst <- mget(paste0('Dataframe_', 1:10))
lst1 <- lapply(lst, transform, NewVariable = "SomeString")
list2env(lst1, envir = .GlobalEnv())
Or another option is with assign
nm1 <- ls(pattern = "^Dataframe_\\d+")
nm2 <- rep("NewVariable", length(nm1))
for(j in seq_along(nm1)){
assign(nm1[j], `[<-`(get(nm1[j]), nm2[j], value = "SomeString"))
}

Put multiple data frames into list (smart way) [duplicate]

This question already has answers here:
How do I make a list of data frames?
(10 answers)
Closed 6 years ago.
Is it possible to put a lot of data frames into a list in some easy way?
Meaning instead of having to write each name manually like the following way:
list_of_df <- list(data_frame1,data_frame2,data_frame3, ....)
I have all the data frames loaded into my work space.
I am going to use the list to loop over all the data frames (to perform the same operations on each data frame).
You can use ls() with get as follows:
l.df <- lapply(ls(), function(x) if (class(get(x)) == "data.frame") get(x))
This'll load all data.frames from your current environment workspace.
Alternatively, as #agstudy suggests, you can use pattern to load just the data.frames you require.
l.df <- lapply(ls(pattern="df[0-9]+"), function(x) get(x))
Loads all data.frames in current environment that begins with df followed by 1 to any amount of numbers.
By far the easiest solution would be to put the data.frame's into a list where you create them. However, assuming you have a character list of object names:
list_df = lapply(list_object_names, get)
where you could construct you list like this (example for 10 objects):
list_object_names = sprintf("data_frame%s", 1:10)
or get all the objects in your current workspace into a list:
list_df = lapply(ls(), get)
names(list_df) = ls()
You can use ls with a specific pattern for example. For example:
some data.frames:
data.frame1 <- data.frame()
data.frame2 <- data.frame()
data.frame3 <- data.frame()
data.frame4 <- data.frame()
list(ls(pattern='data.fra*'))
[[1]]
[1] "data.frame1" "data.frame2" "data.frame3" "data.frame4"

Resources