Put multiple data frames into list (smart way) [duplicate] - r

This question already has answers here:
How do I make a list of data frames?
(10 answers)
Closed 6 years ago.
Is it possible to put a lot of data frames into a list in some easy way?
Meaning instead of having to write each name manually like the following way:
list_of_df <- list(data_frame1,data_frame2,data_frame3, ....)
I have all the data frames loaded into my work space.
I am going to use the list to loop over all the data frames (to perform the same operations on each data frame).

You can use ls() with get as follows:
l.df <- lapply(ls(), function(x) if (class(get(x)) == "data.frame") get(x))
This'll load all data.frames from your current environment workspace.
Alternatively, as #agstudy suggests, you can use pattern to load just the data.frames you require.
l.df <- lapply(ls(pattern="df[0-9]+"), function(x) get(x))
Loads all data.frames in current environment that begins with df followed by 1 to any amount of numbers.

By far the easiest solution would be to put the data.frame's into a list where you create them. However, assuming you have a character list of object names:
list_df = lapply(list_object_names, get)
where you could construct you list like this (example for 10 objects):
list_object_names = sprintf("data_frame%s", 1:10)
or get all the objects in your current workspace into a list:
list_df = lapply(ls(), get)
names(list_df) = ls()

You can use ls with a specific pattern for example. For example:
some data.frames:
data.frame1 <- data.frame()
data.frame2 <- data.frame()
data.frame3 <- data.frame()
data.frame4 <- data.frame()
list(ls(pattern='data.fra*'))
[[1]]
[1] "data.frame1" "data.frame2" "data.frame3" "data.frame4"

Related

Create a variable in Multiple Dataframes in R

I want to create a ranked variable that will appear in multiple data frames.
I'm having trouble getting the ranked variable into the data frames.
Simple code. Can't make it happen.
dfList <- list(df1,df2,df3)
for (df in dfList){
rAchievement <- rank(df["Achievement"])
df[[rAchievement]]<-rAchievement
}
The result I want is for df1, df2 and df3 to each gain a new variable called rAchievement.
I'm struggling!! And my apologies. I know there are similar questions out there. I have reviewed them all. None seem to work and accepted answers are rare.
Any help would be MUCH appreciated. Thank you!
We can use lapply with transform in a single line
dfList <- lapply(dfList, transform, rAchievement = rank(Achievement))
If we need to update the objects 'df1', 'df2', 'df3', set the names of the 'dfList' with the object names and use list2env (not recommended though)
names(dfList) <- paste0('df", 1:3)
list2env(dfList, .GlobalEnv)
Or using the for loop, we loop over the sequence of the list, extract the list element assign a new column based on the rank of the 'Achievement'
for(i in seq_along(dfList)) {
dfList[[i]][['rAchievement']] <- rank(dfList[[i]]$Achievement)
}

How to create multiple dataframes with lapply()?

I want do same things to create different data frames, can I use lapply achieve?
I tried to did it but not succeed
xx<-c("a1","b1")
lapply(xx, function(x){
x<-data.frame(c(1,2,3,4),"1")
})
I hope I can get two data frames ,like
a1<-data.frame(c(1,2,3,4),"1")
b1<-data.frame(c(1,2,3,4),"1")
An option that assigns to the .Globalenv. This as pointed out is less efficient but was provided to answer the OP's question as is:
lapply(xx, function(x) assign(x,data.frame(A=c(1,2,3,4),
B="1"),
envir=.GlobalEnv))
You can then call each data frame with their names.
a1, b1.
You could try using sapply over the xx vector of names to populate a list with the data frames:
lst <- list()
xx <- c("a1", "b1")
sapply(xx, function(x) {
lst[[x]] <- data.frame(c(1,2,3,4), "1")
})
Then, you may access each data frame using the list, e.g. lst$a1.

in R: execute function on dataframes whose names are in list

My global environment contains several dataframes. I want to execute functions on only those that contain a specific string in their name. So, I first create a list of these dataframes of interest:
dfs <- ls()[sapply(ls(), function(x) class(get(x))) == 'data.frame']
dfs <- as.data.frame(dfs)
dfs_lst <- agrep("stats", dfs$dfs, ignore.case=FALSE, value=TRUE,
max.distance=0.1, useBytes=FALSE)
dfs_lst correctly returns all dataframes in my global environment containing the string "stats". dfs_lst
chr [1:3] "stats1" "stats2" "stats3".
Now, I want to execute functions on these 3 dataframes, however I do not know how to call them from the dfs_lst. I want something of the kind:
for(i in 1:length(dfs_lst){
# Find dataframe name in dfs_lst, and then use the matching dataframe in
# global environment. So, something of the sort:
for(dfs_lst[i] in ls()){
result[i,] <- dfs_lst[i] %>%
summarise(. , <summarise stuff> )
}
}
For example, for i=1, dfs_lst[1] is dataframe "stats1", I would want to perform the following, and save it in the first row of "results":
for(stats1 in ls()){
result[1,] <- stats1 %>% summarise(. , <summarise stuff> )
}
As #lmo pointed out, it's probably best to store these data.frames together in a single list. Instead of having data.frame objects called "stats1", "stats2", etc, floating around in your environment, a (hacky) way to store all your data.frame objects in a list is this:
dfs <- ls()[sapply(ls(), function(x) class(get(x))) == 'data.frame']
##make an empty list
my_list <- list()
##populate the list
for (dfm_name in dfs) {
my_list[[dfm_name]] <- get(dfm_name)
}
Now you've got a list my_list containing every object of the class data.frame in your environment. This will probably be helpful when you want to work with all data.frames names "statsX":
##find all list objects whose name starts with "stats"
stats_objects <- substr(names(my_list),1,5)=="stats"
results <- matrix(NA, ncol = your_length, nrow = sum(stats_objects))
##now perform intended operations
for ( row_num in 1:nrow(results)) {
results[i,] <- my_list[stats_objects][[row_num]] %>%
summarise(. , <summarise stuff> )
}
This should perform as necessary, after a couple alterations in the code (e.g. your_length needs to be specified, and you wanted all objects whose name contains "stats" so you'll need to work with regularized expressions).
What's nice about this is my_list contains all the data.frames, so if you choose to run analysis on data.frames not named "stats" you can still access them with a similar procedure. Hope this helps.
As discussed in the comments, if we have a list of interesting data frames, it will be easier to deal with the elements as data frame. So, the main issue here seems to be having just the object names and not the actual data.frame objects.
In order to follow the code and tracking the data types, I have decomposed it first:
1.
env.list <- ls() # chr vector
2.
env.classes <- sapply(env.list, function(x) class(get(x)))
# list of chr (containing classes), element names: data frame names
3.
dfs <- env.list[env.classes == 'data.frame'] # chr vector
4.
dfs <- as.data.frame(dfs)
# data frame with one column (named "dfs"), containing data.frame names
Now, we can get the list of data.frames:
3.
dfs <- env.list[env.classes == 'data.frame'] # chr vector
dfs.list <- sapply(dfs, function(x) {get(x)})
grep can be applied now to names(dfs.list) to get the interesting data frames.

Group R objects into a list

I have loaded a series of SpatialPolygonsDataFrames into my workspace. Each of the named objects has either "_adm0", "_adm1", or "_adm2" attached to the country abreviation. For Germany, this would look like "DEU_adm0", "DEU_adm1", and "DEU_adm2".
I'm trying to gather all of the "_adm0" data frames into a list which can then be operated on by ldply and fortify. I could do that with,
mylist <- list(DEU_adm0, FRA_adm0, RUS_adm0, etc...) where I write out all of the countries that I want to be included in the list.
But, how do I grab all of the "_adm0" data frames by a pattern?
I have started with the code below but it doesn't give me the desired result as writing out
adm0list <- ls()[str_detect(ls(), "_adm0")]
mylist <- sapply(adm0list, function(x) get(x))
or alternatively,
mylist <- mget(adm0list, .GlobalEnv)
I do get a list of objects with the sapply method, and using mget(), but I'm not seeing why those lists are different than using list() with the object names directly. I suspect the answer to that question will tell me why ldply + fortify works with the list()method but not the other two.
You could use the pattern argument of ls and then use the # extractor for the data.frame portion of your SPDF objects...
# Construct list of objects wtih mget
ll <- mget( ls( pattern = "_adm0" ) )
# Extract data.frames
out <- lapply( ll , function(x) x#data )

Apply an already defined function to all dataframes at once [duplicate]

This question already has an answer here:
How to apply a function to a certain column for all the data frames in environment in R
(1 answer)
Closed 1 year ago.
I already have defined a function (which works fine). Nevertheless, I have 20 dataframes in the working space to which I want to lapply the same function (dat1 to dat20).
So far it looks like this:
dat1 <- func(dat=dat1)
dat2 <- func(dat=dat2)
dat3 <- func(dat=dat3)
dat4 <- func(dat=dat4)
...
dat20 <- func(dat=dat20)
However, is there a way to do this more elegant with a shorter command, i.e. to lapply the function to all dataframes at once?
I tried this, but it didn't work:
mylist <- paste0("dat", 1:20, sep="")
lapply(mylist, func)
Try something like:
lapply(mget(ls(pattern="dat")),func)
Some details: The pattern argument in ls will limit which object names it lists (e.g., I assume you have other objects including your function in the global environment). mget retrieves those objects from the environment and turns them into a list, which you can then lapply your function over.
If you have the name of a variable, you can use get() to retrieve the value from the workspace. The corresponding assignment function is called assign():
mylist <- paste0("dat", 1:20)
lapply(mylist, function(name) assign(name, func(dat=get(name))) )
The desired behavior can be obtained using eval instead of lapply.
Assume mylist to be the names of the data.frame you want to apply fun to. mylist might be generated using
mylist <- ls(pattern="dat")
Then you can use the following code to do exactly what you want:
cCmd <- paste(mylist , "<- func(" ,mylist,")", sep="")
eCmd <- parse(text=cCmd)
eval(eCmd)

Resources