Group R objects into a list - r

I have loaded a series of SpatialPolygonsDataFrames into my workspace. Each of the named objects has either "_adm0", "_adm1", or "_adm2" attached to the country abreviation. For Germany, this would look like "DEU_adm0", "DEU_adm1", and "DEU_adm2".
I'm trying to gather all of the "_adm0" data frames into a list which can then be operated on by ldply and fortify. I could do that with,
mylist <- list(DEU_adm0, FRA_adm0, RUS_adm0, etc...) where I write out all of the countries that I want to be included in the list.
But, how do I grab all of the "_adm0" data frames by a pattern?
I have started with the code below but it doesn't give me the desired result as writing out
adm0list <- ls()[str_detect(ls(), "_adm0")]
mylist <- sapply(adm0list, function(x) get(x))
or alternatively,
mylist <- mget(adm0list, .GlobalEnv)
I do get a list of objects with the sapply method, and using mget(), but I'm not seeing why those lists are different than using list() with the object names directly. I suspect the answer to that question will tell me why ldply + fortify works with the list()method but not the other two.

You could use the pattern argument of ls and then use the # extractor for the data.frame portion of your SPDF objects...
# Construct list of objects wtih mget
ll <- mget( ls( pattern = "_adm0" ) )
# Extract data.frames
out <- lapply( ll , function(x) x#data )

Related

R Combine more than two lists elements with RegEx

I have multiple lists starting with the same name.
(values_1, values_2,values_n)
Is there a way to combine them like
all_lists <- c(values_*)
As suggested by Ronak Shah comment:
You have to work with the global environment .GlobalEnv
The function ls returns all the objects already defined in the .GlobalEnv
The pattern parameter allows you to obtain only objects which match the pattern.
ls() returns a character vector with the names of the objects.
To access the value of objects with their names, you have to use the get() function
When you have multiple names, you can use mget(). So the final snippet is
list_data <- mget(ls(pattern = 'values_'))
If you want to do the same with dataframes
Here is a working example:
mtc_1 <- mtcars
mtc_2 <- mtcars
mtc_3 <- mtcars
list_data <- mget(ls(pattern = 'mtc_'))
do.call(rbind, list_data)

R: transforming multiple sets to dataframes at once

I have 31 datasets corresponding to data about 31 teachers. I need to perform multiple transformations on all these datasets. One of them is transforming all of them into dataframes
class(alexandre)
[1] "tbl_df" "tbl" "data.frame"
As I said, I have 31 similar datasets, and I need to transform all into dataframes. My code to do so has been
alexandre <- as.data.frame(alexandre)
adrian <- as.data.frame(adrian)
akemi <- as.data.frame(akemi)
arcanjo <- as.data.frame(arcanjo)
ana_barbara <- as.data.frame(ana_barbara)
brigida <- as.data.frame(brigida)
cleiton <- as.data.frame(cleiton)
daniela <- as.data.frame(daniela)
davi <- as.data.frame(davi)
eliezer <- as.data.frame(eliezer)
eduardo <- as.data.frame(eduardo)
eustaquio <- as.data.frame(eustaquio)
gilberto <- as.data.frame(gilberto)
gilmar <- as.data.frame(gilmar)
jorge <- as.data.frame(jorge)
juarez <- as.data.frame(juarez)
junior <- as.data.frame(junior)
... and add some rows to this code (31 lines of this). Obviously all these lines of code take too much space and there must be a faster(and more elegant) way to accomplish this. In fact, I tried this
teachers <- c(alexandre, akemi, adrian, brigida, davi, ...)
cnames <- function(x){
colnames(x) <- c(1:18)
}
mapply(cnames, teachers)
Then I would do all the work with a few lines of code. And this method (form a vector containing all datasets, then use mapply on the vector) would make my work much easier because, as I said, I have to perform multiple transformation on all these datasets.
This code does not work, however. I get the following error:
Error in `colnames<-`(`*tmp*`, value = c(1:18)) :
attempt to set 'colnames' on an object with less than two dimensions
This error message is very unenlightening, I find. I have no idea what to do to to make the code work, which is obviously why I'm here. Any other methods to accomplish what I'm trying to do are welcome. Thanks.
As commented and often discussed in the R tag of SO, simply use a list to maintain all your individual, similarly structured data frames. Doing so allows you the following benefits:
Easily run operations consistently across all items using loops or apply family calls without separate naming assignments.
Organizes your environment and workspace with maintenance of one object with easy reference by number or name instead of 31 objects flooding your global environment.
Facilitates data frame migrations and handling with rbind, cbind, split, by, or other operations.
To create a list of all current data frames in global environment use eapply or mget filtering on data frame objects. Each returns a named list of data frames.
teachers_df_list <- Filter(is.data.frame, eapply(.GlobalEnv, identity))
teachers_df_list <- Filter(is.data.frame, mget(x=ls()))
Alternatively, source your data frames originally from file sources using list objects such as list.files:
teachers_df_list <- lapply(list.files(...), function(f) read.csv(f, ...))
You lose no functionality of data frame if stored inside a list.
head(teachers_df_list$alexandre)
tail(teachers_df_list$adrian)
summary(teachers_df_list$akemi)
...
Then run your needed operations with lapply like renaming columns with right-hand side function, setNames. Run other needed operations: aggregate or lm.
new_teachers_df_list <- lapply(teachers_df_list,
function(df) setNames(df, paste0("col_", c(1:18)))
new_teachers_agg_list <- lapply(teachers_df_list,
function(df) aggregate(col1 ~ col2, df, sum))
new_teachers_model_list <- lapply(teachers_df_list,
function(df) summary(lm(col1 ~ col2, df)))
Even compile all data frames into one master version using do.call + rbind:
# ADD A TEACHER INDICATOR COLUMN
new_teachers_df_list <- Map(function(df, n) transform(df, teacher=n),
new_teachers_df_list, names(new_teachers_df_list))
# BUILD SINGLE DF
teachers_df <- do.call(rbind, new_teachers_df_list)
Even split master version back into individual groupings if needed later on:
# SPLIT BACK TO LIST OF DFs
teachers_df_list <- split(teachers_df, teachers_df$teacher)
Maybe you could use a list to stock all your data.frame. It seems to work, but you need to find a way to extract all data.frame in the list after that.
df_1 <- data.frame(c(0, 1, 0), c(3, 4, 5))
df_2 <- data.frame(c(0, 1, 0), c(3, 4, 5))
l <- list(df_1, df_2)
lapply(l, function(x){
colnames(x) <- 1:2
return(x)
})

in R: execute function on dataframes whose names are in list

My global environment contains several dataframes. I want to execute functions on only those that contain a specific string in their name. So, I first create a list of these dataframes of interest:
dfs <- ls()[sapply(ls(), function(x) class(get(x))) == 'data.frame']
dfs <- as.data.frame(dfs)
dfs_lst <- agrep("stats", dfs$dfs, ignore.case=FALSE, value=TRUE,
max.distance=0.1, useBytes=FALSE)
dfs_lst correctly returns all dataframes in my global environment containing the string "stats". dfs_lst
chr [1:3] "stats1" "stats2" "stats3".
Now, I want to execute functions on these 3 dataframes, however I do not know how to call them from the dfs_lst. I want something of the kind:
for(i in 1:length(dfs_lst){
# Find dataframe name in dfs_lst, and then use the matching dataframe in
# global environment. So, something of the sort:
for(dfs_lst[i] in ls()){
result[i,] <- dfs_lst[i] %>%
summarise(. , <summarise stuff> )
}
}
For example, for i=1, dfs_lst[1] is dataframe "stats1", I would want to perform the following, and save it in the first row of "results":
for(stats1 in ls()){
result[1,] <- stats1 %>% summarise(. , <summarise stuff> )
}
As #lmo pointed out, it's probably best to store these data.frames together in a single list. Instead of having data.frame objects called "stats1", "stats2", etc, floating around in your environment, a (hacky) way to store all your data.frame objects in a list is this:
dfs <- ls()[sapply(ls(), function(x) class(get(x))) == 'data.frame']
##make an empty list
my_list <- list()
##populate the list
for (dfm_name in dfs) {
my_list[[dfm_name]] <- get(dfm_name)
}
Now you've got a list my_list containing every object of the class data.frame in your environment. This will probably be helpful when you want to work with all data.frames names "statsX":
##find all list objects whose name starts with "stats"
stats_objects <- substr(names(my_list),1,5)=="stats"
results <- matrix(NA, ncol = your_length, nrow = sum(stats_objects))
##now perform intended operations
for ( row_num in 1:nrow(results)) {
results[i,] <- my_list[stats_objects][[row_num]] %>%
summarise(. , <summarise stuff> )
}
This should perform as necessary, after a couple alterations in the code (e.g. your_length needs to be specified, and you wanted all objects whose name contains "stats" so you'll need to work with regularized expressions).
What's nice about this is my_list contains all the data.frames, so if you choose to run analysis on data.frames not named "stats" you can still access them with a similar procedure. Hope this helps.
As discussed in the comments, if we have a list of interesting data frames, it will be easier to deal with the elements as data frame. So, the main issue here seems to be having just the object names and not the actual data.frame objects.
In order to follow the code and tracking the data types, I have decomposed it first:
1.
env.list <- ls() # chr vector
2.
env.classes <- sapply(env.list, function(x) class(get(x)))
# list of chr (containing classes), element names: data frame names
3.
dfs <- env.list[env.classes == 'data.frame'] # chr vector
4.
dfs <- as.data.frame(dfs)
# data frame with one column (named "dfs"), containing data.frame names
Now, we can get the list of data.frames:
3.
dfs <- env.list[env.classes == 'data.frame'] # chr vector
dfs.list <- sapply(dfs, function(x) {get(x)})
grep can be applied now to names(dfs.list) to get the interesting data frames.

Apply an already defined function to all dataframes at once [duplicate]

This question already has an answer here:
How to apply a function to a certain column for all the data frames in environment in R
(1 answer)
Closed 1 year ago.
I already have defined a function (which works fine). Nevertheless, I have 20 dataframes in the working space to which I want to lapply the same function (dat1 to dat20).
So far it looks like this:
dat1 <- func(dat=dat1)
dat2 <- func(dat=dat2)
dat3 <- func(dat=dat3)
dat4 <- func(dat=dat4)
...
dat20 <- func(dat=dat20)
However, is there a way to do this more elegant with a shorter command, i.e. to lapply the function to all dataframes at once?
I tried this, but it didn't work:
mylist <- paste0("dat", 1:20, sep="")
lapply(mylist, func)
Try something like:
lapply(mget(ls(pattern="dat")),func)
Some details: The pattern argument in ls will limit which object names it lists (e.g., I assume you have other objects including your function in the global environment). mget retrieves those objects from the environment and turns them into a list, which you can then lapply your function over.
If you have the name of a variable, you can use get() to retrieve the value from the workspace. The corresponding assignment function is called assign():
mylist <- paste0("dat", 1:20)
lapply(mylist, function(name) assign(name, func(dat=get(name))) )
The desired behavior can be obtained using eval instead of lapply.
Assume mylist to be the names of the data.frame you want to apply fun to. mylist might be generated using
mylist <- ls(pattern="dat")
Then you can use the following code to do exactly what you want:
cCmd <- paste(mylist , "<- func(" ,mylist,")", sep="")
eCmd <- parse(text=cCmd)
eval(eCmd)

Put multiple data frames into list (smart way) [duplicate]

This question already has answers here:
How do I make a list of data frames?
(10 answers)
Closed 6 years ago.
Is it possible to put a lot of data frames into a list in some easy way?
Meaning instead of having to write each name manually like the following way:
list_of_df <- list(data_frame1,data_frame2,data_frame3, ....)
I have all the data frames loaded into my work space.
I am going to use the list to loop over all the data frames (to perform the same operations on each data frame).
You can use ls() with get as follows:
l.df <- lapply(ls(), function(x) if (class(get(x)) == "data.frame") get(x))
This'll load all data.frames from your current environment workspace.
Alternatively, as #agstudy suggests, you can use pattern to load just the data.frames you require.
l.df <- lapply(ls(pattern="df[0-9]+"), function(x) get(x))
Loads all data.frames in current environment that begins with df followed by 1 to any amount of numbers.
By far the easiest solution would be to put the data.frame's into a list where you create them. However, assuming you have a character list of object names:
list_df = lapply(list_object_names, get)
where you could construct you list like this (example for 10 objects):
list_object_names = sprintf("data_frame%s", 1:10)
or get all the objects in your current workspace into a list:
list_df = lapply(ls(), get)
names(list_df) = ls()
You can use ls with a specific pattern for example. For example:
some data.frames:
data.frame1 <- data.frame()
data.frame2 <- data.frame()
data.frame3 <- data.frame()
data.frame4 <- data.frame()
list(ls(pattern='data.fra*'))
[[1]]
[1] "data.frame1" "data.frame2" "data.frame3" "data.frame4"

Resources