subset multiple data tables using lapply - r

I have multiple data tables and all have a common column called ID. I have a vector vec that contains a set of ID values.
I would like to use lapply to subset all data tables using vec
I understand how to use lapply to subset the data tables but my question is how to assign the subsetted results back to original data tables
Here is what I tried :
tables<-c("dt1","dt2","dt3","dt4")
lapply(mget(tables),function(x)x[ID %in% vec,])
The above gives subsets of all data tables but how do I assign them back to dt1,dt2,dt3,dt4 ?

I would keep the datasets in the list rather than updating the dataset objects in the global environment as most of the operations can be done within the list (including reading the files and writing to output files ). But, if you insist, we can use list2env which will update the original dataset objects with the subset dataset
lst <- lapply(mget(tables),function(x)x[ID %in% vec,])
list2env(lst, envir=.GlobalEnv)

You could also just name the datasets in the list:
tables <- c("dt1","dt2","dt3","dt4")
dflist <- lapply(mget(tables),function(x)x[ID %in% vec,])
dflist <- setNames(dflist, tables)

Related

Is there a way to extract a data frame from a list, and assign the data frame to an object with a dynamic name?

I have a list containing many named data frames. I am trying to find a way to extract each data frame from this list. Ultimately, the goal is to assign each data frame in the list to an object according to the name that it has in the list, allowing me to reference the data frames directly instead of through the list (eg. dataframe instead of LIST[[dataframe]])
Here is an example similar to what I am working with.
library(googlesheets4)
inst.pkg("dplyr")
library(dplyr)
gs4_deauth()
TABLES <- list("Test1", "Test2")
readTable <- function(TABLES){
TABLES <- range_read(as_sheets_id("SHEET ID"),sheet = TABLES)
TABLES <-as.data.frame(TABLES)
TABLES <- TABLES %>%
transmute(Column1= as.character(Column1), Column2 = as.character(Column2 ))
return(TABLES)}
LIST <- lapply(TABLES, readTable)
names(LIST) <- TABLES
I know that this could be done manually, but I'm trying to find a way to automate this process. Any advice would be helpful. Thanks in advance.
If named_dfs is a named list where each element is a dataframe you can use the assign function to achieve your goal.
Map(assign, names(named_dfs), named_dfs, pos = 1)
For each name, it assigns (equivalent to <- operator) the corresponding dataframe object.
Map(function(x, y) assign(x, y, envir = globalenv()), names(named_dfs), named_dfs)
Should also work.

Create new renamed dataframes based on subset of current dataframes in a loop

I'm working with approximately 400 dataframes so I need this to be able to be completed in a loop-like process.
I want to create a copy of all of my dataframes by selecting a subset of rows based on the time points, I can manually do this one at a time but can't figure out how to loop it. All of my dataframes are currently in a list together. Ideally I'd like the new dataframes to be renamed by adding a small string to the original name, i.e. df is the original and df_t is the subset that's been created. It'd also be really helpful if it's possible to put all of these dataframe copies into a list together.
My current code that works for a single dataframe:
df_t <- with(df, df[hour(columnname) > 5 | hour(columnname) <20,])
You could use the same code which works for one dataframe on a list of dataframes inside lapply. Assuming your list where all the dataframes are stored is called list_df
library(lubridate)
out <- lapply(mget(list_df), function(df) subset(df, hour(columnname) > 5 | hour(columnname) <20))
If you want to name each dataframe in the list you can do
names(out) <- paste0("df_t", seq_along(out))

R. Apply same function to similar data frames. Generate new data frames with same name pattern

I have seen similar posts but no one that address specifically this question. I have 22 data frames called chr1,chr2,chr3,...,chr22. I want to apply a home-made function "diff_set" to all these data frames and generate 22 new data frames with names chr1.1,chr2.1,chr3.1,...,chr22.1. The function is applied to one of the columns. For instance, for chr1, I apply diff_set and generate chr1.1:
chr1.1 = diff_set(chr1$POSITION, 200000)
Any suggestion is welcome !
Simply lapply a list of dataframes on your function, diff_set, rename your output list and then if really needed but should be avoided run list2env to save individual dfs as separate objects:
output_list <- lapply(mget(paste0("chr", seq(1,22))),
function(df) diff_set(df$POSITION, 20000))
output_list <- setNames(output_list, paste0("chr", seq(1,22)+0.1))
# FIRST THREE DFS
output_list$chr1.1
output_list$chr2.1
output_list$chr3.1
# OUTPUT EACH DF AS SEPARATE OBJECT
# (BUT CONSIDER AVOIDING THIS AS YOU FLOOD GLOBAL ENVIRONMENT)
list2env(output_list, envir=.GlobalEnv)

Storing dataframes in a list

I'm trying to store a bunch of dataframes in a list, and each of these dataframes has column names that are important (they are stock names, which are different for each dataframe).
I'm storing them in a list because this way it can be done with a foreach loop, which will allow me to run this beforehand, then use the list as a database of information.
right now I have:
Y.matrices <- foreach(i = (1:600)) %dopar% {
df = data.frame(data)
return(df)
}
The issue with this is once I store them, I'm not sure how to get the data frames back. If I do:
unlist(Y.matrices[1])
I get a long numeric vector that has lost the column names. Is there some other way to store these data frames (ie, perhaps not in a list) that would enable me to preserve the formats?
Thanks!
To access 1 individual dataframe, you can use Y.matrices[[#]], where # is the dataframe you want to access, if the result needs to be 1 merged dataframe with all the 600 dataframes you can use:
library(dplyr)
df1 <- bind_rows(Y.matrices, .id = "df")
The .id fills in the number of the data.frame, or if they are named in the list, the name of the dataframe.

Binding data frames stored in a list in R

I have several data frames stored in R memory among several other objects.
Their particularity is that they are all named as "Station_Year.df".
I want to merge all these data frames into one.
I tried:
df_list <- ls(pattern=".df")
dataset <- rbind(df_list)
But I get a data frame with the names of the data frames...
You should use mget to get the data of each dataframe of the df_list. So you can do:
dataset <- do.call(rbind, mget(df_list))
Note that this implies that all the rows are of the same length. Probably you find useful also the merge function.
Thanks alexis_laz, I forgot the do.call.

Resources