Storing dataframes in a list

Storing dataframes in a list - r

I'm trying to store a bunch of dataframes in a list, and each of these dataframes has column names that are important (they are stock names, which are different for each dataframe).
I'm storing them in a list because this way it can be done with a foreach loop, which will allow me to run this beforehand, then use the list as a database of information.
right now I have:
Y.matrices <- foreach(i = (1:600)) %dopar% {
df = data.frame(data)
return(df)
}
The issue with this is once I store them, I'm not sure how to get the data frames back. If I do:
unlist(Y.matrices[1])
I get a long numeric vector that has lost the column names. Is there some other way to store these data frames (ie, perhaps not in a list) that would enable me to preserve the formats?
Thanks!

To access 1 individual dataframe, you can use Y.matrices[[#]], where # is the dataframe you want to access, if the result needs to be 1 merged dataframe with all the 600 dataframes you can use:
library(dplyr)
df1 <- bind_rows(Y.matrices, .id = "df")
The .id fills in the number of the data.frame, or if they are named in the list, the name of the dataframe.

Related

Is there a way to extract a data frame from a list, and assign the data frame to an object with a dynamic name?

I have a list containing many named data frames. I am trying to find a way to extract each data frame from this list. Ultimately, the goal is to assign each data frame in the list to an object according to the name that it has in the list, allowing me to reference the data frames directly instead of through the list (eg. dataframe instead of LIST[[dataframe]])
Here is an example similar to what I am working with.
library(googlesheets4)
inst.pkg("dplyr")
library(dplyr)
gs4_deauth()
TABLES <- list("Test1", "Test2")
readTable <- function(TABLES){
TABLES <- range_read(as_sheets_id("SHEET ID"),sheet = TABLES)
TABLES <-as.data.frame(TABLES)
TABLES <- TABLES %>%
transmute(Column1= as.character(Column1), Column2 = as.character(Column2 ))
return(TABLES)}
LIST <- lapply(TABLES, readTable)
names(LIST) <- TABLES
I know that this could be done manually, but I'm trying to find a way to automate this process. Any advice would be helpful. Thanks in advance.

If named_dfs is a named list where each element is a dataframe you can use the assign function to achieve your goal.
Map(assign, names(named_dfs), named_dfs, pos = 1)
For each name, it assigns (equivalent to <- operator) the corresponding dataframe object.
Map(function(x, y) assign(x, y, envir = globalenv()), names(named_dfs), named_dfs)
Should also work.

R for loop: creating data frames using split?

I have data that I want to separate by date, I have managed to do this manually through:
tsssplit <- split(tss, tss$created_at)
and then creating dataframes for each list which I then use.
t1 <- tsssplit[[1]]
t2 <- tsssplit[[2]]
But I don't know how many splits I will need, as sometimes the og data frame may may have 6 dates to split up by, and sometimes it may have 5, etc. So I want to create a for loop.
Within the for loop, I want to incorporate this code, which connects to a function:
bscore3 <- score.sentiment(t3$cleaned_text,pos.words,neg.words,.progress='text')
score3 <- as.integer(bscore3$score[[1]])
Then I want to be able to create a new data frame that has the scores for each list.
So essentially I want the for loop to:
split the data into lists using split
split each list into a separate data frames for each different day
Come out with a score for each data frame
Put that into a new data frame
It doesn't have to be exactly like this as long as I can come up with a visualisation of the scores at the end.
Thanks!

It is not recommended to create separate dataframes in the global environment, they are difficult to keep track of. Put them in a list instead. You have started off well by using split and creating list of dataframes. You can then iterate over each dataframe in the list and apply the function on each one of them.
Using by this would look like as :
by(tss, tss$created_at, function(x) {
bscore3 <- score.sentiment(x$cleaned_text,pos.words,neg.words,.progress='text')
score3 <- as.integer(bscore3$score[[1]])
return(score3)
}) -> result
result

How to create a matrix/data frame from a high number of single objects by using a loop?

I have a high number of single objects each one containing a mean value for a year. They are called cddmean1950, cddmean1951, ... ,cddmean2019.
Now I would like to put them together into a matrix or data frame with the first column being the year (1950 - 2019) and the second column being the single mean values.
This is a very long way to do it without looping:
matrix <- rbind(cddmean1950,cddmean1951,cddmean1952,...,cddmean2019)
Afterwards you transform the matrix to a data frame, create a vector with the years and add it to the data frame.
I am sure there must be a smarter and faster way to do this by using a loop or anything else?

I think this could be an easy way to do it. Provided all those single objects are in your current environment.
First we would create a list of the variable names using the paste0 function
YearRange <- 1950:2019
ObjectName <- paste0("cddmean", YearRange)
Then we can use lapply and get to get the values of all these variables as a list.
Then using do.call and rbind we can rbind all these values into a single vector and then finally create your dataframe as you requested.
ListofSingleObjects <- lapply(ObjectName, get)
MeanValues <- do.call(rbind,ListofSingleObjects)
df <- data.frame( year = YearRange , Mean = MeanValues )

Create new renamed dataframes based on subset of current dataframes in a loop

I'm working with approximately 400 dataframes so I need this to be able to be completed in a loop-like process.
I want to create a copy of all of my dataframes by selecting a subset of rows based on the time points, I can manually do this one at a time but can't figure out how to loop it. All of my dataframes are currently in a list together. Ideally I'd like the new dataframes to be renamed by adding a small string to the original name, i.e. df is the original and df_t is the subset that's been created. It'd also be really helpful if it's possible to put all of these dataframe copies into a list together.
My current code that works for a single dataframe:
df_t <- with(df, df[hour(columnname) > 5 | hour(columnname) <20,])

You could use the same code which works for one dataframe on a list of dataframes inside lapply. Assuming your list where all the dataframes are stored is called list_df
library(lubridate)
out <- lapply(mget(list_df), function(df) subset(df, hour(columnname) > 5 | hour(columnname) <20))
If you want to name each dataframe in the list you can do
names(out) <- paste0("df_t", seq_along(out))

subset multiple data tables using lapply

I have multiple data tables and all have a common column called ID. I have a vector vec that contains a set of ID values.
I would like to use lapply to subset all data tables using vec
I understand how to use lapply to subset the data tables but my question is how to assign the subsetted results back to original data tables
Here is what I tried :
tables<-c("dt1","dt2","dt3","dt4")
lapply(mget(tables),function(x)x[ID %in% vec,])
The above gives subsets of all data tables but how do I assign them back to dt1,dt2,dt3,dt4 ?

I would keep the datasets in the list rather than updating the dataset objects in the global environment as most of the operations can be done within the list (including reading the files and writing to output files ). But, if you insist, we can use list2env which will update the original dataset objects with the subset dataset
lst <- lapply(mget(tables),function(x)x[ID %in% vec,])
list2env(lst, envir=.GlobalEnv)

You could also just name the datasets in the list:
tables <- c("dt1","dt2","dt3","dt4")
dflist <- lapply(mget(tables),function(x)x[ID %in% vec,])
dflist <- setNames(dflist, tables)