I am doing a series of intricate data manipulations and in doing so I create a series of dataframes from one "source" dataframe and dynamically name all my "subset" dataframes. They all have the same structure (columns) and I want to bind them together.
The challenge I have is that I can't seem to get the syntax for binding right after I dynamically name/create these dataframes.
So to create my "subset" dataframes I get the desired data into a dataframe called df_master and name it using assign. I do this inside of a for loop so I end up with 10 subset dataframes. Pseudo code looks like this:
for (i in 1:10){
.... do some stuff ...
master_df <- save into a df
assign(paste0("df_months_", i), df_master) # dynamically (re) name df_master
}
This works fine and I get my 10 dataframes names df_months_1, df_months_2 , etc.
The trouble comes in when I want to bind. This post recommends binding multiple dataframes using do.call. In order to do that I need to put my "subset" dataframes in a list and then use do.call and rbind. This is the part I cant get right. I think I need a list of the subset dataframes themselves. But I can't seem to create that list.
Per the linked solution I need:
new_df <- do.call("rbind", list(df_months_1, df_months_2, ...)
Not sure how to create that list given that I am dynamically creating names.
As we have create multiple objects in the global env (not recommended), check those objects in the global env with ls with a regex as pattern
ls(pattern = "^df_months_\\d+$")
It returns a vector of object names that matches the pattern - df_months_ from the start (^) of the string followed by one or more digits (\\d+) till the end ($) of the string
Now, we get the values of the objects. For >=1 objects, use mget which returns a key/value pair as a named list.
mget(ls(pattern = "^df_months_\\d+$"))
Then, we use rbind within do.call to bind the elements of the list
do.call(rbind, mget(ls(pattern = "^df_months_\\d+$")))
Related
T12 is a data frame with 22 columns (but I just want column 2 till 8) and about one million entries.
Some of the Entries are NA in column one. Everytime there is NA in first column, complete cases deletes the complete row. Everything works well.
I Have a lot more data frames and I don't want to write the whole code again for every data frame.
I would like to have something like this function and want to put as x T12, T13, T14, T15 and so on.
Might you help me?
split <- function (x){
x <- x[,2:8]
x <- x[complete.cases(x[ ,1]),]
}
If you have dataframes named "T12", "T13" etc, you can use the pattern "T" followed by a number to capture all such dataframes in a character vector using ls.
Using mget you can get dataframes from those character vector in a named list.
You can then use lapply to apply split function on each list.
new_data <- lapply(mget(ls(pattern = 'T\\d+')), split)
new_data has list of dataframes. If you want these changes to reflect in original dataframe use list2env.
list2env(new_data, .GlobalEnv)
PS - split is a default function in R, so it is better to give some different name to your function.
I am trying to remove the first 9 rows of multiple dataframes that have the same structures but different names (keeping similar name structure). In my example, there are 4 dataframes with respectively the names
Mydataframe_A, Mydataframe_B, Mydataframe_C, Mydataframe_D.
Currently it is working with the following code:
`Mydataframe_A`<- `Mydataframe_A`[-c(1:9),]
`Mydataframe_B`<- `Mydataframe_B`[-c(1:9),]
`Mydataframe_C`<- `Mydataframe_C`[-c(1:9),]
`Mydataframe_D`<- `Mydataframe_D`[-c(1:9),]
But I would like to write this is with only one line and not having to specify each time each name of dataframe.
I think this could work by using a pattern name and lists because for example this is what I am doing to rbind different dataframes:
All_mydataframes <- rbindlist(mget(ls(pattern = "^Mydataframe_")))
Any idea on how to do this ?
Thanks a ton!
Since mget turns this into a list, you can use apply family functions:
rbindlist(lapply(mget(ls(pattern = "^Mydataframe_")), function(x) x[-c(1:9), ]))
This takes the list from mget and removes the first 9 rows, then rbind it from list to data.table. The only problem is you can't differentiate what data.frame the original data was part of.
I have a small problem but I cannot manage to figure it out.
I have 2 list of multiple tibbles generated by using dplyr read.xlsx function. Now i want to merge the first element of each list with each other using the left_join function and by the name of a shared column, that means that the tibbles in both lists share a column named Study File ID. So i want to merge each element of each list with each other using left_join by=c(Study File ID= Study FIle ID) for instance. I know how to merge a single list with Tibble dataframes but not using each element of 2 lists. Hopefully someone can help me
files_all_compounds <- list.files("~/Internship/Internship/Script/Results - Excel files/Script_map", pattern = '*CompoundsPerFile_ML*')
input_files <- list.files("~/Internship/Internship/Script/Results - Excel files/Script_map", pattern = '*inputfiles_ML*')
setwd("~/Internship/Internship/Script/Results - Excel files/Script_map")
data_all_compounds <- lapply(files_all_compounds,read_xlsx)
data_input_files <- lapply(input_files,read_xlsx)
...
I have one vector storing character strings like so:
labels <- as.character(c('site1', 'site2'))
Except my vectors stores 100s of sites. And then I have data frames, named after each site (site1, site2, etc.), that have dozens of measurements. I want to use for loops to iteratively access and graph values from the data frames. In doing so, I was hoping to use the value returned from subsetting the first vector to subset the data frame, like so:
y1<-(labels[1]$measurements)
But I haven't been able to figure it out.
Thanks.
We can use mget to get the values from each of the object names in the 'labels' vector. The output will be a list. We can loop the list using lapply and extract the measurements column (if I understand the code correctly).
lst <- lapply(mget(labels), function(x) x$measurements)
It may be better to do all the operations within the list. But, if you need to create some additional objects in the global environment (not recommended), we can change the names of the list elements using paste and then use list2env.
colnames(lst) <- paste0('y', seq_along(lst))
list2env(lst, envir=.GlobalEnv)
y1
y2
I have a collection of data frames that I have generated in R. I need to count the number of data frames whose names begin with "entry_". I'd like to generate a number to then use for a function that rbinds all of these data frames and these data frames only.
So far, I have tried using grep to identify the data frames, however, this just returns where they are indexed in my object list (e.g., 16:19 --- objects 16-19 begin with "entry_"):
count_entry <- (grep("entry_", objects()))
Eventually I would like to rbind all of these data frames like so:
list.make <- function() {
sapply(paste('entry_', seq(1:25), sep=''), get, environment(), simplify = FALSE)
}
all.entries <- list.make()
final.data <- rbind.fill(all.entries)
I don't want to have to enter the sequence manually every time (for example (1:25) in the code above), which is why I'm hoping to be able to automatically count the data frames beginning with "entry_".
If anyone has any ideas of how to solve this, or how to go about this in a better way, I'm all ears!
Per comment by docendo: The ls function will list objects in an environment that match a regex pattern. You can then use mget to retrieve those objects as a list:
mylist <- mget(ls(pattern = "^entry_"))
That will then work with rbind.fill. You can then remove the original objects using something similar: rm(ls(pattern = "^entry_"))