Remove rows for multiple dataframes having a name matching a pattern - r

I am trying to remove the first 9 rows of multiple dataframes that have the same structures but different names (keeping similar name structure). In my example, there are 4 dataframes with respectively the names
Mydataframe_A, Mydataframe_B, Mydataframe_C, Mydataframe_D.
Currently it is working with the following code:
`Mydataframe_A`<- `Mydataframe_A`[-c(1:9),]
`Mydataframe_B`<- `Mydataframe_B`[-c(1:9),]
`Mydataframe_C`<- `Mydataframe_C`[-c(1:9),]
`Mydataframe_D`<- `Mydataframe_D`[-c(1:9),]
But I would like to write this is with only one line and not having to specify each time each name of dataframe.
I think this could work by using a pattern name and lists because for example this is what I am doing to rbind different dataframes:
All_mydataframes <- rbindlist(mget(ls(pattern = "^Mydataframe_")))
Any idea on how to do this ?
Thanks a ton!

Since mget turns this into a list, you can use apply family functions:
rbindlist(lapply(mget(ls(pattern = "^Mydataframe_")), function(x) x[-c(1:9), ]))
This takes the list from mget and removes the first 9 rows, then rbind it from list to data.table. The only problem is you can't differentiate what data.frame the original data was part of.

Related

Dynamically create/name a series of data.frames and bind them together

I am doing a series of intricate data manipulations and in doing so I create a series of dataframes from one "source" dataframe and dynamically name all my "subset" dataframes. They all have the same structure (columns) and I want to bind them together.
The challenge I have is that I can't seem to get the syntax for binding right after I dynamically name/create these dataframes.
So to create my "subset" dataframes I get the desired data into a dataframe called df_master and name it using assign. I do this inside of a for loop so I end up with 10 subset dataframes. Pseudo code looks like this:
for (i in 1:10){
.... do some stuff ...
master_df <- save into a df
assign(paste0("df_months_", i), df_master) # dynamically (re) name df_master
}
This works fine and I get my 10 dataframes names df_months_1, df_months_2 , etc.
The trouble comes in when I want to bind. This post recommends binding multiple dataframes using do.call. In order to do that I need to put my "subset" dataframes in a list and then use do.call and rbind. This is the part I cant get right. I think I need a list of the subset dataframes themselves. But I can't seem to create that list.
Per the linked solution I need:
new_df <- do.call("rbind", list(df_months_1, df_months_2, ...)
Not sure how to create that list given that I am dynamically creating names.
As we have create multiple objects in the global env (not recommended), check those objects in the global env with ls with a regex as pattern
ls(pattern = "^df_months_\\d+$")
It returns a vector of object names that matches the pattern - df_months_ from the start (^) of the string followed by one or more digits (\\d+) till the end ($) of the string
Now, we get the values of the objects. For >=1 objects, use mget which returns a key/value pair as a named list.
mget(ls(pattern = "^df_months_\\d+$"))
Then, we use rbind within do.call to bind the elements of the list
do.call(rbind, mget(ls(pattern = "^df_months_\\d+$")))

How do I pass a data frame as an argument to a function?

T12 is a data frame with 22 columns (but I just want column 2 till 8) and about one million entries.
Some of the Entries are NA in column one. Everytime there is NA in first column, complete cases deletes the complete row. Everything works well.
I Have a lot more data frames and I don't want to write the whole code again for every data frame.
I would like to have something like this function and want to put as x T12, T13, T14, T15 and so on.
Might you help me?
split <- function (x){
x <- x[,2:8]
x <- x[complete.cases(x[ ,1]),]
}
If you have dataframes named "T12", "T13" etc, you can use the pattern "T" followed by a number to capture all such dataframes in a character vector using ls.
Using mget you can get dataframes from those character vector in a named list.
You can then use lapply to apply split function on each list.
new_data <- lapply(mget(ls(pattern = 'T\\d+')), split)
new_data has list of dataframes. If you want these changes to reflect in original dataframe use list2env.
list2env(new_data, .GlobalEnv)
PS - split is a default function in R, so it is better to give some different name to your function.

using rbind to combine all data sets the names of all data set start with common characters

I want to combine all rows of different data sets. The names of all data sets starts with test. All data sets have same number of observations. I know i can combine it by using rbind(). But typing the names of every data set will take a lot of time. Suggest me some better approach.
rbind(test1,test2,test3,test4)
Try first obtaining a vector of all matching objects using ls() with the pattern ^test:
dfs <- lapply(ls(pattern="^test"), function(x) get(x))
result <- rbindlist(dfs)
I am taking the suggestion by #Rohit to use rbindlist to make our lives easier to rbind together a list of data frames.
Second line of above code will work only if data sets are in data.table form or data frame form. IF data sets are in xts/zoo format then one have to make slight improvement use do.call() function.
## First make a list of all your data sets as suggested above
list_xts <- lapply(ls(pattern="^test"), function(x) get(x))
## then use do call and rbind()
xts_results<-do.call(rbind,list_xts)

R dplyr 2 list of multiple tibbles, merge them with left_join

I have a small problem but I cannot manage to figure it out.
I have 2 list of multiple tibbles generated by using dplyr read.xlsx function. Now i want to merge the first element of each list with each other using the left_join function and by the name of a shared column, that means that the tibbles in both lists share a column named Study File ID. So i want to merge each element of each list with each other using left_join by=c(Study File ID= Study FIle ID) for instance. I know how to merge a single list with Tibble dataframes but not using each element of 2 lists. Hopefully someone can help me
files_all_compounds <- list.files("~/Internship/Internship/Script/Results - Excel files/Script_map", pattern = '*CompoundsPerFile_ML*')
input_files <- list.files("~/Internship/Internship/Script/Results - Excel files/Script_map", pattern = '*inputfiles_ML*')
setwd("~/Internship/Internship/Script/Results - Excel files/Script_map")
data_all_compounds <- lapply(files_all_compounds,read_xlsx)
data_input_files <- lapply(input_files,read_xlsx)
...

Extract columns with same names from multiple data frames [R]

I am dealing with about 10 data frames that have the same column names, but different number of rows. I would like to create a list of all columns with the same names.
So, say i have 2 data frames with the same names.
a<-seq(0,20,1)
b<-seq(20,40,1)
c<-seq(10,30,1)
df.abc.1<-data.frame(a,b,c)
a<-seq(20,50,1)
b<-seq(10,40,1)
c<-seq(30,60,1)
df.abc.2<-data.frame(a,b,c)
I know i can create a list from this data such as,
list(df.abc.1$a, df.abc.2$a)
but i don't want to type out my long data frame names and column names.
I was hoping to do something like this,
list(c(df.abc.1, df.abc.2)$a)
But, it returns a list of df.abc.1$a
Perhaps there could be a way to use the grep function across multiple data.frames?
Perhaps a loop could accomplish this task?
Not sure if it's any better, but maybe
lapply(list(df.abc.1, df.abc.2), function(x) x$a)
For more than one column
lapply(list(df.abc.1, df.abc.2), function(x) x[, c("a","b")])

Resources