Extract columns with same names from multiple data frames [R] - r

I am dealing with about 10 data frames that have the same column names, but different number of rows. I would like to create a list of all columns with the same names.
So, say i have 2 data frames with the same names.
a<-seq(0,20,1)
b<-seq(20,40,1)
c<-seq(10,30,1)
df.abc.1<-data.frame(a,b,c)
a<-seq(20,50,1)
b<-seq(10,40,1)
c<-seq(30,60,1)
df.abc.2<-data.frame(a,b,c)
I know i can create a list from this data such as,
list(df.abc.1$a, df.abc.2$a)
but i don't want to type out my long data frame names and column names.
I was hoping to do something like this,
list(c(df.abc.1, df.abc.2)$a)
But, it returns a list of df.abc.1$a
Perhaps there could be a way to use the grep function across multiple data.frames?
Perhaps a loop could accomplish this task?

Not sure if it's any better, but maybe
lapply(list(df.abc.1, df.abc.2), function(x) x$a)
For more than one column
lapply(list(df.abc.1, df.abc.2), function(x) x[, c("a","b")])

Related

How to dynamically create and name data frames in a for loop

I am trying to generate data frame subsets for each respondent in a data frame using a for loop.
I have a large data frame with columns titled "StandardCorrect", "NameProper", "StartTime", "EndTime", "AScore", and "StandardScore" and several thousand rows.
I want to make a subset data frame for each person's name so I can generate statistics for each respondent.
I tried using a for loop
for(name in 1:length(NamesList)){ name <- DigiNONA[DigiNONA$NameProper == NamesList[name], ] }
NamesList is just a list containing all the levels of NamesProper (which isa factor variable)
All I want the loop to do is each iteration, generate a new data frame with the name "NamesList[name]" and I want that data frame to contain a subset of the main data frame where NameProper corresponds to the name in the list for that iteration.
This seems like it should be simple I just can;t figure out how to get r to dynamically generate data frames with different names for each iteration.
Any advice would be appreciated, thank you.
The advice to use assign for this purpose is technically feasible, but incorrect in the sense that it is widely deprecated by experienced users of R. Instead what should be done is to create a single list with named elements each of which contains the data from a single individual. That way you don't need to keep a separate data object with the names of the resulting objects for later access.
named_Dlist <- setNames( split( DigiNONA, DigiNONA$NameProper),
NamesList)
This would allow you to access individual dataframes within the named_Dlist object:
named_Dlist[[ NamesList[1] ]] # The dataframe with the first person in that NamesList vector.
It's probably better to use the term list only for true R lists and not for atomic character vectors.

Remove rows for multiple dataframes having a name matching a pattern

I am trying to remove the first 9 rows of multiple dataframes that have the same structures but different names (keeping similar name structure). In my example, there are 4 dataframes with respectively the names
Mydataframe_A, Mydataframe_B, Mydataframe_C, Mydataframe_D.
Currently it is working with the following code:
`Mydataframe_A`<- `Mydataframe_A`[-c(1:9),]
`Mydataframe_B`<- `Mydataframe_B`[-c(1:9),]
`Mydataframe_C`<- `Mydataframe_C`[-c(1:9),]
`Mydataframe_D`<- `Mydataframe_D`[-c(1:9),]
But I would like to write this is with only one line and not having to specify each time each name of dataframe.
I think this could work by using a pattern name and lists because for example this is what I am doing to rbind different dataframes:
All_mydataframes <- rbindlist(mget(ls(pattern = "^Mydataframe_")))
Any idea on how to do this ?
Thanks a ton!
Since mget turns this into a list, you can use apply family functions:
rbindlist(lapply(mget(ls(pattern = "^Mydataframe_")), function(x) x[-c(1:9), ]))
This takes the list from mget and removes the first 9 rows, then rbind it from list to data.table. The only problem is you can't differentiate what data.frame the original data was part of.

How to use the names in a list to access a dataframe column

I have two lists and a dataframe. The columns in the dataframe have the same names as the entries in the list. The dataframe has other columns as well, other than the ones specified in the lists
category.list <- c('Reserve_Book','choicepriv_and_points','Latency_freeze_load','signin','gift_card','mystery_gift','credit_card','call_support','account')
crosstab.list <- c('browser','OS','Device','comment_cat','comment_focus','recommend')
Now, how do I iterate through the elements in the list and use them to access the dataframe columns?
Below is the code, I am trying but I am getting errors while trying to access the dataframe column via the iterator variable.
for (i in category.list){
for (j in crosstab.list){
ftable(dataframe[j]~dataframe[i])
}
}
Specifically to your question, your dataframe references need to specify both which columns are desired and which rows.
ftable(dataframe[j]~dataframe[i])
needs to be
ftable(dataframe[,j]~dataframe[,i])
Note the addition of commas

R - Subset based on column name

My data frame has over 120 columns (variables) and I would like to create subsets bases on column names.
For example I would like to create a subset where the column name includes the string "mood". Is this possible?
I generally use
SubData <- myData[,grep("whatIWant", colnames(myData))]
I know very well that the "," is not necessary and
colnames
could be replaced by
names
but it would not work with matrices and I hate to change the formalism when changing objects.

How can I use pattern to combine data frames using a wildcard?

A series of functions generate varying number of data frames (minimum of 1 and a max of 11).
I'd like to combine them using rbind. If I knew the names, I could easily just rbind(d1,d2...) but can't do that since I have to combine a different number of data frames each time.
So lags=rbind(pattern("lags_2_Y*")) didn't work.
I can get the list of the generated lag names into a vector like so: lag_names=ls(pattern="lags_2_Y*")
If I do: lags=llply(lag_names,rbind), I just get a list with the lag names. I want to rbind the contents of those data frames.
Ideas?
try
library(plyr)
lags = ldply(lag_names, get)
Edit:
If you give lag_names names, ldply() will add an id column
names(lag_names) <- lag_names
lags = ldply(lag_names, get)

Resources