How can I use pattern to combine data frames using a wildcard? - r

A series of functions generate varying number of data frames (minimum of 1 and a max of 11).
I'd like to combine them using rbind. If I knew the names, I could easily just rbind(d1,d2...) but can't do that since I have to combine a different number of data frames each time.
So lags=rbind(pattern("lags_2_Y*")) didn't work.
I can get the list of the generated lag names into a vector like so: lag_names=ls(pattern="lags_2_Y*")
If I do: lags=llply(lag_names,rbind), I just get a list with the lag names. I want to rbind the contents of those data frames.
Ideas?

try
library(plyr)
lags = ldply(lag_names, get)
Edit:
If you give lag_names names, ldply() will add an id column
names(lag_names) <- lag_names
lags = ldply(lag_names, get)

Related

How to create a matrix/data frame from a high number of single objects by using a loop?

I have a high number of single objects each one containing a mean value for a year. They are called cddmean1950, cddmean1951, ... ,cddmean2019.
Now I would like to put them together into a matrix or data frame with the first column being the year (1950 - 2019) and the second column being the single mean values.
This is a very long way to do it without looping:
matrix <- rbind(cddmean1950,cddmean1951,cddmean1952,...,cddmean2019)
Afterwards you transform the matrix to a data frame, create a vector with the years and add it to the data frame.
I am sure there must be a smarter and faster way to do this by using a loop or anything else?
I think this could be an easy way to do it. Provided all those single objects are in your current environment.
First we would create a list of the variable names using the paste0 function
YearRange <- 1950:2019
ObjectName <- paste0("cddmean", YearRange)
Then we can use lapply and get to get the values of all these variables as a list.
Then using do.call and rbind we can rbind all these values into a single vector and then finally create your dataframe as you requested.
ListofSingleObjects <- lapply(ObjectName, get)
MeanValues <- do.call(rbind,ListofSingleObjects)
df <- data.frame( year = YearRange , Mean = MeanValues )

How to bind several data frames obtained from web scraping, using a for loop?

So I have vector that is basically a list of species such as:
list_species<-c("Pomphorhynchus laevis","Profilicollis altmani","Leptorhynchoides thecatus","Mayarhynchus karlae","Oligacanthorhynchus tortuosa","Pseudoacanthocephalus toshimai","Corynosoma australe")
And I have this function, which mines data on several specimens for each of those species:
library(bold)
df<-bold_seqspec(name_of_species, format = "tsv")
I want to use the bold_seqspec function to create one data frame for each of the elements in list_species, so far I tried like this:
for (name_of_species in list_species){
df<-bold_seqspec(name_of_species, format = "tsv")
joined_dfs<-rbind(df)
}
What I wanted was a data frame that is the sum of all the data frames which were downloaded for in species name in list_species.
But what I'm getting is a data frame with one observation one, so something must be wrong in the code.
Since you want to apply this for multiple species, you need to loop over them.
You can use purrr's map functions.
joined_dfs <- purrr::map_df(list_species, bold::bold_seqspec)
Try
do.call(rbind, lapply(list_species, bold_seqspec, format = "tsv"))
Explanation: lapply(list_species, bold_seqspec, format = "tsv") loops through list_species and applies bold_seqspec to every element with argument format = "tsv". The return object is a list of bold_seqspec return objects; assuming they are data.frames you can then row-bind them with do.call(rbind, ...), producing a single data.frame.

using rbind to combine all data sets the names of all data set start with common characters

I want to combine all rows of different data sets. The names of all data sets starts with test. All data sets have same number of observations. I know i can combine it by using rbind(). But typing the names of every data set will take a lot of time. Suggest me some better approach.
rbind(test1,test2,test3,test4)
Try first obtaining a vector of all matching objects using ls() with the pattern ^test:
dfs <- lapply(ls(pattern="^test"), function(x) get(x))
result <- rbindlist(dfs)
I am taking the suggestion by #Rohit to use rbindlist to make our lives easier to rbind together a list of data frames.
Second line of above code will work only if data sets are in data.table form or data frame form. IF data sets are in xts/zoo format then one have to make slight improvement use do.call() function.
## First make a list of all your data sets as suggested above
list_xts <- lapply(ls(pattern="^test"), function(x) get(x))
## then use do call and rbind()
xts_results<-do.call(rbind,list_xts)

Applying a function to a dataframe to trim empty columns within a list environment R

I am a naive user of R and am attempting to come to terms with the 'apply' series of functions which I now need to use due to the complexity of the data sets.
I have large, ragged, data frame that I wish to reshape before conducting a sequence of regression analyses. It is further complicated by having interlaced rows of descriptive data(characters).
My approach to date has been to use a factor to split the data frame into sets with equal row lengths (i.e. a list), then attempt to remove the trailing empty columns, make two new, matching lists, one of data and one of chars and then use reshape to produce a common column number, then recombine the sets in each list. e.g. a simplified example:
myDF <- as.data.frame(rbind(c("v1",as.character(1:10)),
c("v1",letters[1:10]),
c("v2",c(as.character(1:6),rep("",4))),
c("v2",c(letters[1:6], rep("",4)))))
myDF[,1] <- as.factor(myDF[,1])
myList <- split(myDF, myDF[,1])
myList[[1]]
I can remove the empty columns for an individual set and can split the data frame into two sets from the interlacing rows but have been stumped with the syntax in writing a function to apply the following function to the list - though 'lapply' with 'seq_along' should do it?
Thus for the individual set:
DF <- myList[[2]]
DF <- DF[,!sapply(DF, function(x) all(x==""))]
DF
(from an earlier answer to a similar, but simpler example on this site). I have a large data set and would like an elegant solution (I could use a loop but that would not use the capabilities of R effectively). Once I have done that I ought to be able to use the same rationale to reshape the frames and then recombine them.
regards
jac
Try
lapply(split(myDF, myDF$V1), function(x) x[!colSums(x=='')])

Split dataframe using two columns of data and apply common transformation on list of resulting dataframes

I want to split a large dataframe into a list of dataframes according to the values in two columns. I then want to apply a common data transformation on all dataframes (lag transformation) in the resulting list. I'm aware of the split command but can only get it to work on one column of data at a time.
You need to put all the factors you want to split by in a list, eg:
split(mtcars,list(mtcars$cyl,mtcars$gear))
Then you can use lapply on this to do what else you want to do.
If you want to avoid having zero row dataframes in the results, there is a drop parameter whose default is the opposite of the drop parameter in the "[" function.
split(mtcars,list(mtcars$cyl,mtcars$gear), drop=TRUE)
how about this one:
library(plyr)
ddply(df, .(category1, category2), summarize, value1 = lag(value1), value2=lag(value2))
seems like an excelent job for plyr package and ddply() function. If there are still open questions please provide some sample data. Splitting should work on several columns as well:
df<- data.frame(value=rnorm(100), class1=factor(rep(c('a','b'), each=50)), class2=factor(rep(c('1','2'), 50)))
g <- c(factor(df$class1), factor(df$class2))
split(df$value, g)
You can also do the following:
split(x = df, f = ~ var1 + var2...)
This way, you can also achieve the same split dataframe by many variables without using a list in the f parameter.

Resources