merging list of multiple data frames - r

I'm trying to merge all the data frames in my current environment into one data frame, initially I tried
Reduce(function(x,y) merge(x,y,by="Date"),list(ls()))
But this didn't work, just returning the a list of data frame names.
I know it will work if I do
Reduce(function(x,y) merge(x,y,by="Date"),list(df1,df2,df3....))
But why doesn't the initial attempt work?
Both
typeof(list(ls()))
typeof(list(df1,df2,df3))
Return type "list"
What can I do if there are so many data frames I can't input them all into the Reduce function?

Try this:
lst = lapply(Filter(function(x) class(get(x))=='data.frame', ls(env=globalenv())), get)
Reduce(function(x,y) merge(x,y,by="Date"),lst)

Related

How to bind several data frames obtained from web scraping, using a for loop?

So I have vector that is basically a list of species such as:
list_species<-c("Pomphorhynchus laevis","Profilicollis altmani","Leptorhynchoides thecatus","Mayarhynchus karlae","Oligacanthorhynchus tortuosa","Pseudoacanthocephalus toshimai","Corynosoma australe")
And I have this function, which mines data on several specimens for each of those species:
library(bold)
df<-bold_seqspec(name_of_species, format = "tsv")
I want to use the bold_seqspec function to create one data frame for each of the elements in list_species, so far I tried like this:
for (name_of_species in list_species){
df<-bold_seqspec(name_of_species, format = "tsv")
joined_dfs<-rbind(df)
}
What I wanted was a data frame that is the sum of all the data frames which were downloaded for in species name in list_species.
But what I'm getting is a data frame with one observation one, so something must be wrong in the code.
Since you want to apply this for multiple species, you need to loop over them.
You can use purrr's map functions.
joined_dfs <- purrr::map_df(list_species, bold::bold_seqspec)
Try
do.call(rbind, lapply(list_species, bold_seqspec, format = "tsv"))
Explanation: lapply(list_species, bold_seqspec, format = "tsv") loops through list_species and applies bold_seqspec to every element with argument format = "tsv". The return object is a list of bold_seqspec return objects; assuming they are data.frames you can then row-bind them with do.call(rbind, ...), producing a single data.frame.

Binding data frames created by loops in R

I have a number of R scripts that create data frames of the same length and I am trying to aggregate all the data frames into one.
I used a for loop to run those R scripts:
for(i in sample){
source(i)
}
This does create all the data frames I need. But is there a good way to include a function that binds those data frames together within that for loop?
Assuming source(i) returns a data frame, you can combine all the data frames together with something like:
do.call(rbind, lapply(sample, source))

Count the number of data frames beginning with prefix in R

I have a collection of data frames that I have generated in R. I need to count the number of data frames whose names begin with "entry_". I'd like to generate a number to then use for a function that rbinds all of these data frames and these data frames only.
So far, I have tried using grep to identify the data frames, however, this just returns where they are indexed in my object list (e.g., 16:19 --- objects 16-19 begin with "entry_"):
count_entry <- (grep("entry_", objects()))
Eventually I would like to rbind all of these data frames like so:
list.make <- function() {
sapply(paste('entry_', seq(1:25), sep=''), get, environment(), simplify = FALSE)
}
all.entries <- list.make()
final.data <- rbind.fill(all.entries)
I don't want to have to enter the sequence manually every time (for example (1:25) in the code above), which is why I'm hoping to be able to automatically count the data frames beginning with "entry_".
If anyone has any ideas of how to solve this, or how to go about this in a better way, I'm all ears!
Per comment by docendo: The ls function will list objects in an environment that match a regex pattern. You can then use mget to retrieve those objects as a list:
mylist <- mget(ls(pattern = "^entry_"))
That will then work with rbind.fill. You can then remove the original objects using something similar: rm(ls(pattern = "^entry_"))

Binding data frames stored in a list in R

I have several data frames stored in R memory among several other objects.
Their particularity is that they are all named as "Station_Year.df".
I want to merge all these data frames into one.
I tried:
df_list <- ls(pattern=".df")
dataset <- rbind(df_list)
But I get a data frame with the names of the data frames...
You should use mget to get the data of each dataframe of the df_list. So you can do:
dataset <- do.call(rbind, mget(df_list))
Note that this implies that all the rows are of the same length. Probably you find useful also the merge function.
Thanks alexis_laz, I forgot the do.call.

Storing multiple data frames into one data structure - R

Is it possible to have multiple data frames to be stored into one data structure and process it later by each data frame? i.e. example
df1 <- data.frame(c(1,2,3), c(4,5,6))
df2 <- data.frame(c(11,22,33), c(44,55,66))
.. then I would like to have them added in a data structure, such that I can loop through that data structure retrieving each data frame one at a time and process it, something like
for ( iterate through the data structure) # this gives df1, then df2
{
write data frame to a file
}
I cannot find any such data structure in R. Can anyone point me to any code that illustrates the same functionality?
Just put the data.frames in a list. A plus is that a list works really well with apply style loops. For example, if you want to save the data.frame's, you can use mapply:
l = list(df1, df2)
mapply(write.table, x = l, file = c("df1.txt", "df2.txt"))
If you like apply style loops (and you will, trust me :)) please take a look at the epic plyr package. It might not be the fastest package (look data.table for fast), but it drips with syntactic sugar.
Lists can be used to hold almost anything, including data.frames:
## Versatility of lists
l <- list(file(), new.env(), data.frame(a=1:4))
For writing out multiple data objects stored in a list, lapply() is your friend:
ll <- list(df1=df1, df2=df2)
## Write out as *.csv files
lapply(names(ll), function(X) write.csv(ll[[X]], file=paste0(X, ".csv")))
## Save in *.Rdata files
lapply(names(ll), function(X) {
assign(X, ll[[X]])
save(list=X, file=paste0(X, ".Rdata"))
})
What you are looking for is a list.
You can use a function like lapply to treat each of your data frames in the same manner sperately. However, there might be cases where you need to pass your list of data frames to a function that handles the data frames in relation to each other. In this case lapply doesn't help you.
That's why it is important to note how you can access and iterate the data frames in your list. It's done like this:
mylist[[data frame]][row,column]
Note the double brackets around your data frame index.
So for your example it would be
df1 <- data.frame(c(1,2,3), c(4,5,6))
df2 <- data.frame(c(11,22,33), c(44,55,66))
mylist<-list(df1,df2)
mylist[[1]][1,2] would return 4, whereas mylist[1][1,2] would return NULL. It took a while for me to find this, so I thought it might be helpful to post here.

Resources