R : how to append data frames in a list? - r

i am trying to produce data frames using for loop.
How can i append these data frames to a list and then check if any frame is empty or not ?
I would like to remove the data frames with empty rows from the list.
any help is appreciated

You should use lapply here without using a for loop. The advantages are:
You want to create a list of data.frame and lapply create a list
You do the job once , no need to do 2 loops.
Somethinkg like :
lapply(seq_len(nbr_df),function(x)
{
## code to create you data.frame dt
## dt = data.frame(...)
if(nrow(dt)>0) dt
})
second option: data.frames already created in separate variables:
We assume that your variable have a certain pattern, say patt:
lapply(mget(ls(pattern=patt)),function(x)if(nrow(x)>0)x)

To append to a list you can
Your_list= list()
for(i in numbOfPosibleDF){
k <- data.frame()
if(nrow(k)!=0){
Your_list[paste0(df,i)] = k
}
}

I would just add valid data frames to the list instead of removing them afterwards. If you want or need to use a for-loop (instead of lapply function), you may use following:
# init list
list.of.df <- list()
# start your loop to
# create data frame etc.
# ....
df <- data.frame(1,2)
# add to list
if (!is.null(df) && nrow(df)>0) list.of.df[[length(list.of.df)+1]] <- df
# ... end of loop here.

For the benefit of anyone finding this otherwise dead-end page by its title, the way to concatenate consistently formatted data.frames that are items of a list is with plyr:
rbind.fill.matrix(lst)

I would like to give a better picture of the scenario :
the frames may or may not have same number of columns/rows.
the data frames are dynamically produced using a for loop.
the frames have all data types: numeric , factor, character.

Related

How to bind several data frames obtained from web scraping, using a for loop?

So I have vector that is basically a list of species such as:
list_species<-c("Pomphorhynchus laevis","Profilicollis altmani","Leptorhynchoides thecatus","Mayarhynchus karlae","Oligacanthorhynchus tortuosa","Pseudoacanthocephalus toshimai","Corynosoma australe")
And I have this function, which mines data on several specimens for each of those species:
library(bold)
df<-bold_seqspec(name_of_species, format = "tsv")
I want to use the bold_seqspec function to create one data frame for each of the elements in list_species, so far I tried like this:
for (name_of_species in list_species){
df<-bold_seqspec(name_of_species, format = "tsv")
joined_dfs<-rbind(df)
}
What I wanted was a data frame that is the sum of all the data frames which were downloaded for in species name in list_species.
But what I'm getting is a data frame with one observation one, so something must be wrong in the code.
Since you want to apply this for multiple species, you need to loop over them.
You can use purrr's map functions.
joined_dfs <- purrr::map_df(list_species, bold::bold_seqspec)
Try
do.call(rbind, lapply(list_species, bold_seqspec, format = "tsv"))
Explanation: lapply(list_species, bold_seqspec, format = "tsv") loops through list_species and applies bold_seqspec to every element with argument format = "tsv". The return object is a list of bold_seqspec return objects; assuming they are data.frames you can then row-bind them with do.call(rbind, ...), producing a single data.frame.

using rbind to combine all data sets the names of all data set start with common characters

I want to combine all rows of different data sets. The names of all data sets starts with test. All data sets have same number of observations. I know i can combine it by using rbind(). But typing the names of every data set will take a lot of time. Suggest me some better approach.
rbind(test1,test2,test3,test4)
Try first obtaining a vector of all matching objects using ls() with the pattern ^test:
dfs <- lapply(ls(pattern="^test"), function(x) get(x))
result <- rbindlist(dfs)
I am taking the suggestion by #Rohit to use rbindlist to make our lives easier to rbind together a list of data frames.
Second line of above code will work only if data sets are in data.table form or data frame form. IF data sets are in xts/zoo format then one have to make slight improvement use do.call() function.
## First make a list of all your data sets as suggested above
list_xts <- lapply(ls(pattern="^test"), function(x) get(x))
## then use do call and rbind()
xts_results<-do.call(rbind,list_xts)

R. Apply same function to similar data frames. Generate new data frames with same name pattern

I have seen similar posts but no one that address specifically this question. I have 22 data frames called chr1,chr2,chr3,...,chr22. I want to apply a home-made function "diff_set" to all these data frames and generate 22 new data frames with names chr1.1,chr2.1,chr3.1,...,chr22.1. The function is applied to one of the columns. For instance, for chr1, I apply diff_set and generate chr1.1:
chr1.1 = diff_set(chr1$POSITION, 200000)
Any suggestion is welcome !
Simply lapply a list of dataframes on your function, diff_set, rename your output list and then if really needed but should be avoided run list2env to save individual dfs as separate objects:
output_list <- lapply(mget(paste0("chr", seq(1,22))),
function(df) diff_set(df$POSITION, 20000))
output_list <- setNames(output_list, paste0("chr", seq(1,22)+0.1))
# FIRST THREE DFS
output_list$chr1.1
output_list$chr2.1
output_list$chr3.1
# OUTPUT EACH DF AS SEPARATE OBJECT
# (BUT CONSIDER AVOIDING THIS AS YOU FLOOD GLOBAL ENVIRONMENT)
list2env(output_list, envir=.GlobalEnv)

Count the number of data frames beginning with prefix in R

I have a collection of data frames that I have generated in R. I need to count the number of data frames whose names begin with "entry_". I'd like to generate a number to then use for a function that rbinds all of these data frames and these data frames only.
So far, I have tried using grep to identify the data frames, however, this just returns where they are indexed in my object list (e.g., 16:19 --- objects 16-19 begin with "entry_"):
count_entry <- (grep("entry_", objects()))
Eventually I would like to rbind all of these data frames like so:
list.make <- function() {
sapply(paste('entry_', seq(1:25), sep=''), get, environment(), simplify = FALSE)
}
all.entries <- list.make()
final.data <- rbind.fill(all.entries)
I don't want to have to enter the sequence manually every time (for example (1:25) in the code above), which is why I'm hoping to be able to automatically count the data frames beginning with "entry_".
If anyone has any ideas of how to solve this, or how to go about this in a better way, I'm all ears!
Per comment by docendo: The ls function will list objects in an environment that match a regex pattern. You can then use mget to retrieve those objects as a list:
mylist <- mget(ls(pattern = "^entry_"))
That will then work with rbind.fill. You can then remove the original objects using something similar: rm(ls(pattern = "^entry_"))

Storing multiple data frames into one data structure - R

Is it possible to have multiple data frames to be stored into one data structure and process it later by each data frame? i.e. example
df1 <- data.frame(c(1,2,3), c(4,5,6))
df2 <- data.frame(c(11,22,33), c(44,55,66))
.. then I would like to have them added in a data structure, such that I can loop through that data structure retrieving each data frame one at a time and process it, something like
for ( iterate through the data structure) # this gives df1, then df2
{
write data frame to a file
}
I cannot find any such data structure in R. Can anyone point me to any code that illustrates the same functionality?
Just put the data.frames in a list. A plus is that a list works really well with apply style loops. For example, if you want to save the data.frame's, you can use mapply:
l = list(df1, df2)
mapply(write.table, x = l, file = c("df1.txt", "df2.txt"))
If you like apply style loops (and you will, trust me :)) please take a look at the epic plyr package. It might not be the fastest package (look data.table for fast), but it drips with syntactic sugar.
Lists can be used to hold almost anything, including data.frames:
## Versatility of lists
l <- list(file(), new.env(), data.frame(a=1:4))
For writing out multiple data objects stored in a list, lapply() is your friend:
ll <- list(df1=df1, df2=df2)
## Write out as *.csv files
lapply(names(ll), function(X) write.csv(ll[[X]], file=paste0(X, ".csv")))
## Save in *.Rdata files
lapply(names(ll), function(X) {
assign(X, ll[[X]])
save(list=X, file=paste0(X, ".Rdata"))
})
What you are looking for is a list.
You can use a function like lapply to treat each of your data frames in the same manner sperately. However, there might be cases where you need to pass your list of data frames to a function that handles the data frames in relation to each other. In this case lapply doesn't help you.
That's why it is important to note how you can access and iterate the data frames in your list. It's done like this:
mylist[[data frame]][row,column]
Note the double brackets around your data frame index.
So for your example it would be
df1 <- data.frame(c(1,2,3), c(4,5,6))
df2 <- data.frame(c(11,22,33), c(44,55,66))
mylist<-list(df1,df2)
mylist[[1]][1,2] would return 4, whereas mylist[1][1,2] would return NULL. It took a while for me to find this, so I thought it might be helpful to post here.

Resources