Writing a loop to apply the operator 'data.frame' multiple times - r

I would like to write a loop to create multiple data frames from a set of already existsing matrices.
I've imported and created these using the code:
temp<-list.files(pattern="*.csv")
ddives <- lapply(temp, read.csv)
so 'ddives' is my set of set of csv files. I now want to create a data frame out of each of these using a looped version of the code:
d.dives1<- data.frame(ddives[1])

A quick primer on terminology before I answer your question:
The result of read.csv() is a data.frame.
The result of lapply() is a list.
Thus you now have a list of data frames.
If you can safely assume that the data frames in the list have the same structure (i.e. the same number of columns and the same classes), then you can use rbind() to combine your list of data frames into a single data.frame.
To make this easier, you can use do.call() as follows:
do.call(rbind, ddives)
do.call constructs a call from the function using the list elements as arguments. If they are named, they are passed as named arguments, otherwise in order (as always in R). In this case you apply rbind to all of the elements in your list, thus creating a single data.frame.
This is clearly untested, since I don't have your data. But, in general, do.call is a useful function for this type of operation.

As this is a follow up to the earlier question you posted, try this:
for (i in 1:length(ddives)) assign(temp[i], ddives[[i]])

If you really want a looped version of your code, this would be:
for (i in 1:length(ddives)){
assign(paste("d.dives", i, sep =""), ddives[i])
}

Related

print variable names in my own function r

I want to create a funtion that creates new data frames using some variables from other data frames. For that I thing I need to print the variable names in my own function somehow.
The variables come from two data frames (asd and tetracam) which have six variables in common, the bands "w530", "w550", "w570", "670", "w700" and "w800". So, I want to create six data frames, one for each band. One by one I could write like this:
# Band w530
w530<-data.frame(tetracam$filename,tetracam$time,tetracam$type,tetracam$w530,asd$w530)
names(w530)<-c("filename","time","type","tetracam","asd")
w530<-w530[order(w530$time),]
It works fine but I'd like to do it as a function in order to run for all bands. I thought I have to replace all the w530 in the code above for a dinamic object. As I thought of using some of the apply family. So, I first created a list with the names of my common variables:
bands<-c("w530","w550","w570","670","w700","w800")
Then, I tried several ways, for example, using cat or sprintf that would use the strings from the list to fill my function. But it didn't work. Actually, I'm not sure which apply family function I would use. If it's possible to use any in this case:
my.fun<- function(band){
sprintf("%s<-data.frame(tetracam$filename,tetracam$time,tetracam$type,asd$%s,tetracam$%s)",band,band,band)
sprintf("names(%s)<-c('filename','time','type','asd','tetracam')",band)
sprintf("%s[order(%s$time),]",band,band)
}
Any help is appreciated.
Trick is to access data.frame column using df[varName] idiom.
fun1 <- function(band, tetracam, asd){
df<-data.frame(tetracam$filename,tetracam$time,tetracam$type,tetracam[band],asd[band])
names(df)<-c("filename","time","type","tetracam","asd")
df<-df[order(df$time),]
return(df)
}
for (band in bands){
single_band_df <- fun1(band, tetracam, asd)
}

Converting a list of data frames into individual data frames in R [duplicate]

This question already has answers here:
Return elements of list as independent objects in global environment
(4 answers)
Closed 3 years ago.
I have been searching high and low for what I think is an easy solution.
I have a large data frame that I split by factors.
eqRegions <- split(eqDataAll, eqDataAll$SeismicRegion)
This now creates a list object of the data frames by region; there are 8 in total. I would like to loop through the list to make individual data frames using another name.
I can execute the following to convert the list items to individual data frames, but I am thinking that there is a loop mechanism that is fast if I have many factors.
testRegion1 <- eqRegions[[1]]
testRegion3 <- eqRegions[[3]]
I can manually perform the above and it handles it nicely, but if I have many regions it's not efficient. What I would like to do is the equivalent of the following:
for (i in 1:length(eqRegions)) {
region[i] <- as.data.frame(eqRegions[[i]])
}
I think the key is to define region before the loop, but it keep overwriting itself and not incrementing. Many thanks.
Try
list2env(eqRegions,envir=.GlobalEnv)
This should work. The name of the data.frames created will be equal to the names within eqDataAll$SeismicRegion. Anyways, this practice of populating individual data.frames is not recommended. The more I work with R, the more I love/use list.
lapply(names(eqRegions), function(x) assign(x, eqRegions[[x]], envir = .GlobalEnv))
edit: Use list2env solution posted. Was not aware of list2env function.
attach(eqRegions) should be enough. But I recommend working with them in list form using lapply. I guarantee it will result in simpler code.
list2env returns data frames to the global environment whose names are the names in the list. An alternative, if you want to have the same name for the data frames but identified by i from a loop:
for (i in 1:length(eqRegions)) {
assign(paste0("eqRegions", i), as.data.frame(eqRegions[[i]]))
}
This can be slow if the length if the list gets too long.
As an alternative, a "best practice" when splitting data like this is to keep the data.frames within a list, as provided by split. To process it, you use either one of sapply or lapply (many factors) and capture the output back in a list. For instance:
eqRegionsProcessed <- lapply(eqRegions, function(df) {
## do something meaningful here
})
This obviously only works if you are doing the same thing to each data.frame.
If you really must break them out and deal with each data.frame uniquely, then #MatthewPlourde's and #MaratTalipov's answers will work.

Performing column select over multiple dataframes

I have looked around a lot for this answer, they get close but no cigar. I am trying to perform a selection of columns over multiple dataframes. I can do this and return a list, but I wish to preserve the dataframes in the global environment. I want to keep the dataframes separate for ease of use and visibility in Rstudio. For example I am selecting columns based on their name as so, for one dataframe:
E07 <- E07[,c("Block","Name","F635.Mean","F532.Mean","B635.Mean","B532")]
I have x amount of data frames listed in dflist so I have written this function:
columnselect<-function(df){df[,c("Block","Name","F635.Mean","F532.Mean","B635.Mean","B532")];df}
I then wish to apply this over the dflist as so:
lapply(X=dflist,FUN=columnselect)
This returns the function over the dflist however the data tables remain unchanged. How do I apply the function over multiple dataframes without returning them in a list.
Many thanks
M
Your function returns the data frames unchanged because this is the last thing evaluated in your function. Instead of:
columnselect<-function(df){
df[,c("Block","Name","F635.Mean","F532.Mean","B635.Mean","B532")]
df}
It should be:
columnselect<-function(df){
df[,c("Block","Name","F635.Mean","F532.Mean","B635.Mean","B532")]
}
Having the last df in your function simply returned the full df that you passed in the function.
As for the second question that you would like to have the data.frames in the global environment rather than in the list (which is bad practice just so you know; it is always better to keep those in the list) you need the list2env function i.e.:
mylist <- lapply(X=dflist,FUN=columnselect)
list2env(mylist, envir = globalenv())
Using this the data.frames in the global environment will be updated.

Joining list of data frames in R

I have this example data
list_1<-list(data.frame(c(1:10)),data.frame(c(11:20)))
list_2<-list(data.frame(c(21:30)),data.frame(c(31:40)))
And I need to join them together to get structure like
list_3<-list(data.frame(c(1:10)),data.frame(c(11:20)),data.frame(c(21:30)),data.frame(c(31:40)))
It means that I have to create one new list of frames. Because when I use
list_3<-list(list_1,list_2)
then the first frame in list_1 is list_3[[1]][[1]] and it is problem for me. I need to call this frame like list_3[[1]].
Any straightforward way how to achieve it?
I have tried some plyr like join, join_all and I cannot still done this.
Moving some comments to the correct place (answers), the two most common solutions would be:
c(list_1, list_2)
or
append(list_1, list_2)
Since you had already tried:
list(list_1, list_2)
and found that this had created a nested list, you can also unlist the nested list with the argument recursive = FALSE.
unlist(list(list_1, list_2), recursive = FALSE)

Saving many subsets as dataframes using "for"-loops

this question might be very simple, but I do not find a good way to solve it:
I have a dataset with many subgroups which need to be analysed all-together and on their own. Therefore, I want to use subsets for the groups and use them for the later analysis. As well, the defintion of the subsets as the analysis should be partly done with loops in order to save space and to ensure that the same analysis has been done with all subgroups.
Here is an example of my code using an example dataframe from the boot package:
data(aids)
qlist <- c("1","2","3","4")
for (i in length(qlist)) {
paste("aids.sub.",qlist[i],sep="") <- subset(aids, quarter==qlist[i])
}
The variable which contains the subgroups in my dataset is stored as a string, therefore I added the qlist part which would be not required otherwise.
Make a list of the subsets with lapply:
lapply(qlist, function(x) subset(aids, quarter==x))
Equivalently, avoiding the subset():
lapply(qlist, function(x) aids[aids$quarter==x,])
It is likely the case that using a list will make the subsequent code easier to write and understand. You can subset the list to get a single data frame (just as you can use one of the subsets, as created below). But you can also iterate over it (using for or lapply) without having to construct variable names.
To do the job as you are asking, use assign:
for (i in qlist) {
assign(paste("aids.sub.",i,sep=""), subset(aids, quarter==i))
}
Note the removal of the length() function, and that this is iterating directly over qlist.

Resources