R:Define data frames in a similar space - r

Is there a way in R to define data frames in a similar space.
So lets say I have an unknown number of data frames to be created (say there will be n data.frames)
I want to define a space as such:
space<-data.frame.space()
for(i in 1:n) (
space[i]<-some.func(var1,var2)
)|
where some.func creates certain data.frames (in this case it downloads information from the internet), and then I get to call these data frames by saying
space[1] #or
space[2]
#etc
I know people somehow use environments for this, and in functions I see something of the sort. I just don't know how they do that.

I think you just want a simple list
space<-list()
for(i in 1:n) (
space[[i]]<-some.func(var1,var2)
)
and then
space[[1]]
space[[2]]
Note the double bracket indexing. Using double brackets will return the data.frame. Using single brackets will return a list containing the data.frame.

Related

Converting a list of data frames into individual data frames in R [duplicate]

This question already has answers here:
Return elements of list as independent objects in global environment
(4 answers)
Closed 3 years ago.
I have been searching high and low for what I think is an easy solution.
I have a large data frame that I split by factors.
eqRegions <- split(eqDataAll, eqDataAll$SeismicRegion)
This now creates a list object of the data frames by region; there are 8 in total. I would like to loop through the list to make individual data frames using another name.
I can execute the following to convert the list items to individual data frames, but I am thinking that there is a loop mechanism that is fast if I have many factors.
testRegion1 <- eqRegions[[1]]
testRegion3 <- eqRegions[[3]]
I can manually perform the above and it handles it nicely, but if I have many regions it's not efficient. What I would like to do is the equivalent of the following:
for (i in 1:length(eqRegions)) {
region[i] <- as.data.frame(eqRegions[[i]])
}
I think the key is to define region before the loop, but it keep overwriting itself and not incrementing. Many thanks.
Try
list2env(eqRegions,envir=.GlobalEnv)
This should work. The name of the data.frames created will be equal to the names within eqDataAll$SeismicRegion. Anyways, this practice of populating individual data.frames is not recommended. The more I work with R, the more I love/use list.
lapply(names(eqRegions), function(x) assign(x, eqRegions[[x]], envir = .GlobalEnv))
edit: Use list2env solution posted. Was not aware of list2env function.
attach(eqRegions) should be enough. But I recommend working with them in list form using lapply. I guarantee it will result in simpler code.
list2env returns data frames to the global environment whose names are the names in the list. An alternative, if you want to have the same name for the data frames but identified by i from a loop:
for (i in 1:length(eqRegions)) {
assign(paste0("eqRegions", i), as.data.frame(eqRegions[[i]]))
}
This can be slow if the length if the list gets too long.
As an alternative, a "best practice" when splitting data like this is to keep the data.frames within a list, as provided by split. To process it, you use either one of sapply or lapply (many factors) and capture the output back in a list. For instance:
eqRegionsProcessed <- lapply(eqRegions, function(df) {
## do something meaningful here
})
This obviously only works if you are doing the same thing to each data.frame.
If you really must break them out and deal with each data.frame uniquely, then #MatthewPlourde's and #MaratTalipov's answers will work.

r create and address variable in for loop

I have multiple csv-files in one folder. I want to load each csv-file in this folder into one separate data frame. Next, I want to extract certain elements from this data frame into a matrix and calculate the mean of all these matrixes.
setwd("D:\\data")
group_1<-list.files()
a<-length(group_1)
mferg_mean<-data.frame
for(i in 1:a)
{
assign(paste0("mferg_",i),read.csv(group_1[i],header=FALSE,sep=";",quote="",dec=",",col.names=1:90))
}
As there are 11 csv-files in the folder I now have the data frames
mferg_1
to
mferg_11
How can I address each data frame in this loop? As mentioned, I want to extract certain elements from each data frame to a matrix. I would imagine it something like this:
assign(paste0("mferg_matrix_",i),mferg_i[1:5,1:10])
But this obviously does not work because R does not recognize mferg_i in the loop. How can I address this data frame?
This is not something you should probably be using assign for in the first place. Working with a bunch of different data.frames in R is a mess, but working with a list of data.frames is much easier. Try reading your data with
group_1<-list.files()
mferg <- lapply(group_1, function(filename) {
read.csv(filename,header=FALSE,sep=";",quote="",dec=",",col.names=1:90))
})
and you get each each value with mferg[[1]], mferg[[1]], etc. And then you can create a list of extractions with
mferg_matrix <- lapply(mferg, function(x) x[1:5, 1:10])
This is the more R-like way to do things.
But technically you can use get to retrieve values like you use assign to create them. For example
assign(paste0("mferg_matrix_",i),get(paste0("mferg_",i))[1:5,1:10])
but again, this is probably not a smart strategy in the long run.

Using a list of matrix names

I have 75 matrices that I want to search through. The matrices are named a1r1, a1r2, a1r3, a1r4, a1r5, a2r1,...a15r5, and I have a list with all 75 of those names in it; each matrix has the same number of rows and columns. Inside some nested for loops, I also have a line of code that, for the first matrix looks like this:
total <- (a1r1[row,i]) + (a1r1[row,j]) + (a1r1[row,k])
(i, j, k, and row are all variables that I am looping over.) I would like to automate this line so that the for loops would fully execute using the first matrix in the list, then fully execute using the second matrix and so on. How can I do this?
(I'm an experienced programmer, but new to R, so I'm willing to be told I shouldn't use a list of the matrix names, etc. I realize too that there's probably a better way in R than for loops, but I was hoping for sort of quick and dirty at my current level of R expertise.)
Thanks in advance for the help.
Here The R way to do this :
lapply(ls(pattern='a[0-9]r[0-9]'),
function(nn) {
x <- get(nn)
sum(x[row,c(i,j,k)])
})
ls will give a list of variable having a certain pattern name
You loop through the resulted list using lapply
get will transform the name to a varaible
use multi indexing with the vectorized sum function
It's not bad practice to build automatically lists of names designating your objects. You can build such lists with paste, rep, and sequences as 0:10, etc. Once you have a list of object names (let's call it mylist), the get function applied on it gives the objects themselves.

Getting the index of an iterator in R (in parallel with foreach)

I'm using the foreach function to iterate over columns of a data.frame. At each iteration, I would like to get the index of the iterator (i.e. the index or the name of the column considered) and the column itself.
However, the following code, which seems fine in first place, doesn't work because i has no names or colnames attributes.
foreach(i=iter(base[1:N],by='col')) %dopar% c(colnames(i),i)
Now, if you wonder why I'm not iterating over indexes, the reason is that I'm using the %dopar% tool and I don't want to send the whole base to all workers, but only the columns each of them require.
Question : How can I get the index of an iterator ?
Thank you
I would just specify a second iteration variable in the foreach loop that acts as a counter:
library(foreach)
library(iterators)
df <- data.frame(a=1:10, b=rnorm(10), c=runif(10))
r <- foreach(d=df, i=icount()) %do% {
list(d=d, i=i)
}
The "icount" function from the iterators package will return an unbounded counting iterator if no arguments are used, so this example works regardless of the number of columns in the data frame.
You could also include the column name as a third iteration variable:
r <- foreach(d=df, i=icount(), nm=colnames(df)) %do% {
list(d=d, i=i, nm=nm)
}
Here are a couple of possibilities:
Modify the iter function (or write your own) so that instead of sending just the value of the column it includes the names or other information)
You could iterate over the indexes, but use a shared memory tool (such as the Rdsm package) so that each process only needs to grab the part of the data frame it needs rather than distributing the entire data frame.
You could convert your base data frame into a list where each element contains the corresponding column of base along with the column name, then iterate over that list (so the entire element is sent, but not the other elements).

Saving many subsets as dataframes using "for"-loops

this question might be very simple, but I do not find a good way to solve it:
I have a dataset with many subgroups which need to be analysed all-together and on their own. Therefore, I want to use subsets for the groups and use them for the later analysis. As well, the defintion of the subsets as the analysis should be partly done with loops in order to save space and to ensure that the same analysis has been done with all subgroups.
Here is an example of my code using an example dataframe from the boot package:
data(aids)
qlist <- c("1","2","3","4")
for (i in length(qlist)) {
paste("aids.sub.",qlist[i],sep="") <- subset(aids, quarter==qlist[i])
}
The variable which contains the subgroups in my dataset is stored as a string, therefore I added the qlist part which would be not required otherwise.
Make a list of the subsets with lapply:
lapply(qlist, function(x) subset(aids, quarter==x))
Equivalently, avoiding the subset():
lapply(qlist, function(x) aids[aids$quarter==x,])
It is likely the case that using a list will make the subsequent code easier to write and understand. You can subset the list to get a single data frame (just as you can use one of the subsets, as created below). But you can also iterate over it (using for or lapply) without having to construct variable names.
To do the job as you are asking, use assign:
for (i in qlist) {
assign(paste("aids.sub.",i,sep=""), subset(aids, quarter==i))
}
Note the removal of the length() function, and that this is iterating directly over qlist.

Resources