Cleaner method for mass-declaring lists? - r

I have an iterative process that involves appending modified/utilised data to sublists of lists. It feels quite messy to declare a bunch of lists e.g. testList <- list() x 12, so that within the loop I can jump straight into the below code:
testList <- list()
otherTestList <- list()
anotherTestList <- list()
for(i in 1:10){
testList[[i]] <- testData
otherTestList[[i]] <- otherTestData
anotherTestList[[i]] <- anotherTestData
}
The above code would require declaration of 3 lists at the start of the code, which wouldn't be so much of an issue, but I have around 12 lists so their mass declaration makes the code quite ugly. I was wondering if there was a solution to this problem? Something I'd considered was somewhere along the lines of using lapply to create the lists in one line, but this doesn't appear to be an option due to the need to declare the objects as some type prior to using them as lists.

You can use assign(). Say you have the names of your lists or want to set them iteratively:
for (i in 1:5){
assign(paste0("x",i),list())
}
If you have the names:
listnames = c("testList", "otherTestList","anotherTestList", ...)
for (i in 1:5){
assign(listnames[i],list())
}
This will give you 5 variables named x1, x2 or testList, otherTestList... containing an empty list each.

Related

access indexed data in a loop using R

Lets say I have 5 databases, named data1-data5. I basically want to create a loop that prints the first 10 rows of the data. In my naïve mind, the code should look something like this:
for (i in 1:5){
print(head(data[i]))
}
That does not work. What's the proper way to do this? How do I define [i] as the "indexing" variable for the different databases?
Another way would be to use get function:
for (i in 1:5){
tmp <- get(paste0("data", i))
## Assigns the data to the variable tmp - just like tmp <- data1/data2/data3 etc
print(head(tmp))
}
It would be better to put these objects in a list and use [[ to reference them. But if you must use separate names for the objects, then you need to parse them and evaluate the resulting expressions.
Here's an example you can emulate. For brevity, it prints the values of numerical objects rather than the heads of "databases."
data1 <- 1; data2 <- 2; data3 <- 3
for (i in 1:3) {
print(eval(parse(text=paste0("data", i))))
}

How to apply single-variable function to multiple lists in R

I am using the tidytransit package to analyze 7 GTFS feeds in R. Tidytransit presents these feeds as lists of dataframes. I want to execute an operation on each of these lists (set_servicepattern(x)) but don't feel like typing it out seven times, so would like to build a function that loops the operation across all seven lists. I've attempted this with lapply, but cannot seem to figure out how to get the result from lapply out of the list structure back into each of its sub-lists to overwrite the input list. Suggestions?
Here's the relevant portion of my code:
nyc_subway <- read_gtfs("http://web.mta.info/developers/data/nyct/subway/google_transit.zip")
bus_bk <- read_gtfs("http://web.mta.info/developers/data/nyct/bus/google_transit_brooklyn.zip")
bus_qn <- read_gtfs("http://web.mta.info/developers/data/nyct/bus/google_transit_queens.zip")
bus_bx <- read_gtfs("http://web.mta.info/developers/data/nyct/bus/google_transit_bronx.zip")
bus_ma <- read_gtfs("http://web.mta.info/developers/data/nyct/bus/google_transit_manhattan.zip")
bus_si <- read_gtfs("http://web.mta.info/developers/data/nyct/bus/google_transit_staten_island.zip")
bus_bc <- read_gtfs("http://web.mta.info/developers/data/busco/google_transit.zip")
list_2 <- list(nyc_subway, bus_bk, bus_qn, bus_bc, bus_bx, bus_ma, bus_si)
a <- function(x){
set_servicepattern(x)
return(x)}
lapply(list_2, a)
If we want to overwrite the objects, create a named list and then use list2env (not recommended as it is better to keep it in the list)
names(list_2) <- c('nyc_subway', 'bus_bk', 'bus_qn',
'bus_bc', 'bus_bx', 'bus_ma', 'bus_si')
list2env(list_2, .GlobalEnv)

How to use extract function in a for loop?

I am using the extract function in a loop. See below.
for (i in 1:length(list_shp_Tanzania)){
LU_Mod2000<- extract(x=rc_Mod2000_LC, y=list_shp_Tanzania[[i]], fun=maj)
}
Where maj function is:
maj <- function(x){
y <- as.numeric(names(which.max(table(x))))
return(y)
}
I was expecting to get i outputs, but I get only one output once the loop is done. Somebody knows what I am doing wrong. Thanks.
One solution in this kind of situation is to create a list and then assign the result of each iteration to the corresponding element of the list:
LU_Mod2000 <- vector("list", length(list_shp_Tanzania))
for (i in 1:length(list_shp_Tanzania)){
LU_Mod2000[[i]] <- extract(x=rc_Mod2000_LC, y=list_shp_Tanzania[[i]], fun=maj)
}
Do not do
LU_Mod2000 <- c(LU_Mod2000, extract(x=rc_Mod2000_LC, y=list_shp_Tanzania[[i]], fun=maj))
inside the loop. This will create unnecessary copies and will take long to run. Use the list method, and after the loop, convert the list of results to the desired format (usually using do.call(LU_Mod2000, <some function>))
Alternatively, you could substitute the for loop with lapply, which is what many people seem to prefer
LU_Mod2000 <- lapply(list_shp_Tanzania, function(z) extract(x=rc_Mod2000_LC, y=z, fun=maj))

Running the same function multiple times and saving results with different names in workspace

So, I built a function called sort.song.
My goal with this function is to randomly sample the rows of a data.frame (DATA) and then filter it out (DATA.NEW) to analyse it. I want to do it multiple times (let's say 10 times). By the end, I want that each object (mantel.something) resulted from this function to be saved in my workspace with a name that I can relate to each cycle (mantel.something1, mantel.somenthing2...mantel.something10).
I have the following code, so far:
sort.song<-function(DATA){
require(ade4)
for(i in 1:10){ # Am I using for correctly here?
DATA.NEW <- DATA[sample(1:nrow(DATA),replace=FALSE),]
DATA.NEW <- DATA.NEW[!duplicated(DATA.NEW$Point),]
coord.dist<-dist(DATA.NEW[,4:5],method="euclidean")
num.notes.dist<-dist(DATA.NEW$Num_Notes,method="euclidean")
songdur.dist<-dist(DATA.NEW$Song_Dur,method="euclidean")
hfreq.dist<-dist(DATA.NEW$High_Freq,method="euclidean")
lfreq.dist<-dist(DATA.NEW$Low_Freq,method="euclidean")
bwidth.dist<-dist(DATA.NEW$Bwidth_Song,method="euclidean")
hfreqlnote.dist<-dist(DATA.NEW$HighFreq_LastNote,method="euclidean")
mantel.numnotes[i]<<-mantel.rtest(coord.dist,num.notes.dist,nrepet=1000)
mantel.songdur[i]<<-mantel.rtest(coord.dist,songdur.dist,nrepet=1000)
mantel.hfreq[i]<<-mantel.rtest(coord.dist,hfreq.dist,nrepet=1000)
mantel.lfreq[i]<<-mantel.rtest(coord.dist,lfreq.dist,nrepet=1000)
mantel.bwidth[i]<<-mantel.rtest(coord.dist,bwidth.dist,nrepet=1000)
mantel.hfreqlnote[i]<<-mantel.rtest(coord.dist,hfreqlnote.dist,nrepet=1000)
}
}
Could someone please help me to do it the right way?
I think I'm not assigning the cycles correctly for each mantel.somenthing object.
Many thanks in advance!
The best way to implement what you are trying to do is through a list. You can even make it take two indices, the first for the iterations, the second for the type of analysis.
mantellist <- as.list(1:10) ## initiate list with some values
for (i in 1:10){
...
mantellist[[i]] <- list(numnotes=mantel.rtest(coord.dist,num.notes.dist,nrepet=1000),
songdur=mantel.rtest(coord.dist,songdur.dist,nrepet=1000),
hfreq=mantel.rtest(coord.dist,hfreq.dist,nrepet=1000),
...)
}
return(mantellist)
In this way you can index your specific analysis for each iteration in an intuitive way:
mantellist[[2]][['hfreq']]
mantellist[[2]]$hfreq ## alternative
EDIT by Mohr:
Just for clarification...
So, according to your suggestion the code should be something like this:
sort.song<-function(DATA){
require(ade4)
mantellist <- as.list(1:10)
for(i in 1:10){
DATA.NEW <- DATA[sample(1:nrow(DATA),replace=FALSE),]
DATA.NEW <- DATA.NEW[!duplicated(DATA.NEW$Point),]
coord.dist<-dist(DATA.NEW[,4:5],method="euclidean")
num.notes.dist<-dist(DATA.NEW$Num_Notes,method="euclidean")
songdur.dist<-dist(DATA.NEW$Song_Dur,method="euclidean")
hfreq.dist<-dist(DATA.NEW$High_Freq,method="euclidean")
lfreq.dist<-dist(DATA.NEW$Low_Freq,method="euclidean")
bwidth.dist<-dist(DATA.NEW$Bwidth_Song,method="euclidean")
hfreqlnote.dist<-dist(DATA.NEW$HighFreq_LastNote,method="euclidean")
mantellist[[i]] <- list(numnotes=mantel.rtest(coord.dist,num.notes.dist,nrepet=1000),
songdur=mantel.rtest(coord.dist,songdur.dist,nrepet=1000),
hfreq=mantel.rtest(coord.dist,hfreq.dist,nrepet=1000),
lfreq=mantel.rtest(coord.dist,lfreq.dist,nrepet=1000),
bwidth=mantel.rtest(coord.dist,bwidth.dist,nrepet=1000),
hfreqlnote=mantel.rtest(coord.dist,hfreqlnote.dist,nrepet=1000)
)
}
return(mantellist)
}
You can achieve your objective of repeating this exercise 10 (or more times) without using an explicit for-loop. Rather than have the function run the loop, write the sort.song function to run one iteration of the process, then you can use replicate to repeat that process however many times you desire.
It is generally good practice not to create a bunch of named objects in your global environment. Instead, you can hold of the results of each iteration of this process in a single object. replicate will return an array (if possible) otherwise a list (in the example below, a list of lists). So, the list will have 10 elements (one for each iteration) and each element will itself be a list containing named elements corresponding to each result of mantel.rtest.
sort.song<-function(DATA){
DATA.NEW <- DATA[sample(1:nrow(DATA),replace=FALSE),]
DATA.NEW <- DATA.NEW[!duplicated(DATA.NEW$Point),]
coord.dist <- dist(DATA.NEW[,4:5],method="euclidean")
num.notes.dist <- dist(DATA.NEW$Num_Notes,method="euclidean")
songdur.dist <- dist(DATA.NEW$Song_Dur,method="euclidean")
hfreq.dist <- dist(DATA.NEW$High_Freq,method="euclidean")
lfreq.dist <- dist(DATA.NEW$Low_Freq,method="euclidean")
bwidth.dist <- dist(DATA.NEW$Bwidth_Song,method="euclidean")
hfreqlnote.dist <- dist(DATA.NEW$HighFreq_LastNote,method="euclidean")
return(list(
numnotes = mantel.rtest(coord.dist,num.notes.dist,nrepet=1000),
songdur = mantel.rtest(coord.dist,songdur.dist,nrepet=1000),
hfreq = mantel.rtest(coord.dist,hfreq.dist,nrepet=1000),
lfreq = mantel.rtest(coord.dist,lfreq.dist,nrepet=1000),
bwidth = mantel.rtest(coord.dist,bwidth.dist,nrepet=1000),
hfreqlnote = mantel.rtest(coord.dist,hfreqlnote.dist,nrepet=1000)
))
}
require(ade4)
replicate(10, sort.song(DATA))

Iterating over separate lists in R

I have lots of variables in R, all of type list
a100 = list()
a200 = list()
# ...
p700 = list()
Each variable is a complicated data structure:
a200$time$data # returns 1000 x 1000 matrix
Now, I want to apply code to each variable in turn. However, since R doesn't support pass-by-reference, I'm not sure what to do.
One idea I had was to create a big list of all these lists, i.e.,
biglist = list()
biglist[[1]] = a100
...
And then I could iterate over biglist:
for (i in 1:length(biglist)){
biglist[[i]]$newstuff = "profit"
# more code here
}
And finally, after the loop, go backwards so that existing code (that uses variable names) still works:
a100 = biglist[[1]]
# ...
The question is: is there a better way to iterate over a set of named lists? I have a feeling that I'm doing things horribly wrong. Is there something easier, like:
# FAKE, Idealized code:
foreach x in (a100, a200, ....){
x$newstuff = "profit"
}
a100$newstuff # "profit"
To parallel walk over lists you can use mapply, which will take parallel lists and then walk over them in lock-step. Furthermore, in a functional language you should emit the object that you want rather than modify the data structure within a function call.
You should use the sapply, apply, lapply, ... family of functions.
jim
jimmyb is quite right. lapply and sapply are specifically designed to work on lists. So they would work with your biglist as well. You shouldn't forget to return the object in the nested function though : An example :
X <- list(A=list(A1=1:2,A2=3:4),B=list(B1=5:6,B2=7:8))
lapply(X,function(i){
i$newstuff = "profit"
return(i)
})
Now as you said, R passes by value so you have multiple copies of the data roaming around. If you work with really big lists, you might want to try toning the memory usage down by working on each variable seperately, using assign and get. The following is considered bad coding, but can sometimes be necessary to avoid memory trouble :
A <- X[[1]] ; B <- X[[2]] #make the data
list.names <- c("A","B")
for (i in list.names){
tmp <- get(i)
tmp$newstuff <- "profit"
assign(i,tmp)
rm(tmp)
}
Make sure you are well aware of the implication this code has, as you're working within the global environment. If you need to do this more often, you might want to work with environments instead :
my.env <- new.env() # make the environment
my.env$A <- X[[1]];my.env$B <- X[[2]] # put vars in environment
for (i in list.names){
tmp <- get(i,envir=my.env)
tmp$newstuff <- "profit"
assign(i,tmp,envir=my.env)
rm(tmp)
}
my.env$A
my.env$B

Resources