I have a model object m1. I need to create 100 distinctly named copies so I can adjust and plot each. To create a copy, I currently do this as such:
m1recip1 <- m1
m1recip2 <- m1
m1recip3 <- m1
m1recip4 <- m1
m1recip5 <- m1
m1recip6 <- m1
m1recip7 <- m1
...
m1recip100 <- m1
I planned to create these through a loop, but this is less efficient because I only know how to do so by initializing all 100 objects before looping through them. I'm effectively looking for something similar to the macro facility in other languages (where m1recip&i would produce the names iteratively). I'm sure R can do this - how?
As mentioned above, reconsider saving many similar structured objects in global environment. Instead, use a named list which results in the maintenance of one, indexed object to maintain where R has many handlers (i.e., apply family) to run operations across all elements.
Specifically, consider replicate (wrapper to sapply) to build the 100 m1 elements and use setNames to name them accordingly. You lose no functionality of object if saved within a list.
model_list <- setNames(replicate(100, m1, simplify = FALSE),
paste0("m1recip", 1:100))
model_list$m1recip1
model_list$m1recip2
model_list$m1recip3
...
Instead of assigning m1 to 100 objects, we can create a list with 100 elements like the following:
m1recip_list <- lapply(1:100, function(x) m1)
We can then reference each element by element number m1recip_list[[10]] or apply a function to every element of the list using lapply:
lapply(m1recip_list, some_function)
You can dynamically create object names using the paste function in a loop, and you can assign them values using the assign function as opposed to the "<-" operator.
for(i in 1:100) {
assign(paste("m1recip",i, sep = ""), m1)
}
Related
I am using the tidytransit package to analyze 7 GTFS feeds in R. Tidytransit presents these feeds as lists of dataframes. I want to execute an operation on each of these lists (set_servicepattern(x)) but don't feel like typing it out seven times, so would like to build a function that loops the operation across all seven lists. I've attempted this with lapply, but cannot seem to figure out how to get the result from lapply out of the list structure back into each of its sub-lists to overwrite the input list. Suggestions?
Here's the relevant portion of my code:
nyc_subway <- read_gtfs("http://web.mta.info/developers/data/nyct/subway/google_transit.zip")
bus_bk <- read_gtfs("http://web.mta.info/developers/data/nyct/bus/google_transit_brooklyn.zip")
bus_qn <- read_gtfs("http://web.mta.info/developers/data/nyct/bus/google_transit_queens.zip")
bus_bx <- read_gtfs("http://web.mta.info/developers/data/nyct/bus/google_transit_bronx.zip")
bus_ma <- read_gtfs("http://web.mta.info/developers/data/nyct/bus/google_transit_manhattan.zip")
bus_si <- read_gtfs("http://web.mta.info/developers/data/nyct/bus/google_transit_staten_island.zip")
bus_bc <- read_gtfs("http://web.mta.info/developers/data/busco/google_transit.zip")
list_2 <- list(nyc_subway, bus_bk, bus_qn, bus_bc, bus_bx, bus_ma, bus_si)
a <- function(x){
set_servicepattern(x)
return(x)}
lapply(list_2, a)
If we want to overwrite the objects, create a named list and then use list2env (not recommended as it is better to keep it in the list)
names(list_2) <- c('nyc_subway', 'bus_bk', 'bus_qn',
'bus_bc', 'bus_bx', 'bus_ma', 'bus_si')
list2env(list_2, .GlobalEnv)
In R one can use the <<- symbol within the lapply() function to assign a value to a variable outside lapply().
Let's consider a matrix full of 1:
m<-matrix(data=1, nrow=5, ncol=5)
Let's say I want to replace each row by the values 1,2,3,4 and 5 using the assignation symbol <<-. I can use the function the lapply function (it is not the designed function for that kind of operation, this is only an example):
lapply(X = seq(nrow(m)), FUN = function(r){
m[r,]<<-seq(5)
})
This will work.
But if I now use mclapply like this:
mclapply(X = seq(nrow(m)), FUN = function(r){
m[r,]<<-seq(5)
})
The matrix m will remain full of 1.
The idea is to apply changes to rows of a matrix, without creating a new one, but rather assigning them in the existing one. The only constrain is to use a function from the parallel package (e.g. mclapply(), but maybe another function would better fit).
Also using the <<- symbol is not mandatory.
How can I do that ?
You can't assign in parallel, as you're just assigning to a local copy of the matrix.
Two solutions:
Use shared memory (e.g. matrices on disk using package {bigstatsr}; disclaimer: I'm the author)
Don't assign in the first place. Just run the lapply(), get all the results parts as a list and use do.call("rbind", list).
How about this, using the future package
library(future)
plan(multiprocess)
m <- matrix(data = 1, nrow = 5, ncol = 5)
# we create a set of futures, so the values are calculated in parallele and
# not sent back to the main environment
fs <- lapply(seq(nrow(m)), function(x) future(seq(5) + x))
# when then pull the values one by one and apply them where they belong
for (i in seq(nrow(m))) {
m[i, ] <- value(fs[[i]])
}
# or the same way you did it:
lapply(X = seq(nrow(m)), FUN = function(r){
m[r,] <<- value(fs[[r]])
})
The drawback here is that the value are assigned sequentially but at least they are calculated in parallel. But, I don't think you intend to use the matrix before all calculations are done anyway.
I have a list of objects in R on which I perform different actions using lapply. However in the next step I would like to apply functions only to certain elements of the list. Therefore I would like to split the list into the original variables again. Is there a command in R for this? Is it even possible, or do I have to create new variables each time I want to do this?
See the following example to make clear what I mean:
# 3 vectors:
test1 <- 1:3
test2 <- 2:6
test3 <- 8:9
# list l:
l <- list(test1,test2,test3)
# add 3 to each element of the list:
l <- lapply(l, function(x) x+3)
# In effect, list l contains modified versions of the three test vectors
Question: How can I assign those modifications to the original variables again? I do not want to do:
test1 <- l[[1]]
test2 <- l[[2]]
test3 <- l[[3]]
# etc.
Is there a better way to do that?
A more intuitive approach, assuming you're new to R, might be to use a for loop. I do think that Richard Scriven's approach is better. It is at least more concise.
for(i in seq(1, length(l))){
name <- paste0("test",i)
assign(name, l[[i]] + 3)
}
That all being said, your ultimate goal is a bit dubious. I would recommend that you save the results in a list or matrix, especially if you are new to R. By including all the results in a list or matrix you can continue to use functions like lapply and sapply to manipulate your results.
Loosely speaking Richard Scriven's approach in the comments is converting each element of your list into an object and then passing these objects to the enclosing environment which is the global environment in this case. He could have passed the objects to any environment. For example try,
e <- new.env()
list2env(lapply(mget(ls(pattern = "test[1-3]")), "+", 3), e)
Notice that test1, test2 and test3 are now in environment e. Try e$test1 or ls(e). Going deeper in to the parenthesis, the call to ls uses simple regular expressions to tell mget the names of the objects to look for. For more, take a look at http://adv-r.had.co.nz/Environments.html.
Function lm(...) returns an object of class 'lm'. How do I create an array of such objects? I want to do the following:
my_lm_array <- rep(as.lm(NULL), 20)
#### next, populate this array by running lm() repeatedly:
for(i in 1:20) {
my_lm_array[i] <- lm(my_data$results ~ my_data[i,])
}
Obviously the line "my_lm <- rep(as.lm(NULL), 20)" does not work. I'm trying to create an array of objects of type 'lm'. How do I do that?
Not sure it will answer your question, but if what you want to do is run a series of lm from a variable against different columns of a data frame, you can do something like this :
data <- data.frame(result=rnorm(10), v1=rnorm(10), v2=rnorm(10))
my_lms <- lapply(data[,c("v1","v2")], function(v) {
lm(data$result ~ v)
})
Then, my_lms would be a list of elements of class lm.
Well, you can create an array of empty/meaningless lm objects as follows:
z <- NA
class(z) <- "lm"
lm_array <- replicate(20,z,simplify=FALSE)
but that's probably not the best way to solve the problem. You could just create an empty list of the appropriate length (vector("list",20)) and fill in the elements as you go along: R is weakly enough typed that it won't mind you replacing NULL values with lm objects. More idiomatically, though, you can run lapply on your list of predictor names:
my_data <- data.frame(result=rnorm(10), v1=rnorm(10), v2=rnorm(10))
prednames <- setdiff(names(my_data),"result") ## extract predictor names
lapply(prednames,
function(n) lm(reformulate(n,response="result"),
data=my_data))
Or, if you don't feel like creating an anonymous function, you can first generate a list of formulae (using lapply) and then run lm on them:
formList <- lapply(prednames,reformulate,response="result") ## create formulae
lapply(formList,lm,data=my_data) ## run lm() on each formula in turn
will create the same list of lm objects as the first strategy above.
In general it is good practice to avoid using syntax such as my_data$result inside modeling formulae; instead, try to set things up so that all the variables in the model are drawn from inside the data object. That way methods like predict and update are more likely to work correctly ...
I have lots of variables in R, all of type list
a100 = list()
a200 = list()
# ...
p700 = list()
Each variable is a complicated data structure:
a200$time$data # returns 1000 x 1000 matrix
Now, I want to apply code to each variable in turn. However, since R doesn't support pass-by-reference, I'm not sure what to do.
One idea I had was to create a big list of all these lists, i.e.,
biglist = list()
biglist[[1]] = a100
...
And then I could iterate over biglist:
for (i in 1:length(biglist)){
biglist[[i]]$newstuff = "profit"
# more code here
}
And finally, after the loop, go backwards so that existing code (that uses variable names) still works:
a100 = biglist[[1]]
# ...
The question is: is there a better way to iterate over a set of named lists? I have a feeling that I'm doing things horribly wrong. Is there something easier, like:
# FAKE, Idealized code:
foreach x in (a100, a200, ....){
x$newstuff = "profit"
}
a100$newstuff # "profit"
To parallel walk over lists you can use mapply, which will take parallel lists and then walk over them in lock-step. Furthermore, in a functional language you should emit the object that you want rather than modify the data structure within a function call.
You should use the sapply, apply, lapply, ... family of functions.
jim
jimmyb is quite right. lapply and sapply are specifically designed to work on lists. So they would work with your biglist as well. You shouldn't forget to return the object in the nested function though : An example :
X <- list(A=list(A1=1:2,A2=3:4),B=list(B1=5:6,B2=7:8))
lapply(X,function(i){
i$newstuff = "profit"
return(i)
})
Now as you said, R passes by value so you have multiple copies of the data roaming around. If you work with really big lists, you might want to try toning the memory usage down by working on each variable seperately, using assign and get. The following is considered bad coding, but can sometimes be necessary to avoid memory trouble :
A <- X[[1]] ; B <- X[[2]] #make the data
list.names <- c("A","B")
for (i in list.names){
tmp <- get(i)
tmp$newstuff <- "profit"
assign(i,tmp)
rm(tmp)
}
Make sure you are well aware of the implication this code has, as you're working within the global environment. If you need to do this more often, you might want to work with environments instead :
my.env <- new.env() # make the environment
my.env$A <- X[[1]];my.env$B <- X[[2]] # put vars in environment
for (i in list.names){
tmp <- get(i,envir=my.env)
tmp$newstuff <- "profit"
assign(i,tmp,envir=my.env)
rm(tmp)
}
my.env$A
my.env$B