Reassign variables to elements of list - r

I have a list of objects in R on which I perform different actions using lapply. However in the next step I would like to apply functions only to certain elements of the list. Therefore I would like to split the list into the original variables again. Is there a command in R for this? Is it even possible, or do I have to create new variables each time I want to do this?
See the following example to make clear what I mean:
# 3 vectors:
test1 <- 1:3
test2 <- 2:6
test3 <- 8:9
# list l:
l <- list(test1,test2,test3)
# add 3 to each element of the list:
l <- lapply(l, function(x) x+3)
# In effect, list l contains modified versions of the three test vectors
Question: How can I assign those modifications to the original variables again? I do not want to do:
test1 <- l[[1]]
test2 <- l[[2]]
test3 <- l[[3]]
# etc.
Is there a better way to do that?

A more intuitive approach, assuming you're new to R, might be to use a for loop. I do think that Richard Scriven's approach is better. It is at least more concise.
for(i in seq(1, length(l))){
name <- paste0("test",i)
assign(name, l[[i]] + 3)
}
That all being said, your ultimate goal is a bit dubious. I would recommend that you save the results in a list or matrix, especially if you are new to R. By including all the results in a list or matrix you can continue to use functions like lapply and sapply to manipulate your results.
Loosely speaking Richard Scriven's approach in the comments is converting each element of your list into an object and then passing these objects to the enclosing environment which is the global environment in this case. He could have passed the objects to any environment. For example try,
e <- new.env()
list2env(lapply(mget(ls(pattern = "test[1-3]")), "+", 3), e)
Notice that test1, test2 and test3 are now in environment e. Try e$test1 or ls(e). Going deeper in to the parenthesis, the call to ls uses simple regular expressions to tell mget the names of the objects to look for. For more, take a look at http://adv-r.had.co.nz/Environments.html.

Related

How to apply single-variable function to multiple lists in R

I am using the tidytransit package to analyze 7 GTFS feeds in R. Tidytransit presents these feeds as lists of dataframes. I want to execute an operation on each of these lists (set_servicepattern(x)) but don't feel like typing it out seven times, so would like to build a function that loops the operation across all seven lists. I've attempted this with lapply, but cannot seem to figure out how to get the result from lapply out of the list structure back into each of its sub-lists to overwrite the input list. Suggestions?
Here's the relevant portion of my code:
nyc_subway <- read_gtfs("http://web.mta.info/developers/data/nyct/subway/google_transit.zip")
bus_bk <- read_gtfs("http://web.mta.info/developers/data/nyct/bus/google_transit_brooklyn.zip")
bus_qn <- read_gtfs("http://web.mta.info/developers/data/nyct/bus/google_transit_queens.zip")
bus_bx <- read_gtfs("http://web.mta.info/developers/data/nyct/bus/google_transit_bronx.zip")
bus_ma <- read_gtfs("http://web.mta.info/developers/data/nyct/bus/google_transit_manhattan.zip")
bus_si <- read_gtfs("http://web.mta.info/developers/data/nyct/bus/google_transit_staten_island.zip")
bus_bc <- read_gtfs("http://web.mta.info/developers/data/busco/google_transit.zip")
list_2 <- list(nyc_subway, bus_bk, bus_qn, bus_bc, bus_bx, bus_ma, bus_si)
a <- function(x){
set_servicepattern(x)
return(x)}
lapply(list_2, a)
If we want to overwrite the objects, create a named list and then use list2env (not recommended as it is better to keep it in the list)
names(list_2) <- c('nyc_subway', 'bus_bk', 'bus_qn',
'bus_bc', 'bus_bx', 'bus_ma', 'bus_si')
list2env(list_2, .GlobalEnv)

creating and editing many objects at the same time

I have a model object m1. I need to create 100 distinctly named copies so I can adjust and plot each. To create a copy, I currently do this as such:
m1recip1 <- m1
m1recip2 <- m1
m1recip3 <- m1
m1recip4 <- m1
m1recip5 <- m1
m1recip6 <- m1
m1recip7 <- m1
...
m1recip100 <- m1
I planned to create these through a loop, but this is less efficient because I only know how to do so by initializing all 100 objects before looping through them. I'm effectively looking for something similar to the macro facility in other languages (where m1recip&i would produce the names iteratively). I'm sure R can do this - how?
As mentioned above, reconsider saving many similar structured objects in global environment. Instead, use a named list which results in the maintenance of one, indexed object to maintain where R has many handlers (i.e., apply family) to run operations across all elements.
Specifically, consider replicate (wrapper to sapply) to build the 100 m1 elements and use setNames to name them accordingly. You lose no functionality of object if saved within a list.
model_list <- setNames(replicate(100, m1, simplify = FALSE),
paste0("m1recip", 1:100))
model_list$m1recip1
model_list$m1recip2
model_list$m1recip3
...
Instead of assigning m1 to 100 objects, we can create a list with 100 elements like the following:
m1recip_list <- lapply(1:100, function(x) m1)
We can then reference each element by element number m1recip_list[[10]] or apply a function to every element of the list using lapply:
lapply(m1recip_list, some_function)
You can dynamically create object names using the paste function in a loop, and you can assign them values using the assign function as opposed to the "<-" operator.
for(i in 1:100) {
assign(paste("m1recip",i, sep = ""), m1)
}

Applying multiple function via sapply

I'm trying to replicate solution on applying multiple functions in sapply posted on R-Bloggers but I can't get it to work in the desired manner. I'm working with a simple data set, similar to the one generated below:
require(datasets)
crs_mat <- cor(mtcars)
# Triangle function
get_upper_tri <- function(cormat){
cormat[lower.tri(cormat)] <- NA
return(cormat)
}
require(reshape2)
crs_mat <- melt(get_upper_tri(crs_mat))
I would like to replace some text values across columns Var1 and Var2. The erroneous syntax below illustrates what I am trying to achieve:
crs_mat[,1:2] <- sapply(crs_mat[,1:2], function(x) {
# Replace first phrase
gsub("mpg","MPG",x),
# Replace second phrase
gsub("gear", "GeArr",x)
# Ideally, perform other changes
})
Naturally, the code is not syntactically correct and fails. To summarise, I would like to do the following:
Go through all the values in first two columns (Var1 and Var2) and perform simple replacements via gsub.
Ideally, I would like to avoid defining a separate function, as discussed in the linked post and keep everything within the sapply syntax
I don't want a nested loop
I had a look at the broadly similar subject discussed here and here but, if possible, I would like to avoid making use of plyr. I'm also interested in replacing the column values not in creating new columns and I would like to avoid specifying any column names. While working with my existing data frame it is more convenient for me to use column numbers.
Edit
Following very useful comments, what I'm trying to achieve can be summarised in the solution below:
fun.clean.columns <- function(x, str_width = 15) {
# Make character
x <- as.character(x)
# Replace various phrases
x <- gsub("perc85","something else", x)
x <- gsub("again", x)
x <- gsub("more","even more", x)
x <- gsub("abc","ohmg", x)
# Clean spaces
x <- trimws(x)
# Wrap strings
x <- str_wrap(x, width = str_width)
# Return object
return(x)
}
mean_data[,1:2] <- sapply(mean_data[,1:2], fun.clean.columns)
I don't need this function in my global.env so I can run rm after this but even nicer solution would involve squeezing this within the apply syntax.
We can use mgsub from library(qdap) to replace multiple patterns. Here, I am looping the first and second column using lapply and assign the results back to the crs_mat[,1:2]. Note that I am using lapply instead of sapply as lapply keeps the structure intact
library(qdap)
crs_mat[,1:2] <- lapply(crs_mat[,1:2], mgsub,
pattern=c('mpg', 'gear'), replacement=c('MPG', 'GeArr'))
Here is a start of a solution for you, I think you're capable of extending it yourself. There's probably more elegant approaches available, but I don't see them atm.
crs_mat[,1:2] <- sapply(crs_mat[,1:2], function(x) {
# Replace first phrase
step1 <- gsub("mpg","MPG",x)
# Replace second phrase. Note that this operates on a modified dataframe.
step2 <- gsub("gear", "GeArr",step1)
# Ideally, perform other changes
return(step2)
#or one nested line, not practical if more needs to be done
#return(gsub("gear", "GeArr",gsub("mpg","MPG",x)))
})

How to index (subset) over a list of data.frames

I got a list of several data.frames and I want to remove the first 2 columns from each of the data.frames. I did it as follows, but feel this could be more R-ish.
data(mtcars)
data(iris)
myList <- list(A = mtcars, B = iris)
# helper function
removeCols <- function(df,vec) {
res <- df[,-vec]
}
lapply(myList,removeCols,1:2)
Obviously this does the job, but to me it seems like i must have missed something here (such as using an operator within lapply, cause it's technically a function too).
However, the major disadvantage of this approach is that you need a little helper function for every little task you want to do to all elements of that list.
Your code is perfectly good R. But you have two alternative options:
Use an anonymous function - this is a general solution
Use the [ operator - specific to this case
Your original:
xx <- lapply(myList,removeCols,1:2)
An anonymous function:
yy <- lapply(myList, function(df, vec){df[,-vec]}, 1:2)
Use the [ operator:
zz <- lapply(myList, "[", -(1:2))
These yield identical results
identical(xx, yy)
[1] TRUE
identical(xx, zz)
[1] TRUE
The only thing I can imagine at the moment to be more R-ish is to make it shorter and get rid of the helper-function.
data(mtcars)
data(iris)
myList <- list(A = mtcars, B = iris)
lapply(myList,function(x) x[,-(1:2)])
If you asking for a direct way to modify something:
myList[[1]][,-(1:2)]
But as lists are a quite open structure with no requirements to its content you can not index over its contents, as they can be really different. However if your tow data sets have the same dimension (nxm) than you can combine them to an 3d-array on which all the known indexing tricks will work.

Iterating over separate lists in R

I have lots of variables in R, all of type list
a100 = list()
a200 = list()
# ...
p700 = list()
Each variable is a complicated data structure:
a200$time$data # returns 1000 x 1000 matrix
Now, I want to apply code to each variable in turn. However, since R doesn't support pass-by-reference, I'm not sure what to do.
One idea I had was to create a big list of all these lists, i.e.,
biglist = list()
biglist[[1]] = a100
...
And then I could iterate over biglist:
for (i in 1:length(biglist)){
biglist[[i]]$newstuff = "profit"
# more code here
}
And finally, after the loop, go backwards so that existing code (that uses variable names) still works:
a100 = biglist[[1]]
# ...
The question is: is there a better way to iterate over a set of named lists? I have a feeling that I'm doing things horribly wrong. Is there something easier, like:
# FAKE, Idealized code:
foreach x in (a100, a200, ....){
x$newstuff = "profit"
}
a100$newstuff # "profit"
To parallel walk over lists you can use mapply, which will take parallel lists and then walk over them in lock-step. Furthermore, in a functional language you should emit the object that you want rather than modify the data structure within a function call.
You should use the sapply, apply, lapply, ... family of functions.
jim
jimmyb is quite right. lapply and sapply are specifically designed to work on lists. So they would work with your biglist as well. You shouldn't forget to return the object in the nested function though : An example :
X <- list(A=list(A1=1:2,A2=3:4),B=list(B1=5:6,B2=7:8))
lapply(X,function(i){
i$newstuff = "profit"
return(i)
})
Now as you said, R passes by value so you have multiple copies of the data roaming around. If you work with really big lists, you might want to try toning the memory usage down by working on each variable seperately, using assign and get. The following is considered bad coding, but can sometimes be necessary to avoid memory trouble :
A <- X[[1]] ; B <- X[[2]] #make the data
list.names <- c("A","B")
for (i in list.names){
tmp <- get(i)
tmp$newstuff <- "profit"
assign(i,tmp)
rm(tmp)
}
Make sure you are well aware of the implication this code has, as you're working within the global environment. If you need to do this more often, you might want to work with environments instead :
my.env <- new.env() # make the environment
my.env$A <- X[[1]];my.env$B <- X[[2]] # put vars in environment
for (i in list.names){
tmp <- get(i,envir=my.env)
tmp$newstuff <- "profit"
assign(i,tmp,envir=my.env)
rm(tmp)
}
my.env$A
my.env$B

Resources