I have 20 unique linear models created from 1 dataset. Each one was created by:
mymodel1 <- lm(y ~ x1 + etc, data=mydata)
Now all I want to do is create a list of the output of a command on all 20 models, e.g. something like:
summary(mymodel[i])$adj
for i=1,2,...,20
It's probably obvious, but I'm not finding anything on this.
Is this the best way to act on 20 variable names that change by a positive integer?
for (i in 1:20) print(somefunction(eval(parse(text=paste0("model", i))))$adj)
This should return a vector of items in your workspace that inherit from class of 'lm":
lm.names <- ls()[ sapply( ls(), function(x) 'lm' %in% class(get(x) ))]
This will return a list of summary items from all of them.
sapply( lm.names, function(x) summary( get(x) )
Notice the use of get (twice). The ls function returns the names of object but neither as the objects themselves nor as true R names, but rather as a character vector. You might want to look carefully at the "Value" section of ?summary.lm, because it's a list and perhaps you only want a few items form that list?
Related
I have a list of data frames. I want to use lapply on a specific column for each of those data frames, but I keep throwing errors when I tried methods from similar answers:
The setup is something like this:
a <- list(*a series of data frames that each have a column named DIM*)
dim_loc <- lapply(1:length(a), function(x){paste0("a[[", x, "]]$DIM")}
Eventually, I'll want to write something like results <- lapply(dim_loc, *some function on the DIMs*)
However, when I try get(dim_loc[[1]]), say, I get an error: Error in get(dim_loc[[1]]) : object 'a[[1]]$DIM' not found
But I can return values from function(a[[1]]$DIM) all day long. It's there.
I've tried working around this by using as.name() in the dim_loc assignment, but that doesn't seem to do the trick either.
I'm curious 1. what's up with get(), and 2. if there's a better solution. I'm constraining myself to the apply family of functions because I want to try to get out of the for-loop habit, and this name-as-list method seems to be preferred based on something like R- how to dynamically name data frames?, but I'd be interested in other, more elegant solutions, too.
I'd say that if you want to modify an object in place you are better off using a for loop since lapply would require the <<- assignment symbol (<- doesn't work on lapply`). Like so:
set.seed(1)
aList <- list(cars = mtcars, iris = iris)
for(i in seq_along(aList)){
aList[[i]][["newcol"]] <- runif(nrow(aList[[i]]))
}
As opposed to...
invisible(
lapply(seq_along(aList), function(x){
aList[[x]][["newcol"]] <<- runif(nrow(aList[[x]]))
})
)
You have to use invisible() otherwise lapply would print the output on the console. The <<- assigns the vector runif(...) to the new created column.
If you want to produce another set of data.frames using lapply then you do:
lapply(seq_along(aList), function(x){
aList[[x]][["newcol"]] <- runif(nrow(aList[[x]]))
return(aList[[x]])
})
Also, may I suggest the use of seq_along(list) in lapply and for loops as opposed to 1:length(list) since it avoids unexpected behavior such as:
# no length list
seq_along(list()) # prints integer(0)
1:length(list()) # prints 1 0.
I have a model object m1. I need to create 100 distinctly named copies so I can adjust and plot each. To create a copy, I currently do this as such:
m1recip1 <- m1
m1recip2 <- m1
m1recip3 <- m1
m1recip4 <- m1
m1recip5 <- m1
m1recip6 <- m1
m1recip7 <- m1
...
m1recip100 <- m1
I planned to create these through a loop, but this is less efficient because I only know how to do so by initializing all 100 objects before looping through them. I'm effectively looking for something similar to the macro facility in other languages (where m1recip&i would produce the names iteratively). I'm sure R can do this - how?
As mentioned above, reconsider saving many similar structured objects in global environment. Instead, use a named list which results in the maintenance of one, indexed object to maintain where R has many handlers (i.e., apply family) to run operations across all elements.
Specifically, consider replicate (wrapper to sapply) to build the 100 m1 elements and use setNames to name them accordingly. You lose no functionality of object if saved within a list.
model_list <- setNames(replicate(100, m1, simplify = FALSE),
paste0("m1recip", 1:100))
model_list$m1recip1
model_list$m1recip2
model_list$m1recip3
...
Instead of assigning m1 to 100 objects, we can create a list with 100 elements like the following:
m1recip_list <- lapply(1:100, function(x) m1)
We can then reference each element by element number m1recip_list[[10]] or apply a function to every element of the list using lapply:
lapply(m1recip_list, some_function)
You can dynamically create object names using the paste function in a loop, and you can assign them values using the assign function as opposed to the "<-" operator.
for(i in 1:100) {
assign(paste("m1recip",i, sep = ""), m1)
}
I am working with a list of lm models. Let's create a small example of that:
set.seed(1234)
mydata <- matrix(rnorm(40),ncol=4)
modlist <- list()
for (i in 1:3) {
modlist[[i]] <- lm(mydata[,1] ~ mydata[,i+1])
}
In reality there about 50 models. If you print the modlist object, you'll notice that the call attribute for each model is generic, namely lm(formula = mydata[, 1] ~ mydata[, i + 1]). As later subsets of this list will be needed, I would like to have the convenience to see the name of the dependent variable in each model, assigning that name to the respective call attribute:
modlist[[1]]$call <- "Factor 1"
One can see that the model call has changed to "Factor 1" in the first element of modlist. Let us say I have a vector of names, which I would like to assign:
modnames <- paste0("Factor",1:3)
It would be, of course, possible to assign the respective value of that vector to the respective model in the list, e.g.:
for (i in 1:3) {
modlist[[i]]$call <- modnames[i]
}
Is there a vectorized version of this? I suspect it will be mapply, but I can't figure out how to combine the assignment operator with extracting the respective element of the list, i.e. [[(). More of a purist anti-loop premature optimization exercise, but still :) Thank you!
Sometimes I have code which references a specific dataset based on some variable ID. I have then been creating lines of code using paste0, and then eval(parse(...)) that line to execute the code. This seems to be getting sloppy as the length of the code increases. Are there any cleaner ways to have dynamic data reference?
Example:
dataset <- "dataRef"
execute <- paste0("data.frame(", dataset, "$column1, ", dataset, "$column2)")
eval(parse(execute))
But now imagine a scenario where dataRef would be called for 1000 lines of code, and sometimes needs to be changed to dataRef2 or dataRefX.
Combining the comments of Jack Maney and G.Grothendieck:
It is better to store your data frames that you want to access by a variable in a list. The list can be created from a vector of names using get:
mynames <- c('dataRef','dataRef2','dataRefX')
# or mynames <- paste0( 'dataRef', 1:10 )
mydfs <- lapply( mynames, get )
Then your example becomes:
dataset <- 'dataRef'
mydfs[[dataset]][,c('column1','column2')]
Or you can process them all at once using lapply, sapply, or a loop:
mydfs2 <- lapply( mydfs, function(x) x[,c('column1','column2')] )
#G.Grothendieck has shown you how to use get and [ to elevate a character value and return the value of a named object and then reference named elements within that object. I don't know what your code was intended to accomplish since the result of executing htat code would be to deliver values to the console, but they would not have been assigned to a name and would have been garbage collected. If you wanted to use three character values: objname, colname1 and colname2 and those columns equal to an object named after a fourth character value.
newname <- "newdf"
assign( newname, get(dataset)[ c(colname1, colname2) ]
The lesson to learn is assign and get are capable of taking character character values and and accessing or creating named objects which can be either data objects or functions. Carl_Witthoft mentions do.call which can construct function calls from character values.
do.call("data.frame", setNames(list( dfrm$x, dfrm$y), c('x2','y2') )
do.call("mean", dfrm[1])
# second argument must be a list of arguments to `mean`
Function lm(...) returns an object of class 'lm'. How do I create an array of such objects? I want to do the following:
my_lm_array <- rep(as.lm(NULL), 20)
#### next, populate this array by running lm() repeatedly:
for(i in 1:20) {
my_lm_array[i] <- lm(my_data$results ~ my_data[i,])
}
Obviously the line "my_lm <- rep(as.lm(NULL), 20)" does not work. I'm trying to create an array of objects of type 'lm'. How do I do that?
Not sure it will answer your question, but if what you want to do is run a series of lm from a variable against different columns of a data frame, you can do something like this :
data <- data.frame(result=rnorm(10), v1=rnorm(10), v2=rnorm(10))
my_lms <- lapply(data[,c("v1","v2")], function(v) {
lm(data$result ~ v)
})
Then, my_lms would be a list of elements of class lm.
Well, you can create an array of empty/meaningless lm objects as follows:
z <- NA
class(z) <- "lm"
lm_array <- replicate(20,z,simplify=FALSE)
but that's probably not the best way to solve the problem. You could just create an empty list of the appropriate length (vector("list",20)) and fill in the elements as you go along: R is weakly enough typed that it won't mind you replacing NULL values with lm objects. More idiomatically, though, you can run lapply on your list of predictor names:
my_data <- data.frame(result=rnorm(10), v1=rnorm(10), v2=rnorm(10))
prednames <- setdiff(names(my_data),"result") ## extract predictor names
lapply(prednames,
function(n) lm(reformulate(n,response="result"),
data=my_data))
Or, if you don't feel like creating an anonymous function, you can first generate a list of formulae (using lapply) and then run lm on them:
formList <- lapply(prednames,reformulate,response="result") ## create formulae
lapply(formList,lm,data=my_data) ## run lm() on each formula in turn
will create the same list of lm objects as the first strategy above.
In general it is good practice to avoid using syntax such as my_data$result inside modeling formulae; instead, try to set things up so that all the variables in the model are drawn from inside the data object. That way methods like predict and update are more likely to work correctly ...