I have defined two global variables as lists, let's say:
importance.5maturity <- list()
importance.10maturity <- list()
I also have a function which runs randomForest and I want to add the Importance of this function in the list, in each loop (I am using a rolling window). I believe this can be done using list.append().
The input of this function has a variable named maturity. I want to have an if statement, in a way that if the list name in global variable has the same number as in maturity, the function stores the Importance information in that particular list. For example,
b <- randomForest(y ~., data= d.na, mtry=5, ntree=1000, importance=TRUE)
if(maturity==5){
importance.5maturity <- list.append(Importance(b))
But I don't know how to match the maturity and the number (5 and 10) in the list name so the function would choose the correct list to store the information in automatically.
I also don't want to use local variables, which the function would return, since I am returning another data frame from it.
Related
Context: I'd like to save the results of a Likelihood ratio test for a multinomial logistic regression in several dynamic variables, but I'm not sure how I could do that. This is what I've been trying:
library(lmtest)
indels = c("C.T","A.G","G.A","G.C","T.C","C.A","G.T","A.C","C.G","A.del","TAT.del","TCTGGTTTT.del","TACATG.del","GATTTC.del")
my_list = list()
for (i in 1:length(indels)) {
assign(paste0("lrtest_results_",indels[i]), my_list[[i]]) = lrtest(multinom_model_completo, indels[i])
}
I was basically trying to save each variable (with the name lrtest_results_ + the dynamic part of the variable name which depends on the vector indels) in a list using the assign method and paste0, but it doesn't seem to be working. Any help is very welcome!
The best way is to lapply the test function to each element of the vector indels and assign the names after.
my_list <- lapply(indels, \(x) lrtest(multinom_model_completo, x))
names(my_list) <- paste0("lrtest_results_", indels)
I want to write a script that will be used to choose the best degree of freedom for the spline function predictor in glm.
MODEL.1<-glm(ZAL ~ns(D1, df = i), data = DANE3, family =poisson, na.action=na.omit);
I wanted to use the loop to calculate sequentially models for values from 4 to 12, but I don't know how to make the loop save each model separately as an object, for example with the names "MODEL.df4", "MODEL.df5" and so on .
How can I code it in R?
There are many ways to do this. You could store the results in a named list. Here's a simple model function which returns a data.frame.
#library(tidyverse)
modelFn <- function(i){data.frame(IN = i, OUT = 7 + i)}
Initialize an empty list.
MODEL = NULL
Run the model against values from 4 to 12, and save the named result in your list
walk(4:12, ~ {MODEL[[paste0("df", .x)]] <<- modelFn(.x)})
I am writing a function to create some predicted variables within an existing data set that I am using to run some ML models. My function looks like this:
doall <- function(x1, x2){
J48 <- J48(ML, data=df1)
#summary(J48)
X1 <- predict(J48, df1, type="class")
X2 <- predict(J48, df2, type="class")
#return(X1)
}
doall(df1$DT_predict, df2$DT_predict1)
J48 is a decision tree model (via RWeka). The code works (doall(df1$DT_predict1, df2$DT_predict1)) properly, I believe, because when I include the return function, it returns the values of X1. However, the predicted variables are not getting generated/stored in the data frames (df1 and df2). Ideally, I would like to have the dataframe names within the function, but that's the next step.
Can someone show how can I store the variables X1 and X2 within dataframes df1 and df2 respectively.
Ideally your question would have a bit more information about what your data frames look like, what X1 and X2 look like, and where your data frames are stored. For my answer I am assuming your data frames are stored in the global environment, and you want to modify them through a function.
This question has to do with scoping. For an in-depth description of scoping check out this article http://adv-r.had.co.nz/Functions.html#lexical-scoping
First, by assigning your variables within a function you are assigning them in a local environment. This means that the variables you are assigning do not carry over into the global environment (what you see when you type ls().
I believe you either want change a 'global variable' from within a function. This is done by the
<<-
command
for instance
a <- 2
print(a)
returns 2
change_a<-function(x){
x<-x*4
}
change_a(a)
print(a)
still returns 2
while
change_a<-function(x){
x<<-x*4
}
change_a(a)
print(a)
would return 8
I think you want to use the <<- operator instead of <- to accomplish what you want.
On a related note, it is not generally considered to be best practices to assign and change global variables from within a function.
I am working with a list of lm models. Let's create a small example of that:
set.seed(1234)
mydata <- matrix(rnorm(40),ncol=4)
modlist <- list()
for (i in 1:3) {
modlist[[i]] <- lm(mydata[,1] ~ mydata[,i+1])
}
In reality there about 50 models. If you print the modlist object, you'll notice that the call attribute for each model is generic, namely lm(formula = mydata[, 1] ~ mydata[, i + 1]). As later subsets of this list will be needed, I would like to have the convenience to see the name of the dependent variable in each model, assigning that name to the respective call attribute:
modlist[[1]]$call <- "Factor 1"
One can see that the model call has changed to "Factor 1" in the first element of modlist. Let us say I have a vector of names, which I would like to assign:
modnames <- paste0("Factor",1:3)
It would be, of course, possible to assign the respective value of that vector to the respective model in the list, e.g.:
for (i in 1:3) {
modlist[[i]]$call <- modnames[i]
}
Is there a vectorized version of this? I suspect it will be mapply, but I can't figure out how to combine the assignment operator with extracting the respective element of the list, i.e. [[(). More of a purist anti-loop premature optimization exercise, but still :) Thank you!
Function lm(...) returns an object of class 'lm'. How do I create an array of such objects? I want to do the following:
my_lm_array <- rep(as.lm(NULL), 20)
#### next, populate this array by running lm() repeatedly:
for(i in 1:20) {
my_lm_array[i] <- lm(my_data$results ~ my_data[i,])
}
Obviously the line "my_lm <- rep(as.lm(NULL), 20)" does not work. I'm trying to create an array of objects of type 'lm'. How do I do that?
Not sure it will answer your question, but if what you want to do is run a series of lm from a variable against different columns of a data frame, you can do something like this :
data <- data.frame(result=rnorm(10), v1=rnorm(10), v2=rnorm(10))
my_lms <- lapply(data[,c("v1","v2")], function(v) {
lm(data$result ~ v)
})
Then, my_lms would be a list of elements of class lm.
Well, you can create an array of empty/meaningless lm objects as follows:
z <- NA
class(z) <- "lm"
lm_array <- replicate(20,z,simplify=FALSE)
but that's probably not the best way to solve the problem. You could just create an empty list of the appropriate length (vector("list",20)) and fill in the elements as you go along: R is weakly enough typed that it won't mind you replacing NULL values with lm objects. More idiomatically, though, you can run lapply on your list of predictor names:
my_data <- data.frame(result=rnorm(10), v1=rnorm(10), v2=rnorm(10))
prednames <- setdiff(names(my_data),"result") ## extract predictor names
lapply(prednames,
function(n) lm(reformulate(n,response="result"),
data=my_data))
Or, if you don't feel like creating an anonymous function, you can first generate a list of formulae (using lapply) and then run lm on them:
formList <- lapply(prednames,reformulate,response="result") ## create formulae
lapply(formList,lm,data=my_data) ## run lm() on each formula in turn
will create the same list of lm objects as the first strategy above.
In general it is good practice to avoid using syntax such as my_data$result inside modeling formulae; instead, try to set things up so that all the variables in the model are drawn from inside the data object. That way methods like predict and update are more likely to work correctly ...