loop through column glmer - r

I am trying to run a glmer by looping through columns in my dataset which contain response variables (dat_prob).The code I am using is as follows, adapted from code researched on another stackoverflow question (Looping through columns in R).
Their code:
dat_y<-(dat[,c(2:1130)])
dat_x<-(dat[,c(1)])
models <- list()
#
for(i in names(dat_y)){
y <- dat_y[i]
model[[i]] = lm( y~dat_x )
}
My code:
dat_prob<-(probs[,c(108:188)])
dat_age<-(probs[,c(12)])
dat_dist<-(probs[,c(20)])
fyearcap=(probs[,c(25)])
fstation=(probs[,c(22)])
fnetnum=(probs[,c(23)])
fdepth=(probs[,c(24)])
models <- list()
#
for(i in names(dat_prob)){
y <- dat_prob[i]
y2=as.vector(y)
model[[i]] = glmer( y ~ dat_age * dat_dist + (1|fyearcap) + (1|fstation)+
(1|fnetnum)+ (1|fdepth),family=binomial,REML=TRUE )
}
And I receive this error, similar to the error received in the hyperlinked question:
Error in model.frame.default(drop.unused.levels = TRUE, formula = y ~ :
invalid type (list) for variable 'y'
I have been working through this for hours and now can't see the forest through the trees.
Any help is appreciated.

y <- dat_prob[i] makes y a list (or data frame, whatever). Lists are vectors - try is.vector(list()), so even y2 = as.vector(y) is still a list/data frame (even though you don't use it).
class(as.vector(mtcars[1]))
# [1] "data.frame"
To extract a numeric vector from a data frame, use [[: y <- dat_prob[[i]].
class(mtcars[[1]])
# [1] "numeric"
Though I agree with Roman - using formulas is probably a nicer way to go. Try something like this:
for(i in names(dat_prob)) {
my_formula = as.formula(paste(i,
"~ dat_age * dat_dist + (1|fyearcap) + (1|fstation)+ (1|fnetnum)+ (1|fdepth)"
))
model[[i]] = glmer(my_formula, family = binomial, REML = TRUE)
}
I'm also pretty skeptical of whatever you're doing trying 80 different response variables, but that's not your question...

Related

Recommended way of creating reusable objects within an R function

Suppose we have the following data:
# simulate data to fit
set.seed(21)
y = rnorm(100)
x = .5*y + rnorm(100, 0, sqrt(.75))
Let's also suppose the user has fit a model:
# user fits a lm
mod = lm(y~x)
Now suppose I have an R package designed to perform several operations on the object mod. Just for simplicify, suppose we have two functions, one that plots the data, and one that computes the coefficients. However, as an intermediary, suppose we want to perform some operation on the data (in this example, add ten).
Example:
# function that adds ten to all scores
add_ten = function(model) {
data = model$model
data = data + 10
return(data)
}
# functions I defined that do something to the "add_ten" dataset
plot_ten = function(model) {
new_data = data.frame(add_ten(model))
x = all.vars(formula(model))[2]
y = all.vars(formula(model))[1]
ggplot2::ggplot(new_data, aes_string(x=x, y=y)) + geom_point() + geom_smooth()
}
coefs_ten = function(model) {
new_data = data.frame(add_ten(model))
coef(lm(formula(model), new_data))
}
(Obviously, this is pretty silly to do. In actuality, the operation I want to perform is multiple imputation, which is computationally intensive).
Notice in the above example I have to call the add_ten function twice, once for plot_ten and once for coefs_ten. This is inefficient.
So, now to my question, what is the best way to create a reusable object within a function?
I could, of course, create an object to be placed in the user's global environment:
add_ten = function(model) {
# check for add_ten_data in the global environment
if (exists("add_ten_data", where = .GlobalEnv)) return(get("add_ten_data", envir = .GlobalEnv))
data = model$model
data = data + 10
# assign add_ten_data to the global environment
assign('add_ten_data', data, envir = .GlobalEnv)
return(data)
}
I'm happy to do so, but worry about the "netiquette" of putting something in the user's environment. There's also a potential problem if users happen to have an object called "add_ten_data" in their environment.
So, what is the best way of accomplishing this?
Thanks in advance!
You should certainly avoid writing an object to the global environment. If you find that you have to repeat the same computationally expensive task at the top of a number of different functions, it means you are carrying out the computationally expensive task too late.
For example, you could create an S3 class that holds the necessary components to produce a "cheap" plot and a "cheap" extraction of the coefficients. It even has the benefits of generic dispatch:
add_ten <- function(model) model$model + 10
lm_tens <- function(formula, data)
{
model <- if(missing(data)) lm(formula) else lm(formula, data = data)
structure(list(data = data.frame(add_ten(model)), model = model),
class = "tens")
}
plot.tens <- function(tens) {
x = all.vars(formula(tens$data))[2]
y = all.vars(formula(tens$data))[1]
ggplot2::ggplot(tens$data, ggplot2::aes(x = x, y = y)) +
ggplot2::geom_point() +
ggplot2::geom_smooth()
}
coef.tens = function(tens) {
coef(lm(formula(tens$model), data = tens$data))
}
So now we just need to do:
set.seed(21)
y = rnorm(100)
x = .5*y + rnorm(100, 0, sqrt(.75))
mod <- lm_tens(y ~ x)
coef(mod)
#> (Intercept) x
#> 4.3269914 0.5775404
plot(mod)
#> `geom_smooth()` using method = 'loess' and formula 'y ~ x'
Note that we only need to call add_ten once here.

Change name of object variable R

Sorry for the poor wording, I'm hoping to change the name of an internal variable in a dgCMatrix. Specifically I want to change "Dimnames" to "dimnames" (I've attached a picture of the object variables for clarity), as I believe that may help with an error I'm getting (I'll post that at the bottom).
I've tried this, but to no avail
rename(emat#Dimnames, "dimnames")
The error I hope to fix with this:
> rvel.cd <- gene.relative.velocity.estimates(emat,nmat,deltaT=2,
+ kCells=10,
+ cell.dist=cell.dist,
+ fit.quantile=fit.quantile,
+ n.cores=2)
matching cells between cell.dist and emat/nmat ... done
calculating cell knn ... done
calculating convolved matrices ... Error in intI(i, n = d[1], dn[[1]], give.dn = FALSE) :
no 'dimnames[[.]]': cannot use character indexing
Reproducible data:
#Generate dgCMatrix
library(Matrix)
i <- c(1,3:8)
j <- c(2,9,6:10)
x <- 7 * (1:7)
emat <- sparseMatrix(i, j, x = x)

How to use strings from a list as variables in mediate and lm in R?

I am trying to run lots of mediation analyses and to make it quicker I'm trying to put the lm() and mediate() functions inside a for loop. I then pass a list of lists into the loop where each item of the list is a list of three in the form c("", "", "").
Passing the items into the loop and unlisting them to have single strings for X, M and Y variables is fine. I've tried many variations on get(), eval() and assign() within the mediate() function to no avail. I think this is due to my use of get() within lm().
The way I think my code should look:
MedVarList <- list(c('SCI', 'rMEQ', 'SIDAS'))
for(i in MedVarList){
X <- unlist((i)[1])
M <- unlist((i)[2])
Y <- unlist((i)[3])
model.M <- lm(get(M) ~ get(X), data = NewScDat)
model.Y <- lm(get(Y) ~ get(X) + get(M), data = NewScDat)
results <- mediate(model.M, model.Y, treat=get(X), mediator=get(M),
boot=TRUE, sims=500)
}
The model.M and model.Y bits work fine. It's the treat= and mediator= inside mediate() that I simply cannot figure out. I get this error:
Error in get(X) : object 'SCI' not found
If I change the mediate() call to include the variable names directly I get a different error:
results <- mediate(model.M, model.Y, treat='SCI', mediator='rMEQ',
boot=TRUE, sims=500)
Error in `[.data.frame`(m.data, , treat) : undefined columns selected
I then thought that lm() may be using "get(X)" as a variable name instead of "SCI" which is what get(X) spits out intially:
results <- mediate(model.M, model.Y, treat='get(X)', mediator='get(M)',
boot=TRUE, sims=500)
Error in get(M) : object 'rMEQ' not found
And just to test what's going on I looked at what get(X) and get(M) are now spitting out:
get(X)
Error in get(X) : object 'SCI' not found
get(M)
Error in get(M) : object 'rMEQ' not found
What I'm really trying to achieve is to be able to run mediate() inside a loop using a list of lists as described above. I'm doing this to avoid having multiple mediate() functions repeated with manual setup.
Here's my MWE of the successful solution:
library(mediation)
MedVarList <- list(c('SCI', 'rMEQ', 'SIDAS'))
for(i in MedVarList){
X <- unlist((i)[1])
M <- unlist((i)[2])
Y <- unlist((i)[3])
FormulaM <- paste(M,X,sep = " ~ ") # Results in a string "rMEQ ~ SCI"
FormulaY <- paste(Y,"~", X,"+",M,sep=' ') # Results in a string "SIDAS ~ SCI + rMEQ"
model.M <- lm(FormulaM, data=df)
model.Y <- lm(FormulaY, data=df)
results <- mediate(model.M, model.Y, treat=X, mediator=M,
boot=TRUE, sims=500)
}
Thanks for the tips and suggestions all. #Parfait - I've included the dput() but could you point me towards an FAQ or similar explaining the reasoning behind this?
EDIT - I understand what dput() is and does now so I've removed it from the MWE because I'd used it inappropriately.
Fuller example including useful recording of results for anyone that needs it:
MedVarList <- list(c('SCI', 'rMEQ', 'SIDAS'))
NBootstraps = 5000
MediationResults <- list()
j <- 1
for(i in MedVarList){
X <- unlist((i)[1])
M <- unlist((i)[2])
Y <- unlist((i)[3])
FormulaM <- paste(M,X,sep = " ~ ")
FormulaY <- paste(Y,"~", X,"+",M,sep=' ')
model.M <- lm(FormulaM, data = NewScDat)
model.Y <- lm(FormulaY, data = NewScDat)
MediationResults[[j]] <- summary(mediate(model.M, model.Y, treat=X, mediator=M,
boot=TRUE, sims=NBootstraps))
j <- j + 1
}

How to use the for loop with function needing for a string field?

I am using the smbinning R package to compute the variables information value included in my dataset.
The function smbinning() is pretty simple and it has to be used as follows:
result = smbinning(df= dataframe, y= "target_variable", x="characteristic_variable", p = 0.05)
So, df is the dataset you want to analyse, y the target variable and x is the variable of which you want to compute the information value statistics; I enumerate all the characteristic variables as z1, z2, ... z417 to be able to use a for loop to mechanize all the analysis process.
I tried to use the following for loop:
for (i in 1:417) {
result = smbinning(df=DATA, y = "FLAG", x = "DATA[,i]", p=0.05)
}
in order to be able to compute the information value for each variable corresponding to i column of the dataframe.
The DATA class is "data.frame" while the resultone is "character".
So, my question is how to compute the information value of each variable and store that in the object denominated result?
Thanks! Any help will be appreciated!
No sample data is provided I can only hazard a guess that the following will work:
results_list = list()
for (i in 1:417) {
current_var = paste0('z', i)
current_result = smbinning(df=DATA, y = "FLAG", x = current_var, p=0.05)
results_list[i] = current_result$iv
}
You could try to use one of the apply methods, iterating over the z-counts. The x value to smbinning should be the column name not the column.
results = sapply(paste0("z",1:147), function(foo) {
smbinning(df=DATA, y = "FLAG", x = foo, p=0.05)
})
class(results) # should be "list"
length(results) # should be 147
names(results) # should be z1,...
results[[1]] # should be the first result, so you can also iterate by indexing
I tried the following, since you had not provided any data
> XX=c("IncomeLevel","TOB","RevAccts01")
> res = sapply(XX, function(z) smbinning(df=chileancredit.train,y="FlagGB",x=z,p=0.05))
Warning message:
NAs introduced by coercion
> class(res)
[1] "list"
> names(res)
[1] "IncomeLevel" "TOB" "RevAccts01"
> res$TOB
...
HTH

How to do introspection in R

I am somewhat new to R, and i have this piece of code which generates a variable that i don't know the type for. Are there any introspection facility in R which will tell me which type this variable belongs to?
The following illustrates the property of this variable:
I am working on linear model selection, and the resource I have is lm result from another model. Now I want to retrieve the lm call by the command summary(model)$call so that I don't need to hardcode the model structure. However, since I have to change the dataset, I need to do a bit of modification on the "string", but apparently it is not a simple string. I wonder if there is any command similar to string.replace so that I can manipulate this variable from the variable $call.
> str<-summary(rdnM)$call
> str
lm(formula = y ~ x1, data = rdndat)
> str[1]
lm()
> str[2]
y ~ x1()
> str[3]
rdndat()
> str[3] <- data
Warning message:
In str[3] <- data :
number of items to replace is not a multiple of replacement length
> str
lm(formula = y ~ x1, data = c(10, 20, 30, 40))
> str<-summary(rdnM)$call
> str
lm(formula = y ~ x1, data = rdndat)
> str[3] <- 'data'
> str
lm(formula = y ~ x1, data = "data")
> str<-summary(rdnM)$call
> type str
Error: unexpected symbol in "type str"
>
In terms of introspection: R allows you to easily examine and operate on language objects.
For more details, see R Language Definition, particularly sections 2 and 6. For instance, in your case, summary(rdnM)$call is a "call" object. You can retrieve pieces of it by indexing, but you can't construct another call object by assigning to indices like you are trying to do. You'd have to construct a new call.
In your case you are constructing an updated call to lm() out of an existing call. If you want to reuse the formula on different data, you would extract the formula from the call object via formula(foo$call), like so:
foo <- lm(formula = y ~ x1, data = data.frame(y=rnorm(10),x1=rnorm(10)))
bar <- lm(formula(foo$call), data = data.frame(y=rnorm(10),x1=rnorm(10)))
On the other hand, if you are trying to update the formula, you could use update():
baz <- update(bar, . ~ . - 1)
baz$call
##>lm(formula = y ~ x1 - 1, data = data.frame(y = rnorm(10), x1 = rnorm(10)))

Resources