LME with Random effects R code error message? - r

I'm trying to create a random mixed effects model with CCT as the outcome and time*behavior as the main predictor interaction term. I'm having a hard time running the model, as this is my code and it keeps saying "unused argument" error message. I'm using the variable ID as the random effects variable, and this is my code. thank you!
model.set$f <- with(model.set, ID)
m <- lmer(CCT ~ time*behavior, random = ~ 1|f, data = model.set)

The argument causing the error in your command is random: when using the function lme4::lmer, there is no "random" argument, the random effect is part of the main formula. The syntax you are using is specific to the function nlme::lme, which has a "random" argument to define random effects. See here for examples.
The following command should run without error:
m <- lmer(CCT ~ time * behavior + (1|f), data = model.set)

Related

glm package glm.nb() "Error: no valid set of coefficients has been found: please supply starting values"

I am running a negative binomial regression on my dataset using the glm.nb() function.
My model looks something like this:
m_nb= glm.nb(Error_Count ~ TotalWL + Auto_frac +PHONE+JUSTIF_weight + MESSAGE_OTHER_count + Hour+
I(Auto_frac^2)+I(TotalWL^2), data = df)
When I ran it with a dataset of 10,000, the model is able to run, however, when I ran it with a larger dataset (60,000), I got this error:
`Error: no valid set of coefficients has been found: please supply starting values`
I then tried to give it some start values, but still throw the same error
m_nb= glm.nb(Error_Count ~ TotalWL + Auto_frac +PHONE+JUSTIF_weight + MESSAGE_OTHER_count + Hour+
I(Auto_frac^2)+I(TotalWL^2), data = df, start = c(0.02, 0.3,0.2,3,43, 4,13,0.04, 100))
Error: cannot find valid starting values: please specify some
But the model still doesn't converge. How should I set the starting value?
I also tried the same model with the fenebin() function in the fixest pacakage and the model works. However, I need the glm package, since the fixest package does not provide the standard error (S.E.) in the predict().
Thank you.

Random effects (more than one) in multinomial regression in R (mblogit)

I'm trying to run a multinomial regression for the first time. I am attempting to build the code based on a built-in dataset, but I'm having trouble getting it to do what I need. I want to run a model with random effects only.
Here is the code with one random effect which I think is fine:
library(MASS)
library(mlogit)
house.mblogit1 <- mblogit(Sat ~ 1, random=~1|Infl, maxit=1000, estimator = "REML",data = housing)
However, I can't find an example of how to add another random effect (e.g., Type). I tried it and I can't figure out the syntax. Both these are wrong apparently:
house.mblogit2 <- mblogit(Sat ~ 1, random=~1|Infl+ ~1|Type, data = housing)
house.mblogit3 <- mblogit(Sat ~ 1, random=~1|Infl+Type, data = housing)
Error: Invalid random formula
Is it even possible to do it?

ggcoef_model error when two random intercepts

When trying to graph the conditional fixed effects of a glmmTMB model with two random intercepts in GGally I get the error:
There was an error calling "tidy_fun()". Most likely, this is because the
function supplied in "tidy_fun=" was misspelled, does not exist, is not
compatible with your object, or was missing necessary arguments (e.g. "conf.level=" or "conf.int="). See error message below.
Error: Error in "stop_vctrs()":
! Can't recycle "..1" (size 3) to match "..2" (size 2).`
I have tinkered with figuring out the issue and it seems to be related to the two random intercepts included in the model. I have also tried extracting the coefficient and standard error information separately through broom.mixed::tidy and then feeding the data frame into GGally:ggcoef() with no avail. Any suggestions?
# Example with built-in randu data set
data(randu)
randu$A <- factor(rep(c(1,2), 200))
randu$B <- factor(rep(c(1,2,3,4), 100))
# Model
test <- glmmTMB(y ~ x + z + (0 +x|A) + (1|B), family="gaussian", data=randu)
# A few of my attempts at graphing--works fine when only one random effects term is in model
ggcoef_model(test)
ggcoef_model(test, tidy_fun = broom.mixed::tidy)
ggcoef_model(test, tidy_fun = broom.mixed::tidy, conf.int = T, intercept=F)
ggcoef_model(test, tidy_fun = broom.mixed::tidy(test, effects="fixed", component = "cond", conf.int = TRUE))
There are some (old!) bugs that have recently been fixed (here, here) that would make confidence interval reporting on RE parameters break for any model with multiple random terms (I think). I believe that if you are able to install updated versions of both glmmTMB and broom.mixed:
remotes::install_github("glmmTMB/glmmTMB/glmmTMB#ci_tweaks")
remotes::install_github("bbolker/broom.mixed")
then ggcoef_model(test) will work.

Extracting predictions from a GAM model with splines and lagged predictors

I have some data and am trying to teach myself about utilize lagged predictors within regression models. I'm currently trying to generate predictions from a generalized additive model that uses splines to smooth the data and contains lags.
Let's say I have the following data and have split the data into training and test samples.
head(mtcars)
Train <- sample(1:nrow(mtcars), ceiling(nrow(mtcars)*3/4), replace=FALSE)
Great, let's train the gam model on the training set.
f_gam <- gam(hp ~ s(qsec, bs="cr") + s(lag(disp, 1), bs="cr"), data=mtcars[Train,])
summary(f_gam)
When I go to predict on the holdout sample, I get an error message.
f_gam.pred <- predict(f_gam, mtcars[-Train,]); f_gam.pred
Error in ExtractData(object, data, NULL) :
'names' attribute [1] must be the same length as the vector [0]
Calls: predict ... predict.gam -> PredictMat -> Predict.matrix3 -> ExtractData
Can anyone help diagnose the issue and help with a solution. I get that lag(__,1) leaves a data point as NA and that is likely the reason for the lengths being different. However, I don't have a solution to the problem.
I'm going to assume you're using gam() from the mgcv library. It appears that gam() doesn't like functions that are not defined in "base" in the s() terms. You can get around this by adding a column which include the transformed variable and then modeling using that variable. For example
tmtcars <- transform(mtcars, ldisp=lag(disp,1))
Train <- sample(1:nrow(mtcars), ceiling(nrow(mtcars)*3/4), replace=FALSE)
f_gam <- gam(hp ~ s(qsec, bs="cr") + s(ldisp, bs="cr"), data= tmtcars[Train,])
summary(f_gam)
predict(f_gam, tmtcars[-Train,])
works without error.
The problem appears to be coming from the mgcv:::get.var function. It tires to decode the terms with something like
eval(parse(text = txt), data, enclos = NULL)
and because they explicitly set the enclosure to NULL, variable and function names outside of base cannot be resolved. So because mean() is in the base package, this works
eval(parse(text="mean(x)"), data.frame(x=1:4), enclos=NULL)
# [1] 2.5
but because var() is defined in stats, this does not
eval(parse(text="var(x)"), data.frame(x=1:4), enclos=NULL)
# Error in eval(expr, envir, enclos) : could not find function "var"
and lag(), like var() is defined in the stats package.

Predict function from Caret package give an Error

I am doing just a regular logistic regression using the caret package in R. I have a binomial response variable coded 1 or 0 that is called a SALES_FLAG and 140 numeric response variables that I used dummyVars function in R to transform to dummy variables.
data <- dummyVars(~., data = data_2, fullRank=TRUE,sep="_",levelsOnly = FALSE )
dummies<-(predict(data, data_2))
model_data<- as.data.frame(dummies)
This gives me a data frame to work with. All of the variables are numeric. Next I split into training and testing:
trainIndex <- createDataPartition(model_data$SALE_FLAG, p = .80,list = FALSE)
train <- model_data[ trainIndex,]
test <- model_data[-trainIndex,]
Time to train my model using the train function:
model <- train(SALE_FLAG~. data=train,method = "glm")
Everything runs nice and I get a model. But when I run the predict function it does not give me what I need:
predict(model, newdata =test,type="prob")
and I get an ERROR:
Error in dimnames(out)[[2]] <- modelFit$obsLevels :
length of 'dimnames' [2] not equal to array extent
On the other hand when I replace "prob" with "raw" for type inside of the predict function I get prediction but I need probabilities so I can code them into binary variable given my threshold.
Not sure why this happens. I did the same thing without using the caret package and it worked how it should:
model2 <- glm(SALE_FLAG ~ ., family = binomial(logit), data = train)
predict(model2, newdata =test, type="response")
I spend some time looking at this but not sure what is going on and it seems very weird to me. I have tried many variations of the train function meaning I didn't use the formula and used X and Y. I used method = 'bayesglm' as well to check and id gave me the same error. I hope someone can help me out. I don't need to use it since the train function to get what I need but caret package is a good package with lots of tools and I would like to be able to figure this out.
Show us str(train) and str(test). I suspect the outcome variable is numeric, which makes train think that you are doing regression. That should also be apparent from printing model. Make it a factor if you want to do classification.
Max

Resources