lmer multilevel fit with intercept constraint - r

I regularly have this problem: I want to fit a multilevel regression, with constraint. I don't know how to do that. I usualy end up using lavaan, as it allows to set constraints on the regression coefficients. But still it can't have random slope models (only random intercept, and truth is I don't know how to set a constraint on the intercept in lavaan either), and I would like to have a multilevel approach.
So basically I have y variable having a second order polynomial dependence on x, with coefficients depending on the subject ID:
library(data.table)
library(ggplot2)
df <- data.table(x = rep(0:10,5),ID = rep(LETTERS[1:5],each = 11))
df[,a:= rnorm(1,2,1),by = ID]
df[,b:= rnorm(1,1,0.2),by = ID]
df[,y := rnorm(.N,0,10) + a*x + b*x^2 ]
ggplot(df,aes(x,y,color = ID))+
geom_point()
and I can do normall multilevel:
lmer(y ~ x + I(x^2) + (x+ I(x^2)|ID),df)
But I would like to constrain the intercept to be 0. Is there a simple way to do so ?
Thank you

You can suppress the intercept with -1. For example:
coef(summary(lmer(y ~ x + I(x^2) + (x+ I(x^2)|ID),df)))
Estimate Std. Error t value
(Intercept) -1.960196 4.094491 -0.4787398
x 2.535092 1.754963 1.4445275
I(x^2) 1.015212 0.130004 7.8090889
coef(summary(lmer(y ~ -1 + x + I(x^2) + (x+ I(x^2)|ID),df)))
Estimate Std. Error t value
x 1.831692 0.9780500 1.872800
I(x^2) 1.050261 0.1097583 9.568856

Related

How to develop a hierarchical model to see the heterogeneity in mean of a specific variable using glmer?

I am using the following model in R:
glmer(y ~ x + z + (1|id), weights = specification, family = binomial, data = data)
which:
y ~ binomial(y, specification)
Logit(y) = intercept + a*x + b*z
a and b are coefficients for x and z variables.
a|a0+a1 = a0 + a1*I
One of the variables (here x) depends on other variable (here I), so I need a hierarchical model to see the heterogeneity in mean of the x.
I would appreciate it if anyone could help me with this problem?
Sorry if the question does not look professional! This is one of my first experiences.
I'm not perfectly sure I understand the question, but: if Logit(y) = intercept + a*x + b*z and a = a0 + a1*I, then it would seem that
Logit(y) = intercept + (a0+a1*I)*x + b*z
This looks like a straightforward interaction model:
glmer(y ~ 1 + x + x:I + z + (1|id), ...)
to make it more explicit, this could also be written as
glmer(y ~ 1 + x + I(x*I) + z + (1|id), ...)
(although the use of I as a predictor variable and in the I() function is a little bit confusing at first glance ...)

Calculating confusion matrix for fixed effect logit

I would like to ask how to calculate a confusion matrix for a fixed effect logit model (bife package)
With the basic logit model (glm) there is no problem, but with fixed effect logit there is.
For some reason the number of predictions is different for logit and fixed effect logit.
Example:
library(bife)
library(tidyverse)
library(caret)
dataset <- psid
logit <- glm(LFP ~ AGE + I(AGE^2) + log(INCH) + KID1 + KID2 + KID3, data = dataset, family = "binomial")
mod <- bife(LFP ~AGE + I(AGE^2) + log(INCH) + KID1 + KID2 + KID3| ID, dataset)
summary(mod)
summary(logit)
predict(logit)
predict(mod)
Y <- factor(dataset$LFP)
PRE <- factor(round(predict(logit, type = "response")))
PRE_FIX <- factor(round(predict(mod, type = "response")))
confusionMatrix(Y, PRE)
# Not working
confusionMatrix(Y, PRE_FIX)
It is possible to compute the confusionMatrix:
confusionMatrix<-table(true=Y , pred = round(fitted(PRE_FIX)))
And then, convert the confusion matrix to a table shape.
0 1
0 TruePositive FalseNegative
1 FalsePositive TrueNegative

heteroscedasticity: weights in lm function in R

I am confused. I have the following model: lm(GAV ~ EMPLOYED). This model has heteroscedasticity, and I believe the error standard deviation of this model can be approximated by a variable called SDL.
I have fitted the corresponding weighted model, resulting after dividing each term by variable SDL, using two forms:
lm(I(GAV/SDL) ~ I(1/SDL) + I(EMPLOYED/SDL)-1)
And
lm(GAV ~EMPLOYED,weights = 1/SDL)
I thought they would yield the same results. However, I get different parameters estimates...
Can anyone show me the error I am making?
Thanks in advance!
Fede
help("lm") clearly explains:
weighted least squares is used with weights weights (that is,
minimizing sum(w*e^2));
So:
x <- 1:10
set.seed(42)
w <- sample(10)
y <- 1 + 2 * x + rnorm(10, sd = sqrt(w))
lm(y ~ x, weights = 1/w)
#Call:
# lm(formula = y ~ x, weights = 1/w)
#
#Coefficients:
#(Intercept) x
# 3.715 1.643
lm(I(y/w^0.5) ~ I(1/w^0.5) + I(x/w^0.5) - 1)
#Call:
# lm(formula = I(y/w^0.5) ~ I(1/w^0.5) + I(x/w^0.5) - 1)
#
#Coefficients:
#I(1/w^0.5) I(x/w^0.5)
# 3.715 1.643
Btw., you might be interested in library(nlme); help("gls"). It offers more sophisticated possibilities for modelling heteroscedasticity.

Variable order in interaction terms

I'm trying to fit a number of linear models as shown below. It is important that all interaction terms are sorted lexicographically. Note that the second model is missing the main effect for x.
x = rnorm(100)
y = rnorm(100)
z = x + y + rnorm(100)
m1 = glm(z ~ x + y + x:y)
m2 = glm(z ~ y + x:y)
The models don't behave as expected with respect to the interaction terms:
m1:
x:y -0.1565 0.1151 -1.360 0.1770
m2:
y:x -0.2776 0.1416 -1.961 0.0528 .
I understand that there may be a way to use the interaction() function with the lex.order argument but I can't figure out how or, indeed, whether this is the best way to go. Advice?

What does the R formula y~1 mean?

I was reading the documentation on R Formula, and trying to figure out how to work with depmix (from the depmixS4 package).
Now, in the documentation of depmixS4, sample formula tends to be something like y ~ 1.
For simple case like y ~ x, it is defining a relationship between input x and output y, so I get that it is similar to y = a * x + b, where a is the slope, and b is the intercept.
If we go back to y ~ 1, the formula is throwing me off. Is it equivalent to y = 1 (a horizontal line at y = 1)?
To add a bit context, if you look at the depmixs4 documentation, there is one example below
depmix(list(rt~1,corr~1),data=speed,nstates=2,family=list(gaussian(),multinomial()))
I think in general, formula that end with ~ 1 is confusing to me. Can any explain what ~ 1 or y ~ 1 mean?
Many of the operators used in model formulae (asterix, plus, caret) in R, have a model-specific meaning and this is one of them: the 'one' symbol indicates an intercept.
In other words, it is the value the dependent variable is expected to have when the independent variables are zero or have no influence. (To use the more common mathematical meaning of model terms, you wrap them in I()). Intercepts are usually assumed so it is most common to see it in the context of explicitly stating a model without an intercept.
Here are two ways of specifying the same model for a linear regression model of y on x. The first has an implicit intercept term, and the second an explicit one:
y ~ x
y ~ 1 + x
Here are ways to give a linear regression of y on x through the origin (that is, without an intercept term):
y ~ 0 + x
y ~ -1 + x
y ~ x - 1
In the specific case you mention ( y ~ 1 ), y is being predicted by no other variable so the natural prediction is the mean of y, as Paul Hiemstra stated:
> data(city)
> r <- lm(x~1, data=city)
> r
Call:
lm(formula = x ~ 1, data = city)
Coefficients:
(Intercept)
97.3
> mean(city$x)
[1] 97.3
And removing the intercept with a -1 leaves you with nothing:
> r <- lm(x ~ -1, data=city)
> r
Call:
lm(formula = x ~ -1, data = city)
No coefficients
formula() is a function for extracting formula out of objects and its help file isn't the best place to read about specifying model formulae in R. I suggest you look at this explanation or Chapter 11 of An Introduction to R.
if your model were of the form y ~ x1 + x2 This (roughly speaking) represents:
y = β0 + β1(x1) + β2(x2)
Which is of course the same as
y = β0(1) + β1(x1) + β2(x2)
There is an implicit +1 in the above formula. So really, the formula above is y ~ 1 + x1 + x2
We could have a very simple formula, whereby y is not dependent on any other variable. This is the formula that you are referencing,
y ~ 1 which roughly would equate to
y = β0(1) = β0
As #Paul points out, when you solve the simple model, you get β0 = mean (y)
Here is an example
# Let's make a small sample data frame
dat <- data.frame(y= (-2):3, x=3:8)
# Create the linear model as above
simpleModel <- lm(y ~ 1, data=dat)
## COMPARE THE COEFFICIENTS OF THE MODEL TO THE MEAN(y)
simpleModel$coef
# (Intercept)
# 0.5
mean(dat$y)
# [1] 0.5
In general such a formula describes the relation between dependent and independent variables in the form of a linear model. The lefthand side are the dependent variables, the right hand side the independent. The independent variables are used to calculate the trend component of the linear model, the residuals are then assumed to have some kind of distribution. When the independent are equal to one ~ 1, the trend component is a single value, e.g. the mean value of the data, i.e. the linear model only has an intercept.

Resources