How to add weights parameter to Generalized Mixed Model

How to add weights parameter to Generalized Mixed Model - julia

How do you add the weight of an observation to a Mixed Model?
I thought I could add the Freq column to wt argument, but apparently not.
using RDatasets MixedModels
titanic = RDatasets.dataset("datasets", "Titanic")
titanic.surv_flg = titanic.Survived .== "Yes";
This runs:
MixedModels.fit(GeneralizedLinearMixedModel, #formula(surv_flg ~ 1 + Age + Sex + (1 | Class)), titanic, Bernoulli(), nAGQ = 2, fast = true)
But this doesn't
MixedModels.fit(GeneralizedLinearMixedModel, #formula(surv_flg ~ 1 + Age * Sex + (1 | Class)), titanic, wt = Freq, Bernoulli(), nAGQ = 2, fast = true)

I found this out on another forum.
the parameter should be wts not wt.
So it should be:
MixedModels.fit(GeneralizedLinearMixedModel, #formula(surv_flg ~ 1 + Age * Sex + (1 | Class)), titanic, wts = Freq, Bernoulli(), nAGQ = 2)

Related

Shortening the formula syntax of a regression model

I was wondering if the syntax of the regression model below could be made more concise (shorter) than it currently is?
dat <- read.csv('https://raw.githubusercontent.com/rnorouzian/v/main/bv1.csv')
library(nlme)
model <- lme(achieve ~ 0 + D1 + D2+
D1:time + D2:time+
D1:schcontext + D2:schcontext +
D1:female + D2:female+
D1:I(female*time) + D2:I(female*time)+
D1:I(schcontext*time) + D2:I(schcontext*time), correlation = corSymm(),
random = ~0 + D1:time | schcode/id, data = dat, weights = varIdent(form = ~1|factor(math)),
na.action = na.omit, control = lmeControl(maxIter = 200, msMaxIter = 200, niterEM = 50,
msMaxEval = 400))
coef(summary(model))

Focusing on the fixed-effect component only.
Original formula:
form1 <- ~ 0 + D1 + D2+
D1:time + D2:time+
D1:schcontext + D2:schcontext +
D1:female + D2:female+
D1:I(female*time) + D2:I(female*time)+
D1:I(schcontext*time) + D2:I(schcontext*time)
X1 <- model.matrix(form1, data=dat)
I think this is equivalent
form2 <- ~0 +
D1 + D2 +
(D1+D2):(time + schcontext + female + female:time+schcontext:time)
X2 <- model.matrix(form2, data=dat)
(Unfortunately ~ 0 + (D1 + D2):(1 + time + ...) doesn't work as I would have liked/expected.)
For a start, the model matrix has the right dimensions. Staring at the column names of the model matrices and reordering the columns manually:
X2o <- X2[,c(1:3,6,4,7,5,8,9,11,10,12)]
all.equal(c(X1),c(X2o)) ##TRUE
(For numerical predictors, you don't need I(A*B): A:B is equivalent.)
Actually you can do a little better using the * operator
form3 <- ~0 +
D1 + D2 +
(D1+D2):(time*(schcontext+female))
X3 <- model.matrix(form3, data=dat)
X3o <- X3[,c(1:3,6,4,7,5,8,10,12,9,11)]
all.equal(c(X1),c(X3o)) ## TRUE
Compare formula length:
sapply(list(form1,form2,form3),
function(x) nchar(as.character(x)[[2]]))
## [1] 183 84 54

put all variables in a regression

I want reduce the expression in r code
model1 <- pglm::pglm(formula = lfp ~ lfp_1+lfp1+ kids + *kids2 + kids3 + kids4 + kids5+ lhinc + lhinc2 + lhinc3 +lhinc4 + lhinc5 +educ+ black + age + agesq + per2+ per3 + per4+ per5,
family = binomial("probit"),
data = lfp1,
model = "random")
on stata will put kids2 - kids5 and list the variables kids from 2 to 5 in the regression.
Same to lhinc2-lhinc5 and to per2 - per5

Try this one:
model1 <- pglm::pglm(formula = lfp ~.,
family = binomial("probit"),
data = lfp1,
model = "random")

Plotting Panel data Mixed Effect model with Random and Fixed models

I am working on panel data models and I am now using Mixed model from lme4 package, I also Used model basen on random, fixed, LSDV, Fisrt_diff, etc...
I have a function that plot all models coeffs. in ggplot, however plotting coefficients from lme4 is an issue I can make it work:
Is there a way hot to make below code work for all model, including also model mixed?
library(plm)
library(lme4)
library(ggplot2)
mixed <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy)
fixed = plm(Reaction ~ Days, data = sleepstudy, index = c("Subject", "Days"), model = "within")
random = plm(Reaction ~ Days, data = sleepstudy, index = c("Subject", "Days"), model = "random")
pool = plm(Reaction ~ Days, data = sleepstudy, index = c("Subject", "Days"), model = "pooling")
first_diff = plm(Reaction ~ Days, data = sleepstudy, index = c("Subject", "Days"), model = "fd")
# Function to extract point estimates
ce <- function(model.obj) {
extract <- summary(get(model.obj))$coefficients[2:nrow(summary(get(model.obj))$coefficients), 1:2]
return(data.frame(extract, vars = row.names(extract), model = model.obj))
}
# Run function on the three models and bind into single data frame
coefs <- do.call(rbind, sapply(paste0(list(
"fixed", "random", "pool", "first_diff"
)), ce, simplify = FALSE))
names(coefs)[2] <- "se"
gg_coef <- ggplot(coefs, aes(vars, Estimate)) +
geom_hline(yintercept = 0, lty = 1, lwd = 0.5, colour = "red") +
geom_errorbar(aes(ymin = Estimate - se, ymax = Estimate + se, colour = vars),
lwd = 1, width = 0
) +
geom_point(size = 3, aes(colour = vars)) +
facet_grid(model ~ ., scales="free") +
coord_flip() +
guides(colour = FALSE) +
labs(x = "Coefficient", y = "Value") +
ggtitle("Raw models coefficients")
gg_coef

The error you have with the current code, is that
data(sleepstudy)
mixed <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy)
coefficients(summary(mixed))
Estimate Std. Error t value
(Intercept) 251.40510 6.823773 36.842535
Days 10.46729 1.545958 6.770744
Days is numeric in the sleepstudy dataset and used a continuous predictor. Using your ce function, this returns an error because the row names are dropped, with 2:nrow(..).
To get similar estimates to your other models, set Days to factor and random effect to (1|Day). I don't think (Days | Subject) make sense.
sleepstudy$Days = factor(sleepstudy$Days)
mixed <- lmer(Reaction ~ Days + (1 | Subject), sleepstudy)
and we alter your ce code slightly, using drop=FALSE,to prevent the empty row.names
ce <- function(model.obj) {
summ.model <- summary(get(model.obj))$coefficients
extract <- summ.model[2:nrow(summ.model),drop=FALSE, 1:2]
return(data.frame(extract, vars = row.names(extract), model = model.obj))
}
coefs <- do.call(rbind, sapply(paste0(list(
"fixed", "random", "pool", "first_diff","mixed"
)), ce, simplify = FALSE))
names(coefs)[2] <- "se"
run the rest of what you have:
gg_coef <- ggplot(coefs, aes(vars, Estimate)) +
geom_hline(yintercept = 0, lty = 1, lwd = 0.5, colour = "red") +
geom_errorbar(aes(ymin = Estimate - se, ymax = Estimate + se, colour = vars),
lwd = 1, width = 0
) +
geom_point(size = 3, aes(colour = vars)) +
facet_grid(model ~ ., scales="free") +
coord_flip() +
guides(colour = FALSE) +
labs(x = "Coefficient", y = "Value") +
ggtitle("Raw models coefficients")
gg_coef

R: Undefined column error when using mediate in a user made function

I am running a series of mediation analyses using R's mediation package. Because the models are extremely similar to each other I wrote a function where all
that would change would be the mediating variable, the outcome variable, and the data set. The function is below:
library(mediation)
data("framing", package = "mediation")
covList <- list("age", "educ", "gender", "income")
meBrokenFunction <- function(MEDIATOR, OUTCOME, DATA) {
treatOnMed <- lm(DATA[[MEDIATOR]] ~ treat + age + educ + gender + income, data = DATA)
medOnOut <- glm(DATA[[OUTCOME]] ~ DATA[[MEDIATOR]] + treat + age + educ + gender + income, data = DATA, family = binomial("probit"))
expt <- mediate(treatOnMed, medOnOut, sims = 100,
treat = "treat", mediator = MEDIATOR,
covariates = covList, robustSE = TRUE)
expt
}
set.seed(2019)
test_first <- meBrokenFunction("emo", "cong_mesg", framing)
When I run this function I get the following error:
Error in `[.data.frame`(y.data, , mediator) : undefined columns selected
However if I run the code without using the function I wrote, everything works as intended.
test_treatOnMed <- lm(emo ~ treat + age + educ + gender + income,
data = framing)
test_treatOnOut <- glm(cong_mesg ~ treat + age + educ + gender + income,
data = framing, family = binomial("probit"))
test_medOnOut <- glm(cong_mesg ~ emo + treat + age + educ + gender + income,
data = framing, family = binomial("probit"))
test_second <- mediate(test_treatOnMed, test_medOnOut, sims = 100,
treat = "treat", mediator = "emo",
covariates = covList, robustSE = TRUE)
The error appears to be in the mediate function, specifically at mediator = MEDIATOR but I do not understand why it is not working or if I am approaching the problem incorrectly.

In the formula, we may need paste instead of DATA[[MEDIATOR]]
lm(paste(MEDIATOR, "~ treat + age + educ + gender + income"), data = DATA)
Similarly for the glm
-fullcode
meFixedFunction <- function(MEDIATOR, OUTCOME, DATA) {
treatOnMed <- lm(paste(MEDIATOR,
"~ treat + age + educ + gender + income"), data = DATA)
medOnOut <- glm(paste(OUTCOME, "~", MEDIATOR,
"+ treat + age + educ + gender + income"), data = DATA,
family = binomial("probit"))
expt <- mediate(treatOnMed, medOnOut, sims = 100,
treat = "treat", mediator = MEDIATOR,
covariates = covList, robustSE = TRUE)
expt
}
-testing
set.seed(2019)
test_first <- meFixedFunction("emo", "cong_mesg", framing)

Predict function for heckman model

I use the example from the sampleSelection package
## Greene( 2003 ): example 22.8, page 786
data( Mroz87 )
Mroz87$kids <- ( Mroz87$kids5 + Mroz87$kids618 > 0 )
# Two-step estimation
test1 = heckit( lfp ~ age + I( age^2 ) + faminc + kids + educ,
wage ~ exper + I( exper^2 ) + educ + city, Mroz87 )
# ML estimation
test2 = selection( lfp ~ age + I( age^2 ) + faminc + kids + educ,
wage ~ exper + I( exper^2 ) + educ + city, Mroz87 )
pr2 <- predict(test2,Mroz87)
pr1 <- predict(test1,Mroz87)
My problem is that the predict function does not work. I get this error:
Error in UseMethod("predict") :
no applicable method for 'predict' applied to an object of class "c('selection', 'maxLik', 'maxim', 'list')"
The predict function works for many models so I wonder why I get an error for heckman regression models.
-----------UPDATE-----------
I made some progress but I still need your help. I build an original heckman model for comparsion:
data( Mroz87 )
Mroz87$kids <- ( Mroz87$kids5 + Mroz87$kids618 > 0 )
test1 = heckit( lfp ~ age + I( age^2 ) + faminc + kids + educ,
wage ~ exper + I( exper^2 ) + educ + city, Mroz87[1:600,] )
After that I start building it on my own. Heckman model requires a selection equation:
zi* = wi γ + ui
where zi =1 if zi* >0 and zi = 0 if zi* <=0
after you calculate yi = xi*beta +ei ONLY for the cases where zi*>0
I build the probit model first:
library(MASS)
#probit1 = probit(lfp ~ age + I( age^2 ) + faminc + kids + educ, Mroz87, x = TRUE, print.level = print.level - 1, iterlim = 30)
myprobit <- glm(lfp ~ age + I( age^2 ) + faminc + kids + educ, family = binomial(link = "probit"),
data = Mroz87[1:600,])
summary(myprobit)
The model is exactly the same just as with the heckit command.
Then I build a lm model:
#get predictions for the variables (the data is not needed but I specify it anyway)
selectvar <- predict(myprobit,data = Mroz87[1:600,])
#bind the prediction to the table (I build a new one in my case)
newdata = cbind(Mroz87[1:600,],selectvar)
#Build an lm model for the subset where zi>0
lm1 = lm(wage ~ exper + I( exper^2 ) + educ + city , newdata, subset = selectvar > 0)
summary(lm1)
My issue now is that the lm model does not much the one created by heckit. I have no idea why. Any ideas?

Implementation
Here is an implementation of the predict.selection function -- it produces 4 different types of predictions (which are explained here):
library(Formula)
library(sampleSelection)
predict.selection = function(objSelection, dfPred,
type = c('link', 'prob', 'cond', 'uncond')) {
# construct the Formula object
tempS = evalq(objSelection$call$selection)
tempO = evalq(objSelection$call$outcome)
FormHeck = as.Formula(paste0(tempO[2], '|', tempS[2], '~', tempO[3], '|', tempS[3]))
# regressor matrix for the selection equation
mXSelection = model.matrix(FormHeck, data = dfPred, rhs = 2)
# regressor matrix for the outcome equation
mXOutcome = model.matrix(FormHeck, data = dfPred, rhs = 1)
# indices of the various parameters in selectionObject$estimate
vIndexBetaS = objSelection$param$index$betaS
vIndexBetaO = objSelection$param$index$betaO
vIndexErr = objSelection$param$index$errTerms
# get the estimates
vBetaS = objSelection$estimate[vIndexBetaS]
vBetaO = objSelection$estimate[vIndexBetaO]
dLambda = objSelection$estimate[vIndexErr['rho']]*
objSelection$estimate[vIndexErr['sigma']]
# depending on the type of prediction requested, return
# TODO allow the return of multiple prediction types
pred = switch(type,
link = mXSelection %*% vBetaS,
prob = pnorm(mXSelection %*% vBetaS),
uncond = mXOutcome %*% vBetaO,
cond = mXOutcome %*% vBetaO +
dnorm(temp <- mXSelection %*% vBetaS)/pnorm(temp) * dLambda)
return(pred)
}
Test
Suppose you estimate the following Heckman sample selection model using MLE:
data(Mroz87)
# define a new variable
Mroz87$kids = (Mroz87$kids5 + Mroz87$kids618 > 0)
# create the estimation sample
Mroz87Est = Mroz87[1:600, ]
# create the hold out sample
Mroz87Holdout = Mroz87[601:nrow(Mroz87), ]
# estimate the model using MLE
heckML = selection(selection = lfp ~ age + I(age^2) + faminc + kids + educ,
outcome = wage ~ exper + I(exper^2) + educ + city, data = Mroz87Est)
summary(heckML)
The different types of predictions are computed as below:
vProb = predict(objSelection = heckML, dfPred = Mroz87Holdout, type = 'prob')
vLink = predict(objSelection = heckML, dfPred = Mroz87Holdout, type = 'link')
vCond = predict(objSelection = heckML, dfPred = Mroz87Holdout, type = 'cond')
vUncond = predict(objSelection = heckML, dfPred = Mroz87Holdout, type = 'uncond')
You can verify these computation on a platform that produces these outputs, such as Stata.