I'm conducting lme analysis using on my dataset with the following code
M1 <- lme(VT ~ visit + sx + agevis + c_bmi + gpa + qa + BP + MH + ethn, data = Cleaned_data4t300919, random = ~ 1 + visit |id, corAR1(),method = "ML", na.action = na.omit(Cleaned_data4t300919))
and I get the following error message:
Error in model.frame.default(formula = ~visit + sx + agevis + c_bmi +
: attempt to apply non-function
I am not sure what I am doing wrong or how to get the model to run. I really appreciate an answer. Thank you.
I am trying to run a linear mixed effect model with VT as my dependent variable, visit as my time variable, with a 1st order autoregressive correlation, ML estimator on data with some missing observations.
I have tried changing the code in the following ways but got the same error message
library(nlme)
?lme
fm2 <- lme(VT ~ visit + sx + agevis + c_bmi + gpa + qa + BP + MH + ethn, data = Cleaned_data4t300919, random = ~ 1|id, corAR1(),method = "ML", na.action = na.pass(Cleaned_data4t300919))
fm2 <- lme(VT ~ visit + sx + agevis + c_bmi + gpa + qa + BP + sfnMH + ethn, data = Cleaned_data4t300919, random = ~ 1 + visit |cenid, corAR1(),method = "ML", na.action = na.omit(Cleaned_data4t300919))
fm2 <- lme(VT ~ visit + sx + agevis , data = Cleaned_data4t300919, random = ~ 1 + visit |id, corAR1(),method = "ML", na.action = na.omit(Cleaned_data4t300919))
fm2 <- lme(VT~visit + sx + agevis + c_bmi + gpa + qa + BP + MH + ethn, data = Cleaned_data4t300919, na.action = na.exclude(Cleaned_data4t300919))
fm2 <- lme(formula= sfnVT ~ visit + sx + agevis , data = Cleaned_data4t300919, random = ~ 1 + visit |cenid, corAR1(),method = "ML", na.action = na.omit(Cleaned_data4t300919))
I will like to obtain the estimates for the code and plot estimates using ggplot.
na.action = na.omit(Cleaned_data4t300919)
and similar attempts are the problem I think.
From ?lme:
na.action: a function that indicates what should happen when the data
contain 'NA's
You are providing data, not a function, since na.omit(dataset) returns a data.frame with NA containing rows removed, rather than something that can be applied to the data= specified. Just:
na.action=na.omit
or similar na.* functions will be sufficient.
A way to identify these kinds of issues for sure is to use ?debug - debug(lme) then step through the function line-by-line to see exactly what the error is in response to.
Related
In R using GLM to include all variables you can simply use a . as shown How to succinctly write a formula with many variables from a data frame?
for example:
y <- c(1,4,6)
d <- data.frame(y = y, x1 = c(4,-1,3), x2 = c(3,9,8), x3 = c(4,-4,-2))
mod <- lm(y ~ ., data = d)
however I am struggling to do this with svydesign. I have many exploratory variables and an ID and weight variable, so first I create my survey design:
des <-svydesign(ids=~id, weights=~wt, data = df)
Then I try creating my binomial model using weights:
binom <- svyglm(y~.,design = des, family="binomial")
But I get the error:
Error in svyglm.survey.design(y ~ ., design = des, family = "binomial") :
all variables must be in design = argument
What am I doing wrong?
You typically wouldn't want to do this, because "all the variables" would include design metadata such as weights, cluster indicators, stratum indicators, etc
You can use col.names to extract all the variable names from a design object and then reformulate, probably after subsetting the names, eg with the api example in the package
> all_the_names <- colnames(dclus1)
> all_the_actual_variables <- all_the_names[c(2, 11:37)]
> reformulate(all_the_actual_variables,"y")
y ~ stype + pcttest + api00 + api99 + target + growth + sch.wide +
comp.imp + both + awards + meals + ell + yr.rnd + mobility +
acs.k3 + acs.46 + acs.core + pct.resp + not.hsg + hsg + some.col +
col.grad + grad.sch + avg.ed + full + emer + enroll + api.stu
I am running a series of mediation analyses using R's mediation package. Because the models are extremely similar to each other I wrote a function where all
that would change would be the mediating variable, the outcome variable, and the data set. The function is below:
library(mediation)
data("framing", package = "mediation")
covList <- list("age", "educ", "gender", "income")
meBrokenFunction <- function(MEDIATOR, OUTCOME, DATA) {
treatOnMed <- lm(DATA[[MEDIATOR]] ~ treat + age + educ + gender + income, data = DATA)
medOnOut <- glm(DATA[[OUTCOME]] ~ DATA[[MEDIATOR]] + treat + age + educ + gender + income, data = DATA, family = binomial("probit"))
expt <- mediate(treatOnMed, medOnOut, sims = 100,
treat = "treat", mediator = MEDIATOR,
covariates = covList, robustSE = TRUE)
expt
}
set.seed(2019)
test_first <- meBrokenFunction("emo", "cong_mesg", framing)
When I run this function I get the following error:
Error in `[.data.frame`(y.data, , mediator) : undefined columns selected
However if I run the code without using the function I wrote, everything works as intended.
test_treatOnMed <- lm(emo ~ treat + age + educ + gender + income,
data = framing)
test_treatOnOut <- glm(cong_mesg ~ treat + age + educ + gender + income,
data = framing, family = binomial("probit"))
test_medOnOut <- glm(cong_mesg ~ emo + treat + age + educ + gender + income,
data = framing, family = binomial("probit"))
test_second <- mediate(test_treatOnMed, test_medOnOut, sims = 100,
treat = "treat", mediator = "emo",
covariates = covList, robustSE = TRUE)
The error appears to be in the mediate function, specifically at mediator = MEDIATOR but I do not understand why it is not working or if I am approaching the problem incorrectly.
In the formula, we may need paste instead of DATA[[MEDIATOR]]
lm(paste(MEDIATOR, "~ treat + age + educ + gender + income"), data = DATA)
Similarly for the glm
-fullcode
meFixedFunction <- function(MEDIATOR, OUTCOME, DATA) {
treatOnMed <- lm(paste(MEDIATOR,
"~ treat + age + educ + gender + income"), data = DATA)
medOnOut <- glm(paste(OUTCOME, "~", MEDIATOR,
"+ treat + age + educ + gender + income"), data = DATA,
family = binomial("probit"))
expt <- mediate(treatOnMed, medOnOut, sims = 100,
treat = "treat", mediator = MEDIATOR,
covariates = covList, robustSE = TRUE)
expt
}
-testing
set.seed(2019)
test_first <- meFixedFunction("emo", "cong_mesg", framing)
I'm wondering if there is essentially a faster way of getting predictions from a regression model for certain values of the covariates without manually specifying the formulation. For example, if I wanted to get a prediction for a given dependent variable at means of the covariates, I can do something like this:
glm(ins ~ retire + age + hstatusg + qhhinc2 + educyear + married + hisp,
family = binomial, data = dat)
meanRetire <- mean(dat$retire)
meanAge <- mean(dat$age)
meanHStatusG <- mean(dat$hStatusG)
meanQhhinc2 <- mean(dat$qhhinc2)
meanEducyear <- mean(dat$educyear)
meanMarried <- mean(dat$married)
meanYear <- mean(dat$year)
ins_predict <- coef(r_3)[1] + coef(r_3)[2] * meanRetire + coef(r_3)[3] * meanAge +
coef(r_3)[4] * meanHStatusG + coef(r_3)[5] * meanQhhinc2 +
coef(r_3)[6] * meanEducyear + coef(r_3)[7] * meanMarried +
coef(r_3)[7] * meanHisp
Oh... There is a predict function:
fit <- glm(ins ~ retire + age + hstatusg + qhhinc2 + educyear + married + hisp,
family = binomial, data = dat)
newdat <- lapply(dat, mean) ## column means
lppred <- predict(fit, newdata = newdat) ## prediction of linear predictor
To get predicted response, use:
predict(fit, newdata = newdat, type = "response")
or (more efficiently from lppred):
binomial()$linkinv(lppred)
I've been trying to look at the explanatory power of individual variables in a model by holding other variables constant at their sample mean.
However, I am unable to do something like:
Temperature = alpha + Beta1*RFGG + Beta2*RFSOx + Beta3*RFSolar
where Beta1=Beta2=Beta3 -- something like
Temperature = alpha + Beta1*(RFGG + RFSolar + RFSOx)
I want to do this so I can compare the difference in explanatory power (R^2/size of residuals) when one independent variable is not held at the sample mean while the rest are.
Temperature = alpha + Beta1*(RFGG + meanRFSolar + meanRFSOx)
or
Temperature = alpha + Beta1*RFGG + Beta1*meanRFSolar + Beta1*meanRFSOx
However, the lm function seems to estimate its own coefficients so I don't know how I can hold anything constant.
Here's some ugly code I tried throwing together that I know reeks of wrongness:
# fixing a new clean matrix for my data
dat = cbind(dat[,1:2],dat[,4:6]) # contains 162 rows of: Date, Temp, RFGG, RFSolar, RFSOx
# make a bunch of sample mean independent variables to use
meandat = dat[,3:5]
meandat$RFGG = mean(dat$RFGG)
meandat$RFSolar = mean(dat$RFSolar)
meandat$RFSOx = mean(dat$RFSOx)
RFTotal = dat$RFGG + dat$RFSOx + dat$RFSolar
B = coef(lm(dat$Temp ~ 1 + RFTot)) # trying to save the coefficients to use them...
B1 = c(rep(B[1],length = length(dat[,1])))
B2 = c(rep(B[2],length = length(dat[,1])))
summary(lm(dat$Temp ~ B1 + B2*dat$RFGG:meandat$RFSOx:meandat$RFSolar)) # failure
summary(lm(dat$Temp ~ B1 + B2*RFTot))
Thanks for taking a look to whoever sees this and please ask me any questions.
Thank you both of you, it was a combination of eliminating the intercept with (-1) and the offset function.
a = lm(Temp ~ I(RFGG + RFSOx + RFSolar),data = dat)
beta1hat = rep(coef(a)[1],length=length(dat[,1]))
beta2hat = rep(coef(a)[2],length=length(dat[,1]))
b = lm(Temp ~ -1 + offset(beta1hat) + offset(beta2hat*(RFGG + RFSOx_bar + RFSolar_bar)),data = dat)
c = lm(Temp ~ -1 + offset(beta1hat) + offset(beta2hat*(RFGG_bar + RFSOx + RFSolar_bar)),data = dat)
d = lm(Temp ~ -1 + offset(beta1hat) + offset(beta2hat*(RFGG_bar + RFSOx_bar + RFSolar)),data = dat)
I use the example from the sampleSelection package
## Greene( 2003 ): example 22.8, page 786
data( Mroz87 )
Mroz87$kids <- ( Mroz87$kids5 + Mroz87$kids618 > 0 )
# Two-step estimation
test1 = heckit( lfp ~ age + I( age^2 ) + faminc + kids + educ,
wage ~ exper + I( exper^2 ) + educ + city, Mroz87 )
# ML estimation
test2 = selection( lfp ~ age + I( age^2 ) + faminc + kids + educ,
wage ~ exper + I( exper^2 ) + educ + city, Mroz87 )
pr2 <- predict(test2,Mroz87)
pr1 <- predict(test1,Mroz87)
My problem is that the predict function does not work. I get this error:
Error in UseMethod("predict") :
no applicable method for 'predict' applied to an object of class "c('selection', 'maxLik', 'maxim', 'list')"
The predict function works for many models so I wonder why I get an error for heckman regression models.
-----------UPDATE-----------
I made some progress but I still need your help. I build an original heckman model for comparsion:
data( Mroz87 )
Mroz87$kids <- ( Mroz87$kids5 + Mroz87$kids618 > 0 )
test1 = heckit( lfp ~ age + I( age^2 ) + faminc + kids + educ,
wage ~ exper + I( exper^2 ) + educ + city, Mroz87[1:600,] )
After that I start building it on my own. Heckman model requires a selection equation:
zi* = wi γ + ui
where zi =1 if zi* >0 and zi = 0 if zi* <=0
after you calculate yi = xi*beta +ei ONLY for the cases where zi*>0
I build the probit model first:
library(MASS)
#probit1 = probit(lfp ~ age + I( age^2 ) + faminc + kids + educ, Mroz87, x = TRUE, print.level = print.level - 1, iterlim = 30)
myprobit <- glm(lfp ~ age + I( age^2 ) + faminc + kids + educ, family = binomial(link = "probit"),
data = Mroz87[1:600,])
summary(myprobit)
The model is exactly the same just as with the heckit command.
Then I build a lm model:
#get predictions for the variables (the data is not needed but I specify it anyway)
selectvar <- predict(myprobit,data = Mroz87[1:600,])
#bind the prediction to the table (I build a new one in my case)
newdata = cbind(Mroz87[1:600,],selectvar)
#Build an lm model for the subset where zi>0
lm1 = lm(wage ~ exper + I( exper^2 ) + educ + city , newdata, subset = selectvar > 0)
summary(lm1)
My issue now is that the lm model does not much the one created by heckit. I have no idea why. Any ideas?
Implementation
Here is an implementation of the predict.selection function -- it produces 4 different types of predictions (which are explained here):
library(Formula)
library(sampleSelection)
predict.selection = function(objSelection, dfPred,
type = c('link', 'prob', 'cond', 'uncond')) {
# construct the Formula object
tempS = evalq(objSelection$call$selection)
tempO = evalq(objSelection$call$outcome)
FormHeck = as.Formula(paste0(tempO[2], '|', tempS[2], '~', tempO[3], '|', tempS[3]))
# regressor matrix for the selection equation
mXSelection = model.matrix(FormHeck, data = dfPred, rhs = 2)
# regressor matrix for the outcome equation
mXOutcome = model.matrix(FormHeck, data = dfPred, rhs = 1)
# indices of the various parameters in selectionObject$estimate
vIndexBetaS = objSelection$param$index$betaS
vIndexBetaO = objSelection$param$index$betaO
vIndexErr = objSelection$param$index$errTerms
# get the estimates
vBetaS = objSelection$estimate[vIndexBetaS]
vBetaO = objSelection$estimate[vIndexBetaO]
dLambda = objSelection$estimate[vIndexErr['rho']]*
objSelection$estimate[vIndexErr['sigma']]
# depending on the type of prediction requested, return
# TODO allow the return of multiple prediction types
pred = switch(type,
link = mXSelection %*% vBetaS,
prob = pnorm(mXSelection %*% vBetaS),
uncond = mXOutcome %*% vBetaO,
cond = mXOutcome %*% vBetaO +
dnorm(temp <- mXSelection %*% vBetaS)/pnorm(temp) * dLambda)
return(pred)
}
Test
Suppose you estimate the following Heckman sample selection model using MLE:
data(Mroz87)
# define a new variable
Mroz87$kids = (Mroz87$kids5 + Mroz87$kids618 > 0)
# create the estimation sample
Mroz87Est = Mroz87[1:600, ]
# create the hold out sample
Mroz87Holdout = Mroz87[601:nrow(Mroz87), ]
# estimate the model using MLE
heckML = selection(selection = lfp ~ age + I(age^2) + faminc + kids + educ,
outcome = wage ~ exper + I(exper^2) + educ + city, data = Mroz87Est)
summary(heckML)
The different types of predictions are computed as below:
vProb = predict(objSelection = heckML, dfPred = Mroz87Holdout, type = 'prob')
vLink = predict(objSelection = heckML, dfPred = Mroz87Holdout, type = 'link')
vCond = predict(objSelection = heckML, dfPred = Mroz87Holdout, type = 'cond')
vUncond = predict(objSelection = heckML, dfPred = Mroz87Holdout, type = 'uncond')
You can verify these computation on a platform that produces these outputs, such as Stata.