How to fit ordered logistic regression using svyglm()? - r

I am trying to fit an ordered logistic regression glm for weighted data using svyglm() from the survey library:
model <- svyglm(freehms ~ agea, design = wave9_design, family=binomial(link= "logit"))
freehms is numeric ranging 1 to 5 (I've tried setting it as a factor) and agea is numeric too. I have many more variables, but didn't include them here for simplicity.
But for some reason I get the following error message:
"Error in eval(family$initialize) : y values must be 0 <= y <= 1"
I have looked at online examples, tutorials, and I just can't find what I'm doing wrong. I don't understand why Rstudio insists my independent variable be binary when I have specified the link function (logit) to address this very problem.

You want the svyolr() function in the survey pacakge. Or the new svyVGAM package, which does a wide range of ordinal models. svyglm doesn't fit this model because it isn't a generalised linear model.
For example
library(survey)
data(api)
dclus2<-svydesign(id=~dnum+snum, fpc=~fpc1+fpc2, data=apiclus2)
dclus2<-update(dclus2, mealcat=as.ordered(cut(meals,c(0,25,50,75,100))))
svyolr(mealcat~avg.ed+mobility+stype, design=dclus2)
library(svyVGAM)
svy_vglm(mealcat~avg.ed+mobility+stype, design=dclus2, family=propodds())

Related

how to run a GLMM on R

I am trying to run a Generalized linear mixed model (GLMM) on r, I have two fixed factors and two random factors
however there are a lot of holes in my data set and the I am struggling to find a code to run the glmm all I found is the glm
Can someone please walk me through this, I know very little about R and coding
You can use the lme4 package as well. The command for a generalized linear mixed model is glmer().
Example:
install.packages("lme4") #If you still haven't done it.
library(lme4)
myfirstmodel <- glmer(variable_to_study ~ fixed_variable + (1|random_effect_varible), data = mydataset, family = poisson)
Family = poisson was just an example. Choose the 'family' according to the nature of the variable_to_study (eg. poisson for discrete data).

Logistic regression for numerical predictor?

I'm working on a data set and want to use some of following variables to predict "Operatieduur". All the predictors have been factorized.
LogicFit <- train(Operatieduur ~ Anesthesioloog + Aorta_chirurgie + Benadering +
Chirurg + Operatietype, data = TrainData,
method="glm", family="binomial")
Here I use "train" function from caret package to make a logistic fitting with glm. When I ran this code I got the error message:
1: model fit failed for Resample01: parameter=none Error in eval(family$initialize) : y values must be 0 <= y <= 1
I googled it and found that the reason is that the resopnse "Operatieduur" is a continuous numerical value(it's a duration time). So how should I modify the function to use the predictors(they are all categorical values) to predict a continuous numerical value? Can logistic function do that?
Logistic regression predicts categories, not numerical variables. If you want to predict a continuous numerical variable (even using categorical variables), use normal regression. Depending on the number of categories of your predictor variables, you may want to consider one hot/dummy encoding.

Calculating VIF for ordinal logistic regression & multicollinearity in R

I am running an ordinal regression model. I have 8 explanatory variables, 4 of them categorical ('0' or '1') , 4 of them continuous. Beforehand I want to be sure there's no multicollinearity, so I use the variance inflation factor (vif function from the car package) :
mod1<-polr(Y ~ X1+X2+X3+X4+X5+X6+X7+X8, Hess = T, data=df)
vif(mod1)
but I get a VIF value of 125 for one of the variables, as well as the following warning :
Warning message: In vif.default(mod1) : No intercept: vifs may not be sensible.
However, when I convert my dependent variable to numeric (instead of a factor), and do the same thing with a linear model :
mod2<-lm(Y ~ X1+X2+X3+X4+X5+X6+X7+X8, data=df)
vif(mod2)
This time all the VIF values are below 3, suggesting that there's no multicollinearity.
I am confused about the vif function. How can it return VIFs > 100 for one model and low VIFs for another ? Should I stick with the second result and still do an ordinal model anyway ?
The vif() function uses determinants of the correlation matrix of the parameters (and subsets thereof) to calculate the VIF. In the linear model, this includes just the regression coefficients (excluding the intercept). The vif() function wasn't intended to be used with ordered logit models. So, when it finds the variance-covariance matrix of the parameters, it includes the threshold parameters (i.e., intercepts), which would normally be excluded by the function in a linear model. This is why you get the warning you get - it doesn't know to look for threshold parameters and remove them. Since the VIF is really a function of inter-correlations in the design matrix (which doesn't depend on the dependent variable or the non-linear mapping from the linear predictor into the space of the response variable [i.e., the link function in a glm]), you should get the right answer with your second solution above, using lm() with a numeric version of your dependent variable.

Using lme4 glmer function for unbalanced treatment comparison results in variable length error

I am using the lme4 package to run a generalized linear mixed model for proportion data using a binary response. I have unequal sample sizes for my treatments and am getting the following error, which I understand is due to the very fact that I have unequal sample sizes:
Error in model.frame.default(data = POL3, drop.unused.levels = TRUE,
formula = X2 ~ : variable lengths differ (found for 'Trtmt')
Here is the code that leads to the error:
#Excluding NA from the data set
POL3<-na.exclude(POL)
#Indicating the binary response
X2<-cbind(POL3$CHSd, POL3$TotSd-POL3$CHSd)
#Running the model
MMCHS4<-glmer(X2~Trtmt+(1|BSD)+(1|Hgt), family=binomial, data=POL3)
I have read that lme4 can deal with unbalanced samples but can't get this to work.
Impossible to say for sure without a reproducible example, but you probably need to make sure that the Trtmt variable is contained within POL3 (i.e., that there isn't another Trtmt variable lying around in your global workspace).
I would probably implement the model in this way:
glmer(CHSd/TotSd~Trtmt+(1|BSD)+(1|Hgt),
weights=TotSd,
family=binomial,
na.action=na.exclude,
data=POL)

How do I plot predictions from new data fit with gee, lme, glmer, and gamm4 in R?

I have fit my discrete count data using a variety of functions for comparison. I fit a GEE model using geepack, a linear mixed effect model on the log(count) using lme (nlme), a GLMM using glmer (lme4), and a GAMM using gamm4 (gamm4) in R.
I am interested in comparing these models and would like to plot the expected (predicted) values for a new set of data (predictor variables). My goal is to compare the predicted effects for each model under particular conditions (x variables). Of particular interest is the comparison between marginal (GEE) and conditional estimates.
I think my main problem might be getting the new data in the correct form with the correct labels and attributes and such. I am still very much an R novice and struggle with this stuff (no course on this at my university unfortunately).
I currently have fitted models
gee1 lme1 lmer1 gamm1
and can extract their fixed effect coefficients and standard errors without a problem. I also don't have a problem converting them from the log scale or estimating confidence intervals accounting for the random effects.
I also have my new dataframe newdat which has 365 observations of 23 variables (average environmental data for each day of the year).
I am stuck on how to predict new count estimates from this. I played around with the model.matrix function but couldn't get it to work. For example, I tried:
mm = model.matrix(terms(glmm1), newdat) # Error in model.frame.default(object,
# data, xlev = xlev) : object is not a matrix
newdat$pcount = mm %*% fixef(glmm1)
Any suggestions or good references would be greatly appreciated. Can anyone help with the error above?
Getting predictions for lme() and lmer() is documented on http://glmm.wikidot.com/faq

Resources