glm model with normal distribution and log link - r

In a university assignment I have a negative intercept in my lm regression, which makes my model hard to interpret. My supervisor advised me to try glm model a normal distribution and a log link in R to force my model to have a non negative intercept.
I have tried the following two ways with the glm and gamlss functions:
model1 <- glm(y ~ x, family = gaussian(link = "log"))
model2 <- gamlss(y ~ x, family=NO, data=df, mu.link= "log")
Both of them keep giving negative intercepts. Does anyone have an idea what I do wrong?

Related

Using an exponential distribution in a predictive glm model in R

I am using a glm model to predict my depending variable. For that I need to choose a family of the distribution of my variable. unfortunately the exponential distribution is not part of the available objects of the argument "family".
Now I don't know on how I can proceed with my research.
This is my model. Does anyone have an idea, what I can do?
model<-train(duration~., data = data, method='glm', family= ???, trControl =trainControl(method = "repeatedcv", repeats = 10)
The exponential distribution is in the gamma family with dispersion parameter fixed at 1, so family = Gamma(link="log") should provide you with what you need. When interpreting significance or standard error of the fitted coefficients assuming exponential, you specify the dispersion.
Since your example wasn't reproducible, an example using glm and summary is:
mdl <- glm(formula = duration ~., family = Gamma(link="log"), data = your_data)
summary(mdl, dispersion = 1)

How can I incorporate a categorical variable with ~200 levels in a nonlinear mixed effects model in R?

I am trying to fit a nonlinear mixed effects model with a categorical variable genotype that has around 200 levels.
So this is the linear version of the model.
mlinear <- lmer(WUE ~ moisture * genotype + (1|pot), data = d8)
Now I'm trying to make the same model, but with a logistic function instead of linear
mlogistic <- nlme(WUE ~ SSlogis(moisture, Asym, xmid, scal), data = d8, fixed = Asym + xmid + scal ~ 1, random = Asym + xmid + scal~1|pot)
Problem is, now I don't know how to incorporate genotype into this nonlinear model. Asym, xmid, and scal parameters should be able to vary between each genotype. Anyone know how to do this?
It looks like you’re using lme4::lmer for your linear model and nlme::nlme for the logistic? If you use lme4 for both, you should be able to keep the same model specification, and use lme4::glmer with family = binomial for the logistic model. The whole idea behind a GLM with a link function is that you shouldn’t have to do anything different with your predictor vs a linear model, as the link function takes care of that.
library(lme4)
mlinear <- lmer(WUE ~ moisture * genotype + (1|pot), data = d8)
mlogistic <- glmer(WUE ~ moisture * genotype + (1|pot), family = binomial, data = d8)
All that being said, how is WUE measured? You probably want to use either a logistic model (if binary) or linear (if continuous), not both.
Just to add to the answer by #zephryl, who has explained how you can do this, I would like to focus on:
Since my WUE responses are almost alway curved, I'm trying to fit a logistic function instead of a linear one.
Fitting a logistic model does not really make sense here. The distribution of WUE is not relevant. It is the conditional distribution that matters, and we typically assess this by inspecting the residuals. If the residuals show a nonlinear pattern then there are several ways to model this including
transformations
nonlinear terms
splines

Cannot get adjusted means for glmer using lsmeans

I have a glm that I would like to get adjusted means for using lsmeans. The following code makes the model (and seems to be doing it correctly):
library(lmerTest)
data$group <- as.factor(data$grp)
data$site <- as.factor(data$site)
data$stimulus <- as.factor(data$stimulus)
data.acc1 = glmer(accuracy ~ site + grp*stimulus + (1|ID), data=data, family=binomial)
However, using when I try to use any of the below code to get adjusted means for the model, I get the error
Error in lsmeansLT(model, test.effs = test.effs, ddf = ddf) :
The model is not linear mixed effects model.
lsmeans(data.acc1, "stimulus")
or
data.lsm <- lsmeans(data.acc1, accuracy ~ stimulus ~ grp)
pairs(data.lsm)
Any suggestiongs?
The problem is that you have created a generalised linear mixed model using glmer() (in this case a mixed logistic regression model) not a linear mixed model using lmer(). The lsmeans() function does not accept objects created by glmer() because they are not linear mixed models.
Answers in this post might help: I can't get lsmeans output in glmer
And this post might be useful if you want to understand/compute marginal effects for mixed GLMs: Is there a way of getting "marginal effects" from a `glmer` object

Performing VIF Test in R

I started using R studio a few days ago and I am struggling a bit to compute a VIF. Here is the situation:
I have a panel data and ran Fixed effect and Random effect regressions. I have one Dependent Variable (New_biz_density) and 2 Independent variables (Cost_to_start, Capital_requirements). I would like to check if my two independent variables present multicollinearity by computing their Variance Inflation Factor, both for Fixed and Random effect models.
I already installed some packages to perform the VIF (Faraway, Car) but did not manage to do it. Does anybody know how to do it?
Here is my script:
# install.packages("plm")
library(plm)
mydata<- read.csv("/Users/juliantabone/Downloads/DATAweakoutliers.csv")
Y <- cbind(new_biz_density)
X <- cbind(capital_requirements, cost_to_start)
# Set data as panel data
pdata <- plm.data(mydata, index=c("country_code","year"))
# Descriptive statistics
summary(Y)
summary(X)
# Pooled OLS estimator
pooling <- plm(Y ~ X, data=pdata, model= "pooling")
summary(pooling)
# Between estimator
between <- plm(Y ~ X, data=pdata, model= "between")
summary(between)
# First differences estimator
firstdiff <- plm(Y ~ X, data=pdata, model= "fd")
summary(firstdiff)
# Fixed effects or within estimator
fixed <- plm(Y ~ X, data=pdata, model= "within")
summary(fixed)
# Random effects estimator
random <- plm(Y ~ X, data=pdata, model= "random")
summary(random)
# LM test for random effects versus OLS
plmtest(pooling)
# LM test for fixed effects versus OLS
pFtest(fixed, pooling)
# Hausman test for fixed versus random effects model
phtest(random, fixed)
There seem to be two popular ways of calculating VIFs (Variance Inflation Factors, to detect collinearity among variables in regression) in R:
The vif() function in the car package, where the input is the model. This requires you to first fit a model before you can check for VIFs among variables in the model.
The corvif() function, where the input are the actual candidate explanatory variables (i.e. a list of variables, before the model is even fitted). This function is part of the AED package (Zuur et al. 2009), which has been discontinued. This one seems to work only on a list of variables, not on a fitted regression model.
To compute VIF refer function non_collinear_vars from metan package.
To compute the VIF between two variables Cost_to_start & Capital_requirements use,
library(metan)
non_collinear_vars(X)

Stata's xtlogit (fe, re) equivalent in R?

Stata allows for fixed effects and random effects specification of the logistic regression through the xtlogit fe and xtlogit re commands accordingly. I was wondering what are the equivalent commands for these specifications in R.
The only similar specification I am aware of is the mixed effects logistic regression
mymixedlogit <- glmer(y ~ x1 + x2 + x3 + (1 | x4), data = d, family = binomial)
but I am not sure whether this maps to any of the aforementioned commands.
The glmer command is used to quickly fit logistic regression models with varying intercepts and varying slopes (or, equivalently, a mixed model with fixed and random effects).
To fit a varying intercept multilevel logistic regression model in R (that is, a random effects logistic regression model), you can run the following using the in-built "mtcars" data set:
data(mtcars)
head(mtcars)
m <- glmer(mtcars$am ~ 1 + mtcars$wt + (1|mtcars$gear), family="binomial")
summary(m)
# and you can examine the fixed and random effects
fixef(m); ranef(m)
To fit a varying-intercept slope model in Stata, you of course use the xtlogit command (using the similar but not identical in-built "auto" data set in Stata):
sysuse auto
xtset gear_ratio
xtlogit foreign weight, re
I'll add that I find the entire reference to "fixed" versus "random" effects ambiguous, and I prefer to refer to the structure of the model itself (e.g., are the intercepts varying? which slopes are varying, if any? is the model nested in 2 levels or more? are the levels cross-classified or not?). For a similar view, see Andrew Gelman's thoughts on "fixed" versus "random" effects.
Update: Ben Bolker's excellent comment below points out that in R it's more informative when using predict commands to use the data=mtcars option instead of, say, the dollar notation:
data(mtcars)
m1 <- glmer(mtcars$am ~ 1 + mtcars$wt + (1|mtcars$gear), family="binomial")
m2 <- glmer(am ~ 1 + wt + (1|gear), family="binomial", data=mtcars)
p1 <- predict(m1); p2 <- predict(m2)
names(p1) # not that informative...
names(p2) # very informative!

Resources