I want to fit a fractional logit model with the command:
glmfit <- glm(tr1 ~ period + male + stib+ income,
family = quasibinomial(link = "logit"), data=mydata)
Where tr1 is a variable that lies between zero and one (including some zeros).
I now want to choose the model with the smallest QAIC value (i.e. testing possible combinations of the independent variables and checking the resulting QAIC values). To do that, I tried to apply the glmulti command in R:
require("glmulti")
glmulti.out <- glmulti(tr1 ~ period + male + stib+ income,
data = mydata,crit = "qaic",
confsetsize = 5, fitfunction = "glm",
family = quasibinomial(link = "logit"))
However, I constantly get the following error and I can't see why:
Error in lesCrit[sel] = cricri : replacement has length zero
Does anyone know how I could overcome this problem?
For me this worked:
library(bbmle)
qaicmod = function (fit) qAIC(fit, dispersion=with(fit,sum((weights * residuals^2)[weights > 0])/df.residual) )
glmulti.out <- glmulti(tr1 ~ period + male + stib+ income,
data = mydata,crit = "qaicmod",
confsetsize = 5, fitfunction = "glm",
family = binomial(link = "logit"))
This uses a regular binomial GLM but calculates the QAIC based on the estimated dispersion coefficient.
In the dispersion argument of the qaicmod function you could also put the estimated dispersion coefficient of a full quasibinomial GLM with all variables included (some statisticions I have seen recommend this), i.e. to use instead
disp <<- summary(fullmodel)$dispersion
qaicmod = function (fit) qAIC(fit, dispersion=disp)
Finally, I also tried using
library(MuMIn)
x.quasibinomial <<- function(...) {
res <- quasibinomial(...)
res$aic <- binomial(...)$aic
res
}
qaicmod <<- function (fit) QAIC(update(fit, family = x.quasibinomial), chat = deviance(fit) / df.residual(fit))
glmulti.out <- glmulti(tr1 ~ period + male + stib+ income,
data = mydata,crit = "qaicmod",
confsetsize = 5, fitfunction = "glm",
family = binomial(link = "logit"))
but that returns the error "Error in eval(expr, envir, enclos) : could not find function "fitfunc"" - not sure how I can fix that...
(The idea would be that this solution would properly refit the model as a quasibinomial GLM and then return the QAIC from that)
First solution above should be OK though I think...
Related
I want to fit a linear mixed model (because I have repeated measures over time).
LM.fit <- lme(VR ~ Group * Time + Age + sexe + study,
random = ~1|subject,
method = "REML",
na.action = na.omit,
data = Data)
I check the assumptions (which is ok I think ?!)
BUT I have unbalanced classes in Group : Subgroup1 have 80 samples and Subgroup2 have 20 samples..
LM.fit <- lme(VR ~ Group * Time + Age + sexe + study,
random = ~1|subject,
weights = varIdent(form = ~1|Group),
method = "REML",
na.action = na.omit,
data = Data)
weights = varIdent(form = ~1|Group) I heard that adding this line can help accounting for unequal variances across groups and unbalanced classes. Can someone help me to know why and how?
If you have other methods, i am listening
I am trying to adjust a generalized linear model defined below:
It must be noted that the response variable Var1, as well as the regressor variable Var2, have zero values, for which a constant has been added to avoid problems when applying the log.
model = glm(Var1+2 ~ log(Var2+2) + offset(log(Var3/Var4)),
family = gaussian(link = "log"), data = data2)
However, I am facing an error when performing the graph for the diagnostic analysis using the hnp function, which is expressed by:
library(hnp)
hnp(model)
Gaussian model (glm object)
Error in eval(family$initialize) :
cannot find valid starting values: please specify some
In order to get around the situation, I tried to perform the manual implementation to then carry out the construction of the graph, however, the error message is still present.
dfun <- function(obj) resid(obj)
sfun <- function(n, obj) simulate(obj)[[1]]
ffun <- function(resp) glm(resp ~ log(Var2+2) + offset(log(Var3/Var4)),
family = gaussian(link = "log"), data = data2)
hnp(model, newclass = TRUE, diagfun = dfun, simfun = sfun, fitfun = ffun)
Error in eval(family$initialize) :
cannot find valid starting values: please specify some
Some guidelines in which I found information to try to solve the problem were used, such as considering initial values to initialize the estimation algorithm both in the linear predictor, as well as for the means, however, these were not enough to solve the problem, see below the computational routine:
fit = lm(Var1+2 ~ log(Var2+2) + offset(log(Var3/Var4)), data=data2)
coefficients(fit)
(Intercept) log(Var2+2)
32.961103 -8.283306
model = glm(Var1+2 ~ log(Var2+2) + offset(log(Var3/Var4)),
family = gaussian(link = "log"), start = c(32.96, -8.28), data = data2)
hnp(model)
Error in eval(family$initialize) :
cannot find valid starting values: please specify some
See that the error persists even when trying to manually implement the half-normal plot.
dfun <- function(obj) resid(obj)
sfun <- function(n, obj) simulate(obj)[[1]]
ffun <- function(resp) glm(resp ~ log(Var2+2) + offset(log(Var3/Var4)),
family = gaussian(link = "log"), data = data2, start = c(32.96, -8.28))
hnp(model, newclass = TRUE, diagfun = dfun, simfun = sfun, fitfun = ffun)
Error in eval(family$initialize) :
cannot find valid starting values: please specify some
I also tried to readjust the model by removing the zeros from the database, however, I didn't get any solution to the problem, that is, it still persists.
I suspect what you meant to fit is a log transformed response variable against your predictors. You can more detail about the difference between a log link glm and a log transformed response variable. Essentially when you use a log link, you are assuming the errors are on the exponential scale. I am not so familiar with hnp but my guess it there are problems simulating the response variable.
If I run your regression like this using the data provided, it looks ok
data2$Y = with(data2, log( (Var1+2)/Var3/Var4))
model = glm(Y ~ log(Var2+2), data = data2)
hnp(model)
I am doing this for econometrics purposes and want to test the significance of all coefficients, it just would be better for my calculations. I have tried regressing the variable on itself and it returned some result with only an intercept, but i am not sure if this is correct.
myprobit0 <- glm(target ~ target,
data = data_numeric, family = binomial(link = "probit") )
Could you please suggest a proper way to only estimate the model with a constant?
vecs = rep(1 ,length(target)) // give length of your data as second argument
myprobit0 <- glm(target ~ vecs ,
data = data_numeric, family = binomial(link = "probit") )
I am dealing with a very hard-to-work data set: fish larval density. It is a semicontinuous data, with 90% of zeros and a right-skewed distribution, with few very huge values. I would like, for example, to make some predictions about enviromental features and and larval density. I am trying to use a two part model (GLMMadaptive for semicontinuous data), family = hurdle.lognormal().
But the command summary does not work with models fitted with mixed_model(), family = hurdle.lognormal(). So, I don't know how to get standard errors, p-values and confidence intervals for my predictors.
Another question is related to Goodness of Fit for the residuals. How can I look for it?
Also, I tried to fit a null model, without fixed effects, looking for model significance, but I couldn't fix it, because it gives me the following message:
Error in .subset2(x, i, exact = exact) : subscript out of bounds
Nullmodel <- mixed_model(fixed = Dprochilodus ~ 1, random = ~ 1|periodo, data = OeL_final, family = hurdle.lognormal(), max_coef_value = 30)
mymodel <- mixed_model(fixed = Dprochilodus ~ ponto+Dif_his.y+temp, random = ~ 1 | periodo, data = OeL_final, family = hurdle.lognormal(), n_phis = 1, zi_fixed = ~ ponto, max_coef_value = 30)
The results of my model are:
Call: mixed_model(fixed = logDprochilodus ~ ponto + Dif_his.y + temp,
random = ~1 | periodo, data = OeL_final, family = hurdle.lognormal(),
zi_fixed = ~ponto, n_phis = 1, max_coef_value = 30)
Model: family: hurdle log-normal link: identity
Random effects covariance matrix:
StdDev (Intercept) 0.05366623
Fixed effects: (Intercept) pontoIR pontoITA pontoJEQ pontoTB Dif_his.y temp
3.781147e-01 -1.161167e-09 3.660306e-01 -1.273341e+00 -5.834588e-01 1.374241e+00 -4.010771e-02
Zero-part coefficients: (Intercept) pontoIR pontoITA pontoJEQ pontoTB
1.4522523 21.3761790 3.3013379 1.1504374 0.2031707
Residual std. dev.:
1.240212
log-Lik: -216.3266
Have some one worked with this kind of model?? I really appreciate any help!
The summary() method should work with family = hurdle.lognormal(). For example, you can call summary() in the example posted here.
To check the goodness-of-fit you could use the simulated scale residuals provided from the DHARMa package; for an example check here.
If you are working in Rstudio console you may need to print(summary())
I followed this example for running a piecewise mixed model using lmer, and it works very well. However, I am having trouble translating the model to lme because I need to deal with heteroscedasticity, and lmer doesn’t have that ability.
Code to reproduce the problem is here. I included details about the experimental design in the code if you think it’s necessary to answer the question.
Here is the model without the breakpoint:
linear <- lmer(mass ~ lat + (1 | pop/line), data = df)
And here is how I run it with the breakpoint:
bp = 30
b1 <- function(x, bp) ifelse(x < bp, x, 0)
b2 <- function(x, bp) ifelse(x < bp, 0, x)
breakpoint <- lmer(mass ~ b1(lat, bp) + b2(lat, bp) + (1 | pop/line), data = df)
The problem is that I have pretty severe heteroscedasticity. As far as I understand, that means I should be using lme from the nlme package. Here is the linear model in lme:
ctrl <- lmeControl(opt='optim')
linear2 <- lme(mass ~ lat , random=~1|pop/line, na.action = na.exclude, data=df, control = ctrl, weights=varIdent(form=~1|pop))
And this is the breakpoint model that is, well, breaking:
breakpoint2 <- lme(mass ~ b1(lat, bp) + b2(lat, bp), random=~1|pop/line, na.action = na.exclude, data=df, control = ctrl, weights=varIdent(form=~1|pop))
Here is the error message:
Error in model.frame.default(formula = ~pop + mass + lat + bp + line, : variable lengths differ (found for 'bp')
How can I translate this lovely breakpoint model from lmer to lme? Thank you!
Looks like lme doesn't like it when you use variables in your formula that aren't in the data.frame you are fitting your model on. One option would be to build your formula first then pass it to lme. For example
myform <- eval(substitute(mass ~ b1(lat, bp) + b2(lat, bp), list(bp=bp)))
breakpoint2 <- lme(myform, random=~1|pop/line, na.action = na.exclude, data=df, control = ctrl, weights=varIdent(form=~1|pop))
The eval()/substitute() is just to swap out the bp in your formula with the value of the variable bp
Or if bp were always 30, you would just put that directly in the formula
breakpoint2 <- lme(mass ~ b1(lat, 30) + b2(lat, 30), random=~1|pop/line, na.action = na.exclude, data=df, control = ctrl, weights=varIdent(form=~1|pop))
and that would work as well.