How to Fit Conway–Maxwell-Poisson regression in R? - r

I want to fit Conway–Maxwell-Poisson regression with one response and two randomly generated covariates in R, How I Fit?
library(COMPoissonReg)
n1=200
x1 = rnorm(n1,0,1)
x2 = rnorm(n1,0,1)
b0=0.05; b1=0.0025;b2=0.005;b7=0.0001
y=b0+(b1*x1)+(b2*x2)
y1=exp(y)
nu=exp(y)
y2=rcmp(n1, y1,nu)
model = glm.cmp(y1 ~ x1+x2,formula.nu=x1+x2)
I found the following error
Error in formula.default(object, env = baseenv()) : invalid formula
so guide me I Fit this model?

For your data above, you did not simulate the overdispersion parameter so you can don't specify the formula for nu. This will work, but your dependent variable is not an integer, so it throws an error:
glm.cmp(y1 ~ x1+x2)
You can simulate data like this, below comes with an over dispersion so you can see that in the estimate for nu :
n1=200
x1 = rnorm(n1,0,1)
x2 = rnorm(n1,0,1)
b0=0.05; b1=3;b2=2
mu=exp(b0+(b1*x1)+(b2*x2))
y = rnbinom(length(mu),mu=mu,size=1)
glm.cmp(y ~ x1+x2,data=df)

Related

Trouble in GAM model in R software

I am trying to run the following code on R:
m <- gam(Flp_pop ~ s(Flp_CO, bs = "cr", k = 30), data = data, family = poisson, method = "REML")
My dataset is like this:
enter image description here
But when I try to execute, I get this error message:
"Error in if (abs(old.score - score) > score.scale * conv.tol) { :
missing value where TRUE/FALSE needed
In addition: There were 50 or more warnings (use warnings() to see the first 50)"
I am very new to R, maybe it is a very basic question. But does anyone know why this is happening?
Thanks!
The Poisson distribution has support on the non-negative integers and you are passing a continuous variable as the response. Here's an example with simulated data
library("mgcv")
library("gratia")
library("dplyr")
df <- data_sim("eg1", seed = 2) %>% # simulate Gaussian response
mutate(yabs = abs(y)) # make y non negative
mp <- gam(yabs ~ s(x2, bs = "cr"), data = df,
family = poisson, method = "REML")
# fails
which reproduces the error you saw
Error in if (abs(old.score - score) > score.scale * conv.tol) { :
missing value where TRUE/FALSE needed
In addition: There were 50 or more warnings (use warnings() to see the first 50)
The warnings are of the form:
$> warnings()[1]
Warning message:
In dpois(y, y, log = TRUE) : non-integer x = 7.384012
Indicating the problem; the model is evaluating the probability mass for your response data given the estimated model and you're evaluating this at the indicated non-integer value, which returns a 0 mass plus the warning.
If we'd passed the original Gaussian variable as the response, which includes negative values, the function would have errored out earlier:
mp <- gam(y ~ s(x2, bs = "cr"), data = df,
family = poisson, method = "REML")
which raises this error:
r$> mp <- gam(y ~ s(x2, bs = "cr"), data = df,
family = poisson, method = "REML")
Error in eval(family$initialize) :
negative values not allowed for the 'Poisson' family
An immediate but not necessarily advisable solution is just to use the quasipoisson family
mq <- gam(yabs ~ s(x2, bs = "cr"), data = df,
family = quasipoisson, method = "REML")
which uses the same mean variance relationship as the Poisson distribution but not the actual distribution so we can get away with abusing it.
Better would be to ask yourself why you were trying to fit a model that is ostensibly for counts to a response that is a continuous (non-negative) variable?
If the answer is you had a count but then normalised it in some way (say by dividing by some measure of effort like area surveyed or length of observation time) then you should use an offset of the form + offset(log(effort_var)) added to the model formula, and use the original non-normalised integer variable as the response.
If you really have a continuous response and the poisson was an over sight, try fitting with family = Gamma(link = "log")) or family = tw().
If it's something else, you should edit your question to include that info and perhaps we here can help or the question could be migrated to CrossValidated if the issue is more statistical in nature.

GLMMadaptive for semi-continuous data

I am dealing with a very hard-to-work data set: fish larval density. It is a semicontinuous data, with 90% of zeros and a right-skewed distribution, with few very huge values. I would like, for example, to make some predictions about enviromental features and and larval density. I am trying to use a two part model (GLMMadaptive for semicontinuous data), family = hurdle.lognormal().
But the command summary does not work with models fitted with mixed_model(), family = hurdle.lognormal(). So, I don't know how to get standard errors, p-values and confidence intervals for my predictors.
Another question is related to Goodness of Fit for the residuals. How can I look for it?
Also, I tried to fit a null model, without fixed effects, looking for model significance, but I couldn't fix it, because it gives me the following message:
Error in .subset2(x, i, exact = exact) : subscript out of bounds
Nullmodel <- mixed_model(fixed = Dprochilodus ~ 1, random = ~ 1|periodo, data = OeL_final, family = hurdle.lognormal(), max_coef_value = 30)
mymodel <- mixed_model(fixed = Dprochilodus ~ ponto+Dif_his.y+temp, random = ~ 1 | periodo, data = OeL_final, family = hurdle.lognormal(), n_phis = 1, zi_fixed = ~ ponto, max_coef_value = 30)
The results of my model are:
Call: mixed_model(fixed = logDprochilodus ~ ponto + Dif_his.y + temp,
random = ~1 | periodo, data = OeL_final, family = hurdle.lognormal(),
zi_fixed = ~ponto, n_phis = 1, max_coef_value = 30)
Model: family: hurdle log-normal link: identity
Random effects covariance matrix:
StdDev (Intercept) 0.05366623
Fixed effects: (Intercept) pontoIR pontoITA pontoJEQ pontoTB Dif_his.y temp
3.781147e-01 -1.161167e-09 3.660306e-01 -1.273341e+00 -5.834588e-01 1.374241e+00 -4.010771e-02
Zero-part coefficients: (Intercept) pontoIR pontoITA pontoJEQ pontoTB
1.4522523 21.3761790 3.3013379 1.1504374 0.2031707
Residual std. dev.:
1.240212
log-Lik: -216.3266
Have some one worked with this kind of model?? I really appreciate any help!
The summary() method should work with family = hurdle.lognormal(). For example, you can call summary() in the example posted here.
To check the goodness-of-fit you could use the simulated scale residuals provided from the DHARMa package; for an example check here.
If you are working in Rstudio console you may need to print(summary())

Syntax for glmer function for use with glmulti?

Using glmer, I can run a logistic regression mixed model just fine. But when I try to do the same using glmulti, I get errors (described below). I think the problem is with the function I am specifying for use in glmulti. I want a function that specifies a logistic regression model for data containing continuous fixed covariates and categorical random effects, using a logit link. The response variable is a binary 0/1.
Sample data:
library(lme4)
library(rJava)
library(glmulti)
set.seed(666)
x1 = rnorm(1000) # some continuous variables
x2 = rnorm(1000)
x3 = rnorm(1000)
r1 = rep(c("red", "blue"), times = 500) #categorical random effects
r2 = rep(c("big", "small"), times = 500)
z = 1 + 2*x1 + 3*x2 +2*x3
pr = 1/(1+exp(-z))
y = rbinom(1000,1,pr) # bernoulli response variable
df = data.frame(y=y,x1=x1,x2=x2, x3=x3, r1=r1, r2=r2)
A single glmer logistic regression works just fine:
model1<-glmer(y~x1+x2+x3+(1|r1)+(1|r2),data=df,family="binomial")
But errors occur when I try to use the same model structure through glmulti:
# create a function - I think this is where my problem is
glmer.glmulti<-function(formula, data, family=binomial(link ="logit"), random="", ...){
glmer(paste(deparse(formula),random),data=data,...)
}
# run glmulti models
glmulti.logregmixed <-
glmulti(formula(glmer(y~x1+x2+x3+(1|r1)+(1|r2), data=df), fixed.only=TRUE), #error w/o fixed.only=TRUE
data=df,
level = 2,
method = "g",
crit = "aicc",
confsetsize = 128,
plotty = F, report = F,
fitfunc = glmer.glmulti,
family = binomial(link ="logit"),
random="+(1|r1)","+(1|r2)", # possibly this line is incorrect?
intercept=TRUE)
#Errors returned:
singular fit
Error in glmulti(formula(glmer(y ~ x1 + x2 + x3 + (1 | r1) + (1 | r2), :
Improper call of glmulti.
In addition: Warning message:
In glmer(y ~ x1 + x2 + x3 + (1 | r1) + (1 | r2), data = df) :
calling glmer() with family=gaussian (identity link) as a shortcut to lmer() is deprecated; please call lmer() directly
I've tried various changes to the function, and within the formula and fitfunc portion of the glmulti code. I've tried substituting lmer for glmer and I guess I don't understand the error. I'm also afraid that calling lmer may change the model structure, as during one of my attempts the summary() of the model stated "Linear mixed model fit by REML ['lmerMod']." I need the glmulti models to be the same as what I'm obtaining with model1 using glmer (ie summary(model1) gives "Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']"
Many similar questions remain unanswered. Thanks in advance!
Credit:
sample data set created with help from here:
https://stats.stackexchange.com/questions/46523/how-to-simulate-artificial-data-for-logistic-regression
glmulti code adapted from here:
Model selection using glmulti

error in gamma link specification for lmer

I'm trying to fit a mixed-effects model with a gamma distribution. The most basic model has one fixed predictor and 1 random effect. No matter which link I specify (I've tried log, identity and inverse), I obtain the following error. My real data has zeros in Y, but even when I apply simulated data with only positive Y as below, it throws the same error.
mockdf = data.frame(y = rnorm(100,77,6.5), x1 = sample(letters,100,replace = T), x2 = seq(1900,1999,1))
mod = lmer(y ~ (1|x1) + x2, family = gamma(link = 'identity'), na.action = na.exclude, data = mockdf)
Error in gamma(link = "identity") :
supplied argument name 'link' does not match 'x'
I searched through SO and couldn't find another person who ran into this error. Is my syntax incorrect?
Thanks for your help.

Custom nonlinear function in `R.gnm`

I'm trying to build a nonlinear model as described in the manual for gmn in R. The desired form of the model is
y = b0*x0^g0 + b1*x1^g1 + ...
This would seem to be the simplest possible form of nonlinear model to me, but for some reason (and please correct me if I'm wrong!) I have to write a custom nonlinear function to fit it in R. Very well!
df=read.csv("d:/mydataframe.csv")
require(gnm)
mypower = function(x){
list(predictors = list(beta=1,gamma=1),
variables = list(substitute(x)),
term = function(predlabels,varlabels) {
paste(predlabels[1],"*(",varlabels[1],"**",predlabels[2],")")
}
)
}
class(mypower) <- "nonlin"
Now when I try
fit <- gnm(formula=y ~ mypower(x1), data=df)
I get a fitted value of beta and gamma from the model. But when I try
fit <- gnm(formula=y ~ mypower(x1)+mypower(x2), data=df)
I get the error
Algorithm failed - no model could be estimated.
So, question 1: how can I solve this?
Also, when - trying to match all xs - I try
fit <- gnm(formula=PedalCycles ~ mypower(.), data=df)
I get
Error in eval(expr, envir, enclos) : object '.' not found
Is this the right way to specify a sum of all xs each raised to a power?
To estimate y = b_0*x_0^g_0 you could use gnm's built-in Exp() to estimate
Exp( 1 + I( log(x_0) ) )
This gives you coefficients:
b'_0 for the intercept
g'_0 for log(x_0)
Hence g'_0 is your desired g_0 (since e^log(x_0)*g'_0 = x_0^g_0) and e^b'_0 is b_0. Your model is now a sum of such terms.
Caveat: This will not work if x_0 assumes non-positive values in your data set.

Resources