I am running the following regression:
Model <- glm(emp ~ industry + nat_status + region + state + age + educ7 + religion + caste,
family=binomial(link="logit"), data=IHDS)
However when I use the margins command, I get the following error:
There were 50 or more warnings (use warnings() to see the first 50)"
Warning messages: 1: In predict.lm(object, newdata, se.fit, scale = 1,
type = if (type == ... : prediction from a rank-deficient fit may
be misleading
Based on this error, I know that collinearity might exist. However, I do not know how to find it out and deal.
(I have tried adding each control individually)
Related
I ran a linear mixed-effects model looking at the effect of Stress and Lifestyle (HLI) on cognitive change over Time using the lme4 package in R...
mod <- lmer(3MS ~ age + sex + edu + Stress*Time*HLI + (1|ID), data=dflong, na.action = na.omit)
I'm most interested in decomposing the 3-way interaction between Stress*Lifestyle*Time. Specifically, I want to get the interaction contrasts to look at the conditional effects of Stress*Time at -1SD, mean, and +1SD of HLI. To do this, I am using the interactions package to decompose the interaction...
sim_slopes(
model=mod,
pred=Stress*Time,
modx=HLI,
data = dflong)
But I'm receiving the following error
Error in as(x, "matrix")[i = i, j = j, drop = drop] :
subscript out of bounds
In addition: Warning message:
Stress * Time and HLI are not included in an interaction with one another in
the model.
I'm not sure why I'm getting this error or how to go about fixing it? Or if there's another package that can do what I need in a better way? I'm not familiar with any though.
Thanks so much in advance!!
I am trying to run the following code on R:
m <- gam(Flp_pop ~ s(Flp_CO, bs = "cr", k = 30), data = data, family = poisson, method = "REML")
My dataset is like this:
enter image description here
But when I try to execute, I get this error message:
"Error in if (abs(old.score - score) > score.scale * conv.tol) { :
missing value where TRUE/FALSE needed
In addition: There were 50 or more warnings (use warnings() to see the first 50)"
I am very new to R, maybe it is a very basic question. But does anyone know why this is happening?
Thanks!
The Poisson distribution has support on the non-negative integers and you are passing a continuous variable as the response. Here's an example with simulated data
library("mgcv")
library("gratia")
library("dplyr")
df <- data_sim("eg1", seed = 2) %>% # simulate Gaussian response
mutate(yabs = abs(y)) # make y non negative
mp <- gam(yabs ~ s(x2, bs = "cr"), data = df,
family = poisson, method = "REML")
# fails
which reproduces the error you saw
Error in if (abs(old.score - score) > score.scale * conv.tol) { :
missing value where TRUE/FALSE needed
In addition: There were 50 or more warnings (use warnings() to see the first 50)
The warnings are of the form:
$> warnings()[1]
Warning message:
In dpois(y, y, log = TRUE) : non-integer x = 7.384012
Indicating the problem; the model is evaluating the probability mass for your response data given the estimated model and you're evaluating this at the indicated non-integer value, which returns a 0 mass plus the warning.
If we'd passed the original Gaussian variable as the response, which includes negative values, the function would have errored out earlier:
mp <- gam(y ~ s(x2, bs = "cr"), data = df,
family = poisson, method = "REML")
which raises this error:
r$> mp <- gam(y ~ s(x2, bs = "cr"), data = df,
family = poisson, method = "REML")
Error in eval(family$initialize) :
negative values not allowed for the 'Poisson' family
An immediate but not necessarily advisable solution is just to use the quasipoisson family
mq <- gam(yabs ~ s(x2, bs = "cr"), data = df,
family = quasipoisson, method = "REML")
which uses the same mean variance relationship as the Poisson distribution but not the actual distribution so we can get away with abusing it.
Better would be to ask yourself why you were trying to fit a model that is ostensibly for counts to a response that is a continuous (non-negative) variable?
If the answer is you had a count but then normalised it in some way (say by dividing by some measure of effort like area surveyed or length of observation time) then you should use an offset of the form + offset(log(effort_var)) added to the model formula, and use the original non-normalised integer variable as the response.
If you really have a continuous response and the poisson was an over sight, try fitting with family = Gamma(link = "log")) or family = tw().
If it's something else, you should edit your question to include that info and perhaps we here can help or the question could be migrated to CrossValidated if the issue is more statistical in nature.
I'm trying to run a gam using the mgcv package with a response variable which is proportional data. The data is overdispered so initially I used a quasibinomial distribution. However because I'm using model selection that's not particularly useful as it does not produce AIC scores.
Instead I'm trying to use betar distribution, as I've read that it could be appropriate.
mRI_br <- bam(ri ~ SE_score + s(deg, k=7) + s(gs, k=7) + TL + species + sex + season + year + s(code, bs = 're') + s(station, bs = 're'), family=betar(), data=node_dat, na.action = "na.fail")
However I'm getting this warnings when I run the model.
Warning messages:
1: In estimate.theta(theta, family, y, mu, scale = scale1, ... :
step failure in theta estimation
And when I try and check the model summary I get this error.
> summary(mRI_br)
Error in chol.default(diag(p) + crossprod(S2 %*% t(R2))) :
the leading minor of order 62 is not positive definite
I would like to know:
What is causing these errors and warnings, and how can they be solved?
If not are there any other distributions that can be used with proportion data which enable me to subsequently use model selection techniques (such as the dredge function from the MuMIn package.
A copy of the dataset can be found here
I'm looking at debris ingestion in gulls. Each gull is listed by row. Columns contain the sex(0=male, 1=female), if they ate debris (0=no, 1=yes) and if I found any number of other items in their stomach, for this problem I'd like to see if sex and presence of debris influences the number of birds with Shells in their stomach (0=no shells, 1=shells). Debris prevalence is likely overdispersed and zero-inflated, but I'm not sure that matters if I'm using it as a factor to evaluate shell prevalence. Shell prevalence might be overdispersed and zero inflated as well.
I've plotted the data and want to test whether the differences seen in the plot are significant.
But when trying to run a zero-inflated negative binomial model I get many diff errors depending on how I set it up.
library (aod)
library(MASS)
library (ggplot2)
library(gridExtra)
library(pscl)
library(boot)
library(reshape2)
mydata1 <- read.csv('D:/mp paper/analysis wkshts/stats files/FOdata.csv')
mydata1 <- within(mydata1, {
debris <- factor(debris)
sex <- factor(sex)
Shell_frags <- factor(Shell_frags)
})
summary(mydata1)
ggplot(mydata1, aes(Shell_frags, fill=debris)) +
stat_count() +
facet_grid(debris ~ sex, margins=TRUE, scales="free_y")
m1 <- zeroinfl((Shell_frags ~ sex + debris), data = mydata1, dist = "negbin", EM = TRUE)
summary(m1)
Error message:
Error in if (all(Y > 0)) stop("invalid dependent variable, minimum count is not zero") :
missing value where TRUE/FALSE needed
In addition: Warning messages:
1: In model.response(mf, "numeric") :
using type = "numeric" with a factor response will be ignored
2: In Ops.factor(Y, 0) : ‘>’ not meaningful for factors
> summary(m1)
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'object' in selecting a method for function 'summary': object 'm1'
not found
Is there a way to scale all your variables *including your interaction variables, in a GLM? For example, running the model works when I scale all single variables, but not the interactions.
This works:
mod_lm_scale <-glm( pa ~
scale(WC05) + scale(WC06) + scale(WC08) +
scale(WC13) + scale(WC15),
data=sdmdata,
family='binomial')
Trying to add a scaled interaction does not work. See the last term: scale(WC05:WC06)
mod_lm_scale <-glm( pa ~
scale(WC05) + scale(WC06) + scale(WC08) +
scale(WC13) + scale(WC15),
scale(WC05:WC06),
data=sdmdata,
family='binomial')
I receive this error when including the scaled interaction, and receive no errors when I don't include it:
Error in model.frame.default(formula = pa ~ scale(WC05_clipped) + scale(WC06_clipped) + :
variable lengths differ (found for '(weights)')
In addition: Warning messages:
1: In WC05_clipped:WC06_clipped :
numerical expression has 2869 elements: only the first used
2: In WC05_clipped:WC06_clipped :
numerical expression has 2869 elements: only the first used