rootogram() error when checking for overdispersion in GAM - r

I have run the below GAM and am trying to plot a rootogram() using the countreg package to check for overdispersion, but get the error Error in X[, pstart[i] - 1 + 1:object$nsdf[i]] <- Xp : number of items to replace is not a multiple of replacement length.
I understand what the error message is telling me, that the length of two vectors/objects do not match, but am none the wiser as to how to fix it. Any help/suggestions would be appreciated? Has anyone had this problem previously, if so how did you fix it?
This may be arising due to a peculiarity in my data as I have never previously had a problem producing rootograms when using other datasets.
# I cannot fit a rootogram from the following GAM
> knots2 <- list(nMonth = c(0.5, 12.5))
> sup15 <- gam(Number ~ State + Virus + State*Virus + s(nMonth, bs = "cc", k = 12, by = Virus) + s(Time, k = 60, by = Virus),
data = supply.pad,
family = nb(),
method = "REML",
knots = knots2)
> root_nb <- rootogram(sup15, style = "hanging", plot = FALSE)
Error in X[, pstart[i] - 1 + 1:object$nsdf[i]] <- Xp :
number of items to replace is not a multiple of replacement length
# But can fit a rootogram from the below GAM. Note that these are different datasets but pretty much the same code.
> knots1 <- list(month = c(0.5, 12.5))
> gam10 <- gam(n ~ State + s(month, bs = "cc", k = 12) + s(time),
data = rhdv.gp.pad,
family = nb(),
method = "REML",
knots = knots1)
> root_nb1 <- rootogram(gam10, style = "hanging", plot = FALSE)

Related

Generalized estimating equations working by themselves but not within functions (R)

I am trying to write a function to run GEE using the geepack package. It works fine "on its own" but not within a function, please see example below:
library(geepack)
library(pstools)
df <- data.frame(study_id = c(1:20),
leptin = runif(20),
insulin = runif(20),
age = runif(20, min = 20, max = 45),
sex = sample(c(0,1), size = 20, replace = TRUE))
#Works
geepack::geeglm(leptin ~ insulin + age + sex, id = study_id, data = df)
#Doesn't work
model_function_covariates_gee <- function(x,y) {
M1 <- paste0(x, "~", y, "+ age + sex")
M1_fit <- geepack::geeglm(M1, id = study_id, data = df)
s <- summary(M1_fit)
return(s)
}
model_function_covariates_gee("leptin", "insulin")
Error message:
Error in mcall$formula[3] <- switch(match(length(sformula), c(0, 2, 3)), :
incompatible types (from language to character) in subassignment type fix
Does anyone know why this is? I've fiddled around with it but can't get it to change. Thanks in advance.

Error in R: Non-conformable arrays, how to fix?

I am trying to create an effect plot for a cox proportional hazards model:
fitC7 <- coxph(Surv(TimeDeath, event == 1) ~
strata(sex) * mutation + age
+ ns(BM1, 3),
data = data)
I created a new dataset as follows:
ND1a <- with(data, expand.grid(age = seq(30, 75, length.out = 40), mutation = factor(c("Yes", "No")), sex = factor(c("male", "female")), BM1 = 1.583926))
Then, I tried to use the predict function:
predict(fitC7, newdata = ND1a, type = "lp", se.fit = T)
However, I keep getting the error:
Error in newx - xmeans[match(newstrat, row.names(xmeans)), ] : non-conformable arrays
and I do not know how to correct this.
It does work when I put in a model without sex as a stratifier, e.g.,
fitC9 <- coxph(Surv(TimeDeath, event ==1) ~
sex * mutation + age +
ns(BM1, 3), data = data)
I hope someone can help me, I could not figure it out with previous question and answer threads.

Structural equation modeling / path analysis using lavaan

I'm trying to use the sem function, after having defined a simple model as:
Model1 <- 'Y ~ X + M
M ~ X'
sem (Model1, data = A)
Where A is a matrix defined with the command
A = matrix(ncol = 3, nrow = 50)
A[,1] = read.csv2("Mydata1",header = TRUE)
A[,2] = read.csv2("Mydata2", header = TRUE)
A[,3] = read.csv2("Mydata3",header = TRUE)
But the software displays:
Error in lav_data_full(data = data, group = group, cluster = cluster,
: lavaan ERROR: missing observed variables in dataset: Y M
I've tried also to substitute missing values with means for each variable, but it displays the same error.

mgcv::gamm() and MuMIn::dredge() errors

I've been trying to fit multiple GAMs using the package mgcv within a function, and crudely select the most appropriate model through model selection procedures. But my function runs the first model then doesn't seem to recognise the input data dat again.
I get the error
Error in is.data.frame(data) : object 'dat' not found.
I think this is a scoping problem and I've looked here, and here for help but cannot figure it out.
Code and data are as follows (hopefully reproducible):
https://github.com/cwaldock1/Help/blob/master/test_gam.csv
library(mgcv)
# Function to fit multiple models
best.mod <- function(dat) {
# Set up control structure
ctrl <- list(niterEM = 0, msVerbose = TRUE, optimMethod="L-BFGS-B")
# AR(1)
m1 <- get.models(dredge(gamm(Temp ~ s(Month, bs = "cc") + s(Date, bs = 'cr') + Year,
data = dat, correlation = corARMA(form = ~ 1|Year, p = 1),
control = ctrl)), subset=1)[[1]]
# AR(2)
m2 <- get.models(dredge(gamm(Temp ~ s(Month, bs = "cc") + s(Date, bs = 'cr') + Year,
data = dat, correlation = corARMA(form = ~ 1|Year, p = 2),
control = ctrl)), subset=1)[[1]]
# AR(3)
m3 <- get.models(dredge(gamm(Temp ~ s(Month, bs = "cc") + s(Date, bs = 'cr') + Year,
data = dat, correlation = corARMA(form = ~ 1|Year, p = 3),
control = ctrl)), subset = 1)[[1]]
### Select best model to work with based on unselective AIC criteria
if(AIC(m2$lme) > AIC(m1$lme)){mod = m1}else{mod = m2}
if(AIC(mod$lme) > AIC(m3$lme)){mod = m3}else{mod = mod}
return(mod$gam)
}
mod2 <- best.mod(dat = test_gam)
Any help would be greatly appreciated.
Thanks,
Conor
get.models evaluates in model's formula environment, which in gamm is
(always?) .GlobalEnv, while it should be function's environment (i.e.
sys.frames(sys.nframe())).
So, instead of
get.models(ms, 1)
use
eval(getCall(ms, 1))

Errors in segmented package: breakpoints confusion

Using the segmented package to create a piecewise linear regression I am seeing an error when I try to set my own breakpoints; it seems only when I try to set more than two.
(EDIT) Here is the code I am using:
# data
bullard <- structure(list(Rt = c(0, 4.0054, 25.1858, 27.9998, 35.7259, 39.0769,
45.1805, 45.6717, 48.3419, 51.5661, 64.1578, 66.828, 111.1613,
114.2518, 121.8681, 146.0591, 148.8134, 164.6219, 176.522, 177.9578,
180.8773, 187.1846, 210.5131, 211.483, 230.2598, 262.3549, 266.2318,
303.3181, 329.4067, 335.0262, 337.8323, 343.1142, 352.2322, 367.8386,
380.09, 388.5412, 390.4162, 395.6409), Tem = c(15.248, 15.4523,
16.0761, 16.2013, 16.5914, 16.8777, 17.3545, 17.3877, 17.5307,
17.7079, 18.4177, 18.575, 19.8261, 19.9731, 20.4074, 21.2622,
21.4117, 22.1776, 23.4835, 23.6738, 23.9973, 24.4976, 25.7585,
26.0231, 28.5495, 30.8602, 31.3067, 37.3183, 39.2858, 39.4731,
39.6756, 39.9271, 40.6634, 42.3641, 43.9158, 44.1891, 44.3563,
44.5837)), .Names = c("Rt", "Tem"), class = "data.frame", row.names = c(NA,
-38L))
library(segmented)
# create a linear model
out.lm <- lm(Tem ~ Rt, data=bullard)
o<-segmented(out.lm, seg.Z=~Rt, psi=list(Rt=c(200,300)), control=seg.control(display=FALSE))
Using the psi option, I have tried the following:
psi = list(x = c(150, 300)) -- OK
psi = list(x = c(100, 200)) -- OK
psi = list(x = c(200, 300)) -- OK
psi = list(x = c(100, 300)) -- OK
psi = list(x = c(120, 150, 300)) -- error 1 below
psi = list(x = c(120, 300)) -- OK
psi = list(x = c(120, 150)) -- OK
psi = list(x = c(150, 300)) -- OK
psi = list(x = c(100, 200, 300)) -- error 2 below
(1) Error in segmented.lm(out.lm, seg.Z = ~Rt, psi = list(Rt = c(120, 150, :
only 1 datum in an interval: breakpoint(s) at the boundary or too close
(2) Error in diag(Cov[id, id]) : subscript out of bounds
I have already listed my data at this question, but as a guide the limits on the x data are about 0--400.
A second question that pertains to this one is: how do I actually fix the breakpoints using this segmented package?
The issue here seems to be poor error trapping in the segmented package. Having a look at the code for segmented.lm allows a bit of debugging. For example, in the case of psi = list(x = c(100, 200, 300)), an augmented linear model is fitted as shown below:
lm(formula = Tem ~ Rt + U1.Rt + U2.Rt + U3.Rt + psi1.Rt + psi2.Rt +
psi3.Rt, data = mf)
Call:
lm(formula = Tem ~ Rt + U1.Rt + U2.Rt + U3.Rt + psi1.Rt + psi2.Rt +
psi3.Rt, data = mf)
Coefficients:
(Intercept) Rt U1.Rt U2.Rt U3.Rt psi1.Rt
15.34303 0.04149 0.04591 742.74186 -742.74499 1.02252
psi2.Rt psi3.Rt
NA NA
As you can see, the fit has NA values which then result in a degenerate variance-covariance matrix (called Cov in the code). The function doesn't check for this and tries to pull out diagonal entries from Cov and fails with the error message shown. At least the first error, although perhaps not overly helpful, is caught by the function itself and suggests that the break-points are too close.
In the absence of better error trapping in the function, I think that all you can do is adopt a trial and error approach (and avoid break points which are too close). For example, psi = list(x = c(50, 200, 300)) seems to work ok.
If you use while and tryCatch you can make the command repeat itself until it decides there is no error in the model #jaySf. I'm guessing this is down to the randomiser settings in the function, which can be seen in seg.control.
lm.model <- lm(xdat ~ ydat, data = x)
if.false <- F
while(if.false == F){
tryCatch({
s <- segmented(lm.model, seg.Z =~ydata, psi = NA)
if.false <- T
}, error = function(e){
}, finally = {})
}

Resources