mgcv::gamm() and MuMIn::dredge() errors - r

I've been trying to fit multiple GAMs using the package mgcv within a function, and crudely select the most appropriate model through model selection procedures. But my function runs the first model then doesn't seem to recognise the input data dat again.
I get the error
Error in is.data.frame(data) : object 'dat' not found.
I think this is a scoping problem and I've looked here, and here for help but cannot figure it out.
Code and data are as follows (hopefully reproducible):
https://github.com/cwaldock1/Help/blob/master/test_gam.csv
library(mgcv)
# Function to fit multiple models
best.mod <- function(dat) {
# Set up control structure
ctrl <- list(niterEM = 0, msVerbose = TRUE, optimMethod="L-BFGS-B")
# AR(1)
m1 <- get.models(dredge(gamm(Temp ~ s(Month, bs = "cc") + s(Date, bs = 'cr') + Year,
data = dat, correlation = corARMA(form = ~ 1|Year, p = 1),
control = ctrl)), subset=1)[[1]]
# AR(2)
m2 <- get.models(dredge(gamm(Temp ~ s(Month, bs = "cc") + s(Date, bs = 'cr') + Year,
data = dat, correlation = corARMA(form = ~ 1|Year, p = 2),
control = ctrl)), subset=1)[[1]]
# AR(3)
m3 <- get.models(dredge(gamm(Temp ~ s(Month, bs = "cc") + s(Date, bs = 'cr') + Year,
data = dat, correlation = corARMA(form = ~ 1|Year, p = 3),
control = ctrl)), subset = 1)[[1]]
### Select best model to work with based on unselective AIC criteria
if(AIC(m2$lme) > AIC(m1$lme)){mod = m1}else{mod = m2}
if(AIC(mod$lme) > AIC(m3$lme)){mod = m3}else{mod = mod}
return(mod$gam)
}
mod2 <- best.mod(dat = test_gam)
Any help would be greatly appreciated.
Thanks,
Conor

get.models evaluates in model's formula environment, which in gamm is
(always?) .GlobalEnv, while it should be function's environment (i.e.
sys.frames(sys.nframe())).
So, instead of
get.models(ms, 1)
use
eval(getCall(ms, 1))

Related

nlme with correlation structure not fitting and crashes R

I have a mixed model with a non-linear term, so I would like to use the R package nlme instead of lme. However, switching to nlme, even without adding anything to the model, causes Rstudio and R to crash.
I have found that even generated data, which can easily be fitted using lme, causes this behaviour (on my computer).
Let's start by loading the libraries and setting up a data.frame with the grouping id and spatial coordinate x.
library(nlme)
nid <- 300
nx <- 10
data <- expand.grid(
x = seq(nx),
id = seq(nid)
)
Now, let's add correlated error and uncorrelated error as separate columns, as well as a random intercept value per id. The output of arima.sim requires a normalisation step.
data$ec <- c(
replicate(
nid,
as.numeric(
arima.sim(
model = list(
order = c(1, 0, 0),
ar = 0.5
),
n = nx
)
)
)
)
data$ec <- data$ec / sd(data$ec)
data$eu <- rnorm(nid * nx)
data$random <- rep(rnorm(nid), each = nx)
Now, we can create 3 dependent variables, for 3 models. The first is a mixed model with uncorrelated (regular) error. The second includes an exponential (AR1) correlation structure. The third combines both. I am adding an intercept of 1, an sd of the random effect of 2 and an sd of the total residual error of 3.
data$y1 <- 1 + 2 * data$random + 3 * data$eu
data$y2 <- 1 + 2 * data$random + 3 * data$ec
data$y3 <- 1 + 2 * data$random + sqrt(8) * data$ec + sqrt(1) * data$eu
All of the following lme models fit without problem, giving the expected result.
l1 <- lme(
fixed = y1 ~ 1,
random = ~ 1 | id,
data = data,
method = "ML"
)
l2 <- lme(
fixed = y2 ~ 1,
random = ~ 1 | id,
correlation = corExp(
form = ~ x | id
),
data = data,
method = "ML"
)
l3 <- lme(
fixed = y3 ~ 1,
random = ~ 1 | id,
correlation = corExp(
form = ~ x | id,
nugget = TRUE
),
data = data,
method = "ML"
)
As far as I know, the following nlme code specifies exactly the same models as above. The first runs without issues. But the ones with a correlation structure crash R / RStudio. No warning or error message is provided. Fiddling with the arguments and with nlmeControl does not help, though I do think nlmeControl could be the place to search for a solution.
nlme(
model = y1 ~ b0,
fixed = b0 ~ 1,
random = b0 ~ 1,
group = ~ id,
data = data,
start = list(
fixed = fixed.effects(l1),
random = setNames(random.effects(l1), "b0")
),
method = "ML"
)
nlme(
model = y2 ~ b0,
fixed = b0 ~ 1,
random = b0 ~ 1,
group = ~ id,
correlation = corExp(
form = ~ x
),
data = data,
start = list(
fixed = fixed.effects(l2),
random = setNames(random.effects(l2), "b0")
),
method = "ML"
)
nlme(
model = y3 ~ b0,
fixed = b0 ~ 1,
random = b0 ~ 1,
group = ~ id,
correlation = corExp(
form = ~ x,
nugget = TRUE
),
data = data,
start = list(
fixed = fixed.effects(l3),
random = setNames(random.effects(l3), "b0")
),
method = "ML"
)
Has anyone experienced this before? Does my example code give the same problem on your computer? What are good strategies to change nlmeControl to attempt to remedy this?

rootogram() error when checking for overdispersion in GAM

I have run the below GAM and am trying to plot a rootogram() using the countreg package to check for overdispersion, but get the error Error in X[, pstart[i] - 1 + 1:object$nsdf[i]] <- Xp : number of items to replace is not a multiple of replacement length.
I understand what the error message is telling me, that the length of two vectors/objects do not match, but am none the wiser as to how to fix it. Any help/suggestions would be appreciated? Has anyone had this problem previously, if so how did you fix it?
This may be arising due to a peculiarity in my data as I have never previously had a problem producing rootograms when using other datasets.
# I cannot fit a rootogram from the following GAM
> knots2 <- list(nMonth = c(0.5, 12.5))
> sup15 <- gam(Number ~ State + Virus + State*Virus + s(nMonth, bs = "cc", k = 12, by = Virus) + s(Time, k = 60, by = Virus),
data = supply.pad,
family = nb(),
method = "REML",
knots = knots2)
> root_nb <- rootogram(sup15, style = "hanging", plot = FALSE)
Error in X[, pstart[i] - 1 + 1:object$nsdf[i]] <- Xp :
number of items to replace is not a multiple of replacement length
# But can fit a rootogram from the below GAM. Note that these are different datasets but pretty much the same code.
> knots1 <- list(month = c(0.5, 12.5))
> gam10 <- gam(n ~ State + s(month, bs = "cc", k = 12) + s(time),
data = rhdv.gp.pad,
family = nb(),
method = "REML",
knots = knots1)
> root_nb1 <- rootogram(gam10, style = "hanging", plot = FALSE)

How can I find regression model analyses from 2 dataset?

setwd("C:/Users/sevvalayse.yurtekin/Desktop/hw3")
data = read.table('DSE501_fall2020_HW3.csv', header= T, sep=',')
attach
data
getOption("max.print")
rs<-rowSums(data[,2:76], na.rm = TRUE)
data<-cbind(data,rs)
data
p1<-ggplot()+
geom_line(aes(y = rs, x=year), data=data)+
scale_x_continuous(breaks = seq(2004,2019,2))
p1
model = lm(rs ~ year )
model
summary(model)
residuals(model)
predict(model)
#model.fit = lm(year~rs)
#summary(model.fit)
new.year<-data.frame(
year = c(2021,2022,2023)
)
predict(model, newdata = new.year, interval = 'confidence')
data2 = read.table('TUIK_nufus_2019.csv', header = T, sep=",")
data2
total = data2$Total
mydata<-data[-c(1,2,3),]
model2 = lm(mydata~total)
model2
Hello, I have an error about the Error in model.frame.default(formula = mydata ~ total, drop.unused.levels = TRUE) : invalid type (list) for variable 'mydata'.
How can I fixed? I want to regression analyses from 2 data.
The line that's causing the issue is model2 = lm(mydata~total). mydata is not a vector, which is what your dependent variable should be in the lm function. When you set mydata you do not provide a column name: mydata<-data[-c(1,2,3), <enter column name of dependent variable>]
Otherwise you can fit your model with the following syntax (provided your dependent and independent variables are in the same dataframe). Here I just used y as a fake variable name: lm(y ~ total, data = mydata)

Structural equation modeling / path analysis using lavaan

I'm trying to use the sem function, after having defined a simple model as:
Model1 <- 'Y ~ X + M
M ~ X'
sem (Model1, data = A)
Where A is a matrix defined with the command
A = matrix(ncol = 3, nrow = 50)
A[,1] = read.csv2("Mydata1",header = TRUE)
A[,2] = read.csv2("Mydata2", header = TRUE)
A[,3] = read.csv2("Mydata3",header = TRUE)
But the software displays:
Error in lav_data_full(data = data, group = group, cluster = cluster,
: lavaan ERROR: missing observed variables in dataset: Y M
I've tried also to substitute missing values with means for each variable, but it displays the same error.

Error from JM package in R

JM is a package to fit a model with joint longitudinal and survival data. I can get it to run with their example data, but I get an error with my own data. Any idea what the issue with JMfit1 or JMfit2 is?
My Data:
https://1drv.ms/u/s!AkG9wyz5G1c1gR4Vs_xohO--4Rb5
install.packages('JM')
require(JM)
?jointModel
# Example from vignette
# linear mixed model fit (random intercepts + random slopes)
fitLME <- lme(log(serBilir) ~ drug * year, random = ~ year | id, data = pbc2)
summary(fitLME)
# survival regression fit
fitSURV <- survreg(Surv(years, status2) ~ drug, data = pbc2.id, x = TRUE)
summary(fitSURV)
# joint model fit, under the (default) Weibull model
fitJOINT <- jointModel(fitLME, fitSURV, timeVar = "year")
fitJOINT
summary(fitJOINT)
# we can also include an interaction term of log(serBilir) with drug
fitJOINT <- jointModel(fitLME, fitSURV, timeVar = "year",
# interFact = list(value = ~ drug, data = pbc2.id))
fitJOINT
summary(fitJOINT)
# With my data:
data = readRDS('data.list.d1.dk.RDS')
d1 = data$d1
dk = data$dk
dim(d1); names(d1)
dim(dk); names(dk)
slct.cov = c('ID','Yi','Ai','zi.1','zi.2','zi.3','xi_A','di')
fmla.fix = as.formula('Yi ~ Ai*(zi.1+zi.2+zi.3)')
fmla.rnd = as.formula(' ~ Ai|ID')
fit.Yi = lme(fixed= fmla.fix, random=reStruct(fmla.rnd),
method="ML", data = dk[,slct.cov] )
surv.model = survreg(Surv(xi_A, di) ~ zi.1+zi.2+zi.3, data = d1[,slct.cov], x = TRUE)
JMfit1 = jointModel(lmeObject = fit.Yi, survObject = surv.model, timeVar = 'Ai')
# Error in if (t1 || t2) { : missing value where TRUE/FALSE needed
dForm <- list(fixed = ~ 1 + zi.1 + zi.2 + zi.3, indFixed = c(2,6,7,8), random = ~ 1, indRandom = 2)
JMfit2 = jointModel(lmeObject = fit.Yi, survObject = surv.model, timeVar = 'Ai',
derivForm = dForm, parameterization = c("both")) #"both", "value", "slope"
# method = "weibull-PH-aGH",
# "weibull-PH-aGH", "weibull-PH-GH", "weibull-AFT-aGH","weibull-AFT-GH",
# "piecewise-PH-aGH", "piecewise-PH-GH", "Cox-PH-aGH", "Cox-PH-GH",
# "spline-PH-aGH", "spline-PH-GH", # "ch-Laplace"
# interFact = NULL, lag = 0, scaleWB = NULL,
# CompRisk = FALSE, init = NULL, control = list())
It looks to me that your fmla.fix model needs more investigation in its own right maybe.
Simplifying the interactions to Yi ~ Ai+zi.1+zi.3 + Ai*(zi.2) or even Yi ~ Ai+zi.1+zi.2+zi.3 seem to give a valid JMfit1 output.
I suspect you'll get a different error for JMfit2 (do you??), so that may be a subsequent SO question.

Resources