nlme with correlation structure not fitting and crashes R - r

I have a mixed model with a non-linear term, so I would like to use the R package nlme instead of lme. However, switching to nlme, even without adding anything to the model, causes Rstudio and R to crash.
I have found that even generated data, which can easily be fitted using lme, causes this behaviour (on my computer).
Let's start by loading the libraries and setting up a data.frame with the grouping id and spatial coordinate x.
library(nlme)
nid <- 300
nx <- 10
data <- expand.grid(
x = seq(nx),
id = seq(nid)
)
Now, let's add correlated error and uncorrelated error as separate columns, as well as a random intercept value per id. The output of arima.sim requires a normalisation step.
data$ec <- c(
replicate(
nid,
as.numeric(
arima.sim(
model = list(
order = c(1, 0, 0),
ar = 0.5
),
n = nx
)
)
)
)
data$ec <- data$ec / sd(data$ec)
data$eu <- rnorm(nid * nx)
data$random <- rep(rnorm(nid), each = nx)
Now, we can create 3 dependent variables, for 3 models. The first is a mixed model with uncorrelated (regular) error. The second includes an exponential (AR1) correlation structure. The third combines both. I am adding an intercept of 1, an sd of the random effect of 2 and an sd of the total residual error of 3.
data$y1 <- 1 + 2 * data$random + 3 * data$eu
data$y2 <- 1 + 2 * data$random + 3 * data$ec
data$y3 <- 1 + 2 * data$random + sqrt(8) * data$ec + sqrt(1) * data$eu
All of the following lme models fit without problem, giving the expected result.
l1 <- lme(
fixed = y1 ~ 1,
random = ~ 1 | id,
data = data,
method = "ML"
)
l2 <- lme(
fixed = y2 ~ 1,
random = ~ 1 | id,
correlation = corExp(
form = ~ x | id
),
data = data,
method = "ML"
)
l3 <- lme(
fixed = y3 ~ 1,
random = ~ 1 | id,
correlation = corExp(
form = ~ x | id,
nugget = TRUE
),
data = data,
method = "ML"
)
As far as I know, the following nlme code specifies exactly the same models as above. The first runs without issues. But the ones with a correlation structure crash R / RStudio. No warning or error message is provided. Fiddling with the arguments and with nlmeControl does not help, though I do think nlmeControl could be the place to search for a solution.
nlme(
model = y1 ~ b0,
fixed = b0 ~ 1,
random = b0 ~ 1,
group = ~ id,
data = data,
start = list(
fixed = fixed.effects(l1),
random = setNames(random.effects(l1), "b0")
),
method = "ML"
)
nlme(
model = y2 ~ b0,
fixed = b0 ~ 1,
random = b0 ~ 1,
group = ~ id,
correlation = corExp(
form = ~ x
),
data = data,
start = list(
fixed = fixed.effects(l2),
random = setNames(random.effects(l2), "b0")
),
method = "ML"
)
nlme(
model = y3 ~ b0,
fixed = b0 ~ 1,
random = b0 ~ 1,
group = ~ id,
correlation = corExp(
form = ~ x,
nugget = TRUE
),
data = data,
start = list(
fixed = fixed.effects(l3),
random = setNames(random.effects(l3), "b0")
),
method = "ML"
)
Has anyone experienced this before? Does my example code give the same problem on your computer? What are good strategies to change nlmeControl to attempt to remedy this?

Related

Posterior distribution for difference of two proportions with brms in R

I would like to produce a posterior distribution for the difference of two groups. I can do this in JAGS in R, but am hoping to move into this century and replicate this with Stan. I am using brms.
My questions are:
Have I correctly produced the posterior distribution for the effect size?
If not, should I use one of the other functions suggested in the answer here?
How can I account for a prior effect size that could be negative? I imagine this requires using a non-beta prior, but am not sure what would be preferable.
The below code specifies a control and intervention group, with a control and intervention group event rate. It generates beta distributions for each group (step 1), prior distributions (step 2), builds a binomial model (step 3), and posteriors (step 4).
I am reasonably confident what I’ve done in the above steps is correct. I am a bit uncertain as to what the values generated by as_draws_df are - namely the b_Intercept_p vs. b_groupint_p. Further reading on these forums suggested subtracting b_groupint_p from b_Intercept_p; which does produce a plot that could plausibly be the posterior interval (labelled new_p when plotting the alt.graph object), but I am pretty uncertain on this point and would appreciate any clarification.
# Setup
## Packages
library(tidyverse)
library(ProbBayes)
library(brms)
library(tidybayes)
## Options
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())
# ***
# Setup simulation
# ***
# Trial data
con.n = 150
con.event = 50
int.n = 150
int.event = 30
# Distributions
## x1 is the expected control event rate at probability p.x1
## x2 is the control event rate at probability p.x2
x1 = 0.4
p.x1 = 0.5
x2 = 0.6
p.x2 = 0.9
## As above, but for the difference between intervention and control group
y1 = 0.05
p.y1 = 0.5
y2 = 0.1
p.y2 = 0.9
# Model setup
## For brm()
n.chains = 3
n.iter = 4000
n.warmup = 500
## For beta.select(); based on model so the number of observations are equal
n.draws = n.iter - n.warmup
# ***
# Simulation starts here
# ***
# Define Functions
## Log transformation
fun_log.trans = function(x) {
log.trans = log(x / (1 - x))
log.trans
}
## Inverse log transformation
fun_invlog.trans = function(x) {
invlog.trans = exp(x) / (1 + exp(x))
invlog.trans
}
# Run things
# 0. Put data into dataframe
data = data.frame(group = c("int" , "con"),
n = c(int.n , con.n),
event = c(int.event, con.event))
# 1. Generate prior distributions
## Beta prior for the control group event rate
beta0.val = beta.select(list(x = x1, p = p.x1),
list(x = x2, p = p.x2))
p0.sim = rbeta(n.draws, beta0.val[1], beta0.val[2])
### Log transform it
theta0.sim = fun_log.trans(p0.sim)
## Prior distribution for the difference in logit-p for each group
beta1.val = beta.select(list(x = y1, p = p.y1),
list(x = y2, p = p.y2))
p1.sim = rbeta(n.draws, beta1.val[1], beta1.val[2])
### Log transform
theta1.sim = fun_log.trans(p1.sim)
## 2. Create a matrix to store priors
priors = get_prior(family = binomial,
event | trials(n) ~ group,
data = data)
### Get characteristics of the normal distribution for the priors generated in step 1
theta0.val = c(mean(theta0.sim), sd(theta0.sim))
theta1.val = c(mean(theta1.sim), sd(theta1.sim))
### Input the these characteristics into the prior matrix generated at the start of step 2
priors$prior[3] = paste("normal(", theta0.val[1], ", ", theta0.val[2], ")", sep = "")
priors$prior[2] = paste("normal(", theta1.val[1], ", ", theta1.val[2], ")", sep = "")
# 3. Generate model with stan
model = brm(family = binomial,
event | trials(n) ~ group,
data = data,
prior = priors,
iter = n.iter,
warmup = n.warmup,
chains = n.chains,
refresh = 0)
## Plot model
plot(model)
print(model)
# 4. Generate posteriors
posteriors = as_draws_df(model)
## Inverse log transform function on theta to get p again
posteriors = posteriors %>%
mutate(across(starts_with("b_"), .f = fun_invlog.trans, .names = "{.col}_p")) %>%
rename_with(~paste0(., "_theta"), .cols = starts_with("b_") & !ends_with("_p"))
## 95% posterior interval estimate
quantile(posteriors$b_Intercept_p, c(0.025, 0.975))
# 5. Plot posterior densities
## Take the posterior interval data and bind it with the priors
## Ideally, n.draws = number of iterations - warmup
graph = posteriors %>%
select(ends_with("_p")) %>%
cbind(p0.sim, p1.sim) %>%
pivot_longer(cols = everything(),
names_to = "distribution",
values_to = "value")
alt.graph = posteriors %>%
mutate(new_p = b_Intercept_p - b_groupint_p) %>%
select(new_p) %>%
cbind(p0.sim, p1.sim)
quantile(alt.graph$new_p, c(0.025, 0.975))
alt.graph = alt.graph %>%
pivot_longer(cols = everything(),
names_to = "distribution",
values_to = "value")
## Plot it
alt.graph %>%
ggplot(aes(x = value)) +
geom_density(aes(fill = distribution),
alpha = 0.5) +
theme_light()

brms: how do I set prior on categorical variable?

I am building a binomial regression model using 2 categorical variables. This is from an example in the book, Statistical rethinking. In the book, while using the rethinking package, we can set priors on each categorical variable as shown below
m11.5 <- ulam(
alist(
pulled_left ~ dbinom( 1 , p ) ,
logit(p) <- a[actor] + b[treatment] ,
a[actor] ~ dnorm( 0 , 1.5 ),
b[treatment] ~ dnorm( 0 , 0.5 )
) ,
data=d , chains=4 , log_lik=TRUE )
I am trying to do the same in brms.
priors <- c(prior(normal(0, 1.5), class = b, coef = "actor"),
prior(normal(0, 0.5), class = b, coef = "treatment"))
m11.5 <- brm(data = d, family = binomial,
formula = pulled_left | trials(1) ~ 1 + actor + treatment,
sample_prior = T, prior = priors,
cores = 4, chains = 4)
I would like to set priors for all of the actor levels and the treatment levels mentioned once. However the above code doesn't go through with the following error message,
Upon using get_prior, I see the following (implying that the levels are internally split)
I donot want to specify the prior for the each level of the categorical variables. How do I do it? Please advice.
You can just paste and set multiple coefficients:
priors <- c(prior_string("normal(0, 1.5)", class = "b", coef = paste("actor", 2:7, sep="")),
prior_string("normal(0, 0.5)", class = "b", coef = paste("treatment", 2:4, sep="")))

Creating Survival Trees with MST package: Undefined Columns Error?

I am trying to create a survival Tree with the MST package from R. I have been looking into this paper.
I replicated their example with randomly generated Data and it works just fine. I adjusted my data to fit the same model. My data has the same columns and the same datatypes.
I keep getting this error:
Error in `[.data.frame`(mf_data[col.split.var], , 3) : undefined columns selected
with the following line of code:
fit <- MST(formula = Surv(time,status) ~ x1 + | id), data = data)
I have looked through all of the documentation and I didnt find anything and I can't understand why this error appears.
The code form the paper looks like this:
set.seed(186117)
data <- rmultime(N = 200, K = 4, beta = c(-1, 0.8, 0.8, 0, 0),cutoff = c(0.5, 0.3, 0, 0), model = "marginal.multivariate.exponential", rho = 0.65)$dat
test <- rmultime(N = 100, K = 4, beta = c(-1, 0.8, 0.8, 0, 0), cutoff = c(0.5, 0.3, 0, 0), model = "marginal.multivariate.exponential",rho = 0.65)$dat
fit <- MST(formula = Surv(time, status) ~ x1 + x2 + x3 + x4 | id,data, test, method = "marginal", minsplit = 100, minevents = 20,selection.method = "test.sample")
I tried running your code and I do get an error although not the one you are getting and I'm fairly sure after looking at it that you need to use the [edit] features of SO to modify your question.
> fit <- MST(formula = Surv(time,status) ~ x1 + | id), data = data)
Error: unexpected '|' in "fit <- MST(formula = Surv(time,status) ~ x1 + |"
The formula give is obviously wrong and there is an unnecesary closing parentheses. I am able to get teh error you report with:
> fit <- MST(formula = Surv(time,status) ~ x1 | id, data = data)
[1] "No test sample supplied, changed selection.method = 'bootstrap'"
Error in `[.data.frame`(mf_data[col.split.var], , 3) :
undefined columns selected
.... but not with the original code:
fit <- MST(formula = Surv(time, status) ~ x1 + x2 + x3 + x4 | id,data, test, method = "marginal", minsplit = 100, minevents = 20,selection.method = "test.sample")
I also see an erroir with x1+x2|id on the RHS of the formula but not with three variables:
> fit <- MST(formula = Surv(time, status) ~ x1 +x2 | id,data, test, method = "marginal", minsplit = 100, minevents = 20,selection.method = "test.sample")
Error in `[.data.frame`(mf_data[col.split.var], , 3) :
undefined columns selected
> fit <- MST(formula = Surv(time, status) ~ x1 +x2+x3| id,data, test, method = "marginal", minsplit = 100, minevents = 20,selection.method = "test.sample")
So I'm thinking is is a bug that the developers had not anticipated. Here's how to obtain the needed email address to report:
> maintainer("MST")
[1] "Peter Calhoun <calhoun.peter#gmail.com>"

Error from JM package in R

JM is a package to fit a model with joint longitudinal and survival data. I can get it to run with their example data, but I get an error with my own data. Any idea what the issue with JMfit1 or JMfit2 is?
My Data:
https://1drv.ms/u/s!AkG9wyz5G1c1gR4Vs_xohO--4Rb5
install.packages('JM')
require(JM)
?jointModel
# Example from vignette
# linear mixed model fit (random intercepts + random slopes)
fitLME <- lme(log(serBilir) ~ drug * year, random = ~ year | id, data = pbc2)
summary(fitLME)
# survival regression fit
fitSURV <- survreg(Surv(years, status2) ~ drug, data = pbc2.id, x = TRUE)
summary(fitSURV)
# joint model fit, under the (default) Weibull model
fitJOINT <- jointModel(fitLME, fitSURV, timeVar = "year")
fitJOINT
summary(fitJOINT)
# we can also include an interaction term of log(serBilir) with drug
fitJOINT <- jointModel(fitLME, fitSURV, timeVar = "year",
# interFact = list(value = ~ drug, data = pbc2.id))
fitJOINT
summary(fitJOINT)
# With my data:
data = readRDS('data.list.d1.dk.RDS')
d1 = data$d1
dk = data$dk
dim(d1); names(d1)
dim(dk); names(dk)
slct.cov = c('ID','Yi','Ai','zi.1','zi.2','zi.3','xi_A','di')
fmla.fix = as.formula('Yi ~ Ai*(zi.1+zi.2+zi.3)')
fmla.rnd = as.formula(' ~ Ai|ID')
fit.Yi = lme(fixed= fmla.fix, random=reStruct(fmla.rnd),
method="ML", data = dk[,slct.cov] )
surv.model = survreg(Surv(xi_A, di) ~ zi.1+zi.2+zi.3, data = d1[,slct.cov], x = TRUE)
JMfit1 = jointModel(lmeObject = fit.Yi, survObject = surv.model, timeVar = 'Ai')
# Error in if (t1 || t2) { : missing value where TRUE/FALSE needed
dForm <- list(fixed = ~ 1 + zi.1 + zi.2 + zi.3, indFixed = c(2,6,7,8), random = ~ 1, indRandom = 2)
JMfit2 = jointModel(lmeObject = fit.Yi, survObject = surv.model, timeVar = 'Ai',
derivForm = dForm, parameterization = c("both")) #"both", "value", "slope"
# method = "weibull-PH-aGH",
# "weibull-PH-aGH", "weibull-PH-GH", "weibull-AFT-aGH","weibull-AFT-GH",
# "piecewise-PH-aGH", "piecewise-PH-GH", "Cox-PH-aGH", "Cox-PH-GH",
# "spline-PH-aGH", "spline-PH-GH", # "ch-Laplace"
# interFact = NULL, lag = 0, scaleWB = NULL,
# CompRisk = FALSE, init = NULL, control = list())
It looks to me that your fmla.fix model needs more investigation in its own right maybe.
Simplifying the interactions to Yi ~ Ai+zi.1+zi.3 + Ai*(zi.2) or even Yi ~ Ai+zi.1+zi.2+zi.3 seem to give a valid JMfit1 output.
I suspect you'll get a different error for JMfit2 (do you??), so that may be a subsequent SO question.

mgcv::gamm() and MuMIn::dredge() errors

I've been trying to fit multiple GAMs using the package mgcv within a function, and crudely select the most appropriate model through model selection procedures. But my function runs the first model then doesn't seem to recognise the input data dat again.
I get the error
Error in is.data.frame(data) : object 'dat' not found.
I think this is a scoping problem and I've looked here, and here for help but cannot figure it out.
Code and data are as follows (hopefully reproducible):
https://github.com/cwaldock1/Help/blob/master/test_gam.csv
library(mgcv)
# Function to fit multiple models
best.mod <- function(dat) {
# Set up control structure
ctrl <- list(niterEM = 0, msVerbose = TRUE, optimMethod="L-BFGS-B")
# AR(1)
m1 <- get.models(dredge(gamm(Temp ~ s(Month, bs = "cc") + s(Date, bs = 'cr') + Year,
data = dat, correlation = corARMA(form = ~ 1|Year, p = 1),
control = ctrl)), subset=1)[[1]]
# AR(2)
m2 <- get.models(dredge(gamm(Temp ~ s(Month, bs = "cc") + s(Date, bs = 'cr') + Year,
data = dat, correlation = corARMA(form = ~ 1|Year, p = 2),
control = ctrl)), subset=1)[[1]]
# AR(3)
m3 <- get.models(dredge(gamm(Temp ~ s(Month, bs = "cc") + s(Date, bs = 'cr') + Year,
data = dat, correlation = corARMA(form = ~ 1|Year, p = 3),
control = ctrl)), subset = 1)[[1]]
### Select best model to work with based on unselective AIC criteria
if(AIC(m2$lme) > AIC(m1$lme)){mod = m1}else{mod = m2}
if(AIC(mod$lme) > AIC(m3$lme)){mod = m3}else{mod = mod}
return(mod$gam)
}
mod2 <- best.mod(dat = test_gam)
Any help would be greatly appreciated.
Thanks,
Conor
get.models evaluates in model's formula environment, which in gamm is
(always?) .GlobalEnv, while it should be function's environment (i.e.
sys.frames(sys.nframe())).
So, instead of
get.models(ms, 1)
use
eval(getCall(ms, 1))

Resources