The data in the following example are from here
library(tidyverse)
library(lme4)
dat <- read.table("aids.dat2",head=T) %>%
filter(day <= 90) %>%
mutate(log10copy = log10(lgcopy)) %>%
na.omit()
> head(dat)
patid day cd4 lgcopy cd8 log10copy
2 11542 2 159.84 4.361728 619.38 0.6396586
3 11542 7 210.60 3.531479 666.90 0.5479566
4 11542 16 204.12 2.977724 635.04 0.4738844
5 11542 29 172.48 2.643453 407.68 0.4221716
6 11542 57 270.94 2.113943 755.78 0.3250933
8 11960 2 324.72 3.380211 856.08 0.5289438
Running the following code gives me the error: Error in eval(expr, envir, enclos) : object 'log10copy' not found, but log10copy is clearly one of the columns in my data set?
lme4.fit <- lme4::nlmer(log10copy ~ exp(p1-b1*day) + exp(p2-b2*day + 1) +
(1|p1) + (1|b1) + (1|p2) + (1|b2), data = dat)
I want to fit a model with 4 fixed effects on p1, b1, p2, b2 and 4 random effects on the same set of parameters.
You have several problems here...
1) The starting values must be a named vector
2) the data argument in nlmer should receive dat as value and not aids.dat as in your example
start <- c(p1 = 10, b1 = 0.5, p2 = 6, b2 = 0.005)
lme4.fit <- lme4::nlmer(log10copy ~ exp(p1-b1*day) + exp(p2-b2*day + 1) ~
(p1|patid) + (b1|patid) + (p2|patid) + (b2|patid), data = dat,
start = start)
This will now trigger the following error :
Erreur : is.matrix(gr <- attr(val, "gradient")) is not TRUE
As explained in the documentation :
Currently, the Nonlin(..) formula part must not only return a numeric
vector, but also must have a "gradient" attribute, a matrix. The
functions SSbiexp, SSlogis, etc, see selfStart, provide this (and
more). Alternatively, you can use deriv() to automatically produce
such functions or expressions.
You can then adapt the example provided by the documentation :
## a. Define formula
nform <- ~ exp(p1-b1*input) + exp(p2-b2*input + 1)
## b. Use deriv() to construct function:
nfun <- deriv(nform, namevec=c("p1", "b1", "p2", "b2"),
function.arg=c("input","p1", "b1", "p2", "b2"))
lme4.fit <- lme4::nlmer(log10copy ~ nfun(day, p1, b1, p2, b2) ~
(p1|patid) + (b1|patid) + (p2|patid) + (b2|patid), data = dat,
start = start)
You will then have the following error
Error in fn(nM$xeval()) : prss failed to converge in 300 iterations
This might mean that your model is too complex for your data...
Or maybe I did a mistake in the specification as I' don't know nlmer very well (I just tried to apply the documentation...) nor do I know your model/question.
When you change the optimizer, the convergence problems seem to be gone...
See here for recommendations about "troubleshooting" (including convergence problems) with lme4
lme4.fit <- lme4::nlmer(log10copy ~ nfun(day, p1, b1, p2, b2) ~
(p1|patid) + (b1|patid) +
(p2|patid) + (b2|patid),
data = dat,
start = start,
nlmerControl(optimizer = "bobyqa"))
Related
In a Cox regression framework, I'd like to implement piecewise continuous time-dependent coefficients. This is for variable that do not satisify the PH assumption.
In this vignette https://cran.r-project.org/web/packages/survival/vignettes/timedep.pdf, there are examples for step functions (p.17), and coefficient with some functional form (p.20).
What I'd like is to somehow have a piecewise relationship. Using the example provided in the vignette:
library(survival)
vfit <- coxph(Surv(time, status) ~ trt + prior + karno, veteran)
zp <- cox.zph(vfit, transform= function(time) log(time +20))
## Step functions
vet2 <- survSplit(Surv(time, status) ~ ., data= veteran, cut=c(90, 180),
episode= "tgroup", id="id")
vfit2 <- coxph(Surv(tstart, time, status) ~ trt + prior +
karno:strata(tgroup), data=vet2)
## Functional form
vfit3 <- coxph(Surv(time, status) ~ trt + prior + karno + tt(karno),
data=veteran,
tt = function(x, t, ...) x * log(t+20))
plot(zp[3])
abline(coef(vfit3)[3:4], col=2)
From the plot (also on p.21 in the vignette), we might argue that we could have a similar but inverted trend from approx. Time=200. I've tried but without success.
First tried directly with a piecewise function with the tt argument but it does not give two sets of coefficents, only one coef for karno and one for tt(karno). I mean we should have something like ax+b for t<200 and cx+d for t>=200
vfit3 <- coxph(Surv(time, status) ~ trt + prior + karno + tt(karno),
data=veteran,
tt = function(x, t, t1, t2, ...) x * log(t1+20) * (t<200) +
x * t2 * (t>=200))
So in a second step, I tried to mix both step functions with some functional form for each. Meaning to split the data in two time periods as for step functions and then fit a function in each. But gives error.
vfit3 <- coxph(Surv(tstart, time, status) ~ trt + prior +
(karno + tt(karno)):strata(tgroup),
data=vet2,
tt = function(x, t, ...) x * log(t+20) * (t<200) -
x * t * (t>=200))
Does someone knows how to implement this?
EDIT:
This is what I've come up with
library(survival)
## Original model
m1 <- coxph(formula = Surv(time, status) ~ trt + prior + karno,
data = veteran)
## Transform to long format as in the link
vet1 <- survSplit(Surv(time, status)~., data = veteran, id = "id",
cut = unique(veteran$time))
## Add a grouping variable (strata) for time before 200 days and after.
vet1$tgroup <- ifelse(vet1$time < 200, 1, 2)
## Add a time-transform function
## Here it is the same function for both strata, but they could be different
## e.g. ifelse(vet1$time < 200, f1(time), f2(time))
## Actually not sure, as we need to be careful with the time scale... Anyway
vet1$time1 <- log(vet1$time + 20)
## Same model as in the link, but then add an interaction with the strata
m2 <- coxph(formula = Surv(tstart, time, status)~
trt + prior + (karno + karno:time1):strata(tgroup), data = vet1)
## Some plots as in the vignette
zp <- cox.zph(m1, transform = function(time) log(time +20))
plot(zp[3])
abline(coef(m2)[c(3,5)], col="tomato")
abline(coef(m2)[c(4,6)], col="tomato")
I have a dataset that I want to fit a Gompertz model grouped by 4 different factors (subject, race, target & distractor). The Gompertz model works when applied to the entire data set (i.e., without applying "group_by"). The group_by function works when I use a (much simpler) linear regression. However, when I try to use group_by with the Gompertz model I get the following error:
Error in chol2inv(object$m$Rmat()) :
element (3, 3) is zero, so the inverse cannot be computed
In addition: Warning messages:
1: In nls(yt ~ ymin + ymax * (exp(-exp((alpha * 2.718282/ymax) * (lambda - :
Convergence failure: false convergence (8)
2: In nls(yt ~ ymin + ymax * (exp(-exp((alpha * 2.718282/ymax) * (lambda - :
Convergence failure: singular convergence (7)
Here is the code:
grouped_data = all_merged %>%
group_by(subject,race,target,distractor)
gomp_fits = do(grouped_data, tidy(nls(yt ~ ymin+ymax*(exp(-exp((alpha* 2.718282/ymax)*(lambda-time)+1))), data = ., start = list(lambda = 0.480, alpha = 5.8, ymin = 0, ymax = 1.6),
control = list(warnOnly = TRUE),
algorithm = "port",
lower = c(0,-Inf, -Inf, 0),
upper= c(2, Inf, Inf, 2))))
Thank you!
TLDR
Consider nlsLM, a self-starting Gompertz model or use a method to calculate starting values, use it in a group_modify workflow.
Maybe something like this (though upper and lower limits may not be necessary
fit_gomp <- function(data, ...) {
nlsLM(formula = y ~ SSgompertz(x, Asym, b2, b3),
data = data,
lower = c(0,-Inf, -Inf, 0),
upper = c(2, Inf, Inf, 2),
...) %>% tidy()
}
data %>%
group_by(subject, race, target, distractor) %>%
group_modify(~ fit_qomp(data = .x), .keep = TRUE)
Getting starting values
While I haven't used a Gompertz model, consider if you can find a way to get starting values mathematically.
For example, let's say I want to fit a quadratic-plateau model (it only has 3 starting parameters however). First I have a function that defines the equation, which will go inside nls later.
# y = b0 + b1x + b2x^2
# b0 = intercept
# b1 = slope
# b2 = quadratic term
# jp = join point = critical concentration
quadp <- function(x, b0, b1, jp) {
b2 <- -0.5 * b1 / jp
if_else(
condition = x < jp,
true = b0 + (b1 * x) + (b2 * x * x),
false = b0 + (b1 * jp) + (b2 * jp * jp)
)
}
The second part is to make a fitting function that fits a quadratic polynomial, uses those coefficients as starting values in the nls portion, and fits the nls model.
fit_quadp <- function(data, ...) {
# get starting values from simple quadratic
start <- lm(y ~ poly(x, 2, raw = TRUE), data = data)
start_values <- list(b0 = start$coef[[1]], # intercept
b1 = start$coef[[2]], # slope
jp = median(data$x)) # join-point
# nls model that uses those starting values
nlsLM(formula = y ~ quadp(x, b0, b1, jp),
data = data,
start = start_values,
...
) %>% tidy()
}
The ... is to add arguments for nls.control if needed.
Analyzing grouped data
As for analyzing grouped data, I use group_modify() because it returns a data frame whereas group_map() returns a list. So my basic workflow looks like:
dataset %>%
group_by(grouping_variable_1, grouping_variable_2, ...) %>%
group_modify(~ fit_quadp(data = .x), .keep = TRUE)
Then out comes a table with all the tidy statistics because tidy() was used in the function. You can consider including a try() wrapped around the nls() portion of the function so that if it succeeds on the first two groups but on the third, it'll still continue and you should still get some results.
nlsLM()
Also, if you want to use nlsLM from minpack.lm, the algorithm there succeeds more than those available in nls(). Some worry about false convergence, but I haven't seen it yet in my applications. Also with nlsLM you may not need to bother with upper and lower limits, though they can still be set.
I am using R-INLA to run the following model (Treatment, Animal.1 and Animal.2 are factors and Encounter.Length is continuous):
formula <- Encounter.Length ~ Treatment +f(Animal.1, model = "iid", n = n.animal) +
f(Animal.2, copy = "Animal.1")
m.1 <- inla(formula, data = inla.dat)
However, after running this code I get the following error message:
Error in inla(formula, data = inla.dat) :
In f(Animal.1): 'covariate' must match 'values', and both must either be 'numeric', or 'factor'/'character'.
I am new to using INLA and want to know what this error message means and how to fix it.
Answer (from r-inla.help): The levels of B are not a subset of A (which is used to define the model, for which B copies). So you must define the models on the union on the levels.
For example:
n <- 3
A <- as.factor(letters[1:n])
B <- as.factor(letters[1+1:n])
y <- 1:n
This does not work
inla(y ~ -1 + f(A) + f(B, copy = "A"), data = data.frame(A, B))
But this does
values <- as.factor(unique(c(levels(A), levels(B))))
inla(y ~ -1 + f(A, values = values) + f(B, copy = "A"),
data = list(A = A, B = B, values = values))
> library("lmtest")
> a = arima.sim(list(ar = c(.05, -.05)), 1000)
> b = arima(a, order = c(2, 0, 0))
> resettest(b)
**Error in terms.default(formula) : no terms component nor attribute**
Question 1. What I am doing is shown above. What should I do about that?
(I have tried to put in type, data and power parameter at resettest(), result is the same.)
Question 2.If I want to do the same thing on the model below
๐๐ก=0.5+0.5๐(๐กโ1)โ0.5๐(๐กโ2)+0.1๐(๐กโ1)^2+๐_๐ก
which is a ar(2) model plus 0.1๐_(๐กโ1)^2, how to fit this nonlinear model (by using R, thank you!)?
should have earn more reputation... can't post pic below 10 :(
The issue is that the first argument of resettest is
formula - a symbolic description for the model to be tested (or a fitted "lm" object).
So, passing an Arima object is not going to work. Instead we may manually define the lagged variables and provide an lm object or just the formula:
la1 <- Hmisc::Lag(a, 1)
la2 <- Hmisc::Lag(a, 2)
resettest(a ~ la1 + la2)
#
# RESET test
#
# data: a ~ la1 + la2
# RESET = 0.10343, df1 = 2, df2 = 993, p-value = 0.9018
Now your second model is nonlinear in variables but linear in parameters, so the same estimation methods still apply. (I'm assuming that the true DGP remains the same and you just want to test a new specification.) In particular,
resettest(a ~ la1 + la2 + I(la2^2))
#
# RESET test
#
# data: a ~ la1 + la2 + I(la2^2)
# RESET = 0.089211, df1 = 2, df2 = 992, p-value = 0.9147
I'm trying to get a lme with self constructed interaction variables to fit. I need those for post-hoc analysis.
library(nlme)
# construct fake dataset
obsr <- 100
dist <- rep(rnorm(36), times=obsr)
meth <- dist+rnorm(length(dist), mean=0, sd=0.5); rm(dist)
meth <- meth/dist(range(meth)); meth <- meth-min(meth)
main <- data.frame(meth = meth,
cpgl = as.factor(rep(1:36, times=obsr)),
pbid = as.factor(rep(1:obsr, each=36)),
agem = rep(rnorm(obsr, mean=30, sd=10), each=36),
trma = as.factor(rep(sample(c(TRUE, FALSE), size=obsr, replace=TRUE), each=36)),
depr = as.factor(rep(sample(c(TRUE, FALSE), size=obsr, replace=TRUE), each=36)))
# check if all factor combinations are present
# TRUE for my real dataset; Naturally TRUE for the fake dataset
with(main, all(table(depr, trma, cpgl) >= 1))
# construct interaction variables
main$depr_trma <- interaction(main$depr, main$trma, sep=":", drop=TRUE)
main$depr_cpgl <- interaction(main$depr, main$cpgl, sep=":", drop=TRUE)
main$trma_cpgl <- interaction(main$trma, main$cpgl, sep=":", drop=TRUE)
main$depr_trma_cpgl <- interaction(main$depr, main$trma, main$cpgl, sep=":", drop=TRUE)
# model WITHOUT preconstructed interaction variables
form1 <- list(fixd = meth ~ agem + depr + trma + depr*trma + cpgl +
depr*cpgl +trma*cpgl + depr*trma*cpgl,
rndm = ~ 1 | pbid,
corr = ~ cpgl | pbid)
modl1 <- nlme::lme(fixed=form1[["fixd"]],
random=form1[["rndm"]],
correlation=corCompSymm(form=form1[["corr"]]),
data=main)
# model WITH preconstructed interaction variables
form2 <- list(fixd = meth ~ agem + depr + trma + depr_trma + cpgl +
depr_cpgl + trma_cpgl + depr_trma_cpgl,
rndm = ~ 1 | pbid,
corr = ~ cpgl | pbid)
modl2 <- nlme::lme(fixed=form2[["fixd"]],
random=form2[["rndm"]],
correlation=corCompSymm(form=form2[["corr"]]),
data=main)
The first model fits without any problems whereas the second model gives me following error:
Error in MEEM(object, conLin, control$niterEM) :
Singularity in backsolve at level 0, block 1
Nothing i found out about this error so far helped me to solve the problem. However the solution is probably pretty easy.
Can someone help me? Thanks in advance!
EDIT 1:
When i run:
modl3 <- lm(form1[["fixd"]], data=main)
modl4 <- lm(form2[["fixd"]], data=main)
The summaries reveal that modl4 (with the self constructed interaction variables) in contrast to modl3 shows many more predictors. All those that are in 4 but not in 3 show NA as coefficients. The problem therefore definitely lies within the way i create the interaction variables...
EDIT 2:
In the meantime I created the interaction variables "by hand" (mainly paste() and grepl()) - It seems to work now. However I would still be interested in how i could have realized it by using the interaction() function.
I should have only constructed the largest of the interaction variables (combining all 3 simple variables).
If i do so the model gets fit. The likelihoods then are very close to each other and the number of coefficients matches exactly.