I want to fit a coxph model in r and offset one main effect and the interaction term.
I try to use offset command in front of the variable but it gives the error: Error in model.frame.default(formula = Surv(time = X, event = Delta) ~ :
variable lengths differ (found for 'offset(D:Y)')
We can generate some toy data like
require(survival)
D = rbinom(100, 1, 0.5)
Y = rbinom(100, 1, 0.5)
X = rexp(100, 1)
Delta = rbinom(100, 1, 0.5)
coxph(Surv(time = X, event = Delta) ~ D + offset(Y) + offset(D:Y))
I want to offset Y and D:Y but it keeps giving me the error. Maybe I am wrong with how to use "offset".
Despite the fact that you are using the expression D:Y within a formula expression, it is first being processed by offset which does not actually "know" what to do with the ":" operator in the R formula-parsing context. It instead is giving you a warning message that implies it is being parsed as the integer-sequence operator.
Error in model.frame.default(formula = Surv(time = X, event = Delta) ~ :
variable lengths differ (found for 'offset(D:Y)')
In addition: Warning messages:
1: In D:Y : numerical expression has 100 elements: only the first used
If you want to use the product of D and Y as an additional "interaction" offset, then you could have done this:
> coxph(Surv(time = X, event = Delta) ~ D + offset(Y) + offset(I(D*Y)))
Call:
coxph(formula = Surv(time = X, event = Delta) ~ D + offset(Y) +
offset(I(D * Y)))
coef exp(coef) se(coef) z p
D -0.9959 0.3694 0.2825 -3.525 0.000423
Likelihood ratio test=12.26 on 1 df, p=0.0004635
n= 100, number of events= 53
And looking at it I thought maybe even dropping the I() since offset wasn't using formula parsing logic anyway. And that proved to be the case.
Related
I have a dataset that I want to fit a Gompertz model grouped by 4 different factors (subject, race, target & distractor). The Gompertz model works when applied to the entire data set (i.e., without applying "group_by"). The group_by function works when I use a (much simpler) linear regression. However, when I try to use group_by with the Gompertz model I get the following error:
Error in chol2inv(object$m$Rmat()) :
element (3, 3) is zero, so the inverse cannot be computed
In addition: Warning messages:
1: In nls(yt ~ ymin + ymax * (exp(-exp((alpha * 2.718282/ymax) * (lambda - :
Convergence failure: false convergence (8)
2: In nls(yt ~ ymin + ymax * (exp(-exp((alpha * 2.718282/ymax) * (lambda - :
Convergence failure: singular convergence (7)
Here is the code:
grouped_data = all_merged %>%
group_by(subject,race,target,distractor)
gomp_fits = do(grouped_data, tidy(nls(yt ~ ymin+ymax*(exp(-exp((alpha* 2.718282/ymax)*(lambda-time)+1))), data = ., start = list(lambda = 0.480, alpha = 5.8, ymin = 0, ymax = 1.6),
control = list(warnOnly = TRUE),
algorithm = "port",
lower = c(0,-Inf, -Inf, 0),
upper= c(2, Inf, Inf, 2))))
Thank you!
TLDR
Consider nlsLM, a self-starting Gompertz model or use a method to calculate starting values, use it in a group_modify workflow.
Maybe something like this (though upper and lower limits may not be necessary
fit_gomp <- function(data, ...) {
nlsLM(formula = y ~ SSgompertz(x, Asym, b2, b3),
data = data,
lower = c(0,-Inf, -Inf, 0),
upper = c(2, Inf, Inf, 2),
...) %>% tidy()
}
data %>%
group_by(subject, race, target, distractor) %>%
group_modify(~ fit_qomp(data = .x), .keep = TRUE)
Getting starting values
While I haven't used a Gompertz model, consider if you can find a way to get starting values mathematically.
For example, let's say I want to fit a quadratic-plateau model (it only has 3 starting parameters however). First I have a function that defines the equation, which will go inside nls later.
# y = b0 + b1x + b2x^2
# b0 = intercept
# b1 = slope
# b2 = quadratic term
# jp = join point = critical concentration
quadp <- function(x, b0, b1, jp) {
b2 <- -0.5 * b1 / jp
if_else(
condition = x < jp,
true = b0 + (b1 * x) + (b2 * x * x),
false = b0 + (b1 * jp) + (b2 * jp * jp)
)
}
The second part is to make a fitting function that fits a quadratic polynomial, uses those coefficients as starting values in the nls portion, and fits the nls model.
fit_quadp <- function(data, ...) {
# get starting values from simple quadratic
start <- lm(y ~ poly(x, 2, raw = TRUE), data = data)
start_values <- list(b0 = start$coef[[1]], # intercept
b1 = start$coef[[2]], # slope
jp = median(data$x)) # join-point
# nls model that uses those starting values
nlsLM(formula = y ~ quadp(x, b0, b1, jp),
data = data,
start = start_values,
...
) %>% tidy()
}
The ... is to add arguments for nls.control if needed.
Analyzing grouped data
As for analyzing grouped data, I use group_modify() because it returns a data frame whereas group_map() returns a list. So my basic workflow looks like:
dataset %>%
group_by(grouping_variable_1, grouping_variable_2, ...) %>%
group_modify(~ fit_quadp(data = .x), .keep = TRUE)
Then out comes a table with all the tidy statistics because tidy() was used in the function. You can consider including a try() wrapped around the nls() portion of the function so that if it succeeds on the first two groups but on the third, it'll still continue and you should still get some results.
nlsLM()
Also, if you want to use nlsLM from minpack.lm, the algorithm there succeeds more than those available in nls(). Some worry about false convergence, but I haven't seen it yet in my applications. Also with nlsLM you may not need to bother with upper and lower limits, though they can still be set.
all-
First-time poster, here, so please be forbearing if I've violated some of the conventions for asking questions (like, for example, providing a replicable example).
I'm trying to estimate a Generalized Additive Mixed Model using the "gamm" function with this code:
fit1.1 = gamm(opioidNonFatalOD ~ s(mandatoryReg.l2, k = 3, fx = TRUE,
bs = "cr") +
s(coalitionActive.l2, k = 3, fx = TRUE, bs = "cr") +
monthsSinceJan2011 +
everFunded +
ICD10 +
spoke5 +
hub +
s(monthly2, bs = "cc", fx = FALSE, k = 4) +
s(county2, bs = "re"),
#+ offset(log(population / 100000)),
correlation = corAR1(form = ~ monthsSinceJan2011 | county2),
data = tsData,
family = quasipoisson, offset = log(population / 100000),
niterPQL = 20,
verbosePQL = TRUE)
For some reason, it looks like the "offset" argument isn't getting passed to gammPQL. I get this error:
iteration 1
Quitting from lines 201-220 (pfs_model_experiments_041520.Rmd)
Error in lme(fixed = fixed, random = random, data = data, correlation = correlation, :
unused argument (offset = log(population/1e+05))
Calls: <Anonymous> ... withVisible -> eval -> eval -> gamm -> eval -> eval -> gammPQL
Execution halted
Here're the traceback messages:
Error in lme(fixed = fixed, random = random, data = data, correlation = correlation, : unused argument (offset = log(population/1e+05))
4.
gammPQL(y ~ X - 1, random = rand, data = strip.offset(mf), family = family, correlation = correlation, control = control, weights = weights, niter = niterPQL, verbose = verbosePQL, mustart = mustart, etastart = etastart, ...) at <text>#1
3.
eval(parse(text = paste("ret$lme<-gammPQL(", deparse(fixed.formula), ",random=rand,data=strip.offset(mf),family=family,", "correlation=correlation,control=control,", "weights=weights,niter=niterPQL,verbose=verbosePQL,mustart=mustart,etastart=etastart,...)", sep = "")))
2.
eval(parse(text = paste("ret$lme<-gammPQL(", deparse(fixed.formula), ",random=rand,data=strip.offset(mf),family=family,", "correlation=correlation,control=control,", "weights=weights,niter=niterPQL,verbose=verbosePQL,mustart=mustart,etastart=etastart,...)", sep = "")))
1.
gamm(opioidNonFatalOD ~ s(mandatoryReg.l2, k = 3, fx = TRUE, bs = "cr") + s(coalitionActive.l2, k = 3, fx = TRUE, bs = "cr") + monthsSinceJan2011 + everFunded + ICD10 + spoke5 + hub + s(monthly2, bs = "cc", fx = FALSE, k = 4) + s(county2, bs = "re"), ...
I've tried using the offset as a term in the model (see commented-out code), but get a similar error.
Just be inspecting the code, does anyone have an idea of what I'm doing wrong?
Thanks,
David
tl;dr;
Create the offset outside the gamm function and then pass it to the formula using ...+offset().
In your example then use:
tsData$off = log(tsData$population/100000)
gamm(opioidNonFatalOD ~ <other variables> + s(county2, bs = "re") + offset(off),
<other stuffs>)
The general syntax for gams to add an offset is to include it in the formula, like y ~ ... + x + offset(offset_variable). However, as seen in the examples below it seems as if gammis struggling to parse functions (i.e. the log or division) within the offset function.
Some examples:
library(mgcv)
# create some data
set.seed(1)
dat <- gamSim(6,n=200,scale=.2,dist="poisson")
# create an offset
dat$off1 = (dat$y+1)*sample(2:10, 200, TRUE)
Attempt 1: finds off1 but errors likely due to the large values in off1 (and we really would like the log transfromed, or whichever link function was used)
m1 <- gamm(y~s(x0)+s(x1)+s(x2) + offset(off1),
family=poisson,data=dat,random=list(fac=~1))
Maximum number of PQL iterations: 20
iteration 1
iteration 2
Show Traceback
Rerun with Debug
Error in na.fail.default(list(Xr.1 = c(-0.00679246534326368, -0.0381904761033802,
:missing values in object
Attempt 2: can't seem to find off1 after log transform within offset function
m2 <- gamm(y~s(x0)+s(x1)+s(x2) + offset(log(off1)),
family=poisson, data=dat,random=list(fac=~1))
Maximum number of PQL iterations: 20
iteration 1
Show Traceback
Rerun with Debug
Error in eval(predvars, data, env) : object 'off1' not found
Attempt 3: define offset term outside offset function
# Success
dat$off2 = log(dat$off1)
m3 <- gamm(y~s(x0)+s(x1)+s(x2) + offset(off2),
family=poisson, data=dat, random=list(fac=~1))
So create the offset variable outside then pass it to the gamm formula.
The data in the following example are from here
library(tidyverse)
library(lme4)
dat <- read.table("aids.dat2",head=T) %>%
filter(day <= 90) %>%
mutate(log10copy = log10(lgcopy)) %>%
na.omit()
> head(dat)
patid day cd4 lgcopy cd8 log10copy
2 11542 2 159.84 4.361728 619.38 0.6396586
3 11542 7 210.60 3.531479 666.90 0.5479566
4 11542 16 204.12 2.977724 635.04 0.4738844
5 11542 29 172.48 2.643453 407.68 0.4221716
6 11542 57 270.94 2.113943 755.78 0.3250933
8 11960 2 324.72 3.380211 856.08 0.5289438
Running the following code gives me the error: Error in eval(expr, envir, enclos) : object 'log10copy' not found, but log10copy is clearly one of the columns in my data set?
lme4.fit <- lme4::nlmer(log10copy ~ exp(p1-b1*day) + exp(p2-b2*day + 1) +
(1|p1) + (1|b1) + (1|p2) + (1|b2), data = dat)
I want to fit a model with 4 fixed effects on p1, b1, p2, b2 and 4 random effects on the same set of parameters.
You have several problems here...
1) The starting values must be a named vector
2) the data argument in nlmer should receive dat as value and not aids.dat as in your example
start <- c(p1 = 10, b1 = 0.5, p2 = 6, b2 = 0.005)
lme4.fit <- lme4::nlmer(log10copy ~ exp(p1-b1*day) + exp(p2-b2*day + 1) ~
(p1|patid) + (b1|patid) + (p2|patid) + (b2|patid), data = dat,
start = start)
This will now trigger the following error :
Erreur : is.matrix(gr <- attr(val, "gradient")) is not TRUE
As explained in the documentation :
Currently, the Nonlin(..) formula part must not only return a numeric
vector, but also must have a "gradient" attribute, a matrix. The
functions SSbiexp, SSlogis, etc, see selfStart, provide this (and
more). Alternatively, you can use deriv() to automatically produce
such functions or expressions.
You can then adapt the example provided by the documentation :
## a. Define formula
nform <- ~ exp(p1-b1*input) + exp(p2-b2*input + 1)
## b. Use deriv() to construct function:
nfun <- deriv(nform, namevec=c("p1", "b1", "p2", "b2"),
function.arg=c("input","p1", "b1", "p2", "b2"))
lme4.fit <- lme4::nlmer(log10copy ~ nfun(day, p1, b1, p2, b2) ~
(p1|patid) + (b1|patid) + (p2|patid) + (b2|patid), data = dat,
start = start)
You will then have the following error
Error in fn(nM$xeval()) : prss failed to converge in 300 iterations
This might mean that your model is too complex for your data...
Or maybe I did a mistake in the specification as I' don't know nlmer very well (I just tried to apply the documentation...) nor do I know your model/question.
When you change the optimizer, the convergence problems seem to be gone...
See here for recommendations about "troubleshooting" (including convergence problems) with lme4
lme4.fit <- lme4::nlmer(log10copy ~ nfun(day, p1, b1, p2, b2) ~
(p1|patid) + (b1|patid) +
(p2|patid) + (b2|patid),
data = dat,
start = start,
nlmerControl(optimizer = "bobyqa"))
Like in this post I'm struggling with the notation of MCMCglmm, especially what is meant by trait. My code ist the following
library("MCMCglmm")
set.seed(123)
y <- sample(letters[1:3], size = 100, replace = TRUE)
x <- rnorm(100)
id <- rep(1:10, each = 10)
dat <- data.frame(y, x, id)
mod <- MCMCglmm(fixed = y ~ x, random = ~us(x):id,
data = dat,
family = "categorical")
Which gives me the error message For error structures involving catgeorical data with more than 2 categories pleasue use trait:units or variance.function(trait):units. (!sic). If I would generate dichotomous data by letters[1:2], everything would work fine. So what is meant by this error message in general and "trait" in particular?
Edit 2016-09-29:
From the linked question I copied rcov = ~ us(trait):units into my call of MCMCglmm. And from https://stat.ethz.ch/pipermail/r-sig-mixed-models/2010q3/004006.html I took (and slightly modified it) the prior
list(R = list(V = diag(2), fix = 1), G = list(G1 = list(V = diag(2), nu = 1, alpha.mu = c(0, 0), alpha.V = diag(2) * 100))). Now my model actually gives results:
MCMCglmm(fixed = y ~ 1 + x, random = ~us(1 + x):id,
rcov = ~ us(trait):units, prior = prior, data = dat,
family = "categorical")
But still I've got a lack of understanding what is meant by trait (and what by units and the notation of the prior, and what is us() compared to idh() and ...).
Edit 2016-11-17:
I think trait is synoym to "target variable" or "response" in general or y in this case. In the formula for random there is nothing on the left side of ~ "because the response is known from the fixed effect specification." So the rational behind specifiying that rcov needs trait:units could be that it is alread defined by the fixed formula, what trait is (y in this case).
units is the response variable value, and trait is the response variable name, which corresponds to the categories. By specifying rcov = ~us(trait):units, you are allowing the residual variance to be heterogeneous across "traits" (response categories) so that all elements of the residual variance-covariance matrix will be estimated.
In Section 5.1 of Hadfield's MCMCglmm Course Notes (vignette("CourseNotes", "MCMCglmm")) you can read an explanation for the reserved variables trait and units.
I have a dataset like this
df
x y
7.3006667 -0.14383333
-0.8983333 0.02133333
2.7953333 -0.07466667
and I would like to fit an exponential function like y = a*(exp(bx)).
This is what I tried and the error I get
f <- function(x,a,b) {a * exp(b * x)}
st <- coef(nls(log(y) ~ log(f(x, a, b)), df, start = c(a = 1, b = -1)))
Error in qr.qty(QR, resid) : NA/NaN/Inf in foreign function call (arg 5)
In addition: Warning messages:
1: In log(y) : NaNs produced
2: In log(y) : NaNs produced
fit <- nls(y ~ f(x, a, b), data = df, start = list(a = st[1], b = st[2]))
Error in nls(y ~ exp(a + b * x), data = df, start = list(a = st[1], :
singular gradient
I believe it has to do with the fact that the log is not defined for negative numbers but I don't know how to solve this.
I'm having trouble seeing the problem here.
f <- function(x,a,b) {a * exp(b * x)}
fit <- nls(y~f(x,a,b),df,start=c(a=1,b=1))
summary(fit)$coefficients
# Estimate Std. Error t value Pr(>|t|)
# a -0.02285668 0.03155189 -0.7244157 0.6008871
# b 0.25568987 0.19818736 1.2901422 0.4197729
plot(y~x, df)
curve(predict(fit,newdata=data.frame(x)), add=TRUE)
The coefficients are very poorly estimated, but that's not surprising: you have two parameters and three data points.
As to why your code fails: the first call to nls(...) generates an error, so st is never set to anything (although it may have a value from some earlier code). Then you try to use that in the second call to nls(...).