Related
I am facing a problem I do not manage to solve. I would like to use nlme or nlmODE to perform a non linear regression with random effect using as a model the solution of a second order differential equation with fixed coefficients (a damped oscillator).
I manage to use nlme with simple models, but it seems that the use of deSolve to generate the solution of the differential equation causes a problem. Below an example, and the problems I face.
The data and functions
Here is the function to generate the solution of the differential equation using deSolve:
library(deSolve)
ODE2_nls <- function(t, y, parms) {
S1 <- y[1]
dS1 <- y[2]
dS2 <- dS1
dS1 <- - parms["esp2omega"]*dS1 - parms["omega2"]*S1 + parms["omega2"]*parms["yeq"]
res <- c(dS2,dS1)
list(res)}
solution_analy_ODE2 = function(omega2,esp2omega,time,y0,v0,yeq){
parms <- c(esp2omega = esp2omega,
omega2 = omega2,
yeq = yeq)
xstart = c(S1 = y0, dS1 = v0)
out <- lsoda(xstart, time, ODE2_nls, parms)
return(out[,2])
}
I can generate a solution for a given period and damping factor, as for example here a period of 20 and a slight damping of 0.2:
# small example:
time <- 1:100
period <- 20 # period of oscillation
amort_factor <- 0.2
omega <- 2*pi/period # agular frequency
oscil <- solution_analy_ODE2(omega^2,amort_factor*2*omega,time,1,0,0)
plot(time,oscil)
Now I generate a panel of 10 individuals with a random starting phase (i.e. different starting position and velocity). The goal is to perform a non linear regression with random effect on the starting values
library(data.table)
# generate panel
Npoint <- 100 # number of time poitns
Nindiv <- 10 # number of individuals
period <- 20 # period of oscillation
amort_factor <- 0.2
omega <- 2*pi/period # agular frequency
# random phase
phase <- sample(seq(0,2*pi,0.01),Nindiv)
# simu data:
data_simu <- data.table(time = rep(1:Npoint,Nindiv), ID = rep(1:Nindiv,each = Npoint))
# signal generation
data_simu[,signal := solution_analy_ODE2(omega2 = omega^2,
esp2omega = 2*0.2*omega,
time = time,
y0 = sin(phase[.GRP]),
v0 = omega*cos(phase[.GRP]),
yeq = 0)+
rnorm(.N,0,0.02),by = ID]
If we have a look, we have a proper dataset:
library(ggplot2)
ggplot(data_simu,aes(time,signal,color = ID))+
geom_line()+
facet_wrap(~ID)
The problems
Using nlme
Using nlme with similar syntax working on simpler examples (non linear functions not using deSolve), I tried:
fit <- nlme(model = signal ~ solution_analy_ODE2(esp2omega,omega2,time,y0,v0,yeq),
data = data_simu,
fixed = esp2omega + omega2 + y0 + v0 + yeq ~ 1,
random = y0 ~ 1 ,
groups = ~ ID,
start = c(esp2omega = 0.08,
omega2 = 0.04,
yeq = 0,
y0 = 1,
v0 = 0))
I obtain:
Error in checkFunc(Func2, times, y, rho) : The number of derivatives returned by func() (2) must equal the length of the initial conditions vector (2000)
The traceback:
12. stop(paste("The number of derivatives returned by func() (", length(tmp[[1]]), ") must equal the length of the initial conditions vector (", length(y), ")", sep = ""))
11. checkFunc(Func2, times, y, rho)
10. lsoda(xstart, time, ODE2_nls, parms)
9. solution_analy_ODE2(omega2, esp2omega, time, y0, v0, yeq)
.
.
I looks like nlme is trying to pass a vector of starting condition to solution_analy_ODE2, and causes an error in checkFunc from lasoda.
I tried using nlsList:
test <- nlsList(model = signal ~ solution_analy_ODE2(omega2,esp2omega,time,y0,v0,yeq) | ID,
data = data_simu,
start = list(esp2omega = 0.08, omega2 = 0.04,yeq = 0,
y0 = 1,v0 = 0),
control = list(maxiter=150, warnOnly=T,minFactor = 1e-10),
na.action = na.fail, pool = TRUE)
head(test)
Call:
Model: signal ~ solution_analy_ODE2(omega2, esp2omega, time, y0, v0, yeq) | ID
Data: data_simu
Coefficients:
esp2omega omega2 yeq y0 v0
1 0.1190764 0.09696076 0.0007577956 -0.1049423 0.30234654
2 0.1238936 0.09827158 -0.0003463023 0.9837386 0.04773775
3 0.1280399 0.09853310 -0.0004908579 0.6051663 0.25216134
4 0.1254053 0.09917855 0.0001922963 -0.5484005 -0.25972829
5 0.1249473 0.09884761 0.0017730823 0.7041049 0.22066652
6 0.1275408 0.09966155 -0.0017522320 0.8349450 0.17596648
We can see that te non linear fit works well on individual signals. Now if I want to perform a regression of the dataset with random effects, the syntax should be:
fit <- nlme(test,
random = y0 ~ 1 ,
groups = ~ ID,
start = c(esp2omega = 0.08,
omega2 = 0.04,
yeq = 0,
y0 = 1,
v0 = 0))
But I obtain the exact same error message.
I then tried using nlmODE, following Bne Bolker's comment on a similar question I asked some years ago
using nlmODE
library(nlmeODE)
datas_grouped <- groupedData( signal ~ time | ID, data = data_simu,
labels = list (x = "time", y = "signal"),
units = list(x ="arbitrary", y = "arbitrary"))
modelODE <- list( DiffEq = list(dS2dt = ~ S1,
dS1dt = ~ -esp2omega*S1 - omega2*S2 + omega2*yeq),
ObsEq = list(yc = ~ S2),
States = c("S1","S2"),
Parms = c("esp2omega","omega2","yeq","ID"),
Init = c(y0 = 0,v0 = 0))
resnlmeode = nlmeODE(modelODE, datas_grouped)
assign("resnlmeode", resnlmeode, envir = .GlobalEnv)
#Fitting with nlme the resulting function
model <- nlme(signal ~ resnlmeode(esp2omega,omega2,yeq,time,ID),
data = datas_grouped,
fixed = esp2omega + omega2 + yeq + y0 + v0 ~ 1,
random = y0 + v0 ~1,
start = c(esp2omega = 0.08,
omega2 = 0.04,
yeq = 0,
y0 = 0,
v0 = 0)) #
I get the error:
Error in resnlmeode(esp2omega, omega2, yeq, time, ID) : object 'yhat' not found
Here I don't understand where the error comes from, nor how to solve it.
Questions
Can you reproduce the problem ?
Does anyone have an idea to solve this problem, using either nlme or nlmODE ?
If not, is there a solution using an other package ? I saw nlmixr (https://cran.r-project.org/web/packages/nlmixr/index.html), but I don't know it, the instalation is complicated and it was recently remove from CRAN
Edits
#tpetzoldt suggested a nice way to debug nlme behavior, and it surprised me a lot. Here is a working example with a non linear function, where I generate a set of 5 individual with a random parameter varying between individuals :
reg_fun = function(time,b,A,y0){
cat("time : ",length(time)," b :",length(b)," A : ",length(A)," y0: ",length(y0),"\n")
out <- A*exp(-b*time)+(y0-1)
cat("out : ",length(out),"\n")
tmp <- cbind(b,A,y0,time,out)
cat(apply(tmp,1,function(x) paste(paste(x,collapse = " "),"\n")),"\n")
return(out)
}
time <- 0:10*10
ramdom_y0 <- sample(seq(0,1,0.01),10)
Nid <- 5
data_simu <-
data.table(time = rep(time,Nid),
ID = rep(LETTERS[1:Nid],each = length(time)) )[,signal := reg_fun(time,0.02,2,ramdom_y0[.GRP]) + rnorm(.N,0,0.1),by = ID]
The cats in the function give here:
time : 11 b : 1 A : 1 y0: 1
out : 11
0.02 2 0.64 0 1.64
0.02 2 0.64 10 1.27746150615596
0.02 2 0.64 20 0.980640092071279
0.02 2 0.64 30 0.737623272188053
0.02 2 0.64 40 0.538657928234443
0.02 2 0.64 50 0.375758882342885
0.02 2 0.64 60 0.242388423824404
0.02 2 0.64 70 0.133193927883213
0.02 2 0.64 80 0.0437930359893108
0.02 2 0.64 90 -0.0294022235568269
0.02 2 0.64 100 -0.0893294335267746
.
.
.
Now I do with nlme:
nlme(model = signal ~ reg_fun(time,b,A,y0),
data = data_simu,
fixed = b + A + y0 ~ 1,
random = y0 ~ 1 ,
groups = ~ ID,
start = c(b = 0.03, A = 1,y0 = 0))
I get:
time : 55 b : 55 A : 55 y0: 55
out : 55
0.03 1 0 0 0
0.03 1 0 10 -0.259181779318282
0.03 1 0 20 -0.451188363905974
0.03 1 0 30 -0.593430340259401
0.03 1 0 40 -0.698805788087798
0.03 1 0 50 -0.77686983985157
0.03 1 0 60 -0.834701111778413
0.03 1 0 70 -0.877543571747018
0.03 1 0 80 -0.909282046710588
0.03 1 0 90 -0.93279448726025
0.03 1 0 100 -0.950212931632136
0.03 1 0 0 0
0.03 1 0 10 -0.259181779318282
0.03 1 0 20 -0.451188363905974
0.03 1 0 30 -0.593430340259401
0.03 1 0 40 -0.698805788087798
0.03 1 0 50 -0.77686983985157
0.03 1 0 60 -0.834701111778413
0.03 1 0 70 -0.877543571747018
0.03 1 0 80 -0.909282046710588
0.03 1 0 90 -0.93279448726025
0.03 1 0 100 -0.950212931632136
0.03 1 0 0 0
0.03 1 0 10 -0.259181779318282
0.03 1 0 20 -0.451188363905974
0.03 1 0 30 -0.593430340259401
0.03 1 0 40 -0.698805788087798
0.03 1 0 50 -0.77686983985157
0.03 1 0 60 -0.834701111778413
0.03 1 0 70 -0.877543571747018
0.03 1 0 80 -0.909282046710588
0.03 1 0 90 -0.93279448726025
0.03 1 0 100 -0.950212931632136
0.03 1 0 0 0
0.03 1 0 10 -0.259181779318282
0.03 1 0 20 -0.451188363905974
0.03 1 0 30 -0.593430340259401
0.03 1 0 40 -0.698805788087798
0.03 1 0 50 -0.77686983985157
0.03 1 0 60 -0.834701111778413
0.03 1 0 70 -0.877543571747018
0.03 1 0 80 -0.909282046710588
0.03 1 0 90 -0.93279448726025
0.03 1 0 100 -0.950212931632136
0.03 1 0 0 0
0.03 1 0 10 -0.259181779318282
0.03 1 0 20 -0.451188363905974
0.03 1 0 30 -0.593430340259401
0.03 1 0 40 -0.698805788087798
0.03 1 0 50 -0.77686983985157
0.03 1 0 60 -0.834701111778413
0.03 1 0 70 -0.877543571747018
0.03 1 0 80 -0.909282046710588
0.03 1 0 90 -0.93279448726025
0.03 1 0 100 -0.950212931632136
time : 55 b : 55 A : 55 y0: 55
out : 55
0.03 1 0 0 0
0.03 1 0 10 -0.259181779318282
0.03 1 0 20 -0.451188363905974
0.03 1 0 30 -0.593430340259401
0.03 1 0 40 -0.698805788087798
0.03 1 0 50 -0.77686983985157
0.03 1 0 60 -0.834701111778413
0.03 1 0 70 -0.877543571747018
0.03 1 0 80 -0.909282046710588
0.03 1 0 90 -0.93279448726025
0.03 1 0 100 -0.950212931632136
0.03 1 0 0 0
0.03 1 0 10 -0.259181779318282
0.03 1 0 20 -0.451188363905974
0.03 1 0 30 -0.593430340259401
0.03 1 0 40 -0.698805788087798
0.03 1 0 50 -0.77686983985157
0.03 1 0 60 -0.834701111778413
0.03 1 0 70 -0.877543571747018
0.03 1 0 80 -0.909282046710588
0.03 1 0 90 -0.93279448726025
0.03 1 0 100 -0.950212931632136
0.03 1 0 0 0
0.03 1 0 10 -0.259181779318282
0.03 1 0 20 -0.451188363905974
0.03 1 0 30 -0.593430340259401
0.03 1 0 40 -0.698805788087798
0.03 1 0 50 -0.77686983985157
0.03 1 0 60 -0.834701111778413
0.03 1 0 70 -0.877543571747018
0.03 1 0 80 -0.909282046710588
0.03 1 0 90 -0.93279448726025
0.03 1 0 100 -0.950212931632136
...
So nlme binds 5 time (the number of individual) the time vector and pass it to the function, with the parameters repeated the same number of time. Which is of course not compatible with the way lsoda and my function works.
It seems that the ode model is called with a wrong argument, so that it gets a vector with 2000 state variables instead of 2. Try the following to see the problem:
ODE2_nls <- function(t, y, parms) {
cat(length(y),"\n") # <----
S1 <- y[1]
dS1 <- y[2]
dS2 <- dS1
dS1 <- - parms["esp2omega"]*dS1 - parms["omega2"]*S1 + parms["omega2"]*parms["yeq"]
res <- c(dS2,dS1)
list(res)
}
Edit: I think that the analytical function worked, because it is vectorized, so you may try to vectorize the ode function, either by iterating over the ode model or (better) internally using vectors as state variables. As ode is fast in solving systems with several 100k equations, 2000 should be feasible.
I guess that both, states and parameters from nlme are passed as vectors. The state variable of the ode model is then a "long" vector, the parameters can be implemented as a list.
Here an example (edited, now with parameters as list):
ODE2_nls <- function(t, y, parms) {
#cat(length(y),"\n")
#cat(length(parms$omega2))
ndx <- seq(1, 2*N-1, 2)
S1 <- y[ndx]
dS1 <- y[ndx + 1]
dS2 <- dS1
dS1 <- - parms$esp2omega * dS1 - parms$omega2 * S1 + parms$omega2 * parms$yeq
res <- c(dS2, dS1)
list(res)
}
solution_analy_ODE2 = function(omega2, esp2omega, time, y0, v0, yeq){
parms <- list(esp2omega = esp2omega, omega2 = omega2, yeq = yeq)
xstart = c(S1 = y0, dS1 = v0)
out <- ode(xstart, time, ODE2_nls, parms, atol=1e-4, rtol=1e-4, method="ode45")
return(out[,2])
}
Then set (or calculate) the number of equations, e.g. N <- 1 resp. N <-1000 before the calls.
The model runs through this way, before running in numerical issues, but that's another story ...
You may then try to use another ode solver (e.g. vode), set atoland rtol to lower values, tweak nmle's optimization parameters, use box constraints ... and so on, as usual in nonlinear optimization.
I found a solution hacking nlme behavior: as shown in my edit, the problem comes from the fact that nlme passes a vector of NindividualxNpoints to the nonlinear function, supposing that the function associates for each time point a value. But lsoda don't do that, as it integrates an equation along time (i.e. it need all time until a given time poit to produce a value).
My solution consists in decomposing the parameters that nlme passes to my function, make the calculation, and re-create a vector:
detect_id <- function(vec){
tmp <- c(0,diff(vec))
out <- tmp
out <- NA
out[tmp < 0] <- 1:sum(tmp < 0)
out <- na.locf(out,na.rm = F)
rleid(out)
}
detect_id decompose the time vector into single time vectors identificator:
detect_id(rep(1:10,3))
[1] 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3
And then, the function doing the numeric integration loop over each individuals, and bind the resulting vectors together:
solution_analy_ODE2_modif = function(omega2,esp2omega,time,y0,v0,yeq){
tmp <- detect_id(time)
out <- lapply(unique(tmp),function(i){
idxs <- which(tmp == i)
parms <- c(esp2omega = esp2omega[idxs][1],
omega2 = omega2[idxs][1],
yeq = yeq[idxs][1])
xstart = c(S1 = y0[idxs][1], dS1 = v0[idxs][1])
out_tmp <- lsoda(xstart, time[idxs], ODE2_nls, parms)
out_tmp[,2]
}) %>% unlist()
return(out)
}
It I make a test, where I pass a vector similar to whats nlme passes to the function:
omega2vec <- rep(0.1,30)
eps2omegavec <- rep(0.1,30)
timevec <- rep(1:10,3)
y0vec <- rep(1,30)
v0vec <- rep(0,30)
yeqvec = rep(0,30)
solution_analy_ODE2_modif(omega2 = omega2vec,
esp2omega = eps2omegavec,
time = timevec,
y0 = y0vec,
v0 = v0vec,
yeq = yeqvec)
[1] 1.0000000 0.9520263 0.8187691 0.6209244 0.3833110 0.1321355 -0.1076071 -0.3143798
[9] -0.4718058 -0.5697255 1.0000000 0.9520263 0.8187691 0.6209244 0.3833110 0.1321355
[17] -0.1076071 -0.3143798 -0.4718058 -0.5697255 1.0000000 0.9520263 0.8187691 0.6209244
[25] 0.3833110 0.1321355 -0.1076071 -0.3143798 -0.4718058 -0.5697255
It works. It would not work with #tpetzoldt method, because the time vector passes from 10 to 0, which would cause integration problems. Here I really need to hack the way nlnme works.
Now :
fit <- nlme(model = signal ~ solution_analy_ODE2_modif (esp2omega,omega2,time,y0,v0,yeq),
data = data_simu,
fixed = esp2omega + omega2 + y0 + v0 + yeq ~ 1,
random = y0 ~ 1 ,
groups = ~ ID,
start = c(esp2omega = 0.5,
omega2 = 0.5,
yeq = 0,
y0 = 1,
v0 = 1))
works like a charm
summary(fit)
Nonlinear mixed-effects model fit by maximum likelihood
Model: signal ~ solution_analy_ODE2_modif(omega2, esp2omega, time, y0, v0, yeq)
Data: data_simu
AIC BIC logLik
-597.4215 -567.7366 307.7107
Random effects:
Formula: list(y0 ~ 1, v0 ~ 1)
Level: ID
Structure: General positive-definite, Log-Cholesky parametrization
StdDev Corr
y0 0.61713329 y0
v0 0.67815548 -0.269
Residual 0.03859165
Fixed effects: esp2omega + omega2 + y0 + v0 + yeq ~ 1
Value Std.Error DF t-value p-value
esp2omega 0.4113068 0.00866821 186 47.45002 0.0000
omega2 1.0916444 0.00923958 186 118.14876 0.0000
y0 0.3848382 0.19788896 186 1.94472 0.0533
v0 0.1892775 0.21762610 186 0.86974 0.3856
yeq 0.0000146 0.00283328 186 0.00515 0.9959
Correlation:
esp2mg omega2 y0 v0
omega2 0.224
y0 0.011 -0.008
v0 0.005 0.030 -0.269
yeq -0.091 -0.046 0.009 -0.009
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-3.2692477 -0.6122453 0.1149902 0.6460419 3.2890201
Number of Observations: 200
Number of Groups: 10
How can I get starting values?
I think there is a problem with the slope
I know the model is modified hill equation
logFC_max is upper bound(5) - lower bound(-1)
Smax is the maximum sensitivity(1/EC50) of the gene
tmax is the point in time with maximum sensitivity
Sdur represents a measure of the duration of the sensitivity interval.
head(ctr)
t c y logy logc logt
1 1 0 100 4.60517 -Inf 0.0000000
2 1 0 100 4.60517 -Inf 0.0000000
3 1 0 100 4.60517 -Inf 0.0000000
4 2 0 100 4.60517 -Inf 0.6931472
5 2 0 100 4.60517 -Inf 0.6931472
6 2 0 100 4.60517 -Inf 0.6931472
(time <- unique(ctr$t))
[1] 1 2 4 6 8 24 30 48 54 72
fo <- logy ~ logFC_max / (1+exp(-slope*(log(c)-log(1/(s_max*exp(-0.5*(log(t)-log(t_max))/s_dur)^2)))))
ctr_nls <- nls(fo,
data = ctr,
start = list(logFC_max = 6, slope = -10, s_max = 4.619, t_max = 72, s_dur = 8))
Error in numericDeriv(form[[3L]], names(ind), env) :
In addition: Warning message:
In log(1/(s_max * exp(-0.5 * ((log(t) - log(t_max))/s_dur)^2))) :
NaNs produced
I read that the R flexsurv package can also be used for modeling time-dependent covariates according to Christopher Jackson (2016) ["flexsurv: a platform for parametric survival modeling in R, Journal of Statistical Software, 70 (1)].
However, I was not able to figure out how, even after several adjustments and searches in online forums.
Before turning to the estimation of time-dependent covariates I tried to create a simple model with only time-independent covariates to test whether I specified the Surv object correctly. Here is a small example.
library(splitstackshape)
library(flexsurv)
## create sample data
n=50
set.seed(2)
t <- rpois(n,15)+1
x <- rnorm(n,t,5)
df <- data.frame(t,x)
df$id <- 1:n
df$rep <- df$t-1
Which looks like this:
t x id rep
1 12 17.696149 1 11
2 12 20.358094 2 11
3 11 2.058789 3 10
4 16 26.156213 4 15
5 13 9.484278 5 12
6 15 15.790824 6 14
...
And the long data:
long.df <- expandRows(df, "rep")
rep.vec<-c()
for(i in 1:n){
rep.vec <- c(rep.vec,1:(df[i,"t"]-1))
}
long.df$start <- rep.vec
long.df$stop <- rep.vec +1
long.df$censrec <- 0
long.df$censrec<-ifelse(long.df$stop==long.df$t,1,long.df$censrec)
Which looks like this:
t x id start stop censrec
1 12 17.69615 1 1 2 0
1.1 12 17.69615 1 2 3 0
1.2 12 17.69615 1 3 4 0
1.3 12 17.69615 1 4 5 0
1.4 12 17.69615 1 5 6 0
1.5 12 17.69615 1 6 7 0
1.6 12 17.69615 1 7 8 0
1.7 12 17.69615 1 8 9 0
1.8 12 17.69615 1 9 10 0
1.9 12 17.69615 1 10 11 0
1.10 12 17.69615 1 11 12 1
2 12 20.35809 2 1 2 0
...
Now I can estimate a simple Cox model to see whether it works:
coxph(Surv(t)~x,data=df)
This yields:
coef exp(coef) se(coef) z p
x -0.0588 0.9429 0.0260 -2.26 0.024
And in the long format:
coxph(Surv(start,stop,censrec)~x,data=long.df)
I get:
coef exp(coef) se(coef) z p
x -0.0588 0.9429 0.0260 -2.26 0.024
Taken together I conclude that my transformation into the long format was correct. Now, turning to the flexsurv framework:
flexsurvreg(Surv(time=t)~x,data=df, dist="weibull")
yields:
Estimates:
data mean est L95% U95% se exp(est) L95% U95%
shape NA 5.00086 4.05569 6.16631 0.53452 NA NA NA
scale NA 13.17215 11.27876 15.38338 1.04293 NA NA NA
x 15.13380 0.01522 0.00567 0.02477 0.00487 1.01534 1.00569 1.02508
But
flexsurvreg(Surv(start,stop,censrec) ~ x ,data=long.df, dist="weibull")
causes an error:
Error in flexsurvreg(Surv(start, stop, censrec) ~ x, data = long.df, dist = "weibull") :
Initial value for parameter 1 out of range
Would anyone happen to know the correct syntax for the latter Surv object? If you use the correct syntax, do you get the same estimates?
Thank you very much,
best,
David
===============
EDIT AFTER FEEDBACK FROM 42
===============
library(splitstackshape)
library(flexsurv)
x<-c(8.136527, 7.626712, 9.809122, 12.125973, 12.031536, 11.238394, 4.208863, 8.809854, 9.723636)
t<-c(2, 3, 13, 5, 7, 37 ,37, 9, 4)
df <- data.frame(t,x)
#transform into long format for time-dependent covariates
df$id <- 1:length(df$t)
df$rep <- df$t-1
long.df <- expandRows(df, "rep")
rep.vec<-c()
for(i in 1:length(df$t)){
rep.vec <- c(rep.vec,1:(df[i,"t"]-1))
}
long.df$start <- rep.vec
long.df$stop <- rep.vec +1
long.df$censrec <- 0
long.df$censrec<-ifelse(long.df$stop==long.df$t,1,long.df$censrec)
coxph(Surv(t)~x,data=df)
coxph(Surv(start,stop,censrec)~x,data=long.df)
flexsurvreg(Surv(time=t)~x,data=df, dist="weibull")
flexsurvreg(Surv(start,stop,censrec) ~ x ,data=long.df, dist="weibull",inits=c(shape=.1, scale=1))
Which yields the same estimates for both coxph models but
Call:
flexsurvreg(formula = Surv(time = t) ~ x, data = df, dist = "weibull")
Estimates:
data mean est L95% U95% se exp(est) L95% U95%
shape NA 1.0783 0.6608 1.7594 0.2694 NA NA NA
scale NA 27.7731 3.5548 216.9901 29.1309 NA NA NA
x 9.3012 -0.0813 -0.2922 0.1295 0.1076 0.9219 0.7466 1.1383
N = 9, Events: 9, Censored: 0
Total time at risk: 117
Log-likelihood = -31.77307, df = 3
AIC = 69.54614
and
Call:
flexsurvreg(formula = Surv(start, stop, censrec) ~ x, data = long.df,
dist = "weibull", inits = c(shape = 0.1, scale = 1))
Estimates:
data mean est L95% U95% se exp(est) L95% U95%
shape NA 0.8660 0.4054 1.8498 0.3353 NA NA NA
scale NA 24.0596 1.7628 328.3853 32.0840 NA NA NA
x 8.4958 -0.0912 -0.3563 0.1739 0.1353 0.9128 0.7003 1.1899
N = 108, Events: 9, Censored: 99
Total time at risk: 108
Log-likelihood = -30.97986, df = 3
AIC = 67.95973
Reading the error message:
Error in flexsurvreg(Surv(start, stop, censrec) ~ x, data = long.df, dist = "weibull", :
initial values must be a numeric vector
And then reading the help page, ?flexsurvreg, it seemed as though an attempt at setting values for inits to a named numeric vector should be attempted:
flexsurvreg(Surv(start,stop,censrec) ~ x ,data=long.df, dist="weibull", inits=c(shape=.1, scale=1))
Call:
flexsurvreg(formula = Surv(start, stop, censrec) ~ x, data = long.df,
dist = "weibull", inits = c(shape = 0.1, scale = 1))
Estimates:
data mean est L95% U95% se exp(est) L95% U95%
shape NA 5.00082 4.05560 6.16633 0.53454 NA NA NA
scale NA 13.17213 11.27871 15.38341 1.04294 NA NA NA
x 15.66145 0.01522 0.00567 0.02477 0.00487 1.01534 1.00569 1.02508
N = 715, Events: 50, Censored: 665
Total time at risk: 715
Log-likelihood = -131.5721, df = 3
AIC = 269.1443
Extremely similar results. My guess was basically a stab in the dark, so I have no guidance on how to make a choice if this had not succeeded other than to "expand the search."
I just want to mention that in flexsurv v1.1.1, running this code:
flexsurvreg(Surv(start,stop,censrec) ~ x ,data=long.df, dist="weibull")
doesn't return any errors. It also gives the same estimates as the non time-varying command
flexsurvreg(Surv(time=t)~x,data=df, dist="weibull")
I am attempting to fit an analytical model to my observed data on human somatic growth. The generalized model is a 3-parameter logarithmic growth curve where:
s= B0+B1*t+B2*log(t),
where s is a growth parameter whether it be a bone length or stature and t is an age.
I have attempted to run through various iterations of code to both form a likelihood function and and maximize teh returned parameters. To be completely honest, I am totally unsure if I am writing this correctly, but the most recent iteration of my attempts looks as such using a dataframe title "cedar":
cedar.f<-function(b){sum((cedar$FLEN~b[0]+b[1]*cedar$MINAGE+b[2]*log(cedar$MINAGE))^2)}
cedar.optimx<-optimx(c(0,0,0), fn = cedar.f, control = list(all.methods=TRUE, save.failures=TRUE, maxit=5000, maximize=TRUE))
cedar$MINAGE=c(2.5,0.5,6,0.01,0.01,0.01,1,0.01,12,0.01,0.01,1,12,1,4.5,1,4.5,0.01,7.8,11,4,7.5,9,0.25,0.01,0.01,0.01,0.1,1,1,0.01,0.01)
cedar$FLEN=c(167,150,300,54,60,78,152,72, 452,84,81,163,450,137,271,151,261,73,330,378,226,319,378,89,78,89,65,81,144,122, 50,55
Above, I have provided my attempt at the function and the optimization. I have received several errors in my attempts at doing this including:
argument "cedar" is missing ,no default
non-numeric argument to binary operator
Really, I am here to ask what recommendations anyone may have in writing this function so as to maximize the best fit of the data to the analytical human growth curve. If I am going about this all wrong / writing the function wrong, any help would be appreciated.
Thank you all.
cedar <- data.frame(MINAGE=c(2.5,0.5,6,0.01,0.01,0.01,1,0.01,12,0.01,0.01,1,12,1,4.5,1,4.5,0.01,7.8,11,4,7.5,9,0.25,0.01,0.01,0.01,0.1,1,1,0.01,0.01),
FLEN=c(167,150,300,54,60,78,152,72,452,84,81,163,450,137,271,151,261,73,330,378,226,319,378,89,78,89,65,81,144,122,50,55))
# Sum of squared errors
# Minus sign is for function minimization
cedar.f <- function(b) {
-sum( (cedar$FLEN - (b[1] + b[2]*cedar$MINAGE + b[3]*log(cedar$MINAGE)))^2 )
}
library(optimx)
cedar.optimx <- optimx( c(1,1,1), fn = cedar.f,
control = list(all.methods=TRUE, save.failures=TRUE, maxit=5000, maximize=TRUE))
# p1 p2 p3 value fevals gevals niter convcode kkt1 kkt2 xtimes
# BFGS 120.4565 24.41910 11.25419 -7.674935e+03 25 8 NA 0 TRUE TRUE 0.00
# CG 120.4565 24.41910 11.25419 -7.674935e+03 1072 298 NA 0 TRUE TRUE 0.15
# Nelder-Mead 120.4714 24.41647 11.25186 -7.674947e+03 258 NA NA 0 TRUE TRUE 0.02
# L-BFGS-B 120.4565 24.41910 11.25419 -7.674935e+03 17 17 NA 0 TRUE TRUE 0.01
# nlm 120.4564 24.41910 11.25417 -7.674935e+03 NA NA 12 0 TRUE TRUE 0.01
# nlminb 120.4565 24.41910 11.25419 -7.674935e+03 21 48 13 0 TRUE TRUE 0.02
# spg 120.4565 24.41910 11.25419 -7.674935e+03 99 NA 92 0 TRUE TRUE 0.06
# ucminf 120.4564 24.41910 11.25417 -7.674935e+03 10 10 NA 0 TRUE TRUE 0.00
# Rcgmin NA NA NA -8.988466e+307 NA NA NA 9999 NA NA 0.00
# Rvmmin NA NA NA -8.988466e+307 NA NA NA 9999 NA NA 0.00
# newuoa 120.4565 24.41910 11.25419 -7.674935e+03 118 NA NA 0 TRUE TRUE 0.01
# bobyqa 120.4565 24.41910 11.25419 -7.674935e+03 142 NA NA 0 TRUE TRUE 0.02
# nmkb 120.4566 24.41907 11.25421 -7.674935e+03 213 NA NA 0 TRUE TRUE 0.03
# hjkb 1.0000 1.00000 1.00000 -1.363103e+06 1 NA 0 9999 NA NA 0.00
Alternatively, model coefficients can be estimated using a simple linear model:
fitlm <- lm(FLEN~MINAGE+log(MINAGE), data=cedar)
coef(fitlm)
# Intercept) MINAGE log(MINAGE)
# 120.45654 24.41910 11.25419
The estimated function can be plotted as follows:
optpar <- as.matrix(cedar.optimx[1,1:3])
estim_fun <- function(x, b=optpar) {
b[1] + b[2]*x + b[3]*log(x)
}
curve(estim_fun, from=min(cedar$MINAGE), to=max(cedar$MINAGE))
I am trying to make a nls fit for a little bit complicated expression that includes two integrals with two of the fit parameters in their upper limits.
I got the error
"Error in nlsModel(formula, mf, start, wts) : singular gradient
matrix at initial parameter estimates".
I have searched already in the previous answers, but didn't help. The parameters initialization seem to be ok, I have tried to change the parameters but none work. If my function has just one integral everything works very nicely, but when adding a second integral term just got the error. I don't believe the function is over-parametrized, as I have performed other fits with much more parameters and they worked. Below I have wrote a list with some data.
The minimal example is the following:
integrand <- function(X) {
return(X^4/(2*sinh(X/2))^2)
}
fitting = function(T1, T2, N, D, x){
int1 = integrate(integrand, lower=0, upper = T1)$value
int2 = integrate(integrand, lower=0, upper = T2)$value
return(N*(D/x)^2*(exp(D/x)/(1+exp(D/x))^2
)+(448.956*(x/T1)^3*int1)+(299.304*(x/T2)^3*int2))
}
fit = nls(y ~ fitting(T1, T2, N, D, x),
start=list(T1=400,T2=200,N=0.01,D=2))
------>For reference, the fit that worked is the following:
integrand <- function(X) {
return(X^4/(2*sinh(X/2))^2)
}
fitting = function(T1, N, D, x){
int = integrate(integrand, lower=0, upper = T1)$value
return(N*(D/x)^2*(exp(D/x)/(1+exp(D/x))^2 )+(748.26)*(x/T1)^3*int)
}
fit = nls(y ~ fitting(T1 , N, D, x), start=list(T1=400,N=0.01,D=2))
------->Data to illustrate the problem:
dat<- read.table(text="x y
0.38813 0.0198
0.79465 0.02206
1.40744 0.01676
1.81532 0.01538
2.23105 0.01513
2.64864 0.01547
3.05933 0.01706
3.47302 0.01852
3.88791 0.02074
4.26301 0.0256
4.67607 0.03028
5.08172 0.03507
5.48327 0.04283
5.88947 0.05017
6.2988 0.05953
6.7022 0.07185
7.10933 0.08598
7.51924 0.0998
7.92674 0.12022
8.3354 0.1423
8.7384 0.16382
9.14656 0.19114
9.55062 0.22218
9.95591 0.25542", header=TRUE)
I cannot figure out what happen. I need to perform this fit for three integral components, but even for two I have this problem. I appreciate so much your help. Thank you.
You could try some other optimizers:
fitting1 <- function(par, x, y) {
sum((fitting(par[1], par[2], par[3], par[4], x) - y)^2)
}
library(optimx)
res <- optimx(c(400, 200, 0.01, 2),
fitting1,
x = DF$x, y = DF$y,
control = list(all.methods = TRUE))
print(res)
# p1 p2 p3 p4 value fevals gevals niter convcode kkt1 kkt2 xtimes
#BFGS 409.7992 288.6416 -0.7594461 39.00871 1.947484e-03 101 100 NA 1 NA NA 0.22
#CG 401.1281 210.9087 -0.9026459 20.80900 3.892929e-01 215 101 NA 1 NA NA 0.25
#Nelder-Mead 414.6402 446.5080 -1.1298606 -227.81280 2.064842e-03 89 NA NA 0 NA NA 0.02
#L-BFGS-B 412.4477 333.1338 -0.3650530 37.74779 1.581643e-03 34 34 NA 0 NA NA 0.06
#nlm 411.8639 333.4776 -0.3652356 37.74855 1.581644e-03 NA NA 45 0 NA NA 0.04
#nlminb 411.9678 333.4449 -0.3650271 37.74753 1.581643e-03 50 268 48 0 NA NA 0.07
#spg 422.0394 300.5336 -0.5776862 38.48655 1.693119e-03 1197 NA 619 0 NA NA 1.06
#ucminf 412.7390 332.9228 -0.3652029 37.74829 1.581644e-03 45 45 NA 0 NA NA 0.05
#Rcgmin NA NA NA NA 8.988466e+307 NA NA NA 9999 NA NA 0.00
#Rvmmin NA NA NA NA 8.988466e+307 NA NA NA 9999 NA NA 0.00
#newuoa 396.3071 345.1165 -0.3650286 37.74754 1.581643e-03 3877 NA NA 0 NA NA 1.02
#bobyqa 410.0392 334.7074 -0.3650289 37.74753 1.581643e-03 7866 NA NA 0 NA NA 2.07
#nmkb 569.0139 346.0856 282.6526588 -335.32320 2.064859e-03 75 NA NA 0 NA NA 0.01
#hjkb 400.0000 200.0000 0.0100000 2.00000 3.200269e+00 1 NA 0 9999 NA NA 0.01
Levenberg-Marquardt converges too, but nlsLM fails when it tries to create an nls model object from the result because the gradient matrix is singular:
library(minpack.lm)
fit <- nlsLM(y ~ fitting(T1, T2, N, D, x),
start=list(T1=412,T2=333,N=-0.36,D=38), data = DF, trace = TRUE)
#It. 0, RSS = 0.00165827, Par. = 412 333 -0.36 38
#It. 1, RSS = 0.00158186, Par. = 417.352 329.978 -0.3652 37.746
#It. 2, RSS = 0.00158164, Par. = 416.397 330.694 -0.365025 37.7475
#It. 3, RSS = 0.00158164, Par. = 416.618 330.568 -0.365027 37.7475
#It. 4, RSS = 0.00158164, Par. = 416.618 330.568 -0.365027 37.7475
#Error in nlsModel(formula, mf, start, wts) :
# singular gradient matrix at initial parameter estimates