Self Starting Functions for Exponential Decay Model in R - r

I'm working on a exponential decay model where I would like to estimate the decay rate. My current model uses a self-starting function, SSasymp from the stats package. I've also written a second model, where I just eyeball the starting parameters, which requires minpack.lm package. My question is, is there another way I can estimate the starting parameters to cross check the SSasymp function. I (think) I understand what the code is doing to estimate the starting parameters, but I wanted to get some feedback on wether the SSasymp is the right function to use with this data or if there is another function I could potentially use.
library(stats)
library(minpack.lm)
library(broom)
library(ggplot2)
df<-data.frame(Date=seq(1:66),
Level=c(1438072839.75, 1397678053.5, 1358947420.5, 1313619938.25, 1269224528.25,
1246776954.75, 1207201162.5, 1176229091.25, 1136063160, 1103721704.25, 1080591637.5,
1048286667, 1017840460.5, 1001057052, 975815001, 943568665.5, 932026210.5, 916996593.75,
903904288.5, 887578544.25, 871428547.5, 855417720, 843504839.25, 825835607.25,
816060303.75, 803506361.25, 801213123, 797977217.25, 793483994.25, 780060123, 766265609.25,
756172471.5, 746615497.5, 738002936.25, 723741644.25, 711969181.5, 696032998.5,
686162453.25, 671953166.25, 674184571.5, 664739475, 641091932.25, 627358484.25,
616740068.25, 602261552.25, 592440797.25, 584160403.5, 569780103.75, 556305753,
551682927, 546535062, 537782506.5, 524251944.75, 519277188.75, 503598795, 498481312.5,
487907885.25, 479760227.25, 474773064.75, 468246932.25, 460561701, 455266345.5,
448451890.5, 447760119, 441236056.5, 438884417.25))
dfDecay<-nls(Level~ SSasymp(Date, Asym, R0, lrc), data = df)
dfFitted<-augment(dfDecay)
ggplot(df, aes(x=Date,y=Level))+geom_point()+ geom_line( aes(y=dfFitted$.fitted), color="red")
dfDecay2<-nlsLM(Level~b*exp(-a*Date),
data = df,
start= list(a=.01,b=1.5e+09),
algorithm = "LM")
fitDecay2<-augment(dfDecay2)
ggplot(df, aes(x=Date,y=Level))+geom_point()+ geom_line( aes(y=fitDecay2$.fitted), color="red")

Regarding starting values:
Take logs of both sides and fit with a linear model.
The parameters should be of similar magnitude to avoid numerical problems so use Level/1e9 in place of Level. This just changes the units in which Level is measured.
Using starting values from the linear model, nls should be sufficient.
This gives:
fm0 <- lm(log(Level/1e9) ~ Date, df)
st <- list(a = exp(coef(fm0)[[1]]), b = -coef(fm0)[[2]])
nls(Level/1e9 ~ a * exp(-b * Date ), df, start = st)
giving:
Nonlinear regression model
model: Level/1e+09 ~ a * exp(-b * Date)
data: df
a b
1.3532 0.0183
residual sum-of-squares: 0.08055
Number of iterations to convergence: 4
Achieved convergence tolerance: 4.023e-07

Related

Including an offset when using cph {rms} for validation of a Cox model

I am externally validating and updating a Cox model in R. The model predicts 5 year risk. I don't have access to the original data, just the equation for the linear predictor and the value of the baseline survival probability at 5 years.
I have assessed calibration and discrimination of the model in my dataset and found that the model needs to be updated.
I want to update the model by adjusting baseline risk only, so I have been using a Cox model with the linear predictor ("beta.sum") included as an offset term, to restrict its coefficient to be 1.
I want to be able to use cph instead of coxph as it makes internal validation by bootstrapping much easier. However, when including the linear predictor as an offset I get the error:
"Error in exp(object$linear.predictors) :
non-numeric argument to mathematical function"
Is there something I am doing incorrectly, or does the cph function not allow an offset within the formula? If so, is there another way to restrict the coefficient to 1?
My code is below:
load(file="k.Rdata")
### Predicted risk ###
# linear predictor (LP)
k$beta.sum <- -0.2201 * ((k$age/10)-7.036) + 0.2467 * (k$male - 0.5642) - 0.5567 * ((k$epi/5)-7.222) +
0.4510 * (log(k$acr_mgmmol/0.113)-5.137)
k$pred <- 1 - 0.9365^exp(k$beta.sum)
# Recalibrated model
# Using coxph:
cox.new <- coxph(Surv(time, rrt) ~ offset(beta.sum), data = k, x=TRUE, y=TRUE)
# new baseline survival at 5 years
library(pec)
predictSurvProb(cox.new, newdata=data.frame(beta.sum=0), times = 5) #baseline = 0.9570
# Using cph
cph.new <- cph(Surv(time, rrt) ~ offset(beta.sum), data=k, x=TRUE, y=TRUE, surv=TRUE)
The model will run without surv=TRUE included, but this means a lot of the commands I want to use cannot work, such as calibrate, validate and predictSurvProb.
EDIT:
I will include a way to reproduce the error
library(purr)
library(rms)
n <- 1000
set.seed(1234)
status <- as.numeric(rbernoulli(n, p=0.1))
time <- -5* log(runif(n))
lp <- rnorm(1000, mean=-2.7, sd=1)
mydata <- data.frame(status, time, lp)
test <- cph(Surv(time, status) ~ offset(lp), data=mydata, surv=TRUE)

Interpretation of output and prediction from cpglmm

I am helping a colleague fit a Compound-Poisson Generalized Linear Mixed Model in R, using the cpglmm-function from the cplm-package (link). The model involves a three-way interaction and I would like to compute some interpretable quantities. So far, I have tried to calculate some Odds-ratios but I am not sure this is the right way to do it?
# Fit model with three-way interaction in fixed effects #
m <- cpglmm(ncs ~ diversity_index*diversity_speciality*n_authors + selfcit +
n_refs + (1|region), data = diversity)
# Calculate Odds-ratio #
se <- sqrt(diag(vcov(m)))
tab <- cbind(Est = m$fixef,
S.E. = se,
LL = m$fixef - 1.96 * se,
UL = m$fixef + 1.96 * se)
print(exp(tab), digits=3)
I also want to compute some predicted values, e.g predicted probabilities or the like, but I can't get predict() to work for the cpglmm. Is there any functions I could use?

Fitting non-linear Langmuir Isotherm in R

I want to fit Isotherm models for the following data in R. The simplest isotherm model is Langmuir model given here model is given in the bottom of the page. My MWE is given below which throw the error. I wonder if there is any R package for Isotherm models.
X <- c(10, 30, 50, 70, 100, 125)
Y <- c(155, 250, 270, 330, 320, 323)
Data <- data.frame(X, Y)
LangIMfm2 <- nls(formula = Y ~ Q*b*X/(1+b*X), data = Data, start = list(Q = 1, b = 0.5), algorith = "port")
Error in nls(formula = Y ~ Q * b * X/(1 + b * X), data = Data, start = list(Q = 1, :
Convergence failure: singular convergence (7)
Edited
Some nonlinear models can be transform to linear models. My understanding is that there might be one-to-one relationship between the estimates of nonlinear model and its linear model form but their corresponding standard errors are not related to each other. Is this assertion true? Are there any pitfalls in fitting Nonlinear Models by transforming to linearity?
I am not aware of such packages and personally I don't think that you need one as the problem can be solved using a base R.
nls is sensitive to the starting parameters, so you should begin with a good starting guess. You can easily evaluate Q because it corresponds to the asymptotic limit of the isotherm at x-->Inf, so it is reasonable to begin with Q=323 (which is the last value of Y in your sample data set).
Next, you could do plot(Data) and add a line with an isotherm that corresponds to your starting parameters Q and b and tweak b to come up with a reasonable guess.
The plot below shows your data set (points) and a probe isotherm with Q = 323 and b = 0.5, generated by with(Data,lines(X,323*0.5*X/(1+0.5*X),col='red')) (red line). It seemed a reasonable starting guess to me, and I gave it a try with nls:
LangIMfm2 <- nls(formula = Y ~ Q*b*X/(1+b*X), data = Data, start = list(Q = 300, b = 1), algorith = "port")
# Nonlinear regression model
# model: Y ~ Q * b * X/(1 + b * X)
# data: Data
# Q b
# 366.2778 0.0721
# residual sum-of-squares: 920.6
#
# Algorithm "port", convergence message: relative convergence (4)
and plotted predicted line to make sure that nls found the right solution:
lines(Data$X,predict(LangIMfm2),col='green')
Having said that, I would suggest to use a more effective strategy, based on the linearization of the model by rewriting the isotherm equation in reciprocal coordinates:
z <- 1/Data
plot(Y~X,z)
abline(lm(Y~X,z))
M <- lm(Y~X,z)
Q <- 1/coef(M)[1]
# 363.2488
b <- coef(M)[1]/coef(M)[2]
# 0.0741759
As you could see, both approaches produce essentially the same result, but the linear model is more robust and doesn't require starting parameters (and, as far as I remember, it is the standard way of the isotherm analysis in the experimental physical chemistry).
You can use the SSmicmen self-starter function (see Ritz and Streibig, 2008, Nonlinear Regression with R) in the nlme package for R, which calculates initial parameters from the fit of the linearized form of the Michaelis-Menten (MM) equation. Fortunately, the MM equation possesses a form that can be adapted for the Langmuir equation, S = Smax*x/(KL + x). I've found the nlshelper and tidyverse packages useful for modeling and exporting the results of the nls command into tables and plots, particularly when modeling sample groups. Here's my code for modeling a single set of sorption data:
library(tidyverse)
library(nlme)
library(nlshelper)
lang.fit <- nls(Y ~ SSmicmen(X,Smax,InvKL), data=Data)
fit.summary <- tidy(lang.fit)
fit.coefs <- coef(lang.fit)
For simplicity, the Langmuir affinity constant is modeled here as 1/KL. Applying this code, I get the same parameter estimates as #Marat given above.
The simple code below allows for wrangling the data in order to create a ggplot object, containing the original points and fitted line (i.e., geom_point would represent the original X and Y data, geom_line would represent the original X plus YHat).
FitY <- tibble(predict(lang.fit))
YHat <- FitY[,1]
Data2 <- cbind(Data, YHat)
If you want to model multiple groups of data (say, based on a "Sample_name" column, then the lang.fit variable would be calculated as below, this time using the nlsList command:
lang.fit <- nlsList(Y ~ SSmicmen(X,Smax,InvKL) | Sample_name, data=Data)
The problem is the starting values. We show two approaches to this as well as an alternative that converges even using the starting values in the question.
1) plinear The right hand side is linear in Q*b so it would be better to absorb b into Q and then we have a parameter that enters linearly so it is easier to solve. Also with the plinear algorithm no starting values are needed for the linear parameter so only the starting value for b need be specified. With plinear the right hand side of the nls formula should be specified as the vector that multiplies the linear parameter. The result of running nls giving fm0 below will be coefficients named b and .lin where Q = .lin / b.
We already have our answer from fm0 but if we want a clean run in terms of b and Q rather than b and .lin we can run the original formula in the question using the starting values implied by the coefficients returned by fm0 as shown.
fm0 <- nls(Y ~ X/(1+b*X), Data, start = list(b = 0.5), alg = "plinear")
st <- with(as.list(coef(fm0)), list(b = b, Q = .lin/b))
fm <- nls(Y ~ Q*b*X/(1+b*X), Data, start = st)
fm
giving
Nonlinear regression model
model: Y ~ Q * b * X/(1 + b * X)
data: Data
b Q
0.0721 366.2778
residual sum-of-squares: 920.6
Number of iterations to convergence: 0
Achieved convergence tolerance: 9.611e-07
We can display the result. The points are the data and the red line is the fitted curve.
plot(Data)
lines(fitted(fm) ~ X, Data, col = "red")
(contineud after plot)
2) mean Alternately, using a starting value of mean(Data$Y) for Q seems to work well.
nls(Y ~ Q*b*X/(1+b*X), Data, start = list(b = 0.5, Q = mean(Data$Y)))
giving:
Nonlinear regression model
model: Y ~ Q * b * X/(1 + b * X)
data: Data
b Q
0.0721 366.2779
residual sum-of-squares: 920.6
Number of iterations to convergence: 6
Achieved convergence tolerance: 5.818e-06
The question already had a reasonable starting value for b which we used but if one were needed one could set Y to Q*b so that they cancel and X to mean(Data$X) and solve for b to give b = 1 - 1/mean(Data$X) as a possible starting value. Although not shown using this starting value for b with mean(Data$Y) as the starting value for Q also resulted in convergence.
3) optim If we use optim the algorithm converges even with the initial values used in the question. We form the residual sum of squares and minimize that:
rss <- function(p) {
Q <- p[1]
b <- p[2]
with(Data, sum((Y - b*Q*X/(1+b*X))^2))
}
optim(c(1, 0.5), rss)
giving:
$par
[1] 366.27028219 0.07213613
$value
[1] 920.62
$counts
function gradient
249 NA
$convergence
[1] 0
$message
NULL

Calculating R^2 for a nonlinear least squares fit

Suppose I have x values, y values, and expected y values f (from some nonlinear best fit curve).
How can I compute R^2 in R? Note that this function is not a linear model, but a nonlinear least squares (nls) fit, so not an lm fit.
You just use the lm function to fit a linear model:
x = runif(100)
y = runif(100)
spam = summary(lm(x~y))
> spam$r.squared
[1] 0.0008532386
Note that the r squared is not defined for non-linear models, or at least very tricky, quote from R-help:
There is a good reason that an nls model fit in R does not provide
r-squared - r-squared doesn't make sense for a general nls model.
One way of thinking of r-squared is as a comparison of the residual
sum of squares for the fitted model to the residual sum of squares for
a trivial model that consists of a constant only. You cannot
guarantee that this is a comparison of nested models when dealing with
an nls model. If the models aren't nested this comparison is not
terribly meaningful.
So the answer is that you probably don't want to do this in the first
place.
If you want peer-reviewed evidence, see this article for example; it's not that you can't compute the R^2 value, it's just that it may not mean the same thing/have the same desirable properties as in the linear-model case.
Sounds like f are your predicted values. So the distance from them to the actual values devided by n * variance of y
so something like
1-sum((y-f)^2)/(length(y)*var(y))
should give you a quasi rsquared value, so long as your model is reasonably close to a linear model and n is pretty big.
As a direct answer to the question asked (rather than argue that R2/pseudo R2 aren't useful) the nagelkerke function in the rcompanion package will report various pseudo R2 values for nonlinear least square (nls) models as proposed by McFadden, Cox and Snell, and Nagelkerke, e.g.
require(nls)
data(BrendonSmall)
quadplat = function(x, a, b, clx) {
ifelse(x < clx, a + b * x + (-0.5*b/clx) * x * x,
a + b * clx + (-0.5*b/clx) * clx * clx)}
model = nls(Sodium ~ quadplat(Calories, a, b, clx),
data = BrendonSmall,
start = list(a = 519,
b = 0.359,
clx = 2304))
nullfunct = function(x, m){m}
null.model = nls(Sodium ~ nullfunct(Calories, m),
data = BrendonSmall,
start = list(m = 1346))
nagelkerke(model, null=null.model)
The soilphysics package also reports Efron's pseudo R2 and adjusted pseudo R2 value for nls models as 1 - RSS/TSS:
pred <- predict(model)
n <- length(pred)
res <- resid(model)
w <- weights(model)
if (is.null(w)) w <- rep(1, n)
rss <- sum(w * res ^ 2)
resp <- pred + res
center <- weighted.mean(resp, w)
r.df <- summary(model)$df[2]
int.df <- 1
tss <- sum(w * (resp - center)^2)
r.sq <- 1 - rss/tss
adj.r.sq <- 1 - (1 - r.sq) * (n - int.df) / r.df
out <- list(pseudo.R.squared = r.sq,
adj.R.squared = adj.r.sq)
which is also the pseudo R2 as calculated by the accuracy function in the rcompanion package. Basically, this R2 measures how much better your fit becomes compared to if you would just draw a flat horizontal line through them. This can make sense for nls models if your null model is one that allows for an intercept only model. Also for particular other nonlinear models it can make sense. E.g. for a scam model that uses stricly increasing splines (bs="mpi" in the spline term), the fitted model for the worst possible scenario (e.g. where your data was strictly decreasing) would be a flat line, and hence would result in an R2 of zero. Adjusted R2 then also penalize models with higher nrs of fitted parameters. Using the adjusted R2 value would already address a lot of the criticisms of the paper linked above, http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2892436/ (besides if one swears by using information criteria to do model selection the question becomes which one to use - AIC, BIC, EBIC, AICc, QIC, etc).
Just using
r.sq <- max(cor(y,yfitted),0)^2
adj.r.sq <- 1 - (1 - r.sq) * (n - int.df) / r.df
I think would also make sense if you have normal Gaussian errors - i.e. the correlation between the observed and fitted y (clipped at zero, so that a negative relationship would imply zero predictive power) squared, and then adjusted for the nr of fitted parameters in the adjusted version. If y and yfitted go in the same direction this would be the R2 and adjusted R2 value as reported for a regular linear model. To me this would make perfect sense at least, so I don't agree with outright rejecting the usefulness of pseudo R2 values for nls models as the answer above seems to imply.
For non-normal error structures (e.g. if you were using a GAM with non-normal errors) the McFadden pseudo R2 is defined analogously as
1-residual deviance/null deviance
See here and here for some useful discussion.
Another quasi-R-squared for non-linear models is to square the correlation between the actual y-values and the predicted y-values. For linear models this is the regular R-squared.
As an alternative to this problem I used at several times the following procedure:
compute a fit on data with the nls function
using the resulting model make predictions
Trace (plot...) the data against the values predicted by the model (if the model is good, points should be near the bissectrix).
Compute the R2 of the linear régression.
Best wishes to all. Patrick.
With the modelr package
modelr::rsquare(nls_model, data)
nls_model <- nls(mpg ~ a / wt + b, data = mtcars, start = list(a = 40, b = 4))
modelr::rsquare(nls_model, mtcars)
# 0.794
This gives essentially the same result as the longer way described by Tom from the rcompanion resource.
Longer way with nagelkerke function
nullfunct <- function(x, m){m}
null_model <- nls(mpg ~ nullfunct(wt, m),
data = mtcars,
start = list(m = mean(mtcars$mpg)))
nagelkerke(nls_model, null_model)[2]
# 0.794 or 0.796
Lastly, using predicted values
lm(mpg ~ predict(nls_model), data = mtcars) %>% broom::glance()
# 0.795
Like they say, it's only an approximation.

how to do predictions from cox survival model with time varying coefficients

I have built a survival cox-model, which includes a covariate * time interaction (non-proportionality detected).
I am now wondering how could I most easily get survival predictions from my model.
My model was specified:
coxph(formula = Surv(event_time_mod, event_indicator_mod) ~ Sex +
ageC + HHcat_alt + Main_Branch + Acute_seizure + TreatmentType_binary +
ICH + IVH_dummy + IVH_dummy:log(event_time_mod)
And now I was hoping to get a prediction using survfit and providing new.data for the combination of variables I am doing the predictions:
survfit(cox, new.data=new)
Now as I have event_time_mod in the right-hand side in my model I need to specify it in the new data frame passed on to survfit. This event_time would need to be set at individual times of the predictions. Is there an easy way to specify event_time_mod to be the correct time to survfit?
Or are there any other options for achieving predictions from my model?
Of course I could create as many rows in the new data frame as there are distinct times in the predictions and setting to event_time_mod to correct values but it feels really cumbersome and I thought that there must be a better way.
You have done what is refereed to as
An obvious but incorrect approach ...
as stated in Using Time Dependent Covariates and Time Dependent Coefficients in the Cox Model vignette in version 2.41-3 of the R survival package. Instead, you should use the time-transform functionality, i.e., the tt function as stated in the same vignette. The code would be something similar to the example in the vignette
> library(survival)
> vfit3 <- coxph(Surv(time, status) ~ trt + prior + karno + tt(karno),
+ data=veteran,
+ tt = function(x, t, ...) x * log(t+20))
>
> vfit3
Call:
coxph(formula = Surv(time, status) ~ trt + prior + karno + tt(karno),
data = veteran, tt = function(x, t, ...) x * log(t + 20))
coef exp(coef) se(coef) z p
trt 0.01648 1.01661 0.19071 0.09 0.9311
prior -0.00932 0.99073 0.02030 -0.46 0.6462
karno -0.12466 0.88279 0.02879 -4.33 1.5e-05
tt(karno) 0.02131 1.02154 0.00661 3.23 0.0013
Likelihood ratio test=53.8 on 4 df, p=5.7e-11
n= 137, number of events= 128
The survfit though does not work when you have a tt term
> survfit(vfit3, veteran[1, ])
Error in survfit.coxph(vfit3, veteran[1, ]) :
The survfit function can not yet process coxph models with a tt term
However, you can easily get out the terms, linear predictor or mean response with predict. Further, you can create the term over time for the tt term using the answer here.

Resources