nls model not showing the y0 table but the a and b - r

I am trying to get the nonlinear formula of y0, a, and b for my curve so I cam plot it on my graph. the summary(nls.mod) output does not show the y0 that I will need to plot the curve and I am not sure why as I have tried everything. The code is below:
# BH this version of plot is used for diagnostic plots for
# BH residuals of a linear model, i.e. using lm.
plot(mdl3 <- lm(ETR ~ wp_Mpa + I(wp_Mpa^2) + I(wp_Mpa^3), data = dat3))
prd <- data.frame(x = seq(-4, 0, by = 0.5))
result <- prd
result$mdl3 <- predict(mdl3, newdat3 = prd)
# BH use nls to fit this model y0+a*exp(b*x)
nls.mod <- nls(ETR ~ y0 + a*exp(b*wp_Mpa), start=c(a = -4, b = 0), data=dat3.no_na)
summary(nls.mod)
and here is the output:
Formula: ETR ~ y0 + a * exp(b * wp_Mpa)
Parameters:
Estimate Std. Error t value Pr(>|t|)
a 85.85515 8.62005 9.960 <2e-16 ***
b 0.14157 0.07444 1.902 0.0593 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 58.49 on 134 degrees of freedom
Number of iterations to convergence: 8
Achieved convergence tolerance: 1.515e-06
as you can see for some reason only the a and b show up but the y0 is supposed to be above that
I tried to reassign the variables and that just continued to give me the same output contacted a stats consultant and they just said I needed to change the variables but it still didnt work

Related

How to calculate by hand standard errors and t statistics of minpack.lm::nlsLM?

Let's consider this code as an example:
a = 10
b = 2
c = 1.05
set.seed(123,kind="Mersenne-Twister",normal.kind="Inversion")
x = runif(100)
data = data.frame(X = x, Y = a + b/c * (((1-x)^-c)-1))
fit_sp <- minpack.lm::nlsLM( formula = Y ~ a + b/c * (((1-X)^-c)-1),
data = data, start = c(a = 0, b = 0.1, c = 0.01),
control = nls.control(maxiter = 1000),
lower = c(0.0001,0.0001,0.0001))
fit_sp
Nonlinear regression model
model: Y ~ a + b/c * (((1 - X)^-c) - 1)
data: data
a b c
10.00 2.00 1.05
residual sum-of-squares: 1.507e-26
Number of iterations to convergence: 13
Achieved convergence tolerance: 1.49e-08
summary(fit_sp)
Formula: Y ~ a + b/c * (((1 - X)^-c) - 1)
Parameters:
Estimate Std. Error t value Pr(>|t|)
a 1.000e+01 1.516e-15 6.598e+15 <2e-16 ***
b 2.000e+00 4.372e-16 4.574e+15 <2e-16 ***
c 1.050e+00 5.319e-17 1.974e+16 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.246e-14 on 97 degrees of freedom
Number of iterations to convergence: 13
Achieved convergence tolerance: 1.49e-08
I know that non linear least squares calculates the coefficients that minimize the sum of squared residuals. But how is it possible to obtain by hand the standard errors and the t-statistics for the parameters estimate?

Non-numeric argument to binary operator R NLS package

I am using the nls package in R to perform a nonlinear fit. I have specified my independent variable as follows:
t <- seq(1,7)
and my dependent variables as P <- c(0.0246, 0.2735, 0.5697, 0.6715, 0.8655, 0.9614, 1)
I then have tried:
m <- nls(P ~ 1 / (c + q*exp(-b*t))^(1/v)),
but every time I get:
"Error in c + q * exp(-b * t) : non-numeric argument to binary
operator"
Every one of my variables is numeric. Any ideas?
Thanks!
You have more than one problem in your script. The main issue is that you should never use names which are used by R: t is the matrix transpose, c is a common method to create vectors, and q is the quit instruction. nls() will not try to fit them, as they are already defined. I recommend using more meaningful and less dangerous variables such as Coef1, Coef2, …
The second problem is that you are trying to fit a model with 4 variables to a dataset with 7 data... This may yield singularities and other problems.
For the sake of the argument, I have reduced your model to three variables, and changed some names:
Time <- seq(1,7)
Prob <- c(0.0246, 0.2735, 0.5697, 0.6715, 0.8655, 0.9614, 1)
plot(Time, Prob)
And now we perform the nls() fit:
Fit <- nls(Prob ~ 1 / (Coef1 + Coef2 * exp(-Coef3 * Time)))
X <- data.frame(Time = seq(0, 7, length.out = 100))
Y <- predict(object = Fit, newdata = X)
lines(X$Time, Y)
And a summary of the results:
summary(Fit)
# Formula: Prob ~ 1/(Coef1 + Coef2 * exp(-Coef3 * Time))
#
# Parameters:
# Estimate Std. Error t value Pr(>|t|)
# Coef1 1.00778 0.06113 16.487 7.92e-05 ***
# Coef2 23.43349 14.42378 1.625 0.1796
# Coef3 1.04899 0.21892 4.792 0.0087 **
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 0.06644 on 4 degrees of freedom
#
# Number of iterations to convergence: 12
# Achieved convergence tolerance: 3.04e-06
I know it is not exactly what you wanted, but I hope it helps.

R fGarch: presample matrix for garchSpec()

I am trying to specify GARCH model by function fGarch::garchSpec() and i need a specified presample. As defined in manual:
presample: a numeric three column matrix with start values for the
series, for the innovations, and for the conditional
variances.
But i am pretty sure, that this is not the correct order. After reading the manuals and codes for functions: 'garchFit', 'garchSpec', 'garchSim' I am still quite confused.
The question is: how to exactly build presample matrix?
You don't need to set an argument to presample. It will supply a "good" guess and in estimating the parameters it will not matter. If you want to simulate data, then I would just make sure the burnin, n.start, is large enough.
Let's look at an example:
library(fGarch)
## First we simulate some data without setting presample:
# we set up the model by spec:
set.seed(911)
spec <- garchSpec(model = list(mu = 0.02, omega = 0.05, alpha = 0.2, beta = 0.75))
# then simulate our GARCH(1,1) model:
garchSim <- garchSim(spec, n = 200, n.start = 1)
plot(garchSim)
And estimates:
> garchFit(~ garch(1, 1), data = garchSim)
Error Analysis:
Estimate Std. Error t value Pr(>|t|)
mu -0.02196 0.05800 -0.379 0.7049
omega 0.03567 0.02681 1.331 0.1833
alpha1 0.12074 0.04952 2.438 0.0148 *
beta1 0.84527 0.05597 15.103 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Log Likelihood:
-265.8417 normalized: -1.329209
Let us now try to add a very xtreme presample. In the above model (and this seed) the presample was:
> spec#presample
Presample:
time z h y
1 0 -0.4324072 1 0.02
now we replace it by c(100, 0.1, 0.1). Since my model is a GARCH(1,1) without any ARMA-part, I only need to set 3 parameters as descriped in the documentation ?garchSpec. After updating spec we estimate the same model:
set.seed(911)
spec#presample <- matrix(c(0.1, 0.1, 0.1), ncol = 3)
garchFit(~ garch(1, 1), data = garchSim)
with the same output:
Error Analysis:
Estimate Std. Error t value Pr(>|t|)
mu -0.02196 0.05800 -0.379 0.7049
omega 0.03567 0.02681 1.331 0.1833
alpha1 0.12074 0.04952 2.438 0.0148 *
beta1 0.84527 0.05597 15.103 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Log Likelihood:
-265.8417 normalized: -1.329209
The likelihood and estimates are identical, but notice when we simulate with the new spec:
set.seed(911)
garchSim <- garchSim(spec, n = 200, n.start = 1)
plot(garchSim)
, the extreme initial sample supplied messed up our nice simulation. But by increasing the burn.in we get:
set.seed(911)
garchSim <- garchSim(spec, n = 200, n.start = 100)
plot(garchSim)

Variable Selection with mgcv

Is there a way of automating variable selection of a GAM in R, similar to step? I've read the documentation of step.gam and selection.gam, but I've yet to see a answer with code that works. Additionally, I've tried method= "REML" and select = TRUE, but neither remove insignificant variables from the model.
I've theorized that I could create a step model and then use those variables to create the GAM, but that does not seem computationally efficient.
Example:
library(mgcv)
set.seed(0)
dat <- data.frame(rsp = rnorm(100, 0, 1),
pred1 = rnorm(100, 10, 1),
pred2 = rnorm(100, 0, 1),
pred3 = rnorm(100, 0, 1),
pred4 = rnorm(100, 0, 1))
model <- gam(rsp ~ s(pred1) + s(pred2) + s(pred3) + s(pred4),
data = dat, method = "REML", select = TRUE)
summary(model)
#Family: gaussian
#Link function: identity
#Formula:
#rsp ~ s(pred1) + s(pred2) + s(pred3) + s(pred4)
#Parametric coefficients:
# Estimate Std. Error t value Pr(>|t|)
#(Intercept) 0.02267 0.08426 0.269 0.788
#Approximate significance of smooth terms:
# edf Ref.df F p-value
#s(pred1) 0.8770 9 0.212 0.1174
#s(pred2) 1.8613 9 0.638 0.0374 *
#s(pred3) 0.5439 9 0.133 0.1406
#s(pred4) 0.4504 9 0.091 0.1775
---
#Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#R-sq.(adj) = 0.0887 Deviance explained = 12.3%
#-REML = 129.06 Scale est. = 0.70996 n = 100
Marra and Wood (2011, Computational Statistics and Data Analysis 55; 2372-2387) compare various approaches for feature selection in GAMs. They concluded that an additional penalty term in the smoothness selection procedure gave the best results. This can be activated in mgcv::gam() by using the select = TRUE argument/setting, or any of the following variations:
model <- gam(rsp ~ s(pred1,bs="ts") + s(pred2,bs="ts") + s(pred3,bs="ts") + s(pred4,bs="ts"), data = dat, method = "REML")
model <- gam(rsp ~ s(pred1,bs="cr") + s(pred2,bs="cr") + s(pred3,bs="cr") + s(pred4,bs="cr"),
data = dat, method = "REML",select=T)
model <- gam(rsp ~ s(pred1,bs="cc") + s(pred2,bs="cc") + s(pred3,bs="cc") + s(pred4,bs="cc"),
data = dat, method = "REML")
model <- gam(rsp ~ s(pred1,bs="tp") + s(pred2,bs="tp") + s(pred3,bs="tp") + s(pred4,bs="tp"), data = dat, method = "REML")
In addition to specifying select = TRUE in your call to function gam, you can increase the value of argument gamma to get stronger penalization. For example, we generate some data:
library("mgcv")
set.seed(2)
dat <- gamSim(1, n=400, dist="normal", scale=5)
## Gu & Wahba 4 term additive model
We fit a GAM with 'standard' penalization and variable selection:
b <- gam(y ~ s(x0) + s(x1) + s(x2) + s(x3), data=dat, method = "REML")
summary(b)
##
## Family: gaussian
## Link function: identity
##
## Formula:
## y ~ s(x0) + s(x1) + s(x2) + s(x3)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.890 0.246 32.07 <2e-16 ***
## ---
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(x0) 1.363 1.640 0.804 0.3174
## s(x1) 1.681 2.088 11.309 1.35e-05 ***
## s(x2) 5.931 7.086 16.240 < 2e-16 ***
## s(x3) 1.002 1.004 4.102 0.0435 *
## ---
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
##
## R-sq.(adj) = 0.253 Deviance explained = 27.1%
## -REML = 1212.5 Scale est. = 24.206 n = 400
par(mfrow = c(2, 2))
plot(b)
We fit a GAM with stronger penalization and variable selection:
b2 <- gam(y ~ s(x0) + s(x1) + s(x2) + s(x3), data=dat, method = "REML", select = TRUE, gamma = 7)
## summary(b2)
## Family: gaussian
## Link function: identity
##
## Formula:
## y ~ s(x0) + s(x1) + s(x2) + s(x3)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.8898 0.2604 30.3 <2e-16 ***
## ---
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(x0) 5.330e-05 9 0.000 0.1868
## s(x1) 5.427e-01 9 0.967 7.4e-05 ***
## s(x2) 1.549e+00 9 6.210 < 2e-16 ***
## s(x3) 6.155e-05 9 0.000 0.0812 .
## ---
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
##
## R-sq.(adj) = 0.163 Deviance explained = 16.7%
## -REML = 179.46 Scale est. = 27.115 n = 400
plot(b2)
According to the documentation, increasing the value of gamma produces smoother models, because it multiplies the effective degrees of freedom in the GCV or UBRE/AIC criterion.
A possible downside is thus that all non-linear effects will be shrunken towards linear effects, and all linear effects will be shrunken towards zero. This is what we also observe in the plots and output above: With higher value of gamma, some effects are practically penalized out (edf values close 0, F-value of 0), while the other effects are closer to linear (edf values closer to 1).

nls line of best fit - how to force plotting of line?

I am trying to write a basic function to add some lines of best fit to plots using nls.
This works fine unless the data just happens to be defined exactly by the formula passed to nls. I'm aware of the issues and that this is documented behaviour as reported here.
My question though is how can I get around this and force a line of best fit to be plotted regardless of the data exactly being described by the model? Is there a way to detect the data matches exactly and plot the perfectly fitting curve? My current dodgy solution is:
#test data
x <- 1:10
y <- x^2
plot(x, y, pch=20)
# polynomial line of best fit
f <- function(x,a,b,d) {(a*x^2) + (b*x) + d}
fit <- nls(y ~ f(x,a,b,d), start = c(a=1, b=1, d=1))
co <- coef(fit)
curve(f(x, a=co[1], b=co[2], d=co[3]), add = TRUE, col="red", lwd=2)
Which fails with the error:
Error in nls(y ~ f(x, a, b, d), start = c(a = 1, b = 1, d = 1)) :
singular gradient
The easy fix I apply is to jitter the data slightly, but this seems a bit destructive and hackish.
# the above code works after doing...
y <- jitter(x^2)
Is there a better way?
Use Levenberg-Marquardt.
x <- 1:10
y <- x^2
f <- function(x,a,b,d) {(a*x^2) + (b*x) + d}
fit <- nls(y ~ f(x,a,b,d), start = c(a=1, b=0, d=0))
Error in nls(y ~ f(x, a, b, d), start = c(a = 1, b = 0, d = 0)) :
number of iterations exceeded maximum of 50
library(minpack.lm)
fit <- nlsLM(y ~ f(x,a,b,d), start = c(a=1, b=0, d=0))
summary(fit)
Formula: y ~ f(x, a, b, d)
Parameters:
Estimate Std. Error t value Pr(>|t|)
a 1 0 Inf <2e-16 ***
b 0 0 NA NA
d 0 0 NA NA
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0 on 7 degrees of freedom
Number of iterations to convergence: 1
Achieved convergence tolerance: 1.49e-08
Note that I had to adjust the starting values and the result is sensitive to starting values.
fit <- nlsLM(y ~ f(x,a,b,d), start = c(a=1, b=0.1, d=0.1))
Parameters:
Estimate Std. Error t value Pr(>|t|)
a 1.000e+00 2.083e-09 4.800e+08 < 2e-16 ***
b -7.693e-08 1.491e-08 -5.160e+00 0.00131 **
d 1.450e-07 1.412e-08 1.027e+01 1.8e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 6.191e-08 on 7 degrees of freedom
Number of iterations to convergence: 3
Achieved convergence tolerance: 1.49e-08

Resources