How to calculate R-squared in nls package (non-linear model) in R? - r

I analyzed non-linear regression using nls package.
power<- nls(formula= agw~a*area^b, data=calibration_6, start=list(a=1, b=1))
summary(power)
I heard in non-linear model, R-squared is not valid and rather than R-squared, we usually show residual standard error which R provides
However, I just want to know what R-squared is. Is that possible to check R-squared in nls package?
Many thanks!!!
OutPut

I found the solution. This method might not be correct in terms of statistics (As R^2 is not valid in non-linear model), but I just want see the overall goodness of fit for my non-linear model.
Step 1> to transform data as log (common logarithm)
When I use non-linear model, I can't check R^2
nls(formula= agw~a*area^b, data=calibration, start=list(a=1, b=1))
Therefore, I transform my data to log
x1<- log10(calibration$area)
y1<- log10(calibration$agw)
cal<- data.frame (x1,y1)
Step 2> to analyze linear regression
logdata<- lm (formula= y1~ x1, data=cal)
summary(logdata)
Call:
lm(formula = y1 ~ x1)
This model provides, y= -0.122 + 1.42x
But, I want to force intercept to zero, therefore,
Step 3> to force intercept to zero
logdata2<- lm (formula= y1~ 0 + x1)
summary(logdata2)
Now the equation is y= 1.322x, which means log (y) = 1.322 log (x),
so it's y= x^1.322.
In power curve model, I force intercept to zero. The R^2 is 0.9994

Related

How to force intercept to zero using nls pacakge in R?

I used nls package to analyze non linear model (power curve, y= ax^b).
cal<- nls(formula= agw~a*area^b, data=calibration_6, start=list(a=1, b=1))
summary(cal)
What I want now is to force intercept (a) to zero to check something. In Excel, I can't set intercept for power curve. In R, is that possible to set intercept?
If possible, could you tell me how to do it?
y~ x^b model is what I considered first,
nls(formula= agw ~ area^b, data=calibration_6, start=list(b=1))
but, I also found other way, please check the below link.
How to calculate R-squared in nls package (non-linear model) in R?

Restricted Cubic Spline output in R rms package after cph

I am developing a COX regression model in R.
The model I am currently using is as follows
fh <- cph(S ~ rcs(MPV,4) + rcs(age,3) + BMI + smoking + hyperten + gender +
rcs(FVCPP,3) + TLcoPP, x=TRUE, y=TRUE, surv=TRUE, time.inc=2*52)
If I then want to look at this with
print(fh, latex = TRUE)
I get 3 coefs/SE/Wald etc for MPV (MVP, MVP' and MVP'') and 2 for age (age, age').
Could someone please explain to me what these outputs are? i.e. I believe they are to do with the restricted cubic splines I have added.
When you write rcs(MPV,4), you define the number of knots to use in the spline; in this case 4. Similarly, rcs(age,3) defines a spline with 3 knots. Due to identifiability constraints, 1 knot from each spline is subtracted out. You can think of this as defining an intercept for each spline. So rcs(Age,3) is a linear combination of 2 nonlinear basis functions and an intercept, while rcs(MPV,4) is a linear combination of 3 nonlinear basis functions and an intercept, i.e.,
and
In the notation above, what you get out from the print statement are the regression coefficients and , with corresponding standard errors, p-values etc. The intercepts and are typically set to zero, but they are important, because without them, the model fitting routine how have no idea of where on the y-axis to constrain the splines.
As a final note, you might actually be more interested in the output of summary(fh).

Covariance matrix of the estimated parameteric coefficients and estimated smoothing parameters in a GAM (package: mgcv)?

Consider the code below to fit a generalized additive model including two terms x0 which is linear and x1 which is nonlinear:
library(mgcv)
set.seed(2) ## simulate some data...
dat <- gamSim(1,n=400,dist="normal",scale=2, method="REML")
b <- gam(y~x1+s(x2, k=5),data=dat)
The model b estimates 3 parameters: an intercept, one parametric coefficient for x1, and one smoothing parameter for x2. How can I extract the estimated covariance matrix of these 3 parameters? I have used vcov(b) which gives the following results:
(Intercept) x0 s(x1).1 s(x1).2 s(x1).3 s(x1).4
(Intercept) 0.104672470 -0.155791753 0.002356237 0.001136459 0.001611635 0.001522158
x0 -0.155791753 0.322528093 -0.004878003 -0.002352757 -0.003336490 -0.003151250
s(x1).1 0.002356237 -0.004878003 0.178914602 0.047701707 0.078393786 0.165195739
s(x1).2 0.001136459 -0.002352757 0.047701707 0.479869768 0.606310668 0.010704075
s(x1).3 0.001611635 -0.003336490 0.078393786 0.606310668 0.933905535 0.025816649
s(x1).4 0.001522158 -0.003151250 0.165195739 0.010704075 0.025816649 0.184471259
It seems vcov(b) gives the covariance related to each knot of the smooth term s(x1), as the results contain s(x1).1, s(x1).2, s(x1).3, s(x1).4 (That's what I guess). I need the covariance between the estimated smoothing parameter and other parametric coefficients, which should be just one for (Intercept) and just one for x0. Is it available at all?
Edit: I set the method of estimation to REML in the code. I agree that I might have used incorrect phrases to explain my idea as said by Gavin Simpson, and I understand all he said. Yet the idea of calculating the covariance between the parametric coefficients (intercept and coefficient of x1) and them smoothing parameter comes from the method of estimation. If we set it to ML or REML, then there could be the covariance I guess. In this case, the estimated covariance matrix for the log smoothing parameter estimates are provided by sp.vcov. So I think such value could exist similarly for the parametric coefficients and the smoothing parameter.
Your statement
The model b estimates 3 parameters: an intercept, one parametric coefficient for x1, and one smoothing parameter for x2.
is incorrect.
The model estimates many more coefficients that these three. Note also that it is confusing to speak of a smoothing parameter for x2 as the model also estimates one of those, but I doubt this is what you mean by that phrase. The smoothing parameter estimated for x2 is the value that controls the wiggliness of the fitted spline. It is also estimated alongside the other coefficients you see, although it isn't typically considered as part of the main model estimated parameters because what you see in the VCOV are actually the variances and covariances of the model coefficients conditional upon this value of the smoothness parameter.
The GAM fitted here is one in which the effect of x2 is represented by a spline basis expansion of x2. For the basis used and the identifiability constraints applied to the basis, this means that the true effect of x2, f(x2), is estimated via a k-1 basis functions. This is a function hat(f(x2)) = \sum \beta_i b_i(x2) estimated by summing up the weighted (by beta_i, the model coefficients for the ith basis function, b) basis functions evaluated at the observed values of x2 (b_i(x2)).
Hence once the basis is chosen and once we have a smoothness parameter (my version, the one controlling the wiggliness), this model is simply a GLM with x1 and the 4 basis functions evaluated at x2. Hence it is parametric and there isn't a single element in the VCOV that relates to the smooth f(x2) - the model just doesn't work that way.

Probability (density) of new dataset under fitted model

Given a fitted model in R (i.e. an object of class 'lm', 'glm', 'merMod', etc), I am trying to figure out how to calculate the probability of a new dataset.
That is, I want the probability (density) of dataset B under the parameter estimates obtained by fitting a model to dataset A. I know how to do this in general, but I am wondering whether a simple pre-existing function can do this in R. Is there a simple function to do this?
This question is very similar, but I want to do this in R.
for a linear regression model lm, you could use the following function to determine and visualize the likelihood, assuming the residuals of linear model fit is normally distributed (these functions were adapted from this R Blogger Post; rationale of the procedure can be found in this post):
log_lik <- function(beta0, beta1, mu, sigma) {
## beta0 and beta1 require intial guesses
R <- y - x * beta1 - beta0
R <- dnorm(R, mu, sigma, log=T)
return(-sum(R))
}
library(stats4)
fit <- mle(log_lik, start=list(beta0=4, beta1=2, mu = 0, sigma=1))
summary(fit)
## mu will be your estimated likelihood
## sigma will be uncertainty
For glm, this post in Cross Validated provides user-defined R functions for the likelihood.
(ps: It would be nice if you could provide a specific example involving one of lm, glm, etc. If you just want to know these models in general, Cross Validated, Mathematics, or Data Science might be better places to ask.)

Determining the values of beta in a logistic regression equation

I read about logistic regression on Wikipedia and it talks of the equation where z, the output depends on the values of beta1, beta2, beta3, and so on which are termed as regression coefficients along with beta0, which is the intercept.
How do you determine the values of these coefficients and intercept given a sample set where we know the output probability and the values of the input dependent factors?
Use R. Here's a video: www.youtube.com/watch?v=Yv05RjKpEKY

Resources