Getting p-value via nls in r - r

I'm a complete novice in R and I'm trying to do a non-linear least squares fit to some data. The following (SC4 and t are my data columns) seems to work:
fit = nls(SC4 ~ fin+(inc-fin)*exp(-t/T), data=asc4, start=c(fin=0.75,inc=0.55,T=150.0))
The "summary(fit)" command produces an output that doesn't include a p-value, which is ultimately, aside from the fitted parameters, what I'm trying to get. The parameters I get look sensible.
Formula: SC4 ~ fin + (inc - fin) * exp(-t/T)
Parameters:
Estimate Std. Error t value Pr(>|t|)
fin 0.73703 0.02065 35.683 <2e-16 ***
inc 0.55671 0.02206 25.236 <2e-16 ***
T 51.48446 21.25343 2.422 0.0224 *
--- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.04988 on 27 degrees of freedom
Number of iterations to convergence: 8
Achieved convergence tolerance: 4.114e-06
So is there any way to get a p-value? I'd be happy to use another command other than nls if that will do the job. In fact, I'd be happy to use gnuplot for example if there's some way to get a p-value from it (in fact gnuplot is what I normally use for graphics).
PS I'm looking for a p-value for the overall fit, rather than for individual coefficients.

The way to do this in R is you have to use the anova function to compute the fitted values of your current model and then fit your model with less variables, and then use the function anova(new_model,previous_model). The computed F-score will be closer to one if you cannot reject the null that the parameters for the variables you have removed are equal to zero. The summary function when doing the standard linear regression will usually do this for you automatically.
So for example this is how you would do it for the standard linear regression:
> x = rnorm(100)
> y=rnorm(100)
> reg = lm(y~x)
> summary(reg)
Call:
lm(formula = y ~ x)
Residuals:
Min 1Q Median 3Q Max
-2.3869 -0.6815 -0.1137 0.7431 2.5939
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.002802 0.105554 -0.027 0.9789
x -0.182983 0.104260 -1.755 0.0824 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.056 on 98 degrees of freedom
Multiple R-squared: 0.03047, Adjusted R-squared: 0.02058
F-statistic: 3.08 on 1 and 98 DF, p-value: 0.08237
But then if you use anova you should get the same F-score:
> anova(reg)
Analysis of Variance Table
Response: y
Df Sum Sq Mean Sq F value Pr(>F)
x 1 3.432 3.4318 3.0802 0.08237 .
Residuals 98 109.186 1.1141
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Related

Where can I find the significance of my random effect for a GAMM in gamlss?

I created the following GAMM function using the R package gamlss:
model<-gamlss(Overlap~ Diff.Long + Diff.Fzp + DiffSeason +
random(Xnumber),family=BEZI(mu.link = "logit", sigma.link = "log",
nu.link = "logit"),data=data,trace=F)
The output of this model is:
******************************************************************
Family: c("BEZI", "Zero Inflated Beta")
Call: gamlss(formula = Overlap ~ Diff.Long + Diff.Fzp + DiffSeason +
random(Xnumber), family = BEZI(mu.link = "logit",
sigma.link = "log", nu.link = "logit"), data = data, trace = F)
Fitting method: RS()
------------------------------------------------------------------
Mu link function: logit
Mu Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.188647 0.208359 -0.905 0.36715
Diff.Long -0.002072 0.000736 -2.814 0.00575 **
Diff.Fzp -0.030909 0.013749 -2.248 0.02648 *
DiffSeasonEW-LW -0.617976 0.211260 -2.925 0.00415 **
DiffSeasonLW-LW -0.356989 0.270548 -1.320 0.18963
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
------------------------------------------------------------------
Sigma link function: log
Sigma Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.865 0.126 14.8 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
------------------------------------------------------------------
Nu link function: logit
Nu Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.3470 0.3156 -7.437 2.02e-11 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
------------------------------------------------------------------
NOTE: Additive smoothing terms exist in the formulas:
i) Std. Error for smoothers are for the linear effect only.
ii) Std. Error for the linear terms maybe are not accurate.
------------------------------------------------------------------
No. of observations in the fit: 126
Degrees of Freedom for the fit: 11.15247
Residual Deg. of Freedom: 114.8475
at cycle: 5
Global Deviance: -43.54531
AIC: -21.24037
SBC: 10.39118
******************************************************************
I'm not the most familiar yet with additive models, and am trying to find the significance of my random effect ("Xnumber"). I know that the package mgcv has a way but the package gamlss is the only one to have the distribution I need (zero-inflated beta).
If anyone knows any functions I can use, that would be great? Or is it in the summary, but I just don't know where to look?

What is ga() in the gamlss package doing?

I have been looking into the gamlss package for fitting semiparametric models and came across something strange in the ga() function. Even if the model is specified as having a gamma distribution, fitted using REML, the output for the model is Gaussian, fitted using GCV.
Example::
library(mgcv)
library(gamlss)
library(gamlss.add)
data(rent)
ga3 <- gam(R~s(Fl)+s(A), method="REML", data=rent, family=Gamma(log))
gn3 <- gamlss(R~ga(~s(Fl)+s(A), method="REML"), data=rent, family=GA)
Model summary for the GAM::
summary(ga3)
Family: Gamma
Link function: log
Formula:
R ~ s(Fl) + s(A)
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.667996 0.008646 771.2 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Approximate significance of smooth terms:
edf Ref.df F p-value
s(Fl) 1.263 1.482 442.53 <2e-16 ***
s(A) 4.051 4.814 36.34 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
R-sq.(adj) = 0.302 Deviance explained = 28.8%
-REML = 13979 Scale est. = 0.1472 n = 1969
Model summary for the GAMLSS::
summary(getSmo(gn3))
Family: gaussian
Link function: identity
Formula:
Y.var ~ s(Fl) + s(A)
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.306e-13 8.646e-03 0 1
Approximate significance of smooth terms:
edf Ref.df F p-value
s(Fl) 1.269 1.492 440.14 <2e-16 ***
s(A) 3.747 4.469 38.83 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
R-sq.(adj) = 0.294 Deviance explained = 29.6%
GCV = 0.97441 Scale est. = 0.97144 n = 1969
Question::
Why is the model output giving the incorrect distribution and fitting method? Is there something that I am missing here and this is correct?
When using the ga()-function, gamlss calls in the background the gam()-function from mgcv without specifying the family. As a result, the splines are fitted assuming a normal distribution. Therefore you see when showing the fitted smoothers family: gaussian and link function: identity. Also note that the scale estimate returned when using ga() is related to the normal distribution.
Yes, when using the ga()-function, each gamlss iteration calls in the background the gam()-function from mgcv. It uses the correct local working variable and local weights for a gamma distribution on each iteration.

Interpreting Kendall-Theil Sen Siegel Nonparametric Linear Regression from RStudio and writing up APA style

I am a newbie when it comes to stats and I am performing Kendall-Theil Sen (Siegel variation) linear regressions for my dissertation. I am really not sure how to interpret the output and google hasn't offered much help either. I know the estimate reflects the regression coefficients but I am completely oblivious as how to write this up correctly.
This is my output for one regression:
Call:
mblm(formula = SCI_TotalScore ~ DefeatTotalScore, dataframe = complete_dat_totals)
Residuals:
Min 1Q Median 3Q Max
-11.6089 -4.0464 -0.3589 6.1411 14.3911
Coefficients:
Estimate MAD V value Pr(>|V|)
(Intercept) 23.8589 7.6916 4095.0 < 2e-16 ***
DefeatTotalScore -0.2500 0.2553 245.5 1.05e-12 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 6.247 on 88 degrees of freedom
Can anyone offer any help?

linear regression r comparing multiple observations vs single observation

Based upon answers of my question, I am supposed to get same values of intercept and the regression coefficient for below 2 models. But they are not the same. What is going on?
is something wrong with my code? Or is the original answer wrong?
#linear regression average qty per price point vs all quantities
x1=rnorm(30,20,1);y1=rep(3,30)
x2=rnorm(30,17,1.5);y2=rep(4,30)
x3=rnorm(30,12,2);y3=rep(4.5,30)
x4=rnorm(30,6,3);y4=rep(5.5,30)
x=c(x1,x2,x3,x4)
y=c(y1,y2,y3,y4)
plot(y,x)
cor(y,x)
fit=lm(x~y)
attributes(fit)
summary(fit)
xdum=c(20,17,12,6)
ydum=c(3,4,4.5,5.5)
plot(ydum,xdum)
cor(ydum,xdum)
fit1=lm(xdum~ydum)
attributes(fit1)
summary(fit1)
> summary(fit)
Call:
lm(formula = x ~ y)
Residuals:
Min 1Q Median 3Q Max
-8.3572 -1.6069 -0.1007 2.0222 6.4904
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 40.0952 1.1570 34.65 <2e-16 ***
y -6.1932 0.2663 -23.25 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.63 on 118 degrees of freedom
Multiple R-squared: 0.8209, Adjusted R-squared: 0.8194
F-statistic: 540.8 on 1 and 118 DF, p-value: < 2.2e-16
> summary(fit1)
Call:
lm(formula = xdum ~ ydum)
Residuals:
1 2 3 4
-0.9615 1.8077 -0.3077 -0.5385
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 38.2692 3.6456 10.497 0.00895 **
ydum -5.7692 0.8391 -6.875 0.02051 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.513 on 2 degrees of freedom
Multiple R-squared: 0.9594, Adjusted R-squared: 0.9391
F-statistic: 47.27 on 1 and 2 DF, p-value: 0.02051
You are not calculating xdum and ydum in a comparable fashion because rnorm will only approximate the mean value you specify, particularly when you are sampling only 30 cases. This is easily fixed however:
coef(fit)
#(Intercept) y
# 39.618472 -6.128739
xdum <- c(mean(x1),mean(x2),mean(x3),mean(x4))
ydum <- c(mean(y1),mean(y2),mean(y3),mean(y4))
coef(lm(xdum~ydum))
#(Intercept) ydum
# 39.618472 -6.128739
In theory they should be the same if (and only if) the mean of the former model is equal to the point in the latter model.
This is not the case in your models, so the results are slightly different. For example the mean of x1:
x1=rnorm(30,20,1)
mean(x1)
20.08353
where the point version is 20.
There are similar tiny differences from your other rnorm samples:
> mean(x2)
[1] 17.0451
> mean(x3)
[1] 11.72307
> mean(x4)
[1] 5.913274
Not that this really matters, but just FYI the standard nomenclature is that Y is the dependent variable and X is the independent variable, which you reversed. Makes no difference of course, but just so you know.

Extract only coefficients whose p values are significant from a logistic model

I have run a logistic regression, the summary of which I name. "score" Accordingly, summary(score) gives me the following
Deviance Residuals:
Min 1Q Median 3Q Max
-1.3616 -0.9806 -0.7876 1.2563 1.9246
Estimate Std. Error z value Pr(>|z|)
(Intercept) -4.188286233 1.94605597 -2.1521921 0.031382230 *
Overall -0.013407201 0.06158168 -0.2177141 0.827651866
RTN -0.052959314 0.05015013 -1.0560154 0.290961160
Recorded 0.162863294 0.07290053 2.2340482 0.025479900 *
PV -0.086743611 0.02950620 -2.9398438 0.003283778 **
Expire -0.035046322 0.04577103 -0.7656878 0.443862068
Trial 0.007220173 0.03294419 0.2191637 0.826522498
Fitness 0.056135418 0.03114687 1.8022810 0.071501212 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 757.25 on 572 degrees of freedom
Residual deviance: 725.66 on 565 degrees of freedom
AIC: 741.66
Number of Fisher Scoring iterations: 4
What I am hoping to achieve is to get variables names and coefficients of those variables which have a *, **, or *** next to their Pr(>|z|) value. In other words, I want the aforementioned variables and coefficients with a Pr(>|z|) < .05.
Ideally, I'd like to get them in a data frame. Unfortunately, the following code I've tried does not work.
variable_try <-
summary(score)$coefficients[if(summary(score)$coefficients[, 4] <= .05,
summary(score)$coefficients[, 1]),]
Error: unexpected ',' in "variable_try <-
summary(score)$coefficients[if(summary(score)$coefficients[,4] < .05,"
What about this:
data.frame(summary(score)$coef[summary(score)$coef[,4] <= .05, 4])

Resources