Calculation of R^2 value for a non-linear regression - r

I would first like to say, that I understand that calculating an R^2 value for a non-linear regression isn't exactly correct or a valid thing to do.
However, I'm in a transition period of performing most of our work in SigmaPlot over to R and for our non-linear (concentration-response) models, colleagues are used to seeing an R^2 value associated with the model to estimate goodness-of-fit.
SigmaPlot calculates the R^2 using 1-(residual SS/total SS), but in R I can't seem to extract the total SS (residual SS are reported in summary).
Any help in getting this to work would be greatly appreciated as I try and move us into using a better estimator of goodness-of-fit.
Cheers.

Instead of extracting the total SS, I've just calculated them:
test.mdl <- nls(ctrl.adj~a/(1((conc.calc/x0)^b)),
data=dataSet,
start=list(a=100,b=10,x0=40), trace=T);
1 - (deviance(test.mdl)/sum((ctrl.adj-mean(ctrl.adj))^2))
I get the same R^2 as when using SigmaPlot, so all should be good.

So the total variation in y is like (n-1)*var(y) and the proportion not explained my your model is sum(residuals(fit)^2) so do something like 1-(sum(residuals(fit)^2)/((n-1)*var(y)) )

Related

SE for logistic regression predictions

I have been tasked with calculating the SE for logistic regression point estimates (where all my predictor variables are factors). I typically use ggpredict to estimate my predictions which provides CI's. However, we are comparing our results to estimates from program MARK and we find readers have a better grasp at understanding our plots with SE as opposed to 95% CI's.
Based on reading the package notes, it appears I can simply calculate (conf.high - predicted value)/1.96). Am I correct? Or am I missing something and that is not the correct way to calculate SE for the predicted estimates. If I am wrong, any ideas on how I can do this or do I need to just use CI's?
Thank you very much for your help.

Type of regression for large dataset, nonlinear, skewed in R

I'm researching moth biomass in different biotopes, and I want to find a model that estimates the biomass. I have measured the length and width of the forewing, abdomen and thorax of 37088 specimens, and I have weighed them individually (dried).
First, I wanted to a simple linear regression of each variable on the biomass. The problem is, none of the assumptions are met. The data is not linear, biomass (and some variables) don't follow a normal distribution, there is heteroskedasticity, and a lot of outliers. Now I have tried to transform my data using log, x^2, 1/x, and boxcox, but none of them actually helped. I have also tried Thiel-Sen regression (not possible because of too much data) and Siegel regression (biomass is not a vector). Is there some other form of non-parametric or median-based regression that I can try? Because I am really out of ideas.
Here is a frequency histogram for biomass:
Frequency histogram dry biomass
So what I actually want to do is to build a model that accurately estimates the dry biomass, based on the measurements I performed. I have a power function (Rogers et al.) that is general for all insects, but there is a significant difference between this estimate and what I actually weighed. Therefore, I just want to build to build a model with all significant variables. I am not very familiar with power functions, but maybe it is possible to build one myself? Can anyone recommend a method? Thanks in advance.
To fit a power function, you could perhaps try nlsLM from the minpack.lm package
library(minpack.lm)
m <- nlsLM( y ~ a*x^b, data=your.data.here )
Then see if it performs satisfactory.

Interpret GLM formula as an ecuation

Sorry to ask such a basic question, but I am stuck in interpreting glm formula.
I calculated a binomial model and I want to use the formula output (intercept and each of the estimated coefficients) to "manually" calculate the predicted score calculated by the model.
In the case of a linear regression this should be something like
y = a +b1*x1+ ... + bn*xn
or
score = intercept + x.1*variable.1 + ... + x.n* variable.n
but I understand that logistic regression is different, and could not find how.
Could someone help me with this?
thanks in advance!!!
I find out that the score could be manually calculated aplying the formula as is detailed in teh output and using inv.logit() to the value obtained.

Pseudo R squared for cumulative link function

I have an ordinal dependent variable and trying to use a number of independent variables to predict it. I use R. The function I use is clm in the ordinal package, to perform a cumulative link function with a probit link, to be precise:
I tried the function pR2 in the package pscl to get the pseudo R squared with no success.
How do I get pseudo R squareds with the clm function?
Thanks so much for your help.
There are a variety of pseudo-R^2. I don't like to use any of them because I do not see the results as having a meaning in the real world. They do not estimate effect sizes of any sort and they are not particularly good for statistical inference. Furthermore in situations like this with multiple observations per entity, I think it is debatable which value for "n" (the number of subjects) or the degrees of freedom is appropriate. Some people use McFadden's R^2 which would be relatively easy to calculate, since clm generated a list with one of its values named "logLik". You just need to know that the logLikelihood is only a multiplicative constant (-2) away from the deviance. If one had the model in the first example:
library(ordinal)
data(wine)
fm1 <- clm(rating ~ temp * contact, data = wine)
fm0 <- clm(rating ~ 1, data = wine)
( McF.pR2 <- 1 - fm1$logLik/fm0$logLik )
[1] 0.1668244
I had seen this question on CrossValidated and was hoping to see the more statistically sophisticated participants over there take this one on, but they saw it as a programming question and dumped it over here. Perhaps their opinion of R^2 as a worthwhile measure is as low as mine?
Recommend to use function nagelkerke from rcompanion package to get Pseudo r-squared.
When your predictor or outcome variables are categorical or ordinal, the R-Squared will typically be lower than with truly numeric data. R-squared merely a very weak indicator about model's fit, and you can't choose model based on this.

How to get 95% CIs using ezANOVA()

This is a programming question for people who like to use the ez package in R. I am accustomed to using linear mixed effects models with lmer(). Among the useful outputs of lmer (), I get a coefficient value for each of my experimental factors, and using pvals.fnc() I can easily get 95% confidence intervals (CI) to report together with the model coefficients.
I have recently started using ezANOVA, and I would like to know: Is there a mainstream way to get the same output? That is, I'd like to get a value for the coefficient of an experimental factor and a CI to go along with it. Here is sample code to make this concrete:
library(languageR) #necessary to use pvals.fnc()
library(lme4) #necessary for lmer()
library(ez) #necessary for ezANOVA
data(ANT) #load sample data
If I were using lmer, I would estimate my model and then get 95% CIs for the coefficients:
model_lmer = lmer( formula = rt ~ cue*flank + (1|subnum), data = ANT)
pvals.fnc(model_lmer, withMCMC=T)$fixed
So, for example, I know that the estimate of the interaction between cue and flank (when cue has the level "center" and flank has the level "congruent") is -3.9511 and the 95% CI is [-12.997, 5.535]
Now say that I want to run an anova by-subjects and by-items using ezANOVA, and I want to get 95% CIs for the by-subject estimates. This is my model:
model.f1 = ezANOVA(data=ANT, dv=rt,wid=subnum,within=.(cue,flank),return_aov=T)
But in the output, I don't see the model estimates when I do:
model.f1$ANOVA
And I don't know how to calculate the 95% CIs corresponding to those estimates. I think I should be able to use ezBoot() but I tried and I'm not sure how to implement it.
Any suggestions? Thanks for your help!
This answer was provided by the author of the "ez" package in another forum. I'm copying it here in case someone else finds it useful:
"One somewhat hacky way to get CIs for effects is to use ezStats () to get the means
and FLSD, compute the difference between the means to get the effect,
and divide the FLSD by sqrt(2) to get the CI"

Resources