R: Fit curve to points: what linear/non-linear model to use? - r

I have a data which should follow the power law distribution.
x = distance
y = %
I want to create a model and to add the fitted line to my plot.
My aim to recreate something like this:
As author uses R-square; I assume they applied linear models, as R^2 is not suitable for non-linear models. http://blog.minitab.com/blog/adventures-in-statistics-2/why-is-there-no-r-squared-for-nonlinear-regression
However, I can't find out how to "curve" my line to the points; how to add the formula y ~ a*x^(-b) to my model.
Instead of curly line I got back the line as from the simple linear regression.
My questions are:
Do I correctly assume the model y ~ a*x^(-b) used by author is linear?
what type of model to use to recreate my example: lm, glm, nls, etc. ?
I generated the dummy data, including the applied power law formula from the plot above:
set.seed(42)
scatt<-runif(10)
x<-seq(1, 1000, 100)
b = 1.8411
a = 133093
y = a*x^(-b) + scatt # add some variability in my dependent variable
plot(y ~ x)
and tried to create a glm model.
# formula for non-linear model
m<-m.glm<-glm(y ~ x^2, data = dat) #
# add predicted line to plot
lines(x,predict(m),col="red",lty=2,lwd=3)
This is my first time to model, so I am really confused and I don't know where to start... thank you for any suggestion or directions, I really appreciate it...

I personally think this question a dupe of this: `nls` fails to estimate parameters of my model but I would be cold-blooded if I close it (as OP put a bounty). Anyway, bounty question can not be closed.
So the best I could think of, is to post a community wiki answer (I don't want to get this bounty).
As you want to fit a model of this form y ~ a*x^(-b), it often benefit from taking log transform on both sides and fit a linear model log(y) ~ log(x).
fit <- lm(log(y) ~ log(x))
As you have already known how to use curve to plot regression curve and are happy with it, I will now show how to make plot.
Some people call this log-log regression. Here are some other links I have for such kind of regression:
How to predict a new value using simple linear regression log(y)=b0+b1*log(x)
How to plot confidence bands for my weighted log-log linear regression?

m <- lm(log(y) ~ log(x), data=dat)
a <- exp(intercept)
b <- -exp(slope)
plot(y ~ x, type="p", lty=3)
lines(x, exp(predict(m)), col="blue", lty=2, lwd=3)

Related

Curve Fitting in R

Any resources for curve fitting in R?
I came across https://systatsoftware.com/products/sigmaplot/product-uses/sigmaplot-products-uses-curve-fitting-using-sigmaplot/
Any similar recommendations or libraries in R?
Thank you!
Hi There are not one but several ways to do curve fitting in R. You could start with something as simple as below
x <- c(32,64,96,118,126,144,152.5,158)
#make y as response variable
y <- c(99.5,104.8,108.5,100,86,64,35.3,15)
plot(x,y,pch=19)
This should give you the below plot. Eyeballing the curve tells us we can fit some nice polynomial curve here.
Now we could fit our curve(s) on the data below:
linMod <- lm(y~x)
#second degree polynomial model
linMod2 <- lm(y~poly(x,2,raw=TRUE))
#third degree polynomial model
linMod3 <- lm(y~poly(x,3,raw=TRUE))
#fourth degree polynomial model
linMod4 <- lm(y~poly(x,4,raw=TRUE))
#generate new data in range of 50 numbers starting from 30 and ending at 160
newData <- seq(30,160, length=50)
plot(x,y,pch=19,ylim=c(0,150))
lines(newData, predict(linMod, data.frame(x=newData)), col="red")
lines(newData, predict(linMod2, data.frame(x=newData)), col="green")
lines(newData, predict(linMod3, data.frame(x=newData)), col="blue")
lines(newData, predict(linMod4, data.frame(x=newData)), col="purple")
Giving us:
This is just a simple illustration of curve fitting in R. There are tons of tutorials available out there, perhaps you could start looking here:
http://www.css.cornell.edu/faculty/dgr2/teach/R/R_CurveFit.pdf
https://rpubs.com/carlmart/228874
Fitting a curve to specific data

Plot Non-linear Mixed Model Over Original Fitted Data

I'm trying to plot the resultant curve from fitting a non-linear mixed model. It should be something like a curve of a normal distribution but skewed to the right. I followed previous links here and here, but when I use my data I can not make it happen for different difficulties (see below).
Here is the dataset
and code
s=read.csv("GRVMAX tadpoles.csv")
t=s[s$SPP== levels(s$SPP)[1],]
head(t)
vmax=t[t$PERFOR=="VMAX",]
colnames(vmax)[6]="vmax"
vmax$TEM=as.numeric(as.character(vmax$TEM));
require(lme4)
start =c(TEM=25)
is.numeric(start)
nm1 <- nlmer ( vmax ~ deriv(TEM)~TEM|INDIVIDUO,nlpars=start, nAGQ =0,data= vmax)# this gives an error suggesting nlpars is not numeric, despite start is numeric...:~/
After that, I want to plot the curve over the original data
with(vmax,plot(vmax ~ (TEM)))
x=vmax$TEM
lines(x, predict(nm1, newdata = data.frame(TEM = x, INDIVIDUO = "ACI5")))
Any hint?
Thanks in advance

add line to nls fit and plot

I am having problems adding a line to one scatter plot. My data is fit to an exponential model. I used nls to get coefficients:
fit <- nls(volumen ~ bo+ exp(b1*dap), data= df, start = list(bo=0, b1=55))
Later I used plot and lines commands to visualize the fit to my data:
plot(dap,volumen)
lines(dap,predict(fit,data.frame(x=dap)))
My Big problem is that I visualize a linear line that does not fit with my exponential points.
Could my model statement have errors?
Please, any comments I would appreciate.

plot multiple ROC curves for logistic regression model in R

I have a logistic regression model (using R) as
fit6 <- glm(formula = survived ~ ascore + gini + failed, data=records, family = binomial)
summary(fit6)
I'm using pROC package to draw ROC curves and figure out AUC for 6 models fit1 through fit6.
I have approached this way to plots one ROC.
prob6=predict(fit6,type=c("response"))
records$prob6 = prob6
g6 <- roc(survived~prob6, data=records)
plot(g6)
But is there a way I can combine the ROCs for all 6 curves in one plot and display the AUCs for all of them, and if possible the Confidence Intervals too.
You can use the add = TRUE argument the plot function to plot multiple ROC curves.
Make up some fake data
library(pROC)
a=rbinom(100, 1, 0.25)
b=runif(100)
c=rnorm(100)
Get model fits
fit1=glm(a~b+c, family='binomial')
fit2=glm(a~c, family='binomial')
Predict on the same data you trained the model with (or hold some out to test on if you want)
preds=predict(fit1)
roc1=roc(a ~ preds)
preds2=predict(fit2)
roc2=roc(a ~ preds2)
Plot it up.
plot(roc1)
plot(roc2, add=TRUE, col='red')
This produces the different fits on the same plot. You can get the AUC of the ROC curve by roc1$auc, and can add it either using the text() function in base R plotting, or perhaps just toss it in the legend.
I don't know how to quantify confidence intervals...or if that is even a thing you can do with ROC curves. Someone else will have to fill in the details on that one. Sorry. Hopefully the rest helped though.

add a logarithmic regression line to a scatterplot (comparison with Excel)

In Excel, it's pretty easy to fit a logarithmic trend line of a given set of trend line. Just click add trend line and then select "Logarithmic." Switching to R for more power, I am a bit lost as to which function should one use to generate this.
To generate the graph, I used ggplot2 with the following code.
ggplot(data, aes(horizon, success)) + geom_line() + geom_area(alpha=0.3)+
stat_smooth(method='loess')
But the code does local polynomial regression fitting which is based on averaging out numerous small linear regressions. My question is whether there is a log trend line in R similar to the one used in Excel.
An alternative I am looking for is to get an log equation in form y = (c*ln(x))+b; is there a coef() function to get 'c' and 'b'?
Let my data be:
c(0.599885189,0.588404133,0.577784156,0.567164179,0.556257176,
0.545350172,0.535112897,0.52449292,0.51540375,0.507271336,0.499904325,
0.498851894,0.498851894,0.497321087,0.4964600,0.495885955,0.494068121,
0.492154612,0.490145427,0.486892461,0.482395714,0.477229238,0.471010333)
The above data are y-points while the x-points are simply integers from 1:length(y) in increment of 1. In Excel: I can simply plot this and add a logarithmic trend line and the result would look:
With black being the log. In R, how would one do this with the above dataset?
I prefer to use base graphics instead of ggplot2:
#some data with a linear model
x <- 1:20
set.seed(1)
y <- 3*log(x)+5+rnorm(20)
#plot data
plot(y~x)
#fit log model
fit <- lm(y~log(x))
#look at result and statistics
summary(fit)
#extract coefficients only
coef(fit)
#plot fit with confidence band
matlines(x=seq(from=1,to=20,length.out=1000),
y=predict(fit,newdata=list(x=seq(from=1,to=20,length.out=1000)),
interval="confidence"))
#some data with a non-linear model
set.seed(1)
y <- log(0.1*x)+rnorm(20,sd=0.1)
#plot data
plot(y~x)
#fit log model
fit <- nls(y~log(a*x),start=list(a=0.2))
#look at result and statistics
summary(fit)
#plot fit
lines(seq(from=1,to=20,length.out=1000),
predict(fit,newdata=list(x=seq(from=1,to=20,length.out=1000))))
You can easily specify alternative smoothing methods (such as lm(), linear least-squares fitting) and an alternative formula
library(ggplot2)
g0 <- ggplot(dat, aes(horizon, success)) + geom_line() + geom_area(alpha=0.3)
g0 + stat_smooth(method="lm",formula=y~log(x),fill="red")
The confidence bands are automatically included: I changed the color to make them visible since they're very narrow. You can use se=FALSE in stat_smooth to turn them off.
The other answer shows you how to get the coefficients:
coef(lm(success~log(horizon),data=dat))
I can imagine you might next want to add the equation to the graph: see Adding Regression Line Equation and R2 on graph
I'm pretty sure a simple +scale_y_log10() would get you what you wanted. GGPlot stats are calculated after transformations, so the loess() would then be calculated on the log transformed data.
I've just written a blog post here that describes how to match Excel's logarithmic curve fitting exactly. The nub of the approach centers around the lm() function:
# Set x and data.to.fit to the independent and dependent variables
data.to.fit <- c(0.5998,0.5884,0.5777,0.5671,0.5562,0.5453,0.5351,0.524,0.515,0.5072,0.4999,0.4988,0.4988,0.4973,0.49,0.4958,0.4940,0.4921,0.4901,0.4868,0.4823,0.4772,0.4710)
x <- c(seq(1, length(data.to.fit)))
data.set <- data.frame(x, data.to.fit)
# Perform a logarithmic fit to the data set
log.fit <- lm(data.to.fit~log(x), data=data.set)
# Print out the intercept, log(x) parameters, R-squared values, etc.
summary(log.fit)
# Plot the original data set
plot(data.set)
# Add the log.fit line with confidence intervals
matlines(predict(log.fit, data.frame(x=x), interval="confidence"))
Hope that helps.

Resources