add a logarithmic regression line to a scatterplot (comparison with Excel) - r

In Excel, it's pretty easy to fit a logarithmic trend line of a given set of trend line. Just click add trend line and then select "Logarithmic." Switching to R for more power, I am a bit lost as to which function should one use to generate this.
To generate the graph, I used ggplot2 with the following code.
ggplot(data, aes(horizon, success)) + geom_line() + geom_area(alpha=0.3)+
stat_smooth(method='loess')
But the code does local polynomial regression fitting which is based on averaging out numerous small linear regressions. My question is whether there is a log trend line in R similar to the one used in Excel.
An alternative I am looking for is to get an log equation in form y = (c*ln(x))+b; is there a coef() function to get 'c' and 'b'?
Let my data be:
c(0.599885189,0.588404133,0.577784156,0.567164179,0.556257176,
0.545350172,0.535112897,0.52449292,0.51540375,0.507271336,0.499904325,
0.498851894,0.498851894,0.497321087,0.4964600,0.495885955,0.494068121,
0.492154612,0.490145427,0.486892461,0.482395714,0.477229238,0.471010333)
The above data are y-points while the x-points are simply integers from 1:length(y) in increment of 1. In Excel: I can simply plot this and add a logarithmic trend line and the result would look:
With black being the log. In R, how would one do this with the above dataset?

I prefer to use base graphics instead of ggplot2:
#some data with a linear model
x <- 1:20
set.seed(1)
y <- 3*log(x)+5+rnorm(20)
#plot data
plot(y~x)
#fit log model
fit <- lm(y~log(x))
#look at result and statistics
summary(fit)
#extract coefficients only
coef(fit)
#plot fit with confidence band
matlines(x=seq(from=1,to=20,length.out=1000),
y=predict(fit,newdata=list(x=seq(from=1,to=20,length.out=1000)),
interval="confidence"))
#some data with a non-linear model
set.seed(1)
y <- log(0.1*x)+rnorm(20,sd=0.1)
#plot data
plot(y~x)
#fit log model
fit <- nls(y~log(a*x),start=list(a=0.2))
#look at result and statistics
summary(fit)
#plot fit
lines(seq(from=1,to=20,length.out=1000),
predict(fit,newdata=list(x=seq(from=1,to=20,length.out=1000))))

You can easily specify alternative smoothing methods (such as lm(), linear least-squares fitting) and an alternative formula
library(ggplot2)
g0 <- ggplot(dat, aes(horizon, success)) + geom_line() + geom_area(alpha=0.3)
g0 + stat_smooth(method="lm",formula=y~log(x),fill="red")
The confidence bands are automatically included: I changed the color to make them visible since they're very narrow. You can use se=FALSE in stat_smooth to turn them off.
The other answer shows you how to get the coefficients:
coef(lm(success~log(horizon),data=dat))
I can imagine you might next want to add the equation to the graph: see Adding Regression Line Equation and R2 on graph

I'm pretty sure a simple +scale_y_log10() would get you what you wanted. GGPlot stats are calculated after transformations, so the loess() would then be calculated on the log transformed data.

I've just written a blog post here that describes how to match Excel's logarithmic curve fitting exactly. The nub of the approach centers around the lm() function:
# Set x and data.to.fit to the independent and dependent variables
data.to.fit <- c(0.5998,0.5884,0.5777,0.5671,0.5562,0.5453,0.5351,0.524,0.515,0.5072,0.4999,0.4988,0.4988,0.4973,0.49,0.4958,0.4940,0.4921,0.4901,0.4868,0.4823,0.4772,0.4710)
x <- c(seq(1, length(data.to.fit)))
data.set <- data.frame(x, data.to.fit)
# Perform a logarithmic fit to the data set
log.fit <- lm(data.to.fit~log(x), data=data.set)
# Print out the intercept, log(x) parameters, R-squared values, etc.
summary(log.fit)
# Plot the original data set
plot(data.set)
# Add the log.fit line with confidence intervals
matlines(predict(log.fit, data.frame(x=x), interval="confidence"))
Hope that helps.

Related

Extract values used to make plot for parametric component of GAM in R

I have performed a GAM that includes both continuous smooth terms and a categorical variable. I have plotted the model (mod) using plot(mod,residuals=T,all.terms=T,pages=1). This produces plots of the two smooth parameters as well as the parametric parameter. I want to extract the values used to make these plots so I can re do them and make them look nicer. If I save the plot in an object, this gives me everything I need for the smooth terms, but doesn't contain any information about the parametric component: plot.mod=plot(mod,residuals=T,all.terms=T,select=0). But I can't see where the numbers are coming from for the default plotting of the parametric component. Is there a way to extract these as well?
Here is a reproducible example of what I have done so far
library(mgcv)
# create some data
data=data.frame(response=c(10,12,8,9,3,4,5,5,4,5,4,5,4,1),pred1=c(9,8,8,9,6,7,6,4,3,4,2,3,3,1),pred2=as.factor(c("A","C","B","B","A","A","C","B","C","A","C","B","A","B")),pred3=c(1,6,3,4,8,6,4,5,7,10,11,3,12,1))
# run the GAM
mod <- gam(response ~ s(pred1,k=8) + pred2 + s(pred3,k=5), data=data, family=gaussian(), method="REML")
# the default plot
plot(mod,residuals=T,all.terms=T,pages=1)
# save values in an object. But this only saves the smooth terms.
plot.mod=plot(mod,residuals=T,all.terms=T,select=0)
# How can I extract the values used to plot the parametric term?
The plot I'm trying to extract the data to make:
From the plot.gam documentation, termplot is used for the parametric terms, so
plot.para <- termplot(mod, se = TRUE, plot = FALSE)
saves that plot to a list.
The format is different than the others, but the data is there.

How to graph inc exponential decay in R?

My prof decided that our first experience with coding was going to be trying to fit the function z(t) = A(1-e^(-t/T)) into a given data-set from class using R. I'm completely lost. I keep using lm and nls functions, without quite knowing how they work. So far, I have the data graphed but I have no clue how to get any sort of line more complicated than
mod3<-lm(y~I(x^1/5))
pre3<-predict(mod3)
lines(pre3)
to sum up: how do I find the A and T parameters? Do I use nls for the formula? Anything helps. I'll include a picture of the graph and the data. Please ignore the random lines on the plot. graph depicting my dataset dataset I have to use
One could attempt transform your expression into a linear relationship, but sometimes it is easier to just let the computer do the work. As mention in the comments, R has the nls function to perform the nonlinear regression.
Here is an example using some dummy data. The supply the nls function with your equation, the data frame containing the data and supply it with the initial estimates of the parameters.
See comments for additional details.
#create dummy data
A= 0.8
T1 = 13
t <- seq(2, 50, 3)
z <- A*(1-exp(-t/T1))
z<- z +rnorm(length(z), 0, 0.005) #add noise
#starting data frame
df <-data.frame(t, z)
#solve non-linear model
model <- nls(z ~ A*(1-exp(-t/Tc)), data=df, start = list(A=1, Tc=1))
print(summary(model))
#predict
pred_y <-predict(model, data.frame(t))
#plot
plot(x=t, y=z)
lines(y=pred_y, x= t, col="blue")

How can I calculate the slope from a linear regression analysis?

In R, I am trying to overlay an abline onto a plot, the result of linear regression. I want to create a scatter plot showing
TrainRegRpt$train.data.Price (original price) on the x-axis,
TrainRegress$fitted.values (the projected price that came from the lm model) on the y-axis and draw the line of best fit through the plotted points.
Here is some of my code:
TrainRegress <- lm(PriceBH.df$Price ~ ., data=PriceBH.df, subset = train.rows)
TrainRegRpt <- data.frame(train.data$Price, TrainRegress$fitted.values, TrainRegress$residuals)
x <- as.vector(TrainRegRpt$TrainRegress.fitted.values) # on the x-axis
y <- as.vector(TrainRegRpt$train.data.Price) #on the y-axis
plot(TrainRegRpt$train.data.Price ~ TrainRegRpt$TrainRegress.fitted.values)
abline(x,y)
The scatter plot came out the same:
x <- as.vector(newdf$fv)
y <- as.vector(newdf$p)
p <-as.vector(TrainRegRpt$train.data.Price) # my y-axis in the scatter plot
fv <- as.vector(round(TrainRegRpt$TrainRegress.fitted.values,2) # my y-axis in the scatter plot
newdf<- dfrm <- data.frame(p,fv)
plot(newdf$p ~ newdf$fv)
abline(x,y)
summary(TrainRegress)
The following is the summary of TrainRegress: Coefficients obtained from the summary of TrainRegress:
Intercept Estimate
................30.318
CRIM.........0.245
CHAS......5.8368
RM..........8.4846
I extracted the y-intercept as follows:
y.interceptval <-summary(TrainRegress)$coefficients[1]
I will use y.interceptval in the abline(y.interceptval,***?slope***) but I need to know how to calculate the slope. How do I calculate the slope to pass to abline(y.interceptval, slope)?
I have 5 textbooks here that are no help and my professor refuses to help me and I really want this to be perfect!
Thank you!!!
plot(TrainRegRpt$train.data.Price ~ TrainRegRpt$TrainRegress.fitted.values)<br>
abline(x,y)
It looks like you already calculated your slope. The slopes from a linear regression analysis using lm() are the coefficients. So, in this case, 30.318 is your Y-intercept.
This gives you a regression equation of:
Y = 30.318 + 0.245*(CRIM) + 5.8368*(CHAS) + 8.4846*(RM)
The numbers 0.245, 5.8368, and 8.4846 are the coefficients for each variable and they are also the individual slopes.
Also, one thing about your fitted vs reesiduals plot, it looks like you reversed the way abline() is supposed to be (i.e. instead of abline(x,y) it should be abline(y,x).
Edit You used abline(x,y) but your plotted data are
plot(TrainRegRpt$train.data.Price ~ TrainRegRpt$TrainRegress.fitted.values)
(train.data.Price vs. Fitted Values not x vs y).

R: Fit curve to points: what linear/non-linear model to use?

I have a data which should follow the power law distribution.
x = distance
y = %
I want to create a model and to add the fitted line to my plot.
My aim to recreate something like this:
As author uses R-square; I assume they applied linear models, as R^2 is not suitable for non-linear models. http://blog.minitab.com/blog/adventures-in-statistics-2/why-is-there-no-r-squared-for-nonlinear-regression
However, I can't find out how to "curve" my line to the points; how to add the formula y ~ a*x^(-b) to my model.
Instead of curly line I got back the line as from the simple linear regression.
My questions are:
Do I correctly assume the model y ~ a*x^(-b) used by author is linear?
what type of model to use to recreate my example: lm, glm, nls, etc. ?
I generated the dummy data, including the applied power law formula from the plot above:
set.seed(42)
scatt<-runif(10)
x<-seq(1, 1000, 100)
b = 1.8411
a = 133093
y = a*x^(-b) + scatt # add some variability in my dependent variable
plot(y ~ x)
and tried to create a glm model.
# formula for non-linear model
m<-m.glm<-glm(y ~ x^2, data = dat) #
# add predicted line to plot
lines(x,predict(m),col="red",lty=2,lwd=3)
This is my first time to model, so I am really confused and I don't know where to start... thank you for any suggestion or directions, I really appreciate it...
I personally think this question a dupe of this: `nls` fails to estimate parameters of my model but I would be cold-blooded if I close it (as OP put a bounty). Anyway, bounty question can not be closed.
So the best I could think of, is to post a community wiki answer (I don't want to get this bounty).
As you want to fit a model of this form y ~ a*x^(-b), it often benefit from taking log transform on both sides and fit a linear model log(y) ~ log(x).
fit <- lm(log(y) ~ log(x))
As you have already known how to use curve to plot regression curve and are happy with it, I will now show how to make plot.
Some people call this log-log regression. Here are some other links I have for such kind of regression:
How to predict a new value using simple linear regression log(y)=b0+b1*log(x)
How to plot confidence bands for my weighted log-log linear regression?
m <- lm(log(y) ~ log(x), data=dat)
a <- exp(intercept)
b <- -exp(slope)
plot(y ~ x, type="p", lty=3)
lines(x, exp(predict(m)), col="blue", lty=2, lwd=3)

Plot Non-linear Mixed Model Over Original Fitted Data

I'm trying to plot the resultant curve from fitting a non-linear mixed model. It should be something like a curve of a normal distribution but skewed to the right. I followed previous links here and here, but when I use my data I can not make it happen for different difficulties (see below).
Here is the dataset
and code
s=read.csv("GRVMAX tadpoles.csv")
t=s[s$SPP== levels(s$SPP)[1],]
head(t)
vmax=t[t$PERFOR=="VMAX",]
colnames(vmax)[6]="vmax"
vmax$TEM=as.numeric(as.character(vmax$TEM));
require(lme4)
start =c(TEM=25)
is.numeric(start)
nm1 <- nlmer ( vmax ~ deriv(TEM)~TEM|INDIVIDUO,nlpars=start, nAGQ =0,data= vmax)# this gives an error suggesting nlpars is not numeric, despite start is numeric...:~/
After that, I want to plot the curve over the original data
with(vmax,plot(vmax ~ (TEM)))
x=vmax$TEM
lines(x, predict(nm1, newdata = data.frame(TEM = x, INDIVIDUO = "ACI5")))
Any hint?
Thanks in advance

Resources