in R, plot a nonlinear curve - r

There are several references that come close, but my lines() is producing multiple arcs instead of just one nonlinear curve. It looks like a hammock with a bunch of unwanted lines. How do you generate a simple nonlinear line? Dataset available as Auto.csv at http://www-bcf.usc.edu/~gareth/ISL/data.html.
library(ISLR)
data(Auto)
lm.fit1=lm(mpg~horsepower,data=Auto) #linear
lm.fit2=lm(mpg~horsepower+I(horsepower^2),data=Auto) #add polynomial
plot(Auto$horsepower,Auto$mpg,col=8,pch=1)
abline(lm.fit1,col=2) #linear fit
lines(Auto$horsepower,predict(lm.fit2),col=4) #attempt at nonlinear

lines plots the data in whatever order it happens to be in. As a result, if you don't sort by the x-value first, you'll get a mess of lines going back and forth as the x-value jumps back and forth from one row to the next. Try this, for example:
plot(c(1,3,2,0), c(1,9,4,0), type="l", lwd=7)
lines(0:3, c(0,1,4,9), col='red', lwd=4)
To get a nice curve, sort by horsepower first:
curve.dat = data.frame(x=Auto$horsepower, y=predict(lm.fit2))
curve.dat = curve.dat[order(curve.dat$x),]
lines(curve.dat, col=4)
Whereas, if you don't sort by horsepower, here's what you get:

You should use poly for your polynomial fit. You can then use curve with predict:
lm.fit2 = lm(mpg ~ poly(horsepower, 2, raw = TRUE), data = Auto) #fit polynomial
#curve passes values to x, see help("curve")
curve(predict(lm.fit2, newdata = data.frame(horsepower = x)), add = TRUE, col = 4)
This also works with nls fits.

An alternative way if you don't want to worry about sorting the dataframe first is to use ggplot. It has a useful method geom_smooth which lets you pick the formula and type of line you want to fit into your model:
library(ISLR)
library(ggplot2)
data(Auto)
ggplot(Auto, aes(mpg, horsepower)) +
geom_point() +
geom_smooth(method="lm", formula = y~x, se=FALSE)+
geom_smooth(method="lm", formula = y~x+I(x^2), se=FALSE, colour="red")

Related

How to calculate the Intersect of a ggplot line and a nonlinear function in r

INTRO: I am new to r and to stack overflow...So I am doing a term paper and need to run some stats on how or better when to develop habits.
Ideally habit formation is according to Mitscherlich’s law & looks like a non-linear regression and a asymptote. Once a participant reaches his plateau (defined as 95% interval to asymptote) One can speak of an established habit... Well actually that is debateable... BUT we are doing a replica of a study done by Lally et al. 2010 (How habits are formed:Modelling habit formation in the real world) So we somehow have to stick to certian criteria
ACTUAL QUESTION: The first step is to obtain the R2 for linear and non-linear regression. This I managed.
But for some reason I just can not manage to obtain the x-Axis value for the intersect(orange point in picture) of a non-linear function and my 95% Habit plateau line (Purple line in picture)...
Here is an example of how an ideal graph looks like
But exactley this X value is crucial for group comparison and later on checking for significant differences...
Of course I already googled but somehow I could not manage to make sense of the presented solutions to other or similar question... It seems one can not do it in ggplot with the geom_point() & therefore has to build a seperate formula using the approx() function, right?
Maybe someone can help me out... Tks in advance!
And here is the code of interest...
library(ggplot2)
library(patchwork)
library(stats)
days <- c(0:15)
score <- c(14,17,16,22,24,27,30,31,32,35,40,43,42,43,43)
df <- data.frame(days,score)
# red curve in graph
#This way the R2 for the nonlinear regression is obtained for later analisis
nonlinreg1 <- nls(score ~ SSasymp(days, Asym, R0, lrc), data = df)
summary(nonlinreg1)
RSS <- sum(residuals(nonlinreg1)^2)
TSS <- sum((df$score - mean(df$score))^2)
R.square.nonlinreg1 <- 1 - (RSS/TSS)
R.square.nonlinreg1
# purple line in graph
#Definition of plateau at 95% of asymptote reached
Asymp95 <- summary(nonlinreg1)$parameters[1,1] * 0.95
# define green line as the Asymptote
nls_line <- predict(nonlinreg1)
#plotting Asymptote (nls_line)
HabitReach95 <- approx(nls_line, df$days, xout = Asymp95)$y
# Now in GGplot
ggplot(data=df,aes(x=days, y=score)) +
geom_point()+
#HERE now from this intersect below, I would like to obtain the exact X-value
geom_point(x = HabitReach95, y= Asymp95, aes(color="Intersect"), lwd=2) +
#this is the rest of ggplot code but I think it is not of interest for the described problem, but still just in case...
geom_smooth(method=lm, aes(color="Linear Reg"), se=F) +
geom_smooth(method="nls", formula=y~SSasymp(x, Asym, R0, lrc), aes(color="Non-Linear Reg"), se=F) +
geom_hline(aes(color="Asymptote for non-linear Reg", yintercept=summary(nonlinreg1)$parameters[1,1])) +
geom_hline(aes(color="Habit plateau at 95%", yintercept=Asymp95)) +
xlab("Days of Experiment") + ylab("Automaticity Score Habit") +
ggtitle("Test graph for participant") +
theme(plot.title = element_text(hjust = 0.5))+
#ylim(0,49)+ # Actual graph or scale for experiment
scale_color_manual(values = c("green", "purple", "orange", "blue", "red"), name="Legend")
OMg I am so stupid... I already calculate it with this line here!!!
HabitReach95 <- approx(nls_line, df$days, xout = Asymp95)$y
Haha can not believe it... well sometimes you don't see the forest for the trees!

Plotting fitted response vs observed response

I have a model which has been created like this
cube_model <- lm(y ~ x + I(x^2) + I(x^3), data = d.r.data)
I have been using ggplot methods like geom_point to plot datapoints and geom_smooth to plot the regression line. Now the question i am trying to solve is to plot fitted data vs observed .. How would i do that? I think i am just unfamiliar with R so not sure what to use here.
--
EDIT
I ended up doing this
predicted <- predict(cube_model)
ggplot() + geom_point(aes(x, y)) + geom_line(aes(x, predicted))
Is this correct approach?
What you need to do is use the predict function to generate the fitted values. You can then add them back to your data.
d.r.data$fit <- predict(cube_model)
If you want to plot the predicted values vs the actual values, you can use something like the following.
library(ggplot2)
ggplot(d.r.data) +
geom_point(aes(x = fit, y = y))

Displaying smoothed (convolved) densities with ggplot2

I'm trying to display some frequencies convolved with a Gaussian kernel in ggplot2. I tried smoothing the lines with:
+ stat_smooth(se = F,method = "lm", formula = y ~ poly(x, 24))
Without success.
I read an article suggesting the frequencies should be convolved with a Gaussian kernel. Which ggplot2's stat_density function (http://docs.ggplot2.org/current/stat_density.html) seem to be able to produce.
However, I can't seem to be able to replace my geometry with stat_density. I there anything wrong with my code?
require(reshape2)
library(ggplot2)
library(RColorBrewer)
fileName = "/1.csv" # downloadable there: https://www.dropbox.com/s/l5j7ckmm5s9lo8j/1.csv?dl=0
mydata = read.csv(fileName,sep=",", header=TRUE)
dataM = melt(mydata,c("bins"))
myPalette <- colorRampPalette(rev(brewer.pal(11, "Spectral")))
ggplot(data=dataM,
aes(x=bins, y=value, colour=variable)) +
geom_line() + scale_x_continuous(limits = c(0, 2))
This code produces the following plot:
I'm looking at smoothing the lines a little bit, so they look more like this:
(from http://journal.frontiersin.org/Journal/10.3389/fncom.2013.00189/full)
Since my comments solved your problem, I'll convert them to an answer:
The density function takes individual measurements and calculates a kernel density distribution by convolution (gaussian is the default kernel). For example, plot(density(rnorm(1000))). You can control the smoothness with the bw (bandwidth) parameter. For example, plot(density(rnorm(1000), bw=0.01)).
But your data frame is already a density distribution (analogous to the output of the density function). To generate a smoother density estimate, you need to start with the underlying data and run density on it, adjusting bw to get the smoothness where you want it.
If you don't have access to the underlying data, you can smooth out your existing density distributions as follows:
ggplot(data=dataM, aes(x=bins, y=value, colour=variable)) +
geom_smooth(se=FALSE, span=0.3) +
scale_x_continuous(limits = c(0, 2)).
Play around with the span parameter to get the smoothness you want.

Fit a line with LOESS in R

I have a data set with some points in it and want to fit a line on it. I tried it with the loess function. Unfortunately I get very strange results. See the plot bellow. I expect a line that goes more through the points and over the whole plot. How can I achieve that?
How to reproduce it:
Download the dataset from https://www.dropbox.com/s/ud32tbptyvjsnp4/data.R?dl=1 (only two kb) and use this code:
load(url('https://www.dropbox.com/s/ud32tbptyvjsnp4/data.R?dl=1'))
lw1 = loess(y ~ x,data=data)
plot(y ~ x, data=data,pch=19,cex=0.1)
lines(data$y,lw1$fitted,col="blue",lwd=3)
Any help is greatly appreciated. Thanks!
You've plotted fitted values against y instead of against x. Also, you will need to order the x values before plotting a line. Try this:
lw1 <- loess(y ~ x,data=data)
plot(y ~ x, data=data,pch=19,cex=0.1)
j <- order(data$x)
lines(data$x[j],lw1$fitted[j],col="red",lwd=3)
Unfortunately the data are not available anymore, but an easier way how to fit a non-parametric line (Locally Weighted Scatterplot Smoothing or just a LOESS if you want) is to use following code:
scatter.smooth(y ~ x, span = 2/3, degree = 2)
Note that you can play with parameters span and degree to get arbitrary smoothness.
May be is to late, but you have options with ggplot (and dplyr). First if you want only plot a loess line over points, you can try:
library(ggplot2)
load(url("https://www.dropbox.com/s/ud32tbptyvjsnp4/data.R?dl=1"))
ggplot(data, aes(x, y)) +
geom_point() +
geom_smooth(method = "loess", se = FALSE)
Other way, is by predict() function using a loess fit. For instance I used dplyr functions to add predictions to new column called "loess":
library(dplyr)
data %>%
mutate(loess = predict(loess(y ~ x, data = data))) %>%
ggplot(aes(x, y)) +
geom_point(color = "grey50") +
geom_line(aes(y = loess))
Update: Added line of code to load the example data provided
Update2: Correction on geom_smoot() function name acoording #phi comment

equation for lm graphics

I'm doing graphics with lm relation, and I want to archive and to plot for each one of them an equation y=ax+b with R². How can I do it?
lmfit <- geom_smooth(method="lm", se = T)
p <- qplot(x, y, data=Tab) + facet_grid(id ~., scales = "free") + lmfit
Within ggplot, there is no direct way to do this. You need to compute the regressions separately for each id and then extract the equation and R^2 from each of those. Put those extracted versions in a dataframe (along with id) and use geom_text to display them.

Resources