I have a data set with some points in it and want to fit a line on it. I tried it with the loess function. Unfortunately I get very strange results. See the plot bellow. I expect a line that goes more through the points and over the whole plot. How can I achieve that?
How to reproduce it:
Download the dataset from https://www.dropbox.com/s/ud32tbptyvjsnp4/data.R?dl=1 (only two kb) and use this code:
load(url('https://www.dropbox.com/s/ud32tbptyvjsnp4/data.R?dl=1'))
lw1 = loess(y ~ x,data=data)
plot(y ~ x, data=data,pch=19,cex=0.1)
lines(data$y,lw1$fitted,col="blue",lwd=3)
Any help is greatly appreciated. Thanks!
You've plotted fitted values against y instead of against x. Also, you will need to order the x values before plotting a line. Try this:
lw1 <- loess(y ~ x,data=data)
plot(y ~ x, data=data,pch=19,cex=0.1)
j <- order(data$x)
lines(data$x[j],lw1$fitted[j],col="red",lwd=3)
Unfortunately the data are not available anymore, but an easier way how to fit a non-parametric line (Locally Weighted Scatterplot Smoothing or just a LOESS if you want) is to use following code:
scatter.smooth(y ~ x, span = 2/3, degree = 2)
Note that you can play with parameters span and degree to get arbitrary smoothness.
May be is to late, but you have options with ggplot (and dplyr). First if you want only plot a loess line over points, you can try:
library(ggplot2)
load(url("https://www.dropbox.com/s/ud32tbptyvjsnp4/data.R?dl=1"))
ggplot(data, aes(x, y)) +
geom_point() +
geom_smooth(method = "loess", se = FALSE)
Other way, is by predict() function using a loess fit. For instance I used dplyr functions to add predictions to new column called "loess":
library(dplyr)
data %>%
mutate(loess = predict(loess(y ~ x, data = data))) %>%
ggplot(aes(x, y)) +
geom_point(color = "grey50") +
geom_line(aes(y = loess))
Update: Added line of code to load the example data provided
Update2: Correction on geom_smoot() function name acoording #phi comment
Related
I have a model which has been created like this
cube_model <- lm(y ~ x + I(x^2) + I(x^3), data = d.r.data)
I have been using ggplot methods like geom_point to plot datapoints and geom_smooth to plot the regression line. Now the question i am trying to solve is to plot fitted data vs observed .. How would i do that? I think i am just unfamiliar with R so not sure what to use here.
--
EDIT
I ended up doing this
predicted <- predict(cube_model)
ggplot() + geom_point(aes(x, y)) + geom_line(aes(x, predicted))
Is this correct approach?
What you need to do is use the predict function to generate the fitted values. You can then add them back to your data.
d.r.data$fit <- predict(cube_model)
If you want to plot the predicted values vs the actual values, you can use something like the following.
library(ggplot2)
ggplot(d.r.data) +
geom_point(aes(x = fit, y = y))
I am trying to add a linear regression to data plotted with ggplot; however, due to the nature of my data I need to plot it such that the responding variable in the linear regression is the x-axis, not the y. Is there a way to change the way the regression is done (I tried changing "formula = y~x" to "formula = x~y" but no luck) by maybe specifying alternate mapping from the mapping specified by the plot? Or is there an easy way to invert the plot after I have added the regression? Thanks! Any help is appreciated.
One straightforward way (which you suggested) would be to make the plot with y and x reversed, and then "inverting" the final plot. I used heavily right skewed "noise" so the example really makes it clear what is being fit.
library(tidyverse)
set.seed(42)
foo <- data_frame(x = 1:100, y = 2 + 0.5*x + 3*rchisq(100, 3))
foo %>%
ggplot(aes(x=y, y=x)) + geom_point() + stat_smooth(method = "lm") + coord_flip()
There are several references that come close, but my lines() is producing multiple arcs instead of just one nonlinear curve. It looks like a hammock with a bunch of unwanted lines. How do you generate a simple nonlinear line? Dataset available as Auto.csv at http://www-bcf.usc.edu/~gareth/ISL/data.html.
library(ISLR)
data(Auto)
lm.fit1=lm(mpg~horsepower,data=Auto) #linear
lm.fit2=lm(mpg~horsepower+I(horsepower^2),data=Auto) #add polynomial
plot(Auto$horsepower,Auto$mpg,col=8,pch=1)
abline(lm.fit1,col=2) #linear fit
lines(Auto$horsepower,predict(lm.fit2),col=4) #attempt at nonlinear
lines plots the data in whatever order it happens to be in. As a result, if you don't sort by the x-value first, you'll get a mess of lines going back and forth as the x-value jumps back and forth from one row to the next. Try this, for example:
plot(c(1,3,2,0), c(1,9,4,0), type="l", lwd=7)
lines(0:3, c(0,1,4,9), col='red', lwd=4)
To get a nice curve, sort by horsepower first:
curve.dat = data.frame(x=Auto$horsepower, y=predict(lm.fit2))
curve.dat = curve.dat[order(curve.dat$x),]
lines(curve.dat, col=4)
Whereas, if you don't sort by horsepower, here's what you get:
You should use poly for your polynomial fit. You can then use curve with predict:
lm.fit2 = lm(mpg ~ poly(horsepower, 2, raw = TRUE), data = Auto) #fit polynomial
#curve passes values to x, see help("curve")
curve(predict(lm.fit2, newdata = data.frame(horsepower = x)), add = TRUE, col = 4)
This also works with nls fits.
An alternative way if you don't want to worry about sorting the dataframe first is to use ggplot. It has a useful method geom_smooth which lets you pick the formula and type of line you want to fit into your model:
library(ISLR)
library(ggplot2)
data(Auto)
ggplot(Auto, aes(mpg, horsepower)) +
geom_point() +
geom_smooth(method="lm", formula = y~x, se=FALSE)+
geom_smooth(method="lm", formula = y~x+I(x^2), se=FALSE, colour="red")
I would like to change the color of coefficient lines based on whether the point estimate is negative or positive in a ggplot2 coefficient plot in R. For example:
require(coefplot)
set.seed(123)
dat <- data.frame(x = rnorm(100), z = rnorm(100))
mod1 <- lm(y1 ~ x + z, data = dat)
coefplot.lm(mod1)
Which produces the following plot:
In this plot, I would like to change the "x" variable to red when plotted. Any ideas? Thanks.
I think, you cannot do this with a plot produced by coefplot.lm. The package coefplot uses ggplot2 as the plotting system, which is good itself, but does not allow to play with colors as easily as you would like. To achieve the desired colors, you need to have a variable in your dataset that would color-code the values; you need to specify color = color-code in aes() function within the layer that draws the dots with CE. Apparently, this is impossible to do with the output of coefplot.lm function. Maybe, you can change the colors using ggplot2 ggplot_build() function. I would say, it's easier to write your own function for this task.
I've done this once to plot odds. If you want, you may use my code. Feel free to change it. The idea is the same as in coefplot. First, we extract coefficients from a model object and prepare the data set for plotting; second, actually plot.
The code for extracting coefficients and data set preparation
df_plot_odds <- function(x){
tmp<-data.frame(cbind(exp(coef(x)), exp(confint.default(x))))
odds<-tmp[-1,]
names(odds)<-c('OR', 'lower', 'upper')
odds$vars<-row.names(odds)
odds$col<-odds$OR>1
odds$col[odds$col==TRUE] <-'blue'
odds$col[odds$col==FALSE] <-'red'
odds$pvalue <- summary(x)$coef[-1, "Pr(>|t|)"]
return(odds)
}
Plot the output of the extract function
plot_odds <- function(df_plot_odds, xlab="Odds Ratio", ylab="", asp=1){
require(ggplot2)
p <- ggplot(df_plot_odds, aes(x=vars, y=OR, ymin=lower, ymax=upper),asp=asp) +
geom_errorbar(aes(color=col),width=0.1) +
geom_point(aes(color=col),size=3)+
geom_hline(yintercept = 1, linetype=2) +
scale_color_manual('Effect', labels=c('Positive','Negative'),
values=c('blue','red'))+
coord_flip() +
theme_bw() +
theme(legend.position="none",aspect.ratio = asp)+
ylab(xlab) +
xlab(ylab) #switch because of the coord_flip() above
return(p)
}
Plotting your example
set.seed(123)
dat <- data.frame(x = rnorm(100),y = rnorm(100), z = rnorm(100))
mod1 <- lm(y ~ x + z, data = dat)
df <- df_plot_odds(mod1)
plot <- plot_odds(df)
plot
Which yields
Note that I chose theme_wb() as the default. Output is a ggplot2object. So, you may change it quite a lot.
The follwing command:
ggplot(s, aes(x = I5, y = Success))+geom_point(size=3, alpha=0.4)+
stat_smooth(method="loess", colour="blue", size=1.5)+
xlab("I5")+
ylab("Probability of Success")+
theme_bw()
gives me the following plot:
I would like to get what corresponds to the blue line as a function so that I can apply it to any value.
Is there a way to do that?
If you need the actual loess fit, it's probably better to run it yourself. Let's create some sample data (it would have been nice if you had include some in your original question)
dd <- data.frame(
x=1:50,
y = cumsum(rnorm(50))
)
And now we can run the loess function ourself
sm <- loess(y~x, dd)
Now we can compare the line that ggplot draws to our loess curve
ggplot(dd, aes(x,y)) +
stat_smooth(method="loess") +
geom_point(data=data.frame(x=sm$x, y=predict(sm)), col="red")
We can see these line up perfectly. This we can just use the predict() function with our loess object to get a value for any point. For example
predict(sm, 5)
# [1] -2.922876