For a university exercise, I would like to plot two regression lines in the same graph: one regression includes a constant, the other one doesn't. It should illustrate how removing the constant changes the regression line.
However, when I use the following ggplot-command, I only get one regression line. Does anybody know the reason for this and how to fix it?
data(mtcars)
ggplot(mtcars, aes(x=disp, y=mpg)) +
geom_point() + # Scatters
geom_smooth(method=lm, se=FALSE)+
geom_smooth(method=lm, aes(color='red'),
formula = y ~ x -0, #remove constant
se=FALSE)
I tried this, but it doesn't do the trick.
You almost got it; to remove the intercept, you need + 0 or - 1, but not - 0; from help("lm"):
A formula has an implied intercept term. To remove this use either y ~
x - 1 or y ~ 0 + x. See formula for more details of allowed formulae.
So, we can do this:
library(ggplot2)
data(mtcars)
ggplot(mtcars, aes(x=disp, y=mpg)) +
geom_point() + # Scatters
geom_smooth(method=lm, se=FALSE)+
geom_smooth(method=lm, aes(color='red'),
formula = y ~ x - 1, #remove constant
se=FALSE)
Created on 2018-10-07 by the reprex package (v0.2.1)
good day,
the good book said to do it like this:
http://bighow.org/questions/18280770/formatting-regression-line-equation-using-ggplot2-in-r
I cant take credit for the code writing but the research ...
Have a great day...
captbullett
Related
First, is there a way to fix my code to include ln(x) in the equation. For example, the plot in the equation is shown as y=1.2 + 0.32x, but instead should be y=1.2 +0.32ln(x).
Lastly, I'm trying to figure out is there a way to create either a new data frame that would allow me to summarize all the plots logarithmic equations that resulted from using stat_regline_equation(formula=y~log(x)).
iris<-rotated.plot.data %>%
select(`2014-02-03 06:10:00` : `2014-09-30 22:10:00`)
plots <- purrr::map(iris, function(y) {
ggplot(rotated.plot.data,
aes(x=instrument.supersaturation, y={{ y }})) +
geom_point() + geom_smooth(method="lm", formula = y~log(x)) +
stat_regline_equation(formula=y~log(x)) +
ylab("Nccn/Ncn") +
xlab("instrument supersaturation(%)")})
Unfortunately, I been google searching and I can't find any methods to help with the problems I have encountered.
Your example is not reproducible (since it relies on rotated.plot.data that is not available).
But from your question it seems all you want is to transform the X axis with the logarithm. There is a scale for that: scale_x_log10(). You can remove the log() from the geom_smooth() call add this scale to your plot.
Try this:
rotated.plot.data %>%
ggplot(aes(x=instrument.supersaturation, y={{ y }})) +
geom_point() +
geom_smooth(method="lm") +
scale_x_log10() +
ylab("Nccn/Ncn") +
xlab("instrument supersaturation(%)")
I have a model which has been created like this
cube_model <- lm(y ~ x + I(x^2) + I(x^3), data = d.r.data)
I have been using ggplot methods like geom_point to plot datapoints and geom_smooth to plot the regression line. Now the question i am trying to solve is to plot fitted data vs observed .. How would i do that? I think i am just unfamiliar with R so not sure what to use here.
--
EDIT
I ended up doing this
predicted <- predict(cube_model)
ggplot() + geom_point(aes(x, y)) + geom_line(aes(x, predicted))
Is this correct approach?
What you need to do is use the predict function to generate the fitted values. You can then add them back to your data.
d.r.data$fit <- predict(cube_model)
If you want to plot the predicted values vs the actual values, you can use something like the following.
library(ggplot2)
ggplot(d.r.data) +
geom_point(aes(x = fit, y = y))
Trying to add geom_points to an autolayer() line ("fitted" in pic), which is a wrapper part of autoplot() for ggplot2 in Rob Hyndmans forecast package (there's a base autoplot/autolayer in ggplot2 too so same likely applies there).
Problem is (I'm no ggplot2 expert, and autoplot wrapper makes it trickier) the geom_point() applies fine to the main call, but how do I apply similar to the autolayer (fitted values)?
Tried type="b" like normal geom_line() but it's not an object param in autolayer().
require(fpp2)
model.ses <- ets(mdeaths, model="ANN", alpha=0.4)
model.ses.fc <- forecast(model.ses, h=5)
forecast::autoplot(mdeaths) +
forecast::autolayer(model.ses.fc$fitted, series="Fitted") + # cannot set to show points, and type="b" not allowed
geom_point() # this works fine against the main autoplot call
This seems to work:
library(forecast)
library(fpp2)
model.ses <- ets(mdeaths, model="ANN", alpha=0.4)
model.ses.fc <- forecast(model.ses, h=5)
# Pre-compute the fitted layer so we can extract the data out of it with
# layer_data()
fitted_layer <- forecast::autolayer(model.ses.fc$fitted, series="Fitted")
fitted_values <- fitted_layer$layer_data()
plt <- forecast::autoplot(mdeaths) +
fitted_layer +
geom_point() +
geom_point(data = fitted_values, aes(x = timeVal, y = seriesVal))
There might be a way to make forecast::autolayer do what you want directly but this solution works. If you want the legend to look right, you'll want to merge the input data and fitted values into a single data.frame.
I have a dataset comprising the following variables (fruit, prices, country, organic/non-organic, location).
I would like a plot like the one here but with one thing added - a line of best fit that runs through the points for each grouping of organic/non-organic, location, and fruit.
plot -> https://dl.dropboxusercontent.com/u/3803117/stackoverflow.jpeg
For example, in the "Organic, City" square, I would like 4 lines of best fits - one centered on Apples, Bananas, Cherries, Dates, etc.
Here's the code I used to generate the plot.
p <- ggplot(data,aes(factor(fruit),price)) +
geom_violin(aes(fill=Country,trim=FALSE)) +
geom_boxplot(aes(fill=Country),position=position_dodge(0.9),width=.1) +
geom_jitter(alpha=0.5) +
facet_wrap(organic~location) +
xlab("Fruit") +
ylab("Price") +
labs(fill="Country")
Here's a sample dataset if it might help -> https://dl.dropboxusercontent.com/u/3803117/stackoverflow.csv
Thanks in advance so much for all the help!
Doesn't the geom_abline documentation specify exactly what you are looking for? See the part "# Slopes and intercepts from linear model"
http://docs.ggplot2.org/current/geom_abline.html
EDIT: Just checked and realized that there are no examples without the SE bands but you can easily disable them by setting SE=FALSE:
p <- qplot(wt, mpg, data = mtcars)
p <- p + geom_smooth(aes(group=cyl), method="lm", se=FALSE)
p <- p + facet_grid(cyl~.)
print(p)
If you provided a sample dataset it would be even easier to help you.
EDIT2:
The following might more closely resemble what the OP envisioned. however, I hasten that it is not meaningful as the ordering of country (or fruit, or type, or anything) can typically not be used to formulate a useful linear relationship:
p <- ggplot(data,aes(factor(country),price)) +
geom_violin(aes(fill=country,trim=FALSE)) +
geom_boxplot(aes(fill=country),position=position_dodge(0.9),width=.1) +
geom_jitter(alpha=0.5) +
facet_wrap(organic~location+fruit) +
xlab("Fruit") +
ylab("Price") +
labs(fill="country")
p <- p + geom_smooth(aes(group=1,color=country), method="lm", se=FALSE)
p
I am trying to plot a graph of predicted values in ggplot.The script is depicted below -
Program1
lumber.predict.plm1=lm(lumber.1980.2000 ~ scale(woman.1980.2000) +
I(scale(woman.1980.2000)^2), data=lumber.unemployment.women)
xmin=min(lumber.unemployment.women$woman.1980.2000)
xmax=max(lumber.unemployment.women$woman.1980.2000)
predicted.lumber.all=data.frame(woman.1980.2000=seq(xmin,xmax,length.out=100))
predicted.lumber.all$lumber=predict(lumber.predict.plm1,newdata=predicted.lumber.all)
lumber.predict.plot=ggplot(lumber.unemployment.women,mapping=aes(x=woman.1980.2000,
y=lumber.1980.2000)) +
geom_point(colour="red") +
geom_line(data=predicted.lumber.all,size=1)
lumber.predict.plot
Error: Aesthetics must either be length one, or the same length as the dataProblems:woman.1980.2000
I believe, we do not need to match the number of observations in base dataset with the one in predicted values dataset. The same logic/program works when I try it on 'cars' dataset.
speed.lm = lm(speed ~ dist, data = cars)
xmin=10
xmax=120
new = data.frame(dist=seq(xmin,xmax,length.out=200))
new$speed=predict(speed.lm,newdata=new,interval='none')
sp <- ggplot(cars, aes(x=dist, y=speed)) +
geom_point(colour="grey40") + geom_line(data=new, colour="green", size=.8)
The above code works fine.
Unable to figure out the problem with my first program.
You should use the same y value in the predicted data. Change this line
predicted.lumber.all$lumber=
predict(lumber.predict.plm1,newdata=predicted.lumber.all)
by this one :
predicted.lumber.all$lumber.1980.2000= ## very bad variable name!
predict(lumber.predict.plm1,newdata=predicted.lumber.all)
Or recall aes as :
geom_line(data=new,aes(y=lumber),
colour="green", size=.8)
The basic problem is that in your code,
...
geom_line(data=predicted.lumber.all,size=1)
...
ggplot does not know which column from predicted.lumber to use. As #agstudy says, you can specify this with aes(...) in geom_line:
...
geom_line(data=predicted.lumber.all, aes(y=lumber), size=1)
...
Since you're just plotting the regression curve, you could accomplish the same thing with less code using:
df <- lumber.unemployment.women
model <- lumber.1980.2000 ~ scale(woman.1980.2000) + I(scale(woman.1980.2000)^2)
ggplot(df, aes(x=woman.1980.2000, y=lumber.1980.2000)) +
geom_point(color="red") +
stat_smooth(formula=model, method="lm", se=T, color="green", size=0.8)
Note that se=T gives you the confidence limits on the regression curves.