I have a model which has been created like this
cube_model <- lm(y ~ x + I(x^2) + I(x^3), data = d.r.data)
I have been using ggplot methods like geom_point to plot datapoints and geom_smooth to plot the regression line. Now the question i am trying to solve is to plot fitted data vs observed .. How would i do that? I think i am just unfamiliar with R so not sure what to use here.
--
EDIT
I ended up doing this
predicted <- predict(cube_model)
ggplot() + geom_point(aes(x, y)) + geom_line(aes(x, predicted))
Is this correct approach?
What you need to do is use the predict function to generate the fitted values. You can then add them back to your data.
d.r.data$fit <- predict(cube_model)
If you want to plot the predicted values vs the actual values, you can use something like the following.
library(ggplot2)
ggplot(d.r.data) +
geom_point(aes(x = fit, y = y))
Related
So I have 2 groups and an x and y variable. I am trying to run a linear regression to see if there is a significant relationship between the x and y variables within each group but I also want to look at the significance between groups. Then I would like to plot those results and provide a p-value, equation, and R^2 value on the graph. How would I go about accomplishing this?
I am able to plot the data on the same graph using this code:
ggplot(data_NeuroPsych, aes(x = Flanker_Ratio, y = Neuropsych_Delta, color = Group)) +
geom_point() +
geom_smooth(method = "lm", fill = NA)
Then using this open source code I was able to look at the results separately: https://github.com/kassambara/ggpubr/blob/master/R/stat_regline_equation.R#L7
The issue with the above is the data is not on the same plot and it does not look at the comparison between groups.
I am trying to plot a graph of predicted values in ggplot.The script is depicted below -
Program1
lumber.predict.plm1=lm(lumber.1980.2000 ~ scale(woman.1980.2000) +
I(scale(woman.1980.2000)^2), data=lumber.unemployment.women)
xmin=min(lumber.unemployment.women$woman.1980.2000)
xmax=max(lumber.unemployment.women$woman.1980.2000)
predicted.lumber.all=data.frame(woman.1980.2000=seq(xmin,xmax,length.out=100))
predicted.lumber.all$lumber=predict(lumber.predict.plm1,newdata=predicted.lumber.all)
lumber.predict.plot=ggplot(lumber.unemployment.women,mapping=aes(x=woman.1980.2000,
y=lumber.1980.2000)) +
geom_point(colour="red") +
geom_line(data=predicted.lumber.all,size=1)
lumber.predict.plot
Error: Aesthetics must either be length one, or the same length as the dataProblems:woman.1980.2000
I believe, we do not need to match the number of observations in base dataset with the one in predicted values dataset. The same logic/program works when I try it on 'cars' dataset.
speed.lm = lm(speed ~ dist, data = cars)
xmin=10
xmax=120
new = data.frame(dist=seq(xmin,xmax,length.out=200))
new$speed=predict(speed.lm,newdata=new,interval='none')
sp <- ggplot(cars, aes(x=dist, y=speed)) +
geom_point(colour="grey40") + geom_line(data=new, colour="green", size=.8)
The above code works fine.
Unable to figure out the problem with my first program.
You should use the same y value in the predicted data. Change this line
predicted.lumber.all$lumber=
predict(lumber.predict.plm1,newdata=predicted.lumber.all)
by this one :
predicted.lumber.all$lumber.1980.2000= ## very bad variable name!
predict(lumber.predict.plm1,newdata=predicted.lumber.all)
Or recall aes as :
geom_line(data=new,aes(y=lumber),
colour="green", size=.8)
The basic problem is that in your code,
...
geom_line(data=predicted.lumber.all,size=1)
...
ggplot does not know which column from predicted.lumber to use. As #agstudy says, you can specify this with aes(...) in geom_line:
...
geom_line(data=predicted.lumber.all, aes(y=lumber), size=1)
...
Since you're just plotting the regression curve, you could accomplish the same thing with less code using:
df <- lumber.unemployment.women
model <- lumber.1980.2000 ~ scale(woman.1980.2000) + I(scale(woman.1980.2000)^2)
ggplot(df, aes(x=woman.1980.2000, y=lumber.1980.2000)) +
geom_point(color="red") +
stat_smooth(formula=model, method="lm", se=T, color="green", size=0.8)
Note that se=T gives you the confidence limits on the regression curves.
I have a data set with some points in it and want to fit a line on it. I tried it with the loess function. Unfortunately I get very strange results. See the plot bellow. I expect a line that goes more through the points and over the whole plot. How can I achieve that?
How to reproduce it:
Download the dataset from https://www.dropbox.com/s/ud32tbptyvjsnp4/data.R?dl=1 (only two kb) and use this code:
load(url('https://www.dropbox.com/s/ud32tbptyvjsnp4/data.R?dl=1'))
lw1 = loess(y ~ x,data=data)
plot(y ~ x, data=data,pch=19,cex=0.1)
lines(data$y,lw1$fitted,col="blue",lwd=3)
Any help is greatly appreciated. Thanks!
You've plotted fitted values against y instead of against x. Also, you will need to order the x values before plotting a line. Try this:
lw1 <- loess(y ~ x,data=data)
plot(y ~ x, data=data,pch=19,cex=0.1)
j <- order(data$x)
lines(data$x[j],lw1$fitted[j],col="red",lwd=3)
Unfortunately the data are not available anymore, but an easier way how to fit a non-parametric line (Locally Weighted Scatterplot Smoothing or just a LOESS if you want) is to use following code:
scatter.smooth(y ~ x, span = 2/3, degree = 2)
Note that you can play with parameters span and degree to get arbitrary smoothness.
May be is to late, but you have options with ggplot (and dplyr). First if you want only plot a loess line over points, you can try:
library(ggplot2)
load(url("https://www.dropbox.com/s/ud32tbptyvjsnp4/data.R?dl=1"))
ggplot(data, aes(x, y)) +
geom_point() +
geom_smooth(method = "loess", se = FALSE)
Other way, is by predict() function using a loess fit. For instance I used dplyr functions to add predictions to new column called "loess":
library(dplyr)
data %>%
mutate(loess = predict(loess(y ~ x, data = data))) %>%
ggplot(aes(x, y)) +
geom_point(color = "grey50") +
geom_line(aes(y = loess))
Update: Added line of code to load the example data provided
Update2: Correction on geom_smoot() function name acoording #phi comment
I'm trying to plot an exponential decay line (with error bars) onto a scatterplot in ggplot of price information over time. I currently have this:
f2 <- ggplot(data, aes(x=date, y=cost) ) +
geom_point(aes(y = cost), colour="red", size=2) +
geom_smooth(se=T, method="lm", formula=y~x) +
# geom_smooth(se=T) +
theme_bw() +
xlab("Time") +
scale_y_log10("Price over time") +
opts(title="The Falling Price over time")
print(f2)
The key line is in the geom_smooth command, of formula=y~x Although this looks like a linear model, ggplot seems to automatically detect my scale_y_log10 and log it.
Now, my issue here is that date is a date data type. I think I need to convert it to seconds since t=0 to be able to apply an exponential decay model of the form y = Ae^-(bx).
I believe this because when I tried things like y = exp(x), I get a message that I think(?) is telling me I can't take exponents of dates. It reads:
Error in lm.wfit(x, y, w, offset = offset, singular.ok = singular.ok, :
NA/NaN/Inf in foreign function call (arg 1)
However, log(y) = x works correctly. (y is a numeric data type, x is a date.)
Is there a convenient way to fit exponential growth/decay time series models within ggplot plots in the geom_smooth(formula=formula) function call?
This appears to work, although I don't know how finicky it will be with real/messy data:
set.seed(101)
dat <- data.frame(d=seq.Date(as.Date("2010-01-01"),
as.Date("2010-12-31"),by="1 day"),
y=rnorm(365,mean=exp(5-(1:365)/100),sd=5))
library(ggplot2)
g1 <- ggplot(dat,aes(x=d,y=y))+geom_point()+expand_limits(y=0)
g1+geom_smooth(method="glm",family=gaussian(link="log"),
start=c(5,0))
I'm doing graphics with lm relation, and I want to archive and to plot for each one of them an equation y=ax+b with R². How can I do it?
lmfit <- geom_smooth(method="lm", se = T)
p <- qplot(x, y, data=Tab) + facet_grid(id ~., scales = "free") + lmfit
Within ggplot, there is no direct way to do this. You need to compute the regressions separately for each id and then extract the equation and R^2 from each of those. Put those extracted versions in a dataframe (along with id) and use geom_text to display them.