ggplot error on aesthetics on predicted values of Regression in R - r

I am trying to plot a graph of predicted values in ggplot.The script is depicted below -
Program1
lumber.predict.plm1=lm(lumber.1980.2000 ~ scale(woman.1980.2000) +
I(scale(woman.1980.2000)^2), data=lumber.unemployment.women)
xmin=min(lumber.unemployment.women$woman.1980.2000)
xmax=max(lumber.unemployment.women$woman.1980.2000)
predicted.lumber.all=data.frame(woman.1980.2000=seq(xmin,xmax,length.out=100))
predicted.lumber.all$lumber=predict(lumber.predict.plm1,newdata=predicted.lumber.all)
lumber.predict.plot=ggplot(lumber.unemployment.women,mapping=aes(x=woman.1980.2000,
y=lumber.1980.2000)) +
geom_point(colour="red") +
geom_line(data=predicted.lumber.all,size=1)
lumber.predict.plot
Error: Aesthetics must either be length one, or the same length as the dataProblems:woman.1980.2000
I believe, we do not need to match the number of observations in base dataset with the one in predicted values dataset. The same logic/program works when I try it on 'cars' dataset.
speed.lm = lm(speed ~ dist, data = cars)
xmin=10
xmax=120
new = data.frame(dist=seq(xmin,xmax,length.out=200))
new$speed=predict(speed.lm,newdata=new,interval='none')
sp <- ggplot(cars, aes(x=dist, y=speed)) +
geom_point(colour="grey40") + geom_line(data=new, colour="green", size=.8)
The above code works fine.
Unable to figure out the problem with my first program.

You should use the same y value in the predicted data. Change this line
predicted.lumber.all$lumber=
predict(lumber.predict.plm1,newdata=predicted.lumber.all)
by this one :
predicted.lumber.all$lumber.1980.2000= ## very bad variable name!
predict(lumber.predict.plm1,newdata=predicted.lumber.all)
Or recall aes as :
geom_line(data=new,aes(y=lumber),
colour="green", size=.8)

The basic problem is that in your code,
...
geom_line(data=predicted.lumber.all,size=1)
...
ggplot does not know which column from predicted.lumber to use. As #agstudy says, you can specify this with aes(...) in geom_line:
...
geom_line(data=predicted.lumber.all, aes(y=lumber), size=1)
...
Since you're just plotting the regression curve, you could accomplish the same thing with less code using:
df <- lumber.unemployment.women
model <- lumber.1980.2000 ~ scale(woman.1980.2000) + I(scale(woman.1980.2000)^2)
ggplot(df, aes(x=woman.1980.2000, y=lumber.1980.2000)) +
geom_point(color="red") +
stat_smooth(formula=model, method="lm", se=T, color="green", size=0.8)
Note that se=T gives you the confidence limits on the regression curves.

Related

Is there an R function to create a data frame of the logarithmic equations?

First, is there a way to fix my code to include ln(x) in the equation. For example, the plot in the equation is shown as y=1.2 + 0.32x, but instead should be y=1.2 +0.32ln(x).
Lastly, I'm trying to figure out is there a way to create either a new data frame that would allow me to summarize all the plots logarithmic equations that resulted from using stat_regline_equation(formula=y~log(x)).
iris<-rotated.plot.data %>%
select(`2014-02-03 06:10:00` : `2014-09-30 22:10:00`)
plots <- purrr::map(iris, function(y) {
ggplot(rotated.plot.data,
aes(x=instrument.supersaturation, y={{ y }})) +
geom_point() + geom_smooth(method="lm", formula = y~log(x)) +
stat_regline_equation(formula=y~log(x)) +
ylab("Nccn/Ncn") +
xlab("instrument supersaturation(%)")})
Unfortunately, I been google searching and I can't find any methods to help with the problems I have encountered.
Your example is not reproducible (since it relies on rotated.plot.data that is not available).
But from your question it seems all you want is to transform the X axis with the logarithm. There is a scale for that: scale_x_log10(). You can remove the log() from the geom_smooth() call add this scale to your plot.
Try this:
rotated.plot.data %>%
ggplot(aes(x=instrument.supersaturation, y={{ y }})) +
geom_point() +
geom_smooth(method="lm") +
scale_x_log10() +
ylab("Nccn/Ncn") +
xlab("instrument supersaturation(%)")

Plotting fitted response vs observed response

I have a model which has been created like this
cube_model <- lm(y ~ x + I(x^2) + I(x^3), data = d.r.data)
I have been using ggplot methods like geom_point to plot datapoints and geom_smooth to plot the regression line. Now the question i am trying to solve is to plot fitted data vs observed .. How would i do that? I think i am just unfamiliar with R so not sure what to use here.
--
EDIT
I ended up doing this
predicted <- predict(cube_model)
ggplot() + geom_point(aes(x, y)) + geom_line(aes(x, predicted))
Is this correct approach?
What you need to do is use the predict function to generate the fitted values. You can then add them back to your data.
d.r.data$fit <- predict(cube_model)
If you want to plot the predicted values vs the actual values, you can use something like the following.
library(ggplot2)
ggplot(d.r.data) +
geom_point(aes(x = fit, y = y))

Ggplot: 1 Regression line with constant and 1 without

For a university exercise, I would like to plot two regression lines in the same graph: one regression includes a constant, the other one doesn't. It should illustrate how removing the constant changes the regression line.
However, when I use the following ggplot-command, I only get one regression line. Does anybody know the reason for this and how to fix it?
data(mtcars)
ggplot(mtcars, aes(x=disp, y=mpg)) +
geom_point() + # Scatters
geom_smooth(method=lm, se=FALSE)+
geom_smooth(method=lm, aes(color='red'),
formula = y ~ x -0, #remove constant
se=FALSE)
I tried this, but it doesn't do the trick.
You almost got it; to remove the intercept, you need + 0 or - 1, but not - 0; from help("lm"):
A formula has an implied intercept term. To remove this use either y ~
x - 1 or y ~ 0 + x. See formula for more details of allowed formulae.
So, we can do this:
library(ggplot2)
data(mtcars)
ggplot(mtcars, aes(x=disp, y=mpg)) +
geom_point() + # Scatters
geom_smooth(method=lm, se=FALSE)+
geom_smooth(method=lm, aes(color='red'),
formula = y ~ x - 1, #remove constant
se=FALSE)
Created on 2018-10-07 by the reprex package (v0.2.1)
good day,
the good book said to do it like this:
http://bighow.org/questions/18280770/formatting-regression-line-equation-using-ggplot2-in-r
I cant take credit for the code writing but the research ...
Have a great day...
captbullett

R : stat_smooth groups (x axis)

I have a Database, and want to show a figure using stat_smooth.
I can show the avg_time vs Scored_Probabilities figure, which looks like this:
c <- ggplot(dataset1, aes(x=Avg.time, y=Scored.Probabilities))
c + stat_smooth()
But when changing Avg.time to time or Age, an error occurs:
c <- ggplot(dataset1, aes(x=Age, y=Scored.Probabilities))
c + stat_smooth()
error: geom_smooth: Only one unique x value each group. Maybe you want aes(group = 1)?
How could I fix it?
the error message says to set group=1, doing that gives another error
ggplot(dataset1, aes(x=Age, y=Scored.Probabilities, group=1))+stat_smooth()
geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.
Error in smooth.construct.cr.smooth.spec(object, data, knots) :
x has insufficient unique values to support 10 knots: reduce k.
Now the number of unique x values is not enough.
So two solutions : i) using another function like mean, ii) using jitter to move slightly Age.
ggplot(dataset1, aes(x=Age, y=Scored.Probabilities, group=1))+
geom_point()+
stat_summary(fun.y=mean, colour="red", geom="line", size = 3) # draw a mean line in the data
Or
ggplot(dataset1, aes(x=jitter(as.numeric(as.character(Age))), y=Scored.Probabilities, group=1))+
geom_point()+stat_smooth()
Note the use of as.numeric because Age is a factor.

facet_grid of back to back histogram failing

I am having some trouble creating a facet grid of a back-to-back histogram created with ggplot.
# create data frame with latency values
latc_sorted <- data.frame(
subject=c(1,1,1,1,1,2,2,2,2,2),
grp=c("K_N","K_I","K_N","K_I","K_N","K_I","K_N","K_I","K_N","K_I"),
lat=c(22,45,18,55,94,11,67,22,64,44)
)
# subset and order data
x.sub_ki<-subset(latc_sorted, grp=="K_I")
x.sub_kn<-subset(latc_sorted, grp=="K_N")
x.sub_k<-rbind(x.sub_ki,x.sub_kn)
x=x.sub_ki$lat
y=x.sub_kn$lat
nm<-list("x","y")
# make absolute values on x axis
my.abs<-function(x){abs(x)}
# plot back-to-back histogram
hist_K<-qplot(x, geom="histogram", fill="inverted", binwidth=20) +
geom_histogram(data=data.frame(x=y), aes(fill="non-inverted", y=-..count..),
binwidth= 20) + scale_y_continuous(formatter='my.abs') + coord_flip() +
scale_fill_hue("variable")
hist_K
this plots fine but if I try the following I get the error:
Error: Casting formula contains variables not found in molten data: x.sub_k$subject
hist_K_sub<-qplot(x, geom="histogram", fill="inverted", binwidth=20) +
geom_histogram(data=data.frame(x=y), aes(fill="non-inverted", y=-..count..),
binwidth= 20) + scale_y_continuous(formatter='my.abs') + coord_flip() +
scale_fill_hue("variable")+
facet_grid(x.sub_k$subject ~ .)
hist_K_sub
any ideas what is causing this to fail?
The problem is that the variables referenced in facet_grid are looked for in the data.frames that are passed to the various layers. You have created (implicitly and explicitly) data.frames which have only the lat data and do not have the subject information. If you use x.sub_ki and x.sub_kn instead, they do have the subject variable associated with the lat values.
hist_K_sub <-
ggplot() +
geom_histogram(data=x.sub_ki, aes(x=lat, fill="inverted", y= ..count..), binwidth=20) +
geom_histogram(data=x.sub_kn, aes(x=lat, fill="not inverted", y=-..count..), binwidth=20) +
facet_grid(subject ~ .) +
scale_y_continuous(formatter="my.abs") +
scale_fill_hue("variable") +
coord_flip()
hist_K_sub
I also converted from qplot to full ggplot syntax; that shows the parallel structure of ki and kn better.
The syntax above doesn't work with newer versions of ggplot2, use
the following instead for the formatting of axes:
abs_format <- function() {
function(x) abs(x)
}
hist_K_sub <- hist_K_sub+ scale_y_continuous(labels=abs_format())

Resources