How to make the prediction interval to cover the area without datapoint - r

I am trying to plot a prediction interval for a linear model in ggplot.
Following the instructions in other websites, I have written the following
codes:
model1 <- lm(rate ~ conc, data=origin)
temp_var <- predict(model1, interval="prediction")
new_df <- cbind(origin, temp_var)
ggplot(data = new_df, aes(x = conc, y = rate))+
geom_line(data=plot, aes(x=x, y=y, group=group)) +
geom_point(color="#2980B9", size = 0.5) +
geom_line(aes(y=lwr), color = "#2980B9", linetype = "dashed")+
geom_line(aes(y=upr), color = "#2980B9", linetype = "dashed")
Since I want to predict the values outside my data range (blue points), I
would like to extend the prediction band. But the problem is the "lwr" and
"upr" generated by predict function only provides values within my data
range, as shown in the following graph.
Is there any way to extend the prediction band?
Thank you so much for your help.

Related

How to colour multiple geom_lines on a gradient using ggplot2 in R?

I asked a similar question here with regard to how to colour a pdp-ice plot. I have since figured out a way to colour the plots by the predicted value. However, I am still having problems with how to manually select a colour gradient. I am using the iml package to create the predictions... but I feel that this is essentially a ggplot2 problem, hence, I am opening a different question.
In the example below, I am creating a random forest model on the Boston data and using the rf model to create the ice-plot in iml.
library("iml")
library("randomForest")
library(ggplot2)
# We train a random forest on the Boston dataset:
data("Boston", package = "MASS")
rf = randomForest(medv ~ ., data = Boston, ntree = 50)
# Use iml to generate predictions
mod = Predictor$new(rf, data = Boston)
# Compute the individual conditional expectations for the feature room
ice = FeatureEffect$new(mod, method = 'ice', feature = "rm")
Now I could plot this using plot(ice) and this will create a ice plot with grey lines. However, I want to colour these lines on a gradient. I can colour them by the predicted values like so:
df <- ice$results
ggplot(df, aes(x = rm, y = .value, color = .value)) + geom_line(aes(group = .id))
and this will produce a plot like this:
Which is what I want (i.e., an ice plot colour gradient), but I can't figure out a way to manually set the colours of the plot... for example, if I wanted the low values to be red and high values to be blue. I have tried a few of the ggplot2 options.. but I can't get them to work?
so,I solved this problem by using:
ggplot(df, aes(x = rm, y = .value, color = .value )) +
geom_line(aes(group = .id)) +
scale_colour_gradient2(low = "red", mid = "yellow", high = "blue", midpoint = 25)

Linear model diagnostics: smoothing line obtained in ggplot2 is different from the one obtained with base plot

I am trying to reproduce the diagnostics plots for a linear regression model using ggplot2. The smoothing line that I get is different from the one obtained using base plots or ggplot2::autoplot.
library(survival)
library(ggplot2)
model <- lm(wt.loss ~ meal.cal, data=lung)
## Fitted vs. residuals using base plot:
plot(model, which=1)
## Fitted vs. residuals using ggplot
model.frame <- fortify(model)
ggplot(model.frame, aes(.fitted, .resid)) + geom_point() + geom_smooth(method="loess", se=FALSE)
The smoothing line is different, the influence of the the first few points is much larger using the loess method provided by ggplot. My question is: how can I reproduce the smoothing line obtained with plot() using ggplot2?
You can calculate the lowess, which is used to plot the red line in the original diagnostic plot, using samename base function.
smoothed <- as.data.frame(with(model.frame, lowess(x = .fitted, y = .resid)))
ggplot(model.frame, aes(.fitted, .resid)) +
theme_bw() +
geom_point(shape = 1, size = 2) +
geom_hline(yintercept = 0, linetype = "dotted", col = "grey") +
geom_path(data = smoothed, aes(x = x, y = y), col = "red")
And the original:

Legend for overlaid plots in ggplot

I'm trying to make a plot that overlays a bunch of simulated density plots that are one color with low alpha and one empirical density plot with high alpha in a new color. This produces a plot that looks about how I want it.
library(ggplot2)
model <- c(1:100)
values <- rnbinom(10000, 1, .4)
df = data.frame(model, values)
empirical_data <- rnbinom(1000, 1, .3)
ggplot() +
geom_density(aes(x=empirical_data), color='orange') +
geom_line(stat='density',
data = df,
aes(x=values,
group = model),
color='blue',
alpha = .05) +
xlab("Value")
However, it doesn't have a legend and I can't figure out how to add a legend to differentiate plots from df and plots from empirical_data.
The other road I started to go down was to put them all in one dataframe but I couldn't figure out how to change the color and alpha for just one of the density plots.
Moving the color = ... into the aes allows you to call the scale_color_manual and move them into the aes and make the values you pass to color a binding. You can then change it to whatever you want as the actual colors are determined in the scale_color_manual.
ggplot() +
geom_density(aes(x=empirical_data, color='a')) +
geom_line(stat='density',
data = df,
aes(x=values,
group = model,
color='b'),
alpha = .05) +
scale_color_manual(name = 'data source',
values =c('b'='blue','a'='orange'),
labels = c('df','empirical_data')) +
xlab("Value")

Different function curves for each facet in ggplot2

Short:
How do you plot a different, user/data-defined curve in each facet in ggplot2?
Long:
I would like to overlay faceted scatterplots of real data with user-defined curves of predicted data based on a faceting variables, i.e. using different curves for each facet.
Here's a toy example:
We have data on number of hedgehogs played by red or white queens for two years at two sites, with two different rate treatments. We expect those treatments to alter the hedgehog population by an annual exponential rate of either 0.5 or 1.5. So out data look like
queen <- as.factor(c(rep("red", 8), rep("white",8)))
site <- as.factor(c(rep(c(rep(1,4), rep(2,4)),2)))
year <- c(rep(c(rep(1,2), rep(2,2)),4))
rate <- rep(c(0.5,1.5),8)
hedgehogs <- c(8,10,6,14,16,9,8,11,11,9,9,10,8,11,11,6)
toy.data <- data.frame(queen, site, year, rate, hedgehogs)
Using the following this makes four nice facets of site by rate:
library("ggplot2")
ggplot(toy.data, aes(year, hedgehogs)) +
geom_point(aes(colour=queen), size=10) +
scale_colour_manual(values=c("red", "white")) +
facet_grid(rate ~ site, labeller= label_both)
I would like to overlay rate curves onto these plots.
Our prediction curve looks like:
predict.hedgehogs <- function(year, rate){
10*(rate^(year-1))
}
Where the number of hedgehogs predicted based on an exponentiation of the rate and the number of years multiplied by the starting number (here given as 10 hedgehogs).
I've tried all manner of stuffing around with stat_function and produced something on the right track but just not there,
E.g:
Adding facet specific data as per geom_hline (see bottom page here)
facet.data <- data.frame(rate=c(0.5, 0.5, 1.5, 1.5),
site=c(1, 2, 1, 2))
Then plotting
ggplot(toy.data, aes(year, hedgehogs)) +
geom_point(aes(colour = queen), size = 10) +
scale_colour_manual(values = c("red", "white")) +
facet_grid(rate ~ site, labeller = label_both) +
stat_function(mapping = aes(x = year, y = predict.hedgehogs(year,rate)),
fun = predict.hedgehogs,
args = list(r = facet.data$rate), geom = "line")
Or separate stat_function call for each rate (i.e., this strategy):
ggplot(toy.data, aes(year, hedgehogs)) +
geom_point(aes(colour=queen), size=10) +
scale_colour_manual(values=c("red", "white")) +
facet_grid(rate ~ site, labeller= label_both) +
stat_function(fun=predict.hedgehogs, args=list(rate=0.5), geom="line", rate==0.5)+
stat_function(fun=predict.hedgehogs, args=list(rate=1.5), geom="line", rate==1.5)
Error: `mapping` must be created by `aes()`
Any thoughts?
And with many thanks to comment from #Roland
If we add to toy.data predicted data from the function predict.hedgehogs above:
pred.hogs <- predict.hedgehogs(year, rate)
toy.data <- data.frame(toy.data, pred.hogs)
We can plot:
ggplot(toy.data, aes(year, hedgehogs)) +
geom_point(aes(colour=queen), size=10) +
scale_colour_manual(values=c("red", "white")) +
facet_grid(rate ~ site) +
geom_smooth(aes(x=year, y=pred.hogs), stat="identity", colour = "black")

How to add gaussian curve to histogram created with qplot?

I have question probably similar to Fitting a density curve to a histogram in R. Using qplot I have created 7 histograms with this command:
(qplot(V1, data=data, binwidth=10, facets=V2~.)
For each slice, I would like to add a fitting gaussian curve. When I try to use lines() method, I get error:
Error in plot.xy(xy.coords(x, y), type = type, ...) :
plot.new has not been called yet
What is the command to do it correctly?
Have you tried stat_function?
+ stat_function(fun = dnorm)
You'll probably want to plot the histograms using aes(y = ..density..) in order to plot the density values rather than the counts.
A lot of useful information can be found in this question, including some advice on plotting different normal curves on different facets.
Here are some examples:
dat <- data.frame(x = c(rnorm(100),rnorm(100,2,0.5)),
a = rep(letters[1:2],each = 100))
Overlay a single normal density on each facet:
ggplot(data = dat,aes(x = x)) +
facet_wrap(~a) +
geom_histogram(aes(y = ..density..)) +
stat_function(fun = dnorm, colour = "red")
From the question I linked to, create a separate data frame with the different normal curves:
grid <- with(dat, seq(min(x), max(x), length = 100))
normaldens <- ddply(dat, "a", function(df) {
data.frame(
predicted = grid,
density = dnorm(grid, mean(df$x), sd(df$x))
)
})
And plot them separately using geom_line:
ggplot(data = dat,aes(x = x)) +
facet_wrap(~a) +
geom_histogram(aes(y = ..density..)) +
geom_line(data = normaldens, aes(x = predicted, y = density), colour = "red")
ggplot2 uses a different graphics paradigm than base graphics. (Although you can use grid graphics with it, the best way is to add a new stat_function layer to the plot. The ggplot2 code is the following.
Note that I couldn't get this to work using qplot, but the transition to ggplot is reasonably straighforward, the most important difference is that your data must be in data.frame format.
Also note the explicit mapping of the y aesthetic aes=aes(y=..density..)) - this is slighly unusual but takes the stat_function results and maps it to the data:
library(ggplot2)
data <- data.frame(V1 <- rnorm(700), V2=sample(LETTERS[1:7], 700, replace=TRUE))
ggplot(data, aes(x=V1)) +
stat_bin(aes(y=..density..)) +
stat_function(fun=dnorm) +
facet_grid(V2~.)

Resources