I want to shade areas in a ggplot but I don't want to manually tell geom_rect() where to stop and where to start. My data changes and I always want to shade several areas based on a condition.
Here for example with the condition "negative":
library("ggplot2")
set.seed(3)
plotdata <- data.frame(somevalue = rnorm(10), indicator = 0 , counter = 1:10)
plotdata[plotdata$somevalue < 0,]$indicator <- 1
plotdata
I can do that manually like here or here:
plotranges <- data.frame(from = c(1,4,9), to = c(2,4,9))
ggplot() +
geom_line(data = plotdata, aes(x = counter, y = somevalue)) +
geom_rect(data = plotranges, aes(xmin = from - 1, xmax = to, ymin = -Inf, ymax = Inf), alpha = 0.4)
But my problem is that, so to speak, the set.seed() argument changes and I want to still automatically generate the plot without specifying min and max values of the shading manually. Is there a way (maybe without geom_rect() but instead geom_bar()?) to plot shading based directly on my indicator variable?
edit: Here is my own best attempt, as you can see not great:
ggplot(data = plotdata, aes(x = counter, y = somevalue)) + geom_line() +
geom_bar(aes(y = indicator*max(somevalue)), stat= "identity")
You can use stat_summary() to calculate the extremes of runs of your indicator. In the code below data.table::rleid() groups the data by runs of indicators. In the summary layer, y doesn't really do anything, so we use it to store the resolution of your datapoints, which we then later use to offset the xmin/xmax parameters. The after_stat() function is used to access computed variables after the ranges have been computed.
library("ggplot2")
plotdata <- data.frame(somevalue = rnorm(10), counter = 1:10)
plotdata$indicator <- as.numeric(plotdata$somevalue < 0)
ggplot(plotdata, aes(counter, somevalue)) +
stat_summary(
geom = "rect",
aes(group = data.table::rleid(indicator),
xmin = after_stat(xmin - y * 0.5),
xmax = after_stat(xmax + y * 0.5),
alpha = I((indicator) * 0.4),
y = resolution(counter)),
fun.min = min, fun.max = max,
orientation = "y", ymin = -Inf, ymax = Inf
) +
geom_line()
Created on 2021-09-14 by the reprex package (v2.0.1)
Related
I'm working with stock prices and trying to plot the price difference.
I created one using autoplot.zoo(), my question is, how can I manage to change the point shapes to triangles when they are above the upper threshold and to circles when they are below the lower threshold. I understand that when using the basic plot() function you can do these by calling the points() function, wondering how I can do this but with ggplot2.
Here is the code for the plot:
p<-autoplot.zoo(data, geom = "line")+
geom_hline(yintercept = threshold, color="red")+
geom_hline(yintercept = -threshold, color="red")+
ggtitle("AAPL vs. SPY out of sample")
p+geom_point()
We can't fully replicate without your data, but here's an attempt with some sample generated data that should be similar enough that you can adapt for your purposes.
# Sample data
data = data.frame(date = c(2001:2020),
spread = runif(20, -10,10))
# Upper and lower threshold
thresh <- 4
You can create an additional variable that determines the shape, based on the relationship in the data itself, and pass that as an argument into ggplot.
# Create conditional data
data$outlier[data$spread > thresh] <- "Above"
data$outlier[data$spread < -thresh] <- "Below"
data$outlier[is.na(data$outlier)] <- "In Range"
library(ggplot2)
ggplot(data, aes(x = date, y = spread, shape = outlier, group = 1)) +
geom_line() +
geom_point() +
geom_hline(yintercept = c(thresh, -thresh), color = "red") +
scale_shape_manual(values = c(17,16,15))
# If you want points just above and below# Sample data
data = data.frame(date = c(2001:2020),
spread = runif(20, -10,10))
thresh <- 4
data$outlier[data$spread > thresh] <- "Above"
data$outlier[data$spread < -thresh] <- "Below"
ggplot(data, aes(x = date, y = spread, shape = outlier, group = 1)) +
geom_line() +
geom_point() +
geom_hline(yintercept = c(thresh, -thresh), color = "red") +
scale_shape_manual(values = c(17,16))
Alternatively, you can just add the points above and below the threshold as individual layers with manually specified shapes, like this. The pch argument points to shape type.
# Another way of doing this
data = data.frame(date = c(2001:2020),
spread = runif(20, -10,10))
# Upper and lower threshold
thresh <- 4
ggplot(data, aes(x = date, y = spread, group = 1)) +
geom_line() +
geom_point(data = data[data$spread>thresh,], pch = 17) +
geom_point(data = data[data$spread< (-thresh),], pch = 16) +
geom_hline(yintercept = c(thresh, -thresh), color = "red") +
scale_shape_manual(values = c(17,16))
I have a question about ggplot2.
I want to connect data point with ols result via vertical line, like the code listed below.
Can I transfer ..y.., the value calculated by stat_smooth, to geom_linerange directly?
I tried stat_smooth(..., geom = "linerange", mapping(aes(ymin=pmin(myy, ..y..), ymax=pmax(myy,..y..)) but it is not the result I want.
library(ggplot2)
df <- data.frame(myx = 1:10,
myy = c(1:10) * 5 + 2 * rnorm(10, 0, 1))
lm.fit <- lm("myy~myx", data = df)
pred <- predict(lm.fit)
ggplot(df, aes(myx, myy)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
geom_linerange(mapping = aes(ymin = pmin(myy, pred),
ymax = pmax(myy, pred)))
stat_smooth evaluates the values at n evenly spaced points, with n = 80 by default. These points may not coincide with the original x values in your data frame.
Since you are calculating predicted values anyway, it would probably be more straightforward to add that back to your data frame and plot all geom layers based on that as your data source, for example:
df$pred <- pred
ggplot(df, aes(myx, myy)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
geom_linerange(aes(ymin = myy, ymax = pred))
I have the following code (as an example) which I would like to adapt such that the ribbon extends to the entire xrange, as geom_hline() does. The ribbon indicates what values are within accepted bounds. In my real application sometimes has no upper or lower bound, so the hline by itself is not enough to determine whether values are within bounds.
library(ggplot2)
set.seed(2016-12-19)
dates <- seq(as.Date('2016-01-01'), as.Date('2016-12-31'), by = 1)
values <- rexp(length(dates), 1)
groups <- rpois(length(dates), 5)
temp <- data.frame(
date = dates,
value = values,
group = groups,
value.min = 0,
value.max = 2
)
ggplot(temp, aes(date, value)) +
geom_ribbon(aes(ymin = value.min, ymax = value.max), fill = '#00cc33', alpha = 0.6) +
geom_hline(aes(yintercept = value.min)) +
geom_hline(aes(yintercept = value.max)) +
geom_point() +
facet_wrap(~group)
I tried setting the x in geom_ribbon to datesas well, but then only fractions of the range are filled.
Also I tried this:
geom_ribbon(aes(ymin = -Inf, ymax = 2, x = dates), data = data.frame(), fill = '#00cc33', alpha = 0.6)
but then the data seems to be overwritten for the entire plot and I get the error Error in eval(expr, envir, enclos) : object 'value' not found. Even if it would work then the range is still actually too narrow as the xlimits are expanded.
Here's one way to do it:
ggplot(temp, aes(as.numeric(date), value)) +
geom_rect(aes(xmin=-Inf, xmax=Inf, ymin = value.min, ymax = value.max), temp[!duplicated(temp$group),], fill = '#00cc33', alpha = 0.6) +
geom_hline(aes(yintercept = value.min)) +
geom_hline(aes(yintercept = value.max)) +
geom_point() +
scale_x_continuous(labels = function(x) format(as.Date(x, origin = "1970-01-01"), "%b %y")) +
facet_wrap(~group)
Note that I used as.numeric(date), because otherwise Inf and -Inf yield
Error: Invalid input: date_trans works with objects of class Date only
To get date labels for numeric values, I adjusted the scale_x_continuous labels accordingly. (Although they are not exact here. You may want to adjust it by using the exact dates instead of month/year, or alternatively set manual breaks using the breaks argument and for example seq.Date.)
Also note that I used temp[!duplicated(temp$group),] to avoid overplotting and thus maintaining the desired alpha transparency.
Based on lukeA's answer I produced the following code, which I think is a little simpler:
library(ggplot2)
set.seed(2016-12-19)
dates <- seq(as.Date('2016-01-01'), as.Date('2016-12-31'), by = 1)
values <- rexp(length(dates), 1)
groups <- rpois(length(dates), 5)
temp <- data.frame(
date = dates,
value = values,
group = groups,
value.min = 1,
value.max = 2
)
bounds <- data.frame(
xmin = -Inf,
xmax = Inf,
ymin = temp$value.min[1],
ymax = temp$value.max[1]
)
ggplot(temp, aes(date, value)) +
geom_rect(
aes(
xmin = as.Date(xmin, origin = '1970-01-01'),
xmax = as.Date(xmax, origin = '1970-01-01'),
ymin = ymin,
ymax = ymax
),
data = bounds,
fill = '#00cc33',
alpha = 0.3,
inherit.aes = FALSE
) +
geom_point() +
facet_wrap(~group)
I created a temporary dataframe containing the bounds for the rectangle, and added inherit.aes = FALSE since apparently the bounds otherwise overrule the temp data.frame (still seems a bug to me). By transforming the -Inf and Inf to the correct datatype I didn't need the custom labeler (if your dealing with POSIXt use the correct as.POSIXct/lt as automatic transformation fails).
I am trying to recreate some functionality I use daily in Tableau for R (ggplot2 and plotly). I need to be able to create reference bands and lines similar to the image below. I've figured out how to create the reference lines from the geom_errorbar(). However I can't seem to find a solution for the 'Reference Band'.
If a solution isn't possible in ggplot2 or plotly I would be open to another package, but I need somethign static for Rmarkdown reports and something dynamic for html widgets dashboard.
Below I Have some sample code, I would like to add reference bands of 'High' and 'Low' to the bar graph for each person.
library(ggplot2)
#Create Data
Name <- c("Rick","Carl","Daryl","Glenn")
Pos <- c("M","M","D","D")
Load <- c(100,110,90,130)
High <- c(150,160,130,140)
Low <- c(130,145,120,130)
data <- data.frame(Name,Pos,Load,High,Low)
rm(Name,Pos,Load,High,Low)
#create plot
ggplot(data = data, aes(x = Name, y = Load)) +
geom_bar(stat ="identity", width=.4)
Could any guidance would be appreciated. Thank you!
geom_rect() would be a better choise than geom_errorbar() because you can reproduce the same image that you posted. Take a look at both rect and errorbar documentarion.
The following example could be used in the markdown:
library(dplyr)
library(ggplot2)
delta <- 0.5
data <- mtcars %>% group_by(cyl, vs) %>%
summarise(xmin = first(cyl) - 1,
xmax = first(cyl) + 1,
wt = mean(wt),
ymin = wt - delta,
ymax = wt + delta)
ggplot(data = data, aes(x = cyl, y = wt)) +
geom_rect(aes(xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax),
fill = "indianred", alpha = 0.4) + # adds the reference band layer before
geom_bar(stat = "identity", fill = "darkblue", width = 1) + # the bar layer
facet_wrap(~vs) + theme_classic()
If you wish just one reference band you just have to use the same ymax and ymin parameters to all the observations.
You will still need more effort in the html version, because plotly::ggplotly() is messing it up.
I was able to find a solution. I needed to set the xmin and xmax as numeric and then I was able to create the reference bars.
library(ggplot2)
#Create Data
NameID <- c("Rick","Carl","Daryl","Glenn")
Pos <- c("M","M","D","D")
Load <- c(100,110,90,130)
High <- c(110,160,130,140)
Low <- c(90,145,120,130)
df <- data.frame(NameID,Pos,Load,High,Low)
rm(NameID,Pos,Load,High,Low)
p <- ggplot()
p <- p + scale_x_discrete()
p <- p + geom_rect(data=df,
aes(xmin = as.numeric(NameID)-.25,
xmax = as.numeric(NameID)+.25,
ymin = Low,
ymax = High),
fill = "blue", alpha = 0.2)
p <- p + geom_bar(data = df, aes(x = NameID, y = Load), stat="identity", width = .4)
p
I'd like to annotate all y-values greater than a y-threshold using ggplot2.
When you plot(lm(y~x)), using the base package, the second graph that pops up automatically is Residuals vs Fitted, the third is qqplot, and the fourth is Scale-location. Each of these automatically label your extreme Y values by listing their corresponding X value as an adjacent annotation. I'm looking for something like this.
What's the best way to achieve this base-default behavior using ggplot2?
Updated scale_size_area() in place of scale_area()
You might be able to take something from this to suit your needs.
library(ggplot2)
#Some data
df <- data.frame(x = round(runif(100), 2), y = round(runif(100), 2))
m1 <- lm(y ~ x, data = df)
df.fortified = fortify(m1)
names(df.fortified) # Names for the variables containing residuals and derived qquantities
# Select extreme values
df.fortified$extreme = ifelse(abs(df.fortified$`.stdresid`) > 1.5, 1, 0)
# Based on examples on page 173 in Wickham's ggplot2 book
plot = ggplot(data = df.fortified, aes(x = x, y = .stdresid)) +
geom_point() +
geom_text(data = df.fortified[df.fortified$extreme == 1, ],
aes(label = x, x = x, y = .stdresid), size = 3, hjust = -.3)
plot
plot1 = ggplot(data = df.fortified, aes(x = .fitted, y = .resid)) +
geom_point() + geom_smooth(se = F)
plot2 = ggplot(data = df.fortified, aes(x = .fitted, y = .resid, size = .cooksd)) +
geom_point() + scale_size_area("Cook's distance") + geom_smooth(se = FALSE, show_guide = FALSE)
library(gridExtra)
grid.arrange(plot1, plot2)