How can I make a confidence interval band that extends to the end of the plot in ggplot?
I can do it if the plotted band is entirely within the plot, for example
limits <- c(1e2, 1e7)
confPolygon <- tibble(
x = c(limits[1], limits[1]*10, limits[2], limits[2], limits[2]/10, limits[1], limits[1]),
y = c(limits[1], limits[1], limits[2]/10, limits[2], limits[2], limits[1]*10, limits[1])
)
plot <- ggplot() +
geom_polygon(data = confPolygon, aes(x = x, y = y), fill = "grey", alpha = .25) +
scale_x_log10(limits = limits) +
scale_y_log10(limits = limits)
works. However, if I try any shape that extends the polygon to the edges
confPolygon <- tibble(
x = c(limits[1], limits[2]*10, limits[2]*10, limits[1], limits[1]),
y = c(limits[1], limits[1], limits[2]*10, limits[2]*10, limits[1])
)
then it doesn't plot the polygon.
The reason is because the method you are using to zoom in to the plot (setting limits within the x or y scales) isn't meant to zoom in; it actually subsets the data, accidentally creating missing values on the way. Use coord_cartesian(xlim = c(0,5), ylim = c(0,5)), or in your case, coord_cartesian(xlim = limits, ylim = limits) instead, as this step does not subset the data.
One way to do this is with oob=scales::squish().
plot2 <- ggplot() +
geom_polygon(data = confPolygon, aes(x = x, y = y), fill = "grey", alpha = .25) +
scale_x_log10(limits = limits, oob=scales::squish) +
scale_y_log10(limits = limits, oob=scales::squish)
If you really want the polygon to extend all the way to the edge, you should also add expand=c(0,0) to each of the scale_*_log10() argument lists.
Related
The figure below is a conceptual diagram used by Michael Clark,
https://m-clark.github.io/docs/lord/index.html
to explain Lord's Paradox and related phenomena in regression.
My question is framed in this context and using ggplot2 but it is broader in terms of geometry & graphing.
I would like to reproduce figures like this, but using actual data. I need to know:
how to draw a new axis at the origin, with a -45 degree angle, corresponding to values of y-x
how to draw little normal distributions or density diagrams, or other representations of the values y-x projected onto this axis.
My minimal base example uses ggplot2,
library(ggplot2)
set.seed(1234)
N <- 200
group <- rep(c(0, 1), each = N/2)
initial <- .75*group + rnorm(N, sd=.25)
final <- .4*initial + .5*group + rnorm(N, sd=.1)
change <- final - initial
df <- data.frame(id = factor(1:N),
group = factor(group,
labels = c('Female', 'Male')),
initial,
final,
change)
#head(df)
#' plot, with regression lines and data ellipses
ggplot(df, aes(x = initial, y = final, color = group)) +
geom_point() +
geom_smooth(method = "lm", formula = y~x) +
stat_ellipse(size = 1.2) +
geom_abline(slope = 1, color = "black", size = 1.2) +
coord_fixed(xlim = c(-.6, 1.2), ylim = c(-.6, 1.2)) +
theme_bw() +
theme(legend.position = c(.15, .85))
This gives the following graph:
In geometry, the coordinates of the -45 degree rotated axes of distributions I want to portray are
(y-x), (x+y) in the original space of the plot. But how can I draw these with
ggplot2 or other software?
An accepted solution can be vague about how the distribution of (y-x) is represented,
but should solve the problem of how to display this on a (y-x) axis.
Fun question! I haven't encountered it yet, but there might be a package to help do this automatically. Here's a manual approach using two hacks:
the clip = "off" parameter of the coord_* functions, to allow us to add annotations outside the plot area.
building a density plot, extracting its coordinates, and then rotating and translating those.
First, we can make a density plot of the change from initial to final, seeing a left skewed distribution:
(my_hist <- df %>%
mutate(gain = final - initial) %>% # gain would be better name
ggplot(aes(gain)) +
geom_density())
Now we can extract the guts of that plot, and transform the coordinates to where we want them to appear in the combined plot:
a <- ggplot_build(my_hist)
rot = pi * 3/4
diag_hist <- tibble(
x = a[["data"]][[1]][["x"]],
y = a[["data"]][[1]][["y"]]
) %>%
# squish
mutate(y = y*0.2) %>%
# rotate 135 deg CCW
mutate(xy = x*cos(rot) - y*sin(rot),
dens = x*sin(rot) + y*cos(rot)) %>%
# slide
mutate(xy = xy - 0.7, # magic number based on plot range below
dens = dens - 0.7)
And here's a combination with the original plot:
ggplot(df, aes(x = initial, y = final, color = group)) +
geom_point() +
geom_smooth(method = "lm", formula = y~x) +
stat_ellipse(size = 1.2) +
geom_abline(slope = 1, color = "black", size = 1.2) +
coord_fixed(clip = "off",
xlim = c(-0.7,1.6),
ylim = c(-0.7,1.6),
expand = expansion(0)) +
annotate("segment", x = -1.4, xend = 0, y = 0, yend = -1.4) +
annotate("path", x = diag_hist$xy, y = diag_hist$dens) +
theme_bw() +
theme(legend.position = c(.15, .85),
plot.margin = unit(c(.1,.1,2,2), "cm"))
I'm working with stock prices and trying to plot the price difference.
I created one using autoplot.zoo(), my question is, how can I manage to change the point shapes to triangles when they are above the upper threshold and to circles when they are below the lower threshold. I understand that when using the basic plot() function you can do these by calling the points() function, wondering how I can do this but with ggplot2.
Here is the code for the plot:
p<-autoplot.zoo(data, geom = "line")+
geom_hline(yintercept = threshold, color="red")+
geom_hline(yintercept = -threshold, color="red")+
ggtitle("AAPL vs. SPY out of sample")
p+geom_point()
We can't fully replicate without your data, but here's an attempt with some sample generated data that should be similar enough that you can adapt for your purposes.
# Sample data
data = data.frame(date = c(2001:2020),
spread = runif(20, -10,10))
# Upper and lower threshold
thresh <- 4
You can create an additional variable that determines the shape, based on the relationship in the data itself, and pass that as an argument into ggplot.
# Create conditional data
data$outlier[data$spread > thresh] <- "Above"
data$outlier[data$spread < -thresh] <- "Below"
data$outlier[is.na(data$outlier)] <- "In Range"
library(ggplot2)
ggplot(data, aes(x = date, y = spread, shape = outlier, group = 1)) +
geom_line() +
geom_point() +
geom_hline(yintercept = c(thresh, -thresh), color = "red") +
scale_shape_manual(values = c(17,16,15))
# If you want points just above and below# Sample data
data = data.frame(date = c(2001:2020),
spread = runif(20, -10,10))
thresh <- 4
data$outlier[data$spread > thresh] <- "Above"
data$outlier[data$spread < -thresh] <- "Below"
ggplot(data, aes(x = date, y = spread, shape = outlier, group = 1)) +
geom_line() +
geom_point() +
geom_hline(yintercept = c(thresh, -thresh), color = "red") +
scale_shape_manual(values = c(17,16))
Alternatively, you can just add the points above and below the threshold as individual layers with manually specified shapes, like this. The pch argument points to shape type.
# Another way of doing this
data = data.frame(date = c(2001:2020),
spread = runif(20, -10,10))
# Upper and lower threshold
thresh <- 4
ggplot(data, aes(x = date, y = spread, group = 1)) +
geom_line() +
geom_point(data = data[data$spread>thresh,], pch = 17) +
geom_point(data = data[data$spread< (-thresh),], pch = 16) +
geom_hline(yintercept = c(thresh, -thresh), color = "red") +
scale_shape_manual(values = c(17,16))
This is a slightly different question from an earlier post(ggplot hexbin shows different number of hexagons in plot versus data frame).
I am using hexbin() to bin data into hexagon objects, and ggplot() to plot the results. I notice that, sometimes, the hexagons on the edge of the plot are cut in half. Below is an example.
library(hexbin)
library(ggplot2)
set.seed(1)
data <- data.frame(A=rnorm(100), B=rnorm(100), C=rnorm(100), D=rnorm(100), E=rnorm(100))
maxVal = max(abs(data))
maxRange = c(-1*maxVal, maxVal)
x = data[,c("A")]
y = data[,c("E")]
h <- hexbin(x=x, y=y, xbins=5, shape=1, IDs=TRUE, xbnds=maxRange, ybnds=maxRange)
hexdf <- data.frame (hcell2xy (h), hexID = h#cell, counts = h#count)
ggplot(hexdf, aes(x = x, y = y, fill = counts, hexID = hexID)) +
geom_hex(stat = "identity") +
coord_cartesian(xlim = c(maxRange[1], maxRange[2]), ylim = c(maxRange[1], maxRange[2]))
This creates a graphic where one hexagon is cut off at the top and one hexagon is cut off at the bottom:
Another approach I can try is to hard-code a value (here 1.5) to be added to the limits of the x and y axis. Doing so does seem to solve the problem in that no hexagons are cut off anymore.
ggplot(hexdf, aes(x = x, y = y, fill = counts, hexID = hexID)) +
geom_hex(stat = "identity") +
scale_x_continuous(limits = maxRange * 1.5) +
scale_y_continuous(limits = maxRange * 1.5)
However, even though the second approach solves the problem in this instance, the value of 1.5 is arbitrary. I am trying to automate this process for a variety of data and variety of bin sizes and hexagon sizes that could be used. Is there a solution to keeping all hexagons fully visible in the plot without having to hard-code an arbitrary value that may be too large or too small for certain instances?
Consider that you can skip the computation of hexbin, and let ggplot do the job.
Then, if you prefer to manually set the width of the bins you can set the binwidth and modify the limits:
bwd = 1
ggplot(data, aes(x = x, y = y)) +
geom_hex(binwidth = bwd) +
coord_cartesian(xlim = c(min(x) - bwd, max(x) + bwd),
ylim = c(min(y) - bwd, max(y) + bwd),
expand = T) +
geom_point(color = "red") +
theme_bw()
this way, hexagons should never be truncated (though you may end up with some "empty" space.
Result with bwd = 1:
Result with bwd = 3:
If instead you prefer to programmatically set the number of the bins, you can use:
nbins_x <- 4
nbins_y <- 6
range_x <- range(data$A, na.rm = T)
range_y <- range(data$E, na.rm = T)
bwd_x <- (range_x[2] - range_x[1])/nbins_x
bwd_y <- (range_y[2] - range_y[1])/nbins_y
ggplot(data, aes(x = A, y = E)) +
geom_hex(bins = c(nbins_x,nbins_y)) +
coord_cartesian(xlim = c(range_x[1] - bwd_x, range_x[2] + bwd_x),
ylim = c(range_y[1] - bwd_y, range_y[2] + bwd_y),
expand = T) +
geom_point(color = "red")+
theme_bw()
I'm looking to create multiple density graphs, to make an "animated heat map."
Since each frame of the animation should be comparable, I'd like the density -> color mapping on each graph to be the same for all of them, even if the range of the data changes for each one.
Here's the code I'd use for each individual graph:
ggplot(data= this_df, aes(x=X, y=Y) ) +
geom_point(aes(color= as.factor(condition)), alpha= .25) +
coord_cartesian(ylim= c(0, 768), xlim= c(0,1024)) + scale_y_reverse() +
stat_density2d(mapping= aes(alpha = ..level..), geom="polygon", bins=3, size=1)
Imagine I use this same code, but 'this_df' changes on each frame. So in one graph, maybe density ranges from 0 to 4e-4. On another, density ranges from 0 to 4e-2.
By default, ggplot will calculate a distinct density -> color mapping for each of these. But this would mean the two graphs-- the two frames of the animation--aren't really comparable. If this were a histogram or density plot, I'd simply make a call to coord_cartesian and change the x and y lim. But for the density plot, I have no idea how to change the scale.
The closest I could find is this:
Overlay two ggplot2 stat_density2d plots with alpha channels
But I don't have the option of putting the two density plots on the same graph, since I want them to be distinct frames.
Any help would be hugely appreciated!
EDIT:
Here's a reproducible example:
set.seed(4)
g = list(NA,NA)
for (i in 1:2) {
sdev = runif(1)
X = rnorm(1000, mean = 512, sd= 300*sdev)
Y = rnorm(1000, mean = 384, sd= 200*sdev)
this_df = as.data.frame( cbind(X = X,Y = Y, condition = 1:2) )
g[[i]] = ggplot(data= this_df, aes(x=X, y=Y) ) +
geom_point(aes(color= as.factor(condition)), alpha= .25) +
coord_cartesian(ylim= c(0, 768), xlim= c(0,1024)) + scale_y_reverse() +
stat_density2d(mapping= aes(alpha = ..level.., color= as.factor(condition)), geom="contour", bins=4, size= 2)
}
print(g) # level has a different scale for each
I would like to leave an update for this question. As of July 2016, stat_density2d is not taking breaks any more. In order to reproduce the graphic, you need to move breaks=1e-6*seq(0,10,by=2) to scale_alpha_continuous().
set.seed(4)
g = list(NA,NA)
for (i in 1:2) {
sdev = runif(1)
X = rnorm(1000, mean = 512, sd= 300*sdev)
Y = rnorm(1000, mean = 384, sd= 200*sdev)
this_df = as.data.frame( cbind(X = X,Y = Y, condition = 1:2) )
g[[i]] = ggplot(data= this_df, aes(x=X, y=Y) ) +
geom_point(aes(color= as.factor(condition)), alpha= .25) +
coord_cartesian(ylim= c(0, 768), xlim= c(0,1024)) +
scale_y_reverse() +
stat_density2d(mapping= aes(alpha = ..level.., color= as.factor(condition)),
geom="contour", bins=4, size= 2) +
scale_alpha_continuous(limits=c(0,1e-5), breaks=1e-6*seq(0,10,by=2))+
scale_color_discrete("Condition")
}
do.call(grid.arrange,c(g,ncol=2))
So to have both plots show contours with the same levels, use the breaks=... argument in stat_densit2d(...). To have both plots with the same mapping of alpha to level, use scale_alpha_continuous(limits=...).
Here is the full code to demonstrate:
library(ggplot2)
set.seed(4)
g = list(NA,NA)
for (i in 1:2) {
sdev = runif(1)
X = rnorm(1000, mean = 512, sd= 300*sdev)
Y = rnorm(1000, mean = 384, sd= 200*sdev)
this_df = as.data.frame( cbind(X = X,Y = Y, condition = 1:2) )
g[[i]] = ggplot(data= this_df, aes(x=X, y=Y) ) +
geom_point(aes(color= as.factor(condition)), alpha= .25) +
coord_cartesian(ylim= c(0, 768), xlim= c(0,1024)) + scale_y_reverse() +
stat_density2d(mapping= aes(alpha = ..level.., color= as.factor(condition)),
breaks=1e-6*seq(0,10,by=2),geom="contour", bins=4, size= 2)+
scale_alpha_continuous(limits=c(0,1e-5))+
scale_color_discrete("Condition")
}
library(gridExtra)
do.call(grid.arrange,c(g,ncol=2))
And the result...
Not sure how useful this is, but I found it easier to either use:
scale_fill_gradient(low = "purple", high = "yellow", limits = c(0, 1000))
Where you can overwrite the limits of the plot easily, choose colors etc. and you can just add it at the end of your code so it'll overwrite most things it needs to, so it's easy to use
or a similar solution using:
library(viridis)#colors for heat map
scale_fill_viridis(option = 'inferno')+
scale_fill_viridis_c(limits = c(0, 1000))
I am creating a plot in ggplot2 with filled densities, a few of which I would like to truncate. I apologize for lack of images--apparently I'm not allowed to post them yet. A simple example of beginning code:
dd = with(density(rnorm(100,0,1)),data.frame(x,y))
ylimit = .3
ggplot(data = dd, mapping = aes(x = x, y = y), geom="line") +
layer(data = dd, mapping = aes(x = x, y = y), geom = "area",
geom_params=list(fill="red",alpha=.3)) +
scale_x_continuous(limits = c(-3,3)) +
scale_y_continuous(limits = c(0,ylimit))
This, however, results in an empty area in the middle of the filled density where dd$y > ylimit.
If I compensate for this with
dd$y = pmin(dd$y, ylimit)
The area is shaded but the plot displays an area slightly higher than ylimit, so the fill does not extend to the top of the graph.
Ideally I would like to know how to get ggplot display a plot exactly up to ylimit, but any other solutions for having the fill extend to the top of the plot would be welcome.
Edit:fixed the code.
I think this is what you meant. Note the use of ifelse to get the truncating behavior.
dd = with(density(rnorm(100,0,1)), data.frame(x, y))
ylimit = .3
dev.new(width=4, height=4)
ggplot(data = dd, mapping = aes(x = x, y = y), geom="line") +
layer(data = dd, mapping = aes(x = x, y = ifelse(y>ylimit, ylimit, y)), geom = "area",
geom_params=list(fill="red",alpha=.3)) +
scale_x_continuous(limits = c(-3,3)) +
coord_cartesian(ylim=c(0, ylimit))