How to add geom_rect to geom_line plot - r

I am plotting a time series of returns and would like to use NBER recession dating to shade recessions, like FRED graphs do.
The recession variable is in the same data frame and is a 1, 0 variable for: 1 = Recession, 0 = Expansion.
The idea is to use geom_rect and alpha = (Recession == 1) to shade the areas where Recession == 1.
The code for the gg_plot is below. Thanks for the help!
ERVALUEplot <- ggplot(data = Alldata)+
geom_line(aes(x = Date, y = ERVALUE), color = 'red')+
geom_rect(aes(x = Date, alpha = (Alldata$Recession ==1)), color = 'grey')

I think your case might be slightly simplified by using geom_tile() instead of geom_rect(). The output is the same but the parametrisation is easier.
I have presumed your data had a structure roughly like this:
library(ggplot2)
set.seed(2)
Alldata <- data.frame(
Date = Sys.Date() + 1:10,
ERVALUE = cumsum(rnorm(10)),
Recession = sample(c(0, 1), 10, replace = TRUE)
)
With this data, we can make grey rectangles wherever recession == 1 as follows. Here, I've mapped it to a scale to generate a legend automatically.
ggplot(Alldata, aes(Date)) +
geom_tile(aes(alpha = Recession, y = 1),
fill = "grey", height = Inf) +
geom_line(aes(y = ERVALUE), colour = "red") +
scale_alpha_continuous(range = c(0, 1), breaks = c(0, 1))
Created on 2021-08-25 by the reprex package (v1.0.0)

Related

Layering violin plots with geom_violin to compare distributions

I am trying to compare the distributions of a continuous variable across groups using violin plots. Pretty easy. However, I would like to make comparisons across distributions easier by showing the distribution for one of the groups (the reference) in grey with a low alpha value in the background. Something like this but with a violin plot:
My current approach plots the data twice. For the first geom_violin, I duplicate the data for the reference group and plot it in grey. For the second geom_violin, I use the actual data d. In this example, the two violin plots in grey and blue should look the same for the group "blue". However, they are NOT the same even though they are based on exactly the same data for group "blue".
How can I resolve this problem? Or is there another better approach to do this?
d <- tibble(
group = sample(c("green", "blue"), 1000, replace = TRUE, prob = c(0.7, 0.3)),
x = ifelse(group == "green", rnorm(1000, 1, 1), rnorm(1000, 0, 3))
)
dblue <- filter(d, group == "blue")
dblue <- bind_rows(dblue, mutate(dblue, group = "green"))
ggplot(d, aes(x = factor(group), y = x)) +
geom_violin(data = dblue, fill = alpha("#333333", 0.2), color = alpha("#333333", 0)) +
geom_violin(fill = alpha("#0072B2", 0.8), color = alpha("#0072B2", 0))
Add scale = "width" to the second geom_violin
ggplot(d, aes(x = factor(group), y = x)) +
geom_violin(data = dblue, fill = alpha("#333333", 0.2), color = alpha("#333333", 0)) +
geom_violin(fill = alpha("#0072B2", 0.8), color = alpha("#0072B2", 0),
scale = "width")

ggplot2 point size by numeric: Do not display point when value = 0

I have data of two time series that I would like to plot together. The x-axis will be date and the y-axis will be a line graph of series 1, while the point sizes will be scaled based on the numeric value of series 2. However, when series 2 = 0, I would like ggplot to not display a point at all. I've tried setting the range of point sizes from a minimum of 0, but it still displays points for values of 0.
Here's code to reproduce the problem:
Dates = c("2015-05-01", "2015-05-02", "2015-05-03", "2015-05-04", "2015-05-05", "2015-05-06")
Dates = as.Date(Dates)
Series1 = c(0,2,8,5,3,1)
Series2 = c(0,0,5,0,10,5)
df = data.frame(Dates, Series1, Series2)
ggplot(data = df)+
geom_line(aes(x=Dates, y = Series1))+
geom_point(aes(x=Dates, y = Series1, size = Series2))+
scale_size_continuous(range = c(0, 5))
This produces the following graph:
How can I make ggplot2 not create a point when Series2 = 0, but still display the line?
I also tried replacing 0's with NA's for Series2, but this results in the plot failing.
You can change the minimum value to a negative one:
ggplot(data = df) +
geom_line(aes(x = Dates, y = Series1))+
geom_point(aes(x = Dates, y = Series1, size = Series2))+
scale_size_continuous(range = c(-1, 5))
In case you do not want the legend to include 0 you can add breaks:
scale_size_continuous(range = c(-1, 5), breaks = seq(2.5, 10, 2.5))
Another option is to make use of alpha to turn size == 0 points invisible. We set alpha in the aes to the logical expression Series2 == 0, and then use scale_alpha_manual to set values to 1 if FALSE and 0 (invisible) if TRUE:
ggplot(data = df)+
geom_line(aes(x=Dates, y = Series1))+
geom_point(aes(x=Dates, y = Series1, size = Series2, alpha = Series2 == 0))+
scale_size_continuous(range = c(1, 5)) +
scale_alpha_manual(values = c(1,0)) +
guides(alpha = FALSE) # Hide the legend for alpha

Using geom_segment to create a timeline visualization

I am trying to create a chart like this one produced in the NYTimes using ggplot:
I think I'm getting close, but I'm not quite sure how to separate out some of my data so I get the right view. My data is political office holders that appear something like this:
name,year_elected,year_left,years_in_office,type,party
Person 1,1969,1969,1,Candidate,Unknown
Person 2,1969,1971,2,Candidate,Unknown
Person 3,1969,1973,4,Candidate,Unknown
Person 4,1969,1973,4,Candidate,Unknown
Person 5,1971,1974,3,Candidate,Unknown
Person 1,1971,1976,5,Candidate,Unknown
Person 2,1971,1980,9,Candidate,Unknown
Person 6,1973,1978,5,Candidate,Unknown
Person 7,1973,1980,7,Candidate,Unknown
Person 8,1975,1980,5,Candidate,Unknown
Person 9,1977,1978,1,Candidate,Unknown
And I've used the below code to get very close to this view, but I think an issue I'm running into is either drawing segments incorrectly (e.g., I don't seem to have a single segment for each candidate), or segments are overlapping/stacking. The key issue I'm running into is my list of office holders is around 60, but my chart is only drawing around 28 lines.
library(googlesheets)
library(tidyverse)
# I'm reading from a Google Spreadsheet
data <- gs_title("Council Members")
data_sj <- gs_read(ss = data, ws = "Sheet1")
ggplot(data, aes(year_elected, years_in_office)) +
geom_segment(aes(x = year_elected, y = 0,
xend = year_left, yend = years_in_office)) +
theme_minimal()
The above code gives me:
Thanks ahead of time for any pointers!
If your data frame is called d, then:
Transform it to data.table
Add jitter to year_electer
Add equivalent jitter to year_left
Add group (as an example) to color your samples
Use ggrepel to add text if there are many points.
Code:
library(data.table)
library(ggplot2)
library(ggrepel)
d[, year_elected2 := jitter(year_elected)]
d[, year_left2 := year_left + year_elected2 - year_elected + 0.01]
d[, group := TRUE]
d[factor(years_in_office %/% 9) == 1, group := FALSE]
ggplot(d, aes(year_elected2, years_in_office)) +
geom_segment(aes(x = year_elected2, xend = year_left2,
y = 0, yend = years_in_office, linetype = group),
alpha = 0.8, size = 1, color = "grey") +
geom_point(aes(year_left2), color = "black", size = 3.3) +
geom_point(aes(year_left2, color = group), size = 2.3) +
geom_text_repel(aes(year_left2, label = name), ) +
scale_colour_brewer(guide = FALSE, palette = "Dark2") +
scale_linetype_manual(guide = FALSE, values = c(2, 1)) +
labs(x = "Year elected",
y = "Years on office") +
theme_minimal(base_size = 10)
Result:
For the record and to address my comment on #PoGibas answer above, here's my tidyverse version:
data_transform <- data_sj %>%
mutate(year_elected_jitter = jitter(year_elected)) %>%
mutate(year_left_jitter = year_left + year_elected_jitter - year_elected + 0.01)
ggplot(data_transform, aes(year_elected, years_in_office, label = name)) +
geom_segment(aes(x = year_elected_jitter, y = 0, xend = year_left_jitter, yend = years_in_office, color = gender), size = 0.3) +
geom_text_repel(aes(year_left_jitter, label = name)) +
theme_minimal()

Non-linear color distribution over the range of values in a geom_raster

I'm faced with the following problem: a few extreme values are dominating the colorscale of my geom_raster plot. An example is probably more clear (note that this example only works with a recent ggplot2 version, I use 0.9.2.1):
library(ggplot2)
library(reshape)
theme_set(theme_bw())
m_small_sd = melt(matrix(rnorm(10000), 100, 100))
m_big_sd = melt(matrix(rnorm(100, sd = 10), 10, 10))
new_xy = m_small_sd[sample(nrow(m_small_sd), nrow(m_big_sd)), c("X1","X2")]
m_big_sd[c("X1","X2")] = new_xy
m = data.frame(rbind(m_small_sd, m_big_sd))
names(m) = c("x", "y", "fill")
ggplot(m, aes_auto(m)) + geom_raster() + scale_fill_gradient2()
Right now I solve this by setting the values over a certain quantile equal to that quantile:
qn = quantile(m$fill, c(0.01, 0.99), na.rm = TRUE)
m = within(m, { fill = ifelse(fill < qn[1], qn[1], fill)
fill = ifelse(fill > qn[2], qn[2], fill)})
This does not really feel like an optimal solution. What I would like to do is have a non-linear mapping of colors to the range of values, i.e. more colors present in the area with more observations. In spplot I could use classIntervals from the classInt package to calculate the appropriate class boundaries:
library(sp)
library(classInt)
gridded(m) = ~x+y
col = c("#EDF8B1", "#C7E9B4", "#7FCDBB", "#41B6C4",
"#1D91C0", "#225EA8", "#0C2C84", "#5A005A")
at = classIntervals(m$fill, n = length(col) + 1)$brks
spplot(m, at = at, col.regions = col)
To my knowledge it is not possible to hardcode this mapping of colors to class intervals like I can in spplot. I could transform the fill axis, but as there are negative values in the fill variable that will not work.
So my question is: are there any solutions to this problem using ggplot2?
Seems that ggplot (0.9.2.1) and scales (0.2.2) bring all you need (for your original m):
library(scales)
qn = quantile(m$fill, c(0.01, 0.99), na.rm = TRUE)
qn01 <- rescale(c(qn, range(m$fill)))
ggplot(m, aes(x = x, y = y, fill = fill)) +
geom_raster() +
scale_fill_gradientn (
colours = colorRampPalette(c("darkblue", "white", "darkred"))(20),
values = c(0, seq(qn01[1], qn01[2], length.out = 18), 1)) +
theme(legend.key.height = unit (4.5, "lines"))

How I can make in ggplot2 my first 10 lines in red and the rest lines in blue based on example (R, ggplot2)

There were example code for E on ggplot2 library:
theme_set(theme_bw())
dat = data.frame(value = rnorm(100,sd=2.5))
dat = within(dat, {
value_scaled = scale(value, scale = sd(value))
obs_idx = 1:length(value)
})
ggplot(aes(x = obs_idx, y = value_scaled), data = dat) +
geom_ribbon(ymin = -1, ymax = 1, alpha = 0.1) +
geom_line() + geom_point()
There is a question: How I can make in ggplot2 my first 10 lines in red and the rest lines in blue based on example? I tried to use some kind of layer syntax is, but it doesn't work.
First, add another column to your data frame dat. It has value 0 for the first 10 rows and 1 for the rest.
dat$group <- factor(rep.int(c(0, 1), c(10, nrow(dat)-10)))
Generate the plot:
library(ggplot2)
ggplot(aes(x = obs_idx, y = value_scaled), data = dat) +
geom_ribbon(ymin = -1, ymax = 1, alpha = 0.1) +
geom_line(aes(colour = group), show_guide = FALSE) +
scale_colour_manual(values = c("red", "blue")) +
geom_point()
The parameter show_guide = FALSE suppresses the legend for the red and blue lines.
OK, I could manage layers, the code is (not elegant, but works):
require(ggplot2)
value=round(rnorm(50,200,50),0)
nmbrs<-length(value) ## length of vector
obrv<-1:length(value) ## list of observations
#create data frame from the values
data_lj<-data.frame(obrv,value)
data_lj20<-data.frame(data_lj[1:20,1:2])
data_lj21v<-data.frame(data_lj[20:nmbrs,1:2])
#plot with ggplot
rr<-ggplot()+
layer(mapping=aes(obrv,value),geom="line",data=data_lj20,colour="red")+
layer(mapping=aes(obrv,value),geom="line",data=data_lj21v,colour="blue")
print(rr)

Resources