Using geom_segment to create a timeline visualization - r

I am trying to create a chart like this one produced in the NYTimes using ggplot:
I think I'm getting close, but I'm not quite sure how to separate out some of my data so I get the right view. My data is political office holders that appear something like this:
name,year_elected,year_left,years_in_office,type,party
Person 1,1969,1969,1,Candidate,Unknown
Person 2,1969,1971,2,Candidate,Unknown
Person 3,1969,1973,4,Candidate,Unknown
Person 4,1969,1973,4,Candidate,Unknown
Person 5,1971,1974,3,Candidate,Unknown
Person 1,1971,1976,5,Candidate,Unknown
Person 2,1971,1980,9,Candidate,Unknown
Person 6,1973,1978,5,Candidate,Unknown
Person 7,1973,1980,7,Candidate,Unknown
Person 8,1975,1980,5,Candidate,Unknown
Person 9,1977,1978,1,Candidate,Unknown
And I've used the below code to get very close to this view, but I think an issue I'm running into is either drawing segments incorrectly (e.g., I don't seem to have a single segment for each candidate), or segments are overlapping/stacking. The key issue I'm running into is my list of office holders is around 60, but my chart is only drawing around 28 lines.
library(googlesheets)
library(tidyverse)
# I'm reading from a Google Spreadsheet
data <- gs_title("Council Members")
data_sj <- gs_read(ss = data, ws = "Sheet1")
ggplot(data, aes(year_elected, years_in_office)) +
geom_segment(aes(x = year_elected, y = 0,
xend = year_left, yend = years_in_office)) +
theme_minimal()
The above code gives me:
Thanks ahead of time for any pointers!

If your data frame is called d, then:
Transform it to data.table
Add jitter to year_electer
Add equivalent jitter to year_left
Add group (as an example) to color your samples
Use ggrepel to add text if there are many points.
Code:
library(data.table)
library(ggplot2)
library(ggrepel)
d[, year_elected2 := jitter(year_elected)]
d[, year_left2 := year_left + year_elected2 - year_elected + 0.01]
d[, group := TRUE]
d[factor(years_in_office %/% 9) == 1, group := FALSE]
ggplot(d, aes(year_elected2, years_in_office)) +
geom_segment(aes(x = year_elected2, xend = year_left2,
y = 0, yend = years_in_office, linetype = group),
alpha = 0.8, size = 1, color = "grey") +
geom_point(aes(year_left2), color = "black", size = 3.3) +
geom_point(aes(year_left2, color = group), size = 2.3) +
geom_text_repel(aes(year_left2, label = name), ) +
scale_colour_brewer(guide = FALSE, palette = "Dark2") +
scale_linetype_manual(guide = FALSE, values = c(2, 1)) +
labs(x = "Year elected",
y = "Years on office") +
theme_minimal(base_size = 10)
Result:

For the record and to address my comment on #PoGibas answer above, here's my tidyverse version:
data_transform <- data_sj %>%
mutate(year_elected_jitter = jitter(year_elected)) %>%
mutate(year_left_jitter = year_left + year_elected_jitter - year_elected + 0.01)
ggplot(data_transform, aes(year_elected, years_in_office, label = name)) +
geom_segment(aes(x = year_elected_jitter, y = 0, xend = year_left_jitter, yend = years_in_office, color = gender), size = 0.3) +
geom_text_repel(aes(year_left_jitter, label = name)) +
theme_minimal()

Related

How to add geom_rect to geom_line plot

I am plotting a time series of returns and would like to use NBER recession dating to shade recessions, like FRED graphs do.
The recession variable is in the same data frame and is a 1, 0 variable for: 1 = Recession, 0 = Expansion.
The idea is to use geom_rect and alpha = (Recession == 1) to shade the areas where Recession == 1.
The code for the gg_plot is below. Thanks for the help!
ERVALUEplot <- ggplot(data = Alldata)+
geom_line(aes(x = Date, y = ERVALUE), color = 'red')+
geom_rect(aes(x = Date, alpha = (Alldata$Recession ==1)), color = 'grey')
I think your case might be slightly simplified by using geom_tile() instead of geom_rect(). The output is the same but the parametrisation is easier.
I have presumed your data had a structure roughly like this:
library(ggplot2)
set.seed(2)
Alldata <- data.frame(
Date = Sys.Date() + 1:10,
ERVALUE = cumsum(rnorm(10)),
Recession = sample(c(0, 1), 10, replace = TRUE)
)
With this data, we can make grey rectangles wherever recession == 1 as follows. Here, I've mapped it to a scale to generate a legend automatically.
ggplot(Alldata, aes(Date)) +
geom_tile(aes(alpha = Recession, y = 1),
fill = "grey", height = Inf) +
geom_line(aes(y = ERVALUE), colour = "red") +
scale_alpha_continuous(range = c(0, 1), breaks = c(0, 1))
Created on 2021-08-25 by the reprex package (v1.0.0)

how to prevent an overlapped segments in geom_segment

I'm trying to map different ranges (lines) into different regions in the plot (see below) using geom_segment but some of the ranges overlap and can't be shown at all.
This is a minimal example for a dataframes:
start = c(1, 5,8, 14)
end =c(3, 6,12, 16)
regions = c(1,2,3, 4)
regions = data_frame(regions, start, end)
from = c(1,2, 5.5, 13.5)
to = c(3,2.5,6, 15)
lines = data_frame(from, to)
I plotted the regions with geom_rect and then plot the ranges (lines) with geom_segment.
This is the plot:
plot_splice <- ggplot() +
scale_x_continuous(breaks = seq(1,16)) +
scale_y_continuous() +
geom_hline(yintercept = 1.6,
size = 20,
alpha = 0.1) +
geom_rect(
data = regions,
mapping = aes(
xmin = start,
xmax = end,
ymin = 1.5,
ymax = 1.8,
)) +
geom_segment(
data = lines,
x = (lines$from),
xend = (lines$to),
y = 1.48,
yend = 1.48,
colour = "red",
size = 3
) +
ylim(1.0, 2.2) +
xlab("") +
theme_minimal()
The first plot is the one generated with the code whereas the second one is the desired plot.
As you can see, the second line overlaps with the first one, so you can't see the second line at all.
How can I change the code to produce the second plot?
I'm trying to use ifelse statement but not sure what is test argument should be:
I want it to check for each range (line) if it is overlapped with any previous range (line) to change the y position by around .05, so it doesn't overlap.
lines <- lines %>%
dplyr::arrange(desc(from))
new_y$lines = ifelse(from[1] < to[0], 1.48, 1.3)
geom_segment(
data = lines,
x = (lines$from),
xend = (lines$to),
y = new_y,
yend = new_y,
colour = "red",
size = 3
)
Your geom_segment call isn't using any aesthetic mapping, which is how you normally get ggplot elements to change position based on a particular variable (or set of variables).
The stacking of the geom_segment based on the number of overlapping regions is best calculated ahead of the call to ggplot. This allows you to pass the x and y values into an aesthetic mapping:
# First ensure that the data feame is ordered by the start time
lines <- lines[order(lines$from),]
# Now iterate through each row, calculating how many previous rows have
# earlier starts but haven't yet finished when the current row starts.
# Multiply this number by a small negative offset and add the 1.48 baseline value
lines$offset <- 1.48 - 0.03 * sapply(seq(nrow(lines)), function(i) {
with(lines[seq(i),], length(which(from < from[i] & to > from[i])))
})
Now do the same plot but using aesthetic mapping inside geom_segment:
ggplot() +
scale_x_continuous(breaks = seq(1,16), name = "") +
scale_y_continuous(limits = c(1, 2.2), name = "") +
geom_hline(yintercept = 1.6,
size = 20,
alpha = 0.1) +
geom_rect(
data = regions,
mapping = aes(
xmin = start,
xmax = end,
ymin = 1.5,
ymax = 1.8,
)) +
geom_segment(
data = lines,
mapping = aes(
x = from,
xend = to,
y = offset,
yend = offset),
colour = "red",
size = 3
) +
theme_minimal()

R: ggplot2 let the characters of geom_text exactly cover one X unit

I want to highlight text based on the position in a string, for example if we have this text:
this is a really nice informative piece of text
Then I want to say let's draw a rectangle around positions 2 till 4:
t[his] is a really nice informative piece of text
I tried to do so in ggplot2 using the following code:
library(ggplot2)
library(dplyr)
box.data <- data.frame(
start = c(4,6,5,7,10,7),
type = c('BOX1.start', 'BOX1.start', 'BOX1.start','BOX1.end', 'BOX1.end', 'BOX1.end'),
text.id = c(1,2,3,1,2,3)
)
text.data <- data.frame(
x = rep(1,3),
text.id = c(1,2,3),
text = c('Thisissomerandomrandomrandomrandomtext1',
'Thisissomerandomrandomrandomrandomtext2',
'Thisissomerandomrandomrandomrandomtext3')
)
ggplot(data = text.data, aes(x = x, y = text.id)) +
scale_x_continuous(limits = c(1, nchar(as.character(text.data$text[1])))) +
geom_text(label = text.data$text, hjust = 0, size = 3) +
geom_line(data = box.data, aes(x = start, y = text.id, group = text.id, size = 3, alpha = 0.5, colour = 'red'))
This produces the following graph:
My method fails as a letter does not cover exactly one unit of the x-axis, is there any way to achieve this?
I just figured out that I can split the string in characters and plot these, perhaps it is useful for someone else.
library(ggplot2)
library(dplyr)
library(splitstackshape)
# First remember the plotting window, which equals the text length
text.size = nchar(as.character(text.data$text[1]))
# Split the string into single characters, and adjust the X-position to the string position
text.data <- cSplit(text.data, 'text', sep = '', direction = 'long', stripWhite = FALSE) %>%
group_by(text.id) %>%
mutate(x1 = seq(1,n()))
# Plot each character and add highlights
ggplot(data = text.data, aes(x = x1, y = text.id)) +
scale_x_continuous(limits = c(1, text.size)) +
geom_text(aes(x = text.data$x1, y = text.data$text.id, group = text.id, label = text)) +
geom_line(data = box.data, aes(x = start, y = text.id, group = text.id, size = 3, alpha = 0.5, colour = 'red'))
Which produces this plot:
Perhaps the marking should extend a little but upwards and downwards, but that's an easy fix.

Mark in ggplot when the observations go below a certain number

I have a dataset where
example <- data.frame(
Country = rep(c("A", "B"), each = 12),
IP = c(55,56,59,63,67,69,69,73,74,74,79,87,0,22,24,26,26,31,37,41,43,46,46,47),
Mean_st = c(46,47,49,50,53,55,53,57,60,57,58,63,0,19,20,21,22,25,26,28,29,30,31,31)
)
ggplot(example) +
geom_line(aes(x = IP, y = Mean_st, color = Country), size = 2) +
geom_vline(xintercept = 73) +
geom_vline(xintercept = 42)
I need to mark where the number of observations is below a certain number (let's say less than 5). I can find that point in my spreadsheet for each of the countries (73 and 42), and use geom_vline like in the example, but is there a way of finding that point directly in ggplot without the need of checking the spreadsheet?
You could do something like:
n_below <- 5
ggplot(example) +
geom_line(aes(x = IP, y = Mean_st, color = Country), size = 2) +
geom_vline(xintercept = sort(example$IP[example$Country == "A"][n_below])) +
geom_vline(xintercept = sort(example$IP[example$Country == "B"][n_below]))

Mix color and fill aesthetics in ggplot

I wonder if there is the possibility to change the fill main colour according to a categorical variable
Here is a reproducible example
df = data.frame(x = c(rnorm(10, mean = 0),
rnorm(10, mean = 3)),
y = c(rnorm(10, mean = 0),
rnorm(10, mean = 3)),
grp = c(rep('a', times = 10),
rep('b', times = 10)),
val = rep(1:10, times = 2))
ggplot(data = df,
aes(x = x,
y = y)) +
geom_point(pch = 21,
aes(color = grp,
fill = val,
size = val))
Of course it is easy to change the circle colour/shape, according to the variable grp, but I'd like to have the a group in shades of red and the b group in shades of blue.
I also thought about using facets, but don't know if the fill gradient can be changed for the two panels.
Anyone knows if that can be done, without gridExtra?
Thanks!
I think there are two ways to do this. The first is using the alpha aesthetic for your val column. This is a quick and easy way to accomplish your goal but may not be exactly what you want:
ggplot(data = df,
aes(x = x,
y = y)) +
geom_point(pch = 21,
aes(alpha=val,
fill = grp,
size = val)) + theme_minimal()
The second way would be to do something similar to this post: Vary the color gradient on a scatter plot created with ggplot2. I edited the code slightly so its not a range from white to your color of interest but from a lighter color to a darker color. This requires a little bit of work and using the scale_fill_identity function which basically takes a variable that has the colors you want and maps them directly to each point (so it doesn't do any scaling).
This code is:
#Rescale val to [0,1]
df$scaled_val <- rescale(df$val)
low_cols <- c("firebrick1","deepskyblue")
high_cols <- c("darkred","deepskyblue4")
df$col <- ddply(df, .(grp), function(x)
data.frame(col=apply(colorRamp(c(low_cols[as.numeric(x$grp)[1]], high_cols[as.numeric(x$grp)[1]]))(x$scaled_val),
1,function(x)rgb(x[1],x[2],x[3], max=255)))
)$col
df
ggplot(data = df,
aes(x = x,
y = y)) +
geom_point(pch = 21,
aes(
fill = col,
size = val)) + theme_minimal() +scale_fill_identity()
Thanks to this other post I found a way to visualize the fill bar in the legend, even though that wasn't what I meant to do.
Here's the ouptup
And the code
df = data.frame(x = c(rnorm(10, mean = 0),
rnorm(10, mean = 3)),
y = c(rnorm(10, mean = 0),
rnorm(10, mean = 3)),
grp = factor(c(rep('a', times = 10),
rep('b', times = 10)),
levels = c('a', 'b')),
val = rep(1:10, times = 2)) %>%
group_by(grp) %>%
mutate(scaledVal = rescale(val)) %>%
ungroup %>%
mutate(scaledValOffSet = scaledVal + 100*(as.integer(grp) - 1))
scalerange <- range(df$scaledVal)
gradientends <- scalerange + rep(c(0,100,200), each=2)
ggplot(data = df,
aes(x = x,
y = y)) +
geom_point(pch = 21,
aes(fill = scaledValOffSet,
size = val)) +
scale_fill_gradientn(colours = c('white',
'darkred',
'white',
'deepskyblue4'),
values = rescale(gradientends))
Basically one should rescale fill values (e.g. between 0 and 1) and separate them using another order of magnitude, provided by the categorical variable grp.
This is not what I wanted though: the snippet can be improved, of course, to make the whole thing less manual, but still lacks the simple usual discrete fill legend.

Resources