ggplot2 geom_linerange remove whitespace between rows - r

Am attempting to create a plot similar to a strip chart recorder showing outage data. Outage Severity is Major and Minor. Plot has a large amount of vertical white space between the two rows and before and after that I would like to remove to create a compact two-row chart.
dataframe is:
> head(dfsub)
StartDateTime EndDateTime Outage.DUR Outage.Severity
1 2021-07-01T00:23:33.0000000 2021-07-01T00:25:26.0000000 1.8833333 Minor
2 2021-07-01T00:25:26.0000000 2021-07-01T00:31:33.0000000 6.1166667 Major
3 2021-07-01T00:31:33.0000000 2021-07-01T00:40:34.0000000 9.0166667 Major
4 2021-07-01T00:40:34.0000000 2021-07-01T00:42:57.0000000 2.3833333 Minor
5 2021-07-01T00:42:57.0000000 2021-07-01T00:43:49.0000000 0.8666667 Minor
6 2021-07-01T00:43:49.0000000 2021-07-01T00:45:35.0000000 1.7666667 Minor
R Code I am running
ggplot(dfsub) +
geom_linerange(aes(y = Outage.Severity,
xmin = StartDateTime,
xmax = EndDateTime,
colour = as.factor(Outage.Severity)
),
show.legend = FALSE,
size = 50) +
scale_color_manual(values = c("red", "yellow")) +
theme(legend.position = "none") +
theme_test()
generates this plot

Two suggestions.
You didn't ask about this, but your x-axis is broken, using time (which is a continuous thing) in a categorical sense. Note that R and ggplot2 are treating the current columns as strings not timestamps. This is easily resolved:
dfsub[c("StartDateTime", "EndDateTime")] <-
lapply(dfsub[c("StartDateTime", "EndDateTime")], as.POSIXct, format="%Y-%m-%dT%H:%M:%OS", tz="UTC")
I don't think you're going to get the fine control over blank space between the reds and yellows using geom_linerange, I suggest geom_rect as an option. With that, remove size=, and we'll need to control ymin= and ymax=. This benefits from setting Outage.Severity to a factor; while not completely necessary, it's common for this work to then come back with "how do I change the order of the y-axis categories?", for which the only (sane) response is to convert them to factors and control their levels=. We also need to add fill=, which geom_linerange did not need.
dfsub$Outage.Severity <- factor(dfsub$Outage.Severity) # add 'levels=' if you want to control the order
From here, knowing that categorical data are plotted on integers, we'll fill the gap between them by extending their rectangles +/- 0.48 (arbitrary, but should likely be close to but not at/beyond 0.5).
ggplot(dfsub) +
geom_rect(aes(ymin = as.numeric(Outage.Severity)-0.48,
ymax = as.numeric(Outage.Severity)+0.48,
xmin = StartDateTime,
xmax = EndDateTime,
colour = Outage.Severity,
fill = Outage.Severity),
show.legend = FALSE) +
scale_y_continuous(breaks = unique(as.numeric(dfsub$Outage.Severity)), labels = unique(dfsub$Outage.Severity)) +
scale_color_manual(values = c("Major"="red", "Minor"="yellow")) +
scale_fill_manual(values = c("Major"="red", "Minor"="yellow")) +
theme(legend.position = "none") +
theme_test()

Related

Adding space *just* on right size of x-axis, color based on relative position, specify labels

I have a time series graph of 49 countries, and I'd like to do three things: (1) prevent the country label name from being cut off, (2) specify so that the coloring is based on the position in the graph rather than alphabetically, and (3) specify which countries I would like to label (49 labels in one graph is too many).
library(ggplot2)
library(directlabels)
library(zoo)
library(RColorBrewer)
library(viridis)
colourCount = length(unique(df$newCol))
getPalette = colorRampPalette(brewer.pal(11, "Paired"))
## Yearly Incorporation Rates
ggplot(df,aes(x=year2, y=total_count_th, group = newCol, color = newCol)) +
geom_line() +
geom_dl(aes(label = newCol),
method= list(dl.trans(x = x + 0.1),
"last.points", cex = 0.8)) +
scale_color_manual(values = getPalette(colourCount)) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1),
legend.position = "none") +
labs(title = "Title",
x = "Year",
y = "Count")
This code works -- there are 49 lines, and each of them is labelled. But it just so happens that all the countries with the highest y-values have the same/similar colors (red/orange). So is there a way to specify the colors dynamically (maybe with scale_color_identity)? And how do I add space just on the right side of the labels? I found the expand = expand_scale, but it added space on both sides (though I did read that in the new version, it should be possible to do so.)
I am also fine defining a list of 49 manually-defined colors rather than using the color ramp.
One way to do it is to limit the x axis by adding something like
coord_cartesian(xlim = c(1,44), expand = TRUE)
In this case, I had 41 years of observations on the axis, so by specifying 44, I added space to the x-axis.
Thank you to #JonSpring for the help and getting me to the right answer!

How to add labels and points to each geom_line in ggplot2?

I have a dataframe called (casos_obitos) that looks something like this:
EPI_WEEK CASES DEATHS
SE 51 1053 19
SE 52 1384 21
SE 53 1892 25
SE 01/21 1806 43
I'm making a plot with ggplot that places both cases and deaths in two different geom_lines. This is my code:
scl = 10
ggplot(data = casos_obitos, aes(x = EPI_WEEK, y = CASES, fill = CASES, group =1))+
scale_y_continuous(limits = c(0, max(casos_obitos$CASES)+10), expand = expansion(mult = c(0, .1)),
sec.axis = sec_axis(~./scl, name = "Nº de Óbitos"))+
geom_line(aes(x = SEM_EPI, y = CASES, color = "CASES"), size = 1)+
geom_line(aes(x = SEM_EPI, y = DEATHS*scl, color = "DEATHS"), size = 1) +
geom_text(aes(label= CASES), hjust= 0.5, vjust = -2, size= 2.0, color= "black") +
labs(x = "Semana Epidemiológica", y = "Nº de Casos") +
scale_colour_manual(" ", values=c("CASES" = "blue", "DEATHS" = "red"))+
theme_minimal(base_size = 10) +
theme(legend.position = "bottom", axis.line = element_line(colour = "black"),
axis.text.x=element_text(angle = 90, vjust = 0.5, hjust=1, color="black"),
axis.text.y=element_text(color="black"))
For now, my plot looks like this:
Where the blue line is the cases column and the red one is the deaths column. I need to put labels on the red line but I can't seem to find answers for that. I also wany to put labels in a "nice looking" way so I can understand the numbers and they don't look messy like they're right now.
Thanks!
You should be able to add the following to get labels on the bottom line:
geom_text(aes(y = DEATHS*scl, label= DEATHS), hjust= 0.5, vjust = -2, size= 2.0, color= "black") +
You might also consider reshaping your data into a long format so that the CASES and DEATHS (after scaling) values are combined into the same column, with another column distinguishing which series is related to each value. ggplot2 generally works more smoothly with data in that form -- you would map the color aesthetic to the column specifying which series, and then you'd only need one geom_line and one geom_text to get both series. In this case, with only two series, and one of them scaled, it might not be worth the trouble to switch.
"Nice looking labels" is subjective and a harder problem than it might sound. There are a few options, including:
use a function like ggrepel::geom_text_repel to automatically shift labels from overlapping each other. It works by starting from an initial point and iteratively nudging until the labels have as much separation as you've specified. Many options for adjusting the initial starting position and how the nudging should work.
manually nudge the labels you need to using code, e.g. by adjusting vjust for certain points. You might, for instance, use vjust to make the labels under the line for the points that are lower than neighboring points, by pre-calculating a moving average and comparing values to that.
manually nudge the points afterward, e.g. by using officer/svg to output to a vector file you can edit in powerpoint, for instance.
avoid persistent labels altogether by shifting to an interactive option like ggplotly and see the labels upon hover instead of all the time.
You might also take a look at functions like scales::comma to control how the labels themselves appear. I'm anticipating that your Deaths labels will have many digits of decimals but you probably just will want the integer part of that...

Discrete values and geom_ribbon and geom_lines + problems with "discrete" scale

I have got a file like this one:
Month,Open,Closed
2017-08,53,38
2017-09,102,85
2017-10,58,38
2017-11,51,42
2017-12,32,24
2018-01,24,30
2018-02,56,46
2018-03,82,74
2018-04,95,89
2018-05,16,86
I want to plot both lines, and also shade the difference between them. So this works:
ggplot() +geom_line(data=issues.m,aes(x=Month,y=Open,group=1))
+geom_line(data=issues.m,aes(x=Month,y=Closed,group=1))
+geom_ribbon(data=issues.m, aes(x=Month,ymin=Closed,ymax=Open,color=Open-Closed))
+theme_tufte()
+theme(axis.text.x = element_text(angle = 90, hjust = 1))
producing this
First problem here is that I would like the whole area between the two lines shaded if possible, not a single line. How can I do that?
But I would also like to color the two lines. If I add a color to one of them:
ggplot()
+geom_line(data=issues.m,aes(x=Month,y=Open,group=1,color='open'))
+geom_line(data=issues.m,aes(x=Month,y=Closed,group=1))
+geom_ribbon(data=issues.m, aes(x=Month,ymin=Closed,ymax=Open,color=Open-Closed))
+theme_tufte()
+theme(axis.text.x = element_text(angle = 90, hjust = 1))
I get the error:
Error: Continuous value supplied to discrete scale
So, can what I want to do be done at all? Would it be possible to change the colour palette of the ribbon too?
Your error was because you were mapping Open - Closed onto the color, which will be a continuous variable, i.e. the difference between those two values for each month. But you also assigned "open" to color inside the aes in one of your geom_lines. That means you're trying to assign both continuous values and discrete values to the same scale, and that's not going to work.
If all you need to do is get 2 colors, one for each line, you can do this one of two ways, the second of which fits more into the ggplot/tidyverse way of doing things.
First off I turned your dates into date objects to clean up the x-axis and avoid rotating the labels—feel free to experiment with the date breaks that work well in scale_x_date.
The less "tidy" way is to just make two geom_lines, one for Open and one for Closed, and assign a color to each.
library(tidyverse)
df_dated <- df %>%
mutate(month2 = sprintf("%s-01", Month) %>% lubridate::ymd())
ggplot(df_dated, aes(x = month2)) +
geom_ribbon(aes(ymin = Open, ymax = Closed), fill = "lightblue2") +
geom_line(aes(y = Open), color = "green3") +
geom_line(aes(y = Closed), color = "red") +
ggthemes::theme_tufte()
But the more idiomatically "tidy" way is to make a long-shaped version of the data so you can map a variable—in this case whether an observation is the opening or closing value—onto an aesthetic such as color. This also gives you a legend—if you don't want it, you can get rid of it in the theme. This lets you set a scale for the colors, instead of hard-coding into each geom_line.
df_date_long <- df_dated %>%
gather(key, value, -month2, -Month)
ggplot(df_dated, aes(x = month2)) +
geom_ribbon(aes(ymin = Open, ymax = Closed), fill = "lightblue2") +
geom_line(aes(y = value, color = key), data = df_date_long) +
scale_color_manual(values = c(Open = "green3", Closed = "red")) +
ggthemes::theme_tufte()

Secondary / Dual axis - ggplot

I am opening this question for three reasons : First, to re-open the dual-axis discussion with ggplot. Second, to ask if there is a non-torturing generic approach to do that. And finally to ask for your help with respect to a work-around.
I realize that there are multiple discussions and questions on how to add a secondary axis to a ggplot. Those usually end up in one of two conclusions:
It's bad, don't do it: Hadley Wickham answered the same question here, concluding that it is not possible. He had a very good argument that "using separate y scales (not y-scales that are transformations of each other) are fundamentally flawed".
If you insist, over-complicate your life and use grids : for example here and here
However, here are some situations that I often face, in which the visualization would greatly benefit from dual-axis. I abstracted the concepts below.
The plot is wide, hence duplicating the y-axis on the right side would help (or x-axis on the top) would ease interpretation. (We've all stumbled across one of those plots where we need to use a ruler on the screen, because the axis is too far)
I need to add a new axis that is a transformation to the original axes (eg: percentages, quantiles, .. ). (I am currently facing a problem with that. Reproducible example below)
And finally, adding Grouping/Meta information: I stumble across that when using categorical data with multiple-level, (e.g.: Categories = {1,2,x,y,z}, which are "meta-divided" into letters and numerics.) Even though color-coding the meta-levels and adding a legend or even facetting solve the issue, things get a little bit simpler with a secondary axis, where the user won't need to match the color of the bars to that of the legend.
General question: Given the new extensibility features ggplot 2.0.0, is there a more-robust no-torture way to have dual-axis without using grids?
And one final comment: I absolutely agree that the wrong use of dual-axis can be dangerously misleading... But, isn't that the case for information visualization and data science in general?
Work-around question:
Currently, I need to have a percentage-axis (2nd case). I used annotate and geom_hline as a workaround. However, I can't move the text outside the main plot. hjust also didn't seem to work with me.
Reproducible example:
library(ggplot2)
# Random values generation - with some manipulation :
maxVal = 500
value = sample(1:maxVal, size = 100, replace = T)
value[value < 400] = value[value < 400] * 0.2
value[value > 400] = value[value > 400] * 0.9
# Data Frame prepartion :
labels = paste0(sample(letters[1:3], replace = T, size = length(value)), as.character(1:length(value)))
df = data.frame(sample = factor(labels, levels = labels), value = sort(value, decreasing = T))
# Plotting : Adding Percentages/Quantiles as lines
ggplot(data = df, aes(x = sample, y = value)) +
geom_bar(stat = "identity", fill = "grey90", aes(y = maxVal )) +
geom_bar(stat = "identity", fill = "#00bbd4") +
geom_hline(yintercept = c(0, maxVal)) + # Min and max values
geom_hline(yintercept = c(maxVal*0.25, maxVal*0.5, maxVal*0.75), alpha = 0.2) + # Marking the 25%, 50% and 75% values
annotate(geom = "text", x = rep(100,3), y = c(maxVal*0.25, maxVal*0.5, maxVal*0.75),
label = c("25%", "50%", "75%"), vjust = 0, hjust = 0.2) +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
theme(panel.background = element_blank()) +
theme(plot.background = element_blank()) +
theme(plot.margin = unit(rep(2,4), units = "lines"))
In response to #1
We've all stumbled across one of those plots where we need to use a ruler on the screen, because the axis is too far
cowplot.
# Assign your original plot to some variable, `gpv` <- ggplot( ... )
ggdraw(switch_axis_position(gpv, axis="y", keep="y"))

Omitting a Missing x-axis Value in ggplot2 (Convert range to categorical variable)

I am using ggplot to generate a chart that summarises a race made up from several laps. There are 24 participants in the race,numbered 1-12, 14-25; I am plotting out a summary measure for each participant using ggplot, but ggplot assumes I want the number range 1-25, rather than categories 1-12, 14-25.
What's the fix for this? Here's the code I am using (the data is sourced from a Google spreadsheet).
sskey='0AmbQbL4Lrd61dHlibmxYa2JyT05Na2pGVUxLWVJYRWc'
library("ggplot2")
require(RCurl)
gsqAPI = function(key,query,gid){ return( read.csv( paste( sep="", 'http://spreadsheets.google.com/tq?', 'tqx=out:csv', '&tq=', curlEscape(query), '&key=', key, '&gid=', curlEscape(gid) ) ) ) }
sin2011racestatsX=gsqAPI(sskey,'select A,B,G',gid='13')
sin2011proximity=gsqAPI(sskey,'select A,B,C',gid='12')
h=sin2011proximity
k=sin2011racestatsX
l=subset(h,lap==1)
ggplot() +
geom_step(aes(x=h$car, y=h$pos, group=h$car)) +
scale_x_discrete(limits =c('VET','WEB','HAM','BUT','ALO','MAS','SCH','ROS','SEN','PET','BAR','MAL','','SUT','RES','KOB','PER','BUE','ALG','KOV','TRU','RIC','LIU','GLO','AMB'))+
xlab(NULL) + opts(title="F1 2011 Korea \nRace Summary Chart", axis.text.x=theme_text(angle=-90, hjust=0)) +
geom_point(aes(x=l$car, y=l$pos, pch=3, size=2)) +
geom_point(aes(x=k$driverNum, y=k$classification,size=2), label='Final') +
geom_point(aes(x=k$driverNum, y=k$grid, col='red')) +
ylab("Position")+
scale_y_discrete(breaks=1:24,limits=1:24)+ opts(legend.position = "none")
Expanding on my cryptic comment, try this:
#Convert these to factors with the appropriate labels
# Note that I removed the ''
h$car <- factor(h$car,labels = c('VET','WEB','HAM','BUT','ALO','MAS','SCH','ROS','SEN','PET','BAR','MAL',
'SUT','RES','KOB','PER','BUE','ALG','KOV','TRU','RIC','LIU','GLO','AMB'))
k$driverNum <- factor(k$driverNum,labels = c('VET','WEB','HAM','BUT','ALO','MAS','SCH','ROS','SEN','PET','BAR','MAL',
'SUT','RES','KOB','PER','BUE','ALG','KOV','TRU','RIC','LIU','GLO','AMB'))
l=subset(h,lap==1)
ggplot() +
geom_step(aes(x=h$car, y=h$pos, group=h$car)) +
geom_point(aes(x=l$car, y=l$pos, pch=3, size=2)) +
geom_point(aes(x=k$driverNum, y=k$classification,size=2), label='Final') +
geom_point(aes(x=k$driverNum, y=k$grid, col='red')) +
ylab("Position") +
scale_y_discrete(breaks=1:24,limits=1:24) + opts(legend.position = "none") +
opts(title="F1 2011 Korea \nRace Summary Chart", axis.text.x=theme_text(angle=-90, hjust=0)) + xlab(NULL)
Calling scale_x_discrete is no longer necessary. And stylistically, I prefer putting opts and xlab stuff at the end.
Edit
A few notes in response to your comment. Many of your difficulties can be eased by a more streamlined use of ggplot. Your data is in an awkward format:
#Summarise so we can use geom_linerange rather than geom_step
d1 <- ddply(h,.(car),summarise,ymin = min(pos),ymax = max(pos))
#R has a special value for missing data; use it!
k$classification[k$classification == 'null'] <- NA
k$classification <- as.integer(k$classification)
#The other two data sets should be merged and converted to long format
d2 <- merge(l,k,by.x = "car",by.y = "driverNum")
colnames(d2)[3:5] <- c('End of Lap 1','Final Position','Grid Position')
d2 <- melt(d2,id.vars = 1:2)
#Now the plotting call is much shorter
ggplot() +
geom_linerange(data = d1,aes(x= car, ymin = ymin,ymax = ymax)) +
geom_point(data = d2,aes(x= car, y= value,shape = variable),size = 2) +
opts(title="F1 2011 Korea \nRace Summary Chart", axis.text.x=theme_text(angle=-90, hjust=0)) +
labs(x = NULL, y = "Position", shape = "")
A few notes. You were setting aesthetics to fixed values (size = 2) which should be done outside of aes(). aes() is for mapping variables (i.e. columns) to aesthetics (color, shape, size, etc.). This allows ggplot to intelligently create the legend for you.
Merging the second two data sets and then melting it creates a grouping variable for ggplot to use in the legend. I used the shape aesthetic since a few values overlap; using color may make that hard to spot. In general, ggplot will resist mixing aesthetics into a single legend. If you want to use shape, color and size you'll get three legends.
I prefer setting labels using labs, since you can do them all in one spot. Note that setting the aesthetic label to "" removes the legend title.

Resources