ggplot2: Conditional formatting of x axis label in facet_grid - r

First post so please forgive any transgressions.
The data (russ_defensive) looks something like this
russ_defensive dataframe
And this code is meant to create a facet_grid of stacked bars conditionally filled by capitalisation and with axis.text.x colour set to red or black based on whether the industry is defensive or not
library(dplyr)
library(ggplot2)
chart_foo <- ggplot(data = russ_defensive, aes(x = industry)) +
facet_grid(~ sector, space = "free", scales="free") +
geom_bar(stat="count") + aes(fill = capitalisation) +
theme(axis.text.x = element_text(angle = 90, color = ifelse(russ_defensive$defensive_industries == "N", "red", "black")))
Around half of the industries are non-defensive (so russ_defensive$defensive_industries is "N") however this code only turns one of the labels red (see here) and gives the following error:
Warning message:
Vectorized input to `element_text()` is not officially supported.
Results may be unexpected or may change in future versions of ggplot2.
Is there a simple fix to this/ alternate method to conditionally formatting labels based on a column of the dataset?
Thanks for any help, if a reproducible dataset would be useful please let me know.

Well, as the warning tells you it is not recommended to choose the axis text colours by using vectorised theme input (although many people try nonetheless). I believe this was also one of the motivations behind the ggtext package, in which you can use markdown to stylise your text. Below, you'll find an example with a standard dataset, I hope it translates well to yours. We just conditionally apply colouring to some of the x-axis categories.
library(ggplot2)
library(ggtext)
#> Warning: package 'ggtext' was built under R version 4.0.3
df <- transform(mtcars, car = rownames(mtcars))[1:10,]
red_cars <- sample(df$car, 5)
df$car <- ifelse(
df$car %in% red_cars,
paste0("<span style='color:#FF0000'>", df$car, "</span>"),
df$car
)
ggplot(df, aes(car, mpg)) +
geom_col() +
theme(axis.text.x = element_markdown(angle = 90))
Created on 2021-02-03 by the reprex package (v1.0.0)
For more examples, see https://github.com/wilkelab/ggtext#markdown-in-theme-elements

Related

ggplot2 - Add extra space between two legend items

I've created a ggplot2 graph using the basic code below:
my_df %>%
ggplot(aes(conv_norm, vot_norm, color = language:poa)) +
geom_smooth(method = "glm", se=FALSE) +
theme(
...
)
[I've left out the formatting commands from the theme() layer]
And I got a graph that looks like this:
Now, my question is: how can I add extra space only in between two legend items? I've looked online and have found ways to increase the spacing between all items in the legend, but I only want extra spacing between the English items and the Spanish items. Is there a way to add a 1-in distance between these language groups?
Well, I don't know of an elegant, simple solution to do what you are asking to do... but by working with how legends are drawn and adjusting some of the elements, we can come up with a really "hacky" solution. ;)
Here's a sample dataset that kind of simulates what you shared, along with the plot:
set.seed(12345)
my_df <- data.frame(
lang = rep(c(paste('English',1:3), paste('Spanish',1:3)),2),
x = c(rep(0,6), rep(1,6)),
y = rnorm(12, 10,2))
library(ggplot2)
p <- ggplot(my_df, aes(x,y, color=lang)) + geom_line()
p
The approach here is going to be to combine all the following individual steps:
Add a "blank" legend entry. We do this by refactoring and specifying the levels of the column mydf$lang to include a blank entry in the correct position. This will be the final order of the items in the legend.
Use scale_color_manual() to set the colors of the legend items manually. I make sure to use "NA" for the blank entry.
Within scale_color_manual() I use the drop=FALSE setting. This includes all levels for a factor, even if there is no data on the plot to show. This makes our blank entry show on the legend.
Use the legend.background theme element to draw transparent boxes for the legend key items. This is so that you don't have a white or gray box for that blank entry.
Putting it all together you get this:
my_df$lang <- factor(my_df$lang, levels=c(paste('English',1:3), '', paste('Spanish',1:3)))
ggplot(my_df, aes(x,y, color=lang)) +
geom_line() +
scale_color_manual(
values=c(rainbow(6)[1:3], 'NA', rainbow(6)[4:6]),
drop=FALSE) +
theme( legend.key = element_rect(fill='NA') )
Alternatively, you could use guides(color=guide_legend(override.aes... to set the colors, but you need the drop=FALSE part within scale_color_manual() get the blank level to draw in the legend anyway.
Another option would be to create two separate legends. Either by using two different aesthetics, or you can use color twice, e.g with ggnewscale - thanks to user chemdork123 for the fake data +1.
library(tidyverse)
library(ggnewscale)
set.seed(12345)
my_df <- data.frame(
lang = rep(c(paste('English',1:3), paste('Spanish',1:3)),2),
x = c(rep(0,6), rep(1,6)),
y = rnorm(12, 10,2))
ggplot(mapping = aes(x,y)) +
geom_line(data = filter(my_df, grepl("English", lang)), aes(color=lang)) +
scale_color_brewer(NULL, palette = "Dark2") +
new_scale_colour() +
geom_line(data = filter(my_df, grepl("Spanish", lang)), aes(color=lang)) +
scale_color_brewer(palette = "Set1") +
guides(color = guide_legend(order = 1))
Created on 2021-04-11 by the reprex package (v1.0.0)

Plot zeros in a different color in ggplot geom_bar

In a ggplot (geom_bar), I'm looking to plot the zero-values in a different color.
Code for the bar-graph itself:
ggplot(Rodeococha, aes(x=Age ,y=Quantity)) +
geom_bar(color="dark red", stat = "identity")
And using the instructions for colouring specific values found on a different page I tried cutting my values into intervals and constructed:
ggplot(data= Rodeococha, aes(x= Age ,y= Quantity)) +
geom_bar(aes(colour = cut(qsec, c(-Inf,0,Inf))), stat = "identity") +
scale_colour_manual(name = "qsec", values = c("(-Inf,0]" = "black",
"(0,Inf]" = "red"))
Atm it gives the error
Error in cut(qsec, c(-Inf, 0, Inf)) : object 'qsec' not found.
Before this error, it also gave a few other errors so instead of taking even more time tackling this one error I thought why not ask advice, maybe there is someone else with a better idea.
Edit: the answer from #Tjebo worked.
For clarification to others: the plot is actually a stacked plot with 7 x-axes each containing multiple bars. This code was just the first x-axis. Showing the zeros in a different color was to make interpretation more easy.
Your code is not reproducible, I am therefore using another data set. First, bar graphs may not be appropriate here. It's difficult to show 'zeros' with bar graphs. I am increasing the line size in order to show the effect, and you will see that this has a quite undesired side effect.
For your question, just use a conditional statement as aesthetic. See below
If this is not what you want, provide better sample data and a desired output.
library(ggplot2)
ggplot(mtcars, aes(x= cyl,y= vs)) +
geom_bar(stat = "identity", size = 3, aes(color = vs == 0)) +
scale_colour_manual(name = "vs", values = c(`TRUE` = 'black',`FALSE` = "red"))
Created on 2020-03-30 by the reprex package (v0.3.0)

ggplotly fails with geom_vline() with xintercept Date value

Trying to use ggplotly to graph time series data with a vertical line to indicate dates of interest.
Call fails with Error in Ops.Date(z[[xy]], 86400000) : * not defined for "Date" objects. I have tried unsuccessfully using both the latest CRAN and development versions of ggplot2 (as per plotly recommendation). Other SO questions (e.g., ggplotly and geom_bar when using dates - latest version of plotly (4.7.0)) do not address my concerns.
As illustrated below with plot object p - both ggplot and ggplotly work as expected. However, when a geom_vline() is added to the plot in p2, it only works correctly in ggplot, failing when calling ggplotly(p2).
library(plotly)
library(ggplot2)
library(magrittr)
set.seed(1)
df <- data.frame(date = seq(from = lubridate::ymd("2019-01-01"), by = 1, length.out = 10),
y = rnorm(10))
p <- df %>%
ggplot(aes(x = date, y = y)) +
geom_line()
p ## plots as expected
ggplotly(p) ## plots as expected
p2 <- p + geom_vline(xintercept = lubridate::ymd("2019-01-08"), linetype = "dashed")
p2 ## plots as expected
ggplotly(p2) ##fails
I just solved this using #Axeman's suggestion. In your case, you can just replace the date:
lubridate::ymd("2019-01-01")
becomes
as.numeric(lubridate::ymd("2019-01-01"))
Not pretty, but it works.
For future reference:
The pop-up window for vertical lines created via date (or POSIX*) to numeric conversions is rather blank. This is particularly valid for POSIX* applications where the exact time can often not be read off directly.
In case you need more significant pop-up content, the definition of a text aesthetic could be helpful (just ignore the 'unknown aesthetics' warning as it doesn't seem to apply). Then, simply specify what you want to see during mouse hover via the tooltip argument, ie. rule out xintercept, and you're all set.
p2 = p +
geom_vline(
aes(
xintercept = as.numeric(lubridate::ymd("2019-01-08"))
, text = "date: 2019-01-08"
)
, linetype = "dashed"
)
ggplotly(p2, tooltip = c("x", "y", "text"))

Plotting data with R. How to break coordinates [duplicate]

I want to make a bar plot where one of the values is much bigger than all other values. Is there a way of having a discontinuous y-axis? My data is as follows:
df <- data.frame(a = c(1,2,3,500), b = c('a1', 'a2','a3', 'a4'))
p <- ggplot(data = df, aes(x = b, y = a)) + geom_bar()
p <- p + opts(axis.text.x=theme_text(angle= 90, hjust=1)) + coord_flip()
p
Is there a way that I can make my axis run from 1- 10, then 490 - 500? I can't think of any other way of plotting the data (aside from transforming it, which I don't want to do)
[Edit 2019-05-06]:
8 years later, above code needs to be amended to work with version 3.1.1 of ggplot2 in order to create the same chart:
library(ggplot2)
ggplot(df) +
aes(x = b, y = a) +
geom_col() +
coord_flip()
As noted elsewhere, this isn't something that ggplot2 will handle well, since broken axes are generally considered questionable.
Other strategies are often considered better solutions to this problem. Brian mentioned a few (faceting, two plots focusing on different sets of values). One other option that people too often overlook, particularly for barcharts, is to make a table:
Looking at the actual values, the 500 doesn't obscure the differences in the other values! For some reason tables don't get enough respect as data a visualization technique. You might object that your data has many, many categories which becomes unwieldy in a table. If so, it's likely that your bar chart will have too many bars to be sensible as well.
And I'm not arguing for tables all the time. But they are definitely something to consider if you are making barcharts with relatively few bars. And if you're making barcharts with tons of bars, you might need to rethink that anyway.
Finally, there is also the axis.break function in the plotrix package which implements broken axes. However, from what I gather you'll have to specify the axis labels and positions yourself, by hand.
Eight years later, the ggforce package offers a facet_zoom() extension which is an implementation of Hadley Wickham's suggestion to show two plots (as referenced in Brian Diggs' answer).
Zoom facet
library(ggforce)
ggplot(df) +
aes(x = b, y = a) +
geom_col() +
facet_zoom(ylim = c(0, 10))
Unfortunately, the current version 0.2.2 of ggforce throws an error with coord_flip() so only vertical bars can be shown.
The zoomed facet shows the variations of the small values but still contains the large - now cropped - a4 bar. The zoom.data parameter controls which values appear in the zoomed facet:
library(ggforce)
ggplot(df) +
aes(x = b, y = a) +
geom_col() +
facet_zoom(ylim = c(0, 10), zoom.data = ifelse(a <= 10, NA, FALSE))
Two plots
Hadley Wickham suggested
I think it's much more appropriate to show two plots - one of all the
data, and one of just the small values.
This code creates two plots
library(ggplot2)
g1 <- ggplot(df) +
aes(x = b, y = a) +
geom_col() +
coord_flip()
g2 <- ggplot(df) +
aes(x = b, y = a) +
geom_col() +
coord_flip() +
ylim(NA, 10)
which can be combined into one plot by
cowplot::plot_grid(g1, g2) # or ggpubr::ggarrange(g1, g2)
or
gridExtra::grid.arrange(g1, g2) # or egg::ggarrange(g1, g2)
Two facets
This was suggested in a comment by Chase and also by Brian Diggs in his answer who interpreted Hadley's suggestion to use
faceted plots, one with all the data, one zoomed in a particular region
but no code was supplied for this approach, so far.
As there is no simple way to scale facets separately (see related question, e.g.) the data needs to be manipulated:
library(dplyr)
library(ggplot2)
ggplot() +
aes(x = b, y = a) +
geom_col(data = df %>% mutate(subset = "all")) +
geom_col(data = df %>% filter(a <= 10) %>% mutate(subset = "small")) +
coord_flip() +
facet_wrap(~ subset, scales = "free_x")
No, not using ggplot. See the discussion in the thread at http://groups.google.com/group/ggplot2/browse_thread/thread/8d2acbfc59d2f247 where Hadley explains why it is not possible but gives a suggested alternative (faceted plots, one with all the data, one zoomed in a particular region).
Not with ggplot, but with plotrix you can easily do that:
library(plotrix)
gap.barplot(df$a, gap=c(5,495),horiz=T)
No, unfortunately not
The fear is that allowing discontinuous axes will lead to deceit of the audience. However, there are cases where not having a discontinuous axis leads to distortion.
For example, if the axis is truncated, but usually lies within some interval (say [0,1]), the audience may not notice the truncation and make distorted conclusions about the data. In this case, an explicit discontinuous axis would be more appropriate and transparent.
Compare:
An option could be using the ggbreak package using the scale_y_cut() or scale_x_cut() function. This function makes it possible to cut the ggplot object into parts with the possibility to specify which part is zoom in or zoom out. Here is a reproducible example with left plot normal and right plot with the function used:
df <- data.frame(a = c(1,2,3,500), b = c('a1', 'a2','a3', 'a4'))
library(ggplot2)
library(ggbreak)
library(patchwork)
p1 <- ggplot(df) +
aes(x = b, y = a) +
geom_col()
p2 <- ggplot(df) +
aes(x = b, y = a) +
geom_col() +
scale_y_cut(breaks=c(4, 30), which=c(1, 3), scales=c(0.5, 3))
p1 + p2
Created on 2022-08-22 with reprex v2.0.2
As you can see from the example, some parts are zoomed in and zoomed out. This can be changed by using different arguments.
Arguments used:
breaks:
a numeric or numeric vector, the points to be divided
which:
integer, the position of subplots to scales, started from left to
right or top to bottom.
scales:
numeric, relative width or height of subplots.
To change the space between the subplots, you can use the argument space.
For some extra information and examples check this tutorial.
A clever ggplot solution is provided by Jörg Steinkamp, using facet_grid. Simplified, it is something like this:
library("tidyverse")
df <- data.frame(myLetter=LETTERS[1:4], myValue=runif(12) + rep(c(4,0,0),2)) # cluster a few values well above 1
df$myFacet <- df$myValue > 3
(ggplot(df, aes(y=myLetter, x=myValue))
+ geom_point()
+ facet_grid(. ~ myFacet, scales="free", space="free")
+ scale_x_continuous(breaks = seq(0, 5, .25)) # this gives both facets equal interval spacing.
+ theme(strip.text.x = element_blank()) # get rid of the facet labels
)
As of 2022-06-01, we have the elegant-looking ggbreak package, which appears to answer the OP's question. Although I haven't tried it on my own data, it looks to be compatible with many or all other ggplot2 functionality. Offers differential scaling too, perhaps useful to OP's and similar uses.
library(ggplot2)
library(ggbreak)
set.seed(2019-01-19)
d <- data.frame(x = 1:20,
y = c(rnorm(5) + 4, rnorm(5) + 20, rnorm(5) + 5, rnorm(5) + 22))
p1 <- ggplot(d, aes(y, x)) + geom_col(orientation="y") +
theme_minimal()
p1 + scale_x_break(c(7, 17), scales = 1.5) + scale_x_break(c(18, 21), scales=2)
I doubt there's anything off the shelf in R, but you could show the data as a series of 3D partial cubes. 500 is only 5*10*10, so it would scale well. The exact value could be a label.
This probably should only be used if you must have a graphic representation for some reason.
One strategy is to change the axis to plot Log Scale. This way you get to reduce exponentially higher value by a factor of 10

Using ggplot2, can I insert a break in the axis?

I want to make a bar plot where one of the values is much bigger than all other values. Is there a way of having a discontinuous y-axis? My data is as follows:
df <- data.frame(a = c(1,2,3,500), b = c('a1', 'a2','a3', 'a4'))
p <- ggplot(data = df, aes(x = b, y = a)) + geom_bar()
p <- p + opts(axis.text.x=theme_text(angle= 90, hjust=1)) + coord_flip()
p
Is there a way that I can make my axis run from 1- 10, then 490 - 500? I can't think of any other way of plotting the data (aside from transforming it, which I don't want to do)
[Edit 2019-05-06]:
8 years later, above code needs to be amended to work with version 3.1.1 of ggplot2 in order to create the same chart:
library(ggplot2)
ggplot(df) +
aes(x = b, y = a) +
geom_col() +
coord_flip()
As noted elsewhere, this isn't something that ggplot2 will handle well, since broken axes are generally considered questionable.
Other strategies are often considered better solutions to this problem. Brian mentioned a few (faceting, two plots focusing on different sets of values). One other option that people too often overlook, particularly for barcharts, is to make a table:
Looking at the actual values, the 500 doesn't obscure the differences in the other values! For some reason tables don't get enough respect as data a visualization technique. You might object that your data has many, many categories which becomes unwieldy in a table. If so, it's likely that your bar chart will have too many bars to be sensible as well.
And I'm not arguing for tables all the time. But they are definitely something to consider if you are making barcharts with relatively few bars. And if you're making barcharts with tons of bars, you might need to rethink that anyway.
Finally, there is also the axis.break function in the plotrix package which implements broken axes. However, from what I gather you'll have to specify the axis labels and positions yourself, by hand.
Eight years later, the ggforce package offers a facet_zoom() extension which is an implementation of Hadley Wickham's suggestion to show two plots (as referenced in Brian Diggs' answer).
Zoom facet
library(ggforce)
ggplot(df) +
aes(x = b, y = a) +
geom_col() +
facet_zoom(ylim = c(0, 10))
Unfortunately, the current version 0.2.2 of ggforce throws an error with coord_flip() so only vertical bars can be shown.
The zoomed facet shows the variations of the small values but still contains the large - now cropped - a4 bar. The zoom.data parameter controls which values appear in the zoomed facet:
library(ggforce)
ggplot(df) +
aes(x = b, y = a) +
geom_col() +
facet_zoom(ylim = c(0, 10), zoom.data = ifelse(a <= 10, NA, FALSE))
Two plots
Hadley Wickham suggested
I think it's much more appropriate to show two plots - one of all the
data, and one of just the small values.
This code creates two plots
library(ggplot2)
g1 <- ggplot(df) +
aes(x = b, y = a) +
geom_col() +
coord_flip()
g2 <- ggplot(df) +
aes(x = b, y = a) +
geom_col() +
coord_flip() +
ylim(NA, 10)
which can be combined into one plot by
cowplot::plot_grid(g1, g2) # or ggpubr::ggarrange(g1, g2)
or
gridExtra::grid.arrange(g1, g2) # or egg::ggarrange(g1, g2)
Two facets
This was suggested in a comment by Chase and also by Brian Diggs in his answer who interpreted Hadley's suggestion to use
faceted plots, one with all the data, one zoomed in a particular region
but no code was supplied for this approach, so far.
As there is no simple way to scale facets separately (see related question, e.g.) the data needs to be manipulated:
library(dplyr)
library(ggplot2)
ggplot() +
aes(x = b, y = a) +
geom_col(data = df %>% mutate(subset = "all")) +
geom_col(data = df %>% filter(a <= 10) %>% mutate(subset = "small")) +
coord_flip() +
facet_wrap(~ subset, scales = "free_x")
No, not using ggplot. See the discussion in the thread at http://groups.google.com/group/ggplot2/browse_thread/thread/8d2acbfc59d2f247 where Hadley explains why it is not possible but gives a suggested alternative (faceted plots, one with all the data, one zoomed in a particular region).
Not with ggplot, but with plotrix you can easily do that:
library(plotrix)
gap.barplot(df$a, gap=c(5,495),horiz=T)
No, unfortunately not
The fear is that allowing discontinuous axes will lead to deceit of the audience. However, there are cases where not having a discontinuous axis leads to distortion.
For example, if the axis is truncated, but usually lies within some interval (say [0,1]), the audience may not notice the truncation and make distorted conclusions about the data. In this case, an explicit discontinuous axis would be more appropriate and transparent.
Compare:
An option could be using the ggbreak package using the scale_y_cut() or scale_x_cut() function. This function makes it possible to cut the ggplot object into parts with the possibility to specify which part is zoom in or zoom out. Here is a reproducible example with left plot normal and right plot with the function used:
df <- data.frame(a = c(1,2,3,500), b = c('a1', 'a2','a3', 'a4'))
library(ggplot2)
library(ggbreak)
library(patchwork)
p1 <- ggplot(df) +
aes(x = b, y = a) +
geom_col()
p2 <- ggplot(df) +
aes(x = b, y = a) +
geom_col() +
scale_y_cut(breaks=c(4, 30), which=c(1, 3), scales=c(0.5, 3))
p1 + p2
Created on 2022-08-22 with reprex v2.0.2
As you can see from the example, some parts are zoomed in and zoomed out. This can be changed by using different arguments.
Arguments used:
breaks:
a numeric or numeric vector, the points to be divided
which:
integer, the position of subplots to scales, started from left to
right or top to bottom.
scales:
numeric, relative width or height of subplots.
To change the space between the subplots, you can use the argument space.
For some extra information and examples check this tutorial.
A clever ggplot solution is provided by Jörg Steinkamp, using facet_grid. Simplified, it is something like this:
library("tidyverse")
df <- data.frame(myLetter=LETTERS[1:4], myValue=runif(12) + rep(c(4,0,0),2)) # cluster a few values well above 1
df$myFacet <- df$myValue > 3
(ggplot(df, aes(y=myLetter, x=myValue))
+ geom_point()
+ facet_grid(. ~ myFacet, scales="free", space="free")
+ scale_x_continuous(breaks = seq(0, 5, .25)) # this gives both facets equal interval spacing.
+ theme(strip.text.x = element_blank()) # get rid of the facet labels
)
As of 2022-06-01, we have the elegant-looking ggbreak package, which appears to answer the OP's question. Although I haven't tried it on my own data, it looks to be compatible with many or all other ggplot2 functionality. Offers differential scaling too, perhaps useful to OP's and similar uses.
library(ggplot2)
library(ggbreak)
set.seed(2019-01-19)
d <- data.frame(x = 1:20,
y = c(rnorm(5) + 4, rnorm(5) + 20, rnorm(5) + 5, rnorm(5) + 22))
p1 <- ggplot(d, aes(y, x)) + geom_col(orientation="y") +
theme_minimal()
p1 + scale_x_break(c(7, 17), scales = 1.5) + scale_x_break(c(18, 21), scales=2)
I doubt there's anything off the shelf in R, but you could show the data as a series of 3D partial cubes. 500 is only 5*10*10, so it would scale well. The exact value could be a label.
This probably should only be used if you must have a graphic representation for some reason.
One strategy is to change the axis to plot Log Scale. This way you get to reduce exponentially higher value by a factor of 10

Resources