I want to make a bar plot where one of the values is much bigger than all other values. Is there a way of having a discontinuous y-axis? My data is as follows:
df <- data.frame(a = c(1,2,3,500), b = c('a1', 'a2','a3', 'a4'))
p <- ggplot(data = df, aes(x = b, y = a)) + geom_bar()
p <- p + opts(axis.text.x=theme_text(angle= 90, hjust=1)) + coord_flip()
p
Is there a way that I can make my axis run from 1- 10, then 490 - 500? I can't think of any other way of plotting the data (aside from transforming it, which I don't want to do)
[Edit 2019-05-06]:
8 years later, above code needs to be amended to work with version 3.1.1 of ggplot2 in order to create the same chart:
library(ggplot2)
ggplot(df) +
aes(x = b, y = a) +
geom_col() +
coord_flip()
As noted elsewhere, this isn't something that ggplot2 will handle well, since broken axes are generally considered questionable.
Other strategies are often considered better solutions to this problem. Brian mentioned a few (faceting, two plots focusing on different sets of values). One other option that people too often overlook, particularly for barcharts, is to make a table:
Looking at the actual values, the 500 doesn't obscure the differences in the other values! For some reason tables don't get enough respect as data a visualization technique. You might object that your data has many, many categories which becomes unwieldy in a table. If so, it's likely that your bar chart will have too many bars to be sensible as well.
And I'm not arguing for tables all the time. But they are definitely something to consider if you are making barcharts with relatively few bars. And if you're making barcharts with tons of bars, you might need to rethink that anyway.
Finally, there is also the axis.break function in the plotrix package which implements broken axes. However, from what I gather you'll have to specify the axis labels and positions yourself, by hand.
Eight years later, the ggforce package offers a facet_zoom() extension which is an implementation of Hadley Wickham's suggestion to show two plots (as referenced in Brian Diggs' answer).
Zoom facet
library(ggforce)
ggplot(df) +
aes(x = b, y = a) +
geom_col() +
facet_zoom(ylim = c(0, 10))
Unfortunately, the current version 0.2.2 of ggforce throws an error with coord_flip() so only vertical bars can be shown.
The zoomed facet shows the variations of the small values but still contains the large - now cropped - a4 bar. The zoom.data parameter controls which values appear in the zoomed facet:
library(ggforce)
ggplot(df) +
aes(x = b, y = a) +
geom_col() +
facet_zoom(ylim = c(0, 10), zoom.data = ifelse(a <= 10, NA, FALSE))
Two plots
Hadley Wickham suggested
I think it's much more appropriate to show two plots - one of all the
data, and one of just the small values.
This code creates two plots
library(ggplot2)
g1 <- ggplot(df) +
aes(x = b, y = a) +
geom_col() +
coord_flip()
g2 <- ggplot(df) +
aes(x = b, y = a) +
geom_col() +
coord_flip() +
ylim(NA, 10)
which can be combined into one plot by
cowplot::plot_grid(g1, g2) # or ggpubr::ggarrange(g1, g2)
or
gridExtra::grid.arrange(g1, g2) # or egg::ggarrange(g1, g2)
Two facets
This was suggested in a comment by Chase and also by Brian Diggs in his answer who interpreted Hadley's suggestion to use
faceted plots, one with all the data, one zoomed in a particular region
but no code was supplied for this approach, so far.
As there is no simple way to scale facets separately (see related question, e.g.) the data needs to be manipulated:
library(dplyr)
library(ggplot2)
ggplot() +
aes(x = b, y = a) +
geom_col(data = df %>% mutate(subset = "all")) +
geom_col(data = df %>% filter(a <= 10) %>% mutate(subset = "small")) +
coord_flip() +
facet_wrap(~ subset, scales = "free_x")
No, not using ggplot. See the discussion in the thread at http://groups.google.com/group/ggplot2/browse_thread/thread/8d2acbfc59d2f247 where Hadley explains why it is not possible but gives a suggested alternative (faceted plots, one with all the data, one zoomed in a particular region).
Not with ggplot, but with plotrix you can easily do that:
library(plotrix)
gap.barplot(df$a, gap=c(5,495),horiz=T)
No, unfortunately not
The fear is that allowing discontinuous axes will lead to deceit of the audience. However, there are cases where not having a discontinuous axis leads to distortion.
For example, if the axis is truncated, but usually lies within some interval (say [0,1]), the audience may not notice the truncation and make distorted conclusions about the data. In this case, an explicit discontinuous axis would be more appropriate and transparent.
Compare:
An option could be using the ggbreak package using the scale_y_cut() or scale_x_cut() function. This function makes it possible to cut the ggplot object into parts with the possibility to specify which part is zoom in or zoom out. Here is a reproducible example with left plot normal and right plot with the function used:
df <- data.frame(a = c(1,2,3,500), b = c('a1', 'a2','a3', 'a4'))
library(ggplot2)
library(ggbreak)
library(patchwork)
p1 <- ggplot(df) +
aes(x = b, y = a) +
geom_col()
p2 <- ggplot(df) +
aes(x = b, y = a) +
geom_col() +
scale_y_cut(breaks=c(4, 30), which=c(1, 3), scales=c(0.5, 3))
p1 + p2
Created on 2022-08-22 with reprex v2.0.2
As you can see from the example, some parts are zoomed in and zoomed out. This can be changed by using different arguments.
Arguments used:
breaks:
a numeric or numeric vector, the points to be divided
which:
integer, the position of subplots to scales, started from left to
right or top to bottom.
scales:
numeric, relative width or height of subplots.
To change the space between the subplots, you can use the argument space.
For some extra information and examples check this tutorial.
A clever ggplot solution is provided by Jörg Steinkamp, using facet_grid. Simplified, it is something like this:
library("tidyverse")
df <- data.frame(myLetter=LETTERS[1:4], myValue=runif(12) + rep(c(4,0,0),2)) # cluster a few values well above 1
df$myFacet <- df$myValue > 3
(ggplot(df, aes(y=myLetter, x=myValue))
+ geom_point()
+ facet_grid(. ~ myFacet, scales="free", space="free")
+ scale_x_continuous(breaks = seq(0, 5, .25)) # this gives both facets equal interval spacing.
+ theme(strip.text.x = element_blank()) # get rid of the facet labels
)
As of 2022-06-01, we have the elegant-looking ggbreak package, which appears to answer the OP's question. Although I haven't tried it on my own data, it looks to be compatible with many or all other ggplot2 functionality. Offers differential scaling too, perhaps useful to OP's and similar uses.
library(ggplot2)
library(ggbreak)
set.seed(2019-01-19)
d <- data.frame(x = 1:20,
y = c(rnorm(5) + 4, rnorm(5) + 20, rnorm(5) + 5, rnorm(5) + 22))
p1 <- ggplot(d, aes(y, x)) + geom_col(orientation="y") +
theme_minimal()
p1 + scale_x_break(c(7, 17), scales = 1.5) + scale_x_break(c(18, 21), scales=2)
I doubt there's anything off the shelf in R, but you could show the data as a series of 3D partial cubes. 500 is only 5*10*10, so it would scale well. The exact value could be a label.
This probably should only be used if you must have a graphic representation for some reason.
One strategy is to change the axis to plot Log Scale. This way you get to reduce exponentially higher value by a factor of 10
Related
I want to make a bar plot where one of the values is much bigger than all other values. Is there a way of having a discontinuous y-axis? My data is as follows:
df <- data.frame(a = c(1,2,3,500), b = c('a1', 'a2','a3', 'a4'))
p <- ggplot(data = df, aes(x = b, y = a)) + geom_bar()
p <- p + opts(axis.text.x=theme_text(angle= 90, hjust=1)) + coord_flip()
p
Is there a way that I can make my axis run from 1- 10, then 490 - 500? I can't think of any other way of plotting the data (aside from transforming it, which I don't want to do)
[Edit 2019-05-06]:
8 years later, above code needs to be amended to work with version 3.1.1 of ggplot2 in order to create the same chart:
library(ggplot2)
ggplot(df) +
aes(x = b, y = a) +
geom_col() +
coord_flip()
As noted elsewhere, this isn't something that ggplot2 will handle well, since broken axes are generally considered questionable.
Other strategies are often considered better solutions to this problem. Brian mentioned a few (faceting, two plots focusing on different sets of values). One other option that people too often overlook, particularly for barcharts, is to make a table:
Looking at the actual values, the 500 doesn't obscure the differences in the other values! For some reason tables don't get enough respect as data a visualization technique. You might object that your data has many, many categories which becomes unwieldy in a table. If so, it's likely that your bar chart will have too many bars to be sensible as well.
And I'm not arguing for tables all the time. But they are definitely something to consider if you are making barcharts with relatively few bars. And if you're making barcharts with tons of bars, you might need to rethink that anyway.
Finally, there is also the axis.break function in the plotrix package which implements broken axes. However, from what I gather you'll have to specify the axis labels and positions yourself, by hand.
Eight years later, the ggforce package offers a facet_zoom() extension which is an implementation of Hadley Wickham's suggestion to show two plots (as referenced in Brian Diggs' answer).
Zoom facet
library(ggforce)
ggplot(df) +
aes(x = b, y = a) +
geom_col() +
facet_zoom(ylim = c(0, 10))
Unfortunately, the current version 0.2.2 of ggforce throws an error with coord_flip() so only vertical bars can be shown.
The zoomed facet shows the variations of the small values but still contains the large - now cropped - a4 bar. The zoom.data parameter controls which values appear in the zoomed facet:
library(ggforce)
ggplot(df) +
aes(x = b, y = a) +
geom_col() +
facet_zoom(ylim = c(0, 10), zoom.data = ifelse(a <= 10, NA, FALSE))
Two plots
Hadley Wickham suggested
I think it's much more appropriate to show two plots - one of all the
data, and one of just the small values.
This code creates two plots
library(ggplot2)
g1 <- ggplot(df) +
aes(x = b, y = a) +
geom_col() +
coord_flip()
g2 <- ggplot(df) +
aes(x = b, y = a) +
geom_col() +
coord_flip() +
ylim(NA, 10)
which can be combined into one plot by
cowplot::plot_grid(g1, g2) # or ggpubr::ggarrange(g1, g2)
or
gridExtra::grid.arrange(g1, g2) # or egg::ggarrange(g1, g2)
Two facets
This was suggested in a comment by Chase and also by Brian Diggs in his answer who interpreted Hadley's suggestion to use
faceted plots, one with all the data, one zoomed in a particular region
but no code was supplied for this approach, so far.
As there is no simple way to scale facets separately (see related question, e.g.) the data needs to be manipulated:
library(dplyr)
library(ggplot2)
ggplot() +
aes(x = b, y = a) +
geom_col(data = df %>% mutate(subset = "all")) +
geom_col(data = df %>% filter(a <= 10) %>% mutate(subset = "small")) +
coord_flip() +
facet_wrap(~ subset, scales = "free_x")
No, not using ggplot. See the discussion in the thread at http://groups.google.com/group/ggplot2/browse_thread/thread/8d2acbfc59d2f247 where Hadley explains why it is not possible but gives a suggested alternative (faceted plots, one with all the data, one zoomed in a particular region).
Not with ggplot, but with plotrix you can easily do that:
library(plotrix)
gap.barplot(df$a, gap=c(5,495),horiz=T)
No, unfortunately not
The fear is that allowing discontinuous axes will lead to deceit of the audience. However, there are cases where not having a discontinuous axis leads to distortion.
For example, if the axis is truncated, but usually lies within some interval (say [0,1]), the audience may not notice the truncation and make distorted conclusions about the data. In this case, an explicit discontinuous axis would be more appropriate and transparent.
Compare:
An option could be using the ggbreak package using the scale_y_cut() or scale_x_cut() function. This function makes it possible to cut the ggplot object into parts with the possibility to specify which part is zoom in or zoom out. Here is a reproducible example with left plot normal and right plot with the function used:
df <- data.frame(a = c(1,2,3,500), b = c('a1', 'a2','a3', 'a4'))
library(ggplot2)
library(ggbreak)
library(patchwork)
p1 <- ggplot(df) +
aes(x = b, y = a) +
geom_col()
p2 <- ggplot(df) +
aes(x = b, y = a) +
geom_col() +
scale_y_cut(breaks=c(4, 30), which=c(1, 3), scales=c(0.5, 3))
p1 + p2
Created on 2022-08-22 with reprex v2.0.2
As you can see from the example, some parts are zoomed in and zoomed out. This can be changed by using different arguments.
Arguments used:
breaks:
a numeric or numeric vector, the points to be divided
which:
integer, the position of subplots to scales, started from left to
right or top to bottom.
scales:
numeric, relative width or height of subplots.
To change the space between the subplots, you can use the argument space.
For some extra information and examples check this tutorial.
A clever ggplot solution is provided by Jörg Steinkamp, using facet_grid. Simplified, it is something like this:
library("tidyverse")
df <- data.frame(myLetter=LETTERS[1:4], myValue=runif(12) + rep(c(4,0,0),2)) # cluster a few values well above 1
df$myFacet <- df$myValue > 3
(ggplot(df, aes(y=myLetter, x=myValue))
+ geom_point()
+ facet_grid(. ~ myFacet, scales="free", space="free")
+ scale_x_continuous(breaks = seq(0, 5, .25)) # this gives both facets equal interval spacing.
+ theme(strip.text.x = element_blank()) # get rid of the facet labels
)
As of 2022-06-01, we have the elegant-looking ggbreak package, which appears to answer the OP's question. Although I haven't tried it on my own data, it looks to be compatible with many or all other ggplot2 functionality. Offers differential scaling too, perhaps useful to OP's and similar uses.
library(ggplot2)
library(ggbreak)
set.seed(2019-01-19)
d <- data.frame(x = 1:20,
y = c(rnorm(5) + 4, rnorm(5) + 20, rnorm(5) + 5, rnorm(5) + 22))
p1 <- ggplot(d, aes(y, x)) + geom_col(orientation="y") +
theme_minimal()
p1 + scale_x_break(c(7, 17), scales = 1.5) + scale_x_break(c(18, 21), scales=2)
I doubt there's anything off the shelf in R, but you could show the data as a series of 3D partial cubes. 500 is only 5*10*10, so it would scale well. The exact value could be a label.
This probably should only be used if you must have a graphic representation for some reason.
One strategy is to change the axis to plot Log Scale. This way you get to reduce exponentially higher value by a factor of 10
I want to make a number of symmetrical histograms to show butterfly abundance through time. Here's a site that shows the form of the graphs I am trying to create: http://thebirdguide.com/pelagics/bar_chart.htm
For ease, I will use the iris dataset.
library(ggplot2)
g <- ggplot(iris, aes(Sepal.Width)) + geom_histogram(binwidth=.5)
g + coord_fixed(ratio = .003)
Essentially, I would like to mirror this histogram below the x-axis. Another way of thinking about the problem is to create a horizontal violin diagram with distinct bins. I've looked at the plotrix package and the ggplot2 documentation but don't find a solution in either place. I prefer to use ggplot2 but other solutions in base R, lattice or other packages will be fine.
Without your exact data, I can only provide an approximate coding solution, but it is a start for you (if you add more details, I'll be happy to help you tweak the plot). Here's the code:
library(ggplot2)
noSpp <- 3
nTime <- 10
d <- data.frame(
JulianDate = rep(1:nTime , times = noSpp),
sppAbundance = c(c(1:5, 5:1),
c(3:5, 5:1, 1:2),
c(5:1, 1:5)),
yDummy = 1,
sppName = rep(letters[1:noSpp], each = nTime))
ggplot(data = d, aes(x = JulianDate, y = yDummy, size = sppAbundance)) +
geom_line() + facet_grid( sppName ~ . ) + ylab("Species") +
xlab("Julian Date")
And here's the figure.
I want to make a bar plot where one of the values is much bigger than all other values. Is there a way of having a discontinuous y-axis? My data is as follows:
df <- data.frame(a = c(1,2,3,500), b = c('a1', 'a2','a3', 'a4'))
p <- ggplot(data = df, aes(x = b, y = a)) + geom_bar()
p <- p + opts(axis.text.x=theme_text(angle= 90, hjust=1)) + coord_flip()
p
Is there a way that I can make my axis run from 1- 10, then 490 - 500? I can't think of any other way of plotting the data (aside from transforming it, which I don't want to do)
[Edit 2019-05-06]:
8 years later, above code needs to be amended to work with version 3.1.1 of ggplot2 in order to create the same chart:
library(ggplot2)
ggplot(df) +
aes(x = b, y = a) +
geom_col() +
coord_flip()
As noted elsewhere, this isn't something that ggplot2 will handle well, since broken axes are generally considered questionable.
Other strategies are often considered better solutions to this problem. Brian mentioned a few (faceting, two plots focusing on different sets of values). One other option that people too often overlook, particularly for barcharts, is to make a table:
Looking at the actual values, the 500 doesn't obscure the differences in the other values! For some reason tables don't get enough respect as data a visualization technique. You might object that your data has many, many categories which becomes unwieldy in a table. If so, it's likely that your bar chart will have too many bars to be sensible as well.
And I'm not arguing for tables all the time. But they are definitely something to consider if you are making barcharts with relatively few bars. And if you're making barcharts with tons of bars, you might need to rethink that anyway.
Finally, there is also the axis.break function in the plotrix package which implements broken axes. However, from what I gather you'll have to specify the axis labels and positions yourself, by hand.
Eight years later, the ggforce package offers a facet_zoom() extension which is an implementation of Hadley Wickham's suggestion to show two plots (as referenced in Brian Diggs' answer).
Zoom facet
library(ggforce)
ggplot(df) +
aes(x = b, y = a) +
geom_col() +
facet_zoom(ylim = c(0, 10))
Unfortunately, the current version 0.2.2 of ggforce throws an error with coord_flip() so only vertical bars can be shown.
The zoomed facet shows the variations of the small values but still contains the large - now cropped - a4 bar. The zoom.data parameter controls which values appear in the zoomed facet:
library(ggforce)
ggplot(df) +
aes(x = b, y = a) +
geom_col() +
facet_zoom(ylim = c(0, 10), zoom.data = ifelse(a <= 10, NA, FALSE))
Two plots
Hadley Wickham suggested
I think it's much more appropriate to show two plots - one of all the
data, and one of just the small values.
This code creates two plots
library(ggplot2)
g1 <- ggplot(df) +
aes(x = b, y = a) +
geom_col() +
coord_flip()
g2 <- ggplot(df) +
aes(x = b, y = a) +
geom_col() +
coord_flip() +
ylim(NA, 10)
which can be combined into one plot by
cowplot::plot_grid(g1, g2) # or ggpubr::ggarrange(g1, g2)
or
gridExtra::grid.arrange(g1, g2) # or egg::ggarrange(g1, g2)
Two facets
This was suggested in a comment by Chase and also by Brian Diggs in his answer who interpreted Hadley's suggestion to use
faceted plots, one with all the data, one zoomed in a particular region
but no code was supplied for this approach, so far.
As there is no simple way to scale facets separately (see related question, e.g.) the data needs to be manipulated:
library(dplyr)
library(ggplot2)
ggplot() +
aes(x = b, y = a) +
geom_col(data = df %>% mutate(subset = "all")) +
geom_col(data = df %>% filter(a <= 10) %>% mutate(subset = "small")) +
coord_flip() +
facet_wrap(~ subset, scales = "free_x")
No, not using ggplot. See the discussion in the thread at http://groups.google.com/group/ggplot2/browse_thread/thread/8d2acbfc59d2f247 where Hadley explains why it is not possible but gives a suggested alternative (faceted plots, one with all the data, one zoomed in a particular region).
Not with ggplot, but with plotrix you can easily do that:
library(plotrix)
gap.barplot(df$a, gap=c(5,495),horiz=T)
No, unfortunately not
The fear is that allowing discontinuous axes will lead to deceit of the audience. However, there are cases where not having a discontinuous axis leads to distortion.
For example, if the axis is truncated, but usually lies within some interval (say [0,1]), the audience may not notice the truncation and make distorted conclusions about the data. In this case, an explicit discontinuous axis would be more appropriate and transparent.
Compare:
An option could be using the ggbreak package using the scale_y_cut() or scale_x_cut() function. This function makes it possible to cut the ggplot object into parts with the possibility to specify which part is zoom in or zoom out. Here is a reproducible example with left plot normal and right plot with the function used:
df <- data.frame(a = c(1,2,3,500), b = c('a1', 'a2','a3', 'a4'))
library(ggplot2)
library(ggbreak)
library(patchwork)
p1 <- ggplot(df) +
aes(x = b, y = a) +
geom_col()
p2 <- ggplot(df) +
aes(x = b, y = a) +
geom_col() +
scale_y_cut(breaks=c(4, 30), which=c(1, 3), scales=c(0.5, 3))
p1 + p2
Created on 2022-08-22 with reprex v2.0.2
As you can see from the example, some parts are zoomed in and zoomed out. This can be changed by using different arguments.
Arguments used:
breaks:
a numeric or numeric vector, the points to be divided
which:
integer, the position of subplots to scales, started from left to
right or top to bottom.
scales:
numeric, relative width or height of subplots.
To change the space between the subplots, you can use the argument space.
For some extra information and examples check this tutorial.
A clever ggplot solution is provided by Jörg Steinkamp, using facet_grid. Simplified, it is something like this:
library("tidyverse")
df <- data.frame(myLetter=LETTERS[1:4], myValue=runif(12) + rep(c(4,0,0),2)) # cluster a few values well above 1
df$myFacet <- df$myValue > 3
(ggplot(df, aes(y=myLetter, x=myValue))
+ geom_point()
+ facet_grid(. ~ myFacet, scales="free", space="free")
+ scale_x_continuous(breaks = seq(0, 5, .25)) # this gives both facets equal interval spacing.
+ theme(strip.text.x = element_blank()) # get rid of the facet labels
)
As of 2022-06-01, we have the elegant-looking ggbreak package, which appears to answer the OP's question. Although I haven't tried it on my own data, it looks to be compatible with many or all other ggplot2 functionality. Offers differential scaling too, perhaps useful to OP's and similar uses.
library(ggplot2)
library(ggbreak)
set.seed(2019-01-19)
d <- data.frame(x = 1:20,
y = c(rnorm(5) + 4, rnorm(5) + 20, rnorm(5) + 5, rnorm(5) + 22))
p1 <- ggplot(d, aes(y, x)) + geom_col(orientation="y") +
theme_minimal()
p1 + scale_x_break(c(7, 17), scales = 1.5) + scale_x_break(c(18, 21), scales=2)
I doubt there's anything off the shelf in R, but you could show the data as a series of 3D partial cubes. 500 is only 5*10*10, so it would scale well. The exact value could be a label.
This probably should only be used if you must have a graphic representation for some reason.
One strategy is to change the axis to plot Log Scale. This way you get to reduce exponentially higher value by a factor of 10
I'd like to use ggplot2's stat_binhex() to simultaneously plot two independent variables on the same chart, each with its own color gradient using scale_colour_gradientn().
If we disregard the fact that the x-axis units do not match, a reproducible example would be to plot the following in the same image while maintaining separate fill gradients.
d <- ggplot(diamonds, aes(x=carat,y=price))+
stat_binhex(colour="white",na.rm=TRUE)+
scale_fill_gradientn(colours=c("white","blue"),name = "Frequency",na.value=NA)
try(ggsave(plot=d,filename=<some file>,height=6,width=8))
d <- ggplot(diamonds, aes(x=depth,y=price))+
stat_binhex(colour="white",na.rm=TRUE)+
scale_fill_gradientn(colours=c("yellow","black"),name = "Frequency",na.value=NA)
try(ggsave(plot=d,filename=<some other file>,height=6,width=8))
I found some conversation of a related issue in ggplot2 google groups here.
Here is another possible solution: I have taken #mnel's idea of mapping bin count to alpha transparency, and I have transformed the x-variables so they can be plotted on the same axes.
library(ggplot2)
# Transforms range of data to 0, 1.
rangeTransform = function(x) (x - min(x)) / (max(x) - min(x))
dat = diamonds
dat$norm_carat = rangeTransform(dat$carat)
dat$norm_depth = rangeTransform(dat$depth)
p1 = ggplot(data=dat) +
theme_bw() +
stat_binhex(aes(x=norm_carat, y=price, alpha=..count..), fill="#002BFF") +
stat_binhex(aes(x=norm_depth, y=price, alpha=..count..), fill="#FFD500") +
guides(fill=FALSE, alpha=FALSE) +
xlab("Range Transformed Units")
ggsave(plot=p1, filename="plot_1.png", height=5, width=5)
Thoughts:
I tried (and failed) to display a sensible color/alpha legend. Seems tricky, but should be possible given all the legend-customization features of ggplot2.
X-axis unit labeling needs some kind of solution. Plotting two sets of units on one axis is frowned upon by many, and ggplot2 has no such feature.
Interpretation of cells with overlapping colors seems clear enough in this example, but could get very messy depending on the datasets used, and the chosen colors.
If the two colors are additive complements, then wherever they overlap equally you will see a neutral gray. Where the overlap is unequal, the gray would shift to more yellow, or more blue. My colors are not quite complements, judging by the slightly pink hue of the gray overlap cells.
I think what you want goes against the principles of ggplot2 and the grammar of graphics approach more generally. Until the issue is addressed (for which I would not hold my breath), you have a couple of choices
Use facet_wrap and alpha
This is will not produce a nice legend, but takes you someway to what you want.
You can set the alpha value to scale by the computed Frequency, accessed by ..Frequency..
I don't think you can merge the legends nicely though.
library(reshape2)
# in long format
dm <- melt(diamonds, measure.var = c('depth','carat'))
ggplot(dm, aes(y = price, fill = variable, x = value)) +
facet_wrap(~variable, ncol = 1, scales = 'free_x') +
stat_binhex(aes(alpha = ..count..), colour = 'grey80') +
scale_alpha(name = 'Frequency', range = c(0,1)) +
theme_bw() +
scale_fill_manual('Variable', values = setNames(c('darkblue','yellow4'), c('depth','carat')))
Use gridExtra with grid.arrange or arrangeGrob
You can create separate plots and use gridExtra::grid.arrange to arrange on a single image.
d_carat <- ggplot(diamonds, aes(x=carat,y=price))+
stat_binhex(colour="white",na.rm=TRUE)+
scale_fill_gradientn(colours=c("white","blue"),name = "Frequency",na.value=NA)
d_depth <- ggplot(diamonds, aes(x=depth,y=price))+
stat_binhex(colour="white",na.rm=TRUE)+
scale_fill_gradientn(colours=c("yellow","black"),name = "Frequency",na.value=NA)
library(gridExtra)
grid.arrange(d_carat, d_depth, ncol =1)
If you want this to work with ggsave (thanks to #bdemarest comment below and #baptiste)
replace grid.arrange with arrangeGrob something like.
ggsave(plot=arrangeGrob(d_carat, d_depth, ncol=1), filename="plot_2.pdf", height=12, width=8)
I want to make a bar plot where one of the values is much bigger than all other values. Is there a way of having a discontinuous y-axis? My data is as follows:
df <- data.frame(a = c(1,2,3,500), b = c('a1', 'a2','a3', 'a4'))
p <- ggplot(data = df, aes(x = b, y = a)) + geom_bar()
p <- p + opts(axis.text.x=theme_text(angle= 90, hjust=1)) + coord_flip()
p
Is there a way that I can make my axis run from 1- 10, then 490 - 500? I can't think of any other way of plotting the data (aside from transforming it, which I don't want to do)
[Edit 2019-05-06]:
8 years later, above code needs to be amended to work with version 3.1.1 of ggplot2 in order to create the same chart:
library(ggplot2)
ggplot(df) +
aes(x = b, y = a) +
geom_col() +
coord_flip()
As noted elsewhere, this isn't something that ggplot2 will handle well, since broken axes are generally considered questionable.
Other strategies are often considered better solutions to this problem. Brian mentioned a few (faceting, two plots focusing on different sets of values). One other option that people too often overlook, particularly for barcharts, is to make a table:
Looking at the actual values, the 500 doesn't obscure the differences in the other values! For some reason tables don't get enough respect as data a visualization technique. You might object that your data has many, many categories which becomes unwieldy in a table. If so, it's likely that your bar chart will have too many bars to be sensible as well.
And I'm not arguing for tables all the time. But they are definitely something to consider if you are making barcharts with relatively few bars. And if you're making barcharts with tons of bars, you might need to rethink that anyway.
Finally, there is also the axis.break function in the plotrix package which implements broken axes. However, from what I gather you'll have to specify the axis labels and positions yourself, by hand.
Eight years later, the ggforce package offers a facet_zoom() extension which is an implementation of Hadley Wickham's suggestion to show two plots (as referenced in Brian Diggs' answer).
Zoom facet
library(ggforce)
ggplot(df) +
aes(x = b, y = a) +
geom_col() +
facet_zoom(ylim = c(0, 10))
Unfortunately, the current version 0.2.2 of ggforce throws an error with coord_flip() so only vertical bars can be shown.
The zoomed facet shows the variations of the small values but still contains the large - now cropped - a4 bar. The zoom.data parameter controls which values appear in the zoomed facet:
library(ggforce)
ggplot(df) +
aes(x = b, y = a) +
geom_col() +
facet_zoom(ylim = c(0, 10), zoom.data = ifelse(a <= 10, NA, FALSE))
Two plots
Hadley Wickham suggested
I think it's much more appropriate to show two plots - one of all the
data, and one of just the small values.
This code creates two plots
library(ggplot2)
g1 <- ggplot(df) +
aes(x = b, y = a) +
geom_col() +
coord_flip()
g2 <- ggplot(df) +
aes(x = b, y = a) +
geom_col() +
coord_flip() +
ylim(NA, 10)
which can be combined into one plot by
cowplot::plot_grid(g1, g2) # or ggpubr::ggarrange(g1, g2)
or
gridExtra::grid.arrange(g1, g2) # or egg::ggarrange(g1, g2)
Two facets
This was suggested in a comment by Chase and also by Brian Diggs in his answer who interpreted Hadley's suggestion to use
faceted plots, one with all the data, one zoomed in a particular region
but no code was supplied for this approach, so far.
As there is no simple way to scale facets separately (see related question, e.g.) the data needs to be manipulated:
library(dplyr)
library(ggplot2)
ggplot() +
aes(x = b, y = a) +
geom_col(data = df %>% mutate(subset = "all")) +
geom_col(data = df %>% filter(a <= 10) %>% mutate(subset = "small")) +
coord_flip() +
facet_wrap(~ subset, scales = "free_x")
No, not using ggplot. See the discussion in the thread at http://groups.google.com/group/ggplot2/browse_thread/thread/8d2acbfc59d2f247 where Hadley explains why it is not possible but gives a suggested alternative (faceted plots, one with all the data, one zoomed in a particular region).
Not with ggplot, but with plotrix you can easily do that:
library(plotrix)
gap.barplot(df$a, gap=c(5,495),horiz=T)
No, unfortunately not
The fear is that allowing discontinuous axes will lead to deceit of the audience. However, there are cases where not having a discontinuous axis leads to distortion.
For example, if the axis is truncated, but usually lies within some interval (say [0,1]), the audience may not notice the truncation and make distorted conclusions about the data. In this case, an explicit discontinuous axis would be more appropriate and transparent.
Compare:
An option could be using the ggbreak package using the scale_y_cut() or scale_x_cut() function. This function makes it possible to cut the ggplot object into parts with the possibility to specify which part is zoom in or zoom out. Here is a reproducible example with left plot normal and right plot with the function used:
df <- data.frame(a = c(1,2,3,500), b = c('a1', 'a2','a3', 'a4'))
library(ggplot2)
library(ggbreak)
library(patchwork)
p1 <- ggplot(df) +
aes(x = b, y = a) +
geom_col()
p2 <- ggplot(df) +
aes(x = b, y = a) +
geom_col() +
scale_y_cut(breaks=c(4, 30), which=c(1, 3), scales=c(0.5, 3))
p1 + p2
Created on 2022-08-22 with reprex v2.0.2
As you can see from the example, some parts are zoomed in and zoomed out. This can be changed by using different arguments.
Arguments used:
breaks:
a numeric or numeric vector, the points to be divided
which:
integer, the position of subplots to scales, started from left to
right or top to bottom.
scales:
numeric, relative width or height of subplots.
To change the space between the subplots, you can use the argument space.
For some extra information and examples check this tutorial.
A clever ggplot solution is provided by Jörg Steinkamp, using facet_grid. Simplified, it is something like this:
library("tidyverse")
df <- data.frame(myLetter=LETTERS[1:4], myValue=runif(12) + rep(c(4,0,0),2)) # cluster a few values well above 1
df$myFacet <- df$myValue > 3
(ggplot(df, aes(y=myLetter, x=myValue))
+ geom_point()
+ facet_grid(. ~ myFacet, scales="free", space="free")
+ scale_x_continuous(breaks = seq(0, 5, .25)) # this gives both facets equal interval spacing.
+ theme(strip.text.x = element_blank()) # get rid of the facet labels
)
As of 2022-06-01, we have the elegant-looking ggbreak package, which appears to answer the OP's question. Although I haven't tried it on my own data, it looks to be compatible with many or all other ggplot2 functionality. Offers differential scaling too, perhaps useful to OP's and similar uses.
library(ggplot2)
library(ggbreak)
set.seed(2019-01-19)
d <- data.frame(x = 1:20,
y = c(rnorm(5) + 4, rnorm(5) + 20, rnorm(5) + 5, rnorm(5) + 22))
p1 <- ggplot(d, aes(y, x)) + geom_col(orientation="y") +
theme_minimal()
p1 + scale_x_break(c(7, 17), scales = 1.5) + scale_x_break(c(18, 21), scales=2)
I doubt there's anything off the shelf in R, but you could show the data as a series of 3D partial cubes. 500 is only 5*10*10, so it would scale well. The exact value could be a label.
This probably should only be used if you must have a graphic representation for some reason.
One strategy is to change the axis to plot Log Scale. This way you get to reduce exponentially higher value by a factor of 10