How can I flip and then zoom in on a boxplot? - r

Consider the following code:
library(ggplot2)
ggplot(diamonds, aes("", price)) + geom_boxplot() + coord_flip()
After flipping the box plot, how can I zoom in to c(0,7000) on price (which is the new x-axis)?
I feel like it has something to do with coord_cartesian(ylim=c(0, 7000)), but this doesn't seem to work in conjunction with coord_flip().

Here is my solution:
ggplot(diamonds, aes("", price)) +
geom_boxplot() +
coord_flip(ylim=c(0, 7000))
Just combine the ylim command as argument in coord_flip().

You can use scale_y_continuous():
library(ggplot2)
ggplot(diamonds, aes("", price)) +
geom_boxplot() +
coord_flip() +
scale_y_continuous(limits = c(0, 7000))
Remember that coord_flip() just rotates the plot, hence you call scale_ on the y axis, which is what you specify price as. I usually like to call it last for that reason: to help limit confusion over which axis is which!

I think you need to manually compute the boxplot statistics and plot these.
# Compute summary statistics with max (y100) set to cutoff (7000)
df <- data.frame(x = 1,
y0 = min(diamonds$price),
y25 = quantile(diamonds$price, 0.25),
y50 = median(diamonds$price),
y75 = quantile(diamonds$price, 0.75),
y100 = 7000
)
ggplot(df, aes(x)) +
geom_boxplot(aes(ymin = y0, lower = y25, middle = y50, upper = y75, ymax = y100),
stat = "identity") +
coord_flip()

Related

How to flip a geom_area to be under the line when using scale_y_reverse()

I had to flip the axis of my line, but still need the geom_area to be under the curve. However I cannot figure out how to do so.
This is the line of code I tried
ggplot(PalmBeachWell, aes(x=Date, y=Depth.to.Water.Below.Land.Surface.in.ft.)) +
geom_area(position= "identity", fill='lightblue') +
theme_classic() +
geom_line(color="blue") +
scale_y_reverse()
and here is what i got
One option would be to use a geom_ribbon to fill the area above the curve which after applying scale_y_reverse will result in a fill under the curve.
Using some fake example data based on the ggplot2::economics dataset:
library(ggplot2)
PalmBeachWell <- economics[c("date", "psavert")]
names(PalmBeachWell) <- c("Date", "Depth.to.Water.Below.Land.Surface.in.ft.")
ggplot(PalmBeachWell, aes(x = Date, y = Depth.to.Water.Below.Land.Surface.in.ft.)) +
geom_ribbon(aes(ymin = Depth.to.Water.Below.Land.Surface.in.ft., ymax = Inf),
fill = "lightblue"
) +
geom_line(color = "blue") +
scale_y_reverse() +
theme_classic()

Calculate axis tick locations based on data in faceted plot

I have an issue where I would like to calculate locations of y-axis labels in a large plot mad with facet_grid(). Let me show you what I mean using the mpg dataset.
require(tidyverse)
ggplot(mpg, aes(x = displ)) +
geom_point(aes(y = hwy)) +
facet_grid(drv ~ class, scales = "free")
You will notice that both axes use variable numbers of labels. Ignoring problems with the x-axis, I am interested in labelling only three values on the y-axis: 0, half of max value, and max value. It makes sense in my use-case, so here is what I tried.
ggplot(mpg, aes(x = displ)) +
geom_point(aes(y = hwy)) +
facet_grid(drv ~ class, scales = "free") +
geom_blank(aes(y = 0)) + # extends y-axis to 0
scale_y_continuous(expand = expansion(mult = c(0, 0.1)), # prevents ggplot2 from extending beyond y = 0
n.breaks = 3) # Three axis labels, please.
The plot correctly starts at y = 0 and labels it correctly. However, remaining labels are poorly assigned, and have labels n = 2 and n = 4 instead of n = 3 for some reason.
If I could only directly calculate the label positions!
ggplot(mpg, aes(x = displ)) +
geom_point(aes(y = hwy)) +
facet_grid(drv ~ class, scales = "free") +
geom_blank(aes(y = 0)) +
scale_y_continuous(expand = expansion(mult = c(0, 0.1)),
n.breaks = 3,
breaks = c(0, 0.5*max(hwy), 1*max(hwy))) # Maybe these formulas works?
Error in check_breaks_labels(breaks, labels) : object 'hwy' not found
I believe providing break points by this way should work, but that my syntax is bad. How do I access and work with the data underlying the plot? Alternatively, if this doesn't work, can I manually specify y-axis labels for each row of panels?
I could really use some assistance here, please.
If you want custom rules for breaks, the easiest thing is to use a function implementing those rules given the (panel) limits.
Below an example for labeling 0, the max and half-max.
library(ggplot2)
ggplot(mpg, aes(x = displ)) +
geom_point(aes(y = hwy)) +
facet_grid(drv ~ class, scales = "free") +
scale_y_continuous(expand = expansion(mult = c(0, 0.1)),
limits = c(0, NA), # <- fixed min, flexible max. replaces geom_blank
breaks = function(x){c(x[1], mean(x), x[2])})
You can remove scales free: Do you get then what you desire?
require(tidyverse)
ggplot(mpg, aes(x = displ)) +
geom_point(aes(y = hwy)) +
facet_grid(drv ~ class)

R ggplot Line With Vertical Bands

I wish to create a plot as the above with data such as this,
data1=data.frame("School"=c(1,2,3,4,5,6,7,8,9,10),
"Score"=c(80,64,79,64,64,89,69,71,61,98),
"ScoreLow"=c(65,62,62,60,60,84,54,55,55,69),
"ScoreHigh"=c(98,79,85,97,88,95,97,90,79,99))
The blue line is 'Score' and score is on the Y-AXIS and 'SChool' is on the X-AXIS. The length of the black line gets determined from 'ScoreLow' and 'ScoreHigh'
geom_errorbar would also work, in case you want to add some ticks at the edges (or leave them out, setting width=0, as below):
library(ggplot2)
data1=data.frame("School"=c(1,2,3,4,5,6,7,8,9,10),
"Score"=c(80,64,79,64,64,89,69,71,61,98),
"ScoreLow"=c(65,62,62,60,60,84,54,55,55,69),
"ScoreHigh"=c(98,79,85,97,88,95,97,90,79,99))
ggplot(data1, aes(x=School, y=Score)) + geom_line(colour="#507bc7", size=2)+
geom_errorbar(aes(ymin=ScoreLow, ymax=ScoreHigh), width=0, col="black", size=1.5) +
theme_minimal()
Created on 2020-04-10 by the reprex package (v0.3.0)
I think you are looking for a combination of geom_line() and geom_segment().
library(ggplot2)
ggplot(data1) +
geom_line(aes(x = School, y = Score), color = "blue", size = 1.5) +
geom_segment(aes(x = School, xend = School, y = ScoreLow, yend = ScoreHigh), size = 2) +
scale_x_continuous(breaks = 1:10) +
scale_y_continuous(limits = c(0, 100), breaks = 0:10 * 10) +
theme_minimal()
Need to probably play around a bit to get it how you want it.

How to break axis in R/ggplot2? [duplicate]

I'm generating plots for some data, but the number of ticks is too small, I need more precision on the reading.
Is there some way to increase the number of axis ticks in ggplot2?
I know I can tell ggplot to use a vector as axis ticks, but what I want is to increase the number of ticks, for all data. In other words, I want the tick number to be calculated from the data.
Possibly ggplot do this internally with some algorithm, but I couldn't find how it does it, to change according to what I want.
You can override ggplots default scales by modifying scale_x_continuous and/or scale_y_continuous. For example:
library(ggplot2)
dat <- data.frame(x = rnorm(100), y = rnorm(100))
ggplot(dat, aes(x,y)) +
geom_point()
Gives you this:
And overriding the scales can give you something like this:
ggplot(dat, aes(x,y)) +
geom_point() +
scale_x_continuous(breaks = round(seq(min(dat$x), max(dat$x), by = 0.5),1)) +
scale_y_continuous(breaks = round(seq(min(dat$y), max(dat$y), by = 0.5),1))
If you want to simply "zoom" in on a specific part of a plot, look at xlim() and ylim() respectively. Good insight can also be found here to understand the other arguments as well.
Based on Daniel Krizian's comment, you can also use the pretty_breaks function from the scales library, which is imported automatically:
ggplot(dat, aes(x,y)) + geom_point() +
scale_x_continuous(breaks = scales::pretty_breaks(n = 10)) +
scale_y_continuous(breaks = scales::pretty_breaks(n = 10))
All you have to do is insert the number of ticks wanted for n.
A slightly less useful solution (since you have to specify the data variable again), you can use the built-in pretty function:
ggplot(dat, aes(x,y)) + geom_point() +
scale_x_continuous(breaks = pretty(dat$x, n = 10)) +
scale_y_continuous(breaks = pretty(dat$y, n = 10))
You can supply a function argument to scale, and ggplot will use
that function to calculate the tick locations.
library(ggplot2)
dat <- data.frame(x = rnorm(100), y = rnorm(100))
number_ticks <- function(n) {function(limits) pretty(limits, n)}
ggplot(dat, aes(x,y)) +
geom_point() +
scale_x_continuous(breaks=number_ticks(10)) +
scale_y_continuous(breaks=number_ticks(10))
Starting from v3.3.0, ggplot2 has an option n.breaks to automatically generate breaks for scale_x_continuous and scale_y_continuous
library(ggplot2)
plt <- ggplot(mtcars, aes(x = mpg, y = disp)) +
geom_point()
plt +
scale_x_continuous(n.breaks = 5)
plt +
scale_x_continuous(n.breaks = 10) +
scale_y_continuous(n.breaks = 10)
Additionally,
ggplot(dat, aes(x,y)) +
geom_point() +
scale_x_continuous(breaks = seq(min(dat$x), max(dat$x), by = 0.05))
Works for binned or discrete scaled x-axis data (I.e., rounding not necessary).
A reply to this question and How set labels on the X and Y axises by equal intervals in R ggplot?
mtcars %>%
ggplot(aes(mpg, disp)) +
geom_point() +
geom_smooth() +
scale_y_continuous(limits = c(0, 500),
breaks = seq(0,500,50)) +
scale_x_continuous(limits = c(0,40),
breaks = seq(0,40,5))

R - ggplot2: geom_area loses its fill if limits are defined to max and min values from a data.frame

I am trying to reproduce a sparkline with ggplot2 like the one at the bottom of this image:
Using the following code I get the result displayed at the end of the code.
Note: My actual data.frame has only 2 rows. Therefore the result looks like a single line.
# Create sparkline for MM monthly
# sparkline(dailyMM2.aggregate.monthly$Count, type = 'line')
p <- ggplot(dailyMM2.aggregate.monthly, aes(x=seq(1:nrow(dailyMM2.aggregate.monthly)), y=Count)) +
geom_area(fill="#83CAF5") +
geom_line(color = "#2C85BB", size = 1.5) +
scale_x_continuous(expand=c(0,0)) +
scale_y_continuous(expand=c(0,0))
p + theme(axis.line=element_blank(),axis.text.x=element_blank(),
axis.text.y=element_blank(),axis.ticks=element_blank(),
axis.title.x=element_blank(),
axis.title.y=element_blank(),legend.position="none",
panel.background=element_blank(),panel.border=element_blank(),panel.grid.major=element_blank(),
panel.grid.minor=element_blank(),plot.background=element_blank())
However, as I try to only show trends with the sparkline and, therefore, absolute values aren't relevant for me, I have to adapt the config of the ggplot to limit the visible area between the min and max of my axis.y. I do it using the limits option:
# Create sparkline for MM monthly
# sparkline(dailyMM2.aggregate.monthly$Count, type = 'line')
p <- ggplot(dailyMM2.aggregate.monthly, aes(x=seq(1:nrow(dailyMM2.aggregate.monthly)), y=Count)) +
geom_area(fill="#83CAF5") +
geom_line(color = "#2C85BB", size = 1.5) +
scale_x_continuous(expand=c(0,0)) +
scale_y_continuous(expand=c(0,0), limits = c(min(dailyMM2.aggregate.monthly$Count)-100, max(dailyMM2.aggregate.monthly$Count)+100))
p + theme(axis.line=element_blank(),axis.text.x=element_blank(),
axis.text.y=element_blank(),axis.ticks=element_blank(),
axis.title.x=element_blank(),
axis.title.y=element_blank(),legend.position="none",
panel.background=element_blank(),panel.border=element_blank(),panel.grid.major=element_blank(),
panel.grid.minor=element_blank(),plot.background=element_blank())
However, the result is not like expected, as the whole geom_area's fill dissapears, as shown in the folllowing image:
Can anyone shed light why this behaviour is happening and maybe help me with a proper way to solve this problem?
If you check ?geom_area you will note that the minimum is fixed to 0. It might be easier to use geom_ribbon. It has a ymin aesthetic. Set the maximum y value using limits or coord_cartesian.
library(reshape2)
library(ggplot2)
# Some data
df=data.frame(year = rep(2010:2014, each = 4),
quarter=rep(c("Q1","Q2","Q3","Q4"),5),
da=c(46,47,51,50,56.3,53.6,55.8,58.9,61.0,63,58.8,62.5,59.5,61.7,60.6,63.9,68.4,62.2,62,70.4))
df.m <- melt(data = df, id.vars = c("year", "quarter"))
ymin <- min(df.m$value)
ymax <- max(df.m$value)
ggplot(data = df.m, aes(x = interaction(quarter,year), ymax = value, group = variable)) +
geom_ribbon(aes(ymin = ymin), fill = "#83CAF5") +
geom_line(aes(y = value), size = 1.5, colour = "#2C85BB") +
coord_cartesian(ylim = c(ymin, ymax)) +
scale_y_continuous(expand = c(0,0)) +
scale_x_discrete(expand = c(0,0)) +
theme_void()

Resources