I want to draw boxplots with the number of observations on top. The problem is that depending on the information and the outliers, the y-axis changes. For that reason, I want to change the limits of scale_y_continuous automatically. Is it possible to do this?
This is a reproducible example:
library(dplyr)
library(ggplot2)
myFreqs <- mtcars %>%
group_by(cyl, am) %>%
summarise(Freq = n())
myFreqs
p <- ggplot(mtcars, aes(factor(cyl), drat, fill=factor(am))) +
stat_boxplot(geom = "errorbar") +
geom_boxplot() +
stat_summary(geom = 'text', label = paste("n = ", myFreqs$Freq), fun = max, position = position_dodge(width = 0.77), vjust=-1)
p
The idea is to increase at least +1 to the maximum value of the plot with the highest y-axis value (in the case explained above, it would be the second boxplot with n=8)
I have tried to change the y-axis with scale_y_continuous like this:
p <- p + scale_y_continuous(limits = c(0, 5.3))
p
However, I don't want to put the limits myself, I want to find a way to modify the limits according to the plots that I have. (Because... what if the information changes?).
Is there a way to do something like this? With min and max --> scale_y_continuous(limits = c(min(x), max(x)))
Thanks very much in advance
Thanks to #teunbrand and #caldwellst I got the solution that I needed it.
There are 3 solutions that work perfectly:
1-
p + scale_y_continuous(limits = function(x){
c(min(x), (max(x)+0.1))
})
p
2-
library(tidyverse)
p + scale_y_continuous(limits = ~ c(min(.x), max(.x) + 0.1))
3-
p + scale_y_continuous(limits = function(x){
c(min(x), ceiling(max(x) * 1.1))
})
I'm looking to add a legend to my plot, for the moment the code I wrote is:
plot(allEffects(covid.lm, residuals=T), # plot with countries on graph
band.colors="grey2",
residuals.color=adjustcolor("steelblue3",alpha.f=0.5),
residuals.pch=16, smooth.residuals=F,
id = list(n=length(d$COUNTRY), cex=0.5))
Basically, I added numbers to the points in the plot (for which I created a linear model covid.lm, that done I'd need to add a legend for those points (that is a list of countries). Thanks in advance.
library(ggplot2)
data(iris)
m <- lm(Petal.Length ~ Sepal.Length, data = iris)
iris$Fitted <- predict(m)
iris$Species_num <- as.numeric(iris$Species)
ggplot(iris, aes(x = Petal.Length, y = Fitted)) +
geom_point(aes(color = as.factor(Species_num))) +
geom_text(aes(label = Species_num), hjust = 1.1, vjust = 1.1) +
labs(title = "Residuals", x = "Observed", y = "Fitted") +
guides(color=guide_legend(title="New Legend Title"))
Created on 2022-04-08 by the reprex package (v2.0.1)
I want to give the points relating to the first 130 x-axis values a different colour than the rest (up to 250). So basically, divide the points vertically with two different colours. Is this possible and how would you go about it?
Welcome to SO!
I would use ggplot2
Here are some examples:
library(ggplot2)
ggplot(mtcars,aes(hp,mpg,color = mpg < 20)) +
geom_point()
ggplot(mtcars,aes(hp,mpg,color = mpg < 20)) +
geom_point() +
theme(legend.position = 'none')
ggplot(mtcars,aes(hp,mpg,color = mpg < 20)) +
geom_point() +
labs(color = 'mpg less than 20')
ggplot(mtcars,aes(hp,mpg,color = mpg < 20)) +
geom_point() +
scale_color_manual(values = c('purple4','springgreen4'))
Good luck!
You can use the row_number for the colours.
library(ggplot2)
library(dplyr)
data(mpg)
mpg %>%
mutate(colour=row_number(displ)<=130) %>%
ggplot(aes(x=displ, y=cty, col=colour)) +
geom_point(show.legend=FALSE) + theme_bw()
And seems there is a tie at about 3.5.
using base R you can try
plot(iris$Sepal.Length, iris$Sepal.Width, col = rep(1:2, times = c(130, nrow(iris)-130)))
I have some data that I would like to plot a threshold on, only if the data approaches the threshold. Therefore I would like to have a horizontal line at my threshold, but not extend the y axis limits if this value wouldn't have already been included. As my data is faceted it is not feasible to pre-calculate limits and I am doing it for many different data sets so would get very messy. This question seems to be asking the same thing but the answers are not relevant to me: ggplot2: Adding a geom without affecting limits
Simple example.
library(ggplot2)
#> Warning: package 'ggplot2' was built under R version 3.5.3
ggplot(iris, aes(x = Sepal.Width, y = Sepal.Length))+geom_point()+facet_wrap(~Species, scales = "free")+geom_hline(yintercept = 7)
which gives me
But I would like this (created in paint) where the limits have not been impacted by the geom_hline
Created on 2020-01-21 by the reprex package (v0.3.0)
You can automate this by checking whether a given facet has a maximum y-value that exceeds the threshold.
threshold = 7
iris %>%
ggplot(aes(Sepal.Width, Sepal.Length)) +
geom_point() +
facet_wrap(~Species, scales = "free") +
geom_hline(data = . %>%
group_by(Species) %>%
filter(max(Sepal.Length, na.rm=TRUE) >= threshold),
yintercept = threshold)
Adapting from this post:
How can I add a line to one of the facets?
library(tidyverse)
iris %>%
ggplot(aes(x = Sepal.Width, y = Sepal.Length)) +
geom_point() +
facet_wrap(~Species, scales = "free") +
geom_hline(data = . %>% filter(Species != "setosa"), aes(yintercept = 7))
Inspired by the Q Finding the elbow/knee in a curve I started to play around with smooth.spline().
In particular, I want to visualize how the parameter df (degree of freedom) influences the approximation and the first and second derivative. Note that this Q is not about approximation but about a specific problem (or edge case) in visualisation with ggplot2.
First attempt: simple facet_grid()
library(ggplot2)
ggplot(ap, aes(x, y)) +
geom_point(data = dp, alpha = 0.2) +
geom_line() +
facet_grid(deriv ~ df, scales = "free_y", labeller = label_both) +
theme_bw()
dp is a data.table containing the data points for which an approximation is sought and ap is a data.table with the approximated data plus the derivatives (data are given below).
For each row, facet_grid() with scales = "free_y" has choosen a scale which displays all data. Unfortunately, one panel has kind of "outliers" which make it difficult to see details in the other panels. So, I want to "zoom in".
"Zoom in" using coord_cartesian()
ggplot(ap, aes(x, y)) +
geom_point(data = dp, alpha = 0.2) +
geom_line() +
facet_grid(deriv ~ df, scales = "free_y", labeller = label_both) +
theme_bw() +
coord_cartesian(ylim = c(-200, 50))
With the manually selected range, more details in the panels of row 3 have been made visible. But, the limit has been applied to all panels of the grid. So, in row 1 details hardly can been distinguished.
What I'm looking for is a way to apply coord_cartesian() with specific parameters separately to each individual panel (or group of panels, e.g., rowwise) of the grid. For instance, is it possible to manipulate the ggplot object afterwards?
Workaround: Combine individual plots with cowplot
As a workaround, we can create three separate plots and combine them afterwards using the cowplot package:
g0 <- ggplot(ap[deriv == 0], aes(x, y)) +
geom_point(data = dp, alpha = 0.2) +
geom_line() +
facet_grid(deriv ~ df, scales = "free_y", labeller = label_both) +
theme_bw()
g1 <- ggplot(ap[deriv == 1], aes(x, y)) +
geom_line() +
facet_grid(deriv ~ df, scales = "free_y", labeller = label_both) +
theme_bw() +
coord_cartesian(ylim = c(-50, 50))
g2 <- ggplot(ap[deriv == 2], aes(x, y)) +
geom_line() +
facet_grid(deriv ~ df, scales = "free_y", labeller = label_both) +
theme_bw() +
coord_cartesian(ylim = c(-200, 100))
cowplot::plot_grid(g0, g1, g2, ncol = 1, align = "v")
Unfortunately, this solution
requires to write code to create three separate plots,
duplicates strips and axes and adds whitespace which isn't available for display of the data.
Is facet_wrap() an alternative?
We can use facet_wrap() instead of facet_grid():
ggplot(ap, aes(x, y)) +
# geom_point(data = dp, alpha = 0.2) + # this line causes error message
geom_line() +
facet_wrap(~ deriv + df, scales = "free_y", labeller = label_both, nrow = 3) +
theme_bw()
Now, the y-axes of every panel are scaled individually exhibiting details of some of the panels. Unfortunately, we still can't "zoom in" into the bottom right panel because using coord_cartesian() would affect all panels.
In addition, the line
geom_point(data = dp, alpha = 0.2)
strangely causes
Error in gList(list(x = 0.5, y = 0.5, width = 1, height = 1, just = "centre", :
only 'grobs' allowed in "gList"
I had to comment this line out, so the the data points which are to be approximated are not displayed.
Data
library(data.table)
# data points
dp <- data.table(
x = c(6.6260, 6.6234, 6.6206, 6.6008, 6.5568, 6.4953, 6.4441, 6.2186,
6.0942, 5.8833, 5.7020, 5.4361, 5.0501, 4.7440, 4.1598, 3.9318,
3.4479, 3.3462, 3.1080, 2.8468, 2.3365, 2.1574, 1.8990, 1.5644,
1.3072, 1.1579, 0.95783, 0.82376, 0.67734, 0.34578, 0.27116, 0.058285),
y = 1:32,
deriv = 0)
# approximated data points and derivatives
ap <- rbindlist(
lapply(seq(2, length(dp$x), length.out = 4),
function(df) {
rbindlist(
lapply(0:2,
function(deriv) {
result <- as.data.table(
predict(smooth.spline(dp$x, dp$y, df = df), deriv = deriv))
result[, c("df", "deriv") := list(df, deriv)]
})
)
})
)
Late answer, but the following hack just occurred to me. Would it work for your use case?
Step 1. Create an alternative version of the intended plot, limiting the range of y values such that scales = "free_y" gives a desired scale range for each facet row. Also create the intended facet plot with the full data range:
library(ggplot2)
library(dplyr)
# alternate plot version with truncated data range
p.alt <- ap %>%
group_by(deriv) %>%
mutate(upper = quantile(y, 0.75),
lower = quantile(y, 0.25),
IQR.multiplier = (upper - lower) * 10) %>%
ungroup() %>%
mutate(is.outlier = y < lower - IQR.multiplier | y > upper + IQR.multiplier) %>%
mutate(y = ifelse(is.outlier, NA, y)) %>%
ggplot(aes(x, y)) +
geom_point(data = dp, alpha = 0.2) +
geom_line() +
facet_grid(deriv ~ df, scales = "free_y", labeller = label_both) +
theme_bw()
# intended plot version with full data range
p <- p.alt %+% ap
Step 2. Use ggplot_build() to generate plot data for both ggplot objects. Apply the panel parameters of the alt version onto the intended version:
p <- ggplot_build(p)
p.alt <- ggplot_build(p.alt)
p$layout$panel_params <- p.alt$layout$panel_params
rm(p.alt)
Step 3. Build the intended plot from the modified plot data, & plot the result:
p <- ggplot_gtable(p)
grid::grid.draw(p)
Note: in this example, I truncated the data range by setting all values more than 10*IQR away from the upper / lower quartile in each facet row as NA. This can be replaced by any other logic for defining outliers.