ggplot2: why are side-by-side boxplots not being plotted? - r

set.seed(3)
df <- data.frame(lambda = c(rep(0, 6), rep(1, 6), rep(1.5, 6)),
approach = rep(c(rep("A", 3), rep("B", 3)), 3),
value = rnorm(18, 0, 1))
ggplot(data = df, aes(x = lambda, y = value)) + geom_boxplot(aes(fill = approach))
I want to plot 3 sets of boxplots at lambda = 0, 1, and 1.5, respectively. Within each set are 2 boxplots, one corresponds to approach A and the other to approach B. However, the current code is only plotting two boxplots, whereas I'm looking for a total of six.

I think you want "lambda" to be a factor, e.g.
library(tidyverse)
set.seed(3)
df <- data.frame(lambda = c(rep(0, 6), rep(1, 6), rep(1.5, 6)),
approach = rep(c(rep("A", 3), rep("B", 3)), 3),
value = rnorm(18, 0, 1))
ggplot(data = df, aes(x = factor(lambda), y = value)) +
geom_boxplot(aes(fill = approach))

Related

Fix axis label length in ggplot2

library(tidyverse)
df <- data.frame(x = c(1, 2, 3, 4),
y1 = c(1, 3, 2, 4),
y2 = c(3000, 2000, 4000, 1000))
label_1 = c("1", "2", "3") %>% format(width = 5, justify = "right")
label_2 = c("1000", "2000", "3000") %>% format(width = 5, justify = "right")
p1 <- df %>% ggplot(aes(x = x, y = y1))+
geom_line()+
scale_y_continuous(breaks = c(1, 2, 3),
labels = label_1)
p2 <- df %>% ggplot(aes(x = x, y = y2))+
geom_line()+
scale_y_continuous(breaks = c(1000, 2000, 3000),
labels = label_2)
p1
p2
In the above code, I try to make p1 and p2 to have the same length of y-axis label, by format(width = 5). In the actual graph, labels in p1 are still shorter than those in p2.
By trial and error, I get the right length when setting width =8.
label_3 = c("1", "2", "3") %>% format(width = 8, justify = "right")
p1 <- df %>% ggplot(aes(x = x, y = y1))+
geom_line()+
scale_y_continuous(breaks = c(1, 2, 3),
labels = label_3)
p1
Could someone please explain this or guide me to the relevant previous posts?
The regular space does not have the same width as a digits (in non-monospaced fonts). Unicode has a 'figure space' that has 'tabular width', i.e. the same width as digits. If you make a helper function that replaces spaces with the tabular widths (\u2007), both plots should have the same size for their y-axis. The function:
pad_numbers <- function(x, width, justify = "right") {
x <- format(x, width = width, justify = justify)
gsub(" ", "\u2007", x)
}
In action:
library(tidyverse)
df <- data.frame(x = c(1, 2, 3, 4),
y1 = c(1, 3, 2, 4),
y2 = c(3000, 2000, 4000, 1000))
label_1 = c("1", "2", "3") %>% pad_numbers(width = 5, justify = "right")
label_2 = c("1000", "2000", "3000") %>% pad_numbers(width = 5, justify = "right")
p1 <- df %>% ggplot(aes(x = x, y = y1))+
geom_line()+
scale_y_continuous(breaks = c(1, 2, 3),
labels = label_1)
p2 <- df %>% ggplot(aes(x = x, y = y2))+
geom_line()+
scale_y_continuous(breaks = c(1000, 2000, 3000),
labels = label_2)
p1
p2
Created on 2022-02-17 by the reprex package (v2.0.1)
Lastly, if you want the y-axis to have the same width for plot composition purposes, I recommend the {ragg} package, which does a good job at aligning the panels between two plots. This wouldn't require the helper function we wrote at the beginning.

R facet_wrap and geom_density with multiple groups

Here's my dataframe:
df <- data.frame(state = sample(c(0, 1), replace = TRUE, size = 100),
X1 = rnorm(100, 0, 1),
X2 = rnorm(100, 1, 2),
X3 = rnorm(100, 2, 3))
What I would like to do is to plot for each variable X1, X2, X3 two densities/histograms (given the value of state) on the same plot BUT in such a way that all of the plots are on the same facet. I've done these things separately:
ggplot() +
geom_density(data = df, aes(x = X1, group = state, fill = state), alpha = 0.5, adjust = 2) +
xlab("X1") +
ylab("Density")
ggplot(gather(df[df$state == 0, 2:4]), aes(value)) +
geom_density() +
facet_wrap(~key, scales = 'free_x')
but I struggle to make it work together.
I'm assuming that you want the three facets for variables X1, X2 and X3, each with two curves filled by state.
You'll need to convert state to a factor, to make it a categorical variable, using dplyr::mutate(). I would also use the newer tidyr::pivot_longer() instead of gather: this will generate columns name + value by default.
Your data but with a seed to make it reproducible and named df1:
set.seed(1001)
df1 <- data.frame(state = sample(c(0, 1), replace = TRUE, size = 100),
X1 = rnorm(100, 0, 1),
X2 = rnorm(100, 1, 2),
X3 = rnorm(100, 2, 3))
The plot:
library(dplyr)
library(tidyr)
library(ggplot2)
df1 %>%
pivot_longer(-state) %>%
mutate(state = as.factor(state)) %>%
ggplot(aes(value)) +
geom_density(aes(fill = state), alpha = 0.5) +
facet_wrap(~name)
Result:

Using geom_ridgeline with a log y-axis

I am trying to visualise timeseries data, and thought the ggridges package would be useful for this. However some of my data needs to be plotted on a log-scale. Is there a way to do this?
I tried it using y = 0.001 instead of 0, as y = zero fails, but then the heights are not correct. This can be seen when you plot the points as well.
Thanks
Example below:
data <- data.frame(x = 1:5, y = rep(0.001, 5), height = c(0.001, 0.1, 3, 300, 4))
ggplot(data) +
geom_ridgeline(aes(x, y, height = height),fill = "lightblue") +
scale_y_log10() +
geom_point(aes(x=x, y=height))
Hopefully this will give you a lead towards solving your problem.
Using an example from ggridges (https://wilkelab.org/ggridges/articles/introduction.html), I added +1 to avoid zeros (and thus Inf) when taking log10
library(ggridges)
d <- data.frame(
x = rep(1:5, 3),
y = c(rep(0, 5), rep(1, 5), rep(2, 5)),
height = c(0, 1, 3, 4, 0, 1, 2, 3, 5, 4, 0, 5, 4, 4, 1)
)
ggplot(d, aes(x, (y + 1), height = height, group = y)) +
geom_ridgeline(fill = "lightblue")+
scale_y_log10() +
annotation_logticks(sides = "l")
Generates:

Color each facet by different variable value

I have a data frame like the following:
df = data.frame(x = runif(100, 0, 1),
y = runif(100, 1, 2),
var1 = runif(100, 0, 1),
var2 = runif(100, 0, 1),
var3 = rep(c("a", "b"), 50))
I want to make a faceted plot in ggplot2 that plots the same x vs y in each facet (scatterplot), but colors by the values of var1, var2, and var3. In this case, there would only be 3 facets, one for each of the coloring columns.
How could this be done?
plots = lapply(3:5, function(i){
dt = df[,c(1, 2, i)]
ggplot(data = dt, aes_string(x = names(dt)[1],
y = names(dt)[2],
color = names(dt[3]))) +
geom_point()
})
library(gridExtra)
do.call(function(...){
grid.arrange (..., ncol = 3)},
plots)

ggplot adjust size of legend key values

I am having problems finding a way to adjust the value of key legends. In the example below count ranges from 3 to 500, however the legend only ranges from 100 to 500. This is understandable, though I would like to change the values of the legend so there is a size that corresponds with a count of 3.
So in sum I would like to find a way to adjust the key values to correspond with count values I select. Is this possible?
library(ggplot2)
df <- data.frame(x = c(1, 2, 3, 4, 5, 6),
y = c(4, 2, 6, 1, 7, 7),
count = c(3, 100, 200, 300, 400, 500))
plt <- ggplot() +
geom_point(data = df,
aes(x = x, y = y, size = count))
Credit to this answer goes to aosmith.
Below is the correct code.
library(ggplot2)
df <- data.frame(x = c(1, 2, 3, 4, 5, 6),
y = c(4, 2, 6, 1, 7, 7),
count = c(3, 100, 200, 300, 400, 500))
plt <- ggplot() +
geom_point(data = df,
aes(x = x, y = y, size = count)) +
scale_size_continuous(breaks = c(3, 100, 200, 500))

Resources