Control Discrete Tick Labels in ggplot2 (scale_x_discrete) [duplicate] - r

This question already has answers here:
ggplot2: display every nth value on discrete axis
(2 answers)
Closed 1 year ago.
On a continuous scale, I can reduce the density of the tick labels using breaks and get nice control over their density in a flexible fashion using scales::pretty_breaks(). However, I can't figure out how to achieve something similar with a discrete scale. Specifically, if my discrete labels are letters, then let's say that I want to show every other one to clean up the graph. Is there an easy, systematic way to do this?
I have a hack that works (see below) but looking for something more automatic and elegant.
library(tidyverse)
# make some dummy data
dat <-
matrix(sample(100),
nrow = 10,
dimnames = list(letters[1:10], LETTERS[1:10])) %>%
as.data.frame() %>%
rownames_to_column("row") %>%
pivot_longer(-row, names_to = "column", values_to = "value")
# default plot has all labels on discrete axes
dat %>%
ggplot(aes(row, column)) +
geom_tile(aes(fill = value))
# desired plot would look like following:
ylabs <- LETTERS[1:10][c(T, NA)] %>% replace_na("")
xlabs <- letters[1:10][c(T, NA)] %>% replace_na("")
# can force desired axis text density but it's an ugly hack
dat %>%
ggplot(aes(row, column)) +
geom_tile(aes(fill = value)) +
scale_y_discrete(labels = ylabs) +
scale_x_discrete(labels = xlabs)
Created on 2021-12-21 by the reprex package (v2.0.1)

One option for dealing with overly-dense axis labels is to use n.dodge:
ggplot(dat, aes(row, column)) +
geom_tile(aes(fill = value)) +
scale_x_discrete(guide = guide_axis(n.dodge = 2)) +
scale_y_discrete(guide = guide_axis(n.dodge = 2))
Alternatively, if you are looking for a way to reduce your use of xlabs and do it more programmatically, then we can pass a function to scale_x_discrete(breaks=):
everyother <- function(x) x[seq_along(x) %% 2 == 0]
ggplot(dat, aes(row, column)) +
geom_tile(aes(fill = value)) +
scale_x_discrete(breaks = everyother) +
scale_y_discrete(breaks = everyother)

Related

Adding a single label per group in ggplot with stat_summary and text geoms

I would like to add counts to a ggplot that uses stat_summary().
I am having an issue with the requirement that the text vector be the same length as the data.
With the examples below, you can see that what is being plotted is the same label multiple times.
The workaround to set the location on the y axis has the effect that multiple labels are stacked up. The visual effect is a bit strange (particularly when you have thousands of observations) and not sufficiently professional for my purposes. You will have to trust me on this one - the attached picture doesn't fully convey the weirdness of it.
I was wondering if someone else has worked out another way. It is for a plot in shiny that has dynamic input, so text cannot be overlaid in a hardcoded fashion.
I'm pretty sure ggplot wasn't designed for the kind of behaviour with stat_summary that I am looking for, and I may have to abandon stat_summary and create a new summary dataframe, but thought I would first check if someone else has some wizardry to offer up.
This is the plot without setting the y location:
library(dplyr)
library(ggplot2)
df_x <- data.frame("Group" = c(rep("A",1000), rep("B",2) ),
"Value" = rnorm(1002))
df_x <- df_x %>%
group_by(Group) %>%
mutate(w_count = n())
ggplot(df_x, aes(x = Group, y = Value)) +
stat_summary(fun.data="mean_cl_boot", size = 1.2) +
geom_text(aes(label = w_count)) +
coord_flip() +
theme_classic()
and this is with my hack
ggplot(df_x, aes(x = Group, y = Value)) +
stat_summary(fun.data="mean_cl_boot", size = 1.2) +
geom_text(aes(y = 1, label = w_count)) +
coord_flip() +
theme_classic()
Create a df_text that has the grouped info for your labels. Then use annotate:
library(dplyr)
library(ggplot2)
set.seed(123)
df_x <- data.frame("Group" = c(rep("A",1000), rep("B",2) ),
"Value" = rnorm(1002))
df_text <- df_x %>%
group_by(Group) %>%
summarise(avg = mean(Value),
n = n()) %>%
ungroup()
yoff <- 0.0
xoff <- -0.1
ggplot(df_x, aes(x = Group, y = Value)) +
stat_summary(fun.data="mean_cl_boot", size = 1.2) +
annotate("text",
x = 1:2 + xoff,
y = df_text$avg + yoff,
label = df_text$n) +
coord_flip() +
theme_classic()
I found another way which is a little more robust for when the plot is dynamic in its ordering and filtering, and works well for faceting. More robust, because it uses stat_summary for the text.
library(dplyr)
library(ggplot2)
df_x <- data.frame("Group" = c(rep("A",1000), rep("B",2) ),
"Value" = rnorm(1002))
counts_df <- function(y) {
return( data.frame( y = 1, label = paste0('n=', length(y)) ) )
}
ggplot(df_x, aes(x = Group, y = Value)) +
stat_summary(fun.data="mean_cl_boot", size = 1.2) +
coord_flip() +
theme_classic()
p + stat_summary(geom="text", fun.data=counts_df)

Adding labels to individual % inside geom_bar() using R / ggplot2 [duplicate]

This question already has answers here:
Add percentage labels to a stacked barplot
(2 answers)
Closed 3 years ago.
bgraph <- ggplot(data = data, aes(x = location)) +
geom_bar(aes(fill = success))
success is a percentage calculated as a factor of 4 categories with the varying 4 outcomes of the data set. I could separately calculate them easily, but as the ggplot is currently constituted, they are generated by the geom_bar(aes(fill=success)).
data <- as.data.frame(c(1,1,1,1,1,1,2,2,3,3,3,3,4,4,4,4,4,4,
4,4,5,5,5,5,6,6,6,6,6,6,7,7,7,7,7))
data[["success"]] <- c("a","b","c","c","d","d","a","b","b","b","c","d",
"a","b","b","b","c","c","c","d","a","b","c","d",
"a","b","c","c","d","d","a","b","b","c","d")
names(data) <- c("location","success")
bgraph <- ggplot(data = data, aes(x = location)) +
geom_bar(aes(fill = success))
bgraph
How do I get labels over the individual percentages? More specifically, I wanted 4 individual percentages for each bar. One for yellow, light orange, orange, and red, respectively. %'s all add up to 1.
Maybe there is a way to do this in ggplot directly but with some pre-processing in dplyr, you'll be able to achieve your desired output.
library(dplyr)
library(ggplot2)
data %>%
count(location, success) %>%
group_by(location) %>%
mutate(n = n/sum(n) * 100) %>%
ggplot() + aes(x = location, n, fill = success,label = paste0(round(n, 2), "%")) +
geom_bar(stat = "identity") +
geom_text(position=position_stack(vjust=0.5))
How about creating a summary frame with the relative frequencies within location and then using that with geom_col() and geom_text()?
# Create summary stats
tots <-
data %>%
group_by(location,success) %>%
summarise(
n = n()
) %>%
mutate(
rel = round(100*n/sum(n)),
)
# Plot
ggplot(data = tots, aes(x = location, y = n)) +
geom_col(aes(fill = fct_rev(success))) + # could only get it with this reversed
geom_text(aes(label = rel), position = position_stack(vjust = 0.5))
OUTPUT:

is it possible to ggplot grouped partial boxplots w/o facets w/ a single `geom_boxplot()`?

I needed to add some partial boxplots to the following plot:
library(tidyverse)
foo <- tibble(
time = 1:100,
group = sample(c("a", "b"), 100, replace = TRUE) %>% as.factor()
) %>%
group_by(group) %>%
mutate(value = rnorm(n()) + 10 * as.integer(group)) %>%
ungroup()
foo %>%
ggplot(aes(x = time, y = value, color = group)) +
geom_point() +
geom_smooth(se = FALSE)
I would add a grid of (2 x 4 = 8) boxplots (4 per group) to the plot above. Each boxplot should consider a consecutive selection of 25 (or n) points (in each group). I.e., the firsts two boxplots represent the points between the 1st and the 25th (one boxplot below for the group a, and one boxplot above for the group b). Next to them, two other boxplots for the points between the 26th and 50th, etcetera. If they are not in a perfect grid (which I suppose would be both more challenging to obtain and uglier) it would be even better: I prefer if they will "follow" their corresponding smooth line!
That all without using facets (because I have to insert them in a plot which is already facetted :-))
I tried to
bar <- foo %>%
group_by(group) %>%
mutate(cut = 12.5 * (time %/% 25)) %>%
ungroup()
bar %>%
ggplot(aes(x = time, y = value, color = group)) +
geom_point() +
geom_smooth(se = FALSE) +
geom_boxplot(aes(x = cut))
but it doesn't work.
I tried to call geom_boxplot() using group instead of x
bar %>%
ggplot(aes(x = time, y = value, color = group)) +
geom_point() +
geom_smooth(se = FALSE) +
geom_boxplot(aes(group = cut))
But it draws the boxplots without considering the groups and loosing even the colors (and add a redundant call including color = group doesn't help)
Finally, I decided to try it roughly:
bar %>%
ggplot(aes(x = time, y = value, color = group)) +
geom_point() +
geom_smooth(se = FALSE) +
geom_boxplot(data = filter(bar, group == "a"), aes(group = cut)) +
geom_boxplot(data = filter(bar, group == "b"), aes(group = cut))
And it works (maintaining even the correct colors from the main aes)!
Does someone know if it is possible to obtain it using a single call to geom_boxplot()?
Thanks!
This was interesting! I haven't tried to use geom_boxplot with a continuous x before and didn't know how it behaved. I think what is happening is that setting group overrides colour in geom_boxplot, so it doesn't respect either the inherited or repeated colour aesthetic. I think this workaround does the trick; we combine the group and cut variables into group_cut, which takes 8 different values (one for each desired boxplot). Now we can map aes(group = group_cut) and get the desired output. I don't think this is particularly intuitive and it might be worth raising it on the Github, since usually we expect aesthetics to combine nicely (e.g. combining colour and linetype works fine).
library(tidyverse)
bar <- tibble(
time = 1:100,
group = sample(c("a", "b"), 100, replace = TRUE) %>% as.factor()
) %>%
group_by(group) %>%
mutate(
value = rnorm(n()) + 10 * as.integer(group),
cut = 12.5 * ((time - 1) %/% 25), # modified this to prevent an extra boxplot
group_cut = str_c(group, cut)
) %>%
ungroup()
bar %>%
ggplot(aes(x = time, y = value, colour = group)) +
geom_point() +
geom_smooth(se = FALSE) +
geom_boxplot(aes(group = group_cut), position = "identity")
#> `geom_smooth()` using method = 'loess' and formula 'y ~ x'
Created on 2019-08-13 by the reprex package (v0.3.0)

Grouping data outside limits in histogram using ggplot2

I am trying to do a histogram zoomed on part of the data. My problem is that I would like to grup everything that is outside the range into last category "10+". Is it possible to do it using ggplot2?
Sample code:
x <- data.frame(runif(10000, 0, 15))
ggplot(x, aes(runif.10000..0..15.)) +
geom_histogram(aes(y = (..count..)/sum(..count..)), colour = "grey50", binwidth = 1) +
scale_y_continuous(labels = percent) +
coord_cartesian(xlim=c(0, 10)) +
scale_x_continuous(breaks = 0:10)
Here is how the histogram looks now:
How the histogram looks now
And here is how I would like it to look:
How the histogram should look
Probably it is possibile to do it by nesting ifelses, but as I have in my problem more cases is there a way for ggplot to do it?
You could use forcats and dplyr to efficiently categorize the values, aggregate the last "levels" and then compute the percentages before the plot. Something like this should work:
library(forcats)
library(dplyr)
library(ggplot2)
x <- data.frame(x = runif(10000, 0, 15))
x2 <- x %>%
mutate(x_grp = cut(x, breaks = c(seq(0,15,1)))) %>%
mutate(x_grp = fct_collapse(x_grp, other = levels(x_grp)[10:15])) %>%
group_by(x_grp) %>%
dplyr::summarize(count = n())
ggplot(x2, aes(x = x_grp, y = count/10000)) +
geom_bar(stat = "identity", colour = "grey50") +
scale_y_continuous(labels = percent)
However, the resulting graph is very different from your example, but I think it's correct, since we are building a uniform distribution:

ggplot2 move x-axis to top (intersect with reversed y axis at 0) [duplicate]

This question already has answers here:
Plot with reversed y-axis and x-axis on top in ggplot2
(3 answers)
Closed 3 years ago.
I want to make a figure which have reversed y-axis and x-axis at y=0.
y axis was reversed with scale_y_reverse, but x-axis stayed at the bottom.
p <- ggplot(df, aes(x= conc, y=depth, group=factor(stn), color=factor(stn)))+
geom_point(shape=1)+
geom_path(alpha=0.5)+
scale_y_reverse(limits=(c(20,0)), expand=c(0,0))+
scale_x_continuous(expand=c(0,0))
I tried the code from this post like in below, but didn't work.
p +
scale_x_continuous(guide = guide_axis(position = "top")) +
scale_y_continuous(guide = guide_axis(position = "right"))
I don't need to have two x-axis, simply just move from bottom to the top.
This is still not possible in ggplot2, but it is possible in ggvis, which combines the ggplot2 grammer with dplyr pipelines. Just use the add_axis function to put the axis at the top.
# sample data
N <- 20
df <- data.frame(conc = seq(0, N),
depth = runif(N+1, 0, 20),
stn = rep(1:4, length=N+1))
# ggplot version
require(ggplot2)
p <- ggplot(df, aes(x= conc, y=depth, group=factor(stn), color=factor(stn)))+
geom_point(shape=1)+
geom_path(alpha=0.5)+
scale_y_reverse(limits=(c(20,0)), expand=c(0,0))+
scale_x_continuous(expand=c(0,0))
p
# ggvis version
require(ggvis)
df %>% transform(stn = factor(stn)) %>%
ggvis(x = ~conc, y = ~depth, stroke = ~stn) %>%
layer_points(shape := "circle", fill := "white") %>%
layer_lines(opacity := 0.5) %>%
scale_numeric("y", reverse=TRUE, domain=c(0,20), expand=c(0,0)) %>%
scale_numeric("x", expand=c(0,0)) %>%
add_axis("x", orient = "top")
You can also use:
library(cowplot)
ggdraw(switch_axis_position(p, axis = 'x'))

Resources