Say I have the below df
library(ggplot2)
library(data.table)
# dummy data
df <- data.table(revenue = rnorm(1e4, mean = 100, sd = 1)); df
revenue
1: 100.01769
2: 98.31862
3: 99.78464
4: 100.17670
5: 99.31678
---
9996: 99.47635
9997: 98.27383
9998: 99.48378
9999: 100.06227
10000: 99.13972
and that I plot a histogram with a vline denoting the mean
# mean of x axis
x <- df[, mean(revenue)]
# plot
ggplot(df, aes(x = revenue)) +
geom_histogram(aes(y = (..count..) / sum(..count..))) + # turn count into %
geom_vline(aes(xintercept = x), col = 'red', size = 1)
The above is fine. However, when trying to add a label showing the mean, I am unsure what to enter for y in geom_label(aes(x = x, y = ?)...:
# plot
ggplot(df, aes(x = revenue)) +
geom_histogram(aes(y = (..count..) / sum(..count..))) + # turn count into %
geom_vline(aes(xintercept = x), col = 'red', size = 1) +
geom_label(aes(x = x, y = ?)
, label = x
)
I have tried small numbers such as 0.025 (after looking at the density from the previous plot) but then R gets stuck and creating the plot and never finishes. Say, I'd like to position the numeric label at where y axis = 0, then what value should I put into y = ?
Thank you
Remove aes() from geom_label, you are plotting constants, not a data.table variables . To plot where the y axis is zero, well, make y = 0.
Use hjust and vjust to position the label relative to (the constants) x, y. From the documentation:
Alignment
You can modify text alignment with the vjust and hjust aesthetics. These can either be a number between 0 (right/bottom) and 1 (top/left) or a character ("left", "middle", "right", "bottom", "center", "top"). There are two special alignments: "inward" and "outward". Inward always aligns text towards the center, and outward aligns it away from the center.
And it does take time. Once again from the documentation, my emphasis:
geom_label()
Currently geom_label() does not support the check_overlap argument or the angle aesthetic. Also, it is considerably slower than geom_text(). The fill aesthetic controls the background colour of the label.
library(ggplot2)
#library(data.table)
# dummy data
df <- data.frame(revenue = rnorm(1e4, mean = 100, sd = 1))
xbar <- mean(df$revenue)
# plot
ggplot(df, aes(x = revenue)) +
geom_histogram(aes(y = (..count..) / sum(..count..)), bins = 30) +
geom_vline(aes(xintercept = xbar), col = 'red', size = 1) +
geom_label(x = xbar, y = 0, label = round(xbar, 2),
hjust = -0.5, vjust = 1,
fill = "white", color = "black")
Created on 2022-08-31 by the reprex package (v2.0.1)
Related
I have a spatial dataset, containing values from 0 to 10. I want every number (11 numbers) to have a unique color from a gradient. The simple plot function does the trick (assigning one color to one value) but my default is ggplot, which I also want to use here. ggplot only uses ten colors for some reason and I cannot figure out why. I think I might just be using the wrong scale_x_y function.
Reproducible example:
library(raster)
#Colors
cols <- colorRampPalette(c("yellow", "red", "darkred", "black"))
# Create Raster
r <- raster(ncol=100, nrow=100)
r[] <- sample(0:10, 10000, replace = T)
# Plot simple
plot(r, col=cols(11)) # 11 colors seen here
# Convert to df
r <- as.data.frame(r, xy=T)
# Plot with ggplot
X <- ggplot(data = r) + geom_raster(aes(x = x, y = y, fill = layer), interpolate = F) +
scale_fill_stepsn(colors=cols(11), breaks=seq(0,10,1), show.limits=T)
print(X) # only 10 colors seen here
In scale_fill_stepsn the breaks are at the limits of each bin. If you have a sequence of 11 breaks, then you only have ten bins (if you have 11 fence posts you only have 10 stretches of fence between them). You need to add one to your sequence of breaks, otherwise the level 10 will be excluded:
ggplot(data = r) +
geom_raster(aes(x = x, y = y, fill = layer), interpolate = FALSE) +
scale_fill_stepsn(colors = cols(11), breaks = seq(0, 11, 1),
show.limits = TRUE) +
coord_equal()
An alternative is to use a manual scale, which I think makes more sense here. As I understand it, you are treating the fill color as a discrete variable, and the labels should correspond to the levels rather than corresponding to the break between labels as implied by scale_fill_stepsn
ggplot(data = r) +
geom_raster(aes(x = x, y = y, fill = factor(layer, 10:0))) +
scale_fill_manual(values = rev(cols(11)), name = 'layer') +
coord_equal()
EDIT
To get the legend at the bottom, try:
ggplot(data = r) +
geom_raster(aes(x = x, y = y, fill = factor(layer, 0:10))) +
scale_fill_manual(values = cols(11), name = 'layer ') +
coord_equal() +
guides(fill = guide_legend(label.position = 'top', nrow = 1)) +
theme(legend.position = 'bottom',
legend.spacing.x = unit(0, 'mm'),
legend.title = element_text(hjust = 3, vjust = 0.25))
I want to use geom_segment to replace error bars with arrows when the error exceeds a certain limit. I found a previous post that addresses this question: R - ggplot2 - Add arrow if geom_errorbar outside limits
The code works well, except that my x-axis is a factor variable instead of a numeric variable. Using position_dodge within the geom_segment statement makes the arrows start in the correct location, but it doesn't change the terminal point (xend) and all arrows point towards one central point on the x-axis instead of going straight up from the origins.
Instead of recoding the x-axis to be numeric (I will use this code to create many plots that have a range of x-axis values, with the last numeric value always ending in "+"), is there a way to correct this within geom_segment?
Code used:
data$OR.95U_u = ifelse(data$OR.95U > 10, 10 , NA)
ggplot(data, aes(x = numAlleles, y = OR, fill = Outcome)) +
geom_bar(position = position_dodge(.5), stat = "identity", width = .4, color = "black") + geom_hline(yintercept = 1, linetype = "dashed", color = "black") +
ylim(0,10) + geom_errorbar(aes(ymin=OR.95L, ymax=OR.95U), width=.2,position=position_dodge(.5)) +
theme(legend.key = element_blank(), text = element_text(size = 11.5), legend.title = element_blank()) +
labs(x = "Number of rare alleles") +
scale_fill_manual(values=c("chocolate1","coral1", "red2", "darkred")) +
geom_segment(aes(x = numAlleles, xend = numAlleles, y = OR, yend = OR.95U_u), position = position_dodge(.5), arrow = arrow(length = unit(0.3, "cm")))
Resulting figure
Ok, after investigating a bit, I didn't find a clean way of doing this, at it seems that position_dodge only change the x aes, and not the xend aes. position_nudge also don't work here, as it moves all the arrows at the same time.
So I came with a dirty way of doing this. All we need is create a new variable with the desired xend position for the geom_segment. I try and came with a semi-automtized way of doing it, for any number of levels of the coloring variable, and also created a reproducible dataset to work with, as I'm sure this could be improved a lot by people with more knowledge than me.
The code has inline comments expalining the steps:
library(tidyverse)
# dummy data (tried to replicate your plot data more or less accurately)
df <- tibble(
numAlleles = rep(c("1", "2+"), each = 4),
Outcome = rep(LETTERS[1:4], 2),
OR = c(1.4, 1.5, 1.45, 2.3, 3.8, 4.2, 4.0, 1.55),
OR.95U = c(1.9,2.1,1.9,3.8,12,12,12,12),
OR.95L = c(0.9, 0.9, 0.9, 0.8, NA, NA,NA,NA)
) %>%
mutate(
OR.95U_u = if_else(OR.95U > 10, 10, NA_real_)
)
# as it seems that position_dodge in a geom_segment only "dodge" the x aes and
# not the xend aes, we need to supply a custom xend. Also, we need to try
# to automatize the position, for more classes or different dodge widths.
# To do that, lets start with some parameters:
# position_dodge width
position_dodge_width <- 0.5
# number of bars per x axis class
bars_per_class <- length(unique(df$Outcome))
# total space available per class. In discrete vars, this is 1 au (arbitrary unit)
# for each class, but position_dodge only use the fraction of that unit
# indicated in the width parameter, so we need to calculate the real
# space available:
total_space_available <- 1 * position_dodge_width
# now we calculate the real bar width used by ggplot in these au, dividing the
# space available by the number of bars to plot for each class
bar_width_real <- (total_space_available / bars_per_class)
# position_dodge with discrete variables place bars to the left and to the right of the
# class au value, so we need to know when to place the xend to the left or
# to the right. Also, the number of bars has to be taken in to account, as
# in odd number of bars, one is located on the exact au value
if (bars_per_class%%2 == 0) {
# we need an offset, as bars are wider than arrows, and we want them in the
# middle of the bar
offset_segment <- bar_width_real / 2
# offset modifier to know when to substract or add the modifier
offset_modifier <- c(rep(-1, bars_per_class%/%2), rep(1, bars_per_class%/%2))
# we also need to know how meny bars to the left and how many to the right,
# but, the first bar of each side is already taken in account with the offset,
# so the bar modifier has to have one bar less for each side
bar_width_modifier <- c(seq((bars_per_class%/%2-1), 0), seq(0, (bars_per_class%/%2-1)))
} else {
# when odd number of columns, the offset is the same as the bar width
offset_segment <- bar_width_real
# and the modifiers have to have a middle zero value for the middle bar
offset_modifier <- c(rep(-1, bars_per_class%/%2), 0, rep(1, bars_per_class%/%2))
bar_width_modifier <- c(seq((bars_per_class%/%2-1), 0), 0, seq(0, (bars_per_class%/%2-1)))
}
# finally we create the vector of xend values needed:
df %>%
mutate(
numAlleles_u = as.numeric(as.factor(numAlleles)) + offset_modifier*(offset_segment + (bar_width_modifier*bar_width_real))
)
ggplot(df, aes(x = numAlleles, y = OR, fill = Outcome)) +
geom_bar(
position = position_dodge(position_dodge_width), stat = "identity",
width = 0.4, color = "black"
) +
geom_hline(yintercept = 1, linetype = "dashed", color = "black") +
ylim(0,10) +
geom_errorbar(
aes(ymin=OR.95L, ymax=OR.95U), width=.2,position=position_dodge(position_dodge_width)
) +
theme(
legend.key = element_blank(), text = element_text(size = 11.5),
legend.title = element_blank()
) +
labs(x = "Number of rare alleles") +
scale_fill_manual(values=c("chocolate1","coral1", "red2", "darkred")) +
geom_segment(
aes(x = numAlleles, xend = numAlleles_u, y = OR, yend = OR.95U_u),
position = position_dodge(position_dodge_width), arrow = arrow(length = unit(0.3, "cm"))
)
And the plot:
We can check that for three levels discrete variables also works:
df_three_bars <- df %>% filter(Outcome != 'D')
bars_per_class <- length(unique(df_three_bars$Outcome))
total_space_available <- 1 * position_dodge_width
bar_width_real <- (total_space_available / bars_per_class)
if (bars_per_class%%2 == 0) {
offset_segment <- bar_width_real / 2
offset_modifier <- c(rep(-1, bars_per_class%/%2), rep(1, bars_per_class%/%2))
bar_width_modifier <- c(seq((bars_per_class%/%2-1), 0), seq(0, (bars_per_class%/%2-1)))
} else {
offset_segment <- bar_width_real
offset_modifier <- c(rep(-1, bars_per_class%/%2), 0, rep(1, bars_per_class%/%2))
bar_width_modifier <- c(seq((bars_per_class%/%2-1), 0), 0, seq(0, (bars_per_class%/%2-1)))
}
df_three_bars <- df_three_bars %>%
mutate(
numAlleles_u = as.numeric(as.factor(numAlleles)) + offset_modifier*(offset_segment + (bar_width_modifier*bar_width_real))
)
ggplot(df_three_bars, aes(x = numAlleles, y = OR, fill = Outcome)) +
geom_bar(
position = position_dodge(position_dodge_width), stat = "identity",
width = 0.4, color = "black"
) +
geom_hline(yintercept = 1, linetype = "dashed", color = "black") +
ylim(0,10) +
geom_errorbar(
aes(ymin=OR.95L, ymax=OR.95U), width=.2,position=position_dodge(position_dodge_width)
) +
theme(
legend.key = element_blank(), text = element_text(size = 11.5),
legend.title = element_blank()
) +
labs(x = "Number of rare alleles") +
scale_fill_manual(values=c("chocolate1","coral1", "red2", "darkred")) +
geom_segment(
aes(x = numAlleles, xend = numAlleles_u, y = OR, yend = OR.95U_u),
position = position_dodge(position_dodge_width), arrow = arrow(length = unit(0.3, "cm"))
)
I've been struggling with one last bit of code to make this graph I'm working on really work for me and my audience. I have a bar chart with a two lines (one is acting as a rolling average, the other as the peak of that rolling average). What I want to do is label that peak line with a number, one time, but in each facet where the number is different in each facet. Here's some stripped down data and code:
tdf <- data.frame(a=as.POSIXct(c("2019-10-15 08:00:00","2019-10-15 09:00:00","2019-10-15 10:00:00","2019-10-15 08:00:00","2019-10-15 09:00:00","2019-10-15 10:00:00")),
b=as.Date(c("2019-09-02","2019-09-02","2019-09-02","2019-09-03","2019-09-03","2019-09-03")),
m1=c(0.2222222,0.3636364, 0.2307692, 0.4000000, 0.3428571, 0.3529412),
m2=c(0.2222222,0.2929293, 0.2972028, 0.3153846, 0.3714286, 0.3529412),
m3=c(0.2929293, 0.2929293, 0.2929293, 0.3529412,0.3529412,0.3529412))
g <- ggplot(data = tdf, aes(x = a, y = m1)) +
geom_bar(stat = "identity", alpha = 0.75, fill = 352) +
xlab("time of day") +
ylab("metric name") +
ggtitle("Graph Title") +
scale_x_datetime(breaks = scales::date_breaks("1 hours"),
date_labels = "%H")+
scale_y_continuous(breaks = c(0,.10,.20,.30,.40,.50,.50,.60,.70,.80,.90,1.0),
labels = scales::percent) +
theme_minimal()
# add line for m2
g <- g +
geom_line(data = tdf,
aes(x = a, y = m2),
color = "blue",
size = 1.2)
# add line for m3
g <- g + geom_line(data=tdf,
aes(x = a, y = m3),
color = "#d95f02",
size = 0.6,
linetype = "dashed")
# last attempt to label the line results in an error: Invalid input: time_trans works with objects of class POSIXct
#g <- g+geom_text(aes(x=-Inf, y=Inf, label=median(tdf$m3)), size=2, hjust=-0.5, vjust= 1.4,inherit.aes=FALSE)
# facet wrap
g <- g + facet_wrap(~b, ncol = 5, scales = "fixed")
I've seen a few techniques, but none of them seem to relate having a time for the x-axis in the facets, and each facet having a different date. I'm reasonably certain it's related to the date, but I sort of have no clue how to make the text block happen on each facet anyway.
You just need to pass a different dataset to the labeling layer that still preserves your faceting variable. This will work using dplyr
g <- g +
geom_text(data = tdf %>%
group_by(b) %>%
summarize(median = median(m3)),
aes(x = as.POSIXct(-Inf, origin="1970-01-01"),
y = Inf,
label = median),
size = 2,
hjust = -0.5,
vjust = 1.4,
inherit.aes = FALSE)
We also have to explicitly convert the x to a date/time value for the axis to work.
As you can see on the image, R automatically assigns the values 0, 0.25... 1 for the size of the point. I was wondering if I could replace the 0, 0.25... 1 and make these text values instead while keeping the actual numerical values from the data.
library(ggplot2)
library(scales)
data(SLC4A1, package="ggplot2")
SLC4A1 <- read.csv(file.choose(), header = TRUE)
# bubble chart showing position of polymorphisms on gene, the frequency of each of these
# polymorphisms, where they are prominent on earth, and p-value
SLC4A1ggplot <- ggplot(SLC4A1, aes(Position, log10(Frequency)))+
geom_jitter(aes(col=Geographical.Location, size =(p.value)))+
labs(subtitle="Frequency of Various Polymorphisms", title="SLC4A1 Gene") +
labs(color = "Geographical Location") +
labs(size = "p-value") + labs(x = "Position of Polymorphism on SLC4A1 Gene") +
scale_size_continuous(range=c(1,4.5), trans = "reverse") +
guides(size = guide_legend(reverse = TRUE))
library(tidyver)
df <- data.frame(x = 1:5, y = 1:5,z = 1:5)
ggplot(df,aes(x = x, y = y, size = z)) +
geom_point()
ggplot(df,aes(x = x, y = y, size = z)) +
geom_point() +
scale_size_continuous(range = 1:2) # control range of circle size
See more here:
https://ggplot2.tidyverse.org/reference/scale_size.html
I am trying to create a faceted alluvial plot with labels for the stratums on the first axis repelled to the left and left justified and the labels on the right repelled to the right and right justified.
# Small working example
# Install Packages and Libraries
install.packages("ggplot2")
install.packages("ggalluvial")
install.packages("ggrepel")
library(ggplot2)
library(ggalluvial)
library(ggrepel)
# Data Frame with 2 regions, 3 supply sectors and 3 demand sectors
df <- data.frame(region = c("A","A","A","B","B","B"),
supplySector = c("coal","gas","wind","coal","gas","wind"),
demandSector = c("resid","indus","ag","resid","indus","ag"),
value = 10*runif(6)); df
# Faceted plot with ggrepel (nudge_x and hjust assigned for each label) works.
p <- ggplot(df, aes(y = value, axis1 = supplySector, axis2 = demandSector, group=region)) +
ggalluvial::geom_alluvium(aes(fill = supplySector), width = 1/12, color="black", alpha=0.6) +
ggalluvial::geom_stratum(width = 1/12, fill = "grey70", color = "grey10", alpha=1) +
scale_x_discrete(limits = c("supplySector", "demandSector"), expand = c(0.3,0),drop=F) +
facet_wrap(region~.) +
ggrepel::geom_text_repel(stat = "stratum", label.strata = TRUE, direction = "y",
size = 4, segment.color = 'grey50',
nudge_x = rep(c(-3,-3,-3,3,3,3),2),
hjust = rep(c(1,1,1,-1,-1,-1),2)); p
# Faceted plot with ggrepel (nudge_x and hjust assigned for each label)
# does not work when different number of variables in each facet
df1 <- df[-nrow(df),]; df1 # Remove one of the rows from df
# So this gives the following plot with different alluvia in each facet
p1 <- ggplot(df1, aes(y = value, axis1 = supplySector, axis2 = demandSector, group=region)) +
ggalluvial::geom_alluvium(aes(fill = supplySector), width = 1/12, color="black", alpha=0.6) +
ggalluvial::geom_stratum(width = 1/12, fill = "grey70", color = "grey10", alpha=1) +
scale_x_discrete(limits = c("supplySector", "demandSector"), expand = c(0.3,0),drop=F) +
facet_wrap(region~.); p1
# If we try and label these and assigns the nudge and hjust for each axis we get an error
# It expects the same length vector for nudge and hjust for each facet
p1 + ggrepel::geom_text_repel(stat = "stratum", label.strata = TRUE, direction = "y",
size = 4, segment.color = 'grey50',
nudge_x = rep(c(-3,-3,-3,3,3,3),2),
hjust=rep(c(1,1,1,-1,-1,-1),2))
# Gives error: Error: Aesthetics must be either length 1 or the same as the data (10): hjust
# If we adjust the vectors for nudge_x and hjust to 10
p1 + ggrepel::geom_text_repel(stat = "stratum", label.strata = TRUE, direction = "y",
size = 4, segment.color = 'grey50',
nudge_x = c(-3,-3,-3,3,3,3,-3-3,3,3),
hjust = c(1,1,1,-1,-1,-1,1,1,-1,-1))
# Get Error: Error in data.frame(x = data$x + nudge_x, y = data$y + nudge_y) :
# arguments imply differing number of rows: 9, 6
# In addition: Warning message:
# In data$x + nudge_x :
# longer object length is not a multiple of shorter object length
# It can be plotted without specifying the nudge_x and hjust values
p1 + ggrepel::geom_text_repel(stat = "stratum", label.strata = TRUE, direction = "y",
size = 4, segment.color = 'grey50')
In summary, what I am trying to do is:
For plot p1 (with different number of alluvia in different facets)
Label each x axis stratum column
Have axis1 labels repel to the left and be left justified
Have axis2 labels repel to the right and be right justified
This answer suggested the different vector length for labels but it doesn't work for varying facets.
Labelling and theme of ggalluvial plot in R
This is tricky! The nudge_* and *just arguments generally aren't dynamic. One way you could solve for this is to dig into the guts using ggplot_build()
ggplot_build() has all of the "instructions" of how ggplot() builds the chart. You can edit the data and then run plot(ggplot_gtable()) to see the plot with your modifications. I have added comments to help explain these steps.
# here is the base plot + the new layer for labels
plot_and_label <-
p1 +
geom_text_repel(
stat = "stratum", label.strata = TRUE,
direction = "y", size = 4,
segment.color = 'grey50',
nudge_x = 0
)
# this is the plot under the hood
gg_guts <- ggplot_build(plot_and_label)
# the geom_text_repel layer was the 3rd one we added so you can
# access and edit it like this
gg_guts$data[[3]] <-
gg_guts$data[[3]] %>%
mutate(hjust = ifelse(x%%2 == 1, 2, -2))
# once you've made your adjustments, you can plot it again
plot(ggplot_gtable(gg_guts))