Parse superscript in discrete axis values on geom_bar - r

I'm trying to add a superscript to some x-axis values in order to connect to a footnote that'll be at the bottom of the page. The easy workaround would just be an asterisk instead of ^a but that won't work for my purposes.
I did a lot of searching and while there's plenty of posts about superscripts in axis labels, I couldn't find any about superscripts in axis values. Most of them appeared to centera round adding a gg + labs(x = expression("blah^a")).
I did find this post about parsing superscripts inside a geom_text() but it appears the same doesn't work for a geom_bar().
Here's some test data:
library(ggplot2)
dat <- data.frame(x = c("alpha", "bravo^a"),
y = c(10, 8))
ggplot(data = dat) +
geom_bar(aes(x = x,
y = y),
stat = "identity")

You just need to parse the text inside scale_x_discrete
Edit: add geom_text example
library(ggplot2)
dat <- data.frame(x = c("alpha", "bravo^a"),
y = c(10, 8))
### need to convert x to factor if R >= 4.0
dat$x <- factor(dat$x)
ggplot(data = dat) +
geom_bar(aes(x = x,
y = y),
stat = "identity") +
scale_x_discrete(labels = parse(text = levels(dat$x))) +
geom_text(aes(x = x, y = y,
label = x),
parse = TRUE,
nudge_y = 1,
size = 5) +
theme_minimal(base_size = 14)
Created on 2018-08-27 by the reprex package (v0.2.0.9000).

Related

How to manually add a tick mark in ggplot2? [duplicate]

This question already has answers here:
ggplot2 annotate on y-axis (outside of plot)
(2 answers)
Closed 5 months ago.
I am quite new to the ggplot2 world in R, so I am trying to get familiar with the technicalities of plotting with ggplot2. In particular, I have a problem, which can be replicated by the following MWE:
ggplot(data.frame(x = c(1:10), y = c(1:10)), aes(x = c(1:10), y = c(1:10))) +
geom_line() +
geom_hline(aes(yintercept = 6), lty = 2)
This generates a simple graph of a diagonal line with a horizontal dashed line that cuts the y-axis at 6.
I would like to know whether there is a way to add a tick mark of 6 on the y-axis? In other words, say I simply showed this plot to a person without the code. I want the person to know that the horizontal dashed line cuts the y-axis at 6, which is not easily seen since 6 is not labelled currently.
Any intuitive suggestions will be greatly appreciated :)
You can use scale_y_continuous and give it all the points where you'd want a tick in the breaks argument.
library(ggplot2)
ggplot(data.frame(x = c(1:10), y = c(1:10)), aes(x = x, y = y)) +
geom_line() +
geom_hline(aes(yintercept = 6), lty = 2) +
scale_y_continuous(breaks = c(2, 4, 6, 8, 10))
in this specific example there is lots of "empty space" and you might consider adding the number within the plot using geom_text or geom_label if you think that is, what people should take away from your plot:
library(ggplot2)
ggplot(data.frame(x = c(1:10), y = c(1:10)), aes(x = x, y = y)) +
geom_line() +
geom_hline(aes(yintercept = 6), lty = 2) +
#scale_y_continuous(breaks = c(2, 4, 6, 8, 10)) +
geom_label(aes(x=2, y=6, label = "line at y = 6.0"))
Your data frame
df <- data.frame(x = c(1:10), y = c(1:10))
#Plotting
p <- ggplot(df, aes(x, y)) +
geom_point()
p
Custom function from here
It creates both x and y breaks
add_x_break <- function(plot, xval) {
p2 <- ggplot_build(plot)
breaks <- p2$layout$panel_params[[1]]$x$breaks
breaks <- breaks[!is.na(breaks)]
plot +
geom_vline(xintercept = xval) +
scale_x_continuous(breaks = sort(c(xval, breaks)))
}
add_y_break <- function(plot, yval) {
p2 <- ggplot_build(plot)
breaks <- p2$layout$panel_params[[1]]$y$breaks
breaks <- breaks[!is.na(breaks)]
plot +
geom_hline(yintercept = yval) +
scale_y_continuous(breaks = sort(c(yval, breaks)))
}
Define your break in your case y break
p <- add_y_break(p, 6)
p
Hope this helps
A bit of improvement with the ticks
p <- ggplot(df, aes(x, y)) +
geom_point()+
scale_x_continuous(breaks = round(seq(min(df$x), max(df$x), by = 0.5),1)) +
scale_y_continuous(breaks = round(seq(min(df$y), max(df$y), by = 0.5),1))
p
p <- add_y_break(p, 6)
p

Is there a way to customize the labels of size on a bubble plot to whatever I'd like in R?

As you can see on the image, R automatically assigns the values 0, 0.25... 1 for the size of the point. I was wondering if I could replace the 0, 0.25... 1 and make these text values instead while keeping the actual numerical values from the data.
library(ggplot2)
library(scales)
data(SLC4A1, package="ggplot2")
SLC4A1 <- read.csv(file.choose(), header = TRUE)
# bubble chart showing position of polymorphisms on gene, the frequency of each of these
# polymorphisms, where they are prominent on earth, and p-value
SLC4A1ggplot <- ggplot(SLC4A1, aes(Position, log10(Frequency)))+
geom_jitter(aes(col=Geographical.Location, size =(p.value)))+
labs(subtitle="Frequency of Various Polymorphisms", title="SLC4A1 Gene") +
labs(color = "Geographical Location") +
labs(size = "p-value") + labs(x = "Position of Polymorphism on SLC4A1 Gene") +
scale_size_continuous(range=c(1,4.5), trans = "reverse") +
guides(size = guide_legend(reverse = TRUE))
library(tidyver)
df <- data.frame(x = 1:5, y = 1:5,z = 1:5)
ggplot(df,aes(x = x, y = y, size = z)) +
geom_point()
ggplot(df,aes(x = x, y = y, size = z)) +
geom_point() +
scale_size_continuous(range = 1:2) # control range of circle size
See more here:
https://ggplot2.tidyverse.org/reference/scale_size.html

Directlabels package-- labels do not fit in plot area

I want to explore the directlabels package with ggplot. I am trying to plot labels at the endpoint of a simple line chart; however, the labels are clipped by the plot panel. (I intend to plot about 10 financial time series in one plot and I thought directlabels would be the best solution.)
I would imagine there may be another solution using annotate or some other geoms. But I would like to solve the problem using directlabels. Please see code and image below. Thanks.
library(ggplot2)
library(directlabels)
library(tidyr)
#generate data frame with random data, for illustration and plot:
x <- seq(1:100)
y <- cumsum(rnorm(n = 100, mean = 6, sd = 15))
y2 <- cumsum(rnorm(n = 100, mean = 2, sd = 4))
data <- as.data.frame(cbind(x, y, y2))
names(data) <- c("month", "stocks", "bonds")
tidy_data <- gather(data, month)
names(tidy_data) <- c("month", "asset", "value")
p <- ggplot(tidy_data, aes(x = month, y = value, colour = asset)) +
geom_line() +
geom_dl(aes(colour = asset, label = asset), method = "last.points") +
theme_bw()
On data visualization principles, I would like to avoid extending the x-axis to make the labels fit--this would mean having data space with no data. Rather, I would like the labels to extend toward the white space beyond the chart box/panel (if that makes sense).
In my opinion, direct labels is the way to go. Indeed, I would position labels at the beginning and at the end of the lines, creating space for the labels using expand(). Also note that with the labels, there is no need for the legend.
This is similar to answers here and here.
library(ggplot2)
library(directlabels)
library(grid)
library(tidyr)
x <- seq(1:100)
y <- cumsum(rnorm(n = 100, mean = 6, sd = 15))
y2 <- cumsum(rnorm(n = 100, mean = 2, sd = 4))
data <- as.data.frame(cbind(x, y, y2))
names(data) <- c("month", "stocks", "bonds")
tidy_data <- gather(data, month)
names(tidy_data) <- c("month", "asset", "value")
ggplot(tidy_data, aes(x = month, y = value, colour = asset, group = asset)) +
geom_line() +
scale_colour_discrete(guide = 'none') +
scale_x_continuous(expand = c(0.15, 0)) +
geom_dl(aes(label = asset), method = list(dl.trans(x = x + .3), "last.bumpup")) +
geom_dl(aes(label = asset), method = list(dl.trans(x = x - .3), "first.bumpup")) +
theme_bw()
If you prefer to push the labels into the plot margin, direct labels will do that. But because the labels are positioned outside the plot panel, clipping needs to be turned off.
p1 <- ggplot(tidy_data, aes(x = month, y = value, colour = asset, group = asset)) +
geom_line() +
scale_colour_discrete(guide = 'none') +
scale_x_continuous(expand = c(0, 0)) +
geom_dl(aes(label = asset), method = list(dl.trans(x = x + .3), "last.bumpup")) +
theme_bw() +
theme(plot.margin = unit(c(1,4,1,1), "lines"))
# Code to turn off clipping
gt1 <- ggplotGrob(p1)
gt1$layout$clip[gt1$layout$name == "panel"] <- "off"
grid.draw(gt1)
This effect can also be achieved using geom_text (and probably also annotate), that is, without the need for direct labels.
p2 = ggplot(tidy_data, aes(x = month, y = value, group = asset, colour = asset)) +
geom_line() +
geom_text(data = subset(tidy_data, month == 100),
aes(label = asset, colour = asset, x = Inf, y = value), hjust = -.2) +
scale_x_continuous(expand = c(0, 0)) +
scale_colour_discrete(guide = 'none') +
theme_bw() +
theme(plot.margin = unit(c(1,3,1,1), "lines"))
# Code to turn off clipping
gt2 <- ggplotGrob(p2)
gt2$layout$clip[gt2$layout$name == "panel"] <- "off"
grid.draw(gt2)
Since you didn't provide a reproducible example, it's hard to say what the best solution is. However, I would suggest trying to manually adjust the x-scale. Use a "buffer" increase the plot area.
#generate data frame with random data, for illustration and plot:
p <- ggplot(tidy_data, aes(x = month, y = value, colour = asset)) +
geom_line() +
geom_dl(aes(colour = asset, label = asset), method = "last.points") +
theme_bw() +
xlim(minimum_value, maximum_value + buffer)
Using scale_x_discrete() or scale_x_continuous() would likely also work well here if you want to use the direct labels package. Alternatively, annotate or a simple geom_text would also work well.

Dynamically formatting individual axis labels in ggplot2

This may end up being an expression or call question, but I am trying to conditionally format individual axis labels.
In the following example, I'd like to selectively bold one of the axis labels:
library(ggplot2)
data <- data.frame(labs = c("Oranges", "Apples", "Cucumbers"), counts = c(5, 10, 12))
ggplot(data = data) +
geom_bar(aes(x = labs, y = counts), stat="identity")`
There is similar problem here, but the solution involves theme and element_text. I am trying to use axis labels directly.
I can do this manually as below:
breaks <- levels(data$labs)
labels <- breaks
labels[2] <- expression(bold("Cucumbers"))
ggplot(data = data) +
geom_bar(aes(x = labs, y = counts), stat="identity") +
scale_x_discrete(label = labels, breaks = breaks)
But, if I try to do it by indexing instead of typing out "Cucumbers", I get the following error:
breaks <- levels(data$labs)
labels <- breaks
labels[2] <- expression(bold(labels[2]))
ggplot(data = data) +
geom_bar(aes(x = labs, y = counts), stat="identity") +
scale_x_discrete(label = labels, breaks = breaks)
Which makes sense, because it is not evaluating the labels[2]. But, does anyone know how to force it to do that? Thanks.
How about
breaks <- levels(data$labs)
labels <- as.expression(breaks)
labels[[2]] <- bquote(bold(.(labels[[2]])))
ggplot(data = data) +
geom_bar(aes(x = labs, y = counts), stat="identity") +
scale_x_discrete(label = labels, breaks = breaks)
Here we are more explicit about the conversion to expression and we use bquote() to insert the value of the label into the expression itself.
Another option is to set the font face dynamically with theme, though I'm not sure if this is in any sense a better or worse method than #MrFlick's answer:
breaks <- levels(data$labs)
# Reference breaks by name
toBold = "Cucumbers"
ggplot(data = data) +
geom_bar(aes(x = labs, y = counts), stat="identity") +
scale_x_discrete(label = labels, breaks = breaks) +
theme(axis.text.x=
element_text(face=ifelse(breaks %in% toBold, "bold", "plain")))
# Reference breaks by position
label.index=2
ggplot(data = data) +
geom_bar(aes(x = labs, y = counts), stat="identity") +
scale_x_discrete(label = labels, breaks = breaks) +
theme(axis.text.x=
element_text(face=ifelse(breaks %in% labels[match(label.index, 1:length(breaks))],
"bold", "plain")))

Condition a ..count.. summation on the faceting variable

I'm trying to annotate a bar chart with the percentage of observations falling into that bucket, within a facet. This question is very closely related to this question:
Show % instead of counts in charts of categorical variables but the introduction of faceting introduces a wrinkle. The answer to the related question is to use stat_bin w/ the text geom and then have the label be constructed as so:
stat_bin(geom="text", aes(x = bins,
y = ..count..,
label = paste(round(100*(..count../sum(..count..)),1), "%", sep="")
)
This works fine for an un-faceted plot. However, with facets, this sum(..count..) is summing over the entire collection of observations without regard for the facets. The plot below illustrates the issue---note that the percentages do not sum to 100% within a panel.
Here the actually code for the figure above:
g.invite.distro <- ggplot(data = df.exp) +
geom_bar(aes(x = invite_bins)) +
facet_wrap(~cat1, ncol=3) +
stat_bin(geom="text", aes(x = invite_bins,
y = ..count..,
label = paste(round(100*(..count../sum(..count..)),1), "%", sep="")
),
vjust = -1, size = 3) +
theme_bw() +
scale_y_continuous(limits = c(0, 3000))
UPDATE: As per request, here's a small example re-producing the issue:
df <- data.frame(x = c('a', 'a', 'b','b'), f = c('c', 'd','d','d'))
ggplot(data = df) + geom_bar(aes(x = x)) +
stat_bin(geom = "text", aes(
x = x,
y = ..count.., label = ..count../sum(..count..)), vjust = -1) +
facet_wrap(~f)
Update geom_bar requires stat = identity.
Sometimes it's easier to obtain summaries outside the call to ggplot.
df <- data.frame(x = c('a', 'a', 'b','b'), f = c('c', 'd','d','d'))
# Load packages
library(ggplot2)
library(plyr)
# Obtain summary. 'Freq' is the count, 'pct' is the percent within each 'f'
m = ddply(data.frame(table(df)), .(f), mutate, pct = round(Freq/sum(Freq) * 100, 1))
# Plot the data using the summary data frame
ggplot(data = m, aes(x = x, y = Freq)) +
geom_bar(stat = "identity", width = .7) +
geom_text(aes(label = paste(m$pct, "%", sep = "")), vjust = -1, size = 3) +
facet_wrap(~ f, ncol = 2) + theme_bw() +
scale_y_continuous(limits = c(0, 1.2*max(m$Freq)))

Resources