R ggplot custom label axis ticks with other variable - r

When labelling axis ticks I'd like to add other information from the dataframe in parentheses. For example, using code snippet below I'd like to automatically include the information on Area in parentheses next to the label X. In other words, the label might say 'Chicago (45)' instead of just 'Chicago'. I know I can do this manually by setting labels in scale_x_discrete. However, can I do this automatically? My dataset has a large number of entries so I would like to avoid doing this manually.
dataset <- data.frame(Area = sample(c(NA, 1:100), 3, rep = TRUE),
Y = rnorm(3), X = c("Chicago","New York", "Orlando"))
ggplot(dataset, aes(X, Y)) + geom_point()

You can build a new column with your desired ticks labels:
library(dyplr)
library(stringr)
library(ggplot2)
dataset = dataset %>% mutate(label = str_c(X, " (", Area, ")"))
Then use the column label in the aesthetics
ggplot(dataset, aes(label, Y)) + geom_point()

Related

R: Programmatically changing ggplot scale labels to Greek letters with expressions

I am trying to change the labels in a ggplot object to Greek symbols for an arbitrary number of labels. Thanks to this post, I can do this manually when I know the number of labels in advance and the number is not too large:
# Simulate data
df <- data.frame(name = rep(c("alpha1","alpha2"), 50),
value = rnorm(100))
# Create a plot with greek letters for labels
ggplot(df, aes(x = value, y = name)) + geom_density() +
scale_y_discrete(labels = c("alpha1" = expression(alpha[1]),
"alpha2" = expression(alpha[2])))
For our purposes, assume I need to change k default labels, where each of the k labels is the pre-fix "alpha" followed by a number 1:k. Their corresponding updated labels would substitute the greek letter for "alpha" and use a subscript. An example of this is below:
# default labels
paste0("alpha", 1:k)
# desired labels
for (i in 1:k) { expression(alpha[i]) }
I was able to hack together the below programmatic solution that appears to produce the desired result thanks to this post:
ggplot(df, aes(x = value, y = name)) + geom_density() +
scale_y_discrete(labels = parse(text = paste("alpha[", 1:length(unique(df)), "]")))
However, I do not understand this code and am seeking clarification about:
What is parse() doing here that expression() otherwise would do?
While I understand everything to the right-hand side of =, what is text doing on the left-hand side of the =?
Another option to achieve your desired result would be to add a new column to your data which contains the ?plotmath expression as a string and map this new column on y. Afterwards you could use scales::label_parse() to parse the expressions:
set.seed(123)
df <- data.frame(name = rep(c("alpha1","alpha2"), 50),
value = rnorm(100))
df$label <- gsub("^(.*?)(\\d+)$", "\\1[\\2]", df$name)
library(ggplot2)
library(scales)
ggplot(df, aes(x = value, y = label)) + geom_density() +
scale_y_discrete(labels = scales::label_parse())

how to assign text to ggplot with a condition when y axis is not numeric in r

I want to annotate a percentage number of missing values for each variable that actually has any missing values somewhere above the corresponding variable blue line.
I can add a text using geom_text but I have difficulty with how to select those vars with Nas. I would appreciate any hint with this.
library(ggplot2)
library(naniar)
gg_miss_var(airquality) + labs(y = "Look at all the missing ones")
You can use naniar::miss_var_summary() to create a data frame with labels for all variables with at least one NA:
df <- miss_var_summary(airquality) %>%
dplyr::filter(n_miss > 0) %>%
dplyr::mutate(pct_label = paste0(round(pct_miss, 1), '%'))
You can then use this data frame inside your geom_text() line:
gg_miss_var(airquality) +
geom_text(data = df, aes(x = as.factor(variable), y = n_miss, label = pct_label),
vjust = 1.5) +
labs(y = "Look at all the missing ones")

How to add a legend on a multiple line graph in R?

I am trying to plot two different datasets on the same plot. I am using this code to add the lines and to actually plot everything
ggplot()+
geom_point(data=Acc, aes(x=Year, y=Accumulo), color="lightskyblue")+
geom_line(data=Acc, aes(x=Year, y=RM3), color="gold1")+
geom_line(data=Acc, aes(x=Year, y=RM5), color="springgreen3")+
geom_line(data=Acc, aes(x=Year, y=RM50), color="blue")+
geom_line(data=Vulcani, aes(x=Year, y=Accumulo.V), color="red")+
theme_bw()+
scale_x_continuous(expand=expand_scale(0)) + scale_y_continuous(limits=c(50,350),expand=expand_scale(0))
but I can't find any way to add a legend and add custom labels to the different series. I find a way to add legends on a single dataset, but I can't find a way to add to this one a legend on the side
You are better off creating a single dataset tailored to your plot needs before, which would be in the long format, so that you can give a single geom_line() instruction, and add colors to the lines with aes(color = ...) within the call to geom_line(). Here's an example with the midwest dataset (consider them as distinct datasets for the sake of example)
library(ggplot2)
library(dplyr)
library(tidyr)
long_midwest <- midwest %>%
select(popwhite, popasian, PID, poptotal) %>%
gather(key = "variable", value = "value", -PID, -poptotal) # convert to long format
long_midwest2 <- midwest %>%
select(poptotal, perchsd, PID) %>%
gather(key = "variable", value = "value", -PID, -poptotal)
plot_data <- bind_rows(long_midwest, long_midwest2) %>% # bind datasets vertically
mutate(line_type = ifelse(variable == 'perchsd', 'A', 'B')) # creates a line_type variable
ggplot(data = plot_data, aes(x=poptotal, y = value))+
geom_line(aes(color = variable, linetype = line_type)) +
scale_color_manual(
values = c('lightskyblue', 'gold1', 'blue'),
name = "My color legend"
) +
scale_linetype_manual(
values = c(3, 1), # play with the numbers to get the correct styling
name = "My linetype legend"
)
I added a line_type variable to show the most generic case where you want specific mapping between the column values and the line type. If it is the same than, say, variable, just use aes(color = variable, linetype = variable). You can then decide which linetype you want (see here for more details).
For customising the labels, just change the content of variable within the dataset with the desired values.

Highlight positions without data in facet_wrap ggplot

When facetting barplots in ggplot the x-axis includes all factor levels. However, not all levels may be present in each group. In addition, zero values may be present, so from the barplot alone it is not possible to distinguish between x-axis values with no data and those with zero y-values. Consider the following example:
library(tidyverse)
set.seed(43)
site <- c("A","B","C","D","E") %>% sample(20, replace=T) %>% sort()
year <- c("2010","2011","2012","2013","2014","2010","2011","2012","2013","2014","2010","2012","2013","2014","2010","2011","2012","2014","2012","2014")
isZero = rbinom(n = 20, size = 1, prob = 0.40)
value <- ifelse(isZero==1, 0, rnorm(20,10,3)) %>% round(0)
df <- data.frame(site,year,value)
ggplot(df, aes(x=year, y=value)) +
geom_bar(stat="identity") +
facet_wrap(~site)
This is fish census data, where not all sites were fished in all years, but some times no fish were caught. Hence the need to differentiate between the two situations. For example, there was no catch at site C in 2010 and it was not fished in 2011, and the reader cannot tell the difference. I would like to add something like "no data" to the plot for 2011. Maybe it is possible to fill the rows where data is missing, generate another column with the desired text to be added and then include this via geom_text?
So here is an example of your proposed method:
# Tabulate sites vs year, take zero entries
tab <- table(df$site, df$year)
idx <- which(tab == 0, arr.ind = T)
# Build new data.frame
missing <- data.frame(site = rownames(tab)[idx[, "row"]],
year = colnames(tab)[idx[, "col"]],
value = 1,
label = "N.D.") # For 'no data'
ggplot(df, aes(year, value)) +
geom_col() +
geom_text(data = missing, aes(label = label)) +
facet_wrap(~site)
Alternatively, you could also let the facets omit unused x-axis values:
ggplot(df, aes(x=year, y=value)) +
geom_bar(stat="identity") +
facet_wrap(~site, scales = "free_x")

How to specify columns in facet_grid OR how to change labels in facet_wrap

I have a large number of data series that I want to plot using small multiples. A combination of ggplot2 and facet_wrap does what I want, typically resulting a nice little block of 6 x 6 facets. Here's a simpler version:
The problem is that I don't have adequate control over the labels in facet strips. The names of the columns in the data frame are short and I want to keep them that way, but I want the labels in the facets to be more descriptive. I can use facet_grid so that I can take advantage of the labeller function but then there seems to be no straightforward way to specify the number of columns and a long row of facets just doesn't work for this particular task. Am I missing something obvious?
Q. How can I change the facet labels when using facet_wrap without changing the column names? Alternatively, how can I specify the number of columns and rows when using facet_grid?
Code for a simplified example follows. In real life I am dealing with multiple groups each containing dozens of data series, each of which changes frequently, so any solution would have to be automated rather than relying on manually assigning values.
require(ggplot2)
require(reshape)
# Random data with short column names
set.seed(123)
myrows <- 30
mydf <- data.frame(date = seq(as.Date('2012-01-01'), by = "day", length.out = myrows),
aa = runif(myrows, min=1, max=2),
bb = runif(myrows, min=1, max=2),
cc = runif(myrows, min=1, max=2),
dd = runif(myrows, min=1, max=2),
ee = runif(myrows, min=1, max=2),
ff = runif(myrows, min=1, max=2))
# Plot using facet wrap - we want to specify the columns
# and the rows and this works just fine, we have a little block
# of 2 columns and 3 rows
mydf <- melt(mydf, id = c('date'))
p1 <- ggplot(mydf, aes(y = value, x = date, group = variable)) +
geom_line() +
facet_wrap( ~ variable, ncol = 2)
print (p1)
# Problem: we want more descriptive labels without changing column names.
# We can change the labels, but doing so requires us to
# switch from facet_wrap to facet_grid
# However, in facet_grid we can't specify the columns and rows...
mf_labeller <- function(var, value){ # lifted bodily from the R Cookbook
value <- as.character(value)
if (var=="variable") {
value[value=="aa"] <- "A long label"
value[value=="bb"] <- "B Partners"
value[value=="cc"] <- "CC Inc."
value[value=="dd"] <- "DD Company"
value[value=="ee"] <- "Eeeeeek!"
value[value=="ff"] <- "Final"
}
return(value)
}
p2 <- ggplot(mydf, aes(y = value, x = date, group = variable)) +
geom_line() +
facet_grid( ~ variable, labeller = mf_labeller)
print (p2)
I don't quite understand. You've already written a function that converts your short labels to long, descriptive labels. What is wrong with simply adding a new column and using facet_wrap on that column instead?
mydf <- melt(mydf, id = c('date'))
mydf$variableLab <- mf_labeller('variable',mydf$variable)
p1 <- ggplot(mydf, aes(y = value, x = date, group = variable)) +
geom_line() +
facet_wrap( ~ variableLab, ncol = 2)
print (p1)
To change the label names, just change the factor levels of the factor you use in facet_wrap. These will be used in facet_wrap on the strips. You can use a similar setup as you would using the labeller function in facet_grid. Just do something like:
new_labels = sapply(levels(df$factor_variable), custom_labeller_function)
df$factor_variable = factor(df$factor_variable, levels = new_labels)
Now you can use factor_variable in facet_wrap.
Just add labeller = label_wrap_gen(width = 25, multi_line = TRUE) to the facet_wrap() arguments.
Eg.: ... + facet_wrap( ~ variable, ,labeller = label_wrap_gen(width = 25, multi_line = TRUE))
More info: ?ggplot2::label_wrap_gen
Simply add labeller = label_both to the facet_wrap() arguments.
... + facet_wrap( ~ variable, labeller = label_both)

Resources