Adding subscripts and symbols to facet_wrap in ggplot2 - r

I am trying to create a 2x2 facet plot of some weather conditions over time, and am having trouble adding a degrees symbol and a superscript to some of the facet titles.
weatherPLOT = data.frame(weather = rep(c("Soil Temperature (C)",
"Snow Depth (m)",
"Air Temperature (C)",
"Discharge (m3/sec)"), each = 366),
day = 1:366,
mean = 3, # Obvious place holders,
Lo95 = 2, # Are actual numbers in real code
Hi95 = 4)
ggplot(aes(y = mean, x = day), data = weatherPLOT) +
geom_ribbon(aes(ymin = Lo95, ymax = Hi95), alpha = 0.25) +
geom_path(size = 1) +
theme(axis.title.y = element_blank()) + xlab("Julian Day") +
facet_wrap( ~ weather, nrow = 2, ncol = 2, scales = "free")
I know the trick is to use labeller inside of facet_wrap, but I can't to make it work - I'm just looking to add a degree symbol before the (C) and make the 3 in (m3/sec) superscripted.

The easiest way to do this to change the text values themselves to appropriate symbols and use the ggtext package for the formatting.
\u00b0 is the Unicode value for the degree sign. <sup>3</sup> is the ggtext Markdown code for superscript 3. You specify that theme text should be markdown by using ggtext::element_markdown().
library(ggplot2)
weatherPLOT = data.frame(weather = rep(c("Soil Temperature (\u00b0C)",
"Snow Depth (m)",
"Air Temperature (\u00b0C)",
"Discharge (m<sup>3</sup>/sec)"), each = 366),
day = 1:366,
mean = 3, # Obvious place holders,
Lo95 = 2, # Are actual numbers in real code
Hi95 = 4)
ggplot(aes(y = mean, x = day), data = weatherPLOT) +
geom_ribbon(aes(ymin = Lo95, ymax = Hi95), alpha = 0.25) +
geom_path(size = 1) +
labs(y = "", x = "Julian Day") +
theme(strip.text = ggtext::element_markdown()) +
facet_wrap( ~ weather, nrow = 2, ncol = 2, scales = "free")
Created on 2021-08-25 by the reprex package (v2.0.0)

Another way would be the following:
weatherPLOT %>%
mutate(weather = factor(weather,
labels = c(bquote('Air Temperature'*degree*C),
"Discharge~(m^{3}/sec)",
"Snow~Depth~(m)",
bquote('Soil Temperature'*degree*C)))) %>%
ggplot(aes(y = mean, x = day)) +
geom_ribbon(aes(ymin = Lo95, ymax = Hi95), alpha = 0.25) +
geom_path(size = 1) +
theme(axis.title.y = element_blank()) + xlab("Julian Day") +
facet_wrap( ~ weather, nrow = 2, ncol = 2, scales = "free", labeller = label_parsed)

Related

geom_ribbon: Fill area between lines - spurious lines connecting groups

I'm trying to build a plot with two lines and fill the area between with geom_ribbon. I've managed to select a fill color (red/blue) depending on the sign of the difference between two lines. First I create two new columns in the dataset for ymax, ymin. It seems to work but some spurious lines appear joining red areas.
Is geom_ribbon appropriate to fill the areas? Is there any problem in the plot code?
This is the code used to create the plot
datos.2022 <- datos.2022 %>% mutate(y1 = SSTm-273.15, y2 = SST.mean.day-273.15)
datos.2022 %>% ggplot(aes(x=fecha)) +
geom_line(aes(y=SSTm-273.15), color = "red") +
geom_line(aes(y=SST.mean.day - 273.15), color = "black") +
geom_ribbon(aes(ymax=y1, ymin = y2, fill = as.factor(sign)), alpha = 0.6) +
scale_fill_manual(guide = "none", values=c("blue","red")) +
scale_y_continuous(limits = c(10,30)) +
scale_x_date(expand = c(0,0), breaks = "1 month", date_labels = "%b" ) +
theme_hc() +
labs(x="",y ="SST",title = "Temperature (2022)") +
theme(text = element_text(size=20,family = "Arial"))
And this is the output
Example data for the plot available at https://www.dropbox.com/s/mkk8w7py2ynuy1t/temperature.dat?dl=0
What if you made two different series to plot as ribbons - one for the positive values where there is no distance between ymin and ymax for the places where the difference is negative. And one for the negative values that works in a similar way.
library(dplyr)
library(ggplot2)
datos.2022 <- datos.2022 %>%
mutate(y1 = SSTm-273.15,
y2 = SST.mean.day-273.15) %>%
rowwise() %>%
mutate(high_pos = max(SST.mean.day - 273.15, y1),
low_neg = min(SSTm-273.15, y2))
datos.2022 %>% ggplot(aes(x=fecha)) +
geom_line(aes(y=SSTm-273.15), color = "red") +
geom_line(aes(y=SST.mean.day - 273.15), color = "black") +
geom_ribbon(aes(ymax=high_pos, ymin = SST.mean.day - 273.15, fill = "b"), alpha = 0.6, col="transparent", show.legend = FALSE) +
geom_ribbon(aes(ymax = SST.mean.day - 273.15, ymin = low_neg, fill = "a"), alpha = 0.6, col="transparent", show.legend = FALSE) +
scale_fill_manual(guide = "none", values=c("blue","red")) +
scale_y_continuous(limits = c(10,30)) +
scale_x_date(expand = c(0,0), breaks = "1 month", date_labels = "%b" ) +
#theme_hc() +
labs(x="",y ="SST",title = "Temperature (2022)") +
theme(text = element_text(size=20,family = "Arial"))

R tidyverse enable use of multiple CPU cores

I have a core i9 in my work computer and I was wondering is it possible to enable use of multiple CPU cores to produce ggplots more quickly? Or is ggplot more GPU dependent?
Data (df):
The can be accessed here. I wasn't able to paste it since the data is big. And I guess this dataset is necessary for the question since I want to speed up ggplot.
Sample code:
df = read.csv("path/TMean.csv")
gg = df %>%
ggplot(aes(x = year, y = tmean)) +
geom_point(aes(color = "Temperature"), size = 2, shape = 1, alpha = 0.1) +
geom_smooth(method = lm, aes(linetype = "LM"), se = FALSE, color = "red") +
scale_linetype_manual(values = 2, name = NULL) +
scale_colour_manual(values = "deepskyblue4", name = "Legend") +
theme(text = element_text(size = 16)) +
xlab("Year") +
ylab("Mean Temperature (\u00B0C)") +
ggtitle("1980-2021 Historical Change")+
guides(color = guide_legend(override.aes = list(alpha = 0.5), order = 1))
# Model and label formula and R^2
lm(tmean ~ year, data = df) -> model_df
get_formula(model_df)
scales::percent(summary(df)$r.squared, accuracy = 0.01) -> r_square
summary(r_square)$r.squared -> r_squared_df
r_squared_df = round(r_squared_df, digits = 4)
#Now we need to add the text to the plot:
gg +
geom_text(x = 1983.2, y = 30.8,
label = paste0("Formula = ", get_formula(model_df)),
color = 'red') +
geom_text(x = 1980, y = 30.4,
label = paste0("R\U00B2 = ", r_squared_df),
color = 'blue')

Adding a comma and a plus sign (+) to numbers on the x axis of ggplot

I have plotted the following figure, which shows numbers on the x axis. What I want is to add commas and plus signs on these numbers. What I would expect is "+100,000" or "+200,000". Nonetheless, I have only managed to do it separately, as: "100,000" or "+100000"
I used the following code.
ggplot(data, aes(x = difference_gdp, y = difference_rate, color = region)) +
geom_point(size = 4) +
xlab("Variation in GDP per capita, 1990 vs 2019") +
ylab("Variation in age-standardised\nT2DM-attributable deaths per\n100,000 people, 1990 vs 2019") +
stat_cor(method = "pearson", aes(x=difference_gdp, y = difference_rate, color = NA), digits = 2, p.accuracy = 0.05) +
geom_smooth(method = 'lm', formula = 'y~x', se = FALSE, aes(color = NA)) +
scale_x_continuous(labels = function(x) sprintf("%+d", x)) +
scale_y_continuous(labels = function(y) sprintf("%+d", y))
I know the code to add the comma is scale_x_continuous(labels = comma) but I don't know to add it to my previous code.
I think the scales package covers this use cases
e.g
scales::number_format(prefix = "+",big.mark = ",")(1000)
maybe this works, can't test it
ggplot(data, aes(x = difference_gdp, y = difference_rate, color = region)) +
geom_point(size = 4) + xlab("Variation in GDP per capita, 1990 vs 2019") + ylab(
"Variation in age-standardised\nT2DM-attributable deaths per\n100,000 people, 1990 vs 2019"
) + stat_cor(
method = "pearson",
aes(x = difference_gdp, y = difference_rate, color = NA),
digits = 2,
p.accuracy = 0.05
) + geom_smooth(method = 'lm',
formula = 'y~x',
se = FALSE,
aes(color = NA)) + scale_x_continuous(
labels = scales::number_format(prefix = "+",big.mark = ",")
) + scale_y_continuous(
labels = scales::number_format(prefix = "+",big.mark = ",")
)

Add label in ggplot

My code with the following output (below in the picture) calculates the average price of the neighbourhood groups.
Beside the mean I also want to add the median price label. How should I add this information to the graph?
{r }
p.nbr <- ny_explor %>%
group_by(neighbourhood_group) %>%
summarise(price = round(mean(price), 2))
ggplot(ny_explor, aes(price)) +
geom_histogram(bins = 30, aes(y = ..density..), fill = "darkslategrey") +
geom_density(alpha = 0.2, fill = "darkslategrey") +
theme_bw() +
ggtitle("Distribution of price by neighbourhood groups",
subtitle = expression("With" ~'log'[10] ~ "transformation of x-axis")) +
geom_vline(data = p.nbr, aes(xintercept = price), size = 2, linetype = 3) +
geom_text(data = p.nbr,y = 1.5, aes(x = price + 1400, label = paste("Mean = ",price)), color = "saddlebrown", size = 4) +
facet_wrap(~neighbourhood_group) +
scale_x_log10()
Though it would have been easier if you could include some sample data, yet it is advised that your existing code may be modified like this, which may work. If not, please incluide some sample data
p.nbr <- ny_explor %>%
group_by(neighbourhood_group) %>%
summarise(price_mean = round(mean(price), 2),
price_median = median(price))
ggplot(ny_explor, aes(price_mean)) +
geom_histogram(bins = 30, aes(y = ..density..), fill = "darkslategrey") +
geom_density(alpha = 0.2, fill = "darkslategrey") +
theme_bw() +
ggtitle("Distribution of price by neighbourhood groups",
subtitle = expression("With" ~'log'[10] ~ "transformation of x-axis")) +
geom_vline(data = p.nbr, aes(xintercept = price_mean), size = 2, linetype = 3) +
geom_text(data = p.nbr,y = 1.5, aes(x = price_mean + 1400, label = paste("Mean = ",price_mean),
"/nMedian = ", price_median), color = "saddlebrown", size = 4) +
facet_wrap(~neighbourhood_group) +
scale_x_log10()

R graph: label by group

The data I am working on is a clustering data, with multiple observations within one group, I generated a caterpillar plot and want labelling for each group(zipid), not every line, my current graph and code look like this:
text = hosp_new[,c("zipid")]
ggplot(hosp_new, aes(x = id, y = oe, colour = zipid, shape = group)) +
# theme(panel.grid.major = element_blank()) +
geom_point(size=1) +
scale_shape_manual(values = c(1, 2, 4)) +
geom_errorbar(aes(ymin = low_ci, ymax = high_ci)) +
geom_smooth(method = lm, se = FALSE) +
scale_linetype_manual(values = linetype) +
geom_segment(aes(x = start_id, xend = end_id, y = region_oe, yend = region_oe, linetype = "4", size = 1.2)) +
geom_ribbon(aes(ymin = region_low_ci, ymax = region_high_ci), alpha=0.2, linetype = "blank") +
geom_hline(aes(yintercept = 1, alpha = 0.2, colour = "red", size = 1), show.legend = "FALSE") +
scale_size_identity() +
scale_x_continuous(name = "hospital id", breaks = seq(0,210, by = 10)) +
scale_y_continuous(name = "O:E ratio", breaks = seq(0,7, by = 1)) +
geom_text(aes(label = text), position = position_stack(vjust = 10.0), size = 2)
Caterpillar plot:
Each color represents a region, I just want one label/per region, but don't know how to delete the duplicated labels in this graph.
Any idea?
The key is to have geom_text return only one value for each zipid, rather than multiple values. If we want each zipid label located in the middle of its group, then we can use the average value of id as the x-coordinate for each label. In the code below, we use stat_summaryh (from the ggstance package) to calculate that average id value for the x-coordinate of the label and return a single label for each zipid.
library(ggplot2)
theme_set(theme_bw())
library(ggstance)
# Fake data
set.seed(300)
dat = data.frame(id=1:100, y=cumsum(rnorm(100)),
zipid=rep(LETTERS[1:10], c(10, 5, 20, 8, 7, 12, 7, 10, 13,8)))
ggplot(dat, aes(id, y, colour=zipid)) +
geom_segment(aes(xend=id, yend=0)) +
stat_summaryh(fun.x=mean, aes(label=zipid, y=1.02*max(y)), geom="text") +
guides(colour=FALSE)
You could also use faceting, as mentioned by #user20650. In the code below, panel.spacing.x=unit(0,'pt') removes the space between facet panels, while expand=c(0,0.5) adds 0.5 units of padding on the sides of each panel. Together, these ensure constant spacing between tick marks, even across facets.
ggplot(dat, aes(id, y, colour=zipid)) +
geom_segment(aes(xend=id, yend=0)) +
facet_grid(. ~ zipid, scales="free_x", space="free_x") +
guides(colour=FALSE) +
theme_classic() +
scale_x_continuous(breaks=0:nrow(dat),
labels=c(rbind(seq(0,100,5),'','','',''))[1:(nrow(dat)+1)],
expand=c(0,0.5)) +
theme(panel.spacing.x = unit(0,"pt"))

Resources