Ggplot2 axis label from column name of apply function iteration - r

I would like to change the y axis label (or main title would also be fine) of a ggplot to reflect the column name being iterated over within an apply function.
Here is some sample data and my working apply function:
trial_df <- data.frame("Patient" = c(1,1,2,2,3,3,4,4),
"Outcome" = c("NED", "NED", "NED", "NED", "Relapse","Relapse","Relapse","Relapse"),
"Time_Point" = c("Baseline", "Week3", "Baseline", "Week3","Baseline", "Week3","Baseline", "Week3"),
"CD4_Param" = c(50.8,53.1,20.3,18.1,30.8,24.5,35.2,31.0),
"CD8_Param" = c(5.3,9.7,4.4,4.3,3.1,3.2,5.6,5.3),
"CD3_Param" = c(11.6,16.6,5.0,5.1,14.3,7.1,5.9,8.1))
apply(trial_df[,4:length(trial_df)], 2, function(i) ggplot(data = trial_df, aes_string(x = "Time_Point", y = i )) +
facet_wrap(~Outcome) +
geom_boxplot(alpha = 0.1) +
geom_point(aes(color = `Outcome`, fill = `Outcome`)) +
geom_path(aes(group = `Patient`, color = `Outcome`)) +
theme_minimal() +
ggpubr::stat_compare_means( method = "wilcox.test") +
scale_fill_manual(values=c("blue", "red")) +
scale_color_manual(values=c("blue", "red")))
Example plot output
This creates 3 graphs as expected, however the y axis just says "y". I would like this to display the column name for the column in that iteration. It would also be fine to add a main title with this information, as I just need to know which graph corresponds to which column.
Here are things I have already tried adding to the ggplot code above based on some similar questions I found, but all of them give me the error "non-numeric argument to binary operator":
ggtitle(paste(i))
labs(y = i)
labs(y = as.character(i))
Any help or resources I may have missed would be greatly appreciated, thanks!

So.....for the strangest of reasons I cannot figure out why. This gives what you want but for only one graph!!!
apply(trial_df[,4:length(trial_df)], 2, function(i) ggplot(data = trial_df, aes_string(x = "Time_Point", y = i )) +
facet_wrap(~Outcome) +
geom_boxplot(alpha = 0.1) +
geom_point(aes(color = `Outcome`, fill = `Outcome`)) +
geom_path(aes(group = `Patient`, color = `Outcome`)) +
theme_minimal() +
stat_compare_means( method = "wilcox.test") +
scale_fill_manual(values=c("blue", "red")) +
scale_color_manual(values=c("blue", "red"))+
labs(y=colnames(trial_df)[i]))
Gives these:

Related

Group geom_vline for a conditional

I believe I'm going about this incorrectly.
I have a ggplot that has several lines graphed into it. Each line is categorized under a 'group.' (ie. predator lines include lines for bear frequency, lion_frequency; prey lines include lines for fish frequency, rabbit_frequency; etc.)
Here's a reproducible example using dummy data
p <- function(black_lines, green_lines){
ggplot(mtcars, aes(x = wt, y = mpg)) + geom_point() +
geom_vline(xintercept = 5) +
geom_vline(xintercept = 10) +
geom_vline(xintercept = 1:5,
colour = "green",
linetype = "longdash")
}
p()
Ideally, it would work like:
p <- function(black_lines, green_lines){
ggplot(mtcars, aes(x = wt, y = mpg)) + geom_point() +
if (black_lines){
geom_vline(xintercept = 5) +
geom_vline(xintercept = 10) +
}
if(green_lines){
geom_vline(xintercept = 1:5,
colour = "green",
linetype = "longdash")
}
}
p(T, T)
This method won't work, of course since R doesn't like ->
Error in ggplot_add():
! Cannot add ggproto objects together. Did you forget to add this object to a ggplot object?
But I'm wondering if this is possible? I couldn't find any similar questions so I feel like I'm going about wrongly.
For those who believe more context is needed. This is for a reactive Shiny app and I want the user to be able to select how the graph will be generated (as such with specific lines or not).
Thank you for your guidance in advance!
You could create your conditional layers using an if and assign them to a variable which could then be added to your ggplot like any other layer:
Note: In case you want to include multiple layers then you could put them in a list, e.g. list(geom_vline(...), geom_vline(...)).
library(ggplot2)
p <- function(black_lines, green_lines){
vline_black <- if (black_lines) geom_vline(xintercept = c(5, 10))
vline_green <- if (green_lines) geom_vline(xintercept = 1:5,
colour = "green",
linetype = "longdash")
ggplot(mtcars, aes(x = wt, y = mpg)) +
geom_point() +
vline_black +
vline_green
}
p(T, T)
p(T, F)

geom_contour on adjusted number of distributions

I want to plot of Gaussian distributions that are included in samp, but the size of samp may varies. Here there are 4 components but it may there are less or more components. How could I adjust the code two plot as many components as there are in samp?
ggplot(samp, aes(x=seq1, y=seq2)) +
geom_contour(mapping = aes(z=prob.1), color = "tomato") +
geom_contour(mapping = aes(z=prob.2), color = "darkblue") +
geom_contour(mapping = aes(z=prob.3), color = "green4") +
geom_contour(mapping = aes(z=prob.4), color = "purple") +
labs(x = "PC1", y = "PC2", title="Posterior probability") +
theme_bw()
The best way to do this is probably to pivot all columns whose names include the string "prob" into long format. That way, it's also easy to map the name of each probability to the color aesthetic, which gives you the option of a legend:
library(tidyverse)
ggplot(pivot_longer(samp, contains("prob")), aes(seq1, seq2, colour = name)) +
geom_contour(aes(z = value)) +
labs(x = "PC1", y = "PC2", title="Posterior probability") +
theme_bw()
Sample data used
set.seed(1)
samp <- setNames(as.data.frame(replicate(4, rnorm(100))), paste0("prob.", 1:4))
samp$seq1 <- rep(1:10, 10)
samp$seq2 <- rep(1:10, each = 10)

Line break with x10^x notation ggplot title

I was able to get plots that look like the attached image by using "\n" for the line break. However, the problem is the 8.8e-14. The journal requests I change it to 8.8x10^14 (with the 14 in superscript). However, that only works if I use expression(paste). But in that case, the "\n" doesn't cause a line break anymore. I've spent about 5 hours trial and erroring through different solutions on the internet to no avail. Does anyone have a solution? Thanks in advance.
What I would like it to look like (except I would like it to say 8.8x10^14 instead):
The below works EXCEPT there's no line break (I would like a line break before "Interaction")
plot_fun_to_revise = function(x, y) {
ggplot(data = data_for_median_plots, aes(x = .data[[x]], y = .data[[y]], group = Secretor, linetype = Secretor)) +
stat_summary(geom = "line", fun.data = median_hilow, size = 0.5) +
stat_sum_df_all("median_hilow",
fun.args = (conf.int = 0.5),
linetype = "solid",
size = 0.5) +
theme_classic()
lnnt_plot_median <- plot_fun_to_revise("Timepoint", "LNnT") +
ylim(0,5000) +
labs(y = paste("LNnT", "(\u03BCg/mL)"),
title = expression(paste("Time p = 8.8 x", 10^-14, ", Secretor p = 0.35, Interaction p = 0.51")),
x = "Time (months postpartum)"
Obviously, I don't have your data to replicate the plot itself, but since this is about labelling anyway, let's just make an (essentially) empty plot:
lnnt_plot_median <- ggplot(data.frame(x = 1, y = 1), aes(x, y)) +
geom_point() +
theme_classic() +
theme(text = element_text(face = 2, size = 16),
plot.title = element_text(hjust = 0.5))
Since you are using unicode escapes already, I think the easiest thing to do here is use the unicode escapes for superscript 1 and superscript 4:
lnnt_plot_median +
labs(y = paste("LNnT", "(\u03BCg/mL)"),
title = paste("Time p = 8.8 x 10\u00b9\u2074,",
"Secretor p = 0.35,\n Interaction p = 0.51"),
x = "Time (months postpartum)")

R: placing a text with combination of variables over bars in ggplot

Lets draw a bar chart with ggplot2 from the following data (already in a long format). The values of the variable are then placed in the middle of the bars via geom_text() directive.
stuff.dat<-read.csv(text="continent,stuff,num
America,apples,13
America,bananas,13
Europe,apples,30
Europe,bananas,21
total,apples,43
total,bananas,34")
library(ggplot2)
ggplot(stuff.dat, aes(x=continent, y=num,fill=stuff))+geom_col() +
geom_text(position = position_stack(vjust=0.5),
aes(label=num))
Now it is necessary to add on top of the bars the "Apple-Bananas Index", which is defined as f=apples/bananas - just as manually added in the figure. How to program this in ggplot? How it would be possible to add it to the legend as a separate entry?
I think that the easiest way to achieve this is to prepare the data before you create the plot. I define a function abi() that computes the apple-banana-index from stuff.dat given a continent:
abi <- function(cont) {
with(stuff.dat,
num[continent == cont & stuff == "apples"] / num[continent == cont & stuff == "bananas"]
)
}
And then I create a data frame with all the necessary data:
conts <- levels(stuff.dat$continent)
abi_df <- data.frame(continent = conts,
yf = aggregate(num ~ continent, sum, data = stuff.dat)$num + 5,
abi = round(sapply(conts, abi), 1))
Now, I can add that information to the plot:
library(ggplot2)
ggplot(stuff.dat, aes(x = continent, y = num, fill = stuff)) +
geom_col() +
geom_text(position = position_stack(vjust = 0.5), aes(label = num)) +
geom_text(data = abi_df, aes(y = yf, label = paste0("f = ", abi), fill = NA))
Adding fill = NA to the geom_text() is a bit of a hack and leads to a warning. But if fill is not set, plotting will fail with a message that stuff was not found. I also tried to move fill = stuff from ggplot() to geom_col() but this breaks the y⁻coordinate of the text labels inside the bars. There might be a cleaner solution to this, but I haven't found it yet.
Adding the additional legend is, unfortunately, not trivial, because one cannot easily add text outside the plot area. This actually needs two steps: first one adds text using annotation_custom(). Then, you need to turn clipping off to make the text visible (see, e.g., here). This is a possible solution:
p <- ggplot(stuff.dat, aes(x = continent, y = num, fill = stuff)) +
geom_col() +
geom_text(position = position_stack(vjust = 0.5), aes(label = num)) +
geom_text(data = abi_df, aes(y = yf, label = paste0("f = ", abi), fill = NA)) +
guides(size = guide_legend(title = "f: ABI", override.aes = list(fill = 1))) +
annotation_custom(grob = textGrob("f: ABI\n(Apple-\nBanana-\nIndex",
gp = gpar(cex = .8), just = "left"),
xmin = 3.8, xmax = 3.8, ymin = 17, ymax = 17)
# turn off clipping
library(grid)
gt <- ggplot_gtable(ggplot_build(p))
gt$layout$clip[gt$layout$name == "panel"] <- "off"
grid.draw(gt)

How to show every second R ggplot2 x-axis label value?

I want to show every second of x-axis label list in the presentation.
Simplified code example in the following and its output in Fig. 1 where four Dates shown but #2 and #4 should be skipped.
# https://stackoverflow.com/a/6638722/54964
require(ggplot2)
my.dates = as.Date(c("2011-07-22","2011-07-23",
"2011-07-24","2011-07-28","2011-07-29"))
my.vals = c(5,6,8,7,3)
my.data <- data.frame(date =my.dates, vals = my.vals)
plot(my.dates, my.vals)
p <- ggplot(data = my.data, aes(date,vals))+ geom_line(size = 1.5)
Expected output: skip dates second and fourth.
Actual code
Actual code where due to rev(Vars) logic, I cannot apply as.Date to the values in each category; the variable molten has a column Dates
p <- ggplot(molten, aes(x = rev(Vars), y = value)) +
geom_bar(aes(fill=variable), stat = "identity", position="dodge") +
facet_wrap( ~ variable, scales="free") +
scale_x_discrete("Column name dates", labels = rev(Dates))
Expected output: skip #2,#4, ... values in each category.
I thought here changing scale_x_discrete to scale_x_continuous and having a break sequence breaks = seq(1,length(Dates),2)) in scale_x_continuous but it fails because of the following error.
Error: `breaks` and `labels` must have the same length
Proposal based Juan's comments
Code
ggplot(data = my.data, aes(as.numeric(date), vals)) +
geom_line(size = 1.5) +
scale_x_continuous(breaks = pretty(as.numeric(rev(my.data$date)), n = 5))
Output
Error: Discrete value supplied to continuous scale
Testing EricWatt's proposal application into Actual code
Code proposal
p <- ggplot(molten, aes(x = rev(Vars), y = value)) +
geom_bar(aes(fill=variable), stat = "identity", position="dodge") +
facet_wrap( ~ variable, scales="free") +
scale_x_discrete("My dates", breaks = Dates[seq(1, length(Dates), by = 2)], labels = rev(Dates))
Output
Error: `breaks` and `labels` must have the same length
If you have scale_x_discrete("My dates", breaks = Dates[seq(1, length(Dates), by = 2)]), you get x-axis without any labels so blank.
Fig. 1 Output of the simplified code example,
Fig. 2 Output of EricWatt's first proposal
OS: Debian 9
R: 3.4.0
This works with your simplified example. Without your molten data.frame it's hard to check it against your more complicated plot.
ggplot(data = my.data, aes(date, vals)) +
geom_line(size = 1.5) +
scale_x_date(breaks = my.data$date[seq(1, length(my.data$date), by = 2)])
Basically, use scale_x_date which will likely handle any strange date to numeric conversions for you.
My solution eventually on the actual code motivated by the other linked thread and EricWatt's answer
# Test data of actual data here # https://stackoverflow.com/q/45130082/54964
ggplot(data = molten, aes(x = as.Date(Time.data, format = "%d.%m.%Y"), y = value)) +
geom_bar(aes(fill = variable), stat = "identity", position = "dodge") +
facet_wrap( ~ variable, scales="free") +
theme_bw() + # has to be before axis text manipulations because disables their effect otherwise
theme(axis.text.x = element_text(angle = 90, hjust=1),
text = element_text(size=10)) +
scale_x_date(date_breaks = "2 days", date_labels = "%d.%m.%Y")

Resources