Symbols in ggplot2 breaks - r

I have data with the word alpha in it, and I'd like to use ggplot2 to render the alpha in the breaks as the symbol.
df <- data.frame(Method = c("Method (alpha = 0.01)", "Method (alpha = 0.05)"),
Value = c(2,3))
ggplot(df, aes(x = Method,
y = Value)) +
geom_point()
I couldn't find this on the site, but I don't think it will be that difficult a question. I can get single values in breaks to work using the expression command in ggplot2::xlab, etc., but I can't figure out how to create a vector of expressions. For example, the code
c(expression("Method (alpha = 0.01)"),
+ expression("Method (alpha = 0.05)"))
gives as output
expression("Method (alpha = 0.01)", "Method (alpha = 0.05)")

You can use parse as in the following possibilities. I think this is easier than having to write out lists of expressions.
Edit
To increase the space between 'Method' and the rest,
df$Method <- gsub("Method", "Method~", as.character(df$Method))
Then, plot
ggplot(df, aes(x = Method, y = Value)) +
geom_point() +
scale_x_discrete(labels = parse(text=gsub('=','==',as.character(df$Method))))
or
ggplot(df, aes(x = Method, y = Value)) +
geom_point() +
scale_x_discrete(labels = parse(text=paste("alpha", c(0.01, 0.05), sep="==")))
The result from the first one,

Related

How do I add data labels to a ggplot histogram with a log(x) axis?

I am wondering how to add data labels to a ggplot showing the true value of the data points when the x-axis is in log scale.
I have this data:
date <- c("4/3/2021", "4/7/2021","4/10/2021","4/12/2021","4/13/2021","4/13/2021")
amount <- c(105.00, 96.32, 89.00, 80.84, 121.82, 159.38)
address <- c("A","B","C","D","E","F")
df <- data.frame(date, amount, address)
And I plot it in ggplot2:
plot <- ggplot(df, aes(x = log(amount))) +
geom_histogram(binwidth = 1)
plot + theme_minimal() + geom_text(label = amount)
... but I get the error
"Error: geom_text requires the following missing aesthetics: y"
I have 2 questions as a result:
Why am I getting this error with geom_histogram? Shouldn't it assume to use count as the y value?
Will this successfully show the true values of the data points from the 'amount' column despite the plot's log scale x-axis?
Perhaps like this?
ggplot(df, aes(x = log(amount), y = ..count.., label = ..count..)) +
geom_histogram(binwidth = 1) +
stat_bin(geom = "text", binwidth = 1, vjust = -0.5) +
theme_minimal()
ggplot2 layers do not (at least in any situations I can think of) take the summary calculations of other layers, so I think the simplest thing would be to replicate the calculation using stat_bin(geom = "text"...
Or perhaps simpler, you could pre-calculate the numbers:
library(dplyr)
df %>%
count(log_amt = round(log(amount))) %>%
ggplot(aes(log_amt, n, label = n)) +
geom_col(width = 1) +
geom_text(vjust = -0.5)
EDIT -- to show buckets without the log transform we could use:
df %>%
count(log_amt = round(log(amount))) %>%
ggplot(aes(log_amt, n, label = n)) +
geom_col(width = 0.5) +
geom_text(vjust = -0.5) +
scale_x_continuous(labels = ~scales::comma(10^.),
minor_breaks = NULL)

Ggplot2 axis label from column name of apply function iteration

I would like to change the y axis label (or main title would also be fine) of a ggplot to reflect the column name being iterated over within an apply function.
Here is some sample data and my working apply function:
trial_df <- data.frame("Patient" = c(1,1,2,2,3,3,4,4),
"Outcome" = c("NED", "NED", "NED", "NED", "Relapse","Relapse","Relapse","Relapse"),
"Time_Point" = c("Baseline", "Week3", "Baseline", "Week3","Baseline", "Week3","Baseline", "Week3"),
"CD4_Param" = c(50.8,53.1,20.3,18.1,30.8,24.5,35.2,31.0),
"CD8_Param" = c(5.3,9.7,4.4,4.3,3.1,3.2,5.6,5.3),
"CD3_Param" = c(11.6,16.6,5.0,5.1,14.3,7.1,5.9,8.1))
apply(trial_df[,4:length(trial_df)], 2, function(i) ggplot(data = trial_df, aes_string(x = "Time_Point", y = i )) +
facet_wrap(~Outcome) +
geom_boxplot(alpha = 0.1) +
geom_point(aes(color = `Outcome`, fill = `Outcome`)) +
geom_path(aes(group = `Patient`, color = `Outcome`)) +
theme_minimal() +
ggpubr::stat_compare_means( method = "wilcox.test") +
scale_fill_manual(values=c("blue", "red")) +
scale_color_manual(values=c("blue", "red")))
Example plot output
This creates 3 graphs as expected, however the y axis just says "y". I would like this to display the column name for the column in that iteration. It would also be fine to add a main title with this information, as I just need to know which graph corresponds to which column.
Here are things I have already tried adding to the ggplot code above based on some similar questions I found, but all of them give me the error "non-numeric argument to binary operator":
ggtitle(paste(i))
labs(y = i)
labs(y = as.character(i))
Any help or resources I may have missed would be greatly appreciated, thanks!
So.....for the strangest of reasons I cannot figure out why. This gives what you want but for only one graph!!!
apply(trial_df[,4:length(trial_df)], 2, function(i) ggplot(data = trial_df, aes_string(x = "Time_Point", y = i )) +
facet_wrap(~Outcome) +
geom_boxplot(alpha = 0.1) +
geom_point(aes(color = `Outcome`, fill = `Outcome`)) +
geom_path(aes(group = `Patient`, color = `Outcome`)) +
theme_minimal() +
stat_compare_means( method = "wilcox.test") +
scale_fill_manual(values=c("blue", "red")) +
scale_color_manual(values=c("blue", "red"))+
labs(y=colnames(trial_df)[i]))
Gives these:

ggplot: plotting more than one function on one plot

I have a task to plot histogram using my data (here) named NoPodsWeight, its density and normal distribution for this segment (min(NoPodsWeight) and max(NoPodsWeight)).
I am trying this:
myframe <- read.csv(filepath, fileEncoding = "UTF-8", stringsAsFactors = FALSE)
myframe <- myframe[rowSums(is.na(myframe)) <= 0,]
nopodsweight <- myframe$NoPodsWeight
height <- myframe$Height
ggplot(myframe, aes(x = NoPodsWeight, y = ..density..)) +
geom_histogram(color="black", fill="white") +
geom_density(color = "blue") +
stat_function(fun = dnorm, args = list(mean = mean(myframe$NoPodsWeight), sd = sd(myframe$NoPodsWeight)))
Using this code I get an error:
Error: Aesthetics must be valid computed stats. Problematic aesthetic(s): y =
..density...
Did you map your stat in the wrong layer?
I don't understand how to plot two or more functions on one plot. For example I can solve my problem using standard plot (but without density):
hist(x = nopodsweight, freq = F, ylim = c(0, 0.45), breaks = 37)
n_norm<-seq(min(nopodsweight)-1, max(nopodsweight)+1, 0.0001)
lines(n_norm, dnorm(n_norm), col = "red")
Is there any function in ggplot to plot (normal) distribution (or maybe using another function) like in lines?
You need to take ..density.. out of the ggplot() layer and put it specifically in the geom_histogram layer. I didn't download and import your data, but here's an example on mtcars:
ggplot(mtcars, aes(x = mpg)) +
geom_histogram(aes(y = ..density..)) +
geom_density(color = "blue") +
stat_function(fun = dnorm, args = list(mean = mean(mtcars$mpg), sd = sd(mtcars$mpg)))
The error message says "did you map your stat in the wrong layer?"; that's a hint. Moving aes(y=..density..) to apply specifically to geom_histogram() seems to make everything OK ...
ggplot(myframe, aes(x = NoPodsWeight)) +
geom_histogram(color="black", fill="white",
aes(y = ..density..)) +
## [... everything else ...]

How to define a function that invokes ggplot2 functions and takes a variable name among its arguments?

As an example, suppose that I have this snippet of code:
binwidth <- 0.01
my.histogram <- ggplot(my.data, aes(x = foo, fill = type)) +
geom_histogram(binwidth = binwidth,
aes(y = ..density..),
position = "identity",
alpha = 0.5) +
lims(x = c(0 - binwidth, 1 + binwidth), y = c(0, 100)) +
labs(x = "foo", y = "density")
Further, suppose that my.data has many other columns besides foo that could be plotted using pretty much the same code. Therefore, I would like to define a helper function make.histogram, so that I could replace the assignment above with something like:
my.histogram <- make.histogram(foo, bindwidth = 0.01)
Actually, this looks a bit weird to me. Would R complain that foo is not defined? Maybe the call would have to be this instead:
my.histogram <- make.histogram("foo", binwidth = 0.01)
Be that as it may, how would one define make.histogram?
For the purpose of this question, make.histogram may treat my.data as a global variable.
Also, note that in the snippet above, foo appears twice, once (as a variable) as the x argument in the first aes call, and once (as a string) as the x argument in the labs call. In other words, the make.histogram functions needs somehow to translate the column specified in its first argument into both a variable name and a string.
Not sure to understand your question.
Why couldn't you use aes_string() and define a function like below ?
make.histogram <- function(variable) {
p <- ggplot(my.data, aes_string(x = variable, fill = "type")) + (...) + xlab(variable)
print(p)
}
Since ggplot is part of the tidyverse, I think tidyeval will come in handy:
make.histogram <- function(var = "foo", bindwith = 0.01) {
varName <- as.name(var)
enquo_varName <- enquo(varName)
ggplot(my.data, aes(x = !!enquo_varName, fill = type)) +
...
labs(x = var)
}
Basically, with as.name() we generate a name object that matches var (here var is a string like "foo"). Then, following Programming with dplyr, we use enquo() to look at that name and return the associated value as a quosure. This quosure can then be unquoted inside the ggplot() call using !!.
After reading the material that #andrew.punnett linked in his comment, it was very easy to code the desired function:
make.histogram <- function(column.name, binwidth = 0.02) {
base.aes <- eval(substitute(aes(x = column.name, fill = type)))
x.label <- deparse(substitute(column.name))
ggplot(my.data, base.aes) +
geom_histogram(binwidth = binwidth,
aes(y = ..density..),
position = "identity",
alpha = 0.5) +
lims(x = c(0 - binwidth, 1 + binwidth), y = c(0, 100)) +
labs(x = x.label, y = "density")
}
my.histogram <- make.histogram(foo, binwidth = 0.01)
The benefit of this solution is its generality: it relies only on base R functions (substitute, eval, and deparse), so it can be easily ported to situations outside of the ggplot2 context.

Plot mean and sd of dataset per x value using ggplot2

I have a dataset that looks a little like this:
a <- data.frame(x=rep(c(1,2,3,5,7,10,15,20), 5),
y=rnorm(40, sd=2) + rep(c(4,3.5,3,2.5,2,1.5,1,0.5), 5))
ggplot(a, aes(x=x,y=y)) + geom_point() +geom_smooth()
I want the same output as that plot, but instead of smooth curve, I just want to take line segments between the mean/sd values for each set of x values. The graph should look similar to the above graph, but jagged, instead of curved.
I tried this, but it fails, even though the x values aren't unique:
ggplot(a, aes(x=x,y=y)) + geom_point() +stat_smooth(aes(group=x, y=y, x=x))
geom_smooth: Only one unique x value each group.Maybe you want aes(group = 1)?
?stat_summary is what you should look at.
Here is an example
# functions to calculate the upper and lower CI bounds
uci <- function(y,.alpha){mean(y) + qnorm(abs(.alpha)/2) * sd(y)}
lci <- function(y,.alpha){mean(y) - qnorm(abs(.alpha)/2) * sd(y)}
ggplot(a, aes(x=x,y=y)) + stat_summary(fun.y = mean, geom = 'line', colour = 'blue') +
stat_summary(fun.y = mean, geom = 'ribbon',fun.ymax = uci, fun.ymin = lci, .alpha = 0.05, alpha = 0.5)
You can use one of the built-in summary functions mean_sdl. The code is shown below
ggplot(a, aes(x=x,y=y)) +
stat_summary(fun.y = 'mean', colour = 'blue', geom = 'line')
stat_summary(fun.data = 'mean_sdl', geom = 'ribbon', alpha = 0.2)
Using ggplot2 0.9.3.1, the following did the trick for me:
ggplot(a, aes(x=x,y=y)) + geom_point() +
stat_summary(fun.data = 'mean_sdl', mult = 1, geom = 'smooth')
The 'mean_sdl' is an implementation of the Hmisc package's function 'smean.sdl' and the mult-variable gives how many standard deviations (above and below the mean) are displayed.
For detailed info on the original function:
library('Hmisc')
?smean.sdl
You could try writing a summary function as suggested by Hadley Wickham on the website for ggplot2: http://had.co.nz/ggplot2/stat_summary.html. Applying his suggestion to your code:
p <- qplot(x, y, data=a)
stat_sum_df <- function(fun, geom="crossbar", ...) {
stat_summary(fun.data=fun, colour="blue", geom=geom, width=0.2, ...)
}
p + stat_sum_df("mean_cl_normal", geom = "smooth")
This results in this graphic:

Resources