ggplot2: annotation with text, sub/superscript, and calculated values - r

I have searched here for a while and my question was partially answered by previous questions/answers. I am learning R, coming from Matlab. As the title says, I have a question about plot annotations. In Matlab it was fairly straightforward to have plot annotations that contained all sorts of data formats, and I am looking for something similar in R. I have already discovered paste and managed to put text and numbers into one annotation and I also figured out (to a degree...) what parse does, for example when displaying an r squared. My question is, how do I combine the two annotations in the code snippet into one annotation without R yelling at me? My solution with two annotations works for what I need, but I simply would like to know how to do it...
a <- 30 # some coefficients
b <- 70
r2 <- 0.87
anno1 <- paste("y = ",b,"ln(x) + ",a) # first annotation with a random equation
anno2 <- paste("r^2 == ", r2) # second annotation with a random r squared
Pdata <- data.frame("X" = 1:10, "Y" = 1:10) # some data
ggplot(Pdata,aes(x=Pdata$X,y=Pdata$Y)) +
geom_point() +
annotate("text", x=2, y=8, label=anno1, parse=FALSE) +
annotate("text", x=2, y=7, label=anno2, parse=TRUE)
Thanks y'all!

It took a while for me to figure this out (for my own projects), but here's a solution:
anno3 <- paste("'y ='~",b,"~'ln(x) +'~",a,"~r^2==~", r2)
Add it to your plot using + annotate("text", x=2, y=6, label=anno3, parse=TRUE)
The single quote identifies text to not evaluate. Combined, the pasted result should be written like an expression.

Here is one way to do the requested operation by using bquote
ggplot() +
geom_point(aes(x = 1:4, y = 1:4)) +
annotate("text", x=2, y=3,
label = deparse(bquote(~y ==~ .(b) ~ln(x)~ + .(a) ~r^2 ==~ .(r2))),
parse = T)
bquote quotes its argument except the terms wrapped in .() which are evaluated
annotate does not support expressions, one trick to get it to work is to deparse it and then parse it again

Related

How do I loop a ggplot2 functon to export and save about 40 plots?

I am trying to loop a ggplot2 plot with a linear regression line over it. It works when I type the y column name manually, but the loop method I am trying does not work. It is definitely not a dataset issue.
I've tried many solutions from various websites on how to loop a ggplot and the one I've attempted is the simplest I could find that almost does the job.
The code that works is the following:
plots <- ggplot(Everything.any, mapping = aes(x = stock_VWRETD, y = stock_10065)) +
geom_point() +
labs(x = 'Market Returns', y = 'Stock Returns', title ='Stock vs Market Returns') +
geom_smooth(method='lm',formula=y~x)
But I do not want to do this another 40 times (and then 5 times more for other reasons). The code that I've found on-line and have tried to modify it for my means is the following:
plotRegression <- function(z,na.rm=TRUE,...){
nm <- colnames(z)
for (i in seq_along(nm)){
plots <- ggplot(z, mapping = aes(x = stock_VWRETD, y = nm[i])) +
geom_point() +
labs(x = 'Market Returns', y = 'Stock Returns', title ='Stock vs Market Returns') +
geom_smooth(method='lm',formula=y~x)
ggsave(plots,filename=paste("regression1",nm[i],".png",sep=" "))
}
}
plotRegression(Everything.any)
I expect it to be the nice graph that I'd expect to get, a Stock returns vs Market returns graph, but instead on the y-axis, I get one value which is the name of the respective column, and the Market value plotted as normally, but as if on a straight number-line across the one y-axis value. Please let me know what I am doing wrong.
Desired Plot:
Actual Plot:
Sample Data is available on Google Drive here:
https://drive.google.com/open?id=1Xa1RQQaDm0pGSf3Y-h5ZR0uTWE-NqHtt
The problem is that when you assign variables to aesthetics in aes, you mix bare names and strings. In this example, both X and Y are supposed to be variables in z:
aes(x = stock_VWRETD, y = nm[i])
You refer to stock_VWRETD using a bare name (as required with aes), however for y=, you provide the name as a character vector produced by colnames. See what happens when we replicate this with the iris dataset:
ggplot(iris, aes(Petal.Length, 'Sepal.Length')) + geom_point()
Since aes expects variable names to be given as bare names, it doesn't interpret 'Sepal.Length' as a variable in iris but as a separate vector (consisting of a single character value) which holds the y-values for each point.
What can you do? Here are 2 options that both give the proper plot
1) Use aes_string and change both variable names to character:
ggplot(iris, aes_string('Petal.Length', 'Sepal.Length')) + geom_point()
2) Use square bracket subsetting to manually extract the appropriate variable:
ggplot(iris, aes(Petal.Length, .data[['Sepal.Length']])) + geom_point()
you need to use aes_string instead of aes, and double-quotes around your x variable, and then you can directly use your i variable. You can also simplify your for loop call. Here is an example using iris.
library(ggplot2)
plotRegression <- function(z,na.rm=TRUE,...){
nm <- colnames(z)
for (i in nm){
plots <- ggplot(z, mapping = aes_string(x = "Sepal.Length", y = i)) +
geom_point()+
geom_smooth(method='lm',formula=y~x)
ggsave(plots,filename=paste("regression1_",i,".png",sep=""))
}
}
myiris<-iris
plotRegression(myiris)

add symbol to just one axis label in R [duplicate]

I have a plot which is generated thus:
ggplot(dt.2, aes(x=AgeGroup, y=Prevalence)) +
geom_errorbar(aes(ymin=lower, ymax=upper), colour="black", width=.2) +
geom_point(size=2, colour="Red")
I control the x axis labels like this:
scale_x_discrete(labels=c("0-29","30-49","50-64","65-79",">80","All")) +
This works but I need to change the ">80" label to "≥80".
However "≥80" is displayed as "=80".
How can I display the greater than or equal sign ?
An alternative to using expressions is Unicode characters, in this case Unicode Character 'GREATER-THAN OR EQUAL TO' (U+2265). Copying #mnel's example
.d <- data.frame(a = letters[1:6], y = 1:6)
ggplot(.d, aes(x=a,y=y)) + geom_point() +
scale_x_discrete(labels = c(letters[1:5], "\u2265 80"))
Unicode is a good alternative if you have trouble remembering the complicated expression syntax or if you need linebreaks, which expressions don't allow. As a downside, whether specific Unicode characters work at all depends on your graphics device and font of choice.
You can pass an expression (including phantom(...) to fake a leading >= within
the label argument to scale_x_discrete(...)
for example
.d <- data.frame(a = letters[1:6], y = 1:6)
ggplot(.d, aes(x=a,y=y)) + geom_point() +
scale_x_discrete(labels = c(letters[1:5], expression(phantom(x) >=80))
See ?plotmath for more details on creating mathematical expressions and
this related SO question and answer
plot(5, ylab=expression("T ">="5"))
You can use
expression("">=80)
So your full axis label like would look like:
scale_x_discrete(labels=c("0-29","30-49","50-64","65-79",expression("">=80),"All")) +
I have had trouble exporting plots when using unicode, but the expression function is more consistent.

Label or annotation with subscript and variable source

I have an R routine which creates a number of plots from a large set of data. Each plot is labeled with a titles describing the details of the set of points plotted. Unfortunately, I have not been able to use subscripts in the text if I am using paste to combine a complex label. The result is ugly. This is a simplified version of the code using data from R. The title shows the technique I am currently using, without subscripts. The attempt at an improved version is placed either on the x axis or on the plot.
library(ggplot2)
x1 = 1
x2 = 2
list <- c(1:4)
tle <- paste("VGM = ", as.character(list[1]),
"V, VDM = ", as.character(list[2]),
"V, VGF = ", as.character(list[3]),
"V, VDF = ", as.character(list[4]),
"V", sep="")
p <- ggplot(mtcars, aes(x=wt, y=mpg)) +
labs(title=tle) +
geom_point()
p
p + xlab(expression(V[DM])) #works fine
p + xlab(expression(paste(V[DM], "= 3"))) # works fine
# now we would like to use a variable to provide the number
p + xlab(expression(paste(V[DM], "=", x1))) # Just displays "x1", not value of x1
p + xlab(expression(paste(V[DM], "=",
as.character(x1)))) # NO
p + xlab(expression(paste(V[DM], "=",
as.character(as.number(x1))))) # NO
my.xlab1 <- bquote(V[DM] == .(x1))
p + xlab(my.xlab1) # We can see the success here
# A single variable at the end of the expression works
# What if you wanted to display two variables?
my.xlab2 <- bquote(V[GM] == .(x2))
my.xlab3 <- paste(my.xlab1, my.xlab2)
p + xlab(my.xlab3) # doesn't work
# Apparently the expressions cannot be pasted together. Try another approach.
# Place the two expressions separately on the plot. They no longer need to be
# pasted together. It would look better, anyway. Would it work?
p + annotate("text", x=4, y=30, label="Annotate_text", parse=TRUE)
# This is the idea
# p + annotate("text", x=4, y=30, label=bquote(V[DM] == .(x1)), parse=TRUE)
# This is a disaster
# RStudio stops the process with a question mark placed on the console. Appears that
# more input is being requested?
p + geom_text(x=4, y=30, label="Geom_text") # works
p + geom_text(x=4, y=30, label=my.xlab1) # does not accept variables.
I have included comments which describe the problems raised by each attempt. Ideally, the information should probably be placed as an annotation on the plot rather than as a title, but I cannot find a way to do this. Using a subscript turns a character into an expression, and it seems that there is a long list of functions which handle characters but not expressions.
If you want to "paste" two expressions together, you need to have some "operator" join them. There really isn't a paste method for expressions, but there are ways to put them together. First, obviously you could use one bquote() to put both variables together. Either
my.xlab3 <- bquote(V[DM] == .(x1)~ V[GM] == .(x2))
my.xlab3 <- bquote(list(V[DM] == .(x1), V[GM] == .(x2)))
would work. The first puts a space between them, the second puts a comma between them. But if you want to build them separately, you can combine them with another round of bquote. So the equivalent building method for the two above expressions is
my.xlab3 <- bquote(.(my.xlab1) ~ .(my.xlab2))
my.xlab3 <- bquote(list(.(my.xlab1), .(my.xlab2)))
All of those should work to set your xlab() value.
Now, if you also want to get annotate to work, you can "un-parse" your expression and then have R "re-parse" it for you and you should be all set. Observe
p + annotate("text", x=4, y=30, label=deparse(my.xlab3), parse=TRUE)

How do I change the formatting of numbers on an axis with ggplot? [duplicate]

This question already has answers here:
Force R to stop plotting abbreviated axis labels (scientific notation) - e.g. 1e+00
(9 answers)
Closed 8 months ago.
The community reviewed whether to reopen this question 8 months ago and left it closed:
Original close reason(s) were not resolved
I'm using R and ggplot to draw a scatterplot of some data, all is fine except that the numbers on the y-axis are coming out with computer style exponent formatting, i.e. 4e+05, 5e+05, etc. This is unacceptable to me, so I want to get it to display them as 500,000, 400,000, and so on. Getting a proper exponent notation would also be acceptable.
The code for the plot is as follows:
p <- ggplot(valids, aes(x=Test, y=Values)) +
geom_point(position="jitter") +
facet_grid(. ~ Facet) +
scale_y_continuous(name="Fluorescent intensity/arbitrary units") +
scale_x_discrete(name="Test repeat") +
stat_summary(fun.ymin=median, fun.ymax=median, fun.y=median, geom="crossbar")
Any help much appreciated.
Another option is to format your axis tick labels with commas is by using the package scales, and add
scale_y_continuous(name="Fluorescent intensity/arbitrary units", labels = comma)
to your ggplot statement.
If you don't want to load the package, use:
scale_y_continuous(name="Fluorescent intensity/arbitrary units", labels = scales::comma)
I also found another way of doing this that gives proper 'x10(superscript)5' notation on the axes. I'm posting it here in the hope it might be useful to some. I got the code from here so I claim no credit for it, that rightly goes to Brian Diggs.
fancy_scientific <- function(l) {
# turn in to character string in scientific notation
l <- format(l, scientific = TRUE)
# quote the part before the exponent to keep all the digits
l <- gsub("^(.*)e", "'\\1'e", l)
# turn the 'e+' into plotmath format
l <- gsub("e", "%*%10^", l)
# return this as an expression
parse(text=l)
}
Which you can then use as
ggplot(data=df, aes(x=x, y=y)) +
geom_point() +
scale_y_continuous(labels=fancy_scientific)
x <- rnorm(10) * 100000
y <- seq(0, 1, length = 10)
p <- qplot(x, y)
library(scales)
p + scale_x_continuous(labels = comma)
I'm late to the game here but in-case others want an easy solution, I created a set of functions which can be called like:
ggplot + scale_x_continuous(labels = human_gbp)
which give you human readable numbers for x or y axes (or any number in general really).
You can find the functions here: Github Repo
Just copy the functions in to your script so you can call them.
I find Jack Aidley's suggested answer a useful one.
I wanted to throw out another option. Suppose you have a series with many small numbers, and you want to ensure the axis labels write out the full decimal point (e.g. 5e-05 -> 0.0005), then:
NotFancy <- function(l) {
l <- format(l, scientific = FALSE)
parse(text=l)
}
ggplot(data = data.frame(x = 1:100,
y = seq(from=0.00005,to = 0.0000000000001,length.out=100) + runif(n=100,-0.0000005,0.0000005)),
aes(x=x, y=y)) +
geom_point() +
scale_y_continuous(labels=NotFancy)

Refactoring recurring ggplot code

I'm using R and ggplot2 to analyze some statistics from basketball games. I'm new to R and ggplot, and I like the results I'm getting, given my limited experience. But as I go along, I find that my code gets repetitive; which I dislike.
I created several plots similar to this one:
Code:
efgPlot <- ggplot(gmStats, aes(EFGpct, Nrtg)) +
stat_smooth(method = "lm") +
geom_point(aes(colour=plg_ShortName, shape=plg_ShortName)) +
scale_shape_manual(values=as.numeric(gmStats$plg_ShortName))
Only difference between the plots is the x-value; next plot would be:
orPlot <- ggplot(gmStats, aes(ORpct, Nrtg)) +
stat_smooth(method = "lm") + ... # from here all is the same
How could I refactor this, such that I could do something like:
efgPlot <- getPlot(gmStats, EFGpct, Nrtg))
orPlot <- getPlot(gmStats, ORpct, Nrtg))
Update
I think my way of refactoring this isn't really "R-ish" (or ggplot-ish if you will); based on baptiste's comment below, I solved this without refactoring anything into a function; see my answer below.
The key to this sort of thing is using aes_string rather than aes (untested, of course):
getPlot <- function(data,xvar,yvar){
p <- ggplot(data, aes_string(x = xvar, y = yvar)) +
stat_smooth(method = "lm") +
geom_point(aes(colour=plg_ShortName, shape=plg_ShortName)) +
scale_shape_manual(values=as.numeric(data$plg_ShortName))
print(p)
invisible(p)
}
aes_string allows you to pass variable names as strings, rather than expressions, which is more convenient when writing functions. Of course, you may not want to hard code to color and shape scales, in which case you could use aes_string again for those.
Although Joran's answer helpt me a lot (and he accurately answers my question), I eventually solved this according to baptiste's suggestion:
# get the variablesI need from the stats data frame:
forPlot <- gmStats[c("wed_ID","Nrtg","EFGpct","ORpct","TOpct","FTTpct",
"plg_ShortName","Home")]
# melt to long format:
forPlot.m <- melt(forPlot, id=c("wed_ID", "plg_ShortName", "Home","Nrtg"))
# use fact wrap to create 4 plots:
p <- ggplot(forPlot.m, aes(value, Nrtg)) +
geom_point(aes(shape=plg_ShortName, colour=plg_ShortName)) +
scale_shape_manual(values=as.numeric(forPlot.m$plg_ShortName)) +
stat_smooth(method="lm") +
facet_wrap(~variable,scales="free")
Which gives me:

Resources