I want to print the variables z in the plot.
I have added
sprintf(%1.1f,z1)
etc in various combinations with paste (and paste0) and expression, but none of them are working.
In the dummy code below I have hardcoded the values I want to see.
x <- c(1,2,3)
y <- c(1,2,3)
plot(x,y)
z <- c(0.1,0.2,0.3)
labels = c( expression( paste( sigma," = ","0.1" )),
expression( paste( sigma," = ","0.2" )),
expression( paste( sigma," = ","0.3" ))
)
legend("topright", inset=.05, title="title",
labels, lwd=2, lty=c(1,1,1), col=colors)
Create the string and parse it.
labels <- parse(text = sprintf("sigma == %f", z))
Words can be separated with ~ symbols or combined into a single literal using quotes. * can be used for juxtaposition.
labels <- parse(text = sprintf("Case ~ (%d) ~ sigma == %f", 1:3, z))
labels <- parse(text = sprintf("Case ~ (%d) * ':' ~ sigma == %f", 1:3, z))
labels <- parse(text = sprintf("'Case (%d)' ~ sigma == %f", 1:3, z))
labels <- parse(text = sprintf("'Case (%d):' ~ sigma == %f", 1:3, z))
Try demo("plotmath") for more info.
Related
We often want individual regression equations in ggplot facets. The best way to do this is build the labels in a dataframe and then add them manually. But what if the labels contain plotmath, e.g., superscripts?
Here is a way to do it. The plotmath is converted to a string and then parsed by ggplot. The test_eqn function is taken from another Stackoverflow post, I'll link it when I find it again. Sorry about that.
library(ggplot2)
library(dplyr)
test_eqn <- function(y, x){
m <- lm(log(y) ~ log(x)) # fit y = a * x ^ b in log space
p <- exp(predict(m)) # model prediction of y
eq <- substitute(expression(Y==a~X^~b),
list(
a = format(unname(exp(coef(m)[1])), digits = 3),
b = format(unname(coef(m)[2]), digits = 3)
))
list(eq = as.character(eq)[2], pred = p)
}
set.seed(123)
x <- runif(20)
y <- runif(20)
test_eqn(x,y)$eq
#> [1] "Y == \"0.57\" ~ X^~\"0.413\""
data <- data.frame(x = x,
y = y,
f = sample(c("A","B"), 20, replace = TRUE)) %>%
group_by(f) %>%
mutate(
label = test_eqn(y,x)$eq, # add label
labelx = mean(x),
labely = mean(y),
pred = test_eqn(y,x)$pred # add prediction
)
# plot fits (use slice(1) to avoid multiple copies of labels)
ggplot(data) +
geom_point(aes(x = x, y = y)) +
geom_line(aes(x = x, y = pred), colour = "red") +
geom_text(data = slice(data, 1), aes(x = labelx, y = labely, label = label), parse = TRUE) +
facet_wrap("f")
Created on 2021-10-20 by the reprex package (v2.0.1)
I am trying to add lm model coefs of two parallel modelling results onto the same ggplot plot. Here is my working example:
library(ggplot2)
set.seed(100)
dat <- data.frame(
x <- rnorm(100, 1),
y <- rnorm(100, 10),
lev <- gl(n = 2, k = 50, labels = letters[1:2])
)
mod1 <- lm(y~x, dat = dat[lev %in% "a", ])
r1 <- paste("R^2==", round(summary(mod1)[[9]], 3))
p1<- paste("p==", round(summary(mod1)[[4]][2, 4], 3), sep= "")
lab1 <- paste(r1, p1, sep =",")
mod2 <- lm(y~x, dat = dat[lev %in% "b", ])
r2 <- paste("R^2==", round(summary(mod2)[[9]], 3))
p2 <- paste("p==", round(summary(mod2)[[4]][2, 4], 3), sep= "")
lab2 <- paste(r2, p2, sep =",")
ggplot(dat, aes(x = x, y = y, col = lev)) + geom_jitter() + geom_smooth(method = "lm") + annotate("text", x = 2, y = 12, label = lab1, parse = T) + annotate("text", x = 10, y = 8, label = lab2, parse = T)
Here is the promot shows:
Error in parse(text = text[[i]]) : <text>:1:12: unexpected ','
1: R^2== 0.008,
Now the problem is that I could label either R2 or p value seperately, but not both of them together. How could I do to put the two results into one single line on the figure?
BTW, any other efficienty way of doing the same thing as my code? I have nine subplots that I want to put into one full plot, and I don't want to add them one by one.
++++++++++++++++++++++++++ Some update ++++++++++++++++++++++++++++++++++
Following #G. Grothendieck 's kind suggestion and idea, I tried to wrap the most repeatative part of the codes into a function, so I could finish all the plot with a few lines. Now the problem is that, whatever I changed the input variables, the output plot are basically the same, except the axis labels. Can anyone explain why? The following is the working code I used:
library(ggplot2)
library(ggpubr)
set.seed(100)
dat <- data.frame(
x = rnorm(100, 1),
y = rnorm(100, 10),
z = rnorm(100, 25),
lev = gl(n = 2, k = 50, labels = letters[1:2])
)
test <- function(dat, x, y){
fmt <- "%s: Adj ~ R^2 == %.3f * ',' ~ {p == %.3f}"
mod1 <- lm(y ~ x, dat, subset = lev == "a")
sum1 <- summary(mod1)
lab1 <- sprintf(fmt, "a", sum1$adj.r.squared, coef(sum1)[2, 4])
mod2 <- lm(y ~ x, dat, subset = lev == "b")
sum2 <- summary(mod2)
lab2 <- sprintf(fmt, "b", sum2$adj.r.squared, coef(sum2)[2, 4])
colors <- 1:2
p <- ggplot(dat, aes(x = x, y = y, col = lev)) +
geom_jitter() +
geom_smooth(method = "lm") +
annotate("text", x = 2, y = c(12, 8), label = c(lab1, lab2),
parse = TRUE, hjust = 0, color = colors) +
scale_color_manual(values = colors)
return(p)
}
ggarrange(test(dat, x, z), test(dat, y, z))
There are several problems here:
x, y and lev are arguments to data.frame so they must be specified using = rather than <-
make use of the subset= argument in lm
use sprintf instead of paste to simplify the specification of labels
label the text strings a and b and make them the same color as the corresponding lines to identify which is which
the formula syntax needs to be corrected. See fmt below.
it would be clearer to use component names and accessor functions of the summary objects where available
use TRUE rather than T because the latter can be overridden if there is a variable called T but TRUE can never be overridden.
use hjust=0 and adjust the x= and y= in annotate to align the two text strings
combine the annotate statements
place the individual terms of the ggplot statement on separate lines for improved readability
This gives:
library(ggplot2)
set.seed(100)
dat <- data.frame(
x = rnorm(100, 1),
y = rnorm(100, 10),
lev = gl(n = 2, k = 50, labels = letters[1:2])
)
fmt <- "%s: Adj ~ R^2 == %.3f * ',' ~ {p == %.3f}"
mod1 <- lm(y ~ x, dat, subset = lev == "a")
sum1 <- summary(mod1)
lab1 <- sprintf(fmt, "a", sum1$adj.r.squared, coef(sum1)[2, 4])
mod2 <- lm(y ~ x, dat, subset = lev == "b")
sum2 <- summary(mod2)
lab2 <- sprintf(fmt, "b", sum2$adj.r.squared, coef(sum2)[2, 4])
colors <- 1:2
ggplot(dat, aes(x = x, y = y, col = lev)) +
geom_jitter() +
geom_smooth(method = "lm") +
annotate("text", x = 2, y = c(12, 8), label = c(lab1, lab2),
parse = TRUE, hjust = 0, color = colors) +
scale_color_manual(values = colors)
Unless I'm misunderstanding your question, the problem's with the parse = T arguments to your annotate calls. I don't think your strings need to be parsed. Try parse = F instead, or just drop the parameter, as the default value seems to be FALSE anyway
I'm trying to plot a graph between two columns of data from the data frame called "final". I want the p value and r^2 value to show up on the graph.
I'm using this function and code, but it gives me the error "cannot find y value"
library(ggplot2)
lm_eqn <- function(final, x, y){
m <- lm(final[,y] ~ final[,x])
output <- paste("r.squared = ", round(summary(m)$adj.r.squared, digits = 4), " | p.value = ", formatC(summary(m)$coefficients[8], format = "e", digits = 4))
return(output)
}
output_plot <- lm_eqn(final, x, y)
p1 <- ggplot(final, aes(x=ENSG00000153563, y= ENSG00000163599)) + geom_point() + geom_smooth(method=lm, se=FALSE) + labs(x = "CD8A", y = "CTLA-4") + ggtitle("CD8 v/s CTLA-4", subtitle = paste("Linear Regression of Expression |", output_plot))
How do I get both columns of data x and y to flow through the function and for the graph to plot with the p value and residual value printed on the graph?
Thanks in advance.
When you call function for output_plot generation you have to use the same ENS... variables as in your plot. After simplifying slightly function, should work now
library(stats)
library(ggplot2)
lm_eqn <- function(x, y){
m <- lm(y ~ x)
output <- paste("r.squared = ", round(summary(m)$adj.r.squared, digits = 4), " | p.value = ", formatC(summary(m)$coefficients[8], format = "e", digits = 4))
return(output)
}
x <-c(1,2,5,2,3,6,7,0)
y <-c(2,3,5,9,8,3,3,1)
final <- data_frame(x,y)
output_plot <- lm_eqn(x, y)
p1 <- ggplot(final, aes(x=x, y= y)) + geom_point() + geom_smooth(method=lm, se=FALSE) + labs(x = "x", y = "y") + ggtitle("CD8 v/s CTLA-4", subtitle = paste("Linear Regression of Expression |", output_plot))
I wrote a function to ease the visualization of a bunch of correlations that I was doing. Specifically, I was interested in viewing bivariate relationships side by side in ggplot2 panels with the p-value and rho value printed directly on the graph. I wrote this function using the iris dataset:
library(ggplot2)
library(dplyr)
grouped_cor_ <- function(data, x, y, group.col){
x <- lazyeval::as.lazy(x)
y <- lazyeval::as.lazy(y)
cor1 <- lazyeval::interp(~ cor.test(x, y,method="spearman",na.action = "na.exclude")$estimate, x = x, y = y)
corp <- lazyeval::interp(~ cor.test(x, y,method="spearman", na.action = "na.exclude")$p.value, x = x, y = y)
mnx <- lazyeval::interp(~ mean(x, na.rm=TRUE), x = x, y = y)
mny <- lazyeval::interp(~ mean(y, na.rm=TRUE), x = x, y = y)
summarise_(group_by(data, Species), rho=cor1, pval=corp, xcoord=mnx, ycoord=mny)
}
This is the data frame that I am using to print the statistics from the correlation:
grouped_cor_(data=iris, x=~Petal.Width, y=~Petal.Length)
Then this is the function that calls the plot:
corHighlight <- function(Data, x, y){
cordf<-grouped_cor_(Data, x = substitute(x), y = substitute(y))
cordf$prho <- paste("rho=",round(cordf$rho,3), "\n p-value=",round(cordf$pval,3), sep=" ")
plt<-ggplot(Data, aes_q(x = substitute(x), y = substitute(y))) +
geom_text(data=cordf, aes_q(x=substitute(xcoord),
y=substitute(ycoord),
label=substitute(prho)), colour='red') +
geom_point(size=2, alpha=0.3) +
facet_wrap(~Species)
print(plt)
}
corHighlight(Data=iris,
x=Petal.Width,
y=Petal.Length)
The function, though a little clunky, works well now with one small detail that I can't seem to figure out. I can't figure out how to add a column specification for the grouping variable. Right now the function is tied to the iris dataset because it only accepts a grouping variable named `species'. My question then is how do I separate this function from the iris dataset and generalized the grouping variable.
Can anyone recommend an efficient way of doing this? Happy to accept any comments that improve the function as well.
This would let you pass a single grouping factor to your helper function. Does require using group_by_ since I extract the name from the formula as a character but then coerce back it to a name:
grouped_cor_ <- function(data, x, y, form){
x <- lazyeval::as.lazy(x)
y <- lazyeval::as.lazy(y); fac <- as.name(as.character(form)[2])
cor1 <- lazyeval::interp(~ cor.test(x, y,method="spearman",na.action = "na.exclude")$estimate, x = x, y = y)
corp <- lazyeval::interp(~ cor.test(x, y,method="spearman", na.action = "na.exclude")$p.value, x = x, y = y)
mnx <- lazyeval::interp(~ mean(x, na.rm=TRUE), x = x, y = y)
mny <- lazyeval::interp(~ mean(y, na.rm=TRUE), x = x, y = y)
summarise_( group_by_(data, fac), rho=cor1, pval=corp, xcoord=mnx, ycoord=mny)
}
To illustrate what I said in the comment (allow the function to accept a formula that can be processed by `facet_wrap``:
corHighlight <- function(Data, x, y, form){
cordf<-grouped_cor_(Data, x = substitute(x), y = substitute(y), form=substitute(form))
cordf$prho <- paste("rho=",round(cordf$rho,3), "\n p-value=",round(cordf$pval,3), sep=" ")
plt<-ggplot(Data, aes_q(x = substitute(x), y = substitute(y))) +
geom_text(data=cordf, aes_q(x=substitute(xcoord),
y=substitute(ycoord),
label=substitute(prho)), colour='red') +
geom_point(size=2, alpha=0.3) +
facet_wrap(form)
print(plt)
}
corHighlight(Data=iris,
x=Petal.Width,
y=Petal.Length, form = ~Species)
I am working on a figure for publication and wish to annotate it with some beta and p values; the style guidelines of my area dictate that these numbers be formatted without leading zeros (e.g., ".003", not "0.003"). I have run into what seems like a Catch-22; I have extracted beta and p values from my models and done some preprocessing to correctly format them so that they are now characters rather than numeric:
fake.beta.vals <- c(".53", ".29", ".14")
fake.p.vals <- c(".034", ".001", ".050")
But, when I try to use these values in my figure, parse = TRUE turns them back into numeric values, losing the formatting I need.
fake.beta.vals <- c(".53", ".29", ".14")
fake.p.vals <- c(".034", ".001", ".050")
p <- ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width))
p <- p +
geom_smooth(method = "lm") +
geom_point() +
facet_wrap( ~ Species)
p
len <-length(levels(iris$Species))
vars <- data.frame(expand.grid(levels(iris$Species)))
colnames(vars) <- c("Species")
betalabs <- as.data.frame(fake.beta.vals)
plabs <- as.data.frame(fake.p.vals)
dat <- data.frame(
x = rep(7, len),
y = rep(4, len),
vars,
betalabs,
plabs)
dat$fake.beta.vals <- as.factor(dat$fake.beta.vals)
dat$fake.p.vals <- as.factor(dat$fake.p.vals)
p <- p +
geom_text(
aes(x = x,
y = y,
label = paste("list(beta ==",
fake.beta.vals,
", italic(p) ==",
fake.p.vals,
")"),
group = NULL),
size = 5,
data = dat,
parse = TRUE)
p
I have been banging my head against this problem for a while now but adding as.character():
label = paste("list(beta ==",
as.character(fake.beta.vals),
", italic(p) ==",
as.character(fake.p.vals),
")"),
Is obviously also cancelled out by parse = TRUE
And adding the function I had previously used to format my values:
statformat <- function(val,z){
sub("^(-?)0.", "\\1.", sprintf(paste("%.",z,"f", sep = ""), val))
}
Is even worse:
label = paste("list(beta ==",
statformat(fake.beta.vals, 2),
", italic(p) ==",
statformat(fake.p.vals, 3),
")"),
And just ends up with a mess.
Help?
Use bquote to create the labels, then coerce to a character representation using deparse
For example
# create a list of labels using bquotw
labs <- Map(.beta = fake.beta.vals,
.p = fake.p.vals,
f = function(.beta,.p) bquote(list(beta == .(.beta), italic(p) == .(.p))))
# coerce to a character representation for parse=TRUE to work within
# geom_text
dat <- data.frame(
x = rep(7, len),
y = rep(4, len),
vars,
labels = sapply(labs,deparse))
p <- ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) +
geom_smooth(method = "lm") +
geom_point() +
facet_wrap( ~ Species) +
geom_text(data = dat, aes(x=x,y=y,label=labels), parse=TRUE)
p
After getting back to my computer and re-reading your question, I found that I misinterpreted the question. Trying out the I function, I found that it doesn't seem to work with parse.
I found a way to get it to work, and this is by encasing your fake.beta.vals and fake.p.vals with the ` character or the ' character in your call to parse.
p <- p +
geom_text(
aes(x = x,
y = y,
label = paste("list(beta ==",
"`", fake.beta.vals, "`",
", italic(p) ==",
"`", fake.p.vals, "`",
")",
sep=""),
group = NULL),
size = 5,
data = dat,
parse = TRUE)
That should work.