Remove the equal sign and set significant figures with stat_poly_eq? - r

I am using the function stat_poly_eq to display the R-squared and p-value of a linear regression within a ggplot. I have two questions to optimize my output:
How do I remove the equal sign (=) from the p-value, such that only these less than sign (<) remains?
How can I set a desired number of significant digits to display? For example, I would like to see 3 significant digits for both the R-squared and p-value.
Here's some reproducible code to show the issue:
data(mtcars)
ggplot(data=mtcars, aes(x=mpg,y=hp)) +
geom_point() +
geom_smooth(method = "lm",formula = y ~ x,se=TRUE, color="black") +
stat_poly_eq(formula = y ~ x,
aes(label = paste(..rr.label.., ..p.value.label.., sep = "*`,`~")),
parse = TRUE,label.x.npc = "right",size=8)

The number of digits is easy to control with the argument rr.digits. I can't replicate your problem with the equals sign, but if you update ggplot and ggpmisc and use the modern after_stat syntax rather than the depricated .. syntax, you should get the same result as demonstrated in this reprex:
library(ggplot2)
library(ggpmisc)
#> Loading required package: ggpp
#>
#> Attaching package: 'ggpp'
#> The following object is masked from 'package:ggplot2':
#>
#> annotate
ggplot(mtcars, aes(mpg, hp)) +
geom_point() +
geom_smooth(method = "lm", formula = y ~ x, color = "black") +
stat_poly_eq(formula = y ~ x,
aes(label = paste(after_stat(rr.label), "*`,`~",
after_stat(p.value.label))),
parse = TRUE, label.x.npc = "right", size = 8, rr.digits = 3)
Created on 2022-12-23 with reprex v2.0.2

Related

Loop graphs ggplot y for x for different categories with linear regression, How to consequetively plot categories?

EDITED:
I have a large data base trying to reapeatedly assess energy expenditue over time with the aim to compare multiple different variables (0/1, e.g. presence of severe head trauma vs. no such). The graph analysis should be repeated for all available variables in the database. All tables should be exported to a PDF File.
Currently I'm using the following code:
library(tidyverse)
library(ggpmisc)
my_data %>%
pdf(file="Plots.pdf" )
print(colnames(my_data) %>%
map(function(x) my_data%>%
ggplot(aes(x = Day,
y = REE,
color=as_factor(x)))+
scale_x_continuous(breaks = c(0,2,4,6,8,10,12,14,16,18,20,22,24,26,28))+
scale_y_continuous(limits= c(0000,4000))+
geom_point()+
geom_smooth(method=lm,
se=TRUE,
size=2/10,
aes(group=as_factor(x)))+
stat_poly_eq(aes(label = paste(after_stat(eq.label),
after_stat(rr.label),
after_stat(p.value.label),
sep = "*\", \"*")),
label.y="bottom", label.x="right")+
labs(x="Time [d]",
y="Resting Energy Expenditure [kcal]")+
scale_colour_grey(start=0.7,
end=0.3)+
theme_bw()
))
dev.off()
It generates the PDF File with all graphs. However, it does not group/color according to the as_factor(x) and all data points are categorised into the same group.
Does anyone have a possible explanation on how to resolve this problem that the categorising according to the factor variable doesn't work?
The issue is that you loop over column names which are character strings. Doing color=as.factor(x) (or group = as.factor(x)) you are mapping a constant character string on the color aes, i.e. you are doing something like color="foo".
If you pass variable names as character strings you to have to tell ggplot2 that you want to map the data column with this name on an aesthetic, which could be achieved via the so-called .data pronoun, e.g. do color=as.factor(.data[[x]]).
Using a minimal reprex based on mtcars:
Note: Personally I would suggest to put your plotting code in a separate function ainstead of passing it as an anonymous function to purrr::map as I do below. Makes debugging easier and your code cleaner.
library(tidyverse)
library(ggpmisc)
my_data <- mtcars
plot_fun <- function(x) {
ggplot(my_data, aes(
x = mpg,
y = hp,
color = as_factor(.data[[x]])
)) +
geom_point() +
geom_smooth(
method = lm,
se = TRUE,
size = 2 / 10,
aes(group = as_factor(.data[[x]]))
) +
stat_poly_eq(aes(label = paste(after_stat(eq.label),
after_stat(rr.label),
after_stat(p.value.label),
sep = "*\", \"*"
)),
label.y = "bottom", label.x = "right"
) +
labs(
x = "Time [d]",
y = "Resting Energy Expenditure [kcal]"
) +
scale_colour_grey(
start = 0.7,
end = 0.3
) +
theme_bw()
}
cols <- c("cyl", "am", "gear") # colnames(my_data)
# pdf(file = "Plots.pdf")
purrr::map(cols, plot_fun)
#> [[1]]
#> `geom_smooth()` using formula 'y ~ x'
#>
#> [[2]]
#> `geom_smooth()` using formula 'y ~ x'
#>
#> [[3]]
#> `geom_smooth()` using formula 'y ~ x'
# dev.off()

Color ggscatter by R and P values

I'm plotting a wrapped ggscatter like the image below. What I want is to color differently according to the R and P values. For example, when P is not significant, I want the plot gray; when P is significant is want the plot colored according the R value in a continue scale. The problem is I don't know how to get those values to make an if statement inside the ggscatter. Anyone can help me? Thank you!
Example of dataset:
conc exposure col
11.16 21294 0.139275104
11.16 18018 0.150012216
13.8 26208 0.067379679
18.1 29484 0.013190731
Plot:
ggscatter(data, x = "exposure", y = col, add = "reg.line", conf.int = TRUE, color = cor.hilab[1], cor.coef = TRUE) +
facet_wrap(~conc)+
ylab("OD")+
xlab("Exposure")
This is now implemented in package 'ggpmisc' in GitHub (future version 0.4.4), but not yet in version 0.4.3 available in CRAN. This does not do exactly what you asked as at the moment geom_poly_eq() does not return R as it is not meaningful for polynomials of higher order than 1. Keeping to the grammar of graphics paradigm allows easier customization at the cost of lengthier code than the all-in-one functions from package pubr. Each approach has its advantages and drawbacks. I show here three examples of different possible approaches using package 'ggpmisc' (together with 'ggplot2').
library(ggpmisc)
#> Loading required package: ggpp
#> Loading required package: ggplot2
#>
#> Attaching package: 'ggpp'
#> The following object is masked from 'package:ggplot2':
#>
#> annotate
# First approach: faint lines for non-significant fits, with bands
ggplot(mpg, aes(displ, hwy)) +
geom_point() +
stat_poly_eq(aes(label = paste(after_stat(rr.label),
after_stat(p.value.label),
sep = "*\", \"*")),
label.x = "right") +
stat_poly_line(aes(colour = stage(after_scale = ifelse(p.value < 0.05,
alpha(colour, 1),
alpha(colour, 0.25)))),
se = TRUE,
mf.values = T) +
facet_wrap(~class, ncol = 2) +
theme_bw()
# Second approach: faint lines for non-significant fits, no-bands
# colour mapped to class
ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(colour = class)) +
stat_poly_eq(aes(label = paste(after_stat(rr.label),
after_stat(p.value.label),
sep = "*\", \"*")),
label.x = "right") +
stat_poly_line(aes(colour = stage(start = class,
after_scale = ifelse(p.value < 0.05,
alpha(colour, 1),
alpha(colour, 0.25)))),
se = FALSE,
mf.values = T) +
facet_wrap(~class, ncol = 2) +
theme_bw()
#> Warning: Failed to apply `after_scale()` modifications to legend
# Third approach: no bands or lines for non-significant fits
ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(colour = class)) +
stat_poly_eq(aes(label = paste(after_stat(rr.label),
after_stat(p.value.label),
sep = "*\", \"*")),
label.x = "right") +
stat_poly_line(aes(colour = stage(after_scale = ifelse(p.value < 0.05,
colour,
NA)),
fill = stage(after_scale = ifelse(p.value < 0.05,
fill,
NA))),
se = TRUE,
mf.values = T) +
facet_wrap(~class, ncol = 2) +
theme_bw()
#> Warning: Duplicated aesthetics after name standardisation: NA
#> Warning: Failed to apply `after_scale()` modifications to legend
Created on 2021-09-07 by the reprex package (v2.0.1)

adding significance brackets to ridgeline plot

I am creating a ridge plot to compare a few groups (using ggridges package) and would like to add significance brackets to show comparisons between some group levels (using ggsignif package).
But this doesn't seem to work because the computation fails in stat_ggsignif.
Here is a reproducible example:
set.seed(123)
library(ggsignif)
library(ggridges)
library(ggplot2)
ggplot(iris, aes(x = Sepal.Length, y = Species)) +
geom_density_ridges(scale = 1) +
coord_flip() +
geom_signif(comparisons = list(c("setosa", "versicolor")))
#> Picking joint bandwidth of 0.181
#> Warning in f(..., self = self): NAs introduced by coercion
#> Warning: Computation failed in `stat_signif()`:
#> missing value where TRUE/FALSE needed
Created on 2021-07-29 by the reprex package (v2.0.0)
How can I get these two packages to work with each other? Thanks.
I did not manage to combine A) geom_density_ridges and B) geom_signif. The reason is that (A) requires numerical variable as x and categories as y, while (B) requires numerical variable as y and categories as x. And I have not managed to overwrite this behaviour.
But I assume that you have chosen ridge_plots over simple boxplots as you are interested in a more informative visualization of the distribution. To do so, there is a much better solution than ridge_plots, the so called violin plots. See below a standard boxplot (with labelled significance):
ggplot(iris, aes(x = Species, y = Sepal.Length, fill = Species)) +
geom_boxplot() +
geom_signif(comparisons = list(c("setosa", "versicolor")), test = "t.test")
See below a violin plot (with jitter and labelled significance):
ggplot(iris, aes(x = Species, y = Sepal.Length, fill = Species)) +
geom_violin(trim = F) + geom_jitter() +
geom_signif(comparisons = list(c("setosa", "versicolor")), test = "t.test")
This does the job unless you are particularly interest in making ggridges and ggsignif work together. Please note that a violin plot is just a folded density plot (see https://en.wikipedia.org/wiki/Violin_plot#:~:text=A%20violin%20plot%20is%20a,by%20a%20kernel%20density%20estimator for more details).
For the same purpose, see also the sina plot (suggestion by tjebo):
library(ggforce)
ggplot(iris, aes(x = Species, y = Sepal.Length, colour = Species)) +
geom_sina() +
geom_signif(comparisons = list(c("setosa", "versicolor")), test = "t.test")
Thanks to a new pull request to ggsignif, the following now works:
set.seed(123)
library(ggsignif)
library(ggridges)
library(ggplot2)
ggplot(iris, aes(x = Sepal.Length, y = Species)) +
geom_density_ridges(scale = 1) +
coord_flip() +
geom_signif(comparisons = list(c("setosa", "versicolor")),
y_position = 9)
#> Picking joint bandwidth of 0.181
Created on 2021-08-06 by the reprex package (v2.0.1)

ggpmisc::stat_poly_eq crashes when one group does not have enough data points

library(ggpmisc)
data <- mpg
table(data$class)
data$class[mpg$class=="2seater"] <- c(rep("2seater", 1), rep("compact", 4))
formula <- y ~ x + I(x^2)
ggplot(data, aes(x = displ, y = hwy, color = class)) +
geom_point() +
geom_smooth(method = "lm", formula = formula) +
ggpmisc::stat_poly_eq(aes(label = paste(stat(eq.label), stat(adj.rr.label), sep = "*\", \"*")),
formula = formula, parse = TRUE)
The above code will produce a plot without the formula and r-squared labels. And will give this warning:
Warning message:
Computation failed in `stat_poly_eq()`:
argument "x" is missing, with no default
The problem stems from the fact that the 2seater color group only has one data point. It isn't enough. But then it should at least output labels for the other groups?
The problem is now fixed in upcoming 'ggpmisc' (0.4.0), which can be installed from GitHub. At the moment the fixed version can be installed with remotes::install_github("aphalo/ggpp") followed by remotes::install_github("aphalo/ggpmisc").
Once both packages are in CRAN updating 'ggpmisc' will be enough.

Specifying formula for each facet using stat_poly_eq in ggplot2

I borrowed this example dataset from here:
# Load library
library(ggplot2)
# Load data
data(mtcars)
# Plot data
p <- ggplot(mtcars,aes(x = disp, y = mpg)) + geom_point() + facet_grid(gear ~ am)
p <- p + geom_smooth(method="lm")
print(p)
In above code the regression methods and formulae are the same in all facets. If we want to specify formula for facet (or panel) 6, we have the following code, from here:
# Smoothing function with different behaviour depending on the panel
custom.smooth <- function(formula, data,...){
smooth.call <- match.call()
if(as.numeric(unique(data$PANEL)) == 6) {
# Linear regression
smooth.call[[1]] <- quote(lm)
# Specify formula
smooth.call$formula <- as.formula("y ~ log(x)")
}else{
# Linear regression
smooth.call[[1]] <- quote(lm)
}
# Perform fit
eval.parent(smooth.call)
}
# Plot data with custom fitting function
p <- ggplot(mtcars,aes(x = disp, y = mpg)) + geom_point() + facet_grid(gear ~ am)
p <- p + geom_smooth(method = "custom.smooth", se = FALSE)
print(p)
Now if I want to add regression equations to these facets:
# Load library
library(ggpmisc)
p + stat_poly_eq(formula = y ~ x,aes(label = paste(..eq.label.., ..rr.label.., sep = "~~~")),
parse=TRUE,label.x.npc = "right")
Then what should I do, to specify the equation and R2 displayed on panel 6, that can match the model I specified before? See the plot below, now panel 6 has its own fitting model, but the equation label doesn't. Maybe we can define a similar function as we did to ggplot2 parameters?
It seems like the function you are calling custom.smooth contains a row that defines the formula as "y ~ log(x)". Therefore, you need to also specify this in your stat_poly_eq function, hence the polynomial shape (but in reality logarithmic) of a linear looking equation.
I.e. add:
p + stat_poly_eq(formula = y ~ log(x),
aes(label = paste(..eq.label.., ..rr.label.., sep = "~~~")),
parse=TRUE,label.x.npc = "right")
You could update panel 6's formula individually (of course you could also update all panels with a function like that, but there's no need for that here)
rename_panel_expression <- function(grb, panel, expr) {
g <- grb$grobs[[panel + 1]]$children
grb$grobs[[panel + 1]]$children[[grep("GRID.text", names(g))]]$label <- expr
grb
}
l <- lm(mpg ~ log(disp), mtcars[mtcars$am == 1 & mtcars$gear == 5, ])
tt <- rename_panel_expression(ggplotGrob(p), 6,
bquote(italic(y)~`=`~.(round(l$coefficients[1], 3)) - .(round(abs(l$coefficients[2]), 3))*~italic(x)~~~italic(R)^2~`=`~.(round(summary(l)$r.squared, 3))))
grid::grid.newpage()
grid::grid.draw(tt)
This answer does not exactly answer the question when considering details, but matches the title, so hopefully will be of some help to future visitors.
stat_poly_eq() is meant to be used with model formulas where the untransformed x (or the untransformed y) is the explanatory variable. It does not support a model formula like y ~ log(x) without manually replacing x by _log(x) in the equation label within the call, even if used in all panels. In the most recent version of 'ggpmisc' it is possible to have polynomials of different degrees in each panel using a user-defined method function.
library(ggpmisc)
#> Loading required package: ggpp
#> Loading required package: ggplot2
#>
#> Attaching package: 'ggpp'
#> The following object is masked from 'package:ggplot2':
#>
#> annotate
poly_degree <- function(formula, data, ...) {
if (all(data$PANEL == 6)) {
formula <- y ~ poly(x, 2, raw = TRUE)
}
lm(formula = formula, data = data, ...)
}
ggplot(mtcars,aes(x = disp, y = mpg)) +
geom_point() +
stat_poly_line(method = "poly_degree") +
stat_poly_eq(method = "poly_degree",
use_label(c("eq", "r2")),
size = 3,
label.x = "right") +
theme(legend.position = "bottom") +
facet_grid(gear ~ am)
Created on 2022-10-17 with reprex v2.0.2

Resources