Lines overlapping in R with ggridges - r

I use ggridges in R to visualize my data. But a lot of the lines are overlapping and are hard to read.
My code is:
ggplot(task1, aes(x = ibu, y = style, fill = style)) +
geom_density_ridges(alpha=1) +
theme_ridges() +
theme(legend.position = "none")
What should I change to make this visualization more readable?

You can use the scale parameter to adjust the overall height scaling. Just set it to a number that produces results you like.
library(ggplot2)
library(ggridges)
#>
#> Attaching package: 'ggridges'
#> The following object is masked from 'package:ggplot2':
#>
#> scale_discrete_manual
ggplot(iris, aes(x = Sepal.Length, y = Species, fill = Species)) +
geom_density_ridges()
#> Picking joint bandwidth of 0.181
ggplot(iris, aes(x = Sepal.Length, y = Species, fill = Species)) +
geom_density_ridges(scale = 0.5)
#> Picking joint bandwidth of 0.181
Created on 2019-11-03 by the reprex package (v0.3.0)

Related

Remove the equal sign and set significant figures with stat_poly_eq?

I am using the function stat_poly_eq to display the R-squared and p-value of a linear regression within a ggplot. I have two questions to optimize my output:
How do I remove the equal sign (=) from the p-value, such that only these less than sign (<) remains?
How can I set a desired number of significant digits to display? For example, I would like to see 3 significant digits for both the R-squared and p-value.
Here's some reproducible code to show the issue:
data(mtcars)
ggplot(data=mtcars, aes(x=mpg,y=hp)) +
geom_point() +
geom_smooth(method = "lm",formula = y ~ x,se=TRUE, color="black") +
stat_poly_eq(formula = y ~ x,
aes(label = paste(..rr.label.., ..p.value.label.., sep = "*`,`~")),
parse = TRUE,label.x.npc = "right",size=8)
The number of digits is easy to control with the argument rr.digits. I can't replicate your problem with the equals sign, but if you update ggplot and ggpmisc and use the modern after_stat syntax rather than the depricated .. syntax, you should get the same result as demonstrated in this reprex:
library(ggplot2)
library(ggpmisc)
#> Loading required package: ggpp
#>
#> Attaching package: 'ggpp'
#> The following object is masked from 'package:ggplot2':
#>
#> annotate
ggplot(mtcars, aes(mpg, hp)) +
geom_point() +
geom_smooth(method = "lm", formula = y ~ x, color = "black") +
stat_poly_eq(formula = y ~ x,
aes(label = paste(after_stat(rr.label), "*`,`~",
after_stat(p.value.label))),
parse = TRUE, label.x.npc = "right", size = 8, rr.digits = 3)
Created on 2022-12-23 with reprex v2.0.2

Is there a way to add a legend to refer to residuals in a effect plot?

I'm looking to add a legend to my plot, for the moment the code I wrote is:
plot(allEffects(covid.lm, residuals=T), # plot with countries on graph
band.colors="grey2",
residuals.color=adjustcolor("steelblue3",alpha.f=0.5),
residuals.pch=16, smooth.residuals=F,
id = list(n=length(d$COUNTRY), cex=0.5))
Basically, I added numbers to the points in the plot (for which I created a linear model covid.lm, that done I'd need to add a legend for those points (that is a list of countries). Thanks in advance.
library(ggplot2)
data(iris)
m <- lm(Petal.Length ~ Sepal.Length, data = iris)
iris$Fitted <- predict(m)
iris$Species_num <- as.numeric(iris$Species)
ggplot(iris, aes(x = Petal.Length, y = Fitted)) +
geom_point(aes(color = as.factor(Species_num))) +
geom_text(aes(label = Species_num), hjust = 1.1, vjust = 1.1) +
labs(title = "Residuals", x = "Observed", y = "Fitted") +
guides(color=guide_legend(title="New Legend Title"))
Created on 2022-04-08 by the reprex package (v2.0.1)

adding significance brackets to ridgeline plot

I am creating a ridge plot to compare a few groups (using ggridges package) and would like to add significance brackets to show comparisons between some group levels (using ggsignif package).
But this doesn't seem to work because the computation fails in stat_ggsignif.
Here is a reproducible example:
set.seed(123)
library(ggsignif)
library(ggridges)
library(ggplot2)
ggplot(iris, aes(x = Sepal.Length, y = Species)) +
geom_density_ridges(scale = 1) +
coord_flip() +
geom_signif(comparisons = list(c("setosa", "versicolor")))
#> Picking joint bandwidth of 0.181
#> Warning in f(..., self = self): NAs introduced by coercion
#> Warning: Computation failed in `stat_signif()`:
#> missing value where TRUE/FALSE needed
Created on 2021-07-29 by the reprex package (v2.0.0)
How can I get these two packages to work with each other? Thanks.
I did not manage to combine A) geom_density_ridges and B) geom_signif. The reason is that (A) requires numerical variable as x and categories as y, while (B) requires numerical variable as y and categories as x. And I have not managed to overwrite this behaviour.
But I assume that you have chosen ridge_plots over simple boxplots as you are interested in a more informative visualization of the distribution. To do so, there is a much better solution than ridge_plots, the so called violin plots. See below a standard boxplot (with labelled significance):
ggplot(iris, aes(x = Species, y = Sepal.Length, fill = Species)) +
geom_boxplot() +
geom_signif(comparisons = list(c("setosa", "versicolor")), test = "t.test")
See below a violin plot (with jitter and labelled significance):
ggplot(iris, aes(x = Species, y = Sepal.Length, fill = Species)) +
geom_violin(trim = F) + geom_jitter() +
geom_signif(comparisons = list(c("setosa", "versicolor")), test = "t.test")
This does the job unless you are particularly interest in making ggridges and ggsignif work together. Please note that a violin plot is just a folded density plot (see https://en.wikipedia.org/wiki/Violin_plot#:~:text=A%20violin%20plot%20is%20a,by%20a%20kernel%20density%20estimator for more details).
For the same purpose, see also the sina plot (suggestion by tjebo):
library(ggforce)
ggplot(iris, aes(x = Species, y = Sepal.Length, colour = Species)) +
geom_sina() +
geom_signif(comparisons = list(c("setosa", "versicolor")), test = "t.test")
Thanks to a new pull request to ggsignif, the following now works:
set.seed(123)
library(ggsignif)
library(ggridges)
library(ggplot2)
ggplot(iris, aes(x = Sepal.Length, y = Species)) +
geom_density_ridges(scale = 1) +
coord_flip() +
geom_signif(comparisons = list(c("setosa", "versicolor")),
y_position = 9)
#> Picking joint bandwidth of 0.181
Created on 2021-08-06 by the reprex package (v2.0.1)

ggplot add geom (specifically geom_hline) which doesn't affect limits

I have some data that I would like to plot a threshold on, only if the data approaches the threshold. Therefore I would like to have a horizontal line at my threshold, but not extend the y axis limits if this value wouldn't have already been included. As my data is faceted it is not feasible to pre-calculate limits and I am doing it for many different data sets so would get very messy. This question seems to be asking the same thing but the answers are not relevant to me: ggplot2: Adding a geom without affecting limits
Simple example.
library(ggplot2)
#> Warning: package 'ggplot2' was built under R version 3.5.3
ggplot(iris, aes(x = Sepal.Width, y = Sepal.Length))+geom_point()+facet_wrap(~Species, scales = "free")+geom_hline(yintercept = 7)
which gives me
But I would like this (created in paint) where the limits have not been impacted by the geom_hline
Created on 2020-01-21 by the reprex package (v0.3.0)
You can automate this by checking whether a given facet has a maximum y-value that exceeds the threshold.
threshold = 7
iris %>%
ggplot(aes(Sepal.Width, Sepal.Length)) +
geom_point() +
facet_wrap(~Species, scales = "free") +
geom_hline(data = . %>%
group_by(Species) %>%
filter(max(Sepal.Length, na.rm=TRUE) >= threshold),
yintercept = threshold)
Adapting from this post:
How can I add a line to one of the facets?
library(tidyverse)
iris %>%
ggplot(aes(x = Sepal.Width, y = Sepal.Length)) +
geom_point() +
facet_wrap(~Species, scales = "free") +
geom_hline(data = . %>% filter(Species != "setosa"), aes(yintercept = 7))

How to use color() instead of facet_grid() to 'split' your data but keep it on the same plot

I'm having trouble substituting color() for facet_grid() when I want to 'split' my data by a variable. Instead of generating individual plots with regression lines, I'm looking to generate a single plot with all regression lines.
Here's my code:
ggplot(data, aes(x = Rooms, y = Price)) +
geom_point(size = 1, alpha = 1/100) +
geom_smooth(method = "lm", color = Type) # Single plot with all regression lines
ggplot(data, aes(x = Rooms, y = Price)) +
geom_point(size = 1, alpha = 1/100) +
geom_smooth(method = "lm") + facet_grid(. ~ Type) # Individual plots with regression lines
(The first plot doesn't work) Here's the output:
"Error in grDevices::col2rgb(colour, TRUE) : invalid color name 'Type'
In addition: Warning messages:
1: Removed 12750 rows containing non-finite values (stat_smooth).
2: Removed 12750 rows containing missing values (geom_point)."
Here's a link to the data:
Dataset
You need to supply an aesthetic mapping to geom_smooth, not just a parameter, which means you need to put colour inside aes(). This is what you need to do any time you want to have an graphical element correspond to something in the data rather than a fixed parameter.
Here's an example with the built-in iris dataset. In fact, if you move colour to the ggplot call so it is inherited by geom_point as well, then you can colour the points as well as the lines.
library(ggplot2)
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) +
geom_point() +
geom_smooth(aes(colour = Species), method = "lm")
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, colour = Species)) +
geom_point() +
geom_smooth(method = "lm")
Created on 2018-07-20 by the reprex package (v0.2.0).

Resources