Align ggplot2 points without having them overlap? - r

Is there any possible way to make the points on a boxplot show and not have them overlap each other if they arent unique?
Currently:
I want it to look like this (with the colours and other features):
I tried beeswarm and I'm getting the error:
Warning in f(...) : The default behavior of beeswarm has changed in version 0.6.0. In versions <0.6.0, this plot would have been dodged on the y-axis. In versions >=0.6.0, grouponX=FALSE must be explicitly set to group on y-axis. Please set grouponX=TRUE/FALSE to avoid this warning and ensure proper axis choice.
even though I have geom_beeswarm(grouponY=TRUE)

You could do something like this ...
library(tidyverse)
tibble(x = "label", y = rnorm(100, 10, 10)) |>
ggplot(aes(x, y)) +
geom_jitter(width = 0.1) +
geom_boxplot(outlier.shape = NA)
Created on 2022-04-24 by the reprex package (v2.0.1)

Slight modifications to Carl's answer could be to:
Move the geom_jitter layer below the geom_boxplot layer, so that the points show through the box; or
Make the box more transparent to allow the points to be visible
tibble(x = "label", y = rnorm(100, 10, 10)) %>%
ggplot(aes(x, y)) +
geom_boxplot(outlier.shape = NA, alpha ) +
geom_jitter(width = 0.1)
Alternatively, have you considered using a violin plot? It can more effectively show the density of the distribution, as the width of the plot is proportional to the proportion of data points around that level (for the y axis).

Related

Setting scale limits with `lims` for fill

I am trying to (naively?) set the limits of a color scale, but what happens is that the color scale itself is overridden additionally. What am I making wrong or how can it be done?
Simple example:
p <- volcano %>% reshape2::melt(varnames=c("x", "y")) %>% as_tibble() %>%
ggplot(aes(x,y, fill=value)) + geom_tile() +
scale_fill_gradientn(colours=hcl.colors(15, palette = "Purple-Green"))
p
p + lims(fill=c(50,200))
changes the whole color scale:
NB: In the real world example I want to center the color scale symmetrically around 0 with a diverging color scale and I do not want to use scale_fill_gradient2
Thanks in advance for any help!
The reason why lims doesn't work here is that it adds a whole new scale object to the plot which overrides the one you have already specified. If you look at the code for lims, it does its work through sending all its arguments individually to the generic function ggplot2:::limits. In your case, this invokesggplot2:::limits.numeric, which creates a new scale object via ggplot2:::make_scale. This function ends up just calling scale_fill_continuous.
As for why you can use lims after specifying an x or y scale without overwriting the existing one, the answer is: you can't, it does override the existing scale, and in fact warns you that it is doing so. Suppose we specify an x axis scale with lots of breaks in your example:
library(tidyverse)
p <- volcano %>%
reshape2::melt(varnames = c("x", "y")) %>%
as_tibble() %>%
ggplot(aes(x,y, fill = value)) +
geom_tile() +
scale_fill_gradientn(colours = hcl.colors(15, palette = "Purple-Green"),
limits = c(50, 200)) +
scale_x_continuous(breaks = 0:44 * 2)
p
Now look what happens if we add x axis lims to our scale:
p + lims(x = c(0, 90))
#> Scale for 'x' is already present. Adding another scale for 'x', which will
#> replace the existing scale.
We lost all our breaks, and got a warning that our x scale was being overwritten.
So the bottom line is that passing numbers to lims just adds a vanilla contnuous scale to whichever aesthetic you specify. Doing + lims(fill = c(0, 10)) gives exactly the same result as + scale_fill_continuous(limits = c(0, 10)). The answer, as you have found yourself, is to specify the limits argument directly in the scale you wish to add.
Created on 2022-08-21 with reprex v2.0.2
Ok, now I found the solution myself....
Did not know that there is a limits keyword directly:
volcano %>% reshape2::melt(varnames=c("x", "y")) %>% as_tibble() %>%
+ ggplot(aes(x,y, fill=value)) + geom_tile() +
+ scale_fill_gradientn(colours=hcl.colors(15, palette = "Purple-Green"), limits=c(50,200))
Still don't fully get, why lims overrides the whole scale. And this mean one cannot change the limits afterwards (as one can do x and y scales?

annotate edge of plot without changing plot limits or setting "expand" to 0

I have a ggplot object. I want to use annotate() to add a label to the top of the plot, so that the upper edge of the label is also the upper edge of the plot. When using default settings, this doesn't seem possible: adding an annotation at the upper edge of the plot causes the upper y-limit to increase.
One can get around this problem by specifying scale_y_continuous(expand = c(0, 0)) when creating the plot. But I don't want to do that, partly because I like the y limits created by the default expand setting. Given this constraint, is it possible to use annotate() to position a label at the top of the plot?
Here is a minimal example that demonstrates the problem:
library(ggplot2)
p <- ggplot(mtcars, aes(mpg, wt)) + geom_point()
yMax <- layer_scales(p)$y$range$range[2] # upper y-limit
p + annotate("label", x = 30, y = yMax, vjust = "top", label = "X")
And here is the result:
You see that the annotation is not at the top of the plot. Instead, consistent with the default "expand" settings, the y-limit of the plot has changed.
Possible solutions:
Figure out the y-limits implied by the default expand setting. Then use scale_y_continuous() to both set the y limits and set expand = c(0, 0). This solution will give me the y limits that I want, and it will place the label appropriately. I know how to implement it, but it seems a bit cumbersome. It would also prevent other annotations at the top of the figure from changing the y-limit of the plot -- and I don't want the solution to affect annotations other than the one that I describe here.
Use annotation_custom(), which doesn't change plot limits in the same way. #baptiste suggests a solution like that in this answer to a different question. But annotation_custom() requires a grob. In practice, the annotations that I use may be more complicated than the label in this example, and I won't always know how to create them as a grob that can be passed to annotation_custom(). In addition, I've had some trouble positioning grobs with annotation_custom() while also specifying their exact sizes.
That said, I am quite open to annotation_custom()-based solutions. And perhaps there are solutions other than the two that I've sketched above.
I've read many SO posts on changing plot limits, but I haven't found any that speak to this problem.
A simple solution for that is setting y = Inf instead of using the maximum value found of the y-axis (yMax). The code would be like that then:
# load library
library(ggplot2)
# load data
data(mtcars)
# define plot
p <- ggplot(mtcars, aes(mpg, wt)) + geom_point()
p + annotate("label", x = 30, y = Inf, vjust = "top", label = "X")
Here is the output:
Let me know if this is what you're looking for.
Does this help?
library(ggplot2)
data(mtcars)
ggplot(mtcars, aes(mpg, wt)) +
geom_point() +
geom_text(label = "X", x = 30, y = max(mtcars$wt))

Multiple Splines using ggplot2 + Different colours + Line width + Custom X-axis markings

I have a two small sets of points, viz. (1,a1),...,(9,a9) and (1,b1),...,(9,b9). I'm trying to interpolate these two set of points separately by using splines with the help of ggplot2. So, what I want is 2 different splines curves interpolating the two sets of points on the same plot (Refer to the end of this post).
Since I have a very little plotting experience using ggplot2, I copied a code snippet from this answer by Richard Telford. At first, I stored my Y-values for set of points in two numeric variables A and B, and wrote the following code :
library(ggplot2)
library(plyr)
A <- c(a1,...,a9)
B <- c(b1,...,b9)
d <- data.frame(x=1:9,y=A)
d2 <- data.frame(x=1:9,y=B)
dd <- rbind(cbind(d, case = "d"), cbind(d2, case = "d2"))
ddsmooth <- plyr::ddply(dd, .(case), function(k) as.data.frame(spline(k)))
ggplot(dd,aes(x, y, group = case)) + geom_point() + geom_line(aes(x, y, group = case), data = ddsmooth)
This produces the following output :
Now, I'm seeking for an almost identical plot with the following customizations :
The two spline curves should have different colours
The line width should be user's choice (Like we do in plot function)
A legend (Specifying the colour and the corresponding attribute)
Markings on the X-axis should be 1,2,3,...,9
Hoping for a detailed solution to my problem, though any kind of help is appreciated. Thanks in advance for your time and help.
You have already shaped your data correctly for the plot. It's just a case of associating the case variable with colour and size scales.
Note the following:
I have inferred the values of A and B from your plot
Since the lines are opaque, we plot them first so that the points are still visible
I have included size and colour parameters to the aes call in geom_line
I have selected the colours by passing them as a character vector to scale_colour_manual
I have also selected the sizes of the lines by calling scale_size_manual
I have set the x axis breaks by adding a call to scale_x_continuous
The legend has been added automatically according to the scales used.
ggplot(dd, aes(x, y)) +
geom_line(aes(colour = case, size = case, linetype = case), data = ddsmooth) +
geom_point(colour = "black") +
scale_colour_manual(values = c("red4", "forestgreen"), name = "Legend") +
scale_size_manual(values = c(0.8, 1.5), name = "Legend") +
scale_linetype_manual(values = 1:2, name = "Legend") +
scale_x_continuous(breaks = 1:9)
Created on 2020-07-15 by the reprex package (v0.3.0)

Linear color gradient inside ggplot histogram columns

I am trying to reproduce this plot with ggplot2 :
From what I understood you can call it a histogram with linear color gradient
I stuck on this linear color gradient, I can't figure how to reproduce it within each columns.
I found one work around on another post here :Trying to apply color gradient on histogram in ggplot
But it is quite an old one and does not look well with my data, also it is more a "categorical coloring" than a "gradient coloring".
I found also this one : Plot background colour in gradient but it only applies the gradient on the plot background and not in the columns.
This could be tested using the iris dataset :
ggplot(iris, aes(x=Species, fill=Petal.Width)) +
geom_histogram(stat = "count")
Where the Petal.Width values of each Species would be used as a coloring gradient for each columns of the histogram with a color legend as in the example plot.
Any help is welcome !
As the data is not provided, I use a toy example.
The point is to have two variables one for colouring (grad) and another for the x-axis (x in the example). You need to use desc() to make the higher values placed on the higher position in each bin.
library(tidyverse)
n <- 10000
grad <- runif(n, min = 0, max = 100) %>% round()
x <- sample(letters, size = n, replace = T)
tibble(x, grad) %>%
ggplot(aes(x = x, group = desc(grad), fill = grad)) +
geom_bar(stat = 'count') +
scale_fill_viridis_c()
Created on 2020-05-14 by the reprex package (v0.3.0)
Or, using iris, the example is like:
library(tidyverse)
ggplot(iris, aes(x=Species, group = desc(Petal.Width), fill=Petal.Width)) +
geom_histogram(stat = "count") +
scale_fill_viridis_c()
#> Warning: Ignoring unknown parameters: binwidth, bins, pad
Created on 2020-05-14 by the reprex package (v0.3.0)

ggrepel: Repelling text in only one direction, and returning values of repelled text

I have a dataset, where each data point has an x-value that is constrained (represents an actual instance of a quantitative variable), y-value that is arbitrary (exists simply to provide a dimension to spread out text), and a label. My datasets can be very large, and there is often text overlap, even when I try to spread the data across the y-axis as much as possible.
Hence, I am trying to use the new ggrepel. However, I am trying to keep the text labels constrained at their x-value position, while only allowing them to repel from each other in the y-direction.
As an example, the below code produces an plot for 32 data points, where the x-values show the number of cylinders in a car, and the y-values are determined randomly (have no meaning but to provide a second dimension for text plotting purposes). Without using ggrepel, there is significant overlap in the text:
library(ggrepel)
library(ggplot2)
set.seed(1)
data = data.frame(x=runif(100, 1, 10),y=runif(100, 1, 10),label=paste0("label",seq(1:100)))
origPlot <- ggplot(data) +
geom_point(aes(x, y), color = 'red') +
geom_text(aes(x, y, label = label)) +
theme_classic(base_size = 16)
I can remedy the text overlap using ggrepel, as shown below. However, this changes not only the y-values, but also the x-values. I am trying to avoid changing the x-values, as they represent an actual physical meaning (the number of cylinders):
repelPlot <- ggplot(data) +
geom_point(aes(x, y), color = 'red') +
geom_text_repel(aes(x, y, label = label)) +
theme_classic(base_size = 16)
As a note, the reason I cannot allow the x-value of the text to change is because I am only plotting the text (not the points). Whereas, it seems that most examples in ggrepel keep the position of the points (so that their values remain true), and only repel the x and y values of the labels. Then, the points and connected to the labels with segments (you can see that in my second plot example).
I kept the points in the two examples above for demonstration purposes. However, I am only retaining the text (and hence will be removing the points and the segments), leaving me with something like this:
repelPlot2 <- ggplot(data) + geom_text_repel(aes(x, y, label = label), segment.size = 0) + theme_classic(base_size = 16)
My question is two fold:
1) Is it possible for me to repel the text labels only in the y-direction?
2) Is it possible for me to obtain a structure containing the new (repelled) y-values of the text?
Thank you for any advice!
ggrepel version 0.6.8 (Install from GitHub using devtools::github_install) now supports a "direction" argument, which enables repelling of labels only in "x" or "y" direction.
repelPlot2 <- ggplot(data) + geom_text_repel(aes(x, y, label = label), segment.size = 0, direction = "y") + theme_classic(base_size = 16)
Getting the y values is harder -- one approach can be to use the "repel_boxes" function from ggrepel first to get repelled values and then input those into ggplot with geom_text. For discussion and sample code of that approach, see https://github.com/slowkow/ggrepel/issues/24. Note that if using the latest version, the repel_boxes function now also has a "direction" argument, which takes in "both","x", or "y".
I don't think it is possible to repel text labels only in one direction with ggrepel.
I would approach this problem differently, by instead generating the arbitrary y-axis positions manually. For example, for the data set in your example, you could do this using the code below.
I have used the dplyr package to group the data set by the values of x, and then created a new column of data y containing the row numbers within each group. The row numbers are then used as the values for the y-axis.
library(ggplot2)
library(dplyr)
data <- data.frame(x = mtcars$cyl, label = paste0("label", seq(1:32)))
data <- data %>%
group_by(x) %>%
mutate(y = row_number())
ggplot(data, aes(x = x, y = y, label = label)) +
geom_text(size = 2) +
xlim(3.5, 8.5) +
theme_classic(base_size = 8)
ggsave("filename.png", width = 4, height = 2)

Resources