ggplot - extract the values associated with breaks in a continuous colour scale - r

I would like to extract the breaks and the colour values associated with a ggplot continuous colour scale. There are multiple answers to finding the colour associated with each date point (like this), which can also be used to get discrete scale values, but I haven't seen an approach for a continuous colour scale. I don't want to force the scales, just retrieve the values that ggplot generates.
example:
library(ggplot)
df <- data.frame(x = 1:10, y = 1:10, col = 11:20)
ggplot(df) +
geom_point(aes(x = x, y = y, colour = col))
I would like to get a data frame showing breaks (12.5, 15, 17.5, 20) and the colour values associated with them.
Many thanks!

There are two ways of doing this, once with building the plot and once without building the plot.
If we build the plot;
library(ggplot2)
df <- data.frame(x = 1:10, y = 1:10, col = 11:20)
ggplot(df) +
geom_point(aes(x = x, y = y, colour = col))
We can extract the scale and use it to retrieve the relevant information.
# Using build plot
build <- ggplot_build(last_plot())
scale <- build$plot$scales$get_scales("colour")
breaks <- scale$get_breaks()
colours <- scale$map(breaks)
data.frame(breaks = breaks, colours = colours)
#> breaks colours
#> 1 NA grey50
#> 2 12.5 #1D3F5E
#> 3 15.0 #2F638E
#> 4 17.5 #4289C1
#> 5 20.0 #56B1F7
Alternatively, we can skip building the plot and use the scales themselves directly, provided we 'train' the scales by showing it the limits of the data.
scale <- scale_colour_continuous()
scale$train(range(df$col))
breaks <- scale$get_breaks()
colours <- scale$map(breaks)
data.frame(breaks = breaks, colours = colours)
As you can see, the default breaks algorithm produces an out-of-bounds break. If you want to use the information later on, it might be good to filter those out.

Related

How to fix overlapping hexagons with geom_hex() and ggplot()?

I am plotting some data, basically I have coordinates and a value for each point. I wanted to make a hexagon map, with each hexagon averaging all the point values that correspond to that hexagon.
I manage to produce the map, but the some of the hexagons are overlapping and I am not sure how to fix it.
Here is my code:
pp = ggplot(df, aes(x = lon, y = lat, fill=value, group=value)) +
geom_hex(bins = 50, linewidth = 10)
pp
And the plot:
If you want the hexagons to be colored according to the average value, you will need stat_summary_hex, passing the numeric value to the z aesthetic, which by default is averaged in each hex bin.
Don't group by value - this effectively creates a layer of hexbins for each value, and this is what leads to the bins being in different positions in each group. Also, the values can't be averaged if they are in different groups.
library(ggplot2)
ggplot(df, aes(x = lon, y = lat)) +
stat_summary_hex(aes(z = as.numeric(as.character(value))),
bins = 50, linewidth = 10) +
scale_fill_gradientn(colors = scales::hue_pal()(5))
Note that the latest CRAN version of ggplot has an issue with hex-binning, and you will need to install the development version to get a decent result here. See this question for further details.
Created on 2023-01-04 with reprex v2.0.2
Data used
set.seed(1)
df <- data.frame(lon = rnorm(1000, 5.5), lat = rnorm(1000, 52.5),
value = factor(sample(0:4, 1000, TRUE)))

Create standard color scale for several graphs

I am trying to create a custom color scale for several graphs. I would like it to be a standard color scheme so that the two graphs can be compared. The data for the first graph has a much smaller range (its maximum is just a bit above 3) while the other one goes to 9. Therefore, I need colors to match numbers 4-9 but do not want them to appear in the first graph. However, they always do and I do not understand why.
Here is the data for the first graph:
df <- data.frame(
x = runif(100),
y = runif(100),
z1 = rnorm(100),
z2 = abs(rnorm(100))
)
And here is the graph, with the custom color scale. However, as you can see all the colors appear in the graph even though only the first 5 colors should show up.
ggplot(df, aes(x, y)) +
geom_point(aes(colour = z2))+scale_colour_gradientn(colours = c('springgreen1', 'springgreen4', 'yellowgreen','yellow2','lightsalmon','orange','orange3','orange4','navajowhite3','white'),breaks=c(0,1,2,3,4,5,6,7,8,9))
The limits term of scale_colour_gradientn can help here:
ggplot(df, aes(x, y)) +
geom_point(aes(colour = z2))+
scale_colour_gradientn(colours = c('springgreen1', 'springgreen4', 'yellowgreen','yellow2',
'lightsalmon','orange','orange3','orange4','navajowhite3','white'),
breaks=c(0,1,2,3,4,5,6,7,8,9),
limits = c(0,9)) +
theme(legend.key.height = unit(1.5, "cm"))

DBSCAN clustering plotting through ggplot2

I am trying to plot the dbscan clustering result through ggplot2. If I understand it correctly the current dbscan plots noise in black colour with base plot function. Some code first,
library(dbscan)
n <- 100
x <- cbind(
x = runif(5, 0, 10) + rnorm(n, sd = 0.2),
y = runif(5, 0, 10) + rnorm(n, sd = 0.2)
)
plot(x)
kNNdistplot(x, k = 5)
abline(h=.25, col = "red", lty=2)
res <- dbscan::dbscan(x, eps = .25, minPts = 4)
plot(res, x, main = "DBSCAN")
x <- data.frame(x)
ggplot(x, aes(x = x, y=y)) + geom_point(color = res$cluster+1, pch = clusym[res$cluster+1])
+ theme_grey() + ggtitle("(c)") + labs(x ="x", y = "y")
I want two things to do differently here, first trying to plot the clustering output through ggplot(). The difficulty is if I use res$cluster to plot points the plot() will ignore points with 0 labels (which are noise points), and ggplots() will though error as length of res$cluster will be smaller than actual data to plot and if I try to use res$cluster+1 it will give 1 to noise points, which I don't want. And secondly if possible try to do something which clusym[] in package fpc does. It plots clusters with labels 1, 2, 3, ... and ignores 0 labels. Thats fine if my labels for noise points are still 0 and then giving any specific symbol say "*" to noise point with a specific colour lets say grey. I have seen a stack overflow post which tries to do similar thing for convex hull plotting but couldn't still figure out how to do this if I don't want to draw the hull and want a clustering number for each cluster.
A possibility which I thought was first plot the points without noise and then additional adding noise points with the desired colour and symbols to the original plot .
But since the res$cluster length is not equal to x it is thronging error.
ggplot(x, aes(x = x, y=y)) + geom_point(color = res$cluster+1, pch = clusym[res$cluster+1])
+ theme_grey() + ggtitle("(c)") + labs(x ="x", y = "y") + adding noise points
Error: Aesthetics must be either length 1 or the same as the data (100): shape, colour
You should first subset the third column from the output of DBSCAN, tack that onto your original data as a new column (i.e. as cluster), and assign that as a factor.
When you make the ggplot, you can assign color or shape to cluster. As for ignoring the noise points, I would do it as follows.
data <- dataframe with the cluster column (still in numeric form).
data2 <- dplyr::filter(data, cluster > 0)
data2$cluster <- as.factor(data2$cluster)
ggplot(data2, aes(x = x, y = y) +
geom_point(aes(color = `cluster`))

R - ggplot geom_dotplot shape option

I want to use geom_dotplot to distinguish two different variables by shape of the dots (rather than colours as the documentation suggests). For example:
library(ggplot2)
set.seed(1)
x = rnorm(20)
y = rnorm(20)
df = data.frame(x,y)
ggplot(data = df) +
geom_dotplot(aes(x = x), fill = "red") +
geom_dotplot(aes(x=y), fill = "blue")
i.e. to distinguish the x and y in the below example
I want to set all the x to be dots, and y to be triangles.
Is this possible?
Thanks!
You could probably hack together something similar to what you want using the information from geom_dotplot plus base R's stripchart function.
#Save the dot plot in an object.
dotplot <- ggplot(data = df) +
geom_dotplot(aes(x = x), fill = "red") +
geom_dotplot(aes(x=y), fill = "blue")
#Use ggplot_build to save information including the x values.
dotplot_ggbuild <- ggplot_build(dotplot)
main_info_from_ggbuild_x <- dotplot_ggbuild$data[[1]]
main_info_from_ggbuild_y <- dotplot_ggbuild$data[[2]]
#Include only the first occurrence of each x value.
main_info_from_ggbuild_x <-
main_info_from_ggbuild_x[which(duplicated(main_info_from_ggbuild_x$x) == FALSE),]
main_info_from_ggbuild_y <-
main_info_from_ggbuild_y[which(duplicated(main_info_from_ggbuild_y$x) == FALSE),]
#To demonstrate, let's first roughly reproduce the original plot.
stripchart(rep(main_info_from_ggbuild_x$x,
times=main_info_from_ggbuild_x$count),
pch=19,cex=2,method="stack",at=0,col="red")
stripchart(rep(main_info_from_ggbuild_y$x,
times=main_info_from_ggbuild_y$count),
pch=19,cex=2,method="stack",at=0,col="blue",add=TRUE)
#Now, redo using what we actually want.
#You didn't specify if you want the circles and triangles filled or not.
#If you want them filled in, just change the pch values.
stripchart(rep(main_info_from_ggbuild_x$x,
times=main_info_from_ggbuild_x$count),
pch=21,cex=2,method="stack",at=0)
stripchart(rep(main_info_from_ggbuild_y$x,
times=main_info_from_ggbuild_y$count),
pch=24,cex=2,method="stack",at=0,add=TRUE)

R boxplot and stripchart side-by-side in 1 figure

Is it possible to plot a boxplot and a stripchart next to each other in the same figure? If I run this code, the stripchart overrides the boxplots. What i actually want is that they lay next to each other. In hat way a figure with 10 column on the x-as will be formed. Is that possible?
boxplot(doubles[1:5,])
stripchart(doubles[6:10,],add=TRUE,vertical=TRUE, pch=19)
Some example of you data would be good, but the easiest option is probably:
#random data corresponding to your 5 columns
x <- data.frame(V = rnorm(100), W = rnorm(100), X = rnorm(100), Y = rnorm(100),
Z = rnorm(100))
#remove axis with 'axes=F', define wider x-limits with 'xlim'
stripchart(x[1:5,],vertical=TRUE, pch=19,xlim=c(1,6),axes=F)
#add boxplots next to stripchart, decrease width with 'boxwex'
boxplot(x[1:5,],add=T,at=1.5:5.5,boxwex=0.25,axes=F)
#add custom x axis
axis(1,at=1.25:5.25,labels=names(x))
Use ggplot2
library(ggplot2)
qplot(treatment, decrease, data = OrchardSprays) +
scale_y_log10() +
geom_boxplot() +
geom_point(colour = 'blue', alpha = 0.5)

Resources