Varying gradient using ggplot2 in R - r

I am trying to create a plot where the color gradient changes by both the x and y axis. More specifically I am trying set up the gradients so that the hue range changes along the x axis and the value changes along the y axis.
For an example I am working with a sine curve with some noise along -pi to pi.
set.seed(5678)
x <- seq(-1*pi, 1*pi, 0.01)
y <- sin(x) + rnorm(length(y))
df <- cbind.data.frame(x, y)
ggplot(df, aes(x=x, y=y)) + geom_line()
Now I want to colorize the line so that the hue progresses from red-orange to orange-yellow to yellow-green, etc. along the x axis and then will take on different values in that range depending on its y value. So at x=-pi, y=2 might be red and y=-2 might be yellow while at x=0, y=2 might be green and y=-2 might be blue.
Has anyone tried to create a graph like this?

Here's an option for doing it using a hue calculated from x and y:
df$hue <- pmax(pmin((df$x + pi)/pi/3 + (2 - df$y) / 12, 1), 0)
ggplot(df, aes(x=x, y=y, group = 1, colour = hsv(hue, 1, 1))) + geom_path() +
scale_colour_identity()
Note because the lines are quite long vertically so the effect isn't fully seen. Here's a version using approx to interpolate:
adf <- as.data.frame(approx(df, xout = seq(-pi, max(df$x), 0.001)))
adf$hue <- pmax(pmin((adf$x + pi)/pi/3 + (2 - adf$y) / 12, 1), 0)
ggplot(adf, aes(x=x, y=y, group = 1, colour = hsv(hue, 1, 1))) + geom_path() +
scale_colour_identity()
In both cases, it's the hue that's dependent on both x and y, with value held constant. That fits your proposed example, if not your original description. Clearly it could be tailored to vary hue and value separately. It's also worth noting that there needs to be a group set. Otherwise ggplot2 tries to join together all the points of the same colour.

Related

R - creating a bar and line on same chart, how to add a second y axis

I'm trying to create a ggplot2 graph showing a bar graph and a line graph overlaying each other. In excel this would be done by adding a second axis.
The x axis represents product type, the y values of the bar graph should represent revenue, and the line graph I want to represent profit margin as a percentage. The value of the line graph and the bar chart should be independent of each other, i.e. there is no such relationship.
require(ggplot2)
df <- data.frame(x = c(1:5), y = abs(rnorm(5)*100))
df$y2 <- abs(rnorm(5))
ggplot(df, mapping= aes(x=as.factor(`x`), y = `y`)) +
geom_col(aes(x=as.factor(`x`), y = `y`),fill = 'blue')+
geom_line(mapping= aes(x=as.factor(`x`), y = `y`),group=1) +
geom_label(aes(label= round(y2,2))) +
scale_y_continuous() +
theme_bw() +
theme(axis.text.x = element_text(angle = 20,hjust=1))
The image above produces almost what I want. However, the scaling is incorrect - I would need the 1.38 and 0.23 value to be ordered by magnitude, i.e. the point 0.23 should show below 1.38. I am also not sure how to add another axsis on the right hand side.
Starting with version 2.2.0 of ggplot2, it is possible to add a secondary axis - see this detailed demo. Also, some already answered questions with this approach: here, here, here or here. An interesting discussion about adding a second OY axis here.
The main idea is that one needs to apply a transformation for the second OY axis. In the example below, the transformation factor is the ratio between the max values of each OY axis.
# Prepare data
library(ggplot2)
set.seed(2018)
df <- data.frame(x = c(1:5), y = abs(rnorm(5)*100))
df$y2 <- abs(rnorm(5))
# The transformation factor
transf_fact <- max(df$y)/max(df$y2)
# Plot
ggplot(data = df,
mapping = aes(x = as.factor(x),
y = y)) +
geom_col(fill = 'blue') +
# Apply the factor on values appearing on second OY axis
geom_line(aes(y = transf_fact * y2), group = 1) +
# Add second OY axis; note the transformation back (division)
scale_y_continuous(sec.axis = sec_axis(trans = ~ . / transf_fact,
name = "Second axis")) +
geom_label(aes(y = transf_fact * y2,
label = round(y2, 2))) +
theme_bw() +
theme(axis.text.x = element_text(angle = 20, hjust = 1))
But if you have a particular wish for the one-to-one transformation, like, say value 100 from Y1 should correspond to value 1 from Y2 (200 to 2 and so on), then change the transformation (multiplication) factor to 100 (100/1): transf_fact <- 100/1 and you get this:
The advantage of transf_fact <- max(df$y)/max(df$y2) is using the plotting area in a optimum way when using two different scales - try something like transf_fact <- 1000/1 and I think you'll get the idea.

DBSCAN clustering plotting through ggplot2

I am trying to plot the dbscan clustering result through ggplot2. If I understand it correctly the current dbscan plots noise in black colour with base plot function. Some code first,
library(dbscan)
n <- 100
x <- cbind(
x = runif(5, 0, 10) + rnorm(n, sd = 0.2),
y = runif(5, 0, 10) + rnorm(n, sd = 0.2)
)
plot(x)
kNNdistplot(x, k = 5)
abline(h=.25, col = "red", lty=2)
res <- dbscan::dbscan(x, eps = .25, minPts = 4)
plot(res, x, main = "DBSCAN")
x <- data.frame(x)
ggplot(x, aes(x = x, y=y)) + geom_point(color = res$cluster+1, pch = clusym[res$cluster+1])
+ theme_grey() + ggtitle("(c)") + labs(x ="x", y = "y")
I want two things to do differently here, first trying to plot the clustering output through ggplot(). The difficulty is if I use res$cluster to plot points the plot() will ignore points with 0 labels (which are noise points), and ggplots() will though error as length of res$cluster will be smaller than actual data to plot and if I try to use res$cluster+1 it will give 1 to noise points, which I don't want. And secondly if possible try to do something which clusym[] in package fpc does. It plots clusters with labels 1, 2, 3, ... and ignores 0 labels. Thats fine if my labels for noise points are still 0 and then giving any specific symbol say "*" to noise point with a specific colour lets say grey. I have seen a stack overflow post which tries to do similar thing for convex hull plotting but couldn't still figure out how to do this if I don't want to draw the hull and want a clustering number for each cluster.
A possibility which I thought was first plot the points without noise and then additional adding noise points with the desired colour and symbols to the original plot .
But since the res$cluster length is not equal to x it is thronging error.
ggplot(x, aes(x = x, y=y)) + geom_point(color = res$cluster+1, pch = clusym[res$cluster+1])
+ theme_grey() + ggtitle("(c)") + labs(x ="x", y = "y") + adding noise points
Error: Aesthetics must be either length 1 or the same as the data (100): shape, colour
You should first subset the third column from the output of DBSCAN, tack that onto your original data as a new column (i.e. as cluster), and assign that as a factor.
When you make the ggplot, you can assign color or shape to cluster. As for ignoring the noise points, I would do it as follows.
data <- dataframe with the cluster column (still in numeric form).
data2 <- dplyr::filter(data, cluster > 0)
data2$cluster <- as.factor(data2$cluster)
ggplot(data2, aes(x = x, y = y) +
geom_point(aes(color = `cluster`))

ggplot2 z clipping: remove unnecessary points in overlapping stacks

Consider the following example of plotting 100 overlapping points:
ggplot(data.frame(x=rnorm(100), y=rnorm(100)), aes(x=x, y=y)) +
geom_point(size=100) +
xlim(-10, 10) +
ylim(-10, 10)
I now want to save the image as vector graphics, e.g. in PDF. This is not a problem with the above example, but once I've got over a million points (e.g. from a volcano plot), the file size can exceed 100 MB for one page and it takes ages to display or edit.
In the above example the same shape could could still be represented by either
converting the points to a shape outline, or
keeping a couple of points and discarding the rest.
Is there any way (or preferably tool that already does this) to remove points from a plot that will never be visible? (ideally supporting transparency)
The best approach I have heard so far is to round the position of the dots and remove grid points that have > N points, then use the original positions of the remaining ones. Is there anything better?
Note that this should work with an arbitrary structure of points, and only remove those that are not visible.
You could do something with the convex hull, like this, filling in the polygon that makes up the convex hull:
library(ggplot2)
set.seed(123)
df <- data.frame(x = rnorm(100), y = rnorm(100))
idx <- chull(df)
ggplot(df, aes(x = x, y = y)) +
geom_point(size = 100,color="darkgrey") +
geom_polygon(data=df[idx,],color="blue") +
geom_point(size = 1, color = "red", size = 2) +
xlim(-10, 10) +
ylim(-10, 10)
yielding:
(Note that I pulled this chull-idea out of Hadley's "Extending ggplot2" guide https://cran.r-project.org/web/packages/ggplot2/vignettes/extending-ggplot2.html.)
In your case you would drop the geom_point calls and set transparency on the geom_polygon. Also not sure how fast chull is for millions of points, though it will clearly be faster than plotting them all.
And I am not really sure what you are after. If you really want the 100 pixel radius, they you could probably just do it for the ones on the complex hull, plus fill in the middle with geom_polygon.
So using this code:
ggplot(df[idx,], aes(x = x, y = y)) +
geom_point(size = 100, color = "black") +
geom_polygon(fill = "black") +
xlim(-10, 10) +
ylim(-10, 10)
to make this:

Combine guides for continuous fill (color) and alpha scales

In ggplot2, I am making a geom_tile plot where both color and alpha vary with the same variable, I would like to make a single guide that shows the colors the way they appear on the plot instead of two separate guides.
library(ggplot2)
x <- seq(-10,10,0.1)
data <- expand.grid(x=x,y=x)
data$z <- with(data,y^2 * dnorm(sqrt(x^2 + y^2), 0, 3))
p <- ggplot(data) + geom_tile(aes(x=x,y=y, fill = z, alpha = z))
p <- p + scale_fill_continuous(low="blue", high="red") + scale_alpha_continuous(range=c(0.2,1.0))
plot(p)
This produces a figure with two guides: one for color and one for alpha. I would like to have just one guide on which both color and alpha vary together the way they do in the figure (so as the color shifts to blue, it fades out)
For this figure, I could achieve a similar effect by varying the saturation instead of alpha, but the real project in which I am using this, I will be overlaying this layer on top of a map, and want to vary alpha so the map is more clearly visible for smaller values of the z-variable.
I don't think you can combine continuous scales into one legend, but you can combine discrete scales. For example:
# Create discrete version of z
data$z.cut = cut(data$z, seq(min(data$z), max(data$z), length.out=10))
ggplot(data) +
geom_tile(aes(x=x, y=y, fill=z.cut, alpha=z.cut)) +
scale_fill_hue(h=c(-60, -120), c=100, l=50) +
scale_alpha_discrete(range=c(0.2,1))
You can of course cut z at different, perhaps more convenient, values and change scale_fill_hue to whatever color scale you prefer.

How to conditionally highlight points in ggplot2 facet plots - mapping color to column

In the following example I create two series of points and plot them using ggplot2. I also highlight several points based on their values
library(ggplot2)
x <- seq(0, 6, .5)
y.a <- .1 * x -.1
y.b <- sin(x)
df <- data.frame(x=x, y=y.a, case='a')
df <- rbind(df, data.frame(x=x, y=y.b, case='b'))
print(ggplot(df) + geom_point(aes(x, y), color=ifelse(df$y<0, 'red', 'black')))
And here is the result
Now I want to separate the two cases into two facets, keeping the highlighting scheme
> print(ggplot(df) + geom_point(aes(x, y), color=ifelse(df$y<0, 'red', 'black')) + facet_grid(case ~. ,))
Error: Incompatible lengths for set aesthetics: colour
How can this be acheived?
You should put color=ifelse(y<0, 'red', 'black') inside the aes(), so color will be set according to y values in each facet independently. If color is set outside the aes() as vector then the same vector (with the same length) is used in both facets and then you get error because length of color vector is larger as number of data points.
Then you should add scale_color_identity() to ensure that color names are interpreted directly.
ggplot(df) + geom_point(aes(x, y, color=ifelse(y<0, 'red', 'black'))) +
facet_grid(case ~. ,)+scale_color_identity()
Instead of using scale_..._identity, one can also wrap the color (and fill) aesthetic in I(). It also requires having color defined in aes.
I came across this question, where the OP I guess kind of accidentally made use of I()... ggplot color is not automatically coloring based on group
Not sure I would every make use of that, but I find this kind of fun.
library(ggplot2)
x <- seq(0, 6, .5)
y.a <- .1 * x -.1
y.b <- sin(x)
df <- data.frame(x=x, y=y.a, case='a')
df <- rbind(df, data.frame(x=x, y=y.b, case='b'))
ggplot(df) +
geom_point(aes(x, y, color= I(ifelse(y < 0, 'red', 'black')))) +
facet_grid(case ~. )
Created on 2020-07-01 by the reprex package (v0.3.0)

Resources