R ggplot overlapping lines to use matplotlib colour behaviour - r

When two lines coincide, matplotlib uses the "sum" of the two line colours, while ggplot uses colour from one line. The matplotlib way makes it clearer that the two lines are overlapping. Is it possible to make ggplot do the similar colouring?
Setting alpha sort of does that, but with alpha, the resulting colour is dominated by the top colour. (If alpha = 0.5, then top colour gets opacity 0.5 and under colour gets opacity 0.5 * 0.5.)
matplotlib
pd.DataFrame({'A' : [0,1,2,3, 4], 'B' : [-1, 0, 2, 3, 0]}).plot(title = 'matplotlib in python')
ggplot
dt = data.table(name = rep(c('A','B'), each = 5),
y = c(0,1,2,3,4,-1, 0, 2, 3, 0),
x = 1:5)
ggplot(dt) +
geom_line(aes(x = x, y = y, col = name)) +
ggtitle('ggplot in R')

Related

ggplot2 geom_pointline doesn't link centers of points

I would like to create a plot with points and lines between them, but with spaces, in ggplot2, R. I have a shaded area in the plot, so some parts of points has gray and white background. I found lemon library with geom_pointline function.
ggplot(data = dt, aes(x = x, y = y)) +
geom_ribbon(aes(ymin = min, ymax = max), fill = "gray", alpha = 0.35) +
geom_pointline(shape = 19, linecolor = "black", size = 4, color = "blue", distance = 2)
The result I get is shown below. As one can notice, the lines don't start and end in the middle of points, but rather at the top right and bottom left of the point. It gets even worse when I shorten the lines. I tried with many parameters but couldn't solve it. I would like the lines to start and end closer to the middle than it is now.
Thanks in advance!
If switching to an other package is an option for you then one option to achieve your desired result would be ggh4x::geom_pointpath whichs similar to geom_pointline adds some padding around points along a line or path. One drawback is that TBMK it has no option to set different colors for the points and the lines. A hack would be to draw the lines via ggh4x::geom_pointpath then add a geom_point on top of it.
Using some fake example data:
set.seed(123)
dt <- data.frame(
x = seq(20, 160, 20),
y = 1:8,
min = 1:8 - runif(8),
max = 1:8 + runif(8)
)
library(ggplot2)
library(ggh4x)
ggplot(data = dt, aes(x = x, y = y)) +
geom_ribbon(aes(ymin = min, ymax = max), fill = "gray", alpha = 0.35) +
geom_pointpath(shape = 19, size = 4, color = "black", mult = .25) +
geom_point(shape = 19, size = 4, color = "blue")

Time-varying width of geom_segment in ggplot2

I would like to plot a rectangle whose width increases as the x-axis on a plot increases. Geom_segment is a great way to plot lines but you cannot map size within aes(). You can only select one size for the entire segment:
geom_segment(aes(x=5,xend=10,y=10,yend=10),size=10)
This doesn't work, the size doesn't vary with the value of x_axis_variable:
geom_segment(aes(x=5,xend=10,y=10,yend=10,size=x_axis_variable))
where x_axis_variable is whatever continuous variable you have plotted on the x-axis.
Is there a workaround, or some other option, to plot a single line whose size varies along the X or Y axes?
I'm happy to post some example data, but I'm actually not sure how helpful it would be for this question because it's not dependent upon data structure. I think it's just an artifact of geom_segment and hopefully there's another option. Thanks!
Edit with sort of the expected output:
Except that I'd like the line to increase gradually over the x-axis, not discretely as in the example.
Can you just use geom_line()?
library(tidyverse)
library(ggplot2)
d <- tibble(x = 1:20, y=5)
ggplot(d, aes(x=x, y=y, size=I(x), color=x)) +
geom_line()
Geom_segment is a great way to plot lines but you cannot map size
within aes().
Is this premise true? Check out my artistic chart:
ggplot(mtcars) +
geom_segment(aes(wt, mpg, xend = dplyr::lead(wt),
yend = dplyr::lead(mpg), size = gear))
Or this:
ggplot(data = data.frame(x = 1:5),
aes(x = x, xend = x + 1,
y = 0, yend = 0, size = x)) +
geom_segment()
geom_segment draws one segment with one size for each element of data you map. If you want the single segment to vary along its length, you might use ggforce::geom_link, like here, where it interpolates the size by making the segment into many pieces.
ggplot() +
geom_segment(aes(x = 0, xend = 1, y = 0, yend = 0)) +
ggforce::geom_link(aes(x = 0, xend = 1, y = 0.5, yend = 0.5, size = after_stat(index)^2)) +
scale_size(range = c(0,10))
For a rectangle you might do something like:
ggplot() +
ggforce::geom_link2(aes(x = c(0, 0, 1, 1, 0),
xend = c(0, 1, 1, 0, 0),
y = c(0,1,1,0, 0),
yend = c(1,1,0,0, 1),
size = c(1,1,2,2, 1)), lineend = "round")

Transform x,y coordinate space in ggplot

I know that you can transform the coordinates of a plot using coord_trans(), and you can even perform coordinate transformations along both axes (e.g. coord_trans(x = "log10", y = "log10")), but is there a way to perform a coordinate transformation that depends on the values of both axes, like a shear?
I know that I can perform the linear transformation before I pass my data to ggplot using something like ggforce::linear_trans() like this example:
trans <- linear_trans(shear(1, 0))
square <- data.frame(x = c(0, 0, 1, 1), y = c(0, 1, 1, 0))
square2 <- trans$transform(square$x, square$y)
ggplot(square2, aes(x, y)) +
geom_polygon(colour = 'black')
However, I'm hoping that there would be a way to write a custom coordinate system such that the data doesn't need to be transformed beforehand, e.g.:
square <- data.frame(x = c(0, 0, 1, 1), y = c(0, 1, 1, 0))
ggplot(square, aes(x, y)) +
geom_polygon(colour = 'black') +
coord_shear(x=1)
I implemented a custom coord that does this. It takes a transformer like that produced by ggforce::linear_trans and applies it to a ggplot. Check it out in my deeptime package here.

Colours across Plots / Heatmaps in R

I am creating a number of heatmaps in R, but I am having problems when it comes to keeping the colour scale consistent across graphs.
I find that the colours are scaled within a graph, is there a way to make colours consistent across graphs? Ie. So that that colour difference between a value of 0.4 and 0.5 is always the same?
Code Example:
set.seed(123)
d1 = matrix(rnorm(9, mean = 0.2, sd = 0.1), ncol = 3)
d2 = matrix(rnorm(9, mean = 0.8, sd = 0.1), ncol = 3)
mat = list(d1, d2)
for(m in mat)
heatmap(m, Rowv = NA ,Colv = NA)
You'll note in the example that cell (2,3) the first graph is similar to cell (1,3) in the second, despite being ~0.8 different
Here's a way to do it with ggplot2, if you're open to not using base graphics:
library(reshape2)
library(ggplot2)
# Set common limits for color scale
limits = range(unlist(mat))
Here's the code for two separate graphs. The last line of code for each graph ensures that they use the same z limits for setting the colors:
ggplot(melt(mat[[1]]), aes(Var1, Var2, fill=value)) +
geom_tile() +
scale_fill_continuous(limits=limits)
ggplot(melt(mat[[2]]), aes(Var1, Var2, fill=value)) +
geom_tile() +
scale_fill_continuous(limits=limits)
Another option is to plot both heatmaps in a single graph using facetting, which automatically ensures both graphs are on the same color scale:
ggplot(melt(mat), aes(Var1, Var2, fill=value)) +
geom_tile() +
facet_grid(. ~ L1)
I've used the default colors here, but for either approach you can set the color scale to be anything you wish. For example:
ggplot(melt(mat), aes(Var1, Var2, fill=value)) +
geom_tile() +
facet_grid(. ~ L1) +
scale_fill_gradient(low="red", high="green")
You could use the image function directly (heatmap uses image), though it will require some extra formatting to match the output of heatmap. You can use zlim to set the color range. Quoting from the ?image page:
the minimum and maximum z values for which colors should be plotted,
defaulting to the range of the finite values of z. Each of the given
colors will be used to color an equispaced interval of this range. The
midpoints of the intervals cover the range, so that values just
outside the range will be plotted.
# define zlim min and max for all the plots
minz = Reduce(min, mat)
maxz = Reduce(max, mat)
for(m in mat) {
image( m, zlim = c(minz, maxz), col = heat.colors(20))
}
To get closer to the formatting produced by heatmap, you can just reuse some code from the heatmap function:
for(m in mat) {
labCol = dim(m)[2]
labRow = dim(m)[1]
image(seq_len(labCol), seq_len(labRow), m, zlim = c(minz, maxz),
col = heat.colors(20), axes = FALSE, xlab = "", ylab = "",
xlim = 0.5 + c(0, labCol), ylim = 0.5 + c(0, labRow))
axis(1, 1L:labCol, labels = seq_len(labCol), las = 2, line = -0.5, tick = 0)
axis(4, 1L:labRow, labels = seq_len(labRow), las = 2, line = -0.5, tick = 0)
}
Using the breaks argument to image is another option. It allows more flexibility than zlim in setting the breakpoints for colors. Quoting from the help page, breaks is
a set of finite numeric breakpoints for the colours: must have one
more breakpoint than colour and be in increasing order. Unsorted
vectors will be sorted, with a warning.

Place annotation at the top of a series of histograms in ggplot2 using a for loop

I am creating a number of histograms and I want to add annotations towards the top of the graph. I am plotting these using a for loop so I need a way to place the annotations at the top even though my ylims change from graph to graph. If I could store the ylim for each graph within the loop I could cause the y coordinates for my annotation to vary based on the current graph. The y value I include in my annotation must change dynamically as the loop proceeds across iterations. Here is some sample code to demonstrate my issue (Notice how the annotation moves around. I need it to change based on the ylim for each graph):
library(ggplot2)
cuts <- levels(as.factor(diamonds$cut))
pdf(file = "Annotation Example.pdf", width = 11, height = 8,
family = "Helvetica", bg = "white")
for (i in 1:length(cuts)) {
by.cut<-subset(diamonds, diamonds$cut == cuts[[i]])
print(ggplot(by.cut, aes(price)) +
geom_histogram(fill = "steelblue", alpha = .55) +
annotate ("text", label = "My annotation goes at the top", x = 10000 ,hjust = 0, y = 220, color = "darkred"))
}
dev.off()
ggplot uses Inf in its positions to represent the extremes of the plot range, without changing the plot range. So the y value of the annotation can be set to Inf, and the vjust parameter can also be adjusted to get a better alignment.
...
print(ggplot(by.cut, aes(price)) +
geom_histogram(fill = "steelblue", alpha = .55) +
annotate("text", label = "My annotation goes at the top",
x = 10000, hjust = 0, y = Inf, vjust = 2, color = "darkred"))
...
For i<-2, this looks as:
There may be a neater way, but you can get the max count and use that to set y in the annotate call:
for (i in 1:length(cuts)) {
by.cut<-subset(diamonds, diamonds$cut == cuts[[i]])
## get the cut points that ggplot will use. defaults to 30 bins and thus 29 cuts
by.cut$cuts <- cut(by.cut$price, seq(min(by.cut$price), max(by.cut$price), length.out=29))
## get the highest count of prices in a given cut.
y.max <- max(tapply(by.cut$price, by.cut$cuts, length))
print(ggplot(by.cut, aes(price)) +
geom_histogram(fill = "steelblue", alpha = .55) +
## change y = 220 to y = y.max as defined above
annotate ("text", label = "My annotation goes at the top", x = 10000 ,hjust = 0, y = y.max, color = "darkred"))
}

Resources