Is it possible to define new shapes for plotting? - r

I see from here (http://sape.inf.usi.ch/quick-reference/ggplot2/shape) the set of possible shapes. If I wanted to define new shapes, is it possible? For example, suppose I wanted to use a 7-sided polygon with an optional fill aesthetic -- is there a way to tell ggplot about that shape?
I feel constrained by this set of possibilities:
library(tidyverse)
dat <- tibble(p = c(0:25, 32:127),
x = p %% 16,
y = p %/% 16)
ggplot(dat, aes(x, y)) +
geom_text(aes(label = p), size = 3, nudge_y = -.25) +
geom_point(aes(shape = p), size = 5, fill = "red") +
scale_shape_identity() +
theme_void()

Yes, it's possible to do this in one of several ways. Unless you have an svg file of a 7-sided polygon available, one quick solution would be to define this shape as a grob and plot it using geom_grob from package ggpmisc. This keeps things in vector format.
Creating the heptagon is the hard part:
library(ggplot2)
library(dplyr)
library(grid)
library(ggpmisc)
# Make heptagon
septs <- seq(0, 2 * pi, length.out = 8)
devratio <- dev.size()[2]/dev.size()[1]
heptagon <- linesGrob(x = unit(0.5 + 0.2 * devratio * sin(septs), "npc"),
y = unit(0.5 + 0.2 * cos(septs), "npc"),
gp = gpar(lwd = 2))
The plot itself is straightforward:
# Plot 10 random points with the heptagon
set.seed(69)
tibble(x = rnorm(10), y = rnorm(10), shape = list(heptagon)) %>%
ggplot() +
geom_grob(aes(x, y, label = shape))
As you can see from this example, custom shapes aren't necessarily all that easy to use, since the shape has to be defined pointwise by the user, it is difficult to match its size and lineweight to existing points, and the user would have to define what/where their fill is, etc. I don't think it's an omission from ggplot to not have a simple interface for creating custom shapes - ggplot has great extensibility for advanced users, and it's not clear that you could have a useful shape-creating interface for beginners. Meanwhile there are more than enough shapes to provide informative plots for all but the most niche applications.

Maybe not exactly what you're looking for but let me suggest three packages:
ggimage - allows you to use images, like a .png file, for your points. See https://mran.microsoft.com/snapshot/2018-04-02/web/packages/ggimage/vignettes/ggimage.html
ggpattern - allows you to add different fills to your graphics. See https://github.com/coolbutuseless/ggpattern
emoGG - uses emojis for your points. See https://github.com/dill/emoGG

Related

Plotting order of circles in ggforce

I am revising a problem I posted prevously Limited number of geoms in ggforce?. I thought then that I was mistaken but I can now reconstruct it more clearly:
I would like to plot N circles one on top of the other. No matter how many circles I want to plot, circle #11-N are plotted underneath the first 10 circles. This demonstartes the problem:
library(tidyverse)
library(ggforce)
circles <- data.frame(
x0 = seq(1, 30),
y0 = seq(1, 30),
r = 2
)
ggplot() +
geom_circle(aes(x0 = x0, y0 = y0, r = r), fill = "grey", data = circles) +
coord_fixed()
Therefore when I would like to plot concentric circles, the first 10 circles hide all the rest.
I can code a workaround by first plotting the 11-N circles and then the first ten but its not elegant
This is a bug in ggforce: it groups the points in the circles with strings that don't sort into the same order as the original circles.
To get a version that fixes the bug, use
remotes::install_github("dmurdoch/ggforce#sortorder")
This will require you to have package build tools installed. If you don't have those, you could watch the web site https://github.com/thomasp85/ggforce/issues/224 to see when the fix gets incorporated into the CRAN build of the package.
I can only offer a slightly hacky workaround that may not perform very well for large datasets. Loop through each of the rows creating a single circle then add them all together to make a single object to plot.
library(tidyverse)
library(ggforce)
circles <- data.frame(
x0 = seq(1, 30),
y0 = seq(1, 30),
r = 2
)
mygeom_circle <- function(onerow) {
geom_circle(aes(x0 = x0, y0 = y0, r = r), fill = "grey", data = onerow)
}
gs <- sapply(1:nrow(circles), function(r) mygeom_circle(circles[r,])) # loop through one by one making single circles
gg <-
ggplot() +
gs + # add the single circles in the order they were made
coord_fixed()
gg

stat_density2d tile geom showing grid when exporting as PDF

I'm having an issue when exporting stat_density2d plots.
ggplot(faithful, aes(eruptions, y = waiting, alpha = ..density..)) +
stat_density2d(geom = 'tile', contour = F)
When exporting as a png it looks like so:
But when I export as a PDF a grid appears:
I'm assuming that this is because the boundaries of the tiles overlap and so have equivalent of a doubled alpha value.
How can I edit just the edges of the tiles to stop this from happening?
Secondary question:
As Tjebo mentioned geom = 'raster' would fix this problem. However, this raises a new issue that only one group gets plot.
df <- faithful
df$new <- as.factor(ifelse(df$eruptions>3.5, 1, 0))
ggplot(df, aes(eruptions, waiting, fill = new, alpha = ..density..)) +
stat_density2d(geom = 'tile', contour = F) +
scale_fill_manual(values = c('1' = 'blue', '0' = 'red'))
ggplot(df, aes(eruptions, waiting, fill = new, alpha = ..density..)) +
stat_density2d(geom = 'raster', contour = F) +
scale_fill_manual(values = c('1' = 'blue', '0' = 'red'))
help on this second issue would also be much appreciated!
Now I decided to transform my comment into an answer instead. Hopefully it solves your problem.
There is an old, related google thread on this topic - It seems related to how the plots are vectorized in each pdf viewer.
A hack is suggested in this thread, but one solution might simply be to use geom = 'raster' instead.
library(ggplot2)
ggplot(faithful, aes(eruptions, y = waiting, alpha = ..density..)) +
stat_density2d(geom = 'raster', contour = F)
Created on 2019-08-02 by the reprex package (v0.3.0)
If you have a look at the geom_raster documentation - geom_raster is recommended if you want to export to pdf!
The most common use for rectangles is to draw a surface. You always want to use geom_raster here because it's so much faster, and produces smaller output when saving to PDF
edit - second part of the question
Your tile plot can't be correct - you are creating cut-offs (your x value), so the fill should not overlap. This points to the core of the problem - the alpha=..density.. probably calculates the density based on the entire sample, and not by group. I think the only way to go is to pre-calculate the density (e.g., using density(). In your example, for demonstration purpose, we have this luckily precalculated, as faithfuld (this is likely not showing the results which you really want, as it is the density on the entire sample!!).
I'd furthermore recommend not to use numbers as your factor values - this is pretty confusing for you and R. Use characters instead. Also, ideally don't use df for a sample data frame, as this is a base R function;)
Hope this helps
mydf <- faithfuld ## that is crucial!!! faithfuld contains a precalculated density column which is needed for the plot!!
mydf$new <- as.factor(ifelse(mydf$eruptions>3.5, 'large', 'small'))
ggplot(mydf, aes(eruptions, waiting)) +
geom_raster(aes(fill = new, alpha = density), interpolate = FALSE)
Created on 2019-08-02 by the reprex package (v0.3.0)

How to draw a radar plot in ggplot using polar coordinates? [duplicate]

This question already has an answer here:
ggplot2: connecting points in polar coordinates with a straight line 2
(1 answer)
Closed 2 years ago.
I am trying to use ggplot to draw a radar-chart following the guidelines from the Grammar of Graphics. I am aware of the ggradar package but based on the grammar it looks like coord_polar should be enough here. This is the pseudo-code from the grammar:
So I thought something like this may work, however, the contour of the area chart is curved as if I used geom_line:
library(tidyverse)
dd <- tibble(category = c('A', 'B', 'C'), value = c(2, 7, 4))
ggplot(dd, aes(x = category, y = value, group=1)) +
coord_polar(theta = 'x') +
geom_area(color = 'blue', alpha = .00001) +
geom_point()
While I understand why geom_line draws arcs once in coord_polar, my understanding of the explanation from the Grammar of Graphics is that there may be an element/geom area that could plot straight lines:
here is one technical detail concerning the shape of Figure 9.29. Why
is the outer edge of the area graphic a set of straight lines instead
of arcs? The answer has to do with what is being measured. Since
region is a categorical variable, the line segments linking regions
are not in a metric region of the graph. That is, the segments of the
domain between regions are not measurable and thus the straight lines
or edges linking them are arbitrary and perhaps not subject to
geometric transformation. There is one other problem with the
grammatical specification of this figure. Can you spot it? Undo the
polar trans- formation and think about the domain of the plot. We
cheated.
For completeness, this question derives from this other question I asked about plotting in polar system.
tl;dr we can write a function to solve this problem.
Indeed, ggplot uses a process called data munching for non-linear coordinate systems to draw lines. It basically breaks up a straight line in many pieces, and applies the coordinate transformation on the individual pieces instead of merely the start- and endpoints of lines.
If we look at the panel drawing code of for example GeomArea$draw_group:
function (data, panel_params, coord, na.rm = FALSE)
{
...other_code...
positions <- new_data_frame(list(x = c(data$x, rev(data$x)),
y = c(data$ymax, rev(data$ymin)), id = c(ids, rev(ids))))
munched <- coord_munch(coord, positions, panel_params)
ggname("geom_ribbon", polygonGrob(munched$x, munched$y, id = munched$id,
default.units = "native", gp = gpar(fill = alpha(aes$fill,
aes$alpha), col = aes$colour, lwd = aes$size * .pt,
lty = aes$linetype)))
}
We can see that a coord_munch is applied to the data before it is passed to polygonGrob, which is the grid package function that matters for drawing the data. This happens in almost any line-based geom for which I've checked this.
Subsequently, we would like to know what is going on in coord_munch:
function (coord, data, range, segment_length = 0.01)
{
if (coord$is_linear())
return(coord$transform(data, range))
...other_code...
munched <- munch_data(data, dist, segment_length)
coord$transform(munched, range)
}
We find the logic I mentioned earlier that non-linear coordinate systems break up lines in many pieces, which is handled by ggplot2:::munch_data.
It would seem to me that we can trick ggplot into transforming straight lines, by somehow setting the output of coord$is_linear() to always be true.
Lucky for us, we wouldn't have to get our hands dirty by doing some deep ggproto based stuff if we just override the is_linear() function to return TRUE:
# Almost identical to coord_polar()
coord_straightpolar <- function(theta = 'x', start = 0, direction = 1, clip = "on") {
theta <- match.arg(theta, c("x", "y"))
r <- if (theta == "x")
"y"
else "x"
ggproto(NULL, CoordPolar, theta = theta, r = r, start = start,
direction = sign(direction), clip = clip,
# This is the different bit
is_linear = function(){TRUE})
}
So now we can plot away with straight lines in polar coordinates:
ggplot(dd, aes(x = category, y = value, group=1)) +
coord_straightpolar(theta = 'x') +
geom_area(color = 'blue', alpha = .00001) +
geom_point()
Now to be fair, I don't know what the unintended consequences are for this change. At least now we know why ggplot behaves this way, and what we can do to avoid it.
EDIT: Unfortunately, I don't know of an easy/elegant way to connect the points across the axis limits but you could try code like this:
# Refactoring the data
dd <- data.frame(category = c(1,2,3,4), value = c(2, 7, 4, 2))
ggplot(dd, aes(x = category, y = value, group=1)) +
coord_straightpolar(theta = 'x') +
geom_path(color = 'blue') +
scale_x_continuous(limits = c(1,4), breaks = 1:3, labels = LETTERS[1:3]) +
scale_y_continuous(limits = c(0, NA)) +
geom_point()
Some discussion about polar coordinates and crossing the boundary, including my own attempt at solving that problem, can be seen here geom_path() refuses to cross over the 0/360 line in coord_polar()
EDIT2:
I'm mistaken, it seems quite trivial anyway. Assume dd is your original tibble:
ggplot(dd, aes(x = category, y = value, group=1)) +
coord_straightpolar(theta = 'x') +
geom_polygon(color = 'blue', alpha = 0.0001) +
scale_y_continuous(limits = c(0, NA)) +
geom_point()

ggplot2 z clipping: remove unnecessary points in overlapping stacks

Consider the following example of plotting 100 overlapping points:
ggplot(data.frame(x=rnorm(100), y=rnorm(100)), aes(x=x, y=y)) +
geom_point(size=100) +
xlim(-10, 10) +
ylim(-10, 10)
I now want to save the image as vector graphics, e.g. in PDF. This is not a problem with the above example, but once I've got over a million points (e.g. from a volcano plot), the file size can exceed 100 MB for one page and it takes ages to display or edit.
In the above example the same shape could could still be represented by either
converting the points to a shape outline, or
keeping a couple of points and discarding the rest.
Is there any way (or preferably tool that already does this) to remove points from a plot that will never be visible? (ideally supporting transparency)
The best approach I have heard so far is to round the position of the dots and remove grid points that have > N points, then use the original positions of the remaining ones. Is there anything better?
Note that this should work with an arbitrary structure of points, and only remove those that are not visible.
You could do something with the convex hull, like this, filling in the polygon that makes up the convex hull:
library(ggplot2)
set.seed(123)
df <- data.frame(x = rnorm(100), y = rnorm(100))
idx <- chull(df)
ggplot(df, aes(x = x, y = y)) +
geom_point(size = 100,color="darkgrey") +
geom_polygon(data=df[idx,],color="blue") +
geom_point(size = 1, color = "red", size = 2) +
xlim(-10, 10) +
ylim(-10, 10)
yielding:
(Note that I pulled this chull-idea out of Hadley's "Extending ggplot2" guide https://cran.r-project.org/web/packages/ggplot2/vignettes/extending-ggplot2.html.)
In your case you would drop the geom_point calls and set transparency on the geom_polygon. Also not sure how fast chull is for millions of points, though it will clearly be faster than plotting them all.
And I am not really sure what you are after. If you really want the 100 pixel radius, they you could probably just do it for the ones on the complex hull, plus fill in the middle with geom_polygon.
So using this code:
ggplot(df[idx,], aes(x = x, y = y)) +
geom_point(size = 100, color = "black") +
geom_polygon(fill = "black") +
xlim(-10, 10) +
ylim(-10, 10)
to make this:

How to adjust figure settings in plotmatrix?

Can I adjust the point size, alpha, font, and axis ticks in a plotmatrix?
Here is an example:
library(ggplot2)
plotmatrix(iris)
How can I:
make the points twice as big
set alpha = 0.5
have no more than 5 ticks on each axis
set font to 1/2 size?
I have fiddled with the mapping = aes() argument to plotmatrix as well as opts() and adding layers such as + geom_point(alpha = 0.5, size = 14), but none of these seem to do anything. I have hacked a bit of a fix to the size by writing to a large pdf (pdf(file = "foo.pdf", height = 10, width = 10)), but this provides only a limited amount of control.
Pretty much all of the ggplot2 scatterplot matrix options are still fairly new and can be a bit experimental.
But the facilities in GGally do allows you to construct this kind of plot manually, though:
custom_iris <- ggpairs(iris,upper = "blank",lower = "blank",
title = "Custom Example")
p1 <- ggplot(iris,aes(x = Sepal.Length,y = Sepal.Width)) +
geom_point(size = 1,alpha = 0.3)
p2 <- ggplot(iris,aes(x = Sepal.Width,y = Sepal.Length)) +
geom_point()
custom_iris <- putPlot(custom_iris,p1,2,1)
custom_iris <- putPlot(custom_iris,p2,3,2)
custom_iris
I did that simply by directly following the last example in ?ggpairs.

Resources