Make resizable plots using the grid graphing system in R - r

Recently I read about the grid graphing system in R. It is very flexible and with its mastery one should be able to make very sophisticated graphs. However I have not found any good place that will allow me to plot a graph that is also re-sizable? The question is as follows: How do you use grid graphing system in R so that the final output is actually resizable?

One way of doing so is not using the grip graphing system directly, but use the lattice interface to it. The lattice package comes installed with R as far as I know, and forms a very flexible interface to the underlying Trellis graphs, which are grid-based graphs. Lattice also allows you to manipulate the grid directly, so in fact for most sophisticated graphs that will be all you need.
If you really are going to work with the grid graphing system itself, you have to use the correct coordinate system for it to be scalable. Either "native", "npc" (Normalized Parent Coordinates) or "snpc" (Square Normalized Parent Coordinates) allow you to rescale a figure, as they give the coordinates relative to the size (or one aspect of it) of the current viewport.
In order to make full use of these, make sure you understand the concept of viewports very well. I have to admit that I still have a lot to learn about it. If you really want to get on with it, I can suggest the book R Graphics from Paul Murrell
Take a closer look at chapter 5 of that book. You can also learn a lot from the R code of the examples, which can also be found on this page
To give you one :
grid.circle(x=seq(0.1, 0.9, length=100),
y=0.5 + 0.4*sin(seq(0, 2*pi, length=100)),
r=abs(0.1*cos(seq(0, 2*pi, length=100))))
Perfectly scaleable. If you look at the help pages of grid.circle, you'll find the default.units="npc" option. That's where in this case the correct coordinate system is set. Compare to
grid.circle(x=seq(0.1, 0.9, length=100),
y=0.5 + 0.4*sin(seq(0, 2*pi, length=100)),
r=abs(0.1*cos(seq(0, 2*pi, length=100))),
default.units="inch")
which is not scaleable.

Related

How to rasterize a single layer of a ggplot?

Matplotlib allows to rasterize individual elements of a plot and save it as a mixed pixel/vector graphic (.pdf) (see e.g. this answer). How can the same achieved in R with ggplot2?
The following is a toy problem in which I would like to rasterize only the geom_point layer.
set.seed(1)
x <- rlnorm(10000,4)
y <- 1+rpois(length(x),lambda=x/10+1/x)
z <- sample(letters[1:2],length(x), replace=TRUE)
p <- ggplot(data.frame(x,y,z),aes(x=x,y=y)) +
facet_wrap("z") +
geom_point(size=0.1,alpha=0.1) +
scale_x_log10()+scale_y_log10() +
geom_smooth(method="gam",formula = y ~ s(x, bs = "cs"))
print(p)
ggsave("out.pdf", p)
When saved as .pdf as is, Adobe reader DC needs ~1s to render the figure. Below you can see a .png version:
Of course, it is often possible to avoid the problem by not plotting raw data
Thanks to the ggrastr package by Viktor Petukhov & Evan Biederstedt, it is now possible to rasterize individual layers. However, currently (2018-08-13), only geom_point and geom_tile are supported. and work by Teun van den Brand it is now possible to rasterize any individual ggplot layer by wrapping it in ggrastr::rasterise():
# install.packages('devtools')
# remotes::install_github('VPetukhov/ggrastr')
df %>% ggplot(aes(x=x, y=y)) +
# this layer will be rasterized:
ggrastr::rasterise(geom_point(size=0.1, alpha=0.1)) +
# this one will still be "vector":
geom_smooth()
Previously, only a few geoms were supported:
To use it, you had to replace geom_point by ggrastr::geom_point_rast.
For example:
# install.packages('devtools')
# devtools::install_github('VPetukhov/ggrastr')
library(ggplot2)
set.seed(1)
x <- rlnorm(10000, 4)
y <- 1+rpois(length(x), lambda = x/10+1/x)
z <- sample(letters[1:2], length(x), replace = TRUE)
ggplot(data.frame(x, y, z), aes(x=x, y=y)) +
facet_wrap("z") +
ggrastr::geom_point_rast(size=0.1, alpha=0.1) +
scale_x_log10() + scale_y_log10() +
geom_smooth(method="gam", formula = y ~ s(x, bs = "cs"))
ggsave("out.pdf")
This yields a pdf that contains only the geom_point layer as raster and everything else as vector graphic. Overall the figure looks as the one in the question, but zooming in reveals the difference:
Compare this to an all-raster graphic:
I think you've set yourself up to not have this question answered. You write:
I expect an answer to provide an extension to ggplot2 that allows to export plots with rasterized layers with minimal changes to to existing plotting commands, i.e. as wrapper for geom_... commands or as an additional parameter to these or a ggsave command that expects a list of unevaluated parts of a plot command (every second to be rasterized), not a hacky workaround as provided in the linked question.
This is a major development effort that could easily require several weeks or more of effort by a highly skilled developer. It's unlikely anybody will do this just because of a Stack Overflow question. In lieu of a functioning implementation, I'll describe here how one could implement what you're asking for and why it's rather challenging.
The players
Let's start with the key players we'll be dealing with. At the highest level sits the ggplot2 library. It takes data frames and turns them into figures. ggplot2 itself doesn't know anything about low-level drawing, though. It only deals with lines, polygons, text, etc., which it hands off to the grid library in the form of graphics objects (grobs).
The grid library itself is a fairly high-level library. It also doesn't know much about low-level drawing. It primarily deals with lines, polygons, text, etc., which it hands off to an R graphics device. The device does the actual drawing.
There are many different R graphics devices. Enter ?Devices in an R command line to see an incomplete list. There are vector-graphics devices, such as pdf, postscript, or svg, raster devices such as png, jpeg, or tiff, and interactive devices such as X11 or quartz. Obviously, rasterization as a concept only makes sense for vector-graphics devices, since raster devices raster everything anyways. Importantly, neither ggplot2 nor grid know or care which graphics device you're currently drawing on. They deal with graphical objects that can be drawn on any device.
Ideal high-level interface
The high-level interface should consist of an option rasterize in the layer() function of ggplot2. In this way, one could simply write, e.g., geom_point(rasterize = TRUE) to rasterize the points layer. This would work transparently for all geoms and stats, since they all call layer().
Possible implementations
I see four possible routes of implementation, ordered from most impossible to least.
1. Ideally, the layer() function would simply hand off the rasterize option to the grid library, which would hand it off to the graphics device to tell it which parts of the plot to rasterize. This approach would require major changes in the graphics device API. I don't see this happening. Not in my lifetime, at least.
2. Alternatively, one could write a new grob type that can take any arbitrary grob and rasterize it on demand when the grob is drawn on a graphics device. This approach would not require changes in the graphics device API, but it would require detailed knowledge of the low-level implementation of the grid library. It would also possibly make interactive viewing of such figures very slow.
3. A slightly simpler alternative to 2. would be to rasterize the arbitrary grob only once, on grob construction, and then reuse whenever that grob is drawn. This would be quite a bit faster on interactive graphics devices but the drawing would get distorted if the aspect ratio is changed interactively. Nevertheless, since the primary use of this functionality would be to generate pdf output (I assume), this option might be sufficient.
4. Finally, rasterization could also happen in the layer() function, and that function could simply place a regular raster grob into the grob tree. That solution is similar to the technique described here. Technically, it's not much different from 3. Either way, one needs to write code to rasterize a grob tree and then replace it by a raster grob.
Technical hurdles
To rasterize parts of the grob tree, we'd have to send them to an R raster graphics device to render. However, there isn't one that renders to memory. So, one would have to render to a temporary file (e.g., using png()), and then read the file back in. That's possible but ugly. It also depends on functionality (such as png()) that isn't guaranteed to be available on every R installation.
Second, to render parts of the grob tree separately from the overall rendering, we'll have to open a new graphics device in addition to the one currently open. That's possible but can lead to unexpected bugs. I'm dealing with such bugs all the time, see e.g. here or here for issues related to code using this technique. Whoever implements the rasterization functionality would have to deal with such issues.
Finally, we'll have to get the rasterization code accepted into the ggplot2 library, since we need to replace the layer() function and I don't think there's a way to do that from a separate package. Given how hackish the rasterization solutions are going to be (see previous two paragraphs), that may be a tall order.

3D Line Plot with Datavisualization

I am looking for a way to draw a 3d line plot. Preferably I would like to use the datavisualization framework, but it does not seem to provide this out of the box.
I experimented a little bit and ended up using 3D surface plots (Surface3D) displaying the lines as surfaces (i.e. ribbons) like this:
While this works and looks okay in above picture the thickness of the line depends on the perspective. Rotating the plot always allows to find the angle where the line disappears since it has not thickness:
Is there a type of plot that would be better suited for this? I tested the bars which don't perform well for lots of samples and don't look nice in my application. I also tested scatterplots which are not suitable either.
If there isn't: Where would I start to implement this myself on top of the existing classes in the datavisualization framework? I am thinking about adding another surface "ribbon" in z direction, however that seems a little hackish.
I used the technique described as hackish above. While I am not too happy about the approach the overall look is quite okay:
So basically each data line consists of three QSurfaceDataRows that together form two 90° ribbons as can be seen here:

Raster map vs alternative

I recently found this web page Crime in Downtown Houston that I'm interested in reproducing. This is my first learning experience with mapping in R and thus lack the vocabulary and understanding necessary to make appropriate decisions.
At the end of the page David Kahle states:
One last point might be helpful. In making these kinds of plots, one
might tempted to use the map raster file itself as a background. This
method can be used to make map plots much more quickly than the
methods described above. However, the method has one very significant
disadvantage which, if not handled properly, can destroy the entire
purpose of using the map.
In very plain English what is the difference between the raster file
approach and his approach?
Does the RgoogleMaps package have the ability to produce these types
of high quality maps as seen on the page I referenced above that
calls a google map into R?
I ask not because I lack information but the opposite. There's too much and I want to make a good decision(s) about the approach to pursue so I'm not wasting my time on outdated or inefficient techniques.
Feel free to pass along any readings you think would benefit me.
Thank you in advance for your direction.
Basically, you had two options at the time this plot was made:
draw the map as a layer using geom_tile, where each pixel of the image is mapped onto the x,y axes (slow but accurate)
add a background image to the plot, as a purely "cosmetic" annotation. This method is faster, because you can use grid.raster which draws images more efficiently, but the image is not constrained by the axes of the plotting region. In other words, you have to manually adjust the x and y axes limits to make sure that the image corresponds to the actual positions on the plot.
Now, I would suggest you look at the new annotation_raster in ggplot2 v. 0.9.0. It should have the advantage of speed and leaner output files, and still conform to the data space of the plot. I believe that this function, as well as geom_raster and annotation_map did not exist when David made those plots.

Extending ggplot2 properly?

Recently a few neat uses of ggplot2 have come up, and either partial or full solutions have been posted:
ggheat
Curly braces
position_dynamic
ggheat is notable because it rather breaks the ggplot metaphor by just plotting rather than returning an object.
The curly brace solutions are notable because none really fits in the ggplot2 high-level concept (e.g. you should be specifying a range of points you want to breaks, and then somewhere else be able to specify the geom of how you want that range displayed--brace, box, purple cow, etc.).
The ggplot2 book (which I will order soon and have read the 2 online chapters) seems to be about using the grammar and functions rather than writing new ones or extensively extending existing ones.
I would like to learn to add a specific feature or develop a new geom, and do it properly. ggplot2 may not be intended as a general graphics package in the same way that grid or base graphics are, but there are a great many graphs which are only a step or two extension from an existing ggplot2 geom. When these situations come up, I can typically put together enough objects to do something once, but what if I need the same plot a few dozen times? What if other people like it and want to use it--now they have to kludge through the same process each time they want that graph. It seems to me that the proper solution is to add in a stat_heatplot and geom_heatplot, or to add a geom_Tuftebox for Tufte box plots, etc. Yet I've never seen an example of actually extending ggplot2; just examples of how to use it.
What resources exist to dig deeper into ggplot2 and start extending it? I'm particularly interested in a high-level way to specify a range on an axis as described above, but general knowledge about what makes ggplot2 tick is welcome as well.
Absent a coherent guide (which rarely exists for sufficiently advanced tinkering and therefore may not exist here), how would one go about learning about the internals? Inspecting source is obviously one way, but what functions to start with, etc.
ggplot2 is gradually becoming more and more extensible. The development version, https://github.com/hadley/ggplot2/tree/develop, uses roxygen2 (instead of two separate homegrown systems), and has begun the switch from proto to simpler S3 classes (currently complete for coords and scales). These two changes should hopefully make the source code easier to understand, and hence easier for others to extend (backup by the fact that pull request for ggplot2 are increasing).
Another big improvement that will be included in the next version is Kohske Takahashi's improvements to the guide system (https://github.com/kohske/ggplot2/tree/feature/new-guides-with-gtable). As well as improving the default guides (e.g. with elegant continuous colour bars), his changes also make it easier to override the defaults with your own custom legends and axes. This would make it possible to draw the curly braces in the axes, where they probably belong.
The next big round of changes (which I probably won't be able to tackle until summer 2012) will include a rewrite of geoms, stats and position adjustments, along the lines of the sketch in the layers package (https://github.com/hadley/layers). This should make geoms, stats and position adjustments much easier to write, and will hopefully foster more community contributions, such as a geom_tufteboxplot.
I am not certain that I agree with your analysis. I'll explain why, and will then point you to some resources for writing your own geoms.
ggheat
As far as I can tell, ggheat returns an object of class ggplot. Thus it is a convenient wrapper around ggplot, customised for a specific use case. Although qplot is far more generic, it does in principle the same thing: It is a wrapper around ggplot that makes some informed guesses about the data and chooses sensible defaults. Hadley calls this plot functions and it is described briefly on page 181 of the ggplot2 book.
curly braces
The curly brace solution does exactly what the ggplot philosophy says, i.e. separate data from presentation. In this case, the data is generated by a little custom function and is stored in a data.frame. It is then displayed using a geom that makes sense, i.e. geom_line.
quo vadis?
You have noted (in the r chat room) that you would prefer to have a more generic approach to plotting the curly braces. Something along the following lines (and I paraphrase and extend at the same time):
Supply data in the form of a bounding box coordinates (i.e. x0, x1, y0 and y1)
Specify a "statistic", such as brace, box or whatever
Specify a geom, such as geom_custom_shape
This sounds like a nice generalisation and extension of the ideas behind the curly brace solution, and would clearly require writing a new geom. There is an official ggplot wiki, where you can find instructions for creating a new geom.
Why do you want to extend it? What is the motivation? As I see it ggplot2 is meant to be a high-level graphics package designed to produce nice figures from a particular data set. And do things right and make other things easy: like scales, legends etc. ggplot2 is not meant to be a general-purpose graphics tool-kit. Like lattice it has a particular paradigm in mind and you use it for that purpose.
grid is the underlying graphical toolkit you want to use to do general purpose, customised plotting. And IIRC, it is relatively easy to add grid grobs to lattice or ggplot2 plots/objects, for this sort of arbitrary notation/annotation etc.
What doesn't make too much sense is extending ggplot2 or lattice along the lines you are thinking. I don't see why the ggplot2 can't do heatplots as it is? Or am I missing something here?
What would be very useful would be if the data processing guts of ggplot2 or lattice were available for others to write actual plotting code on top of. Hadley has mentioned this somewhere before.
ggplot2, in particular, and lattice are quite difficult codes to get into to read/understand. ggplot2 uses the proto package for a version of OOP, which means you need to understand what that is doing as well as ggplot2 semantics. lattice is similar as there is a lot of computing on the language done there that, if you are not familiar with that sort of R programming, can by quite intimidating, daunting and impenetrable!
For grid, I suggest you look at Paul Murrell's R Graphics book, a second edition of which is with the publisher: http://www.stat.auckland.ac.nz/~paul/RG2e/
Edit: The point I was intending to get across was that the interfaces provided by packages like ggplot2 and lattice are necessarily high-level. Extending them is fine as long as they stick to the paradigm/philosophy in use. Heatplots can already be made by using existing geoms; part of the philosophy of the ggplot system is to separate the data from the display/presentation, and to use geoms in interesting ways to produce the desired display.
Wrapping base ggplot + geom calls into a more user friendly function is OK as long as i) it works like ggplot already does and returns an object, and ii) it doesn't have an interface that is too different from the way ggplot works. Developers are free to write whatever code they want, it just isn't helpful to the wider community to provide wrappers that move too far away from the original's workings. That leads to confusion on the part of the user and doesn't foster learning of ggplot2 itself.
The dynamic positioning idea is interesting; you could include these ideas in all plotting packages. You could bolt this into a geom, or alternatively as an external function that modified the input coordinates to produce a new data object that could be used by the relevant geom. That same function could be used for other plotting packages - it wouldn't need to be ggplot-specific.

Plotting large numbers with R, but not all numbers are being shown

I am trying to render 739455 data point on a graph using R, but on the x-axis I can not view all those numbers, is there a way I can do that?
I am new to R.
Thank you
As others suggested, try hist, hexbin, plot(density(node)), as these are standard methods for dealing with more points than pixels. (I like to set hist with the parameter breaks = "FD" - it tends to have better breakpoints than the default setting.)
Where you may find some joy is in using the iplots package, an interactive plotting package. The corresponding commands include ihist, iplot, and more. As you have a Mac, the more recent Acinonyx package may be even more fun. You can zoom in and out quite easily. I recommend starting with the iplots package as it has more documentation and a nice site.
If you have a data frame with several variables, not just node, then being able to link the different plots such that brushing points in one plot highlights them in another will make the whole process more stimulating and efficient.
That's not to say that you should ignore hexbin and the other ideas - those are still very useful. Be sure to check out the options for hexbin, e.g. ?hexbin.

Resources