How to save ggplots in PDF with less size R? - r

I am having trouble saving numerous ggplots inside pdf because I am creating ggplots (scatter plots and boxplots) with 12 million rows (lots of observations).
The problem is when I save the plot as PDF using:
ggsave("my_plots.pdf", myArrangedPlots)
The pdf size is very large = 90 MB for only 120 pages of PDF
When I save one plot as PNG using:
ggsave("plot1.png" plot1)
The size is much less in comparison to saving same single plot in PDF (1MB vs 0.1 MBs)
I think the reason is that ggplot internal mechanism tries to save the plots in Vectorized format format inside the PDF file to get maximum resolution but I don't need that much of resolution. Also note that when there are million of points represented in Vectorized format the size is going to be greater than the same plot in PNG, because PNG doesn't save layers.
I want to save the plots in PDF format but embedding the plots as PNGs instead of Vectorized format to make the PDF file size smaller.
I there any parameter in ggplot2 to achieve this or is there any workaround?

Observing the documentation of pdf(), it's parameters seem to be compatible with ggsave().
I found a parameter which is useDingbats, by default it is set to FALSE but If you set it to TRUE, the PDF size reduces drastically from 94 MB to 10 MB in my case.
So I use it like this:
ggsave("myplots.pdf", arrangedPlots, useDingbats = TRUE)
NOTE: setting useDingbats to true what does is using Dinbats font for small circles, which in case of the scatter plots and boxplots with lots of outlier points reduces the size of final PDF a lot.

Related

How to export the graphs produced in R?

As a beginner on R I manage to produce correct relational graphs thanks to R. They are about correct in the visualization space but of very bad quality when I export them in PDF or in JPEG / PNG. The image is not centered, a part of the legend is missing, the graph is very small or blurred etc.
How do you proceed for the export?
Thanks in advance!
I am looking for the right handling or settings to export the graph visualizations produced in R?
I understood that I had to set up the viewer space but via the code but have no idea how to do it...
Here an example how to write a plot to a PNG file. The plot "commands" are embedded in png(....)and dev.off(). Several options are available to configure size and resolution.
png("myfile.png", width=1600, height=1200, res=300) # good for LaTeX or Word
#png("myfile.png", width=800, height=600, res=150) # good for Powerpoint or Impress
plot(iris$Sepal.Length, iris$Petal.Length)
dev.off()
Some hints:
width and height are given in pixels
res influences nominal resolution and font size (play with it)
use at least 300 dpi (dots per inch). For centimeters, the number of pixels = 300/2.54 * width in cm
professionals use 600 or even 1200 pixels per inch, but then .docx and .pptx files will dramatically increase
1600 x 1200px is good for 13.3 x 10 cm size in the printed document
If you work with LaTeX, it is in most cases better to use PDF for the figures. Another very good idea is to use Markdown for the text. Then, figures are automatically embedded.

How to speed up scrolling speed of PDF pages with large data plots (e.g. trace plots)

I am preparing a Latex document and a slide show for my Bayesian analysis results. Trace plots generated by "coda" package in R are very large in size. By size, I mean kilobytes (KB), and loading time. When I am scrolling down the pdf files in a slow computer or IPAD, it takes quite a lot of time to load the pages that are involving trace plots. Is there any way to "lighten" those plots, so that the scrolling times decreases substantially? (such as converting to another format without losing much detail).
Note: I am using Rstudio and knitr to produce latex documents.
For example, I generated a plot using following code. If I export it to a single page PDF document, the size of the PDF will be 439 KB (compared to basic plots with sizes 7 KB).
library(coda)
temp <- mcmc(matrix(rnorm(100000),ncol=1))
traceplot(temp)
I would recommend you dump the images not as pdf, but as png. If you ensure that the png has a high enough resolution, it will be hard to see the difference between the pdf and the png. The png will be much faster than the pdf, speeding up scrolling.
PDF would have the advantage to scale, but the disadvantage is the rendering of bigger vector data.
In order to keep the scalability, what can be done is flattening and simplifying the "plot output" (I am sure that curves are split up into hundreds of minuscule straight lines). There should be tools out there which can do it (if needed get the PDF into Illustrator and do it there).
But even with simplifying, you may eventually get beyond the tolerable limits, and in this case, rasterizing the plot is the way to go. PNG has been suggested as format; TIFF would work as well. However, NEVER EVER do JPEG from plots; the quality would become horrendously bad.

Reduce PDF file size of plots by filtering hidden objects

While producing scatter plots of many points in R (using ggplot() for example), there might be many points that are behind the others and not visible at all. For instance see the plot below:
This is a scatter plot of several hundreds of thousands points, but most of them are behind the other points. The problem is when casting the output to a vector file (a PDF file for example), the invisible points make the file size so big, and increase memory and cpu usage while viewing the file.
A simple solution is to cast the output to a bitmap picture (TIFF or PNG for example), but they lose the vector quality and can be even larger in size. I tried some online PDF compressors, but the result was the same size as my original file.
Is there any good solution? For example some way to filter the points that are not visible, possibly during generating plot or after it by editing PDF file?
As a start you can do something like this:
set.seed(42)
DF <- data.frame(x=x<-runif(1e6),y=x+rnorm(1e6,sd=0.1))
plot(y~x,data=DF,pch=".",cex=4)
PDF size: 6334 KB
DF2 <- data.frame(x=round(DF$x,3),y=round(DF$y,3))
DF2 <- DF[!duplicated(DF2),]
nrow(DF2)
#[1] 373429
plot(y~x,data=DF2,pch=".",cex=4)
PDF size: 2373 KB
With the rounding you can control how many values you want to remove. You only need to modify this to handle the different colours.
Simply saving the plot as a high-res png file will very drastically cut the size, while keeping the quality more than good enough. At least I've never had journals complain about any of the png's I sent them, just keep sure to use > 600 dpi.
I think it might be done with some post-processing of the pdf-file. In linux, if I have to reduce a pdf, I would do
pdf2ps input.pdf output.ps
ps2pdf output.ps output.pdf
which for some reason works quite efficiently.
You can see some discussion at https://askubuntu.com/questions/113544/how-to-reduce-pdf-filesize.

How to resize pdf graphics produced in R using Illustrator

I am making plots in R using the pdf() command. Graphs look perfect and resize nicely in Acrobat Reader. My usual workflow includes manipulating labels etc. in Illustrator, saving as .eps for submission to publishers or inserting in Word. All works fine for single graphs.
Now I am trying to combine 4 graphs into one by manually putting them together in an A4 Illustrator document. However, when I resize the standard 7x7 inches pdf graph in Illustrator to fit in one column of an A4 page (ca 3.4 inches wide), all proportions get screwed up, e.g. lines and symbols outlines become way too thick. Using pdf(..., width=3.4, height=3.4) in R messes up all the symbol and font sizes so carefully chosen to produce the original graph. Why can't I resize the graph within Illustrator the same way I can resize the pdf e.g. in Acrobat Reader?
Illustrator is scaling everything including the stroke thickness, used to draw lines and symbols that have been converted to paths, and the font size for any text not converted to a path. (As I don't have Illustrator I can't say whether Illustrator treats "text" as text or as paths when opening the pdf.)
When Adobe Acrobat Reader displays the pdf it is just showing a rasterised view of the current file so just scales everything nicely as you wish.
I see two options; Either create the 2x2 plot directly in R and export that to PDF with the correct dimensions, or reduce the margins and font size used in each plot and export then at the desired width/height using the command you showed.
The first options can be achieved via:
pdf("attempt1.pdf", ....)
layout(matrix(1:4, ncol = 2, byrow = FALSE)) ## byrow = TRUE for fill-by-row
## all 4 plot calls go in here
layout(1)
dev.off()
You may need to tweak the point size used in the pdf() device and somewhat adjust the cex.??? settings for some bits of the plot to tailor this exactly how you want it.
Alternatively, you need to reduce the pointsize and margins and draw each plot on the 3.4 by 3.4 inch device. Something like this will get you started:
pdf("attempt2.pdf", height = 3.4, width = 3.4, pointsize = 10)
op <- par(mar = c(4,3,3,1) + 0.1) ## one line less per marging
## your single plotting call here
par(op)
dev.off()
See ?par for the list of ways to control the plot margins & other parameters you might wish to set to control the quality of the final plot. You may wish to look into the cex.foo parameters to control the relative sizing of the text on the plots, but this is all relative to the base pointsize you set when you create the pdf() device.

Producing a vector graphics image (i.e. metafile) in R suitable for printing in Word 2007

First a caveat: I posted this question here on SuperUser, but it is clearly the wrong place to ask R questions. I recognize that it is not directly a programming question, but I believe it can be solved by changing how plots are produced (i.e. by coding appropriately). So I hope readers find this appropriate for the forum.
R plots usually consist entirely of vector graphics elements (i.e. points, lines, polygons, text). R permits you to save your figure (or copy-paste) in various formats including various raster formats, as a PDF, or as a Windows meta-file.
I usually save my images as PDFs and print them. This renders the images exactly as I intended them on paper, in the highest quality. I avoid raster formats (e.g. JPG, TIFF) for printing as inevitably the quality is poorer and publishers prefer vector formats.
However, I need to make a large multi-page desktop published document using Microsoft Word 2007, and therefore using PDFs is not an option. When I import my figures from meta-files, or copy and paste directly from R into Word both the screen and print rendering of the image changes slightly (e.g. polygons and their fills become slightly misaligned).
Given that I want to retain high vector quality (and not use raster formats), what can I do to make R vector graphics work with Word? (Of course Sweave and LaTeX would be nice, but again, not a realistic option).
Consider this example:
plot(c(1:100), c(1:100), pch=20)
## Copy and paste to Word 2007 as Windows metafile
## Print
## Quality is poorer (e.g. dot fills misaligned with borders)
pdf("printsPerfectly.pdf")
plot(c(1:100), c(1:100), pch=20)
dev.off()
## Now print PDF
## Quality is as expected
EDIT: Further to suggestions by #John I produced it as an EPS postscript file (see below), inserted it as a picture into Word. Because ultimately it will be printed from a PDF created from Word, I converted it to a PDF using default Word 2007 settings, printed it on my HP Laserjet P1606dn laser printer, and then took aphotograph to illustrate the issue of polygons borders and fills misaligning (image on left, below). I also produced it directly as PDF from R using pdf() and printed the PDF and took a photograph (image on right, below).
It may seem like small potatoes! But when you have gone to a lot of trouble to achieve high quality, it is disappointing to be thwarted at the end. In addition, it is not really obvious here, but the numerals are not as high-quality (left) as in the PDF (right), disregarding differences in focus on the photograph.
The accepted answer to me is not acceptable, since if one goes to the trouble of making a nice vector based figure, the last thing one would like to do is just rasterize it to a bitmap... Unless it's an increadibly complex graph that takes ages to render in vector format, or something like that, but for most graphs that's not the case.
The best solution is to export to Word directly in native Office vector format. I just made a new package, export, that allows one to do exactly that an allows export of either graphs or statistical tables to Word and Powerpoint, see
https://cran.r-project.org/web/packages/export/index.html and for demo see
https://github.com/tomwenseleers/export
For example:
library(devtools)
devtools::install_github("tomwenseleers/export")
library(export)
?graph2ppt
?graph2doc
?table2ppt
?table2doc
## export of ggplot2 plot
library(ggplot2)
qplot(Sepal.Length, Petal.Length, data = iris, color = Species,
size = Petal.Width, alpha = I(0.7))
# export to Word
graph2doc(file="ggplot2_plot.docx", width=7, height=5)
# export to Powerpoint
graph2ppt(file="ggplot2_plot.pptx", width=7, height=5)
You can also export to enhanced metafile using the function
graph2emf(file="ggplot2_plot.emf", width=7, height=5)
but the quality of the native Office format is better.
For final production you can also readily print it to PDF from Powerpoint if need be, and it will stay nicely in vector format then.
Your only option is to use high resolution raster graphics. Once you're over 300 dpi it will be completely indistinguishable from vector printed; it will just make larger files.. Your copy and paste method is coming in at 72 dpi and will look terrible. If you import from a file you can get the resolution in the file and things will be much better. Fortunately Office 2007 is supposed to handle png images, which have the best compression for typical graphs. Let's say you wanted the image 4" wide and 6" high...
png('printsGreat.png', width = 4, height = 6, units = 'in', res = 300)
plot(c(1:100), c(1:100), pch=20)
dev.off()
Also, Office 2007 is supposed to be able to handle EPS files and R postscript files are by default EPS compatible when you print one page.
postscript("printsPerfectly.eps", width = 4, height = 6, horizontal = FALSE, onefile = FALSE)
plot(c(1:100), c(1:100), pch=20)
dev.off()
But if you don't have luck with them go back to the high resolution image.
My preferred solution is to use the windows metafile device for plotting, e.g.:
win.metafile("mygraph.wmf")
print(gg1)
dev.off()
This produces a *.wmf file that can be copy-pasted into the word file.
The devEMF package seems to produce graphics that look nicer than the default wmf when pasted into PowerPoint.
Since I tried to produce png at high res in R and it didn't seem to work on my PC (if I set the resolution higher than, say, 300 dpi, R would produce an error like "cannot start png device"), the way I found was to save the figure using postscript() and then use GSView to convert the ps file into png with 600 dpi resolution. MS Word consumes the png's happily and the quality of print seems to be perfect.
What #Tom Wenseleers said:
The current best answer above to me is not acceptable, since if one
goes to the trouble of making a nice vector based figure, the last
thing one would like to do is just rasterize it to a bitmap... Unless
it's an increadibly complex graph that takes ages to render in vector
format, or something like that, but for most graphs that's not the
case.
For me, there is a new best answer to this question, since graph2ppt and graph2doc tend to move axis labels around (which apparently cannot be fixed; see here: https://github.com/davidgohel/rvg/blob/master/R/body_add_vg.R and here: export::graph2office moves axis labels around).
I think that .svg is the most appropriate vector format for usage with publication graphics. The only drawback is that older versions of e.g. MS Word cannot handle it. IN R, you could use the native graphics::svg - device. However, I'd recommend to use CairoSVG from the Cairo - Package, especially when you are working with non-native fonts (e.g. via the extrafont - package), because in contrast to graphics::svg, Cairo::CairoSVG embeds fonts quite nicely (without relying on GhostScript, if I am right).
If you are working with an older version of MS Word, you could use incscape (a free vector graphic editor) and convert your graph to .wmf, for example (which might be better than printing to .wmf directly, because R rasterizes points when exporting .wmf files).
An example:
## create plot
library (ggplot2)
library (extrafont)
# note: if you want to use other fonts than the standard ones - in this example "ChantillyLH" -
# you must register your fonts via
# font_import () ##run only once (type "y" in the console)
# and
# loadfonts (device = "win") ##run only once.
# Otherwise, the extrafont - package is not needed.
beautiful_plot <-
ggplot (data = iris, mapping = aes (x = Sepal.Length, y = Petal.Length)) +
geom_point () +
theme (text = element_text (size = 18,
family = "ChantillyLH")
)
# export SVG
library (Cairo)
CairoSVG ("My_Path/My_Plot.svg", width = 6, height = 6)
print (beautiful_plot)
dev.off ()
# the resulting SVG-file is in the the "My_Path" - Folder.
In Incscape, it looks like this:
Newer versions of Word can import raster graphics from SVG files. R 3.6.2 has built-in support for creating SVG files with the svg function - no extra packages needed.
Your example then becomes
svg("printsPerfectly.svg", width=4, height=4)
plot(c(1:100), c(1:100), pch=20)
dev.off()
Note that there is a known issue when you try to create PDF files from Word documents with embedded SVG files with thin lines. If you are using thin lines, e.g. with lwd=0.7 somewhere, you need to apply this workaround.

Resources