Combining vector and bitmap graphics in a pdf - r

When plotting images or heatmaps to pdfs as in the example below they are saved as vector objects where every pixel in the image or cell in the heatmap is represented by a square. Even at modest resolutions this results in unnecessarily large files that also renders uglily on some devices. Is there a way to make R save only the image area as a png or jpg embedded in the pdf but keep text, axes, anotations etc as vector graphics?
I'm asking since I am often printing R graphics, sometimes on large posters, and would like to combine the best of the two worlds. Of course I could save the entire figure as a high resolution png but that would not be as elegant, or combine it manually e.g. in Inkscape but it is quite tedious.
my.func <- function(x, y) x %*% t(y)
pdf(file="myPlot.pdf")
image(my.func(seq(-10,10,,500), seq(-5,15,,500)), col=heat.colors(100))
dev.off()
Thanks for your time, ideas and hopefully solutions!

Use ?rasterImage, or more conveniently in recent versions image with option useRaster = TRUE.
That will dramatically reduce the size of the file.
my.func <- function(x, y) x %*% t(y)
pdf(file="image.pdf")
image(my.func(seq(-10,10,,500), seq(-5,15,,500)), col=heat.colors(100))
dev.off()
pdf(file="rasterImage.pdf")
image(my.func(seq(-10,10,,500), seq(-5,15,,500)), col=heat.colors(100), useRaster = TRUE)
dev.off()
file.info("image.pdf")$size
file.info("rasterImage.pdf")$size
image.pdf: 813229 bytes
rasterImage.pdf 16511 bytes
See more details about the new features here:
http://developer.r-project.org/Raster/raster-RFC.html
http://journal.r-project.org/archive/2011-1/RJournal_2011-1_Murrell.pdf

Related

How do I save plot images in R?

I have created a plot from a very large vector (magnitude of 10^7). The problem with the usual way of saving the plot as a pdf file is that the pdf file comes out as a very large file of around 10MB. I don't want such a large size for a simple time series plot. How do I save the plot such that the size is small enough to be at most 100kilobytes?
baptiste is on the right track with their suggestion of png for a nice raster type plot. In contrast to Jdbaba's suggestion of copying the open device, I suggest that you make a call to the png()device directly. This will save a lot of time in that you won't have to load the plot in a separate device window first, which can take a long time to load if the data set is large.
Example
#plotting of 1e+06 points
x <- rnorm(1000000)
y <- rnorm(1000000)
png("myplot.png", width=4, height=4, units="in", res=300)
par(mar=c(4,4,1,1))
plot(x,y,col=rgb(0,0,0,0.03), pch=".", cex=2)
dev.off() #only 129kb in size
see ?png for other settings of the png device.
If you want to plot the png file use the following command:
dev.copy(png,"myfile.png",width=8,height=6,units="in",res=100)
dev.off()
you can change res value to higher value if you want to output high quality graphs.
If you want to save the file as pdf use the following command:
pdf("myfile.pdf",width=8,height=6)
dev.off()
Remember to change the width and height values as needed.

Reduce PDF file size of plots by filtering hidden objects

While producing scatter plots of many points in R (using ggplot() for example), there might be many points that are behind the others and not visible at all. For instance see the plot below:
This is a scatter plot of several hundreds of thousands points, but most of them are behind the other points. The problem is when casting the output to a vector file (a PDF file for example), the invisible points make the file size so big, and increase memory and cpu usage while viewing the file.
A simple solution is to cast the output to a bitmap picture (TIFF or PNG for example), but they lose the vector quality and can be even larger in size. I tried some online PDF compressors, but the result was the same size as my original file.
Is there any good solution? For example some way to filter the points that are not visible, possibly during generating plot or after it by editing PDF file?
As a start you can do something like this:
set.seed(42)
DF <- data.frame(x=x<-runif(1e6),y=x+rnorm(1e6,sd=0.1))
plot(y~x,data=DF,pch=".",cex=4)
PDF size: 6334 KB
DF2 <- data.frame(x=round(DF$x,3),y=round(DF$y,3))
DF2 <- DF[!duplicated(DF2),]
nrow(DF2)
#[1] 373429
plot(y~x,data=DF2,pch=".",cex=4)
PDF size: 2373 KB
With the rounding you can control how many values you want to remove. You only need to modify this to handle the different colours.
Simply saving the plot as a high-res png file will very drastically cut the size, while keeping the quality more than good enough. At least I've never had journals complain about any of the png's I sent them, just keep sure to use > 600 dpi.
I think it might be done with some post-processing of the pdf-file. In linux, if I have to reduce a pdf, I would do
pdf2ps input.pdf output.ps
ps2pdf output.ps output.pdf
which for some reason works quite efficiently.
You can see some discussion at https://askubuntu.com/questions/113544/how-to-reduce-pdf-filesize.

Producing a vector graphics image (i.e. metafile) in R suitable for printing in Word 2007

First a caveat: I posted this question here on SuperUser, but it is clearly the wrong place to ask R questions. I recognize that it is not directly a programming question, but I believe it can be solved by changing how plots are produced (i.e. by coding appropriately). So I hope readers find this appropriate for the forum.
R plots usually consist entirely of vector graphics elements (i.e. points, lines, polygons, text). R permits you to save your figure (or copy-paste) in various formats including various raster formats, as a PDF, or as a Windows meta-file.
I usually save my images as PDFs and print them. This renders the images exactly as I intended them on paper, in the highest quality. I avoid raster formats (e.g. JPG, TIFF) for printing as inevitably the quality is poorer and publishers prefer vector formats.
However, I need to make a large multi-page desktop published document using Microsoft Word 2007, and therefore using PDFs is not an option. When I import my figures from meta-files, or copy and paste directly from R into Word both the screen and print rendering of the image changes slightly (e.g. polygons and their fills become slightly misaligned).
Given that I want to retain high vector quality (and not use raster formats), what can I do to make R vector graphics work with Word? (Of course Sweave and LaTeX would be nice, but again, not a realistic option).
Consider this example:
plot(c(1:100), c(1:100), pch=20)
## Copy and paste to Word 2007 as Windows metafile
## Print
## Quality is poorer (e.g. dot fills misaligned with borders)
pdf("printsPerfectly.pdf")
plot(c(1:100), c(1:100), pch=20)
dev.off()
## Now print PDF
## Quality is as expected
EDIT: Further to suggestions by #John I produced it as an EPS postscript file (see below), inserted it as a picture into Word. Because ultimately it will be printed from a PDF created from Word, I converted it to a PDF using default Word 2007 settings, printed it on my HP Laserjet P1606dn laser printer, and then took aphotograph to illustrate the issue of polygons borders and fills misaligning (image on left, below). I also produced it directly as PDF from R using pdf() and printed the PDF and took a photograph (image on right, below).
It may seem like small potatoes! But when you have gone to a lot of trouble to achieve high quality, it is disappointing to be thwarted at the end. In addition, it is not really obvious here, but the numerals are not as high-quality (left) as in the PDF (right), disregarding differences in focus on the photograph.
The accepted answer to me is not acceptable, since if one goes to the trouble of making a nice vector based figure, the last thing one would like to do is just rasterize it to a bitmap... Unless it's an increadibly complex graph that takes ages to render in vector format, or something like that, but for most graphs that's not the case.
The best solution is to export to Word directly in native Office vector format. I just made a new package, export, that allows one to do exactly that an allows export of either graphs or statistical tables to Word and Powerpoint, see
https://cran.r-project.org/web/packages/export/index.html and for demo see
https://github.com/tomwenseleers/export
For example:
library(devtools)
devtools::install_github("tomwenseleers/export")
library(export)
?graph2ppt
?graph2doc
?table2ppt
?table2doc
## export of ggplot2 plot
library(ggplot2)
qplot(Sepal.Length, Petal.Length, data = iris, color = Species,
size = Petal.Width, alpha = I(0.7))
# export to Word
graph2doc(file="ggplot2_plot.docx", width=7, height=5)
# export to Powerpoint
graph2ppt(file="ggplot2_plot.pptx", width=7, height=5)
You can also export to enhanced metafile using the function
graph2emf(file="ggplot2_plot.emf", width=7, height=5)
but the quality of the native Office format is better.
For final production you can also readily print it to PDF from Powerpoint if need be, and it will stay nicely in vector format then.
Your only option is to use high resolution raster graphics. Once you're over 300 dpi it will be completely indistinguishable from vector printed; it will just make larger files.. Your copy and paste method is coming in at 72 dpi and will look terrible. If you import from a file you can get the resolution in the file and things will be much better. Fortunately Office 2007 is supposed to handle png images, which have the best compression for typical graphs. Let's say you wanted the image 4" wide and 6" high...
png('printsGreat.png', width = 4, height = 6, units = 'in', res = 300)
plot(c(1:100), c(1:100), pch=20)
dev.off()
Also, Office 2007 is supposed to be able to handle EPS files and R postscript files are by default EPS compatible when you print one page.
postscript("printsPerfectly.eps", width = 4, height = 6, horizontal = FALSE, onefile = FALSE)
plot(c(1:100), c(1:100), pch=20)
dev.off()
But if you don't have luck with them go back to the high resolution image.
My preferred solution is to use the windows metafile device for plotting, e.g.:
win.metafile("mygraph.wmf")
print(gg1)
dev.off()
This produces a *.wmf file that can be copy-pasted into the word file.
The devEMF package seems to produce graphics that look nicer than the default wmf when pasted into PowerPoint.
Since I tried to produce png at high res in R and it didn't seem to work on my PC (if I set the resolution higher than, say, 300 dpi, R would produce an error like "cannot start png device"), the way I found was to save the figure using postscript() and then use GSView to convert the ps file into png with 600 dpi resolution. MS Word consumes the png's happily and the quality of print seems to be perfect.
What #Tom Wenseleers said:
The current best answer above to me is not acceptable, since if one
goes to the trouble of making a nice vector based figure, the last
thing one would like to do is just rasterize it to a bitmap... Unless
it's an increadibly complex graph that takes ages to render in vector
format, or something like that, but for most graphs that's not the
case.
For me, there is a new best answer to this question, since graph2ppt and graph2doc tend to move axis labels around (which apparently cannot be fixed; see here: https://github.com/davidgohel/rvg/blob/master/R/body_add_vg.R and here: export::graph2office moves axis labels around).
I think that .svg is the most appropriate vector format for usage with publication graphics. The only drawback is that older versions of e.g. MS Word cannot handle it. IN R, you could use the native graphics::svg - device. However, I'd recommend to use CairoSVG from the Cairo - Package, especially when you are working with non-native fonts (e.g. via the extrafont - package), because in contrast to graphics::svg, Cairo::CairoSVG embeds fonts quite nicely (without relying on GhostScript, if I am right).
If you are working with an older version of MS Word, you could use incscape (a free vector graphic editor) and convert your graph to .wmf, for example (which might be better than printing to .wmf directly, because R rasterizes points when exporting .wmf files).
An example:
## create plot
library (ggplot2)
library (extrafont)
# note: if you want to use other fonts than the standard ones - in this example "ChantillyLH" -
# you must register your fonts via
# font_import () ##run only once (type "y" in the console)
# and
# loadfonts (device = "win") ##run only once.
# Otherwise, the extrafont - package is not needed.
beautiful_plot <-
ggplot (data = iris, mapping = aes (x = Sepal.Length, y = Petal.Length)) +
geom_point () +
theme (text = element_text (size = 18,
family = "ChantillyLH")
)
# export SVG
library (Cairo)
CairoSVG ("My_Path/My_Plot.svg", width = 6, height = 6)
print (beautiful_plot)
dev.off ()
# the resulting SVG-file is in the the "My_Path" - Folder.
In Incscape, it looks like this:
Newer versions of Word can import raster graphics from SVG files. R 3.6.2 has built-in support for creating SVG files with the svg function - no extra packages needed.
Your example then becomes
svg("printsPerfectly.svg", width=4, height=4)
plot(c(1:100), c(1:100), pch=20)
dev.off()
Note that there is a known issue when you try to create PDF files from Word documents with embedded SVG files with thin lines. If you are using thin lines, e.g. with lwd=0.7 somewhere, you need to apply this workaround.

Reduce pdf file size of plot in R

i am plotting some data in R using the following commands:
jj = ts(read.table("overlap.txt"))
pdf(file = "plot.pdf")
plot(jj, ylab="", main="")
dev.off()
The result looks like this:
The problem I have is that the pdf file that I get is quite big (25Mb). Is the a way to reduce the file size? JPEG is not an option because I need a vector graphic.
Take a look at tools::compactPDF - you need to have either qpdf or ghostscript installed, but it can make a huge difference to pdf file size.
If reading a PDF file from disk, there are 3 options for GostScript quality (gs_quality), as indicated in the R help file:
printer (300dpi)
ebook (150dpi)
screen (72dpi)
The default is none. For example to convert all PDFs in folder mypdfs/ to ebook quality, use the command
tools::compactPDF('mypdfs/', gs_quality='ebook')
You're drawing a LOT of lines or points. Vector image formats such as pdf, ps, eps, svg, etc. maintain logical information about all of those points, lines, or other items that increase complexity, which translates to size and drawing time, as the number of points increases. Generally vector images are the best in a number of ways, most compact, scale best, and highest quality reproduction. But, if the number of graphical elements becomes very large then it's often best to go to a raster image format such as png. When you switch to raster it's best to have a good idea what size image you want, both in pixels and also in things like print measurements, in order to produce the best image.
For information from the other direction, too large a raster image, see this answer.
One way of reducing the file size is to reduce the number of values that you have. Assuming you have a dataframe called df:
# take sample of data from dataframe
sampleNo = 10000
sampleData <- df[sample(nrow(df), sampleNo), ]
I think the only other alternative within R is to produce a non-vector. Outside of R you could use Acrobat Professional (which is not free) to optimize the pdf. This can reduce the file size enormously.
Which version of R are you using? In R 2.14.0, pdf() has an argument compress to support compression. I'm not sure how much it can help you, but there are also other tools to compress PDF files such as Pdftk and qpdf. I have two wrappers for them in the animation package, but you may want to use command line directly.
Hard to tell without seeing what the plot looks like - post a screenshot?
I suspect its a lot of very detailed lines and most of the information probably isn't visible - lots of things overlapping or very very small detail. Try thinning your data in one dimension or another. I doubt you'll lose visible information.

Plotting with R - skewed eps image & resolution problem

I want to create a graphics of a function in R. The code is:
x <- seq(from=0, to=1, by=0.00001)
f <- function(x) ....
y <- f(x)
plot(x, y, xlab="x", ylab="f(x)", pch=16, cex=0.5)
min(y)
[1] 0.2291203
max(y)
[1] 0.7708797
When I save the graphics as bmp from RGui, it looks like here and this is fine. When I save it as eps and include in LaTeX with:
\begin{figure}[htbp]
\centering
\includegraphics[scale=0.4]{./images/f-probart.eps}
\end{figure}
it is skewed, as shown in the screen capture from here
What is wrong? I guess there is a problem with exporting from RGui in eps, as the resulted eps is also shown skewed in IrfanView. Hence I suspect that is not the LaTeX inclusion code that is wrong...
How could I create this graphics with a requested resolution, say 244 dpi? Is there another package/function that allows me to export eps with a specific resolution?
Thanks
I cannot reproduce your error, so I guess it's something specific to your system. If I save as eps and include it in latex (using the graphicx package), everything works completely fine. Keep in mind that if you used the postscript() function in R, you have to specify the width and height of your picture as well. I could be wrong, but I think it defaults to the default values of the graphics window in R (which could explain the dimensions of your eps pictures).
If you saved from the graphics window, it normally should take the current width and height of the graphics window. It does so on my R version, but maybe your options are set differently? check ps.options() and see if width and height have value 0. If that's not the case, that could be the problem.
On a side-note : You could use pdf instead. See ?pdf in R. It allows you to specify the width and height of the picture, and reproduces correctly in Latex. You should use pdftex for building the file then.
My experience is that using pdf graphics and pdftex is less trouble than passing through PS. In fact, in latex there is no need to pass through eps any more to come to a decent pdf. Another advantage of using pdftex is that you can easily combine all graphics formats in the same document. (For EPS you need the epstopdf package)
2) the dpi requirement is only useful for grid images, so not for eps and pdf which are vectorized. I'd use png, that's the best format for graphs. See the option res in the function png().
png("somefile.png",res=244)
plot(x, y, xlab="x", ylab="f(x)", pch=16, cex=0.5)
dev.off()
Alternatively, you could use the function bmp() for bitmap graphics in exactly the same way. Don't forget the dev.off() at the end.
I used the Cairo package; the code was:
Cairo(24000,24000,file="a.ps",type="ps",bg="transparent",pointsize=12, units="px", dpi=2400)
plot(x, y, xlab="x", ylab="f(x)", pch=16, cex=0.5, type='l')
dev.off()
The resulted graph looked fine. One question, however: according to #Joris Meys, the dpi is useless for vector graphics; in this case, why specifying dpi for the Cairo function is mandatory?

Resources