Can I reduce pdf file size in knitR/ggplot2 when using a large dataset without using external tools? - r

I have a number of large-ish files which I am reading into R in an rmarkdown document, cleaning up, and plotting with ggplot2.
Most files are about 3Mb in size with around 80,000 lines of data, but some are 12Mb in size, with 318,406 lines of data (Time, Extension, Force).
Time,Extension,Load
(sec),(mm),(N)
"0.00000","0.00000","-4.95665"
"0.00200","0.00000","-4.95677"
"0.00400","0.00000","-4.95691"
"0.10400","-0.00040","-4.95423"
It takes a while to churn through the data and create the pdf file (that's OK), but the PDF file is now nearly 6Mb in size with about 16 graphs in there (in fact 3 graphs which are facet plots using ggplot2).
I understand that the pdf is including a line segment for every datapoint in my dataset, and therefore as I increase the number of graphs the amount of data in the file increases./ However, I don't forsee a requirement to drill down into the pdf document to see that level of detail, and I will have problems emailing it around as it approaches 10Mb).
If I convert pdf to ps using pdf2ps and then go back to pdf with ps2pdf, I get a file about 1/3 of the size of the original pdf, and the quality looks great.
Therefore is there a method from within R/knitR/ggplot2 to reduce the number of points plotted in the pdf images without using an external tool to compress the pdf file ? (or to somehow optimise the pdf generated ?)
Cheers
Pete

You can try changing the graphic device from pdf to png by adding
knitr::opts_chunk$set(dev = 'png')
to your setup chunk.
Or you can add this to your output header
output:
pdf_document:
dev: png
Try different devices (png, jpg). Maybe this will change the size

Related

Reducing the output file size of tikzDevice

I'm writing a large latex document with lots of plots generated by R's tikzDevice package. Currently, I'm experiencing LaTeX error "TeX capacity exceeded".
I've managed to fix this issue temporarily by following the remedy from Leo Liu's answer here (I've also tried the accepted answer from the same source), but this solution is temporary as I said, because eventually, I will be adding many more plots to my document with at least 1 meg per plot. Some people say that LuaTex dynamically allocates memory so it is never an issue with LuaTex (same source); however, I have a big document and limited time available, so converting to LuaTex is not an option (unless there is an automatic conversion software).
During my search, I found this post by chance talking about the same problem but with matlab's matlab2tikz package. The solution there is to reduce the number of samples in the plot and thus reduce the size of the resulting file. I looked up R's tikzDevice documentation (here and here) for a similar option, but was not able to find one unfortunately.
So the question is: How can I control the size of the plots (tikzpictures) generated by tikzDevice?
One workaround is to export the file produced by tikzDevice with option standAlone = TRUE. Compile the resulting file with lualatex to avoid the error message "TeX capacity exceeded". This can be done directly from R, using e.g.
tikz(figure.tex, standAlone = TRUE)
...
dev.off()
system("lualatex figure.tex; rm *.aux; rm *.log")
Then, include the file in LaTeX as a graphic using \includegraphics.
To reduce the size of the resulting pdf, you may wish to convert the image to a lossy compression format such as JPEG. Converting to postscript and back can also create smaller files, albeit at the cost of a loss of resolution.

Can't get images from R to latex

I've generated a number of plots in R, but two specific ones are giving me fits. I have two point processes from R (.ppp objects) that I've generated plots from, but no matter what file type I save them as, Texmaker will not read them in properly. I've tried using .eps files and the epstopdf package, but would only get an empty plot (only the axes and no points). I've tried saving the plot as a .png, .jpeg, and .pdf using the instructions (minus rsweave) here Getting R plots into LaTeX?, but every time I try to compile, I get the error "no output PDF file produced!" because "reading image file failed". I'm at my wit's end here.
I'm using Texmaker along with miktex 2.9, if that helps.
Edit: Don't use dev.new() after using png() when saving images in R (shown in the link). That is what was causing the load error.

How to hide figures in knitr, but create them as png?

I am currently doing some statistical analysis in R and use knitr to generate results and an overview document.
There are some additional plots, which I want to be done and saved as a .png (with specified file name and location), but not included in the generated .html file (too many of them, and they are not at the end).
Using dev.copy(png, ...) works fine for generating the plots, but the figures appear in the .html. If I specify fig.keep=none the .png files are created, but blank.
Is there some way to do what I want?
This is from knitr website:
fig.show: ('asis'; character) how to show/arrange the plots; four
possible values are
asis: show plots exactly in places where they were
generated (as if the code were run in an R terminal)
hold: hold all
plots and output them in the very end of a code chunk
animate: wrap
all plots into an animation if there are mutiple plots in a chunk
hide: generate plot files but hide them in the output document
fig.show = 'hide' worked for me.

Output graph to a two page PDF

I am doing a forest plot and want to save it to a PDF file.
My forest plot is oversize (8in*20in). It can fit in a one page PDF like this:
dev.print(pdf, file="C:\\Work\\plot.pdf", width=8, height=20);
But then it is too long: When I print this PDF on a A4 paper, it has to be shrinked to fit the paper.
So I want to save it to a two-page PDF file (from R). Ps: it is not a question about how to set the printer.
How to do this?
So, you are able to generate an 8in x 20in == 203.2mm x 508mm == 576pt x 1440pt sized PDF showing a plot.
It is not entirely clear to me from your question what exactly you want:
Generate the PDF plot so that it is divided into two different pages from the beginning?
Take the PDF as is and during the print job setup find these settings which would print it onto two different pages by posterizing the original page?
Post-process the PDF that you created to posterize it and create a 2-page output PDF (which you can then print)?
Assuming '1.': generate PDF plot distributed over 2 pages
Sorry, I cannot help here...
Assuming '2.': print setup to print 1 PDF page on 2 sheets of paper
If you print a PDF from Adobe Acrobat or from Adobe Reader, then you'll find a setting in the print dialog named "Poster". Here you can select to print one PDF page across multiple pieces of paper. (It also lets you select if you want some overlap from piece to piece, and if you want to add cut marks and the like to the printouts).
Assuming '3.': post-process 1 PDF page to stretch over 2 A4 pages
MuPDF is a lightweight PDF (and other document formats) viewer, made by the same company that also maintains Ghostscript. MuPDF ships with an additional command line utility, mutool.
Its subcommand poster can divide PDF pages into smaller tiles and 'posterize' them. So this command will achieve what you want:
mutool poster -x 1 -y 2 input.pdf output.pdf
The output.pdf will be divided into 1 part (i.e. not divided) in x-, and into 2 equal parts in y-direction. (You could divide it into any other number of segments if you wanted). So output.pdf will have two pages, each sized 8in x 10in. A4 paper is sized 8.26in x 11.69in when measured in Inches.
When printing these, you'll still need to enable the Print to fit Page Size checkbox in the print dialog if you want to make best use of the A4 page size.
Ghostscript is a command line tool that can (amongst many other functions) be used to process PDF files (PDF in, modified PDF out). It can be (ab)used to cut PDF pages into halfs.
Here are a few previous StackOverflow answers which describe how to do it. You'll need to adapt some parameters to your specific size(s), but the principles should be clear from those examples (even though some of these split pages into left and right halves, not top/bottom as you may require):
Linux-based tool to chop PDFs into multiple pages (SuperUser)
Freeware to split a pdf's pages down the middle? (SuperUser)
Convert PDF 2 sides per page to 1 side per page (SuperUser)
How can I split a PDF's pages down the middle? (SuperUser)
Cropping a PDF using Ghostscript 9.01 (StackOverflow)
PDF - Remove White Margins (StackOverflow)
Split one PDF page into two (StackOverflow)
The method described there is more tedious and not as straight-forward as with the mutool poster method.
Maybe not the answer you are looking for but you could print it in another vectorial format (e.g. svg) and then export it as pdf on two pages with a (vectorial) image editor.
Edit: If ploting in pdf works well despite the big size of the graph there are also tools to split pdf pages. You can find some directions here:
https://superuser.com/questions/437148/how-to-split-a-pdf-onto-multiple-pages-on-command-line
Windows equivalent of pdfposter could be Rasterbator or PosteRazor, for example.

ggplot with ggplot2: pdf very slow to display

I am producing a pdf plot with this kind of command:
ggplot(df, aes(sample = x))+
stat_qq(geom="point",distribution=qexp)+
geom_abline(intercept = 0, slope = 1,linetype='dashed',col='red')
ggsave(file="xxx.pdf")
Than I want to integrate the pdf into a tex file and produce a final pdf document.
But, the ggplot is very slow to display and makes the pdf crash very often.
When I use geom='line' it doesn't happen so I guess it comes from the number of circle points.
Do you have any idea on how to solve this? I really prefer the geom='point' option.
PDFs are vector based - so every single point on your chart has to be loaded individually. This produces a 'load-up' sort of effect on your PDF. My solution would be to save as a high DPI png/gif instead:
ggsave(file="xxx.png", dpi=400) #default is 300 which is probably sufficent
Tex to pdflatex (or AN Other) will find the file 'xxx' if you not forced an extension in your R to Tex conversion as the include statement will usually not mention an extension. You will need to make sure that the pdf is deleted from the your charts folders to ensure it doesn't get picked up in preference to the png.

Resources