Large filesize of PDF output from spplot - r

Using a truncated shapefile and some data that I have added, I have created a colored map using spplot. However, when I export the created graph as a PDF, the filesize is very large and so is the final document. (The PDF is around 50 MB).
Is this normal or am I doing something wrong? I am not sure if I can provide a minimum working example, since the shapefile is also quite large (300MB).
Alternatively, is there any way to reduce the file-size after the fact?

To cite Prof Brian Ripley in http://markmail.org/message/ravkpnjexagmpm4o
If compression is enough, pdf() in R-devel does it, as does cairo_pdf() in current R. And there are other ways than Acrobat to compress/compact a PDF file: see ?tools::compactPDF and the 'Writing R Extensions' manual.

As an alternative to pdf I suggest to generate a high resolution png. I have submitted such pngs without any problem to journals such as computers and geosciences. Using such a png makes the documents they are put into much more workable. If you want to stick to pdf, pdftk (pdf toolkit) allows you to compress pdfs.

library(tools)
help(compactPDF)

Related

R tabulizer encoding or security

I have been practicing with tabulizer package in R and have following problem. Unfortunately I can't offer reproducible example, as pdf is firms property, but I will describe problem in detail.
I'm trying to read PDF that has start and end date in upperright corner. When I open PDF they look normal
Start: 01-Mar-2018
End: 31-Mar-2018
Now the fun part. When I highlight them and use Ctrl+C to copy them here is result when pasted to R.
:tttt: 11-rrr-8118
tt:: 11-rrr-8118
This is exactly same kind of nonsense that extract_text(path, pages=1) will give. A lot of t::ttttt:ttt... My question is that is there some security in this PDF or do I just need to figure out correct encoding or because this PDF is automatically created from system, there is some weird notation to everything?
I figured it out. This PDF is mainly created by metadata (didn't know) and great tool in R for accessing metadata in PDFs is pdftools.
library(pdftools)
pdf_info(path.pdf)
and you can wrangle out all the important metadata bits.

Reducing the output file size of tikzDevice

I'm writing a large latex document with lots of plots generated by R's tikzDevice package. Currently, I'm experiencing LaTeX error "TeX capacity exceeded".
I've managed to fix this issue temporarily by following the remedy from Leo Liu's answer here (I've also tried the accepted answer from the same source), but this solution is temporary as I said, because eventually, I will be adding many more plots to my document with at least 1 meg per plot. Some people say that LuaTex dynamically allocates memory so it is never an issue with LuaTex (same source); however, I have a big document and limited time available, so converting to LuaTex is not an option (unless there is an automatic conversion software).
During my search, I found this post by chance talking about the same problem but with matlab's matlab2tikz package. The solution there is to reduce the number of samples in the plot and thus reduce the size of the resulting file. I looked up R's tikzDevice documentation (here and here) for a similar option, but was not able to find one unfortunately.
So the question is: How can I control the size of the plots (tikzpictures) generated by tikzDevice?
One workaround is to export the file produced by tikzDevice with option standAlone = TRUE. Compile the resulting file with lualatex to avoid the error message "TeX capacity exceeded". This can be done directly from R, using e.g.
tikz(figure.tex, standAlone = TRUE)
...
dev.off()
system("lualatex figure.tex; rm *.aux; rm *.log")
Then, include the file in LaTeX as a graphic using \includegraphics.
To reduce the size of the resulting pdf, you may wish to convert the image to a lossy compression format such as JPEG. Converting to postscript and back can also create smaller files, albeit at the cost of a loss of resolution.

Easy way to copy paste base graph from R Studio into word document

I need to copy paste R graphs into MS Word.
rando<-rnorm(1:100)
plot(rando)
when I copy the .png graph from R Studio into Word I get a negative space version of the graph:
Is there a cleaner/easier way to do this? I would be happy to use pdf or something else to present the graph.
As #Roland suggested:
Export -> Copy Plot to Clipboard (window with plot will pop-out) -> Metafile -> Copy Plot -> Paste to MSWord.
This seems a lot of clicks to me, rather as #user2633645 suggested save all plots as png then insert them in MSWord in one go.
?png
rando<-rnorm(1:100)
png(filename = "rando.png")
plot(rando)
dev.off()
You can use the devEMF package. It is repeatable, easy to use, and converts very nicely to pdf if needed.
(wow is this an old question, but I guess it's still an issue. I'm on RStudio v1.3.1073)
I came across 3 options:
Cross posting from https://github.com/rstudio/rstudio/issues/5103#issuecomment-679021780: "I've found two simple workarounds: Paste Special and choose TIFF, or (my preference) drag the image from the zoom window and drop in PowerPoint"
Snipping tool
You can specify the dimensions like in the RStudio plot export
windows(800,600) ## Opens graphic window. ctrl c/v works here.
plot(rando)
If you are doing this copy-plot-to-Word often or for several plots, consider creating an .Rmd file with your code, call knitr on that file, and use system("pandoc to convert your knitted .md file to Word.

Create and save R's default codebooks as a pdf

If I load data(mtcars) it comes with a very neat codebook that I can call using ?mtcars.
I'm interested to document my data in the same way and, furthermore, save that neat codebook as a pdf.
Is it possible to save the 'content' of ?mtcars and how is it created?
Thanks, Eric
P.S. I did read this thread.
update 2012-05-14 00:39:59 PDT
I am looking for a solution using only R; unfortunately I cannot rely on other software (e.g. Tex)
update 2012-05-14 09:49:05 PDT
Thank you very much everyone for the many answers.
Reading these answers I realized that I should have made my priorities much clearer. Therefore, here is a list of my priorities in regard to this question.
R, I am looking for a solution that is based exclusively on R.
Reproducibility, that the codebook can be part of a automated script.
Readability, the text should be easy to read.
Searchability, a file that can be open with any standard software and searched (this is why I thought pdf would be a good solution, but this is overruled by 1 through 3).
I am currently labeling my variables using label() from the Hmisc package and might end up writing a .txt codebook using Label() from the same package.
(I'm not completely sure what you're after, but):
Like other package documentation, the file for mtcars is an .Rd file. You can convert it into other formats (ASCII) than pdf, but the usual way of producing a pdf does use pdflatex.
However, most information in such an .Rd file is written more or less by hand (unless you use yet another R package like roxygen/roxygen2 help you to generate parts of it automatically.
For user-data, usually Noweb is much more convenient.
.Rnw -Sweave-> -> .tex -pdflatex-> pdf is certainly the most usual way with such files.
However, you can use it e.g. with Openoffice (if that is installed) or use it with plain ASCII files instead of TeX.
Have a look at package knitr which may be easier with pure-ASCII files. (I'm not an expert, just switching over from Sweave)
If html is an option, both Sweave and knitr can work with that.
I don't know how to get the pdf of individual data sets but you can build the pdf of the entire datasets package from the LaTeX version using:
path <- find.package('datasets')
system(paste(shQuote(file.path(R.home("bin"), "R")),"CMD",
"Rd2pdf",shQuote(path)))
I'm not sure on this but it only makes sense you'd have to have some sort of LaTeX program like MikTex. Also I'm not sure how this will work on different OS as mine is windows and this works for me.
PS this is only a partial answer to your question as you want to do this for your data, but if nothing else it may get the ball rolling.
The help page that is displayed when entering ?mtcars is generated from an .Rd file, which is a LaTeX-like file that is used for all of R's help pages. Although .Rd files are LaTeX-like, you don't actually need to know LaTeX to read or write them. The actual mtcars.Rd file is available here: http://commondatastorage.googleapis.com/jthetzel-public/mtcars.Rd , which can be viewed with any text editor.
.Rd files included in the ./man directory of a package are converted to .html files when installing the package. They are converted by functions in the "tools" package.. If you would like functionality like ?mtcars for your datasets, you would need to create a package for them. That might sound complicated if you have never created a package before, but it is easy enough to learn and will make you a better R programmer. There are a number of examples of dataset-only packages on CRAN, for example msProstate: http://cran.r-project.org/web/packages/msProstate/index.html . Consider downloading the package source to see how it is organized.
For more information on creating your own packages, writing .Rd files, and building packages:
http://cran.r-project.org/doc/manuals/R-exts.html, especially "1.1.5 Data in packages".
Edit
And if you want to convert the .Rd file in your package to a .pdf, you can do so when building your package, but you will need a LaTeX compiler. If you are on Windows, see here: http://cran.r-project.org/bin/windows/Rtools/ .
You can't create a PDF with just R; you need to use other software that creates PDFs.
You could use a combination of utils::promptData, tools::Rd2HTML, and a simple custom function to open the created HTML file in the users' browser.
It would probably be easier to just make a package containing your data sets. Look at the "datasets" package for an example.
It looks like that if you want to generate a pdf, an external tool like LaTeX is always needed. I would recommend using a simple ASCII text format to generate such a file. In principle the .Rd files are also ASCII text, but I do not find them particularly readable.
Instead, I would recommend using a plain text ASCII format such as Markdown (which is e.g. used on StackOverflow) to write the text file. Such a file is already much more readable than an .Rd formatted file, and as a bonus it can quite easily be processed into a PDF should you choose to do so later on. The knitr package I think is capable of generating PDF files from Markdown sources. In addition, knitr allows you to mix in R code in the Markdown text. This code can be evaluated and the results (even figures) added to the resulting PDF.
In practice you can use sprintf to generate character vectors that you can pipe to a file in order to dynamically generate the markdown text. Just write the template one time, and mark the places for the text you want to add later like this:
base_text = "
First header
============
This document was generated on %s, by %s.
"
text_forfile = sprintf(text, some_date, some_name)
Just dump the text in text_forfile to a .md file and your done, no external tools needed. See this post on SO for how dump text to a file.

Redirecting R output and graphs

I use Sweave and LaTex to create reports from R output and graphs. But sometime it is required to have graphs in editable format. I tried R2wd package but it doesn't seem very flexible with ggplot2. I'd highly appreciate if someone point out me some efficient ways. Thanks
It really depends what you mean by "editable", and what kinds of files/endpoints you're talking about. There are a lot of discussions out there (e.g. this R-help thread from 2006) about (1) the best options for generating figures to embed in Word (or PowerPoint, which is pretty much the same question) and (2) the best options for figures that can be edited (by which I mean that they can be modified by non-R-users, not just moved from file to file). The general conclusions I have seen are:
PDF files: vector format, editable in Adobe Illustrator ($$$), only sorta-kinda-embeddable in MS Office documents
Windows metafiles or extended metafiles (WMF/EMF): vector format, very limited support outside of the Windows platform. Somewhat wonky format, but MS Office-native. Will certainly have limited support for things like alpha channels (transparency).
SVG: vector format. Very modern, editable in Inkscape (don't know where else), not particularly MS Office-compatible (I think). (Generatable at least via the Cairo package.)
PNG: raster format, but very compact (you can make the resolution absurdly large and still have a reasonably small output file); probably the easiest/lowest-common-denominator solution if you only need portability and not editability.
As of R 2.13.0, Sweave can automatically generate both PDF and PNG files on the fly for each figure chunk. If this is saved as document foo.Rnw:
\documentclass{article}
\begin{document}
\SweaveOpts{png=TRUE,pdf=TRUE,eps=FALSE}
<<fig1,fig=TRUE>>=
plot(1:5,1:5,col=1:5,pch=16)
#
\end{document}
... then after Sweaveing your directory will contain the files foo-fig1.png and foo-fig1.pdf. I don't know if that answers your question, but your question isn't entirely clear ...
The package pgfSweave links together Sweave with tikzDevice. tikzDevice uses the LaTeX tikz package to put the instructions to draw a plot into LaTeX code. So you can copy the resulting code from one file to another. pgfSweave also adds some handy features like cacheing.
You could also just output PDFs of your plots with pdf() and then insert the LaTeX code to load those plots as figures. You lose the automated file management of Sweave that way though.

Resources