This is more of an annoyance than it is a problem, but is there any way to prevent the line "overflow" that occurs when documentation in R is compiled and a line is too long?
A snippet of some documentation created with R CMD Rd2pdf [options] files:
I can't find mention of this anywhere, and the only options for Rd2pdf are:
Options:
-h, --help print short help message and exit
-v, --version print version info and exit
--batch no interaction
--no-clean do not remove created temporary files
--no-preview do not preview generated PDF file
--encoding=enc use 'enc' as the default input encoding
--outputEncoding=outenc
use 'outenc' as the default output encoding
--os=NAME use OS subdir 'NAME' (unix or windows)
--OS=NAME the same as '--os'
-o, --output=FILE write output to FILE
--force overwrite output file if it exists
--title=NAME use NAME as the title of the document
--no-index don't index output
--no-description don't typeset the description of a package
--internals typeset 'internal' documentation (usually skipped)
Sorry for the spoiler: One solution is not using roxygen2 in extremis for package maintenance. Why not maintain your DESCRIPTION manually? You don't need many changes and it looks so much better...
The 'Collate' is really not needed for the vast majority of packages.
Why not follow the tradition of many bioconductor packages, 'Matrix' etc, of
putting S4 class definitions (including reference classes) in a file 'AllClasses.R' and maybe use 'AllGenerics.R' as well, and for the rest, collation order should not matter.
Related
When I try to render the latest version of the book R for Data Science (R4DS), I get as far as LaTeX compilation, then am stopped by the following error message.
! Text line contains an invalid character.
l.406 #> -- ^^[
[1mAttaching packages^^[[22m --------------------------------...
Error: LaTeX failed to compile _main.tex. See https://yihui.org/tinytex/r/#debugging for debugging tips. See _main.log for more info.
>
This corresponds to the part of the R4DS book where we are shown how to load the tidyverse and, looking at the _main.tex file, I see many lines with what look like ANSI escape sequences starting on this line. They have the form ^[[1m, ^[[22m, and so on. I manually compiled the LaTeX output using lualatex and found that there are dozens if not hundreds of examples of this throughout the book. I suspected it was because I was using the colorout package in R, but it appears that that package is required, so others who are rendering successfully must be using it too. I believe I have successfully updated all relevant packages.
It looks like I "solved" the problem by changing an option in the _common.R file from crayon.enabled=TRUE to crayon.enabled=FALSE. This removed the ANSI escape sequences from the book. Previously I had tried setting options(crayon.enabled=FALSE) in my R session, but this was evidently being overridden by the setting in _common.R.
Update: 23 Nov 2022
The process for rendering the files is completely different now because of the switch to Quarto. Here's how I did it.
Rscript -e 'update.packages()'
Rscript -e 'install.packages('quarto')'
Rscript -e 'devtools::install_github("hadley/r4ds")
git clone https://github.com/hadley/r4ds.git
cd r4ds
Next, I wrote a small perl script to avoid the error messages I was getting about trying to render html material to pdf. (I'm omitting a lot of dead-ends I encountered in the process.)
#!/usr/bin/perl
use File::Slurp qw(prepend_file);
my #files = glob( '*.qmd' );
my $header = "\n---\nprefer-html: true\n---\n\n";
foreach my $file (#files) {
prepend_file($file, $header);
}
I ran the above script in the r4ds directory.
Next I loaded R and did the following:
library(quarto)
quarto_render("index.qmd", output_format = "pdf")
The above failed with the error message: "\begin{document} not found". Luckily, the aborted process leaves an index.tex file I can process and also gives a line number for the error. I went to that line number in the index.tex file and deleted the block of html I found there.
After that, I ran
lualatex index.tex
twice and got a successful render, minus the cover page. (You could presumably run xelatex index.tex instead.) There are a lot of problems with my render, such as the plots being too large to fit on the page. If I decide to spend time fixing them (unlikely, since Hadley seems to want us to use the online version) I'll modify this answer.
I am developing a R package. It is based on a project that only used Makefile. Most of it easily translated to the R CMD build workflow. However the pdfs I need to create are a bit complex and I don't get them right unless I tinker - so far I figured how to do it with a Makefile.
In the R package documentations I find references to use Makefiles for sources and even for vignettes.
I don't grasp how these should be applied. From these documentations I had the impression Makefiles would be called in the process of R CMD build but when I put Makefile in the described directories they are just ignored. However R CMD check recognises them and outputs passing tests.
I also have seen some Makefiles that call R CMD build inside - but I keep wondering how these would execute when I use install.packages. That doesn't seem right - I mean why would R CMD check these if it wouldn't care about. And there's also this page in R packages about adding SystemRequiremens: GNU make - why do this for a file you don't use?
So what is the best practice nowadays? And are there examples in the wild that I can look at?
Updates
As I was asked for an example
I want to build a vignette as similar as described in "Writing package vignettes". There is a master Latex file which includes several Rnw files.
The concrete dilemmas are:
how do I build the pdf vignette?
how can I enforce dependencies — obviously the rnws need to rendered first
the Rnw need slowly calculated data that is neither intended to go into the package nor in the repo (it's some gigabytes) — but it is reused several times during the build.
So far I do it with a Makefile, the general pattern is like this:
tmp/test.pdf: tmp/test.tex tmp/rnw1.tex tmp/rnw2.tex
latexmk -outdir=$(#D) $<
tmp/%.tex: r/%.rnw
Rscript -e "knitr::knit('$<', output='$#')"
tmp/rnw1.tex tmp/rnw2.tex: tmp/slowdata.Rdata
tmp/slowdata.Rdata: r/ireallytakeforever.R
Rscript $<
Bdecaf,
Ok, answer version 2.0 - chuckle.
You mentioned that "The question is how Makefiles and the package build workflow are supposed to go together". In that context, my recommendation is you review a set of example R package makefiles:
Makefile for Yihui Xie's knitr package for R.
Makefile for my R/qtlcharts package.
The knitr package makefile (in my view) provides a good example of how to build vignettes. You need to review the makefile and directory structure, that would be the template I would recommend you review and use.
I'd also recommend you look at maker, a Makefile for R package development. On top of this, I would start with Karl Broman guides - (this is what I used myself as a source reference a while back now eclipsed by Hadley's book on packages but still useful (in my view).
Minimal make: A minimal tutorial on Make
R package Primer.
The other recommendation is to read Rob Hynman's article I referenced previously
Makefiles for R/LaTeX projects
between them, you should be able to do what you request. Above and beyond that you have the base R package manual you referenced.
I hope the above helps.
T.
Referenced pages:
minimal make A minimal tutorial on make - Author Karl Broman
I would argue that the most important tool for reproducible research is not Sweave or knitr but GNU make.
Consider, for example, all of the files associated with a manuscript. In the simplest case, I would have an R script for each figure plus a LaTeX file for the main text. And then a BibTeX file for the references.
Compiling the final PDF is a bit of work:
Run each R script through R to produce the relevant figure.
Run latex and then bibtex and then latex a couple of more times.
And the R scripts need to be run before latex is, and only if they’ve changed.
A simple example
GNU make makes this easy. In your directory for the manuscript, you create a text file called Makefile that looks something like the following (here using pdflatex).
mypaper.pdf: mypaper.bib mypaper.tex Figs/fig1.pdf Figs/fig2.pdf
pdflatex mypaper
bibtex mypaper
pdflatex mypaper
pdflatex mypaper
Figs/fig1.pdf: R/fig1.R
cd R;R CMD BATCH fig1.R
Figs/fig2.pdf: R/fig2.R
cd R;R CMD BATCH fig2.R
Each batch of lines indicates a file to be created (the target), the files it depends on (the prerequisites), and then a set of commands needed to construct the target from the dependent files. Note that the lines with the commands must start with a tab character (not spaces).
Another great feature: in the example above, you’d only build fig1.pdf when fig1.R changed. And note that the dependencies propagate. If you change fig1.R, then fig1.pdf will change, and so mypaper.pdf will be re-built.
One oddity: if you need to change directories to run a command, do the cd on the same line as the related command. The following would not work:
### this doesn't work ###
Figs/fig1.pdf: R/fig1.R
cd R
R CMD BATCH fig1.R
You can, however, use \ for a continuation line, line so:
### this works ###
Figs/fig1.pdf: R/fig1.R
cd R;\
R CMD BATCH fig1.R
Note that you still need to use the semicolon (;).
Using GNU make
You probably already have GNU make installed on your computer. Type make --version in a terminal/shell to see. (On Windows, go here to download make.)
To use make:
Go into the the directory for your project.
Create the Makefile file.
Every time you want to build the project, type make.
In the example above, if you want to build fig1.pdf without building mypaper.pdf, just type make fig1.pdf.
Frills
You can go a long way with just simple make files as above, specifying the target files, their dependencies, and the commands to create them. But there are a lot of frills you can add, to save some typing.
Here are some of the options that I use. (See the make documentation for further details.)
Variables
If you’ll be repeating the same piece of code multiple times, you might want to define a variable.
For example, you might want to run R with the flag --vanilla. You could then define a variable R_OPTS:
R_OPTS=--vanilla
You refer to this variable as $(R_OPTS) (or ${R_OPTS}; either parentheses or curly braces is allowed), so in the R commands you would use something like
cd R;R CMD BATCH $(R_OPTS) fig1.R
An advantage of this is that you just need to type out the options you want once; if you change your mind about the R options you want to use, you just have to change them in the one place.
For example, I actually like to use the following:
R_OPTS=--no-save --no-restore --no-init-file --no-site-file
This is like --vanilla but without --no-environ (which I need because I use the .Renviron file to define R_LIBS, to say that I have R packages defined in an alternative directory).
Automatic variables
There are a bunch of automatic variables that you can use to save yourself a lot of typing. Here are the ones that I use most:
$# the file name of the target
$< the name of the first prerequisite (i.e., dependency)
$^ the names of all prerequisites (i.e., dependencies)
$(#D) the directory part of the target
$(#F) the file part of the target
$(<D) the directory part of the first prerequisite (i.e., dependency)
$(<F) the file part of the first prerequisite (i.e., dependency)
For example, in our simple example, we could simplify the lines
Figs/fig1.pdf: R/fig1.R
cd R;R CMD BATCH fig1.R
We could instead write
Figs/fig1.pdf: R/fig1.R
cd $(<D);R CMD BATCH $(<F)
The automatic variable $(<D) will take the value of the directory of the first prerequisite, R in this case. $(<F) will take value of the file part of the first prerequisite, fig1.R in this case.
Okay, that’s not really a simplification. There doesn’t seem to be much advantage to this, unless perhaps the directory were an obnoxiously long string and we wanted to avoid having to type it twice. The main advantage comes in the next section.
Pattern rules
If a number of files are to be built in the same way, you may want to use a pattern rule. The key idea is that you can use the symbol % as a wildcard, to be expanded to any string of text.
For example, our two figures are being built in basically the same way. We could simplify the example by including one set of lines covering both fig1.pdf and fig2.pdf:
Figs/%.pdf: R/%.R
cd $(<D);R CMD BATCH $(<F)
This saves typing and makes the file easier to maintain and extend. If you want to add a third figure, you just add it as another dependency (i.e., prerequisite) for mypaper.pdf.
Our example, with the frills
Adding all of this together, here’s what our example Makefile will look like.
R_OPTS=--vanilla
mypaper.pdf: mypaper.bib mypaper.tex Figs/fig1.pdf Figs/fig2.pdf
pdflatex mypaper
bibtex mypaper
pdflatex mypaper
pdflatex mypaper
Figs/%.pdf: R/%.R
cd $(<D);R CMD BATCH $(R_OPTS) $(<F)
The advantage of the added frills: less typing, and it’s easier to extend to include additional figures. The disadvantage: it’s harder for others who are less familiar with GNU Make to understand what it’s doing.
More complicated examples
There are complicated Makefiles all over the place. Poke around github and study them.
Here are some of my own examples:
Makefile for my AIL probabilities paper
Makefile for my phylo QTL paper
Makefile for my pre-CC probabilities paper
Makefile for a talk on interactive graphs.
Makefile for a talk on QTL mapping for function-valued
traits.
Makefile for my R/qtlcharts package.
And here are some examples from Mike Bostock:
Makefile for us-rivers
Makefile for protovis
Makefile for topotree
Also look at the Makefile for Yihui Xie’s knitr package for R.
Also of interest is maker, a Makefile for R package development.
Resources
GNU make webpage
Official manual
O’Reilly Managing projects with GNU make book (part of the Open Books project)
Software carpentry’s make tutorial
Mike Bostock’s “Why Use Make”
GNU Make for reproducible data analysis by Zachary Jones
Makefiles for R/LaTeX projects by Rob Hyndman
R package primer
a minimal tutorial
A minimal tutorial on how to make an R package.
R packages are the best way to distribute R code and documentation,
and, despite the impression that the official manual
(Writing R Extensions)
might give, they really are quite simple to create.
You should make an R package even for code that you don't plan to
distribute. You'll find it is easier to keep track of your own
personal R functions if they are in a package. And it's good to write
documentation, even if it's just for your future self.
Hadley Wickham wrote
a book about R packages (free online; also
available in paper form from
Amazon). You
might just jump straight there.
Hilary Parker wrote a
short and clear tutorial on writing R packages.
If you want a crash course, you should start there. A lot of people
have successfully built R packages from her instructions.
But there is value in having a diversity of
resources, so I thought I'd go ahead and write my own minimal tutorial.
The following list of topics looks forbidding, but each is short and
straightforward (and hopefully clear). If you're put off by the list
of topics,
and you've not already abandoned me in favor of
Hadley's book, then why aren't you reading
Hilary's tutorial?
If anyone's still with me, the following pages cover the essentials of
making an R package.
Why write an R package?
The minimal R package
Building and installing an R package
Making it a proper package
Writing documentation with Roxygen2
Software licenses
Checking an R package
The following are important but not essential.
Putting it on GitHub
Getting it on CRAN
Writing vignettes
Writing tests
Including datasets
Connecting to other packages
The following contains links to other resources:
Further resources
If anything here is confusing (or wrong!), or if I've missed
important details, please
submit an issue, or (even
better) fork the GitHub repository for this website,
make modifications, and submit a pull request.
The source for this tutorial is on github.
Also see my tutorials on
git/github,
GNU make,
knitr,
making a web site with GitHub Pages,
data organization,
and reproducible research.
At our site, we have a large amount of custom R code that is used to build a set of packages for internal use and distribution to our R users. We try to maintain the entire library in a versioning scheme so that the version numbers and the date are the same. The problem is that we've gotten to the point where the number of packages is substantial enough that manual modification of the DESCRIPTION file and the package .Rd file is very time consuming, and it would be nice to automate these pieces.
We could write a pre-script that goes through the full set of files and writes the current data and version number. This could be done with out a lot of pain, but it would modify our current build chain and we would have to adapt the various steps.
Is there a way that this can be done without having to do a pre-build file modification step? In other words, can the DESCRIPTION file and the .Rd file contain something akin to an environment variable that will be substituted with the current information when called upon by R CMD build ?
You cannot use environment variables as R, when running R CMD build ... or R CMD INSTALL ..., sees the file as fixed.
But the no problem that cannot be fixed by another layer of indirection saying remains true. Your R source code could simply be files within another layer in which you text substitution according to some pattern. If you like autoconf, you could just have DESCRIPTION.in and have a configure script query the environment variables, or a meta-config file or database, or something else, and have that written out. Similarly you could have a sed or perl or python or R or ... script doing the textual substitution.
I used to let svn fill in the argument to Date: in DESCRIPTION, and also encoded revision numbers in an included header file. It's all scriptable to your heart's content.
For example, I am looking into a R function PTdensity.R source code in a package called DPpackage, where I found the author called a fortran function ptdensityu :
foo <- .Fortran("ptdensityu", ...
The thing is how to find source code for ptdensityu subroutine. It may contains in a certain fortran file in /src/ directory, but how do I know which file is it. (Actually I found it by manually check each file under /src/ and found it is in the /src/PTudensity.f.)
Quick link for the package : link
PS: I used to use this link to search source code, but somehow it does not work any more.
On a linux box, you use the grep command. In emacs, you build a tags file. In other editors there's probably similar functions. In Windows, can't you right click on a folder, hit Search... and fill in the 'A word or phrase in the file" box. Or install cygwin and use the grep command.
Amazes me that people are using computers without basic skills such as finding a string in a file...
Have you untarred this...............?
(It was the first subroutine in PTudensity.f)
I've now got everything to work properly on my Mac OS X 10.6 machine so that I can create decent looking LaTeX documents with Sweave that include snippets of R code, output, and LaTeX formatting together. Unfortunately, I feel like my work-flow is a bit clunky and inefficient:
Using TextWrangler, I write LaTeX code and R code (surrounded by <<>>= above and # below R code chunk) together in one .Rnw file.
After saving changes, I call the .Rnw file from R using the Sweave command
Sweave(file="/Users/mymachine/Documents/Assign4.Rnw",
syntax="SweaveSyntaxNoweb")
In response, R outputs the following message:
You can now run LaTeX on 'Assign4.tex'
So then I find the .tex file (Assign4.tex) in the R directory and copy it over to the folder in my documents ~/Documents/ where the .Rnw file is sitting (to keep everything in one place).
Then I open the .tex file (e.g. Assign4.tex) in TeXShop and compile it there into pdf format. It is only at this point that I get to see any changes I have made to the document and see if it 'looks nice'.
Is there a way that I can compile everything with one button click? Specifically it would be nice to either call Sweave / R directly from TextWrangler or TeXShop. I suspect it might be possible to code a script in Terminal to do it, but I have no experience with Terminal.
Please let me know if there's any other things I can do to streamline or improve my work flow.
I use a Makefile of the following form for my Sweave documents:
pdf: myfile.tex
R CMD texi2pdf myfile.tex
myfile.tex: myfile.Rnw
R CMD Sweave myfile.Rnw
Then I can build the document in one step in the Mac OS Terminal by running the command make pdf
I'm sure there is a way to bring this closer to your one-click goal in Mac OS X, but this works well enough for me.
One-click Sweaving is easy to do in TeXShop using the Sweave.sh script by Gregor Gorjanc.
Get it from http://cran.r-project.org/contrib/extra/scripts/Sweave.sh and put it in your ~/Library/TeXShop/bin/ folder.
Then add the following files to your ~/Library/TeXShop/engines/ folder:
As Sweave.engine:
#!/bin/bash
~/Library/TeXShop/bin/Sweave.sh -ld "$1"
As SweaveNoClean.engine:
#!/bin/bash
~/Library/TeXShop/bin/Sweave.sh -nc -ld "$1"
You'll have to set the permissions on Sweave.sh and the two engine files to allow execution.
To Sweave with one click, restart TeXShop after adding these files, open the Sweave document (with Rnw extension) and in the dropdown menu above the document window, change it from LaTeX to Sweave or SweaveNoClean.
BEWARE: The "Sweave" option wll clean up after itself, deleting all the extra files LaTeX and Sweave creates. If your file is called myfile.Rnw, this will include files called myfile.R and myfile.tex. So a word to the wise: make sure the basename of your Rnw file is unique; then nothing unexpected will be written over and then deleted.
The SweaveNoClean option does not clean up after itself. This makes sure you don't delete anything unexpected; though it could still write over a file called myfile.tex if you Sweave a myfile.Rnw. This also doesn't delete any graphics that have been created, in case you want to have them separate from your full typeset document.
On the bash shell command line:
R CMD Sweave foo.Rnw && pdflatex foo.tex
Runs Sweave, and if that succeeds it goes on to do pdflatex. Out pops a pdf. If you've got this in a bash Terminal then just hit up-arrow to get it back and do it again. And again. And Again.
Makefile solution also good.
RStudio has a button that does this in one go. One caveat is that it runs in its own session, so any workspace variables you may have set are ignored.
Just a note: you can actually call things like pdflatex etc. directly from R using texi2dvi (in the tools package). For example:
Sweave(file="/Users/mymachine/Documents/Assign4.Rnw")
texi2pdf("Assign4.tex")
would compile your Rnw file into a pdf. Thus, no need to leave R to handle the tex->pdf step.
I use these (saved as sweave.engine and sweavebibtex.engine) for custom engines in texshop. I usually work up a code chunk in R, then copy the block into the rnw file I have open in texshop. I'd love a solution that lets me do syntax highlighting and spelling correction of R and tex in the same document (that isnt emacs).
#!/bin/bash
echo 'SWEAVE | PDFLATEX. Custom engine--Akasurak-16Nov2009'
export PATH=$PATH:/usr/texbin:/usr/local/bin
R CMD Sweave "$1"
pdflatex "${1%.*}"
and the second, for doing bibtex as well:
#!/bin/bash
date
before="$(date +%s)"
echo 'SWEAVE | PDFLATEX | BIBTEX | PDFLATEX | PDFLATEX. Custom engine--Akasurak-16Nov2009'
#Updated 20Jul2010 for auto including Sweave.sty
export PATH=$PATH:/usr/texbin:/usr/local/bin
R CMD Sweave "$1"
R CMD pdflatex "${1%.*}"
bibtex "${1%.*}.aux"
R CMD pdflatex "${1%.*}"
R CMD pdflatex "${1%.*}"
after="$(date +%s)"
elapsed_seconds="$(expr $after - $before)"
date
echo Elapsed time '(m:s)': $(date -r $elapsed_seconds +%M:%S)
Can't say they are the best way of doing things, but they do work.
I use either Aquamacs or Eclipse to do the editing of the .Rnw file, then I use the following shell function to compile & view it:
sweaveCache () {
Rscript -e "library(cacheSweave); setCacheDir(getwd());
Sweave('$1.Rnw', driver = cacheSweaveDriver)" &&
pdflatex --shell-escape $1.tex &&
open $1.pdf
}
Notice that I'm using the cacheSweave driver, which helps avoid constantly re-executing code sections that take a long time to run.
BTW, I'm also trying to switch over to Babel instead of Sweave; not sure which I'll end up using more often, but there are definitely aspects of Babel that I like.
The best solution is here: you create a new *.engine for TeXShop to use, then typeset using the shortcut or the 1 button.
http://cameron.bracken.bz/sweave-for-texshop
Cameron is also very responsive, so I highly recommend his solution.
If you are open to switching to a (paid) solution, TextMate has a Sweave plugin that takes you from .Rnw to PDF in one step: Sweave, typeset, and view. Combined with Skim, which can be configured to reload PDFs, it makes tweaking files pretty easy.
I had this same issue (I use Mac OSX) and I opted to download Eclipse Classic 3.6.2. and then installed the StatET plugin. It's a bit hairy to get set up but once you do this environment is nice because you can one-click compile your .Rwn Sweave document using pdflatex and set options for your favorite viewer so the .pdf automatically pops up when you compile like it does in TeXShop. You can do this in TeXShop as well, but TeXShop is lousy for debugging .Rnw files and it doesn't highlight the R-code in the .Rwn file. In Eclipse you can customize the syntax highlighting (not the greatest from the Texclipse end, but ok) so that you can easily distinguish between your R and LaTeX code. You can also launch the R console from within Eclipse and it has a graphical object browser. Anyway, I could go on. If you want details about how to get it all installed, message me.
Guess I'm late to the party on this, but I put together a webpage that documents my Sweave workflow based on Eclipse (with one-touch sweave):
http://www.stanford.edu/~messing/ComputationalSocialScienceWorkflow.html