How to extract all code chunks from a Rnw Sweave file? - r

I received a .Rnw file that gives errors when trying to build the package it belongs to. Problem is, when checking the package using the tools in RStudio, I get no useful error information whatsoever. So I need to figure out first on what code line the error occurs.
In order to figure this out, I wrote this 5-minute hack to get all code chunks in a separate file. I have the feeling though I'm missing something. What is the clean way of extracting all code in an Rnw file just like you run a script file? Is there a function to either extract all, or run all in such a way you can find out at which line the error occurs?
My hack:
ExtractChunks <- function(file.in,file.out,...){
isRnw <- grepl(".Rnw$",file.in)
if(!isRnw) stop("file.in should be an Rnw file")
thelines <- readLines(file.in)
startid <- grep("^[^%].+>>=$",thelines)
nocode <- grep("^<<",thelines[startid+1]) # when using labels.
codestart <- startid[-nocode]
out <- sapply(codestart,function(i){
tmp <- thelines[-seq_len(i)]
endid <- grep("^#",tmp)[1] # take into account trailing spaces / comments
c("# Chunk",tmp[seq_len(endid-1)])
})
writeLines(unlist(out),file.out)
}

The two strategies are Stangle (for a Sweave variant) and purl for a knitr variant. My impression for .Rnw files is that they are more or less equivalent, but purl should work for other types of files, as well.
Some simple examples:
f <- 'somefile.Rnw'
knitr::purl(f)
Stangle(f)
Either way you can then run the created code file using source.
Note: This post describes an chunk option for knitr to selectively purl chunks, which may be helpful, too.

Related

Is it possible to copy one (or 2, or 3) code chunks from one R Markdown document to another R Markdown document without copy/pasting?

I have read previously asked similar questions here, here, and here (among many more).
Unfortunately, none of the solutions offered to those questions seem to solve my issue.
I tried the function written by #bryanshalloway here but that did not have the desired result.
For more context, I am producing scientific manuscripts using an R Markdown workflow. I perform EDA in one notebook and later come back to write the manuscript in a different notebook. I import the data, wrangle it, create tables, and do some basic visualizations in the EDA notebook and include narrative text (mostly for myself).
I then create a separate notebook to write the manuscript. To keep it reproducible, I want to include all of the steps from the EDA with respect to data import, tidying, and wrangling, however I do not need the commentary that went along with it. Additionally, I may want some (but definitely not all) of the tables and basic visualizations I created during the EDA, but I would need to build them up substantially to get them publication ready.
My current workflow is just copying and pasting the relevant code chunks and then adding to those where necessary for tables and figures (i.e., adding labels and captions to a ggplot).
Is there a way to "source" these individual code chunks from one R Markdown file/R Notebook into another? Perhaps using knit_child (but not bring the entire R Markdown file into the current parent file)?
I would like to avoid copying the desired code chunks into separate R scripts.
Thanks!
It is very possible with knitr purl and spin:
Ok lets say this is your initial Rmarkdown report:
call the file report1.Rmd
---
title: Use `purl()` to extract R code
---
The function `knitr::purl()` extracts R code chunks from
a **knitr** document and save the code to an R script.
Below is a simple chunk:
```{r, simple, echo=TRUE}
1 + 1
```
Inline R expressions like `r 2 * pi` are ignored by default.
If you do not want certain code chunks to be extracted,
you can set the chunk option `purl = FALSE`, e.g.,
```{r, ignored, purl=FALSE}
x = rnorm(1000)
```
Then you go to the console and purl the file:
> knitr::purl("report1.Rmd")
this creates an R file called report1.R in the same directory you are in,
with only the chunks that are not purl=false.
Its an simple R script looking like this:
## ---- simple, echo=TRUE----------------------------------------------------------------------------
1 + 1
Lets rename the file for safety purposes:
> file.rename("report1.R", "report_new.R")
Finally lets spin it back to report_new.Rmd :
> knitr::spin("report_new.R", format = "Rmd", knit=F)
This gives you a new Rmd file called report_new.Rmd containing only the relevant chunks and nothing else
```{r simple, echo=TRUE}
1 + 1
```

Automatically generated LaTeX beamer slides with R/knitr

I am working on a LaTeX report template that automatically generates a beamer document, pulling in figures from specified directories and placing them one per slide.
Here is an example of the code that I am using for this, as a code chunk in my .Rnw document:
<<results='asis',echo=FALSE>>=
suppressPackageStartupMessages(library("Hmisc"))
# get the plots from the common directory
Barplots_dir<-"/home/figure/barplots"
Barplots_files<-dir(Barplots_dir)
# create a beamer slide for each plot
# use R to output LaTeX markup into the document
for(i in 1:length(Barplots_files)){
GroupingName<-gsub("_alignment_barplot.pdf", "", Barplots_files[i]) # strip this from the filename
file <- paste0(Barplots_dir,"/",Barplots_files[i]) # path to the figure
cat("\\subsubsection{", latexTranslate(GroupingName), "}\n", sep="") # don't forget you need double '\\' because one gets eaten by R !!
cat("\\begin{frame}{", latexTranslate(GroupingName), " Alignment Stats}\n", sep="")
cat("\\includegraphics[width=0.9\\linewidth,height=0.9\\textheight,keepaspectratio]{", file, "}\n", sep="")
cat("\\end{frame}\n\n")
}
#
However I recently came across this article by Yihui Xie which includes a remark about cat("\\includegraphics{}") being a bad idea. Is there a reason for this, and is there a better option?
To be clear, these figures are generated by other programs as part of a larger pipeline; generating them within the document is not an option, but I need the document to be able to dynamically find and insert them into the report. I know that there are some capabilities to do this directly from within LaTeX itself but cat'ing out the LaTeX markup I need seemed like an easier and more flexible task.
cat("\\includegraphics{}") is likely to be a bad idea if you are from the old Sweave world (where one might need to open a graphics device, draw a plot, close the device, and cat("\\includegraphics{}")). No kittens will be killed as long as you understand what you are doing. Your use case seems to be very reasonable to me, and I don't have a better approach.

How to include output of help() in sweave pdf

I would like to include function documentation from the help file in a sweave document. I tried the following sweave block
<<>>=
?lm
#
but I get error messages when calling Sweave on the Rnw file. How can I include the entire help message in the document?
The key to this is really figuring out how to get the information you desire as a character string.
help("lm") opens up the help file for the relevant function, but not in the console.
utils:::.getHelpFile gives you the Rd version of that file.
From there, you can use tools:::Rd2txt to convert it to text...
Which can be "captured" using capture.output.
Those are essentially the steps contained in the first few lines of helpExtract from my "SOfun" package. That function, however, captures just the requested section.
Instead, if you can settle for just the text, you can do something along the lines of:
gsub("_\b", "",
capture.output(tools:::Rd2txt(
utils:::.getHelpFile(utils::help("lm")))))

Editing a .r file from within another .r file

I am trying to make my current project reproducible, and so am creating a master document (eventually a .rmd file) that will be used to call and execute several other documents. This way myself and other investigators only need to open and run one file.
There are three layers to the current setup: master file, 2 read-in files, 2 databases. The master file calls the read-in files using source(), and the read-in files parse the .csv databases and apply labels.
The read-in files and the databases are generated automatically with the data management software I'm currently using (REDCap) each time I download the updated data.
However, the read-in files have a line of code that removes all of the objects in my environment. I would like to edit the read-in files directly from the master file so that I do not have to open the read-in files individually each time I run my report. Specifically, since all the read-in files are the same, I would like to remove line #2 in each.
I've tried searching Google, and tried file.edit(), but have been unable to find anything. Not even sure it is possible, but figured I would ask. Let me know if I can improve this question or if you need any additional code to answer it. Thanks!
Current relevant master code (edited for generality):
source("read-in1")
source("read-in2")
Current relevant read-in file code (same in each file, except for the database name):
#Clear existing data and graphics
rm(list=ls())
graphics.off()
#Load Hmisc library
library(Hmisc)
#Read Data
data=read.csv('database.csv')
#Setting Labels
[read-in code truncated]
Additional details:
OS: Windows 7 Professional x86
R version: 3.1.3
R Studio version: 0.99.441
You might try readLines() and something like the following (which was simplified greatly by a suggestion from #Hong Ooi below):
eval(parse(readLines("read-in1.R")[-2]))
My original solution which was much more pedantic:
f <- file("read-in1.R", open="r")
t <- readLines(f)
close(f)
for (l in t[-2]) { eval(parse(text=l)) }
The for() loop just parses and evaluates each line from the text file except for the second one (that's what the -2 index value does). If you're reading and writing longer files then the following will be much faster than the second option, however still less preferable than #Hong Ooi's:
f <- file("read-in1.R", open="r")
t <- readLines(f)
close(f)
f <- file("out.R", open="w")
o <- writeLines(t[-2], f)
close(f)
source("out.R")
Sorry I'm so late in noticing this question, but you may want to investigate getting access the the REDCap API and using either the redcapAPI package or the REDCapR package. Both of those packages will allow you to export the data from REDCap and directly into R without having to use the download scripts. redcapAPI will even apply all the formats and dates (REDCapR might do this now too. It was in the plan, but I haven't used it in a while).
You could try this. It just calls some shell commands: (1) renames the file, then (2) copies all lines not containing rm(list=ls()) to a new file with the same name as the original file, then (3) removes the copy.
files_to_change <- c("read-in1.R", "read-in2.R")
for (f in files_to_change) {
old <- paste0(f, ".old")
system(paste("cmd.exe /c ren", f, old))
system(paste("cmd.exe /c findstr /v rm(list=ls())", old, ">", f))
system(paste("cmd.exe /c rm", old))
}
After calling this loop you should have
#Clear existing data and graphics
graphics.off()
#Load Hmisc library
library(Hmisc)
#Read Data
data=read.csv('database.csv')
#Setting Labels
in your read-in*.R files. You could put this in a batch script
#echo off
ren "%~f1" "%~nx1.old"
findstr /v "rm(list=ls())" "%~f1.old" > "%~f1"
rm "%~nx1.old"
say, "example.bat", and call that in the same way using system.

practically getting started with Sweave

my question(s) might be less general than the title suggests. I am running R on Mac OS X with a MySQL database to store the data. I have been working with the Komodo / Sciviews-R for some time. Recently I had the need for auto-generated reports and looked into Sweave. I guess StatET / Eclipse appears to be the "standard" solution for Sweavers.
1) Is it reasonable to switch from Komodo to StatET Eclipse? I tried StatET before but chose Komodo over StatET because I liked the calltip / autosuggest and the more convenient config from Komodo so much.
2) What´s a reasonable workflow to generate Sweave files? Usually I develop my R code first and then care about the report later. I just learned today that there is one file in Sweave that contains R code and Latex code at once and that from this file the .tex document is created. While the example files look handily and can't really imagine how to enter my 250 + lines of R code to a file and mixed it up with Latex.
Is it possible to just enter the qplot() and ggplot() statements to a such a document and source the functionality like database connection and intermediate results somehow?
Or is it just a matter of being used to the mix of Latex and R code?
Thx for any suggestions, hints, links and back-to-the-roots-shout-outs…
You've asked several questions, so here's several answers;
Is StatEt/Eclipse the right way to do Sweave ?
Not nessarily (note: I'm an avid StatEt/Eclipse user, and use it for both pure R and Sweave/R and love it, I haven't used Komodo / sciviews-R). You should be able to run the sweave command from any R command line which will generate a .tex file. You can then turn the .tex file into something readable (like pdf) from any tex environment.
What's a good Sweave workflow ?
When I have wanted to turn an r script into a sweave report I generaly start with an empty sweave template and copy/paste my entire R script into a sweave R block just after the title, i.e;
<<label=myEntireRScript, echo=false, include=false>>
#Insert code here
myTable<-dataframe(...)
myPlot<-qplot(....)
#
Then I go through and find the parts I want to report. For instance, if i want to put a table into the report, I'll cut the R block and put an xtable block in, and the same for variables and plots.
<<label=myEntireRScript, echo=false, include=false>>=
#Insert code here
#
Put any text I want before my table here, maybe with a \Sexpr{print(variable)} named variable
<<label=myTable, result=Tex>>=
myTable<-dataframe(...)
print(xtable(mytable,...),...)
#
Any text I want before my figure
<label=myplot, result=figure>>=
myPlot<-qplot(....)
print(qplot)
#
You may want to look at these related SO posts. The rest of my post relates to your question 2.
When creating reports with Sweave, I usually keep most of the R code and the report text separate. If the R code is fast to run, then I prefer I will include something like the following at the start of the .Rnw file:
<<>>
source('/path/to/script.r')
#
On the other hand, if the R code takes a long time, I will often include something like the following at the end of the R script:
Sweave('/path/to/report.Rnw'); system('pdflatex report.tex')
That way, I can re-generate the report quickly, without needing to run all the R code again. Then, the only work R has to do in the Sweave file is print tables, make graphs and maybe extract a few figures.
Like nullglob, I prefer to keep the R and Sweave files separate, but I prefer to save the workspace with save.image() rather than to source() the file. This avoids running the R calculations with each .Rnw file compiling (and I always end up tinkering with the typesetting more than I'd like).
My general work flow is to do each paper/project in it's own folder with it's own R file(s). When the calculation side is "done", I save.image() to store all the workspace variables as-is.
Then, in the .Rnw file in the same directory I set the working directory with setwd() and load all variables with load(".Rdata"). Of course, you can change the name you use for your workspace, but I do one workspace per folder and keep the default name. Oh, and if you tinker with the R file, be sure save the workspace image and watch out for variables that linger in the workspace and .Rnw file, but are no longer part of the R file... this is where the save.image() approach can cause some headaches.
I am on a Mac and I suggest TextMate if you're mildly geeky and emacs/ess if you're really geeky. I use vim and command line R, but emacs/ess works best for most. If you're in this for the long haul, I doubt you'll regret learning emacs/ess for R, Sweave, and LaTeX.

Resources