Linking/importing externalized source code in org-mode - r

This paper inspired me to check out Emac's org-mode a bit and currently I try to assess what's more suitable for writing my documents: knitr/Sweave (I'm mainly using R to do my programming) or org-mode.
What I really like about knitr is the option to externalize the actual source (watch out: the declaration of labels/names in the R script seems to have changed from ## ---- label ------- to ## #knitr label; see ?read_chunk) and "link" them to the actual literate programming/reproducible research document (as opposed to actually writing the code in that very document):
"Import" with
<<import-external, cache=FALSE>>=
read_chunk('foo-bar.R') # has label/name 'foo-bar'
#
and "re-use" by referencing the respective labels with
<<foo-bar>>=
#
Question
Is this also possible in org-mode or am I bound to putting the actual code into the .org document?
I found this, but I did not find any specific notion of linking/importing external source code files and be able to execute them by having the linked code inside
#+BEGIN_SRC R
<linked code>
#+END_SRC
Background
I do see that this approach might contrast the general paradigm of literate programing to some extend. But I like to work in a somewhat "atomic" style and thus it feels more natural to me to keep the files separated at first and then mash everything together dynamically.

Would named code blocks help?
#+NAME: foo-bar
#+BEGIN_SRC R
source(foo-bar.R)
#+END_SRC
And then evaluate (i.e. load) the code when you actually need it:
#+CALL: foo-bar()
See manual for more details.

Related

RStudio: Code sections and compiled notebooks

I really like RStudio's code sections because they help me keep my code organized and makes navigation very easy. So, the code/comment line for code sections look like this (the key for RStudio to recognize the code section is that the comment should end with four or more #, - or =).
# Section One ---------------------------------
# Section Two =================================
### Section Three #############################
Now, I also find very useful "Compiling Notebooks from R Scripts" because you can produce reports directly from your script (which I find much more comfortable working with) without having to write an RMarkdown document. You simply have to use roxygen2-style comments (using #' as prefix) and they will be converted to markdown and/or use #+ or #- prefixes to control RMarkdown chunk options.
Now, I want the best of both worlds but I cannot find a way to get it. I want to have code sections (recognizable for RStudio) in the report (with the proper markdown for headings #, but without the extra #### at the end of the line). Seems like an impossible task to me. If I use #' prefix, I get the section in the report but RStudio does not recognize it as a code section. If I do not use roxygen2-style comments, RStudio recognizes the code section but it does not appear in the report (ok, it appears but as an ugly comment in the formatted code). The best solution I have found so far is to use the #+ prefix and simulate a code chunk label. That way I can get RStudio to recognize it as a code section, and it does not appear in the report as an ugly comment.
Any ideas on how to do this better?

is it possible to compile R latex via knitr in a modula way

Is there any way to compile knitr subfiles separately? What I have in mind is something like the package subfiles for latex just in combination with R/knitr/Sweave?
This would be great in case one has two exercises a first exercise with heavy computations and
don't want to compile the entire exercise always while working and testing the second one.
The patchDVI package does this for Sweave. I imagine it would be possible (maybe even easy) to modify it to do the same for knitr.
For example, in Sweave, you define variables in a chunk like so:
<<>>=
.TexRoot <- "main.tex"
.SweaveFiles <- c("subfile1.Rnw", "subfile2.Rnw")
#
and after Sweave is finished running that file, patchDVI will check whether the files subfile1.Rnw and subfile2.Rnw also need to be run, then will run LaTeX on the main.tex file once everything is up to date.
You don't need to do anything difficult, just use the cache options. Lots of details here, but it's probably as simple as specifying cache = T in the chunk options of your first exercise.

knitr: Referring to and evaluating external code chunks

How could I refer to and evaluate (in an .Rmd file) specific chunks of R code, located not in a different .Rmd file, but in an R module, containing chunks of code, tagged with ## #knitr chunk_name? Thanks!
I just figured out what the problem was: I have simply forgotten to call read_chunk() function for the R module, containing those external code chunks. So far, everything appears to be working, with the exception, mentioned below.
One problem I'm currently experiencing (and this might be a good separate question, but I'll leave it as is for now) is that knitr doesn't seem to respect working directory and paths, constructed on its basis, using relative paths, such as file.path(getwd(), "data/transform"). I think this contradicts with knitr design, which allows code reuse via chunks in external R modules. What are approaches that people are using to solve this peculiar situation? I believe that it might be a good idea to submit as a feature request.

Make Sweave or knitr put graphics suffix in `\includegraphics{}`

I just run into the (curious) problem that when submitting a (pdf)LaTeX manuscript to some Elsevier journal the filenames of figures needed to be complete in order to found by their pdf building and checking syste, i.e.:
\includegraphics{picture.pdf}
Is there any easy and convenient way to tell Sweave or knitr to do that?
Edit:
I'm familiar with sweave's include=FALSE option
I also feel quite capable to patch utils:::RweaveLatexRuncode
However, for the moment I'm hoping that there's something more convenient and elegant.
It's also about handing out the .Rnw files as supplementary material or vignettes. From a didactic point of view I don't like these tweaks that make the source code much more complicated for the new users of whom I hope they read it.
(Which is also why I really appreciate the recently introduced print=TRUE in Sweave)
You can modify the plot hook a little bit in knitr to add the file extension:
<<>>=
knit_hooks$set(plot = function(x, options) {
x = paste(x, collapse = '.') # x is file.ext now instead of c(file, ext)
paste0('\\end{kframe}', hook_plot_tex(x, options), '\\begin{kframe}')
})
#
See 033-file-extension.Rnw for a complete example. To understand what is going on behind the scene, see the source code of the default LaTeX hooks in knitr.
A brute force solution is to explicitly create the files yourself in the R snippet. Set the option for graphics etc to false but have the code evaluated so that the file is created, and then have latex call them with the very \includegraphics{} call you show.
I used similar schemes for simple caching: if the target file exists, skip the code creating.

Using org-mode to structure an analysis

I am trying to make better use of org-mode for my projects. I think literate programming is especially applicable to the realm of data analysis and org-mode lets us do some pretty awesome literate programming.
I think most of you will agree with me that the workflow for writing an analysis is different than most other types of programming. I don't just write a program, I explore the data. And, while many of these explorations are dead-ends, I don't want to delete/ignore them completely. I just don't want to re-run them every time I execute the org file. I also tend to find or develop chunks of useful code that I would like to put into an analytic template, but some of these chunks won't be relevant for every project and I'd like to know how to make org-mode ignore these chunks when I am executing the entire buffer. Here's a simplified example.
* Import
- I want org-mode to ignore import-sql.
#+srcname: import-data
#+begin_src R :exports none :noweb yes
<<import-csv>>
#+end_src
#+srcname: import-csv
#+begin_src R :exports none
data <- read.csv("foo-clean.csv")
#+end_src
#+srcname: import-sql
#+begin_src R :exports none
library(RSQLite)
blah blah blah
#+end_src
* Clean
- This is run on foo.csv, producing foo-clean.csv
- Fixes the mess of -9 and -13 to NA for my sanity.
- This only needs to be run once, and after that, reference.
- How can I tell org-mode to skip this?
#+srcname: clean-csv
#+begin_src sh :exports none
sed .....
#+end_src
* Explore
** Explore by a factor (1)
- Dead end. Did not pan out. Ignore.
- Produces a couple of charts showing there is not interaction.
#+srcname: explore-by-a-factor-1
#+begin_src R :exports none :noweb yes
#+end_src
** Explore by a factor (2)
- A useful exploration that I will reference later in a report.
- Produces a couple of charts showing the interaction of my variables.
#+srcname: explore-by-a-factor-2
#+begin_src R :exports none :noweb yes
#+end_src
I would like to be able to use org-babel-execute-buffer and have org-mode somehow know to skip over the code blocks import-sql, clean-csv and explore-by-a-factor-1. I want them in the org file, because they are relevant to the project. After-all, tomorrow someone might want to know why I was so sure explore-by-a-factor-1 was not useful. I want to keep that code around, so I can bang out the plot or the analysis or what-ever and go on, but not have it run every-time I rerun everything because there's no reason to run it. Ditto with the clean-csv stuff. I want it around, to document what I did to the data (and why), but I don't want to re-run it every time. I'll just import foo-clean.csv.
I Googled all over this and read a bunch of org-mode mailing list archives and I was able to find a couple of ideas, but not what I want. EXPORT_SELECT_TAGS, EXPORT_EXCLUDE_TAGS are great, when exporting the file. And the :tangle header works well, when creating the actual source files. I don't want to do either of these. I just want to execute the buffer. I would like to be able to define code blocks in a similar fashion to be executed or ignored. I guess I would like to find a way to have an org variable such as:
EXECUTE_SELECT_TAGS
This way I could simply tag my various code blocks and be done with it. It would be even nicer if I could then run the file, using only source blocks with specific tags. I can't find a way to do this and I thought I would ask before asking/begging for a new feature in org-mode.
I figured out. From the Org manual (since updated):
The :eval header argument can be used to limit the evaluation of specific code blocks. :eval accepts two arguments “never” and “query”. :eval never will ensure that a code block is never evaluated, this can be useful for protecting against the evaluation of dangerous code blocks. :eval query will require a query for every execution of a code block regardless of the value of the org-confirm-babel-evaluate variable.
So you just have to add :eval never to the header of the blocks that you don’t want to execute, and voilá!
While I never did get an answer to my question, the discussion was interesting and apparently an org-mode based Template for R strikes a few people as an interesting idea. I downloaded the source code to org-mode and looked at org-babel-execute-buffer. It is, as I feared, a naive function which does precisely what it says it does and nothing more. It is not (currently) possible to pass it any additional parameters to affect it's behavior. (Unless I am badly misreading the lisp, which is entirely possible.)
Eventually, I decided org-babel-execute-buffer is not necessary for a useful R template system. Babel's noweb functionality is really flexible and I think it is possible to build a workable solution using noweb, rather than trying to develop a complex tagging schema to define how/when to run things.
For tangling/export it should still be possible to use tags to create usable/sane output.
For anyone who is interested: LiterateR
It's probably a little rude to use this thread to put this out there but this is why I asked the question in the first place. TemplateR is my attempt to make R a little easier to use. Right now it is just a template with two simplistic functions. I consider it to be a proof of concept at this point. Eventually, I want to develop something that does more to help people develop R projects more quickly. TemplateR will accomplish this by:
1. Provide a strong structure to develop around.
2. Provide built-in function to provide support for common tasks, especially in the realm of reproducible research.
3. Provide snippets of tested code that can be rapidly re-purposed for the current project.
Right now, all it provides is a basic structure/framework and two simple functions.
1. Identify which R packages are missing (based on what is manually entered into a table) and
2. Creates project directories (plots, data, reports).
More will come in future versions. The README.org and TODO.org go into further detail.

Resources