Put figure directly into Knitr document (without saving file of it in folder) Part 2 - r

I am extending a question I recently posted here (Put figure directly into Knitr document (without saving file of it in folder)).
I am writing an R package that generates a .pdf file for users that outputs summarizations of data. I have a .Rnw script in the package (here, my MWE of it is called test.Rnw). The user can do:
1) knit("test.Rnw") to create a test.tex file
2) "pdflatex test.tex" to create the test.pdf summary output file.
The .Rnw file generates many images. Originally, these all got saved in the current directory. These images being saved to the directory (or maybe the .aux or .log files that get created upon calling pdflatex on the .tex file) just does not seem as tidy as it could be (since users must remember to delete these image files). Secondarily, I also worry that this untidiness may cause issues when scripts are run multiple time.
So, in my previous post, we improved the .Rnw file by saving the images to a temporary folder. I have been told the files in the temporary folder get deleted each time a new R session is opened. However, I still worry about certain things:
1) I feel I may need to insert a line, like the one on line 19:
system(sprintf("%s", paste0("rm -r ", temppath, "/*")))
to automatically delete the files in the temporary folder each time the .Rnw file is run (so that the images do not only get deleted each time R gets restarted). This will keep the current directory clean of the images, and the user will not have to remember to manually delete the images. However, I do not know if this "solution" will pass CRAN standards to have a line to delete files in the temporary folder. The reason is that it deletes files in the user's system, which could cause problems if other programs are writing files to the temporary folder. I feel I have read about CRAN not allowing files to be written/deleted from the user's computer for obvious reasons. How strict would CRAN be about such a practice? Is there a safe way to go about it?
2) If writing and deleting the image files in a temporary file will not work, what is another way to accomplish the same effect (run the script without having cumbersome image files created in the folder)? Is it possible to instead have the images directly embedded in the output file (not needing to be saved to any directory)? I am pretty sure this is not possible. However, I have been told it is possible to do so with .Rmd, and that I could convert my .Rnw to .Rmd. This may be difficult because the .Rnw file must follow certain formats (text and margins) for the correct output, and it is very long. Is it possible to make use of the .Rmd capability (of inserting images directly into the output) only for the chunks that generate images, without rewriting the entire .Rnw file?
Below is my MWE:
\documentclass[nohyper]{tufte-handout}
\usepackage{tabularx}
\usepackage{longtable}
\setcaptionfont{% changes caption font characteristics
\normalfont\footnotesize
\color{black}% <-- set color here
}
\begin{document}
<<setup, echo=FALSE>>=
library(knitr)
library(xtable)
library(ggplot2)
# Specify directory for figure output in a temporary directory
temppath <- tempdir()
# Erase all files in this temp directory first?
#system(sprintf("%s", paste0("rm -r ", temppath, "/*")))
opts_chunk$set(fig.path = temppath)
#
<<diamondData, echo=FALSE, fig.env = "marginfigure", out.width="0.95\\linewidth", fig.cap = "The diamond dataset has varibles depth and price.",fig.lp="mar:">>=
print(qplot(depth,price,data=diamonds))
#
<<echo=FALSE,results='asis'>>=
myDF <- data.frame(a = rnorm(1:10), b = letters[1:10])
print(xtable(myDF, caption= 'This data frame shows ten random variables from the distribution and a corresponding letter', label='tab:dataFrame'), floating = FALSE, tabular.environment = "longtable", include.rownames=FALSE)
#
Figure \ref{mar:diamondData} shows the diamonds data set, with the
variables price and depth.Table \ref{tab:dataFrame} shows letters a through j
corresponding to a random variable from a normal distribution.
\end{document}

Related

Rmd re-run all blocks that read_csv

I have an Rmd document with several code blocks that call read_csv on many different csv's. The Rmd makes various graphs and tables and I use cache=TRUE to speed up the rendering.
Another program produces the CSVs and generates different results depending on different experiments and configurations. So I want the Rmd to reload those CSVs which have changed and use the cache for those that have not.
At the moment I have a block parameter {r lastrun=100} on each code block that has a read_csv, and i search for lastrun and replace e.g. to 101 for those blocks that have read_csv i think should be reread, but I'd like this to be automated somehow.
so right now my Rmd looks like:
```{r lastrun=100}
a<-map_dfr(paste0(1:10, ".a.csv"), read_csv)
```
lots of text and including code blocks
```{r}
#whatever
print(f(a))
```
```{r lastrun=100}
b<-map_dfr(paste0(1:20, ".b.csv"), read_csv)
```
So on rendering that, if later any of *.a.csv or *.b.csv change, then i have to search/replace lastrun or i'll just see the stale cached versions of a and b. I want a and b to update when the files change (i don't need to be able to identify the exact file and only reread that one, just reloading the block would be fine.)
How can this be done?
Thanks
-Neal
Here's a trick that might work:
---
title: 'nothing to report'
output: html_document
---
```{r setup}
files <- c("file1", "file2")
fileinfo <- file.info(files)
```
```{r test1, file=fileinfo['file1',-6], cache = TRUE}
Sys.sleep(5)
```
```{r test2, file=fileinfo['file2',-6], cache = TRUE}
Sys.sleep(5)
```
To test:
Render this, it should take 10 seconds.
Render it again, it should be instantaneous.
Create one or both of the files file1 and file2.
Render it again, it should take 5 seconds (if one created) or 10 seconds (both).
Touch one of the files (either updating its mtime or add/replace contents, doesn't matter).
Render it again, there should be a pause.
Notes:
files do not need to exist at first; NA row contents (in fileinfo) is just as changing when files "appear"
files can be in the current directory, a subdirectory, or any absolute path, the name of the row is the name set within my files variable
with [,-6], I'm omitting atime, which indicates when something "looks at the file"; this updates with each file.info, so obviously this would defeat the purpose of the cache (invalidating on each test)
other than that, it is currently invalidating on a change in: size, isdir, mode, mtime, ctime, and exe, all fields within file.info; you can easily choose specific fields (instead of omitting one, as I did above);
one could have a chunk depend on multiple files, you either need to be creative in the file= used in order to grab multiple rows, or your setup block should know how to aggregate summaries
if this is on a network drive, sometimes things like mtime and such are less reliable; in that case, one might consider using md5sum somehow
(You will likely want the code in the {r setup} block to be hidden.)

R package to knit a markdown document given some data

I am writing a basic R package that reads in data from a user specified database and spits out a markdown report with predefined graphs and tables etc. I have placed the .Rmd file in the R folder, and have a user level function that reads in the data and knits it.
# create_doc.R
create_doc <- function(directory = NULL,
database_name_name) {
if (is.null(directory)) directory <- tclvalue(tkchooseDirectory(
title = "Choose Folder for Input and Output"))
rmarkdown::render("R/doc_generator.Rmd", output_dir = directory)
}
This works fine on my computer, but when I build the package, the .Rmd file has been deleted. This means I can't give it to other users for use on other computers. I realise that the R folder may not be the correct place for this file (I guess it deletes any files not ending in .R), but I'm not sure where else to put it. It is not package documentation, it creates the end result when using the package.
Googling has not helped so far. Is it possible to knit a document using a function in an R package? If yes, what am I doing wrong. If no, are there any other suggestions on how to achieve this?

How can I avoid hardcoding a file path?

I am using RStudio to knit an .Rnw file to .pdf. This .Rnw file is stored in directory that is under git version control. This directory also contains a .RProj file for the project.
I collaborate with colleagues who don't know the first thing about .Rnw files and git. These colleagues want to open a Word file and track change their hearts out. So I give the people what they want.
Everyone needs access, so storing the Word file on a cloud service like Box makes sense. In the past I created a subfolder in my repo that I shared—keeping everything within the root directory—but this time around I needed to store the file in a shared folder that someone else created. So my solution was to copy the Word file from this shared directory to my repository.
Technical Approach
I don't know how to make this a reproducible problem, but hopefully you will give me some latitude since I'm trying to make my work fully reproducible ;)
Let's say that my .Rnw file is stored in repoRoot/subfolder. Since knitr changes the working directory to subfolder where this .Rnw file is located, the first chunk sets the root.dir one level up at the project root.
<<knitr, include=FALSE>>=
library(knitr)
opts_knit$set(root.dir=normalizePath('../')) # go up 1 level
#
The next chunk copies the Word file from the shared folder to my git repo and runs the analysis file. The shared directory path is hard coded to my machine, which is the problem I'm writing for your help solving.
file.copy(from='/Users/ericpgreen/Box Sync/Project/Paper/draft.docx',
to='subfolder/draft.docx', # my repo
overwrite=TRUE)
source(scripts/analysis.R) # generates objects we reference in the .docx file
After adding \begin{document}, I include a chunk where I convert the .docx file to .txt and then rename it to .Rnw.
# convert docx to txt
system("textutil -convert txt 'subfolder/draft.docx'")
# rename txt to .Rnw
file.rename('subfolder/draft.txt',
'subfolder/draft.Rnw')
The next child chunk calls this .Rnw file that contains the text of the Word file with references to R objects included through \Sexpr{}:
<<include-draft, child='draft.Rnw', include=FALSE>>=
#
This works just fine for me. Whenever I knit the .Rnw file it grabs the latest version of the .docx file that my colleagues have edited (complete with track changes and comments) and, in a later step not shown here, returns the .pdf file to the shared folder.
Problem to Solve
This setup meets almost every need for me, except that the initial file.copy() command is hard coded to my machine. So if someone in my group clones my repo (e.g., research assistants who DO use version control), it won't run out of the box. Is there a workaround to hard coding in this type of case?
Ultimately you won’t get around hard-coding paths that are outside your control, such as paths to network shares. What you can and should avoid is hard-coding these paths in your documents.
Instead, relegate them to configuration files and/or environment variables (which, again, will be controlle by configuration files, to with .bashrc and similar). The simplest approach is then to use
network_share_path = Sys.getenv('NETWORK_SHARE_PATH',
stop('no network share path configured'))
file.copy(from = network_share_path, to = 'subfolder/draft.docx', overwrite = TRUE)

Rstudio is deleting key files when I knit (both PDF and HTML)

So I am having an R nightmare. I've returned to a project I built under the previous iteration (or perhaps one more) of RStudio. I produced a workable report that I was asked to update, and my current bugbear wasn't around then. Here is what happens:
My report file is "ISS Time Series.Rmd". It calls three other files:
"mystyles.sty", which updates the LaTeX preamble to use some additional packages.
"functions.R" and "load.R". The former contains frequently used functions I've written, and the latter loads the data I'm using.
I source the two R functions in the .Rmd file. When I try to Knit the report, whether I get an error or am successful, my two .R files and my one .sty file are deleted. And not just deleted -- gone for good.
I do not know what is up. I have ruined my previous work simply by returning to examine the original file.
Please, somebody has to help me here. My workflow is shot to hell if I have to write every last bit of code over and over again in each report.
UPDATE: Even copying the files to another directory doesn't help.
Here is the code block that calls the "load.R" file:
```{r loaddata}
#
# ------- Load Data
#
# This section loads the ISS survey files one at a time and saves them as
# read.SPSS objects within a list. It names these eleven objects as "ISS 2002",
# "ISS 2003", etc... until "ISS 2012". This file may be prohibitively large.
#
source("load.R") # Loads the ISS Survey files
```
Rename your file to ISS_Time_Series.Rmd and try again.
It is the spaces in the document name that makes rmarkdown::render() delete the files that have been loaded or sourced.
A an issue has already been filed. See https://github.com/rstudio/rmarkdown/issues/580

How to display images in Markdown on github generated from knitr without using external image hosting?

I like uploading repositories to github that include multiple R Markdown and Markdown files.
Here is an example of such a markdown file on github. And here's a screen grab.
The problem is that images do not display. You can click on the image, and you will go to where the file is stored.
The file referenced is:
https://github.com/... /blob/.../myfigure.png
whereas I presume it needs to reference
https://github.com/... /raw/.../myfigure.png
Things I considered:
imgur: I could use external image hosting (e.g., see this example) by adding the following code:
```{r setup}
opts_knit$set(upload.fun = imgur_upload) # upload all images to imgur.com
````
However, for various reasons I don't want to do this (I have trouble uploading when behind a firewall; it's slow; it creates an unnecessary dependency)
Rpubs: There's also RPubs which is quite cool. However, at time of posting it seems more suited to single markdown documents rather than multiple R markdown documents. And it doesn't provide such a close link between source R Markdown and the Markdown document.
Question
Is there a workflow for using R Markdown and knitr to produce Markdown files which when uploaded to github permit the Markdown file to display images stored in the github repository?
This used to be part of the minimal example, use
opts_knit$set(base.url='https://github.com/.../raw/.../')
See the changes here and here.
Also see http://yihui.name/knitr/options.
EDIT [with update to restore base.url to former value
Regarding switching, you could define a function as
create_gitpath <- function(user, repo, branch = 'master'){
paste0(paste('https://github.com', user, repo, 'raw', branch, sep = '/'),'/')
}
my_repo <- create_gitpath(user, repo)
knit.github <- function(..., git_url ){
old_url <- opts_knit$get('base.url')
on.exit(opts_knit$set(base.url = old_url))
opts_knit$set(base.url = git_url)
knit(..., envir = parent.frame())
}
Run with knit until you want to push to github then run knit.github(..., git_url = my_repo)
What about the following code at the beginning of your markdown file?
``` {r setup,echo=FALSE,message=FALSE}
gitsubdir <- paste(tail(strsplit(getwd(),"/")[[1]],1),"/",sep="")
gitrep <- "https://github.com/mpiktas/myliuduomenis.lt"
gitbranch <- "master"
opts_knit$set(base.url=paste(gitrep,"raw",gitbranch,gitsubdir,sep="/"))
```
It is possible to tweak it so that gitrep and gitbranch will be reported by git. Here I assumed that I am one directory level below the main git repository directory. Again this might be tweaked to accommodate more complicated scenarios.
I've tested on github, here is the Rmd file and corresponding md file.

Resources