I am creating reports using R, RStudio, knitr, and packrat. I have a project folder structure similar to below:
project_folder/
- packrat/
- .Rprofile
- analaysis_folder/
- library.R
- child.rnw
- data_folder/
- knitr_rnw_location/
- file.rnw
- .Rprofile
And have set up the .Rprofile with the appropriate lines in the main project_folder and the subdirectory of the .rnw file, according to the recommendations given in RStudio's Limitations and Caveats page.
When I run packrat::init() at the project_folder level, the packrat folder is set up. Then when I open the file.rnw the packrat library is all set up.
However, when I execute packrat::snapshot() it gives errors
Unable to tangle file knitr_rnw_location/file.rnw; cannot parse depndencies
and fails. Is there a way to tell packrat to ignore my .rnw files? All library() are called from separate .R scripts and are source() through the .rnw files. It also searches any variables declared in the knitr chunks and gives the error
Error in eval(x, envir = envir): object 'my_variable_name' not found
In the end, it does state
Snapshot written to "~/project_folder/packrat/packrat.lock"
So I can only assume that packrat::snapshot() was successful. Has anyone else run into the same issue when working with knitr and packrat?
Much appreciated,
Related
I am looking for a way to make my R notebook-centric workflow be more reproducible and subsequently more easily containerized with Docker. For my medium-sized data analysis projects, I work with a very simple structure: a folder with an associated with an .Rproj and an index.html (that is a landing page for Github Pages) that holds other folders that have within them the notebooks, data, scripts, etc. This simple "1 GitHub repo = 1 Rproj" structure was also good for my nb.html files rendered by Github Pages.
.
└── notebooks_project
├── notebook_1
│ ├── notebook_1.Rmd
│ └── ...
├── notebook_2
│ ├── notebook_2.Rmd
│ └── ...
├── notebooks_project.Rproj
├── README.md
├── index.html
└── .gitignore
I wish to keep this workflow that utilizes R notebooks both as literate programming tools and control documents (see RMarkdown Driven Development), as it seems decently suited for medium reproducible analytic projects. Unfortunately, there is a lack of documentation about Rmd-centric workflows using renv, although it seems to be well integrated with it.
Frist, Yihui Xie hinted here that methods related to using renv for individual Rmd documents include: renv::activate(), renv::use(), and renv::embed(). The renv::activate() does ony a part of what renv::init() does: it loads the project and sources the init.R. From my understanding, it does this if a project was already initialized, but it acts like renv::init() if project was not initialized: discovers dependencies, copies them to renv global package cache, writes several files (.Rprofile, renv/activate.R, renv/.gitignore, .Rbuildignore). renv::use() works well within standalone R scripts where the script's dependencies are specified directly within that script and we need those packages automatically installed and loaded when the associated script is run. renv::embed() just embeds a compact representation of renv.lock into a code chunk of the notebook - it changes the .Rmd on render/save by adding the code chunk with dependencies and deletes the call to renv::embed(). As I understand it, using renv::embed() and renv::use() could be sufficient for a reproducible stand-alone notebook. Nevertheless, I don't mind having the lock file in the directory or keeping the renv library as long as they are all in the same directory.
Second, preparing for subsequent Binder or Docker requirements, using renv together with RStudio Package Manager. Grant McDermott provides some useful code here (that may go in the .Rprofile or in the .Rmd itself, I think) and provides the rationale for it:
The lockfile is references against RSPM as the default package
repository (i.e. where to download packages from), rather than one of
the usual CRAN mirrors. Among other things, this enables
time-travelling across different package versions and fast
installation of pre-compiled R package binaries on Linux.
Third, I'd like to use the here package to work with relative paths. It seems the way to go so that the notebooks can run when transferred or when running inside Docker container. Unfortunately, here::here() looks for the .Rproj and will find it in my upper level folder (i.e. notebooks_project). A .here file that may be placed with here::set_here() overrides this behavior making here::here() point to the notebook folder as intended (i.e. notebook1). Unfortunately, the .here file takes effect only on restarting the R session or running unloadNamespace("here") (documented here).
Here is what I have experimented with untill now:
---
title: "<br> R Notebook Template"
subtitle: "RMardown Report"
author: "<br> Claudiu Papasteri"
date: "`r format(Sys.time(), '%d %m %Y')`"
output:
html_notebook:
code_folding: hide
toc: true
toc_depth: 2
number_sections: true
theme: spacelab
highlight: tango
font-family: Arial
---
```{r setup, include = FALSE}
# Set renv activate the current project
renv::activate()
# Set default package source by operating system, so that we automatically pull in pre-built binary snapshots, rather than building from source.
# This can also be appended to .Rprofile
if (Sys.info()[["sysname"]] %in% c("Linux", "Windows")) { # For Linux and Windows use RStudio Package Manager (RSPM)
options(repos = c(RSPM = "https://packagemanager.rstudio.com/all/latest"))
} else {
# For Mac users, we default to installing from CRAN/MRAN instead, since RSPM does not yet support Mac binaries.
options(repos = c(CRAN = "https://cran.rstudio.com/"))
# options(renv.config.mran.enabled = TRUE) ## TRUE by default
}
options(renv.config.repos.override = getOption("repos"))
# Install (if necessary) & Load packages
packages <- c(
"tidyverse", "here"
)
renv::install(packages, prompt = FALSE) # install packages that are not in cache
renv::hydrate(update = FALSE) # install any packages used in the Rnotebook but not provided, do not update
renv::snapshot(prompt = FALSE)
# Set here to Rnotebook directory
here::set_here()
unloadNamespace("here") # need new R session or unload namespace for .here file to take precedence over .Rproj
rrRn_name <- fs::path_file(here::here())
# Set kintr options including root.dir pointing to the .here file in Rnotebook directory
knitr::opts_chunk$set(root.dir = here::here())
# ???
renv::use(lockfile = here::here("renv.lock"), attach = TRUE) # automatic provision an R library when Rnotebook is run and load packages
# renv::embed(path = here::here(rrRn_name), lockfile = here::here("renv.lock")) # if run this embeds the renv.lock inside the Rnotebook
renv::status()$synchronized
```
I'd like my nobooks to be able to run without code change both locally (where dependencies are already installed, cached and where the project was initialized) and when transferred to other systems. Each notebook should have its own renv settings.
I have many questions:
What's wrong with my renv sequence? Is calling renv::activate() on every run (both for initialization and after) the way to go? Should I use renv::use() instead of renv::install() and renv::hydrate()? Is renv::embed() better for a reproducible workflow even though every notebook folder should have its renv.lock and library? renv on activation also creates an .Rproj file (e.g. notebook1.Rproj) thus breaking my simple 1 repo = 1 Rproj - should this concern me?
The renv-RSPM workflow seems great, but is there any advantage of storing that script in the .Rprofile as opposed to having it within the Rmd itself?
Is ther a better way to use here? That unloadNamespace("here") seems hacky but it seems the only way to preserve a use for the .here files.
What's wrong with my renv sequence? Is calling renv::activate() on every run (both for initialization and after) the way to go? Should I use renv::use() instead of renv::install() and renv::hydrate()? Is renv::embed() better for a reproducible workflow even though every notebook folder should have its renv.lock and library?
If you already have a lockfile that you want to use + associate with your projects, then I would recommend just calling renv::restore(lockfile = "/path/to/lockfile"), rather than using renv::use() or renv::embed(). Those tools are specifically for the case where you don't want to use an external lockfile; that is, you'd rather embed your document's dependencies in the document itself.
The question about renv::restore() vs renv::install() comes down to whether you want the exact package versions as encoded in the lockfile, or whatever happens to be current / latest on the R package repositories visible to your session. I think the most typical workflow is something like:
Use renv::install(), renv::hydrate(), or other tools to install packages as you require them;
Confirm that your document is in a good, runnable state,
Call renv::snapshot() to "save" that state,
Use renv::restore() in future runs of your document to "load" that previously-saved state.
renv on activation also creates an .Rproj file (e.g. notebook1.Rproj) thus breaking my simple 1 repo = 1 Rproj - should this concern me?
If this is undesired behavior, you might want to file a bug report at https://github.com/rstudio/renv/issues, with a bit more context.
The renv-RSPM workflow seems great, but is there any advantage of storing that script in the .Rprofile as opposed to having it within the Rmd itself?
It just depends on how visible you want that configuration to be. Do you want it to be active for all R sessions launched in that project directory? If so, then it might belong in the .Rprofile. Do you only want it active for that particular R Markdown document? If so, it might be worth including there. (Bundling it in the R Markdown file also makes it easier to share, since you could then share just the R Markdown document without also needing to share the project / .Rprofile)
Is ther a better way to use here? That unloadNamespace("here") seems hacky but it seems the only way to preserve a use for the .here files.
If I understand correctly, you could just manually create a .here file yourself before loading the here package, e.g.
file.create("/path/to/.here")
library(here)
since that's all set_here() really does.
I have two projects in R. I've moved an .Rmd document from project 1 to project 2.
When I used the .Rmd file which I had moved to project 2 to try and read in some data I get the following error message:
cannot open file '"mydata.csv"': No such file or directoryError in file(file, "rt") : cannot open the connection.
This kind of error usually suggests to me it's a working directory issue, however when I run getwd() in the command line it's the correct working directory that is listed and points to where the csv is stored. I've also run getwd() within the rmd doc and again the wd is correct.
Does anyone else have this experience of moving one .Rmd file to another project and then it not working in the new project?
The code in the .Rmd file that I am trying to run is:
Data <- read.csv("mydata.csv", stringsAsFactors = T) and the data is definitely within the project and has the correct title, is a csv etc.
Has anyone else seen this issue when moving an RMarkdown document into another project before?
Thanks
This may not be the answer, but rmarkdown and knitr intentionally don't respect setwd(): the code in each block is run from the directory holding the .rmd file. So, if you've moved your .rmd file but are then using setwd() to change to the directory holding the data, that does not persist across code chunks.
If this is the issue, then one solution is to use the knitr options to set the root.dir to the data location:
opts_knit$set(root.dir = '/path/to/the/data')
See here.
Maybe not relevant but it seems to be the most likely explanation for what's happening here:
The project shouldn't really interfere with your code here. When opening the project it will set your working directory to the root location of the project. However, this shouldn't matter in this case since RMarkdown files automatically set the working directory to the location where the RMarkdown file is saved. You can see this when running getwd() once in the Console and once from the RMarkdown file via run current chunk.
The behavior is the same when the file is knitted. So except in the case when "mydata.csv" is in the same directory as the RMarkdown file, the code above won't work.
There are two workarounds: you can use relative or absolute paths to navigate to your data file. In a project I prefer relative paths. Let's say the rmd file is in a folder called "scripts" and your data file is in a folder called "data" and both are in the same project folder. Then this should work:
Data <- read.csv("../data/mydata.csv", stringsAsFactors = TRUE)
The other option, which I do not recommend, is to set the working diretory in the rmd file via:
opts_knit$set(root.dir = '../data/')
The reason why I wouldn't do that is because the working direcotry is only changed when knitting the document but using the rmd file interactivly, you have a different working directory (the location of the rmd file)
This is a great application of the here package for dealing with these types of issues.
here https://github.com/jennybc/here_here looks around the Rmd file (and all R files) for the .Rproj file and then uses that as an "anchor" to search for other files using relative references. If for instance you had the following project structure:
-data
|--mydata.csv
-src
|-00-import.R
|-01-make-graphs.R
|-02-do-analysis.R
-report
|--report.Rmd
-yourproject.Rproj
And you wanted to use mydata.csv in your report.Rmd you could do the following:
library(here)
dat <- read.csv(here("data", "mydata.csv"))
here will then convert this path to "~/Users/../data/mydata.csv" for you. Note that you have to be in the Rproject for this use case.
Here is such a great package and solves a lot of issues.
My R scripts perfectly work when running when in scripts file in R studio, but when I use the same scripts in R markdown get an error; file "does not exist in current working directory"
Both are in the same wd.
What may be the reason?.
Note: All my work do in google drive offline.
This is probably because R looks for files relative to the current working directory - by default Sys.getenv("HOME"), whereas knitr looks in the same directory as the Rmarkdown file.
The solution is to specify the correct full or relative path to files in the RMarkdown code.
This is what the new here R package was designed for. It always looks at the root of the R project directory (which is what "here" refers to). It doesn't matter if your Rmarkdown file is in a subdirectory.
library(here)
here("file_i_want.csv")
This will work the same regardless of if you use R scripts or Rmarkdown
More details here (pun intended):
https://github.com/jennybc/here_here
As explained by Yihui Xie in this post, when one uses the Compile PDF button of the RStudio IDE to produce a PDF from a .Rnw file, knit() uses the globalenv() of a new R session. Is there a way that this new R session would use the packrat libraries of my project (even the version of knitr included in my packrat libraries) instead of my personal user libraries to ensure a maximum level of reproducibility? I guess that the new R session would have to be linked to the project itself, but I don't know how to do this efficiently.
I know I could directly use the knit() function instead of the Compile PDF button and, that way, knit() would use my current globalenv(), but I don't like this solution since it's less reproducible.
I think I got the problem myself, but I want to share with others who could confirm I'm right, and possibly help improve my solution.
My specific problem is that my .Rnw file is in a sub-directory of my whole project. When the Compile PDF button creates a new R session, it is created in this sub-directory, thus not finding the .Rprofile file that would initialize packrat. I think the easiest solution would be to create a .Rprofile file in my subdirectory which contains
temp <- getwd()
setwd("..")
source("packrat/init.R")
setwd(temp)
rm(temp)
I have to change the working directory at the project level before source("packrat/init.R") because the file itself refers to the directory...
Anybody can see a better solution?
P.,
I don't know if this solution works for even the knitr package, but I am 99% sure it works for all other packages as it seems to for me.
(I believe) I have a very similar problem. I have my project folder, but my working directory has always been the sub folder where my .rnw file is located, in a subdirectory of my project folder.
The link to Yihiu Xie's answer was very helpful.
Originally I wanted a project folder such as:
project-a/
working/
data/
datas.csv
analysis/
library.R
rscripts.R
rnw/
report.rnw
child/
preamble.rnw
packrat/
But I'm not sure if that is possible with packrat when my R library() calls are not in the working directory and packrat cannot parse the .rnw file (I call the library.R file from a chunck using source() in my .rnw file). A few notes:
I wanted to use a .Rproj file to open the project and have project-a/working as the working directory
If this was true then packrat can find the library.R script
But the .rnw file still defaults to its own working directory when compiling
I thought an .Rprofile with knitr::opts_knit$set(root.dir = "..") would work but I don't think it works for latex commands like input\, it defaults back to the directory containing the .rnw file
I thought this was insufficient because then you have two working directories, one for your r chunks and one for your latex!
Since .rnw always sets the working directory, I put my library.R script in the same directory as my .rnw file which creates the packrat folder in project-a/working/rnw. I am 99% sure this works because when I created the packrat folder in the project-a/working/rnw folder WITHOUT relocating the library.R file it received an error that no packages could be found and I could not compile the .rnw file.
project-a/
working/
data/
datas.csv
analysis/
rscripts.R
rnw/
report.rnw
library.R
packrat/
child/
preamble.rnw
Again, unless I am overlooking something or misunderstanding which packages are being used, this seems to have worked for me. Disclaimer here that I am relatively new to packrat.
I'm using RStudio v0.96.331 with pdfTeX, Version 3.1415926-1.40.10 (TeX Live 2009/Debian).
I have a R project in the '/home/operacao/Myprojs/projName', which is my working directory.
Now, if i create a folder called 'reports' in '/home/operacao/Myprojs/projName/reports', and inside the sweave file (which is in the reports folder) use the code
setwd('/home/operacao/Myprojs/projName')
After loading some packages, i receive the error
Error in driver$finish(drobj) :
the output file 'my_report.tex' has disappeared
Calls: <Anonymous> -> <Anonymous>
Execution halted
But the file is in the folder, and the plots i made appear in the .pdf. The text output
does not appear.
Anyone knows why that happens? If i save the Sweave files in my directly in my working directory, everything works fine.
Thanks!
Probably RStudio requires you to set the working directory to the location which contains the Sweave file. Why do you need to set your working directory to another directory? You could use source to load any R code files which are in projName.