Using packrat libraries with knitr and the rstudio compile PDF button - r

As explained by Yihui Xie in this post, when one uses the Compile PDF button of the RStudio IDE to produce a PDF from a .Rnw file, knit() uses the globalenv() of a new R session. Is there a way that this new R session would use the packrat libraries of my project (even the version of knitr included in my packrat libraries) instead of my personal user libraries to ensure a maximum level of reproducibility? I guess that the new R session would have to be linked to the project itself, but I don't know how to do this efficiently.
I know I could directly use the knit() function instead of the Compile PDF button and, that way, knit() would use my current globalenv(), but I don't like this solution since it's less reproducible.

I think I got the problem myself, but I want to share with others who could confirm I'm right, and possibly help improve my solution.
My specific problem is that my .Rnw file is in a sub-directory of my whole project. When the Compile PDF button creates a new R session, it is created in this sub-directory, thus not finding the .Rprofile file that would initialize packrat. I think the easiest solution would be to create a .Rprofile file in my subdirectory which contains
temp <- getwd()
setwd("..")
source("packrat/init.R")
setwd(temp)
rm(temp)
I have to change the working directory at the project level before source("packrat/init.R") because the file itself refers to the directory...
Anybody can see a better solution?

P.,
I don't know if this solution works for even the knitr package, but I am 99% sure it works for all other packages as it seems to for me.
(I believe) I have a very similar problem. I have my project folder, but my working directory has always been the sub folder where my .rnw file is located, in a subdirectory of my project folder.
The link to Yihiu Xie's answer was very helpful.
Originally I wanted a project folder such as:
project-a/
working/
data/
datas.csv
analysis/
library.R
rscripts.R
rnw/
report.rnw
child/
preamble.rnw
packrat/
But I'm not sure if that is possible with packrat when my R library() calls are not in the working directory and packrat cannot parse the .rnw file (I call the library.R file from a chunck using source() in my .rnw file). A few notes:
I wanted to use a .Rproj file to open the project and have project-a/working as the working directory
If this was true then packrat can find the library.R script
But the .rnw file still defaults to its own working directory when compiling
I thought an .Rprofile with knitr::opts_knit$set(root.dir = "..") would work but I don't think it works for latex commands like input\, it defaults back to the directory containing the .rnw file
I thought this was insufficient because then you have two working directories, one for your r chunks and one for your latex!
Since .rnw always sets the working directory, I put my library.R script in the same directory as my .rnw file which creates the packrat folder in project-a/working/rnw. I am 99% sure this works because when I created the packrat folder in the project-a/working/rnw folder WITHOUT relocating the library.R file it received an error that no packages could be found and I could not compile the .rnw file.
project-a/
working/
data/
datas.csv
analysis/
rscripts.R
rnw/
report.rnw
library.R
packrat/
child/
preamble.rnw
Again, unless I am overlooking something or misunderstanding which packages are being used, this seems to have worked for me. Disclaimer here that I am relatively new to packrat.

Related

RStudio: Running .Rprofile in source file location

I have set up a workflow governed by a Makefile.
Under code/ I have multiple *.r scripts each typically responsible for creating one output file (typically an RData file but could also be csv exports or png images, any file in principle)
code/.Rprofile contains some helper functions to bootstrap the whole project directory system and sources some helper functions etc.
The scripts in code/ need this functionality to work properly.
RStudio has the convenient menu entry to set working directory to source file location.
But could I also make it run .Rprofile in that directory if found? Or really just start R a fresh from the directory of the source file?

Access R file functions from .Rmd file

I'm new in R and Rstudio and I'm making a little project. The fact is that i have on one hand an .R file with the code I want to execute. And on the other hand I've an .Rmd file that I should use to report my work, including the results of the execution of my code in the other file.
How can I access the results and/or functions from de .Rmd file to the .R file?
Thank you,
By default, your .Rmd file will have its working directory as wherever the .Rmd file is saved. You can use all of R's standard functions inside the .Rmd file, including source() to run a .R file. So if your files are in the same directory, you can include source("your_r_file.R") to run the .R file. If they are in different directories, you can use relative or absolute file paths (though you should try to avoid absolute file paths in case the .Rmd file is ever run on a different computer).
If you are using RStudio, I would strongly recommend using the "Projects" feature and the here package. The readme for the here package is quite good for explaining its benefits.
Source the R file in at the top of your .Rmd file like
```{r}
source("file-name.R")
```
and the functions/objects in that R file will be avilable

R issue when moving an rmd file from one project to another (working directory issue)

I have two projects in R. I've moved an .Rmd document from project 1 to project 2.
When I used the .Rmd file which I had moved to project 2 to try and read in some data I get the following error message:
cannot open file '"mydata.csv"': No such file or directoryError in file(file, "rt") : cannot open the connection.
This kind of error usually suggests to me it's a working directory issue, however when I run getwd() in the command line it's the correct working directory that is listed and points to where the csv is stored. I've also run getwd() within the rmd doc and again the wd is correct.
Does anyone else have this experience of moving one .Rmd file to another project and then it not working in the new project?
The code in the .Rmd file that I am trying to run is:
Data <- read.csv("mydata.csv", stringsAsFactors = T) and the data is definitely within the project and has the correct title, is a csv etc.
Has anyone else seen this issue when moving an RMarkdown document into another project before?
Thanks
This may not be the answer, but rmarkdown and knitr intentionally don't respect setwd(): the code in each block is run from the directory holding the .rmd file. So, if you've moved your .rmd file but are then using setwd() to change to the directory holding the data, that does not persist across code chunks.
If this is the issue, then one solution is to use the knitr options to set the root.dir to the data location:
opts_knit$set(root.dir = '/path/to/the/data')
See here.
Maybe not relevant but it seems to be the most likely explanation for what's happening here:
The project shouldn't really interfere with your code here. When opening the project it will set your working directory to the root location of the project. However, this shouldn't matter in this case since RMarkdown files automatically set the working directory to the location where the RMarkdown file is saved. You can see this when running getwd() once in the Console and once from the RMarkdown file via run current chunk.
The behavior is the same when the file is knitted. So except in the case when "mydata.csv" is in the same directory as the RMarkdown file, the code above won't work.
There are two workarounds: you can use relative or absolute paths to navigate to your data file. In a project I prefer relative paths. Let's say the rmd file is in a folder called "scripts" and your data file is in a folder called "data" and both are in the same project folder. Then this should work:
Data <- read.csv("../data/mydata.csv", stringsAsFactors = TRUE)
The other option, which I do not recommend, is to set the working diretory in the rmd file via:
opts_knit$set(root.dir = '../data/')
The reason why I wouldn't do that is because the working direcotry is only changed when knitting the document but using the rmd file interactivly, you have a different working directory (the location of the rmd file)
This is a great application of the here package for dealing with these types of issues.
here https://github.com/jennybc/here_here looks around the Rmd file (and all R files) for the .Rproj file and then uses that as an "anchor" to search for other files using relative references. If for instance you had the following project structure:
-data
|--mydata.csv
-src
|-00-import.R
|-01-make-graphs.R
|-02-do-analysis.R
-report
|--report.Rmd
-yourproject.Rproj
And you wanted to use mydata.csv in your report.Rmd you could do the following:
library(here)
dat <- read.csv(here("data", "mydata.csv"))
here will then convert this path to "~/Users/../data/mydata.csv" for you. Note that you have to be in the Rproject for this use case.
Here is such a great package and solves a lot of issues.

Ignore specific files in packrat search

I am creating reports using R, RStudio, knitr, and packrat. I have a project folder structure similar to below:
project_folder/
- packrat/
- .Rprofile
- analaysis_folder/
- library.R
- child.rnw
- data_folder/
- knitr_rnw_location/
- file.rnw
- .Rprofile
And have set up the .Rprofile with the appropriate lines in the main project_folder and the subdirectory of the .rnw file, according to the recommendations given in RStudio's Limitations and Caveats page.
When I run packrat::init() at the project_folder level, the packrat folder is set up. Then when I open the file.rnw the packrat library is all set up.
However, when I execute packrat::snapshot() it gives errors
Unable to tangle file knitr_rnw_location/file.rnw; cannot parse depndencies
and fails. Is there a way to tell packrat to ignore my .rnw files? All library() are called from separate .R scripts and are source() through the .rnw files. It also searches any variables declared in the knitr chunks and gives the error
Error in eval(x, envir = envir): object 'my_variable_name' not found
In the end, it does state
Snapshot written to "~/project_folder/packrat/packrat.lock"
So I can only assume that packrat::snapshot() was successful. Has anyone else run into the same issue when working with knitr and packrat?
Much appreciated,

R workspaces i.e. .R files

How do I start a new .R file default in a new session for new objects in that session?
Workspaces are .RData files, not .R files. .R files are source files, i.e. text files containing code.
It's a bit tricky. If you saved the workspace, then R saves two files in the current working directory : an .RData file with the objects and a .RHistory file with the history of commands. In earlier versions of R, this was saved in the R directory itself. With my version 2.11.1, it uses the desktop.
If you start up your R and it says : "[Previously saved workspace restored]", then it loaded the file ".RData" and ".RHistory" from the default working directory. You find that one by the command
getwd()
If it's not a desktop or so, then you can use
dir()
to see what's inside. For me that doesn't work, as I only have the file "desktop.ini" there (thank you, bloody Windoze).
Now there are 2 options : you manually rename the workspace, or use the command:
save.image(file="filename.RData")
to save the workspaces before you exit. Alternatively, you can set those options in the file Rprofile.site. This is a text file containing the code R has to run at startup. The file resides in the subdirectory /etc of your R directory. You can add to the bottom of the file something like :
fn <- paste("Wspace",Sys.Date(),sep="")
nfiles <- length(grep(paste(fn,".*.RData",sep=""),dir()))
fn <- paste(fn,"_",nfiles+1,".RData",sep="")
options(save.image.defaults=list(file=fn))
Beware: this doesn't do a thing if you save the workspace by clicking "yes" on the message box. You have to use the command
save.image()
right before you close your R-session. If you click "yes", it will still save the workspace as ".RData", so you'll have to rename it again.
I believe that you can save your current workspace using save.image(), which will default to the name ".RData". You can load a workspace simply using load().
If you're loading a pre-existing workspace and you don't want that to happen, rename or delete the .RData file in the current working directory.
If you want to have different projects with different workspaces, the easiest thing to do is create multiple directories.
There is no connection between sessions, objects and controlling files .R. In short: no need to.
You may enjoy walking through the worked example at the end of the Introduction to R - A Sample Session.
Fire up R in your preferred environment and execute the commands one-by-one.

Resources