How can I avoid hardcoding a file path? - r

I am using RStudio to knit an .Rnw file to .pdf. This .Rnw file is stored in directory that is under git version control. This directory also contains a .RProj file for the project.
I collaborate with colleagues who don't know the first thing about .Rnw files and git. These colleagues want to open a Word file and track change their hearts out. So I give the people what they want.
Everyone needs access, so storing the Word file on a cloud service like Box makes sense. In the past I created a subfolder in my repo that I shared—keeping everything within the root directory—but this time around I needed to store the file in a shared folder that someone else created. So my solution was to copy the Word file from this shared directory to my repository.
Technical Approach
I don't know how to make this a reproducible problem, but hopefully you will give me some latitude since I'm trying to make my work fully reproducible ;)
Let's say that my .Rnw file is stored in repoRoot/subfolder. Since knitr changes the working directory to subfolder where this .Rnw file is located, the first chunk sets the root.dir one level up at the project root.
<<knitr, include=FALSE>>=
library(knitr)
opts_knit$set(root.dir=normalizePath('../')) # go up 1 level
#
The next chunk copies the Word file from the shared folder to my git repo and runs the analysis file. The shared directory path is hard coded to my machine, which is the problem I'm writing for your help solving.
file.copy(from='/Users/ericpgreen/Box Sync/Project/Paper/draft.docx',
to='subfolder/draft.docx', # my repo
overwrite=TRUE)
source(scripts/analysis.R) # generates objects we reference in the .docx file
After adding \begin{document}, I include a chunk where I convert the .docx file to .txt and then rename it to .Rnw.
# convert docx to txt
system("textutil -convert txt 'subfolder/draft.docx'")
# rename txt to .Rnw
file.rename('subfolder/draft.txt',
'subfolder/draft.Rnw')
The next child chunk calls this .Rnw file that contains the text of the Word file with references to R objects included through \Sexpr{}:
<<include-draft, child='draft.Rnw', include=FALSE>>=
#
This works just fine for me. Whenever I knit the .Rnw file it grabs the latest version of the .docx file that my colleagues have edited (complete with track changes and comments) and, in a later step not shown here, returns the .pdf file to the shared folder.
Problem to Solve
This setup meets almost every need for me, except that the initial file.copy() command is hard coded to my machine. So if someone in my group clones my repo (e.g., research assistants who DO use version control), it won't run out of the box. Is there a workaround to hard coding in this type of case?

Ultimately you won’t get around hard-coding paths that are outside your control, such as paths to network shares. What you can and should avoid is hard-coding these paths in your documents.
Instead, relegate them to configuration files and/or environment variables (which, again, will be controlle by configuration files, to with .bashrc and similar). The simplest approach is then to use
network_share_path = Sys.getenv('NETWORK_SHARE_PATH',
stop('no network share path configured'))
file.copy(from = network_share_path, to = 'subfolder/draft.docx', overwrite = TRUE)

Related

The system cannot find the file specified in Rmarkdown

I am doing my current project in RStudio Desktop. I am doing RMarkdown to later transfer. I am having some trouble getting an error of the system cannot find the file specified RMarkdown. At first, it says that the combined_databike was not found, but I literally did it in the same file already also you can see in the upper right all of the data frames which mention "combined_databike." This being said when I am trying to hit knit it gives me the error. Now the error says is at the tripdata_202006, which I cannot understand because I imported every file from tripdata_202006 to tripdata_202105 using the "import dataset."
I want to understand why is not working and how I bring a solution.
That's because the file you want to read is not in the folder where your project or in this case your rmarkdown file is.
As you can see in the file list of your console, in your directory you have 4 files, and you don't have the folder "Bike data" where the file 202006-divvy-tripdata.csv is located, according to the path that is in line 83 of your code.
Maybe to solve that, I think you have two options, the first one is to write the whole path to the folder location and then to the file. And the second option is to move the folder "Bike data" with the files you have in it, where your rmarkdown file is located.

Changing WD in RMarkdown

My R Markdown script was running well until I had to access documents from another WD than my RMarkdown-file. I tried to change the WD. But it doesn't take it.
I might access documents from different folder during my R Markdown Document. How can I be more flexible without coping my files into R Markdown folders? (that would be wasting space!)
Is this the right command?
knitr::opts_knit$set(root.dir = "C:/Users/Nadine/OneDrive/ZID_Kurse/Einführung/Kursmaterial")
And do I need to put it in the beginning of the document???
It just halts at the chunk with setWD() command.
Cheers,
Nadine
You probably do not need to change the working directory; just explain where to find the files relative to the project working directory. You can use an absolute path to the files on the filesystem, e.g.
list.files("C:/projects/another_project/data/")
Or you may try to use relative paths to navigate to files through the parent directory. e.g.
list.files("../another_project/data/")

How to import an external dataset into in a Moodle question?

I would like to import an external dataset using read.table() (or any other function for reading files) and then randomize or sample over it. The file is stored in a subfolder within the parent folder that contains the exercises *.rmd. I am working within a RStudio project. I tried placing the dataset in different levels of the folder structure. Using relative path did not work, but absolute paths did.
My folder structure is:
$home/project_name/exercises # It contains the RMD files
$home/project_name/exercises/data # It contains data files that I want to process
$home/project_name/datasets # this folder could eventually contain the dataset I want to process
To make this code more portable, I would like to know o the manage relative paths within *.Rmd for the knitting process.
The exercises are copied to a temporary directory and processed there. Hence, the easiest option is to copy these files to the temporary directory using include_supplement("file.csv"). By default this assumes that the file.csv is in the same directory that the exercise itself resides in. If it is in a subdirectory you can use include_supplement("file.csv", recursive = TRUE) and then subdirectories are searched recursively for file.csv.
After using include_supplement(), the copied file is available locally and can be processed with read.table() or also included in the exercise as a supplementary file. See http://www.R-exams.org/templates/Rlogo/ for a worked example. However, note that the Rlogo template explicitly specifies the directory from which the file should be copied. This is not necessary if that directory is the same as the exercise or a subdirectory.

Rstudio is deleting key files when I knit (both PDF and HTML)

So I am having an R nightmare. I've returned to a project I built under the previous iteration (or perhaps one more) of RStudio. I produced a workable report that I was asked to update, and my current bugbear wasn't around then. Here is what happens:
My report file is "ISS Time Series.Rmd". It calls three other files:
"mystyles.sty", which updates the LaTeX preamble to use some additional packages.
"functions.R" and "load.R". The former contains frequently used functions I've written, and the latter loads the data I'm using.
I source the two R functions in the .Rmd file. When I try to Knit the report, whether I get an error or am successful, my two .R files and my one .sty file are deleted. And not just deleted -- gone for good.
I do not know what is up. I have ruined my previous work simply by returning to examine the original file.
Please, somebody has to help me here. My workflow is shot to hell if I have to write every last bit of code over and over again in each report.
UPDATE: Even copying the files to another directory doesn't help.
Here is the code block that calls the "load.R" file:
```{r loaddata}
#
# ------- Load Data
#
# This section loads the ISS survey files one at a time and saves them as
# read.SPSS objects within a list. It names these eleven objects as "ISS 2002",
# "ISS 2003", etc... until "ISS 2012". This file may be prohibitively large.
#
source("load.R") # Loads the ISS Survey files
```
Rename your file to ISS_Time_Series.Rmd and try again.
It is the spaces in the document name that makes rmarkdown::render() delete the files that have been loaded or sourced.
A an issue has already been filed. See https://github.com/rstudio/rmarkdown/issues/580

Put figure directly into Knitr document (without saving file of it in folder) Part 2

I am extending a question I recently posted here (Put figure directly into Knitr document (without saving file of it in folder)).
I am writing an R package that generates a .pdf file for users that outputs summarizations of data. I have a .Rnw script in the package (here, my MWE of it is called test.Rnw). The user can do:
1) knit("test.Rnw") to create a test.tex file
2) "pdflatex test.tex" to create the test.pdf summary output file.
The .Rnw file generates many images. Originally, these all got saved in the current directory. These images being saved to the directory (or maybe the .aux or .log files that get created upon calling pdflatex on the .tex file) just does not seem as tidy as it could be (since users must remember to delete these image files). Secondarily, I also worry that this untidiness may cause issues when scripts are run multiple time.
So, in my previous post, we improved the .Rnw file by saving the images to a temporary folder. I have been told the files in the temporary folder get deleted each time a new R session is opened. However, I still worry about certain things:
1) I feel I may need to insert a line, like the one on line 19:
system(sprintf("%s", paste0("rm -r ", temppath, "/*")))
to automatically delete the files in the temporary folder each time the .Rnw file is run (so that the images do not only get deleted each time R gets restarted). This will keep the current directory clean of the images, and the user will not have to remember to manually delete the images. However, I do not know if this "solution" will pass CRAN standards to have a line to delete files in the temporary folder. The reason is that it deletes files in the user's system, which could cause problems if other programs are writing files to the temporary folder. I feel I have read about CRAN not allowing files to be written/deleted from the user's computer for obvious reasons. How strict would CRAN be about such a practice? Is there a safe way to go about it?
2) If writing and deleting the image files in a temporary file will not work, what is another way to accomplish the same effect (run the script without having cumbersome image files created in the folder)? Is it possible to instead have the images directly embedded in the output file (not needing to be saved to any directory)? I am pretty sure this is not possible. However, I have been told it is possible to do so with .Rmd, and that I could convert my .Rnw to .Rmd. This may be difficult because the .Rnw file must follow certain formats (text and margins) for the correct output, and it is very long. Is it possible to make use of the .Rmd capability (of inserting images directly into the output) only for the chunks that generate images, without rewriting the entire .Rnw file?
Below is my MWE:
\documentclass[nohyper]{tufte-handout}
\usepackage{tabularx}
\usepackage{longtable}
\setcaptionfont{% changes caption font characteristics
\normalfont\footnotesize
\color{black}% <-- set color here
}
\begin{document}
<<setup, echo=FALSE>>=
library(knitr)
library(xtable)
library(ggplot2)
# Specify directory for figure output in a temporary directory
temppath <- tempdir()
# Erase all files in this temp directory first?
#system(sprintf("%s", paste0("rm -r ", temppath, "/*")))
opts_chunk$set(fig.path = temppath)
#
<<diamondData, echo=FALSE, fig.env = "marginfigure", out.width="0.95\\linewidth", fig.cap = "The diamond dataset has varibles depth and price.",fig.lp="mar:">>=
print(qplot(depth,price,data=diamonds))
#
<<echo=FALSE,results='asis'>>=
myDF <- data.frame(a = rnorm(1:10), b = letters[1:10])
print(xtable(myDF, caption= 'This data frame shows ten random variables from the distribution and a corresponding letter', label='tab:dataFrame'), floating = FALSE, tabular.environment = "longtable", include.rownames=FALSE)
#
Figure \ref{mar:diamondData} shows the diamonds data set, with the
variables price and depth.Table \ref{tab:dataFrame} shows letters a through j
corresponding to a random variable from a normal distribution.
\end{document}

Resources