Knitr global environment - r

I can't seem to get R markdown/knitr to see/use objects in my global environment in R.
From what I've read knitr should use the global environment as standard but every one of my objects I include in a code chunk returns the error
## Error: object 'XXX' not found
Am I missing something really simple here?
Do I need to manually load objects from the global environment first?
Thanks in advance
Marty

If you've already saved the object(s) to a file, then one clean approach for markdown purposes is as follows:
if(file.exists("rfModel.Rda")){
load("rfModel.Rda")} else {
modFit <- train(class~.,method="rf",data=train)
}
This effectively bypasses the lengthy model build time by only building it if it does not exist as an object yet, so that it preserves reproducibility. This is similar to the cache idea, but is more generalizable IMHO.

It sounds like you want the same code to work with both knitr and your global environment. This can be useful when building complicated Rmd files that require testing during construction.
The issue lies in that knitr uses the local folder when you press knit, and does not look for the project home folder (i.e. your Rproj - I am assuming you use relative paths). So when you go to run code it only works for one or the other. The approach around this is to write code in your Rmd using relative paths to the project folder (as you would in a normal R script), and redirect knitr to use the project home folder. To do this insert the following code at the top of your rmd script.
```{r setup, include=FALSE}
library(knitr)
dd <- getwd()
knitr::opts_knit$set(root.dir = paste0(dd,'/../../'))
knitr::opts_chunk$set(cache.path = paste0(dd,'/cache/'))
knitr::opts_chunk$set(fig.path = paste0(dd,'/figures/'))
```
This code does the following:
first, finds the current directory for your rmd.
second, sets the project root directory. I keep my files two folders in, hence the '/../../' , this will need to be adjusted for your folder structure.
third, you need to set the cache folder path manually, as the default setting no longer works, hence cache wont work.
finally, do the same for the figures folder, as again you need to overwrite the default.
Happy coding.

Related

How to use objects from global environment in Rstudio Markdown

I've seen similar questions on Stack Overflow but virtually no conclusive answers, and certainly no answer that worked for me.
What is the easiest way to access and use objects (regression fits, data frames, other objects) that are located in the global R environment in the Markdown (Rstudio) script.
I find it surprising that there is no easy solution to this, given the tendency of the RStudio team to make things comfortable and effective.
Thanks in advance.
For better or worse, this omission is intentional. Relying on objects created outside the document makes your document less reproducible--that is, if your document needs data in the global environment, you can't just give someone (or yourself in two years) the document and data files and let them recreate it themselves.
For this reason, and in order to perform the render in the background, RStudio actually creates a separate R session to render the document. That background R session cannot see any of the environments in the interactive R session you see in RStudio.
The best way around this problem is to take the code you used to create the contents of your global environment and move it inside your document (you can use echo = FALSE if you don't want it to show up in the document). This makes your document self-contained and reproducible.
If you can't do that, there are a few approaches you can take to use the data in the global environment directly:
Instead of using the Knit HTML button, type rmarkdown::render("your_doc.Rmd") at the R console. This will knit in the current session instead of a background session. Alternatively:
Save your global environment to an .Rdata file prior to rendering (use R's save function), and load it in your document.
Well, in my case i found the following solution:
(1) Save your Global Environmental in a .Rdata file inside the same folder where you have your .Rmd file. (You just need click at disquet picture that is on "Global Environmental" panel)
(2) Write the following code in your script of Rmarkdown:
load(file = "filename.RData") # it load the file that you saved before
and stop suffering.
Going to RStudio´s 'Tools' and 'Global options' and visiting the 'R Markdown' tab, you can make a selection in 'Evaluate chunks in directory', there select the option 'Documents' and the R Markdown knitting engine will be accessing the global environment as plain R code does. Hope this helps those who search this info!
The thread is old but in case anyone's still looking for a solution (as I was):
You can pass an envir parameter to the render() (or knit() function) so that it can access objects from the environment it was called from.
rmarkdown::render(
input = input_rmd,
output_file = output_file,
envir = parent.frame()
)
I have the same problem myself. Some stuff is pretty time consuming to reproduce every time.
I think there could be another answer. What if you save your environment with the save.image() function to a different file than the standard .Rdata one. Then, bring it back with load().
To be sure you are using the same data, use the md5sum() from tools.
Cheers, Cord
I think I solved this problem by referring to the package explicitly in the code that is being knitted. Using the yarrr package, for example, I loaded the dataframe "pirates" using data(pirates). This worked fine at the console and within an Rstudio code chunk, but with knitr it failed following the pattern in the question above. If, however, I loaded the data into memory by creating an object using pirates <- yarrr::pirates, the document then knitted cleanly to HTML.
You can load the script in the desired environment as follows:
```{r, include=FALSE}
source("your-script.R", local = knitr::knit_global())
# or sys.source("your-script.R", envir = knitr::knit_global())
```
Next in the R Markdown document, you can use objects created in these scripts (e.g., data objects or functions).
https://bookdown.org/yihui/rmarkdown-cookbook/source-script.html
One option that I have not yet seen is the use of parameters.
This chapter goes through a simple example of how to do this.

Need the filename of the Rnw when knitr runs in rStudio

When working on a reproducible research project, I would like to know the name of the Rnw file that is being run to use as an R variable.
This would be analogous to inserting an MSExcel filename in a footer
I am using RStudio Server on Ubuntu.
Thank you.
knitr now has a method for this: current_input()
https://github.com/yihui/knitr/issues/701
You can use the following two idioms to fetch the directory and name of the current file in knitr:
knitr:::.knitEnv$input.dir
knitr:::knit_concord$get("infile")
This is independent of RStudio, the knitr package is entirely responsible for this. These are private functions and variables that can change at any time without notice, so if you need something reliable, you might want to file an issue on GitHub.

how can i get the current file directory in R

I have seen many related answers here,but i didn't get a proper way to solve my problem under windows system...
I know the link the similar question
I got that setwd() can locate the directory what i want,however,my R script may move to another directory without any modification,so I want to know the current file directory,becase there are expression like source(...),this called source file and the execution file under the same parent directory in a R project,how I can do?
any help appreciated.
You can get your current directory using the getwd() function and give it a name, say:
cpath = getwd()
Another useful function is the file.path, which can help you specify new directories with simple syntax. For example, you want to get the directory that is one level "above" the current directory, you can use:
upp.dir = file.path("..", "cpath")
This gives upp.dir as "../Your_Current_Dir". How about changing to another folder (called Folder_A) in current directory? Use:
folderA = file.path("cpath", "Folder_A")
These may help easy navigate the file system.
Basically, if you write scripts and those scripts depend on where they are, then you are Doing It Wrong.
Write code in packages. Parameterise functions to make them generally applicable. If you have folders with data in, then make one of those parameters a folder.
A script called with source() cannot reliably locate itself, but that shouldn't be a problem, because WHATEVER CALLED THE SCRIPT knows where the script is (it has to, or how else can it call it?) so it could pass that as a parameter. Something like:
> youarehere = "C:\foo\"
> source("C:\foo\bar.R")
and now bar.R can do setwd(youarehere) and it will work, even if it is badly written such that it relies on sourcing other code in its containing folder.
Or you can do:
> setwd(youarehere)
> source("bar.R")
in your calling function.
But really, its a fail, its a sign of badly written code. Use functions, write packages, use devtools, its really not that hard, then your code will work anywhere and you wont be writing stupid scripts that are a twisty turny maze of source() calls.
Stay classy.

risks of using setwd() in a script?

I've heard it said that it is bad practice to use setwd() in a script.
What are the risks/dangers associated with it?
What are better alternatives?
It's an issue of reproducible code. If you specify a directory that doesn't exist on someone else's computer, then they can't use your code. This is particularly bad with absolute file paths, and particularly bad with Windows file paths (which are absolutely impossible to replicate on a Unix system).
My preferred solution is to specify that the user should be in the relevant directory on their own system before starting to run the code. If for your own convenience you want to put a setwd(...) right at the top of your code, where other people can notice it and comment it out as appropriate, but the rest of your code assumes only relative paths from that starting directory, that's OK with me.
Yihui Xie (author of knitr) feels particularly strongly about this:
https://groups.google.com/forum/?fromgroups=#!topic/knitr/knM0VWoexT0
Whenever you want to manipulate files, they are assumed to be under
the same directory of your source (e.g. Rnw documents). Then you can
always use relative paths and you will never need to setwd(). Using
setwd() contradicts with the principle of reproducibility, e.g. you
use setwd('foo/bar/') and the directory may not exist in other
people's computers. See FAQ 7:
https://github.com/yihui/knitr/blob/master/FAQ.md
And from the aforementioned FAQ 7:
You'd better not do this [change working directory inside knitr code
chunks]. Your working directory is always getwd() (all output files
will be written here), but the code chunks are evaluated under the
directory where your input document comes from. Changing working
directories while running R code is a bad practice in general. See #38
for a discussion. You should also try to avoid absolute directories
whenever possible (use relative directories instead), because it makes
things less reproducible.
See also: https://github.com/yihui/knitr/issues/38
I can't think of any particular issues with using setwd() in a script run on a server I manage as it does return an error which can be trapped with try(), and you can manage it. I have used setwd() when being lazy about paths - see below!
I use file.path() extensively in scripts production or otherwise. Working across the files in an input directory and putting the output graphics and reports elsewhere. So something along the lines of... (untested) This would be a bit tedious using setwd().
kInDir <- '~/Indir'
kOutDir <- '~/Outdir'
flist <- dir(path=kInDir, pattern='^[a-z]{2,5}\\.csv$')
# note I could have used full.names=T - but it's easier not to...
for (fnam in flist) {
# full path to the report file created
sfnam <- file.path(kOutDir, gsub('.csv', '_report.txt', fnam))
# full path to the csv file that will be created
ofnam <- file.path(kOutDir, gsub('.csv', '_b.csv', fnam))
#
# ok... we're going to process this CSV file...
r1 <- read.csv(file.path(kInDir, fnam))
#
# we''ll put the output from the analysis into this report file
sink(sfnam, split=TRUE)
# processs it... into a new data.frame k1
# blah blah blah...
#
write.csv(k1, file=ofnam, row.names=FALSE)
sink() # turn off this particular report file
}
Toward the better alternatives question:
I mainly use R for individual projects (meaning I'm the primary analyst). However, we do use these in projects which sometimes need to be shared with others.
RStudio - Projects
I have found RStudio's Projects functionality goes a long way to keeping your files organized. If other users also adopt RStudio, they will have the nice feeling of being able to open a single file ("*.Rproj") and have the project load in the same state you last saved it to.
ProjectTemplate
On top of this, I've found a new tool, ProjectTemplate that goes a step further! The technique the author developed is used to provide structure to what you are doing. Please go over to the website for more detail.
Though problems with setwd() have been targeted, I would like to add one more to the what are the alternatives part of the question. We often work with git where the relative path is very convenient
setrelwd <- function(rel_path){
curr_dir <- getwd()
abs_path <- file.path(curr_dir,rel_path)
if(dir.exists(abs_path)){
setwd(abs_path)
}
else
{
warning('Directory does not exist. Please create it first.')
}
}
> setrelwd("Summer2016")
Warning message:
In setrelwd("Summer2016") : Directory does not exist. Please create it first.
Also if you don't want to see the warning message but create a folder right away see Check existence of directory and create if doesn't exist
To make things a bit more portable where I work we all put this in a Rprofile
hdrive=
switch(Sys.info()[[1]],
'Linux'="/mnt/hdrive",
'Windows'="H:/",
"Darwin"="/Volumes/hdrive/mnt/hdrive"
)
So i always have that variable to get me to our shared drive. Then in my script we can write
setwd(paste(hdrive,"/relative/path/",sep="/"))
So that gets us around some of the problems that others are talking about.
I personally added the following code. I use Sys.info() and any() with unique information.
First step is to use Sys.info() and find the unique identifier for your computer.
if(any(Sys.info() == "COMPUTER1")) {
setwd("c:/Users/user1/repos/project/")
}
if(any(Sys.info() == "COMPUTER2")) {
setwd("home/user1/repos/project/")
}
and just add the name of the computer to the if statement and add the correct path. Just add a new if for each machine.
For reproduction it does not change anyone's working directory unless they are that specific user.

get filename and path of `source`d file

How can a sourced or Sweaved file find out its own path?
Background:
I work a lot with .R scripts or .Rnw files.
My projects are organized in a directory structure, but the path of the project's base directory frequently varies between different computers (e.g. because I just do parts of data analysis for someone else, and their directory structure is different from mine: I have projects base directories ~/Projects/StudentName/ or ~/Projects/Studentname/Projectname and most students who have just their one Project usually have it under ~/Measurements/ or ~/DataAnalysis/ or something the like - which wouldn't work for me).
So a line like
setwd (my.own.path ())
would be incredibly useful as it would allow to ensure the working directory is the base path of the project regardless of where that project actually is. Without the need that the user must think of setting the working directory.
Let me clarify: I look for a solution that works with pressing the editor's/IDE's source or Sweave Keyboard shortcut of the unthinking user.
Just FYI, knitr will setwd() to the dir of the input file when (and only when) evaluating the code chunks, i.e. if you call knit('path/to/input.Rnw'), the working dir will be temporarily switched to path/to/. If you want to know the input dir in code chunks, currently you can call an unexported function knitr:::input_dir() (I may export it in the future).
Starting from gsk3's Seb's suggestions, here's an idea:
the combination of username (login) and IP or name of the computer could be used to select the right directory.
That leads to something like:
setwd (switch (paste (Sys.info () [c ("user", "nodename")], collapse="."),
user.laptop = "~/Messungen",
user2.server = "~/Projekte/Projekt/",
))
So there is an automatic solution, that
works with source
works with Sweave
even works for interactive sessions where the commands are sent line by line
the combination of user and nodename of course needs to be specific
the paths need to be edited by hand, though.
Improvements welcome!
Update:
Gabor Grothendieck answered the following to a related question on r-help today:
this.dir <- dirname(parent.frame(2)$ofile)
setwd(this.dir)
which will work for source.
Another update: I now do most of the data analysis work in RStudio. RStudio's projects basically solve the problem: RStudio changes the working directory to the project root directory every time I switch between projects.
I can therefore put the project directory as far down my directory tree as I want (and the students can also put their copy wherever they want) and sync the data files and scripts/.Rnws via version control (We use a private git server). The RStudio project files are kept out of the version control, i.e. .gitignore contains .Rproj.user.
Obviously, within the project, the directory structure needs to be synchronized.
You can use sys.calls() to get the command used to source the file. Then you need a bit of trickery using regular expressions to get the pathname, bearing in mind that source("something/filename") could have used either the absolute or relative path. Here's a first attempt at putting all the pieces together: try inserting the following lines at the top of a source file.
whereFrom=sys.calls()[[1]]
# This should be an expression that looks something like
# source("pathname/myfilename.R")
whereFrom=as.character(whereFrom[2]) # get the pathname/filename
whereFrom=paste(getwd(),whereFrom,sep="/") # prefix it with the current working directory
pathnameIndex=gregexpr(".*/",whereFrom) # we want the string up to the final '/'
pathnameLength=attr(pathnameIndex[[1]],"match.length")
whereFrom=substr(whereFrom,1,pathnameLength-1)
print(whereFrom) # or "setwd(whereFrom)" to set the working directory
It's not very robust—for instance, it will fail on windows with source("pathname\\filename"), and I haven't tested what happens if you have one file sourcing another file—but you might be able to build a solution on top of this.
I have no direct solution how to obtain the directory of the file itself but if you have a limited range of directories and directory structures you can probably use
if(file.exists("c:/somedir")==TRUE){setwd("c:/somedir")}
You could check out the pattern of the directory in question and then set the dir. Does this help you?
An additional problem is that the working directory is a global variable, which can be changed by any script, so if your script calls another script, it will have to set the wd back. In RStudio I use Session -> Set Working Directory -> To Source File Location (I know, it's not ideal), and then my script does
wd = getwd ()
...
source ("mySubDir/myOtherScript.R", chdir=TRUE); setwd (wd)
...
source ("anotherSubDir/anotherScript.R", chdir=TRUE); setwd (wd)
In this way one can maintain a stack of working directories. I would love to see this implemented in the language itself.
This answer works for source and also inside nvim-R - I have no idea if it works with knitr and similar things. Any feedback appreciated.
If you have multiple scripts source-ing each other, it is important to get the correct one. That is, the largest i for which sys.frame(i)$ofile exists.
get.full.path.to.this.sourced.script = function() {
for(i in sys.nframe():1) { # Go through all the call frames,
# in *reverse* order.
x = sys.frame(i)$ofile
if(!is.null(x)) # if $ofile exists,
return(normalizePath(x)) # then return the full absolute path
}
}

Resources