Our team uses R scripts in git repos that are shared between several people, across both Mac and Windows (and occasionally Linux) machines. This tends to lead to a bunch of really annoying lines at the top of scripts that look like this:
#path <- 'C:/data-work/project-a/data'
#path <- 'D:/my-stuff/project-a/data'
path = "~/projects/project-a/data"
#path = 'N:/work-projects/project-a/data'
#path <- "/work/project-a/data"
setwd(path)
To run the script, we have to comment/uncomment the correct path variable or the scripts won't run. This is annoying, untidy, and tends to be a bit of a mess in the commit history too.
In past I've got round this by using shell scripts to set directories relative to the script's location and skipping setwd entirely (and then using ./run-scripts.sh instead of Rscript process.R), but as we've got Windows users here, that won't work. Is there a better way to simplify these messy setwd() boilerplates in R?
(side note: in Python, I solve this by using the path library to get the location of the script file itself, and then build relative paths from that. But R doesn't seem to have a way to get the location of the running script's file?)
The answer is to not use setwd() at all, ever. R does things a bit different than Python, for sure, but this is one thing they have in common.
Instead, any scripts you're executing should assume they're being run from a common, top-level, root folder. When you launch a new R process, its working directory (i.e., what getwd() gives) is set to the same folder as the process was spawned from.
As an example, if you had this layout:
.
├── data
│ └── mydata.csv
└── scripts
└── analysis.R
You would run analysis.R from . and analysis.R would reference data/mydata.csv as "data/mydata.csv" (e.g., read.csv("data/mydata.csv, stringsAsFactors = FALSE)).
I would keep your shell scripts or Makefiles that run your R scripts and have the R scripts assume they're being run from the top level of the git repo.
This might look like:
cd . # Whereever `.` above is
Rscript scripts/analysis.R
Further reading:
https://www.tidyverse.org/articles/2017/12/workflow-vs-script/
https://github.com/jennybc/here_here
1) If you are looking for a way to find the path of the currently running script then see:
Rscript: Determine path of the executing script
2) Another approach is to require that users put an option of a prearranged name in their .Rprofile file. Then the script can setwd to that. An attractive aspect of this system is that over time one can forget where various projects are located and with this system one can just look at the .Rprofile file to remind oneself. For example, for projectA each person running the project would put this in their .Rprofile
options(projectA = "...whatever...")
and then the script would start off with:
proj <- getOption("projectA")
if (!is.null(proj)) setwd(proj) else stop("Set option 'projectA' to its directory")
One variation of this is to assume the current directory if projectA is not defined. Although this may seem to be more flexible I personally find the documenting feature of the above code to be a big advantage.
proj <- getOption("projectA")
if (!is.null(proj)) setwd(proj) else cat("Using", getwd(), "\n")
in Python, I solve this by using the path library to get the location of the script file itself, and then build relative paths from that. But R doesn't seem to have a way to get the location of the running script's file?
R itself unfortunately doesn’t have a way for this. But you can achieve the same result in either of two ways:
Use packages instead of scripts where you include code via source. Then you can use the solution outlined in amoeba’s answer. This works because the real issue is that R has no way of telling the source function where to look for scripts.
Use box::use instead of source. The ‘box’ package provides a module system that allows relative imports of code modules. A nice side-effect of this is that the package provides a function that tells you the path of the current script, just like in Python (and, just like in Python, you normally don’t need to use this function directly).
Related
I'm kinda lost on that one:
I have set up an R project, let's call it "Test Project.Rproj". The beauty of R projects is the possibility to use relative paths (relative to the .Rproj file). My project consists of a "main.R" script, which is saved on the same level as the .Rproj file.
Additionally I have a directory called 'Output', where I want my plots and exported data to be saved. My "main.R" file looks like the following:
my_df <- data.frame(A = 1:10, B = 11:20)
my_df |>
writexl::write_xlsx(here::here("Output",
paste0("my_df_",
stringr::str_replace_all(as.character(Sys.time()), ":", ""),
".xlsx")))
My final goal is to automate the execution of the 'main.R' file using the Windows Task Scheduler. But in order to do so, I have to be able to run the script from the terminal. The problem here is the working directory. When opening an R project, all the paths are relative to .Rproj file. But in the terminal the current working directory is <C:\Users\my_name>. Of course I could manually set the working directory via cd "path\to\my\project. But I would like to avoid that.
My current call for the execution of the main.R file in the terminal is the following:
"C:\Program Files\R\R-4.1.0\bin\Rscript" -e "source('C:/Users/my_name/path/to/my/project/main.R')"
My two ideas for a solution are the following, but I am happy for other suggestions as well.
In order to replicate the usual use of a project: Is there a way to execute the .Rproj
file from the terminal? In order to create a similar environment as in RStudio, where all the relative paths are working, when executing scripts from the project afterwards?
There are two packages adressing the problem of relative paths: rprojroot and here, where the former is the basis for the latter. I am pretty sure that here does not provide the needed functionality. I tried adding here::i_am("main.R) to my main.R file, but the project root directory still is not found when executing in the terminal from a working directory outside the project.
For rprojroot to work, I think it is also necessary to have your current working directory somewhere within the project. But this package offers a lot of functionality, so I am not sure wheter I am overlooking something.
So I would be happy about any help. Maybe it is impossible and I have to change the working directory manually - then I would be glad to know that as well.
Some links I used in my research:
https://www.tidyverse.org/blog/2017/12/workflow-vs-script/
https://malco.io/2018/11/05/why-should-i-use-the-here-package-when-i-m-already-using-projects/
http://jenrichmond.rbind.io/post/how-to-use-the-here-package/
Thanks a lot!
Edit: My current implementation is an additional R script, where I manually set the working directory via setwd() and source the main.R file. However it is always suggested to avoid setwd, which is why this whole question exists.
As a brand-new julia programmer, I am striving to tidy up my $HOME dir in order to avoid bloating it up with a tons of dotfiles, and at same time, define sane paths to julia using XDG Base Directory Specification. The steps defined in the Official instruction will generate the dir ~/.julia, which is rather unpleasant... So I am trying a way out to this problem following sane UNIX path definitions.
After reading the julia's environment variables and some discussions about it, I hit the bullet and tried to organize it in the following way (probably not the most optimized, so help me):
Firstly, I decided to put the extracted julia/ directory inside $XDG_DATA_HOME as it is user-specific data files, right? So a ran
mv julia-1.7.2 $XDG_DATA_HOME
The julia's documentation ask me to define the path to julia's bin/ dir as an env variable $JULIA_BINDIR. To do so, I defined in ~/.profile:
# The absolute path of the directory containing the Julia executable -> ~/.local/share/julia-1.7.2/bin
export JULIA_BINDIR="$XDG_DATA_HOME/julia-1.7.2/bin"
The documentation also ask me to put the julia bin in the $PATH. Hence, I symlink
ln -s $XDG_DATA_HOME/julia-1.7.2/bin/julia ~/.local/bin/julia
because User-specific executable files shall be stored in $HOME/.local/bin. This resolve the problem of putting the julia into the $PATH as long as ~/.local/bin is in it.
I would also like to put the data and config file of julia into $XDG_DATA_HOME and $XDG_CONFIG_HOME, respectively, which matches with the XDG Base Directory Specifications. So I added in my .profile the following env. variables:
# relative julia's data directory -> $JULIA_BINDIR/$DATAROOTDIR/julia/base -> ~/.local/share/julia/base
export DATAROOTDIR="../.."
# relative julia's config files -> $JULIA_BINDIR/$SYSCONFDIR/julia/startup.jl -> ~/.config/julia/startup.jl
export SYSCONFDIR="../../../../.config"
Both $DATAROOTDIR and $SYSCONFDIR are relative paths to the data directory and configuration file directory, respectively. With these environment variables, I am pointing the data dir at $XDG_DATA_HOME/julia and the config dir at $XDG_CONFIG_HOME/julia. To have such directories in theses paths, I symlinked again
ln -s $XDG_DATA_HOME/julia-1.7.2/share/julia $XDG_DATA_HOME/julia
ln -s $XDG_DATA_HOME/julia-1.7.2/etc/julia $XDG_CONFIG_HOME/julia
Moreover, I put the Julia history into $XDG_STATE_HOME as that path should be used to store "logs, history, recently used files", but I barely see UNIX users using this dir. I just added to .profile the following line
export JULIA_HISTORY="$XDG_STATE_HOME/julia"
At moment, when I open up julia on terminal in REPL mode, my computer does not generate ~/.julia/ (thanks God), but I bet my way have flaws, so any contribution is welcome.
Thanks in advance.
I am working on a project where I was hoping we'd be able to set file paths using the .Rprofile. I wrote a script that defines paths for specific users and since we're using renv and that puts an activate script into the profile anyways, I just added my paths.R script after that in the .Rprofile.
When I asked a colleague to open the .Rproj file to open the project, I thought it would run renv/activate.R and my paths.R script, but they're not seeing their defined paths, only mine. Am I missing something?
# contents of my .Rprofile
source("renv/activate.R")
source("paths.R")
# contents of path.R
if (Sys.info()["user"] == "francisco"){
data_dir <-
'path/to/data'
}
# etc
Any advice on workflow is welcome. Thanks!
I have a "copy deployed" installation of R with no R-specific environment variables set.
I want to source a file from within Rprofile.site:
source("how/to/find/this/file/my_settings.R")
Where could I place my my_settings.R file so that it can be found from within Rprofile.site no matter in which path Rprofile.site is installed?
BTW: I want to avoid an absolute path to my_settings.R this would work indeed. I'd prefer to use the same folder as Rprofile.site or a path relative to it to support copy deployment of R.
Edit 1: The problem is that getwd is always different depending on the current folder from which you start R
If you can't use absolute paths and that your working directory is not stable, one way is to use .libPaths or .Library.
By default your Rprofile should be in directory paste0(.Library,"/../etc/") or paste0(.libPaths()[2],"/etc/") so you can put your file there and source it with :
source(paste0(.Library,"/../etc/my_settings.R"))
source(paste0(.libPaths()[2],"/etc/my_settings.R"))
As far as I understand the first option is stable (I don't think one can change the value of .Library).
If you use the second option just make sure that if in the future you alter your .libPaths() you do it after sourcing your file.
See ?.libPaths for more info on default folders.
I am trying to get my head around programming with multiple modules (in different files). I don't want to load explicitly the files with ìnclude in the right order.
I am using the Atom IDE as my development platform, so I don't run julia explicitly.
when I am just using importall Datastructures (where ModuleName is the name of the module) julia complains:
LoadError: ArgumentError: Module Datastructures not found in current path.
Run `Pkg.add("Datastructures")` to install the Datastructures package.
while loading F:\dev\ai\Interpreter.jl, in expression starting on line 8
There are two ways to build a package or module in julia:
1) Use the tools in PkgDev. You can get them with Pkg.add("PkgDev") ; using PkgDev. Now you can use PkgDev.generate("MyPackageName", "MIT") (or whatever license you prefer) to build your package folder. By default, julia will build this folder in the same directory as all your other external packages. On Linux, this would ~/.julia/v0.6/ (or whatever version you are running). Also by default, this folder will be on the julia path, so you can just type using MyPackageName at the REPL to load it.
Note that julia essentially loads the package by looking for the file ~/.julia/v0.6/MyPackageName/src/MyPackageName.jl and then running it. If your module consists of multiple files, you should have all of them in the ~/.julia/v0.6/MyPackageName/src/ directory, and then have a line of code in the MyPackageName.jl file that says include("MyOtherFileOfCode.jl").
2) If you don't want to keep your package in ~/.julia/v0.6/ for some reason, or you don't want to build your package using PkgDev.generate(), you can of course just set the files up yourself.
Let's assume you want MyPackageName to be stored in the ~/MyCode directory. First, create the directory ~/MyCode/MyPackageName/. Within this directory, I strongly recommend using the same structure that julia and github use, i.e. store all your code in a directory called ~/MyCode/MyPackageName/src/.
At a minimum, you will need a file in this directory called ~/MyCode/MyPackageName/src/MyPackageName.jl (just like in the method above). This file should begin with module MyPackageName and finish with end. Then, put whatever you want in-between (including include calls to other files in the src directory if you wish).
The final step is to make sure that julia can find MyPackageName. To do this, you will need ~/MyCode to be on the julia path. To do this, use: push!(LOAD_PATH, "~/MyCode") or push!(LOAD_PATH, "~/MyCode/MyPackageName").
Maybe you don't want to have to run this command every time you want to access MyPackageName. No problem, you just need to add this line to your .juliarc.jl file, which is automatically run every time you start julia. On Linux, your .juliarc.jl file should be in your home directory, i.e. ~/.juliarc.jl. If it isn't there, you can just create it and put whatever code you want in there. If you're on a different OS, you'll have to google where to put your .juliarc.jl.
This answer turned out longer than I planned...