Add script defining paths to .Rprofile - r

I am working on a project where I was hoping we'd be able to set file paths using the .Rprofile. I wrote a script that defines paths for specific users and since we're using renv and that puts an activate script into the profile anyways, I just added my paths.R script after that in the .Rprofile.
When I asked a colleague to open the .Rproj file to open the project, I thought it would run renv/activate.R and my paths.R script, but they're not seeing their defined paths, only mine. Am I missing something?
# contents of my .Rprofile
source("renv/activate.R")
source("paths.R")
# contents of path.R
if (Sys.info()["user"] == "francisco"){
data_dir <-
'path/to/data'
}
# etc
Any advice on workflow is welcome. Thanks!

Related

How to run R projects / use their relative paths from the terminal without setwd() resp. cd

I'm kinda lost on that one:
I have set up an R project, let's call it "Test Project.Rproj". The beauty of R projects is the possibility to use relative paths (relative to the .Rproj file). My project consists of a "main.R" script, which is saved on the same level as the .Rproj file.
Additionally I have a directory called 'Output', where I want my plots and exported data to be saved. My "main.R" file looks like the following:
my_df <- data.frame(A = 1:10, B = 11:20)
my_df |>
writexl::write_xlsx(here::here("Output",
paste0("my_df_",
stringr::str_replace_all(as.character(Sys.time()), ":", ""),
".xlsx")))
My final goal is to automate the execution of the 'main.R' file using the Windows Task Scheduler. But in order to do so, I have to be able to run the script from the terminal. The problem here is the working directory. When opening an R project, all the paths are relative to .Rproj file. But in the terminal the current working directory is <C:\Users\my_name>. Of course I could manually set the working directory via cd "path\to\my\project. But I would like to avoid that.
My current call for the execution of the main.R file in the terminal is the following:
"C:\Program Files\R\R-4.1.0\bin\Rscript" -e "source('C:/Users/my_name/path/to/my/project/main.R')"
My two ideas for a solution are the following, but I am happy for other suggestions as well.
In order to replicate the usual use of a project: Is there a way to execute the .Rproj
file from the terminal? In order to create a similar environment as in RStudio, where all the relative paths are working, when executing scripts from the project afterwards?
There are two packages adressing the problem of relative paths: rprojroot and here, where the former is the basis for the latter. I am pretty sure that here does not provide the needed functionality. I tried adding here::i_am("main.R) to my main.R file, but the project root directory still is not found when executing in the terminal from a working directory outside the project.
For rprojroot to work, I think it is also necessary to have your current working directory somewhere within the project. But this package offers a lot of functionality, so I am not sure wheter I am overlooking something.
So I would be happy about any help. Maybe it is impossible and I have to change the working directory manually - then I would be glad to know that as well.
Some links I used in my research:
https://www.tidyverse.org/blog/2017/12/workflow-vs-script/
https://malco.io/2018/11/05/why-should-i-use-the-here-package-when-i-m-already-using-projects/
http://jenrichmond.rbind.io/post/how-to-use-the-here-package/
Thanks a lot!
Edit: My current implementation is an additional R script, where I manually set the working directory via setwd() and source the main.R file. However it is always suggested to avoid setwd, which is why this whole question exists.

Unable to set working directory using here package in R to another location

I have a series of pieces of R code which have been designed to be run on other computers. That is, all code is relative to a root directory, which contains a Rstudio project file, .Rproj. There are no absolute file paths. This works fine when I actually open Rstudio, load the .Rproj file and then run the code.
However some of my code takes hours to run, and I need to set multiple scripts to run one after the other. This means creating a .sh file, and running the R script in turn from the command line. However non of my programs run successfully from the command line, as the root directory is no longer set to that of the .Rproj file. I have read about the here package can be used, which will automatically set the root directory to where ever a .here file is located. This is not the case for me.
The working directory it automatically uses is the home directory I have on the computational cluster I am using. The area where all my files, including the .Rproj and .here files is located in a different directory in which I have a lot more space allocated. Both are accessible from a common parent directory, so I assumed there here() function would be able to locate the directory I want to actually use to run my work. But this is not the case.
Effectively, I would like to set the root directory to a location which is not the default root directory on the system I am using. I have put a .here file there, but this is not located by there here() function, which I believe is its primary objective. Any ideas on how to proceed?
EDIT: I am working on a UNIX system. R version 3.4.2.
My problem was similar, but not exactly the same as yours. Perhaps my solution will work for you. When I opened an RStudio project, I found that if I called "library(here)", the root directory is set where the .Rproj file is located and that "set_here" would not change that directory, despite the 'here' package documentation. Perhaps I was doing something wrong, but I decided to solve the problem with a simple R function that moves up the directory tree until it finds a ".here" file. It then loads the "here" package and that sets the root directory where I want it.
I use "touch .here" in a Terminal outside of R to set my root directory, and then call "init_here()" from my newly opened R project:
init_here <- function() {
`%!in%` = Negate(`%in%`)
files <- dir( all.files = T )
while ( ".here" %!in% files & getwd()!="/" ) {
setwd("..")
files <- dir( all.files = T )
}
library(here)
}
Use Case -
In Unix:
cd( '~/myRoot' )
touch( '.here' )
In RStudio, when I open a project, the calls look like:
R version 4.0.2 (2020-06-22) -- "Taking Off Again"
< R information removed for clarity >
[Workspace loaded from ~/myRoot/myProject/.RData]
> getwd()
[1] "/Users/me/myRoot/myProject"
> init_here()
here() starts at /Users/me/myRoot
> here()
[1] "/Users/me/myRoot"
>
I can now put a ".here" file at the root of each of my RStudio projects and set the expected root directory independently for each project. If you want to get fancy, you could put the function in each project's .Rprofile so that it runs whenever the project is opened. All of my projects have the .Rproj file in the directory above my "R" directory, so my .Rprofile looks like:
source("./R/init_here.R")
init_here()
Hope that helps.
Did you try simply adding a cd /the/path/where/you/put/the/files command in your shell script?
According to this documentation, here() "uses a reasonable heuristics to find your project's files, based on the current working directory at the time when the package is loaded". The "cd" (change directory) command in a shell script changes the current working directory.

How do I use setwd in a relative way?

Our team uses R scripts in git repos that are shared between several people, across both Mac and Windows (and occasionally Linux) machines. This tends to lead to a bunch of really annoying lines at the top of scripts that look like this:
#path <- 'C:/data-work/project-a/data'
#path <- 'D:/my-stuff/project-a/data'
path = "~/projects/project-a/data"
#path = 'N:/work-projects/project-a/data'
#path <- "/work/project-a/data"
setwd(path)
To run the script, we have to comment/uncomment the correct path variable or the scripts won't run. This is annoying, untidy, and tends to be a bit of a mess in the commit history too.
In past I've got round this by using shell scripts to set directories relative to the script's location and skipping setwd entirely (and then using ./run-scripts.sh instead of Rscript process.R), but as we've got Windows users here, that won't work. Is there a better way to simplify these messy setwd() boilerplates in R?
(side note: in Python, I solve this by using the path library to get the location of the script file itself, and then build relative paths from that. But R doesn't seem to have a way to get the location of the running script's file?)
The answer is to not use setwd() at all, ever. R does things a bit different than Python, for sure, but this is one thing they have in common.
Instead, any scripts you're executing should assume they're being run from a common, top-level, root folder. When you launch a new R process, its working directory (i.e., what getwd() gives) is set to the same folder as the process was spawned from.
As an example, if you had this layout:
.
├── data
│   └── mydata.csv
└── scripts
└── analysis.R
You would run analysis.R from . and analysis.R would reference data/mydata.csv as "data/mydata.csv" (e.g., read.csv("data/mydata.csv, stringsAsFactors = FALSE)).
I would keep your shell scripts or Makefiles that run your R scripts and have the R scripts assume they're being run from the top level of the git repo.
This might look like:
cd . # Whereever `.` above is
Rscript scripts/analysis.R
Further reading:
https://www.tidyverse.org/articles/2017/12/workflow-vs-script/
https://github.com/jennybc/here_here
1) If you are looking for a way to find the path of the currently running script then see:
Rscript: Determine path of the executing script
2) Another approach is to require that users put an option of a prearranged name in their .Rprofile file. Then the script can setwd to that. An attractive aspect of this system is that over time one can forget where various projects are located and with this system one can just look at the .Rprofile file to remind oneself. For example, for projectA each person running the project would put this in their .Rprofile
options(projectA = "...whatever...")
and then the script would start off with:
proj <- getOption("projectA")
if (!is.null(proj)) setwd(proj) else stop("Set option 'projectA' to its directory")
One variation of this is to assume the current directory if projectA is not defined. Although this may seem to be more flexible I personally find the documenting feature of the above code to be a big advantage.
proj <- getOption("projectA")
if (!is.null(proj)) setwd(proj) else cat("Using", getwd(), "\n")
in Python, I solve this by using the path library to get the location of the script file itself, and then build relative paths from that. But R doesn't seem to have a way to get the location of the running script's file?
R itself unfortunately doesn’t have a way for this. But you can achieve the same result in either of two ways:
Use packages instead of scripts where you include code via source. Then you can use the solution outlined in amoeba’s answer. This works because the real issue is that R has no way of telling the source function where to look for scripts.
Use box::use instead of source. The ‘box’ package provides a module system that allows relative imports of code modules. A nice side-effect of this is that the package provides a function that tells you the path of the current script, just like in Python (and, just like in Python, you normally don’t need to use this function directly).

How to change .Rprofile location in RStudio

I am working with a "factory fresh" version of RStudio on Windows 7. R is installed under C:/Program Files which means the default libraries are stored here, and the two locations contained in .libPaths() on startup are both within this folder.
I want to work with another R library (igraph). Since the C:\Program Files folder is write-protected, I have set up another area to work in: C:\Users\nick\R and installed the igraph library in C:\Users\nick\R\library. I can manually add this location to the .libPaths() variable and use the library with no problems.
However, my problem is getting RStudio to automatically add this location to the .libPaths() variable on startup. I read that I could add the relevant command to my .Rprofile file - but I couldn't find any such file (presumably they are not automatically created when RStudio is installed). I then created a file called .Rprofile containing only this command. This only seemed to work when the .Rprofile file was saved in C:\Users\nick\Documents (which is the path stored in both the R_USER and HOME environmental variables). What I would like is to have the .Rprofile file stored in C:\Users\nick\R.
I have read all the information in ?Startup and it talks about where to store commands that run on startup. But I just can't make this work. For example there seems to be no way to change the location of the home directory without reading a file stored in the home directory. I don't seem to have any .Renviron files and creating these myself doesn't seem to work either.
I would really appreciate an answer in simple terms that explains how I could go about changing where the .Rprofile file is read from.
In Windows, you set the R_USER profile by opening up a command line and running:
SETX R_PROFILE_USER "C:/.../.Rprofile"
Where (obviously) the path is the path to your desired .Rpofile. In R, you can check that it worked:
Sys.getenv("R_PROFILE_USER")
Should return the path you specified. Note that you likely need to have all R sessions closed before setting the R_USER variable.

Locate the ".Rprofile" file generating default options

In R and RStudio, I think I have messed around with the .Rprofile file a few times, and I currently am loading up an old version of it upon startup of R or RStudio, is there a way that I can quickly find the location of the file that is generating the default options?
Thanks
Like #Gsee suggested, ?Startup has all you need. Note that there isn't just the user profile file, but also a site profile file you could have messed with. And that both files can be found in multiple locations.
You could run the following to list existing files on your system among those listed on the page:
candidates <- c( Sys.getenv("R_PROFILE"),
file.path(Sys.getenv("R_HOME"), "etc", "Rprofile.site"),
Sys.getenv("R_PROFILE_USER"),
file.path(getwd(), ".Rprofile"),
file.path(Sys.getenv("HOME"), ".Rprofile"))
Filter(file.exists, candidates)
Note that it should be run on a fresh session, right after your started R, so that getwd() will return the current directory at startup. There is also the tricky possibility that your profile files do modify the current directory at startup, in which case you would have to start a "no-profile" session (run R --no-site-file --no-init-file) before running the code above.

Resources