How to put an R script into a dockerfile? - r

I am trying to add my R script into a dockerfile. The beginning of the file (loading base image, installing necessary packages) works fine when I run it in my terminal, but I keep getting this error when it gets to the line that contains the R script I want to run:
Step 15/17 : COPY /Users/emma/Documents/folder1/examples/question-1/model-1.r .
COPY failed: stat /var/lib/docker/tmp/docker-builder376572603/Users/emma/Documents/folder1/examples/question-1/model-1.r: no such file or directory
I am already running the terminal out of the "question-1" directory (my shell command looks like this) :
Emmas-MacBook-Air-2:question-1 emma$
and the R script file "model-1.r" is in that folder. What am I doing wrong in detailing the path to the R script? Do I have to somehow transform the script before adding it to the dockerfile?
Thank you

I believe, that you have to specify relative (to your build folder) path to copy from. Source:
Multiple resources may be specified but the paths of files and directories will be interpreted as relative to the source of the context of the build.
And file to copy should be inside context of the build. So if your Dockerfile is located in folder A, then the objects you would like to copy should be placed in the folder A or it's subfolders.

Related

How to run R projects / use their relative paths from the terminal without setwd() resp. cd

I'm kinda lost on that one:
I have set up an R project, let's call it "Test Project.Rproj". The beauty of R projects is the possibility to use relative paths (relative to the .Rproj file). My project consists of a "main.R" script, which is saved on the same level as the .Rproj file.
Additionally I have a directory called 'Output', where I want my plots and exported data to be saved. My "main.R" file looks like the following:
my_df <- data.frame(A = 1:10, B = 11:20)
my_df |>
writexl::write_xlsx(here::here("Output",
paste0("my_df_",
stringr::str_replace_all(as.character(Sys.time()), ":", ""),
".xlsx")))
My final goal is to automate the execution of the 'main.R' file using the Windows Task Scheduler. But in order to do so, I have to be able to run the script from the terminal. The problem here is the working directory. When opening an R project, all the paths are relative to .Rproj file. But in the terminal the current working directory is <C:\Users\my_name>. Of course I could manually set the working directory via cd "path\to\my\project. But I would like to avoid that.
My current call for the execution of the main.R file in the terminal is the following:
"C:\Program Files\R\R-4.1.0\bin\Rscript" -e "source('C:/Users/my_name/path/to/my/project/main.R')"
My two ideas for a solution are the following, but I am happy for other suggestions as well.
In order to replicate the usual use of a project: Is there a way to execute the .Rproj
file from the terminal? In order to create a similar environment as in RStudio, where all the relative paths are working, when executing scripts from the project afterwards?
There are two packages adressing the problem of relative paths: rprojroot and here, where the former is the basis for the latter. I am pretty sure that here does not provide the needed functionality. I tried adding here::i_am("main.R) to my main.R file, but the project root directory still is not found when executing in the terminal from a working directory outside the project.
For rprojroot to work, I think it is also necessary to have your current working directory somewhere within the project. But this package offers a lot of functionality, so I am not sure wheter I am overlooking something.
So I would be happy about any help. Maybe it is impossible and I have to change the working directory manually - then I would be glad to know that as well.
Some links I used in my research:
https://www.tidyverse.org/blog/2017/12/workflow-vs-script/
https://malco.io/2018/11/05/why-should-i-use-the-here-package-when-i-m-already-using-projects/
http://jenrichmond.rbind.io/post/how-to-use-the-here-package/
Thanks a lot!
Edit: My current implementation is an additional R script, where I manually set the working directory via setwd() and source the main.R file. However it is always suggested to avoid setwd, which is why this whole question exists.

RStudio: Running .Rprofile in source file location

I have set up a workflow governed by a Makefile.
Under code/ I have multiple *.r scripts each typically responsible for creating one output file (typically an RData file but could also be csv exports or png images, any file in principle)
code/.Rprofile contains some helper functions to bootstrap the whole project directory system and sources some helper functions etc.
The scripts in code/ need this functionality to work properly.
RStudio has the convenient menu entry to set working directory to source file location.
But could I also make it run .Rprofile in that directory if found? Or really just start R a fresh from the directory of the source file?

Unable to set working directory using here package in R to another location

I have a series of pieces of R code which have been designed to be run on other computers. That is, all code is relative to a root directory, which contains a Rstudio project file, .Rproj. There are no absolute file paths. This works fine when I actually open Rstudio, load the .Rproj file and then run the code.
However some of my code takes hours to run, and I need to set multiple scripts to run one after the other. This means creating a .sh file, and running the R script in turn from the command line. However non of my programs run successfully from the command line, as the root directory is no longer set to that of the .Rproj file. I have read about the here package can be used, which will automatically set the root directory to where ever a .here file is located. This is not the case for me.
The working directory it automatically uses is the home directory I have on the computational cluster I am using. The area where all my files, including the .Rproj and .here files is located in a different directory in which I have a lot more space allocated. Both are accessible from a common parent directory, so I assumed there here() function would be able to locate the directory I want to actually use to run my work. But this is not the case.
Effectively, I would like to set the root directory to a location which is not the default root directory on the system I am using. I have put a .here file there, but this is not located by there here() function, which I believe is its primary objective. Any ideas on how to proceed?
EDIT: I am working on a UNIX system. R version 3.4.2.
My problem was similar, but not exactly the same as yours. Perhaps my solution will work for you. When I opened an RStudio project, I found that if I called "library(here)", the root directory is set where the .Rproj file is located and that "set_here" would not change that directory, despite the 'here' package documentation. Perhaps I was doing something wrong, but I decided to solve the problem with a simple R function that moves up the directory tree until it finds a ".here" file. It then loads the "here" package and that sets the root directory where I want it.
I use "touch .here" in a Terminal outside of R to set my root directory, and then call "init_here()" from my newly opened R project:
init_here <- function() {
`%!in%` = Negate(`%in%`)
files <- dir( all.files = T )
while ( ".here" %!in% files & getwd()!="/" ) {
setwd("..")
files <- dir( all.files = T )
}
library(here)
}
Use Case -
In Unix:
cd( '~/myRoot' )
touch( '.here' )
In RStudio, when I open a project, the calls look like:
R version 4.0.2 (2020-06-22) -- "Taking Off Again"
< R information removed for clarity >
[Workspace loaded from ~/myRoot/myProject/.RData]
> getwd()
[1] "/Users/me/myRoot/myProject"
> init_here()
here() starts at /Users/me/myRoot
> here()
[1] "/Users/me/myRoot"
>
I can now put a ".here" file at the root of each of my RStudio projects and set the expected root directory independently for each project. If you want to get fancy, you could put the function in each project's .Rprofile so that it runs whenever the project is opened. All of my projects have the .Rproj file in the directory above my "R" directory, so my .Rprofile looks like:
source("./R/init_here.R")
init_here()
Hope that helps.
Did you try simply adding a cd /the/path/where/you/put/the/files command in your shell script?
According to this documentation, here() "uses a reasonable heuristics to find your project's files, based on the current working directory at the time when the package is loaded". The "cd" (change directory) command in a shell script changes the current working directory.

Changing R options persistently

I want to change the prompt in R from > to R> and I know I should use the options command options(prompt="...") and it works, but then when I restart R the prompt is back to >.
Is there anyway to save the change so that it sticks?
Use .Rprofile file:
You can customize the R environment through a site initialization file or a directory initialization file. R will always source the Rprofile.site file first. On Windows, the file is in the C:\Program Files\R\R-n.n.n\etc directory. You can also place a .Rprofile file in any directory that you are going to run R from or in the user home directory.
At startup, R will source the Rprofile.site file. It will then look for a .Rprofile file to source in the current working directory. If it doesn't find it, it will look for one in the user's home directory. There are two special functions you can place in these files. .First( ) will be run at the start of the R session and .Last( ) will be run at the end of the session.
More details are here

How to change .Rprofile location in RStudio

I am working with a "factory fresh" version of RStudio on Windows 7. R is installed under C:/Program Files which means the default libraries are stored here, and the two locations contained in .libPaths() on startup are both within this folder.
I want to work with another R library (igraph). Since the C:\Program Files folder is write-protected, I have set up another area to work in: C:\Users\nick\R and installed the igraph library in C:\Users\nick\R\library. I can manually add this location to the .libPaths() variable and use the library with no problems.
However, my problem is getting RStudio to automatically add this location to the .libPaths() variable on startup. I read that I could add the relevant command to my .Rprofile file - but I couldn't find any such file (presumably they are not automatically created when RStudio is installed). I then created a file called .Rprofile containing only this command. This only seemed to work when the .Rprofile file was saved in C:\Users\nick\Documents (which is the path stored in both the R_USER and HOME environmental variables). What I would like is to have the .Rprofile file stored in C:\Users\nick\R.
I have read all the information in ?Startup and it talks about where to store commands that run on startup. But I just can't make this work. For example there seems to be no way to change the location of the home directory without reading a file stored in the home directory. I don't seem to have any .Renviron files and creating these myself doesn't seem to work either.
I would really appreciate an answer in simple terms that explains how I could go about changing where the .Rprofile file is read from.
In Windows, you set the R_USER profile by opening up a command line and running:
SETX R_PROFILE_USER "C:/.../.Rprofile"
Where (obviously) the path is the path to your desired .Rpofile. In R, you can check that it worked:
Sys.getenv("R_PROFILE_USER")
Should return the path you specified. Note that you likely need to have all R sessions closed before setting the R_USER variable.

Resources