Accessing global variables fails - r

I was trying to implement simple authentication in R that would store credentials outside of the source code under revision control. I'm aware of the approach using options() and getOption(), but using it would force me to remove project-level .Rprofile from revision control. I prefer to use another approach, based on exporting credentials' environment variables via .bashrc of an R-associated Linux user (ruser) and then reading these credentials in project-specific .Rprofile into global variables like this:
CB_API_KEY <<- Sys.getenv("CB_API_KEY")
However, accessing such global variables in R modules fails with message "object 'CB_API_KEY' not found". I suspect that the reason is that I source .Rprofile via separate calling R CMD BATCH in Makefile. Executing R modules, where I attempt to access these global variables, is done again via Makefile by separate call to Rscript. Therefore, it appears to me that global environment of the first R session is lost, hence access failures. I would appreciate your comments and advice on this issue.
UPDATE: The following is the contents of project-specific .Rprofile as well as the project's top-level and sub-project-level Makefile files, correspondingly.
.Rprofile:
# Execute global R profile first...
source("~/.Rprofile")
# ...then local project R setup
# Retrieve SRDA (SourceForge) credentials
SRDA_USER <<- Sys.getenv("SRDA_USER")
SRDA_PASS <<- Sys.getenv("SRDA_PASS")
# Retrieve CrunchBase API key
CB_API_KEY <<- Sys.getenv("CB_API_KEY")
# Another approach is to use options() and getOption(),
# but it requires removing this file from source control
options(SRDA_USER = "XXX", SRDA_PASS = "YYY", CB_API_KEY = "ZZZ")
Top-level Makefile:
# Major variable definitions
PROJECT="diss-floss"
HOME_DIR="~/diss-floss"
REPORT={$(PROJECT)-slides}
COLLECTION_DIR=import
PREPARATION_DIR=prepare
ANALYSIS_DIR=analysis
RESULTS_DIR=results
PRESENTATION_DIR=present
RSCRIPT=Rscript
# Targets and rules
all: rprofile collection preparation analysis results presentation
rprofile:
R CMD BATCH ./.Rprofile
collection:
cd $(COLLECTION_DIR) && $(MAKE)
preparation: collection
cd $(PREPARATION_DIR) && $(MAKE)
analysis: preparation
cd $(ANALYSIS_DIR) && $(MAKE)
results: analysis
cd $(RESULTS_DIR) && $(MAKE)
presentation: results
cd $(PRESENTATION_DIR) && $(MAKE)
## Phony targets and rules (for commands that do not produce files)
#.html
.PHONY: demo clean
# run demo presentation slides
demo: presentation
# knitr(Markdown) => HTML page
# HTML5 presentation via RStudio/RPubs or Slidify
# OR
# Shiny app
# remove intermediate files
clean:
rm -f tmp*.bz2 *.Rdata
Sub-project-level Makefile:
# Major variable definitions
RSCRIPT=Rscript
#RSCRIPT=R CMD BATCH
R_OPTS=--no-save --no-restore --verbose
#R_OUT=> outputFile.Rout 2>&1
# --no-save --no-restore --verbose myRfile.R > outputFile.Rout 2>&1
# Targets and rules
collection: importFLOSSmole \
importSourceForge \
importAngelList \
importCrunchBase
importFLOSSmole: getFLOSSmoleDataXML.R
$(RSCRIPT) $(R_OPTS) $<
importSourceForge: getSourceForgeData.R
$(RSCRIPT) $(R_OPTS) $<
importAngelList: getAngelListData.R
$(RSCRIPT) $(R_OPTS) $<
importCrunchBase: getCrunchBaseDataAPI.R
$(RSCRIPT) $(R_OPTS) $<
.PHONY: clean
# remove intermediate files
clean:
rm -f tmp*.bz2 *.Rdata .Rout
Directory structure is typical:
+ `ruser` home directory
|____+ project's home directory
|____ `import` sub-directory
|____ project's other sub-directories
Thank you!

If I've understood this correctly, you have a shell script named Makefile which calls R in batch mode (R CMD BATCH) to source your .Rprofile file, and then later runs another R session via Rscript, where it can't find the variable that you definine in your .Rprofile file.
You are correct in noticing that global variables aren't persisted between sessions.
However, unless you are calling Rscript with the --no-init-file or --vanilla arguments, the .Rprofile file should be sourced on startup.
Adding messages to your .Rprofile will let you know if and when it gets called.

OK, looks like I figured out how to solve this problem. I created another .Rprofile in the project's sub-directory (import), where processing requires authentication, and moved there code retrieving environment variables' values into R global variables. I tested it and haven't seen the previous error message (knocking on wood!). Additional benefit is that now the code is more separated into functional areas and, thus, is cleaner.
Lesson learned (and the reason of all this trouble):
Any R session sources .Rprofile file in any [current] directory where it runs (I wrongfully assumed that R session sources .Rprofile file only once, on the initial run, and subsequent R sessions cannot do that).

Related

Rscript to use renv environment

How do I execute a command using RScript myfile.R so that it uses the renv environment of the project/directory it's in, NOT my default environment?
There are a couple ways:
Ensure your working directory is set to the root of your renv project, and that the renv project's auto-loader is active. (You can set up the auto-loader by calling renv::activate() from R in that project.)
In your script, explicitly call renv::load("/path/to/project") to load the requested project.
If neither of these methods suffice, please file an issue at https://github.com/rstudio/renv/issues.
I recently had a similar problem but the answer by #kevin-ushey was insufficient. Here's the background. I need to be able to run Rscript from any directory (Because I had several statistical models which were to be called from a Docker file, forcing a Docker file to have WORKDIR many times is just too cumbersome when you have long files with several Rscript calls. Moreover, some of these models are called several times in different bash files, making it cumbersome to cd to the directory before every Rscript call). We needed something akin to conda activate where any Rscript call would just use the activated 'renv environment' by default, regardless of what your working directory is. Here's a dummy example:
Install renv with install.packages('renv').
Create dummy folder with a dummy script containing the beepr library (just for the sake of the example) and initialize the renv environment:
mkdir ~/renv_test/
cd ~/renv_test/
echo "library(beepr); print('success')" >> test.R
Rscript -e "renv::init()"
Create a Docker image with the code below:
FROM rocker/r-base
ENV PROJ_ROOT='/usr/local/src/renv_test'
ENV RENV_DIR='/usr/local/.renv/'
COPY . $PROJ_ROOT
# Copy the projects renv infrastructure to RENV_DIR and remove all traces of renv from PROJ_ROOT
RUN mkdir -p $RENV_DIR/renv/ && \
cp $PROJ_ROOT/renv.lock $RENV_DIR && \
cp $PROJ_ROOT/renv/activate.R $RENV_DIR/renv/ && \
echo "source('renv/activate.R')" >> $RENV_DIR/.Rprofile && \
cd $RENV_DIR && \
Rscript -e "renv::restore()" && \
cd $PROJ_ROOT && Rscript -e "renv::deactivate()" && \
rm -rf renv/ renv.lock
# Set RENV_DIR's restore library as the default library
RUN echo $(cd $RENV_DIR && Rscript -e "cat(paste0('R_LIBS=', renv::paths\$library()), sep = '\n')") >> $HOME/.Renviron
# Run any script from any directory as if you had 'renv activated'
CMD Rscript $PROJ_ROOT/test.R
Here's a summary of the approach:
Copy the project to the docker image
Copy the renv infrastructure to a separate folder (here ~/.renv/) and restore the project there.
Eliminate all traces of renv from the project folder (this is so we don't mess up the path of the library if for some reason we execute a script from the root of this project).
Edit .Renviron so that it contains the restored library path in ~/.renv as the default library. This ensures that any new R session will use that library as the first option.
Execute any R scripts located in the project folder without having to cd or WORKDIR (docker) to the project folder.
If you build and run the previous Docker image, you should get a success statement even though we never cd to the project folder:
docker build -t renv_test .
docker run renv_test
[1] "success"
I believe a simpler way than the above answers:
Rscript -e 'renv::run("/path/to/myscript.R")'
It will pick up the renv environment from the base path. You can also specify the environment using the project parameter.

How to save output when running job on cluster using SLURM

I want to run an R script using SLURM. I have created the R script, "test.R" as shown:
print("Running the test script")
write.csv(head(mtcars), "mtcars_data_test.csv")
I created a bash script to run this R script "submit.sh"
#!/bin/bash
#sbatch --job-name=test.job
#sbatch --output=.out/abc.out
Rscript /home/abc/job_sub_test/test.R
And I submitted the job on the cluster
sbatch submit.sh
I am not sure where my output is saved. I looked in the home directory but no output file.
Edit
I also set my working directory in test.R, but nothing different.
setwd("/home/abc")
print("Running the test script")
write.csv(head(mtcars), "mtcars_data_test.csv")
When I run the script without SLURM Rscript test.R, it worked fine and saved the output according to the set path.
Slurm will set the job working directory to the directory which was the working directory when the sbatch command was issued.
Assuming the /home directory is mounted on all compute nodes, you can change explicitly the working directory with cd in the submission script, or setwd() in the R syntax. But that should not be necessary.
Three possibilities:
either the job did not start at all because of a configuration or hardware issue ; that you can find out with the sacct command, looking at the state column.
either the file was indeed created but on the compute node on a filesystem that is not shared; in that case the best option is to SSH to the compute node (which you can find out with sacct) and look for the file there; or
the script crashed and the file was not created at all, in that case you should look into the output file of the job (.out/abc.out). Beware that the .out directory must be present before the job starts, and that, as it starts with a ., it will be a hidden file, revealed in ls only with the -a argument.
The --output argument to sbatch is relative to the folder you submitted the job from. setwd inside the R script wouldn't affect it, because Slurm has already parsed that argument and started piping output to the file by the time the R script is running.
First, if you want the output to go to /home/abc/.out/ make sure you're in your homedir when you submit the script, or specify the full path to the --output argument.
Second, the .out folder has to exist; I tested this and Slurm does not create it if it doesn't.

If condition inside the %Files section on a SPEC file

I'm kinda a new to writing spec files and building RPM's. Currently I have one RPM that is supposed to deploy some files in 1 of 2 possible directories that will vary with the OS.
How can I, within the %files section, verify them? I can't use variable...I can't verify both paths because one will for sure fail...I tried to define a macro earlier in the %install section but it will be defined just once and won't be redefined on every RPM installation...
what can I do here?
Thanks
I had a similar situation where additional files were included in the RPM in case of a DEBUG build over and above all files in the RELEASE build.
The trick is to pass a list of files to %files alongwith a regular list of files below it:
%install
# Create a temporary file containing the list of files
EXTRA_FILES=$RPM_BUILD_ROOT/ExtraFiles.list
touch %{EXTRA_FILES}
# If building in DEBUG mode, then include additional test binaries in the package
%if %{build_mode} == "DEBUG"
# %{build_mode} is a variable that is passed to the spec file when invoked by the build script
# Like: rpmbuild --define "build_mode DEBUG"
echo path/to/file1 > %{EXTRA_FILES}
echo path/to/file2 >> %{EXTRA_FILES}
%endif
%files -f %{EXTRA_FILES}
path/to/release/file1
path/to/release/file2
In your case, you can leverage the %if conditional in the %install section, use the OS as a spec variable passed to rpmbuild (or detect it in the RPM spec itself) and then pass the file containing the list to %files
The %files section can have variables in it, but usually this would be something like your path that is defined so you don't have to repeat it a bunch. so %{long_path}/file_name, where long_path was defined earlier in the spec file. the %files section is all the information that goes into the RPM database, and is created when you build the RPM so you won't be able to change those values based on machine information when installed.
If you really want to do this, you could include a tar file inside of the main tarball that gets extracted depending on certain conditions (since the spec file is just bash). Now keep in mind this is an awful idea. The files won't be tracked by the RPM database, so when you remove the RPM these files will still exist.
In reality you should build two RPMs, this will allow for better support going forward into the future in the event you have to hand this off to someone, as well as preserving your own sanity a year from now when you need to update the RPM.
This is how I solved my problem
step 1 :
In Build section .. somewhere I wrote :
%build
.....
#check my condition here & if true define some macro
%define is_valid %( if [ -f /usr/bin/myfile ]; then echo "1" ; else echo "0"; fi )
#after his normal continuation
.....
...
Step 2: in install section
%install
......
#do something in that condition
if %is_valid
install -m 0644 <file>
%endif
#rest all your stuff
................
Step 3:in files section
%files
%if %is_valid
%{_dir}/<file>
%endif
That's it
It works.
PS : I cannot give you full code hence giving all useful snippet
Forrest suggests the best solution, but if that is not possible practical you can detect the OS version at runtime in the post-install section, move the script to the appropriate location, and then delete it post-uninstall, eg:
# rpm spec snippets
%define OS_version %(hacky os detection)
...
Source2: script.sh
...
%install
install %{_sourcedir}/script.sh %{buildroot}/some/known/location
...
%post
%if %{OS_version} == "..."
mv /some/known/location/script.sh /distro/specific/script.sh
%elif %{OS_version} == "..."
...
%preun
rm -rf /all/script/locations
Much more error prone than building different RPMs on different OSes, but will scale a little better if you need to support many different OSes.

Adding directory to PATH through Makefile

I'm having some trouble in exporting the PATH I've modified inside the Makefile into the current Terminal.
I'm trying to add to the PATH, the bin folder inside wherever the Makefile directory is.
Here's the relevant strip of the makefile:
PATH := $(shell pwd)/bin:$(PATH)
install:
mkdir -p ./bin
export PATH
echo $(PATH)
The echo prints it correctly but if I redo the echo in the terminal, the PATH remains the same.
Thanks in advance for the help.
If you're using GNU make, you need to explicitly export the PATH variable to the environment for subprocesses:
export PATH := $(shell pwd)/bin:$(PATH)
install:
mkdir -p ./bin
export PATH
echo $(PATH)
What you are trying to do is not possible. Make is running in another process than the shell in your terminal. Changes to the environment in the make process does not transfer to the shell.
Perhaps you are confusing the effect of the export statement. export does not export the values of the variables from the make process to the shell. Instead, export marks variables so they will be transfered any child processes of make. As far as I know there is no way to change the environment of the parent process (the shell where you started make is the parent process of the make process).
Perhaps this answers will make the concept of exporting variables to child processes a bit clearer.
Perhaps you can rely on the user to do it for you. Note the quoting
install_hint:
#echo "Execute this command at your shell prompt:"
#echo "export PATH=$(shell pwd)/bin:\$$PATH"

Call cmake from make to create Makefiles?

I am using cmake to build my project. For UNIX, I would like to type make from my project's root directory, and have cmake invoked to create the proper Makefiles (if they don't exist yet) and then build my project. I would like the cmake "internal" files (object files, cmake internal Makefiles, etc.) to be hidden (e.g. put in a .build directory) so it doesn't clutter my project directory.
My project has several sub-projects (in particular, a library, a user executable, and a unit test executable). I would like Makefiles (i.e. I type make and this happens) for each sub-project to execute cmake (as above) and build only that sub-project (with dependencies, so the library would be built from the executables' Makefiles, if needed). The resulting binary (.so library or executable) should be in the sub-project's directory.
I made a Makefile which does the main project bit somewhat well, though it feels somewhat hackish. I can't build specific targets using it, because my Makefile simply calls make in cmake's build directory.
Note that because the library is a sole dependency (and probably doesn't need to be build manually, and because I'm lazy) I omitted it in my Makefile.
BUILD_DIR := .build
.PHONY: all clean project-gui ${BUILD_DIR}/Makefile
all: project-gui project-test
clean:
#([ -d ${BUILD_DIR} ] && make -C ${BUILD_DIR} clean && rm -r ${BUILD_DIR}) || echo Nothing to clean
project-gui: ${BUILD_DIR}/Makefile
#make -C ${BUILD_DIR} project-gui
#cp ${BUILD_DIR}/project-gui/project-gui $#
project-test: ${BUILD_DIR}/Makefile
#make -C ${BUILD_DIR} project-test
#cp ${BUILD_DIR}/project-test/project-test $#
${BUILD_DIR}/Makefile:
#[ -d ${BUILD_DIR} ] || mkdir -p ${BUILD_DIR}
#[ -f ${BUILD_DIR}/Makefile ] || (cd ${BUILD_DIR} && cmake ${CMAKE_OPTS} ..)
If it helps, here's my project structure (if this is "wrong" please tell me -- I'm still learning cmake):
project/
project/CMakeLists.txt
project/common.cmake
project/Makefile -- see Makefile above for this; should be replaced with something better, building libproject, project-gui, and project-test
project/libproject/
project/libproject/CMakeLists.txt
project/libproject/libproject.so -- after build
project/libproject/Makefile -- doesn't exist yet; should build libproject only
project/libproject/source/
project/libproject/include/
project/project-gui/
project/project-gui/CMakeLists.txt
project/project-gui/Makefile -- doesn't exist yet; should build libproject then project-gui
project/project-gui/source/
project/project-gui/include/
project/project-test/
project/project-test/CMakeLists.txt
project/project-test/Makefile -- doesn't exist yet; should build libproject then project-test
project/project-test/source/
project/project-test/include/
If you haven't caught on yet, I'm basically looking for a way to build the project and sub-projects as if cmake wasn't there: as if my project consisted of only Makefiles. Can this be done? Is the solution elegant, or messy? Should I be trying to do something else instead?
Thanks!
If cmake is generating the makefiles, you can simply include the generated makefile in the master makefile, eg
# makefile
all: # Default
include $GENERATED
$GENERATED:$CMAKEFILE
# Generate the makefile here`
The included files are generated then make is restarted with the new included files. The included files should detail the targets, etc.
You should be able to change the location of used files using the vpath directive, see e.g. the Gnu make manual,
vpath %.o project/.build
else the tedious way is to rewrite the rules making note of the necessary directory.
Ed:
Perhaps we shouldn't use a flat makefile.
Try something like:
# makefile
all: gui test
clean:
$(MAKE) -f $(GUI-MAKE) clean
$(MAKE) -f $(TEST-MAKE) clean
gui:$(GUI-MAKE)
$(MAKE) -f $(GUI-MAKE) all
$(GUI-MAKE):$(GUI-CMAKE)
# Generate
# Same for test
This should work if the $(MAKE) -f $(GUI-MAKE) all command works on the command line, and we've hidden cmake in the generating target. You would have to copy any other targets to the master makefile as well, and take care running make in parallel.
Propagating object files through should involve something like
%.o:$(GUI-MAKE)
$(MAKE) -f $(GUI-MAKE) $#
although you'll probably get errors trying to make test objects

Resources