Use crontab to automate R script - r

I am attempting to automate a R script using Rstudio Server on ec2 machine.
The R script is working without errors. I then navigated to the terminal on RStudio Sever and attempted to run the R script using the command - Rscript "Rfilename" and it works.
At this point I created a shell script and placed the command above for running the R script in there. This shell command is also running fine - sh "shellfilename"
But when I try to schedule this shell command using crontab, it does not produce any result. I am using the following cron entry :
* * * * * /usr/bin/sh ./shellfilename.sh
I am using cron for the first time and need help debug what is going wrong. My intuition is that there is there is difference in the environments used by the command when I run it on terminal and when I use the same in crontab. In case it is relevant information - am doing all of this on a user account created for myself on this machine so would differ from admin account.
Can someone help resolve this issue? Thanks!

The issue arose due to relative paths used in the script for importing files and objects. Changing this to absolute path resolved the described issue.

Related

Close and reopen + run an R script from itself on Mac

I would like to write an R script that can close and reopen + run itself.
This is needed for an API query that I am trying to make and which seems to require me going through these steps once every hour to be able to make additional requests. I tried to use the source() function - and simply run my script from itself every hour- but with this the API keeps rejecting additional requests; it seems that actually closing and opening the program is necessary.
I also tried to use the system() command - as described here to actually open R and execute the script - but I was not able to figure out how to implement this in a Mac environment.
Would you have any suggestions on how to do this?
The usual way to run a script every x amount of time on a Unix system is a cronjob. I'm not familiar with macOS, but apparently it works just as on Linux.
Open the job list to edit it (you can of course use another editor instead of nano)
env EDITOR=nano crontab -e
Add a line/job to the file. This runs any command line command. You can use Rscript to run an R script.
0 * * * * Rscript "/path/to/your/script.R"
Exit and safe. This script should now run every hour.
If you want to change the timing check out crontab.guru. Check out this answer, if the cronjob reports Rscript to be missing.

Different behavior when Rscript & rstan is run as a cron job

I try to run an R script at regular intervals to update a webpage. The script runs fine when called from the terminal like this:
/usr/local/bin/Rscript /Users/me/path/myscript.R
However, if I try running it as a cron job, I get an error. I add the job to crontab like this:
46 10 * * * /usr/local/bin/Rscript '/Users/me/path/myscript.R' >> '/Users/me/path/mylog.log' 2>&1
The script does run in R, but aborts due to an error. Specifically, I fit some models using rstan, and get an initialization error. (The error only applies to some models, while others still run fine.) The initialization values are valid by definition, but do not seem to be used properly. It is like rstan is doing math differently (and wrong) when it is run through cron.
The session info from R is identical whether I run the script in the terminal or as a cron job. My question is what else might still differ depending on how the script is run. Could rstan be using a different version of C++ when run as a cron job? Are there other paths I may need to set to get this to work correctly?
Update: The script also works if I run it using R CMD BATCH in terminal, but not if I use R CMD BATCH in a cron job. Using launchd triggers the same issue. I also tried using CmdStan through cmdstanr, and the same same thing happens: Runs fine until added to a cron job.
Edit 2: The models I thought ran fine in cron, were not actually fine. The results were wrong, until I used the fix explained below.
It looks like I finally managed to fix this, and I'm posting my solution here for anyone who encounters the same problem.
I ran env in terminal to see my current user environment. I copy-pasted the full output to the top of my crontab file. (Simply adding the PATH variable was not sufficient. I suppose it was SHELL or perhaps both PATH and SHELL that did the trick, but I haven't explored this further.)
To edit my user's crontab, I ran crontab -e, then pressed i to edit the file, pasted everything from env at the top of the file, stopped editing by pressing ctrl + c, and quit by typing :wq and hitting enter.

Let Docker image build fail when R package installation returns error

I am trying to create a custom Docker image based on Rocker using Dockerfile. In the Dockerfile I am pulling my own R package from a custom GitLab server using:
RUN R -e "devtools::install_git('[custom gitlab server]', quiet = FALSE)"
Everything usually works, but I have noticed that when the GitLab server is down, or the machine running Docker is low on RAM memory, the package does not install correctly and returns an error message in the R console. This behavior is to be expected. However, Docker does not notice the error produced by R and continues evaluating the rest of the Dockerfile. I would like Docker to fail building the image when this occurs. In that way, I could ultimately prevent automatic deployment of the incomplete Docker container by Kubernetes.
So far I have thought of two potential solutions, but I am struggling with the execution:
R level: Wrap tryCatch() around devtools::install_git to catch the error. But then what? Use stop? Will this cause the Docker building process to stop as well? Could withCallingHandlers() be used?
Dockerfile level: Use a shell command to check for errors? I cannot find the contents of R --help as I do not have a Linux machine at the moment. So I am not sure of what R -e actually does (execute I presume) and which other commands could be passed along with R.
It seems that a similar issue is discussed here and here, but the I do not understand how they have solved it.
Thus how to make sure no Docker image ends up running on the Kubernetes cluster without the custom package?
The Docker build process should stop once one of the commands in the Dockerfile returns a non zero status.
install_git doesn't seem to throw an error when the package wasn't installed successfully, so the execution keeps on.
An obvious way to go would be to wrap the installation inside a dedicated R script and throw an error if it didn't finish successfully, which would then stop the build.
So I would suggest something like this ...
Create installation script install_gitlab.R:
### file install_gitlab.R
## change repo- and package name!!
repo <- '[custom gitlab server]'
pkgname <- 'testpackage'
devtools::install_git(repo, quiet = FALSE)
stopifnot(pkgname %in% installed.packages()[,'Package'])
Modify your Dockerfile accordingly (replace the install_git line):
...
Add install_gitlab.R /runscripts/install_gitlab.R
RUN Rscript /runscripts/install_gitlab.R
...
One thing to keep in mind is, this approach assumes the package you're trying to install is NOT installed prior to calling the command.
If you're using a rocker image, they already have the littler package installed, which has the handy installGithub.r script. I believe it should already have the functionality you want. If not, it at least simplifies the running of the custom install_github.r script.
A docker RUN command using littler just looks like:
RUN installGithub.r "yourRepo"

R script as cronjob not executing

I've created a short R script that continuously downloads data from twitter using the streamR package. This script is supposed to run on a standard Amazon EC2 server running Ubuntu 14.04. When testing it in the standard command line, it runs fine. However, it is not run as specified in the cronjob. I used the following command:
sudo crontab -e
and then added the following line to the file
0 * * * * Rscript /home/mydirectory/docs/phd-research/data-collection/cron-script.R
in the hope that it would execute the R script every hour.
Is there anything I might have got wrong with the permissions? I've already checked that cron is running and I chmodded the directory the r script is supposed to write to to 775.
Thanks in advance!
OK - it turns out I used the wrong command to call up the crontab. The directory it was supposed to be working in was owned by user rather than root, so calling a root crontab didn't work.
The proper command for editing the crontab is the following:
crontab -u user -e
The Rscript didn't require the full path, it worked with both.
The $PATH$ is not known (i.e. the directory of Rscript is not part of the standard $PATH$) during the execution of a cronjob. Writing the full path to Rscript will work:
0 * * * * /.../Rscript /home/mydirectory/docs/phd-research/data-collection/cron-script.R

Schedule R script using cron

I am trying to schedule my R script using cron, but it is not working. It seems R can not find packages in cron. Anyone can help me? Thanks.
The following is my bash script
# source my profile
. /home/winie/.profile
# script.R will load packages
R CMD BATCH /home/script.R
Consider these tips
Use Rscript (or littler) rather than R CMD BATCH
Make sure the cron job is running as you
Make sure the script runs by itself
Test it a few times in verbose mode
My box is running the somewhat visible CRANberries via a cronjob calling an R script
(which I execute via littler but Rscript
should work just as well). For this, the entry in /etc/crontab on my Ubuntu server is
# every few hours, run cranberries
16 */3 * * * edd cd /home/edd/cranberries && ./cranberries.r
so every sixteen minutes past every third hour, a shell command is being run with my id. It changes into the working directory, and call the R script (which has executable modes etc).
Looking at this, I could actually just run the script and have setwd() command in it....

Resources