Different behavior when Rscript & rstan is run as a cron job - r

I try to run an R script at regular intervals to update a webpage. The script runs fine when called from the terminal like this:
/usr/local/bin/Rscript /Users/me/path/myscript.R
However, if I try running it as a cron job, I get an error. I add the job to crontab like this:
46 10 * * * /usr/local/bin/Rscript '/Users/me/path/myscript.R' >> '/Users/me/path/mylog.log' 2>&1
The script does run in R, but aborts due to an error. Specifically, I fit some models using rstan, and get an initialization error. (The error only applies to some models, while others still run fine.) The initialization values are valid by definition, but do not seem to be used properly. It is like rstan is doing math differently (and wrong) when it is run through cron.
The session info from R is identical whether I run the script in the terminal or as a cron job. My question is what else might still differ depending on how the script is run. Could rstan be using a different version of C++ when run as a cron job? Are there other paths I may need to set to get this to work correctly?
Update: The script also works if I run it using R CMD BATCH in terminal, but not if I use R CMD BATCH in a cron job. Using launchd triggers the same issue. I also tried using CmdStan through cmdstanr, and the same same thing happens: Runs fine until added to a cron job.
Edit 2: The models I thought ran fine in cron, were not actually fine. The results were wrong, until I used the fix explained below.

It looks like I finally managed to fix this, and I'm posting my solution here for anyone who encounters the same problem.
I ran env in terminal to see my current user environment. I copy-pasted the full output to the top of my crontab file. (Simply adding the PATH variable was not sufficient. I suppose it was SHELL or perhaps both PATH and SHELL that did the trick, but I haven't explored this further.)
To edit my user's crontab, I ran crontab -e, then pressed i to edit the file, pasted everything from env at the top of the file, stopped editing by pressing ctrl + c, and quit by typing :wq and hitting enter.

Related

How to debug `attempt to apply non-function` error that only occurs when executing script from cmd line

I have an R script that runs perfectly when copy pasted into an R session.
However, when I try to run the script from the command line (i.e., Rscript mycode.R) I keep getting an error attempt to apply non-function.
The error is coming from a function that uses AzureGraph and AzureAuth to fetch data from a Microsoft cloud directory. I can provide more detail about this function if needed, but my question is really more general.
What could cause this difference in behaviour between executing in an active R session vs. running the script from the command line.
What is the best strategy to debug this? Normally I would step through code in an R session to locate and fix errors, but obviously that will not work in this case.
Possibilities:
You have objects (functions, packages) in your interactive R session's environment that are not present in the session spawned by Rscript, for example because you automatically restore your R workspace at startup. Possible check: you can use sessionInfo() to see packages that are loaded and/or attached and ls() to list environment objects.
You have multiple R installations on your system, and the RScript command is somehow mapping to a different one than your interactive session, which is missing some packages or functions. Possible check: the version function.
The defaults of RScript are slightly different from those of an interactive session (for example, "save" and "restore" are typically disabled) and this is somehow affecting your script. Possible solution: try to add the --restore argument to the Rscript call.
How to debug? If you're going blind, good old bisection. Take your script, comment out the bottom half, see if it runs. Repeat iteratively (uncomment half of the commented section if it runs, comment out half of the uncommented section if not) until you find the line where the error is.
You can also run an individual line of code from Rscript using Rscript -e "some code" if you want to quickly check if a specific call is causing problems.

Close and reopen + run an R script from itself on Mac

I would like to write an R script that can close and reopen + run itself.
This is needed for an API query that I am trying to make and which seems to require me going through these steps once every hour to be able to make additional requests. I tried to use the source() function - and simply run my script from itself every hour- but with this the API keeps rejecting additional requests; it seems that actually closing and opening the program is necessary.
I also tried to use the system() command - as described here to actually open R and execute the script - but I was not able to figure out how to implement this in a Mac environment.
Would you have any suggestions on how to do this?
The usual way to run a script every x amount of time on a Unix system is a cronjob. I'm not familiar with macOS, but apparently it works just as on Linux.
Open the job list to edit it (you can of course use another editor instead of nano)
env EDITOR=nano crontab -e
Add a line/job to the file. This runs any command line command. You can use Rscript to run an R script.
0 * * * * Rscript "/path/to/your/script.R"
Exit and safe. This script should now run every hour.
If you want to change the timing check out crontab.guru. Check out this answer, if the cronjob reports Rscript to be missing.

Use crontab to automate R script

I am attempting to automate a R script using Rstudio Server on ec2 machine.
The R script is working without errors. I then navigated to the terminal on RStudio Sever and attempted to run the R script using the command - Rscript "Rfilename" and it works.
At this point I created a shell script and placed the command above for running the R script in there. This shell command is also running fine - sh "shellfilename"
But when I try to schedule this shell command using crontab, it does not produce any result. I am using the following cron entry :
* * * * * /usr/bin/sh ./shellfilename.sh
I am using cron for the first time and need help debug what is going wrong. My intuition is that there is there is difference in the environments used by the command when I run it on terminal and when I use the same in crontab. In case it is relevant information - am doing all of this on a user account created for myself on this machine so would differ from admin account.
Can someone help resolve this issue? Thanks!
The issue arose due to relative paths used in the script for importing files and objects. Changing this to absolute path resolved the described issue.

R script to Task Scheduler

I have an R script that gets data from databases on another server and brings it into my database. I have it saved as "dataimport.R"
I followed a few answers from here and from other websites and created a batch file like this:
"C:\Program Files\R\R-3.4.0\bin\R.exe" CMD BATCH --vanilla --slave "C:\dataimport.R"
This is not working. The cmd window opens up but the tables are not recreated and I dont get any error. I wanted to run the Task Scheduler to automate the process. Any ideas on how to fix this?
I kept at it and interestingly the answer to this was this:
"C:\Program Files\R\R-3.4.0\bin\R.exe" "C:\dataimport.R"
I dont know the reason for this but as long as it works.
I've had loads of problems with this, but finally managed to get it to work. To be more comprehensive, here are some of the things I've tried (in case one of these work for other persons):
#echo off, R CMD BATCH C:\myfolder\script.R
R CMD BATCH C:\myfolder\script.R
Using the package taskschedulerR (somehow didn't work overnight)
Used the answer provided above ("C:\Program Files\R\R-3.4.0\bin\R.exe" "C:\dataimport.R")
and all kinds of variations and combinations off these. (can't remember them all exactly)
What finally worked was:
Make an R script and save it (C:\myfolder\Test.R for example)
Through notepad, fill in: "C:\Program Files\R\R-3.5.2\bin\x64\R.exe" CMD BATCH "C:\myfolder\Test.R" (also tried Rscript.exe, didn't work for me).
in Windows Task Scheduler (v1.0) do 'Create task'
fill in time triggers.
in Actions, make an action with Start a program and in the Program/Script line provide the location where your bat script is. C:\myfolder\Test.bat
in the Start in (optional) line: enter C:\myfolder\
Note: both your .bat file and .R script are in the "C:\myfolder" folder.

Automating R to run scripts

I'm basically looking for any way to automatically run R scripts just like it would run as if I was copy and pasting it into console. I've tried the package 'taskscheduleR' however it just seems to output to a log file in the directory which isn't as if I were to just run it inside the Rstudio application.
An example might be, say I want to get the last closing stock prices of 5 stocks each night, then the script in Rstudio and have the variables there and all of the code would be in the script file.
Any thoughts?
I would suggest the in-built Task Scheduler application if you using Windows.
Create a task that will run a batchscript file. This batchscript file has only 1 line which executes the Rscript you want. Set it to run each night (or whatever time you want).
I am not that well-versed in linux and MacOS but here's what I know:
Linux has cron. Add a job to crontab with your preferred timing and execute your script 'path/to/bin/r /path/to/script.r'
MacOS has Automator + iCal (for scheduling). It also has crontab like Linux.

Resources