R automatic task scheduling in windows: how to use external functions - r

I have a scheduled task in windows to run a R program ("ftp.R")
After many attempts and reading many SO literature, I found that the only way to make it work properly was writing this piece of code into a .bat file:
#ECHO OFF
RSCRIPT ftp.R
Everything goes fine except wqith it, when I am trying to use functions that I have in other R programs.
For example, in the "ftp.R" program I have this pice of code:
source("//BCN-01/Server/R/Main/Production/Check_Integrity.R")
In the "Check_Integrity.R" program I have some functions I need to use in "ftp.R".
The thing is that if I execute the .bat file manually, there is no problem, and the "ftp.R" runs perfectly. But if I run exactly the same .bat file but from the task scheduler the "ftp.R" is unable to find the external functions.
(I am running the code in a Windows Server 2012)

One big difference between running a batch manually / with the scheduler is that the scheduler starts the script with system32 folder as work directory. So it might be enough to add the following line to your batch file: CD %~dp0.
Another point is that the scheduler runs your batch as a different user. So it is possible that you (your user account) has access to //BCN-01/Server/R/Main/Production/ while the system user the scheduler is running your script with does not. You could also try to tell the scheduler to run the script with the same user which you are logged in when running it successfully by hand.

Finally I figured out what was wrong:
All the paths have to be defined as absolute paths: mapped units are not valid.
Althought I already I knew that (thanks to some posts I read here in SO), I missed one path mapped into one of the functions in 'Check_Integrity.R' .

Related

How to run R script from command line repeatedly but only load packages the first time

I want to run an R script (in Win 7) from SQL Server 2014 each time a new record is added (to perform some analysis on the data). I saw that this can be done with the xp_cmdshell command which is like running it manually from the command line.
My problems (and questions) are:
I've made out from various websites that probably the best option is to use Rscript. This would have to be used at the command line as:
C:\Program Files\R\R-3.2.3\bin\x64\Rscript "my_file_folder\my_file.r
Can I copy Rscript.exe to the folder where my script is, such that I can run my script independently, even if R is not installed? What other files do I need to copy together with Rscript.exe such that it would work independently?
My script loads some packages that contain functions that it uses. Is there a way to somehow include these in the script such that they don't have to be loaded every time (it takes about 5 sec so far and I need this script to be faster)? Or is there a way to only load these packages the first time that the script runs?
In case the overall approach I've described here is not the best one, I am open to doing it differently. Maybe there is a way to somehow package the R script together with all the required dependencies (libraries and other parts of the R software which the script would need to run independently).
What I ultimately need is a for the script to run silently, and reasonably fast, without any windows or anything else popping up, each time a new record is added to my database, do the analysis and exit.
Thanks in advance for any answers.
UPDATE:
I figured out an elegant solution to running the R script. I'm setting up a job in SQL Server and inside that job I'm using "xp_cmdshell" to run my script as a parameter to Rscript.exe, as detailed at point 1 above. I can start this job from any stored procedure and the beauty of it is that the stored procedure does not wait for the script to finish. It just triggers the job (that runs the script in a separate thread) and then it continues with its business.
But questions from points 1 and 2 still remain.

Restart R session in Rstudio but continue running script

I'm currently running some queries to a database and getting back some big files as a result. I have encountered the common problem of Windows not freeing the memory, even though I 'rm()' everything and (edit) calling 'gc()'. One workaround i have found is using .rs.restartR() in Rstudio.
This though requires me to constantly watch my script, in order to continue it after the session restart. Is it possible to automate it? If not what other methods do people use to overcome this problem?
You could break the code into 2 files and write a batch file (.bat) that runs the first file through .rs.restartR() and then the remainder of the code in the next file.
You could also skip the .bat and just schedule both .R scripts to run in Task Scheduler.
Also, please see my comment regarding garbage collection (gc()).

Where can I deploy R script and schedule to run it in a timely manner?

I have a small R script. Typically, 15-20 lines of code. This script replies to random people on Twitter. Everything is working fine when I run the code. Now, I want to auto-run this after fixed intervals of time. So, I made a .bat the file of the R script and schedule the task in the windows scheduler to run after every 15 mins. This works fine as well and the way I expect it to work. Now, the problem is the scheduler works only when my system is working. Once I shut down my system, the scheduler also stops working and the script does not execute.
I want a server (not sure if this is the right word), where I can host the R file or .bat file and where I can schedule the script to execute after every X mins. I need a server (??) because I want this to be active 24 X 7.
I have tried deploying it on shinyapps.io but somehow it doesn't work. Have gone through this as well, with no help. Any help would be appreciated.

Can a rscript be aware of other Rscripts in ubuntu

I have a cron job that launches an Rscript to go fetch some data and stash it away in a database. Sometimes the source isn't ready so it waits. Sometimes the source skips a data point so the script ends up waiting until another instance is started from cron. Now that two instances are running they cause problems with each other. Is there something like the following pseudo code that I could put at the top of my scripts so they stop when they see another instance of themselves is already running:
stopifnot(Sys.running('nameofscript.r'))
One thing I thought to do would be for the script to make a temp file with a fixed name at the start and then delete that temp file at the end. This way the script can check for the existence of the file to know if it's already running. I also see there's something called flock which is probably a better solution than that but I'd prefer if there was a way R can do it without bash commands (I've never really learned any bash). If there isn't a pure R way to do this, can the R script call flock on itself or would it only work if the cron task calls a bash script that then calls the rscript. If it isn't already obvious, I don't really know how to use flock.

Bamboo Grunt 'dustc' task failing

I've moved my grunt job into Bamboo and everything works great except for the dust compiler. All of my other tasks in my gruntfile can be targeted from the bamboo task and they work. The dustc task gives this error when run: Fatal error: Error: not found: dustc.
I've manually run the dustc task on the build machine's command prompt every way I know how and it works. I even copied the command from the Bamboo build log - the one it uses to execute the grunt task - and it works just fine.
I just can't get it to work when I run the build from within Bamboo.
Any ideas would be greatly appreciated.
Thanks
This turned out to be a permissions/visibility issue.
Evidently system users (or at least the one which was running my build service) don't have access to the system environment variables in Windows. Learn something new every day!
I created a user to run the service (which I probably should have done in the first place) and that non-system user was able to see the correct PATH settings.
This took care of the problem.

Resources