Schedule a function that belongs to an R package - r

I'm trying to build an R package whose goal is to run a series of analyses by taking input data and writing output data to an external database (PostgreSQL).
Specifically, I need a set of operations to be scheduled to run on a daily basis. Therefore, I have written some bash scripts with R code (using the header #!/usr/bin/env Rscript) and I have saved them into the exec/ folder of the R package. The scripts make several call to the package's core functions in R/ folder.
At this point, once installed the package on a linux server, how do I set up a crontab that is able to directly access the scripts in the exec/ folder?
Is this way of proceeding correct or is there a different best practice for such operations?

We do this all the bleeping time at work. Here at home I also have a couple of recurring cronjobs, eg for CRANberries. The exec/ folder you reference works, but my preferred solution is to use, say, inst/scripts/someScript.R.
Then one initial time you need create a softlink from your package library, say, /usr/local/lib/R/site-library/myPackage/scripts/someScript.R to a directory in the $PATH, say /usr/local/bin.
The key aspect is that the softlink persists even as you update the package. So now you are golden. All you now need is your crontab entry referencing someScript.R. We use a mix of Rscript and littler scripts.

Related

R library path for two Linux clusters

I am working with two Linux clusters which share the same file system.
Because of that, when I install libraries in one of the clusters, they
get installed in the same folder (/home/R), shared by both clusters,
which causes conflicts if later I work on the other cluster.
Do you know if there is any external variable or even any R hidden config
I could use, so that, upon starting R (or Rstudio) on one cluster it could
detect the cluster and the corresponding path for the libraries' location
(for instance /home/R/cluster1 and /home/R/cluster2)?
Thanks.
Yes, it should be pretty straightforward. Create an Rprofile.site file (see the Initialization at Startup docs for where this goes). In that file, you can write R code to detect which cluster you're on.
Once you know which cluster you're on, use the .libPaths() function (see libPaths docs) to change the library path.
R will run the Rprofile.site file every time a new session starts up, so each session should get its library path adjusted appropriately for the cluster it's on.

Using `/exec` directory in R packages for R scripts for reproducible research

For my research project I want to use a make-based workflow but also deliver the project in the form of a package. Thus, I want to put reusable functions in the /R directory for others to access but also use R scripts executable from the command line to do the actual analysis (generate and clean datasets, create plots, etc.), with make tracking the relationships among files and rebuilding what is necessary.
In packages, supposedly the /exec directory is meant for executable files. I'm wondering if putting my executable scripts in this directory would be appropriate.

Big R project with several packages and developers: Best setup for easy version controll based on packages

I have to restructure a big project written in R, which is later consisting several packages as well as developers. Everything is set up on a git server.
The question is: How do I manage frequent changes inside packages without having to build them every time and developers updating them after they made a new pull? Is there any best practice or automation for that? I don't want source() with unbuilt packages and R.files but would like to stick with a package like structure as much as possible. We will work in a Windows environment.
Thanks.
So I fiddled around a while, tried different setups and came up with an arrangement which fits my needs.
It basically consists two git repositories. The first on (let's call it base-repo) of them contains most scripts on which all later packages are based on. The second repo we will call the "package-repo".
Most development work should be done on the base-repo. The base-repo is under CI control via a build server and unit tests.
The package-repo contains folders for each package we want to build and the base-repo as a git-submodule.
Each package can now be constructed via a very simple bash/shell script (“build script”):
check out a commit/tag of the submodule base-repo on which the stable
package build should be based on
copy files which are necessary for the package into the specific package folder
checks and builds the package
script can also create a history file of package
script can either be invoked manually or by a build server
This approach can also be combined with packrat. Additional code which is very package specific can now be also added to the package-repo and is under version control while independed from the base-repo
The approach could be further extended to trigger the build of packages from the package-repo based on pushes to the base-repo. Packages with a build script pointing to master as a commit will always be up to date and if under control of a build server it will ensure that changes to the base-repo will not break the package. Also it is possible to create several packages containing the same scripts from base-repo.
See also: git: symlink/reference to a file in an external repository

Load Folder of Scripts in R at startup?

I'm new to R and frankly the amount of documentation is overwhelming, and I haven't been able to find the answer to this question.
I have created a number of .R script files, all stored in a folder that I can access on my server (let's say the folder is, using the Windows backslash character \\servername\Paige\myscripts)
I know that in R you can call each script individually, for example (using the forward slash required in R)
source(file="//servername/Paige/myscripts/con_mdb.r")
and now this script, con_mdb, is available for use.
If I want to make all the scripts in this folder available at startup, how do I do this?
Briefly:
Use your ~/.Rprofile in the directory found via Sys.getenv("HOME") (or if that fails, in R's own Rprofile.site)
Loop over the contents of the directory via dir() or list.files().
Source each file.
as eg via this one liner
sapply(dir("//servername/Paige/myscripts/", "*.r"), source)
but the real story is that you should not do this. Create a package instead, and load that. Bazillion other questions here on how to build a package. Research it -- it is worth it.
Far the best way is to create a package! But as first step, you could also create one r script file (collection.r) in your script directory which includes all the scripts in a relative manner.
In your separate project scripts you can than include only that script with
source(file="//servername/Paige/myscripts/collection.r", chdir = TRUE)
which changes the directory before sourcing. Therefore you would have only to include one file for each project.
In the collection file you could use a loop over all files (except collection.r) or simply list them all.

automating R script using Mac's Automator and Calendar

I have been trying to run a script automatically using the steps that I found online.
I am trying to run the following R script called AUTO.R
Here is what the script contains:
library(quantmod)
obs <- last(Ad(getSymbols("SPY", auto.assign=FALSE)))
saveRDS(obs, "SAMPLE.rds")
When I build the application it prints Workflow completed
I believe all is well until the time comes to run the script. The alarm pop-up in my desktop is displayed from Calendar but nothing runs. After a few minutes the folder where the .rds file should be saved does not contain anything.
Two suggested changes:
Your Automator task should be more like just /usr/local/bin/Rscript --vanilla /Users/rimeallthetime/Desktop/AUTO.R
You should explicitly set the path in saveRDS; i.e. saveRDS(obs, "/Users/rimeallthetime/Desktop/SAMPLE.rds")
Honestly, though, you should at least make a ~/bin dir (i.e. a directory called bin under your home directory, so in your case /Users/rimeallthetime/bin and put both the workflow and R script in there, and I'd also suggest creating another directory for output files vs the desktop.
UPDATE
I just let the calendar event run and this is really a crude way to automate what you want to do. You'd be better off in the long run using launchd, that way it's fully automated and requires no human intervnention at all (but you may need to adjust your script to send you a notification or "append" to the rds file).

Resources