Calling R from S-Plus? - r

Does anyone have any suggestions for a good way to call R from S-Plus? Ideally I would like to just pass code to R and get data back without having to write anything too elaborate to integrate them.
I should add that I'm familiar with the RinS package on Omegahat, but I haven't used it. I was under the impression that Insightful had made an effort to integrate the environments before Tibco took over.
Edit: It turns out that RinS doesn't work on Windows. I found that the easiest solution was to just use Rscript. I can call this from S-Plus with the system() command. For example, here's a simple script:
#! Rscript --vanilla --default-packages=utils
args <- commandArgs(TRUE)
print(args)
print(1:100)
Sys.sleep(2)
res <- "hello world"
class(res) <- "try-error"
if(inherits(res, "try-error")) q(status=1) else q()
And calling it from S-Plus:
system("rscript c://test.rscript 'some text'")
Then I just store the results into a text file and import it into S-Plus after the script is run.

RSPlus is the only option i'm aware of. I used it almost daily for about a year, but haven't used it since R 2.7. From your Q, it seems like you just want to run R inside SPlus, which RSPlus can certainly do (R is a separate interpreter accessible via an interface comprised of a few SPlus functions, the most-often used is '.R()', e.g., .R("fivenum", 1:10).
I think we are talking about the same thing though, because 'RinS' is one of two modules (SpinR being the other) that together comprise RSPlus (i.e., there's only a single interface, regardless of the direction you want to go--R to SPlus, or SPllus to R). Although it wasn't obvious to me at the time, i had to install both modules to get RinS to work.

Related

source() functions preventing the read of downstream functions

I am trying to teach myself R and coming from a Python programming background.
I am clearly having problems with sourcing in one file (file_read_functions.R) when the functions stored in it are called from a file in the same directory (from read_files.R).
file_read_functions is as follows:
constant_source <- 'constants.R'
function_source <- 'file_read_functions.R'
class_source <- 'classes.R'
source(class_source)
source(constant_source)
source(function_source)
cellecta_counts = read_cellecta_counts(filepath = cell_counts_by_gene_id)
file_read_functions.R is as follows:
constants <- 'constants.R'
classes <- 'classes.R'
assignments <- 'assignment_functions.R'
source(constants)
source(classes)
source(assignments)
read_cellecta_counts = function(filepath) {
print("hello")
return(filepath)}
With the above, if I move read_cellecta_counts to before the source functions, the code can successfully find the function. What might be the cause?
This seems like a straight-forward error message to me. The function object wasn't found, so that means you haven't defined it anywhere, or haven't loaded it.
If it's a function from a package, maybe you forgot to load the package, or call the function as package::function(). If it is a function you wrote as a simple script, maybe you forgot to source it or define it locally. If it's a function you wrote as part of a package, you can load all functions by using the shortcut CTRL+SHIFT+L in RStudio.
That said, I believe you can benefit a lot from reading the chapter on Debugging from Hadley Wickham's "Advanced R" book. It is really well written and easy to understand, especially for beginners in the R language. The chapter will teach you how to use some debugging tools, either interactively or not. You can find it here.
I found it out. By commenting out the parts piece-wise in file_read_functions, I found that there a typo within assignment_functions.R. It was a simple typo; I had the following:
constant_source <- 'constants.R'
source(constants_source)
The extra 's' did it.
If you get an error like this in the future, be sure to check all of the upstream source() files; if there is an error in any of those it will not be able to find any proceeding functions.
Thank you all for your patience.

Run multiple R scripts with exiting/restarting in between on Linux

I have a series of R scripts for doing the multiple steps of data
analysis that I require. Some of these take a very long time and create really
large objects. I've noticed that if I just source all of them in a row (via a main.R script), the
processing for later steps takes much longer than if I source one script, save
what I need, and restart R for the next step (loading the data I need).
I was wondering if there was a
way, via Rscript or a Bash script perhaps, that I could carry this out.
There would need to be objects that persist for the first 2 scripts (which load
my external data and create the objects that will be used for all further
steps). I suppose I could also just save those and load them in further scripts.
(I would also like to pass a number of named arguments to this script, which I think I can find on other SO posts and can use something like optparse.)
So, the script would look something like this, I think:
#! /bin/bash
Rscript 01_load.R # Objects would persist, ideally
Rscript 02_create_graphs.R # Objects would persist, ideally
Rscript 03_random_graphs.R # contains code to save objects
#exit R
Rscript 04_permutation_analysis.R # would have to contain code to load data
#exit
And so on. Is there a solution to this? I'm using R 3.2.2 on 64-bit CentOS 6. Thanks.
Chris,
it sounds you should do some manual housekeeping between (or within) your steps by using gc() and maybe also rm(). For more details see help(gc) and help(rm).
So instead of exit R and restart it again you could do:
rm(list = ls())
gc()
But please note: rm(list = ls()) would throw away all your objects. Better you create a suitable list of objects you really want to throw away and pass this list to rm().

Avoid loading libraries on multiple run of R script

I need to run (several times) my R script (script.R), which basically looks like this:
library(myLib)
cmd = commandArgs(TRUE)
args=myLib::parse.cmd(cmd)
myLib::exec(args)
myLib is my own package, which load some dependencies (car, minpack.lm, plyr, ggplot2). The time required for loading libraries is comparable with the time of myLib::exec, so I'm looking for a method which helps me not to load them every time I call Rscript script.R
I know about Rserve, but it looks like a little bit overkill, though it could do exactly what I need. Is there any other solutions?
P.S: I call script.R from JVM using Scala.
Briefly:
on startup you need to load your libraries
if you call repeatedly and start repeatedly you repeatedly load the libraries
you already mentioned a stateful solution (Rserve) which allows you start it once but connect and eval multiple times
so I think you answered your question.
Otherwise, I enjoy littler and have shown how it starts faster than either R or Rscript -- but the fastest approach is simply not to restart.
I tried littlr, seems amazing, but don't want to work on R v4.0.
Rserve seems cool but like you pointed out it seems to be an overkill.
I end up limiting the import to the functions I need.
For example:
library(dplyr, include.only = c("select", "mutate","group_by", "summarise", "filter" , "%>%", "row_number", 'left_join', 'rename') )

start a new R session in knitr

How can I start a new R session in knitr? I would rather start a new session rather than use something like rm(list=ls()) because it is not equivalent.
<<myname>>=
#some R code
#
<<another_chunk>>=
#start a new R session
#more R code
#
Okay, now I have something more substantial for you, inspired by an answer on the R-help list by Georg Ruß. He suggest three things to get R back to how it was at start up, I've written this six step manual for you.
First, you save a string of the packages you have running at start up (this should be done before anything else, before you run any other code),
foo <- .packages()
Second, when you want to reset R, as you also mention, you run
rm(list=ls())
to remove all objects. Then, third, you run,
bar <- .packages()
to get a string of current packages. Followed by,
foobar <- setdiff(bar, foo)
Fifth, you remove the difference with this work-around loop,
toRemove <- paste("package:", foobar, sep='')
#or paste0("package:", foobar) in R-2.15.0 or higher
for(i in seq_along(foobar)) {
detach(toRemove[i], character.only=TRUE)
}
Sixth, depending on your setup, you source your .Rprofile
source(".Rprofile")
This should put R into the state it was in when you started it. I could have overlooked something.
Instead of starting a new R session in knitr, I would recommend you just to start a new R session in your terminal (or command window) like this:
R -e "library(knitr); knit('your_input.Rnw')"
If you are under Windows, you have to put the bin directory of R into your environment variable PATH (I'm very tired of describing how to do this, so google it by yourself if you are in the Windows world, or see the LyX Sweave manual).
However, most editors do start a new R session when calling Sweave or knitr, e.g. LyX and RStudio, etc. You can find more possible editors in http://yihui.name/knitr/demo/editors/ I do not really see the need to call R -e ... in the terminal.

Switch R script from non-interactive to interactive

I've an R script, that takes commandline arguments, where the top line is:
#!/usr/bin/Rscript --slave
I wanted to interrupt execution in a function (so I can interactively use the data variables that have been loaded by that point to work out the next bit of code I need to write). I added this inside the function in question:
browser()
but it gets ignored. A bit of searching suggests it might be because the program is running in non-interactive mode. But even more searching has not tracked down how I switch the script out non-interactive mode so that browser() will work. Something like a browser_yes_I_really_mean_it() function.
P.S. I want to avoid altering the rest of the script if at all possible. My current approach is to copy and paste the code chunks, needed to prepare the data, into an interactive session; but as the script gets more and more complex this is getting more and more unreasonable.
UPDATE: for anyone else with the same question, it appears the answer to the actual question is that it is impossible. Once you start R in a non-interactive mode the die is cast. The given answers are therefore workarounds: either you hack your code (remembering to unhack it afterwards), or you refactor to make debugging easier. (This comment is not intended as a criticism of the answers; the suggested refactoring makes the code cleaner anyway.)
Can you just fire up R and source the file instead?
R
source("script.R")
Following mdsumner's answer, I edited my script like this:
if(!exists("argv")){
argv=commandArgs(TRUE)
if(length(argv)!=4)usage_and_exit()
}else{
if(length(argv)!=4){
stop("Must set argv as a 4 element vector. E.g. argv=c(...)")
}
}
Then no other change was needed, and I was able to do:
R
> argv=c('a','b','c','d')
> source("script.R")
In addition to the previous answer, I'd create a toplevel function (e.g. doStuff) which performs the analysis you want to perform in batch. The function takes the cmd line options as input. In the batch script you source the script that contains this function and call it. In this way you can easily run the function in interactive mode and use e.g. browser().
In some cases, the suggested solution (workaround) may not work - for example, when the R code needs to be run as a part of an existing bash script. For those cases, I suggest to write in your R script into the bash script using here document:
#!/bin/bash
R --interactive << EOT
# R code starts here
argv=c('a','b','c','d')
print(interactive())
# Rest of script contents
quit("no")
# R code ends here
EOT
This way, print(interactive()) above will yield TRUE.
Sidenote: Make sure to avoid the $ character in your R code, as this would not be processed correctly - for example, retrieve a column from a data.frame() by using df[["X1"]] instead of df$X1.

Resources