RStudio local job: "source" multiple scripts using "sapply" will return nothing - r

I have three scripts in the same directory: main.R, func1.R, func2.R. The codes are
main.R:
rm(list = ls())
x <- 0
filelist <- c("func2.R", "func1.R")
print(ls())
sapply(filelist, source)
print(ls())
func1.R:
x1 <- 1
func2.R:
x2 <- 2
If I run main.R in RStudio, the output will be
[1] "filelist" "x"
[1] "filelist" "x" "x1" "x2"
This means the results of func1.R and func2.R are exported into the global environment. However, if I submit main.R as a local job in RStudio, the output will be
[1] "filelist" "x"
[1] "filelist" "x"
I know I can solve this by using loop to source each script separately. I'm simply curious the reason why the sapply function behaves differently in console and local job, and how to make it work if I insist using sapply to source all scripts together? Thanks.

I think there is a misunderstanding of what is meant by local job. The feature run local job is quite a new feature from RStudio and lets the user start a job and run it local somewhere so that it is not inferring with the users own session. The main Idee here is that if you have some computationally heavy job you start this on some other maschine and then collect the results when they are done. If you want the scripts to run in your own session you should use the classic source command

Related

Is there a way to debug an R script run from the command line with Rscript.exe

Is it possible to debug an R source file which is executed with Rscript.exe?
> Rscript.exe mysource.R various parameters
Ideally, I would like to be able to set a break-point somewhere in the mysource.R file in RStudio.
Is entering the R debugger directly at the command line possible for instance by adding some debug directive to the source file?
Maybe sourcing the file from R would work? How? How do I pass the command line arguments "various parameters" so that commandArgs() returns the correct values?
The mysource.R could look as follows (in practice it is much more complicated).
#!/usr/bin/Rscript
args <- commandArgs(trailingOnly=TRUE)
print(args)
As far as debugging from console is concerned there are few questions related to it without an answer.
Is there a way to debug an RScript call in RStudio? and Rscript debug using command line
So, I am not sure if something has changed and if it is now possible.
However, we can debug from sourcing the file by using a hack. You could add browser() in the file wherever you want to debug. Consider your main file as :
main.R
args <- commandArgs(trailingOnly=TRUE)
browser()
print(args)
Now, we can override the commandArgs function and pass whatever arguments we want to pass which will be passed when you source the file.
calling_file.R
commandArgs <- function(...) list(7:9, letters[1:3])
source("main.R")
After running the source command, you could debug from there
Called from: eval(ei, envir)
#Browse[1]> args
#[[1]]
#[1] 7 8 9
#[[2]]
#[1] "a" "b" "c"
There's no native way to debug Rscript in the command line, but you can use a kind of hacky workaround I whipped up with readLines and eval.
ipdb.r <- function(){
input <- ""
while(!input %in% c("c","cont","continue"))
{
cat("ipdb.r>")
# stdin connection to work outside interactive session
input <- readLines("stdin",n=1)
if(!input %in% c("c","cont","continue","exit"))
{
# parent.frame() runs command outside function environment
print(eval(parse(text=input),parent.frame()))
}else if(input=="exit")
{
stop("Exiting from ipdb.r...")
}
}
}
Example usage in an R file to be called with Rscript:
ipdbrtest.R
a <- 3
print(a)
ipdb.r()
print(a)
Command line:
$ Rscript ipdbrtest.R
[1] 3
ipdb.r>a+3
[1] 6
ipdb.r>a+4
[1] 7
ipdb.r>a <- 4
[1] 4
ipdb.r>c
[1] 4
If you're considering using R instead of Rscript, you could pass it environment variables and retrieve them with Sys.getenv().

Simultaneous R sessions within different directories

I am looking in a way to start a new R instance working in a user-defined directory from a current R session. For example, let's say I have
getwd()
## [1] "/Users/jplecavalier/projects/foo"
and I want to do something like
start_new_session("bar_session", "/Users/jplecavalier/projects/bar")
set_focus("bar_session")
getwd()
## [1] "/Users/jplecavalier/projects/bar"
kill_session("bar_session")
getwd()
## [1] "/Users/jplecavalier/projects/foo"
In other words, I want to be able of evaluating something using another R session referring to another working directory while keeping the main R session waiting. Is there any package/function/way-of-doing-kind-of-this?

Accessing R objects from subprocess into parent process

In the context of teaching R programming, I am trying to run R scripts completely independently, so that I can compare the objects they have generated.
Currently, I do this with R environments:
student_env <- new.env()
solution_env <- new.env()
eval(parse(text = "x <- 4"), env = student_env)
eval(parse(text = "x <- 5"), env = solution_env)
student_env$x == student_env$y
While this provides some encapsulation, is is by far complete. E.g., if I execute a library() call in the student environment, it is attached to the global R session's search path, making the package available for code running in solution environment as well.
To ensure complete separation, I could fire up subprocesses using the subprocess package:
library(subprocess)
rbin <- file.path(R.home("bin"), "R")
student_handle <- spawn_process(rbin, c('--no-save'))
solution_handle <- spawn_process(rbin, c('--no-save'))
process_write(student_handle, "x <- 4\n")
process_write(solution_handle, "x <- 5\n")
However, I'm not sure how to go about the step of fetching the R objects so I can compare them.
My questions:
Is subprocess a good approach?
If yes, how can I (efficiently!) grab the R representations of objects from a subprocess so I can compare the objects in the parent process? Python does this through pickling/dilling.
I could communicate through .rds files, but this is unnecessary file creation/reading.
In R, I came across RProtoBuf, but I'm not sure if it solves my problem.
If no, are there other approaches I should consider? I've looked into opencpu, but the concept of firing up a local server and then use R to talk to that server and get representations feels like too complex an approach.
Thanks!
Another possible approach is the callr package, which is popular and developed by a credible source: https://github.com/r-lib/callr#readme.
An example from there:
r(function() var(iris[, 1:4]))
#> Sepal.Length Sepal.Width Petal.Length Petal.Width
#> Sepal.Length 0.6856935 -0.0424340 1.2743154 0.5162707
#> Sepal.Width -0.0424340 0.1899794 -0.3296564 -0.1216394
#> Petal.Length 1.2743154 -0.3296564 3.1162779 1.2956094
#> Petal.Width 0.5162707 -0.1216394 1.2956094 0.5810063
I'd use RServe as it lets you run multiple R sessions and control them all from the master R session. You can run commands in those sessions in any given (interwoven) order and access objects stored there in the native format.
subprocess has been created to run and control any arbitrary program via its command line interface, so I have never intended on adding an object-passing mechanism. Although, if I was to access objects from child processes, I'd do it via saveRDS and readRDS.

Debugging a function in a different source file in R

I'm using RStudio and I want to be able to stop the code execution at a specific line.
The functions are defined in the first script file and called from a second.
I source the first file into the second one using source("C:/R/script1.R")
I used run from beginning to line: where I start running from the second script which has the function calls and have highlighted a line in the first script where the function definitions are.
I then use browser() to view the variables. However this is not ideal as there are some large matrices involved. Is there a way to make these variables appear in RStudio's workspace?
Also when I restart using run from line to end it only runs to the end of the called first script file it does not return to the calling function and complete the running of the second file.
How can I achieve these goals in RStudio?
OK here is a trivial example the function adder below is defined in one script
adder<-function(a,b) {
browser()
return(a+b)
}
I than call is from a second script
x=adder(3,4)
When adder is called in the second script is starts browser() in the first one. From here I can use get("a") to get the value of a, but the values of a and b do not appear in the workspace in RStudio?
In the example here it does not really matter but when you have several large matrices it does.
If you assign the data, into the .GlobalEnv it will be shown in RStudio's "Workspace" tab.
> adder(3, 4)
Called from: adder(3, 4)
Browse[1]> a
[1] 3
Browse[1]> b
[1] 4
Browse[1]> assign('a', a, pos=.GlobalEnv)
Browse[1]> assign('b', b, pos=.GlobalEnv)
Browse[1]> c
[1] 7
> a
[1] 3
> b
[1] 4
What you refer to as RStudio's workspace is the global environment in an R session. Each function lives in its own small environment, not exposing its local variables to the global environment. Therefore a is not present in the object inspector of RStudio.
This is good programming practice as it shields sections of a larger script from each other, reducing the amount of unwanted interaction. For example, if you use i as a counter in one function, this does not influence the value of a counter i in another function.
You can inspect a when you are in the browser session by using any of the usual functions. For example,
head(a)
str(a)
summary(a)
View(a)
attributes(a)
One common tactic after calling browser is to get a summary of all variables in the current (parent) environment. Make it a habit that every time you stop code with browser, you immediately type ls.str() at the command line.

python rpy2 module: refresh global R environment

The documentation for rpy2 states that the robjects.r object gives access to an R global environment. Is there a way to "refresh" this global environment to its initial state?
I would like to be able to restore the global environment to the state it was in when the rpy2.robjects module was imported but not yet used. In this manner, I don't have to worry about memory leaks on long running jobs or other unexpected side effects. Yes, refreshing the environment could introduce a different category of bug, but I believe in my case it will be a win.
Taking your question to mean literally what it says, if you just want to clear out .GlobalEnv, you can do that with a single line:
rm(list = ls(all.names=TRUE))
The all.names=TRUE bit is necessary because some object names are not returned by vanilla ls(). For example:
x <- rnorm(5)
ls()
# [1] "x"
# Doesn't remove objects with names starting with "."
rm(list=ls())
ls(all.names = TRUE)
# [1] ".Random.seed"
# Removes all objects
rm(list = ls(all.names=TRUE))
ls(all.names = TRUE)
# character(0)
There is only /one/ "global environment" in R; it is initialized when R starts. You can clear out its members, as Josh points it out, but if you happen to need to it this might mean that you'd better instanciate new environments and either switch between them or delete them when no longer needed.

Resources