`objects()` in R vs. Splus - r

I would like to write a script main.r that returns the workspace to the state it was in before being run (i.e., at the end of the script, remove all and only the objects that had been added to the workspace). Running the following:
#main.r
initial.objects <- objects()
tmp1 <- 1
remove(list = setdiff(objects(), initial.objects)
via source('main.r') from the R console works as desired. HOWEVER, this does NOT work in Splus with tmp1 being left in the working directory (it does work when I run each line individually rather than sourcing the entire file). Investigating a little further, I found that in R objects() keeps track of the objects entering the workspace even in the MIDDLE of a call to source(). In Splus, objects() doesn't seem to "know" about the objects that have been added to the workspace until the END of a source() call.
Q: What's going on? What can I do to get something similar to main.r working in Splus?

I'm not sure what you're trying to do here, but the best way to reload an environment is to save it and reload it.
save("pre-environ.Rdata")
## Your script goes here
rm(list=ls()) ## clean the environment
## Reload the original environ at end of your script
load("pre-environ.Rdata")

Related

Restart R session without interrupting the for loop

In my for loop, I need to remove RAM. So I delete some objects with rm() command. Then, I do gc() but the RAM still the same
So I use .rs.restartR() instead of gc() and it works: a sufficient part of my RAM is removed after the restart of the R session.
My problem is the for loop which is interrupted after the R restart. Do you have an idea to automatically go on the for loop after the .rs.restartR() command ?
I just stumbled across this post as I'm having a similar problem with rm() not clearing memory as expected. Like you, if I kill the script, remove everything using rm(list=ls(all.names=TRUE)) and restart, the script takes longer than it did initially. However, restarting the session using .rs.restartR() then sourcing again works as expected. As you say, there is no way to 'refresh' the session while inside a loop.
My solution was to write a simple bash script which calls my .r file.
Say you have a loop in R that runs from 1 to 3 and you want to restart the session after each iteration. My bash script 'runR.sh' could read as follows:
#!/bin/bash
for i in {1..3}
do
echo "Rscript myRcode.r $i" #check call to r script is as expected
Rscript myRcode.r $i
done
Then at the top of 'myRcode.r':
args <- commandArgs()
print(args) #list the command line arguments.
myvar <- as.numeric(args[6])
and remove your for (myvar in...){}, keeping just the contents of the loop.
You'll see from print(args) that your input from your shell script is the 6th element of the array, hence args[6] in the following line when assigning your variable. If you're passing in a string, e.g. a file name, then you don't need as.numeric of course.
Running ./runR.sh will then call your script and hopefully solve your memory problem. The only minor issue is that you have to reload your packages each time, unlike when using .rs.restartR(), and may have to repeat other bits that ordinarily would only be run once.
It works in my case, I would be interested to hear from other more seasoned R/bash users whether there are any issues with this solution...
By saving the iteration as an external file, and writing an rscript which calls itself, the session can be restarted within a for loop from within rstudio. This example requires the following steps.
#Save an the iteration as a separate .RData file in the working directory.
iter <- 1
save(iter, file="iter.RData")
Create a script which calls itself for a certain number of iterations. Save the following script as "test_script.R"
###load iteration
library(rstudioapi)
load("iter.RData")
###insert function here.
time_now <- Sys.time()
###save output of function to a file.
save(time_now, file=paste0("time_", iter, ".Rdata"))
###update iteration
iter <- iter+1
save(iter, file="iter.RData")
###restart session calling the script again
if(iter < 5){
restartSession(command='source("test_script.R")')
}
Do you have an idea to automatically go on the for loop after the .rs.restartR() command ?
It is not possible.
Okay, you could configure your R system to do something like this, but it sounds like a bad idea. I'm not really sure if you want to restart the for loop from the beginning or pick it up where left off. (I'm also very confused that you seem to have been able to enter commands in the R console while a for loop was executing. I think there's more than you are not telling us.)
You can use your rprofile.site file to automatically run commands when R starts. You could set it up to automatically run your for loop code whenever R starts. But this seems like a bad idea. I think you should find a different sort of fix for your problem.
Some of the things you could do to help the situation: have your for loop write output for each iteration to disk and also write some sort of log to disk so you know where you left off. Maybe write a function around your for loop that takes an argument of where to start, so that you can "jump in" at any point.
With this approach, rather than "restarting R and automatically picking up the loop", a better bet would be to use Rscript (or similar) and use R or the command line to sequentially run each iteration (or batch of iterations) in its own R session.
The best fix would be to solve the memory issue without restarting. There are several questions on SO about memory management - try the answers out and if they don't work, make a reproducible example and ask a new question.
You can make your script recursive by sourcing itself after restarting session.
Make sure the script will take into account the initial status of the loop. So you might have to save the current status of the loop in a .rds file before restarting session. Then call the .rds file from inside the loop after restarting session. This would help you start the loop where it was before restarting r session.
I just found out about this command 'restartSession'. I'm using it because I was also running into memory consumption issues as the garbage collector will not give back the RAM to the OS (Linux).
library(rstudioapi)
restartSession(command = "print('x')")
An approach independent from Rstudio:
If you want to run this in Rstudio, don't use R console, but terminal, otherwise use rstudioapi::restartSession() as in other answers - not recommended (crashes) -.
create iterator and load script (in system terminal would be:)
R -e 'saveRDS(1,"i.rds"); source("Script.R")'
Script.R file:
# read files and iterator
i<-readRDS("i.rds")
print(i)
# open process id of previous loop to kill it
tryCatch(pid <- readRDS(file="pid.rds"), error=function(e){NA} )
if (exists("pid")){
library(tools)
tools::pskill(pid, SIGKILL)
}
# update objects and iterator
i <- i+1
# process
pid <- Sys.getpid()
# save files and iterator
saveRDS(i, file="i.rds")
# process ID to close it in next loop
saveRDS(pid, file="pid.rds")
### restart session calling the script again
if(i <= 20 ) {
print(paste("Processing of", i-1,"ended, restarting") )
assign('.Last', function() {system('Rscript Script.R')} )
q(save = 'no')
}

Running R via terminal in Windows and leaving R session open

Suppose that I have a R script called test.R, stored at C:\, with the following content:
x <- "Hello Stackoverflowers"
print(x)
To run it via terminal one could simply call:
Rscript C:\test.R
And as expected, the result will be:
However, what I wonder is whether there is a way to run test.R via Windows console but after that staying within the executed R session instead of closing and going back to the console cursor? That is, staying inside the R session instead of going back, in the image above, to C:\R\R-3.4.1\bin>.
For instance, when compiling Python code with python.exe I can easily accomplish a similar thing by passing the -i parameter to the python.execall.
How could I do that with R?
Add this to your .Rprofile:
STARTUP_FILE <- Sys.getenv("STARTUP_FILE")
if (file.exsts(STARTUP_FILE)) source(STARTUP_FILE)
and then set the indicated environment variable outside of R and then run R. e.g. from the Windows cmd line:
set STARTUP_FILE=C:\test.R
R
... R session ...
q()
Variations
There are many variations of this. For example, we could make a copy of the .Rprofile file in a specific directory such as ~/test, say, and add this code to that copy
source("~/test/test.R")
in which case R would only run test.R if R were started in that directory.

Run multiple R scripts with exiting/restarting in between on Linux

I have a series of R scripts for doing the multiple steps of data
analysis that I require. Some of these take a very long time and create really
large objects. I've noticed that if I just source all of them in a row (via a main.R script), the
processing for later steps takes much longer than if I source one script, save
what I need, and restart R for the next step (loading the data I need).
I was wondering if there was a
way, via Rscript or a Bash script perhaps, that I could carry this out.
There would need to be objects that persist for the first 2 scripts (which load
my external data and create the objects that will be used for all further
steps). I suppose I could also just save those and load them in further scripts.
(I would also like to pass a number of named arguments to this script, which I think I can find on other SO posts and can use something like optparse.)
So, the script would look something like this, I think:
#! /bin/bash
Rscript 01_load.R # Objects would persist, ideally
Rscript 02_create_graphs.R # Objects would persist, ideally
Rscript 03_random_graphs.R # contains code to save objects
#exit R
Rscript 04_permutation_analysis.R # would have to contain code to load data
#exit
And so on. Is there a solution to this? I'm using R 3.2.2 on 64-bit CentOS 6. Thanks.
Chris,
it sounds you should do some manual housekeeping between (or within) your steps by using gc() and maybe also rm(). For more details see help(gc) and help(rm).
So instead of exit R and restart it again you could do:
rm(list = ls())
gc()
But please note: rm(list = ls()) would throw away all your objects. Better you create a suitable list of objects you really want to throw away and pass this list to rm().

Making knitr run a r script: do I use read_chunk or source?

I am running R version 2.15.3 with RStudio version 0.97.312. I have one script that reads my data from various sources and creates several data.tables. I then have another r script which uses the data.tables created in the first script. I wanted to turn the second script into a R markdown script so that the results of analysis can be outputted as a report.
I do not know the purpose of read_chunk, as opposed to source. My read_chunk is not working, but source is working. With either instance I do not get to see the objects in my workspace panel of RStudio.
Please explain the difference between read_chunk and source? Why would I use one or the other? Why will my .Rmd script not work
Here is ridiculously simplified sample
It does not work. I get the following message
Error: object 'z' not found
Two simple files...
test of source to rmd.R
x <- 1:10
y <- 3:4
z <- x*y
testing source.Rmd
Can I run another script from Rmd
========================================================
Testing if I can run "test of source to rmd.R"
```{r first part}
require(knitr)
read_chunk("test of source to rmd.R")
a <- z-1000
a
```
The above worked only if I replaced "read_chunk" with "source". I
can use the vectors outside of the code chunk as in inline usage.
So here I will tell you that the first number is `r a[1]`. The most
interesting thing is that I cannot see the variables in RStudio
workspace but it must be there somewhere.
read_chunk() only reads the source code (for future references); it does not evaluate code like source(). The purpose of read_chunk() was explained in this page as well as the manual.
There isn't an option to run a chunk interactively from within knitr AFAIK. However, this can be done easily enough with something like:
#' Run a previously loaded chunk interactively
#'
#' Takes labeled code loaded with load_chunk and runs it in the /global/ envir (unless otherwise specified)
#'
#' #param chunkName The name of the chunk as a character string
#' #param envir The environment in which the chunk is to be evaluated
run_chunk <- function(chunkName,envir=.GlobalEnv) {
chunkName <- unlist(lapply(as.list(substitute(.(chunkName)))[-1], as.character))
eval(parse(text=knitr:::knit_code$get(chunkName)),envir=envir)
}
NULL
In case it helps anyone else, I've found using read_chunk() to read a script without evaluating can be useful in two ways. First, you might have a script with many chunks and want control over which ones run where (e.g., a plot or a table in a specific place). I use source when I want to run everything in a script (for example, at the start of a document to load a standard set of packages or custom functions). I've started using read_chunk early in the document to load scripts and then selectively run the chunks I want where I need them.
Second, if you are working with an R script directly or interactively, you might want a long preamble of code that loads packages, data, etc. Such a preamble, however, could be unnecessary and slow if, for example, prior code chunks in the main document already loaded data.

start a new R session in knitr

How can I start a new R session in knitr? I would rather start a new session rather than use something like rm(list=ls()) because it is not equivalent.
<<myname>>=
#some R code
#
<<another_chunk>>=
#start a new R session
#more R code
#
Okay, now I have something more substantial for you, inspired by an answer on the R-help list by Georg Ruß. He suggest three things to get R back to how it was at start up, I've written this six step manual for you.
First, you save a string of the packages you have running at start up (this should be done before anything else, before you run any other code),
foo <- .packages()
Second, when you want to reset R, as you also mention, you run
rm(list=ls())
to remove all objects. Then, third, you run,
bar <- .packages()
to get a string of current packages. Followed by,
foobar <- setdiff(bar, foo)
Fifth, you remove the difference with this work-around loop,
toRemove <- paste("package:", foobar, sep='')
#or paste0("package:", foobar) in R-2.15.0 or higher
for(i in seq_along(foobar)) {
detach(toRemove[i], character.only=TRUE)
}
Sixth, depending on your setup, you source your .Rprofile
source(".Rprofile")
This should put R into the state it was in when you started it. I could have overlooked something.
Instead of starting a new R session in knitr, I would recommend you just to start a new R session in your terminal (or command window) like this:
R -e "library(knitr); knit('your_input.Rnw')"
If you are under Windows, you have to put the bin directory of R into your environment variable PATH (I'm very tired of describing how to do this, so google it by yourself if you are in the Windows world, or see the LyX Sweave manual).
However, most editors do start a new R session when calling Sweave or knitr, e.g. LyX and RStudio, etc. You can find more possible editors in http://yihui.name/knitr/demo/editors/ I do not really see the need to call R -e ... in the terminal.

Resources