Simultaneous R sessions within different directories - r

I am looking in a way to start a new R instance working in a user-defined directory from a current R session. For example, let's say I have
getwd()
## [1] "/Users/jplecavalier/projects/foo"
and I want to do something like
start_new_session("bar_session", "/Users/jplecavalier/projects/bar")
set_focus("bar_session")
getwd()
## [1] "/Users/jplecavalier/projects/bar"
kill_session("bar_session")
getwd()
## [1] "/Users/jplecavalier/projects/foo"
In other words, I want to be able of evaluating something using another R session referring to another working directory while keeping the main R session waiting. Is there any package/function/way-of-doing-kind-of-this?

Related

RStudio local job: "source" multiple scripts using "sapply" will return nothing

I have three scripts in the same directory: main.R, func1.R, func2.R. The codes are
main.R:
rm(list = ls())
x <- 0
filelist <- c("func2.R", "func1.R")
print(ls())
sapply(filelist, source)
print(ls())
func1.R:
x1 <- 1
func2.R:
x2 <- 2
If I run main.R in RStudio, the output will be
[1] "filelist" "x"
[1] "filelist" "x" "x1" "x2"
This means the results of func1.R and func2.R are exported into the global environment. However, if I submit main.R as a local job in RStudio, the output will be
[1] "filelist" "x"
[1] "filelist" "x"
I know I can solve this by using loop to source each script separately. I'm simply curious the reason why the sapply function behaves differently in console and local job, and how to make it work if I insist using sapply to source all scripts together? Thanks.
I think there is a misunderstanding of what is meant by local job. The feature run local job is quite a new feature from RStudio and lets the user start a job and run it local somewhere so that it is not inferring with the users own session. The main Idee here is that if you have some computationally heavy job you start this on some other maschine and then collect the results when they are done. If you want the scripts to run in your own session you should use the classic source command

Remove a library from .libPaths() permanently without Rprofile.site

How can I permanently remove a library in R?
.libPaths()
[1] "\\\\per-homedrive1.corp.riotinto.org/homedrive$/Tommy.O'Dell/R/win-library/2.15"
[2] "C:/Program Files/R/R-2.15.2/library"
[3] "C:/Program Files/RStudio/R/library"
The first item is my corporate "My Documents" folder, and the apostrophe in the path from my surname is causing all kinds of grief when using R CMD INSTALL --build on a package I'm making, not to mention issues using packages installed there when I'm offline from the network.
I want to use C:/Program Files/R/R-2.15.2/library as the default instead, but I don't want to have to rely on an Rprofile.site.
What I've tried
> .libPaths(.libPaths()[2:3])
> .libPaths()
[1] "C:/Program Files/R/R-2.15.2/library" "C:/Program Files/RStudio/R/library"
That seems to work, but only until I restart my R session, and then I'm back to the original .libPaths() output...
Restarting R session...
> .libPaths()
[1] "\\\\per-homedrive1.corp.riotinto.org/homedrive$/Tommy.O'Dell/R/win-library/2.15"
[2] "C:/Program Files/R/R-2.15.2/library"
[3] "C:/Program Files/RStudio/R/library"
I thought maybe .libPaths() was using R_LIBS_USER
> Sys.getenv("R_LIBS_USER")
[1] "//per-homedrive1.corp.riotinto.org/homedrive$/Tommy.O'Dell/R/win-library/2.15"
So I've tried to unset it using Sys.unsetenv("R_LIBS_USER") but it doesn't persist between sessions.
Additional Info
If it matters, here are some environment variables that might be relevant...
> Sys.getenv("R_HOME")
[1] "C:/PROGRA~1/R/R-215~1.2"
> Sys.getenv("R_HOME")
[1] "C:/PROGRA~1/R/R-215~1.2"
> Sys.getenv("R_USER")
[1] "//per-homedrive1.corp.riotinto.org/homedrive$/Tommy.O'Dell"
> Sys.getenv("R_LIBS_USER")
[1] "//per-homedrive1.corp.riotinto.org/homedrive$/Tommy.O'Dell/R/win-library/2.15"
> Sys.getenv("R_LIBS_SITE")
[1] ""
I've tried Sys.unsetenv("R_LIBS_USER") but this also doesn't stick between sessions
Just set the environment variable R_LIBS in Windows to something like
R_LIBS=C:/Program Files/R/R-2.15.2/library
Restart R.
This is bit late response to the question, but might be useful for others.
I order to set up my own path (and remove one of the original ones) I have:
used .libPaths() inside R to check current library paths;
identified which paths to keep. In my case, it kept R's original library but removed link to my documents.
found R-Home path using R.home() or Sys.getenv("R_HOME");
R-Home\R-3.2.2\etc\Rprofile.site is read every time R kernel starts. Therefore, any modification will be persistent to every run of R.
Edited Rprofile.site by adding the following,
.libPaths(.libPaths()[2])
.libPaths("d:/tmp/R/win-library/3.2")
How it works?
First line remove all but one path (second from the original list), the second line adds an additional path. We end up with two paths.
note that I use Unix path notation despite using windows. R always use Unix notation, regardless of operating system
restarted R (using Ctr+Shift+F10)
This will work every time now.
Use this function in .Rprofile
set_lib_paths <- function(lib_vec) {
lib_vec <- normalizePath(lib_vec, mustWork = TRUE)
shim_fun <- .libPaths
shim_env <- new.env(parent = environment(shim_fun))
shim_env$.Library <- character()
shim_env$.Library.site <- character()
environment(shim_fun) <- shim_env
shim_fun(lib_vec)
}
set_lib_paths("~/code/library") # where "~/code/library" is your package directory
Original Source: https://milesmcbain.xyz/hacking-r-library-paths/
I have put the Sys.unsetenv("R_LIBS_USER") command in a .Rprofile file in my windows "own documents" folder. Seems to help. My problem was that being in an active directory environment made R upstart and package loading incredibly slow when connected via vpn.
If you want to do this at RProfile file (#library/base/R/), you can search the lines where R_LIBS_* environment variables are set (for e.g. Sys.setenv(R_LIBS_SITE=....) and Sys.setenv(R_LIBS_USER=.....))
You can also search the code .libPaths(), which sets the library tree. So you can achieve your goal by a combination of commenting, unsetting and setting the R_LIBS variables before the .libPaths() call as you wish. For e.g. Something like:
Sys.unsetenv("R_LIBS")
Sys.unsetenv("R_LIBS_USER")
Sys.setenv(R_LIBS_SITE = "D:/R/libs/site")
Sys.setenv(R_LIBS_USER = "D:/R/libs/user")
Sys.setenv(R_LIBS = "D:/R/libs")

Debugging a function in a different source file in R

I'm using RStudio and I want to be able to stop the code execution at a specific line.
The functions are defined in the first script file and called from a second.
I source the first file into the second one using source("C:/R/script1.R")
I used run from beginning to line: where I start running from the second script which has the function calls and have highlighted a line in the first script where the function definitions are.
I then use browser() to view the variables. However this is not ideal as there are some large matrices involved. Is there a way to make these variables appear in RStudio's workspace?
Also when I restart using run from line to end it only runs to the end of the called first script file it does not return to the calling function and complete the running of the second file.
How can I achieve these goals in RStudio?
OK here is a trivial example the function adder below is defined in one script
adder<-function(a,b) {
browser()
return(a+b)
}
I than call is from a second script
x=adder(3,4)
When adder is called in the second script is starts browser() in the first one. From here I can use get("a") to get the value of a, but the values of a and b do not appear in the workspace in RStudio?
In the example here it does not really matter but when you have several large matrices it does.
If you assign the data, into the .GlobalEnv it will be shown in RStudio's "Workspace" tab.
> adder(3, 4)
Called from: adder(3, 4)
Browse[1]> a
[1] 3
Browse[1]> b
[1] 4
Browse[1]> assign('a', a, pos=.GlobalEnv)
Browse[1]> assign('b', b, pos=.GlobalEnv)
Browse[1]> c
[1] 7
> a
[1] 3
> b
[1] 4
What you refer to as RStudio's workspace is the global environment in an R session. Each function lives in its own small environment, not exposing its local variables to the global environment. Therefore a is not present in the object inspector of RStudio.
This is good programming practice as it shields sections of a larger script from each other, reducing the amount of unwanted interaction. For example, if you use i as a counter in one function, this does not influence the value of a counter i in another function.
You can inspect a when you are in the browser session by using any of the usual functions. For example,
head(a)
str(a)
summary(a)
View(a)
attributes(a)
One common tactic after calling browser is to get a summary of all variables in the current (parent) environment. Make it a habit that every time you stop code with browser, you immediately type ls.str() at the command line.

hiding personal functions in R

I have a few convenience functions in my .Rprofile, such as this handy function for returning the size of objects in memory. Sometimes I like to clean out my workspace without restarting and I do this with rm(list=ls()) which deletes all my user created objects AND my custom functions. I'd really like to not blow up my custom functions.
One way around this seems to be creating a package with my custom functions so that my functions end up in their own namespace. That's not particularly hard, but is there an easier way to ensure custom functions don't get killed by rm()?
Combine attach and sys.source to source into an environment and attach that environment. Here I have two functions in file my_fun.R:
foo <- function(x) {
mean(x)
}
bar <- function(x) {
sd(x)
}
Before I load these functions, they are obviously not found:
> foo(1:10)
Error: could not find function "foo"
> bar(1:10)
Error: could not find function "bar"
Create an environment and source the file into it:
> myEnv <- new.env()
> sys.source("my_fun.R", envir = myEnv)
Still not visible as we haven't attached anything
> foo(1:10)
Error: could not find function "foo"
> bar(1:10)
Error: could not find function "bar"
and when we do so, they are visible, and because we have attached a copy of the environment to the search path the functions survive being rm()-ed:
> attach(myEnv)
> foo(1:10)
[1] 5.5
> bar(1:10)
[1] 3.027650
> rm(list = ls())
> foo(1:10)
[1] 5.5
I still think you would be better off with your own personal package, but the above might suffice in the meantime. Just remember the copy on the search path is just that, a copy. If the functions are fairly stable and you're not editing them then the above might be useful but it is probably more hassle than it is worth if you are developing the functions and modifying them.
A second option is to just name them all .foo rather than foo as ls() will not return objects named like that unless argument all = TRUE is set:
> .foo <- function(x) mean(x)
> ls()
character(0)
> ls(all = TRUE)
[1] ".foo" ".Random.seed"
Here are two ways:
1) Have each of your function names start with a dot., e.g. .f instead of f. ls will not list such functions unless you use ls(all.names = TRUE) therefore they won't be passed to your rm command.
or,
2) Put this in your .Rprofile
attach(list(
f = function(x) x,
g = function(x) x*x
), name = "MyFunctions")
The functions will appear as a component named "MyFunctions" on your search list rather than in your workspace and they will be accessible almost the same as if they were in your workspace. search() will display your search list and ls("MyFunctions") will list the names of the functions you attached. Since they are not in your workspace the rm command you normally use won't remove them. If you do wish to remove them use detach("MyFunctions") .
Gavin's answer is wonderful, and I just upvoted it. Merely for completeness, let me toss in another one:
R> q("no")
followed by
M-x R
to create a new session---which re-reads the .Rprofile. Easy, fast, and cheap.
Other than that, private packages are the way in my book.
Another alternative: keep the functions in a separate file which is sourced within .RProfile. You can re-source the contents directly from within R at your leisure.
I find that often my R environment gets cluttered with various objects when I'm creating or debugging a function. I wanted a way to efficiently keep the environment free of these objects while retaining personal functions.
The simple function below was my solution. It does 2 things:
1) deletes all non-function objects that do not begin with a capital letter and then
2) saves the environment as an RData file
(requires the R.oo package)
cleanup=function(filename="C:/mymainR.RData"){
library(R.oo)
# create a dataframe listing all personal objects
everything=ll(envir=1)
#get the objects that are not functions
nonfunction=as.vector(everything[everything$data.class!="function",1])
#nonfunction objects that do not begin with a capital letter should be deleted
trash=nonfunction[grep('[[:lower:]]{1}',nonfunction)]
remove(list=trash,pos=1)
#save the R environment
save.image(filename)
print(paste("New, CLEAN R environment saved in",filename))
}
In order to use this function 3 rules must always be kept:
1) Keep all data external to R.
2) Use names that begin with a capital letter for non-function objects that I want to keep permanently available.
3) Obsolete functions must be removed manually with rm.
Obviously this isn't a general solution for everyone...and potentially disastrous if you don't live by rules #1 and #2. But it does have numerous advantages: a) fear of my data getting nuked by cleanup() keeps me disciplined about using R exclusively as a processor and not a database, b) my main R environment is so small I can backup as an email attachment, c) new functions are automatically saved (I don't have to manually manage a list of personal functions) and d) all modifications to preexisting functions are retained. Of course the best advantage is the most obvious one...I don't have to spend time doing ls() and reviewing objects to decide whether they should be rm'd.
Even if you don't care for the specifics of my system, the "ll" function in R.oo is very useful for this kind of thing. It can be used to implement just about any set of clean up rules that fit your personal programming style.
Patrick Mohr
A nth, quick and dirty option, would be to use lsf.str() when using rm(), to get all the functions in the current workspace. ...and let you name the functions as you wish.
pattern <- paste0('*',lsf.str(), '$', collapse = "|")
rm(list = ls()[-grep(pattern, ls())])
I agree, it may not be the best practice, but it gets the job done! (and I have to selectively clean after myself anyway...)
Similar to Gavin's answer, the following loads a file of functions but without leaving an extra environment object around:
if('my_namespace' %in% search()) detach('my_namespace'); source('my_functions.R', attach(NULL, name='my_namespace'))
This removes the old version of the namespace if it was attached (useful for development), then attaches an empty new environment called my_namespace and sources my_functions.R into it. If you don't remove the old version you will build up multiple attached environments of the same name.
Should you wish to see which functions have been loaded, look at the output for
ls('my_namespace')
To unload, use
detach('my_namespace')
These attached functions, like a package, will not be deleted by rm(list=ls()).

R: disentangling scopes

My question is about avoiding namespace pollution when writing modules in R.
Right now, in my R project, I have functions1.R with doFoo() and doBar(), functions2.R with other functions, and main.R with the main program in it, which first does source('functions1.R'); source('functions2.R'), and then calls the other functions.
I've been starting the program from the R GUI in Mac OS X, with source('main.R'). This is fine the first time, but after that, the variables that were defined the first time through the program are defined for the second time functions*.R are sourced, and so the functions get a whole bunch of extra variables defined.
I don't want that! I want an "undefined variable" error when my function uses a variable it shouldn't! Twice this has given me very late nights of debugging!
So how do other people deal with this sort of problem? Is there something like source(), but that makes an independent namespace that doesn't fall through to the main one? Making a package seems like one solution, but it seems like a big pain in the butt compared to e.g. Python, where a source file is automatically a separate namespace.
Any tips? Thank you!
I would explore two possible solutions to this.
a) Think more in a more functional manner. Don't create any variables outside of a function. so, for example, main.R should contain one function main(), which sources in the other files, and does the work. when main returns, none of the clutter will remain.
b) Clean things up manually:
#main.R
prior_variables <- ls()
source('functions1.R')
source('functions2.R')
#stuff happens
rm(list = setdiff(ls(),prior_variables))`
The main function you want to use is sys.source(), which will load your functions/variables in a namespace ("environment" in R) other than the global one. One other thing you can do in R that is fantastic is to attach namespaces to your search() path so that you need not reference the namespace directly. That is, if "namespace1" is on your search path, a function within it, say "fun1", need not be called as namespace1.fun1() as in Python, but as fun1(). [Method resolution order:] If there are many functions with the same name, the one in the environment that appears first in the search() list will be called. To call a function in a particular namespace explicitly, one of many possible syntaxes - albeit a bit ugly - is get("fun1","namespace1")(...) where ... are the arguments to fun1(). This should also work with variables, using the syntax get("var1","namespace1"). I do this all the time (I usually load just functions, but the distinction between functions and variables in R is small) so I've written a few convenience functions that loads from my ~/.Rprofile.
name.to.env <- function(env.name)
## returns named environment on search() path
pos.to.env(grep(env.name,search()))
attach.env <- function(env.name)
## creates and attaches environment to search path if it doesn't already exist
if( all(regexpr(env.name,search())<0) ) attach(NULL,name=env.name,pos=2)
populate.env <- function(env.name,path,...) {
## populates environment with functions in file or directory
## creates and attaches named environment to search() path
## if it doesn't already exist
attach.env(env.name)
if( file.info(path[1])$isdir )
lapply(list.files(path,full.names=TRUE,...),
sys.source,name.to.env(env.name)) else
lapply(path,sys.source,name.to.env(env.name))
invisible()
}
Example usage:
populate.env("fun1","pathtofile/functions1.R")
populate.env("fun2","pathtofile/functions2.R")
and so on, which will create two separate namespaces: "fun1" and "fun2", which are attached to the search() path ("fun2" will be higher on the search() list in this case). This is akin to doing something like
attach(NULL,name="fun1")
sys.source("pathtofile/functions1.R",pos.to.env(2))
manually for each file ("2" is the default position on the search() path). The way that populate.env() is written, if a directory, say "functions/", contains many R files without conflicting function names, you can call it as
populate.env("myfunctions","functions/")
to load all functions (and variables) into a single namespace. With name.to.env(), you can also do something like
with(name.to.env("fun1"), doStuff(var1))
or
evalq(doStuff(var1), name.to.env("fun1"))
Of course, if your project grows big and you have lots and lots of functions (and variables), writing a package is the way to go.
If you switch to using packages, you get namespaces as a side-benefit (provided you use a NAMESPACE file). There are other advantages for using packages.
If you were really trying to avoid packages (which you shouldn't), then you could try assigning your variables in specific environments.
Well avoiding namespace pollution, as you put it, is just a matter of diligently partitioning the namespace and keeping your global namespace uncluttered.
Here are the essential functions for those two kinds of tasks:
Understanding/Navigating the Namespace Structure
At start-up, R creates a new environment to store all objects created during that session--this is the "global environment".
# to get the name of that environment:
globalenv()
But this isn't the root environment. The root is an environment called "the empty environment"--all environments chain back to it:
emptyenv()
returns: <environment: R_EmptyEnv>
# to view all of the chained parent environments (which includes '.GlobalEnv'):
search()
Creating New Environments:
workspace1 = new.env()
is.environment(workspace1)
returns: [1] TRUE
class(workspace1)
returns: [1] "environment"
# add an object to this new environment:
with(workspace1, attach(what="/Users/doug/Documents/test_obj.RData",
name=deparse(substitute(what)), warn.conflicts=T, pos=2))
# verify that it's there:
exists("test_obj", where=workspace1)
returns: [1] TRUE
# to locate the new environment (if it's not visible from your current environment)
parent.env(workspace1)
returns: <environment: R_GlobalEnv>
objects(".GlobalEnv")
returns: [1] "test_obj"
Coming from python, et al., this system (at first) seemed to me like a room full of carnival mirrors. The R Gurus on the other hand seem to be quite comfortable with it. I'm sure there are a number of reasons why, but my intuition is that they don't let environments persist. I notice that R beginners use 'attach', as in attach('this_dataframe'); I've noticed that experienced R users don't do that; they use 'with' instead eg,
with(this_dataframe, tapply(etc....))
(I suppose they would achieve the same thing if they used 'attach' then 'detach' but 'with' is faster and you don't have to remember the second step.) In other words, namespace collisions are avoided in part by limiting the objects visible from the global namespace.

Resources