Environment expansion in R - r

I have some problem I don't know how to solve.
Prehistory: I use R.NET for my calculation (need WPF application). So, I want to parallelize my app, and I created dynamic proxy for REngine class. It needs to serialize data to pass and receive data from-to REngine instance via TCP. Bad news - R.NET classes cannot be serialized. So, I have an idea to serialize R objects in R and pass R serialized data between processes.
So I have same script like this:
a <- 5;
b <- 10;
x <- a+b;
I need to wrap it like this:
wrapFunction <- function()
{
a <- 5;
b <- 10;
x <- a+b;
}
serializedResult <- serialize(wrapFunction());
I'll get serializedResult and pass it as byte array. Also I need to pass environments. But I won't get a, b, x in .GlobalEnv after these manipulations.
How is it possible to get all variables, defined in function body, in my .GlobalEnv?
I don't know names and count, I can't rewrite basic script, replacing "<-" by "<<-".
Other ways?
Thank you.

I'm not sure I fully understand your requirements. They seem to go against the functional language paradigm, which R tries to follow. The following might be helpful or not:
e <- new.env()
wrapFunction <- function(){
with(e, {
a <- 5;
b <- 10;
x <- a+b;
})
}
wrapFunction()
e$a
#[1] 5
You can of course use the .GlobalEnv instead of e, but at least in R that would be considered an even worse practice.

Related

Is there a way to figure out the return type of a function?

I think this is a simple question.
As for many languages, you need to provide the return type before you write a function.
however I didn't find solution for R.
the only way I can do right now is to make a call, and using str(),mode(),class() to check the returned value.
but if the function takes long time, I can't use this way
Is there a simple way to know the return type of the function even before I call it?
by the way, I could find some of return type by typing ?function_name, but many helps didn't mention the return type of the function.
Okey, this is the senario why known this info could be so useful:
1.I need the return type to see, after I got the return value, how should I deal with the return, for simple case, if I don't know the return type is list or dataframe, I don't actually have the info to decide which function to use next~Sometimes you don't know you got S3 or S4 object, which made you don't know you should use # or $ to deal with it
2.suppose there are two functions in two package do the same thing, one return me a connection and another return me a html object, if I knew the return type then I can easily pick which function I should use based on my case. Sometimes you only have limit times to connect to some places, then you will waste several of your chances to check out the return type
In short no...
R is a dynamically typed language and many of it's function return different types depending on the parameters passed, being this in my opinion one of the strong points of R, many functions accept many types and return many types. A quick example:
mode(sapply(vector(mode="list", 10) ,function (x) return ('a')))
[1] "character"
mode(sapply(vector(mode="list", 10) ,function (x) return (1)))
[1] "numeric"
Here sapply returns a "character" type or a "numeric" type depending on the function passed to apply on each of the elements. Overall you just have to get used to the language and if nothing works do what you are doing on small tasks first.
In R, there are a number of ways to handle multiple return types from a function.
Let's say you have a function f that calls g, which can return various types. Your first option is with an explicit type check:
f <- function(x)
{
...
y <- g(x)
if(is.data.frame(y))
{
# process result as a data frame
}
else if(is.list(y))
{
# process y as a list
# this must go after is.data.frame, because a list is also a data frame
}
...
}
Now f will check the result of g when it returns, and then call the appropriate code to handle it.
This is fine if you only have a small number of possible types to choose from. Once the number of types becomes large, it's a better option to use something more systematic. That option would be to use R's object framework(s). The simplest framework is S3, so let's look at that.
f <- function(x)
{
y <- g(x)
f_result(y)
}
# this is the f_result _generic_: it dispatches individual methods based on the class of y
f_result <- function(y)
{
UseMethod("f_result")
}
# f_result _method_ for data frames
f_result.data.frame <- function(y)
{
# process result as a data frame
}
# f_result method for lists
f_result.list <- function(y)
{
# process result as a list
}

R: Storing data within a function and retrieving without using "return"

The following simple example will help me address a problem in my program implementation.
fun2<-function(j)
{
x<-rnorm(10)
y<-runif(10)
Sum<-sum(x,y)
Prod<-prod(x,y)
return(Sum)
}
j=1:10
Try<-lapply(j,fun2)
#
I want to store "Prod" at each iteration so I can access it after running the function fun2. I tried using assign() to create space assign("Prod",numeric(10),pos=1)
and then assigning Prod at j-th iteration to Prod[j] but it does not work.
#
Any idea how this can be done?
Thank you
You can add anything you like in the return() command. You could return a list return(list(Sum,Prod)) or a data frame return(data.frame("In"=j,"Sum"=Sum,"Prod"=Prod))
I would then convert that list of data.frames into a single data.frame
Try2 <- do.call(rbind,Try)
Maybe re-think the problem in a more vectorized way, taking advantage of the implied symmetry to represent intermediate values as a matrix and operating on that
ni = 10; nj = 20
x = matrix(rnorm(ni * nj), ni)
y = matrix(runif(ni * nj), ni)
sums = colSums(x + y)
prods = apply(x * y, 2, prod)
Thinking about the vectorized version is as applicable to whatever your 'real' problem is as it is to the sum / prod example; in practice and when thinking in terms of vectors fails I've never used the environment or concatenation approaches in other answers, but rather the simple solution of returning a list or vector.
I have done this before, and it works. Good for a quick fix, but its kind of sloppy. The <<- operator assigns outside the function to the global environment.
fun2<-function(j){
x<-rnorm(10)
y<-runif(10)
Sum<-sum(x,y)
Prod[j]<<-prod(x,y)
}
j=1:10
Prod <- numeric(length(j))
Try<-lapply(j,fun2)
Prod
thelatemail and JeremyS's solutions are probably what you want. Using lists is the normal way to pass back a bunch of different data items and I would encourage you to use it. Quoted here so no one thinks I'm advocating the direct option.
return(list(Sum,Prod))
Having said that, suppose that you really don't want to pass them back, you could also put them directly in the parent environment from within the function using either assign or the superassignment operator. This practice can be looked down on by functional programming purists, but it does work. This is basically what you were originally trying to do.
Here's the superassignment version
fun2<-function(j)
{
x<-rnorm(10)
y<-runif(10)
Sum<-sum(x,y)
Prod[j] <<- prod(x,y)
return(Sum)
}
j=1:10
Prod <- numeric(10)
Try<-lapply(j,fun2)
Note that the superassignment searches back for the first environment in which the variable exists and modifies it there. It's not appropriate for creating new variables above where you are.
And an example version using the environment directly
fun2<-function(j,env)
{
x<-rnorm(10)
y<-runif(10)
Sum<-sum(x,y)
env$Prod[j] <- prod(x,y)
return(Sum)
}
j=1:10
Prod <- numeric(10)
Try<-lapply(j,fun2,env=parent.frame())
Notice that if you had called parent.frame() from within the function you would need to go back two frames because lapply() creates its own. This approach has the advantage that you could pass it any environment you want instead of parent.frame() and the value would be modified there. This is the seldom-used R implementation of writeable passing by reference. It's safer than superassignment because you know where the variable is that is being modified.

FOR loops giving no result or error in R

I am running the following code:
disc<-for (i in 1:33) {
m=n[i]
xbar<-sum(data[i,],na.rm=TRUE)/m
Sx <- sqrt(sum((data[i,]-xbar)^2,na.rm=TRUE)/(m-1))
Sx
i=i+1}
Running it:
>disc
NULL
Why is it giving me NULL?
This is from the documentation for for, accessible via ?`for`:
‘for’, ‘while’ and ‘repeat’ return ‘NULL’ invisibly.
Perhaps you are looking for something along the following lines:
library(plyr)
disc <- llply(1:33, function(i) {
m=n[i]
xbar<-sum(data[i,],na.rm=TRUE)/m
Sx <- sqrt(sum((data[i,]-xbar)^2,na.rm=TRUE)/(m-1))
Sx
})
Other variants exists -- the ll in llply stands for "list in, list out". Perhaps your intended final result is a data frame or an array -- appropriate functions exist.
The code above is a plain transformation of your example. We might be able to do better by splitting data right away and forgetting the otherwise useless count variable i (untested, as you have provided no data):
disc <- daply(cbind(data, n=n), .(), function(data.i) {
m=data.i$n
xbar<-sum(data.i,na.rm=TRUE)/m
sqrt(sum((data.i-xbar)^2,na.rm=TRUE)/(m-1))
})
See also the plyr website for more information.
Related (if not a duplicate): R - How to turn a loop to a function in R
krlmlr's answer shows you how to fix your code, but to explain your original problem in more abstract terms: A for loop allows you to run the same piece of code multiple times, but it doesn't store the results of running that code for you- you have to do that yourself.
Your current code only really assigns a single value, Sx, for each run of the for loop. On the next run, a new value is put into the Sx variable, so you lose all the previous values. At the end, you'll just end up with whatever the value of Sx was on the last run through the loop.
To save the results of a for loop, you generally need to add them to a vector as you go through, e.g.
# Create the empty results vector outside the loop
results = numeric(0)
for (i in 1:10) {
current_result = 3 + i
results = c(results, current_result)
}
In R for can't return a value. The unique manner to return a value is within a function. So the solution here, is to wrap your loop within a function. For example:
getSx <- function(){
Sx <- 0
disc <- for (i in 1:33) {
m=n[i]
xbar <- sum(data[i,],na.rm=TRUE)/m
Sx <- sqrt(sum((data[i,]-xbar)^2,na.rm=TRUE)/(m-1))
}
Sx
}
Then you call it:
getSx()
Of course you can avoid the side effect of using a for by lapply or by giving a vectorized But this is another problem: You should maybe give a reproducible example and explain a little bit what do you try to compute.

sys.frame, sys.nframe, etc. in R

could someone please explain to me what these various environment functions do specifically? ie which one returns what frame? i am thoroughly confused after reading the documentation (http://stat.ethz.ch/R-manual/R-patched/library/base/html/sys.parent.html)
Let's put some structure on the question:
x = 1; y=2; z=3;
f = function() { ls(); ls(envir=sys.frame());}
#this first prints the contents of this function and then of the global environment
I am trying to understand how one can access environments of calling functions and to know which environment you are in. For example g could have called f:
g = function() { somevar=1; f() }
If I wanted to get the contents of g, how would i do that? What is the difference between a frame and an environment?
parent.frame() refers to the calling environment. You normally don't need the rest of them. For your example use this to list somevar :
f <- function() ls(parent.frame())

How does local() differ from other approaches to closure in R?

Yesterday I learned from Bill Venables how local() can help create static functions and variables, e.g.,
example <- local({
hidden.x <- "You can't see me!"
hidden.fn <- function(){
cat("\"hidden.fn()\"")
}
function(){
cat("You can see and call example()\n")
cat("but you can't see hidden.x\n")
cat("and you can't call ")
hidden.fn()
cat("\n")
}
})
which behaves as follows from the command prompt:
> ls()
[1] "example"
> example()
You can see and call example()
but you can't see hidden.x
and you can't call "hidden.fn()"
> hidden.x
Error: object 'hidden.x' not found
> hidden.fn()
Error: could not find function "hidden.fn"
I've seen this kind of thing discussed in Static Variables in R where a different approach was employed.
What the pros and cons of these two methods?
Encapsulation
The advantage of this style of programming is that the hidden objects won't likely be overwritten by anything else so you can be more confident that they contain what you think. They won't be used by mistake since they can't readily be accessed. In the linked-to post in the question there is a global variable, count, which could be accessed and overwritten from anywhere so if we are debugging code and looking at count and see its changed we cannnot really be sure what part of the code has changed it. In contrast, in the example code of the question we have greater assurance that no other part of the code is involved.
Note that we actually can access the hidden function although its not that easy:
# run hidden.fn
environment(example)$hidden.fn()
Object Oriented Programming
Also note that this is very close to object oriented programming where example and hidden.fn are methods and hidden.x is a property. We could do it like this to make it explicit:
library(proto)
p <- proto(x = "x",
fn = function(.) cat(' "fn()"\n '),
example = function(.) .$fn()
)
p$example() # prints "fn()"
proto does not hide x and fn but its not that easy to access them by mistake since you must use p$x and p$fn() to access them which is not really that different than being able to write e <- environment(example); e$hidden.fn()
EDIT:
The object oriented approach does add the possibility of inheritance, e.g. one could define a child of p which acts like p except that it overrides fn.
ch <- p$proto(fn = function(.) cat("Hello from ch\n")) # child
ch$example() # prints: Hello from ch
local() can implement a singleton pattern -- e.g., the snow package uses this to track the single Rmpi instance that the user might create.
getMPIcluster <- NULL
setMPIcluster <- NULL
local({
cl <- NULL
getMPIcluster <<- function() cl
setMPIcluster <<- function(new) cl <<- new
})
local() might also be used to manage memory in a script, e.g., allocating large intermediate objects required to create a final object on the last line of the clause. The large intermediate objects are available for garbage collection when local returns.
Using a function to create a closure is a factory pattern -- the bank account example in the Introduction To R documentation, where each time open.account is invoked, a new account is created.
As #otsaw mentions, memoization might be implemented using local, e.g., to cache web sites in a crawler
library(XML)
crawler <- local({
seen <- new.env(parent=emptyenv())
.do_crawl <- function(url, base, pattern) {
if (!exists(url, seen)) {
message(url)
xml <- htmlTreeParse(url, useInternal=TRUE)
hrefs <- unlist(getNodeSet(xml, "//a/#href"))
urls <-
sprintf("%s%s", base, grep(pattern, hrefs, value=TRUE))
seen[[url]] <- length(urls)
for (url in urls)
.do_crawl(url, base, pattern)
}
}
.do_report <- function(url) {
urls <- as.list(seen)
data.frame(Url=names(urls), Links=unlist(unname(urls)),
stringsAsFactors=FALSE)
}
list(crawl=function(base, pattern="^/.*html$") {
.do_crawl(base, base, pattern)
}, report=.do_report)
})
crawler$crawl(favorite_url)
dim(crawler$report())
(the usual example of memoization, Fibonacci numbers, is not satisfying -- the range of numbers that don't overflow R's numeric representation is small , so one would probably use a look-up table of efficiently pre-calculated values). Interesting how crawler here is a singleton; could as easily have followed a factory pattern, so one crawler per base URL.

Resources