Under what circumstances does the following example return a local x versus a global x?
The xi'an blog wrote the following at http://xianblog.wordpress.com/2010/09/13/simply-start-over-and-build-something-better/
One of the worst problems is scoping. Consider the following little gem.
f =function() {
if (runif(1) > .5)
x = 10
x
}
The x being returned by this function is randomly local or global. There are other examples where variables alternate between local and non-local throughout the body of a function. No sensible language would allow this. It’s ugly and it makes optimisation really difficult. This isn’t the only problem, even weirder things happen because of interactions between scoping and lazy evaluation.
PS - Is this xi'an blog post written by Ross Ihaka?
Edit - Follow up question.
Is this the remedy?
f = function() {
x = NA
if (runif(1) > .5)
x = 10
x
}
This is only a problem if you write functions that do not take arguments or the functionality relies on the scoping of variables outside the current frame. you either i) pass in objects you need in the function as arguments to that function, or ii) create those objects inside the function that uses them.
Your f is coded incorrectly. If you possibly alter x, then you should pass x in, possibly setting a default of NA or similar if that is what you want the other side of the random flip to be.
f <- function(x = NA) {
if (runif(1) > .5)
x <- 10
x
}
Here we see the function works as per your second function, but by properly assigning x as an argument with appropriate default. Note this works even if we have another x defined in the global workspace:
> set.seed(3)
> replicate(10, f())
[1] NA 10 NA NA 10 10 NA NA 10 10
> x <- 4
> set.seed(3)
> replicate(10, f())
[1] NA 10 NA NA 10 10 NA NA 10 10
Another benefit of this is that you can pass in an x if you want to return some other value instead of NA. If you don't need that facility, then defining x <- NA in the function is sufficient.
The above is predicated on what you actually want to do with f, which isn't clear from your posting and comments. If all you want to do is randomly return 10 or NA, define x <- NA.
Of course, this function is very silly as it can't exploit vectorisation in R - it is very much a scalar operation, which we know is slow in R. A better function might be
f <- function(n = 1, repl = 10) {
out <- rep(NA, n)
out[runif(n) > 0.5] <- repl
out
}
or
f <- function(x, repl = 10) {
n <- length(x)
out <- rep(NA, n)
out[runif(n) > 0.5] <- repl
out
}
Ross's example function was, I surmise, intentionally simple and silly to highlight the scoping issue - it should not be taken as an example of writing good R code, nor would it have been intended as such. Be aware of the scoping feature and code accordingly, and you won't get bitten. You might even find you can exploit this feature...
The 'x' is only declared in the function if the 'if' condition is true, so if 'runif(1)>.5' then the second mentioning of the x will make the function return your local x (10), otherwise it will return a globally defined 'x' (and if 'x' is not defined globally then it will fail)
> f =function() {
+ if (T)
+ x = 10
+ x
+ }
> f()
[1] 10
> f =function() {
+ if (F)
+ x = 10
+ x
+ }
> f()
Error in f() : Object 'x' not found
> x<-77
> f()
[1] 77
Related
I am trying to write my first function in R to calculate emittance using Plank's function for different temperatures. I can do it manually as below for temperatures from 200 to 310 K.
pi <- 3.141593
h <- 6.626068963e-34
c <- 2.99792458e+8
lambda <- 4 * 1e-6
k <- 1.38e-23
t <- c (200:310)
a <- (2*pi*(c^2)*h)/(lambda^5)
b <- exp((h*c)/(lambda*k*t))
B <- a * (1/(b-1))
Where B is the vector of values I want.
Now here is an effort to write a function in R:
P_function <- function(t, pi = 3.141593, h = 6.626068963e-34, c = 2.99792458e+8,
lambda = 4 * 1e-6, k = 1.38e-2) {
((2*pi*(c^2)*h)/(lambda^5)) *((1/(exp((h*c)/(lambda*k*t))-1)))
}
Now for different values of t (200-300K), how do I implement this function?
Couple of problems. First, pi is already a defined constant at better precision than you are using.
> rm(pi) # remove your copy
> pi
[1] 3.141593 # default for console printing is only 8 digits
> print(pi, digits=18)
[1] 3.14159265358979312 # but there is more "depth" to be had
Second, it makes no sense to put scientific constants in the parameter list. Since they're constant they can be defined in the body. Parameter lists are for items that might vary from situation to situation.
newPfun <- function(t) { h <- 6.626068963e-34
c <- 2.99792458e+8
lambda <- 4 * 1e-6
k <- 1.38e-23
a <- (2*pi*(c^2)*h)/(lambda^5) #pi is already defined
b <- exp((h*c)/(lambda*k*t))
B <- a * (1/(b-1))
return(B) }
This is just your original code "packaged" to accept a vector of temperatures. (And I'm pretty sure that's not the right spelling the scientist's name.)
Not sure where your second function is flawed. Perhaps a mismatched parenthesis. After trying to duplicate the results with a single expression and failing multiple times, I'm now wondering if it's really a problem with numerical overflow (or underflow).
I recently discovered that R allows chaining of assignments, e.g.
a = b = 1:10
a
[1] 1 2 3 4 5 6 7 8 9 10
b
[1] 1 2 3 4 5 6 7 8 9 10
I then thought that this could also be used in functions, if two arguments should take the same value. However, this was not the case. For example, plot(x = y = 1:10) produces the following error: Error: unexpected '=' in "plot(x = y =". What is different, and why doesn't this work? I am guessing this has something to with only the first being returned to the function, but both seem to be evaluated.
What are some possibilities and constraints with chained assignments in R?
I don't know about "canonical", but: this is one of the examples that illustrates how assignment (which can be interchangeably be done with <- and =) and passing named arguments (which can only be done using =) are different. It's all about the context in which the expressions x <- y <- 10 or x = y = 10 are evaluated. On their own,
x <- y <- 10
x = y = 10
do exactly the same thing (there are few edge cases where = and <- aren't completely interchangeable as assignment operators, e.g. having to do with operator precedence). Specifically, these are evaluated as (x <- (y <- 10)), or the equivalent with =. y <- 10 assigns the value to 10, and returns the value 10; then x <- 10 is evaluated.
Although it looks similar, this is not the same as the use of = to pass a named argument to a function. As noted by the OP, if f() is a function, f(x = y = 10) is not syntactically correct:
f <- function(x, y) {
x + y
}
f(x = y = 10)
## Error: unexpected '=' in "f(x = y ="
You might be tempted to say "oh, then I can just use arrows instead of equals signs", but this does something different.
f(x <- y <- 10)
## Error in f(x <- y <- 10) : argument "y" is missing, with no default
This statement tries to first evaluate the x <- y <- 10 expression (as above); once it works, it calls f() with the result. If the function you are calling will work with a single, unnamed argument (as plot() does), and you will get a result — although not the result you expect. In this case, since the function has no default value for y, it throws an error.
People do sometimes use <- with a function call as shortcut; in particular I like to use idioms like if (length(x <- ...) > 0) { <do_stuff> } so I don't have to repeat the ... later. For example:
if (length(L <- list(...))>0) {
warning(paste("additional arguments to ranef.merMod ignored:",
paste(names(L),collapse=", ")))
}
Note that the expression length(L <- list(...))>0) could also be written as !length(L <- list(...)) (since the result of length() must be a non-negative integer, and 0 evaluates to FALSE), but I personally think this is a bridge too far in terms of compactness vs readability ... I sometimes think it would be better to forgo the assignment-within-if and write this as L <- list(...); if (length(L)>0) { ... }
PS forcing the association of assignment in the other order leads to some confusing errors, I think due to R's lazy evaluation rules:
rm(x)
rm(y)
## neither x nor y is defined
(x <- y) <- 10
## Error in (x <- y) <- 10 : object 'x' not found
## both x and y are defined
x <- y <- 5
(x <- y) <- 10
## Error in (x <- y) <- 10 : could not find function "(<-"
This question already has answers here:
Returning anonymous functions from lapply - what is going wrong?
(2 answers)
Closed 8 years ago.
I have the following R Code (the last part of this question), after the last line I expect to get a list of 4 "retFun" functions, each initialized with a different x so that I get the following result
funList[[1]](1) == 7 #TRUE
funList[[2]](1) == 8 #TRUE
And so on, but what I seem to get is
funList[[1]](1) == 10 #TRUE
funList[[2]](1) == 10 #TRUE
As if each function in the list has the same x value
creatFun <- function(x, y)
{
retFun <- function(z)
{
z + x + y
}
}
myL <- c(1,2,3,4)
funList <-sapply(myL, creatFun, y = 5)
This could be (and probably is, somewhere) an exercise on how lazy evaluation works in R. You need to force the evaluation of x before the creation of each function:
creatFun <- function(x, y)
{
force(x)
retFun <- function(z)
{
z + x + y
}
}
...and to be safe, you should probably force(y) as well for the times when you aren't passing a single value in for that parameter.
A good discussion can be found in Hadley's forthcoming book, particularly the section on lazy evaluation in the Functions chapter (scroll down).
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How to write an R function that evaluates an expression within a data-frame
I want to write a function that sorts a data.frame -- instead of using the cumbersome order(). Given something like
> x=data.frame(a=c(5,6,7),b=c(3,5,1))
> x
a b
1 5 3
2 6 5
3 7 1
I want to say something like:
sort.df(x,b)
So here's my function:
sort.df <- function(df, ...) {
with(df, df[order(...),])
}
I was really proud of this. Given R's lazy evaluation, I figured that the ... parameter would only be evaluated when needed -- and by that time it would be in scope, due to 'with'.
If I run the 'with' line directly, it works. But the function doesn't.
> with(x,x[order(b),])
a b
3 7 1
1 5 3
2 6 5
> sort.df(x,b)
Error in order(...) : object 'b' not found
What's wrong and how to fix it? I see this sort of "magic" frequently in packages like plyr, for example. What's the trick?
This will do what you want:
sort.df <- function(df, ...) {
dots <- as.list(substitute(list(...)))[-1]
ord <- with(df, do.call(order, dots))
df[ord,]
}
## Try it out
x <- data.frame(a=1:10, b=rep(1:2, length=10), c=rep(1:3, length=10))
sort.df(x, b, c)
And so will this:
sort.df2 <- function(df, ...) {
cl <- substitute(list(...))
cl[[1]] <- as.symbol("order")
df[eval(cl, envir=df),]
}
sort.df2(x, b, c)
It's because when you're passing b you're actually not passing an object. Put a browser inside your function and you'll see what I mean. I stole this from some Internet robot somewhere:
x=data.frame(a=c(5,6,7),b=c(3,5,1))
sort.df <- function(df, ..., drop = TRUE){
ord <- eval(substitute(order(...)), envir = df, enclos = parent.frame())
return(df[ord, , drop = drop])
}
sort.df(x, b)
will work.
So will if you're looking for a nice way to do this in an applied sense:
library(taRifx)
sort(x, f=~b)
I followed the discussion over HERE and am curious why is using<<- frowned upon in R. What kind of confusion will it cause?
I also would like some tips on how I can avoid <<-. I use the following quite often. For example:
### Create dummy data frame of 10 x 10 integer matrix.
### Each cell contains a number that is between 1 to 6.
df <- do.call("rbind", lapply(1:10, function(i) sample(1:6, 10, replace = TRUE)))
What I want to achieve is to shift every number down by 1, i.e all the 2s will become 1s, all the 3s will be come 2 etc. Therefore, all n would be come n-1. I achieve this by the following:
df.rescaled <- df
sapply(2:6, function(i) df.rescaled[df.rescaled == i] <<- i-1))
In this instance, how can I avoid <<-? Ideally I would want to be able to pipe the sapply results into another variable along the lines of:
df.rescaled <- sapply(...)
First point
<<- is NOT the operator to assign to global variable. It tries to assign the variable in the nearest parent environment. So, say, this will make confusion:
f <- function() {
a <- 2
g <- function() {
a <<- 3
}
}
then,
> a <- 1
> f()
> a # the global `a` is not affected
[1] 1
Second point
You can do that by using Reduce:
Reduce(function(a, b) {a[a==b] <- a[a==b]-1; a}, 2:6, df)
or apply
apply(df, c(1, 2), function(i) if(i >= 2) {i-1} else {i})
But
simply, this is sufficient:
ifelse(df >= 2, df-1, df)
You can think of <<- as global assignment (approximately, because as kohske points out it assigns to the top environment unless the variable name exists in a more proximal environment). Examples of why this is bad are here:
Examples of the perils of globals in R and Stata