Create a list of functions using R? [duplicate] - r

This question already has answers here:
Returning anonymous functions from lapply - what is going wrong?
(2 answers)
Closed 8 years ago.
I have the following R Code (the last part of this question), after the last line I expect to get a list of 4 "retFun" functions, each initialized with a different x so that I get the following result
funList[[1]](1) == 7 #TRUE
funList[[2]](1) == 8 #TRUE
And so on, but what I seem to get is
funList[[1]](1) == 10 #TRUE
funList[[2]](1) == 10 #TRUE
As if each function in the list has the same x value
creatFun <- function(x, y)
{
retFun <- function(z)
{
z + x + y
}
}
myL <- c(1,2,3,4)
funList <-sapply(myL, creatFun, y = 5)

This could be (and probably is, somewhere) an exercise on how lazy evaluation works in R. You need to force the evaluation of x before the creation of each function:
creatFun <- function(x, y)
{
force(x)
retFun <- function(z)
{
z + x + y
}
}
...and to be safe, you should probably force(y) as well for the times when you aren't passing a single value in for that parameter.
A good discussion can be found in Hadley's forthcoming book, particularly the section on lazy evaluation in the Functions chapter (scroll down).

Related

Chaining assignments in R

I recently discovered that R allows chaining of assignments, e.g.
a = b = 1:10
a
[1] 1 2 3 4 5 6 7 8 9 10
b
[1] 1 2 3 4 5 6 7 8 9 10
I then thought that this could also be used in functions, if two arguments should take the same value. However, this was not the case. For example, plot(x = y = 1:10) produces the following error: Error: unexpected '=' in "plot(x = y =". What is different, and why doesn't this work? I am guessing this has something to with only the first being returned to the function, but both seem to be evaluated.
What are some possibilities and constraints with chained assignments in R?
I don't know about "canonical", but: this is one of the examples that illustrates how assignment (which can be interchangeably be done with <- and =) and passing named arguments (which can only be done using =) are different. It's all about the context in which the expressions x <- y <- 10 or x = y = 10 are evaluated. On their own,
x <- y <- 10
x = y = 10
do exactly the same thing (there are few edge cases where = and <- aren't completely interchangeable as assignment operators, e.g. having to do with operator precedence). Specifically, these are evaluated as (x <- (y <- 10)), or the equivalent with =. y <- 10 assigns the value to 10, and returns the value 10; then x <- 10 is evaluated.
Although it looks similar, this is not the same as the use of = to pass a named argument to a function. As noted by the OP, if f() is a function, f(x = y = 10) is not syntactically correct:
f <- function(x, y) {
x + y
}
f(x = y = 10)
## Error: unexpected '=' in "f(x = y ="
You might be tempted to say "oh, then I can just use arrows instead of equals signs", but this does something different.
f(x <- y <- 10)
## Error in f(x <- y <- 10) : argument "y" is missing, with no default
This statement tries to first evaluate the x <- y <- 10 expression (as above); once it works, it calls f() with the result. If the function you are calling will work with a single, unnamed argument (as plot() does), and you will get a result — although not the result you expect. In this case, since the function has no default value for y, it throws an error.
People do sometimes use <- with a function call as shortcut; in particular I like to use idioms like if (length(x <- ...) > 0) { <do_stuff> } so I don't have to repeat the ... later. For example:
if (length(L <- list(...))>0) {
warning(paste("additional arguments to ranef.merMod ignored:",
paste(names(L),collapse=", ")))
}
Note that the expression length(L <- list(...))>0) could also be written as !length(L <- list(...)) (since the result of length() must be a non-negative integer, and 0 evaluates to FALSE), but I personally think this is a bridge too far in terms of compactness vs readability ... I sometimes think it would be better to forgo the assignment-within-if and write this as L <- list(...); if (length(L)>0) { ... }
PS forcing the association of assignment in the other order leads to some confusing errors, I think due to R's lazy evaluation rules:
rm(x)
rm(y)
## neither x nor y is defined
(x <- y) <- 10
## Error in (x <- y) <- 10 : object 'x' not found
## both x and y are defined
x <- y <- 5
(x <- y) <- 10
## Error in (x <- y) <- 10 : could not find function "(<-"

vectorized way to code function in R [duplicate]

This question already has answers here:
Vectorized IF statement in R?
(6 answers)
Closed 6 years ago.
If I want to code a math function:
f(x)
= 12 for x>1
= x^2 otherwise
If I use
mathfn<-function(x)
{
if(x>1)
{
return(12)
}
else
{
return(x^2)
}
}
then I suppose that's not a good way to code it because it's not generic for calls in which x is a vector. e.g. plot() or integrate() fail.
plot(mathfn, 0,12)
Warning message:
In if (x > 1) { :
the condition has length > 1 and only the first element will be used
What's a more robust, vectorized idiom to code this so that x can either be a scalar or a vector?
Would something like this work:
mathfn <- function(x) ifelse(x > 1, 12, x^2)

Why doesn't lazy evaluation work in this R function? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How to write an R function that evaluates an expression within a data-frame
I want to write a function that sorts a data.frame -- instead of using the cumbersome order(). Given something like
> x=data.frame(a=c(5,6,7),b=c(3,5,1))
> x
a b
1 5 3
2 6 5
3 7 1
I want to say something like:
sort.df(x,b)
So here's my function:
sort.df <- function(df, ...) {
with(df, df[order(...),])
}
I was really proud of this. Given R's lazy evaluation, I figured that the ... parameter would only be evaluated when needed -- and by that time it would be in scope, due to 'with'.
If I run the 'with' line directly, it works. But the function doesn't.
> with(x,x[order(b),])
a b
3 7 1
1 5 3
2 6 5
> sort.df(x,b)
Error in order(...) : object 'b' not found
What's wrong and how to fix it? I see this sort of "magic" frequently in packages like plyr, for example. What's the trick?
This will do what you want:
sort.df <- function(df, ...) {
dots <- as.list(substitute(list(...)))[-1]
ord <- with(df, do.call(order, dots))
df[ord,]
}
## Try it out
x <- data.frame(a=1:10, b=rep(1:2, length=10), c=rep(1:3, length=10))
sort.df(x, b, c)
And so will this:
sort.df2 <- function(df, ...) {
cl <- substitute(list(...))
cl[[1]] <- as.symbol("order")
df[eval(cl, envir=df),]
}
sort.df2(x, b, c)
It's because when you're passing b you're actually not passing an object. Put a browser inside your function and you'll see what I mean. I stole this from some Internet robot somewhere:
x=data.frame(a=c(5,6,7),b=c(3,5,1))
sort.df <- function(df, ..., drop = TRUE){
ord <- eval(substitute(order(...)), envir = df, enclos = parent.frame())
return(df[ord, , drop = drop])
}
sort.df(x, b)
will work.
So will if you're looking for a nice way to do this in an applied sense:
library(taRifx)
sort(x, f=~b)

What is the local/global problem with R?

Under what circumstances does the following example return a local x versus a global x?
The xi'an blog wrote the following at http://xianblog.wordpress.com/2010/09/13/simply-start-over-and-build-something-better/
One of the worst problems is scoping. Consider the following little gem.
f =function() {
if (runif(1) > .5)
x = 10
x
}
The x being returned by this function is randomly local or global. There are other examples where variables alternate between local and non-local throughout the body of a function. No sensible language would allow this. It’s ugly and it makes optimisation really difficult. This isn’t the only problem, even weirder things happen because of interactions between scoping and lazy evaluation.
PS - Is this xi'an blog post written by Ross Ihaka?
Edit - Follow up question.
Is this the remedy?
f = function() {
x = NA
if (runif(1) > .5)
x = 10
x
}
This is only a problem if you write functions that do not take arguments or the functionality relies on the scoping of variables outside the current frame. you either i) pass in objects you need in the function as arguments to that function, or ii) create those objects inside the function that uses them.
Your f is coded incorrectly. If you possibly alter x, then you should pass x in, possibly setting a default of NA or similar if that is what you want the other side of the random flip to be.
f <- function(x = NA) {
if (runif(1) > .5)
x <- 10
x
}
Here we see the function works as per your second function, but by properly assigning x as an argument with appropriate default. Note this works even if we have another x defined in the global workspace:
> set.seed(3)
> replicate(10, f())
[1] NA 10 NA NA 10 10 NA NA 10 10
> x <- 4
> set.seed(3)
> replicate(10, f())
[1] NA 10 NA NA 10 10 NA NA 10 10
Another benefit of this is that you can pass in an x if you want to return some other value instead of NA. If you don't need that facility, then defining x <- NA in the function is sufficient.
The above is predicated on what you actually want to do with f, which isn't clear from your posting and comments. If all you want to do is randomly return 10 or NA, define x <- NA.
Of course, this function is very silly as it can't exploit vectorisation in R - it is very much a scalar operation, which we know is slow in R. A better function might be
f <- function(n = 1, repl = 10) {
out <- rep(NA, n)
out[runif(n) > 0.5] <- repl
out
}
or
f <- function(x, repl = 10) {
n <- length(x)
out <- rep(NA, n)
out[runif(n) > 0.5] <- repl
out
}
Ross's example function was, I surmise, intentionally simple and silly to highlight the scoping issue - it should not be taken as an example of writing good R code, nor would it have been intended as such. Be aware of the scoping feature and code accordingly, and you won't get bitten. You might even find you can exploit this feature...
The 'x' is only declared in the function if the 'if' condition is true, so if 'runif(1)>.5' then the second mentioning of the x will make the function return your local x (10), otherwise it will return a globally defined 'x' (and if 'x' is not defined globally then it will fail)
> f =function() {
+ if (T)
+ x = 10
+ x
+ }
> f()
[1] 10
> f =function() {
+ if (F)
+ x = 10
+ x
+ }
> f()
Error in f() : Object 'x' not found
> x<-77
> f()
[1] 77

Function within function not activating as expected

I have a function that I use to get a "quick look" at a data.frame... I deal with a lot of survey data and this acts as a quick tool to see what's what.
f.table <- function(x) {
if (is.factor(x[[1]])) {
frequency <- function(x) {
x <- round(length(x)/n, digits=2)
}
x <- na.omit(melt(x,c()))
x <- cast(x, variable ~ value, frequency)
x <- cbind(x,top2=x[,ncol(x)]+x[,ncol(x)-1], bottom=x[,2])
}
if (is.numeric(x[[1]])) {
frequency <- function(x) {
x[x > 1] <- 1
x[is.na(x)] <- 0
x <- round(sum(x)/n, digits=2)
}
x <- na.omit(melt(x))
x <- cast(x, variable ~ ., c(frequency, mean, sd, min, max))
x <- transform(x, variable=reorder(variable, frequency))
}
return(x)
}
What I find happens is that if I don't define "frequency" outside of the function, it returns wonky results for data frames with continuous variables. It doesn't seem to matter which definition I use outside of the function, so long as I do.
try:
n <- 100
x <- data.frame(a=c(1:25),b=rnorm(100),c=rnorm(100))
x[x > 20] <- NA
Now, select either one of the frequency functions and paste them in and try it again:
frequency <- function(x) {
x <- round(length(x)/n, digits=2)
}
f.table(x)
Why is that?
Crucially, I think this is where your problem is. cast() is evaluating those functions without reference to the function it was called from. Inside cast() it evaluates fun.aggregate via funstofun and, although I don't really follow what it is doing, is getting stats:::frequency and not your local one.
Hence my comment to your Q. What do you wan the function to do? At the moment it would seem necessary to define a "frequency" function in the global environment so that cast() or funstofun() finds it. Give it a unique name so it is unlikely to clash with anything so it should be the only thing found, say .Frequency(). Without knowing what you want to do with the function (rather than what you thought the function [f.table] should do) it is a bit difficult to provide further guidance, but why not have .FrequencyNum() and .FrequencyFac() defined in the global workspace and rewrite your f.table() wrapper calls to cast to use the relevant one?
.FrequencyFac <- function(X, N) {
round(length(X)/N, digits=2)
}
.FrequencyNum <- function(X, N) {
X[X > 1] <- 1
X[is.na(X)] <- 0
round(sum(X)/N, digits=2)
}
f.table <- function(x, N) {
if (is.factor(x[[1]])) {
x <- na.omit(melt(x, c()))
x <- dcast(x, variable ~ value, .FrequencyFac, N = N)
x <- cbind(x,top2=x[,ncol(x)]+x[,ncol(x)-1], bottom=x[,2])
}
if (is.numeric(x[[1]])) {
x <- na.omit(melt(x))
x <- cast(x, variable ~ ., c(.FrequencyNum, mean, sd, min, max), N = N)
##x <- transform(x, variable=reorder(variable, frequency))
## left this out as I wanted to see what cast returned
}
return(x)
}
Which I thought would work, but it is not finding N, and it should be. So perhaps I am missing something here?
By the way, it is probably not a good idea to rely on function that find n (in your version) from outside the function. Always pass in the variables you need as arguments.
I don't have the package that contains melt, but there are a couple potential issues I can see:
Your frequency functions do not return anything.
It's generally bad practice to alter function inputs (x is the input and the output).
There is already a generic frequency function in stats package in base R, which may cause issues with method dispatch (I'm not sure).

Resources