R apply with custom functions - r

I'm having trouble applying a custom function in R. The basic setup is I have a bunch of points, I want to drop a grid over top and get the max z value from each cell.
The code I'm trying is below. The results I'm looking for would return myGrid$z=c(5,10,na). The na could be a different value as well as long as I could filter it out later. I'm getting an error at the apply stage
I believe there is an error in how I'm using apply, but I just haven't been able to get my head wrapped around apply.
thanks,
Gordon
myPoints<-data.frame(x=c(0.7,0.9,2),y=c(0.5,0.7,3), z=c(5,3,10))
myGrid<-data.frame(x=c(0.5,2,4),y=c(0.5,3,10))
grid_spacing = 1
get_max_z<-function(x,y) {
z<-max(myPoints$z[myPoints$x > (x-grid_spacing/2)
& myPoints$x <= (x+grid_spacing/2)
& myPoints$y > (y-grid_spacing/2)
& myPoints$y <= (y+grid_spacing/2)])
return(z)
}
myGrid$z<-apply(myGrid,1,get_max_z(x,y),x=myGrid$x,y=myGrid$y)
Edited to include the return(z) line I left out. added $y to line of custom function above return.

First of all I would recommend you to always boil down a question to its core instead of just posting code. But I think I know what your problem is:
> df <- data.frame(x = c(1,2,3), y = c(2,1,5))
> f <- function(x,y) {x+y}
> apply(df,1,function(d)f(d["x"],d["y"]))
[1] 3 3 8
apply(df,1,.) will traverse df row wise and hand the current row as an argument to the provided function. This row is a vector and passed into the anonymous function via the only available argument d. Now you can access the elements of the vector and hand them further down to your custom function f taking two parameters.
I think if you get this small piece of code then you know how to adjust in your case.
UPDATE:
Essentially you make two mistakes:
you hand a function call instead of a function to apply.
function call: get_max_z(x,y)
function: function(x,y)get_max_z(x,y)
you misinterpreted the meaning of "..." in the manual to apply as the way to hand over the arguments. But actually this is just the way to pass additional arguments independent of the traversed data object.

Related

Attempting Pass by reference in R

I want to emulate call by reference in R and in my search came across this link https://www.r-bloggers.com/call-by-reference-in-r/.
Using the strategy given in the link above I tried to create a function that would modify the integer vector passed to it as well as return the modified vector. Here's its implementation
library(purrr)
fun = function(top){
stopifnot(is_integer(top))
top1 <- top
top1 <- c(top1,4L)
eval.parent(substitute(top<-top1))
top1
}
When I create a variable and pass to this function, it works perfectly as shown
> k <- c(9L,5L)
> fun(k)
[1] 9 5 4
> k
[1] 9 5 4
But when I pass the integer vector directly, it throws an error:
> fun(c(3L,4L))
Error in c(3L, 4L) <- c(3L, 4L, 4L) :
target of assignment expands to non-language object
Is there a workaround for this situation, where if a vector is passed directly, then we only return the modified vector as the result ?
Any help would be appreciated...
There is no workaround for this. You've essentially created a function that takes a variable name as input and modifies that variable as a side effect of running. Because c(3L,4L) is not a variable name, the function cannot work as intended.
To be clear, what you have right now is not really pass-by-reference. Your function resembles it superficially, but is in fact using some workarounds to simply evaluate an expression in the function's parent environment, instead of its own. This type of "operation by side effect" is generally considered bad practice (such changes are hard to track and debug, and prone to error), and R is built to avoid them.
Pass-by-reference in R is generally not possible, nor have I found it necessary in over a decade of daily R use.

How do I remove an object from within a function environment in R?

How do I remove an object from the current function environment?
I'm trying to achieve this:
foo <- function(bar){
x <- bar
rm(bar, envir = environment())
print(c(x, is.null(bar)))
}
Because I want the function to be able to handle multiple inputs.
Specifically I'm trying to pass either a dataframe or a vector to the function, and if I'm passing a dataframe I want to set the vector to NULL for later error handling.
If you want, you can watch my DepthPlotter script, where I want to let the second function check if depth is a dataframe, and if so, assign it to df in stead and remove depth from the environment.
Here is a very brief sketch of how to set this up using S3 method dispatch.
First, you define your generic:
DepthPlotter <- function(depth,...){
UseMethod("DepthPlotter", depth)
}
Then you define methods for specific classes of the argument depth. As a very basic example in your case, you might create only two, a data.frame method and a default method to handle the vector case:
DepthPlotter.default <- function(depth, variable, ...){
#Here you write a function assuming that depth is
# anything but a data frame
}
DepthPlotter.data.frame <- function(depth,...){
#Here you'd write a function that assumes
# that depth is a data frame
}
And then you can call DepthPlotter() using either type of argument and the correct function will be run based upon the result of class(depth).
The example I've sketched out here is a little crude, since I've used a default method to handle the vector case. You could write .numeric and .integer methods to handle numeric or integer vectors more specifically. In my example, the .default method will be called for any case other than data.frame, so if you go this route you'd want to write some code in there that checks for strange cases like depth being a complicated list, or other odd object, if you think there's a chance something like that might be passed to the function.

returning functions in R - when does the binding occur?

As in other functional languages, returning a function is a common case in R. for example, after training a model you'd like to return a "predictor" object, which is essentially a function, that given new data, returns predictions. There are other cases when this is useful, of course.
My question is when does the binding (e.g. evaluation) of values within the returned function occur.
As a simple example, suppose I want to have a list of three functions, each is slightly different based on a parameter whose value I set at the time of the creation of the function. Here is a simple code for this:
function.list = list()
for (i in 1:3) function.list[[i]] = function(x) x+i
So now I have three functions. Ideally, the first one returns x+1, the second computes x+2 and the third computes x+3
so I would expect:
function.list[[1]] (3) = 4
function.list[[2]] (3) = 5
etc.
Unfortunately, this doesn't happen and all the functions in the list above compute the same x+3. my question is why? why does the binding of the value of i is so late, and hence the same for all the functions in the list? How can I work around this?
EDIT:
rawr's link to a similar question was insightful, and I thought it solved the problem. Here is the link:
Explain a lazy evaluation quirk
however, I checked the code I gave above, with the fix suggested there, and it still doesn't work. Certainly, I miss something very basic here. can anyone tell me what is it? here is the "fixed" code (that still doesn't work)
function.list = list()
for (i in 1:3) { force(i); function.list[[i]] = function(x) x+i}
Still function.list[[1]] (3) gives 6 and not 4 as expected.
I also tried the following (e.g. putting the force() inside the function)
function.list = list()
for (i in 1:3) function.list[[i]] = function(x) {force(i);x+i}
what's going on?
Here's a solution with a for loop, using R 3.1:
> makeadd=function(i){force(i);function(x){x+i}}
> for (i in 1:3) { function.list[[i]] = makeadd(i)}
> rm(i) # not necessary but causes errors if we're actually using that `i`
> function.list[[1]](1)
[1] 2
> function.list[[2]](1)
[1] 3
The makeadd function creates the adding function in a context with a local i, which is why this works. It would be interesting to know if this works without the force in R 3.2. I always use the force, Luke....

Function return value changes if use local variable

I have two snippets of code which I would have expected to behave the same, but they don't:
position <- function(t) {
coordinates <- c(cosh(t), sinh(t))
return(coordinates[1])
}
and
position <- function(t) {
coordinates <- c(cosh(t), sinh(t))
return(cosh(t))
}
I use the function position to plot a curve. With the first snippet the curve is not plotted. With the second snippet the curve is plotted.
What is the functional difference between the two snippets, and why?
What gets returned will depend on the type of argument passed. If the argument "t" is a matrix as might be expected for a function designed to deal with coordinates, than a matrix is returned from cosh(t) and from sinh(t).
The first function would only return the first element of a matrix formed and then "straightened out" as the c function caused it to loose dimensions. If you wanted to preserve the matrix character, then use rbind or cbind depending on what would be the next function to process the data.
The second function would first calculate "coordinates" and then let it disappear into the garbage collector since it returns the matrix formed by cosh(t) instead.
You will not be able to get a better answer since you are at the moment making us all guess about what sort of data structure you are passing to the function. You should post the results of dput() on your argument to this function. And you should tell us what the help page for the plotting function expects as an argument type.
The result of
coordinates <- c(cosh(t), sinh(t))
is a numeric vector of length 2 * length(t).
The command
return(coordinates[1])
returns only the first value of this vector. (The result of coordinates[1] and cosh(t) are only identical if length(t) == 1.) To return the result of cosh(h), you could index coordinates with a sequence based on the length of t:
coordinates <- c(cosh(t), sinh(t))
return(coordinates[seq_along(t)])
Use double brackets in your first example.
coordinates[[1]]
As a useful tip when troubleshooting, if you explore the output of your two functions using str(position(x)) for your two different functions, you should see the difference.
Try also
str(vec[1])
str(vec[[1]])

Subsetting within a function

I'm trying to subset a dataframe within a function using a mixture of fixed variables and some variables which are created within the function (I only know the variable names, but cannot vectorise them beforehand). Here is a simplified example:
a<-c(1,2,3,4)
b<-c(2,2,3,5)
c<-c(1,1,2,2)
D<-data.frame(a,b,c)
subbing<-function(Data,GroupVar,condition){
g=Data$c+3
h=Data$c+1
NewD<-data.frame(a,b,g,h)
subset(NewD,select=c(a,b,GroupVar),GroupVar%in%condition)
}
Keep in mind that in my application I cannot compute g and h outside of the function. Sometimes I'll want to make a selection according to the values of h (as above) and other times I'll want to use g. There's also the possibility I may want to use both, but even just being able to subset using 1 would be great.
subbing(D,GroupVar=h,condition=5)
This returns an error saying that the object h cannot be found. I've tried to amend subset using as.formula and all sorts of things but I've failed every single time.
Besides the ease of the function there is a further reason why I'd like to use subset.
In the function I'm actually working on I use subset twice. The first time it's the simple subset function. It's just been pointed out below that another blog explored how it's probably best to use the good old data[colnames()=="g",]. Thanks for the suggestion, I'll have a go.
There is however another issue. I also use subset (or rather a variation) in my function because I'm dealing with several complex design surveys (see package survey), so subset.survey.design allows you to get the right variance estimation for subgroups. If I selected my group using [] I would get the wrong s.e. for my parameters, so I guess this is quite an important issue.
Thank you
It's happening right as the function is trying to define GroupVar in the beginning. R is looking for the object h by itself (not within the dataframe).
The best thing to do is refer to the column names in quotes in the subset function. But of course, then you'd have to sidestep the condition part:
subbing <- function(Data, GroupVar, condition) {
....
DF <- subset(Data, select=c("a","b", GroupVar))
DF <- DF[DF[,3] %in% condition,]
}
That will do the trick, although it can be annoying to have one data frame indexing inside another.

Resources