I've vectorized a custom function, why is outer giving me an error? - r

Take
cubeAndAdd<-function(x,y){x^3+y^3}
outer(-1:1,-1:1,function(x,y) Vectorize(cubeAndAdd(x,y)))
Upon running this, you will get the warning message:
Warning message:
In formals(fun) : argument is not a function
Why is this? After all, if I truly wasn't using a function, then this code wouldn't run at all.

The problem comes from what you're 'feeding' to Vectorize.
Vectorize wants a function as its argument. cubeAndAdd is a function, but cubeAndAdd(x,y) is a function call.
To make your outer loop syntactically correct, you should use Vectorize to create the vectorized function, and then call that new function:
outer(-1:1,-1:1,function(x,y) Vectorize(cubeAndAdd)(x,y))
Here, Vectorize(cubeAndAdd) is the function, and you're calling it using (x,y) as arguments: so Vectorize(cubeAndAdd)(x,y)
(Although the suggestion to just remove the entire anonymous function(x,y) from the outer loop works here (and makes the one-liner shorter), it's often a good idea to explicitly 'feed' the arguments to the function, as you are doing, since this allows one to use functions that expect additional arguments).

Related

Suppressing parentheses in a function that uses the pipe operator

Can somebody explain why I have to suppress the parentheses for the function is.factor in the command shown below? Student-data was read from a .csv file. I can see the structure of Student-data and I want to select only the factor variables. The command works fine but I cannot see why I cannot write the parentheses. I saw an example in the forum. Sorry if the question is silly or has been asked before. I could not find any similar question.
studentData%>%select_if(is.factor)
It's not the pipe, %>%, that requires you to "drop the brackets, it's select_if. From the documentation:
.predicate "A predicate function to be applied to the columns or a logical vector. The variables for which .predicate is or returns TRUE are selected. This argument is passed to rlang::as_function() and thus supports quosure-style lambda functions and strings representing function names."
You're not evaluating the function here. You're passing an R object. (Functions are objects, just as data.frames or scalars are). The evaluation happens later, in the guts of select_if. Including the brackets would tell R to evaluate the function at the time the select_if call was executed. That's not correct. It needs to be evaluated later.

Can we use apply function along with some user defined function or if/while loops in R to conditionally work it on selective rows?

I know that while and if functions in R are not vectorised. while and if functions help us selectively work on some rows based on some condition. I also know that the apply function in R is used to apply over the columns and hence it operates on all rows of columns that we wish to put apply on. Can I use apply() along with user defined functions and/or with while/if loop to conditionally use it over some rows rather than all rows as apply function usually does.
Note :- This core issue here is to bypass the drawback on non-vectorization of while/if loops in R.
You can supply user defined functions to apply using an argument FUN = function(x) user_defined_function(x) {}. And apply is "vectorized" in sense that as argument it accept vectors, not scalars (but its implementation is heavily using for and if loops, type apply without arguments in your console). So for and apply are of the same perfomance.
However you can break the execution of user defined function throwing exception with stop and wrapping in tryCatch it is a non-recommended technique (it influences environements, call stacks, scopes etc., make debugging difficult and lead to errors which are difficult to identify).
Better to use for and if and very often it is the most easiest and effective way (to write a recursive function, taking in consideration that (tail) recursion is not really optimized for R, or fully refactor your algorithm quite difficult and time consuming).

using clusterApply with unknown number of arguments

I want to be able to generalise the behavior of clusterApply() so that I can parallelise functions with different number of arguments.
Normally, I use clusterApply() like this:
clusterApply(cl=cl,seq_len(nsim),FUN=runsim,arg1,arg2,arg3)
But what if I don't know how many arguments function runsim has? I was thinking of using do.call("runsim",listofArguments), but I don't know if I can use it inside of clusterApply.
Any suggestions?
The main issue seems to be the fact that do.call wants the function (or name thereof) as first argument while clusterApply, like all functions from the apply family, passes the iterated over object as the first argument to the function it calls. Consequently one solution could be:
clusterApply(cl=cl,seq_len(nsim),FUN=function(x) do.call(rumsim, args = list(...)))
... can now be filled with whatever different arguments there are including the possibility to hand over x (i.e., the slice of the iterated over object, in this case the simulation number).
I do not see the need to also wrap clusterApply into do.call as you know which function to call (clusterApply).

lapply-ing with the "$" function

I was going through some examples in hadley's guide to functionals, and came across an unexpected problem.
Suppose I have a list of model objects,
x=1:3;y=3:1; bah <- list(lm(x~y),lm(y~x))
and want to extract something from each (as suggested in hadley's question about a list called "trials"). I was expecting one of these to work:
lapply(bah,`$`,i='call') # or...
lapply(bah,`$`,call)
However, these return nulls. It seems like I'm not misusing the $ function, as these things work:
`$`(bah[[1]],i='call')
`$`(bah[[1]],call)
Anyway, I'm just doing this as an exercise and am curious where my mistake is. I know I could use an anonymous function, but think there must be a way to use syntax similar to my initial non-solution. I've looked through the places $ is mentioned in ?Extract, but didn't see any obvious explanation.
I just realized that this works:
lapply(bah,`[[`,i='call')
and this
lapply(bah,function(x)`$`(x,call))
Maybe this just comes down to some lapply voodoo that demands anonymous functions where none should be needed? I feel like I've heard that somewhere on SO before.
This is documented in ?lapply, in the "Note" section (emphasis mine):
For historical reasons, the calls created by lapply are unevaluated,
and code has been written (e.g. bquote) that relies on this. This
means that the recorded call is always of the form FUN(X[[0L]],
...), with 0L replaced by the current integer index. This is not
normally a problem, but it can be if FUN uses sys.call or
match.call or if it is a primitive function that makes use of the
call. This means that it is often safer to call primitive functions
with a wrapper, so that e.g. lapply(ll, function(x) is.numeric(x))
is required in R 2.7.1 to ensure that method dispatch for is.numeric
occurs correctly.

R - where can vectorize happen?

So clearly one way to vectorize a function is WITHIN the function - either explicitly iterate over inputs or utilize other functions that have been vectorized. Is there a way to mark or tag a function as being/treated as vectorized so that the iteration is managed by the R platform? The analogy would be attributes in c# or annotations in Java. I tell R that this function should be treated as vectorized and it feeds that input one at a time into the function, constructing the vector output? Or am I just thinking about this whole thing incorrectly?
You can use the Vectorize function (http://stat.ethz.ch/R-manual/R-patched/library/base/html/mapply.html), to make the function take vectors.
But here it just uses the mapply function to do the vectorization. As Gavin said, you are just hiding the loop.

Resources