How to avoid using sapply() on user defined functions in R - r

I am a beginner with R programming. Recently I wrote a user-defined function as follows:
foo <- function(x){
power <- 1:4
sum(x^power)
}
This function works fine when x is a single number. For example, when x = 1, the result is 4 and when x = 10 the result is 11110. However, this function doesn't work with vectors. For example, when x <- c(1, 10), the result is 10102 which is not what I want. My desire result is a vector such as 4 11110. I know this problem can be solved by using sapply() on function or add a for-loop inside the function, but I think there might be another way to rewrite the function without using loops or "apply" functions. I have tried different ways to rewrite the function but nothing works, can somebody help me to solve the problem? Thanks!

Mathematically, a simple and more straightforward approach is to rewrite foo function like below
foo <- function(x) {
power <- 1:4
ifelse(x==1,max(power),x*(x**(max(power))-1)/(x-1))
}
which gives
> foo(c(1,10))
[1] 4 11110

I don't think there is a way to avoid any kind of implicit or explicit loop since power is a vector and you are passing x to it which is another vector.
Here are few options :
Your best bet is sapply (which you have already figured out).
sapply(c(1, 10), foo)
#[1] 4 11110
Another way is to use Vectorize where you cannot "see" the loop but it still loop beneath as it is a wrapper to mapply.
Vectorize(foo)(c(1, 10))
#[1] 4 11110
Using outer :
foo <- function(x){
power <- 1:4
rowSums(outer(x, power, `^`))
}
foo(c(1, 10))
#[1] 4 11110
and obviously you can write a simple for loop as well and pass c(1, 10) to it.

This works:
foo <- function(x, power = 1:4){
ind <- 1 + seq_along(power)
power <- matrix(rep(power, length(x)), nrow = length(x), byrow = T)
x <- as.matrix(x)
m <- cbind(x, power)
m <- m[, 1]^m[, ind]
v <- rowSums(m)
return(v)
}
foo(x = c(1, 10))
## [1] 4 11110
Runs about 8.5x faster than using sapply(x foo) (when foo is a vector of length == 1,000,000). It's a bit late here, so I don't know whether you could optimise the internals a little better.

Related

how to use apply (or sapply) with columns of matrix or dataframe as function args

I know this is a bonehead newbie question, but I've been trying to figure it out for quite awhile and need some input. Basically, I'm trying to learn how to use the apply family to omit for loops, specifically how to set up the call so that columns of a matrix serve as arguments to the function. I'll use a simple call to the rbinom function as an example.
Example: this for loop works fine. The data are a set of integers and a set of probabilities
success <- rep(-1, times=10) # initialize result var
num <- sample.int(20, 10) # get 10 random integers
p <- runif(10) # get 10 random probabilities
for (i in 1:10) {
success[i]= rbinom(n=1, size=num[i],prob=p[i]) # number successes in 1 trial
}
But how to do the same thing with the apply family? I first put the data into 2 columns of a matrix, thinking that was the right start. However, the following does NOT work, obviously due to my
poor understanding of how to set up a call to apply.
myData <- matrix(nrow=10, ncol=2)
myData[,1] <- num
myData[,2] <- p
success <- apply(myData, rbinom, n=1, size=myData[,1], prob=myData[,2])
Any tips are greatly appreciated! I'm coming to R from Fortran, and trying to port over a lot of code that is loaded with DO loops, so I really need to get my head around this.
lapply, sapply, apply only deal with one vector/list at a time. That is, apply will only call its function for one column at a time. What you need is mapply or Map.
myData <- matrix(nrow=10, ncol=2)
myData[,1] <- num
myData[,2] <- p
mapply(rbinom, n = 1, myData[,1], myData[,2])
# [1] 5 4 11 8 3 3 17 8 0 11
Just like lapply returns a list, so does Map; similarly, just like sapply, mapply will return a vector or array if all return values are compatible, otherwise it returns a list as well.
These calls are equivalent:
sapply(1:3, function(z) z + 1)
mapply(function(z) z + 1, 1:3)
but mapply and Map allow arbitrary number of lists/vectors, so for instance
func <- function(X,Y,Z) X^2+2*Y-Z
Map(func, 1:9, 11:19, 21:29)
## effectively the same as
list(
func(1, 11, 21),
func(2, 12, 22),
func(3, 13, 33),
...,
func(9, 19, 29)
)
The equivalent call of that with sapply for your data would be
sapply(seq_len(nrow(myData)), function(ind) {
rbinom(n = 1, size = myData[ind,1], prob = myData[ind,2])
})
though I personally feel that mapply is easier to read.

How to loop over a sliding window of data with lapply?

How can I use lapply() to "loop" over a multi-column dataset and apply a function? Normally, I would use rollapply(), but for reasons that aren't worth going into the analytics in this case only works with lapply(). I know how to run a function over an expanding window. But how can lapply() be used with a sliding window? For example, here's a toy example for manually changing the range works with a function I'll call my_fun for a multi-column dataset (dat1):
set.seed(78)
dat1 <- as.data.frame(matrix(rnorm(1000), ncol = 20, nrow = 50))
my_fun <-function(x) {
a <-apply(x,1,mean)
}
test.1 <-my_fun(dat1[1:10])
test.2 <-my_fun(dat1[2:11])
test.3 <-my_fun(dat1[3:12])
Using lapply() for an expanding window works too, i.e., for ranges 1:10, 1:11, 1:12:
test.a <-lapply(seq(10, 12), function(x) my_fun(dat1[1:x]))
My question: is there any way to use lapply to replicate the sliding window analysis via the 3 manual examples above? I've tried several possibilities, using rep() and replicate(), for example, but so far no success. Any insight would be greatly appreciated.
test.a <-lapply(seq(1, 3), function(x) my_fun(dat1[x:(x+9)]))
In fact, it can be done with rollapply like this:
library(zoo)
res <- t(rollapply(t(dat1), 10, function(x) my_fun(t(x)), by.column = FALSE))
# verify that res[, i] equals test.i for i = 1,2,3
all.equal(res[, 1], test.1)
## [1] TRUE
all.equal(res[, 2], test.2)
## [1] TRUE
all.equal(res[, 3], test.3)
## [1] TRUE

How can I address the values in a vector based on start and stop indexes from other vectors?

Let's say I have a vector full of zeros:
x <- rep(0, 100)
I want to set the values in certain ranges to 1:
starts <- seq(10, 90, 10)
stops <- starts + round(runif(length(starts), 1, 5))
I can do this with a for loop:
for(i in seq_along(starts)) x[starts[i]:stops[i]] <- 1
But I know this is frowned upon in R. How can I do this in a vectorized way, ideally without an external package?
You can use Map() to get all of the indices, Reduce(union, ...) to drop that list down to an atomic vector of the unique indices and then [<- or replace() to replace.
replace(x, Reduce(union, Map(":", starts, stops)), 1L)
Or
x[Reduce(union, Map(":", starts, stops))] <- 1L
Additionally, for() loops are not necessarily "frowned upon" in R. It depends on the situation. Many times for() loops turn out to be the most efficient route.
A solution that uses apply:
x[unlist(apply(cbind(starts, stops), 1, function(x) x[[1]]:x[[2]]))] <- 1
starts <- seq(10, 90, 1)
change_index <- starts[starts %% 10 <= 5]
x[change_index] <- 1

Efficient way to generate permutations of 0 and 1?

What I am trying to do is generate all possible permutations of 1 and 0 given a particular sample size. For instance with a sample of n=8 I would like the m = 2^8 = 256 possible permutations, i.e:
I've written a function in R to do this, but after n=11 it takes a very long time to run. I would prefer a solution in R, but if its in another programming language I can probably figure it out. Thanks!
PermBinary <- function(n){
n.perms <- 2^n
array <- matrix(0,nrow=n,ncol=n.perms)
# array <- big.matrix(n, n.perms, type='integer', init=-5)
for(i in 1:n){
div.length <- ncol(array)/(2^i)
div.num <- ncol(array)/div.length
end <- 0
while(end!=ncol(array)){
end <- end +1
start <- end + div.length
end <- start + div.length -1
array[i,start:end] <- 1
}
}
return(array)
}
expand.grid is probably the best vehicle to get what you want.
For example if you wanted a sample size of 3 we could do something like
expand.grid(0:1, 0:1, 0:1)
For a sample size of 4
expand.grid(0:1, 0:1, 0:1, 0:1)
So what we want to do is find a way to automate that call.
If we had a list of the inputs we want to give to expand.grid we could use do.call to construct the call for us. For example
vals <- 0:1
tmp <- list(vals, vals, vals)
do.call(expand.grid, tmp)
So now the challenge is to automatically make the "tmp" list above in a fashion that we can dictate how many copies of "vals" we want. There are lots of ways to do this but one way is to use replicate. Since we want a list we'll need to tell it to not simplify the result or else we will get a matrix/array as the result.
vals <- 0:1
tmp <- replicate(4, vals, simplify = FALSE)
do.call(expand.grid, tmp)
Alternatively we can use rep on a list input (which I believe is faster because it doesn't have as much overhead as replicate but I haven't tested it)
tmp <- rep(list(vals), 4)
do.call(expand.grid, tmp)
Now wrap that up into a function to get:
binarypermutations <- function(n, vals = 0:1){
tmp <- rep(list(vals), n)
do.call(expand.grid, tmp)
}
Then call with the sample size like so binarypermutations(5).
This gives a data.frame of dimensions 2^n x n as a result - transpose and convert to a different data type if you'd like.
The answer above may be better since it uses base - my first thought was to use data.table's CJ function:
library(data.table)
do.call(CJ, replicate(8, c(0, 1), FALSE))
It will be slightly faster (~15%) than expand.grid, so it will only be more valuable for extreme cases.

Apply family of functions for functions with multiple arguments

I would like to use a function from the apply family (in R) to apply a function of two arguments to two matrices. I assume this is possible. Am I correct? Otherwise, it would seem that I have to put the two matrices into one, and redefine my function in terms of the new matrix.
Here's an example of what I'd like to do:
a <- matrix(1:6,nrow = 3,ncol = 2)
b <- matrix(7:12,nrow = 3,ncol = 2)
foo <- function(vec1,vec2){
d <- sample(vec1,1)
f <- sample(vec2,1)
result <- c(d,f)
return(result)
}
I would like to apply foo to a and b.
(Strictly answering the question, not pointing you to a better approach for you particular use here....)
mapply is the function from the *apply family of functions for applying a function while looping through multiple arguments.
So what you want to do here is turn each of your matrices into a list of vectors that hold its rows or columns (you did not specify). There are many ways to do that, I like to use the following function:
split.array.along <- function(X, MARGIN) {
require(abind)
lapply(seq_len(dim(X)[MARGIN]), asub, x = X, dims = MARGIN)
}
Then all you have to do is run:
mapply(foo, split.array.along(a, 1),
split.array.along(b, 1))
Like sapply, mapply tries to put your output into an array if possible. If instead you prefer the output to be a list, add SIMPLIFY = FALSE to the mapply call, or equivalently, use the Map function:
Map(foo, split.array.along(a, 1),
split.array.along(b, 1))
You could adjust foo to take one argument (a single matrix), and use apply in the function body.
Then you can use lapply on foo to sample from each column of each matrix.
> a <- matrix(1:6,nrow = 3,ncol = 2)
> b <- matrix(7:12,nrow = 3,ncol = 2)
> foo <- function(x){
apply(x, 2, function(z) sample(z, 1))
}
> lapply(list(a, b), foo)
## [[1]]
## [1] 1 6
## [[2]]
## [1] 8 12

Resources