Subsetting with negative indices: best practices?

Subsetting with negative indices: best practices? - r

Say I have a function for subsetting (this is just a minimal example):
f <- function(x, ind = seq(length(x))) {
x[ind]
}
(Note: one could use only seq(x) instead of seq(length(x)), but I don't find it very clear.)
So, if
x <- 1:5
ind <- c(2, 4)
ind2 <- which(x > 5) # integer(0)
I have the following results:
f(x)
[1] 1 2 3 4 5
f(x, ind)
[1] 2 4
f(x, -ind)
[1] 1 3 5
f(x, ind2)
integer(0)
f(x, -ind2)
integer(0)
For the last result, we would have wanted to get all x, but this is a common cause of error (as mentionned in the book Advanced R).
So, if I want to make a function for removing indices, I use:
f2 <- function(x, ind.rm) {
f(x, ind = `if`(length(ind.rm) > 0, -ind.rm, seq(length(x))))
}
Then I get what I wanted:
f2(x, ind)
[1] 1 3 5
f2(x, ind2)
[1] 1 2 3 4 5
My question is:
Can I do something cleaner and that doesn't need passing seq(length(x)) explicitly in f2 but using directly the default value of f's parameter ind when ind.rm is integer(0)?

If you anticipate having "empty" negative indices a lot, you can get a performance improvement for these cases if you can avoid the indexing used by x[seq(x)] as opposed to just x. In other words, if you are able to combine f and f2 into something like:
new_f <- function(x, ind.rm){
if(length(ind.rm)) x[-ind.rm] else x
}
There will be a huge speedup in the case of empty negative indices.
n <- 1000000L
x <- 1:n
ind <- seq(0L,n,2L)
ind2 <- which(x>n+1) # integer(0)
library(microbenchmark)
microbenchmark(
f2(x, ind),
new_f(x, ind),
f2(x, ind2),
new_f(x, ind2)
)
all.equal(f2(x, ind), new_f(x, ind)) # TRUE - same result at about same speed
all.equal(f2(x, ind2), new_f(x, ind2)) # TRUE - same result at much faster speed
Unit: nanoseconds
expr min lq mean median uq max neval
f2(x, ind) 6223596 7377396.5 11039152.47 9317005 10271521 50434514 100
new_f(x, ind) 6190239 7398993.0 11129271.17 9239386 10202882 59717093 100
f2(x, ind2) 6823589 7992571.5 11267034.52 9217149 10568524 63417978 100
new_f(x, ind2) 428 1283.5 5414.74 6843 7271 14969 100

What you have isn't bad, but if you want to avoid passing the default value of a default argument you could restructure like this:
f2 <- function(x, ind.rm) {
`if`(length(ind.rm) > 0, f(x,-ind.rm), f(x))
}
which is slightly shorter than what you have.
On Edit
Based on the comments, it seems you want to be able to pass a function nothing (rather than simply not pass at all), so that it uses the default value. You can do so by writing a function which is set up to receive nothing, also known as NULL. You can rewrite your f as:
f <- function(x, ind = NULL) {
if(is.null(ind)){ind <- seq(length(x))}
x[ind]
}
NULL functions as a flag which tells the receiving function to use a default value for the parameter, although that default value must be set in the body of the function.
Now f2 can be rewritten as
f2 <- function(x, ind.rm) {
f(x, ind = `if`(length(ind.rm) > 0, -ind.rm, NULL))
}
This is slightly more readable than what you have, but at the cost of making the original function slightly longer.

To implement "parameter1 = if(cond1) then value1 else default_value_of_param1", I used formals to get default parameters as a call:
f <- function(x, ind.row = seq_len(nrow(x)), ind.col = seq_len(ncol(x))) {
x[ind.row, ind.col]
}
f2 <- function(x, ind.row.rm = integer(0), ind.col.rm = integer(0)) {
f.args <- formals(f)
f(x,
ind.row = `if`(length(ind.row.rm) > 0, -ind.row.rm, eval(f.args$ind.row)),
ind.col = `if`(length(ind.col.rm) > 0, -ind.col.rm, eval(f.args$ind.col)))
}
Then:
> x <- matrix(1:6, 2)
> f2(x, 1:2)
[,1] [,2] [,3]
> f2(x, , 1:2)
[1] 5 6
> f2(x, 1, 2)
[1] 2 6
> f2(x, , 1)
[,1] [,2]
[1,] 3 5
[2,] 4 6
> f2(x, 1, )
[1] 2 4 6
> f2(x)
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6

Related

double sum calculation, what is the most efficient approach?

I need to calculate this
where x is a vector of length n and f is a function.
What is the most efficient calculation for this in R?
One method is a double for loop, but that is obviously slow.

One fast way to do is the following:
Assume we have this vector:
x = c(0,1,2)
i.e. n=3, and assume f is a multiplication function:
Now, we use expand.grid.unique custom function which produces unique combinations within vector; in other words, it is similar to expand.grid base function but with unique combinations:
expand.grid.unique <- function(x, y, include.equals=FALSE)
{
x <- unique(x)
y <- unique(y)
g <- function(i)
{
z <- setdiff(y, x[seq_len(i-include.equals)])
if(length(z)) cbind(x[i], z, deparse.level=0)
}
do.call(rbind, lapply(seq_along(x), g))
}
In our vector case, when we cal expand.grid.unique(x,x), it produces the following result:
> expand.grid.unique(x,x)
[,1] [,2]
[1,] 0 1
[2,] 0 2
[3,] 1 2
Let's assign two_by_two to it:
two_by_two <- expand.grid.unique(x,x)
Since our function is assumed to be multiplication, then we need to calculate sum-product, i.e. dot product of first and second columns of two_by_two. For this we need %*% operator:
output <- two_by_two[,1] %*% two_by_two[,2]
> output
[,1]
[1,] 2

See ?combn
x <- 0:2
combn(x, 2)
# unique combos
[,1] [,2] [,3]
#[1,] 0 0 1
#[2,] 1 2 2
sum(combn(x, 2))
#[1] 6
combn() creates all the unique combinations. If you have a function that you want to sum, you can add a FUN to the call:
random_f <- function(x){x[1] + 2 * x[2]}
combn(x, 2, FUN = random_f)
#[1] 2 4 5
sum(combn(x, 2, FUN = random_f))
#[1] 11

r - apply a function on data n number of times

I would like to apply the same function certain number of times on a vector using the output from the function every time.
A simplified example with a simple function just to demonstrate:
# sample vector
a <- c(1,2,3)
# function to be applied n times
f1 <- function(x) {
x^2 + x^3
}
I would like to apply f1 on a, n number of times, for example here lets say 3 times.
I heard purrr::reduce or purrr::map() might be a good idea for this but couldn't make it work.
The desired output if n = 3 would be equal to f1(f1(f1(a))).

Let's use Reduce (no external library requirements, generally good performance). I'll modify the function slightly to accept a second (ignored) argument:
f1 <- function(x, ign) x^2 + x^3
Reduce(f1, 1:3, init = a)
# [1] 1.872000e+03 6.563711e+09 1.102629e+14
Here's what's happening. Reduce:
uses a binary function to successively combine the elements of a given vector and a possibly given initial value.
The first argument is the function to use, and it should accept two arguments. The first is the value from the previous execution of the function in this reduction. On the first call of the function, it uses the init= value provided.
First call:
f1(c(1,2,3), 1)
# [1] 2 12 36
Second call:
f1(c(2,12,36), 2)
# [1] 12 1872 47952
Third call:
f1(c(12,1872,47952), 3)
# [1] 1.872000e+03 6.563711e+09 1.102629e+14
The second argument 1:3 is used just for its length. Anything of the proper length will work.
If you don't want to redefine f1 just for this reduction, you can always do
Reduce(function(a,ign) f1(a), ...)
Benchmark:
library(microbenchmark)
r <- Reduce(function(a,b) call("f1", a), 1:3, init=quote(a))
triple_f1 <- function(a) f1(f1(f1(a)))
microbenchmark::microbenchmark(
base = Reduce(function(a,ign) f1(a), 1:3, a),
accum = a %>% accumulate(~ .x %>% f1, .init = f1(a)) %>% extract2(3),
reduc = purrr::reduce(1:3, function(a,ign) f1(a), .init=a),
whil = {
i <- 1
a <- c(1,2,3)
while (i < 10) {
i <- i + 1
a <- f1(a)
}
},
forloop = {
out <- a
for(i in seq_len(3)) out <- f1(out)
},
evaluated = {
r <- Reduce(function(a,b) call("f1", a), 1:3, init=quote(a))
eval(r)
},
precompiled = eval(r),
anotherfun = triple_f1(a)
)
# Unit: microseconds
# expr min lq mean median uq max neval
# base 5.101 7.3015 18.28691 9.3010 10.8510 848.302 100
# accum 294.201 328.4015 381.21204 356.1520 402.6510 823.602 100
# reduc 27.000 38.1005 57.55694 45.2510 54.2005 747.401 100
# whil 1717.300 1814.3510 1949.03100 1861.8510 1948.9510 2931.001 100
# forloop 1110.001 1167.1010 1369.87696 1205.5010 1292.6500 9935.501 100
# evaluated 6.702 10.2505 22.18598 13.3015 15.5510 715.301 100
# precompiled 2.300 3.2005 4.69090 4.0005 4.5010 26.800 100
# anotherfun 1.400 2.0515 12.85201 2.5010 3.3505 1017.801 100

i <- 1
while (i < 10) {
i <- i + 1
x <- f(x)
}

Here is an option with accumulate
library(tidyverse)
n <- 3
a %>%
accumulate(~ .x %>%
f1, .init = f1(a)) %>%
extract2(n)
#[1] 1.872000e+03 6.563711e+09 1.102629e+14
NOTE: accumulate is similar to the base R option Reduce with accumulate = TRUE
checking with the OP's output
f1(f1(f1(a)))
#[1] 1.872000e+03 6.563711e+09 1.102629e+14
Or use a for loop (no external libraries used)
out <- a
for(i in seq_len(n)) out <- f1(out)
out
#[1] 1.872000e+03 6.563711e+09 1.102629e+14

Here's another way to do it with Reduce:
setting the stage
a <- 1:3
f1 <- function(x) x^2 + x^3
constructing a call and evaluating it
N <- 3 # how many times?
r <- Reduce(function(a,b) call("f1", a), rep(NA, N), init=a)
# f1(f1(f1(1:3)))
eval(r)
# [1] 1.872000e+03 6.563711e+09 1.102629e+14
alternative 2
# N defined as above
Reduce(function(x,y) y(x), replicate(N,f1), init=a)
# [1] 1.872000e+03 6.563711e+09 1.102629e+14
alternative 3 (recursive with a global-like variable)
doit <- function(N) {
i <- 0
function(fun, x){
i <<- i +1
if(i < N) Recall(fun, fun(x)) else fun(x)
}
}
doit(3)(f1, a)
# [1] 1.872000e+03 6.563711e+09 1.102629e+14
... or even
doit <- function(N, fun, x) (function(fun, x)
if((N <<- N - 1) > 0)
Recall(fun, fun(x)) else
fun(x))(fun, x)
doit(3, f1, a)
# [1] 1.872000e+03 6.563711e+09 1.102629e+14

Find components of a vector which increase continually by k-times

I want to create a function which finds components of a vector which increase continually by k-times.
That is, if the contrived function is f(x,k) and x=c(2,3,4,3,5,6,5,7), then
the value of f(x,1) is 2,3,3,5,5 since only these components of x increase by 1 time.
In addition, if k=2, then the value of f(x,2) is 2,3 since only these components increase continually by 2 times.(2→3→4 and 3→5→6)
I guess that I ought to use repetitive syntax like for for this purpose.

1) Use rollapply from the zoo package:
library(zoo)
f <- function(x, k)
x[rollapply(x, k+1, function(x) all(diff(x) > 0), align = "left", fill = FALSE)]
Now test out f:
x <- c(2,3,4,3,5,6,5,7)
f(x, 1)
## [1] 2 3 3 5 5
f(x, 2)
## [1] 2 3
f(x, 3)
## numeric(0)
1a) This variation is slightly shorter and also works:
f2 <- function(x, k) head(x, -k)[ rollapply(diff(x) > 0, k, all) ]
2) Here is a version of 1a that uses no packages:
f3 <- function(x, k) head(x, -k)[ apply(embed(diff(x) > 0, k), 1, all) ]

A fully vectorized solution:
f <- function(x, k = 1) {
rlecumsum = function(x)
{ #cumsum with resetting
#http://stackoverflow.com/a/32524260/1412059
cs = cumsum(x)
cs - cummax((x == 0) * cs)
}
x[rev(rlecumsum(rev(c(diff(x) > 0, FALSE) ))) >= k]
}
f(x, 1)
#[1] 2 3 3 5 5
f(x, 2)
#[1] 2 3
f(x, 3)
#numeric(0)

I don't quite understand the second part of your question (that with k=2) but for the first part you can use something like this:
test<-c(2,3,4,3,5,6,5,7) #Your vector
diff(test) #Differentiates the vector
diff(test)>0 #Turns the vector in a logical vector with criterion >0
test[diff(test)>0] #Returns only the elements of test that correspond to a TRUE value in the previous line

The diag() function in R

Is there a way to use the diag() function in a Matrix without using the built-in function or iteration?
M<-matrix(1:9, ncol=3) # make a matrix
q5b<-function(M){ #function
}
I know that M[1,1], M[2,2], and M[3,3] will give me the same output as diag(M). However, I can't think of a way to do this without a for loop.
My thought process was I should have a condition where row index == column index in the Matrix then print that value. I appreciate any suggestions.

You can use the functions row and col to find the indices where the column number is identical to the row number:
row(M) == col(M)
# [,1] [,2] [,3]
# [1,] TRUE FALSE FALSE
# [2,] FALSE TRUE FALSE
# [3,] FALSE FALSE TRUE
M[row(M) == col(M)]
# [1] 1 5 9

Just subset based on another matrix:
> diag(M)
[1] 1 5 9
> M[matrix(rep(sequence(ncol(M)), 2), ncol = 2)]
[1] 1 5 9
The above would run into a problem in a non-square matrix, so we modify it as below.
As your function, one answer for question 5b could be:
q5b <- function(M) {
A <- sequence(ncol(M))[sequence(min(nrow(M), ncol(M)))]
M[cbind(A, A)]
}
Update: Benchmarks are always fun
library(microbenchmark)
fun1 <- function(M) diag(M)
fun2 <- function(M) M[row(M) == col(M)]
fun3 <- function(M) {
A <- sequence(ncol(M))[sequence(min(nrow(M), ncol(M)))]
M[cbind(A, A)]
}
set.seed(1)
M <- matrix(rnorm(1000*1000), ncol = 1000)
microbenchmark(fun1(M), fun2(M), fun3(M), times = 100)
# Unit: microseconds
# expr min lq median uq max neval
# fun1(M) 4654.825 4747.408 4822.8865 4912.690 5877.866 100
# fun2(M) 53270.266 54813.606 55059.0695 55749.062 200384.531 100
# fun3(M) 66.284 82.321 118.8835 129.361 191.155 100

Replacing elements of arrays within a list (R)

Okay, here's the situation:
I have the following list of arrays:
N <- c('A', 'B', 'C')
ll <- sapply(N, function(x) NULL)
ll <- lapply(ll, function(x) assign("x", array(0, dim = c(2,2)))) .
Now I want to replace, say, the element at position [1,1] in those arrays by a given quantity, say 10. What I'm doing, following this question here. That is, I'm doing the following:
x <- lapply(ll, function(x) {x[1,1] <- 10}),
which should make x a list of three 2x2 arrays with the [1,1] element equal to 10, all others equal to 0. Instead of that, I'm seeing this:
> x <- lapply(ll, function(x) {x[2,1] <- 10})
> x
$A
[1] 10
$B
[1] 10
$C
[1] 10
Any ideas of what's going on here?

You're not returning the whole vector. So, the last argument is returned. That is, when you do,
x <- lapply(ll, function(x) {x[2,1] <- 10})
You intend to say:
x <- lapply(ll, function(x) {x[2,1] <- 10; return(x)})
If you don't specify a return value, the last assigned value is returned by default which is 10. Instead you should use return(x) or equivalently just x as follows:
x <- lapply(ll, function(x) {x[2,1] <- 10; x})
# $A
# [,1] [,2]
# [1,] 0 0
# [2,] 10 0
#
# $B
# [,1] [,2]
# [1,] 0 0
# [2,] 10 0
#
# $C
# [,1] [,2]
# [1,] 0 0
# [2,] 10 0

Although apply would generally be preferred, here is an alternative, just for the sake of having one:
for (i in 1:3) ll[[i]][2,1] <- 10

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Subsetting with negative indices: best practices? - r

Related

double sum calculation, what is the most efficient approach?

r - apply a function on data n number of times

Find components of a vector which increase continually by k-times

The diag() function in R

Replacing elements of arrays within a list (R)

Categories

Resources