I need to calculate this
where x is a vector of length n and f is a function.
What is the most efficient calculation for this in R?
One method is a double for loop, but that is obviously slow.
One fast way to do is the following:
Assume we have this vector:
x = c(0,1,2)
i.e. n=3, and assume f is a multiplication function:
Now, we use expand.grid.unique custom function which produces unique combinations within vector; in other words, it is similar to expand.grid base function but with unique combinations:
expand.grid.unique <- function(x, y, include.equals=FALSE)
{
x <- unique(x)
y <- unique(y)
g <- function(i)
{
z <- setdiff(y, x[seq_len(i-include.equals)])
if(length(z)) cbind(x[i], z, deparse.level=0)
}
do.call(rbind, lapply(seq_along(x), g))
}
In our vector case, when we cal expand.grid.unique(x,x), it produces the following result:
> expand.grid.unique(x,x)
[,1] [,2]
[1,] 0 1
[2,] 0 2
[3,] 1 2
Let's assign two_by_two to it:
two_by_two <- expand.grid.unique(x,x)
Since our function is assumed to be multiplication, then we need to calculate sum-product, i.e. dot product of first and second columns of two_by_two. For this we need %*% operator:
output <- two_by_two[,1] %*% two_by_two[,2]
> output
[,1]
[1,] 2
See ?combn
x <- 0:2
combn(x, 2)
# unique combos
[,1] [,2] [,3]
#[1,] 0 0 1
#[2,] 1 2 2
sum(combn(x, 2))
#[1] 6
combn() creates all the unique combinations. If you have a function that you want to sum, you can add a FUN to the call:
random_f <- function(x){x[1] + 2 * x[2]}
combn(x, 2, FUN = random_f)
#[1] 2 4 5
sum(combn(x, 2, FUN = random_f))
#[1] 11
Related
Does anyone know of a way to add up combinations of numbers within a vector?
Suppose I am going through a for loop and each time I end up with a vector of different lengths, how could I combine each element of this vector such that I have the sum of 2, 3, etc elements?
For example if I have:
vector <- c(1:5)
And want to go through it as in:
element 1 + element 2; element 2 + element 3, etc
But also:
element 1 + element 2 + element 3
How would I do this? It's important to note that in many of the vectors the lengths will be different. So whilst one vector might contain 3 elements another might contain 12.
I know you can do vector[1]+vector[2], but I need some way to iterate throughout the vector wherein it takes into account the above note.
Use you can use combn:
> combn(vector, 3, FUN = NULL, simplify = TRUE)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 1 1 1 1 1 1 2 2 2 3
[2,] 2 2 2 3 3 4 3 3 4 4
[3,] 3 4 5 4 5 5 4 5 5 5
The trick here is that each call will return a matrix of results, and you will have to decide how you want to aggregate and store all the various combinations.
If you don't mind having a list, then the following should do the trick:
> sapply(c(1:length(vector)),
function(x) {
combn(vector, x, FUN = NULL, simplify = TRUE)
})
Generate pair IDs
In this case, we need to get the pairs:
combn(3, 2)
Output:
[,1] [,2] [,3]
[1,] 1 1 2
[2,] 2 3 3
Pairs are generated by column.
Sum Over Vector Elements (Using a Subset)
To access each element and perform a summation, we opt to define a helper function that takes the combination and the vector.
# Write a helper function
# sums of the index of the vector
comb_subset_sum = function(x, vec){
return(sum(vec[x]))
}
From this, we can use combn directly or use sapply.
Summing for 1 k:
combn directly:
# Input Vector
vec = 1:5
# Length of vector
n = length(vec)
# Generate pairwise combinations and obtain pair_sum
# Specify the k (m in R)
m = combn(n, m = 2, FUN = comb_subset_sum, vec = vec)
sapply usage:
# Input Vector
vec = 1:5
# Number of Observations
n = length(vec)
# Combinations
# Specify the k (m in R)
combinations = combn(n, m = 2)
# Obtain vectorized sum over subset
subset_summed = apply(combinations, 2, comb_subset_sum, vec = vec)
Example Output:
combinations:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 1 1 1 1 2 2 2 3 3 4
[2,] 2 3 4 5 3 4 5 4 5 5
subset_summed:
[1] 3 4 5 6 5 6 7 7 8 9
Trace:
vec[1]+vec[2]=3
vec[1]+vec[3]=4
vec[1]+vec[4]=5
vec[1]+vec[5]=6
vec[2]+vec[3]=5
vec[2]+vec[4]=6
vec[2]+vec[5]=7
vec[3]+vec[4]=7
vec[3]+vec[5]=8
vec[4]+vec[5]=9
To obtain the trace output, add the following before return() in comb_subset_sum():
cat(paste0("vec[",x,"]", collapse = "+"), "=", sum(vec[x]), "\n")
Summing for multiple k:
Here, we apply the same logic, just in a way that enables the k value of the combination to take multiple values.
# Input Vector
vec = 1:5
# Length of Vec
n = length(vec)
# Store output
o = vector('list',n)
for(i in seq_along(vec)){
o[[i]] = combn(n, i, FUN = comb_subset_sum, vec = vec)
}
Note: The size of each element of o will vary as the number of combinations will increase and then decrease.
Summing over combinations
If we do not care about vector element values, we can then just sum over the actual combinations in a similar way to how we obtained the vector elements.
To generate pairs and then sum, use:
# Input Vector
vec = 1:5
# Length of Vec
n = length(vec)
# Generate all combinations (by column)
# Specify the k (m in R)
m = combn(n, m = 2)
# Obtain sum by going over columns
sum_m = apply(m, 2, sum)
Or do it in one go:
# Specify the k (m in R)
sum_inplace = combn(n, m = 2, FUN = sum)
Equality:
all.equal(sum_m,sum_inplace)
Sum over k uses
And, as before, we can set it up to get all sums under different k by using:
# Input Vector
vec = 1:5
# Length of Vec
n = length(vec)
# Store output (varying lengths)
o = vector('list',n)
for(i in seq_along(vec)){
o[[i]] = combn(n, i, FUN = sum)
}
The following relies on the binary representation of number. Basically, you have 2^n combinations to check. By writing any number between 1 and 2^n in binary with 'n' bits, you have all the permutations of elements you might want.
The number2binary function comes from Paul Hiestra's answer in this tread: How to convert integer number into binary vector?
number2binary = function(number, noBits) {
binary_vector = rev(as.numeric(intToBits(number)))
if(missing(noBits)) {
return(binary_vector)
} else {
binary_vector[-(1:(length(binary_vector) - noBits))]
}
}
vector <- 1:5
n <- length(vector)
comp_sum <- function(x) {
binary <- number2binary(x, noBits = n)
result <- sum(vector[which(binary==1)])
names(result) <- paste(which(binary == 1), collapse = "+")
return(result)
}
binaries <- sapply(1:2^n-1, comp_sum)
Note: I only go up to 2^n - 1 as you do not need the "zero". By adding some conditions in your comp_sum function, you can pick only sums of two elements or of three elements...
You might be looking for rollsum from zoo package, where you can specify the number of elements you want to add up:
lapply(2:5, function(i) zoo::rollsum(1:5, i))
[[1]]
[1] 3 5 7 9 # two elements roll sum
[[2]]
[1] 6 9 12 # three elements roll sum
[[3]]
[1] 10 14 # four elements roll sum
[[4]]
[1] 15 # five elements roll sum
I would like to add each coefficient of a vector to each different column of a matrix. For example, if I have a vector and a matrix:
x <- c(1,2,3)
M <- matrix(c(5,6,7), nrow = 3, ncol = 3)
I would like to in my new matrix M1 1+5 in the first column, 2+6 in the second and 3+7 in the last one.
Is there any function in R that does this task?
try this:
M + rep(x, each = nrow(M))
or this:
apply(M, 1, `+`, x)
result:
[,1] [,2] [,3]
[1,] 6 7 8
[2,] 7 8 9
[3,] 8 9 10
EDIT:
akrun commented on two other great solutions:
M + x[col(M)]
and
sweep(M, 2, x, "+")
I'm trying to make this operation matrices, multiplying the first column with 2, 3 and 4, the first hold value, and then multiply the second column with 3 and 4, keep the value of the third and multiply the third column with 4. I want to do this without using a "for" loop, wanted to use functions like sapply or mapply. Does anyone have an idea how to do it?
Example with one line:
a[1,1]*(a[1,2], a[1,3], a[1,4]) = 2 4 4 4
a[1,1] a[1,2]*(a[1,3], a[1,4]) = 2 4 16 16 #keep a[1,1] a[1,2]
a[1,1] a[1,2] a[1,3] a[1,3]*(a[1,4]) = 2 4 16 256 # #keep a[1,1] a[1,2] a[1,3]
Input:
> a<- matrix(2,4,4) # or any else matrix like a<- matrix(c(1,8,10,1,4,1),3,3)
> a
[,1] [,2] [,3] [,4]
[1,] 2 2 2 2
[2,] 2 2 2 2
[3,] 2 2 2 2
[4,] 2 2 2 2
Output:
> a
[,1] [,2] [,3] [,4]
[1,] 2 4 16 256
[2,] 2 4 16 256
[3,] 2 4 16 256
[4,] 2 4 16 256
EDIT: LOOP VERSION
a<- matrix(2,4,4);
ai<-a[,1,drop=F];
b<- matrix(numeric(0),nrow(a),ncol(a)-1);
i<- 1;
for ( i in 1:(ncol(a)-1)){
a<- a[,1]*a[,-1,drop=F];
b[,i]<- a[,1];
}
b<- cbind(ai[,1],b);
b
If I understand correctly, what you are trying to do is, starting with a matrix A with N columns, perform the following steps:
Step 1. Multiply columns 2 through N of A by column 1 of A. Call the resulting matrix A1.
Step 2. Multiply columns 3 through N of A1 by column 2 of A1. Call the resulting matrix A2.
...
Step (N-1). Multiply column N of A(N-2) by column (N-1) of A(N-2). This is the desired result.
If this is indeed what you are trying to do, you need to either write a double for loop (which you want to avoid, as you say) or come up with some iterative method of performing the above steps.
The double for way would look something like this
DoubleFor <- function(m) {
res <- m
for(i in 1:(ncol(res)-1)) {
for(j in (i+1):ncol(res)) {
res[, j] <- res[, i] * res[, j]
}
}
res
}
Using R's vectorized operations, you can avoid the inner for loop
SingleFor <- function(m) {
res <- m
for(i in 1:(ncol(res)-1))
res[, (i+1):ncol(res)] <- res[, i] * res[, (i+1):ncol(res)]
res
}
When it comes to iterating a procedure, you may want to define a recursive function, or use Reduce. The recursive function would be something like
RecursiveFun <- function(m, i = 1) {
if (i == ncol(m)) return(m)
n <- ncol(m)
m[, (i+1):n] <- m[, (i+1):n] * m[, i]
Recall(m, i + 1) # Thanks to #batiste for suggesting using Recall()!
}
while Reduce would use a similar function without the recursion (which is provided by Reduce)
ReduceFun <- function(m) {
Reduce(function(i, m) {
n <- ncol(m)
m[, (i+1):n] <- m[, (i+1):n] * m[, i]
m
}, c((ncol(m)-1):1, list(m)), right = T)
}
These will all produce the same result, e.g. testing on your matrix
a <- matrix(c(1, 8, 10, 1, 4, 1), 3, 3)
DoubleFor(a)
# [,1] [,2] [,3]
# [1,] 1 1 1
# [2,] 8 32 2048
# [3,] 10 10 1000
all(DoubleFor(a) == SingleFor(a) & SingleFor(a) == RecursiveFun(a) &
RecursiveFun(a) == ReduceFun(a))
# [1] TRUE
Just out of curiosity, I did a quick speed comparison, but I don't think any one of the above will be significantly faster than the others for your size of matrices, so I would just go with the one you think is more readable.
a <- matrix(rnorm(1e6), ncol = 1e3)
system.time(DoubleFor(a))
# user system elapsed
# 22.158 0.012 22.220
system.time(SingleFor(a))
# user system elapsed
# 27.349 0.004 27.415
system.time(RecursiveFun(a))
# user system elapsed
# 25.150 1.336 26.534
system.time(ReduceFun(a))
# user system elapsed
# 26.574 0.004 26.626
Okay, here's the situation:
I have the following list of arrays:
N <- c('A', 'B', 'C')
ll <- sapply(N, function(x) NULL)
ll <- lapply(ll, function(x) assign("x", array(0, dim = c(2,2)))) .
Now I want to replace, say, the element at position [1,1] in those arrays by a given quantity, say 10. What I'm doing, following this question here. That is, I'm doing the following:
x <- lapply(ll, function(x) {x[1,1] <- 10}),
which should make x a list of three 2x2 arrays with the [1,1] element equal to 10, all others equal to 0. Instead of that, I'm seeing this:
> x <- lapply(ll, function(x) {x[2,1] <- 10})
> x
$A
[1] 10
$B
[1] 10
$C
[1] 10
Any ideas of what's going on here?
You're not returning the whole vector. So, the last argument is returned. That is, when you do,
x <- lapply(ll, function(x) {x[2,1] <- 10})
You intend to say:
x <- lapply(ll, function(x) {x[2,1] <- 10; return(x)})
If you don't specify a return value, the last assigned value is returned by default which is 10. Instead you should use return(x) or equivalently just x as follows:
x <- lapply(ll, function(x) {x[2,1] <- 10; x})
# $A
# [,1] [,2]
# [1,] 0 0
# [2,] 10 0
#
# $B
# [,1] [,2]
# [1,] 0 0
# [2,] 10 0
#
# $C
# [,1] [,2]
# [1,] 0 0
# [2,] 10 0
Although apply would generally be preferred, here is an alternative, just for the sake of having one:
for (i in 1:3) ll[[i]][2,1] <- 10
For example: I have a list of matrices, and I would like to evaluate their differences, sort of a 3-D diff. So if I have:
m1 <- matrix(1:4, ncol=2)
m2 <- matrix(5:8, ncol=2)
m3 <- matrix(9:12, ncol=2)
mat.list <- list(m1,m2,m3)
I want to obtain
mat.diff <- list(m2-m1, m3-m2)
The solution I found is the following:
mat.diff <- mapply(function (A,B) B-A, mat.list[-length(mat.list)], mat.list[-1])
Is there a nicer/built-in way to do this?
You can do this with just lapply or other ways of looping:
mat.diff <- lapply( tail( seq_along(mat.list), -1 ),
function(i) mat.list[[i]] - mat.list[[ i-1 ]] )
You can use combn to generate the indexes of matrix and apply a function on each combination.
combn(1:length(l),2,FUN=function(x)
if(diff(x) == 1) ## apply just for consecutive index
l[[x[2]]]-l[[x[1]]],
simplify = FALSE) ## to get a list
Using #Arun data, I get :
[[1]]
[,1] [,2]
[1,] 4 4
[2,] 4 4
[[2]]
NULL
[[3]]
[,1] [,2]
[1,] 4 4
[2,] 4 4