Finding cumulative sum and then average the values in R - r

I want to compute cumulative sum for the first (n-1) columns(if we have n columns matrix) and subsequently average the values. I created a sample matrix to do this task. I have the following matrix
ma = matrix(c(1:10), nrow = 2, ncol = 5)
ma
[,1] [,2] [,3] [,4] [,5]
[1,] 1 3 5 7 9
[2,] 2 4 6 8 10
I wanted to find the following
ans = matrix(c(1,2,2,3,3,4,4,5), nrow = 2, ncol = 4)
ans
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 2 3 4 5
The following are my r function.
ColCumSumsAve <- function(y){
for(i in seq_len(dim(y)[2]-1)) {
y[,i] <- cumsum(y[,i])/i
}
}
ColCumSumsAve(ma)
However, when I run the above function its not producing any output. Are there any mistakes in the code?
Thanks.

There were several mistakes.
Solution
This is what I tested and what works:
colCumSumAve <- function(m) {
csum <- t(apply(X=m, MARGIN=1, FUN=cumsum))
res <- t(Reduce(`/`, list(t(csum), 1:ncol(m))))
res[, 1:(ncol(m)-1)]
}
Test it with:
> colCumSumAve(ma)
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 2 3 4 5
which is correct.
Explanation:
colCumSumAve <- function(m) {
csum <- t(apply(X=m, MARGIN=1, FUN=cumsum)) # calculate row-wise colsum
res <- t(Reduce(`/`, list(t(csum), 1:ncol(m))))
# This is the trickiest part.
# Because `csum` is a matrix, the matrix will be treated like a vector
# when `Reduce`-ing using `/` with a vector `1:ncol(m)`.
# To get quasi-row-wise treatment, I change orientation
# of the matrix by `t()`.
# However, the output, the output will be in this transformed
# orientation as a consequence. So I re-transform by applying `t()`
# on the entire result at the end - to get again the original
# input matrix orientation.
# `Reduce` using `/` here by sequencial list of the `t(csum)` and
# `1:ncol(m)` finally, has as effect `/`-ing `csum` values by their
# corresponding column position.
res[, 1:(ncol(m)-1)] # removes last column for the answer.
# this, of course could be done right at the beginning,
# saving calculation of values in the last column,
# but this calculation actually is not the speed-limiting or speed-down-slowing step
# of these calculations (since this is sth vectorized)
# rather the `apply` and `Reduce` will be rather speed-limiting.
}
Well, okay, I could do then:
colCumSumAve <- function(m) {
csum <- t(apply(X=m[, 1:(ncol(m)-1)], MARGIN=1, FUN=cumsum))
t(Reduce(`/`, list(t(csum), 1:ncol(m))))
}
or:
colCumSumAve <- function(m) {
m <- m[, 1:(ncol(m)-1)] # remove last column
csum <- t(apply(X=m, MARGIN=1, FUN=cumsum))
t(Reduce(`/`, list(t(csum), 1:ncol(m))))
}
This is actually the more optimized solution, then.
Original Function
Your original function makes only assignments in the for-loop and doesn't return anything.
So I copied first your input into a res, processed it with your for-loop and then returned res.
ColCumSumsAve <- function(y){
res <- y
for(i in seq_len(dim(y)[2]-1)) {
res[,i] <- cumsum(y[,i])/i
}
res
}
However, this gives:
> ColCumSumsAve(ma)
[,1] [,2] [,3] [,4] [,5]
[1,] 1 1.5 1.666667 1.75 9
[2,] 3 3.5 3.666667 3.75 10
The problem is that the cumsum in matrices is calculated in column-direction instead row-wise, since it treats the matrix like a vector (which goes columnwise through the matrix).
Corrected Original Function
After some frickeling, I realized, the correct solution is:
ColCumSumsAve <- function(y){
res <- matrix(NA, nrow(y), ncol(y)-1)
# create empty matrix with the dimensions of y minus last column
for (i in 1:(nrow(y))) { # go through rows
for (j in 1:(ncol(y)-1)) { # go through columns
res[i, j] <- sum(y[i, 1:j])/j # for each position do this
}
}
res # return `res`ult by calling it at the end!
}
with the testing:
> ColCumSumsAve(ma)
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 2 3 4 5
Note: dim(y)[2] is ncol(y) - and dim(y)[1] is nrow(y) -
and instead seq_len(), 1: is shorter and I guess even slightly faster.
Note: My solution given first will be faster, since it uses apply, vectorized cumsum and Reduce. - for-loops in R are slower.
Late Note: Not so sure that the first solution is faster. Since R-3.x it seems that for loops are faster. Reduce will be the speed limiting funtion and can be sometimes incredibly slow.

k <- t(apply(ma,1,cumsum))[,-ncol(k)]
for (i in 1:ncol(k)){
k[,i] <- k[,i]/i
}
k
This should work.

All you need is rowMeans:
nc <- 4
cbind(ma[,1],sapply(2:nc,function(x) rowMeans(ma[,1:x])))
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 2 3 4 5

Here's how I did it
> t(apply(ma, 1, function(x) cumsum(x) / 1:length(x)))[,-NCOL(ma)]
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 2 3 4 5
This applies the cumsum function row-wise to the matrix ma and then divides by the correct length to get the average (cumsum(x) and 1:length(x) will have the same length). Then simply transpose with t and remove the last column with [,-NCOL(ma)].
The reason why there is no output from your function is because you aren't returning anything. You should end the function with return(y) or simply y as Marius suggested. Regardless, your function doesn't seem to give you the correct response anyway.

Related

Apply() cannot be applied to this list?

I have created an example below, where I am trying to make a list of each row of a matrix, then use apply().
mat<-matrix(rexp(9, rate=.1), ncol=3)
my_list2 <- list()
for(i in 1:nrow(mat)) {
my_list2[[i]] <- mat[i,]
}
#DO NOT CHANGE THIS:
apply(my_list2[[i]],2,sum)
However the apply() function does not work, giving a dimension error. I understand that apply() is not the best function to use here but it is present in a function that I need so I cannot change that line.
Does anyone have any idea how I can change my "my_list2" to work better? Thank you!
Edit:
Here is an example that works (non reproducible)
Example
Note both the example above and this example have type "list"
This answer addresses "how to properly get a list of matrices", not how to resolve the use of apply.
By default in R, when you subset a matrix to a single column or a single row, it reduces the dimensionality. For instance,
mtx <- matrix(1:6, nrow = 2)
mtx
# [,1] [,2] [,3]
# [1,] 1 3 5
# [2,] 2 4 6
mtx[1,]
# [1] 1 3 5
mtx[,3]
# [1] 5 6
If you want a single row or column but to otherwise retain dimensionality, add the drop=FALSE argument to the [-subsetting:
mtx[1,,drop=FALSE]
# [,1] [,2] [,3]
# [1,] 1 3 5
mtx[,3,drop=FALSE]
# [,1]
# [1,] 5
# [2,] 6
In this way, your code to produce sample data can be adjusted to be:
set.seed(42) # important for reproducibility in questions on SO
mat<-matrix(rexp(9, rate=.1), ncol=3)
my_list2 <- list()
for(i in 1:nrow(mat)) {
my_list2[[i]] <- mat[i,,drop=FALSE]
}
my_list2
# [[1]]
# [,1] [,2] [,3]
# [1,] 1.983368 0.381919 3.139846
# [[2]]
# [,1] [,2] [,3]
# [1,] 6.608953 4.731766 4.101296
# [[3]]
# [,1] [,2] [,3]
# [1,] 2.83491 14.63627 11.91598
And then you can use akrun's most recent code to resolve how to get the row-wise sums within each list element, i.e., one of
lapply(my_list2, apply, 2, sum)
lapply(my_list2, function(z) apply(z, 2, sum))
lapply(my_list2, \(z) apply(z, 2, sum)) # R-4.1 or later
In your screenshot it works because the object part of the list ex[[1]] is an array. And in your example the elements of your list are vectors. You could try the following:
mat<-matrix(rexp(9, rate=.1), ncol=3)
my_list2 <- list()
for(i in 1:nrow(mat)) {
my_list2[[i]] <- as.matrix(mat[i,])
}
#DO NOT CHANGE THIS:
apply(my_list2[[1]],2,sum)
apply(my_list2[[2]],2,sum)
apply(my_list2[[3]],2,sum)
You should note that apply cannot be applied to all three elements of the array in one line. And to do it in one, that line should be changed.

R - Add combinations of elements within a vector

Does anyone know of a way to add up combinations of numbers within a vector?
Suppose I am going through a for loop and each time I end up with a vector of different lengths, how could I combine each element of this vector such that I have the sum of 2, 3, etc elements?
For example if I have:
vector <- c(1:5)
And want to go through it as in:
element 1 + element 2; element 2 + element 3, etc
But also:
element 1 + element 2 + element 3
How would I do this? It's important to note that in many of the vectors the lengths will be different. So whilst one vector might contain 3 elements another might contain 12.
I know you can do vector[1]+vector[2], but I need some way to iterate throughout the vector wherein it takes into account the above note.
Use you can use combn:
> combn(vector, 3, FUN = NULL, simplify = TRUE)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 1 1 1 1 1 1 2 2 2 3
[2,] 2 2 2 3 3 4 3 3 4 4
[3,] 3 4 5 4 5 5 4 5 5 5
The trick here is that each call will return a matrix of results, and you will have to decide how you want to aggregate and store all the various combinations.
If you don't mind having a list, then the following should do the trick:
> sapply(c(1:length(vector)),
function(x) {
combn(vector, x, FUN = NULL, simplify = TRUE)
})
Generate pair IDs
In this case, we need to get the pairs:
combn(3, 2)
Output:
[,1] [,2] [,3]
[1,] 1 1 2
[2,] 2 3 3
Pairs are generated by column.
Sum Over Vector Elements (Using a Subset)
To access each element and perform a summation, we opt to define a helper function that takes the combination and the vector.
# Write a helper function
# sums of the index of the vector
comb_subset_sum = function(x, vec){
return(sum(vec[x]))
}
From this, we can use combn directly or use sapply.
Summing for 1 k:
combn directly:
# Input Vector
vec = 1:5
# Length of vector
n = length(vec)
# Generate pairwise combinations and obtain pair_sum
# Specify the k (m in R)
m = combn(n, m = 2, FUN = comb_subset_sum, vec = vec)
sapply usage:
# Input Vector
vec = 1:5
# Number of Observations
n = length(vec)
# Combinations
# Specify the k (m in R)
combinations = combn(n, m = 2)
# Obtain vectorized sum over subset
subset_summed = apply(combinations, 2, comb_subset_sum, vec = vec)
Example Output:
combinations:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 1 1 1 1 2 2 2 3 3 4
[2,] 2 3 4 5 3 4 5 4 5 5
subset_summed:
[1] 3 4 5 6 5 6 7 7 8 9
Trace:
vec[1]+vec[2]=3
vec[1]+vec[3]=4
vec[1]+vec[4]=5
vec[1]+vec[5]=6
vec[2]+vec[3]=5
vec[2]+vec[4]=6
vec[2]+vec[5]=7
vec[3]+vec[4]=7
vec[3]+vec[5]=8
vec[4]+vec[5]=9
To obtain the trace output, add the following before return() in comb_subset_sum():
cat(paste0("vec[",x,"]", collapse = "+"), "=", sum(vec[x]), "\n")
Summing for multiple k:
Here, we apply the same logic, just in a way that enables the k value of the combination to take multiple values.
# Input Vector
vec = 1:5
# Length of Vec
n = length(vec)
# Store output
o = vector('list',n)
for(i in seq_along(vec)){
o[[i]] = combn(n, i, FUN = comb_subset_sum, vec = vec)
}
Note: The size of each element of o will vary as the number of combinations will increase and then decrease.
Summing over combinations
If we do not care about vector element values, we can then just sum over the actual combinations in a similar way to how we obtained the vector elements.
To generate pairs and then sum, use:
# Input Vector
vec = 1:5
# Length of Vec
n = length(vec)
# Generate all combinations (by column)
# Specify the k (m in R)
m = combn(n, m = 2)
# Obtain sum by going over columns
sum_m = apply(m, 2, sum)
Or do it in one go:
# Specify the k (m in R)
sum_inplace = combn(n, m = 2, FUN = sum)
Equality:
all.equal(sum_m,sum_inplace)
Sum over k uses
And, as before, we can set it up to get all sums under different k by using:
# Input Vector
vec = 1:5
# Length of Vec
n = length(vec)
# Store output (varying lengths)
o = vector('list',n)
for(i in seq_along(vec)){
o[[i]] = combn(n, i, FUN = sum)
}
The following relies on the binary representation of number. Basically, you have 2^n combinations to check. By writing any number between 1 and 2^n in binary with 'n' bits, you have all the permutations of elements you might want.
The number2binary function comes from Paul Hiestra's answer in this tread: How to convert integer number into binary vector?
number2binary = function(number, noBits) {
binary_vector = rev(as.numeric(intToBits(number)))
if(missing(noBits)) {
return(binary_vector)
} else {
binary_vector[-(1:(length(binary_vector) - noBits))]
}
}
vector <- 1:5
n <- length(vector)
comp_sum <- function(x) {
binary <- number2binary(x, noBits = n)
result <- sum(vector[which(binary==1)])
names(result) <- paste(which(binary == 1), collapse = "+")
return(result)
}
binaries <- sapply(1:2^n-1, comp_sum)
Note: I only go up to 2^n - 1 as you do not need the "zero". By adding some conditions in your comp_sum function, you can pick only sums of two elements or of three elements...
You might be looking for rollsum from zoo package, where you can specify the number of elements you want to add up:
lapply(2:5, function(i) zoo::rollsum(1:5, i))
[[1]]
[1] 3 5 7 9 # two elements roll sum
[[2]]
[1] 6 9 12 # three elements roll sum
[[3]]
[1] 10 14 # four elements roll sum
[[4]]
[1] 15 # five elements roll sum

R programming:How to use loop on variables labelled in a consecutive manner?

I'm trying to figure out, how I can run a loop on some variables that have a consecutive label.
I want to do matrix.2-Matrix.1 and store it in matrix x.1, then Matrix.3-matrix.2 and store it in matrix x.2. There are 300 matrices(Matrix.1,Matrix.2,...Matrix.300) but for this example, I would like to just work on matrix 1,2 and 3.
I first tried an approach that involved the list function, but it didn't work, and then I thought about using a MACRO just like in SAS (the % symbol). But the Macro approach seemed not to work in R.
My code is below:
(The list approach)
> Matrix.1=matrix(c(1:6),nrow=2,ncol=3,byrow=TRUE)
> Matrix.2=matrix(c(1,8,9,17,15,2),nrow=2,ncol=3,byrow=TRUE)
> Matrix.3=matrix(c(0,1,2,3,6,0),nrow=2,ncol=3,byrow=TRUE)
> x.1=matrix(rep(0,6),nrow=2,ncol=3,byrow=TRUE)
> x.2=matrix(rep(0,6),nrow=2,ncol=3,byrow=TRUE)
> m=list(Matrix.1=Matrix.1,Matrix.2=Matrix.2,Matrix.3=Matrix.3)
> x=list(x.1=x.1,x.2=x.2)
> m[1]
$Matrix.1
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
> m[2]
$Matrix.2
[,1] [,2] [,3]
[1,] 1 8 9
[2,] 17 15 2
> m[3]
$Matrix.3
[,1] [,2] [,3]
[1,] 0 1 2
[2,] 3 6 0
> x[1]
$x.1
[,1] [,2] [,3]
[1,] 0 0 0
[2,] 0 0 0
> x[2]
$x.2
[,1] [,2] [,3]
[1,] 0 0 0
[2,] 0 0 0
> for (i in 1:2){
+ x[i]=m[i+1]-m[i]
+ print(x[i])
+ }
Error in m[i + 1] - m[i] : non-numeric argument to binary operator
>
How can I make operations on list?
> #Other approach inspired from SAS
> for (i in i:2){
+ x.i=Matrix.i+1-Matrix.i
+ print(R.i)
+ }
Error: object 'Matrix.i' not found
This second approach isn't even doable in R.
What is the best way of dealing loops involving consecutively labelled variables?
Since m and x are both lists, you need to use m[[1]] and x[[1]] to extract its elements.
for (i in 1:2){
x[[i]] <- m[[i+1]]-m[[i]]
print(x[[i]])
}
On the other hand, if you have 300 matrices (Matrix.1, Matrix.2, ... Matrix.300), you could use get and assign to deal with the numerical labels. Here I first assign values to 300 matrices with names Matrix.1 through Matrix.300. Then I use get function to extract these matrices and generate list x.
for (i in 1:300) {
assign(paste("Matrix.", i, sep = ""), matrix(rnorm(9), 3, 3))
}
x <- list()
for (i in 2:300) {
x[[i-1]] <- get(paste("Matrix.", i, sep = "")) - get(paste("Matrix.", i-1, sep = ""))
}
It is the preferred method in R to use the apply family of functions to loop through objects. For lists, you can use lapply which returns a list, or sapply which returns the most simplified object it can without losing any information. With these functions, you output is stored in the same order as the input, which makes comparisons or additional steps much easier.
myProcessedList <- lapply(x, FUN=<some function>)
This is a lot simpler and more straightforward than using assign and get and is worth the investment to learn. SO has many useful examples.

R: How to do this matrix operation without loops or more efficient?

I'm trying to make this operation matrices, multiplying the first column with 2, 3 and 4, the first hold value, and then multiply the second column with 3 and 4, keep the value of the third and multiply the third column with 4. I want to do this without using a "for" loop, wanted to use functions like sapply or mapply. Does anyone have an idea how to do it?
Example with one line:
a[1,1]*(a[1,2], a[1,3], a[1,4]) = 2 4 4 4
a[1,1] a[1,2]*(a[1,3], a[1,4]) = 2 4 16 16 #keep a[1,1] a[1,2]
a[1,1] a[1,2] a[1,3] a[1,3]*(a[1,4]) = 2 4 16 256 # #keep a[1,1] a[1,2] a[1,3]
Input:
> a<- matrix(2,4,4) # or any else matrix like a<- matrix(c(1,8,10,1,4,1),3,3)
> a
[,1] [,2] [,3] [,4]
[1,] 2 2 2 2
[2,] 2 2 2 2
[3,] 2 2 2 2
[4,] 2 2 2 2
Output:
> a
[,1] [,2] [,3] [,4]
[1,] 2 4 16 256
[2,] 2 4 16 256
[3,] 2 4 16 256
[4,] 2 4 16 256
EDIT: LOOP VERSION
a<- matrix(2,4,4);
ai<-a[,1,drop=F];
b<- matrix(numeric(0),nrow(a),ncol(a)-1);
i<- 1;
for ( i in 1:(ncol(a)-1)){
a<- a[,1]*a[,-1,drop=F];
b[,i]<- a[,1];
}
b<- cbind(ai[,1],b);
b
If I understand correctly, what you are trying to do is, starting with a matrix A with N columns, perform the following steps:
Step 1. Multiply columns 2 through N of A by column 1 of A. Call the resulting matrix A1.
Step 2. Multiply columns 3 through N of A1 by column 2 of A1. Call the resulting matrix A2.
...
Step (N-1). Multiply column N of A(N-2) by column (N-1) of A(N-2). This is the desired result.
If this is indeed what you are trying to do, you need to either write a double for loop (which you want to avoid, as you say) or come up with some iterative method of performing the above steps.
The double for way would look something like this
DoubleFor <- function(m) {
res <- m
for(i in 1:(ncol(res)-1)) {
for(j in (i+1):ncol(res)) {
res[, j] <- res[, i] * res[, j]
}
}
res
}
Using R's vectorized operations, you can avoid the inner for loop
SingleFor <- function(m) {
res <- m
for(i in 1:(ncol(res)-1))
res[, (i+1):ncol(res)] <- res[, i] * res[, (i+1):ncol(res)]
res
}
When it comes to iterating a procedure, you may want to define a recursive function, or use Reduce. The recursive function would be something like
RecursiveFun <- function(m, i = 1) {
if (i == ncol(m)) return(m)
n <- ncol(m)
m[, (i+1):n] <- m[, (i+1):n] * m[, i]
Recall(m, i + 1) # Thanks to #batiste for suggesting using Recall()!
}
while Reduce would use a similar function without the recursion (which is provided by Reduce)
ReduceFun <- function(m) {
Reduce(function(i, m) {
n <- ncol(m)
m[, (i+1):n] <- m[, (i+1):n] * m[, i]
m
}, c((ncol(m)-1):1, list(m)), right = T)
}
These will all produce the same result, e.g. testing on your matrix
a <- matrix(c(1, 8, 10, 1, 4, 1), 3, 3)
DoubleFor(a)
# [,1] [,2] [,3]
# [1,] 1 1 1
# [2,] 8 32 2048
# [3,] 10 10 1000
all(DoubleFor(a) == SingleFor(a) & SingleFor(a) == RecursiveFun(a) &
RecursiveFun(a) == ReduceFun(a))
# [1] TRUE
Just out of curiosity, I did a quick speed comparison, but I don't think any one of the above will be significantly faster than the others for your size of matrices, so I would just go with the one you think is more readable.
a <- matrix(rnorm(1e6), ncol = 1e3)
system.time(DoubleFor(a))
# user system elapsed
# 22.158 0.012 22.220
system.time(SingleFor(a))
# user system elapsed
# 27.349 0.004 27.415
system.time(RecursiveFun(a))
# user system elapsed
# 25.150 1.336 26.534
system.time(ReduceFun(a))
# user system elapsed
# 26.574 0.004 26.626

replace diagonal elements in an array

Does anyone know a neat/efficient way to replace diagonal elements in array, similar to the use of diag(x) <- value for a matrix? In other words something like this:
> m<-array(1:27,c(3,3,3))
> for(k in 1:3){
+ diag(m[,,k])<-5
+ }
> m
, , 1
[,1] [,2] [,3]
[1,] 5 4 7
[2,] 2 5 8
[3,] 3 6 5
, , 2
[,1] [,2] [,3]
[1,] 5 13 16
[2,] 11 5 17
[3,] 12 15 5
, , 3
[,1] [,2] [,3]
[1,] 5 22 25
[2,] 20 5 26
[3,] 21 24 5
but without the use of a for loop (my arrays are pretty large and this manipulation will already be within a loop).
Many thanks.
Try this:
with(expand.grid(a = 1:3, b = 1:3), replace(m, cbind(a, a, b), 5))
EDIT:
The question asked for neat/efficient but, of course, those are not the same thing. The one liner here is compact and loop-free but if you are looking for speed I think you will find that the loop in the question is actually the fastest of all the answers.
You can use the following function for that, provided you have only 3 dimensions in your array. You can generalize to more dimensions based on this code, but I'm too lazy to do that for you ;-)
`arraydiag<-` <- function(x,value){
dims <- dim(x)
id <- seq_len(dims[1]) +
dims[2]*(seq_len(dims[2])-1)
id <- outer(id,(seq_len(dims[3])-1)*prod(dims[1:2]),`+`)
x[id] <- value
dim(x) <- dims
x
}
This works like :
m<-array(1:36,c(3,3,4))
arraydiag(m)<-NA
m
Note that, contrary to the diag() function, this function cannot deal with matrices that are not square. You can look at the source code of diag() to find out how to adapt this code in order it does so.
diagArr <-
function (dim)
{
n <- dim[2]
if(dim[1] != n) stop("expecting first two dimensions to be equal")
d <- seq(1, n*n, by=n+1)
as.vector(outer(d, seq(0, by=n*n, length=prod(dim[-1:-2])), "+"))
}
m[diagArr(dim(m))] <- 5
This is written with the intention that it works for dimensions higher than 3 but I haven't tested it in that case. Should be okay though.

Resources