subset an R array that may have a dimension of length 1 - r

I'm working with a three-dimensional array in R that has dimensions determined by user arguments, and where the first dimension can be of length 1 or more.
Subsetting the array works fine if the first dimension is of length two or more:
Arr2 <- array(rnorm(2 * 4 * 7), dim = c(2, 4, 7))
Arr2[,,1]
But if the first dimension is of length one, the subset operator will return either a vector (if drop = TRUE) or a three-dimensional array (if drop = FALSE):
Arrrrgh <- array(rnorm(1 * 4 * 7), dim = c(1,4,7))
Arrrrgh[,,1]
Arrrrgh[,,1,drop=FALSE]
How can I subset this array along the third dimension, while preserving the first and second dimensions?

As you note, from ?"[" there are only two options to control the dimension, drop=TRUE (the default, which in this case will drop both the first and third dimensions) and drop=FALSE, which won't drop any dimension. Neither of these options returns the desired dimension of c(1, 4):
dim(Arrrrgh[,,1])
# NULL
dim(Arrrrgh[,,1,drop=FALSE])
# [1] 1 4 1
One way to address this would be to set the dimension yourself after the subsetting operation:
`dim<-`(Arrrrgh[,,1], dim(Arrrrgh)[1:2])
# [,1] [,2] [,3] [,4]
# [1,] 0.1548771 0.6833689 -0.7507798 1.271966
You could generalize this to a function that drops specified indices if they have a single value passed and doesn't drop any other indices:
extract.arr <- function(arr, ...) {
m <- match.call(expand.dots=FALSE)
missing <- sapply(m[["..."]], is.symbol)
dot.len <- sapply(m[["..."]], function(x) if (is.symbol(x)) 0 else length(eval(x)))
cdim <- dim(arr)
eff.dim <- ifelse(missing, cdim, dot.len)
`dim<-`(do.call("[", c(list(arr), m[["..."]])), eff.dim[eff.dim > 1 | missing])
}
extract.arr(Arrrrgh, ,,1)
# [,1] [,2] [,3] [,4]
# [1,] -0.8634659 1.031382 0.4290036 0.8359372
extract.arr(Arrrrgh, ,,1:2)
# , , 1
#
# [,1] [,2] [,3] [,4]
# [1,] -0.8634659 1.031382 0.4290036 0.8359372
#
# , , 2
#
# [,1] [,2] [,3] [,4]
# [1,] 0.6970842 0.1185803 0.3768951 -0.4577554
extract.arr(Arrrrgh, 1,1,)
# [1] -0.8634659 0.6970842 0.1580495 -1.6606119 -0.2749313 0.4810924 -1.1139392

Related

Apply() cannot be applied to this list?

I have created an example below, where I am trying to make a list of each row of a matrix, then use apply().
mat<-matrix(rexp(9, rate=.1), ncol=3)
my_list2 <- list()
for(i in 1:nrow(mat)) {
my_list2[[i]] <- mat[i,]
}
#DO NOT CHANGE THIS:
apply(my_list2[[i]],2,sum)
However the apply() function does not work, giving a dimension error. I understand that apply() is not the best function to use here but it is present in a function that I need so I cannot change that line.
Does anyone have any idea how I can change my "my_list2" to work better? Thank you!
Edit:
Here is an example that works (non reproducible)
Example
Note both the example above and this example have type "list"
This answer addresses "how to properly get a list of matrices", not how to resolve the use of apply.
By default in R, when you subset a matrix to a single column or a single row, it reduces the dimensionality. For instance,
mtx <- matrix(1:6, nrow = 2)
mtx
# [,1] [,2] [,3]
# [1,] 1 3 5
# [2,] 2 4 6
mtx[1,]
# [1] 1 3 5
mtx[,3]
# [1] 5 6
If you want a single row or column but to otherwise retain dimensionality, add the drop=FALSE argument to the [-subsetting:
mtx[1,,drop=FALSE]
# [,1] [,2] [,3]
# [1,] 1 3 5
mtx[,3,drop=FALSE]
# [,1]
# [1,] 5
# [2,] 6
In this way, your code to produce sample data can be adjusted to be:
set.seed(42) # important for reproducibility in questions on SO
mat<-matrix(rexp(9, rate=.1), ncol=3)
my_list2 <- list()
for(i in 1:nrow(mat)) {
my_list2[[i]] <- mat[i,,drop=FALSE]
}
my_list2
# [[1]]
# [,1] [,2] [,3]
# [1,] 1.983368 0.381919 3.139846
# [[2]]
# [,1] [,2] [,3]
# [1,] 6.608953 4.731766 4.101296
# [[3]]
# [,1] [,2] [,3]
# [1,] 2.83491 14.63627 11.91598
And then you can use akrun's most recent code to resolve how to get the row-wise sums within each list element, i.e., one of
lapply(my_list2, apply, 2, sum)
lapply(my_list2, function(z) apply(z, 2, sum))
lapply(my_list2, \(z) apply(z, 2, sum)) # R-4.1 or later
In your screenshot it works because the object part of the list ex[[1]] is an array. And in your example the elements of your list are vectors. You could try the following:
mat<-matrix(rexp(9, rate=.1), ncol=3)
my_list2 <- list()
for(i in 1:nrow(mat)) {
my_list2[[i]] <- as.matrix(mat[i,])
}
#DO NOT CHANGE THIS:
apply(my_list2[[1]],2,sum)
apply(my_list2[[2]],2,sum)
apply(my_list2[[3]],2,sum)
You should note that apply cannot be applied to all three elements of the array in one line. And to do it in one, that line should be changed.

Efficiently finding minimum cells values from a set of matrices in R

I have a list of matrices (size n*n), and I need to create a new matrix giving the minimum value observed for each cell, based on my list.
For instance, with the following matrices list:
> a = list(matrix(rexp(9), 3), matrix(rexp(9), 3), matrix(rexp(9), 3))
> a
[[1]]
[,1] [,2] [,3]
[1,] 0.5220069 0.39643016 0.04255687
[2,] 0.4464044 0.66029350 0.34116609
[3,] 2.2495949 0.01705576 0.08861866
[[2]]
[,1] [,2] [,3]
[1,] 0.3823704 0.271399 0.7388449
[2,] 0.1227819 1.160775 1.2131681
[3,] 0.1914548 1.004209 0.7628437
[[3]]
[,1] [,2] [,3]
[1,] 0.2125612 0.45379057 1.5987420
[2,] 0.3242311 0.02736743 0.4372894
[3,] 0.6634098 1.15401347 0.9008529
The output should be:
[,1] [,2] [,3]
[1,] 0.2125612 0.271399 0.04255687
[2,] 0.1227819 0.02736743 0.34116609
[3,] 0.1914548 0.01705576 0.08861866
I tried using apply loop with the following code (using melt and dcast from reshape2 library):
library(reshape2)
all = melt(a)
allComps = unique(all[,c(1:2)])
allComps$min=apply(allComps, 1, function(x){
g1 = x[1]
g2 = x[2]
b = unlist(lapply(a, function(y){
return(y[g1,g2])
}))
return(b[which(b==min(b))])
})
dcast(allComps, Var1~Var2)
It works but it is taking a very long time to run when applied on large matrices (6000*6000). I am looking for a faster way to do this.
Use Reduce with pmin :
Reduce(pmin, a)
# [,1] [,2] [,3]
#[1,] 0.02915345 0.03157736 0.3142273
#[2,] 0.57661027 0.05621098 0.1452668
#[3,] 0.48021473 0.18828404 0.4787604
data
set.seed(123)
a = list(matrix(rexp(9), 3), matrix(rexp(9), 3), matrix(rexp(9), 3))
Maybe it should be considered to store the matrices in an array instead of a list. This can be done with simplify2array. In an array the minimum over specific dimensions can be found using min in apply.
A <- simplify2array(a)
apply(A, 1:2, min)
We can use
apply(array(unlist(a), c(3, 3, 3)), 1:2, min)

Finding cumulative sum and then average the values in R

I want to compute cumulative sum for the first (n-1) columns(if we have n columns matrix) and subsequently average the values. I created a sample matrix to do this task. I have the following matrix
ma = matrix(c(1:10), nrow = 2, ncol = 5)
ma
[,1] [,2] [,3] [,4] [,5]
[1,] 1 3 5 7 9
[2,] 2 4 6 8 10
I wanted to find the following
ans = matrix(c(1,2,2,3,3,4,4,5), nrow = 2, ncol = 4)
ans
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 2 3 4 5
The following are my r function.
ColCumSumsAve <- function(y){
for(i in seq_len(dim(y)[2]-1)) {
y[,i] <- cumsum(y[,i])/i
}
}
ColCumSumsAve(ma)
However, when I run the above function its not producing any output. Are there any mistakes in the code?
Thanks.
There were several mistakes.
Solution
This is what I tested and what works:
colCumSumAve <- function(m) {
csum <- t(apply(X=m, MARGIN=1, FUN=cumsum))
res <- t(Reduce(`/`, list(t(csum), 1:ncol(m))))
res[, 1:(ncol(m)-1)]
}
Test it with:
> colCumSumAve(ma)
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 2 3 4 5
which is correct.
Explanation:
colCumSumAve <- function(m) {
csum <- t(apply(X=m, MARGIN=1, FUN=cumsum)) # calculate row-wise colsum
res <- t(Reduce(`/`, list(t(csum), 1:ncol(m))))
# This is the trickiest part.
# Because `csum` is a matrix, the matrix will be treated like a vector
# when `Reduce`-ing using `/` with a vector `1:ncol(m)`.
# To get quasi-row-wise treatment, I change orientation
# of the matrix by `t()`.
# However, the output, the output will be in this transformed
# orientation as a consequence. So I re-transform by applying `t()`
# on the entire result at the end - to get again the original
# input matrix orientation.
# `Reduce` using `/` here by sequencial list of the `t(csum)` and
# `1:ncol(m)` finally, has as effect `/`-ing `csum` values by their
# corresponding column position.
res[, 1:(ncol(m)-1)] # removes last column for the answer.
# this, of course could be done right at the beginning,
# saving calculation of values in the last column,
# but this calculation actually is not the speed-limiting or speed-down-slowing step
# of these calculations (since this is sth vectorized)
# rather the `apply` and `Reduce` will be rather speed-limiting.
}
Well, okay, I could do then:
colCumSumAve <- function(m) {
csum <- t(apply(X=m[, 1:(ncol(m)-1)], MARGIN=1, FUN=cumsum))
t(Reduce(`/`, list(t(csum), 1:ncol(m))))
}
or:
colCumSumAve <- function(m) {
m <- m[, 1:(ncol(m)-1)] # remove last column
csum <- t(apply(X=m, MARGIN=1, FUN=cumsum))
t(Reduce(`/`, list(t(csum), 1:ncol(m))))
}
This is actually the more optimized solution, then.
Original Function
Your original function makes only assignments in the for-loop and doesn't return anything.
So I copied first your input into a res, processed it with your for-loop and then returned res.
ColCumSumsAve <- function(y){
res <- y
for(i in seq_len(dim(y)[2]-1)) {
res[,i] <- cumsum(y[,i])/i
}
res
}
However, this gives:
> ColCumSumsAve(ma)
[,1] [,2] [,3] [,4] [,5]
[1,] 1 1.5 1.666667 1.75 9
[2,] 3 3.5 3.666667 3.75 10
The problem is that the cumsum in matrices is calculated in column-direction instead row-wise, since it treats the matrix like a vector (which goes columnwise through the matrix).
Corrected Original Function
After some frickeling, I realized, the correct solution is:
ColCumSumsAve <- function(y){
res <- matrix(NA, nrow(y), ncol(y)-1)
# create empty matrix with the dimensions of y minus last column
for (i in 1:(nrow(y))) { # go through rows
for (j in 1:(ncol(y)-1)) { # go through columns
res[i, j] <- sum(y[i, 1:j])/j # for each position do this
}
}
res # return `res`ult by calling it at the end!
}
with the testing:
> ColCumSumsAve(ma)
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 2 3 4 5
Note: dim(y)[2] is ncol(y) - and dim(y)[1] is nrow(y) -
and instead seq_len(), 1: is shorter and I guess even slightly faster.
Note: My solution given first will be faster, since it uses apply, vectorized cumsum and Reduce. - for-loops in R are slower.
Late Note: Not so sure that the first solution is faster. Since R-3.x it seems that for loops are faster. Reduce will be the speed limiting funtion and can be sometimes incredibly slow.
k <- t(apply(ma,1,cumsum))[,-ncol(k)]
for (i in 1:ncol(k)){
k[,i] <- k[,i]/i
}
k
This should work.
All you need is rowMeans:
nc <- 4
cbind(ma[,1],sapply(2:nc,function(x) rowMeans(ma[,1:x])))
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 2 3 4 5
Here's how I did it
> t(apply(ma, 1, function(x) cumsum(x) / 1:length(x)))[,-NCOL(ma)]
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 2 3 4 5
This applies the cumsum function row-wise to the matrix ma and then divides by the correct length to get the average (cumsum(x) and 1:length(x) will have the same length). Then simply transpose with t and remove the last column with [,-NCOL(ma)].
The reason why there is no output from your function is because you aren't returning anything. You should end the function with return(y) or simply y as Marius suggested. Regardless, your function doesn't seem to give you the correct response anyway.

Use of function over all row-pairs of two matrices

If I want to calculate the n-dimensional distance of two vectors, I can use a function such as:
a = c(1:10)
b = seq(20, 23, length.out = length(a))
test_fun =
function(x,y) {
return(
sqrt(
sum(
(x - y) ^ 2
)
)
)
}
n_distance = test_fun(a,b)
Now, I want to expand this to a matrix setting: I want to calculate the n-dimensional distance for each pair of rows of two matrices.
set.seed(123)
a_mtx = matrix(1:30, ncol = 5)
b_mtx = matrix(sample(1:15,15), ncol = 5)
n_distance_mtx =
matrix(
NA,
nrow = nrow(b_mtx),
ncol = nrow(a_mtx)
)
for(i in 1:nrow(b_mtx)) {
for(j in 1:nrow(a_mtx)) {
n_distance_mtx[i,j] =
test_fun(a_mtx[j,], b_mtx[i,])
}
}
Where each column of n_distance_mtx contains the distance metrics between each row of a_mtx and b_mtx (so n_distance_mtx[,1] is the distance between a_mtx[1,] and b_mtx[1:3,].
If I calculate column means on n_distance_mtx I can obtain the mean distance between each row in a_mtx and all rows of b_mtx.
colMeans(n_distance_mtx)
#[1] 23.79094 24.90281 26.15618 27.53303 29.01668 30.59220
So 23.79094 is the mean distance between a_mtx[1,] and b_mtx[1:3,], and 24.90281 is the mean distance between a_mtx[2,] and b_mtx[1:3,], and so on.
Question: How can I arrive at the same solution without using for-loops?
I want to apply this method to matrices with much larger dimension (on the order of hundreds of thousands of rows). Looking at this and this, it seems there must be a way to accomplish this with a Vectorized outer function, but I have been unable to generate such a function.
test_fun_vec =
Vectorize(
function(x,y) {
outer(
x,
y,
test_fun
)
}
)
test_fun_vec(a_mtx,b_mtx)
#[1] 4 0 2 7 4 6 3 5 1 5 7 5 10 0 9 11 15 17 8 11 9 12 10 16
#[25] 10 22 20 25 15 24
We can use Vectorize with outer
f1 <- Vectorize(function(i, j) test_fun(a_mtx[j, ], b_mtx[i, ]))
out <- outer(seq_len(nrow(b_mtx)), seq_len(nrow(a_mtx)), FUN = f1)
out
# [,1] [,2] [,3] [,4] [,5] [,6]
#[1,] 20.88061 21.84033 22.97825 24.26932 25.69047 27.22132
#[2,] 24.87971 25.57342 26.43861 27.45906 28.61818 29.89983
#[3,] 25.61250 27.29469 29.05168 30.87070 32.74141 34.65545
colMeans(out)
#[1] 23.79094 24.90281 26.15618 27.53303 29.01668 30.59220
identical(n_distance_mtx, out)
#[1] TRUE
If I unsderstood your question right, you want the Euclidean distance between each vector (row) in a_mtx to the other vectors in b_mtx.
If so, you could use apply twice like this:
result = apply(a_mtx, 1, function(x){ apply(b_mtx, 1, function(y){ test_fun(x,y) })})
This gives a distance matrix:
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 20.88061 21.84033 22.97825 24.26932 25.69047 27.22132
[2,] 24.87971 25.57342 26.43861 27.45906 28.61818 29.89983
[3,] 25.61250 27.29469 29.05168 30.87070 32.74141 34.65545
where the row index is the corresponding vector (row) from b_mtx and the column index is the corresponding vector from a_mtx
Finally, obtain the mean distance using:
colMeans(result)
[1] 23.79094 24.90281 26.15618 27.53303 29.01668 30.59220

check column values and print/delete rows satisfying condition based on percent of columns

I have a matrix of values arranged in different columns per row.
What I want my code to do :
Iterate over a row -> check if value of Column < threshold (e.g. 1)
Within the row, if there are say 80% columns satisfying that condition, Keep the row ; else remove the full row.
Code so far :
myfilt <- function(t,x){
if ((length(which(t[x,] > 1)) / 60) >= 0.8){
return(1)
}else{
return(0)
}
}
y=c()
for(i in 1:length(t[,1])){
y = c(y,myfilt(t,i))
}
But when I print t[v,] all the rows have same value :(
Not sure what I am doing wrong. Also if there is a shorter way to do this, let me know.
P.S. : Here 't' is the name of matrix I am testing
Here's a way to do it :
## Parameters
threshold <- 0.8
perc.to.keep <- 0.5
## Example Matrix
set.seed(1337)
m <- matrix(rnorm(25,1,1),nrow=5,ncol=5)
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1.7122837 0.8383025 -0.02718379 2.2157099 2.1291008
# [2,] 0.2462742 2.4602621 -0.04117532 -0.6214087 1.4501467
# [3,] 1.0381899 3.0094584 0.12937698 0.9314247 1.0505864
# [4,] 2.1784211 0.9220618 1.85313022 0.9370171 0.8756698
# [5,] 0.8467962 2.3543421 0.37723981 2.0757077 1.9120115
test <- m < threshold
sel <- apply(test,1,function(v) sum(v)/length(v)) < perc
m[sel,]
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1.7122837 0.8383025 -0.02718379 2.2157099 2.1291008
# [2,] 1.0381899 3.0094584 0.12937698 0.9314247 1.0505864
# [3,] 2.1784211 0.9220618 1.85313022 0.9370171 0.8756698
# [4,] 0.8467962 2.3543421 0.37723981 2.0757077 1.9120115

Resources