I'm working with a three-dimensional array in R that has dimensions determined by user arguments, and where the first dimension can be of length 1 or more.
Subsetting the array works fine if the first dimension is of length two or more:
Arr2 <- array(rnorm(2 * 4 * 7), dim = c(2, 4, 7))
Arr2[,,1]
But if the first dimension is of length one, the subset operator will return either a vector (if drop = TRUE) or a three-dimensional array (if drop = FALSE):
Arrrrgh <- array(rnorm(1 * 4 * 7), dim = c(1,4,7))
Arrrrgh[,,1]
Arrrrgh[,,1,drop=FALSE]
How can I subset this array along the third dimension, while preserving the first and second dimensions?
As you note, from ?"[" there are only two options to control the dimension, drop=TRUE (the default, which in this case will drop both the first and third dimensions) and drop=FALSE, which won't drop any dimension. Neither of these options returns the desired dimension of c(1, 4):
dim(Arrrrgh[,,1])
# NULL
dim(Arrrrgh[,,1,drop=FALSE])
# [1] 1 4 1
One way to address this would be to set the dimension yourself after the subsetting operation:
`dim<-`(Arrrrgh[,,1], dim(Arrrrgh)[1:2])
# [,1] [,2] [,3] [,4]
# [1,] 0.1548771 0.6833689 -0.7507798 1.271966
You could generalize this to a function that drops specified indices if they have a single value passed and doesn't drop any other indices:
extract.arr <- function(arr, ...) {
m <- match.call(expand.dots=FALSE)
missing <- sapply(m[["..."]], is.symbol)
dot.len <- sapply(m[["..."]], function(x) if (is.symbol(x)) 0 else length(eval(x)))
cdim <- dim(arr)
eff.dim <- ifelse(missing, cdim, dot.len)
`dim<-`(do.call("[", c(list(arr), m[["..."]])), eff.dim[eff.dim > 1 | missing])
}
extract.arr(Arrrrgh, ,,1)
# [,1] [,2] [,3] [,4]
# [1,] -0.8634659 1.031382 0.4290036 0.8359372
extract.arr(Arrrrgh, ,,1:2)
# , , 1
#
# [,1] [,2] [,3] [,4]
# [1,] -0.8634659 1.031382 0.4290036 0.8359372
#
# , , 2
#
# [,1] [,2] [,3] [,4]
# [1,] 0.6970842 0.1185803 0.3768951 -0.4577554
extract.arr(Arrrrgh, 1,1,)
# [1] -0.8634659 0.6970842 0.1580495 -1.6606119 -0.2749313 0.4810924 -1.1139392
Related
I have created an example below, where I am trying to make a list of each row of a matrix, then use apply().
mat<-matrix(rexp(9, rate=.1), ncol=3)
my_list2 <- list()
for(i in 1:nrow(mat)) {
my_list2[[i]] <- mat[i,]
}
#DO NOT CHANGE THIS:
apply(my_list2[[i]],2,sum)
However the apply() function does not work, giving a dimension error. I understand that apply() is not the best function to use here but it is present in a function that I need so I cannot change that line.
Does anyone have any idea how I can change my "my_list2" to work better? Thank you!
Edit:
Here is an example that works (non reproducible)
Example
Note both the example above and this example have type "list"
This answer addresses "how to properly get a list of matrices", not how to resolve the use of apply.
By default in R, when you subset a matrix to a single column or a single row, it reduces the dimensionality. For instance,
mtx <- matrix(1:6, nrow = 2)
mtx
# [,1] [,2] [,3]
# [1,] 1 3 5
# [2,] 2 4 6
mtx[1,]
# [1] 1 3 5
mtx[,3]
# [1] 5 6
If you want a single row or column but to otherwise retain dimensionality, add the drop=FALSE argument to the [-subsetting:
mtx[1,,drop=FALSE]
# [,1] [,2] [,3]
# [1,] 1 3 5
mtx[,3,drop=FALSE]
# [,1]
# [1,] 5
# [2,] 6
In this way, your code to produce sample data can be adjusted to be:
set.seed(42) # important for reproducibility in questions on SO
mat<-matrix(rexp(9, rate=.1), ncol=3)
my_list2 <- list()
for(i in 1:nrow(mat)) {
my_list2[[i]] <- mat[i,,drop=FALSE]
}
my_list2
# [[1]]
# [,1] [,2] [,3]
# [1,] 1.983368 0.381919 3.139846
# [[2]]
# [,1] [,2] [,3]
# [1,] 6.608953 4.731766 4.101296
# [[3]]
# [,1] [,2] [,3]
# [1,] 2.83491 14.63627 11.91598
And then you can use akrun's most recent code to resolve how to get the row-wise sums within each list element, i.e., one of
lapply(my_list2, apply, 2, sum)
lapply(my_list2, function(z) apply(z, 2, sum))
lapply(my_list2, \(z) apply(z, 2, sum)) # R-4.1 or later
In your screenshot it works because the object part of the list ex[[1]] is an array. And in your example the elements of your list are vectors. You could try the following:
mat<-matrix(rexp(9, rate=.1), ncol=3)
my_list2 <- list()
for(i in 1:nrow(mat)) {
my_list2[[i]] <- as.matrix(mat[i,])
}
#DO NOT CHANGE THIS:
apply(my_list2[[1]],2,sum)
apply(my_list2[[2]],2,sum)
apply(my_list2[[3]],2,sum)
You should note that apply cannot be applied to all three elements of the array in one line. And to do it in one, that line should be changed.
I have a list of matrices (size n*n), and I need to create a new matrix giving the minimum value observed for each cell, based on my list.
For instance, with the following matrices list:
> a = list(matrix(rexp(9), 3), matrix(rexp(9), 3), matrix(rexp(9), 3))
> a
[[1]]
[,1] [,2] [,3]
[1,] 0.5220069 0.39643016 0.04255687
[2,] 0.4464044 0.66029350 0.34116609
[3,] 2.2495949 0.01705576 0.08861866
[[2]]
[,1] [,2] [,3]
[1,] 0.3823704 0.271399 0.7388449
[2,] 0.1227819 1.160775 1.2131681
[3,] 0.1914548 1.004209 0.7628437
[[3]]
[,1] [,2] [,3]
[1,] 0.2125612 0.45379057 1.5987420
[2,] 0.3242311 0.02736743 0.4372894
[3,] 0.6634098 1.15401347 0.9008529
The output should be:
[,1] [,2] [,3]
[1,] 0.2125612 0.271399 0.04255687
[2,] 0.1227819 0.02736743 0.34116609
[3,] 0.1914548 0.01705576 0.08861866
I tried using apply loop with the following code (using melt and dcast from reshape2 library):
library(reshape2)
all = melt(a)
allComps = unique(all[,c(1:2)])
allComps$min=apply(allComps, 1, function(x){
g1 = x[1]
g2 = x[2]
b = unlist(lapply(a, function(y){
return(y[g1,g2])
}))
return(b[which(b==min(b))])
})
dcast(allComps, Var1~Var2)
It works but it is taking a very long time to run when applied on large matrices (6000*6000). I am looking for a faster way to do this.
Use Reduce with pmin :
Reduce(pmin, a)
# [,1] [,2] [,3]
#[1,] 0.02915345 0.03157736 0.3142273
#[2,] 0.57661027 0.05621098 0.1452668
#[3,] 0.48021473 0.18828404 0.4787604
data
set.seed(123)
a = list(matrix(rexp(9), 3), matrix(rexp(9), 3), matrix(rexp(9), 3))
Maybe it should be considered to store the matrices in an array instead of a list. This can be done with simplify2array. In an array the minimum over specific dimensions can be found using min in apply.
A <- simplify2array(a)
apply(A, 1:2, min)
We can use
apply(array(unlist(a), c(3, 3, 3)), 1:2, min)
I want to compute cumulative sum for the first (n-1) columns(if we have n columns matrix) and subsequently average the values. I created a sample matrix to do this task. I have the following matrix
ma = matrix(c(1:10), nrow = 2, ncol = 5)
ma
[,1] [,2] [,3] [,4] [,5]
[1,] 1 3 5 7 9
[2,] 2 4 6 8 10
I wanted to find the following
ans = matrix(c(1,2,2,3,3,4,4,5), nrow = 2, ncol = 4)
ans
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 2 3 4 5
The following are my r function.
ColCumSumsAve <- function(y){
for(i in seq_len(dim(y)[2]-1)) {
y[,i] <- cumsum(y[,i])/i
}
}
ColCumSumsAve(ma)
However, when I run the above function its not producing any output. Are there any mistakes in the code?
Thanks.
There were several mistakes.
Solution
This is what I tested and what works:
colCumSumAve <- function(m) {
csum <- t(apply(X=m, MARGIN=1, FUN=cumsum))
res <- t(Reduce(`/`, list(t(csum), 1:ncol(m))))
res[, 1:(ncol(m)-1)]
}
Test it with:
> colCumSumAve(ma)
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 2 3 4 5
which is correct.
Explanation:
colCumSumAve <- function(m) {
csum <- t(apply(X=m, MARGIN=1, FUN=cumsum)) # calculate row-wise colsum
res <- t(Reduce(`/`, list(t(csum), 1:ncol(m))))
# This is the trickiest part.
# Because `csum` is a matrix, the matrix will be treated like a vector
# when `Reduce`-ing using `/` with a vector `1:ncol(m)`.
# To get quasi-row-wise treatment, I change orientation
# of the matrix by `t()`.
# However, the output, the output will be in this transformed
# orientation as a consequence. So I re-transform by applying `t()`
# on the entire result at the end - to get again the original
# input matrix orientation.
# `Reduce` using `/` here by sequencial list of the `t(csum)` and
# `1:ncol(m)` finally, has as effect `/`-ing `csum` values by their
# corresponding column position.
res[, 1:(ncol(m)-1)] # removes last column for the answer.
# this, of course could be done right at the beginning,
# saving calculation of values in the last column,
# but this calculation actually is not the speed-limiting or speed-down-slowing step
# of these calculations (since this is sth vectorized)
# rather the `apply` and `Reduce` will be rather speed-limiting.
}
Well, okay, I could do then:
colCumSumAve <- function(m) {
csum <- t(apply(X=m[, 1:(ncol(m)-1)], MARGIN=1, FUN=cumsum))
t(Reduce(`/`, list(t(csum), 1:ncol(m))))
}
or:
colCumSumAve <- function(m) {
m <- m[, 1:(ncol(m)-1)] # remove last column
csum <- t(apply(X=m, MARGIN=1, FUN=cumsum))
t(Reduce(`/`, list(t(csum), 1:ncol(m))))
}
This is actually the more optimized solution, then.
Original Function
Your original function makes only assignments in the for-loop and doesn't return anything.
So I copied first your input into a res, processed it with your for-loop and then returned res.
ColCumSumsAve <- function(y){
res <- y
for(i in seq_len(dim(y)[2]-1)) {
res[,i] <- cumsum(y[,i])/i
}
res
}
However, this gives:
> ColCumSumsAve(ma)
[,1] [,2] [,3] [,4] [,5]
[1,] 1 1.5 1.666667 1.75 9
[2,] 3 3.5 3.666667 3.75 10
The problem is that the cumsum in matrices is calculated in column-direction instead row-wise, since it treats the matrix like a vector (which goes columnwise through the matrix).
Corrected Original Function
After some frickeling, I realized, the correct solution is:
ColCumSumsAve <- function(y){
res <- matrix(NA, nrow(y), ncol(y)-1)
# create empty matrix with the dimensions of y minus last column
for (i in 1:(nrow(y))) { # go through rows
for (j in 1:(ncol(y)-1)) { # go through columns
res[i, j] <- sum(y[i, 1:j])/j # for each position do this
}
}
res # return `res`ult by calling it at the end!
}
with the testing:
> ColCumSumsAve(ma)
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 2 3 4 5
Note: dim(y)[2] is ncol(y) - and dim(y)[1] is nrow(y) -
and instead seq_len(), 1: is shorter and I guess even slightly faster.
Note: My solution given first will be faster, since it uses apply, vectorized cumsum and Reduce. - for-loops in R are slower.
Late Note: Not so sure that the first solution is faster. Since R-3.x it seems that for loops are faster. Reduce will be the speed limiting funtion and can be sometimes incredibly slow.
k <- t(apply(ma,1,cumsum))[,-ncol(k)]
for (i in 1:ncol(k)){
k[,i] <- k[,i]/i
}
k
This should work.
All you need is rowMeans:
nc <- 4
cbind(ma[,1],sapply(2:nc,function(x) rowMeans(ma[,1:x])))
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 2 3 4 5
Here's how I did it
> t(apply(ma, 1, function(x) cumsum(x) / 1:length(x)))[,-NCOL(ma)]
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 2 3 4 5
This applies the cumsum function row-wise to the matrix ma and then divides by the correct length to get the average (cumsum(x) and 1:length(x) will have the same length). Then simply transpose with t and remove the last column with [,-NCOL(ma)].
The reason why there is no output from your function is because you aren't returning anything. You should end the function with return(y) or simply y as Marius suggested. Regardless, your function doesn't seem to give you the correct response anyway.
If I want to calculate the n-dimensional distance of two vectors, I can use a function such as:
a = c(1:10)
b = seq(20, 23, length.out = length(a))
test_fun =
function(x,y) {
return(
sqrt(
sum(
(x - y) ^ 2
)
)
)
}
n_distance = test_fun(a,b)
Now, I want to expand this to a matrix setting: I want to calculate the n-dimensional distance for each pair of rows of two matrices.
set.seed(123)
a_mtx = matrix(1:30, ncol = 5)
b_mtx = matrix(sample(1:15,15), ncol = 5)
n_distance_mtx =
matrix(
NA,
nrow = nrow(b_mtx),
ncol = nrow(a_mtx)
)
for(i in 1:nrow(b_mtx)) {
for(j in 1:nrow(a_mtx)) {
n_distance_mtx[i,j] =
test_fun(a_mtx[j,], b_mtx[i,])
}
}
Where each column of n_distance_mtx contains the distance metrics between each row of a_mtx and b_mtx (so n_distance_mtx[,1] is the distance between a_mtx[1,] and b_mtx[1:3,].
If I calculate column means on n_distance_mtx I can obtain the mean distance between each row in a_mtx and all rows of b_mtx.
colMeans(n_distance_mtx)
#[1] 23.79094 24.90281 26.15618 27.53303 29.01668 30.59220
So 23.79094 is the mean distance between a_mtx[1,] and b_mtx[1:3,], and 24.90281 is the mean distance between a_mtx[2,] and b_mtx[1:3,], and so on.
Question: How can I arrive at the same solution without using for-loops?
I want to apply this method to matrices with much larger dimension (on the order of hundreds of thousands of rows). Looking at this and this, it seems there must be a way to accomplish this with a Vectorized outer function, but I have been unable to generate such a function.
test_fun_vec =
Vectorize(
function(x,y) {
outer(
x,
y,
test_fun
)
}
)
test_fun_vec(a_mtx,b_mtx)
#[1] 4 0 2 7 4 6 3 5 1 5 7 5 10 0 9 11 15 17 8 11 9 12 10 16
#[25] 10 22 20 25 15 24
We can use Vectorize with outer
f1 <- Vectorize(function(i, j) test_fun(a_mtx[j, ], b_mtx[i, ]))
out <- outer(seq_len(nrow(b_mtx)), seq_len(nrow(a_mtx)), FUN = f1)
out
# [,1] [,2] [,3] [,4] [,5] [,6]
#[1,] 20.88061 21.84033 22.97825 24.26932 25.69047 27.22132
#[2,] 24.87971 25.57342 26.43861 27.45906 28.61818 29.89983
#[3,] 25.61250 27.29469 29.05168 30.87070 32.74141 34.65545
colMeans(out)
#[1] 23.79094 24.90281 26.15618 27.53303 29.01668 30.59220
identical(n_distance_mtx, out)
#[1] TRUE
If I unsderstood your question right, you want the Euclidean distance between each vector (row) in a_mtx to the other vectors in b_mtx.
If so, you could use apply twice like this:
result = apply(a_mtx, 1, function(x){ apply(b_mtx, 1, function(y){ test_fun(x,y) })})
This gives a distance matrix:
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 20.88061 21.84033 22.97825 24.26932 25.69047 27.22132
[2,] 24.87971 25.57342 26.43861 27.45906 28.61818 29.89983
[3,] 25.61250 27.29469 29.05168 30.87070 32.74141 34.65545
where the row index is the corresponding vector (row) from b_mtx and the column index is the corresponding vector from a_mtx
Finally, obtain the mean distance using:
colMeans(result)
[1] 23.79094 24.90281 26.15618 27.53303 29.01668 30.59220
I have a matrix of values arranged in different columns per row.
What I want my code to do :
Iterate over a row -> check if value of Column < threshold (e.g. 1)
Within the row, if there are say 80% columns satisfying that condition, Keep the row ; else remove the full row.
Code so far :
myfilt <- function(t,x){
if ((length(which(t[x,] > 1)) / 60) >= 0.8){
return(1)
}else{
return(0)
}
}
y=c()
for(i in 1:length(t[,1])){
y = c(y,myfilt(t,i))
}
But when I print t[v,] all the rows have same value :(
Not sure what I am doing wrong. Also if there is a shorter way to do this, let me know.
P.S. : Here 't' is the name of matrix I am testing
Here's a way to do it :
## Parameters
threshold <- 0.8
perc.to.keep <- 0.5
## Example Matrix
set.seed(1337)
m <- matrix(rnorm(25,1,1),nrow=5,ncol=5)
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1.7122837 0.8383025 -0.02718379 2.2157099 2.1291008
# [2,] 0.2462742 2.4602621 -0.04117532 -0.6214087 1.4501467
# [3,] 1.0381899 3.0094584 0.12937698 0.9314247 1.0505864
# [4,] 2.1784211 0.9220618 1.85313022 0.9370171 0.8756698
# [5,] 0.8467962 2.3543421 0.37723981 2.0757077 1.9120115
test <- m < threshold
sel <- apply(test,1,function(v) sum(v)/length(v)) < perc
m[sel,]
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1.7122837 0.8383025 -0.02718379 2.2157099 2.1291008
# [2,] 1.0381899 3.0094584 0.12937698 0.9314247 1.0505864
# [3,] 2.1784211 0.9220618 1.85313022 0.9370171 0.8756698
# [4,] 0.8467962 2.3543421 0.37723981 2.0757077 1.9120115