Padding or shifting a multi-dimensional array - r

How can I simply pad (append/prepend) a slice of NA's to a (say) 3D array along (say) dimension 2?
Suppose the initial array is given as
A <- array(1:8,c(2,2,2))
I initially thought this would work:
cbind(A,NA)
but it results in an 8x2 matrix instead of a 2x3x2 array. I then tried
abind(A,NA,along=2)
but that results in an error.
I'm hoping there is a much simpler solution than
dimSlice <- dim(A)
dimSlice[2] <- 1
abind(A,array(NA,dimSlice),along=2)
Background
This padding happens as part of a "remove slice and pad opposite side" operation that shifts an array by one position along some dimension, filling in with NA elements at the vacated positions. The one-dimensional equivalent would be, for example, c(A[-1],NA) for vector A, If there is a simple way to accomplish such an operation without an explicit padding sub-operation, that would be even better.

Subsetting with NAs results in NAs (?Extract):
v = 1:3; m = matrix(1:4, 2, 2); a = array(1:6, c(2, 2, 2))
v[c(NA, 1)]
#[1] NA 1
m[, c(2, NA)]
# [,1] [,2]
#[1,] 3 NA
#[2,] 4 NA
a[, c(1, 2, NA), ]
#, , 1
#
# [,1] [,2] [,3]
#[1,] 1 3 NA
#[2,] 2 4 NA
#
#, , 2
#
# [,1] [,2] [,3]
#[1,] 5 1 NA
#[2,] 6 2 NA
So, to pad with NAs, we could subset using the appropriate indices. Putting the above in a more general function to append/prepend "n" indices with NA in dimension "k" of an array:
pad = function(x, k, n = 1L, append = TRUE)
{
dims = replicate(length(dim(x)), substitute(), simplify = FALSE)
if(append) dims[[k]] = c((n + 1):dim(x)[[k]], rep_len(NA, n))
else dims[[k]] = c(rep_len(NA, n), 1:(dim(x)[[k]] - n))
do.call("[", c(list(x), dims))
}
arr = array(1:24, c(3, 2, 2, 2))
pad(arr, 1, 2, FALSE)
pad(arr, 2)

Related

Function that only runs when matrices has NAs in it

I have two matrices, one of them has a NA value and I want to use a function that only runs if there are NAs present in the data, so if I run the function it should only work on df2 and not df1. How would I do this?
df1 <- matrix(1:4, nrow = 2, ncol = 2)
df2 <- matrix(1,2,3,NA, nrow = 2, ncol = 2)
Based on the comment above, here is a complete answer (assuming I understand what you are getting at). The function is set up to do something or not to the matrix depending on whether it has NA values.
df1 <- matrix(1:4, nrow = 2, ncol = 2)
df2 <- matrix(c(1,2,3,NA), nrow = 2, ncol = 2)
myfunc <- function(m) {
ret <- m
if (all(!is.na(m))) {
print("This matrix has no NAs")
} else {
print("This matrix has NAs")
}
return(ret)
}
myfunc(df1)
# [1] "This matrix has no NAs"
# [,1] [,2]
# [1,] 1 3
# [2,] 2 4
myfunc(df2)
# [1] "This matrix has NAs"
# [,1] [,2]
# [1,] 1 3
# [2,] 2 NA

Create a vector of different sequence repeated with runif and rnorm

I want to do something like :
vector <- c(runif(3),rnorm(1), runif(3), rnorm(1))
I've tried :
vector <- rep( c(runif(3), rnorm(1) ), times = 2) )
But the problem is that it's two times the same sequence.
If you can help me please.
Have a nice day
This is exactly what replicate is meant for.
From the help('replicate') page (my emphasis):
replicate is a wrapper for the common use of sapply for repeated evaluation of an expression (which will usually involve random number generation).
set.seed(1234)
vector <- replicate(2, c(runif(3),rnorm(1)))
vector
# [,1] [,2]
#[1,] 0.1137034 0.640310605
#[2,] 0.6222994 0.009495756
#[3,] 0.6092747 0.232550506
#[4,] 0.3143686 0.429124689
Edit
After the explanation in this comment, I believe the follwing is closer to what the question asks for. Note that each matrix 2x2 has the elements in the previous output in the correct order.
set.seed(1234)
W <- array(dim = c(2, 2, 2))
W[] <- replicate(2, c(runif(3), rnorm(1)))
W
#, , 1
#
# [,1] [,2]
#[1,] 0.1137034 0.6092747
#[2,] 0.6222994 0.3143686
#
#, , 2
#
# [,1] [,2]
#[1,] 0.640310605 0.2325505
#[2,] 0.009495756 0.4291247
You can do this by filling declaring the full vector first, then filling the indices for each distribution at once:
out_length = 4L * 2L
# every fourth element will come from rnorm; the rest from runif
norm_idx = seq(4L, out_length, by = 4L)
n_norm = length(norm_idx)
# declare output
out = numeric(out_length)
out[norm_idx] = rnorm(n_norm)
out[-norm_idx] = runif(out_length - n_norm)
Alternatively, here's a tricky way to accomplish this using matrix indexing:
set.seed(394839)
m = matrix(0, nrow = 4L, ncol = 2L)
m[1:3, ] = runif(3L * ncol(m))
m[4L, ] = rnorm(ncol(m))
c(m)
# [1] 0.4478556 0.1336022 0.5860134 -0.1626707 0.7055598 0.7631879 0.3132743 1.5485366
in R, matrices are just vectors with dimensions, and they filled column-by-column -- hence we can declare this matrix:
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1 3 5 7 9
# [2,] 2 4 6 8 10
like this:
matrix(1:10, nrow = 2L, ncol = 5L)
with that in mind, we can replicate your 3-1-3-1 pattern by making 3-1 be the pattern within each column.
You can confirm it's working by scaling up (so small sample effects are muted):
nrep = 1e4
set.seed(39893)
m = matrix(0, nrow = 4L, ncol = nrep)
m[1:3, ] = runif(3L * nrep)
m[4L, ] = rnorm(nrep)
out = c(m)
idx = seq(4L, length(out), by = 4L)
plot(density(out[idx]), main = 'Normally distributed')
plot(density(out[-idx]), main = 'Uniformly distributed')

Use of function over all row-pairs of two matrices

If I want to calculate the n-dimensional distance of two vectors, I can use a function such as:
a = c(1:10)
b = seq(20, 23, length.out = length(a))
test_fun =
function(x,y) {
return(
sqrt(
sum(
(x - y) ^ 2
)
)
)
}
n_distance = test_fun(a,b)
Now, I want to expand this to a matrix setting: I want to calculate the n-dimensional distance for each pair of rows of two matrices.
set.seed(123)
a_mtx = matrix(1:30, ncol = 5)
b_mtx = matrix(sample(1:15,15), ncol = 5)
n_distance_mtx =
matrix(
NA,
nrow = nrow(b_mtx),
ncol = nrow(a_mtx)
)
for(i in 1:nrow(b_mtx)) {
for(j in 1:nrow(a_mtx)) {
n_distance_mtx[i,j] =
test_fun(a_mtx[j,], b_mtx[i,])
}
}
Where each column of n_distance_mtx contains the distance metrics between each row of a_mtx and b_mtx (so n_distance_mtx[,1] is the distance between a_mtx[1,] and b_mtx[1:3,].
If I calculate column means on n_distance_mtx I can obtain the mean distance between each row in a_mtx and all rows of b_mtx.
colMeans(n_distance_mtx)
#[1] 23.79094 24.90281 26.15618 27.53303 29.01668 30.59220
So 23.79094 is the mean distance between a_mtx[1,] and b_mtx[1:3,], and 24.90281 is the mean distance between a_mtx[2,] and b_mtx[1:3,], and so on.
Question: How can I arrive at the same solution without using for-loops?
I want to apply this method to matrices with much larger dimension (on the order of hundreds of thousands of rows). Looking at this and this, it seems there must be a way to accomplish this with a Vectorized outer function, but I have been unable to generate such a function.
test_fun_vec =
Vectorize(
function(x,y) {
outer(
x,
y,
test_fun
)
}
)
test_fun_vec(a_mtx,b_mtx)
#[1] 4 0 2 7 4 6 3 5 1 5 7 5 10 0 9 11 15 17 8 11 9 12 10 16
#[25] 10 22 20 25 15 24
We can use Vectorize with outer
f1 <- Vectorize(function(i, j) test_fun(a_mtx[j, ], b_mtx[i, ]))
out <- outer(seq_len(nrow(b_mtx)), seq_len(nrow(a_mtx)), FUN = f1)
out
# [,1] [,2] [,3] [,4] [,5] [,6]
#[1,] 20.88061 21.84033 22.97825 24.26932 25.69047 27.22132
#[2,] 24.87971 25.57342 26.43861 27.45906 28.61818 29.89983
#[3,] 25.61250 27.29469 29.05168 30.87070 32.74141 34.65545
colMeans(out)
#[1] 23.79094 24.90281 26.15618 27.53303 29.01668 30.59220
identical(n_distance_mtx, out)
#[1] TRUE
If I unsderstood your question right, you want the Euclidean distance between each vector (row) in a_mtx to the other vectors in b_mtx.
If so, you could use apply twice like this:
result = apply(a_mtx, 1, function(x){ apply(b_mtx, 1, function(y){ test_fun(x,y) })})
This gives a distance matrix:
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 20.88061 21.84033 22.97825 24.26932 25.69047 27.22132
[2,] 24.87971 25.57342 26.43861 27.45906 28.61818 29.89983
[3,] 25.61250 27.29469 29.05168 30.87070 32.74141 34.65545
where the row index is the corresponding vector (row) from b_mtx and the column index is the corresponding vector from a_mtx
Finally, obtain the mean distance using:
colMeans(result)
[1] 23.79094 24.90281 26.15618 27.53303 29.01668 30.59220

Outer function R - maintain coordinate subtraction

I have two matrices, call them A (n x 2) and B (q x 2). I'd like to get an n x q x 2 array C, such that C[1,5,] represents the difference between the first row of A and the fifth row of B, taking the subtraction of the first element in the first row of A with the first element in the fifth row of B and the second element similarly subtracted.
I'm trying to perform this function via the outer function, but it also gives me the "non-diagonal" subtractions; i.e. it will also subtract A[1,1] - B[5,2] and A[1,2] - B[5,1] which I am not interested in. Does anyone have a fast, easy way to do this?
Current code
>diffs <- outer(A,B,FUN ='-')
>diffs[1,,5,]
[,1] [,2]
[1,] **-0.3808701** 0.7591052
[2,] 0.2629293 **1.4029046**
I've added the stars to indicate what I actually want.
Thanks for any help in advance
(EDIT)
Here's a simpler case for illustrative purposes
> A <- matrix(1:10, nrow = 5, ncol = 2)
> B <- matrix(4:9, nrow = 3, ncol = 2)
> A
[,1] [,2]
[1,] 1 6
[2,] 2 7
[3,] 3 8
[4,] 4 9
[5,] 5 10
> B
[,1] [,2]
[1,] 4 7
[2,] 5 8
[3,] 6 9
>diffs <- outer(A,B,FUN ='-')
>diffs[1,,3,] == (A[1,] - B[3,])
[,1] [,2]
[1,] TRUE FALSE
[2,] FALSE TRUE
>diffs[1,,3,]
[,1] [,2]
[1,] -5 -8
[2,] 0 -3
Before worrying about the shape of the output I think we should make sure we're getting the correct values.
A <- matrix(1:10, nrow=5, ncol=2)
B <- matrix(4:9, nrow=3, ncol=2)
# long-winded method
dia_long <- c(
c(A[1,] - B[1,]),
c(A[1,] - B[2,]),
c(A[1,] - B[3,]),
c(A[2,] - B[1,]),
c(A[2,] - B[2,]),
c(A[2,] - B[3,]),
c(A[3,] - B[1,]),
c(A[3,] - B[2,]),
c(A[3,] - B[3,]),
c(A[4,] - B[1,]),
c(A[4,] - B[2,]),
c(A[4,] - B[3,]),
c(A[5,] - B[1,]),
c(A[5,] - B[2,]),
c(A[5,] - B[3,]))
# loop method
comb <- expand.grid(1:nrow(A), 1:nrow(B))
dia_loop <- list()
for (i in 1:nrow(comb)) {
dia_loop[[i]] <- A[comb[i, 1], ] - B[comb[i, 2], ]
}
dia_loop <- unlist(dia_loop)
# outer/apply method
dia_outer <- apply(outer(A, B, FUN='-'), c(3, 1), diag)
# they all return the same values
all.identical <- function(l) {
all(sapply(2:length(l), FUN=function(x) identical(l[1], l[x])))
}
all.identical(lapply(list(dia_long, dia_loop, dia_outer), sort))
# TRUE
table(dia_long)
# dia_long
# -5 -4 -3 -2 -1 0 1 2 3
# 1 2 4 5 6 5 4 2 1
Are these the values you are looking for?
My solution: use nested lapply and sapply functions to extract the diagonals. I then needed to do some post-processing (not related to this specific problem), before I then turned it into an array. Should be noted that this is a q x 2 x n array, which turned out to be better for my purposes - this could be permuted with aperm from here though to solve the original question.
A <- matrix(1:10, nrow = 5, ncol = 2)
B <- matrix(4:9, nrow = 3, ncol = 2)
diffs <- outer(A,B, FUN = '-')
diffs <- lapply(X = 1:nrow(A),FUN = function(y){
t(sapply(1:ncol(B), FUN = function(x) diag(diffs[y,,x,])))})
diffs <- array(unlist(lapply(diffs, FUN = t)), dim = c(nrow(B),2,nrow(A)))

Determining Whether a Matrix Has At Least One Zero Element

I'm sure this is trivial - nonetheless, any help would be appreciated.
The problem is simple: given a matrix, I'd like to get TRUE if the matrix in question has at least one element equal to zero. So, checking
A <- matrix(c(1, 2, 3, 4, 5, 0), nrow = 2, ncol = 3, byrow = TRUE)
> A
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 0
would return TRUE, while
B <- matrix(c(1, 2, 3, 4, 5, 6), nrow = 2, ncol = 3, byrow = TRUE)
> B
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
would return FALSE.
Something like
if ( A == 0 ) { cat("\nZero detected")}
gives a warning. Is there a simple way to do this?
The warning is generated because you're presenting a vector of logical to if, which expects a single value. any is a function to tell if any of the logical values are TRUE:
any(A==0)
## [1] TRUE
any(B==0)
## [1] FALSE
There's also a function all which determines if all of the values in a logical vector are TRUE.
Try
0 %in% A
It should return TRUE or FALSE. It works for NA too:
x = matrix(1:24, ncol = 4)
x[3, 3] = NA
NA %in% x
#TRUE

Resources