Function that only runs when matrices has NAs in it - r

I have two matrices, one of them has a NA value and I want to use a function that only runs if there are NAs present in the data, so if I run the function it should only work on df2 and not df1. How would I do this?
df1 <- matrix(1:4, nrow = 2, ncol = 2)
df2 <- matrix(1,2,3,NA, nrow = 2, ncol = 2)

Based on the comment above, here is a complete answer (assuming I understand what you are getting at). The function is set up to do something or not to the matrix depending on whether it has NA values.
df1 <- matrix(1:4, nrow = 2, ncol = 2)
df2 <- matrix(c(1,2,3,NA), nrow = 2, ncol = 2)
myfunc <- function(m) {
ret <- m
if (all(!is.na(m))) {
print("This matrix has no NAs")
} else {
print("This matrix has NAs")
}
return(ret)
}
myfunc(df1)
# [1] "This matrix has no NAs"
# [,1] [,2]
# [1,] 1 3
# [2,] 2 4
myfunc(df2)
# [1] "This matrix has NAs"
# [,1] [,2]
# [1,] 1 3
# [2,] 2 NA

Related

Create adjacency matrix from a path given as a vector of nodes in base R

Is there a compact and elegant way to create an adjacency matrix given a vector of the form shown (see code excerpt) in base R?
I give my best attempt below. Ideally, I would like to create the already-populated adjacency matrix in a single step as opposed to having to create the matrix data structure then fill it.
p <- 25 # Cardinality of vertex set; Number of nodes
hypothesis_path <- c(17, 7, 6, 1) # path in this form
path_to_D <- function(hypothesis_path, p) {
path_len <- length(hypothesis_path) - 1
idx_path <- t(sapply(1:path_len, function(i) hypothesis_path[i:(i+1)]))
D <- matrix(0, p, p); D[idx_path] <- 1
D
}
D <- path_to_D(hypothesis_path, p)
which(D == 1, arr.ind = TRUE)
# Desired indices of adjacency matrix are populated (with ones)
# row col
# [1,] 6 1
# [2,] 7 6
# [3,] 17 7
Acceptable answers will avoid use of igraph or similar and will use the path vector in the form given. That said, advice and alternatives are of course always welcomed and appreciated.
You can use a sparse matrix from the Matrix package. It is not base R but a very common package.
library(Matrix)
hypothesis_path <- c(17, 7, 6, 1)
D <- sparseMatrix(i = hypothesis_path[-length(hypothesis_path)],
j = hypothesis_path[-1])
which(D == 1, arr.ind = TRUE)
row col
[1,] 6 1
[2,] 7 6
[3,] 17 7
You can use the powerful but little-known trick of matrix-based indexing:
index_mat <- rbind(
c(1, 2),
c(2, 3),
c(3, 1)
)
mat <- matrix(FALSE, 3, 3)
mat[index_mat] <- TRUE
mat
[,1] [,2] [,3]
[1,] FALSE TRUE FALSE
[2,] FALSE FALSE TRUE
[3,] TRUE FALSE FALSE
So do this:
path_to_D <- function (path, p) {
indices <- cbind(path[-length(path)], path[-1])
D <- matrix(0, p, p)
D[indices] <- 1
D
}
D <- path_to_D(hypothesis_path, 25)
which(D == 1, arr.ind=TRUE)
row col
[1,] 6 1
[2,] 7 6
[3,] 17 7

Outer function R - maintain coordinate subtraction

I have two matrices, call them A (n x 2) and B (q x 2). I'd like to get an n x q x 2 array C, such that C[1,5,] represents the difference between the first row of A and the fifth row of B, taking the subtraction of the first element in the first row of A with the first element in the fifth row of B and the second element similarly subtracted.
I'm trying to perform this function via the outer function, but it also gives me the "non-diagonal" subtractions; i.e. it will also subtract A[1,1] - B[5,2] and A[1,2] - B[5,1] which I am not interested in. Does anyone have a fast, easy way to do this?
Current code
>diffs <- outer(A,B,FUN ='-')
>diffs[1,,5,]
[,1] [,2]
[1,] **-0.3808701** 0.7591052
[2,] 0.2629293 **1.4029046**
I've added the stars to indicate what I actually want.
Thanks for any help in advance
(EDIT)
Here's a simpler case for illustrative purposes
> A <- matrix(1:10, nrow = 5, ncol = 2)
> B <- matrix(4:9, nrow = 3, ncol = 2)
> A
[,1] [,2]
[1,] 1 6
[2,] 2 7
[3,] 3 8
[4,] 4 9
[5,] 5 10
> B
[,1] [,2]
[1,] 4 7
[2,] 5 8
[3,] 6 9
>diffs <- outer(A,B,FUN ='-')
>diffs[1,,3,] == (A[1,] - B[3,])
[,1] [,2]
[1,] TRUE FALSE
[2,] FALSE TRUE
>diffs[1,,3,]
[,1] [,2]
[1,] -5 -8
[2,] 0 -3
Before worrying about the shape of the output I think we should make sure we're getting the correct values.
A <- matrix(1:10, nrow=5, ncol=2)
B <- matrix(4:9, nrow=3, ncol=2)
# long-winded method
dia_long <- c(
c(A[1,] - B[1,]),
c(A[1,] - B[2,]),
c(A[1,] - B[3,]),
c(A[2,] - B[1,]),
c(A[2,] - B[2,]),
c(A[2,] - B[3,]),
c(A[3,] - B[1,]),
c(A[3,] - B[2,]),
c(A[3,] - B[3,]),
c(A[4,] - B[1,]),
c(A[4,] - B[2,]),
c(A[4,] - B[3,]),
c(A[5,] - B[1,]),
c(A[5,] - B[2,]),
c(A[5,] - B[3,]))
# loop method
comb <- expand.grid(1:nrow(A), 1:nrow(B))
dia_loop <- list()
for (i in 1:nrow(comb)) {
dia_loop[[i]] <- A[comb[i, 1], ] - B[comb[i, 2], ]
}
dia_loop <- unlist(dia_loop)
# outer/apply method
dia_outer <- apply(outer(A, B, FUN='-'), c(3, 1), diag)
# they all return the same values
all.identical <- function(l) {
all(sapply(2:length(l), FUN=function(x) identical(l[1], l[x])))
}
all.identical(lapply(list(dia_long, dia_loop, dia_outer), sort))
# TRUE
table(dia_long)
# dia_long
# -5 -4 -3 -2 -1 0 1 2 3
# 1 2 4 5 6 5 4 2 1
Are these the values you are looking for?
My solution: use nested lapply and sapply functions to extract the diagonals. I then needed to do some post-processing (not related to this specific problem), before I then turned it into an array. Should be noted that this is a q x 2 x n array, which turned out to be better for my purposes - this could be permuted with aperm from here though to solve the original question.
A <- matrix(1:10, nrow = 5, ncol = 2)
B <- matrix(4:9, nrow = 3, ncol = 2)
diffs <- outer(A,B, FUN = '-')
diffs <- lapply(X = 1:nrow(A),FUN = function(y){
t(sapply(1:ncol(B), FUN = function(x) diag(diffs[y,,x,])))})
diffs <- array(unlist(lapply(diffs, FUN = t)), dim = c(nrow(B),2,nrow(A)))

Padding or shifting a multi-dimensional array

How can I simply pad (append/prepend) a slice of NA's to a (say) 3D array along (say) dimension 2?
Suppose the initial array is given as
A <- array(1:8,c(2,2,2))
I initially thought this would work:
cbind(A,NA)
but it results in an 8x2 matrix instead of a 2x3x2 array. I then tried
abind(A,NA,along=2)
but that results in an error.
I'm hoping there is a much simpler solution than
dimSlice <- dim(A)
dimSlice[2] <- 1
abind(A,array(NA,dimSlice),along=2)
Background
This padding happens as part of a "remove slice and pad opposite side" operation that shifts an array by one position along some dimension, filling in with NA elements at the vacated positions. The one-dimensional equivalent would be, for example, c(A[-1],NA) for vector A, If there is a simple way to accomplish such an operation without an explicit padding sub-operation, that would be even better.
Subsetting with NAs results in NAs (?Extract):
v = 1:3; m = matrix(1:4, 2, 2); a = array(1:6, c(2, 2, 2))
v[c(NA, 1)]
#[1] NA 1
m[, c(2, NA)]
# [,1] [,2]
#[1,] 3 NA
#[2,] 4 NA
a[, c(1, 2, NA), ]
#, , 1
#
# [,1] [,2] [,3]
#[1,] 1 3 NA
#[2,] 2 4 NA
#
#, , 2
#
# [,1] [,2] [,3]
#[1,] 5 1 NA
#[2,] 6 2 NA
So, to pad with NAs, we could subset using the appropriate indices. Putting the above in a more general function to append/prepend "n" indices with NA in dimension "k" of an array:
pad = function(x, k, n = 1L, append = TRUE)
{
dims = replicate(length(dim(x)), substitute(), simplify = FALSE)
if(append) dims[[k]] = c((n + 1):dim(x)[[k]], rep_len(NA, n))
else dims[[k]] = c(rep_len(NA, n), 1:(dim(x)[[k]] - n))
do.call("[", c(list(x), dims))
}
arr = array(1:24, c(3, 2, 2, 2))
pad(arr, 1, 2, FALSE)
pad(arr, 2)

How to calculate correlation between matrices with different column dimention in R

I have two matrices with same number of rows and different number of columns as:
mat1 <- matrix(rnorm(20), 4, 5)
mat2 <- matrix(rnorm(12), 4, 3)
Since i have the same number of rows I want to calculate the following correlation between the columns of the matrices:
cor.test(mat1[,1], mat2[,1])
cor.test(mat1[,1], mat2[,2])
cor.test(mat1[,1], mat2[,3])
cor.test(mat1[,2], mat2[,1])
cor.test(mat1[,2], mat2[,2])
cor.test(mat1[,2], mat2[,3])
...........
...........
cor.test(mat1[,5], mat2[,3])
for(i in 1:5){
for(j in 1:3){
pv[i,j] <- cor.test(mat1[, i], mat2[ , j])$p.value
}
}
At the end I want a matrix(5 * 3) or vector containing the correlation values, can anyone help?
Can i use this to return both p.value and estimate?
FUN <- function(x, y) {
res <- cor.test(x, y, method="spearman", exact=F)
return(list(c = res$estimate, p = res$p.value))
}
r1 <- outer(colnames(mat1), colnames(mat2), Vectorize(function(i,j) FUN(mat1[,i], mat2[,j])$p))
r2 <- outer(colnames(mat1), colnames(mat2), Vectorize(function(i,j) FUN(mat1[,i], mat2[,j])$c))
Thank you.
Why don't you just use cor function to calculate the pearson correlation?
seed(1)
mat1 <- matrix(rnorm(20), 4, 5)
mat2 <- matrix(rnorm(12), 4, 3)
cor(mat1, mat2)
[,1] [,2] [,3]
[1,] 0.4406765 -0.70959590 0.10731768
[2,] -0.2566199 -0.01588993 -0.63630159
[3,] -0.9813313 0.85082165 -0.77172317
[4,] 0.6121358 -0.38564314 0.87077092
[5,] -0.6897573 0.66272015 -0.08380553
To double check,
> col_1 <- 3
> col_2 <- 2
# all.equal is used to compare numeric equality where `==` is discouraged
> all.equal(cor(mat1, mat2)[col_1, col_2], cor(mat1[,col_1], mat2[,col_2]))
[1] TRUE
They are equal!
An alternative, slightly easier to understand than loops in my opinion:
sapply(
data.frame(mat1),
function(x) Map(function(a,b) cor.test(a,b)$p.value,
list(x),
as.data.frame(mat2))
)
Result:
# X1 X2 X3 X4 X5
#[1,] 0.7400541 0.8000358 0.5084979 0.4441933 0.9104712
#[2,] 0.2918163 0.2764817 0.956807 0.6072979 0.4395218
#[3,] 0.2866105 0.4095909 0.5648188 0.1746428 0.9125866
I supose you would like to do it without for's. With base stuff, here is the double apply aproach:
apply(mat1, 2, function(col_mat1){
apply(mat2, 2, function(col2, col1) {
cor.test(col2, col1)$p.value
}, col1=col_mat1)
})
The outter apply iterates at mat1 columns and serves one side of cor.test(). The inner one does the same, but now fills the second side of cor.test(). In practie, apply is replacing the for's.
I think all you need is to define your matrix first
mat_cor <- matrix(nrow=ncol(mat1), ncol=ncol(mat2))
for(i in 1:5)
{
for(j in 1:3)
{
mat_cor[i,j] <- cor.test(mat1[, i], mat2[ , j])$p.value
}
}
Output
mat_cor
[,1] [,2] [,3]
[1,] 0.9455569 0.8362242 0.162569342
[2,] 0.7755360 0.9849619 0.775006329
[3,] 0.8799139 0.8050564 0.001358697
[4,] 0.1574388 0.1808167 0.618624825
[5,] 0.8571844 0.8897125 0.879818822
You can try with something like this
pv <- c()
for(i in 1:dim(mat1)[2]){
for(j in 1:dim(mat2)[2]){
pv <-c(c, cor.test(mat1[, i], mat2[ , j])$estimate)
}
}
dim(pv) <- c(dim(mat1)[2], dim(mat2)[2])

sum adjacent columns for each column in a matrix in R

I am trying to get a function that is the opposite of diff()
I want to add the values of adjacent columns in a matrix for each column in the matrix.
I do NOT need the sum of the entire column or row.
For example:
If I had:
[ 1 2 4;
3 5 8 ]
I would end up with:
[ 3 6;
8 13 ]
Of course for just one or two columns this is simple as I can just do x[,1]+x[,2], but these matrices are quite large.
I'm surprised that I cannot seem to find an efficient way to do this.
m <- matrix(c(1,3,2,5,4,8), nrow=2)
m[,-1] + m[,-ncol(m)]
[,1] [,2]
[1,] 3 6
[2,] 8 13
Or, just for the fun of it:
n <- ncol(m)
x <- suppressWarnings(matrix(c(1, 1, rep(0, n-1)),
nrow = n, ncol = n-1))
m %*% x
[,1] [,2]
[1,] 3 6
[2,] 8 13
Dummy data
mat <- matrix(sample(0:9, 100, replace = TRUE), nrow = 10)
Solution:
sum.mat <- lapply(1:(ncol(mat)-1), function(i) mat[,i] + mat[,i+1])
sum.mat <- matrix(unlist(sum.mat), byrow = FALSE, nrow = nrow(mat))
You could use:
m <- matrix(c(1,2,4,3,5,8), nrow=2, byrow=T)
sapply(2:ncol(m), function(x) m[,x] + m[,(x-1)])

Resources