I need to compare each item in an array against every element in a matrix. The matrix and array can be any size. I can't use loops or if statements - mainly functions like apply(), ifelse(), and so on. NA data can be ignored. Here is an example:
x <- c(1,0,1,0,1,1,1,1,0,1,0,1)
y <- c(1, NA, 1, NA)
The y array needs to compare, in rows, to x - so that after each element in y has compared itself to x, y continues comparing to x as a new row. The solution I want is:
[,1] [,2] [,3] [,4]
[1,] TRUE NA TRUE NA
[2,] TRUE NA TRUE NA
[3,] FALSE NA FALSE NA
This function compares with logical equivalence:
answer <- function(x,y){
z <- x == y
print(z)
}
The solution returns the correct answers but in a single row, where the next row should begin after the second NA.
[1] TRUE NA TRUE NA TRUE NA TRUE NA FALSE NA FALSE NA
When I try to turn that answer into a matrix with column length(y) - 4 in this case - the output isn't correct.
answer <- function(x,y){
z <- x == y
z2 <- matrix(z, ncol = length(y))
print(z2)
}
The return is comparing values straight down by column:
[,1] [,2] [,3] [,4]
[1,] TRUE NA TRUE NA
[2,] NA TRUE NA FALSE
[3,] TRUE NA FALSE NA
What can I use to make the comparison through each column in a row instead of each row in a column? Can I use apply(z, 1, some built-in function) or a nested apply(apply()) function? The difficulty has been to ensure that the resulting matrix is the correct size, with the correct answer, while compensating for any size array/matrix comparison.
We can make the lengths same and do the comparison
x == y[col(x)]
# [,1] [,2] [,3] [,4]
#[1,] TRUE NA TRUE NA
#[2,] FALSE NA TRUE NA
#[3,] TRUE NA FALSE NA
If the comparison is by row
x == y
Or
x== y[row(x)]
Or with sweep
sweep(x, 2, y, FUN = `==`)
# [,1] [,2] [,3] [,4]
#[1,] TRUE NA TRUE NA
#[2,] FALSE NA TRUE NA
#[3,] TRUE NA FALSE NA
data
x <- matrix(c(1,0,1,0,1,1,1,1,0,1,0,1), nrow=3, ncol=4)
y <- c(1, NA, 1, NA)
You can do
t(t(x) == y)
# [,1] [,2] [,3] [,4]
#[1,] TRUE NA TRUE NA
#[2,] FALSE NA TRUE NA
#[3,] TRUE NA FALSE NA
x <- matrix(c(1,0,1,0,1,1,1,1,0,1,0,1), nrow=3, ncol=4)
y <- c(1, NA, 1, NA)
Related
This question already has an answer here:
correlation matrix in R
(1 answer)
Closed 6 years ago.
Starting from a Matrix (nxm), I would like to create a new Matrix mxm that contains the correlation between the permutation of the columns of the starting matrix by 2. So if my input is a Matrix 3x3, I would like to calculate the correlation of the columns 12, 13, 23 and assign the results to the destination Matrix. Banally I used two nested for loop (~O(n^2))
for (i in 1:n) {
for (j in i+1:n) {
if (j <= n) {
tmp = cor(inMatrix[, i], inMatrix[, j])
dstMatrix[i,j] = tmp;
}
}
}
this appears to be working, and I was wondering if exists a better way to achieve it in R.
The simple cor(inMatrix) does it (whole matrix directly passed to cor()):
n <- 7
m <- 5
set.seed(123)
inMatrix <- replicate(m, sample(c(1, - 1), 1) * cumsum(runif(n)))
inMatrix
# [,1] [,2] [,3] [,4] [,5]
# [1,] 0.7883051 -0.4566147 0.04205953 -0.7085305 -0.7954674
# [2,] 1.1972821 -1.4134481 0.36998025 -1.2525965 -0.8200811
# [3,] 2.0802995 -1.8667822 1.32448390 -1.8467385 -1.2978771
# [4,] 3.0207667 -2.5443529 2.21402322 -2.1358983 -2.0563366
# [5,] 3.0663232 -3.1169863 2.90682662 -2.2830119 -2.2727445
# [6,] 3.5944287 -3.2199110 3.54733344 -3.2460361 -2.5909256
# [7,] 4.4868478 -4.1197359 4.54160321 -4.1483352 -2.8225513
dstMatrix <- matrix(nrow = m, ncol = m)
for (i in 1:(m - 1)) {
for (j in (i+1):m) {
if (j <= n) {
tmp = cor(inMatrix[, i], inMatrix[, j])
dstMatrix[i,j] = tmp;
}
}
}
dstMatrix
# [,1] [,2] [,3] [,4] [,5]
# [1,] NA -0.9823516 0.9902370 -0.9688212 -0.9825973
# [2,] NA NA -0.9811424 0.9570599 0.9626469
# [3,] NA NA NA -0.9742235 -0.9862355
# [4,] NA NA NA NA 0.9331879
# [5,] NA NA NA NA NA
dstMatrix_2 <- cor(inMatrix)
dstMatrix_2
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1.0000000 -0.9823516 0.9902370 -0.9688212 -0.9825973
# [2,] -0.9823516 1.0000000 -0.9811424 0.9570599 0.9626469
# [3,] 0.9902370 -0.9811424 1.0000000 -0.9742235 -0.9862355
# [4,] -0.9688212 0.9570599 -0.9742235 1.0000000 0.9331879
# [5,] -0.9825973 0.9626469 -0.9862355 0.9331879 1.0000000
dstMatrix == dstMatrix_2
# [,1] [,2] [,3] [,4] [,5]
# [1,] NA TRUE TRUE FALSE TRUE
# [2,] NA NA TRUE FALSE TRUE
# [3,] NA NA NA FALSE TRUE
# [4,] NA NA NA NA FALSE
# [5,] NA NA NA NA NA
# The difference lies in machine precision magnitude, not sure what caused it:
dstMatrix - dstMatrix_2
# [,1] [,2] [,3] [,4] [,5]
# [1,] NA 0 0 -1.110223e-16 0.000000e+00
# [2,] NA NA 0 2.220446e-16 0.000000e+00
# [3,] NA NA NA -1.110223e-16 0.000000e+00
# [4,] NA NA NA NA 1.110223e-16
# [5,] NA NA NA NA NA
compute correlation coefficient for combinations of columns. combn function is used to get pairs of column numbers
As per #Sotos, function can be passed directly into combn, so it avoids using apply()
cor_vals <- combn(1:col_n, 2, function(x) cor(mat1[, x[1]], mat1[, x[2]]))
# cor_vals <- apply(combn(1:col_n, 2), 2, function(x) cor(mat1[, x[1]], mat1[, x[2]]))
assign names to correlation values
cor_vals <- setNames(cor_vals, combn(1:col_n, 2, paste0, collapse = ''))
cor_vals
# 12 13 23
# 0.1621491 -0.8211970 0.4299367
Data:
set.seed(1L)
row_n <- 3
col_n <- 3
mat1 <- matrix(runif(row_n * col_n, min = 0, max = 20), nrow = row_n, ncol = col_n)
I want to set NA's in every element of a matrix where the value in a column is greater than or equal to the value of a given vector. For example, I can create a matrix:
set.seed(1)
zz <- matrix(data = round(10L * runif(12)), nrow = 4, ncol = 3)
which gives for zz:
[,1] [,2] [,3]
[1,] 8 5 7
[2,] 6 5 1
[3,] 5 10 3
[4,] 9 1 9
and for the comparison vector (for example):
xx <- round(10L * runif(4))
where xx is:
[1] 6 3 8 2
if I perform this operation:
apply(zz,2,function(x) x >= xx)
I get:
[,1] [,2] [,3]
[1,] TRUE FALSE TRUE
[2,] TRUE TRUE FALSE
[3,] FALSE TRUE FALSE
[4,] TRUE FALSE TRUE
What I want is everywhere I have a TRUE element I want an NA and everywhere I have a FALSE I get the number in the zz matrix (e.g., manually ...):
NA 5 NA
NA NA 1
5 NA 3
NA 1 NA
I can cobble together some "for" loops to do what I want, but is there a vector-based way to do this??
Thanks for any tips.
You could simply do:
zz[zz>=xx] <- NA
# [,1] [,2] [,3]
#[1,] NA 5 NA
#[2,] NA NA 1
#[3,] 5 NA 3
#[4,] NA 1 NA
Here is one option to get the expected output. We get a logical matrix (zz >= xx), using NA^ on that returns NA for the TRUE values and 1 for the FALSE, then multiply it with original matrix 'zz' so that NA remains as such while the 1 changes to the corresponding value in 'zz'.
NA^(zz >= xx)*zz
# [,1] [,2] [,3]
#[1,] NA 5 NA
#[2,] NA NA 1
#[3,] 5 NA 3
#[4,] NA 1 NA
Or another option is ifelse
ifelse(zz >= xx, NA, zz)
data
zz <- structure(c(8, 6, 5, 9, 5, 5, 10, 1, 7, 1, 3, 9), .Dim = c(4L, 3L))
xx <- c(6, 3, 8, 2)
I have a vector of data such as the following:
data <- c(1, 3, 4, 7)
And I would like to apply a function to every pair of elements in the vector such that it will return an upper triangle matrix as the following does:
mat <- matrix(data = NA, nrow = length(data), ncol = length(data))
for (i in 1:(length(data) - 1)) {
for (j in (i+1):length(data)) {
mat[i, j] <- "-"(data[j], data[i])
}
}
But I would like to do so with an apply type function instead of a for loop.
I am unsure how to do so. Any suggestions?
Thanks!
We can use combn
mat[lower.tri(mat, diag=FALSE)] <- combn(data, 2,
FUN= function(x) x[2]-x[1])
t(mat)
# [,1] [,2] [,3] [,4]
#[1,] NA 2 3 6
#[2,] NA NA 1 4
#[3,] NA NA NA 3
#[4,] NA NA NA NA
data
mat <- matrix(data = NA, nrow = length(data), ncol = length(data))
Using outer:
t(outer(data,data,"-"))*
NA^lower.tri(matrix(0,length(data),length(data)),diag=TRUE)
# [,1] [,2] [,3] [,4]
#[1,] NA 2 3 6
#[2,] NA NA 1 4
#[3,] NA NA NA 3
#[4,] NA NA NA NA
What features does the R language have to find missing values in dataframe or at least, how to know that the dataframe has missing values?
x = matrix(rep(c(NA, 1,NA), 3), ncol=3, nrow=3)
print(x)
[,1] [,2] [,3]
[1,] NA NA NA
[2,] 1 1 1
[3,] NA NA NA
matrix of boolean values: is the value NA
is.na(x)
[,1] [,2] [,3]
[1,] TRUE TRUE TRUE
[2,] FALSE FALSE FALSE
[3,] TRUE TRUE TRUE
indices of NA values:
which(is.na(x), arr.ind = T)
row col
[1,] 1 1
[2,] 3 1
[3,] 1 2
[4,] 3 2
[5,] 1 3
[6,] 3 3
see if the matrix has any missing values:
any(is.na(x))
TRUE
It's hard to tell based on the example you've given, more details on the structure of "data" would be helpful, but, if you simply want to exclude any observation (row) of your data that has a missing value anywhere in it, try:
cleanDat <- na.omit(data)
Note, there is a nice tutorial on missing data which is where I looked to confirm I had this right.
I created an empty matrix by matrix(), when I need to test whether a given matrix is empty, How can I do that? I know that is.na(matrix()) is TRUE, but if given matrix is higher dimension, it cannot determine.
What I mean empty is element full of NA or NULL.
I'm guessing that you are just looking for all. Here's a small example:
M1 <- matrix(NA, ncol = 3, nrow = 3)
# [,1] [,2] [,3]
# [1,] NA NA NA
# [2,] NA NA NA
# [3,] NA NA NA
M2 <- matrix(c(1, rep(NA, 8)), ncol = 3, nrow = 3)
M2
# [,1] [,2] [,3]
# [1,] 1 NA NA
# [2,] NA NA NA
# [3,] NA NA NA
all(is.na(M1))
# [1] TRUE
all(is.na(M2))
# [1] FALSE