counting N occurrences within a ceiling range of a matrix by-row - r

I would like to tally each time a value lies within a given range in a matrix by-row, and then sum these logical outcomes to derive a "measure of consistency" for each row.
Reproducible example:
m1 <- matrix(c(1,2,1,6,3,7,4,2,6,8,11,15), ncol=4, byrow = TRUE)
# expected outcome, given a range of +/-1 either side
exp.outcome<-matrix(c(TRUE,TRUE,TRUE,FALSE,
TRUE,FALSE,TRUE,TRUE,
FALSE,FALSE,FALSE,FALSE),
ncol=4, byrow=TRUE)
Above I've indicated the the expected outcome, in the case where each value lies within +/- 1 range of any other values within that row.
Within the first row of m1 the first value (1) is within +/-1 of any other value in that row hence equals TRUE, and so on.
By contrast, none of the values in row 4 of m1 are within a single digit value of each other and hence each is assigned FALSE.
Any pointers would be much appreciated?
Update:
Thanks to the help provided I can now count the unique pairs of values which meet the ceiling criteria for any arbitrarily large matrix (using the binomial coefficient, k draws from n, without replacement).

Before progressing with the answer I just wanted to clarify that in your question you have said:
Within the first row of m1 the first value (1) is within +/-1 of any
other value in that row hence equals TRUE, and so on.
However,
>> m1[1,4]
[1] 6
6 is not within the +/- 1 from 1, and there is FALSE value as a correct result in your answer.
Solution
This solution should get you to the desired results:
t(apply(
X = m1,
# Take each row from the matrix
MARGIN = 1,
FUN = function(x) {
sapply(
X = x,
# Now go through each element of that row
FUN = function(y) {
# Your conditions
y %in% c(x - 1) | y %in% c(x + 1)
}
)
}
))
Results
[,1] [,2] [,3] [,4]
[1,] TRUE TRUE TRUE FALSE
[2,] TRUE FALSE TRUE TRUE
[3,] FALSE FALSE FALSE FALSE
Check
For results stored as res.
>> identical(res, exp.outcome)
[1] TRUE

Here is a kind of neat base R method that uses an array:
The first two lines are setup that store a three dimensional array of acceptable values and a matrix that will store the desired output. The structure of the array is as follows: columns correspond with acceptable values of a matrix element in same column. The third dimension correspond with the rows of the matrix.
Pre-allocation in this way should cut down on repeated computations.
# construct array of all +1/-1 values
valueArray <- sapply(1:nrow(m1), function(i) rbind(m1[i,]-1, m1[i,], m1[i,]+1),
simplify="array")
# get logical matrix of correct dimensions
exp.outcome <- matrix(TRUE, nrow(m1), ncol(m1))
# get desired values
for(i in 1:nrow(m1)) {
exp.outcome[i, ] <- sapply(1:ncol(m1), function(j) m1[i, j] %in% c(valueArray[, -j, i]))
}
Which returns
exp.outcome
[,1] [,2] [,3] [,4]
[1,] TRUE TRUE TRUE FALSE
[2,] TRUE FALSE TRUE TRUE
[3,] FALSE FALSE FALSE FALSE

Related

Selecting rows in R based on threshold

In R, I have a matrix with N columns of all numbers. (Each row has a name, but that's irrelevant.) I'd like to return rows where there is at least one column has a value greater than some threshold. Right now, I'm doing something like this:
THRESHOLD <- 10
# my_matrix[,1] can be ignored
my_matrix <- subset (my_matrix, my_matrix[,1] > THRESHOLD | my_matrix[,2] > THRESHOLD | ... )
It seems odd to have to manually list each column. Also, if the number of input columns changes, I have to rewrite this.
There has to be a better way, but I can't figure out what I should be looking for.
I can convert my matrix to a data frame, if that is easier... Any suggestions would be appreciated!
find any row values greater than threshold using apply and use it to extract the rows from mat data.
mat[apply( mat2, 1, function( x ) any( x > threshold ) ), ]
EDIT:
Break down of the above single line.
# create sample data by simulating samples from standard normal distribution
set.seed(1L) # set random number generator for consistent data simulation
mat <- matrix( data = c(letters[1:3], as.character( rnorm(9, mean = 0, sd = 1))),
byrow = FALSE,
nrow = 3,
ncol = 4 ) # create simulated data matrix
threshold <- 0 # set threshold
mat2 <- apply( mat[, 2:ncol(mat) ], 2, as.numeric ) # extract columns 2 to end and convert to numeric
# Get the logical indices (true or false) if any row has values greater than 0 (threshold)
row_indices <- apply( mat2, 1, function( x ) any( x > threshold ) )
mat[row_indices, ] # extract matrix data rows that has TRUE in row_indices
# [,1] [,2] [,3] [,4]
# [1,] "a" "-0.626453810742332" "1.59528080213779" "0.487429052428485"
# [2,] "b" "0.183643324222082" "0.329507771815361" "0.738324705129217"
# [3,] "c" "-0.835628612410047" "-0.820468384118015" "0.575781351653492"
Note:
In your question, you mentioned that first column is character and the rest are numbers. By rule, matrix can hold one data type. Given this information, I assume that your data matrix is a character data type. You can find it by using class(mat). If it is character matrix, then extract columns 2 to end and then convert it to numeric. Then use it in the apply loop to check for any values greater than threshold.

Multiply a matrix' columns by its columns

I have a 4x100 matrix where I would like to multiply column 1 with row 1 in its transpose etc and store these matrices somewhere to be able to take the sum of these new matrices lateron.
I really don't know where to start due to the fact that I get 4x4 matrices after the column-row-multiplication. Due to this fact I cannot store them in a matrix
data:
mm num[1:4,1:100]
mm_t num[1:100,1:4]
I'm thinking of creating a list in some way
list1=list()
for(i in 1:100){
list1[i] <- mm[,i]%*%mm_t[i,]
}
but I need some more indices i think because this just leaves me with a number in each argument..
First, your call for data is not clear. Second, are you tryign to multiply each value by itself, or do matrix multiplication
We create a 4x100 matrix and its transpose:
mm <- matrix(1:400, nrow = 4, ncol = 100)
mm.t <- t(mm)
Then we can do the matrix multiplication (which is what you did, and you get a 4 x 4 matrix from the definition of matrix multiplication https://www.wikiwand.com/en/Matrix_multiplication)
If we want to multiply each index by itself (so mm[1,1] by mm [1,1]) then:
mm * mm
This will result in 4x100 matrix where each value is the square of the original value.
If we want the matrix multiplication of each column with itself, then:
sapply(1:100, function(x) {
mm[, x] %*% mm[, x]
})
This results in 100 values: each one is the matrix product of a 4x1 vector with itself.
Let's start with some sample data. Please get in the habit of including things like this in your question:
nr = 4
nc = 100
set.seed(47)
mm = matrix(runif(nr * nc), nrow = nr)
Here's a working answer, very similar to your attempt:
result = list()
for (i in 1:ncol(mm)) result[[i]] = mm[, i] %*% t(mm[, i])
result[1:2]
# [[1]]
# [,1] [,2] [,3] [,4]
# [1,] 0.9544547 0.3653018 0.7439585 0.8035430
# [2,] 0.3653018 0.1398132 0.2847378 0.3075428
# [3,] 0.7439585 0.2847378 0.5798853 0.6263290
# [4,] 0.8035430 0.3075428 0.6263290 0.6764924
#
# [[2]]
# [,1] [,2] [,3] [,4]
# [1,] 0.3289532 0.3965557 0.2231443 0.2689613
# [2,] 0.3965557 0.4780511 0.2690022 0.3242351
# [3,] 0.2231443 0.2690022 0.1513691 0.1824490
# [4,] 0.2689613 0.3242351 0.1824490 0.2199103
As to why yours didn't work, we can experiment and see that indeed we get a number rather than a matrix. The reason is that when you subset a single row or column of a matrix, the dimensions are "dropped" and it is coerced to a plain vector. And when you matrix multiply two vectors, you get their dot product.
mmt = t(mm)
mm[, 1] %*% mmt[1, ]
# [,1]
# [1,] 2.350646
dim(mm[, 1])
# NULL
dim(mmt[1, ])
# NULL
We can avoid this by specifying drop = FALSE in the subset code
dim(mmt[1, , drop = FALSE])
# [1] 1 4
And thus slightly modify your attempt, just adding drop = FALSE will make it work.
res2 = list()
for (i in 1:ncol(mm)) res2[[i]] = mm[, i] %*% mmt[i, , drop = FALSE]
identical(result, res2)
# [1] TRUE

Check if all rows are the same in a matrix

I'm looking to see if sample(..., replace=T) results in sampling the same row n times. I see the duplicated function checks if elements are repeated by returning a logical vector for each index, but I need to see if one element is repeated n times (single boolean value). What's the best way to go about this?
Here's just an example. Some function on this matrix should return TRUE
t(matrix(c(rep(c(rep(4,1),rep(5,1)),8)),nrow=2,ncol=8))
[,1] [,2]
[1,] 4 5
[2,] 4 5
[3,] 4 5
[4,] 4 5
[5,] 4 5
[6,] 4 5
[7,] 4 5
[8,] 4 5
Here is one solution that works to produce the true/false result you are looking for:
m <- t(matrix(c(rep(c(rep(4,1),rep(5,1)),8)),nrow=2,ncol=8))
apply(m, 2, function(x) length(unique(x)) == 1)
[1] TRUE TRUE
m <- rbind(m, c(4, 6))
apply(m, 2, function(x) length(unique(x)) == 1)
[1] TRUE FALSE
If you want a single boolean value saying if all columns have unique values, you can do:
all(apply(m, 2, function(x) length(unique(x)) == 1) == TRUE)
[1] FALSE
A little cleaner looking (and easier to tell what the code is doing):
m <- t(matrix(c(rep(c(rep(4,1),rep(5,1)),8)),nrow=2,ncol=8))
apply(m, 2, function(x) all(x==x[1]))
[1] TRUE TRUE
Think I've got my solution.
B <- t(matrix(c(rep(c(rep(4,1),rep(5,1)),8)),nrow=2,ncol=8))
length(table(B)) == ncol(B)
[1] TRUE
B <- rbind(B,c(4,6)) # different sample
length(table(B)) == ncol(B)
[1] FALSE
We could also replicate the first row, compare with the original matrix, get the colSums and check whether it is equal to nrow of 'm'
colSums(m[1,][col(m)]==m)==nrow(m)
[1] TRUE TRUE
Or another option would be to check the variance
!apply(m, 2, var)
#[1] TRUE TRUE
For a matrix you can apply unique directly to your matrix, no need for apply:
nrow(unique(m)) == 1L
[1] TRUE
nrow(unique(rbind(m, c(6,7)))) == 1L
[1] FALSE
From the documentation ?unique:
The array method calculates for each element of the dimension specified by MARGIN if the remaining dimensions are identical to those for an earlier element (in row-major order). This would most commonly be used for matrices to find unique rows (the default) or columns (with MARGIN = 2).
Alternatively you can transpose your matrix and leverage vectorized comparison:
all(m[1,] == t(m))
[1] TRUE

Why am I not getting the index value of a matrix using the which function?

I have a matrix of thousands of coordinate values. And all I want is to find the index of a chosen value. I use this:
which(long == -118.1123, arr.ind=TRUE)
But I don't get any value. All I get is blank row and column. However, when I do this, I get values.
which(long < -118.1123, arr.ind=TRUE)
I know this value exists because I have manually checked in the Rstudio pane as well as printed out the value using long[1,2] etc.
dput(long) doesn't work with matrices. Hope you can help.
Diagnosis as per comments
- long[1,1]
[1] -118.0981
- long[1,1]==-118.0981
[1] FALSE
Here's an example of using a test that allows a "fuzz-factor" difference to be ignored:
> M <- matrix(rnorm(10) , 5,2)
> M
[,1] [,2]
[1,] -0.2382021 2.1698010
[2,] -1.1617644 -1.1513516
[3,] 1.3597808 0.9365208
[4,] 0.7460694 -1.7216410
[5,] -0.2413117 -0.1780468
> which(M==-0.2382021, arr.ind=TRUE)
row col
> which(abs(M - -0.2382021) < 0.0000001, arr.ind=TRUE)
row col
[1,] 1 1
My comment suggesting all.equal didn't work with a matrix argument inside which. I checked with #RHertel's choice of signif and it does succeed.
> which(signif(M,7) == -0.2382021, arr.ind=TRUE)
row col
[1,] 1 1
Your numerical value in the matrix may not be exactly equal to -118.1123. It may contain several digits that aren't displayed since the machine's accuracy is much higher and a perfect identity may not be obtained due to minimal roundoff errors.
I suggest that you try
which(signif(long,7) == -118.1123, arr.ind=TRUE)
Here's a simple example illustrating the problem:
Let's first fill a vector with random numbers between 0 and 1:
v <- runif(100)
Then we redefine an arbitrary element of the vector, say element 42, and set it equal to pi:
v[42] <- pi
> v[42]
#[1] 3.141593
However, if we test for equality using "==" the result is FALSE:
> v[42] == 3.141593
#[1] FALSE
But if we only consider the first seven significant digits, the position 42 can be extracted from the vector:
> which(signif(v,7) == 3.141593, arr.ind = T)
#[1] 42
Now, to address your comment, let's assume that you're trying to find the number that is closest to 3.141591 among all your elements in the vector v. We can be certain that this will be element 42, but R should find out. This result can be obtained with
> order(abs(v-3.141591))[1]
#[1] 42

Learning R - What is this Function Doing?

I am learning R and reading the book Guide to programming algorithms in r.
The book give an example function:
# MATRIX-VECTOR MULTIPLICATION
matvecmult = function(A,x){
m = nrow(A)
n = ncol(A)
y = matrix(0,nrow=m)
for (i in 1:m){
sumvalue = 0
for (j in 1:n){
sumvalue = sumvalue + A[i,j]*x[j]
}
y[i] = sumvalue
}
return(y)
}
How do I call this function in the R console? And what exactly is passing into this function A, X?
The function takes an argument A, which should be a matrix, and x, which should be a numeric vector of same length as values per row in A.
If
A <- matrix(c(1,2,3,4,5,6), nrow = 2, ncol = 3)
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
then you have 3 values (number of columns, ncol) per row, thus x needs to be something like
x <- c(4,5,6)
The function itself iterates all rows, and in each row, each value is multiplied with a value from x, where the value in the first column is multiplied with the first value in x, the value in As second column is multiplied with the second value in x and so on. This is repeated for each row, and the sum for each row is returned by the function.
matvecmult(A, x)
[,1]
[1,] 49 # 1*4 + 3*5 + 5*6
[2,] 64 # 2*4 + 4*5 + 6*6
To run this function, you first have to compile (source) it and then consecutively run these three code lines:
A <- matrix(c(1,2,3,4,5,6), nrow = 2, ncol = 3)
x <- c(4,5,6)
matvecmult(A, x)
This function is designed to return the product of a matrix A with a vector x; i.e. the result will be the matrix product A x (where - as is usual in R, the vector is a column vector). An example should make things clear.
# define a matrix
mymatrix <- matrix(sample(12), nrow <- 4)
# see what the matrix looks like
mymatrix
# [,1] [,2] [,3]
# [1,] 2 10 9
# [2,] 3 1 12
# [3,] 11 7 5
# [4,] 8 4 6
# define a vector where multiplication of our matrix times the vector will be defined
vec3 <- c(-1,0,1)
# apply the function to our matrix and vector
result <- matvecmult(mymatrix, vec3)
result
# [,1]
# [1,] 7
# [2,] 9
# [3,] -6
# [4,] -2
class(result)
# [1] "matrix"
So matvecmult(mymatrix, vec3) is how you would call this function, and the result is an n by 1 matrix, where n is the number of rows in the matrix argument.
You can also get some insight by playing around and seeing what happens when you pass something other than a matrix-vector pair where the product is defined. In some cases, you will get an error; sometimes you get nonsense; and sometimes you get something you might not expect just from the function name. See what happens when you call matvecmult(mymatrix, mymatrix).
The function is calculating the product of a Matrix and a column vector. It assumes both the number of columns of the matrix is equal to the number of elements in the vector.
It stores the number of columns of A in n and number of rows in m.
It then initializes a matrix of mrows with all values as 0.
It iterates along the rows of A and multiplies each value in each row with the values in x.
The answer is the stored in y and finally it returns the single column matrix y.

Resources