How to identify coordinates in R - r

I would like to identify which coordinate of my vector gives me the greatest value. For a simple example suppose that:
x <- c(10,22,20,18,5)
The greatest value is 22, but how can i automatically identify that the coordinate 2 has the greatest value?
Thanks!

which.max is your friend as pointed out by #Hong Ooi
> x <- c(10,22,20,18,5)
> which.max(x)
[1] 2
Another (not optimal way) is a combination of which and max.
> which(x==max(x))
[1] 2

First, find the greatest value with max:
> max(x)
[1] 22
From there, you can figure out which value(s) in the vector match the greatest value:
> x==max(x)
[1] FALSE TRUE FALSE FALSE FALSE
which() can be used to translate the boolean vector into indices:
which(x==max(x))
[1] 2

Because you say co-ordinates, I am assuming the case-in-point may not always be a one-dimensional vector and therefore I am going to make my comment to #Jilber an answer.
A general answer is to use which(x == max(x), ind.arr = TRUE). This will give you all dimensions of an array of any dimensionality. For e.g.
R> x <- array(runif(8), dim=rep_len(2, 3))
R> x
, , 1
[,1] [,2]
[1,] 0.3202624 0.7740697
[2,] 0.9374742 0.2370483
, , 2
[,1] [,2]
[1,] 0.9423731 0.2099402
[2,] 0.7035772 0.8195685
R> which(x == max(x), arr.ind=TRUE)
dim1 dim2 dim3
[1,] 1 1 2
R> which(x[1, , ] == max(x[1, , ]), arr.ind=TRUE)
row col
[1,] 1 2
R> which(x[1, 1, ] == max(x[1, 1, ]), arr.ind=TRUE)
[1] 2
For the specific case of one-dimensional vectors, which.max is a 'faster' solution.

Related

Replacing pair of element of symmetric matrix with NA

I have a positive definite symmetric matrix. Pasting the matrix generated using the following code:
set.seed(123)
m <- genPositiveDefMat(
dim = 3,
covMethod = "unifcorrmat",
rangeVar = c(0,1) )
x <- as.matrix(m$Sigma)
diag(x) <- 1
x
#Output
[,1] [,2] [,3]
[1,] 1.0000000 -0.2432303 -0.4110525
[2,] -0.2432303 1.0000000 -0.1046602
[3,] -0.4110525 -0.1046602 1.0000000
Now, I want to run the matrix through iterations and in each iteration I want to replace the symmetric pair with NA. For example,
Iteration 1:
x[1,2] = x[2,1] <- NA
Iteration2:
x[1,3] = x[3,1] <- NA
and so on....
My idea was to check using a for loop
Prototype:
for( r in 1:nrow(x)
for( c in 1:ncol(x)
if x[r,c]=x[c,r]<-NA
else
x[r,c]
The issue with my code is for row 1 and column 1, the values are equal hence it sets to 0 (which is wrong). Also, the moment it is not NA it comes out of the loop.
Appreciate any help here.
Thanks
If you need the replacement done iteratively, you can use the indexes of values represented by upper.tri(x)/lower.tri to do the replacements pair-by-pair. That will allow you to pass the results to a function before/after each replacement, e.g.:
idx <- which(lower.tri(mat), arr.ind=TRUE)
sel <- cbind(
replace(mat, , seq_along(mat))[ idx ],
replace(mat, , seq_along(mat))[ idx[,2:1] ]
)
# [,1] [,2]
#[1,] 2 4 ##each row represents the lower/upper pair
#[2,] 3 7
#[3,] 6 8
for( i in seq_len(nrow(sel)) ) {
mat[ sel[i,] ] <- NA
print(mean(mat, na.rm=TRUE))
}
#[1] 0.2812249
#[1] 0.5581359
#[1] 1

Find closest value with condition

I have a function that finds me the nearest values for each row in a matrix. It then reports a list with an index of the nearest rows. However, I want it to exclude values if they are +1 in the first AND +1 in the second column away from a particular set of values (-1 in the first and -1 in the second column should also be removed). Moreover, +1 in first column and -1 in second column with respect to the values of interest should also be avoided.
As an example, if I want things closes to c(2, 1), it should accept c(3,1) or (2,2) or (1,1), but NOT c(3,2) and not c(1,0).
Basically, for an output to be reported either column 1 or column 2 should be a value of 1 away from a row of interest, but not both.
input looks like this
x
v1 v2
[1,] 3 1
[2,] 2 1
[3,] 3 2
[4,] 1 2
[5,] 8 5
myfunc(x)
The output looks like this. Notice that the closest thing to row 2 ($V2 in output) is row 1,3,4. The answer should only be 1 though.
$V1
[1] 2 3
$V2
[1] 1 3 4
$V3
[1] 1 2
$V4
[1] 2
$V5
integer(0)
Here is myfunc
myfunc = function(t){
d1 <- dist(t[,1])
d2 <- dist(t[,2])
dF <- as.matrix(d1) <= 1 & as.matrix(d2) <= 1
diag(dF) <- NA
colnames(dF) <- NULL
dF2 <- lapply(as.data.frame(dF), which)
return(dF2)
}
Basically, the rows that you want to find should differ from your reference element by +1 or -1 in one column and be identical in the other column. That means that the sum over the absolute values of the differences is exactly one. For your example c(2, 1), this works as follows:
c(3, 1): difference is c(1, 0), thus sum(abs(c(1, 0))) = 1 + 0 = 1
c(1, 1): difference is c(-1, 0), thus sum(abs(c(-1, 0))) = 1 + 0 = 1
etc.
The following function checks exactly this:
myfunc <- function(x) {
do_row <- function(r) {
r_mat <- matrix(rep(r, length = length(x)), ncol = ncol(x), byrow = TRUE)
abs_dist <- abs(r_mat - x)
return(which(rowSums(abs_dist) == 1))
}
return(apply(x, 1, do_row))
}
do_row() does the job for a single row, and then apply() is used to do this with each row. For your example, I get:
myfunc(x)
## [[1]]
## [1] 2 3
##
## [[2]]
## [1] 1
##
## [[3]]
## [1] 1
##
## [[4]]
## integer(0)
##
## [[5]]
## integer(0)
Using sweep(), one can write a shorter function:
myfunc2 <- function(x) {
apply(x, 1, function(r) which(rowSums(abs(sweep(x, 2, r))) == 1))
}
But this seems harder to understand and it turns out that it is slower by about a factor two for your matrix x. (I have also tried it with a large matrix, and there, the efficiency seems about the same.)

Check if all rows are the same in a matrix

I'm looking to see if sample(..., replace=T) results in sampling the same row n times. I see the duplicated function checks if elements are repeated by returning a logical vector for each index, but I need to see if one element is repeated n times (single boolean value). What's the best way to go about this?
Here's just an example. Some function on this matrix should return TRUE
t(matrix(c(rep(c(rep(4,1),rep(5,1)),8)),nrow=2,ncol=8))
[,1] [,2]
[1,] 4 5
[2,] 4 5
[3,] 4 5
[4,] 4 5
[5,] 4 5
[6,] 4 5
[7,] 4 5
[8,] 4 5
Here is one solution that works to produce the true/false result you are looking for:
m <- t(matrix(c(rep(c(rep(4,1),rep(5,1)),8)),nrow=2,ncol=8))
apply(m, 2, function(x) length(unique(x)) == 1)
[1] TRUE TRUE
m <- rbind(m, c(4, 6))
apply(m, 2, function(x) length(unique(x)) == 1)
[1] TRUE FALSE
If you want a single boolean value saying if all columns have unique values, you can do:
all(apply(m, 2, function(x) length(unique(x)) == 1) == TRUE)
[1] FALSE
A little cleaner looking (and easier to tell what the code is doing):
m <- t(matrix(c(rep(c(rep(4,1),rep(5,1)),8)),nrow=2,ncol=8))
apply(m, 2, function(x) all(x==x[1]))
[1] TRUE TRUE
Think I've got my solution.
B <- t(matrix(c(rep(c(rep(4,1),rep(5,1)),8)),nrow=2,ncol=8))
length(table(B)) == ncol(B)
[1] TRUE
B <- rbind(B,c(4,6)) # different sample
length(table(B)) == ncol(B)
[1] FALSE
We could also replicate the first row, compare with the original matrix, get the colSums and check whether it is equal to nrow of 'm'
colSums(m[1,][col(m)]==m)==nrow(m)
[1] TRUE TRUE
Or another option would be to check the variance
!apply(m, 2, var)
#[1] TRUE TRUE
For a matrix you can apply unique directly to your matrix, no need for apply:
nrow(unique(m)) == 1L
[1] TRUE
nrow(unique(rbind(m, c(6,7)))) == 1L
[1] FALSE
From the documentation ?unique:
The array method calculates for each element of the dimension specified by MARGIN if the remaining dimensions are identical to those for an earlier element (in row-major order). This would most commonly be used for matrices to find unique rows (the default) or columns (with MARGIN = 2).
Alternatively you can transpose your matrix and leverage vectorized comparison:
all(m[1,] == t(m))
[1] TRUE

Why am I not getting the index value of a matrix using the which function?

I have a matrix of thousands of coordinate values. And all I want is to find the index of a chosen value. I use this:
which(long == -118.1123, arr.ind=TRUE)
But I don't get any value. All I get is blank row and column. However, when I do this, I get values.
which(long < -118.1123, arr.ind=TRUE)
I know this value exists because I have manually checked in the Rstudio pane as well as printed out the value using long[1,2] etc.
dput(long) doesn't work with matrices. Hope you can help.
Diagnosis as per comments
- long[1,1]
[1] -118.0981
- long[1,1]==-118.0981
[1] FALSE
Here's an example of using a test that allows a "fuzz-factor" difference to be ignored:
> M <- matrix(rnorm(10) , 5,2)
> M
[,1] [,2]
[1,] -0.2382021 2.1698010
[2,] -1.1617644 -1.1513516
[3,] 1.3597808 0.9365208
[4,] 0.7460694 -1.7216410
[5,] -0.2413117 -0.1780468
> which(M==-0.2382021, arr.ind=TRUE)
row col
> which(abs(M - -0.2382021) < 0.0000001, arr.ind=TRUE)
row col
[1,] 1 1
My comment suggesting all.equal didn't work with a matrix argument inside which. I checked with #RHertel's choice of signif and it does succeed.
> which(signif(M,7) == -0.2382021, arr.ind=TRUE)
row col
[1,] 1 1
Your numerical value in the matrix may not be exactly equal to -118.1123. It may contain several digits that aren't displayed since the machine's accuracy is much higher and a perfect identity may not be obtained due to minimal roundoff errors.
I suggest that you try
which(signif(long,7) == -118.1123, arr.ind=TRUE)
Here's a simple example illustrating the problem:
Let's first fill a vector with random numbers between 0 and 1:
v <- runif(100)
Then we redefine an arbitrary element of the vector, say element 42, and set it equal to pi:
v[42] <- pi
> v[42]
#[1] 3.141593
However, if we test for equality using "==" the result is FALSE:
> v[42] == 3.141593
#[1] FALSE
But if we only consider the first seven significant digits, the position 42 can be extracted from the vector:
> which(signif(v,7) == 3.141593, arr.ind = T)
#[1] 42
Now, to address your comment, let's assume that you're trying to find the number that is closest to 3.141591 among all your elements in the vector v. We can be certain that this will be element 42, but R should find out. This result can be obtained with
> order(abs(v-3.141591))[1]
#[1] 42

R - min, max and mean of off-diagonal elements in a matrix

I have like a matrix in R and I want to get:
Max off - diagonal elements
Min off – diagonal elements
Mean off –diagonal elements
With diagonal I used max(diag(A)) , min(diag(A)) , mean(diag(A)) and worked just fine
But for off-diagonal I tried
dataD <- subset(A, V1!=V2)
Error in subset.matrix(A, V1 != V2) : object 'V1' not found
to use:
colMeans(dataD) # get the mean for columns
but I cannot get dataD b/c it says object 'V1' not found
Thanks!
Here the row() and col() helper functions are useful. Using #James A, we can get the upper off-diagonal using this little trick:
> A[row(A) == (col(A) - 1)]
[1] 5 10 15
and the lower off diagonal via this:
> A[row(A) == (col(A) + 1)]
[1] 2 7 12
These can be generalised to give whatever diagonals you want:
> A[row(A) == (col(A) - 2)]
[1] 9 14
and don't require any subsetting.
Then it is a simple matter of calling whatever function you want on these values. E.g.:
> mean(A[row(A) == (col(A) - 1)])
[1] 10
If as per my comment you mean everything but the diagonal, then use
> diag(A) <- NA
> mean(A, na.rm = TRUE)
[1] 8.5
> max(A, na.rm = TRUE)
[1] 15
> # etc. using sum(A, na.rm = TRUE), min(A, na.rm = TRUE), etc..
So this doesn't get lost, Ben Bolker suggests (in the comments) that the above code block can be done more neatly using the row() and col() functions I mentioned above:
mean(A[row(A)!=col(A)])
min(A[row(A)!=col(A)])
max(A[row(A)!=col(A)])
sum(A[row(A)!=col(A)])
which is a nicer solution all round.
In one simple line of code:
For a matrix A if you wish to find the Minimum, 1st Quartile, Median, Mean, 3rd Quartile and Maximum of the upper and lower off diagonals:
summary(c(A[upper.tri(A)],A[lower.tri(A)])).
The diag of a suitably subsetted matrix will give you the off-diagonals. For example:
A <- matrix(1:16,4)
#upper off-diagonal
diag(A[-4,-1])
[1] 5 10 15
#lower off-diagonal
diag(A[-1,-4])
[1] 2 7 12
To get a vector holding the max of the off-diagonal elements of each col or row of a matrix requires a few more steps. I was directed here when searching for help on that. Perhaps others will do the same, so I offer this solution, which I found using what I learned here.
The trick is to create a matrix of only the off-diagonal elements. Consider:
> A <- matrix(c(10,2,3, 4,10,6, 7,8,10), ncol=3)
> A
[,1] [,2] [,3]
[1,] 10 4 7
[2,] 2 10 8
[3,] 3 6 10
> apply(A, 2, max)
[1] 10 10 10
Subsetting using the suggested indexing, A[row(A)!=col(A)] produces a vector of off-diagonal elements, in column-order:
> v <- A[row(A)!=col(A)]
> v
[1] 2 3 4 6 7 8
Returning this to a matrix allows the use of apply() to apply a function of choice to a margin of only off-diagonal elements. Using the max function as an example:
> A.off <- matrix(v, ncol=3)
> A.off
[,1] [,2] [,3]
[1,] 2 4 7
[2,] 3 6 8
> v <- apply(A.off, 2, max)
> v
[1] 3 6 8
The whole operation can be compactly—and rather cryptically—coded in one line:
> v <- apply(matrix(A[row(A)!=col(A)], ncol=ncol(A)), 2, max)
> v
[1] 3 6 8
Just multiply matrix A by 1-diag (nofelements)
for example if A is a 4x4 matrix, then
mean(A*(1-diag(4)) or A*(1-diag(nrow(A)))
This is faster when you need to run the same line of code multiple times
In addition to James' answer, I want to add that you can use the diag function to directly exclude all diagonal elements of a matrix by use of A[-diag(A)]. For example, consider:
summary(A[-diag(A)])

Resources