I'm looking to see if sample(..., replace=T) results in sampling the same row n times. I see the duplicated function checks if elements are repeated by returning a logical vector for each index, but I need to see if one element is repeated n times (single boolean value). What's the best way to go about this?
Here's just an example. Some function on this matrix should return TRUE
t(matrix(c(rep(c(rep(4,1),rep(5,1)),8)),nrow=2,ncol=8))
[,1] [,2]
[1,] 4 5
[2,] 4 5
[3,] 4 5
[4,] 4 5
[5,] 4 5
[6,] 4 5
[7,] 4 5
[8,] 4 5
Here is one solution that works to produce the true/false result you are looking for:
m <- t(matrix(c(rep(c(rep(4,1),rep(5,1)),8)),nrow=2,ncol=8))
apply(m, 2, function(x) length(unique(x)) == 1)
[1] TRUE TRUE
m <- rbind(m, c(4, 6))
apply(m, 2, function(x) length(unique(x)) == 1)
[1] TRUE FALSE
If you want a single boolean value saying if all columns have unique values, you can do:
all(apply(m, 2, function(x) length(unique(x)) == 1) == TRUE)
[1] FALSE
A little cleaner looking (and easier to tell what the code is doing):
m <- t(matrix(c(rep(c(rep(4,1),rep(5,1)),8)),nrow=2,ncol=8))
apply(m, 2, function(x) all(x==x[1]))
[1] TRUE TRUE
Think I've got my solution.
B <- t(matrix(c(rep(c(rep(4,1),rep(5,1)),8)),nrow=2,ncol=8))
length(table(B)) == ncol(B)
[1] TRUE
B <- rbind(B,c(4,6)) # different sample
length(table(B)) == ncol(B)
[1] FALSE
We could also replicate the first row, compare with the original matrix, get the colSums and check whether it is equal to nrow of 'm'
colSums(m[1,][col(m)]==m)==nrow(m)
[1] TRUE TRUE
Or another option would be to check the variance
!apply(m, 2, var)
#[1] TRUE TRUE
For a matrix you can apply unique directly to your matrix, no need for apply:
nrow(unique(m)) == 1L
[1] TRUE
nrow(unique(rbind(m, c(6,7)))) == 1L
[1] FALSE
From the documentation ?unique:
The array method calculates for each element of the dimension specified by MARGIN if the remaining dimensions are identical to those for an earlier element (in row-major order). This would most commonly be used for matrices to find unique rows (the default) or columns (with MARGIN = 2).
Alternatively you can transpose your matrix and leverage vectorized comparison:
all(m[1,] == t(m))
[1] TRUE
Related
I would like to tally each time a value lies within a given range in a matrix by-row, and then sum these logical outcomes to derive a "measure of consistency" for each row.
Reproducible example:
m1 <- matrix(c(1,2,1,6,3,7,4,2,6,8,11,15), ncol=4, byrow = TRUE)
# expected outcome, given a range of +/-1 either side
exp.outcome<-matrix(c(TRUE,TRUE,TRUE,FALSE,
TRUE,FALSE,TRUE,TRUE,
FALSE,FALSE,FALSE,FALSE),
ncol=4, byrow=TRUE)
Above I've indicated the the expected outcome, in the case where each value lies within +/- 1 range of any other values within that row.
Within the first row of m1 the first value (1) is within +/-1 of any other value in that row hence equals TRUE, and so on.
By contrast, none of the values in row 4 of m1 are within a single digit value of each other and hence each is assigned FALSE.
Any pointers would be much appreciated?
Update:
Thanks to the help provided I can now count the unique pairs of values which meet the ceiling criteria for any arbitrarily large matrix (using the binomial coefficient, k draws from n, without replacement).
Before progressing with the answer I just wanted to clarify that in your question you have said:
Within the first row of m1 the first value (1) is within +/-1 of any
other value in that row hence equals TRUE, and so on.
However,
>> m1[1,4]
[1] 6
6 is not within the +/- 1 from 1, and there is FALSE value as a correct result in your answer.
Solution
This solution should get you to the desired results:
t(apply(
X = m1,
# Take each row from the matrix
MARGIN = 1,
FUN = function(x) {
sapply(
X = x,
# Now go through each element of that row
FUN = function(y) {
# Your conditions
y %in% c(x - 1) | y %in% c(x + 1)
}
)
}
))
Results
[,1] [,2] [,3] [,4]
[1,] TRUE TRUE TRUE FALSE
[2,] TRUE FALSE TRUE TRUE
[3,] FALSE FALSE FALSE FALSE
Check
For results stored as res.
>> identical(res, exp.outcome)
[1] TRUE
Here is a kind of neat base R method that uses an array:
The first two lines are setup that store a three dimensional array of acceptable values and a matrix that will store the desired output. The structure of the array is as follows: columns correspond with acceptable values of a matrix element in same column. The third dimension correspond with the rows of the matrix.
Pre-allocation in this way should cut down on repeated computations.
# construct array of all +1/-1 values
valueArray <- sapply(1:nrow(m1), function(i) rbind(m1[i,]-1, m1[i,], m1[i,]+1),
simplify="array")
# get logical matrix of correct dimensions
exp.outcome <- matrix(TRUE, nrow(m1), ncol(m1))
# get desired values
for(i in 1:nrow(m1)) {
exp.outcome[i, ] <- sapply(1:ncol(m1), function(j) m1[i, j] %in% c(valueArray[, -j, i]))
}
Which returns
exp.outcome
[,1] [,2] [,3] [,4]
[1,] TRUE TRUE TRUE FALSE
[2,] TRUE FALSE TRUE TRUE
[3,] FALSE FALSE FALSE FALSE
I would like to identify which coordinate of my vector gives me the greatest value. For a simple example suppose that:
x <- c(10,22,20,18,5)
The greatest value is 22, but how can i automatically identify that the coordinate 2 has the greatest value?
Thanks!
which.max is your friend as pointed out by #Hong Ooi
> x <- c(10,22,20,18,5)
> which.max(x)
[1] 2
Another (not optimal way) is a combination of which and max.
> which(x==max(x))
[1] 2
First, find the greatest value with max:
> max(x)
[1] 22
From there, you can figure out which value(s) in the vector match the greatest value:
> x==max(x)
[1] FALSE TRUE FALSE FALSE FALSE
which() can be used to translate the boolean vector into indices:
which(x==max(x))
[1] 2
Because you say co-ordinates, I am assuming the case-in-point may not always be a one-dimensional vector and therefore I am going to make my comment to #Jilber an answer.
A general answer is to use which(x == max(x), ind.arr = TRUE). This will give you all dimensions of an array of any dimensionality. For e.g.
R> x <- array(runif(8), dim=rep_len(2, 3))
R> x
, , 1
[,1] [,2]
[1,] 0.3202624 0.7740697
[2,] 0.9374742 0.2370483
, , 2
[,1] [,2]
[1,] 0.9423731 0.2099402
[2,] 0.7035772 0.8195685
R> which(x == max(x), arr.ind=TRUE)
dim1 dim2 dim3
[1,] 1 1 2
R> which(x[1, , ] == max(x[1, , ]), arr.ind=TRUE)
row col
[1,] 1 2
R> which(x[1, 1, ] == max(x[1, 1, ]), arr.ind=TRUE)
[1] 2
For the specific case of one-dimensional vectors, which.max is a 'faster' solution.
I have a matrix and a vector with values:
mat<-matrix(c(1,1,6,
3,5,2,
1,6,5,
2,2,7,
8,6,1),nrow=5,ncol=3,byrow=T)
vec<-c(1,6)
This is a small subset of a N by N matrix and 1 by N vector. Is there a way so that I can subset the rows with values in vec?
The most straight forward way of doing this that I know of would be to use the subset function:
subset(mat,vec[,1] == 1 & vec[,2] == 6) #etc etc
The problem with subset is you have to specify in advance the column to look for and the specific combination to do for. The problem I am facing is structured in a way such that I want to find all rows containing the numbers in "vec" in any possible way. So in the above example, I want to get a return matrix of:
1,1,6
1,6,5
8,6,1
Any ideas?
You can do
apply(mat, 1, function(x) all(vec %in% x))
# [1] TRUE FALSE TRUE FALSE TRUE
but this may give you unexpected results if vec contains repeated values:
vec <- c(1, 1)
apply(mat, 1, function(x) all(vec %in% x))
# [1] TRUE FALSE TRUE FALSE TRUE
so you would have to use something more complicated using table to account for repetitions:
vec <- c(1, 1)
is.sub.table <- function(table1, table2) {
all(names(table1) %in% names(table2)) &&
all(table1 <= table2[names(table1)])
}
apply(mat, 1, function(x)is.sub.table(table(vec), table(x)))
# [1] TRUE FALSE FALSE FALSE FALSE
However, if the vector length is equal to the number of columns in your matrix as you seem to indicate but is not the case in your example, you should just do:
vec <- c(1, 6, 1)
apply(mat, 1, function(x) all(sort(vec) == sort(x)))
# [1] TRUE FALSE FALSE FALSE FALSE
I have like a matrix in R and I want to get:
Max off - diagonal elements
Min off – diagonal elements
Mean off –diagonal elements
With diagonal I used max(diag(A)) , min(diag(A)) , mean(diag(A)) and worked just fine
But for off-diagonal I tried
dataD <- subset(A, V1!=V2)
Error in subset.matrix(A, V1 != V2) : object 'V1' not found
to use:
colMeans(dataD) # get the mean for columns
but I cannot get dataD b/c it says object 'V1' not found
Thanks!
Here the row() and col() helper functions are useful. Using #James A, we can get the upper off-diagonal using this little trick:
> A[row(A) == (col(A) - 1)]
[1] 5 10 15
and the lower off diagonal via this:
> A[row(A) == (col(A) + 1)]
[1] 2 7 12
These can be generalised to give whatever diagonals you want:
> A[row(A) == (col(A) - 2)]
[1] 9 14
and don't require any subsetting.
Then it is a simple matter of calling whatever function you want on these values. E.g.:
> mean(A[row(A) == (col(A) - 1)])
[1] 10
If as per my comment you mean everything but the diagonal, then use
> diag(A) <- NA
> mean(A, na.rm = TRUE)
[1] 8.5
> max(A, na.rm = TRUE)
[1] 15
> # etc. using sum(A, na.rm = TRUE), min(A, na.rm = TRUE), etc..
So this doesn't get lost, Ben Bolker suggests (in the comments) that the above code block can be done more neatly using the row() and col() functions I mentioned above:
mean(A[row(A)!=col(A)])
min(A[row(A)!=col(A)])
max(A[row(A)!=col(A)])
sum(A[row(A)!=col(A)])
which is a nicer solution all round.
In one simple line of code:
For a matrix A if you wish to find the Minimum, 1st Quartile, Median, Mean, 3rd Quartile and Maximum of the upper and lower off diagonals:
summary(c(A[upper.tri(A)],A[lower.tri(A)])).
The diag of a suitably subsetted matrix will give you the off-diagonals. For example:
A <- matrix(1:16,4)
#upper off-diagonal
diag(A[-4,-1])
[1] 5 10 15
#lower off-diagonal
diag(A[-1,-4])
[1] 2 7 12
To get a vector holding the max of the off-diagonal elements of each col or row of a matrix requires a few more steps. I was directed here when searching for help on that. Perhaps others will do the same, so I offer this solution, which I found using what I learned here.
The trick is to create a matrix of only the off-diagonal elements. Consider:
> A <- matrix(c(10,2,3, 4,10,6, 7,8,10), ncol=3)
> A
[,1] [,2] [,3]
[1,] 10 4 7
[2,] 2 10 8
[3,] 3 6 10
> apply(A, 2, max)
[1] 10 10 10
Subsetting using the suggested indexing, A[row(A)!=col(A)] produces a vector of off-diagonal elements, in column-order:
> v <- A[row(A)!=col(A)]
> v
[1] 2 3 4 6 7 8
Returning this to a matrix allows the use of apply() to apply a function of choice to a margin of only off-diagonal elements. Using the max function as an example:
> A.off <- matrix(v, ncol=3)
> A.off
[,1] [,2] [,3]
[1,] 2 4 7
[2,] 3 6 8
> v <- apply(A.off, 2, max)
> v
[1] 3 6 8
The whole operation can be compactly—and rather cryptically—coded in one line:
> v <- apply(matrix(A[row(A)!=col(A)], ncol=ncol(A)), 2, max)
> v
[1] 3 6 8
Just multiply matrix A by 1-diag (nofelements)
for example if A is a 4x4 matrix, then
mean(A*(1-diag(4)) or A*(1-diag(nrow(A)))
This is faster when you need to run the same line of code multiple times
In addition to James' answer, I want to add that you can use the diag function to directly exclude all diagonal elements of a matrix by use of A[-diag(A)]. For example, consider:
summary(A[-diag(A)])
I can't believe this is taking me this long to figure out, and I still can't figure it out.
I need to keep a collection of vectors, and later check that a certain vector is in that collection. I tried lists combined with %in% but that doesn't appear to work properly.
My next idea was to create a matrix and rbind vectors to it, but now I don't know how to check if a vector is contained in a matrix. %in appears to compare sets and not exact rows. Same appears to apply to intersect.
Help much appreciated!
Do you mean like this:
wantVec <- c(3,1,2)
myList <- list(A = c(1:3), B = c(3,1,2), C = c(2,3,1))
sapply(myList, function(x, want) isTRUE(all.equal(x, want)), wantVec)
## or, is the vector in the set?
any(sapply(myList, function(x, want) isTRUE(all.equal(x, want)), wantVec))
We can do a similar thing with a matrix:
myMat <- matrix(unlist(myList), ncol = 3, byrow = TRUE)
## As the vectors are now in the rows, we use apply over the rows
apply(myMat, 1, function(x, want) isTRUE(all.equal(x, want)), wantVec)
## or
any(apply(myMat, 1, function(x, want) isTRUE(all.equal(x, want)), wantVec))
Or by columns:
myMat2 <- matrix(unlist(myList), ncol = 3)
## As the vectors are now in the cols, we use apply over the cols
apply(myMat, 2, function(x, want) isTRUE(all.equal(x, want)), wantVec)
## or
any(apply(myMat, 2, function(x, want) isTRUE(all.equal(x, want)), wantVec))
If you need to do this a lot, write your own function
vecMatch <- function(x, want) {
isTRUE(all.equal(x, want))
}
And then use it, e.g. on the list myList:
> sapply(myList, vecMatch, wantVec)
A B C
FALSE TRUE FALSE
> any(sapply(myList, vecMatch, wantVec))
[1] TRUE
Or even wrap the whole thing:
vecMatch <- function(x, want) {
out <- sapply(x, function(x, want) isTRUE(all.equal(x, want)), want)
any(out)
}
> vecMatch(myList, wantVec)
[1] TRUE
> vecMatch(myList, 5:3)
[1] FALSE
EDIT: Quick comment on why I used isTRUE() wrapped around the all.equal() calls. This is due to the fact that where the two arguments are not equal, all.equal() doesn't return a logical value (FALSE):
> all.equal(1:3, c(3,2,1))
[1] "Mean relative difference: 1"
isTRUE() is useful here because it returns TRUE iff it's argument is TRUE, whilst it returns FALSE if it is anything else.
> M
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
v <- c(2, 5, 8)
check each column:
c1 <- which(M[, 1] == v[1])
c2 <- which(M[, 2] == v[2])
c3 <- which(M[, 3] == v[3])
Here is a way to still use intersect() on more than 2 elements
> intersect(intersect(c1, c2), c3)
[1] 2