In R, I am wanting to count the number of different values occurring in a column of a matrix, but only if a certain value occurs in another column. To clarify, consider this matrix:
MAT <- matrix(nrow=5,ncol=2, c(1,0,1,1,2,1,1,1,2,0))
The matrix looks like this:
> MAT
[,1] [,2]
[1,] 1 1
[2,] 0 1
[3,] 1 1
[4,] 1 2
[5,] 2 0
I would like to find the number of '1's occurring in column 2, but only if '0' occurs in column 1 in the same row. The only function I know which does something similar is table, but I don't think it can check another column; it can only exclude values in the data being checked. (Please correct me on this if I am wrong.) I have tried searching on the internet, but I only get hits to unrelated problems.
Can anyone help me find a function for this problem?
you can do something like this :
sum(MAT[,2]==1 & MAT[,1]==0)
You can always subset the matrix with a condition like this:
MAT[ MAT[,1] == 0, ]
table( MAT[ MAT[,1] == 0, ] )
This will give you the rows:
which(MAT[,1]==0 & MAT[,2]==1)
And the length of that is how many times that pattern occurs.
You can use table :
table(MAT[,2]==1 & MAT[,1]==0)
FALSE TRUE
4 1
Related
I'm trying to learn how to use the apply() functions.
Suppose we have a 3 row, 2 column matrix of test <- matrix(c(1,2,3,4,5,6), ncol = 2), and we would like the maximum value of each element in the first column (1, 2, 3) to not exceed 2 for example, so we end up with a matrix of (1,2,2,4,5,6).
How would one write an apply() function to do this?
Here's my latest attempt: test1 <- apply(test[,1], 2, function(x) {if(x > 2){return(x = 2)} else {return(x)}})
We may use pmin on the first column with value 2 as the second argument, so that it does elementwise checking with the recycled 2 and gets the minimum for each value from the first column
test[,1] <- pmin(test[,1], 2)
-output
> test
[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 2 6
Note that apply needs the 'X' as an array/matrix or one with dimensions, when we subset only a single column/row, it drops the dimensions because drop = TRUE by default
If you really want to use the apply() function, I guess you're looking for something like this:
t(apply(test, 1, function(x) c(min(x[1], 2), x[2])))
## [,1] [,2]
## [1,] 1 4
## [2,] 2 5
## [3,] 2 6
But if you want my opinion, akrun's suggestion is definitely better.
So I know to determine the first occurrence of a specific element in each row you use the apply function with which.max or which.min. Here is the code that I am using right now.
x <- matrix(c(20,9,4,16,6,2,14,3,1),nrow=3)
x
apply(3 >= x,1,which.max )
This produces and output of:
[1] 1 3 2
Now when I try to do the same thing on a different matrix "x2"
x2 <- matrix(c(3,9,4,16,6,2,14,3,1),nrow=3)
x2
apply(3 >= x2,1,which.max )
The output is the same;
[1] 1 3 2
But for "x2" it is correct because the "x2" matrix's first row does have a value less than or equal to three.
Now my question which is probably something simple is why do the apply functions produce the same thing for "x" and "x2". For "x" below I would want something like:
[1] 0 3 2
Or maybe even something like this:
[1] NA 3 2
I have seen questions on stack overflow before on which.max not producing NAs and the answer was to just use the which() function, but since I am using a matrix and I want the first occurrence I do not have that luxury... I think.
We could replace the values in 'x' that are >3 with a very small number, for e.g. -999 or the value that is lower than in the minimum value in the dataset. Get the index of the replaced vector with which.max and multiply with a logical index to take care of cases where there are only negative values. i.e. in the case of 'x', the first row is all greater than 3. So by replacing with -999, the which.max returns 1 as the index but we prefer to have it NA or 0. By using sum(x1>0, the first row will be '0' and negating (!), it converts to TRUE, negate once again and it returns FALSE. Multiplying the logical index coerces to binary (0/1) and we get the '0' value for the first case.
apply(x, 1, function(x) {x1 <- ifelse(x>3, -999, x)
which.max(x1)*(!!sum(x1>0))})
#[1] 0 3 2
apply(x2, 1, function(x) {x1 <- ifelse(x>3, -999, x)
which.max(x1)*(!!sum(x1>0))})
#[1] 1 3 2
Another option is using max.col
x1 <- replace(x, which(x>3), -999)
max.col(x1)*!!rowSums(x1>0)
#[1] 0 3 2
x2N <- replace(x2, which(x2>3), -999)
max.col(x2N)*!!rowSums(x2N>0)
#[1] 1 3 2
Or a slight modification would be
indx <- x*(x <=3)
max.col(indx)*!!rowSums(indx)
#[1] 0 3 2
Put a column in front of '(3>=x)' that is Infinity, if and only if all entries in the corresponding row of 'x' are larger than 3, and otherwise NaN. Then apply 'which.max' rowwise, and finally subtract 1, because of the extra column:
x <- matrix(c(20,9,4,16,6,2,14,3,1),nrow=3)
a <- (!apply(3>=x,1,max))*Inf
apply( cbind(a,3>=x), 1, which.max ) - 1
This gives '0,3,2' 'which.max' is applied to the extended matrix
> cbind(a,3>=x)
a
[1,] Inf 0 0 0
[2,] NaN 0 0 1
[3,] NaN 0 1 1
I was wondering if anyone could help me understand the output of this function. I know it's supposed to return the positions in which there is a run of length 2 but I am not exactly sure how to interpret the output.
fun1 = function(M,k) {
n = nrow(M)
m = ncol(M)
runs = vector('list',length=m)
for(i in 1:m) {
for(j in 1:(n-k+1)) {
if(all(M[j:(j+k-1),i]==1)) runs[[i]] = c(runs[[i]],j)
}
}
return(runs)
}
set.seed(123)
M = matrix(sample(0:1,size=15,replace=TRUE),ncol=3,nrow=5)
fun1(M,2)
Output:
[[1]]
[1] 4
[[2]]
[1] 2 3
[[3]]
[1] 3
Each element in the list is the output for a column, starting at the left-most column. The list of numbers (or NULL if there are none) gives you the row numbers in that column where there are two 1's in a row.
To interpret the sample output you have:
- In the first (left-most) column, there are two 1's starting at row 4 (M[4,1] and M[5,1] are 1)
- In the second column, there are two 1's starting at row 2 (meaning row 2 and row 3 are 1's) and also at row 3 (meaning row 3 and row 4 are 1's)
- In the third row, there are two 1's starting at row 3
You can check that this is true if you print our the matrix M, which given your seed looks like this
[,1] [,2] [,3]
[1,] 0 0 1
[2,] 1 1 0
[3,] 0 1 1
[4,] 1 1 1
[5,] 1 0 0
I hope that makes it clear.
By the way, in the future, try to format your code better with proper indentations and line breaks. I had to manually add line breaks to make the sample code work, but good job giving a seed :)
I would like to delete rows from a large matrix using the following criteria:
Any row that contains 100 in its second column should be removed.
How can this be done? I know how to select those rows but I'm not sure how to remove them using a rule.
R > mat = matrix(c(1,2,3,100,200,300), 3,2)
R > mat
[,1] [,2]
[1,] 1 100
[2,] 2 200
[3,] 3 300
R > (index = mat[,2] == 100)
[1] TRUE FALSE FALSE
R > mat[index, ]
[1] 1 100
R > mat[!index, ]
[,1] [,2]
[1,] 2 200
[2,] 3 300
Previously I was confused by the index with another method which, here is the solution by which:
R > (index2 = which(mat[,2] == 100))
[1] 1
R > mat[-index2, ]
[,1] [,2]
[1,] 2 200
[2,] 3 300
Watch out the different use for those index (! and -).
Here's how I would do it in Matlab with a matrix A.
Option 1
for (i=size(A,1):-1:0)
if (A(i,2)==100)
A(i,:)=[];
end
end
This loops over rows (starting at the bottom), and sets any row with 100 in the 2nd element to an empty set, which effectively deletes it.
Maybe you can convert this to r, or maybe it will help somebody else who is having this problem.
Option 2
logicalIndex=(A(:,2)==100);
A(logicalIndex,:)=[];
This first finds rows with 100 in the 2nd column, then deletes them all.
Is it possible to select a subset of a three dimensional array with a two-dimensional binary array? I would like to be able to do this so that I can push values into the selection
For example I have an array dim(a) = (lat, long, time), and I want to select with dim(b) = (lat, long) which is an array full of TRUE/FALSE values. I want to be able to do something like:
> a <- array(c(1,2,3,4,5,6,7,8),c(2,2,2))
> b <- matrix(c(0,1,0,0), c(2,2))==TRUE
> a[[b]] <- 0
> a
, , 1
[,1] [,2]
[1,] 1 3
[2,] 0 4
, , 2
[,1] [,2]
[1,] 5 7
[2,] 0 8
Edit : ok, so this looks like a stupid question, as I just realised that it works exactly as stated above, if you use a[b] <- 0 (single brackets). But that only works if the dimension(s) you want to span are the ones at the end. So, to make it more interesting:
How can you do this if the dimension you want to span is the first or second dimension - eg. if dim(b)==(lat, years)?
R supports matrix subsetting of arrays with the [ operator (i.e. single bracket, not double - the double bracket will always only return a single element):
a[b] <- 0
a
, , 1
[,1] [,2]
[1,] 1 3
[2,] 0 4
, , 2
[,1] [,2]
[1,] 5 7
[2,] 0 8
Notice that this is somewhat different from the result you specify in your question. In your question, the second element (i.e. bottom left element of the matrix) is 1, thus you would expect the second element of each array slice to be modified. (In other words not the first, as you have in your example.)