I would like to delete rows from a large matrix using the following criteria:
Any row that contains 100 in its second column should be removed.
How can this be done? I know how to select those rows but I'm not sure how to remove them using a rule.
R > mat = matrix(c(1,2,3,100,200,300), 3,2)
R > mat
[,1] [,2]
[1,] 1 100
[2,] 2 200
[3,] 3 300
R > (index = mat[,2] == 100)
[1] TRUE FALSE FALSE
R > mat[index, ]
[1] 1 100
R > mat[!index, ]
[,1] [,2]
[1,] 2 200
[2,] 3 300
Previously I was confused by the index with another method which, here is the solution by which:
R > (index2 = which(mat[,2] == 100))
[1] 1
R > mat[-index2, ]
[,1] [,2]
[1,] 2 200
[2,] 3 300
Watch out the different use for those index (! and -).
Here's how I would do it in Matlab with a matrix A.
Option 1
for (i=size(A,1):-1:0)
if (A(i,2)==100)
A(i,:)=[];
end
end
This loops over rows (starting at the bottom), and sets any row with 100 in the 2nd element to an empty set, which effectively deletes it.
Maybe you can convert this to r, or maybe it will help somebody else who is having this problem.
Option 2
logicalIndex=(A(:,2)==100);
A(logicalIndex,:)=[];
This first finds rows with 100 in the 2nd column, then deletes them all.
Related
This question already has an answer here:
Solve homogenous system Ax = 0 for any m * n matrix A in R (find null space basis for A)
(1 answer)
Closed 4 years ago.
I am using the pracma package, which contains the function nullspace(), returning normalized basis vectors of the Null(A):
> require(pracma)
> (A = matrix(c(1,2,3,4,5,6), nrow=2, byrow=T))
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
> nullspace(A)
[,1]
[1,] 0.4082483
[2,] -0.8164966
[3,] 0.4082483
which is perfectly fine. However (don't ask), I want to quickly check the values I'd get if I were to produce the reduced row echelon form:
> rref(A)
[,1] [,2] [,3]
[1,] 1 0 -1
[2,] 0 1 2
and from there "manually" figure out the null space as
N(A) = [1, -2, 1]'
Yes, the latter is a scalar multiple of the former:
> c(1,-2,1)/nullspace(A)
[,1]
[1,] 2.44949
[2,] 2.44949
[3,] 2.44949
but I'd still like to get the latter, non-normalized form of a basis of the null space, as though the values were directly obtained from the reduced row echelon matrix.
You may want to try
B = rref(A)
solve(B[,1:2], -B[,3])
This gives you the combination your need for the first two columns to get one unit of the third column. Just add one to get your result.
Similarly for the case where size of null space is larger than one.
I was wondering if anyone could help me understand the output of this function. I know it's supposed to return the positions in which there is a run of length 2 but I am not exactly sure how to interpret the output.
fun1 = function(M,k) {
n = nrow(M)
m = ncol(M)
runs = vector('list',length=m)
for(i in 1:m) {
for(j in 1:(n-k+1)) {
if(all(M[j:(j+k-1),i]==1)) runs[[i]] = c(runs[[i]],j)
}
}
return(runs)
}
set.seed(123)
M = matrix(sample(0:1,size=15,replace=TRUE),ncol=3,nrow=5)
fun1(M,2)
Output:
[[1]]
[1] 4
[[2]]
[1] 2 3
[[3]]
[1] 3
Each element in the list is the output for a column, starting at the left-most column. The list of numbers (or NULL if there are none) gives you the row numbers in that column where there are two 1's in a row.
To interpret the sample output you have:
- In the first (left-most) column, there are two 1's starting at row 4 (M[4,1] and M[5,1] are 1)
- In the second column, there are two 1's starting at row 2 (meaning row 2 and row 3 are 1's) and also at row 3 (meaning row 3 and row 4 are 1's)
- In the third row, there are two 1's starting at row 3
You can check that this is true if you print our the matrix M, which given your seed looks like this
[,1] [,2] [,3]
[1,] 0 0 1
[2,] 1 1 0
[3,] 0 1 1
[4,] 1 1 1
[5,] 1 0 0
I hope that makes it clear.
By the way, in the future, try to format your code better with proper indentations and line breaks. I had to manually add line breaks to make the sample code work, but good job giving a seed :)
I'm trying to subset a matrix so that I only get the matrix where the first variable is larger than the second variable. I have the matrix out which is a 3000x2 matrix.
I tried
out<-out[out[,1] > out[,2]]
but this eliminates the row.names altogether, and I get a string of integers between 1 to 3000. Would there be a way to preserve the row.names?
Of note, if you only return a subset of one row to form a matrix with one dimension being unity, R will drop the row name:
m <- matrix(1:9, ncol = 3)
rownames(m) <- c("a", "b", "c")
m[1, ] # lost the row name
m[1, , drop = FALSE] # got row name back and a matrix
m[c(1,1), ] # the row name is back when result has nrow > 1
There appears to be no simple way of working around this other than checking for one-row result and assigning the row name.
A matrix is treated by R as a vector with columns and rows.
> A <- matrix(1:9, ncol=3)
# A is filled with 1,...,9 columnwise
> A
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
# only elements with even number in 2nd column of same row
> v <- A[A[,2] %% 2 == 0]
> m <- A[A[,2] %% 2 == 0,]
> v
[1] 1 3 4 6 7 9
> m
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 3 6 9
# The result of evaluating odd/even-ness of middle column.
# This boolean vector is repeated column-wise by default
# until all element's fate in A is determined.
> A[,2] %% 2 == 0
[1] TRUE FALSE TRUE
When you leave out the comma (v), then you address A as a 1-dimensional data structure and R implicitely handles your expression as a vector.
v is in that sense not "string of integers" but a vector of integers. When you add the comma, then you tell R that your condition only adresses the first dimension while indicating a second one (after the comma) - which causes R to handle your expression as a matrix (m).
I have a matrix A, how can I represent the last column, since I want to sort the matrix based on that.
> A <- matrix(rnorm(16), 4, 4)
> ncol(A)
[1] 4
> # Get the last column
> A[,ncol(A)]
[1] 0.7593943 0.0726012 2.2784912 -0.2571095
> # If you want to sort based on the last column...
> A[order(A[,ncol(A)]),]
[,1] [,2] [,3] [,4]
[1,] -0.9013910 -0.06612518 -1.51267548 -0.2571095
[2,] 0.3851738 -0.81303780 0.01062751 0.0726012
[3,] -1.6940473 -1.15323294 -1.50261705 0.7593943
[4,] 0.3120409 -0.30047966 0.59672449 2.2784912
If A is your matrix then the last column of A is:
A[,ncol(A)]
If you are not familiar with bracket indexing in R, this code selects all rows of A (since the space before the comma is blank) and then the last column of A since R indexing begins at 1 (unlike languages like Python). ncol(A) returns the number of columns in A as an integer so indexing in this way gives your desired result.
Is it possible to select a subset of a three dimensional array with a two-dimensional binary array? I would like to be able to do this so that I can push values into the selection
For example I have an array dim(a) = (lat, long, time), and I want to select with dim(b) = (lat, long) which is an array full of TRUE/FALSE values. I want to be able to do something like:
> a <- array(c(1,2,3,4,5,6,7,8),c(2,2,2))
> b <- matrix(c(0,1,0,0), c(2,2))==TRUE
> a[[b]] <- 0
> a
, , 1
[,1] [,2]
[1,] 1 3
[2,] 0 4
, , 2
[,1] [,2]
[1,] 5 7
[2,] 0 8
Edit : ok, so this looks like a stupid question, as I just realised that it works exactly as stated above, if you use a[b] <- 0 (single brackets). But that only works if the dimension(s) you want to span are the ones at the end. So, to make it more interesting:
How can you do this if the dimension you want to span is the first or second dimension - eg. if dim(b)==(lat, years)?
R supports matrix subsetting of arrays with the [ operator (i.e. single bracket, not double - the double bracket will always only return a single element):
a[b] <- 0
a
, , 1
[,1] [,2]
[1,] 1 3
[2,] 0 4
, , 2
[,1] [,2]
[1,] 5 7
[2,] 0 8
Notice that this is somewhat different from the result you specify in your question. In your question, the second element (i.e. bottom left element of the matrix) is 1, thus you would expect the second element of each array slice to be modified. (In other words not the first, as you have in your example.)