Assume we have the following logical matrix in R:
A <- matrix(as.logical(c(0,0,0,1,0,1,0,0,1,0,0,0)), nrow=4)
# [,1] [,2] [,3]
# [1,] FALSE FALSE TRUE
# [2,] FALSE TRUE FALSE
# [3,] FALSE FALSE FALSE
# [4,] TRUE FALSE FALSE
I want to convert this matrix into a column-wise index using
B <- column_wise_index(A)
where column_wise_index returns a vector containing the same number of elements as the number of rows in A (4), and each element contains the column of A that has a logical value TRUE. For A above, B should resemble
B <- c(3,2,0,1)
# [1] 3 2 0 1
where 0 indicates a row that has no TRUE value.
The closest I've come is applying which by row:
unlist(apply(A, 1, function(x) which(x)))
# [1] 3 2 1
However, the result skips 0, and I'm not sure how efficient this is for large matrices (say ~100K x 100 entries).
Here is a solution that is more in the spirit of how you started, but you have to admire #rawr's clever solution.
A <- matrix(as.logical(c(0,0,0,1,0,1,0,0,1,0,0,0)), nrow=4)
TrueSpots = apply(A, 1, which)
TrueSpots[!sapply(TrueSpots, length)] = 0
unlist(TrueSpots)
[1] 3 2 0 1
Update including #akrun's suggestion:
TrueSpots = apply(A, 1, which)
TrueSpots[!lengths(TrueSpots)] = 0
unlist(TrueSpots)
[1] 3 2 0 1
max.col(A) identifies the index where the maximum entry occurs within the row. Ties are broken at random (by default). rowSums(A) on a logical matrix performs a per-row binary addition.
Based on the assumption that each row has at most one TRUE value, rowSums(A) will result in a binary vector. Performing a vector-based multiplication nullifies the truly FALSE rows in A.
> A <- matrix(as.logical(c(0,0,0,1,0,1,0,0,1,0,0,0)), nrow=4)
> max.col(A)*rowSums(A)
[1] 3 2 0 1
Related
Quite a basic question with matrix in R: let's assume we have a matrix m and we want to select some of its element according to a boolean matrix (of same dim) b but keeping the original dimension e.g.
m <- matrix(1:9,3,3)
b <- m > 6 # just to have a boolean matrix
m[b]
# [1] 7 8 9
# Desired:
[,1] [,2] [,3]
[1,] FALSE FALSE 7
[2,] FALSE FALSE 8
[3,] FALSE FALSE 9
Is there an easy way to do that (like drop=FALSE when selecting a given column of the matrix to prevent its cast in numeric) or do I have to manually rebuild the original matrix?
Edit
Following #Darren Tsai comment NA instead of FALSE would be fine
m[m <= 6] <- NA
achieves your expected output, but its logic might be different from that of your question. The following solution is some complex but I think the concept meets drop = FALSE.
b <- m > 6
replace(array(dim = dim(m)), b, m[b])
# or
`[<-`(array(dim = dim(m)), b, m[b])
# [,1] [,2] [,3]
# [1,] NA NA 7
# [2,] NA NA 8
# [3,] NA NA 9
You seem to be confusing changing values in your matrix m with creating a new matrix. Your b is just an indexer. Remember, in R , a matrix is just a vector with a class attribute that lets you display, and index, it with two dimensions.
If you want to know row and column, then
b <- which(m > 6, arr.ind =TRUE)
will give you the coordinate pairs, useful for subsetting the matrix later on.
I have two datasets from 10 people. One is a vector, and the other is a matrix. What I want to see is if the first element of the vector includes in the first row of the matrix, and if the second element of the vector includes in the second row of the matrix, and so on.
so, I changed the vector into a matrix and used apply to compare them row-wise. But, the result was not that correct.
Here is the datasets.
df1<-matrix(c(rep(0,10),2,4,7,6,5,7,4,2,2,2),ncol=2)
df1
# [,1] [,2]
# [1,] 0 2
# [2,] 0 4
# [3,] 0 7
# [4,] 0 6
# [5,] 0 5
# [6,] 0 7
# [7,] 0 4
# [8,] 0 2
# [9,] 0 2
#[10,] 0 2
df2<-c(1,3,6,4,1,3,3,2,2,5)
df2<-as.matrix(df2)
apply(df2, 1, function(x) any(x==df1))
# [1] FALSE FALSE TRUE TRUE FALSE FALSE FALSE TRUE TRUE TRUE
However, the result must be all FALSE but 8th and 9th.
Can anyone correct the function? Thanks!
This vectorized code should be very efficient:
> as.logical( rowSums(df1==df2))
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE FALSE
Here are a few approaches you could take
Two calls to apply
#
# 1 by column to check if the values are equal
# then by row to see if any rows contain TRUE
apply(apply(df1,2,`==`,df2),1,any)
Use sapply and seq_along
sapply(seq_along(df2), function(x, y, i) y[i] %in% x[i, ], y = df2 ,x = df1)
repeat df2 to the same length as df1 and then compare
rowSums(df1==rep(df2, length = length(df1))) > 0
Thank you for your kind reply to my previous questions. I have two lists: list1 and list2. I would like to know if each object of list1 is contained in each object of list2. For example:
> list1
[[1]]
[1] 1
[[2]]
[1] 2
[[3]]
[1] 3
> list2
[[1]]
[1] 1 2 3
[[2]]
[1] 2 3
[[3]]
[1] 2 3
Here are my questions:
1.) How do you I ask R to check if an object is a subset of another object in a list?
For instance I would like to check if list2[[3]]={2,3} is contained in (subset of) list1[[2]]={2}. When I do list2[[3]] %in% list1[[2]], I get [1] TRUE FALSE. However, this is not what I desire to do?! I just want to check if list2[[3]] is a subset of list1[[2]], i.e. is {2,3} \subset of {3} as in the set theoretic notion? I do not want to perform elementwise check as R seems to be doing with the %in% command. Any suggestions?
2.) Is there some sort of way to efficiently make all pairwise subset comparisons (i.e. list1[[i]] subset of list2[[j]], for all i,j combinations? Would something like outer(list1,list2, func.subset) work once question number 1 is answered?
Thank you for your feedback!
setdiff compares unique values
length(setdiff(5, 1:5)) == 0
Alternatively, all(x %in% y) will work nicely.
To do all comparisons, something like this would work:
dt <- expand.grid(list1,list2)
dt$subset <- apply(dt,1, function(.v) all(.v[[1]] %in% .v[[2]]) )
Var1 Var2 subset
1 1 1, 2, 3 TRUE
2 2 1, 2, 3 TRUE
3 3 1, 2, 3 TRUE
4 1 2, 3 FALSE
5 2 2, 3 TRUE
6 3 2, 3 TRUE
7 1 2, 3 FALSE
8 2 2, 3 TRUE
9 3 2, 3 TRUE
Note that the expand.grid isn't the fastest way to do this when dealing with a lot of data (dwin's solution is better in that regard) but it allows you to quickly check visually whether this is doing what you want.
You can use the sets package as follows:
library(sets)
is.subset <- function(x, y) as.set(x) <= as.set(y)
outer(list1, list2, Vectorize(is.subset))
# [,1] [,2] [,3]
# [1,] TRUE FALSE FALSE
# [2,] TRUE TRUE TRUE
# [3,] TRUE TRUE TRUE
#Michael or #DWin's base version of is.subset will work just as well, but for part two of your question, I'd maintain that outer is the way to go.
is.subset <- function(x,y) {length(setdiff(x,y)) == 0}
First the combos of list1 elements that are subsets of list2 items:
> sapply(1:length(list1), function(i1) sapply(1:length(list2),
function(i2) is.subset(list1[[i1]], list2[[i2]]) ) )
[,1] [,2] [,3]
[1,] TRUE TRUE TRUE
[2,] FALSE TRUE TRUE
[3,] FALSE TRUE TRUE
Then the unsurprising lack of any of the list2 items (all of length > 1) that are subsets of list one items (all of length 1):
> sapply(1:length(list1), function(i1) sapply(1:length(list2),
function(i2) is.subset(list2[[i2]], list1[[i1]]) ) )
[,1] [,2] [,3]
[1,] FALSE FALSE FALSE
[2,] FALSE FALSE FALSE
[3,] FALSE FALSE FALSE
adding to #Michael's, here's a neat way to avoid the messiness of expand.grid using the AsIs function:
list2 <- list(1:3,2:3,2:3)
a <- data.frame(list1 = 1:3, I(list2))
a$subset <- apply(a, 1, function(.v) all(.v[[1]] %in% .v[[2]]) )
list1 list2 subset
1 1 1, 2, 3 TRUE
2 2 2, 3 TRUE
3 3 2, 3 TRUE
As fast as possible, I would like to replace the first zeros in some rows of a matrix with values stored in another vector.
There is a numeric matrix where each row is a vector with some zeros.
I also have two vectors, one containing the rows, in what to be replaced, and another the new values: replace.in.these.rows and new.values. Also, I can generate the vector of first zeroes with sapply
mat <- matrix(1,5,5)
mat[c(1,8,10,14,16,22,14)] <- 0
replace.in.these.rows <- c(1,2,3)
new.values <- c(91,92,93)
corresponding.poz.of.1st.zero <- sapply(replace.in.these.rows,
function(x) which(mat [x,] == 0)[1] )
Now I would like something that iterates over the index vectors, but without a for loop possibly:
matrix[replace.in.these.rows, corresponding.poz.of.the.1st.zero ] <- new.values
Is there a trick with indexing more than simple vectors? It could not use list or array(e.g.-by-column) as index.
By default R matrices are a set of column vectors. Do I gain anything if I store the data in a transposed form? It would mean to work on columns instead of rows.
Context:
This matrix stores contact ID-s of a network. This is not an adjacency matrix n x n, rather n x max.number.of.partners (or n*=30) matrix.
The network uses edgelist by default, but I wanted to store the "all links from X" together.
I assumed, but not sure if this is more efficient than always extract the information from the edgelist (multiple times each round in a simulation)
I also assumed that this linearly growing matrix form is faster than storing the same information in a same formatted list.
Some comments on these contextual assumptions are also welcome.
Edit: If only the first zeros are to be replace then this approach works:
first0s <-apply(mat[replace.in.these.rows, ] , 1, function(x) which(x==0)[1])
mat[cbind(replace.in.these.rows, first0s)] <- new.values
> mat
[,1] [,2] [,3] [,4] [,5]
[1,] 91 1 1 0 1
[2,] 1 1 1 1 92
[3,] 1 93 1 1 1
[4,] 1 1 0 1 1
[5,] 1 0 1 1 1
Edit: I thought that the goal was to replace all zeros in the chosen rows and this was the approach. A completely vectorized approach:
idxs <- which(mat==0, arr.ind=TRUE)
# This returns that rows and columns that identify the zero elements
# idxs[,"row"] %in% replace.in.these.rows
# [1] TRUE TRUE FALSE FALSE TRUE TRUE
# That isolates the ones you want.
# idxs[ idxs[,"row"] %in% replace.in.these.rows , ]
# that shows what you will supply as the two column argument to "["
# row col
#[1,] 1 1
#[2,] 3 2
#[3,] 1 4
#[4,] 2 5
chosen.ones <- idxs[ idxs[,"row"] %in% replace.in.these.rows , ]
mat[chosen.ones] <- new.values[chosen.ones[,"row"]]
# Replace the zeros with the values chosen (and duplicated if necessary) by "row".
mat
#---------
[,1] [,2] [,3] [,4] [,5]
[1,] 91 1 1 91 1
[2,] 1 1 1 1 92
[3,] 1 93 1 1 1
[4,] 1 1 0 1 1
[5,] 1 0 1 1 1
In a matrix, if there is some missing data recorded as NA.
how could I delete rows with NA in the matrix?
can I use na.rm?
na.omit() will take matrices (and data frames) and return only those rows with no NA values whatsoever - it takes complete.cases() one step further by deleting the FALSE rows for you.
> x <- data.frame(c(1,2,3), c(4, NA, 6))
> x
c.1..2..3. c.4..NA..6.
1 1 4
2 2 NA
3 3 6
> na.omit(x)
c.1..2..3. c.4..NA..6.
1 1 4
3 3 6
I think na.rm usually only works within functions, say for the mean function. I would go with complete.cases: http://stat.ethz.ch/R-manual/R-patched/library/stats/html/complete.cases.htm
let's say you have the following 3x3-matrix:
x <- matrix(c(1:8, NA), 3, 3)
> x
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 NA
then you can get the complete cases of this matrix with
y <- x[complete.cases(x),]
> y
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
The complete.cases-function returns a vector of truth values that says whether or not a case is complete:
> complete.cases(x)
[1] TRUE TRUE FALSE
and then you index the rows of matrix x and add the "," to say that you want all columns.
If you want to remove rows that contain NA's you can use apply() to apply a quick function to check each row. E.g., if your matrix is x,
goodIdx <- apply(x, 1, function(r) !any(is.na(r)))
newX <- x[goodIdx,]