I have this matrix
[,1] [,2] [,3] [,4]
[1,] FALSE TRUE TRUE TRUE
[2,] TRUE TRUE FALSE TRUE
[3,] TRUE TRUE TRUE TRUE
[4,] FALSE TRUE TRUE FALSE
[5,] TRUE TRUE TRUE TRUE
[6,] TRUE TRUE FALSE TRUE
[7,] TRUE TRUE FALSE TRUE
[8,] TRUE FALSE TRUE FALSE
[9,] TRUE TRUE TRUE TRUE
[10,] TRUE TRUE TRUE TRUE
I need to count how many times TRUE and FALSE appears on each of the columns. How can i do that? Thanks
We could use colSums (assuming it is a logical matrix)
n_trues <- colSums(m1)
n_false <- nrow(m1) - n_trues
Or another option is table by column
apply(m1, 2, table)
Related
What I have are many columns of logical vectors, and would like to be able to merge 2 or more columns into one, and if there is any TRUE in the row to only get that a TRUE in the merged column.
Here is an example of 2 columns and the various combinations
X <- c(T,F,T,F,F,T,F,T,T,F,F,F)
Y <- matrix(X,nrow = 6, ncol = 2)
Y
[,1] [,2]
[1,] TRUE FALSE
[2,] FALSE TRUE
[3,] TRUE TRUE
[4,] FALSE FALSE
[5,] FALSE FALSE
[6,] TRUE FALSE
How to create a 3rd column "adding" the true and leaving behind if both say False, and would this also work if there were 3 or more columns to be added?
If you have logical vectors in all the columns, you can use rowSums
cbind(Y, rowSums(Y) > 0)
# [,1] [,2] [,3]
#[1,] TRUE FALSE TRUE
#[2,] FALSE TRUE TRUE
#[3,] TRUE TRUE TRUE
#[4,] FALSE FALSE FALSE
#[5,] FALSE FALSE FALSE
#[6,] TRUE FALSE TRUE
This will return TRUE if there is at least 1 TRUE in any of the row and FALSE otherwise. This would also work for any number of columns.
Use the below code based on Base R
X <- c(T,F,T,F,F,T,F,T,T,F,F,F)
Y <- as.data.frame(matrix(X,nrow = 6, ncol = 2))
unique(Y$V1)
Y$condition <- ifelse(Y$V1 == "TRUE" | Y$V2 == "TRUE","TRUE","FALSE")
Here is a possible solution using apply() and logical operator | that will work for any number of columns of Y.
result = cbind(Y, apply(Y, 1, FUN = function (x) Reduce(f="|", x)))
result
# [,1] [,2] [,3]
# [1,] TRUE FALSE TRUE
# [2,] FALSE TRUE TRUE
# [3,] TRUE TRUE TRUE
# [4,] FALSE FALSE FALSE
# [5,] FALSE FALSE FALSE
# [6,] TRUE FALSE TRUE
Consider the data frame in R:
set.seed(36)
y <- runif(10,0,200)
group <- sample(rep(1:2, each=5))
d <- data.frame(y, group)
I want to compare all y against all y within each group. The following codes do this correctly:
d_split <- split(d, d$group)
a <- with(d_split[[1]],outer(y, y, "<="))
b <- with(d_split[[2]],outer(y, y, "<="))
But while I am doing this inside a function, and the number of group varies (group will be an argument of that function), then I cannot proceed in this manner. How can I elegantly write the last three line codes to compare all y against all y within each group?
To perform the same operation for multiple groups we can use lapply and perform the outer operation for every group.
lapply(split(d, d$group), function(x) outer(x[["y"]], x[["y"]], "<="))
#$`1`
# [,1] [,2] [,3] [,4] [,5]
#[1,] TRUE TRUE FALSE FALSE FALSE
#[2,] FALSE TRUE FALSE FALSE FALSE
#[3,] TRUE TRUE TRUE FALSE TRUE
#[4,] TRUE TRUE TRUE TRUE TRUE
#[5,] TRUE TRUE FALSE FALSE TRUE
#$`2`
# [,1] [,2] [,3] [,4] [,5]
#[1,] TRUE TRUE FALSE TRUE FALSE
#[2,] FALSE TRUE FALSE TRUE FALSE
#[3,] TRUE TRUE TRUE TRUE TRUE
#[4,] FALSE FALSE FALSE TRUE FALSE
#[5,] TRUE TRUE FALSE TRUE TRUE
Here is an option without splitting
library(data.table)
setDT(d)[, as.data.table(outer(y, y, "<=")), group]
# group V1 V2 V3 V4 V5
#1: 1 TRUE TRUE FALSE FALSE FALSE
#2: 1 FALSE TRUE FALSE FALSE FALSE
#3: 1 TRUE TRUE TRUE FALSE TRUE
#4: 1 TRUE TRUE TRUE TRUE TRUE
#5: 1 TRUE TRUE FALSE FALSE TRUE
#6: 2 TRUE TRUE FALSE TRUE FALSE
#7: 2 FALSE TRUE FALSE TRUE FALSE
#8: 2 TRUE TRUE TRUE TRUE TRUE
#9: 2 FALSE FALSE FALSE TRUE FALSE
#10: 2 TRUE TRUE FALSE TRUE TRUE
Or in a 'long' format with CJ
setDT(d)[, CJ(y, y), group][, V1 <= V2, group]
I have a vector representing a partitioning of objects into clusters:
#9 objects partitioned into 6 clusters
> part1 <- c(1,2,3,1,4,2,2,5,6)
I can easily create similarity matrix where the measure of similarity is just {0,1}: 0 if two elements are in different clusters, and 1, if in the same:
> sim <- outer(part1,part1,"==")
> sim
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,] TRUE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
[2,] FALSE TRUE FALSE FALSE FALSE TRUE TRUE FALSE FALSE
[3,] FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
[4,] TRUE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
[5,] FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
[6,] FALSE TRUE FALSE FALSE FALSE TRUE TRUE FALSE FALSE
[7,] FALSE TRUE FALSE FALSE FALSE TRUE TRUE FALSE FALSE
[8,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
[9,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
But for large vectors (100,000's of objects) it doesn't work due to memory limits.
Clusters are small on average, so sparse matrix would be compact enough. I've looked through Matrix package and couldn't find anything like outer() for sparse objects.
So is there any other simple way to create such matrix directly from the vector (without looping through all vector's elements' pairs and populating sparse matrix element by element)?
I want to compare each element in one vector (D) to each element in another vector (E) such that I get a matrix with dimensions length(D)xlength(E).
The comparison in question is of the form:
abs(D[i]-E[j])<0.1
So for
D <- c(1:5)
E <- c(2:6)
I want to get
[,1] [,2] [,3] [,4] [,5]
[1,] FALSE TRUE FALSE FALSE FALSE
[2,] FALSE FALSE TRUE FALSE FALSE
[3,] FALSE FALSE FALSE TRUE FALSE
[4,] FALSE FALSE FALSE FALSE TRUE
[5,] FALSE FALSE FALSE FALSE FALSE
(Or 1s and 0s to the same effect)
I have been able to get that output by doing something clunky like:
rbind(D%in%E[1],D%in%E[2],D%in%E[3],D%in%E[4],D%in%E[5])
and I could write a loop for 1:length(E), but surely there is a simple name and simple code for this operation? I have been struggling to find the language to search for an answer to this question.
You can use outer to perform the calculation in a vectorized manner across all pairs of elements in D and E:
outer(E, D, function(x, y) abs(x-y) <= 0.1)
# [,1] [,2] [,3] [,4] [,5]
# [1,] FALSE TRUE FALSE FALSE FALSE
# [2,] FALSE FALSE TRUE FALSE FALSE
# [3,] FALSE FALSE FALSE TRUE FALSE
# [4,] FALSE FALSE FALSE FALSE TRUE
# [5,] FALSE FALSE FALSE FALSE FALSE
I see two benefits over the sort of approach you've included in your question:
It is less typing
It is more efficient: the function is called just once with every single pair of x and y values, so it should be quicker than comparing E[1] against every element of D, then E[2], and so on.
Actually a direct approach would be (thanks to #alexis_laz):
n = length(E)
abs(E - matrix(D, ncol=n, nrow=n, byrow=T))<0.1
# [,1] [,2] [,3] [,4] [,5]
#[1,] FALSE TRUE FALSE FALSE FALSE
#[2,] FALSE FALSE TRUE FALSE FALSE
#[3,] FALSE FALSE FALSE TRUE FALSE
#[4,] FALSE FALSE FALSE FALSE TRUE
#[5,] FALSE FALSE FALSE FALSE FALSE
If I have this matrix (which I named data):
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,] FALSE TRUE FALSE FALSE TRUE FALSE FALSE TRUE FALSE
[2,] FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE TRUE
[3,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[4,] FALSE TRUE FALSE FALSE TRUE FALSE FALSE TRUE FALSE
[5,] FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE TRUE
[6,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[7,] FALSE TRUE FALSE FALSE TRUE FALSE FALSE TRUE FALSE
[8,] FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE TRUE
[9,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
And I want to combine the columns into one single column like this: (where one TRUE in the row equals TRUE)
[,1]
[1,] TRUE
[2,] TRUE
[3,] FALSE
[4,] TRUE
[5,] TRUE
[6,] FALSE
[7,] TRUE
[8,] TRUE
[9,] FALSE
I know I could do something like (using the |):
data2[1:9,1]<-data[,1]|data[,2]|data[,3]|data[,4]…
data2 would then contain a single column with the different columns combined. But this is not a good way if I would have lots of columns (for example ncol=100)
I guess there is some simple way of doing it?
Thanks
Here is another answer that takes advantage of how R converts between logicals and numerics:
When going from logical to numeric, FALSE becomes 0 and TRUE becomes 1 so rowSums gives you the number of TRUE per row:
rowSums(data)
# [1] 3 3 0 3 3 0 3 3 0
When going from numeric to logical, 0 becomes FALSE, anything else is TRUE, so you can feed the output of rowSums to as.logical and it will indicate if a row has at least one TRUE:
as.logical(rowSums(data))
# [1] TRUE TRUE FALSE TRUE TRUE FALSE TRUE TRUE FALSE
I like Tyler's answer though, it might be less efficient (to be proven) but I find it more intuitive.
You could use any with apply as in:
mat <- matrix(sample(c(TRUE, FALSE), 100,TRUE), 10)
apply(mat, 1, any)