Combining a matrix of TRUE/FALSE into one - r

If I have this matrix (which I named data):
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,] FALSE TRUE FALSE FALSE TRUE FALSE FALSE TRUE FALSE
[2,] FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE TRUE
[3,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[4,] FALSE TRUE FALSE FALSE TRUE FALSE FALSE TRUE FALSE
[5,] FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE TRUE
[6,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[7,] FALSE TRUE FALSE FALSE TRUE FALSE FALSE TRUE FALSE
[8,] FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE TRUE
[9,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
And I want to combine the columns into one single column like this: (where one TRUE in the row equals TRUE)
[,1]
[1,] TRUE
[2,] TRUE
[3,] FALSE
[4,] TRUE
[5,] TRUE
[6,] FALSE
[7,] TRUE
[8,] TRUE
[9,] FALSE
I know I could do something like (using the |):
data2[1:9,1]<-data[,1]|data[,2]|data[,3]|data[,4]…
data2 would then contain a single column with the different columns combined. But this is not a good way if I would have lots of columns (for example ncol=100)
I guess there is some simple way of doing it?
Thanks

Here is another answer that takes advantage of how R converts between logicals and numerics:
When going from logical to numeric, FALSE becomes 0 and TRUE becomes 1 so rowSums gives you the number of TRUE per row:
rowSums(data)
# [1] 3 3 0 3 3 0 3 3 0
When going from numeric to logical, 0 becomes FALSE, anything else is TRUE, so you can feed the output of rowSums to as.logical and it will indicate if a row has at least one TRUE:
as.logical(rowSums(data))
# [1] TRUE TRUE FALSE TRUE TRUE FALSE TRUE TRUE FALSE
I like Tyler's answer though, it might be less efficient (to be proven) but I find it more intuitive.

You could use any with apply as in:
mat <- matrix(sample(c(TRUE, FALSE), 100,TRUE), 10)
apply(mat, 1, any)

Related

Count how many string value appears in a column R Programming

I have this matrix
[,1] [,2] [,3] [,4]
[1,] FALSE TRUE TRUE TRUE
[2,] TRUE TRUE FALSE TRUE
[3,] TRUE TRUE TRUE TRUE
[4,] FALSE TRUE TRUE FALSE
[5,] TRUE TRUE TRUE TRUE
[6,] TRUE TRUE FALSE TRUE
[7,] TRUE TRUE FALSE TRUE
[8,] TRUE FALSE TRUE FALSE
[9,] TRUE TRUE TRUE TRUE
[10,] TRUE TRUE TRUE TRUE
I need to count how many times TRUE and FALSE appears on each of the columns. How can i do that? Thanks
We could use colSums (assuming it is a logical matrix)
n_trues <- colSums(m1)
n_false <- nrow(m1) - n_trues
Or another option is table by column
apply(m1, 2, table)

How many elements of a vector are smaller or equal to each element of this vector?

I am interested in writing a program that gives the number of elements of vector x that are smaller or equal to any given value within vector x.
Let's say
x = [1,3,8,7,6,4,3,10,12]
I want to calculate the number of elements within x which are smaller or equal to 1, to 3, to 8 etc. For example the fifth element of x[5] is 6 and the number of elements smaller or equal to 6 equals to 5. However, I only know how to do an element-wise comparison, e.g x[1]<=x[3]
I suppose that I will be using the for loop and have something like this here:
for (i in length(x)){
if (x[i]<=x[i]){
print(x[i])}
# count number of TRUEs
}
However, this code obviously does not do what I want.
Use outer to make all comparisons at once:
outer(x, x, "<=")
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
# [1,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
# [2,] FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
# [3,] FALSE FALSE TRUE FALSE FALSE FALSE FALSE TRUE TRUE
# [4,] FALSE FALSE TRUE TRUE FALSE FALSE FALSE TRUE TRUE
# [5,] FALSE FALSE TRUE TRUE TRUE FALSE FALSE TRUE TRUE
# [6,] FALSE FALSE TRUE TRUE TRUE TRUE FALSE TRUE TRUE
# [7,] FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
# [8,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE
# [9,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
colSums(outer(x, x, "<="))
#[1] 1 3 7 6 5 4 3 8 9
You can also use the *apply family as follows,
sapply(x, function(i) sum(x <= i))
#[1] 1 3 7 6 5 4 3 8 9
We can use findInterval
findInterval(x, sort(x))
#[1] 1 3 7 6 5 4 3 8 9
Another alternative is to use rank, which ranks the values. Setting the ties.method argument to "max" retrieves the inclusive value ("<=" versus "<").
rank(x, ties.method="max")
[1] 1 3 7 6 5 4 3 8 9

How to create large similarity matrix as a sparse one, from partitioning data?

I have a vector representing a partitioning of objects into clusters:
#9 objects partitioned into 6 clusters
> part1 <- c(1,2,3,1,4,2,2,5,6)
I can easily create similarity matrix where the measure of similarity is just {0,1}: 0 if two elements are in different clusters, and 1, if in the same:
> sim <- outer(part1,part1,"==")
> sim
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,] TRUE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
[2,] FALSE TRUE FALSE FALSE FALSE TRUE TRUE FALSE FALSE
[3,] FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
[4,] TRUE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
[5,] FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
[6,] FALSE TRUE FALSE FALSE FALSE TRUE TRUE FALSE FALSE
[7,] FALSE TRUE FALSE FALSE FALSE TRUE TRUE FALSE FALSE
[8,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
[9,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
But for large vectors (100,000's of objects) it doesn't work due to memory limits.
Clusters are small on average, so sparse matrix would be compact enough. I've looked through Matrix package and couldn't find anything like outer() for sparse objects.
So is there any other simple way to create such matrix directly from the vector (without looping through all vector's elements' pairs and populating sparse matrix element by element)?

Matrix comparing each element in vector1 to each element in vector2

I want to compare each element in one vector (D) to each element in another vector (E) such that I get a matrix with dimensions length(D)xlength(E).
The comparison in question is of the form:
abs(D[i]-E[j])<0.1
So for
D <- c(1:5)
E <- c(2:6)
I want to get
[,1] [,2] [,3] [,4] [,5]
[1,] FALSE TRUE FALSE FALSE FALSE
[2,] FALSE FALSE TRUE FALSE FALSE
[3,] FALSE FALSE FALSE TRUE FALSE
[4,] FALSE FALSE FALSE FALSE TRUE
[5,] FALSE FALSE FALSE FALSE FALSE
(Or 1s and 0s to the same effect)
I have been able to get that output by doing something clunky like:
rbind(D%in%E[1],D%in%E[2],D%in%E[3],D%in%E[4],D%in%E[5])
and I could write a loop for 1:length(E), but surely there is a simple name and simple code for this operation? I have been struggling to find the language to search for an answer to this question.
You can use outer to perform the calculation in a vectorized manner across all pairs of elements in D and E:
outer(E, D, function(x, y) abs(x-y) <= 0.1)
# [,1] [,2] [,3] [,4] [,5]
# [1,] FALSE TRUE FALSE FALSE FALSE
# [2,] FALSE FALSE TRUE FALSE FALSE
# [3,] FALSE FALSE FALSE TRUE FALSE
# [4,] FALSE FALSE FALSE FALSE TRUE
# [5,] FALSE FALSE FALSE FALSE FALSE
I see two benefits over the sort of approach you've included in your question:
It is less typing
It is more efficient: the function is called just once with every single pair of x and y values, so it should be quicker than comparing E[1] against every element of D, then E[2], and so on.
Actually a direct approach would be (thanks to #alexis_laz):
n = length(E)
abs(E - matrix(D, ncol=n, nrow=n, byrow=T))<0.1
# [,1] [,2] [,3] [,4] [,5]
#[1,] FALSE TRUE FALSE FALSE FALSE
#[2,] FALSE FALSE TRUE FALSE FALSE
#[3,] FALSE FALSE FALSE TRUE FALSE
#[4,] FALSE FALSE FALSE FALSE TRUE
#[5,] FALSE FALSE FALSE FALSE FALSE

Exclude multiple words from a vector with grepl [duplicate]

This question already has answers here:
Matching multiple patterns
(6 answers)
Closed 7 years ago.
Here sample data:
exclude.words <- c("zoznam","azet","dovera","joj","alza","telecom","google","post","sme")
main.data <- c("zoznam","registration","azet","azet.com","dovera","dna","joj","alza","telecom","google","post","sme")
This works if the words are equal (match exactly), however see azet.com that won't be excluded! For that we could use agrepl().
main.data[!(main.data %in% exclude.words)]
So how to use agrepl with two vectors?
main.data[!agrepl(main.data, exclude.words)]
As commented, you can use:
main.data[!grepl(paste(exclude.words, collapse = "|"), main.data)]
to exclude any words that have a partly or complete match between the main.data and exclude.words.
paste(exclude.words, collapse = "|")
creates a single string with "|" (logical OR) between the exclude.words which can be used as a single pattern in grepl. Therefore, you don't need to loop over the single words.
main.data[!as.logical(rowSums(sapply(exclude.words, function(x) agrepl(x, main.data))))]
# [1] "registration" "dna"
# clarification
sapply(exclude.words, function(x) agrepl(x, main.data))
# zoznam azet dovera joj alza telecom google post sme
# [1,] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
# [2,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
# [3,] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
# [4,] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
# [5,] FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
# [6,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
# [7,] FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
# [8,] FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
# [9,] FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE
# [10,] FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE
# [11,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
# [12,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
You can use this functional programming approach:
library(functional)
funcs = lapply(exclude.words, function(u) function(x) x[!grepl(u, x)])
Reduce(Compose, funcs)(main.data)
#[1] "registration" "dna"

Resources