How to merge logical vectors into a new column - r

What I have are many columns of logical vectors, and would like to be able to merge 2 or more columns into one, and if there is any TRUE in the row to only get that a TRUE in the merged column.
Here is an example of 2 columns and the various combinations
X <- c(T,F,T,F,F,T,F,T,T,F,F,F)
Y <- matrix(X,nrow = 6, ncol = 2)
Y
[,1] [,2]
[1,] TRUE FALSE
[2,] FALSE TRUE
[3,] TRUE TRUE
[4,] FALSE FALSE
[5,] FALSE FALSE
[6,] TRUE FALSE
How to create a 3rd column "adding" the true and leaving behind if both say False, and would this also work if there were 3 or more columns to be added?

If you have logical vectors in all the columns, you can use rowSums
cbind(Y, rowSums(Y) > 0)
# [,1] [,2] [,3]
#[1,] TRUE FALSE TRUE
#[2,] FALSE TRUE TRUE
#[3,] TRUE TRUE TRUE
#[4,] FALSE FALSE FALSE
#[5,] FALSE FALSE FALSE
#[6,] TRUE FALSE TRUE
This will return TRUE if there is at least 1 TRUE in any of the row and FALSE otherwise. This would also work for any number of columns.

Use the below code based on Base R
X <- c(T,F,T,F,F,T,F,T,T,F,F,F)
Y <- as.data.frame(matrix(X,nrow = 6, ncol = 2))
unique(Y$V1)
Y$condition <- ifelse(Y$V1 == "TRUE" | Y$V2 == "TRUE","TRUE","FALSE")

Here is a possible solution using apply() and logical operator | that will work for any number of columns of Y.
result = cbind(Y, apply(Y, 1, FUN = function (x) Reduce(f="|", x)))
result
# [,1] [,2] [,3]
# [1,] TRUE FALSE TRUE
# [2,] FALSE TRUE TRUE
# [3,] TRUE TRUE TRUE
# [4,] FALSE FALSE FALSE
# [5,] FALSE FALSE FALSE
# [6,] TRUE FALSE TRUE

Related

I cannot convert my list to a lower triangular matrix

I have two (False and True) lower triangular matrices. I would like to convert all the true values by my chosen number. I wrote the code but it does not work. Yhe given example is very similar to my problem (very complicated).
Here is my code:
> Matrix
[[1]]
[,1] [,2] [,3] [,4] [,5]
[1,] FALSE FALSE FALSE FALSE FALSE
[2,] TRUE FALSE FALSE FALSE FALSE
[3,] TRUE TRUE FALSE FALSE FALSE
[4,] TRUE TRUE TRUE FALSE FALSE
[5,] TRUE TRUE TRUE TRUE FALSE
[[2]]
[,1] [,2] [,3] [,4] [,5]
[1,] FALSE FALSE FALSE FALSE FALSE
[2,] TRUE FALSE FALSE FALSE FALSE
[3,] TRUE TRUE FALSE FALSE FALSE
[4,] TRUE TRUE TRUE FALSE FALSE
[5,] TRUE TRUE TRUE TRUE FALSE
np1 <- 10
np2 <- 10
np <-list(np1, np2)
Here is the numbers that need to take the places of the TRUE values.
new <- list(c(1,2,3,4,5,6,7,8,9,10),c(1,2,3,4,5,6,7,8,9,10))
To do so, I wrote this code:
new1 <- lapply(1:2, function(i) matrix(0, 5, 5))
new1 <- lapply(1:2, function(i){new1[[i]][Matrix[[i]]] <- new[[i]][1:np[[i]]]})
Try the following.
new1 <- lapply(1:2, function(i) matrix(0, 5, 5))
new1 <- lapply(1:2, function(i){new1[[i]][Matrix[[i]]][] <- 1:np[[i]]; new1[[i]]})
new1
You missed the empty [] that forces R to keep the dimensions of the object, in this case the 5x5 matrices new1. Also, the anonymous function in the second lapply needs to return something, so maybe the following exact equivalent is more readable.
new1 <- lapply(1:2, function(i){
new1[[i]][Matrix[[i]]][] <- 1:np[[i]] # note the last []
new1[[i]]
})

How many elements of a vector are smaller or equal to each element of this vector?

I am interested in writing a program that gives the number of elements of vector x that are smaller or equal to any given value within vector x.
Let's say
x = [1,3,8,7,6,4,3,10,12]
I want to calculate the number of elements within x which are smaller or equal to 1, to 3, to 8 etc. For example the fifth element of x[5] is 6 and the number of elements smaller or equal to 6 equals to 5. However, I only know how to do an element-wise comparison, e.g x[1]<=x[3]
I suppose that I will be using the for loop and have something like this here:
for (i in length(x)){
if (x[i]<=x[i]){
print(x[i])}
# count number of TRUEs
}
However, this code obviously does not do what I want.
Use outer to make all comparisons at once:
outer(x, x, "<=")
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
# [1,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
# [2,] FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
# [3,] FALSE FALSE TRUE FALSE FALSE FALSE FALSE TRUE TRUE
# [4,] FALSE FALSE TRUE TRUE FALSE FALSE FALSE TRUE TRUE
# [5,] FALSE FALSE TRUE TRUE TRUE FALSE FALSE TRUE TRUE
# [6,] FALSE FALSE TRUE TRUE TRUE TRUE FALSE TRUE TRUE
# [7,] FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
# [8,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE
# [9,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
colSums(outer(x, x, "<="))
#[1] 1 3 7 6 5 4 3 8 9
You can also use the *apply family as follows,
sapply(x, function(i) sum(x <= i))
#[1] 1 3 7 6 5 4 3 8 9
We can use findInterval
findInterval(x, sort(x))
#[1] 1 3 7 6 5 4 3 8 9
Another alternative is to use rank, which ranks the values. Setting the ties.method argument to "max" retrieves the inclusive value ("<=" versus "<").
rank(x, ties.method="max")
[1] 1 3 7 6 5 4 3 8 9

Matrix comparing each element in vector1 to each element in vector2

I want to compare each element in one vector (D) to each element in another vector (E) such that I get a matrix with dimensions length(D)xlength(E).
The comparison in question is of the form:
abs(D[i]-E[j])<0.1
So for
D <- c(1:5)
E <- c(2:6)
I want to get
[,1] [,2] [,3] [,4] [,5]
[1,] FALSE TRUE FALSE FALSE FALSE
[2,] FALSE FALSE TRUE FALSE FALSE
[3,] FALSE FALSE FALSE TRUE FALSE
[4,] FALSE FALSE FALSE FALSE TRUE
[5,] FALSE FALSE FALSE FALSE FALSE
(Or 1s and 0s to the same effect)
I have been able to get that output by doing something clunky like:
rbind(D%in%E[1],D%in%E[2],D%in%E[3],D%in%E[4],D%in%E[5])
and I could write a loop for 1:length(E), but surely there is a simple name and simple code for this operation? I have been struggling to find the language to search for an answer to this question.
You can use outer to perform the calculation in a vectorized manner across all pairs of elements in D and E:
outer(E, D, function(x, y) abs(x-y) <= 0.1)
# [,1] [,2] [,3] [,4] [,5]
# [1,] FALSE TRUE FALSE FALSE FALSE
# [2,] FALSE FALSE TRUE FALSE FALSE
# [3,] FALSE FALSE FALSE TRUE FALSE
# [4,] FALSE FALSE FALSE FALSE TRUE
# [5,] FALSE FALSE FALSE FALSE FALSE
I see two benefits over the sort of approach you've included in your question:
It is less typing
It is more efficient: the function is called just once with every single pair of x and y values, so it should be quicker than comparing E[1] against every element of D, then E[2], and so on.
Actually a direct approach would be (thanks to #alexis_laz):
n = length(E)
abs(E - matrix(D, ncol=n, nrow=n, byrow=T))<0.1
# [,1] [,2] [,3] [,4] [,5]
#[1,] FALSE TRUE FALSE FALSE FALSE
#[2,] FALSE FALSE TRUE FALSE FALSE
#[3,] FALSE FALSE FALSE TRUE FALSE
#[4,] FALSE FALSE FALSE FALSE TRUE
#[5,] FALSE FALSE FALSE FALSE FALSE

retrieve specific entries of a matrix based on values from a data frame

I have a data frame of the form:
my.df = data.frame(ID=c(1,2,3,4,5,6,7), STRAND=c('+','+','+','-','+','-','+'), COLLAPSE=c(0,0,1,0,1,0,0))
and another matrix of dimensions nrow(mydf) by nrow(my.df). It is a correlation matrix, but that's not important for the discussion.
For example:
mat = matrix(rnorm(n=nrow(my.df)*nrow(my.df),mean=1,sd=1), nrow = nrow(my.df), ncol=nrow(my.df))
The question is how to retrieve only the upper triangle elements from matrix mat, such that my.df have values of COLLAPSE == 0, and are of the of the same strand?
In this specific example, I'd interested in retrieving the following entries from matrix mat in a vector:
mat[1,2]
mat[1,7]
mat[2,7]
mat[4,6]
The logic is as follows, 1,2 are both of the same strand, and it's collapse value is equal to zero so should be retrieved, 3 would never be combined with any other row because it has collapse value = 1, 1,3 are of the same strand and have collapse value = 0 so should also be retrieved,...
I could write a for loop but I am looking for a more crantastic way to achieve such results...
Here's one way to do it using outer:
First, find indices with identical STRAND values and where COLLAPSE == 0:
idx <- with(my.df, outer(STRAND, STRAND, "==") &
outer(COLLAPSE, COLLAPSE, Vectorize(function(x, y) !any(x, y))))
# [,1] [,2] [,3] [,4] [,5] [,6] [,7]
# [1,] FALSE TRUE FALSE FALSE FALSE FALSE TRUE
# [2,] TRUE FALSE FALSE FALSE FALSE FALSE TRUE
# [3,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE
# [4,] FALSE FALSE FALSE FALSE FALSE TRUE FALSE
# [5,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE
# [6,] FALSE FALSE FALSE TRUE FALSE FALSE FALSE
# [7,] TRUE TRUE FALSE FALSE FALSE FALSE FALSE
Second, set values in lower triangle and on the diagonal to FALSE. Create a numeric index:
idx2 <- which(idx & upper.tri(idx), arr.ind = TRUE)
# row col
# [1,] 1 2
# [2,] 4 6
# [3,] 1 7
# [4,] 2 7
Extract values:
mat[idx2]
# [1] 1.72165093 0.05645659 0.74163428 3.83420241
Here's one way to do it.
# select only the 0 collapse records
sel <- my.df$COLLAPSE==0
# split the data frame by strand
groups <- split(my.df$ID[sel], my.df$STRAND[sel])
# generate all possible pairs of IDs within the same strand
pairs <- lapply(groups, combn, 2)
# subset the entries from the matrix
lapply(pairs, function(ij) mat[t(ij)])
df <- my.df[my.df$COLLAPSE == 0, ]
strand <- c("+", "-")
idx <- do.call(rbind, lapply(strand, function(strand){
t(combn(x = df$ID[df$STRAND == strand], m = 2))
}))
idx
# [,1] [,2]
# [1,] 1 2
# [2,] 1 7
# [3,] 2 7
# [4,] 4 6
mat[idx]

Vectorize comparison of values in dataframe

I am trying to compare the value of a parameter in each row of a dataframe with the value of the same parameter of all other rows. The result is a matrix that that is TRUE/FALSE on the intersection of each row with each row. It is pretty simple to implement this in a loop-based manner, but takes too much processing time with a large dataframe. I am blanking on a way to "vectorize" this code (use apply?) and speed up the processing code. Many thanks in advance.
The code that i use so far;
#dim matrix
adjm<- matrix(0,nrow=nrow(df),ncol=nrow(df))
#score
for(i in 1:nrow(df)){
for(t in 1:nrow(df)){
adjm[t,i]=df$varA[i]==df$varA[t]
}
}
You can use outer to vectorize your code
outer(df$varA, df$varA, "==")
For example
df <- data.frame(varA = c(1, 2, 1, 3, 4, 2))
outer(df$varA, df$varA, "==")
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] TRUE FALSE TRUE FALSE FALSE FALSE
## [2,] FALSE TRUE FALSE FALSE FALSE TRUE
## [3,] TRUE FALSE TRUE FALSE FALSE FALSE
## [4,] FALSE FALSE FALSE TRUE FALSE FALSE
## [5,] FALSE FALSE FALSE FALSE TRUE FALSE
## [6,] FALSE TRUE FALSE FALSE FALSE TRUE
With apply:
apply(df,1,function(x) x[1] == df$varA) # `1` should be column number for `varA`
But that's not technically vectorized.

Resources