Related
I have two dataframes that look similar to this example:
> matrix(1:9, nrow = 3, ncol = 3)
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
> matrix(rexp(9), 3) < 1
[,1] [,2] [,3]
[1,] TRUE TRUE FALSE
[2,] FALSE TRUE FALSE
[3,] FALSE FALSE TRUE
I want to sum individual entries of a row, but only when the logical matrix of the same size is TRUE, else this row element should not be in the sum. All rows have at least one case where one element of matrix 2 is TRUE.
The result should look like this
[,1]
[1,] 12
[2,] 5
[3,] 9
Thanks for the help.
Multiplying your T/F matrix by your other one will zero out all the elements where FALSE. You can then sum by row.
m1 <- matrix(1:9, nrow = 3, ncol = 3)
m2 <- matrix(rexp(9), 3) < 1
as.matrix(rowSums(m1 * m2), ncol = 1)
We replace the elements to NA and use rowSums with na.rm
matrix(rowSums(replace(m1, m2, NA), na.rm = TRUE))
# [,1]
#[1,] 12
#[2,] 5
#[3,] 9
Or use ifelse
matrix(rowSums(ifelse(m2, 0, m1)))
data
m1 <- matrix(1:9, nrow = 3, ncol = 3)
m2 <- matrix(rexp(9), 3) >= 1
I want to compare each value of a row of a data.frame to its corresponding value in a vector. Here is an example:
df1 <- matrix(c(2,2,4,8,6,9,9,6,4), ncol = 3)
df2 <- c(5,4,6)
> df1
[,1] [,2] [,3]
[1,] 2 8 9
[2,] 2 6 6
[3,] 4 9 4
> df2
[1] 5 4 6
The comparison would be, if a value in a row of df1 is smaller than its corresponding value in df2, so row1: 2 < 5, 8 < 5, 9 < 5; row2: 2 < 4, 6 < 4, 6 < 4; row3: 4 < 6, 9 < 6, 4 < 6
> result
[,1] [,2] [,3]
[1,] TRUE FALSE FALSE
[2,] TRUE FALSE FALSE
[3,] TRUE FALSE TRUE
Is there any way to do this without use of a loop?
Thanks lads!
We can just do a comparison to create the logical matrix
df1 < df2
# [,1] [,2] [,3]
#[1,] TRUE FALSE FALSE
#[2,] TRUE FALSE FALSE
#[3,] TRUE FALSE TRUE
The reason why it works is based on the recycling of the vector. So, each elements of the vector 'df2', compares with the first columns 'df1', then goes to the second column and so on.
If the length of the vector is not equal to the number of columns of first dataset, we can replicate the vector
df1 < df2[row(df1)]
# [,1] [,2] [,3]
#[1,] TRUE FALSE FALSE
#[2,] TRUE FALSE FALSE
#[3,] TRUE FALSE TRUE
Or another option is sweep
sweep(df1, 1, df2, "<")
Assume we have the following logical matrix in R:
A <- matrix(as.logical(c(0,0,0,1,0,1,0,0,1,0,0,0)), nrow=4)
# [,1] [,2] [,3]
# [1,] FALSE FALSE TRUE
# [2,] FALSE TRUE FALSE
# [3,] FALSE FALSE FALSE
# [4,] TRUE FALSE FALSE
I want to convert this matrix into a column-wise index using
B <- column_wise_index(A)
where column_wise_index returns a vector containing the same number of elements as the number of rows in A (4), and each element contains the column of A that has a logical value TRUE. For A above, B should resemble
B <- c(3,2,0,1)
# [1] 3 2 0 1
where 0 indicates a row that has no TRUE value.
The closest I've come is applying which by row:
unlist(apply(A, 1, function(x) which(x)))
# [1] 3 2 1
However, the result skips 0, and I'm not sure how efficient this is for large matrices (say ~100K x 100 entries).
Here is a solution that is more in the spirit of how you started, but you have to admire #rawr's clever solution.
A <- matrix(as.logical(c(0,0,0,1,0,1,0,0,1,0,0,0)), nrow=4)
TrueSpots = apply(A, 1, which)
TrueSpots[!sapply(TrueSpots, length)] = 0
unlist(TrueSpots)
[1] 3 2 0 1
Update including #akrun's suggestion:
TrueSpots = apply(A, 1, which)
TrueSpots[!lengths(TrueSpots)] = 0
unlist(TrueSpots)
[1] 3 2 0 1
max.col(A) identifies the index where the maximum entry occurs within the row. Ties are broken at random (by default). rowSums(A) on a logical matrix performs a per-row binary addition.
Based on the assumption that each row has at most one TRUE value, rowSums(A) will result in a binary vector. Performing a vector-based multiplication nullifies the truly FALSE rows in A.
> A <- matrix(as.logical(c(0,0,0,1,0,1,0,0,1,0,0,0)), nrow=4)
> max.col(A)*rowSums(A)
[1] 3 2 0 1
I have a data frame of the form:
my.df = data.frame(ID=c(1,2,3,4,5,6,7), STRAND=c('+','+','+','-','+','-','+'), COLLAPSE=c(0,0,1,0,1,0,0))
and another matrix of dimensions nrow(mydf) by nrow(my.df). It is a correlation matrix, but that's not important for the discussion.
For example:
mat = matrix(rnorm(n=nrow(my.df)*nrow(my.df),mean=1,sd=1), nrow = nrow(my.df), ncol=nrow(my.df))
The question is how to retrieve only the upper triangle elements from matrix mat, such that my.df have values of COLLAPSE == 0, and are of the of the same strand?
In this specific example, I'd interested in retrieving the following entries from matrix mat in a vector:
mat[1,2]
mat[1,7]
mat[2,7]
mat[4,6]
The logic is as follows, 1,2 are both of the same strand, and it's collapse value is equal to zero so should be retrieved, 3 would never be combined with any other row because it has collapse value = 1, 1,3 are of the same strand and have collapse value = 0 so should also be retrieved,...
I could write a for loop but I am looking for a more crantastic way to achieve such results...
Here's one way to do it using outer:
First, find indices with identical STRAND values and where COLLAPSE == 0:
idx <- with(my.df, outer(STRAND, STRAND, "==") &
outer(COLLAPSE, COLLAPSE, Vectorize(function(x, y) !any(x, y))))
# [,1] [,2] [,3] [,4] [,5] [,6] [,7]
# [1,] FALSE TRUE FALSE FALSE FALSE FALSE TRUE
# [2,] TRUE FALSE FALSE FALSE FALSE FALSE TRUE
# [3,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE
# [4,] FALSE FALSE FALSE FALSE FALSE TRUE FALSE
# [5,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE
# [6,] FALSE FALSE FALSE TRUE FALSE FALSE FALSE
# [7,] TRUE TRUE FALSE FALSE FALSE FALSE FALSE
Second, set values in lower triangle and on the diagonal to FALSE. Create a numeric index:
idx2 <- which(idx & upper.tri(idx), arr.ind = TRUE)
# row col
# [1,] 1 2
# [2,] 4 6
# [3,] 1 7
# [4,] 2 7
Extract values:
mat[idx2]
# [1] 1.72165093 0.05645659 0.74163428 3.83420241
Here's one way to do it.
# select only the 0 collapse records
sel <- my.df$COLLAPSE==0
# split the data frame by strand
groups <- split(my.df$ID[sel], my.df$STRAND[sel])
# generate all possible pairs of IDs within the same strand
pairs <- lapply(groups, combn, 2)
# subset the entries from the matrix
lapply(pairs, function(ij) mat[t(ij)])
df <- my.df[my.df$COLLAPSE == 0, ]
strand <- c("+", "-")
idx <- do.call(rbind, lapply(strand, function(strand){
t(combn(x = df$ID[df$STRAND == strand], m = 2))
}))
idx
# [,1] [,2]
# [1,] 1 2
# [2,] 1 7
# [3,] 2 7
# [4,] 4 6
mat[idx]
Thank you for your kind reply to my previous questions. I have two lists: list1 and list2. I would like to know if each object of list1 is contained in each object of list2. For example:
> list1
[[1]]
[1] 1
[[2]]
[1] 2
[[3]]
[1] 3
> list2
[[1]]
[1] 1 2 3
[[2]]
[1] 2 3
[[3]]
[1] 2 3
Here are my questions:
1.) How do you I ask R to check if an object is a subset of another object in a list?
For instance I would like to check if list2[[3]]={2,3} is contained in (subset of) list1[[2]]={2}. When I do list2[[3]] %in% list1[[2]], I get [1] TRUE FALSE. However, this is not what I desire to do?! I just want to check if list2[[3]] is a subset of list1[[2]], i.e. is {2,3} \subset of {3} as in the set theoretic notion? I do not want to perform elementwise check as R seems to be doing with the %in% command. Any suggestions?
2.) Is there some sort of way to efficiently make all pairwise subset comparisons (i.e. list1[[i]] subset of list2[[j]], for all i,j combinations? Would something like outer(list1,list2, func.subset) work once question number 1 is answered?
Thank you for your feedback!
setdiff compares unique values
length(setdiff(5, 1:5)) == 0
Alternatively, all(x %in% y) will work nicely.
To do all comparisons, something like this would work:
dt <- expand.grid(list1,list2)
dt$subset <- apply(dt,1, function(.v) all(.v[[1]] %in% .v[[2]]) )
Var1 Var2 subset
1 1 1, 2, 3 TRUE
2 2 1, 2, 3 TRUE
3 3 1, 2, 3 TRUE
4 1 2, 3 FALSE
5 2 2, 3 TRUE
6 3 2, 3 TRUE
7 1 2, 3 FALSE
8 2 2, 3 TRUE
9 3 2, 3 TRUE
Note that the expand.grid isn't the fastest way to do this when dealing with a lot of data (dwin's solution is better in that regard) but it allows you to quickly check visually whether this is doing what you want.
You can use the sets package as follows:
library(sets)
is.subset <- function(x, y) as.set(x) <= as.set(y)
outer(list1, list2, Vectorize(is.subset))
# [,1] [,2] [,3]
# [1,] TRUE FALSE FALSE
# [2,] TRUE TRUE TRUE
# [3,] TRUE TRUE TRUE
#Michael or #DWin's base version of is.subset will work just as well, but for part two of your question, I'd maintain that outer is the way to go.
is.subset <- function(x,y) {length(setdiff(x,y)) == 0}
First the combos of list1 elements that are subsets of list2 items:
> sapply(1:length(list1), function(i1) sapply(1:length(list2),
function(i2) is.subset(list1[[i1]], list2[[i2]]) ) )
[,1] [,2] [,3]
[1,] TRUE TRUE TRUE
[2,] FALSE TRUE TRUE
[3,] FALSE TRUE TRUE
Then the unsurprising lack of any of the list2 items (all of length > 1) that are subsets of list one items (all of length 1):
> sapply(1:length(list1), function(i1) sapply(1:length(list2),
function(i2) is.subset(list2[[i2]], list1[[i1]]) ) )
[,1] [,2] [,3]
[1,] FALSE FALSE FALSE
[2,] FALSE FALSE FALSE
[3,] FALSE FALSE FALSE
adding to #Michael's, here's a neat way to avoid the messiness of expand.grid using the AsIs function:
list2 <- list(1:3,2:3,2:3)
a <- data.frame(list1 = 1:3, I(list2))
a$subset <- apply(a, 1, function(.v) all(.v[[1]] %in% .v[[2]]) )
list1 list2 subset
1 1 1, 2, 3 TRUE
2 2 2, 3 TRUE
3 3 2, 3 TRUE