Finding intersection entries in a data frame

Finding intersection entries in a data frame - r

I've run into an R-programming problem that I can't seem to wrap my head around. I have data like the following:
data = data.frame("start"=c(1,2,4,5),
"length"=c(2,2,2,3),
"decision"=c("yes","no","yes","yes"))
Which looks like:
start length decision
1 1 2 yes
2 2 2 no
3 4 2 yes
4 5 3 yes
Row one stands for a sequence of integers that start at 1 for length 2 (1,2). Row 3 is 2 integers starting at 4 (4,5). I'm looking for intersections between entries that have a 'yes' decision variable. When the decision variable is 'no', then the sequence is thrown out. Here's what I've attempted so far.
I think I need to create a sequence list first.
sequence.list = lapply(seq(dim(data)[1]),
function(d){
seq(data$start[d],(data$start[d]+data$length[d]-1),by=1)
})
This outputs:
sequence.list
[[1]]
[1] 1 2
[[2]]
[1] 2 3
[[3]]
[1] 4 5
[[4]]
[1] 5 6 7
Which is a start. Then I create a list that counts intersections between items on my list (I stole this idea from another post on here).
count.intersect = lapply(sequence.list,function(a) {
sapply(seq(length(sequence.list)),
function(b) length(intersect(sequence.list[[b]], a)))
})
This creates the list:
count.intersect
[[1]]
[1] 2 1 0 0
[[2]]
[1] 1 2 0 0
[[3]]
[1] 0 0 2 1
[[4]]
[1] 0 0 1 3
The way to read this is that entry 1 in the data frame has 2 trivial intersections with itself and 1 intersection with entry 2.
Here's where I get fuzzy on what to do. Make it a matrix?
intersect.matrix = do.call(rbind,count.intersect)
Then set the rows and columns of non-used entries to zero?
intersect.matrix[,data$decision=="no"]=0
intersect.matrix[data$decision=="no",]=0
intersect.matrix
[,1] [,2] [,3] [,4]
[1,] 2 0 0 0
[2,] 0 0 0 0
[3,] 0 0 2 1
[4,] 0 0 1 3
Now, I would like to return indices 3 and 4 somehow. I want to find the rows (or columns) containing non zeros that are also not on the diagonal.
Sorry for posting the whole procedure, I also want to know if there is a shorter way to go from the starting dataframe to finding intersections in used entries.

Since you are not interested in non zero values on the diagonal, first I would subtract them away:
diag.mat <- diag(intersect.matrix) * diag(ncol(intersect.matrix)
which gives:
intersect.matrix - diag.mat
[,1] [,2] [,3] [,4]
[1,] 0 0 0 0
[2,] 0 0 0 0
[3,] 0 0 0 1
[4,] 0 0 1 0
Then identify which of the columns still hold non zero entries using which:
which(colSums(intersect.matrix - diag.mat) != 0)
[1] 3 4

You asked whether there is a short way to go from your data frame data to the indices. Here it is.
(Note: This may be hard to understand if you're new to R.)
1) Create the sequence list:
sequence.list <- apply(data[1:2], 1, function(x) seq_len(x[2]) + x[1] - 1)
# [[1]]
# [1] 1 2
#
# [[2]]
# [1] 2 3
#
# [[3]]
# [1] 4 5
#
# [[4]]
# [1] 5 6 7
2) Count intersects and create the intersect matrix
intersect.matrix <- outer(s <- seq_along(sequence.list), s,
Vectorize(function(a, b)
length(Reduce(intersect, sequence.list[seq(a, b)]))))
# [,1] [,2] [,3] [,4]
# [1,] 2 1 0 0
# [2,] 1 2 0 0
# [3,] 0 0 2 1
# [4,] 0 0 1 3
3) Set cells corresponding to "no" to zero
idx <- data$decision == "no"
intersect.matrix[idx, ] <- intersect.matrix[ , idx] <- 0
# [,1] [,2] [,3] [,4]
# [1,] 2 0 0 0
# [2,] 0 0 0 0
# [3,] 0 0 2 1
# [4,] 0 0 1 3
4) Find indices of non-zero rows/columns (except diagonal)
result <- which(as.logical(colSums("diag<-"(intersect.matrix, 0))))
# [1] 3 4

Related

Generate all possible binary vectors of length n in R

I'm looking to generate all possible binary vectors of length n in R. What is the best way (preferably both computationally efficient and readable code) to do this?

n = 3
expand.grid(replicate(n, 0:1, simplify = FALSE))
# Var1 Var2 Var3
#1 0 0 0
#2 1 0 0
#3 0 1 0
#4 1 1 0
#5 0 0 1
#6 1 0 1
#7 0 1 1
#8 1 1 1

Inspired by this question generating all possible binary vectors of length n containing less than m 1s, I've extended this code to produce all possible combinations. It's not pretty, though.
> z <- 3
> z <- rep(0, n)
> do.call(rbind, lapply(0:n, function(i) t(apply(combn(1:n,i), 2, function(k) {z[k]=1;z}))))
[,1] [,2] [,3]
[1,] 0 0 0
[2,] 1 0 0
[3,] 0 1 0
[4,] 0 0 1
[5,] 1 1 0
[6,] 1 0 1
[7,] 0 1 1
[8,] 1 1 1
What is it doing? Once we strip it back, the core of this one-liner is the following:
apply(combn(1:n,i), 2, function(k) {z[k]=1;z})
To understand this, let's step back one level further. The function combn(x,m) generates all possible combinations of x taken m at a time.
> combn(1:n, 1)
[,1] [,2] [,3]
[1,] 1 2 3
> combn(1:n, 2)
[,1] [,2] [,3]
[1,] 1 1 2
[2,] 2 3 3
> combn(1:n, 3)
[,1]
[1,] 1
[2,] 2
[3,] 3
For using apply(MARGIN=2), we pass in a column of this function at a time to our inner function function(k) {z[k]=1;z} which simply replaces all of the values at the indices k with 1. Since they were originally all 0, this gives us each possible binary vector.
The rest is just window dressing. combn gives us a wide, short matrix; we transpose it with t. lapply returns a list; we bind the matrices in each element of the list together with do.call(rbind, .).

You should define what is "the best way" (fastest? shortest code?, etc.).
One way is to use the package R.utils and the function intToBin for converting decimal numbers to binary numbers. See the example.
require(R.utils)
n <- 5
strsplit(intToBin(0:(2 ^ n - 1)), split = "")

How does [ ] work with a function, is the mean the function become and element?

I have a low triangle matrix:
> Mat1
[,1] [,2] [,3] [,4] [,5]
[1,] 0 0 0 0 0
[2,] 1 0 0 0 0
[3,] 3 3 0 0 0
[4,] 4 4 4 0 0
[5,] 4 1 1 3 0
lower.tri returns a matrix of logicals the same size of a given matrix with entries TRUE in the lower or upper triangle (R help).
Then lets
lowt <– lower.tri(Mat1)
xx <– Mat1[lowt]
xx
[1] 1 3 4 4 3 4 1 4 1 3
My question is how does Mat1[lowt] work? how do we use the function as an element by using [ ]?! what is the idea? any help please?

how do we use the function as an element by using [ ]?!
lowt is not a function, but a boolean matrix, as you said yourself:
lowt <– lower.tri(Mat1) saves the return variable of lower.tri in lowt; Mat1[lowt] therefore returns values from Mat1, by logical indexing - a widely used concept in R.

splitting integers and converting into matrix

I was wondering if is it possible to stringsplit each integer in a set of numbers and transform it into a transition matrix, e.g
data<-c(11,123,142,1423,1234,12)
What i would like to do is to split each integer in the data (considering only the first two elements in the dataset),first element will be 1,1 second element will be 1,2,3....and convert it into matrix e,g 1,1 will be 1 to 1, 1,2 will be 1 to 2 and 2,3 will be 2 to 3. generating the following matrix
1 2 3 4 5
1 1 1 0 0 0
2 0 0 1 0 0
3 0 0 0 0 0
4 0 0 0 0 0
5 0 0 0 0 0
My matrix will never go past 5x5. Below is what i have done which works but it's really really tedious.
data2<-as.matrix(as.character(data))
for(i in 1:nrow(data2)) {
values<-strsplit(data2,"")
}
values2<-t(sapply(values, '[', 1:max(sapply(values, length))))
values2[is.na(values2)]<-0
values3<-apply(values2,2,as.numeric)
from1to1<-0
from1to2<-0
from1to3<-0
from1to4<-0
from1to5<-0
from2to1<-0
from2to2<-0
from2to3<-0
from2to4<-0
...
from5to4<-0
from5to5<-0
for(i in 1:nrow(values3)){
for(j in 1:(ncol(values3)-1))
if (((values3[i,j]==1)&(values3[i,j+1]==1))){
from1to1<-from1to1 + 1
}else{
if (((values3[i,j]==1)&(values3[i,j+1]==2))){
from1to2<-from1to2 + 1
}else{
if (((values3[i,j]==1)&(values3[i,j+1]==3))){
from1to3<-from1to3 + 1
}else{
if (((values3[i,j]==1)&(values3[i,j+1]==4))){
from1to4<-from1to4 + 1
}else{
if (((values3[i,j]==1)&(values3[i,j+1]==5))){
from1to5<-from1to5 + 1
}else{
if (((values3[i,j]==1)&(values3[i,j+1]==1))){
from1to1<-from1to1 + 1
}else{.....continues through all other from2to1...from5to5``
I then place every single number into a 5x5 matrix.
This is obviously tedious and long and ridiculous. Is there anyway to shorten this? Any suggestions is appreciated.

Here's an option, presented here piped so as to be easy to follow:
library(magrittr) # for the pipe
# initialize a matrix of zeros
mat <- matrix(0, 5, 5)
# split each element into individual digits
strsplit(as.character(data), '') %>%
# turn list elements back to integers
lapply(as.integer) %>%
# make a 2 column matrix of each digit paired with the previous digit
lapply(function(x){matrix(c(x[-length(x)], x[-1]), ncol = 2)}) %>%
# reduce list to a single 2-column matrix
do.call(rbind, .) %>%
# for each row, add 1 to the element of mat they subset
apply(1, function(x){mat[x[1], x[2]] <<- mat[x[1], x[2]] + 1; x})
# output is the transpose of the matrix; the real results are stored in mat
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
## [1,] 1 1 2 1 4 1 4 2 1 2 3 1
## [2,] 1 2 3 4 2 4 2 3 2 3 4 2
mat
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1 3 0 2 0
## [2,] 0 0 3 0 0
## [3,] 0 0 0 1 0
## [4,] 0 2 0 0 0
## [5,] 0 0 0 0 0
Alternately, if you'd like xtabs as suggested by alexis_laz, replace the last line with xtabs(formula = ~ .[,1] + .[,2]) instead of using mat.
You might also check out the permutations package, which from what I can tell seems to be for working with this kind of data, though it's somewhat high-level.

R: Get the row and column name of the minimum element of a matrix but with minimum != 0

I've got a matrix with a lot of zeros and with positive numerical values.
I want to get the row and column number for which the element is
the minimal NONZERO value of the matrix.
I don't find that min() has extra options to exclude zero, so how can I
handle this?

Seems like there could be a shorter answer but u can replace the zeros with NA and use na.rm=T
test = matrix(c(0:9,0:9),nrow =4,ncol=5)
test[which(test==0)] = NA
minValue = min(test,na.rm=T)
rows = which(apply(test,1,min,na.rm=T)==minValue)
cols = which(apply(test,2,min,na.rm=T)==minValue)
Allows for duplicates

Using x[x>0] you can directly find your nonzero element, but as you want row number and column number you have to give extra effort. I have used apply method for the same, which will return you index for row number and column number...
set.seed(1234)
set.seed(1234)
myMat = matrix(sample(0:1,replace=T,25),nrow=5)
[,1] [,2] [,3] [,4] [,5]
[1,] 0 1 1 1 0
[2,] 1 0 1 0 0
[3,] 1 0 0 0 0
[4,] 1 1 1 0 0
[5,] 1 1 0 0 0
#Defining a function which will return zero valued member index
FindIndex = function(x){
v=vector()
j=1
for(i in 1:length(x)){
if(x[i]>0){
v[j]=i
j=j+1
}
}
return = v
}
#Find column number row-wise
apply(myMat, 1, FUN=FindIndex)
[[1]]
[1] 2 3 4
[[2]]
[1] 1 3
[[3]]
[1] 1
[[4]]
[1] 1 2 3
[[5]]
[1] 1 2
#Find row number column-wise
apply(myMat, 2, FUN=FindIndex)
[[1]]
[1] 2 3 4 5
[[2]]
[1] 1 4 5
[[3]]
[1] 1 2 4
[[4]]
[1] 1
[[5]]
logical(0)
You can convert this return values to your required format..

How to print row index and occurences count of zeros in rows in R data.frame

I want to print row index and the number of zeros present in each row of a R data.frame ..
The input matrix is like this:
A B
rowIndex1 0 1
rowIndex2 1 1
I thought to use this:
print(which(rowSums(matrix == 0) != 0))
I want that it prints something like this:
rowIndex1
1
However it does not print the number of zeros in the rows but a different number (I checked it) - like this:
rowIndex1
2400
How to achieve it?
Thanks

As mentioned in my comment, perhaps arr.ind would be of use.
Using #bartektartanus's sample data:
m <- diag(5) + c(0:6,0,0)
table(which(m == 0, arr.ind=TRUE)[, "row"])
#
# 2 3 4 5
# 1 2 1 1
The "names" (in this case, 2, 3, 4, and 5) are your row numbers and the values (in this case, 1, 2, 1, 1) are the counts.
Here is the output of which, so you can understand what is going on:
which(m == 0, arr.ind=TRUE)
# row col
# [1,] 3 2
# [2,] 4 2
# [3,] 5 2
# [4,] 2 4
# [5,] 3 4

This is working good. You get row number that contains zero.
> m <- diag(5) + c(0:6,0,0)
Warning message:
In diag(5) + c(0:6, 0, 0) :
longer object length is not a multiple of shorter object length
> m
[,1] [,2] [,3] [,4] [,5]
[1,] 1 5 1 6 2
[2,] 1 7 2 0 3
[3,] 2 0 4 0 4
[4,] 3 0 4 1 5
[5,] 4 0 5 1 7
> which(rowSums(m == 0) != 0)
[1] 2 3 4 5
to obtain what you want use this:
> x <- rowSums(m==0)
> cbind(which(x!=0),x[x!=0])
[,1] [,2]
[1,] 2 1
[2,] 3 2
[3,] 4 1
[4,] 5 1

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Finding intersection entries in a data frame - r

Related

Generate all possible binary vectors of length n in R

How does [ ] work with a function, is the mean the function become and element?

splitting integers and converting into matrix

R: Get the row and column name of the minimum element of a matrix but with minimum != 0

How to print row index and occurences count of zeros in rows in R data.frame

Categories

Resources