Build new adjacency matrix after graph partitioning - graph

I have an adjancecy matrix stored in CSR format. Eg
xadj = 0 2 5 8 11 13 16 20 24 28 31 33 36 39 42 44
adjncy = 1 5 0 2 6 1 3 7 2 4 8 3 9 0 6 10 1 5 7 11 2 6 8 12 3 7 9 13 4 8 14 5 11 6 10 12 7 11 13 8 12 14 9 13
I am now paritioning said graph using METIS. This gives me the partition vector part of the graph. Basically a list that tells me in which partition each vertex is. Is there an efficient way to build the new adjacency matrix for this partitioning such that I can partition the new graph again? Eg a function rebuildAdjacency(xadj, adjncy, part). If possible reusing xadj and adjncy.

I'm assuming that what you mean by "rebuild" is removing the edges between vertices that have been assigned different partitions? If so, the (probably) best you can do is iterate your CSR list, generate a new CSR list, and skip all edges that are between partitions.
In pseudocode (actually, more or less Python):
new_xadj = []
new_adjcy = []
for row in range(0, n):
row_index = xadj[row]
next_row_index = xadj[row+1]
# New row index for the row we are currently building
new_xadj.append(len(new_adjcy))
for col in adjncy[row_index:next_row_index]:
if partition[row] != partition[col]:
pass # Not in the same partition
else:
# Put the row->col edge into the new CSR list
new_adjcy.append(col)
# Last entry in the row index field is the number of entries
new_xadj.append(len(new_adjcy))
I don't think that you can do this very efficiently re-using the old xadj and adjcy fields. However, if you are doing this recursively, you can save memory allocation / deallocation by having exacyly two copies of xadj and adjc, and alternating between them.

Related

Count consecutive preceding elements in DolphinDB

Volume
f
Explanation
10
0
no volume before 10
7
0
no smaller volume before 7
13
2
Both 10 and 7 are smaller than 13
6
0
13 is larger than 6
4
0
6 is larger than 4
8
2
Both 6 and 4 are smaller than 8
7
0
8 is larger than 7
3
0
7 is larger than 3
4
1
3 is smaller than 4
As shown in the above table, I’d like to obtain the f column based on volume in DolphinDB.
Suppose the current volume is t, the desired output f is the count of volumes that meet the following conditions:
There are consecutive elements in volume column that are less than t
The last volume of the consecutive elements is the preceding volume
before t;
The calculation principle in detail is illustrated in the explanation column.
I tried for-loop but it didn't work. Does DolphinDB support any other functions to obtain the result?
t = table(1..10 as volume) tmp = select volume, iif(deltas(volume)>0, rowNo(volume), NULL) as flag from t tmp.bfill!() select volume, cumrank(volume) from tmp context by flag

How to get the ID of each node from topological sort?

I have a network (a directed acyclic graph):
dag_1 <- barabasi.game(20)
I applied a topological sort:
top1 <- topo_sort(dag_1)
top1
+ 20/20 vertices, from 0ee5d26:
[1] 5 8 11 13 14 15 16 17 18 20 4 7 12 19 2 10 9 6 3 1
If I type top1 and hit enter, the results are above. I need to access the vector
5 8 11 13, ..., 1
I tried top1[1] and top1[[1]]. Neither of them gave me the vector.
How can I get it?
top1 is an igraph.vs class object, and indexing e.g. top1[1:10] returns the vertices of the graph. To return a vector of the vertices use:
as.vector(top1)

Alternating between reading forwards and backwards in a loop

My array is 1D m in length. say m = 16
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
The way I actually interpret the array is n x n = m
0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15
I require to read the array in this manner due to the way my physical environment is set up
0 4 8 12 13 9 5 1 2 6 10 14 15 11 7 3
What I came up with works but I really don't think it is the best way to do this:
bool isFlipped = true;
int x = 0; x < m; x++
if(isFlipped)
newLine[x] = line[((n-1)-x%n)*n + x/n)]
else
newLine[x] = line[x%n*n +x/n]
if(x != 0 && x % n == 0)
isFlipped = !isFlipped
This gives me the required result but I really think there is a way to get rid of this boolean by purely using a math formula. I am stuffing this into a 8kb microcontroller and I need to conserve as much space as I can because I will have some bluetooth communication and more math going into it later on.
Edit:
Thanks to a user I got to a one line solution-ish. (the below would replace the lines in the for-loop)
c=x/n
newLine[x] = line[((c+1)%2)*((x%n)*n+c) + (c%2)*((n-1)-2*(x%n))*n ];
You should be able to utilize the fact that odd columns in the n*n matrix are read from down up, and even columns are read from up down.
A number at index x in newLine is located in column number c=floor(x/n) in the n*n matrix. c%2 is 0 for even columns and 1 for odd columns. So something like this should work:
int c = x/n;
newLine[x] = line[(x%n)*n + (c%2)*((n-1)-2*(x%n))*n + c];

TraMineR: Can I get the complete sequence if I give an event sub sequence?

I have a sequence dataset like below:
customerid flag 0 1 2 3 4 5 6 7 8 9 10 11
abc234 1 3 4 3 4 5 8 4 3 3 2 14 14
abc233 0 4 4 4 4 4 4 4 4 4 4 4 4
qpr81 0 9 8 7 8 8 7 8 8 7 8 8 7
qnr94 0 14 14 14 2 14 14 14 14 14 14 14 14
Values in column 0 to 11 are the sequences. There are two sets of customers with flag=1 and flag=0, I have differentiating event sequences for both sets. ( Only frequencies and residuals for 2 groups are shown here)
Subsequence Freq.0 Freq.1 Resid.0 Resid.1
(3>4) 0.19208177 0.0753386 5.540793 -21.43304
(4>5) 0.15752553 0.059960497 5.115241 -19.78691
(5>4) 0.15950556 0.062782167 5.037413 -19.48586
I want to find the customer ids and the flags for which the event sequences match.
Should I write a python script to traverse the transactions or is there some direct method in R to do this?
`
CODE
--------------
library(TraMineR)
custid=c(a1,a2,a3,b4,b5,c6,c7,d8,d9)#sample customer ids
flag=c(0,0,0,1,0,1,1,0,1)#flag
col1=c(14,14,14,14,14,5,14,14,2)
col2=c(14,14,3,14,3,14,6,3,3)
col3=c(14,2,2,14,2,14,2,2,2)
col4=c(14,2,2,14,2,14,2,2,14)
df=data.frame(custid,flag,col1,col2,col3,col4)#dataframe generation
print(df)
#Defining sequence from col1 to col4
df.s<-seqdef(df,3:6)
print(df.s)
#finding the transitions
transition<-seqetm(df.s,method='transition')
print(transition)
#converting to TSE format
df.tse=seqformat(df.s,from='SPS',to='TSE',tevent = transition)
print(df.tse)
#Event sequence generation
df.seqe=seqecreate(id=df.tse$id,timestamp=df.tse$time,event=df.tse$event)
print(df.seqe)
#subsequences
fsubseq <- seqefsub(df.seqe, pMinSupport = 0.01)
print(fsubseq)
groups <- factor(df$flag>0,labels=c(1,0))
#finding differentiating event sequences based on flag using ChiSquare test
diff <- seqecmpgroup(fsubseq, group = df$flag, method = "chisq")
#Using seqeapplysub for finding the presence of subsequences?
presence=seqeapplysub(fsubseq,method="presence")
print(presence[1:3,3:1])
`
Thanks
From what I understand, you have state sequences and have transformed them into event sequences using the seqecreate function of TraMineR. The events you are considering are the state changes. Thus (3>4) stands for a subsequence with only one event, namely the event 3>4 (switching from 3 to 4). Then, you identify the event subsequences that best discriminate your two flags using the seqefsub and seqecmpgroup functions.
If this is correct, then you can identify the sequences containing each subsequence with the seqeapplysub function. I cannot illustrate here because you do not provide any code in your question. Look at the online help of the seqeapplysub function.
======= update referring to your added code =======
Here is how you get the ids of the sequences that contain the most discriminating subsequence.
First we extract the first three most discriminating sequences from your diff object. Second, we compute the presence matrix that provides a column for each extracted subsequence with a 1 in regard of the sequences that contain the subsequence and 0 otherwise.
diffseq <- seqefsub(df.seqe, strsubseq = paste(diff$subseq[1:3]))
(presence=seqeapplysub(diffseq, method="presence"))
Now you get the ids for the first subsequence with
custid[presence[,1]==1]
For the second it would be custid[presence[,2]==1] etc.
Likewise you get the flag with
flag[presence[,1]==1]
Hope this helps.

Filter between threshold

I am working with a large dataset and I am trying to first identify clusters of values that meet specific threshold values. My aim then is to only keep clusters of a minimum length. Below is some example data and my progress thus far:
Test = c("A","A","A","A","A","A","A","A","A","A","B","B","B","B","B","B","B","B","B","B")
Sequence = c(1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10)
Value = c(3,2,3,4,3,4,4,5,5,2,2,4,5,6,4,4,6,2,3,2)
Data <- data.frame(Test, Sequence, Value)
Using package evd, I have identified clusters of values >3
C1 <- clusters(Data$Value, u = 3, r = 1, cmax = F, plot = T)
Which produces
C1
$cluster1
4
4
$cluster2
6 7 8 9
4 4 5 5
$cluster3
12 13 14 15 16 17
4 5 6 4 4 6
My problem is twofold:
1) I don't know how to relate this back to the original dataframe (for example to Test A & B)
2) How can I only keep clusters with a minimum size of 3 (thus excluding Cluster 1)
I have looked into various filtering options etc. however they do not cluster data according to a desired threshold, with no options for the minimum size of the cluster either.
Any help is much appreciated.
Q1: relate back to original dataframe: Have a look at Carl Witthoft's answer. He wrote a variant of rle() (seqle() because it allows one to look for integer sequences rather than repetitions): detect intervals of the consequent integer sequences
Q2: only keep clusters of certain length:
C1[sapply(C1, length) > 3]
yields the 2 clusters that are long enough:
$cluster2
6 7 8 9
4 4 5 5
$cluster3
12 13 14 15 16 17
4 5 6 4 4 6

Resources