How to create all non-isomorphic trees with n=6 nodes? - r

I need to create all non-isomorphic trees with n=6 nodes. I have found the degree sequence and try to generate trees this degree.sequence.game() function:
library(igraph)
set.seed(46)
par(mfrow=c(2, 3))
degs <- matrix(c(1,1,1,2,2,3,
1,1,1,3,2,2,
1,1,2,2,2,2,
1,1,1,1,2,4,
1,1,1,1,1,5,
1,1,1,1,3,3), nrow=6, byrow=T)
for(i in 1:6){
g6 <- degree.sequence.game(degs[i,], method="vl")
plot(g6, vertex.label=NA)
}
The output is:
One can see graphs A and B in left figure are isomorphic.
Expected result in right figure.
Question. What is an alternative method to create non-isomorphic trees?

Update
It seems I misunderstood your objective. Below might be one solution if you try simple.no.multiple.uniform option with in degree.sequence.game, i.e.,
g6 <- degree.sequence.game(degs[i, ], method = "simple.no.multiple.uniform")
and we can obtain
BTW, the version of igraph I am using is igraph_1.3.5 (you can see it when typing sessionInfo() in the console) and you can try with this version, which hopefully helps to address your problem as well.
Previous Answer
I think the pain point in your problem is "How to find all distinct degree sequences with given number of vertices in a tree graph?".
We can break this primary problem into two sub-problems:
What is the sum of degrees given n vertices (if we want generate a tree)? The answer is: 2*(n-1)
How to partition the 2*(n-1) into n non-isomorphic groups that consist of positive integers? the answer is: Using partitions::restrictedparts
library(partitions)
n <- 6
degs <- t(restrictedparts(2*(n-1), n, include.zero = FALSE)
and you will see
> degs
[1,] 1 1 1 1 1 5
[2,] 1 1 1 1 2 4
[3,] 1 1 1 1 3 3
[4,] 1 1 1 2 2 3
[5,] 1 1 2 2 2 2
then you can use degree.sequence.game(degs[i,], method="vl") by iterating i through 1 to nrow(degs).

Related

generate (overlapping) sets of mutually similar elements from binary similarity matrix [duplicate]

This question already has an answer here:
Generating distinct groups of nodes in a network
(1 answer)
Closed 3 years ago.
Given a symmetric binary similarity matrix M (1 = similarity), I want to extract all (potentially overlapping) subsets, where all elements within a set are mutually similar.
A B C D E
A 1 1 0 0 0
B 1 1 1 1 0
C 0 1 1 1 1
D 0 1 1 1 1
E 0 0 1 1 1
Also, sets contained within other sets should be discarded (e.g. {D,E} is contained in {C,D,E}). For the matrix the result would be: {A,B}, {B,C,D}, {C,D,E}
How can I easily achieve this?
I suspect that there is some clustering algorithm for this, but I am unaware of the name for these types of problems. To which (mathematical) class of problems does this task belong to?
Code
M <- matrix(c(1,1,0,0,0,
1,1,1,1,0,
0,1,1,1,1,
0,1,1,1,1,
0,0,1,1,1), ncol = 5, byrow = TRUE)
colnames(M) <- rownames(M) <- LETTERS[1:5]
PS. While this may smell like some homework assignment, but its actually a problem I encountered in my job :)
A clique is a subgraph that is completely connected.
What you are looking for is hence (maximal) clique detection.
https://en.wikipedia.org/wiki/Clique_problem
Beware that the results can be much larger than you anticipate. Consider a graph where each edge is 1 with a probability of p. For p close to 1, almost any subset is a clique. Finding maximum cliques then becomes expensive. P can also be chosen to maximize the number of maximal cliques...

how extract membership vector for my GNgraph in r?

i want to use NMI to compare my algorithm in community detection with other methods.so i am making some graphs with sample_sbm() which i define to give me 10 nodes ,and in block.sizes=c(3,3,4) part i define to have communities,first one has 3members,second 3,third 4 members.
now i want a membership vector of them.it should be : 1 1 1 2 2 2 3 3 3 3
what is the best way to do it?i was thinking of taking 3 arguments c1,c2,c3 and then using them in block.sizes(),so i can use a for loop to build the membership vector.but looks a bit dirty.cause the number of communities should be arbitrary.
i will be thankful if you suggest me something nicer
library(igraph)
p<-cbind( c(1, 0,0), c(0, 1,0) ,c(0,0,1))
g <- sample_sbm(10, pref.matrix=p, block.sizes=c(3,3,4) )
#comunity detection algorithm
wc <- cluster_walktrap(g)
modularity(wc)
a=membership(wc)
UPDATE following the original question-asker's comments:
I store the sizes of the blocks in a my_block_sizes vector. Then I use the rep.int function and the seq_along function to create the membership vector according to the sizes of the blocks.
library(NMI)
library(igraph)
my_block_sizes <- c(3,3,4)
# make a membership vector
membership_vector <- rep.int(seq_along(my_block_sizes), my_block_sizes)
membership_vector
[1] 1 1 1 2 2 2 3 3 3 3
p <- cbind(c(1,0,0), c(0,1,0), c(0,0,1))
g <- igraph::sample_sbm(10, pref.matrix=p, block.sizes=my_block_sizes)
# comunity detection algorithm
wc <- cluster_walktrap(g)
modularity(wc)
a <- membership(wc)
Original answer:
I'm not 100% sure this is what you're after, but based on the information you've provided, this may solve your problem.
I use the length of the wc object to determine the number of communities detected by the community detection algorithm, and the rep.int function to repeat each community number according to the size of the blocks, which I store in advance in the my_block_sizes object.
library(NMI)
library(igraph)
my_block_sizes <- c(3,3,4)
p <- cbind(c(1,0,0), c(0,1,0), c(0,0,1))
g <- igraph::sample_sbm(10, pref.matrix=p, block.sizes=my_block_sizes)
#comunity detection algorithm
wc <- cluster_walktrap(g)
modularity(wc)
a <- membership(wc)
# make a membership vector
membership_vector <- rep.int(1:length(wc), my_block_sizes)
membership_vector
[1] 1 1 1 2 2 2 3 3 3 3

R - Get a matrix with the reduced number of features with SVD

I'm using the SVD package with R and I'm able to reduce the dimensionality of my matrix by replacing the lowest singular values by 0. But when I recompose my matrix I still have the same number of features, I could not find how to effectively delete the most useless features of the source matrix in order to reduce it's number of columns.
For example what I'm doing for the moment:
This is my source matrix A:
A B C D
1 7 6 1 6
2 4 8 2 4
3 2 3 2 3
4 2 3 1 3
If I do:
s = svd(A)
s$d[3:4] = 0 # Replacement of the 2 smallest singular values by 0
A' = s$u %*% diag(s$d) %*% t(s$v)
I get A' which has the same dimensions (4x4), was reconstruct with only 2 "components" and is an approximation of A (containing a little bit less information, maybe less noise, etc.):
[,1] [,2] [,3] [,4]
1 6.871009 5.887558 1.1791440 6.215131
2 3.799792 7.779251 2.3862880 4.357163
3 2.289294 3.512959 0.9876354 2.386322
4 2.408818 3.181448 0.8417837 2.406172
What I want is a sub matrix with less columns but reproducing the distances between the different rows, something like this (obtained using PCA, let's call it A''):
PC1 PC2
1 -3.588727 1.7125360
2 -2.065012 -2.2465708
3 2.838545 0.1377343 # The similarity between rows 3
4 2.815194 0.3963005 # and 4 in A is conserved in A''
Here is the code to get A'' with PCA:
p = prcomp(A)
A'' = p$x[,1:2]
The final goal is to reduce the number of columns in order to speed up clustering algorithms on huge datasets.
Thank you in advance if someone can guide me :)
I would check out this chapter on dimensionality reduction or this cross-validated question. The idea is that the entire data set can be reconstructed using less information. It's not like PCA in the sense that you might only choose to keep 2 out of 10 principal components.
When you do the kind of trimming you did above, you're really just taking out some of the "noise" of your data. The data still as the same dimension.

Dynamic Network In R

I am currently working on dynamic temporal network.
Header: Time Sender Receiver
1 1 2
1 1 3
2 2 1
2 2 1
3 1 2
3 1 2
The above is a sample of my dataset.
There are 3 time periods (sessions) and the edgelists between nodes.
I want to compute centrality measures by each time period.
I am thinking about writing a script that compute centrality measures within the same period of the time.
However I am just wondering whether there might be R libraries that can handle this problem.
Is there anyone who knows about?
Jinie
I tried to write the code for subsetting data based on Time as follows:
uniq <-unique(unlist(df$Time))
uniq
[1] 1 2 3
for (i in 1:length(uniq)){
t[i]<-subset(df, Time==uniq[i])
net[i] <-as.matrix(t[i])
netT[i]<-net[i][,-3] #removing time column
#### getting edgelist
netT[i][,1]=as.character(net[i][,1])
netT[i][,2]=as.character(net[i][,2])
g [i]=graph.edgelist(netT [i], directed=T)
g[i]
}
however, I've got a error message ( Error in t[i] <- subset(df, Time == uniq[i]) :
object of type 'closure' is not subsettable)
Do you know why? I am kind of new to R so it is hard to figure it out.
I guess t[i] is the problem. I don't know how to assign t[i] as a data frame.
The networkDynamic R library is helpful for this sort of thing (disclaimer: I'm a package maintainer)
library(networkDynamic)
# a data frame with your input data
raw<-data.frame(time=c(1,1,2,2,3,3),
sender=c(1,1,2,2,1,1),
receiver=c(2,3,1,1,2,2))
# add another time column to define a start and end time for each edge spell
raw2<-cbind(raw$time,raw$time+1,raw$sender,raw$receiver)
# create a networkDynamic object using this edge timing info
nd<-networkDynamic(edge.spells=raw2)
# load the sna library with static network measures
library(sna)
# apply degree measure to static networks extracted at default time points
lapply(get.networks(nd),degree)
[[1]]
[1] 2 1 1
[[2]]
[1] 1 1 0
[[3]]
[1] 1 1 0
You could try the igraph library. I'm not familiar with it, but i find this question interesting enough to code up an answer, so here we go:
Because you've got a directed network (senders and receivers) you're going to need to two measures of centrality: indegree and outdegree.
Calculating this is fairly simple, the complication is splitting by time points. So for each time point we need to do the following:
Create an adjacency matrix indicating for each row (sender) the number of connections to each column (receiver).
From that we can simply add up the connections in the rows to get the outdegree, and the connections in the columns for the indegree.
Assuming your data is stored in a data.frame named df we can use split to split your data.frame by time point:
nodes <- unique(c(unique(df$Sender), unique(df$Receiver)))
centrality <- lapply(split(df, df$Time), function(time.df) {
adj <- matrix(0, length(nodes), length(nodes), dimnames=list(nodes, nodes))
for (i in 1:nrow(time.df)) {
sender <- time.df[i, "Sender"]
receiver <- time.df[i, "Receiver"]
adj[sender, receiver] <- adj[sender, receiver] + 1
}
list(indegree=colSums(adj), outdegree=rowSums(adj))
})
names(centrality) <- paste0("Time.Point.", 1:length(centrality))
If we run the code on your data (I've replaced the Senders and Receivers with letters for clarity):
> centrality
$Time.Point.1
$Time.Point.1$indegree
a b c
0 1 1
$Time.Point.1$outdegree
a b c
2 0 0
$Time.Point.2
$Time.Point.2$indegree
a b c
2 0 0
$Time.Point.2$outdegree
a b c
0 2 0
$Time.Point.3
$Time.Point.3$indegree
a b c
0 2 0
$Time.Point.3$outdegree
a b c
2 0 0

Find indices of 5 closest samples in distance matrix

Users
I have a distance matrix dMat and want to find the 5 nearest samples to the first one. What function can I use in R? I know how to find the closest sample (cf. 3rd line of code), but can't figure out how to get the other 4 samples.
The code:
Mat <- replicate(10, rnorm(10))
dMat <- as.matrix(dist(Mat))
which(dMat[,1]==min(dMat[,1]))
The 3rd line of code finds the index of the closest sample to the first sample.
Thanks for any help!
Best,
Chega
You can use order to do this:
head(order(dMat[-1,1]),5)+1
[1] 10 3 4 8 6
Note that I removed the first one, as you presumably don't want to include the fact that your reference point is 0 distance away from itself.
Alternative using sort:
sort(dMat[,1], index.return = TRUE)$ix[1:6]
It would be nice to add a set.seed(.) when using random numbers in matrix so that we could show the results are identical. I will skip the results here.
Edit (correct solution): The above solution will only work if the first element is always the smallest! Here's the correct solution that will always give the 5 closest values to the first element of the column:
> sort(abs(dMat[-1,1] - dMat[1,1]), index.return=TRUE)$ix[1:5] + 1
Example:
> dMat <- matrix(c(70,4,2,1,6,80,90,100,3), ncol=1)
# James' solution
> head(order(dMat[-1,1]),5) + 1
[1] 4 3 9 2 5 # values are 1,2,3,4,6 (wrong)
# old sort solution
> sort(dMat[,1], index.return = TRUE)$ix[1:6]
[1] 4 3 9 2 5 1 # values are 1,2,3,4,6,70 (wrong)
# Correct solution
> sort(abs(dMat[-1,1] - dMat[1,1]), index.return=TRUE)$ix[1:5] + 1
[1] 6 7 8 5 2 # values are 80,90,100,6,4 (right)

Resources