Node labels are not in sequence in trees using fancyRpartPlot - r

I applied the decision tress algorithm to famous breast cancer data set from UCI after removing 16 records with missing rows. The tree I obtained is given below. As it can be seen that the small number appearing on top of the node are not in order. The numbers are 1, 2, 3, 4, 5, 6, 7, 12, 13, 14, 28 and 20. Are the missing nodes deleted due to pruning? Can I enable the option to see them? Can I correct the sequence according to the nodes appearing i.e. 1 to 13?

Related

Creating subgraphs with overlapping vertices

I've been looking for packages using which I could create subgraphs with overlapping vertices.
From what I understand in Networkx and metis one could partition a graph into two or multi-parts. But I couldn't find how to partition into subgraphs with overlapping nodes.
Suggestions on libraries that support partitioning with overlapping vertices will be really helpful.
EDIT: I tried the angel algorithm in CDLIB to partition the original graph into subgraphs with 4 overlapping nodes.
import networkx as nx
from cdlib import algorithms
if __name__ == '__main__':
g = nx.karate_club_graph()
coms = algorithms.angel(g, threshold=4, min_community_size=10)
print(coms.method_name)
print(coms.method_parameters) # Clustering parameters)
print(coms.communities)
print(coms.overlap)
print(coms.node_coverage)
Output:
ANGEL
{'threshold': 4, 'min_community_size': 10}
[[14, 15, 18, 20, 22, 23, 27, 29, 30, 31, 32, 8], [1, 12, 13, 17, 19, 2, 21, 3, 7, 8], [14, 15, 18, 2, 20, 22, 30, 31, 33, 8]]
True
0.6470588235294118
From the communities returned, I understand 1 and 3 have an overlap of 4 nodes but 2 and 3 or 1 and 3 don't have an overlap size of 4 nodes.
It is not clear to me how the overlap threshold (4 overlaps) has to be specified
here algorithms. angel(g, threshold=4, min_community_size=10). I tried setting threshold=4 here to define an overlap size of 4 nodes. However, from the documentation available for angel
:param threshold: merging threshold in [0,1].
I am not sure how to translate the 4 overlaps to the value that has to be set between the bounds [0, 1]. Suggestions will be really helpful.
You can check out CDLIB:
They have a great amount of community finding algorithms applicable to networkX, including some overlapping communities algorithms.
On a side note:
The return type of the functions is called Node Clustering which might be a little confusing at first so here are the methods applicable to it, usually you simply want to convert to a Python dictionary.
Specifically about the angel algorithm in CDLIB:
According to ANGEL: efficient, and effective, node-centric community discovery in static and dynamic networks, the threshold is not the overlapping threshold, but used as follows:
If the ratio is greater than (or equal to) a given threshold, the merge is applied and the node label updated.
Basically, this value determines whether to further merge the nodes into bigger communities, and is not equivalent to the number of overlapping nodes.
Also, don't mistake "labels" with "node's labels" (as in nx.relabel_nodes(G, labels)). The "labels" referred are actually correlated with the Label Propagation Algorithm which is used by ANGEL.
As for the effects of varying this threshold:
[...] Increasing the threshold, we obtain a higher number of communities since lower quality merges cannot take place.
[based on the comment by #J. M. Arnold]
From ANGEL's github repository you can see that when threshold >= 1 only the min_comsize value is used:
self.threshold = threshold
if self.threshold < 1:
self.min_community_size = max([3, min_comsize, int(1. / (1 - self.threshold))])
else:
self.min_community_size = min_comsize

R, incomplete elements in time series decomposition

I am using below codes to perform time series decomposition.
a <- c( 4, 3, 2, 12, 6, 6, 13, 9, 9, 11, 8, 6, 15, 3, 3, 4, 4, 12, 14, 11, 3, 10, 5, 5)
ts_a = ts(a, frequency = 12)
decompose_a = decompose(ts_a, 'additive')
plot(decompose_a)
decompose_a = decompose(ts_a, 'multiplicative')
plot(decompose_a)
The plot shows the trend decomposed is incomplete. How should I interpret this?
Is it no complete trend can be extracted from this time series? (likewise the randomness)
Thank you.
With the arguments you provide, decompose() function uses a moving average to compute the trend component (see help(decompose), and help(filter) for technical details about the computations). The moving window has a length of 12 months in both backward and forward directions, i.e. is centered on a given month and utilize the values 6 months before and 6 months after.
Consequently, by definition, you cannot have trend values for the first six and last six months of your data, since the moving average cannot be computed for those months.

How to plan the most efficient route for patio lights

I'm trying to string up some patio lights. Based on another question I asked, I realize I need an algorithm to solve a Route Inspection Problem to figure out the most efficient route the lights should take so there's minimal duplicate edges covered with lights. After some searching I realized that perhaps something like this would be my best bet: Solving Chinese Postman algorithm with eulerization.
However, I'm having trouble creating the graph.
Here's what it needs to look like:
pink circles represent places in the structure I can hang lights from
"Start" is the only available electrical outlet
The yellow dots represent all the places lights should cover
And here's what my graph looks like after referencing this post: Visualizing distance between nodes according to weights - with R:
As you can see, all the nodes are in the correct place, but the edges are connecting where they shouldn't connect. Here's my code:
library(igraph)
gg<-graph.ring(20)
ll=matrix(
c( 0,0, 75.25,0, 150.5,0, 225.8125,0, 302.8125,0,
0,-87, 302.8125,-87,
0,-173.8125, 302.8125,-173.8125,
0,-260.9375, 302.8125,-260.9375,
16,-384.3125, 302.8125,-384.3125,
16,-435.9575, 302.8125,-435.9375,
16,-525.1875, 75.25,-525.1875, 150.5,-525.1875, 225.8125,-525.1875, 302.8175,-525.1875),
ncol=2,byrow=TRUE)
plot(gg,layout=ll)
I think this has something to do with the nature of graph.ring, but I am unable to figure out another way to define the graphs' edges' lengths without error.
I think you can use graph_from_edgelist for a precise specification of which nodes to connect. It is sufficient to specify which nodes to connect in which order. Nice question btw!
gg <- graph_from_edgelist(cbind(c(1:4, 6, 8, 10, 12, 14, 16:19, 1, 6, 8, 21, 12, 14, 5, 7, 9, 11, 13, 15),
c(2:5, 7, 9, 11, 13, 15, 17:20, 6, 8, 10, 12, 14, 16, 7, 9, 11, 13, 15, 20)))
ll=matrix(
c( 0,0, 75.25,0, 150.5,0, 225.8125,0, 302.8125,0,
0,-87, 302.8125,-87,
0,-173.8125, 302.8125,-173.8125,
0,-260.9375, 302.8125,-260.9375,
16,-384.3125, 302.8125,-384.3125,
16,-435.9575, 302.8125,-435.9375,
16,-525.1875, 75.25,-525.1875, 150.5,-525.1875, 225.8125,-525.1875, 302.8175,-525.1875, 16, -260.9375),
ncol=2,byrow=TRUE)
plot(gg,layout=ll, edge.arrow.size = 0, vertex.size = c(rep(18, 20), 0),
edge.color="orange")
I added a node (n 21) to allow a branching that is similar to your scheme. Does this look more or less as it should?
I had a look at the previous post on Stack Overflow (the one you suggested) to try making this an Euler cycle. Actually, the custom function does work out of the box, but you may want to double check if you can use the resulting solution or not. Maybe, you could try defining a better connection design before "eulerizing" the circuit. This is what I got.
# load custom f(x) as in
# https://stackoverflow.com/questions/40576910/solving-chinese-postman-algorithm-with-eulerization/40596816#40596816
eulerian <- make.eulerian(gg)
eulerian$info
g <- eulerian$graph
# set the layout as before to keep the circuit formatted according to your specs
par(mfrow=c(1,2))
plot(gg,layout=ll, edge.arrow.size = 0, vertex.size = c(rep(18, 20), 0),
edge.color="orange", main = "Proposed")
plot(g,layout=ll, edge.arrow.size = 0, vertex.size = c(rep(18, 20), 0),
edge.color="orange", main = "Eulerized")

R : Loop contract.vertices to calculate network measures for groups in a social network in Igraph

I am trying to calculate different networks measures such as betweenness() and constraint()in my network using Igraph in R. My problem is that I am not looking at individuals but on groups of individuals in my network. Therefore I have to contract the vertices before I calculate the different network measures. Thus far I have been able to create a basic code to calculate the measures. But I have a total of ca. 900 groups (with up to 7 members per group) in a network of ca. 70.000 nodes and 250.000 edges. So I am trying to create a loop to automate the approach and make life a little bit easier.
Now I want to present my approach to calculate the constrain().
# load package
library(igraph)
# load data and create a weighted edgelist
df <- data.frame(from=c(6, 9, 10, 1, 7, 8, 8, 4, 5, 2, 5, 10), to=c(3, 4, 2, 5, 10, 1, 9, 10, 6, 9, 3, 6), weight=c(4, 2, 1, 2, 3, 3, 1, 1, 4, 5, 2, 2))
g <- graph.data.frame(df, directed =FALSE)
#import groups
groups <- "
1 5 8
2
10 7 "
subv <- read.table(text = groups, fill = TRUE, header = FALSE)
I would like to loop the upcoming code , to calculate not each constraint() separately. But for all the three groups given in the reproducible example at once.
#create a subvector of the first group and delete all the NA entries
subv1 <- c(as.numeric(as.vector(subv[1,])))
subv1 <- subv1[!is.na(subv1)]
#save subvector as charcter
subv1 <- as.character(subv1)
#creat subgraph with the nodes of group 1 from graph and add their 1st neighbors
g2 <- induced.subgraph(graph=g ,vids=unlist(neighborhood(graph=g ,order=1, nodes = subv1)))
#identify the igraph IDs of the nodes in the first group
match("1", V(g2)$name)
match("5", V(g2)$name)
match("8", V(g2)$name)
#create a contract vector and contract the vertices from largest to smallest using the output from match
convec1 <- c(1:(5-1), 3, 5:(vcount(g2)-1))
g3 <- contract.vertices(g2, convec1, vertex.attr.comb=toString)
convec2 <- c(1:(4-1), 3, 4:(vcount(g3)-1))
g4 <- contract.vertices(g3, convec2, vertex.attr.comb=toString)
#remove the selfloops and sum the weight attributes for the created graph
g5 <- simplify(g4, remove.loops = TRUE, edge.attr.comb=list(weight="sum"))
# calculate the constraint measure for the vertex 1, 5, 8
constraint(g5, nodes=3, weights=NULL)
So now I have the constraint measure for the first group. For the second and third I would have to repeat my steps again. This would be feasible, but as I stated I have 900 groups. Is there any possibility to loop this?
Please let me know if the give example is unclear as I am new to R and Stackoverflow.

Can breadth first search traverse a disconnected graph?

I have an exam question:
Consider the undirected graph with vertices the numbers 3..16, and edges given by the following rule: two vertices a and b are linked by an edge whenever a is a factor of b, or b is a factor of a. List the vertices in BFS order starting from 14. (Many different orders are possible with a BFS, you may list any one of these).
I'm considering two answers:
Because the graph is not connected, from 14 BFS can only traverse to 7, so the result is 14, 7.
List out all the first level disconnected vertices, then traverse to their child nodes.
so, 14, 16, 15, 13, 12, 11, 10, 9, 7, 8, 6, 5, 4, 3
Which one is correct?
Can BFS traverse to disconnected vertices?
Answer 2 doesn't make sense in my opinion, because this requires knowledge about which nodes are disconnected. Which you need the algorithm to find out. So I'd say answer 1 is correct.

Resources