Average clustering coefficient of a network (igraph) - r

I want to calculate the average clustering coefficient of a graph (from igraph package). However, I am not sure which approach I should follow.
library(igraph)
graph <- erdos.renyi.game(10000, 10000, type = "gnm")
# Global clustering coefficient
transitivity(graph)
# Average clustering coefficient
transitivity(graph, type = "average")
# The same as above
mean(transitivity(graph, type = "local"), na.rm = TRUE)
I would be grateful for some guidance.

Using transitivity(graph) computes a global clustering coefficient (transitivity):
This is simply the ratio of the triangles and the connected triples in
the graph. For directed graph the direction of the edges is ignored.
Meanwhile, transitivity(graph, type = "average") being an average of transitivity(graph, type = "local") first computes the local clustering coefficients and then averages them:
The local transitivity of an undirected graph, this is calculated for
each vertex given in the vids argument. The local transitivity of a
vertex is the ratio of the triangles connected to the vertex and the
triples centered on the vertex. For directed graph the direction of
the edges is ignored.
See, e.g., ?transitivity and Clustering coefficient.
So firstly both of them are valid measures and the choice should depend on your purposes. The difference between them is quite clear (see the wikipedia page):
It is worth noting that this metric places more weight on the low
degree nodes, while the transitivity ratio places more weight on the
high degree nodes. In fact, a weighted average where each local
clustering score is weighted by k_i(k_i-1) is identical to the global clustering
coefficient
where k_i is the number of vertex i neighbours. Hence, perhaps using both of them would be quite fair too.

#Julius Vainora answered it. Additional note for those that are trying to look for some answers for what does type = "average" do since igraph documentation doesn't say anything about it:
transitivity(graph, type = "average")
is the same as
transitivity(graph, type = "localaverage")
is the same as
transitivity(graph, type = "localaverageundirected")
is the same as
mean(transitivity(graph, type = "local"), na.rm = TRUE)

transitivity(g, type="local")
How about the order of the output vector?
Is it same as the order of:
degree(g, mode="all")
The degree vector has indexs but clustering coefficient not

Related

Correct way of calculating modularity for weighted graphs

I have about 13000 genes which I am trying to cluster using igraph as follows:
g.communities <- edge.betweenness.community(as.undirected(g), weights = E(g)$weight)
which returns 97 communities with modularity 0.9773353:
modularity(as.undirected(g), membership = g.communities$membership, weights = E(g)$weight)
#0.9773353
when I tried to custom made the number of communities as below I get modularity of 0.0094:
modularity(as.undirected(g), membership = cutat(g.communities, steps = 97), weights =
E(g)$weight)
#0.0094
Shouldn't these functions return similar results? Also, is it possible to use the above
function to find the correct number of clusters? (since by just increasing the steps the modularity always increases)
Finally g.communities$modularity returns a number for each vertex.
Can these numbers be interpreted as the correlation of each vertex to its corresponding module?
You are using the steps argument of cut_at. This does not specify the number of communities, but the number of merging steps to perform on the dendrogram. If you want 97 communities, use cut_at(g.communities, no=97) or simply cut_at(g.communities, 97).
That said, I do not suggest using edge.betweenness.community on weighted graphs at this time, for the reasons I described here.

How can I calculate degree, eigenvector, bonacich power centrality using R igraph package?

I am trying to calculate degree centrality, eigenvetor centrality and bonacich power centrality using igraph package in R. My data is South Korea's commuting data.
My data looks like this1st column: orientation region code, 2nd column: destination region code, 3rd column: commuting times between the two regions
I have made igraph graph using function graph_from_data_frame() like the picture below.
od18 is the data I used. The same one mentioned at the first picture.
But here are my problems.
I can't make an adjacency matrix using this graph.
: Error in get.adjacency.sparse(graph, type = type, attr = attr, edges = edges, :
Sparse matrices must be either numeric or logical,and the edge attribute is not
this is the code I executed.
this is the error.
Actually, I don't care about the adjacency matrix if centrality calculations don't have problems. But I am worrying what if this means that I am not going to get correct results of centrality calculations.
I tried to calculate the degree centrality using function degree() but the results values are all same.
All of my nodes have the same degree values as 250.
Any help about Bonacich power centrality using -beta.
Can you help me with these problems?

How to measure performance of K-Means cluster in R? [image & code included]

I am currently doing a K-means cluster analysis for some customer data at my company. I want to measure the performance of this cluster, I just don't know the library packages used to measure performance of it and I am also unsure if my clusters are grouped too close together.
The data feeding my cluster is a simple RFM (recency, frequency, & monetary value). I also included average order value per transaction by customer. I used the elbow method to determine the optimal number clusters to use. Data consists of 1400 customers and 4 metric values.
Attached is also an image of the cluster plot & R Code
drop = c('CUST_Business_NM')
#Cleaning & Scaling the Data
new_cluster_data = na.omit(data)
new_cluster_data = data[, !(names(data)%in%drop)]
new_cluster_data = scale(new_cluster_data)
glimpse(new_cluster_data)
#Elbow Method for Optimal Clusters
k.max <- 15
data <- new_cluster_data
wss <- sapply(1:k.max,
function(k){kmeans(data, k, nstart=50,iter.max = 15 )$tot.withinss})
#Plot out the Elbow
wss
plot(1:k.max, wss,
type="b", pch = 19, frame = FALSE,
xlab="Number of clusters K",
ylab="Total within-clusters sum of squares")
#Create the Cluster
kmeans_test = kmeans(new_cluster_data, centers = 8, nstart = 1000)
View(kmeans_test$cluster)
#Visualize the Cluster
fviz_cluster(kmeans_test, data = new_cluster_data, show.clust.cent = TRUE, geom = c("point", "text"))
You probably do not want to measure the performance of cluster but the performance of the cluster algorithm, in this case kmeans.
First, you need to be clear what cluster distance measure you want to use. The result of the cluster computation is a dissimilarity matrix, thus the choice of the distance measure is critical, you can play with euclidean, manhattan, any kind of correlation or other distance measure, e.g., like this:
library("factoextra")
dis_pearson <- get_dist(yourdataset, method = "pearson")
dis_pearson
fviz_dist(dis_pearson)
This will give you the distance matrix and visualize it.
The output of kmeans has several bits of information. The most important with regard to your question are:
totss: the total sum of squares
withinss: vector of within-cluster sum of squares
tot.withinss: total within-cluster sum of squares
betweenss: the between-cluster sum of squares
Thus, the goal is to optimize these by playing with distances and other methods to cluster the data. Using cluster package, you can simply extract these measures by mycluster <- kmeans(yourdataframe, centers = 2) and then calling mycluster.
Side comment: kmeans requires the number of clusters defined by the user (additional effort) and it is very sensitive to outliers.

How to deal with missing values in degree function in R?

I'm currently using degree function in igraph. But the problem is my data set has lots of missing values. So if I use degree function, degree centrality is strongly biased. And my data set consists of several groups, it might be also biased due to the size of groups. This is the syntax I'm using right now.
graph <- graph.adjacency(A, mode='directed', diag = FALSE)
indgree <- degree(graph, v = V(graph), mode = c("in"),
loops = TRUE, normalized = TRUE)
So, my question will be:
1) Is there any way to deal with missing values in this function?
2) Can I calculate indegree centrality based on the size of each group?

Input to fit a power-law to degree distribution of a network

I would like to use R to test whether the degree distribution of a network behaves like a power-law with scale-free property. Nonetheless, I've read different people doing this in many different ways, and one confusing point is the input one should use in the model.
Barabasi, for example, recommends fitting a power-law to the 'complementary cumulative distribution' of degrees (see Advanced Topic 3.B of chapter 4, figure 4.22). However, I've seen people fit a power-law to the degrees of the graph (obtained with igraph::degree(g)), and I've also seen others fitting a power-law to a degree distribution, obtained via igraph::degree_distribution(g, cumulative = T)
As you can see in the reproducible example below, these options give very different results. Which one is correct? and how can I get the "complementary cumulative distribution of degrees" to from a graph so I can fit a power-law?
library(igraph)
# create a graph
set.seed(202)
g <- static.power.law.game(500, 1000, exponent.out= 2.2, exponent.in = 2.2, loops = FALSE, multiple = T)
# get input to fit power-law.
# 1) degrees of the nodes
d <- degree(g, v = V(g), mode ="all")
d <- d[ d > 0] # remove nodes with no connection
# OR ?
# 2) cumulative degree distribution
d <- degree_distribution(g, mode ="all", cumulative = T)
# Fit power law
fit <- fit_power_law(d, impelementation = "R.mle")
Well, the problem here is that you have 2 different statistics here.
The degree of a node shows how many connections it has to other nodes.
The degree distribution is the probability distribution of those degrees over the network.
For me it doesn't make much sense to apply the igraph::fit_power_law on a degree distribution as the degree distribution is already a power law to a certain extent.
However, don't forget that the igraph::fit_power_law has more options than the implementation argument, which will result in different things, depending on what you're "feeding it".

Resources