Simulate ERGM with nodal attributes - r

I was wondering whether it is possible to simulate networks that come from an ERGM distribution in which the nodes have attributes. For example, if I wanted to simulate a network where triangles between nodes with similar attributes are more likely, I would do something like:
library(ergm)
g_sim = simulate(network(n, directed=FALSE) ~ triangles + nodematch,
nsim=1,
coef=thetas)
But the thing is that these kind of statistics that depend on node attributes (i.e. like nodematch) require parameters, which I don't have because the network doesn't exist beforehand (I'm trying to simulate it).
How could this be done?

Will something like this work?
library(ergm)
# Initialize an empty network with N nodes
N <- 50
g <- network(1, directed = FALSE)
add.vertices(g, N - network.size(g))
# Make up a node classification to go with nodematch
type <- rbinom(N, 1, .25)
g %v% "type" <- ifelse(type, "green", "blue")
# Set the parameters of the model.
# Use large coefficients to make the result clear.
# These coefficients should set the base density and the
# density of edges between nodes of the same type.
thetas <- c(-2.5, 2.5)
# Simulate one network
# I'm using edges instead of triangles because of the
# tendancy towards degeneracy with triangles (my first attempt
# had a density of 1.0)
g_sim <- simulate(
g ~ edges + nodematch("type"),
nsim = 1,
coef = thetas
)
# Plot to be sure. There should be many more edges between
# nodes of the same color than nodes of different colors.
plot(g_sim, vertex.col = g %v% "type")
# Are the coefficients similar to what they should be?
m <- ergm(g_sim ~ edges + nodematch("type"))
summary(m)

Related

How can I create a scatter plot in R to visualise the result of a SOM clustering model?

I have a dataset (this is just a dummy, my real datasets are much larger) in which there are five variables: two spatial variables X and Y (basically pairs of coordinates) and three attributes A, B and C associated to each X,Y point:
X Y A B C
1 1 34 11 26
1 2 47 16 31
1 3 60 21 36
1 4 73 26 41
1 5 86 31 46
2 1 99 36 51
... with 15 more rows
If I run a k-Means Clustering model on the dataset, I can easily produce a plot in which each X,Y point is coloured according to the related cluster:
library(tidyverse)
#Read the dataset
My_ds <- read_delim("test_dataset.csv",delim = ",", escape_double = FALSE, trim_ws = TRUE)
#Set the number of clusters
kClusters <- 3
#Create the model
kMeans <- kmeans(My_ds[ , c("A", "B", "C")], centers = kClusters)
#Plot the result
ggplot(My_ds, aes(X, Y)) +
geom_point(col = kMeans$cluster,
size = 15) +
theme_minimal()
k-Means scatter plot
With the kohonen package I can also use a different clustering approach based on self-organising maps (SOM):
library(kohonen)
#Prepare the dataset
My_ds_SOM <- as.matrix(scale(My_ds[ , c("A", "B", "C")]))
#Set the grid
My_Grid <- somgrid(xdim = 3, ydim = 3, topo = "hexagonal")
#Create the model
My_Model <- som(X = My_ds_SOM,
grid = My_Grid)
However, I cannot find a way to produce a scatter plot similar to the one above and based on the SOM clusters. With k-Means I used kMeans$cluster to control the colour of the X,Y points, what should I use with SOM?
Update 1
OK, I made some progress thanks to this blog post. The key is to perform clustering on the SOM nodes, to isolate groups of samples with similar metrics.
First, an estimate of the number of clusters that would be suitable can be ascertained using a K-means algorithm and looking for an elbow-point in the plot of within cluster sum of squares (WCSS):
#View WCSS for K-means
mydata <- getCodes(My_Model)
wcss <- (nrow(mydata)-1)*sum(apply(mydata,2,var))
for (i in 2:8) { #Second number is of one's choosing (I used number_of_nodes-1)
wcss[i] <- sum(kmeans(mydata, centers=i)$withinss)
}
plot(wcss)
WCSS plot
Then I use hierarchical clustering and the SOM plot function to visualise the clusters on the node map:
#Define colour palette
pretty_palette <- c("#1f77b4", '#ff7f0e', '#2ca02c', '#d62728', '#9467bd', '#8c564b', '#e377c2')
#Use hierarchical clustering to cluster the codebook vectors
som_cluster <- cutree(hclust(dist(getCodes(My_Model))), 3)
#Plot these results
plot(My_Model, type="mapping", bgcol = pretty_palette[som_cluster], main = "Clusters")
add.cluster.boundaries(My_Model, som_cluster)
Clusters on node map
Finally, I assign labels to the original data by using the som_cluster variable that maps nodes, with the som_model$unit.classif variable that maps data samples to nodes:
#Get vector with cluster value for each original data sample
cluster_assignment <- som_cluster[My_Model$unit.classif]
#Add the assignment as a column in the original data
My_ds$cluster <- cluster_assignment
#Plot the result
ggplot(My_ds, aes(X, Y)) +
geom_point(col = My_ds$cluster,
size = 15) +
theme_minimal()
SOM+hierarchical scatter plot
Applying hierarchical clustering on top of the SOM nodes makes the process a bit convoluted, as SOM already helps reduce the dimensions and cluster neighbouring nodes together. But this was the only way I could get what I wanted.
Update 2
Some more progress. This time I'm focusing on making the whole process fully automatic. Specifically, I want to avoid choosing 1) the SOM grid size and 2) the number of clusters during the clustering of the node map.
Regarding point 1, I used a rule of thumb suggested by Vesanto J, Alhoniemi E. Clustering of the self-organizing map. IEEE Transactions on neural networks. 2000 May;11(3):586-600, which is #nodes = 5*sqrt(#observations). Therefore, setting the grid for the SOM model works like this:
My_dim <- as.integer(sqrt(5*sqrt(nrow(My_ds_SOM))))
My_Grid <- somgrid(xdim = My_dim, ydim = My_dim, topo = "hexagonal")
Of course, this works best with large datasets. In any case, this approach should be a starting point only, the grid size can (and should) then be adjusted by looking at the resulting node count plot, weight vector plot and heatmap.
About point 2, when using hierarchical clustering to cluster the codebook vectors, the kgs function of the maptree package allows the optimal number of clusters to be calculated automatically:
library(maptree)
distance <- dist(getCodes(My_Model))
clustering <- hclust(distance)
optimal_k <- kgs(clustering, distance, maxclus = 20)
clusters <- as.integer(names(optimal_k[which(optimal_k == min(optimal_k))]))
som_cluster <- cutree(clustering, clusters)
Also in this case, the number of clusters determined by the code can be compared to the one suggested by the WCSS plot, to check if there is a significant discrepancy.

how can I extract a sublayout to match a subgraph in R

I'm using igraph in R, trying to extract a subgraph that comprises only the largest cliques in a graph. I want to plot the graph and the subgraph (1) without moving the vertices and (2) while maintaining numbering. I can't seem to make it work. I tried storing the layout coordinates directly in the vertices but plot seems to rescale things. Setting rescale=FALSE didn't work either. Here's what I have, first plotting the random graph, then highlighting the largest cliques, and finally displaying only the largest cliques:
# plot random graph
g <- sample_gnp(n=30, p=.1)
l=layout_with_fr(g)
V(g)$x <- l[,1]
V(g)$y <- l[,2]
V(g)$id <- 1:vcount(g)
plot(g,vertex.size=6,vertex.label.dist=1,vertex.label=V(g)$id,main="a random network",sub="where are the cliques?")
# highlight largest cliques
lc=unlist(largest_cliques(g))
vcol <- rep("grey80", vcount(g))
vcol[unlist(lc)] <- "gold"
plot(g, vertex.size=6,vertex.color=vcol,vertex.label.dist=1,vertex.label=V(g)$id,main="here they are!",
layout=l)
# plot only the largest cliques, without changing position or vertex numbers
sg = induced_subgraph(g,lc)
sl = cbind(V(sg)$x,V(sg)$y)
plot(sg,vertex.size=6,vertex.label.dist=1,vertex.label=V(sg)$id,vertex.color="gold",
layout=sl)
First of all, you did not set the random seed, so each time you run this code you will get a different graph and layout. I am setting the seed for reproducibility.
# plot random graph
set.seed(2021)
g <- sample_gnp(n=30, p=.1)
l=layout_with_fr(g)
V(g)$x <- l[,1]
V(g)$y <- l[,2]
V(g)$id <- 1:vcount(g)
# highlight largest cliques
lc=unlist(largest_cliques(g))
vcol <- rep("grey80", vcount(g))
vcol[unlist(lc)] <- "gold"
There are other ways to do this, but I will do it using rescale=F.
When you do that, you have to adjust xlim and ylim to the ranges of the layout. Also, it changes the scale, so you need to change the vertex size.
Putting all of that together, I get:
Rx =range(l[,1])
Ry =range(l[,2])
par(mfrow=c(1,2))
plot(g, vertex.size=45,vertex.color=vcol,vertex.label.dist=1,
vertex.label=V(g)$id,main="here they are!", layout=l,
rescale=F, xlim=Rx, ylim=Ry)
# plot only the largest cliques, without changing position or vertex numbers
sg = induced_subgraph(g,lc)
sl = cbind(V(sg)$x,V(sg)$y)
plot(sg,vertex.size=45, vertex.label.dist=1,vertex.label=V(sg)$id,vertex.color="gold",
layout=sl, rescale=F, xlim=Rx, ylim=Ry, main="Unmoved")
you can use set-vertex_attr to set the name attribute for graph g, and then subset the layout lc like below
# plot random graph
set.seed(2021)
g <- sample_gnp(n = 30, p = .1)
l <- layout_with_fr(g)
# highlight largest cliques
lc <- unlist(largest_cliques(g))
g %>%
set_vertex_attr(name = "name", value = seq(vcount(.))) %>%
induced.subgraph(lc) %>%
plot(layout = l[as.integer(names(V(.))), ], vertex.color = "gold")
and you will get

Color nodes of a graph created with igraph proportionally to eigenvector centrality of that node

I am using the library igraph in R. I have created an MST graph by using the function mst based on some distance function stored in a dataframe called tree:
gf <- graph_from_data_frame(tree, directed = FALSE)
mstgf <- mst(gf, weights = tree$distance)
I have calculated the eigenvector centrality of each node in the MST as:
ec <- eigen_centrality(mstgf, directed=T, weights=NA)$vector
I have then joined the vector of eigenvector centralities to the data.frame tree:
x <- cbind(names(ec), as.numeric(ec)) %>% as_tibble() %>% mutate(V2 = as.numeric(V2)) %>%
rename(from = V1)
tree <- tree %>% inner_join(.,x, by = "from")
What I want to do is to plot the MST by coloring the nodes in a way to resemble their eigenvector centrality. I am using the following for plotting, yet I don't know how to change the argument vertex.color to obtain something like the image below?
plot.igraph(mstgf,
vertex.color = round(tree$V2,0),
edge.color = "blue",
edge.curved = TRUE,
edge.witdh = 1,
)
Once you've calculated the centralities of your choice, you would like to 1) scale the values to a meaningful categorical range (like 1,2,3,4,5) and 2) associate your centrality categories with colors from a gradient. You don't necessarily have to keep joining and calculating outside igraph.
This is a random network
# Random network
g <- erdos.renyi.game(100,250,'gnm', directed=F)
1) make categories
This forces every eigenvector centrality to assume an integer value between 1 and 10
# Calculate eigen centrality and check the distribution We're attaching the
# result of eigen_centrality() straight onto the vertices as verticy-attributes
V(g)$ec <- eigen_centrality(g, directed=T, weights=NA)$vector
hist(V(g)$ec)
# You could use the scales package, or define this normalisation function:
normalize <- function(x){(x-min(x))/(max(x)-min(x))}
(V(g)$ec_index <- round(normalize(V(g)$ec) * 9) + 1)
#ec_index should now be a category between 1 and 10 for your centralities
You can use any resolution you like.
2) Attach colours from the indexation
There are several packages and ways to load colour-ranges in R (colorspace, colorRamps, RColorBrewer etc).
# Build a color-mapping with 10 categories and set the color of each
# node to the corresponding color of each centrality-measure category
V(g)$color <- colorRampPalette(c("turquoise", "yellow","red"))(10)[V(g)$ec_index]
# Look at what we did
table(V(g)$color)
plot(g, vertex.label=NA, vertex.size=5)
This example should produce something along the lines of this graph here:

How to apply k-means clustering on Network Graphs in R iGraph?

The following code generates a Netwrok Graph and can separate the data into two groups, but I would like to apply k-means on it and see how the algorithm clusters the data into the same set of clusters.
library(igraphdata) # library for the graph data
data(karate)
V(karate) %>% print() # shows the list of the nodes
# Reproducible layout
set.seed(69)
l <- layout_with_kk(karate) #sets the layout.
# Plot undecorated Graph Network First.
igraph_options(vertex.size=10)
par(mfrow=c(1,1)) # sets plotting parameters
plot(karate, layout=l, vertex.label=V(karate),
vertex.color=NA) # Plots a basic Graph
# Now decorate, starting with labels.
V(karate)$label <- sub("Actor ", "", V(karate)$name)
V(karate)
# Two Club Leaders get shapes different from other club members.
V(karate)$shape <- "circle"
V(karate)[c("Hi", "John")]$shape <- "rectangle" # sets different shapes for these two only
# Differentiate two factions by color. (Similar to clustering & color-coded)
V(karate)[Faction == 1]$color <- "red"
V(karate)[Faction == 2]$color <- "dodgerblue"
# Vertex area proportional to vertex strength
# (i.e., total weight of incident edges).
V(karate)$size <- 4*sqrt(strength(karate))
V(karate)$size2 <- V(karate)$size * .5
# Weight edges by number of common activities
E(karate)$width <- E(karate)$weight
# Color edges by within/between faction.
F1 <- V(karate)[Faction==1] # sets variable for first cluster (faction)
F2 <- V(karate)[Faction==2] # similar to the above.
E(karate)[ F1 %--% F1 ]$color <- "pink"
E(karate)[ F2 %--% F2 ]$color <- "lightblue"
E(karate)[ F1 %--% F2 ]$color <- "green"
# Offset vertex labels for smaller points (size based, default is zero).
V(karate)$label.dist <-
ifelse(V(karate)$size >= 9.0, 0, 1.0)
# Plot decorated graph, using same layout.
plot(karate, layout=l)
Final Output:
you can get the adjacency matrix of the resultant graph and apply k-means clustering on top of the matrix. It is equivalent to applying k-means to the graph.
Following is the sample code
adj.matrix = get.adjacency(graph, sparse=FALSE)
k <- 3 # no of desired clusters
km <- kmeans(matrix , centers = k, nstart = 25)

How to add specific weights to some edges?

In a network I find out some specific nodes, for example 3, 4, 5 and an initial node 9. I want to add weights to those edges and I need to call in future.
More specific: I need to add weights to edge:(3,9), (4,9), (5,9). And lately I need to recall those weights to do some calculation, i.e. I need a="(3,9)'s weights" something like this.
Since you do not provide any data, I will use a simple example that has links like the ones you describe.
## A simple example
library(igraph)
set.seed(1234)
g = make_ring(10)
g = add_edges(g, c(3,9,4,9,5,9))
E(g)$weight = 1
LO = layout_nicely(g)
plot(g, layout=LO)
If you have the "Intitial Node" and the "Specific Nodes", you can identify the Special Edges.
## Get the ids of the special edges
InitialNode = 9
ConnectingNodes = c(3,4,5)
ENDS = as.vector(rbind(ConnectingNodes, InitialNode))
SpecialEdges = get.edge.ids(g, ENDS)
With the IDs of the special edges, you can adjust their weights.
## Add weight to the special edges
E(g)$weight[SpecialEdges] = c(2,4,6)
## plot to show the weights
plot(g, edge.width=E(g)$weight)
If you later need to do something with the weights, you can access the weights using:
E(g)$weight[SpecialEdges]
[1] 2 4 6

Resources