I want to create a function for Burt's effective size.
The formula boils down to:
Effective size = n - 2t/n
where t is the number of ties (not counting ties to ego)
n is the number of people in the network (not counting ego).
I'm not really sure where to start with writing functions within/for igraph.
Let me know if more detail would be helpful...
Thanks.
First simulate a basic graph:
require(igraph)
alters = 50
ties = 10
set.seed(12345)
edgelist = rbind(0, 1:alters)
edgelist = cbind(edgelist, replicate(ties, sample(alters, 2)))
g = graph(edgelist, directed=F)
dev.new(width=5, height=5)
plot(g, layout=layout.kamada.kawai)
Then write a simple function to calculate the effective size. (The functions in here that operate on g are all nicely documented in the igraph manual and in various examples around the net.)
EffectiveSize <- function(g, ego=0) {
n = neighbors(g, ego)
t = length(E(g)[to(n) & !to(ego)])
n = length(n)
n - 2 * t / n
}
> EffectiveSize(g)
[1] 49.6
Related
I was wondering whether it is possible to simulate networks that come from an ERGM distribution in which the nodes have attributes. For example, if I wanted to simulate a network where triangles between nodes with similar attributes are more likely, I would do something like:
library(ergm)
g_sim = simulate(network(n, directed=FALSE) ~ triangles + nodematch,
nsim=1,
coef=thetas)
But the thing is that these kind of statistics that depend on node attributes (i.e. like nodematch) require parameters, which I don't have because the network doesn't exist beforehand (I'm trying to simulate it).
How could this be done?
Will something like this work?
library(ergm)
# Initialize an empty network with N nodes
N <- 50
g <- network(1, directed = FALSE)
add.vertices(g, N - network.size(g))
# Make up a node classification to go with nodematch
type <- rbinom(N, 1, .25)
g %v% "type" <- ifelse(type, "green", "blue")
# Set the parameters of the model.
# Use large coefficients to make the result clear.
# These coefficients should set the base density and the
# density of edges between nodes of the same type.
thetas <- c(-2.5, 2.5)
# Simulate one network
# I'm using edges instead of triangles because of the
# tendancy towards degeneracy with triangles (my first attempt
# had a density of 1.0)
g_sim <- simulate(
g ~ edges + nodematch("type"),
nsim = 1,
coef = thetas
)
# Plot to be sure. There should be many more edges between
# nodes of the same color than nodes of different colors.
plot(g_sim, vertex.col = g %v% "type")
# Are the coefficients similar to what they should be?
m <- ergm(g_sim ~ edges + nodematch("type"))
summary(m)
I've been following documentation tutorials and even lecture tutorials step by step. But for some reason the output of my plot is like this:
The output doesn't make any sense to me. There clearly is no structure, or communities in this current plot, as you can see that the bigger circles are all overlapping. Shouldn't this, in this case, return only a single community? Additionally the modularity of my network is ~0.02 which would again, suggest there is no community structure. But why does it return 3 communities?
this is my code: (exactly same as in documentation, with different dataset)
m <- data.matrix(df)
g <- graph_from_adjacency_matrix(m, mode = "undirected")
#el <- get.edgelist(g)
wc <- cluster_walktrap(g)
modularity(wc)
membership(wc)
plot(wc,g)
my data set looks is a 500x500 adjacency matrix in the form of a csv, with a 1-500 column and index names corresponding to a person.
I tried understanding the community class and using different types of variables for the plot, e.g. membership(wc)[2] etc. My thought is that the coloring is simply wrong, but nothing Ive tried so far seems to fix the issue.
You can have inter-community connections. You're working with a graph of 500 nodes and they can have multiple connections. There will be a large number of connections between nodes of different communities, but if you conduct a random walk you're most likely to traverse connections between nodes of the same community.
If you separate the communities in the plot (using #G5W's code (igraph) Grouped layout based on attribute) you can see the different groups.
set.seed(4321)
g <- sample_gnp(500, .25)
plot(g, vertex.label = '', vertex.size = 5)
wc <- cluster_walktrap(g)
V(g)$community <- membership(wc)
E(g)$weight = 1
g_grouped = g
for(i in unique(V(g)$community)){
groupV = which(V(g)$community == i)
g_grouped = add_edges(g_grouped, combn(groupV, 2), attr=list(weight = 2))
}
l <- layout_nicely(g_grouped)
plot( wc,g, layout = l, vertex.label = '', vertex.size = 5, edge.width = .1)
Red edges are intercommunity connections and black edges are intracommunity edges
is there a way to draw the links or nodes of a network in igraph for R proportional to a minimum and maximum values?
Using link and node attributes for drawing is very handy in igraph, but in some networks the difference between the minimum and maximum values found in a network lead to a very ugly drawing. For instance, see this code:
#Transforming a sample network (Safariland) from the package bipartite into an igraph object
mat = Safariland
mat2 = cbind.data.frame(reference=row.names(mat),mat)
list = melt(mat2, na.rm = T)
colnames(list) = c("plant","animal","weight")
list[,1] = as.character(paste(list[,1]))
list[,2] = as.character(paste(list[,2]))
list2 = subset(list, weight > 0)
g = graph.data.frame(list2)
g2 = as.undirected(g)
#Plotting the igraph object with edge widths proportional to link weights
plot(g2,
edge.width = E(g2)$weight)
The result is an odd-looking network, as the difference between link weights it too large. How can I draw those edges within a min-max range, so the network looks better?
Thank you very much.
You can apply any math or function to the values before passing them to the plot function.
What you want is for example a rescaling function to map values to a different range as in this stackoverflow answer:
mapToRange<-function(x,from,to){
return( (x - min(x)) / max(x - min(x)) * (to - from) + from)
}
make example graph with random weights that are bad as line widths:
library(igraph)
g<-erdos.renyi.game(20,0.5)
E(g)$weight<-runif(length(E(g)))^3 *100
bad plot:
plot(g, edge.width = E(g)$weight)
better plot, rescaling the edge weights to values between 1 and 10 first with above function:
weightsRescaled<-mapToRange(E(g)$weight,1,10)
plot(g, edge.width = weightsRescaled)
same thing more concise:
plot(g, edge.width = mapToRange(E(g)$weight,1,10))
I am using the cluster_infomap function from igraph in R to detect communities in a undirected, unweighted, network with ~19,000 edges, but I get a different number of communities each time I run the function. This is the code I am using:
clusters <- list()
clusters[["im"]] <- cluster_infomap(graph)
membership_local_method <- membership(clusters[["im"]])
length(unique(membership_local_method))
The result of the last line of code ranges from 805-837 in the tests I have performed. I tried using set.seed() in case it was an issue of random number generation, but this does not solve the problem.
My questions are (1) why do I get different communities each time, and (2) is there a way to make it stable?
Thanks!
cluster_infomap (see ?igraph::cluster_infomap for help) finds a
community structure that minimizes the expected description length of
a random walker trajectory
Whenever you deal with random number generation, then you get different results on each run. Most of the time, you can override this by setting a seed using set.seed (see ?Random for help) beforehand:
identical(cluster_infomap(g), cluster_infomap(g))
# [1] FALSE
identical({set.seed(1);cluster_infomap(g)},{set.seed(1);cluster_infomap(g)})
# [1] TRUE
or graphically:
library(igraph)
set.seed(2)
g <- ba.game(150)
coords <- layout.auto(g)
par(mfrow=c(2,2))
# without seed: different results
for (x in 1:2) {
plot(
cluster_infomap(g),
as.undirected(g),
layout=coords,
vertex.label = NA,
vertex.size = 5
)
}
# with seed: equal results
for (x in 1:2) {
set.seed(1)
plot(
cluster_infomap(g),
as.undirected(g),
layout=coords,
vertex.label = NA,
vertex.size = 5
)
}
Trying to find communities in tweet data. The cosine similarity between different words forms the adjacency matrix. Then, I created graph out of that adjacency matrix. Visualization of the graph is the task here:
# Document Term Matrix
dtm = DocumentTermMatrix(tweets)
### adjust threshold here
dtms = removeSparseTerms(dtm, 0.998)
dim(dtms)
# cosine similarity matrix
t = as.matrix(dtms)
# comparing two word feature vectors
#cosine(t[,"yesterday"], t[,"yet"])
numWords = dim(t)[2]
# cosine measure between all column vectors of a matrix.
adjMat = cosine(t)
r = 3
for(i in 1:numWords)
{
highElement = sort(adjMat[i,], partial=numWords-r)[numWords-r]
adjMat[i,][adjMat[i,] < highElement] = 0
}
# build graph from the adjacency matrix
g = graph.adjacency(adjMat, weighted=TRUE, mode="undirected", diag=FALSE)
V(g)$name
# remove loop and multiple edges
g = simplify(g)
wt = walktrap.community(g, steps=5) # default steps=2
table(membership(wt))
# set vertex color & size
nodecolor = rainbow(length(table(membership(wt))))[as.vector(membership(wt))]
nodesize = as.matrix(round((log2(10*membership(wt)))))
nodelayout = layout.fruchterman.reingold(g,niter=1000,area=vcount(g)^1.1,repulserad=vcount(g)^10.0, weights=NULL)
par(mai=c(0,0,1,0))
plot(g,
layout=nodelayout,
vertex.size = nodesize,
vertex.label=NA,
vertex.color = nodecolor,
edge.arrow.size=0.2,
edge.color="grey",
edge.width=1)
I just want to have some more gap between separate clusters/communities.
To the best of my knowledge, you can't layout vertices of the same community close to each other, using igraph only. I have implemented this function in my package NetPathMiner. It seems it is a bit hard to install the package just for the visualization function. I will write the a simple version of it here and explain what it does.
layout.by.attr <- function(graph, wc, cluster.strength=1,layout=layout.auto) {
g <- graph.edgelist(get.edgelist(graph)) # create a lightweight copy of graph w/o the attributes.
E(g)$weight <- 1
attr <- cbind(id=1:vcount(g), val=wc)
g <- g + vertices(unique(attr[,2])) + igraph::edges(unlist(t(attr)), weight=cluster.strength)
l <- layout(g, weights=E(g)$weight)[1:vcount(graph),]
return(l)
}
Basically, the function adds an extra vertex that is connected to all vertices belonging to the same community. The layout is calculated based on the new graph. Since each community is now connected by a common vertex, they tend to cluster together.
As Gabor said in the comment, increasing edge weights will also have similar effect. The function leverages this information, by increasing a cluster.strength, edges between created vertices and their communities are given higher weights.
If this is still not enough, you extend this principle (calculating the layout on a more connected graph) by adding edges between all vertices of the same communities (forming a clique). From my experience, this is a bit of an overkill.