what does "not strongly connected graph" means in centiserve centroid computation? - r

As in the question, I have four different networks which I load from 4 different csv files. Each one fails when I compute the centroid using centiserve library. On the other hand, if I generate a random ER network, the centroid computation works.
I looked into the centroid funcion and eventually I found it checks whether the network is connected using an igraph this function is.connected(g, mode="strong")
According to wikipedia a graph is strongly connected if all the nodes are reachable from a random node in the network. To this aim, I calculated the components of my network, using igraph's decompose() function and all the networks have a single connected component: length(decompose(net)) is always equal to 1. But, centroid(net) is always returning the error.
Eventually, the question is: What exactly is this function looking for when it verifies if the graph is suitable? Why my network has a single connected component but the is.connected() function of igraph return False?
Some code:
#load file
finalNet <- read.csv("net.csv", sep=",", header=T)
#get network
net <- graph_from_data_frame(finalNet[, c(1, 2)])
#decompose says that there is a single connected component
length(decompose(net))
#while centroid does not work!
centroid(net)
the network is available here

ok, I found the answer. The problem is that the function graph_from_data_frame create a directed network, if not specified otherwise.
Hence, the solution to make my example work is to load the network as not directed:
net <- graph_from_data_frame(finalNet[, c(1, 2)], directed=F)

Related

OpenstreetmapX.jl routing using subnetworks

I am using OpenstreetmapX.jl library to calculate routes between to points of my OSM network.
Is there any way to obtain such routes limiting the results to paths of a predefined sub network?
For instance, I would like to calculate the route between two points but only using secondary roadways.
Is it possible?
Most likely you would like to create a separate MapData object for processing such data. There is a nice tool osmfilter that allows to filter out specific route types before importing data to Julia.
In this way you can have separate representations of your map data.
However, if you want to operate directly on the MapData object you could set the distance weights to Inf and which would force the routing algorithm to avoid such edges or return Inf route length.
Suppose you have the map from the tutorials:
using OpenStreetMapX, Graphs
map_file_path = joinpath(dirname(pathof(OpenStreetMapX)),"..","test/data/reno_east3.osm")
mx = get_map_data(map_file_path, use_cache=false);
Than you could do to filter out secondary routes:
m2 = deepcopy(mx)
edgs = collect(edges(mx.g))[findall(!=(4), mx.class)]
setindex!.(Ref(m2.w), Inf, src.(edgs), dst.(edgs))
Now the selected route will differ between mx and m2:
julia> shortest_route(mx, 3073938243, 3101115892)
([3073938243, 3073938154, 140340636, 140340638, 3101117891, 3101115892], 549.9261168024648, 32.32592992211371)
julia> shortest_route(m2, 3073938243, 3101115892)
([3073938243, 140533189, 140533195, 3101115892], 617.4395184717641, 54.94654519330738)

count number of disconnected sub-networks

In igraph is there a function that will return the number of sub-networks that are not connected to each other?
For example, it would return 3 for the network below.
Was pretty sure I had used a function like this in the past but can't find anything like it now. There are options for communities and for individual isolates but not for these disconnected sub-networks that I can find.
You are looking for components. For instance,
g <- sample_gnp(20, 1/20)
components(g)$no
# [1] 14
gives the number of them in g.

get.inducedSubgraph and loop function

I have a list containing a network for each row (sna.list.1).
For each of the networks, I need to extract a subgraph where only women are included, in order to calculate the density of women-only networks.
I have created a loop function to set vertex attributes
female=vector()
for (i in 1 : length (sna.list.1))
set.vertex.attribute(sna.list.1[[i]],"Female",alter.list.1bis[[i][,"NIDemo1_c4"])
but when I tried to create the subgraph with get.inducedSubgraph I receive a warning message saying " Illegal vertex selection in get.inducedSubgraph". The same formula works if I applied it to just one row/network.
subnetwork2=vector()
for (i in 1 : length (sna.list.1))
subnetwork2[[i]]=get.inducedSubgraph(sna.list.1[[i]],v=which(sna.list.1[[i]]%v%"Female"=="1"))
does anyone have suggestions?
Assuming that get.inducedSubgraph isolated alters is a continuation of your issue, it sounds like you were trying to induce a subgraph of size zero. i.e. v=which(sna.list.1[[i]]%v%"Female"=="1") was returning integer(0) for some networks.
Ideally, since the network package supports networks of size zero (no vertices) get.inducedSubgraph() should return a network of size zero in this case, but it does not yet do that.

Specifying cross-network (multiplex) terms in R's ERGM package

I have a research question which deals with a multiplex network, i.e. there are multiple edge-types which may (or may not) co-occur between two nodes. For example:
library(igraph)
relations <- data.frame(
from = c(1,1,2,3,4),
to = c(2,3,3,4,5),
link = c(1,1,1,0,0),
funding = c(0,1,0,0,1),
)
graph <- graph.data.frame(relations)
Note that two different types of relations exist, links and funding.
In my analysis, I am interested in triad-structures, especially transitive triads, i.e. if A -link-> B -link-> C then A -fund-> C. I have seen multiplex networks in the literature (this great piece), which serves as an example, but also indicate R should be capable of doing this.
But, my problem is that I am unable to find any indication of the syntax for such triads across networks. I have seen the PNet application to do this (TriangleABA from the manual), but I am unable to do this in R. I am looking for something like
fit <- ergm( fundnet ~ edges + odegree + ttriple(funding) + ttripe(link) )
Instead of specifying types, I could also make the incidence matrices by using tcrossprod(funding,link) for two-step via funding and than links.
Although I am not dead-set on R, my network may -depending on operationalization- be around 100.000 nodes with 300.000 edges. In R, I am confident I can scale this on a cluster of computers, hence my preference.
I would strongly appreciate any remarks or (hopefully) solutions!

Clustering : two different values for the within sum of square

I'm doing a clustering analysis and I have two issues :
I found two different values for the within sum of square with these 2 methods :
1/ First method founded here : http://www.statmethods.net/advstats/cluster.html
set.seed(180)
wss <- (nrow(mydata)-1)*sum(apply(mydata,2,var))
for (i in 1:8)
wss[i] <- sum(kmeans(mydata,
centers=i)$withinss)
wss
[1] 2244832.0 1707497.8 1514193.9 1131349.7 990028.8 698772.0 683106.4 522783.8
2/ Second method
set.seed(180)
fit <- kmeans(mydata, 5)
fit$tot.withinss
[1] 857443.8
As you can see 990 028 !=857 443 even though I used "set.seed"
Is there a mistake in the formula of the Statmethods website ?
Lastly, sometimes the wss raise with the number of cluster. Is it ok or it's impossible ?
You use set.seed but you also do a lot with the random number generator before getting to kmeans(data, 5) in your first example. You are likely getting a different clustering solution. If you just look at sum(fit$withinss) it should match fit$tot.withinss for a given clustering solution. There is some randomness involved though so if you want the same thing you need to make sure you set the seed properly.
By setting the initial centers you effectively provide a manual seed.
k-means only uses the random seed during initial generation of the centers.
So in fact, you provide two different initializations, and as such you shouldn't be surprised to get different results. K-means only finds local optima.

Resources