I am looking for a way to compute the estimated occurrence of a given motif/subgraph in a random network. For example, I am given a network motif M, which has three vertices and three edges: (A->B, B->C, C->A). What is the expected occurrence of M in a random network with n vertices and k edges?
I know methods that generate a random graph and then enumerate M. Since that is computationally expensive, I wondered if there are more straightforward methods to compute the estimated occurrences.
Related
Is there a graph algorithm for solving the following problem:
Given a weighted undirected graph G (all weights are positive), a start node N and a total weight W*. Generate a random cycle through the graph, starting and ending at node N, of which the total weight (the summed weight of all the edges) approximates the given weight W*.
One could see this as generating the cycle that best approximates W*, but generating a cycle that approximates W* within some margin of error is also fine.
If you want a simple cycle, you want an approximation algorithm for the travelling salesman problem. I believe there are known hardness results, indicating that this is NP-hard for general graphs, but there is a wide range of heuristics; you can check the literature.
Hi so bascially lets say I have a network(A) and I want to find the betweeness centrality of it.
I used: centr_betw(graph, directed = FALSE, normalized = TRUE)
This returned every node with the value:
[1] 1.827102e+04 3.554450e+04 5.000000e-01 9.524383e+04
[5] 0.000000e+00 0.000000e+00 1.078184e+05 4.768125e+04
I really want to know what these numbers mean.
It also shows the between centralization of the whole network and a max value. Lets say the network(A) as a whole has a betweenness centrality of 0.04. What can you say about this network(A) when it is compared to a random network with a betweeness centrality of 0.001?
MUCH THANKS GUYS
Quite a bit of information can be found simply if you type ?centr_betw. In particular, centr_betw returns a list of three components: res, centralization, theoretical_max.
Each element of res is the betweenness centrality of a corresponding vertex i computed in this manner. Specifically, given a shortest path between some vertices j and k (not equal to i), i is considered to be more central if this shortest path includes i. Going over all possible pairs of j and k we can find this betweenness centrality of i.
Further, centralization and theoretical_max concern the Freeman centralization. centralization is C_x, which measures how central network's most central vertex is in relation to how central all the other vertices are. theoretical_max is the denominator of C_x providing the maximal possible value of the numerator across all networks with the same number of vertices.
So, if network A has Freeman centralization 0.04 and network B has 0.001, then we may say that the most central vertex of A is significantly more central than the most central vertex of B. If B is random (i.e., Erdos-Renyi), then that makes sense, because in a big enough network all vertices should play pretty similar role.
In dynamical networks, one may calculate the Hamming distance to compare the similarity between two graphs, can anyone explain how?
Assuming that the Hamming distance of two graphs have equal edge density, what is the difference between Hamming distance and expected Hamming distance between two independent Erdos-Renyi random graphs? How does the later arise?
The Hamming distance measures the minimum number of substitutions required to change (transform) one mathematical 'object' (i.e. strings or binary) into another.
So in network theory it can be defined as a the number of different connections between two networks (it can be formulated also for not equally-sized networks and for weighted or directed graphs). In a simple case in which you have two Erdos-Renyi networks (the adjacency matrix has 1 if the node pair is connected and 0 if not) the distance is mathematically defined as follows:
The values that are subtracted are the two adjacency matrix. If you take two Erdos-Renyi networks with wiring probability of 0.5 and compute the hamming distance between them you should get a value around 0.5. I generated different Erdos-Renyi graph and their Hamming distances produced a Gaussian curve around 0.5 (as we can expect; see below).
If it is needed I can give you the code I used.
I have a large, dense graph (~33,000 nodes, ~345 million edges, so the graph density is approximately 0.63). I'm interested in estimating the number of 3-edge paths in this graph. Is there an accurate estimation using only this information (ie no adjacency matrices)?
If a rough estimate is good enough (and the number k of edges in the paths is fix and small): let d be the density, then you have n starting nodes, about n * d possible second nodes, ... and about n * d^k end nodes. If k is small the number of paths with cycles are small opposed to the simple paths. The number of all paths would be about n^(k+1) * d^(k(k+1)/2) - so this would be a (quite) rough estimate.
Suppose there are three sequences to be compared: a, b, and c. Traditionally, the resulting 3-by-3 pairwise distance matrix is symmetric, indicating that the distance from a to b is equal to the distance from b to a.
I am wondering if TraMineR provides some way to produce an asymmetric pairwise distance matrix.
No, TraMineR does not produce 'assymetric' dissimilaries precisely for the reasons stressed in Pat's comment.
The main interest of computing pairwise dissimilarities between sequences is that once we have such dissimilarities we can for instance
measure the discrepancy among sequences, determine neighborhoods, find medoids, ...
run cluster algorithms, self-organizing maps, MDS, ...
make ANOVA-like analysis of the sequences
grow regression trees for the sequences
Inputting a non symmetric dissimilarity matrix in those processes would most probably generate irrelevant outcomes.
It is because of this symmetry requirement that the substitution costs used for computing Optimal Matching distances MUST be symmetrical. It is important to not interpret substitution costs as the cost of switching from one state to the other, but to understand them for what they are, i.e., edit costs. When comparing two sequences, for example
aabcc and aadcc, we can make them equal either by replacing arbitrarily b with d in the first one or d with b in the second one. It would then not make sense not giving the same cost for the two substitutions.
Hope this helps.