Is there a graph algorithm for solving the following problem:
Given a weighted undirected graph G (all weights are positive), a start node N and a total weight W*. Generate a random cycle through the graph, starting and ending at node N, of which the total weight (the summed weight of all the edges) approximates the given weight W*.
One could see this as generating the cycle that best approximates W*, but generating a cycle that approximates W* within some margin of error is also fine.
If you want a simple cycle, you want an approximation algorithm for the travelling salesman problem. I believe there are known hardness results, indicating that this is NP-hard for general graphs, but there is a wide range of heuristics; you can check the literature.
Related
I am trying to learn some facial landmark detection model, and notice that many of them use NME(Normalized Mean Error) as performance metric:
The formula is straightforward, it calculate the l2 distance between ground-truth points and model prediction result, then divided it by a normalized factor, which vary from different dataset.
However, when adopting this formula on some landmark detector that some one developed, i have to deal with this non-trivial situation, that is some detector may not able to generate enough number landmarks for some input image(might because of NMS/model inherited problem/image quality etc). Thus some of ground-truth points might not have their corresponding one in the prediction result.
So how to solve this problem, should i just add such missing point result to "failure result set" and use FR to measure the model, and ignore them when doing the NME calculation?
If you have as output of neural network an vector 10x1 as example
that is your points like [x1,y1,x2,y2...x5,y5]. This vector will be fixed length cause of number of neurons in your model.
If you have missing points - this is because (as example you have 4 from 5 points) some points are go beyond the image width and height. Or are with minus (negative) like [-0.1, -0.2, 0.5,0.7 ...] there first 2 points you can not see on image like they are mission but they will be in vector and you can callculate NME.
In some custom neural nets that can be possible, because missing values will be changed to biggest error points.
In dynamical networks, one may calculate the Hamming distance to compare the similarity between two graphs, can anyone explain how?
Assuming that the Hamming distance of two graphs have equal edge density, what is the difference between Hamming distance and expected Hamming distance between two independent Erdos-Renyi random graphs? How does the later arise?
The Hamming distance measures the minimum number of substitutions required to change (transform) one mathematical 'object' (i.e. strings or binary) into another.
So in network theory it can be defined as a the number of different connections between two networks (it can be formulated also for not equally-sized networks and for weighted or directed graphs). In a simple case in which you have two Erdos-Renyi networks (the adjacency matrix has 1 if the node pair is connected and 0 if not) the distance is mathematically defined as follows:
The values that are subtracted are the two adjacency matrix. If you take two Erdos-Renyi networks with wiring probability of 0.5 and compute the hamming distance between them you should get a value around 0.5. I generated different Erdos-Renyi graph and their Hamming distances produced a Gaussian curve around 0.5 (as we can expect; see below).
If it is needed I can give you the code I used.
I have a large, dense graph (~33,000 nodes, ~345 million edges, so the graph density is approximately 0.63). I'm interested in estimating the number of 3-edge paths in this graph. Is there an accurate estimation using only this information (ie no adjacency matrices)?
If a rough estimate is good enough (and the number k of edges in the paths is fix and small): let d be the density, then you have n starting nodes, about n * d possible second nodes, ... and about n * d^k end nodes. If k is small the number of paths with cycles are small opposed to the simple paths. The number of all paths would be about n^(k+1) * d^(k(k+1)/2) - so this would be a (quite) rough estimate.
Given some x data points in an N dimensional space, I am trying to find a fixed length representation that could describe any subset s of those x points? For example the mean of the s subset could describe that subset, but it is not unique for that subset only, that is to say, other points in the space could yield the same mean therefore mean is not a unique identifier. Could anyone tell me of a unique measure that could describe the points without being number of points dependent?
In short - it is impossible (as you would achieve infinite noiseless compression). You have to either have varied length representation (or fixed length with length being proportional to maximum number of points) or dealing with "collisions" (as your mapping will not be injective). In the first scenario you simply can store coordinates of each point. In the second one you approximate your point clouds with more and more complex descriptors to balance collisions and memory usage, some posibilities are:
storing mean and covariance (so basically perofming maximum likelihood estimation over Gaussian families)
performing some fixed-complexity density estimation like Gaussian Mixture Model or training a generative Neural Network
use set of simple geometrical/algebraical properties such as:
number of points
mean, max, min, median distance between each pair of points
etc.
Any subset can be identified by a bit mask of length ceiling(lg(x)), where bit i is 1 if the corresponding element belongs to the subset. There is no fixed-length representation that is not a function of x.
EDIT
I was wrong. PCA is a good way to perform dimensionality reduction for this problem, but it won't work for some sets.
However, you can almost do it. Where "almost" is formally defined by the Johnson-Lindenstrauss Lemma, which states that for a given large dimension N, there exists a much lower dimension n, and a linear transformation that maps each point from N to n, while keeping the Euclidean distance between every pair of points of the set within some error ε from the original. Such linear transformation is called the JL Transform.
In other words, your problem is only solvable for sets of points where each pair of points are separated by at least ε. For this case, the JL Transform gives you one possible solution. Moreover, there exists a relationship between N, n and ε (see the lemma), such that, for example, if N=100, the JL Transform can map each point to a point in 5D (n=5), an uniquely identify each subset, if and only if, the minimum distance between any pair of points in the original set is at least ~2.8 (i.e. the points are sufficiently different).
Note that n depends only on N and the minimum distance between any pair of points in the original set. It does not depend on the number of points x, so it is a solution to your problem, albeit some constraints.
I have a finite undirected graph in which a node is marked as "start" and another is marked as "goal".
An agent is initially placed at the start node and it navigates through the graph randomly, i.e. at each step it chooses uniformly at random a neighbor node and moves to it.
When it reaches the goal node it stops.
I am looking for an algorithm that, for each node, gives an indication about the probability that the agent visits it, while traveling from start to goal.
Thank you.
As is often the case with graphs, it's simply a matter of knowing an appropriate way to describe the problem.
One way of writing a graph is as an adjacency matrix. If your graph G = (V, E) has |V| nodes (where |V| is the number of vertices), the this matrix will be |V| x |V|. If an edge exists between a pair of vertices, you set the item in the adjacency matrix to 1, or 0 if it isn't present.
A natural extension of this is to weighted graphs. Here, rather than 0 or 1, the adjacency matrix has some notion of weight.
In the case you're describing, you have a weighted graph where the weights are the probability of transitioning from one node to another. This type of matrix has a special name, it is a stochastic matrix. Depending on how you've arranged your matrix, this matrix will have either rows or columns that sum to 1, right and left stochastic matrices respectively.
One link between stochastic matrices and graphs is Markov Chains. In Markov chain literature the critical thing you need to have is a transition matrix (the adjacency matrix with weights equal to the probability of transition after one time-step). Let's call the transition matrix P.
Working out the probability of transitioning from one state to another after k timesteps is given by P^k. If you have a known source state i, then the i-th row of P^k will give you the probability of transitioning to any other state. This gives you an estimate of the probability of being in a given state in the short term
Depending on your source graph, it may be that P^k reaches a steady state distribution - that is, P^k = P^(k+1) for some value of k. This gives you an estimate of the probability of being in a given state in the long term
As an aside, before you do any of this, you should be able to look at your graph, and say some things about what the probability of being in a given state is at some time.
If your graph has disjoint components, the probability of being in a component that you didn't start in is zero.
If your graph has some states that are absorbing, that is, some states (or groups of states) are inescapable once you've entered them, then you'll need to account for that. This may happen if your graph is tree-like.