It is possible to create an A* admissible heuristic with zero cost path values? - path-finding

When we use A* with a non admissible heuristic we can sometimes get a non optimal path as result.
But when it is allowed to have path with zero cost, the only admissible heuristic that comes to my mind is h(x) = 0, which turns A* into a "simple" Dijkstra's algorithm.
Am I correct? Is this the only possible admissible heuristic? What is the real loss of not using an admissible heuristic? There is other path-finding algorithm that works better with zero cost paths?
An example:
Suppose the following graph (the numbers above edges shows the costs):
1 1 0 1 1
S --> V1 --> V2 --> V3 --> V4 --> G
S means start vertex
V means inner vertex
G means goal vertex
By looking the graph, we see that C(S) = 4.
What heuristic function h(x) can I use? If I use euclidian distance I got:
f(S) = g(S) + h(S)
f(S) = 0 + 5 = 5
We can see that this heuristic over-estimates the real distance, therefore for a more complex graph, it may not find the optimal solution.

Not true. The heuristic function h(x) has argument x consisting of the current search state. It returns an estimate of the distance from x to the goal. In a simple graph, x is a graph node.
Admissibility requires that h(x) can only be an under-estimate (or equal to the goal distance). This condition is for each particular x. (You seem to be inferring the condition is for all possible x, which is far too strong. A* would be useless if this were necessary.)
The correct statement regarding the case you propose is that h(x) = 0 is necessary only when x is a state with distance zero to the goal. Any other value would be an over-estimate. However, for any other x (in the same state space) that requires transitions with total at least cost C>0 to get to the goal, we can have any h such that h(x)<=C.
Of course if x's distance to goal is zero, then x is the goal state and the search is complete. So your concern is vacuous - there are no cases where it's of interest.
Information to construct h(x) comes from your knowledge of the search space (e.g. characteristics of the graph). A bare, general graph alone doesn't provide anything useful. The best you can do is h(x) = cost of min weight outgoing edge of x for non-goal nodes and, as already discussed, h(x) = 0 for the goal. Again note this is a lower bound on distance to goal. It gives you Dijkstra!
To do better you need to know something about the graph's structure.
In your example, you are providing detailed knowledge, so making a good h is simple. You can use
/ 4 if x == S
| 3 if x == V1
h(x) = { 2 if x == V2 or V3
| 1 if x == V4
\ 0 if x == G
or you can use any other function h'(x) such that h'(x) <= h(x) for all x. For example, this would be admissible:
/ 3 if x == S
| 2 if x == V1
h'(x) = { 2 if x == V2 or V3
| 1 if x == V4
\ 0 if x == G
The OP points out that for many problems, h(x) can be hard to choose! This is precisely correct. If you can't find a good admissible heuristic, then A* is the wrong algorithm! Nonetheless, A* is very effective for problems where heuristics can be found. Examples I've tried myself:
Graphs where Euclidean distance is a good lower bound on the possible distance between any two nodes. For example, each pair of cities A and B is separated by a distance D "as the crow flies," but the road distance from A to B is at least D in length and possibly much more, i.e. its cost C is greater than or equal to D. In this case, D makes a fine heuristic because it's a low estimate.
Puzzles where "distance" to the winning state involves moving game pieces. In this case, the number of pieces currently out of position with respect to the winning state is a fine heuristic. Examples are the 8-bishop's problem from 7th Guest (number of bishops not yet in their final positions) and the Magic Square Problem (total Manhatten distance from all pieces' current positions to their correct position in the winning state).


IDA* and Admissibility of one Heuristic?

I want to practice old exam on AI and see one challenging question and need help from some experts...
A is initial state and G is a goal state. Cost is show on edge and Heuristic "H" values is shown on each circle. IDA* limit is 7.
We want to search this graph with IDA*. What is the order of visiting these nodes? (child is selected in alphabetical order and in equal condition the node is selected first that produce first.)
Solution is A,B,D,C,D,G.
My question is how this calculated, and how we can say this Heuristic
is Admissible and Consistent?
My question is how this calculated, and how we can say this Heuristic is Admissible and Consistent?
Let's first start with definitions of what are admissible and consistent heuristics:
An admissible heuristic never overestimates the cost of reaching the goal, i.e. the cost estimated to reach the goal is not greater than the cost of the shortest path from that node to the goal node in the graph.
You can easily see that for all nodes n in the graph the estimation h(n) is always smaller or equal than the real shortest path. For example, h(B) = 0 <= 6 (B->F->G).
Let c(n, m) denote the cost of an optimal path in the graph from a node n
to another node n'. A heuristic estimate function h(n) is consistent when
h(n) + c(n, m) <= h(n') for all nodes n , n' in the graph. Another way of seeing the property of consistency is monotonicity. Consistent heuristic functions are also called monotone functions, due to the estimated final cost of a partial solution, is monotonically non-decreasing along the best path to the goal. Thus, we can notice that your heuristic function is not consistent.
h(A) + c(A, B) <= h(B) -> 6 + 2 <= 0.
Let me do an analogy to explain it in a less mathematical way.
You are going for a run with your friend. At certain points you are asking your friend for how long does it take to finish your run. He is a very optimistic guy and he is always giving you a smaller time that you will be able to do, even if you run at your top all the rest of the way.
However, he is not very consistent in his estimations. At a point A he told you it will be at least an hour more to run, and after 30 minutes running you ask him again. Now, he is telling you that it is at least 5 minutes more from there. The estimation in point A is less informative than in point B, and therefore your heuristic friend is inconsistent.
Regarding the execution of IDA*, I copy-paste the pseudocode of the algorithm (I haven't tested) from the wikipedia:
node current node
g the cost to reach current node
f estimated cost of the cheapest path (root..node..goal)
h(node) estimated cost of the cheapest path (node..goal)
cost(node, succ) step cost function
is_goal(node) goal test
successors(node) node expanding function
procedure ida_star(root)
bound := h(root)
t := search(root, 0, bound)
if t = FOUND then return bound
if t = ∞ then return NOT_FOUND
bound := t
end loop
end procedure
function search(node, g, bound)
f := g + h(node)
if f > bound then return f
if is_goal(node) then return FOUND
min := ∞
for succ in successors(node) do
t := search(succ, g + cost(node, succ), bound)
if t = FOUND then return FOUND
if t < min then min := t
end for
return min
end function
follow the execution for your example is straightforward. First we set the bound (or threshold) with the value of the heuristic function for the start node. We explore the graph with a depth first search approach ruling out the branches which f-value is greater than the bound. For example, f(F) = g(F) + h(F) = 4 + 4 > bound = 6.
The nodes are explored in the following order: A,B,D,C,D,G. In a first iteration of the algorithm nodes A,B,D are explored and we run out of options smaller than the bound.
The bound is updated and in the second iteration the nodes C,D and G are explored. Once we reach the solution node with a estimation (7) less than the bound (8), we have the optimal shortest path.

average value when comparing each element with each other element in a list

I have number of strings (n strings) and I am computing edit distance between strings in a way that I take first one and compare it to the (n-1) remaining strings, second one and compare it to (n-2) remaining, ..., comparing until I ran out of the strings.
Why would an average edit distance be computed as sum of all the edit distances between all the strings divided by the number of comparisons squared. This squaring is confusing me.
I assume you have somewhere an answer that seems to come with a squared factor -which I'll take as n^2, where n is the number of strings (not the number of distinct comparisons, which is n*(n-1)/2, as +flaschenpost points to ). It would be easier to give you a more precise answer if you'd exactly quote what that answer is.
From what I understand of your question, it isn't, at least it's not the usual sample average. It is, however, a valid estimator of central tendency with the caveat that it is a biased estimator.
Let's define the sample average, which I will denote as X', by
X' = \sum^m_i X_i/N
IF N=m, we get the standard average. In your case, this is the number of distinct pairs which is m=n*(n-1)/2. Let's call this average Xo.
Then if N=n*n, it is
X' = (n-1)/(2*n) Xo
Xo is an unbiased estimator of the population mean \mu. Therefore, X' is biased by a factor f=(n-1)/(2*n). For n very large this bias tends to 1/2.
That said, it could be that the answer you see has a sum that runs not just over distinct pairs. The normalization would then change, of course. For instance, we could extend that sum to all pairs without changing the average value: The correct normalization would then be N = n*(n-1); the value of the average would still be Xo though as the number of summands has double as well.
Those things are getting easier to understand if done by hand with pen and paper for a small example.
If you have the 7 Strings named a,b,c,d,e,f,g, then the simplest version would
Compare a to b, a to c, ... , a to g (this are 6)
Compare b to a, b to c, ... , b to g (this are 6)
. . .
Compare g to a, g to b, ... , g to f (this are 6)
So you have 7*6 or n*(n-1) values, so you divide by nearly 7^2. This is where the square comes from. Maybe you even compare a to a, which should bring a distance of 0 and increase the values to 7*7 or n*n. But I would count it a bit as cheating for the average distance.
You could double the speed of the algorithm, just changing it a small bit
Compare a to b, a to c, ... , a to g (this are 6)
Compare b to c, ... , b to g (this are 5)
Compare c to d, ... , b to g (this are 4)
. . .
Compare f to g (this is 1)
That is following good ol' Gauss 7*6/2, or n*(n-1)/2.
So in Essence: Try doing a simple example on paper and then count your distance values.
Since Average is still and very simply the same as ever:
sum(values) / count(values)

Algorithm intersecting polytope and half line

I am looking for an algorithm in R to intersect a convex polytope with a line segment. I found several post here on stack exchange for in the plane but I am wondering if this algorithms exists in higher dimensions. My google searches didn't really produce a lot of answers.
The line segment is composed of a point inside and a point outside the convex polytope. Are there algorithms in R available that can do this in dimension N<=10 ? Or does anyone know a reference so I can implement the algorithm myself? Is there information on the complexity of finding the polytope and the intersection ?
For problems in computational geometry, dimension d > 3 usually might as well be d arbitrary. If you have the polytope as a collection of intersected halfspaces, then likely the only sensible thing to do is to intersect the line segment with each of the separating hyperplanes (by solving a system of d linear equations) and take the intersection closest to the point inside.
If you have only the vertices of the polytope or even just a set of vertices whose convex closure is the polytope, then the easiest approach given R's libraries probably is linear programming. (Conceivably you could compute the facets using an algorithm to find high-dimensional convex hulls, but there could be Theta(n^floor(d/2)) of them, where n is the number of vertices.) I'm not familiar with LP solvers in R, so I'll write down the program mathematically. It shouldn't be too hard to translate. Let p_0 be the point outside and p_1 be the point inside and v_i be the ith point defining the polytope.
maximize alpha_0
subject to
for 1 <= j <= d,
p_0[j] alpha_0 + p_1[j] alpha_1 - sum_{1 <= i <= n} v_i[j] beta_i = 0
alpha_0 + alpha_1 = 1
sum_{1 <= i <= n} beta_i = 1
alpha_0 >= 0
alpha_1 >= 0
for 1 <= i <= n,
beta_i >= 0
The intersection is defined by the point p_0 alpha_0 + p_1 alpha_1 (unless the program is infeasible, in which case there is no intersection).

Calculate original set size after hash collisions have occurred

You have an empty ice cube tray which has n little ice cube buckets, forming a natural hash space that's easy to visualize.
Your friend has k pennies which he likes to put in ice cube trays. He uses a random number generator repeatedly to choose which bucket to put each penny. If the bucket determined by the random number is already occupied by a penny, he throws the penny away and it is never seen again.
Say your ice cube tray has 100 buckets (i.e, would make 100 ice cubes). If you notice that your tray has c=80 pennies, what is the most likely number of pennies (k) that your friend had to start out with?
If c is low, the odds of collisions are low enough that the most likely number of k == c. E.g. if c = 3, then it's most like that k was 3. However, the odds of a collision are increasingly likely, after say k=14 then odds are there should be 1 collision, so maybe it's maximally likely that k = 15 if c = 14.
Of course if n == c then there would be no way of knowing, so let's set that aside and assume c < n.
What's the general formula for estimating k given n and c (given c < n)?
The problem as it stands is ill-posed.
Let n be the number of trays.
Let X be the random variable for the number of pennies your friend started with.
Let Y be the random variable for the number of filled trays.
What you are asking for is the mode of the distribution P(X|Y=c).
(Or maybe the expectation E[X|Y=c] depending on how you interpret your question.)
Let's take a really simple case: the distribution P(X|Y=1). Then
P(X=k|Y=1) = (P(Y=1|X=k) * P(X=k)) / P(Y=1)
= (1/nk-1 * P(X=k)) / P(Y=1)
Since P(Y=1) is normalizing constant, we can say P(X=k|Y=1) is proportional to 1/nk-1 * P(X=k).
But P(X=k) is a prior probability distribution. You have to assume some probability distribution on the number of coins your friend has to start with.
For example, here are two priors I could choose:
My prior belief is that P(X=k) = 1/2k for k > 0.
My prior belief is that P(X=k) = 1/2k - 100 for k > 100.
Both would be valid priors; the second assumes that X > 100. Both would give wildly different estimates for X: prior 1 would estimate X to be around 1 or 2; prior 2 would estimate X to be 100.
I would suggest if you continue to pursue this question you just go ahead and pick a prior. Something like this would work nicely: WolframAlpha. That's a geometric distribution with support k > 0 and mean 10^4.

How to calculate n log n = c

I have a homework problem for my algorithms class asking me to calculate the maximum size of a problem that can be solved in a given number of operations using an O(n log n) algorithm (ie: n log n = c). I was able to get an answer by approximating, but is there a clean way to get an exact answer?
There is no closed-form formula for this equation. Basically, you can transform the equation:
n log n = c
log(n^n) = c
n^n = exp(c)
Then, this equation has a solution of the form:
n = exp(W(c))
where W is Lambert W function (see especially "Example 2"). It was proved that W cannot be expressed using elementary operations.
However, f(n)=n*log(n) is a monotonic function. You can simply use bisection (here in python):
import math
def nlogn(c):
lower = 0.0
upper = 10e10
while True:
middle = (lower+upper)/2
if lower == middle or middle == upper:
return middle
if middle*math.log(middle, 2) > c:
upper = middle
lower = middle
the O notation only gives you the biggest term in the equation. Ie the performance of your O(n log n ) algorithm could actually be better represented by c = (n log n) + n + 53.
This means that without knowing the exact nature of the performance of your algorithm you wouldn't be able to calculate the exact number of operations required to process an given amount of data.
But it is possible to calculate that the maximum number of operations required to process a data set of size n is more than a certain number, or conversely that the biggest problem set that can be solved, using that algorithm and that number of operations, is smaller than a certain number.
The O notation is useful for comparing 2 algorithms, ie an O(n^2) algorithm is faster than a O(n^3) algorithm etc.
see Wikipedia for more info.
some help with logs
