A* algorithm when the heuristic can not be calculated for some nodes - graph

I am working on a dataset of cities and towns spread across North America with the objective of finding the shortest path between a starting point and an ending point. I decided to use the Haversine distance as my heuristic function. But, my dataset doesn't have the latitude and longitude coordinates for some of the towns that could lie in the shortest distance path. How am I supposed to calculate the heuristic in this case? Would taking the average of the heuristics of the neighboring towns make sense?
It is given that a town/city without its corresponding coordinates cant be the starting point or the ending point.
Is there a different heuristic I should be considering instead of the Haversive distance?

If I remember correctly (don’t trust me on this!) a heuristic that returns zero for some nodes is still "legal" (in the sense that when you get to the end node, you know it’s optimal), so that would be a brutal solution. Obviously, doing this for too many nodes would wreck your search performance!
I think that interpolating between neighbour locations risks creating an inadmissible heuristic.

Related

Minimum length of lines needed

Suppose we have a set of N points on the cartesian plane (x_i and y_i). Suppose we connect those points with lines.
Is there any way like using a graph and something like a shortest path algorithm or minimum spanning tree so that we can reach any point starting from any point but minimizing the total length of the lines??
I though that maybe I could set the cost of the edges with the distance of a graph and use a shortest path algorithm but I'm not sure if this is possible.
Any ideas ?
I'm not 100% sure what you want, so I go for two algorithms.
First: you just want a robust algorithm, use dijkstras algorithm. The only challange left is to define the edge cost. Which would be 1 for neighboring nodes, I assume.
Second: you want to use heuristics to estimate the next best node and optimize time consumption. Use A*, but you need to write a heuristic which under estimates the distance. You could use the euclidean distance to do so. The edge problematic stays the same.

Finding the nearest zipcode from a list of zipcodes

I have a list of locations with zipcodes. I have another list of Distribution Centers that serve these locations. Is there anyway to map the nearest DC to each of these locations? I am an extremely green coder, but i have some experience with R
I'd need more information to give you some possible code to solve your problem however, here is one approach to solving your problem.
Convert your zipcodes to longitudes and latitudes.
Not sure what location data you have on your distribution centers, but you should be able to find a way to retrieve the long/lat of each of these.
For each zipcode, compute the the distance to each DC (using their respective longs/lats). To compute the distance, use the haversine formula. Find the minimum of these distances. This is your solution.

A* with manhattan distance or euclidean distance for maze solving?

I have obtained all the possible paths of maze through image processing. Now, I want to use A* algorithm in order to find shortest path of maze. However, I am confused as to whether euclidean distance will be a better heuristic or manhattan distance. Does it depend upon maze type or is the choice of heuristic independent of maze type? Which distance (manhattan or euclidean) will be a good choice for the following possible paths and why? Please suggest.
PS. (Please add your reference too, if your have any. It will be helpful)
The objective of a Heuristic is to provide contextual information to the pathfinder. The more accurate this information is, the more efficient the pathfinder can be.
You have two contradicting requirements to get a good heuristic, which is good because it means there is a sweet spot. Here they are:
An Heuristic must be admissible, which means it shall never overestimate the distance. Otherwise, the algorithm will be broken and may return paths that are not even optimal.
An heuristic must return the largest distance possible. An heuristic that underestimates the remaining path from a cell, will favour that cell when another might have been better.
Of course the optimal Heuristic would return the exact, correct length (which generally is not achievable or defeats the purpose) because it cannot return a longer path without ceasing to be admissible.
In your case, it looks like you're dealing with 4-connected grids. In that case the manhattan distance will be a better metric than euclidian distance, because the Euclidian will under-estimate the cost of all displacements compared to Manhattan (due to the Pythagorean Theorem).
Whithout any further knowledge than 'the graph is a 4-connected Grid', there is no better metric than Manhattan. If however you manage to obtain more data (obstacle density, 'highways', etc.) then you might be able to devise a better heuristic - though keeping it admissible would be a very hard problem in itself.
EDIT Having a closer look, it looks like you have angled vertices in the bottom left. If that is so, you're not in a 4-connected graph, then you MUST use Euclidian distance, because Manhattan would not be admissible.
Its not clear what moves are available to your hero. Does you graph make up a rectangular grid like chess board and can you go diagonally in one step like king in chess? If yes then Chebyshev distance is the best https://en.wikipedia.org/wiki/Chebyshev_distance.
Otherwise use Euclidian distance.
You cant use Manhattan here if you want an optimal path because Manhattan heuristic is not admissible on diagonal routes (it overestimates them) so it can lead to suboptimal pathes

How to compute Voronoi tesselation based on manhattan distance in R

I am trying to compute a Voronoi tesselation in 2D with the Manhattan distance in R.
Ideally this would be a function that takes a set of two-dimensional points and outputs a list of polygons that partition the space. I am not certain what representations of Voronoi tesselations are standard.
There are of course many ways to do this with the Euclidean metric (packages like deldir and qhull make this very easy), but I haven't found a way to do this for the manhattan distance. A search using sos's findFn('voronoi') also yielded no results.
Info: taxicabgeometry.net
Interactive: Manhattan-metric Voronoi diagram(Click version)
I've been rolling my own in python, and can sum up the basics here:
Between neighboring centroids is a perpendicular line, in manhattan metric - two rays and a 45 degree diagonal most likely, if the centroids are randomly generated, but a straight horizontal, vertical, or 45 degree diagonal line may also occur. Given a set of such lines for every centroid pair, the edges separating the regions are among them. Collect intersect points of each pair of lines which are equal-distant (within an epsilon), in manhattan metric, to it's 3 nearest centroids. Also collect the two mid points of the 45 degree diagonal which are similarly equal-distant to their nearest two centroids. The outer polies won't be closed. How to deal with them depends on what you need. The poly borders and border verts will need sorting, so your polies aren't a zigzagged mess. Winding order can be determined if they should be clockwise or other. More can be done, just depends on what you need.
Unfortunately, this gets exponentially slower the more points are involved. The intersecting of every bisector to every other bisector is the bottleneck. I've been attempting an insertion method, with some success, but . Now I'm thinking to try first creating a nearest-neighbor linkage between the centroids. If the neighbors are known, the bisectors to intersect will be minimal, and many centroids can be computed quickly.
Anyway, the brute-force approach does work:
The point near the cursor is actually 2 points of a tiny diagonal. It's a precise method, but more complicated than it first seems. The java code from the interactive link above may be faster, but was difficult to get solid and precise geometry from.
Sorry, I dont know R.
Maybe the question is about finding the maximum area of a square that match inside a circumcircle (of a triangle). The equation for such a square abs(x)+abs(y)=r (www.mathematische-basteleien.de/taxicabgeometry.htm). When you have a mesh of triangles the voronoi diagram is the dual.

Approaches for spatial geodesic latitude longitude clustering in R -- Follow-Up

Mine are follow-ups to the question & answer in Approaches for spatial geodesic latitude longitude clustering in R with geodesic or great circle distances.
I would like to better understand:
Question #1: If all the lat / long values are within the same city, is it necessary to use either fossil or distHaversine(...) to first calculate great circle distances ?
or, within a single city, is it OK to run clustering on the lat/long values themselves ?
Question #2: jlhoward suggests that :
It's worth noting that these methods require that all points must go into some cluster. If you just ask which points are close together, and allow that some cities don't go into any cluster, you get very different results.
In my case I would like to ask just ask "which points are close together", without forcing every point into a cluster. How can I do this ?
Question #3: To include one or two factor variables into the clustering (in addition to lat/long), is it as easy as including those factor variables in the df upon which the clustering is run ?
Please confirm.
Thanks!
"within a single city, is it OK to run clustering on the lat/long values themselves ?"
Yes, as long as your city is on the equator, where a degree of longitude is the same distance as a degree of latitude.
I'm standing very close to the north pole. One degree of longitude is 1/360 of the circumference of the circle round the pole from me. Someone ten degrees east of me might only be ten feet away. Someone one degree south of me is miles away. A clustering algorithm based on lat-long would think that guy miles away was closer to me than the guy I can wave to ten degrees east of me.
The solution for small areas to save having to compute great-circle ellipsoid distances is to project to a coordinate system that is near-enough cartesian so that you can use pythagoras' theorem for distance without too much error. Typically you would use a UTM zone transform, which is essentially a coordinate system that puts its equator through your study area.
The spTransform function in sp and rgdal will sort this out for you.

Resources