Routing Distance One way versus reverse direction - here-api

I am using the Here API to calculate distance between two coordinates. The mileage in one direction is sometimes significantly different then in the other direction. I assume this is because we are using a set date/time and is taking into account historical traffic patterns. Are there other factors that could be impacting the difference? Below is an call we are making:
https://route.api.here.com/routing/7.2/calculateroute.json?app_id=app_id&app_code=app_code&mode=shortest;car;traffic:disabled&departure=2019-03-04T09:00:00&waypoint0=lat,long&waypoint1=lat,long

Many attributes affect route calculation which can be specified as per the use case. To check the dependent factors, please refer
http://developer.here.com/documentation/routing/topics/resource-calculate-route.html

Related

Is it possible to order by distance?

I'm asking if it's possible to order results by distance in a Discover query. I've performed a query for train stations however, the results seem to order somewhat randomly. I'm only interested in what the nearest train station is and so having a result with the nearest distance first would be valuable.
I appreciated that it's entirely possible to programmatically achieve this, but before I burn time working on a solution, I figured it was worth reaching out to see if this was already possible within the API.
If it helps, this is the query I was making https://discover.search.hereapi.com/v1/discover?at=-34.0337365,151.0847004&q=station&limit=10&apikey=. You may see that the closest result is 4th down.
Happy to hear an alternative solution if there is one.
Thanks!
You can do this in two different ways.
Use the browse endpoint instead of discover and the results will be arranged according to distance. The code will look like:
https://browse.search.hereapi.com/v1/browse?at=-34.0337365,151.0847004&name=station&limit=10&apikey=
OR
https://browse.search.hereapi.com/v1/browse?at=-34.0337365,151.0847004&categories=400-4100-0035&limit=10&apikey=
Places category system
https://developer.here.com/documentation/geocoding-search-api/dev_guide/topics-places/places-category-system-full.html

Avoid U-Turns in requested HERE Maps route

When requesting a multi-leg route via the HERE API (e.g. Point A to Point B via Point C), is it possible to prevent or restrict u-turns? I am trying to produce a map to be followed by a school bus, and u-turns are not allowed. However, I often find that the directions suggest to do a u-turn upon arrival at the intermediate points. I would like the bus to keep going straight after making its stop. Is this possible to do?
I'm not aware of the possibility to restrict this type of maneuver.
Nevertheless, you can use truck routing in Here Location Platform for Enterprise, see the documentation here:
https://developer.here.com/documentation/download/routing_lbsp/6.2.32.0/Routing%20API%20v6.2.32.0%20Developer%27s%20Guide.pdf
It should retrieve much more adapted route for your use case and it might avoid u-turn.
Also, one final simple solution is to calculate route with alternative set to 3 (or make multiple call with other optimizations) and iterates through the different result in order to exclude routes that would include maneuver with a type associated to a u-turn

Build an undirected weighted graph by matching N vertices

Problem:
I want to suggest the top 10 most compatible matches for a particular user, by comparing his/her 'interests' with interests of all others. I'm building an undirected weighted graph between users, where the weight = match score between the two users.
I already have a set of N users: S. For any user U in S, I have a set of interests I. After a long time (a week?) I create a new user U with a set of interests and add it to S. To generate a graph for this new user, I'm comparing interest set I of the new user with the interest sets of all the users in S, iteratively. The problem is with this "all the users" part.
Let's talk about the function for comparing interests. An interest in a set of interests I is a string. I'm comparing two strings/interests using WikipediaMiner (it uses Wikipedia links to infer how closely related two strings are. eg. Billy Jean & Thriller ==> high match, Brad Pitt & Jamaica ==> low match blah blah). I've asked a question about this too (to see if there's a better solution than the one I'm currently using.
So, the above function takes non-negligible time, and in total, it'll take a HUGE time when we compare thousands (maybe millions?) of users and their hundreds of interests. For 100,000 users, I can't afford to make 100,000 user comparisons in a small time (<30sec) in this way. But, I have to give the top 10 recommendations within 30 secs, possibly a preliminary recommendation, and then improve on it in the next 1 min or so, calculate improved recommendations. Simply comparing 1 user vs the N users sequentially is too slow.
Question:
Please suggest an algorithm, method or tool using which I can improve my situation or solve my problem.
I could think of only an approach to solve the problem, since the outcomes of below stuff
depend on the nature of inter-relation between interests.
=>step:1 As your title says.Build an undirected weighted graph with interests as vertices and the weighted match between them as edges.
=>step:2 - cluster the interests. (Most complex)
Kmeans is a commonly used clustering algo, but works on based on
K-Dimensional vector space.refer wiki to see how K-means works.
it minimizes the sum of (sum of distance^2 for each point and say the center of the cluster) for all clusters. In your case, there are no dimensions available. so try if you can apply the minimizing logic applied there by creating some kind of rule, for distance between two vertices, higher match => lesser distance and vice versa (what are the different matching levels provided by wiki-miner?). chose the Mean of cluster as say the most connected vertex in the chosen set, page ranking sounds to be a good option for "figuring the most connected vertex ".
"Pair-counting F-Measure" sounds like it suit's your need (weighted graph), check for other options available.
(Note: keep modifying this step untill a right clustering algo is found and
the right calibration for distance rule, no of clusters etc are found. )
=>Step:3 - Evaluate the clusters
from here on its like calibrating a couple things to fit your need.
Examine the clusters, reevaluate :
the number of clusters , inter-cluster distance, distance between vertices inside clusters, size of clusters,
time\precision trade-off (compare final - match results without any clustring)
goto: step-2 untill this evaluation is satisfactory.
=>step:4 - Examinie new inerest
iterate thru all clusters, calculate conectivity in each cluster, sort clusters based on high connectivity, for the top x% of sorted clusters
sort and filter out the highly connected interests.
=>step:5 - Match User
reverse look up set of all users using the interests obtained out of step-4, compare all interests for both users, generate a score.
=>step:6 - Apart form the above
you can distribute the load (multiple machines can be used for clusters machine-n clusters) to multiple systems\processors, based on the traffic and stuff.
what is the application for this problem, whats the expected traffic?
Another solution to find the connectivity between the new interest and "set of interests in Cluster" C.
Wiki-Miner runs on a set of wiki documents, let me call it the UNIVERSE.
1:for each cluster fetch and maintain(index, lucene might be handy) the "set of high relevent docs"(I am calling it HRDC) out of the UNIVERSE. so you have 'N' HRDC's if you got 'N' clusters.
2:when a new interest comes find "Conectivity with Cluster" = "Hit ratio of interest in HRDC/Hit ratio of interest in UNIVERSE" for each HRDC.
3:Sort "Conectivity with Cluster"'s and choose the Highly connected clusters.
4:Either compare all the vertices in the cluster with the new interest or the highly connected vertices (using Page Ranking), depending on the time\Precision trade off , that suits you.
One flaw is that your basing your algorithms complexity on the wrong thing. The real issue is that you have to compare each unique interest against every other unique interest (and that interest against itself).
If all of the interests are unique, then there is probably nothing you can do. However, if you have a lot of duplicate interests you can perhaps speed up the algorithm this way by the following.
Create a graph that associates each interest with the users that have that interest. In such a way that allows for fast look-ups.
Create a graph that shows how each interest relates to each other interest, also in such a way that allows for fast look-ups.
Therefore, when a new user is added, their interests are compared to all other interest and stored in a graph. You can then use that information to build to build a list of users with similar interests. That list of users will then need to be filtered somehow to bring it down to the top 10.
Finally, add that user and their interests to the graph of users and interests. This is done last so that the user with the most closely matched interests isn't the user themselves.
Note:
There might be some statistical short cuts that you could do something like this: A is related to B, B is related to C, C is related to D, therefore A is related to B, C, and D. However, to use those kinds of short cuts likely requires a much better understanding of how your comparison function works, which is a bit beyond my expertise.
Approximate solution:
I forgot to mention it earlier, but what your looking when comparing users or interests is a "Nearest neighbor search" in higher dimensions. Meaning, that for exact solutions, a linear search generally works better than data structures. So approximation is probably the best way to go if you need it faster.
To obtain a quick approximate solution (without guarantees as to how close it is), you'll need a data structure that allows for quickly being able to determine which users are likely to be similar to a new user.
One way to build that structure:
Pick 300 random users. These will be the seed users for 300 clusters. Ideally, you'd use the 300 users that are least closely related, but that's probably not practical, still might be wise to ensure that the no seed user is too closely related to the other users (as a sum or average of it's comparison's to other users).
The clusters are then filled by each user joining the cluster whose representative user most closely matches it.
The top ton can then be determined by picking the top 10 users most closely related users from that cluster.
If you ensure that the number of clusters and the users per cluster is always fairly close to sqrt(number of users), then you obtain a fair approximation in O(sqrt(N)) by only checking the points within the cluster. You can improve that approximation by including users in additional clusters and checking the representative users for each cluster. The more clusters you check, the closer you get towards O(N) and an exact solution. Although, there's probably no way to say how close the current solution is to the exact solution. Chances are you start to hit dimishing returns after checking more than a total of log(sqrt(N)) clusters total. Which would put you at O(sqrt(N) log(sqrt(N))).
few thoughts ...
Not exactly a graph theory solution.
assuming a finite set of interests. for each user maintain a bit sequence where each interest is a bit representing whether the user has that interest or not.
For a new user simply multiply the bit sequence with the existing users bit sequence and find the number of bits in the result which gives an idea of how closely their interests match.

Find objects by pointing in a direction

I have a specific problem, and I find it hard to find a solution!
Using a GPS Device I can find my current position on earth. I need to be able to point to a direction (a compass on iPhone or similar device) and find what important objects (locations) are in that direction! Assume that I do have all those locations stored in a database.
Assuming you have a location and a direction, your goal is to find what items in your database are adjacent to the location, in the appropriate direction.
Obviously, you could scan through every element in your database, and answer for each one, "Is this in the region?". The real magic is efficiency; how you index the data in the database such that you can answer that question without having to examine every record.
A great example of this is in MongoDB. However it's implementation does not handle direction, so you will need to filter the results. You will use the database to get all objects within x distance of you, and will filter out those elements which are not in the appropriate direction.
If you cannot use a database engine with native geospatial indexing, you'll have to implement it yourself. As mentioned in the comments, the Haversine function is used to compute distance on a sphere (in this case, the earth). Rather than computing the distance between every point and yourself, you could begin by eliminating any elements which are grossly out of range, e.g. (your latitude + your search distance) < (the objects latitude). Then use the Haversine to filter further. You could also use a geospatial hashing function to do most of the work beforehand.
Once you have all of the elements within range, you can convert the x-y coordinates in your database into the polar coordinates. In short:
arctan((item_y - users_y) / (item_x - users_x)) = the angle between the item and the user
If you compute this for every item within 'range' of the user, and filter out any elements which are not within some bounds of the compass angle (+/- 20 degrees, for example), you will get the elements you need.
If efficiency is still an issue, you can get more clever by immediately invalidating any elements which, for example are on the wrong side of the user (if the user is facing west, than any elements which have an x coordinate higher than the user cannot possibly be in his view). Depending on your programming language, it may also be more efficient to use a static table of arctans with a lower degree of accuracy than is commonly provided.
If you are particularly clever, you may also find ways of indexing the data by angle, which will further lower the computation required.

Way to infer the size of the userbase of a site from sampling taken usernames

Suppose you wanted to estimate the size of a userbase of a site which does not publicize this information.
People are more likely to have acquired different usernames with different probabilities. For instance, if the username 'nick' doesn't exist on the system, it's likely to have an extremely small userbase. If the username 'starbaby' is taken, it's likely to be a much larger site. It seems like a straightforward Bayesian problem.
There is the problem that different sites may have a different space of allowable usernames. The biggest problem would be the legality of common characters such as spaces, I imagine. Another issue that could taint the prior distribution is whether the site suggests names when the one you want is taken, or leaves you to think of a more creative name yourself.
How could you build a training set of the frequency of occurrence of usernames across different sized systems? Is there a way to use Bayes to do numeric estimation rather than classification into fixed-width buckets?
What you need to do is accurately estimate the probability that a certain user name is present given the number of users registered. Lets say N is the number of users and u = 1 if user u is present and 0 if they are absent.
First of all, make the assumption that the probability distributions for each user name are independent of each other. This is not going to be true - and you've already come up with one reason why - but it will probably be necessary since it makes the data collection and the maths a lot easier.
You are going to need a lot of data from sites with registered user names and the total number of users of that site. Now, take any specific user name and imagine your data points on a 2d plot (with N on x and u on y), there's going to be one horizontal line of points at y=0 and another at y=1. You can either bin the x axis as you suggest and take the mean y coordinate of all the data points in the bin to get a discrete function, or you could try to fit the points on the graph to some class of functions. I don't really know what that class of functions that would be - maybe some kind of power law? (I'm thinking of Zipf's law).
You now have the probability distributions to apply Bayes' rule. I don't know what kind of prior for N you would want to use. A uniform distribution (up to some large number) would make no assumptions, but I would guess most sites have a small user base.
I suspect that in order to make this work, when you sample users from a site you will need to do so for a specific set of users. I'm betting that the popularity of user names is going to have a very long tail and so a random sample of users is going to give you a lot of very infrequently used names and therefore a lot of uninformative evidence.
EDIT: I had another thought; in most forums (and on StackOverflow) users have consecutive user ids, so you can use a single site with a large number of users to give you estimates for all smaller N.
I think this is a cool idea!
You may be able to put together a data set by using UserNameCheck.com for some different usernames and cross-referencing the results with the stated userbase sizes of those sites that give them out.
Note: that website does not seem to check if the usernames are valid for the site, so e.g. it thinks Gmail would let you register "nick#gmail.com" even though that's too short.
The only way is to get a large set of taken usernames on systems for which you know the size of the userbase. Data may be skewed in userbases where certain names are more common. Even a tiny userbase from a Lord of the Rings forum will likely contain the username Strider, for example.

Resources