How would you find the most specific "filter" that matches a document? (determining which market segment a user fits in) - graph

Imagine you have actions setup for when a user is from a certain demographic/market segment. The filters work a bit like a graph, matching for country, region, platform, operating system, and browser.
By default, you will match any value (if you specify US, you match for all users from the US regardless of region, platform, OS, or browser)
If you specify multiple values for any property of the filter it works like an OR (can be any of the values you specified), for the filter to match all the properties must have at least one match or be empty (accept all), essentially an AND operation.
So we can have:
Segment #1:
Countries: United States, Canada
Segment #2:
Countries: United States
Regions: New York
Platform: Tablets
Segment #3
Countries: United States
Browser: Chrome
Segment #4
Countries: United States
Segment #5
Match all (all filters left empty)
Scenario #1
User from Canada on his Tablet
Result: Segment #1
Scenario #2
User from New York, United States visits from Google Chrome on his Tablet.
Result: Segment #2, because the filter more specifically matches the user (matches country, region, and platform)
Scenario #3
User from Texas visits from his desktop
Result: Segment #4, tie with segment #1 is resolved because Segment #4 only matches United States and is therefore more specific
Work so far
I was thinking I could take each segment and load it up into a graph database that looks something like this
Country -> Region -> Platform --> OS -> Browser -> Segment
Each node either has a value (ex: United States, Chrome, Firefox, etc) and relationships that link it to any node below it in the tree (Country -> Browser is okay, Browser -> Country is not) or is null ("match all").
Each relationship (represented by ->) would also store a weight used to resolve ties. Relationships from a catch-all node get the max weight as they will always lose to a more specific filter.
Example database (numbers on the lines are the weight, lower weight becomes the prefered path)
Potential query
So now I need a query (maybe neo4j can do this?) that does the following:
Find the top level country node with the same value as the user or null
Go through each relationship (sorted by weight in ascending order)
Find the longest path, ties go to the node connected by a relationship with the lowest weight (if the tie is between a relationship to a null/catch-all node, the null node loses)
Continue this loop until we find a segment #
I'm sorry for the long post, it's hard to explain what I'm getting at via text.
What I'm looking for
Am I on the right path to solving this problem?
Are there better ways to go about this?
What would be the best way to store these relationships (graph database?)
How can I build a query that does what I want?
tl;dr: Need a way decent/performant way of finding the longest/most specific path in a graph like data structure. Comments requesting clarification or with any related information/documentation/projects/reading are very welcome

With Neo4j, you can store properties in a relationship, example:
(u1:User{name:"foo"})-[:FRIEND_WITH{since : "2015/01/01"}]->(u2:User{name:"bar"})
I think you should store country nodes this way:
(usa:Country{name: "USA", other attributes...})
So you can find every single country by matching with Country label, and then filter with the name property to get the one you're looking for.
Same for the cities, you can do a simple relationship to store every city :
(usa:Country{ name: "USA"})-[:CONTAINS_CITY]->(n:City{name: "New York", other attributes...})
and then you can add platform etc after the city.
To match a segment related to a certain country, you can do this way (example for Scenario #1) :
Match (c:Country{name : "Canada"})-[*1..2]->(p:Platform{name : "Tablet"})-[*1..]->(s:Segment) return s
Then you can create your segment by using nodes and create relations between them, the only problem may be on this case:
User1 has a Tablet in Canada
User2 has a Tablet in Canada using
Chrome
In this case, because of the depth match on the relationship ([*1..]) the User1 can be on the same segment as User2. The solution is to create intermediate nodes with default values, in case you don't have browser informations for example.

Related

HERE Navstreets LINK_ID Range

My organization has acquired the HERE Navstreets data set. It wishes to update the content while still adhering to the HERE Navstreets data model and relationships.
In this context, it is deemed of value to:
Retain the LINK_ID column as the unique identifier for each street segment.
Make a distinction between the original HERE LINK_ID values and the one added by my organization.
Retain the ability to ingest streets updates from HERE should my organization decide to do so.
In this context, we would like to use a different range of LINK_ID values from the one used by HERE. As an example, if HERE uses values between 10,000,000 and 100,000,000, we would assign only LINK_ID values that are within the range 1,000-9,999,999 (this is only for illustration purposes).
Is this approach already accounted for by HERE for the street data they may get from Map Creator? Is there a specific HERE (for Review or Work in Progress) range of LINK_ID values we should consider?
Based on the HERE KB0015682: Permanent ID concept in HERE Data
Entities with Permanent IDs
Generally, the following feature do have permanent IDs in the HERE Map products:
Lane
Face
Point Features
Administrative Areas (for , Built-up Areas, Districts, and Administrative Areas)
Complex Features (this includes Complex Administrative Area Features as well as Complex Intersections and Complex Roads)
Permanent IDs are globally unique within a specific Object, e.g., a Link ID occurs once globally. However the same Permanent ID can be used among different Object types (e.g., Node, Link, condition, etc.). Note: When a map is upgraded to Intermediate or HERE map, or when a country undergoes administrative restructuring, there may be a change in Permanent IDs.
The following are examples of permanent IDs in the RDF:
Address Point ID
Admin Place ID
Association ID
Building ID
Carto ID
Complex Feature ID
Condition ID
Country ID (which is one of the Admin Place IDs)
Face ID
Feature Point ID
Lane ID
Lane Nav Strand ID
Link ID
Name ID (with some exceptions)
Nav Strand ID
Node ID
POI ID
Road Link ID
Sign ID
Zone ID
Numeric Range of Permanent IDs
Map object IDs (PVIDs) in the extracts use 32-bit integer values to fit in a N(10) scheme. Note: Exception to N(10) scheme can exist. For example, Lane ID is N(12) in length.
The entire range is divided as follows:
Range----------------------------Designation
0000000001 to 0016777215 -> Non-permanent IDs
0016777215 to 2147483647 -> Permanent IDs
The range dedicated to permanent IDs are used for any entity.
The range dedicated to non permanent IDs are used in rare situations where an update is made in a copy of the database instead of in the live database itself and this update results in a new ID. This new ID in the database copy would be in the non-permanent range. The update would also be applied into the live database and this update would receive a permanent ID available in the next scheduled release. A cross-reference is not provided between non-permanent IDs and the eventual permanent ID from the live database.

Neo4j: How to return a single path for each pair of nodes that have multiple relationships

Assuming a graph like this:
(Thanks to https://neo4j.com/blog/neo4j-2-0-ga-graphs-for-everyone/ )
(Not shown but assume all countries, all artists, and all recording contracts are in the graph)
What would the CYPHER be for:
Starting with United Kingdom, return one path for each country where there is at least one recording contract
It doesn't matter which path is returned, just that it's a single path
Should return (United Kingdom)<-[]-(Iron Maiden)-[]->(Epic)-[]->(United States), but not (United Kingdom)<-[]-(Hybrid Theory)-[]->(Mad Decent)-[]->(United States) or (United Kingdom)<-[]-(Iron Maiden)-[]->(Columbia)-[]->(United States), for example
Return a single path for each of any two countries that are connected
Should return one path for (United Kingdom)-[]-(United States), one for (Japan)-[]-(Canada), etc. Bonus points for LIMIT 20 limiting it to either 20 paths or 20 country nodes
Also does not matter which path is returned, just that it's a single path
Edit: I've tried various combinations of MATCH (c1:Country)-[]-(c2:Country), MATCH p=((c1:Country)-[]-(c2:Country)), WITH, and UNWIND. I've also tried to use FOREACH to return only one path, but can't quite get the formula right.
This is easier if you are using subqueries (Neo4j 4.1.x or higher). That's because the subquery can help scope the operations you need to perform (collect(), in this case) to expansions and work from a single country, per country, instead of having to perform it across all rows for the entirety of the query, which could stress the heap.
In reality, since the number of countries are low, it won't be a problem, but it's a good approach to use when dealing with larger sets of nodes.
MATCH (country:Country)
CALL {
WITH country
MATCH path = (country)<-[:FROM_AREA]-(:Artist)-[:RECORDING_CONTRACT]->(:Label)-[:FROM_AREA]->(other:Country)
WHERE id(country) < id(other)
RETURN other, collect(path)[0] as path
LIMIT 20
}
RETURN country, path
LIMIT 20
Let's look at what this is doing.
We MATCH to :Country nodes.
Per country we will MATCH to the pattern you're looking for. If these are the only such paths and labels in the graph, then you can omit the labels in the pattern, as the relationship types should be enough to find the correct nodes.
The WHERE id(country) < id(other) is here to prevent mirrored results. For example, in the course of the query if we find a path from (United Kingdom)-[*]-(United States), and we also find a path the other direction, for (United States)-[*]-(United Kingdom), you probably don't want to return both. So we place a restriction on the graph ids so that only one of these will meet the restriction, and the mirrored result gets filtered out.
We use RETURN other, collect(path)[0] as path to get a single path per the country and other nodes. Remember that this is happening inside a subquery being called per country node, so even though country is not present here, this operation is being performed for a specific country node.
When we aggregate (such as with this collect(path), the grouping key (usually the non-aggregation variables) become distinct, so for the country and the other country, this will collect all the paths between them and then take the first of that list of paths, so we get our single path between two distinct countries.
We LIMIT the subquery results to 20, since we know in total we don't want more than 20 paths, so per country we don't want more than 20 paths either. This might be a bit redundant for this case, but when the query is more complex it is the right approach to make sure you're not doing more work than is needed.
We also have another LIMIT outside the subquery, so that if there are only a few countries processed, with a few paths per country, the total paths won't exceed 20.

Filter Google Places API results based on City

For one of my applications, I will let the users choose a City and then an Area. What I want to achieve is that based on the user's city selection, the Area field(which is using the autocomplete from Google Places) to display areas from that City. Eg: If user chooses the city as New York, the Area field should autocomplete only the areas from New York. Is this something which can be achieved?
1] In autocomplete API, pass the "Lat,Long" in "location" parameter and "100000" in "radius" parameter. It will bias search result within 100Km in that city.
Eg: Pass "40.7128,74.0059" for NewYork and it will give you result within 100Km in NewYork city.
OR
2] There is a trick you can use.If a user chooses a city, just add the city name as a prefix in the search string. It will only give the search suggestions in which the user is searching. eg, pass "NewYork" as a prefix in your search string, now type any word, it will only give you results for NewYork city restaurant, cafes, places, etc
You can do it by restricting the results of your autocomplete by a specified area.
Here are the ways that you can use:
Location Biasing - you may bias the results to a specified circle by passing a location and a radius parameter. This instructs the Place Autocomplete service to prefer showing results within that circle. Results outside of the defined area may still be displayed. You can use the components parameter to filter results to show only those places within a specified country.
Location Restrict - it can restrict the results to the region defined by location and a radius parameter, by adding the strictbounds parameter. This instructs the Place Autocomplete service to return only results within that region.
Places Types - you can restrict results from a Place Autocomplete request to be of a certain type by passing a types parameter. The parameter specifies a type or a type collection, as listed in the supported types below. If nothing is specified, all types are returned.
Hope this information helps you.

Graph DB get the next best recommended node in Neo4j cypher

I have a graph using NEO4j and currently trying to build a simple recommendation system that is better than text based search.
Nodes are created such as: Album, People, Type, Chart
Relationship are created such as:
People - [:role] -> Album
where roles are: Artist, Producer, Songwriter
Album-[:is_a_type_of]->Type (type is basically Pop, Rock, Disco...)
People -[:POPULAR_ON]->Chart (Chart is which Billboard they might have been)
People -[:SIMILAR_TO]->People (Predetermined similarity connection)
I have written the following cypher:
MATCH (a:Album { id: { id } })-[:is_a_type_of]->(t)<-[:is_a_type_of]-(recommend)
WITH recommend, t, a
MATCH (recommend)<-[:ARTIST_OF]-(p)
OPTIONAL MATCH (p)-[:POPULAR_ON]->()
RETURN recommend, count(DISTINCT t) AS type
ORDER BY type DESC
LIMIT 25;
It works however, it easily repeats itself if it has only one type of music connected to it, therefore has the same neighbors.
Is there a suggested way to say:
Find me the next best album that has the most similar connected relationships to the starting Album from.
Any Recommendation for a tie breaker scenario? Right now it is order by type (so if an album has more than one type of music it is valued more but if everyone has the same number, there is no more
significant)
-I made the [:SIMILAR_TO] link to enforce a priority to consider that relationship as important, but I haven't had a working cypher with it
-Same goes for [:Popular_On] (Maybe Drop this relationship?)
You can use 4 configurations and order albums according to higher value in this order. Keep configuration between 0 to 1 (ex. 0.6)
a. People Popular on Chart and People are similar
b. People Popular on Chart and People are Not similar
c. People Not Popular on Chart and People are similar
d. People Not Popular on Chart and People are Not similar
Calculate and sum these 4 values with each album. Higher the value, higher recommended Album.
I have temporarily made config as a = 1, b =0.8, c=0.6, d = 0.4. And assumed some relationship present which suggests some People Likes Album. If you are making logic based on Chart only then use a & b only.
MATCH (me:People)
where id(me) = 123
MATCH (a:Album { id: 456 })-[:is_a_type_of]->(t:Type)<-[:is_a_type_of]-(recommend)
OPTIONAL MATCH (recommend)<-[:ARTIST_OF]-(a:People)-[:POPULAR_ON]->(:Chart)
WHERE exists((me)-[:SIMILAR_TO]->(a))
OPTIONAL MATCH (recommend)<-[:ARTIST_OF]-(b:People)-[:POPULAR_ON]->(:Chart)
WHERE NOT exists((me)-[:SIMILAR_TO]->(b))
OPTIONAL MATCH (recommend)<-[:LIKES]-(c:People)
WHERE exists((me)-[:SIMILAR_TO]->(a))
OPTIONAL MATCH (recommend)<-[:LIKES]-(d:People)
WHERE NOT exists((me)-[:SIMILAR_TO]->(a))
RETURN recommend, (count(a)*1 + count(b)*0.8 + count(c)* 0.6+count(d)*0.4) as rec_order
ORDER BY rec_order DESC
LIMIT 10;

How does Google Maps decide when to use a specific icon?

I am using the Google Maps Places library to do a search for nearby hospitals, but it returns results that aren't necessary hospitals (but have 'hospital' as one of their types). However, I've noticed that actual hospitals have a hospital icon on the map, so Google must somehow know which establishments are actually hospitals. Does anyone know if the public has access to this data?
This is the icon I'm referring to: https://www.dropbox.com/s/1jfqcayxavjhlyi/Screenshot%202015-03-17%2017.20.19.png?dl=0
Example of request I'm making:
var request = {
location: self.location,
radius: 20000,
types: ['hospital'],
keyword: 'hospital'
};
Example result that isn't a hospital:
{"geometry":{"location":{"k":44.815958,"D":-68.808244}},
"icon":"http://maps.gstatic.com/mapfiles/place_api/icons/generic_business-71.png","id":"de6e60bd70b90ba4cb86afe149a60169553607f1",
"name":"Penobscot Community Health Center",
"opening_hours":{"open_now":true,"weekday_text":[]},
"photos":[{"height":320,"html_attributions":[],"width":320}],"place_id":"ChIJj--4INRKrkwRN0z2XkoJtVU",
"rating":3.1,
"reference":"CoQBdAAAADmf3YA0659efzMbCSPOK6SZttkfus7aWBDhrZZyX63Szl256BRcpz81LH6rIuONldYv256tsN7Zv-N6ZkOkJadlD2VS01bs7C4ierKvGUMyJOJu657xL5MvidF3Tgs9iejeJcXsxjDJYOwtN3m3sbfClfWYVnnIL4hMLYV8P9TnEhBurfJv_30CAG2wp1V73POVGhR-7fz1mCdh4OYWSa3Pw0mPupckoQ",
"scope":"GOOGLE",
"types":["hospital","pharmacy","store","health","establishment"],
"vicinity":"1012 Union Street, Bangor",
"html_attributions":[]}
My guess is there are a couple ways to get around this. You might remove the keyword argument from the API, which acts like a search term rather than a specific match on a type of location like the type field does.
You may want to be careful about your radius value choice.
Next, if you do a search on Google Maps in general you'll get a broad assortment of results. Do you need every result to be an actual hospital or can you do your own filtering afterwards?
If you do your own filtering it looks like type information and even icons are embedded in the result JSON. You might see if there's a distinguishing characteristic between the types of results you want and filter by that. Otherwise, any additional graphical data would not be accessible via the API.

Resources