Create a "join" query with data from edge and connected vertix - gremlin

I have a Gremlin API Cosmos DB. In the DB I have one type of Vertice with Label User that are connected to Vertices labeled Companies. I then want to show all connected companies. I do the query g.V('id-of-User').outE() and gets all connected Companies. The result might look something like this:
[
{
"id": "08f97a1d-9e81-4ccc-a498-90eb502b1879",
"label": "AuthorizedSignatory",
"type": "edge",
"inVLabel": "Company",
"outVLabel": "User",
"inV": "abd51134-1524-44fe-8a49-60d2d449a1f3",
"outV": "103bf1b9-464f-4f68-a4ca-7dfdbe94ae84"
},
{
"id": "c36b640b-9574-403b-8ab6-fcce695caa90",
"label": "AuthorizedSignatory",
"type": "edge",
"inVLabel": "Company",
"outVLabel": "User",
"inV": "2c14d279-00a4-41ad-a8c0-f3b882864568",
"outV": "103bf1b9-464f-4f68-a4ca-7dfdbe94ae84"
}
]
This is absolutely as expected. Now I want to take this a bit further and instead of just showing the GUID in the inV parameter I also want to include the Company Name in the result object, but I do not understand how to do the equivalent to a SQL join here.
Can someone please help me!!
What I want is something similar to the example below:
[
{
"id": "08f97a1d-9e81-4ccc-a498-90eb502b1879",
"label": "AuthorizedSignatory",
"type": "edge",
"inVLabel": "Company",
"outVLabel": "User",
"inV": "abd51134-1524-44fe-8a49-60d2d449a1f3",
"outV": "103bf1b9-464f-4f68-a4ca-7dfdbe94ae84",
"CompanyName": "ACME CORP"
},
{
"id": "c36b640b-9574-403b-8ab6-fcce695caa90",
"label": "AuthorizedSignatory",
"type": "edge",
"inVLabel": "Company",
"outVLabel": "User",
"inV": "2c14d279-00a4-41ad-a8c0-f3b882864568",
"outV": "103bf1b9-464f-4f68-a4ca-7dfdbe94ae84",
"CompanyName": "Giganticorp"
}
]
Where the CompanyName is one of the properties in the Company Vertice with the guid in inV prop.

There is no "join". The data is already connected by way of the edge, so you simply need to traverse further along your graph to get the "CompanyName".
g.V('id-of-User').out().values("CompanyName")
That shows you all of the names of the companies related to that user. If you're saying that you still want to show the data from the edge in addition to company name as you had in your examples, then no problem, project() the edge being specific about what you want:
g.V('id-of-User').outE().
project('eid','label','companyName').
by(T.id).
by(T.label).
by(inV().values("CompanyName"))
Again, note that there is no "join" for the "CompanyName". As the data is implicitly joined by way of the edge you just need to traverse over inV() to reach the data there.

Related

Gremlin querying Edge inVLabel, outVLabel

I have the following example Edge labeled "posts". "posts" can can have multiple types of parent Vertice (outVLabel) such as "channel", "publisher", "user", etc. How do you query for all Edges that have an outVLabel of "channel" without interrogating the label on the out() vertice? I want an array of "posts" Edges returned.
Query:
g.E().hasLabel('posts').has(???, 'channel')
Edge object:
[{
"id": "83c972b0-315d-49fe-a735-882c4dcbdaa2",
"label": "posts",
"type": "edge",
"inVLabel": "article",
"outVLabel": "channel",
"inV": "7410b6c8-ed70-4a00-800c-489d596907da",
"outV": "c8c5f45d-0195-49c5-b7ae-9eda1d441bc9",
"properties": {
"service": "rss"
}]
You would have to do:
g.E().hasLabel('posts').where(outV().hasLabel('channel'))
or if necessary, denormalize and place the outgoing vertex label on the edge as a property, in which case you could then do:
g.E().has('posts', 'outVLabel', 'channel')

Get vertices with a simpler format

Is there a way to get a list of vertices with a simpler format?
Currently, the following query:
g.V().has(label, 'Quantity').has('text', '627 km');
returns an object like this:
{
"id": 42545168,
"label": "Quantity",
"type": "vertex",
"properties": {
"sentence": [
{
"id": "pkbgi-pbw28-745",
"value": "null"
}
],
"updated_text": [
{
"id": "pk9vm-pbw28-5j9",
"value": "627 km"
}
],[...]
And when I get a list of edges it is formatted in a simpler format:
g.E().has(label, 'locatedAt').has('out_entity_id','41573-41579');
returns:
{
"id": "ozfnt-ip8o-2mtx-g8vs",
"label": "locatedAt",
"type": "edge",
"inVLabel": "Location",
"outVLabel": "Location",
"inV": 758008,
"outV": 872520,
"properties": {
"sentence": "Bolloré is a corporation (société anonyme) with a Board of Directors whose registered offi ce is located at Odet, 29500 Ergué-Gabéric in France.",
"in_entity_id": "41544-41548",
"score": "0.795793",
"out_entity_id": "41573-41579"
}
}
How so?
Is there a way to get vertices formatted this way?
My advice is to rather than have your query return the whole vertex, return the specific properties that you are interested in. For example the vertex ID or some selected properties that you are interested in or a valueMap. Otherwise what you will get back is essentially everything. This is really the same as in SQL trying to not do a "select *" but selecting only what you really care about.
Edited to add an example that returns the IDs of matching vertices.
g.V().has(label, 'Quantity').has('text', '627 km').id().fold()
Will yield a result that looks like this
{"requestId":"73f40519-87c8-4037-a9fc-41be82b3b227","status":{"message":"","code":200,"attributes":{}},"result":{"data":[[20608,28920,32912,106744,123080,135200,139296,143464,143488,143560,151584,155688,155752,159784,188520,254016,282688,286968,311360,323832,348408,4344,835648,8336,1343616,12352]],"meta":{}}}

correct GeoJSON format. map visualization

First things first: is this data in proper GeoJSON format?
According to the definition of GeoJSON data, as you can see by the MultiPoint & coordinates, I think it is.
It looks like this:
{
"lang": {
"code": "en",
"conf": 1.0
},
"group": "JobServe",
"description": "Work with the data science team to build new products and integrate analytics\ninto existing workflows. Leverage big data solutions, advanced statistical\nmethods, and web apps. Coordinate with domain experts, IT operations, and\ndevelopers. Present to clients.\n\n * Coordinate the workflow of the data science team\n * Join a team of experts in big data, advanced analytics, and visualizat...",
"title": "Data Science Team Lead",
"url": "http://www.jobserve.com/us/en/search-jobs-in-Columbia,-Maryland,-USA/DATA-SCIENCE-TEAM-LEAD-99739A4618F8894B/",
"geo": {
"type": "MultiPoint",
"coordinates": [
[
-76.8582049,
39.2156213
]
]
},
"tags": [
"Job Board"
],
"spider": "jobserveNa",
"employmentType": [
"Unspecified"
],
"lastSeen": "2015-05-13T01:21:07.240000",
"jobLocation": [
"Columbia, Maryland, United States of America"
],
"identifier": "99739A4618F8894B",
"hiringOrganization": [
"Customer Relation Market Research Company"
],
"firstSeen": "2015-05-13T01:21:07+00:00"
},
I want to visualize this as a "zoomable",viz. interactive, map, as in the examples on the d3js website.
I'm trying to use a tool called mapshaper.org to see an initial visualization of the data in map form, but when I load it up, nothing happens.
To me this doesn't make sense because, according to their website, one can simply
Drag and drop or select a file to import.
Shapefile, GeoJSON and TopoJSON files and Zip archives are supported.
However, in my case it is not working.
Does anyone have any intuition as to what might be going wrong, or a suggestion as to a tool comparable to create a zoomable map out of GeoJSON data?
According to the definition of GeoJSON data, I have what I think constitutes data in that format
Well, you don't have a proper GeoJSON object. Just compare what you've got against the example you've linked. It doesn't even come close. That's why mapshaper doesn't know what to do with the JSON you load into it.
A GeoJSON object with the type "FeatureCollection" is a feature collection object. An object of type "FeatureCollection" must have a member with the name "features". The value corresponding to "features" is an array. Each element in the array is a feature object as defined above.
A feature collection looks like this:
{
"type": "FeatureCollection",
"features": [
// Array of features
]
}
http://geojson.org/geojson-spec.html#feature-collection-objects
A GeoJSON object with the type "Feature" is a feature object. A feature object must have a member with the name "geometry". The value of the geometry member is a geometry object as defined above or a JSON null value. A feature object must have a member with the name "properties". The value of the properties member is an object (any JSON object or a JSON null value). If a feature has a commonly used identifier, that identifier should be included as a member of the feature object with the name "id".
A feature looks like this:
{
"id": "Foo",
"type": "Feature",
"geometry": {
"type": "Point",
"coordinates": [0, 0]
},
"properties": {
"label": "My Foo"
}
}
http://geojson.org/geojson-spec.html#feature-objects
Here are examples of the different geometry objects a feature can support: http://geojson.org/geojson-spec.html#appendix-a-geometry-examples
Put those two together, it would look like this:
{
"type": "FeatureCollection",
"features": [{
"id": "Foo",
"type": "Feature",
"geometry": {
"type": "Point",
"coordinates": [0, 0]
},
"properties": {
"label": "My Foo"
}
},{
"id": "Bar",
"type": "Feature",
"geometry": {
"type": "LineString",
"coordinates": [
[100.0, 0.0],
[101.0, 1.0]
]
},
"properties": {
"label": "My Bar"
}
}]
}
That really doesn't look like the JSON you've posted. You'll need to convert that to proper GeoJSON somehow via a custom script or manually. It's a format i've never seen before, sorry to say.

Freebase beginner: getting an athlete's sport

I'm trying to use Freebase to find out what team a professional athlete belongs to.
So I'm trying to do something like this
[{
"id": null,
"name": "Kobe Bryant",
"type": "/sports/pro_athlete",
"sports_played": []
}]​
query editor
and then extract the property "sport_played" to find out what sport the player belongs to. My plan is to then do a more specific query for "basketball_player" or so until I finde the team name. (Is a simpler way to do this?)
However, I already fail at the first step, because in the results, while the properties sport_played and sport_played_professionally contain a single entry, that entry is null:
{
"code": "/api/status/ok",
"result": [{
"id": "/en/kobe_bryant",
"name": "Kobe Bryant",
"sports_played": [
null
],
"type": "/sports/pro_athlete"
}],
"status": "200 OK",
"transaction_id": "cache;cache03.p01.sjc1:8101;2012-06-13T13:30:20Z;0053"
}
I'm confused: I know from browsing the database that there should be a sport value for this player. And the result clearly shows that there is a single value in the "sports_played" list in the result.
But why is it null? Shouldn't is rather be a reference to a Basketball object?
Your query is asking for a list of sports_played but since you only used square braces it will only return a list of the names of all the matching topics.
If you add curly braces to the query you'll see that sports_played actually returns one topic with name = null (which is what your previous query was showing)
[{
"id": null,
"name": "Kobe Bryant",
"type": "/sports/pro_athlete",
"sports_played": [{}]
}]
This is because the expected value of sports_played is a CVT called Sports played which links athletes to sports for specific periods of time. This is so that we can keep track of athletes that have played multiple sports and know which one is the most current.
If you want to see the values inside the CVT object, you'll need to drill down further like this:
[{
"id": null,
"name": "Kobe Bryant",
"type": "/sports/pro_athlete",
"sports_played": [{
"type": "/sports/pro_sports_played",
"sport": {
"id": null,
"name": null
},
"career_start": null,
"career_end": null
}]
}]
Try it in the Query Editor
The sports_played property isn't really what you want here since it's not necessarily correlated with the properties which contain the team information.
Instead you want to use something along the lines of:
{
"id": null,
"name": "Kobe Bryant",
"/basketball/basketball_player/team" : [{"team":null}],
}]
}
if you wanted to get all the teams for all the Kobe Bryants you could use something like:
[{
"id": null,
"name": "Kobe Bryant",
"/soccer/football_player/current_team" : [{"team":null,"optional":true}],
"/basketball/basketball_player/team" : [{"team":null,"optional":true}],
"/american_football/football_player/current_team" :[{"team":null,"optional":true}]
}]
}]​
Unfortunately you'll need to go through the schema by hand and pull out the properties of interest by hand since they're not reliably regular enough to query automatically, but there really aren't that many sports to consider, so it shouldn't take very long to assemble your list.

freebase city regions above neighborhood

Using freebase how can I find say, all the burrows/subcities of NY? (queens, brooklyn, etc.)
And will it be similar to other cities? Say if I want to know the subdivisions of Prague (Zizkov, Old Town, etc.) or Berlin, etc?
I've tried various combos but haven't hit one yet.
{
"id": "/en/new_york",
"guid": null,
"name": null,
"/location/location/containedby": [
],
"/location/location/contains" : [],
"/location/place_with_neighborhoods/neighborhoods": [
]
}​
The property /location/location/contains is the one that you want, but you're going to have two problems:
It's only sparsely populated
It has multiple levels of containment as a hack to work around API limitations
There's not much you can do about #1 unless you want to work on improving the data yourself. For #2, you can subtract the set of locations which are contained in another location in the "contains" set.
Someone might be able to give a better answer but this will get major districts like in NY but probably not for smaller cities which are more like regions.
{
"id": "/en/new_york",
"guid": null,
"name": null,
"/location/location/containedby": [
],
"/location/location/contains" : [
"name": null,
"type": "/location/citytown"
]
}​
or to select multiple items that might be it
{
"id": "/en/new_york",
"guid": null,
"name": null,
"/location/location/containedby": [
],
"/location/location/contains" : [
"name": null,
"type|=" : [
"/location/citytown",
"/location/neighborhood",
"/location/administrative_division",
"/location/de_borough",
"/location/place_with_neighborhoods/neighborhoods"
]
]
}​

Resources