How to represent the Data Model of a Graph - graph

So we have been developing some graph based analysis tools, using neo4j as a persistence engine in the background. As part of this we are developing a graph data model suitable for our domain, and we want to use this in the application layer to restrict the types of nodes, or to ensure that nodes of certain types must carry certain properties. Normal data model restrictions.
So thats the background, what I am asking is if there is some standard way to represent a data-model for a graph db? The graph equivalent of an xsd perhaps?

There's an open-source project supporting strong schema definitions in Neo4j: Structr (http://structr.org, see it in action: http://vimeo.com/structr/videos)
With Structr, you can define an in-graph schema of your data model including
Type inheritance
Supported data types: Boolean, String, Integer, Long, Double, Date, Enum (+ values)
Default values
Cardinality (1:1, 1:*, *:1)
Not-null constraints
Uniqueness constraints
Full type safety
Validation
Cardinality enforcement
Support for methods (custom action) is currently being added to the schema.
The schema can be edited with an editor, or directly via REST, modifiying the JSON representation of the data model:
{
"query_time": "0.001618446",
"result_count": 4,
"result": [
{
"name": "Whisky",
"extendsClass": null,
"relatedTo": [
{
"id": "96d05ddc9f0b42e2801f06afb1374458",
"name": "Flavour"
},
{
"id": "28f85dca915245afa3782354ea824130",
"name": "Location"
}
],
"relatedFrom": [],
"id": "df9f9431ed304b0494da84ef63f5f2d8",
"type": "SchemaNode",
"_name": "String"
},
{
"name": "Flavour",
...
},
{
"name": "Location",
...
},
{
"name": "Region",
...
}
],
"serialization_time": "0.000829985"
}
{
"query_time": "0.001466743",
"result_count": 3,
"result": [
{
"name": null,
"sourceId": "28f85dca915245afa3782354ea824130",
"targetId": "e4139c5db45a4c1cbfe5e358a84b11ed",
"sourceMultiplicity": null,
"targetMultiplicity": "1",
"sourceNotion": null,
"targetNotion": null,
"relationshipType": "LOCATED_IN",
"sourceJsonName": null,
"targetJsonName": null,
"id": "d43902ad7348498cbdebcd92135926ea",
"type": "SchemaRelationship",
"relType": "IS_RELATED_TO"
},
{
"name": null,
"sourceId": "df9f9431ed304b0494da84ef63f5f2d8",
"targetId": "96d05ddc9f0b42e2801f06afb1374458",
"sourceMultiplicity": null,
"targetMultiplicity": "1",
"sourceNotion": null,
"targetNotion": null,
"relationshipType": "HAS_FLAVOURS",
"sourceJsonName": null,
"targetJsonName": null,
"id": "bc9a6308d1fd4bfdb64caa355444299d",
"type": "SchemaRelationship",
"relType": "IS_RELATED_TO"
},
{
"name": null,
"sourceId": "df9f9431ed304b0494da84ef63f5f2d8",
"targetId": "28f85dca915245afa3782354ea824130",
"sourceMultiplicity": null,
"targetMultiplicity": "1",
"sourceNotion": null,
"targetNotion": null,
"relationshipType": "PRODUCED_IN",
"sourceJsonName": null,
"targetJsonName": null,
"id": "a55fb5c3cc29448e99a538ef209b8421",
"type": "SchemaRelationship",
"relType": "IS_RELATED_TO"
}
],
"serialization_time": "0.000403616"
}
You can access nodes and relationships stored in Neo4j as JSON objects through a RESTful API which is dynamically configured based on the in-graph schema.
$ curl try.structr.org:8082/structr/rest/whiskies?name=Ardbeg
{
"query_time": "0.001267211",
"result_count": 1,
"result": [
{
"flavour": {
"name": "J",
"description": "Full-Bodied, Dry, Pungent, Peaty and Medicinal, with Spicy, Feinty Notes.",
"id": "626ba892263b45e29d71f51889839ebc",
"type": "Flavour"
},
"location": {
"region": {
"name": "Islay",
"id": "4c7dd3fe2779492e85bdfe7323cd78ee",
"type": "Region"
},
"whiskies": [
...
],
"name": "Port Ellen",
"latitude": null,
"longitude": null,
"altitude": null,
"id": "47f90d67e1954cc584c868e7337b6cbb",
"type": "Location"
},
"name": "Ardbeg",
"id": "2db6b3b41b70439dac002ba2294dc5e7",
"type": "Whisky"
}
],
"serialization_time": "0.010824154"
}
In the UI, there's also a data editing (CRUD) tool, and CMS components supporting to create web applications on Neo4j.
Disclaimer: I'm a developer of Structr and founder of the project.

No, there's no standard way to do this. Indeed, even if there were, keep in mind that the only constraints that neo4j currently supports are uniqueness constraints.
Take for example some sample rules:
All nodes labeled :Person must have non-empty properties fname and lname
All nodes labeled :Person must have >= 1 outbound relationship of type :works_for
The trouble with the present neo4j is that even in the case where you did have a schema language (standardized) that could express these things, there wouldn't be a way that the db engine itself could actually enforce that constraint.
So the simple answer is no, there's no standard way of doing that right now.
A few tricks I've seen people use to simulate the same:
Assemble a list of "test suite" cypher queries, with known results. Query for things you know shouldn't be there; non-empty result sets are a sign of a problem/integrity violation. Query for things you know should be there; empty result sets are a problem.
Application-level control -- via some layer like spring-data or similar, control who can talk to the database. This essentially moves your data integrity/testing problem up into the app, away from the database.
It's a common (and IMHO annoying) aspect of many NoSQL solutions (not specifically neo4j) that because of their schema-weakness, they tend to force validation up the tech stack into the application. Doing these things in the application tends to be harder and more error-prone. SQL databases permit you to implement all sorts of schema constraints, triggers, etc -- specifically to make it really damn hard to put the wrong data into the database. The NoSQL databases typically either aren't there yet, or don't do this as a design decision. There are indeed flexibility/performance tradeoffs. Databases can insert faster and be more flexible to adapt quickly if they aren't burdened with checking each atom of data against a long list of schema rules.
EDIT: Two relevant resources: the metagraphs proposal talks about how you could represent the schema as a graph, and neoprofiler is an application that attempts to infer the actual structure of a neo4j database and show you its "profile".
With time, I think it's reasonable to hope that neo would include basic integrity features like requiring certain labels to have certain properties (the example above), restricting the data types of certain properties (lname must always be a String, never an integer), and so on. The graph data model is a bit wild and wooly though (in the computational complexity sense) and there are some constraints on graphs that people desperately would want, but will probably never get. An example would be the constraint that a graph can't have cycles in it. Enforcing that on the creation of every relationship would be very computationally intensive. (

Related

Workfront API: Human readable status for Issues and Projects

https://support.workfront.com/hc/en-us/articles/115003574147-API-Basics
https://support.workfront.com/hc/en-us/categories/202718477
Querying objects (i.e.: GET /attask/api/v9.0/project/4c78821c0000d6fa8d5e52f07a1d54d0) returns a response similar to the following:
..
"status": "INP",
...
How do you get human readable statuses form these responses? Do you hard-code all possible values, or, can it be achieved using queries?
i.e.: "INP" > "In Progress"
You can pull these values from the Custom Enumeration tables (CSTEM). For example, this will return all the task descriptions:
https://subdomain.my.workfront.com/attask/api/v9.0/CSTEM/search?apiKey={{apiKey}}&fields=*&enumClass=STATUS_TASK&enumClass_Mod=in
You will get something like this:
{
"color": "FF3939",
"equatesWith": "CPL",
"groupID": "5419c94f00004a056282a15eed58e47f",
"label": "Complete",
"objCode": "CSTEM",
"value": "CPL",
"ID": "57ed3a2000477cfb7368beb5d995bf88",
"customerID": "540f5a3f0019b...",
"description": "Task is fully completed",
"enumClass": "STATUS_TASK",
"extRefID": null,
"isPrimary": true,
"valueAsInt": null,
"valueAsString": "CPL"
},
For issues use "STATUS_OPTASK".
I created a dictionary and translating the statuses while reading the short version (INP > In Progress)

Why does Freebase think that all wineries lack official websites?

I’m trying to query for wine producers and their websites on Freebase with this query:
[{
  "/common/topic/official_website": [],
  "id": null,
  "name": null,
  "type": "/wine/wine_producer"
}]
Here it is in the Freebase query editor:
http://www.freebase.com/query?lang=%2Flang%2Fen&q=%5B%7B%22%2Fcommon%2Ftopic%2Fofficial_website%22%3A%5B%5D%2C%22id%22%3Anull%2C%22name%22%3Anull%2C%22type%22%3A%22%2Fwine%2Fwine_producer%22%7D%5D
Why do none of the vineyards have official websites? That seems like a unlikely coincidence. Also, none of the other properties of included types have non-null values.
How do I tell Freebase to obtain the properties of included types in addition to the ones on the wine producer type itself?
False premise. 185 of them do have values for the official web site:
[{
"/common/topic/official_website": [{
"value": null
}],
"id": null,
"name": null,
"type": "/wine/wine_producer",
"return": "count"
}]
You need to forget about the notion of included types for anything related to MQL querying. MQL doesn't know and doesn't care.

Freebase Obtain All Information On One Subject

I'm trying to find the best way to get the information displayed on a Freebase page via a MQL query.
I've tried the topic API but that includes a lot of metadata.
I've also tried using links/reflection as in:
{
"id": "/en/samsung_electronics",
"/type/reflect/any_master": [{
"link": {
"master_property": null
},
"name": null,
"id": null
}],
"/type/reflect/any_reverse": [{
"link": {
"master_property": null
},
"name": null,
"id": null
}],
"/type/reflect/any_value": [{
"link": {
"master_property": null
},
"value": null
}]
}
But that means I'll be missing some information, such as the number of employees because that's given as a "Dated Integer" which, of course, doesn't get automatically expanded and I won't know what I would have to expand in general. My best attempts at expanding all objects by nesting that query once in itself were met with a
"code": 503,
"message": "Backend Error"
In RDF/SPARQL (e.g. DBpedia) I'd just do select ?p ?o where {URI ?p ?o} and select ?s ?p where {?s ?p URI}, am I missing such a simple way to do this in Freebase?
So to summarize, I'm looking for a way to get the information on a Freebase HTML page with as little overhead as possible and without missing anything.
The Topic API was designed specifically for this use case (and is what's used to construct the Freebase HTML page). It takes a filter parameter which can be used to tailor its output to include only parts of the schema which are of interest. What metadata is getting in your way? Why can't you just skip it?
If you'd prefer to use SPARQL, there's an RDF dump available that you could load in your own triple store and query with SPARQL.

Relate two entities using properties in Freebase

I want to find out how Wenjin SU and Jimei University are related in Freebase. I have found out the Wenjin SU has a type /business/board_member/which has property/business/board_member/leader_of. How can I use this information in an Freebase MQL to extract the term or mid of Jimei University?
If you go to the Freebase page for Wenjin SU you see that he has the type /business/board_member/ and under that section it lists him as the /business/board_member/leader_of Jimei University
The first thing you should do is go to the Query Editor and create a skeleton MQL query for that relationship:
{
"id": "/m/0sxhm9v",
"name": null,
"/business/board_member/leader_of": [{}]
}
When you run this query you get the following result:
{
"result": {
"name": "Wenjin SU",
"/business/board_member/leader_of": [{
"name": null,
"type": [
"/organization/leadership"
],
"id": "/m/0sxhm9s"
}],
"id": "/m/0sxhm9v"
}
}
This is not quite what you were asking for. It's saying that he is the leader_of an un-named topic /m/0sxhm9s. Now, if you visit the Freebase page for that topic you'll see that its a mediator node that connects a person and their role to an organization for a specific date range. You'll also notice that Jimei University is listed as the /organization/leadership/organization on this page.
We can now add this mediated property to our MQL query to get the full relationship that you're looking for:
{
"id": "/m/0sxhm9v",
"name": null,
"/business/board_member/leader_of": [{
"/organization/leadership/organization": {
}
}]
}
If you're building an application that has a pre-determined set of relationships like this then you can use this process of exploring the Freebase data to build MQL queries for those relationships. If you're looking to find any arbitrary connection between any two entities in Freebase then you'll need to download the Freebase Data Dumps and run a shortest path algorithm over the entire graph.

List all Freebase Domains with MQL query or API call

I would like to develop a Freebase java application that lets you browse Freebase.
I thought a good starting point would be to mimic the Freebase Schema Explorer and allow the user of my app to "drill down" through Domains, Types in a Domain, then Instances in a Type.
Can someone please assist in how you retrieve a List of domains?
Then a list in that domain? etc...
The user can then select a domain and i would like to preset a list of types within that domain and so on until they have found the entry or entries they are investigating.
MQL for domains:
[{
"id": null,
"name": null,
"type": "/type/domain",
"!/freebase/domain_category/domains": {
"id": "/category/commons"
}
}]​
The "!/freebase/domain_category/domains" clause in there is to restrict things to just the Commons (official) domains - otherwise you get the domain which is automatically created for every user and probably isn't what you're after.
Types in a domain:
[{
"id": null,
"name": null,
"type": "/type/type",
"domain": "/cvg"
}]​
Replace "/cvg" as appropriate.
Instances of a type:
[{
"id": null,
"name": null,
"type": "/cvg/computer_videogame"
}]​
Replace "/cvg/computer_videogame" as appropriate.
This should at least get you started.

Resources