Graph Traversing algorithms in Semantic web - graph

I am asking about Algorithms that would be useful in Querying the Semantic web DB to get all the related RDFs to an original Object.
i.e If the original Object is the movie "inception", I want an algorithm to build queries to get the RDFs of the cast of the movie, the studio, the country ....etc so that I can build a relationship graph.
The most close example is the answer to this question , Especially this class , I wan similar algorithms or maybe titles to search in order to produce such an algorithm, I am thinking maybe some modifications on graph traversing algorithms can work, but I'm not sure.
NOTE: My project is in ASP.NET. So, it would help to use Exisiting .NET libraries.

You should be able to do a simple breadth-first-search to get all the objects that are a certain distance away from a given node.
You'll need to know something about the schema because some neighboring nodes are more meaningful than others. For example, in Freebase, we have intermediate nodes that link a film to an actor and a role. You need to know to go 2-ply deep to get at the actor and the role because just saying that the film is related to the intermediate nodes is not very interesting.

Did you take a look at "property paths"?
Property Paths give a more succinct way to write parts of basic graph
patterns and also extend matching of triple pattern to arbitrary
length paths. Property paths do not invalidate or change any existing
SPARQL query.
Triple stores and SPARQL engines such as OWLIM and AllegroGraph support them.

Related

Grakn: how can I construct a knowledge graph from a collection of texts?

I have several documents (pdf and txt) in my notebook and I want to construct a knowledge graph using Grakn.
Through Google I found the blog but there is no documentation or readme teaching how to do that.
Also is written in the blog "The script to mine text can be found on our GitHub repo here" but I am failing in understanding what I have to do.
Can someone here advise me how to construct a knowledge graph from text using Grakn?
Grakn is a knowledge engine/network, which understands knowledge by well defined entities and relations (ontologies), so you need to use NLP (Natural Language processing) to make human language accessible to a graph network. also you need OCR (Optical Character Recognition) to convert some image texts to text. also you should teach the network basic ontologies to understand the texts. you are actually heading through Singularity era.
To give an example of how to go from a collection of text to a knowledge graph, let us assume that all of your text is concerned with a certain domain of knowledge - in the example of the blog post you mention, we are dealing with biomedical research publications.
A first step could be to find entities, or defined "things", in the text. To stick with the biomedical example, we could look for drugs and genes mentioned in the publications. This is called named-entity-recognition (NER), a technique applied in text-mining.
If a certain drug is often mentioned in the same publication as a particular gene, they "co-occur" and are likely related in some way. This would be an example of a relationship. The automated extraction of exactly how they are related is a difficult problem and is called relationship-extraction (RE).
Solutions for both NER and RE are usually domain-specific (ranging from simple matching of dictionary terms to AI models).
If you are interested in text-mining, a good place to start in python is NLTK.
The idea of a knowledge graph is to put defined things, called entities, in defined relationships to one another to create context. After you have a list of entities that you have found in all your documents, as well as their relationships (as in the example above, co-occurrance in a document or even a single sentence), you can define a schema and upload the entities and relationships into grakn and use all of its functionality to analyze your data.
For a tutorial on how to use grakn with already extracted data, see here

Best practice: How to specify a vertex's domain 'type' in a graph database

When building a graph, it is usually necessary to specify the 'type' of vertices. Conceptually I see this could be done by applying a vertex label or property to each vertex (ie Bob, Label: Man), or alternatively by linking a vertex to another 'type' vertex (ie. Bob --IS A--> Man).
To find a list of all vertices of type 'Man' I can write gremlin queries that work for both of these approaches. But what is best practice?
Best practice: keep your data model simple and make sure it is compatible with efficient indexing by the underlying graph database. There is no one size fits all solution at the TinkerPop level.
It really depends on your data model as well as the indexing capabilities of the underlying database, not to mention the way the data is actually serialized on disk. Ultimately, it all boils down to the way you expect to query your graph and the kind of performance you wish to have.
This being said, people typically use vertex labels, sometimes used in conjunction with a type property of some kind. Graph implementers should be able to provide efficient indexes for answering such query. It should also give a simpler graph model, which is an important thing to consider.
Depending on the size of your graph, you could get performance issues when modeling types with vertices since a man type vertex could quickly become a supernode.

Trees / Graphs, How to represent Multiple parents and children?

I'm hoping you can help me out with some technical questions on graphs/trees.
I'm trying to display the creation of objects in systems.
It's really a tree structure.
It has some interesting requirements.
a)
One node can have many children. Say 20. Maybe more.
ie. one library can be used by many objects.
b)
A child node can have many parents. Say up to 20.
ie. many libraries are used by one procedure or object
c)
A particular node can appear in more than one place.
ie. a generic print, or logging function is called in many procedures
Note: This is just an -example- in tech terms I expect you will understand.
It is NOT the issue I need to model. No need to discuss it.
As I've thought about it, I realized that it's not a simple binary tree, or a linked list.
1)
What kind of data structure could I save all the data in?
2)
How could I produce a graph of this in java?
3)
What is a free open source graphing software that could graph such a tree?
Such as Neo4j
Perhaps in formats:
- as a tree, with a root, trunk, branches, and leaves?
- Like the graphs you see now, depicting social networks, with the root node in the center?
4)
Any good websites, or tutorials on this subject?
Thanks a lot!
Check out prefuse. It's old but it works. You'll have to invest a bit of time to learn how to use it though. Once you get there, it's just a matter of creating a prefuse.data.Graph object and fill in your nodes and their neighbors and then creating the visualization.
If you're open to other solution check out d3.js - draw graph using javascript on SVG element in your browser.
If this is really about objects, then maybe UML can help. It's designed to generate graphs of object relationships. There are tons of free UML tools out there. I'd download one and see if you can shoehorn your application into it.
JGraphT can represent your graph structure and can use JGraph for visualisation.
For an example visualization, look at this.

Measuring distances among classes in RDF/OWL graphs

Maybe someone could give me a hint. Is it possible to measure the distance between 2 concepts/classes that belong to the same ontology?
For example, let's suppose I have an ontology with the
Astronomy class and the Telescope class. There is a link between both, but it is not a direct link. Astronomy has a parent class called Science, and Telescope has a parent class called Optical Instrument which belongs to its parent called Instrumentation, that is related to a class called Empirical Science that finally belongs to a class called Science.
So there is an indirect link between Telescope and Astronomy, and I want to find out the number of steps needed to reach one class starting from the another one.
Is there an easy SPARQL query that resolves that question? Or are there better ways to do that job? Or is not possible to find that out using Semantic Web paradigm?
Any hint will be very appreciated.
SPARQL provides the ability to search for arbitrary length paths in a graph but no mechanism to tell you the length of that path.
So you can do something like:
SELECT * WHERE { ?s ex:property+ ?o }
The syntax is very much like regex so you can do alternatives, restricted cardinalities etc
In my understanding SPARQL doesn't contain any recursive constructions to be able to measure indirect link of arbitrary length. The best you could do is to prepare set of queries distance_1(a, b), distance_2(a, b)... to check for specific distance between two concepts.
Another alternative is to discover this information using non-SPARQL technology, for example writing graph traversing algorithm in Python with RDFlib.
Since you explicitly mentioned that you are talking about classes and they will be in the same ontology, it is safe to assume that they will be always connected (because ultimately both will be a subclass of "Thing", right?). On the other hand, the path I mentioned in the parentheses (Class1 -> ... -> Thing <- ... <- Class2) is a trivial one, so I assume you want to find... all of the existing paths between two classes, in other words, all of the existing paths between two vertices. Is that true? Or are you looking for the shortest path? Your question is not very clear in that aspect, can you clarify it?
As far as I know there is no simple SPARQL construct that will list all the paths between classes or the shortest path. However some semantic web triple stores come with graph traversal algorithms such as breadth-first-search or depth-first-search, please refer to:
http://www.franz.com/agraph/support/documentation/current/lisp-reference.html#sna
You may also find the source code of the following project very useful:
RelFinder, Interactive Relationship Discovery in RDF Data, http://www.visualdataweb.org/relfinder.php

How to serialize a graph?

This is an interview question: How to serialize a graph ? I saw this answer but I am not sure if this is enough.
It looks like a very confusing "open question" and the candidates are probably expected to ask more questions about the requirements: what the nodes and edges are, how they are serialized themselves, is this graph weighted, directed, etc., how many nodes/edges are in the graph.What about the infrastructure ? Is it a plain file system or we should/can use a database ?
So, how would you answer this question ?
I think the answer you provided is quite reasonable. IMO, basically you need to know the application background, I will ask at least:
is it directed or not?
what are the properties associated with the vertex, edge and graph itself?
is the graph sparse (If so then we'd better not use adjacency matrix) ?
The simplest way will be storing it as an edge list.
However, in different application there are some classical ways to do it.
For example if you are doing circuit simulation then the graph is sparse and
the resulting graph/matrix can be stored as column-compressed form. If you are solving a (min-cost) max-flow problem then there are already a DIMACS format, such that public solvers can read it and write it. Structured way is also a good choice if you want human readable, XML can provide self-validation (there is already a GraphML as the standard). By the way, the dot format is quite self-contained.
Meh. Whatever you store it in, it's basically:
Output each vertex in the graph. If you don't have all the vertices first, it's a PITA to rebuild the graph when you're reading it back in.
Now you can store edges between vertices. Hopefully your vertices have some form of ID, to uniquely identify them. The version of this I've seen is "store a (graph|tree) in a database". So, read in the nodes, store in a hashtable or similar for O(1) amortized lookup. Then, foreach edge, lookup ID-source and ID-dest, and link.
Voila, you've deserialized it. If it's not a DB, the same idea generally holds - serialize nodes first, then edges.

Resources