I'd like to create different graphs based on different domain. So, a kind of namespace or schema are needed. Just like the "Schema" conception in RDBMS. Therefore, does Gremlin support Namespace or similar sth?
Thanks
There is no notion of a schema name in the Gremlin language that is exactly like what you typically have in SQL. Your Gremlin query is bound to the graph to which you connect. If you have two or more domains then you either:
Create one graph per domain in which you can't traverse across those domains (you'd have to combine results after traversals - without explicit edges, i.e. joins, to connect the domains Gremlin has no way to do those sorts of queries), or
Create one large graph to house both domains and then constrain your traversal to the domain (in TinkerPop this is sometimes accomplished with PartitionStrategy)
Related
I'm new to Docker's world. I want to query an ontology locally. I have already configured
virtuoso-sparql-endpoint-quickstart.
It works, and my endpoint is http://localhost:8890/sparql.
Now I want to query my own ontology (not DBpedia). So can I still use the same endpoint? How can I add my ontology to virtuoso?
Please note that an ontology is a vocabulary used to describe one or more classes of entities. The descriptions themselves are typically referred to as instance data, and queries are usually run over such instance data. (There are a few ontologies used to describe ontologies, and these descriptions are also instance data, and queries might be made against them.)
There are a number of ways to load data into Virtuoso. The most useful for most people is the Bulk Load facility. For most purposes, you'll want to load your data into one or more distinct Named Graphs, such that queries can be scoped to one, some, or all of the those Named Graphs.
Any and all queries can be made against the same http://localhost:8890/sparql endpoint. Results will vary depending on the Named Graphs identified in your query.
Assuming I have an arbitrary Gremlin query I don't control as input, and a graph database that I run it against, how can I capture the paths of all accessed nodes in the graph, as in, how can I see what parts of the graph are needed by an arbitrary query?
Clarification:
If I run the arbitrary, how can I capture all the accessed data as the query runs, not just the result, but all the data accessed during the query.
Different databases may have explain plan options that give some insight into how a query will run but really the only way to know what a Gremlin query is going to need to visit in the graph is to run it. If you know the schema of the graph you could potentially write some code that analyzes the query to look at the various steps and labels used to make an estimate of what the query will touch but I am not aware of any existing tools that do that.
We are stamping user permission as a property (of SET cardinality) on each nodes and edges. Wondering what is best way to apply the has step on all the visited nodes/edges for a given traversal gremlin query.
like a very simple travarsal query:
// Flights from London Heathrow (LHR) to airports in the USA
g.V().has('code','LHR').out('route').has('country','US').values('code')
add has('permission', 'team1') to all the visited vertices and edges while traversal using the above query.
There are two approaches you may consider.
Write a custom TraversalStrategy
Develop a Gremlin DSL
For a TraversalStrategy you would develop one similar to SubgraphStrategy or PartitionStrategy which would take your user permissions on construction and then automatically inject the necessary has() steps after out() / in() sorts of steps. The drawback here is that your TraversalStrategy must be written in a JVM language and if using Gremlin Server must be installed on the server. If you intend to configure this TraversalStrategy from the client-side in any way you would need to build custom serializers to make that possible.
For a DSL you would create new navigational steps for out() / in() sorts of steps and they would insert the appropriate combination of navigation step and has() step. The DSL approach is nice because you could write it in any programming language and it would work, but it doesn't allow server-side configuration and you must always ensure clients use the DSL when querying the graph.
We are stamping user permission as a property (of SET cardinality) on each nodes and edges.
As a final note, by "SET cardinality" I assume that you mean multi-properties. Edges don't allow for those so you would only be able to stamp such a property on vertices.
The gremlin documentation says:
Many graph vendors do not allow the user to specify an element ID and
in such cases, an exception is thrown.
I assume this refers to only specifying an ID when creating a new vertex or edge, not to the overall use of IDs in queries. So which gremlin implementations do, and which do not allow specifying and ID along with vertex or edge creation?
It's easier to specify the graph databases that do allow id assignment rather than those that don't as most graph databases do not allow you to specify the id when you create a vertex/edge.
I'm only aware of two that allow you to specify the id: TinkerGraph and elastic-gremlin. The rest do not support that.
You can always check what a graph supports by calling the features() method on the Graph instance:
http://tinkerpop.incubator.apache.org/docs/3.0.1-incubating/#_features
I'm building a system which allows the user to call N number of different graphs through an API. Currently I have a working prototype which pulls graphs from CouchDB. However, for obvious reasons, I would like to move to a graph DB. My understanding is that Neo4J can only handle one graph at a time or requires so sort of tagging system to not mix graphs. Neither of those approaches seem optimal. What's the best practice approach for this?
A few more things to note: I will be calling these graphs and manipulating them with something like networkx, and I've considered storing the graphs in a "regular" DB then moving them to Neo4J as the requests come in, which seems pretty intense.
Neo4j does not have a concept of multiple databaseslike most relational databases do using CREATE DATABASE. In Neo4j there is one graph space which you can use.
So you have 2 options:
use seperate Neo4j instances (single or clustered) for each graph, maybe using Neo4j in embedded mode is helpful here
use one Neo4j instance (single or clustered) and store your data in distinct subgraphs. If the subgraphs need some interconnections you can use labels to identify to which subgraph a certain node belongs.