Add or get vertex in Azure Cosmos DB Graph API - azure-cosmosdb

Using Gremlin, I can create a vertex in an Azure Cosmos DB graph by issuing
g.addV('the-label').property('id', 'the-id')
and subsequently find it using
g.V('the-label').has('id', 'the-id')
However, I haven't found a way to issue a query that will insert the node if it is missing, and just get the reference to it if it already exists. Is there a way?
My concrete use case is that I want to add an edge between two nodes, regardless of whether those nodes (or the edge, for that matter) exist already or not, in a single query. I tried this upsert approach, but apparently Cosmos DB does not support Groovy closures, so it won't work.

The "upsert pattern" is relatively well defined and accepted at this point. It is described here. If you want to extend that to also add an edge, that's possible too:
g.V().has('event','id','1').
fold().
coalesce(unfold(),
addV('event').property('id','1')).as('start').
coalesce(outE('link').has('id','3'),
coalesce(V().has('event','id','2'),
addV('event').property('id','2')).
addE('link').from('start').property('id','3'))
If that looks a bit complex you can definitely simplify with a Gremlin DSL (though I'm not sure that CosmosDB supports Gremlin bytecode at this point). Here's an example with even more complex upsert logic simplified by a DSL. It's discussed in this blog post in more detail.

Please look at this.
http://tinkerpop.apache.org/docs/current/reference/#coalesce-step
You can try
g.Inject(0).coalesce(__.V().has('id', 'the-id'), addV('the-label').property('id', 'the-id'))
btw, you won't able to find the vertex using g.V('the-label').has('id', 'the-id').
g.V() accepts vertex id as parameters and not vertex labels.

Related

How to use geoshape in gremlinpython

In JanusGraph,there is some function like
g.E().has('place', geoWithin(Geoshape.circle(37.97, 23.72, 50)))
to search place data. Now I want to use gremlinpython to do that,but I can't find the suitable API from the document.
Gremlin does not yet support Geo data types and predicates. The bits of syntax that you are referencing are specific to JanusGraph and are part of its libraries. At this point, I don't believe that JanusGraph has a Python specific library to give you direct access to those things. If you need to use Geo searches then, for now, you will need to submit a Gremlin script to JanusGraph Server with that syntax.
Something like this:
g.V().has('polygon',geoIntersect(Geoshape.point(55.70,37.55)))

Gremlin, filter according to different properties of the same vertex

I have vertices which have two integer properties (int1 and int2). I simply want to select all vertices where int1 is greater than int2.
I already know about this way of doing it,
g.V().hasLabel('person').as('a')
.where('a',gt('a')).by('age').by('k').valueMap('age','k')
but I need another way that works with older versions. This gremlin syntax needs Tinkerpop 3.2.4.
As mentioned in my comment above, the traversal should work in 3.2.4. But anyway, here's another way of doing it:
g.V().hasLabel('person').
sack(assign).by('age').
sack(minus).by('k').
filter(sack().is(gt(0))).
valueMap('age','k')

cosmos db graph api how to check if exist and update edge?

Ho to check if edge already exist before create one with gremlin query? How to update existing edge instead of deleting and recreating?
I'm not sure if you're still looking for an answer; however, the simple answer is that Cosmos DB is somewhat limited in its Gremlin support. See here: https://learn.microsoft.com/en-us/azure/cosmos-db/gremlin-support. The only way to update an edge as at the time of writing this answer is to delete and recreate the edge. That is true whether you're adding properties or updating them.
In terms of querying if the edge already exists, you can use g.E('<xyz-id-guid>') or g.V('id', '<xyz-id-guid>', '<partition-key-property>', '<xyz-id>').outE('<edge-label>').hasId('<xyz-id-guid>'). The hasId() part is optional but recommended as is the use of the partition key value. Both help performance.
Hope that helps.
Cheers,
Seb

A* search in neo4j

I want to search for shortest path in the directed acyclic graph with neo4j. I have graph that looks similar to this:
I want to find path starting from Root down to Layer 3. At each layer I have different set of properties and I can calculate weight using this properties and user input. I need to find all shortest paths with minimal dynamic weight using A* or another search algorithm (it is possible to have several paths with equal weights). Is it possible with neo4j and cypher or gremlin?
I don't want to use embedded version because my project is written in python, so I can't use java library that as I know can do this.
As of now, Cypher does not allow you to pass in function e.g. your cost function. Adding this as feature must be decided very carefully as injecting runnable code by a query language has some security concerns.
That said what you can do now: create a unmanaged extension to the Neo4j server. Inside your unmanaged extension you make use of the the provided graph algorithms. Using JAX-RS parameter you provide data to identify the start and end node of your traversal and let graph algos do the dirty work.
You might want to take a look at https://github.com/sarmbruster/unmanaged-extension-archetype, this is a minimalistic sample project using gradle as a build system.
However, the sketched idea involved Java coding for the server side part. Client side you can use whatever stack you like.

TinkerPop Blueprints and Frames - How to treat data as collections

I found this question How to store and retrieve different types of Vertices with the Tinkerpop/Blueprints graph API?
But it's not a clear answer for me.
How do I query the 10 most recent articles vs the 10 most recently registered users for an api for instance? Put graphing aside for a moment as this stack must be able to handle both a collection/document paradigm as well as a graphing paradigm (relations between elements of these TYPES of collections). I've read everything including source and am getting closer but not quite there.
Closest document I've read is the multitenant article here: http://architects.dzone.com/articles/multitenant-graph-applications
But it focuses on using gremlin for graph segregation and querying which I'm hoping to avoid until I require analysis on the graphs.
I'm considering using cassandr/hadoop at this point with a reference to the graph id but I can see this biting me down the road.
Indexes in Tinkerpop/Blueprints only support simple lookups based on exact matches of property keys.
If you want to find the 10 most recent articles, or articles with the phrase 'Foo Bar', you may have to maintain an external index using Lucene or Elastic Search. These technologies use inverted indexes which support term/phrase lookups, range queries, and wildcard searches, etc. You can store the vertex.getId() as the field in the indexed document to link back to the vertex in the graph database.
I have implemented something like this for my application which uses a Blueprints database (Bitsy) and a fancy index (Lucene). I have documented the high-level design to (a) keep the fancy index up-to-date using batch updates every few seconds and (b) ensure transactional consistency across the graph database and the fancy index. Hope this helps.
Answering my own question to this - The answer is indices.
https://github.com/tinkerpop/blueprints/wiki/Graph-Indices

Resources