Gremlin find all vertices that have "any" property with a given value - gremlin

The properties in my graph are dynamic. That means, there can be any number of properties on the vertices. This also means that, when I do a search, I will not know what property value to look for. Is it possible in gremlin to query the graph to find all vertices that have any property with a given value.
e.g., with name and desc as properties. If the incoming search request is 'test', the query would be g.V().has('name', 'test').or().has('desc', 'test'). How can I achieve similar functionality when I do not know what properties exist? I need to be able to search on all the properties and check if any of those properties' value is 'test'

You can do this using the following syntax:
g.V().properties().hasValue('test')
However, with any size dataset I would expect this to be a very slow traversal to perform as it is the equivalent of asking an RDBMS "Find me any cell in any column in any table where the value equals 'test'". If this is a high frequency request I would suggest looking at refactoring your graph model or using a database optimized for searches such as Elasticsearch.

Related

Gremlin, how to return all vertex pairs that are connected by an edge with a specific label

Take a simple example of airline connection graph as in below picture
can we come up with a gremlin query that can return pairs of cities connected by SW? Like [{ATL,CHI},{SFO,CHI},{DAL,CHI},{HSV,DAL}]
Looks like all you probably need is:
g.V().outE('SW').inV().path()
If you don't want the edge in the result you can use a flatMap :
g.V().flatMap(outE('SW').inV()).path()
To get back some properties rather than just vertices all you need to do is add a by modulator to the path step.
g.V().flatMap(outE('SW').inV()).path().by(valueMap())
This will return all the properties for every vertex. In a large result set this is not considered a best practice and you should explicitly ask for the properties you care about. There are many ways you can do this using values, project or valueMap. If you had a property called code representing the airport code you might do this.
g.V().
flatMap(outE('SW').inV()).
path().
by(valueMap('code'))
or just
g.V().flatMap(outE('SW').inV()).
path().
by('code')

How to make property value required on a vertex in janusgraph?

I want to add property constraint on a specific vertex label to disallow null values or insertion of a vertex without specific properties
I added the name property to the person vertex as below, so the person will not take other properties except name but I need to add constraint on the value so it can not be null
mgmt = graph.openManagement()
person = mgmt.makeVertexLabel('person').make()
name = mgmt.makePropertyKey('name').dataType(String.class).cardinality(Cardinality.SET).make()
mgmt.addProperties(person, name)
mgmt.commit()
The Problem is :
A vertex with label person requires existence of a name property always. Or this vertex should not be created .
Is this achievable in janusgraph?
It is currently not possible to enforce the presence of certain property keys for certain vertex or edge labels in JanusGraph. However, this would be a good addition for the schema constraints that were introduced in JanusGraph 0.3.0. So, feel free to create an issue with JanusGraph for this feature request.
Until something like this is implemented in JanusGraph, you probably have to implement a logic to enforce this in your application that inserts the data.
If you for some reason cannot or don't want to implement this in your application (e.g. because you don't control all applications that insert data in your graph), then you could also implement your own TinkerPop TraversalStrategy that checks every addV step to ensure that the property is also added. These strategies are evaluated for all traversals and can change (e.g. as an optimization) the steps of the traversal or even throw an exception if the traversal is not legal which would be the correct behaviour in your case. JanusGraph itself would probably also implement a strategy to add these additional schema constraints.

Using timestamp as an Attribute in DynamoDB

I'm quite new to DynamoDB, but have some experience in Cassandra. I'm trying to adapt a pattern I followed in Cassandra, where each column represented a timestamped event, and wondering if it will carry over gracefully into DynamoDB or if I need to change my approach.
My goal is to query a set of documents within a date range by using the milliseconds-since-epoch timestamp as an Attribute name. I'm successfully storing the following as each report is generated with each new report being added under its own column:
{ PartitionKey:customerId,
SortKey:reportName_yyyymm,
'#millis_1#':{'report':doc_1},
'#millis_2#':{'report':doc_2},
. . .
'#millis_n#':{'report':doc_n}
}
My question is, given a millisecond-based date range, and the accompanying Partition and Sort keys, is it possible to query the set of Attributes that fall within that range or must I retrieve all columns for the matching keys and filter them at the client?
Welcome to the most powerful NoSQL database ;)
To kick off with the positive news, there is no way to query out specific attributes. You can project certain attributes in a query. But you would have to write your own logic to determine which attributes or columns should be included in the projected query. To get close to your solution you could use a map attribute inside an item with the milliseconds as a key. But there is another thing you have to be aware of when starting on this path.
There is a maximum total item size of 400KB for each item in DynamoDB, including key and attribute names.(Limits in DynamoDB Items) This means you can only store so many attributes in an item. This is especially true if you intend to put the actual report inside of the attribute. Which I would advise against, also because you will be burning up read capacity units every time you get one attribute out of the whole item. You would be better of putting this data in a separate table with the keys in the map. But truthfully in DynamoDB I would split this whole thing up, just add the milliseconds to the sort key and make every document its own item. That way you can directly query to these items and you can use the "between" where clause to select specific date-time ranges. Please let me you meant something else.

Tinkerpop3/Gremlin. Find (A) Upsert (B) add Edge A to B

I am looking for an upsert functionality in Gremlin.
Client program has a stream of (personId, favoriteMovieNodeId) that need to query for the favoriteMovieNodeId's, then UPSERT a person Vertex and create the [favoriteMovie] edge.
this will create duplicate Person nodes:
g.V().has(label,'movies').has('uid',$favoriteMovieNodeId).as('fm')
.addV('Person').property('personId', $personId).addE('favMovie').to('fm')
Is there a way to check for existence of node based on properties before adding a node? I can't seem to find the documentation on this very basic graph function thats a part of every underlying graph db.
If the movie is guaranteed to exist, then it's:
g.V().has('movies','uid',$favoriteMovieNodeId).as('fm').
coalesce(V().has('Person','personId', $personId),
addV('Person').property('personId', $personId)).
addE('favMovie').to('fm')

How do I search only first object in an array of nested objects using ElasticSearch

I'm using ElasticaBundle and ElasticSearch with Symfony2 in a system I've written.
A 'person' can have many 'positions' in their work history. The positions are sorted by date desc, and in order to find someone's current position with PHP I retrieve and read the first object in the array.
I am struggling to search only the current or first position using ElasticSearch. I have set the mappings upas nested, and I am able to perform a Nested Query returning a 'person' who has a 'position' that matches all my criteria. What I can't do is work out how to only search for the criteria in the 1st listed 'position'. Does anyone have any ideas to set me off on the right path?
The only options I can think of at the moment are:
maintain an order value in each object so I can pick out the 1st,
or create another field in the entity that only has a relationship, with the 1st position
I've read on the ElasticSearch documentation that this kind of key isn't supported in ElasticSearch, but now I can't find the page in question. Sorry.
In the end I got round by the problem by setting an identifier on the current position, i.e. giving it a identifier of 0 with all other positions numbers in order.

Resources