How to get an attribute named ID in gremlin - graph

I'm getting weird results while writing a gremlin query. I can happily use the has function for most of the attributes for my nodes, for example "().has('name', 'VerisignCzagExtension').property('id')" will return v5086. But when I attempt to use the has function with the attribute id it never returns true. for example "().has('id', 'v5086').property('id')" returns no results. Anyone have any idea why this is happening?
Thanks.

Internally, Neo4j stores all IDs as java.lang.Long objects. This is a special behavior for id property only. All other properties are stored with their implied data types. That's a reason why has('name', 'VerisignCzagExtension') works (because name property is excluded from this special behavior meant for id). I'm assuming v5086 is being type casted to java.lang.Long, thus losing it's real value. That could explain zero results after a has('id', 'v5086') Gremlin step.
AFAIK, id property is immutable (can't be changed). If you need to make id look ups for vertices using a has Gremlin step, it would look something like has('id', 5086L) assuming that the vertex id is 5086 and is being stored as a java.lang.Long value. An extra L is for explicit java.lang.Long type-casting, Neo4j would assume java.lang.Integer if you don't add that L and your Gremlin step would result in zero results again.
Finally, you might want to call your named ID something else, like a property with key name.
Hope this helps.

Related

Querying on Global Secondary indexes with a usage of contains operator

I've been reading a DynamoDB docs and was unable to understand if it does make sense to query on Global Secondary Index with a usage of 'contains' operator.
My problem is as follows: my dynamoDB document has a list of embedded objects, every object has a 'code' field which is unique:
{
"entities":[
{"code":"entity1Code", "name":"entity1Name"},
{"code":"entity2Code", "name":"entity2Name"}
]
}
I want to be able to get all documents that contain entities with entity.code = X.
For this purpose I'm considering adding a Global Secondary Index that would contain all entity.codes that are present in current db document separated by a comma. So the example above would look like:
{
"entities":[
{"code":"entity1Code", "name":"entity1Name"},
{"code":"entity2Code", "name":"entity2Name"}
],
"entitiesGlobalSecondaryIndex":"entityCode1,entityCode2"
}
And then I would like to apply filter expression on entitiesGlobalSecondaryIndex something like: entitiesGlobalSecondaryIndex contains entityCode1.
Would this be efficient or using global secondary index does not make sense in this way and DynamoDB will simply check the condition against every document which is similar so scan?
Any help is very appreciated,
Thanks
The contains operator of a query cannot be run on a partition Key. In order for a query to use any sort of operators (contains, begins with, > < ect...) you must have a range attributes- aka your Sort Key.
You can very well set up a GSI with some value as your PK and this code as your SK. However, GSIs are replication of the table - there is a slight potential for the data ina GSI to lag behind that of the master copy. If the query you're doing against this GSI isn't very often, then you're probably safe from that.
However. If you are trying to do this to the entire table at once then it's no better than a scan.
If what you need is a specific Code to return all its documents at once, then you could do a GSI with that as the PK. If you add a date field as the SK of this GSI it would even be time sorted. If you query against that code in that index, you'll get every single one of them.
Since you may have multiple codes, if they aren't too many per document, you maybe could use a Sparse Index - if you have an entity with code "AAAA" then you also have an attribute named AAAA (or AAAAflag or something.) It is always null/does not exist Unless the entities contains that code. If you do a GSI on this AAAflag attribute, it will only contain documents that contain that entity code, and ignore all where this attribute does not exist on a given document. This may work for you if you can also provide a good PK on this to keep the numbers well partitioned and if you don't have too many codes.
Filter expressions by the way are different than all of the above. Filter expressions are run on tbe data that would be returned, after it is already read out of the table. This is useful I'd you have a multi access pattern setup, but don't want a particular call to get all the documents associated with a particular PK - in the interests of keeping the data your code is working with concise. The query with a filter expression still retrieves everything from that query, but only presents what makes it past the filter.
If are only querying against a particular PK at any given time and you want to know if it contains any entities of x, then a Filter expressions would work perfectly. Of course, this is only per PK and not for your entire table.
If all you need is numbers, then you could do a count attribute on the document, or a meta document on that partition that contains these values and could be queried directly.
Lastly, and I have no idea if this would work or not, if your entities attribute is a map type you might very well be able to filter against entities code - and maybe even with entities.code.contains(value) if it was an SK - but I do not know if this is possible or not

How to get properties hasid/key with vertexes info in one gremlin query or Gremlin.Net

I try to get properties which has key or id in following query by Gremlin.Net, but vertex info(id and label) in VertexProperty is null in result.
g.V().Properties<VertexProperty>().HasKey(somekey).Promise(p => p.ToList())
So i try another way, but it's return class is Path, and i had to write an ugly code for type conversion.
g.V().Properties<VertexProperty>().HasKey(somekey).Path().By(__.ValueMap<object, object>(true))
Is there a better way to achieve this requirement
I think basically the only thing missing to get what you want is the Project() step.
In order to find all vertices that have a certain property key and then get their id, label, and then all information about that property, you can use this traversal:
g.V().
Has(someKey).
Project<object>("vertexId", "vertexLabel", "property").
By(T.Id).
By(T.Label).
By(__.Properties<object>(someKey).ElementMap<object>()).
Promise(t => t.ToList());
This returns a Dictionary where the keys are the arguments given to the Project step.
If you instead want to filter by a certain property id instead of a property key, then you can do it in a very similar way:
g.V().
Where(__.Properties<object>().HasId(propertyId)).
Project<object>("vertexId", "vertexLabel", "property").
By(T.Id).
By(T.Label).
By(__.Properties<object>(someKey).ElementMap<object>()).
Promise(t => t.ToList());
This filters in both cases first the vertices to only have vertices that have the properties we are looking for. That way, we can use the Project() step afterwards to get the desired data back.
ElementMap should give all information back about the properties that you want.
Note however that these traversals will most likely require a full graph scan in JanusGraph, meaning that it has to iterate over all vertices in your graph. The reason is that these traversals cannot use an index which would make them much more efficient. So, for larger graphs, the traversals will probably not be feasible.
If you had the vertex ids available instead of the property ids in the second traversal, then you could make the traversal a lot more efficient by replacing g.V().Where([...]) simply with g.V(id).

Dynamodb: Index on List attribute and query NOT_CONTAINS

I am trying to figure out (at this point I think the answer is No) if it is possible to build a index on a List Attribute and query NOT_CONTAINS on that attribute.
Example table:
Tasks
Task_id: string
solved_by: List<String> # stores list of user_ids who previously solved this task.
My query would be:
Get me all the tasks not yet solved by current_user
select * from tasks where tasks.solved_by NOT_CONTAINS current_user_id
Is it possible to do this without full scans. I tried creating an attribute of type L but aws cli errors out saying Member must satisfy enum value set: [B, N, S]
If this is not possible with dynamodb, please suggest what datastore I can use.
Any help is highly appreciated. Thanks!
As you found out, and as the error you got suggests, this is NOT possible.
However, I'd argue if your design couldn't be improved. Storing a potentially unbound list of entries (users in your case) inside a single item, which is limited to 400kb seems dangerous.
If instead, you'd store for each task the information that a particular user resolved it as a separate item (partition key - task_id, sort key - user_id) than you could easily look up if a user solved a task or not. You could also store additional information about the particular solution or attempts.
If you haven't heard of DynamoDB single table design yet, or how to overload indexes, I can recommend looking at
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-modeling-nosql-B.html
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-gsi-overloading.html
https://www.dynamodbbook.com/
Update
I just realised, you care about a negation (NOT_CONTAINS) - for those, you can't use an index anyway. For the sort key you can only use positive comparison (=, <, >, <=, >=, between, begins_with): https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Query.html#Query.KeyConditionExpressions
So you might have to rethink the whole approach, to better pre-process the data stored in DDB, so it's easier to fetch, or pick a different database.
In your original question, you defined your access pattern as
Get me all the tasks not yet solved by current_user
In a later comment, you clarified that the access pattern is
A solver should be shown a task that is not yet solved by them.
which is a slightly different access pattern.
Here's one way you could fetch a task not yet solved by a user.
In this data model, I chose to model Users and Tasks as separate items. Tasks have numerically increasing ID's. Each User item should start with the lastSolved attribute set to 1. Each time you fetch a new Task for a user, you fetch TASK#{last_solved+1} and increment the lastSolved attribute by 1.
You could probably take a similar approach by using timestamps instead of numbers... anything sortable, really.

Gremlin code to find 1 vertex with specific property

I want to return a node where the node has a property as a specific uuid and I just want to return one of them (there could be several matches).
g.V().where('application_uuid', eq(application_uuid).next()
Would the above query return all the nodes? How do I just return 1?
I also want to get the property map of this node. How would I do this?
You would just do:
g.V().has('application_uuid', application_uuid).next()
but even better would be the signature that includes the vertex label (if you can):
g.V().has('vlabel', 'application_uuid', application_uuid).next()
Perhaps going a bit further if you explicitly need just one you could:
g.V().has('vlabel', 'application_uuid', application_uuid).limit(1).next()
so that both the graph provider and/or Gremlin Server know your intent is to only next() back one result. In that way, you may save some extra network traffic/processing.
This is a very basic query. You should read more about gremlin. I can suggest Practical Gremlin book.
As for your query, you can use has to filter by property, and limit to get specific number of results:
g.V().has('application_uuid', application_uuid).limit(1).next()
Running your query without the limit will also return a single result since the query result is an iterator. Using toList() will return all results in an array.

EMC Documentum DQL - How to delete repeating attribute

I have a few objects created on my database and I need to delete some of the repeating attributes related to them.
The query I'm trying to run is:
UPDATE gemp1_product objects REMOVE ingredients[1] WHERE (r_object_id = '08015abd8002cd68')
But all I get is the folloing error message:
Error querying databse.
[DM_QUERY_E_UPDATE_INDEX]error: "UPDATE: Unable to REMOVE tghe attribute ingredients at index 1."
[DM_OBJECT_W_DELETE_ATTR_POSITION_ERROR]warning: "attempt to delete
non-existent attribute 88"
Object 08015abd8002cd68 exists and I can see it on the database. Queries like SELECT and DELETE work fine but I do not want to delete the whole object.
There is no easy way to do this. The reason is that repeating attributes are ordered, to enable multiple repeating attributes to be synchronized for a given object.
Either
set the attribute value to be empty for the given position, and change your code to discard empty attributes, or
use multiple DQL statements to shuffle the order so that the last one becomes empty, or
change your data model, e.g. use a single attribute as a property bag with pre-defined delimiters.
Details (1)
UPDATE gemp1_product OBJECTS SET ingredients[1] = '' WHERE ...
Details (2)
For each index; first find the value of index+1:
SELECT ingredients
FROM gemp1_product
WHERE (i_position*-1)-1 = <index+1>
ENABLE (ROW_BASED)
Use the value in a new query:
UPDATE gemp1_product OBJECTS SET ingredients[1] = '<value_from_above>' WHERE ...
It should also be possible to do this by nesting DQL somehow, but it might not be worth the effort.
Something is either wrong with your query or with your repository. I think you are mistyping your attribute name or using wrong index in your UPDATE query.
If you google for DM_OBJECT_W_DELETE_ATTR_POSITION_ERROR you'll see on this link a bit more detailed explanation:
CAUSE: Program executed a DeleteAttr operation that specified an non-existent attribute position (either a negative number or a number larger than the number of attributes in the object).
From this you could guess that type isn't in consistent state, or that you are trying to remove too big index of your repeating attribute, etc. Did you checked your repository with Consistency checker Job and other similar Jobs?
As of for the removing of repeating property (sttribute) value with DQL query, this is unachievable with single query since you need to specify index position which you don't know at first. But writing a simple script or doing it manually if it's not big amount of values to delete is the way you want to go.

Resources