How to add list type properties to vertices in the NebulaGraph database? - nebula-graph

We are testing NebulaGraph database on AWS and need to add dynamic vertex properties. For example, a Person vertex may have purchased a product many times, how to add a list time orderTime property, and every time the person purchases the product, a timestamp or time string is pushed into this list type property?
Finally we get (v:Person{orderTime:[1672898563,1672897563,1672896563]}).

Normally, if the purchase event is an edge/relationship between two vertices, and it’s multi-to-multi, it’s a tyical case we should use the rank of the edge as the timestamp. In NebulaGraph, there is no “id” or Key for an edge, instead, a 4-tuple defined one instance of edge, refer to here for details.
While, if in the case the purchase should be considerred as a “property” for a vertex under TAG: Person, we could leverage the self-loop edge to enable the list of orderTime property, see more from here for details.
Besically, the two approaches share similar ideas under the hood, just for second one, the source and target vertices are the same.

Related

Get nodes adjacent to child node in Gremlin

I'm fairly new to Gremlin and I'm trying to query a graph starting at my vertex Customer, which is related to various nodes amongst which is the Account node. And so, I want to retrieve all nodes related to the Customer node + all nodes related to the Account node connected to it.
As you can see in the image, my Customer node is related to the account node via the has_account edge. I would like to get all the nodes adjacent to that account node.
Customer node
As I said, I'm fairly new to neptune so what I've tried aside from the most basic visualizations is:
g.V('id').outE().inV().outE().inV().path()
And that gives me the nodes adjacent to the Account node but ommits the other adjacent nodes to the Customer node. I've also tried some other groupings and mappings but I can't seem to make it work.
In the query you wrote above, you are starting out a vertex id and then traversing to all of its connections (the first outE().inV(), and then the connections of the connections (the second outE().inV()). If a connection does not have any connections, then it will be filtered out.
If you would like to return both the 1st and optionally 2nd connections, then I would look at using the optional() step for the 2nd+ level connections that may or may not exist like this:
g.V('id').outE().inV().optional(outE().inV()).path()

Gremlin query to get in and out edges for a given Vertex

I’m just playing with the Graph API in Cosmos DB
which uses the Gremlin syntax for query.
I have a number of users (Vertex) in the graph and each have ‘knows’ properties to other users. Some of these are out edges (outE) and others are in edges (inE) depending on how the relationship was created.
I’m now trying to create a query which will return all ‘knows’ relationships for a given user (Vertex).
I can easily get the ID of either inE or outE via:
g.V('7112138f-fae6-4272-92d8-4f42e331b5e1').inE('knows')
g.V('7112138f-fae6-4272-92d8-4f42e331b5e1').outE('knows')
where '7112138f-fae6-4272-92d8-4f42e331b5e1' is the Id of the user I’m querying, but I don’t know ahead of time whether this is an in or out edge, so want to get both (e.g. if the user has in and out edges with the ‘knows’ label).
I’ve tried using a projection and OR operator and various combinations of things e.g.:
g.V('7112138f-fae6-4272-92d8-4f42e331b5e1').where(outE('knows').or().inE('knows'))
but its not getting me back the data I want.
All I want out is a list of the Id’s of all inE and outE that have the label ‘knows’ for a given vertex.
Or is there a simpler/better way to model bi-directional associations such as ‘knows’ or ‘friendOf’?
Thanks
You can use the bothE step in this case. g.V('7112138f-fae6-4272-92d8-4f42e331b5e1').bothE('knows')

ArangoDB anonymous graph traversal

I am planing to use ArangoDB and I am faced with a problem I don't know how to solve. I would like to do simple traversals but in my case but there are two requirements that I don't know how to solve:
I will not know in advance the type of vertices than an edge will connect to. I want to be able to connect edge of one type to any vertex on any side.
For one vertex, I want to retrieve all connected vertices (depth 1) no matter the edge type.
For the requirement 1, an example would be a Tag vertex (to tag some entity with some information) and I want to be able to tag any vertex using i.e. HasTag edge in a named graph. From what I currently see is that I need to define the "From" collections ("To" collection is the Tag collection) and this is limited to 10 collections. Since I could have 100 or more From collections I don't see how to solve this with named graphs.
Option would be to use anonymous graphs but then I have a problem in the second requirement. I also want to have an option, when given a vertex, to find all connected vertices (depth = 1) no matter the type of an edge. In an anonymous graph I would need to specify all of the edge collections in a query and again, there could be 100 or more of them. I don't know if there is a limit to this number but I would assume there is one - maybe I'm mistaken since I haven't yet tried it out.
Has anyone any idea how to solve this with ArrangoDB? I really like the database but I would like it to be more "typeless", that is, that I wouldn't have to define the type of vertex collection an edge can connect to.
Best regards
Tomaz
You can have more than 10 vertex collections in a named graph. The limitation of 10 only exists in the webUI. Creating the named graph over the ArangoShell or the server console will work.

Is it possible to create a Vertex that requires an edge in order to be created

I'd like to know if it is possible to create a Vertex that requires an edge in order to be created.
For example, I want to create an Invoice class that has a HasCustomer edge which points to Person.
I want the HasCustomer edge to be mandatory in order to create the Invoice.
You cannot create the Invoice unless you have the HasCustomer edge.
I know we can have a link to Person, but there is no referential integrity. I can delete the Person and the Invoice will simply end up with a link to a customer that is non-existent.
In OrientDB, you have to create the edge yourself. So if you have the invoice vertex created, you then have to create the HasCustomer edge between the invoice and the customer.
However, if you delete that invoice vertex later, ODB will also remove the linked edge you created (and others) automatically, in order to keep up the data integrity (i.e no orphaned edges).
http://orientdb.com/docs/2.1/SQL-Delete-Vertex.html
This is also why you should choose the Graph API over the Document API. With the Document API, keeping up any integrity between links is up to you.
I am also not sure if it is possible, but you could theoretically create a server-side function, which triggers after any invoice vertex is created (onAfterCreation trigger), which could then create the HasCustomer edge automatically. Again, all theory on my part, as I've never done it before.
http://orientdb.com/docs/2.2.x/Functions.html
http://orientdb.com/docs/2.2.x/Dynamic-Hooks.html
Scott
Watching the official documentation, you can't do the operation that you have described. The only property mandatory you can use is on the fields on class or on edges. About use of the link, rightly as you said, missing control over referential integrity, this because to do this check is very expensive in terms of performance.

how to ensure there single edge in a graph for a given order_id?

My current scenario is like I have I have products,customer and seller nodes in my graph ecosystem. The problem I am facing is that I have to ensure uniqueness of
(customer)-[buys]->product
with order_item_id as property of the buys relation edge.I have to ensure that there is an unique edge with buys property for a given order_item_id. In this way I want to ensure that my graph insertion remains idempotent and no repeated buys edges are created for a given order_item_id.
creating a order_item_id property
if(!mgmt.getPropertyKey("order_item_id")){
order_item_id=mgmt.makePropertyKey("order_item_id").dataType(Integer.class).make();
}else{
order_item_id=mgmt.getPropertyKey("order_item_id");
}
What I have found so far is that building unique index might solve my problem. like
if(mgmt.getGraphIndex('order_item_id')){
ridIndexBuilder=mgmt.getGraphIndex('order_item_id')
}else{
ridIndexBuilder=mgmt.buildIndex("order_item_id",Edge.class).addKey(order_item_id).unique().buildCompositeIndex();
}
Or I can also use something like
mgmt.buildEdgeIndex(graph.getOrCreateEdgeLabel("product"),"uniqueOrderItemId",Direction.BOTH,order_item_id)
How should I ensure this uniqueness of single buys edge for a given
order_item_id. (I don't have a use-case to search based on
order_item_id.)
What is the basic difference in creating an index on edge using
buildIndex and using buildEdgeIndex?
You cannot enforce the uniqueness of properties at the edge-level, ie. between two vertices (see this question on the mailing list). If I understand your problem correctly, building a CompositeIndex on edge with a uniqueness constraint for a given property should address your problem, even though you do not plan to search these edges by the indexed property. However, this could lead to performance issues when inserting many edges simultaneously due to locking. Depending on the rate at which you insert data, you may have to skip the locking (= skip the uniqueness constraint) and risk duplicate edges, then handle the deduplication yourself at read time and/or run post-import batch jobs to clean up potential duplicates.
buildIndex() builds a global, graph-index (either CompositeIndex or MixedIndex). These kind of indexes typically allows you to quickly find starting points of your Traversal within a graph.
However, buildEdgeIndex() allows you to build a local, "vertex-centric index" which is meant to speed-up the traversal between vertices with a potentially high degree (= huge number of incident edges, incoming and/or outgoing). Such vertices are called "super nodes" (see A solution to the supernode problem blog post from Aurelius) - and although they tend to be quite rare, the likelihood of traversing them isn't that low.
Reference: Titan documentation, Ch. 8 - Indexing for better Performance.

Resources