How to get all the vertices/edges without specifying the types in NebulaGraph? - nebula-graph

Is there any way to find all vertices without specifying the relevant tag in the Nebula Graph database?
I tried match (v) return (v), and then the error message Scan vertices or edges need to specify a limit number, or limit number can not push down popped up.
How can I get all vertices without specifying a tag?

You cannot get all vertices directly without specifying a tag, nor edge, or you can use the LIMIT clause to limit the number returned.

Think twice before retrieving all the vertices and edges in the NebulaGraph database because when the data set is large enough, such an operation can be time-consuming and may affect or even disable the service.
If you only want to calculate the number of the vertices and edges, run SUBMIT JOB STATS, and then SHOW STATS. This allows you to see the statistics of the graph space, in a much more performance-friendly way.

By nGQL, you cannot directly getting all the vertices without specifying the tags, neither the edges, or you can use the LIMIT clause to limit the number of returns.
E.g., You cannot run MATCH (n) RETURN (n). An error like Scan vertices or edges need to specify a limit number, or limit number can not push down. will be returned.
You can use NebulaGraph Algorithm. Or get vertices by each tag, and then group them by yourself.

Related

Let A be a gremlin query, and let t1, t2 be two point in times. Excluding ids of edges and nodes, is At1 = At2?

This might be a pretty obscure question but I'll try my best here.
Assuming I have a very simple query, for instance:
g.addV('Person').property('name', 'Marko');
And then I run the same query again.
Of course the graph created two different nodes, but regardless of the id, are they "the same"?
Same for querying the graph:
g.V()
Will the graph produce the results in the same order for any run (assuming it didn't change)?
What I'm trying to ask - can I count on the order of the Gremlin execution?
Thanks!
Gremlin does not enforce iteration order unless you explicitly specify it. It is up to the underlying graph to determine order and most that I'm familiar with do not make such guarantees. Therefore, if you want an order, you need to specify it as in: g.V().order().by('name').

Gremlin query to find the entire sub-graph that a specific node is connected in any way to

I am brand new to Gremlin and am using gremlin-python to traverse my graph. The graph is made up of many clusters or sub-graphs which are intra-connected, and not inter-connected with any other cluster in the graph.
A simple example of this is a graph with 5 nodes and 3 edges:
Customer_1 is connected to CreditCard_A with 1_HasCreditCard_A edge
Customer_2 is connected to CreditCard_B with 2_HasCreditCard_B edge
Customer_3 is connected to CreditCard_A with 3_HasCreditCard_A edge
I want a query that will return a sub-graph object of all nodes and edges connected (in or out) to the queried node. I can then store this sub-graph as a variable and then run different traversals on it to calculate different things.
This query would need to be recursive as these clusters could be made up of nodes which are many (inward or outward) hops away from each other. There are also many different types of nodes and edges, and they all must be returned.
For example:
If I specified Customer_1 in the query, the resulting sub-graph would contain Customer_1, Customer_3, CreditCardA, 1_HasCreditCard_A, and 3_HasCreditCard_A.
If I specififed Customer_2, the returned sub-graph would consist of Customer_2, CreditCard_B, 2_HasCreditCard_B.
If I queried Customer_3, the exact same subgraph object as returned from the Customer_1 query would be returned.
I have used both Neo4J with Cypher and Dgraph with GraphQL and found this task quite easy in these two langauges, but am struggling a bit more with understanding gremlin.
EDIT:
From, this question, the selected answer should achieve what I want, but without specifying the edge type by changing .both('created') to just .both().
However, the loop syntax: .loop{true}{true} is invalid in Python of course. Is this loop function available in gremlin-python? I cannot find anything.
EDIT 2:
I have tried this and it seems to be working as expected, I think.
g.V(node_id).repeat(bothE().otherV().simplePath()).emit()
Is this a valid solution to what I am looking for? Is it also possible to include the queried node in this result?
Regarding the second edit, this looks like a valid solution that returns all the vertices connected to the starting vertex.
Some small fixes:
you can change the bothE().otherV() to both()
if you want to get also the starting vertex you need to move the emit step before the repeat
I would add a dedup step to remove all duplicate vertices (can be more than 1 path to a vertex)
g.V(node_id).emit().repeat(both().simplePath()).dedup()
exmaple: https://gremlify.com/jngpuy3dwg9

ArangoDB anonymous graph traversal

I am planing to use ArangoDB and I am faced with a problem I don't know how to solve. I would like to do simple traversals but in my case but there are two requirements that I don't know how to solve:
I will not know in advance the type of vertices than an edge will connect to. I want to be able to connect edge of one type to any vertex on any side.
For one vertex, I want to retrieve all connected vertices (depth 1) no matter the edge type.
For the requirement 1, an example would be a Tag vertex (to tag some entity with some information) and I want to be able to tag any vertex using i.e. HasTag edge in a named graph. From what I currently see is that I need to define the "From" collections ("To" collection is the Tag collection) and this is limited to 10 collections. Since I could have 100 or more From collections I don't see how to solve this with named graphs.
Option would be to use anonymous graphs but then I have a problem in the second requirement. I also want to have an option, when given a vertex, to find all connected vertices (depth = 1) no matter the type of an edge. In an anonymous graph I would need to specify all of the edge collections in a query and again, there could be 100 or more of them. I don't know if there is a limit to this number but I would assume there is one - maybe I'm mistaken since I haven't yet tried it out.
Has anyone any idea how to solve this with ArrangoDB? I really like the database but I would like it to be more "typeless", that is, that I wouldn't have to define the type of vertex collection an edge can connect to.
Best regards
Tomaz
You can have more than 10 vertex collections in a named graph. The limitation of 10 only exists in the webUI. Creating the named graph over the ArangoShell or the server console will work.

How to reduce the number of same edge label between two Vertex in Titan

Let's say we have two type of Vertex: LOGIN_USER(property:user_id) and IP(property:ip), EDGE between them is : LOGIN(property:session_id, login_time).
This model's problem is that two many edges between one USER and IP(Can be thousands).
Is there anyway to reduce the edge number of the two vertexes and at the same time can keep property: sessionId and login_time? We want to filter these two properties for some query.
Edge property doesn't support cardinality:list which vertex property support.
If put all edge property into Vertex, does it impact performance to fetch the Vertex?
When titan load property for a Vertex?? When traversal to a Vertex, let's g.V(1).next(), does Titan load all Property for the Vertex?
When you say "thousands" of edges between USER and IP, do you think it could actually be "millions" or "tens of millions" or more? If not, then "thousands" should not be a problem for Titan with vertex centric indices. Index your edge properties and you should have fast ordering and traversals.
When you start to get deep into "millions", you might start to experience some problems - for me that has always been with processing global queries with titan-hadoop as the Vertex and its edges must be held in memory. That can make for some trouble spots when you're doing global analytics. From an operational perspective, Titan was always happy to keep writing edges into the millions on a vertex, but I'd tend to avoid it. Of course, much of my experience with this came before vertex cutting in Titan 1.0:
Cutting a vertex means storing a subset of that vertex’s adjacency
list on each partition in the graph. In other words, the vertex and
its adjacency list is partitioned thereby effectively distributing the
load on that single vertex across all of the instances in the cluster
and removing the hot spot.
which you might experiment with as you start to grow supernodes into the millions.
I suppose the other option for supernodes in the millions of edges would be to model around it. Perhaps you introduce some structure between USER and IP. Convert that single LOGIN edge to some vertices/edges that might introduce a time concept between them like:
USER -> LOGIN_YEAR -> LOGIN_MONTH -> IP
So now, instead of creating just one edge between USER and IP you create a LOGIN_YEAR vertex and a LOGIN_MONTH vertex.

how to ensure there single edge in a graph for a given order_id?

My current scenario is like I have I have products,customer and seller nodes in my graph ecosystem. The problem I am facing is that I have to ensure uniqueness of
(customer)-[buys]->product
with order_item_id as property of the buys relation edge.I have to ensure that there is an unique edge with buys property for a given order_item_id. In this way I want to ensure that my graph insertion remains idempotent and no repeated buys edges are created for a given order_item_id.
creating a order_item_id property
if(!mgmt.getPropertyKey("order_item_id")){
order_item_id=mgmt.makePropertyKey("order_item_id").dataType(Integer.class).make();
}else{
order_item_id=mgmt.getPropertyKey("order_item_id");
}
What I have found so far is that building unique index might solve my problem. like
if(mgmt.getGraphIndex('order_item_id')){
ridIndexBuilder=mgmt.getGraphIndex('order_item_id')
}else{
ridIndexBuilder=mgmt.buildIndex("order_item_id",Edge.class).addKey(order_item_id).unique().buildCompositeIndex();
}
Or I can also use something like
mgmt.buildEdgeIndex(graph.getOrCreateEdgeLabel("product"),"uniqueOrderItemId",Direction.BOTH,order_item_id)
How should I ensure this uniqueness of single buys edge for a given
order_item_id. (I don't have a use-case to search based on
order_item_id.)
What is the basic difference in creating an index on edge using
buildIndex and using buildEdgeIndex?
You cannot enforce the uniqueness of properties at the edge-level, ie. between two vertices (see this question on the mailing list). If I understand your problem correctly, building a CompositeIndex on edge with a uniqueness constraint for a given property should address your problem, even though you do not plan to search these edges by the indexed property. However, this could lead to performance issues when inserting many edges simultaneously due to locking. Depending on the rate at which you insert data, you may have to skip the locking (= skip the uniqueness constraint) and risk duplicate edges, then handle the deduplication yourself at read time and/or run post-import batch jobs to clean up potential duplicates.
buildIndex() builds a global, graph-index (either CompositeIndex or MixedIndex). These kind of indexes typically allows you to quickly find starting points of your Traversal within a graph.
However, buildEdgeIndex() allows you to build a local, "vertex-centric index" which is meant to speed-up the traversal between vertices with a potentially high degree (= huge number of incident edges, incoming and/or outgoing). Such vertices are called "super nodes" (see A solution to the supernode problem blog post from Aurelius) - and although they tend to be quite rare, the likelihood of traversing them isn't that low.
Reference: Titan documentation, Ch. 8 - Indexing for better Performance.

Resources