ArangoDB anonymous graph traversal - graph

I am planing to use ArangoDB and I am faced with a problem I don't know how to solve. I would like to do simple traversals but in my case but there are two requirements that I don't know how to solve:
I will not know in advance the type of vertices than an edge will connect to. I want to be able to connect edge of one type to any vertex on any side.
For one vertex, I want to retrieve all connected vertices (depth 1) no matter the edge type.
For the requirement 1, an example would be a Tag vertex (to tag some entity with some information) and I want to be able to tag any vertex using i.e. HasTag edge in a named graph. From what I currently see is that I need to define the "From" collections ("To" collection is the Tag collection) and this is limited to 10 collections. Since I could have 100 or more From collections I don't see how to solve this with named graphs.
Option would be to use anonymous graphs but then I have a problem in the second requirement. I also want to have an option, when given a vertex, to find all connected vertices (depth = 1) no matter the type of an edge. In an anonymous graph I would need to specify all of the edge collections in a query and again, there could be 100 or more of them. I don't know if there is a limit to this number but I would assume there is one - maybe I'm mistaken since I haven't yet tried it out.
Has anyone any idea how to solve this with ArrangoDB? I really like the database but I would like it to be more "typeless", that is, that I wouldn't have to define the type of vertex collection an edge can connect to.
Best regards
Tomaz

You can have more than 10 vertex collections in a named graph. The limitation of 10 only exists in the webUI. Creating the named graph over the ArangoShell or the server console will work.

Related

OpenCypher - Get all nodes that not connect to the center of the graph

I have a graph on Neptune and I used OpenCypher to query on it.
At the middle of the graph I have a big connected nodes,
at the edges you can see that I have some single nodes/ nodes that connected only to 1-5 other nodes. (see on the picture)
I want to get all of them, there is an option to do so?
I tried to think about options like take a random Id from the center and check all the nodes that don't have a path from them to this node, or maybe say get a table with all nodes and number of connected nodes to them, and ask for all nodes that not contain more than 10 nodes connected
but I didn't find a way to write this query,
Must to know that opencypher on Neptune not contains all the magic keys like 'all' predicate function, so need to find a way with the functions that neptune support
Ideally, you would want to run Weakly Connected Components algorithm to identify the largest component and then return all nodes that are not part of it. It seems that Neptune doesn't support that algorithm out-of-the-box, but you could implement it with gremlin as discussed in another SO question: Find largest connected components AWS Neptune

Gremlin query to find the entire sub-graph that a specific node is connected in any way to

I am brand new to Gremlin and am using gremlin-python to traverse my graph. The graph is made up of many clusters or sub-graphs which are intra-connected, and not inter-connected with any other cluster in the graph.
A simple example of this is a graph with 5 nodes and 3 edges:
Customer_1 is connected to CreditCard_A with 1_HasCreditCard_A edge
Customer_2 is connected to CreditCard_B with 2_HasCreditCard_B edge
Customer_3 is connected to CreditCard_A with 3_HasCreditCard_A edge
I want a query that will return a sub-graph object of all nodes and edges connected (in or out) to the queried node. I can then store this sub-graph as a variable and then run different traversals on it to calculate different things.
This query would need to be recursive as these clusters could be made up of nodes which are many (inward or outward) hops away from each other. There are also many different types of nodes and edges, and they all must be returned.
For example:
If I specified Customer_1 in the query, the resulting sub-graph would contain Customer_1, Customer_3, CreditCardA, 1_HasCreditCard_A, and 3_HasCreditCard_A.
If I specififed Customer_2, the returned sub-graph would consist of Customer_2, CreditCard_B, 2_HasCreditCard_B.
If I queried Customer_3, the exact same subgraph object as returned from the Customer_1 query would be returned.
I have used both Neo4J with Cypher and Dgraph with GraphQL and found this task quite easy in these two langauges, but am struggling a bit more with understanding gremlin.
EDIT:
From, this question, the selected answer should achieve what I want, but without specifying the edge type by changing .both('created') to just .both().
However, the loop syntax: .loop{true}{true} is invalid in Python of course. Is this loop function available in gremlin-python? I cannot find anything.
EDIT 2:
I have tried this and it seems to be working as expected, I think.
g.V(node_id).repeat(bothE().otherV().simplePath()).emit()
Is this a valid solution to what I am looking for? Is it also possible to include the queried node in this result?
Regarding the second edit, this looks like a valid solution that returns all the vertices connected to the starting vertex.
Some small fixes:
you can change the bothE().otherV() to both()
if you want to get also the starting vertex you need to move the emit step before the repeat
I would add a dedup step to remove all duplicate vertices (can be more than 1 path to a vertex)
g.V(node_id).emit().repeat(both().simplePath()).dedup()
exmaple: https://gremlify.com/jngpuy3dwg9

Gremlin query to get in and out edges for a given Vertex

I’m just playing with the Graph API in Cosmos DB
which uses the Gremlin syntax for query.
I have a number of users (Vertex) in the graph and each have ‘knows’ properties to other users. Some of these are out edges (outE) and others are in edges (inE) depending on how the relationship was created.
I’m now trying to create a query which will return all ‘knows’ relationships for a given user (Vertex).
I can easily get the ID of either inE or outE via:
g.V('7112138f-fae6-4272-92d8-4f42e331b5e1').inE('knows')
g.V('7112138f-fae6-4272-92d8-4f42e331b5e1').outE('knows')
where '7112138f-fae6-4272-92d8-4f42e331b5e1' is the Id of the user I’m querying, but I don’t know ahead of time whether this is an in or out edge, so want to get both (e.g. if the user has in and out edges with the ‘knows’ label).
I’ve tried using a projection and OR operator and various combinations of things e.g.:
g.V('7112138f-fae6-4272-92d8-4f42e331b5e1').where(outE('knows').or().inE('knows'))
but its not getting me back the data I want.
All I want out is a list of the Id’s of all inE and outE that have the label ‘knows’ for a given vertex.
Or is there a simpler/better way to model bi-directional associations such as ‘knows’ or ‘friendOf’?
Thanks
You can use the bothE step in this case. g.V('7112138f-fae6-4272-92d8-4f42e331b5e1').bothE('knows')

What is wrong with light weight edges?

I created an edge without attribute and guess what? it was created but still can not query it but then i created the same edge again and now they both are having same rid>?
I suggest you to start using OrientDB from the tutorial. This is an extract:
Starting from OrientDB v1.4.x edges, by default, are managed as lightweight edges: they don't have own identities as record, but are physically stored as links inside vertices. OrientDB automatically uses Lightweight edges only when edges have no properties, otherwise regular edges are used. From the logic point of view, lightweight edges are edges at all the effects, so all the graph functions work correctly. This is to improve performance and reduce the space on disk. But as a consequence, since lightweight edges don't exist as separate records in the database, the following query will not return the lightweight edges:
SELECT FROM E
In most of the cases Edges are used from Vertices, so this doesn't cause any particular problem. In case you need to query Edges directly, even those with no properties, disable lightweight edge feature by executing this command once:
ALTER DATABASE CUSTOM useLightweightEdges=false
This will only take effect for new edges. For more information look at Troubleshooting.
You can query for a list of names of edges with:
select name from ( select expand(classes) from metadata:schema ) where superClass="E"

Getting extra relationships in the Neo4j query

I am new to Neo4j.
For a given node (say, node 'n'), I am trying to find all other nodes in the graph that are in some way dependent on it. In other words, find nodes in graph who have edges directed towards node 'n'. I am getting correct nodes(lets call them c,d,e) using the following query:
MATCH (depNode)-[r]->(n:AttributeNode)
WHERE n.name='testnode'
RETURN depNode
In the original graph, the nodes c and d are connected as well using a relationship. In the result of the above query, I am also receiving that relationship (edge between c and d). How do I get rid of that edge in my output?
If I get your question correctly, I think you're already getting the correct answer in tabular form but in the visualization form Neo4j shows the "extra edges". You should check out the tabular form and confirm whether it's correctly showing the desired output or not (which it would be).
What's happening here is the default way how the Neo4j browser works. Whenever you try to retrieve some nodes, it shows all the relationships between the nodes as well. If you want to just visualize the nodes, you cannot do that in the current version of the Neo4j browser. You will have to use visualization tools like Gephi on your database and filter your results accordingly.
As of Neo4j 2.2.0.RC1 you can disable the extra relationships being used by setting Autocomplete to Off. The toggle appears at the bottom-right of your result graph and seems to be remembered for future requests.

Resources