What is wrong with light weight edges? - graph

I created an edge without attribute and guess what? it was created but still can not query it but then i created the same edge again and now they both are having same rid>?

I suggest you to start using OrientDB from the tutorial. This is an extract:
Starting from OrientDB v1.4.x edges, by default, are managed as lightweight edges: they don't have own identities as record, but are physically stored as links inside vertices. OrientDB automatically uses Lightweight edges only when edges have no properties, otherwise regular edges are used. From the logic point of view, lightweight edges are edges at all the effects, so all the graph functions work correctly. This is to improve performance and reduce the space on disk. But as a consequence, since lightweight edges don't exist as separate records in the database, the following query will not return the lightweight edges:
SELECT FROM E
In most of the cases Edges are used from Vertices, so this doesn't cause any particular problem. In case you need to query Edges directly, even those with no properties, disable lightweight edge feature by executing this command once:
ALTER DATABASE CUSTOM useLightweightEdges=false
This will only take effect for new edges. For more information look at Troubleshooting.

You can query for a list of names of edges with:
select name from ( select expand(classes) from metadata:schema ) where superClass="E"

Related

How to do efficient bulk upserts (insert new vertex, or update property(ies)) in Gremlin?

Context:
I do have a graph with about 2000 vertices, and 6000 edges, this over time might grow to 10000 vertices and 100000 edges. Currently I am upserting the new vertices using the following traversal query:
Upserting Vertices & Edges
queryVertex = "g.V().has(label, name, foo).fold().coalesce(
unfold(), addV(label).property(name, foo).property(model, 2)
).property(model, 2)"
The intent here is to look for vertex, named foo, and if found update its model property, otherwise create a new vertex and set the model property. this is issued twice: once for the source vertex and then for the target vertex.
Once the two related vertices are created, another query is issued to create the edge between them:
queryEdge = "g.V('id_of_source_vertex').coalesce(
outE(edge_label).filter(inV().hasId('id_of_target_vertex')),
addE(edge_label).to(V('id_of_target_vertex'))
).property(model, 2)"
here, if there is an edge between the two vertices, the model property on edge is updated, otherwise it creates the edge between them.
And the pseudocode that does this, is something as follows:
for each edge in the list of new edges:
//upsert source and target vertices:
execute queryVertex for edge.source
execute queryVertex for edge.target
// upsert edge:
execute queryEdge
This works, but it is highly inefficient; for example for the mentioned graph size it takes several minutes to finish, and with some in-app concurrency, it reduces the time only by couple of minutes. Surely, there must be a more efficient way of doing this for such a small graph size.
Question
* How can I make these upserts faster?
Bulk loading should typically be relegated to the provider specific tools that are optimized to handle such tasks. Gremlin really doesn't provide abstractions to cover the diverse group of bulk loader tools that are out there for each of the various graph database systems that implement TinkerPop. For Neptune, which is how you tagged your question, that would mean using the Neptune Bulk Loader.
Speaking specifically to your question, though you might see some optimizations to what you described as your approach. From a Gremlin perspective, I imagine you would see some savings here by submitting a single Gremlin request per edge by combining your existing traversals:
g.V().has(label, name, foo).fold().
coalesce(unfold(),
addV(label).property(name, foo)).
property(model, 2).as('source').
V().has(label, name, bar).fold().
coalesce(unfold(),
addV(label).property(name, bar)).
property(model, 2).as('target').
coalesce(inE(edge_label).where(outV().as('source')),
addE(edge_label).from('source').to('target')).
property(model, 2)
I think I got that right - untested, but hopefully you get the idea. Basically, we just reference the vertices already in memory via step labels so that we don't need to requery them. You might try other tactics as well if you continue with Gremlin-style bulk loading like ordering your edges so that you could batch together more edge loads to reduce the amount of vertex lookups and submit vertex/edge data in a more dynamic fashion as described here.

How to remove an edge between two vertices from DSE schema

We have an edge between two vertices in datastax graph.
How can I remove the edge from the schema itself?
I tried this:
schema.edgeLabel("belongswithin").connection("poi", "region").drop()
but there is no drop method to delete an edge like this.
I can't run:
schema.edgeLabel("belongswithin").drop()
as I am having the same edge being used between the other two vertices which I don't want to drop?
In DSE Graph, an EdgeLabel can be dropped, but not selectively for a particular connection between two VertexLabels. You don't say which version of DSE you are using, but I'm pretty certain you'll need to drop and rewrite the schema, then load your data again. If you are a DataStax customer, you might want to contact Support to see if there is any assistance they can provide.

ArangoDB anonymous graph traversal

I am planing to use ArangoDB and I am faced with a problem I don't know how to solve. I would like to do simple traversals but in my case but there are two requirements that I don't know how to solve:
I will not know in advance the type of vertices than an edge will connect to. I want to be able to connect edge of one type to any vertex on any side.
For one vertex, I want to retrieve all connected vertices (depth 1) no matter the edge type.
For the requirement 1, an example would be a Tag vertex (to tag some entity with some information) and I want to be able to tag any vertex using i.e. HasTag edge in a named graph. From what I currently see is that I need to define the "From" collections ("To" collection is the Tag collection) and this is limited to 10 collections. Since I could have 100 or more From collections I don't see how to solve this with named graphs.
Option would be to use anonymous graphs but then I have a problem in the second requirement. I also want to have an option, when given a vertex, to find all connected vertices (depth = 1) no matter the type of an edge. In an anonymous graph I would need to specify all of the edge collections in a query and again, there could be 100 or more of them. I don't know if there is a limit to this number but I would assume there is one - maybe I'm mistaken since I haven't yet tried it out.
Has anyone any idea how to solve this with ArrangoDB? I really like the database but I would like it to be more "typeless", that is, that I wouldn't have to define the type of vertex collection an edge can connect to.
Best regards
Tomaz
You can have more than 10 vertex collections in a named graph. The limitation of 10 only exists in the webUI. Creating the named graph over the ArangoShell or the server console will work.

How to reduce the number of same edge label between two Vertex in Titan

Let's say we have two type of Vertex: LOGIN_USER(property:user_id) and IP(property:ip), EDGE between them is : LOGIN(property:session_id, login_time).
This model's problem is that two many edges between one USER and IP(Can be thousands).
Is there anyway to reduce the edge number of the two vertexes and at the same time can keep property: sessionId and login_time? We want to filter these two properties for some query.
Edge property doesn't support cardinality:list which vertex property support.
If put all edge property into Vertex, does it impact performance to fetch the Vertex?
When titan load property for a Vertex?? When traversal to a Vertex, let's g.V(1).next(), does Titan load all Property for the Vertex?
When you say "thousands" of edges between USER and IP, do you think it could actually be "millions" or "tens of millions" or more? If not, then "thousands" should not be a problem for Titan with vertex centric indices. Index your edge properties and you should have fast ordering and traversals.
When you start to get deep into "millions", you might start to experience some problems - for me that has always been with processing global queries with titan-hadoop as the Vertex and its edges must be held in memory. That can make for some trouble spots when you're doing global analytics. From an operational perspective, Titan was always happy to keep writing edges into the millions on a vertex, but I'd tend to avoid it. Of course, much of my experience with this came before vertex cutting in Titan 1.0:
Cutting a vertex means storing a subset of that vertex’s adjacency
list on each partition in the graph. In other words, the vertex and
its adjacency list is partitioned thereby effectively distributing the
load on that single vertex across all of the instances in the cluster
and removing the hot spot.
which you might experiment with as you start to grow supernodes into the millions.
I suppose the other option for supernodes in the millions of edges would be to model around it. Perhaps you introduce some structure between USER and IP. Convert that single LOGIN edge to some vertices/edges that might introduce a time concept between them like:
USER -> LOGIN_YEAR -> LOGIN_MONTH -> IP
So now, instead of creating just one edge between USER and IP you create a LOGIN_YEAR vertex and a LOGIN_MONTH vertex.

Light Weight Edge

Can you define Light Weight Edge Properly?
Just make a short example about the concept of Light Weight Edge that you have in your mind?
What is the importance of Light weight edge in Graphs and What are the disadvantages?
From the official documentation:
Lightweight edges
Starting from OrientDB v1.4.x edges, by default, are managed as lightweight edges: they don't have own identities as record, but are physically stored as links inside vertices. OrientDB automatically uses Lightweight edges only when edges have no properties, otherwise regular edges are used. From the logic point of view, lightweight edges are edges at all the effects, so all the graph functions work correctly. This is to improve performance and reduce the space on disk. But as a consequence, since lightweight edges don't exist as separate records in the database, the following query will not return the lightweight edges:
SELECT FROM E
In most of the cases Edges are used from Vertices, so this doesn't cause any particular problem. In case you need to query Edges directly, even those with no properties, disable lightweight edge feature by executing this command once:
ALTER DATABASE CUSTOM useLightweightEdges=false
This will only take effect for new edges. For more information look at: https://github.com/orientechnologies/orientdb/wiki/Troubleshooting#why-i-cant-see-all-the-edges.
For more information look at Graph API.

Resources