NebulaGraph Database: How to get all the vertices of each tag? - nebula-graph

I want to get all the vertices of each tag in the Nebula Graph Database.
I tried using fetch prop on player * yield properties(vertex) to get the results, but this was not possible.
(root#nebula) [basketballplayer]> fetch prop on player * yield properties(vertex)
[ERROR (-1004)]: SyntaxError: syntax error near `* yield '
And I tried using neo4j statement match (v:player) return v, but it didn't work either.
root#nebula) [basketballplayer]> match (v:player) return v
[ERROR (-1005)]: Scan vertices or edges need to specify a limit number, or limit number can not push down.
Who can teach me how to use the Nebula Graph database correctly?

By design, the per tag/edge type scan(just like a tabular DBMS data scan) was chosen to be prohibited by default.
Due to the data was stored in NebulaGraph in a more linked/graph way(think of a graph traversal, which started from known nodes and then expand multiple hops along with the edges/relationships). Thus enabling a non-graph scan of data in a distributed graph database like NebulaGraph is costly.
To enable such queries, an index needs to be explicitly created before that 0 or LIMIT sample clause [1] was used(could also avoid full scan).
[1]: example of query(need index for starting node) with LIMIT clause
MATCH (v:player) RETURN v LIMIT 100
Note: the index is only related to the starting node seeking of the query pattern.

Related

How to do random node scan in NebulaGraph database?

I tried to fetch 10 random and non-isolated nodes in the Nebula Graph database. According to their docs the query should be MATCH (n:tag)-[e]-() RETURN n LIMIT 10. But it fails to work.
The screenshots of running the query is as follows:
What is wrong with my query? Such a simple query should not be wrong.
First, there is no index has been created. If you want to execute MATCH (n:tag)-[e]-() RETURN n LIMIT 10, please create at least one index.
If you don't want to create indexes here, you must specify the direction of the edge. For example MATCH (n:tag)-[e]->() RETURN n LIMIT 10

Why can't I MATCH (v:<tag>)-[e:<edge>]-(v2:<tag>) RETURN v LIMIT 10 in the NebulaGraph database

The Nebula Graph docs say that "When traversing all vertices of the specified Tag or edge of the specified Edge Type, such as MATCH (v:player) RETURN v LIMIT N, there is no need to create an index, but you need to use LIMIT to limit the number of output results." But when I run the statement in the preceding screenshot, it told me that I did not have a limit number, which I did.
What is the correct way to RETURN v without creating indexes?
I met the same issue before. Actually, when you specify both a tag and an edge for a query simultaneously, you need to create an index for the tag or the edge first.
Create an index for the tag company first and then try to execute it again.

How to properly use MATCH inside UNWIND for a Nebula query

I’m currently working with the Nebula graph database for the first time and I’m running into some issues with a query. In terms of the schema, I have “Person” nodes, which have a “name” property, as well as Location nodes also with a name property. These node types can be connected by a relationship edge, called HAS_LIVED (to signify whether a person has lived in a certain location). Now for the query, I have a list of names (strings). The query looks like:
UNWIND [“Anna”, “Emma”, “Zach”] AS n
MATCH (p:Person {name: n})-[:HAS_LIVED]->(loc)
RETURN loc.Location.name
This should return a list of three places, i.e. [“London”, “Paris”, “Berlin”]. However, I am getting nothing as a result from the query. When I get rid of the UNWIND and write three separate MATCH queries with each name, it works individually. Not sure why.
Try this instead. It is using "where" clause.
UNWIND [“Anna”, “Emma”, “Zach”] AS n
MATCH (p:Person)-[:HAS_LIVED]->(loc)
where p.name = n
RETURN loc.Location.name

Neo4j - how to include start node in my query?

I'm attempting to build a recommendation engine for a library system.
This is my db schema:
My starting point is a LoanerCard. The flow is then supposed to look like this: Get all copies -> get the material -> get all copies of the material (including the original) -> get LoanerCard from copy -> get all loaned copies -> return the material name of the copies + an aggregated count to indicate the strength of the recommendation.
My best attempt so far has resulted in this query:
MATCH (L:LoanerCard {Barcode:"10007"})-[:LOANED]->(myLoans)-[:COPY_OF]-
(masterMaterial),
(masterMaterial)<-[:COPY_OF]-(allCopies),
(allCopies)<-[:LOANED]-(coLoaners),
(coLoaners)-[r:LOANED]->(theirCopies),
(theirCopies)-[:COPY_OF]-(materials)
RETURN materials.Title as Recommended, count(*) as Strength ORDER BY Strength DESC
My issue here is that when I traverse the graph it doesn't include the original copy and the adjacent LoanerCards of that so essentially it only traverses the area circled in red and never reaches LoanerCard 10817 and 10558
How can I design my query so it includes these?
A MATCH clause automatically filters out duplicate relationships. Therefore, in order to traverse the same relationships twice, you need to split your MATCH clause in two.
Try this:
MATCH (:LoanerCard {Barcode:"10007"})-[:LOANED]->()-[:COPY_OF]-(masterMaterial)
MATCH (masterMaterial)<-[:COPY_OF]-()<-[:LOANED]-()-[:LOANED]->()-[:COPY_OF]-(materials)
RETURN materials.Title as Recommended, count(*) as Strength ORDER BY Strength DESC

Indexing in Titan/Janus

I have 2 questions:
How to index this query?
g.V(vertexId).repeat(out().hasLabel('location')).emit().tree().next()
in the Titan 1.0 documentation, there are only ways given to index the graph once when the data is already inserted .
However in the generate-modern.groovy file here
we see that indexing is done before the creation of vertices which seems reasonable. However I am unable to do it when trying to use buildMixedIndex as it is throwing me
illegal argument exception :Unknown external index backend search
My approach was
def location = mgmt.makeVertexLabel("location").make()
def displayName = mgmt.makePropertyKey("displayName").dataType(String.class).cardinality(Cardinality.SINGLE).make()
def shortName = mgmt.makePropertyKey("shortName").dataType(String.class).cardinality(Cardinality.SINGLE).make()
def description = mgmt.makePropertyKey("description").dataType(String.class).cardinality(Cardinality.SINGLE).make()
def latitude = mgmt.makePropertyKey("latitude").dataType(String.class).cardinality(Cardinality.SINGLE).make()
def longitude = mgmt.makePropertyKey("longitude").dataType(String.class).cardinality(Cardinality.SINGLE).make()
def locationByName = mgmt.buildIndex("displayNameAndShortNameAndDescriptionAndLatitudeAndLongitude", Vertex.class).addKey(displayName).addKey(shortName).addKey(description)
.addKey(latitude).addKey(longitude).indexOnly(location).buildMixedIndex('search')
Where I am getting it wrong?
If that query is taking a long time, the problem is likely that it is visiting too many elements or it is stuck in an infinite loop. The existing JanusGraph/Titan indexes won't help for that. You already have a direct vertex lookup by id, g.V(vertexId), and the rest of the query is traversing the neighborhood from that vertex. I'd suggest using edge labels, i.e. out('friends'), to limit the number of edges you visit. You could also use simplePath() to eliminate cyclic paths. You could also use times() or until() to keep a limit on the number of times you loop with the repeat() step.
The configuration example you referenced only used composite indexes, which do not require an indexing backend.
Mixed indexes require configuring an indexing backend, either Elasticsearch, Lucene, or Solr. Pick one of these, then make sure you pass the correct configuration properties when you initialize your graph. You can find several examples in the distribution zip file in the conf directory. For example, in the janusgraph-cassandra-es.properties, you'll find:
index.search.backend=elasticsearch
index.search.hostname=127.0.0.1
index.search.elasticsearch.client-only=true
where search in index.X.backend is the chosen index configuration name you must pass to buildMixedIndex(X).
Here's an answer.
Both composite and mixed index are only avaiable for the first level gremlin query, not for the second level. Vertex-Centric index is required for the second level query.

Resources