I am new to gremlin and have a requirement to provide the data to plot a graph
Graph has
x axis -> timestamp
y-axis -> Sum of product
Below is the data in graph format
Nodes relation(properties) Nodes
userA likes(timestamp = 22/02/2013) productXY
userX likes(timestamp = 21/05/2013) productAA
userG likes(timestamp = 22/07/2014) productXB
userT likes(timestamp = 03/02/2013) productXR
userA likes(timestamp = 22/02/2013) productXT
userC likes(timestamp = 19/11/2014) productUY
userD likes(timestamp = 22/07/2014) productPY
userE likes(timestamp = 09/07/2013) productLY
userJ likes(timestamp = 09/07/2013) productXY
userP likes(timestamp = 09/07/2013) productKY
Output of the query should be like this.
[09/07/2013, 3]
[22/02/2013, 2]
[21/05/2013, 1]
[22/07/2014, 2]
[03/02/2013, 1]
[19/11/2014, 1]
Could somebody help me in bulind the query.
Note: I have using Rexster RESTAPI to render the data to the application.
Thanks in advance
Answered in the gremlin-users mailing list:
Basically -
you can use the Gremlin extension and this query:
g.V().outE("likes").timestamp.groupCount().cap().next()
Related
I need to download graphs of European countries. I want "drive" network and need to categorize drive network into following three network types: "Highway/motorway and trunk", "primary/secondary" and "tertiary/residential".
Since, downloading the "drive" network without custom_filter is taking forever, I want to download the above three network types separately.
I know that custom_filter for motorway is this : '["highway"~"motorway|trunk"]'
I would like to ask if the following query is correct or not for "primary/secondary":
countries_list = ['UK', 'BELGIUM', 'LUXEMBURG',
'AUSTRIA', 'POLAND', 'HUNGARY', 'SWEDEN', 'SLOVAKIA',
'NETHERLANDS', 'SPAIN', 'SWITZERLAND', 'PORTUGAL', 'IRELAND',
'ITALY', 'DENMARK', 'ROMANIA', 'FINLAND']
## Download the country graphs and save them
for country in countries_list:
print(country + " *****graph download started****************************************************** !!")
cf = '["highway"~"primary |secondary"]'
G = ox.graph_from_place(country, network_type = 'drive', simplify = True, custom_filter = cf)
ox.save_graph_shapefile(G, filepath = country+"/drive_graph_shapefile/")
ox.save_graphml(G, filepath = country+"/drive_graph.graphml")
ox.save_graph_geopackage(G, filepath = country+"/drive_graph.gpkg")
print(country + "graph download successful *********************************************************** !!")
Please correct the above query if it is wrong. Also, what should be the custom_filter for tertiary/residential types of roads.
I am new to janusgraph and gremlin, when I try to create an edge with two existing vertices, I expect the return edge information, the edge source and end vertices id should be the same as I used to create the edge, however, it is not, two new ids are return. Moreover, when I try to find the edges connecting to one of the vertex ("tom"), I found that ("tom") has an edge connecting from and to it-self with has a different id, but the vertice count is just 2.
gremlin> g.V().count()
==>0
gremlin> tom = g.addV("party").property("name", "Tom").property("identity_number", "01234567")\
.property("identity_type", "PASSPORT").property("identity_country", "USA").next()
==>v[57402]
gremlin> mary = g.addV("party").property("name", "Mary").property("identity_number", "76543210")\
.property("identity_type", "PASSPORT").property("identity_country", "USA").next()
==>v[61626]
gremlin> g.V(tom).addE('relationship').to(mary)
==>e[3k4-18ci-80et-1bia][57474-relationship->61570]
gremlin> g.V(tom).bothE().otherV().path().by(__.valueMap().with(WithOptions.tokens))
==>path[{id=57402, label=party, identity_country=[USA], identity_number=[01234567],\
identity_type=[PASSPORT], name=[Tom]}, {id=3k4-18ci-80et-1bia, label=relationship},\
{id=57474, label=party, identity_country=[USA], identity_number=[01234567], identity_type=[PASSPORT],\
name=[Tom]}]
gremlin> g.V().count()
==>2
Could anyone tell me if this is a normal situation? or if there is some configuration make this happened?
Many Thanks.
UPDATE:
I find that this situation is happened after I implemented the janusgraph index by the following code:
m = amlGraph.openManagement();
party = m.makeVertexLabel('party').partition().make();
relationship = m.makeEdgeLabel('relationship').make();
identity_country_key = m.makePropertyKey('identity_country').dataType(String.class).make();
identity_number_key = m.makePropertyKey('identity_number').dataType(String.class).make();
identity_type_key = m.makePropertyKey('identity_type').dataType(String.class).make();
name_key = m.makePropertyKey('name').dataType(String.class).make();
first_seen_datetime_key = m.makePropertyKey('first_seen_datetime').dataType(Date.class).make();
relationship_type_key = m.makePropertyKey('relationship_type').dataType(String.class).make();
party = m.getVertexLabel('party');
identity_country_key = m.getPropertyKey('identity_country');
identity_number_key = m.getPropertyKey('identity_number');
identity_type_key = m.getPropertyKey('identity_type');
name_key = m.getPropertyKey('name');
m.buildIndex('partyMixed', Vertex.class).addKey(identity_country_key, Mapping.TEXTSTRING.asParameter(), Parameter.of('identity_country', 'identity_country')).addKey(identity_number_key, Mapping.TEXTSTRING.asParameter(), Parameter.of('identity_number', 'identity_number')).addKey(identity_type_key, Mapping.TEXTSTRING.asParameter(), Parameter.of('identity_type', 'identity_type')).addKey(name_key, Mapping.TEXTSTRING.asParameter(), Parameter.of('name', 'name')).indexOnly(party).buildMixedIndex('search');
relationship = m.getEdgeLabel('relationship');
first_seen_datetime_key = m.getPropertyKey('first_seen_datetime');
relationship_type_key = m.getPropertyKey('relationship_type');
m.buildIndex('relationshipMixed', Edge.class).addKey(first_seen_datetime_key).addKey(relationship_type_key).indexOnly(relationship).buildMixedIndex('search');
m.commit()
Which version of JanusGraph are you using ? May be if you are using an older version it can be a bug...
I used one of the latest version(0.5.3) and tried to generate the same scenario and I am getting the correct ID.
gremlin>
gremlin> tom = g.addV("party").property("name", "Tom").property("identity_number", "01234567").property("identity_type", "PASSPORT").property("identity_country", "USA").next()
==>v[4112]
gremlin>
gremlin> mary = g.addV("party").property("name", "Mary").property("identity_number", "76543210").property("identity_type", "PASSPORT").property("identity_country", "USA").next()
==>v[40964232]
gremlin>
gremlin> g.V(tom).addE('relationship').to(mary)
==>e[2rm-368-3ehh-oe07c][4112-relationship->40964232]
gremlin>
I have the following vertices -
Person1 -> Device1 <- Person2
^
| |
v
Email1 <- Person3
Now I want to write a gremlin query (janusgraph) which will give me all persons connected to the device(only) with which person1 is connected.
So according to the above graph, our output should be - [Person2].
Person3 is not in output because Person3 is also connected with "Email1" of "Person1".
g.addV('person').property('name', 'Person1').as('p1').
addV('person').property('name', 'Person2').as('p2').
addV('person').property('name', 'Person3').as('p3').
addV('device').as('d1').
addV('email').as('e1').
addE('HAS_DEVICE').from('p1').to('d1').
addE('HAS_EMAIL').from('p1').to('e1').
addE('HAS_DEVICE').from('p2').to('d1').
addE('HAS_DEVICE').from('p3').to('d1').
addE('HAS_EMAIL').from('p3').to('e1')
The following traversal will give you the person vertices that are connected to "Person1" via one or more "device" vertices and not connected via any other type of vertices.enter code here
g.V().has('person', 'name', 'Person1').as('p1').
out().as('connector').
in().where(neq('p1')).
group().
by().
by(select('connector').label().fold()).
unfold().
where(
select(values).
unfold().dedup().fold(). // just in case the persons are connected by multiple devices
is(eq(['device']))
).
select(keys)
I am new to Gremlin and am trying to convert a SQL query to Gremlin. I have two vertex types, labeled host and repo, and here is the Gremlin script to create the vertices and the edges:
g.addV('asset').property(id, 'a1').property('ip', '127.4.8.51').property('scanDate', '2020-09-10').property('repoId', 1)
g.addV('asset').property(id, 'a2').property('ip', '127.4.8.55').property('scanDate', '2020-09-20').property('repoId', 1)
g.addV('asset').property(id, 'a3').property('ip', '127.4.8.57').property('scanDate', '2020-09-21').property('repoId', 1)
g.addV('asset').property(id, 'a4').property('ip', '127.4.10.36').property('scanDate', '2020-09-12').property('repoId', 2)
g.addV('asset').property(id, 'a5').property('ip', '127.4.10.75').property('scanDate', '2020-09-14').property('repoId', 2)
g.addV('repo').property(id, 'r1').property('repoName', 'repo1').property('assetAge', 10).property('repoId', 1)
g.addV('repo').property(id, 'r2').property('repoName', 'repo2').property('assetAge', 9).property('repoId', 2)
g.V('a1').addE('has').to(g.V('r1'))
g.V('a2').addE('has').to(g.V('r1'))
g.V('a3').addE('has').to(g.V('r1'))
g.V('a4').addE('has').to(g.V('r2'))
g.V('a5').addE('has').to(g.V('r2'))
I would like to write down a query in Gremlin that does the same thing as the bellow SQL query:
SELCET *
FROM asset a
JOIN repo r ON a.repoId = r.repoId
WHERE a.scanDate >= CURDATE() - INTERVAL (r.assetAge + 1) DAY
I have so far tried the bellow code in python:
from datetime import datetime, timedelta
from gremlin_python.process.traversal import gte
d = datetime.today() - timedelta(days=10) # here I have hard coded the days
traversal = g.V().hasLabel("asset").has("scanDate", gte(d))
traversal.valueMap().toList()
But I do not know how I can pass the repo.assetAge from the mapped repo vertices to the days parameter in timedelta(). Any help is really appreciated. Thanks.
I can't save my edge when I'm using spark as follows:
for information it can save edge by using gremlin console
val graph = DseGraphFrameBuilder.dseGraph("GRAPH_NAME", spark)
graph.V().has("vertex1","field1","value").as("a").V().has("vertex2","field1","value").addE("myEdgeLabel").to("a")
When I try: graph.edges.show()
I get an empty table
addE() step is not yet implemented in DseGraphFrames, you should use DGF specific updateEdges() function. The function is design for bulk updates It take spark dataframe with new edges in DGF format:
scala> newEdges.printSchema
root
|-- src: string (nullable = false)
|-- dst: string (nullable = false)
|-- ~label: string (nullable = true)
src and dst columns are encoded vertex ids. you can ether construct them with g.idColumn() helper function or select them from vertices.
Usually you know ids and use helper function
scala> val df = Seq((1, 2, "myEdgeLabel")).toDF("v1_id", "v2_id", "~label")
scala> val newEdges=df.select(g.idColumn("vertex2", $"v2_id") as "src", g.idColumn("vertex1", $"v1_id") as "dst", $"~label")
scala> g.updateEdges(newEdges)
For your particular case, you can query ids first and then insert base on them. never do this in production, this approach is slow and is not bulk. on huge graphs use method #1:
val dst = g.V.has("vertex1","field1","value").id.first.getString(0)
val src = g.V.has("vertex2","field1","value").id.first.getString(0)
val newEdges = Seq((src, dst, "myEdgeLabel")).toDF("src", "dst", "~label")
g.updateEdges(newEdges)
See documentation: https://docs.datastax.com/en/dse/5.1/dse-dev/datastax_enterprise/graph/graphAnalytics/dseGraphFrameImport.html