Even though Freebase was deprecated in Jun. 2015, I was impressed by Freebase's MQL. It is intuitive, concise, declarative, and easy to understand/write.
These days I'm learning about TinkerPop3 and gremlin. I think gremlin has many good features. I wonder whether I could convert Freebase MQL to TinkerPop3 gremlin.
Let's say, I have TinkerPop3 sample data "The Crew" and following MQL:
[{
"type": "person",
"name": null,
"develops": {
"type": "software",
"release_date": null, // release_date is not in the crew data.
// Just added for test
"name": null,
"sort": "-name", // descending sort
"limit": 2 // print only two software
},
"uses": {
"type": "software",
"release_date": null,
"name": null,
"sort": "name", // ascending sort
"limit": 2
}
}]
Above MQL means "find person", and for each person "print his two developed software and used software". Please keep in mind that above MQL is just an example for testing.
I've tried to convert MQL to a single gremlin. But I couldn't convert it. So I'm asking you. Is it possible? then how to convert, if impossible why? (If it is possible it would be better if a generated gremlin is efficient in terms of optimization and execution.)
If single gremlin is impossible, then is it possible assembling two or more gremlin can generate as same as MQL's output without performance loss?
Thanks in advanced.
The fastest query to solve this problem should be the following:
gremlin> g.V().hasLabel("person").as("person").
gremlin> map(out("develops").order().by("name", decr).limit(2).fold()).as("develops").select("person").
gremlin> map(out("uses").order().by("name", incr).limit(2).fold()).as("uses").
gremlin> select("person","develops","uses")
==>[person:v[1], develops:[v[11], v[10]], uses:[v[10], v[11]]]
==>[person:v[7], develops:[v[11], v[10]], uses:[v[10], v[11]]]
==>[person:v[8], develops:[v[10]], uses:[v[10], v[11]]]
==>[person:v[9], develops:[], uses:[v[10], v[11]]]
However, using the match() step, you can probably enhance the readability (although it contains the same steps):
g.V().hasLabel("person").match(
__.as("person").out("develops").order().by("name", decr).limit(2).fold().as("develops"),
__.as("person").out("uses").order().by("name", incr).limit(2).fold().as("uses")).
select("person","develops","uses")
UPDATE
Since you don't want to see me (v[9]) in the result set, you can add a simple filter condition:
g.V().hasLabel("person").as("person").
filter(outE("develops").and().outE("uses")).
map(out("develops").order().by("name", decr).limit(2).fold()).as("develops").select("person").
map(out("uses").order().by("name", incr).limit(2).fold()).as("uses").
select("person","develops","uses")
Related
I'm looking for a pattern.
I'm working on this query:
g.V().has('objid','7615388501660').as('location')
.in('enhabits').as('population')
.out('isInFaction').as('faction')
.in('isInFaction').out('isOfSpecies').as('species')
.path().by('name')
and I get this back:
"labels": [
["location"],
["population"],
["faction"],
[],
["species"]
],
"objects": [
"Plara",
"Se Bemon",
"Se",
"Se Bemon",
"Wan"
]
but there is an extra step [] that I feel is the wrong approach. It also traverses through all of the populations in that faction, not just the one I want. What I want is each record of the location, population, faction, species in a list. Or, in another way, for each population in that location, I want that population, it's faction, it's species.
You can often flatten these backtracking type of use cases by introducing a union step into the query. Something along the lines of :
g.V().has('objid','7615388501660').as('location').
in('enhabits').as('population').
local(
union(
out('isInFaction').as('faction'),
out('isOfSpecies').as('species')).
fold()).
path().
by(unfold().values('name').fold())
we are using pinot hll, and got suggested to switch from fasthll to distinctcounthll, but we got the count very different, with the same condition we have 1000x difference.
Example:
SELECT fasthll(my_hll), distinctcounthll(my_hll)
FROM counts_table WHERE timestamp >= 1500768000
I get results:
"aggregationResults": [
{
"function": "fastHLL_my_hll",
"value": "68685244"
}, {
"function": "distinctCountHLL_my_hll",
"value": "50535"
}]
Could anyone suggest what's the big difference between them?
Please refer to pinot-issue-5153.
FastHll will convert one string into a hyperloglog object, which may represent thousand unique values. DistinctCountHLL treats string as a value, not hyperloglog object, so it will return the approximation of how many unique hyperloglog serialized strings, the value should be close to your total number scanned .
fasthll is deprecated because of the low performance of deserialization. You may generate BYTES type for serialized HyperLogLog using org.apache.pinot.core.common.ObjectSerDeUtils.HYPER_LOG_LOG_SER_DE.serialize(hyperLogLog) and query it with distinctcounthll
I have the following traversal that shows that the selected Vertex has 14 Edges labeled "follows".
g.V().has('user','email','me#email.com').project('name','email','follow-edges').by('name').by('email').by(outE().hasLabel('follows').project('id','inV').by('id').by('inV'))
Which produces the following results:
[{
"name": "David",
"email": "me#email.com",
"follow-edges": 14}]
But when I want to project the "follows" Edge's id and inV ids, I'm only getting one result item back.
g.V().has('user','email','david#me.com').project('name','email','follow-edges').by('name').by('email').by(outE().hasLabel('follows').project('edge-id', 'inV-id').by('id').by('inV'))
Results:
[{
"name": "David",
"email": "me#email.com",
"follow-edges": {
"edge-id": "ccc06183-f4ca-410d-9c3c-9d2dfd93f5f0",
"inV-id": "f4703a07-f42d-46f9-86be-7f5440f07f12"
}}]
I was expecting to get a list of all the "follows" edge's for the selected vertex. Similiar to the answer given by Stephen Mallette at this link.
Does anyone know why this is not working?
You need to reduce the stream of objects in your anonymous traversal in by() - note my addition of fold():
g.V().has('user','email','david#me.com').
project('name','email','followedges').
by('name').
by('email').
by(outE().hasLabel('follows').
project('edge-id', 'inV-id').
by('id').
by('inV').fold())
I assume that "inV" is an actual property and you're not trying to get the "in vertex" of the edge. If you are trying to get the "in vertex" then you need by(inV().id()).
I have a structure like below under xyz
{
"pushKey000": {
"findKey": "john_1",
"userName": "john",
"topic": 1
},
"pushKey001": {
"findKey": "john_2",
"userName": "john",
"topic": 2
},
"pushKey002": {
"findKey": "joel_1",
"userName": "joel",
"topic": 1
}
}
Now am trying to make a query where I want data of all entries with findKey starting with "john". I tried the following:(Using REST for example)
https://abc.firebaseio.com/xyz.json?orderBy="findKey"&startAt="john"
This gives me all the results including 'joel'. Basically it just uses the first character of startAt, in this case J.
This firebase video fires the same type of query but only searches with just first character.
Is there something wrong that I am doing or is there is any other way to retrieve it using findKey? Thanks a lot for the help in advance
PS: My .indexOn is on findKey and can't change it
There is nothing wrong with your code, there is something wrong with your expectations. (I always wanted to write that as an answer :))
The startAt() function works as a starting point for your query, not a filter. So in your case it will find the first occurance of "john" and return everything from that point forward (Including Joel, Kevin, Tim, etc...).
Unfortunatly there is no direct way to do a query where findKey contains the string "john". But luckely there is a (partial) workaround using endAt().
You query will look like this:
orderBy="findKey"&startAt="john"&endAt="john\uf8ff"
Here \uf8ff is the last unicode character (please correct me if I'm wrong).
With this you can query for values that start with "john" like "johnnie", "johnn", "john". But not "1john" or "johm" or "joel".
A beginner's question about freebase:
I am looking for the imdb id of a movie called "O". If I use the searchbox on the freebase.com website and constrain the search by type to all:/film/film, then I get a high quality result with the best match on top:
http://www.freebase.com/search?query=o&lang=en&all=%2Ffilm%2Ffilm&scoring=entity&prefixed=true
But this does not include the imdb id. When I try to recreate and refine this search using the query editor, I can't figure out how to do a "general query". The best I could come up with was doing a fuzzy name search like this:
[{
"type": "/film/film",
"name": null,
"name~=": "o",
"imdb_id": [],
"rottentomatoes_id": []
}]
The result contains exactly the information I want, but the movie "O" is only the 12th result in the list, buried under lots of nonsense:
http://www.freebase.com/query?lang=%2Flang%2Fen&q=[{%22type%22%3A%22%2Ffilm%2Ffilm%22%2C%22name%22%3Anull%2C%22name~%3D%22%3A%22o%22%2C%22imdb_id%22%3A[]%2C%22rottentomatoes_id%22%3A[]}]
How can I improve the quality of my result? What special magic does the "?query=o" use that "name~=":"o" does not have?
When you use query=o, freebase does some smart sorting of the results, display exact matches first, followed by less exact matches.
With your query name ~= o you are not searching for movies with name "O", but for movies that contain "O" in their names (the ~= operator). If you want to search for a specific movie title, then specify the exact name:
[{
"type": "/film/film",
"name": "o",
"imdb_id": [],
"rottentomatoes_id": []
}]
This will result in output:
{
"result": [{
"imdb_id": [
"tt0184791"
],
"name": "O",
"type": "/film/film",
"rottentomatoes_id": [
"o"
]
}]
}
If Search gives you the topic that you want, why not just use the output parameter to add the IMDB ID (or whatever else you want) to the output that you request it return?