Azure Cosmos Graph nest edge vertex in a vertex property - azure-cosmosdb

I have two vertex:
1) Vertex 1: { id: 1, name: “john” }
2) Vertex 2: { id: 2, name: “mary” }
There is an edge from 1 to 2 named “children”.
Is it possible to return 2 nested in 1 using gremlin like this?
{
id: 1,
name: “john”,
children: { id: 2, name: “mary” }
}
Thank you!
My solution with an amazing help of #noam621 ---------------------------------
g.V(1)
.union( valueMap(true),
project('children').by( coalesce( out('children').valueMap(true).fold() , constant([]))),
project('parents').by( coalesce( out('parents').valueMap(true).fold() , constant([])))
)
.unfold().group().by(keys).by(select(values))
It returns the following object:
{
id: 1,
name: [ “john” ],
children: [ { id: 2, name: [ “mary” ] } ],
parents: []
}
.union with project are the key to merge all objects in one object.
valueMap(true).fold() is fundamental to get all objects in the edge and coalesce helps with a default value if the edge doesn't return any vertex.
Due to some Azure Cosmos gremlin limitations is only possible to get values as array values.
Thus I finalized the object formatting in my application code. It's ok for now.
Thank you!

You can do it by using the project step for both vertexes:
g.V(1).project('id', 'name', 'children').
by(id).
by('name').
by(out('children').
project('id', 'name').by(id).
by('name'))
example:
https://gremlify.com/3j
query with valueMap:
g.V(1).union(
valueMap().
with(WithOptions.tokens).by(unfold()),
project('children').
by(out('children').
valueMap().
with(WithOptions.tokens).by(unfold()))
).unfold().
group().by(keys).
by(select(values))
if valueMap().with(WithOptions.tokens) not supported in Cosmos use valueMap(true) instead

Related

Gremlin group by vertex property and get sum other properties in the same vertex

We have vertex which will store various jobs and their types and counts as properties. I have to group by the status and their counts. I tried the following query which works for one property(receiveCount)
g.V().hasLabel("Jobs").has("Type",within("A","B","C")).group().by("Type").by(fold().match(__.as("p").unfold().values("receiveCount").sum().as("totalRec")).select("totalRec")).next()
I wanted to give 10 more properties like successCount, FailedCount etc.. Is there a better way to give that?
You could use cap() step just like:
g.V().has("name","marko").out("knows").groupCount("a").by("name").group("b").by("name").by(values("age").sum()).cap("a","b")
And the result would be:
"data": [
{
"a": {
"vadas": 1,
"josh": 1
},
"b": {
"vadas": [
27.0
],
"josh": [
32.0
]
}
}
]

How do I configure OrientDB ETL to import an edge list with attributes

I have an CSV which contains an edge list, one edge per row. It looks like this:
id1, id2, attr1, attr2, attrX, attrY, attrZ
From this, I want to be able to create (or update) the following, per row:
Vertex A of class X, with id1 and attribute attr1
Vertex B of class X, with id2 and attribute attr2
Edge A->B with edge attributes attrX, attrY, attrZ
This is the configuration file I'm feeding to oetl.sh (using OrientDB 2.2 beta2):
{
"source": { "file": { "path": "/data/sample/test.csv" } },
"extractor": { "row": {} },
"transformers" :
[
{ "csv" : {} },
{ "merge" : { "joinFieldName":"id1", "lookup":"X.id" } },
{ "vertex" : { "class" : "X", "skipDuplicates":true } },
{ "edge" : {
"unresolvedLinkAction" : "WARNING",
"class" : "EdgeTypeClass",
"joinFieldName" : "id2",
"lookup": "X.id",
"edgeFields":{"attrX":"${input.attrX}", "attrY":"${input.attrY}","attrZ":"${input.attrZ}"}
}
},
{ "field" : { "fieldNames" : [ "id1", "id2", "attr1", "attr2", "attrX", "attrY", "attrZ" ], "operation": "remove" } }
],
"loader": {
"orientdb": {
"dbURL": "remote:localhost/test2",
"dbType": "graph"
}
}
}
The sample data I used to run the test is as follows:
10,11,"A","B",100,200,1
11,12,"B","C",110,201,5
12,14,"C","D",90,250,10
14,13,"D","E",105,210,3
When I run the oetl.sh script with the given configuration and sample data, it creates 4 vertices instead of 5 and no edges. There are no attributes on the vertices at all.
So these are the questions:
Is there a way in the vertex clause to specify vertex attributes/fields the same way that one can do for edges (i.e. edgeFields)? The documentation doesn't mention anything about it but it seems odd that you wouldn't be able to do it.
Rather than relying on the edge to create the outbound vertex, should I instead be creating two vertices explicitly and if so how do I specify that in the configuration file? When I try to add two "vertex" clauses it only seems to pick up the last one as the "current" vertex.
It's possible that the specific edge (id1 -> id2) already exists. Is it possible to only update the edge attributes in this case?
My sinking feeling is that given the complexity and number of things I'm trying to pack into this that it will be simpler to write my own ETL (e.g. using the Java API) instead of relying on oetl, but I was hoping I'd be able to avoid doing that if only because it's more maintainable.

Multiple range keys in couchdb views

I've been searching for a solution since few hours without success...
I just want to do this request in couchdb with a view:
select * from database where (id >= 3000000 AND id <= 3999999) AND gyro_y >= 1000
I tried this:
function(doc) {
if(doc.id && doc.Gyro_y){
emit([doc.id,doc.Gyro_y], null);
}
}
Here is my document (record in couchdb):
{
"_id": "f97968bee9674259c75b89658b09f93c",
"_rev": "3-4e2cce33e562ae502d6416e0796fcad1",
"id": "30000002",
"DateHeure": "2016-06-16T02:08:00Z",
"Latitude": 1000,
"Longitude": 1000,
"Gyro_x": -242,
"Gyro_y": 183,
"Gyro_z": -156,
"Accel_x": -404,
"Accel_y": -2424,
"Accel_z": -14588
}
I then do an HTTP request like so:
http://localhost:5984/arduino/_design/filter/_view/bygyroy?startkey=["3000000",1000]&endkey=["3999999",9999999]&include_docs=true
I get this as an answer:
{
total_rows: 10,
offset: 8,
rows: [{
id: "f97968bee9674259c75b89658b09f93c",
key: [
"01000002",
183
],
value: null,
doc: {
_id: "f97968bee9674259c75b89658b09f93c",
_rev: "3-4e2cce33e562ae502d6416e0796fcad1",
id: "30000002",
DateHeure: "2016-06-16T02:08:00Z",
Latitude: 1000,
Longitude: 1000,
Gyro_x: -242,
Gyro_y: 183,
Gyro_z: -156,
Accel_x: -404,
Accel_y: -2424,
Accel_z: -14588
}
}
]
}
So it's working for the id but it's not working for the second key gyro_y.
Thanks for your help.
When you specify arrays as your start/end keys, the results are filtered in a "cascade". In other words, it moves from left to right, and only if something was matched by the previous key, will it be matched by the next key.
In this case, you'll only find Gyro_y >= 1000 when that document also matches the first condition of 3000000 <= id <= 3999999.
Your SQL example does not translate exactly to what you are doing in CouchDB. In SQL, it'll find both conditions and then find the intersection amongst your resulting rows. I would read up on view collation to understand these inner-workings of CouchDB.
To solve your problem right now, I would simply switch the order you are emitting your keys. By putting the Gyro_y value first, you should get the results you've described.

Query to get exact matches of Elastic Field with multile values in Array

I want to write a query in Elastic that applies a filter based on values i have in an array (in my R program). Essentially the query:
Matches a time range (time field in Elastic)
Matches "trackId" field in Elastic to any value in array oth_usr
Return 2 fields - "trackId", "propertyId"
I have the following primitive version of the query but do not know how to use the oth_usr array in a query (part 2 above).
query <- sprintf('{"query":{"range":{"time":{"gte":"%s","lte":"%s"}}}}',start_date,end_date)
view_list <- elastic::Search(index = "organised_recent",type = "PROPERTY_VIEW",size = 10000000,
body=query, fields = c("trackId", "propertyId"))$hits$hits
You need to add a terms query and embed it as well as the range one into a bool/must query. Try updating your query like this:
terms <- paste(sprintf("\"%s\"", oth_usr), collapse=", ")
query <- sprintf('{"query":{"bool":{"must":[{"terms": {"trackId": [%s]}},{"range": {"time": {"gte": "%s","lte": "%s"}}}]}}}',terms,start_date,end_date)
I'm not fluent in R syntax, but this is raw JSON query that works.
It checks whether your time field matches given range (start_time and end_time) and whether one of your terms exact matches trackId.
It returns only trackId, propertyId fields, as per your request:
POST /indice/_search
{
"_source": {
"include": [
"trackId",
"propertyId"
]
},
"query": {
"bool": {
"must": [
{
"range": {
"time": {
"gte": "start_time",
"lte": "end_time"
}
}
},
{
"terms": {
"trackId": [
"terms"
]
}
}
]
}
}
}

Kendo - Grid - Aggregate with Complex Objects

I have a Kendo UI grid. The grid has a datasource with complex object data. For example, {"foo": {"bar" : 10}}. Although the column field can navigate the object graph (i.e. foo.bar), the aggregate field doesn't seem to be able to.
Here's the code:
var grid = $("#grid").kendoGrid({
dataSource: {
data: [
{"foo": {"bar": 10}},
{"foo": {"bar": 20}}
],
aggregate: [
{field: "foo.bar", aggregate: "sum"}
]
},
columns: [
{
field: "foo.bar",
footerTemplate: "Sum: #= sum # "
}
]
}).data("kendoGrid");
Here's the fiddle:
http://jsfiddle.net/e6shF/1/
Firebug reports "TypeError: data.foo is undefined" in line 8 of kendo.all.min.js.
Am I doing something incorrectly? If this is a bug in Kendo, is there a way to work around this? I have to keep the objects complex.
Here's a "better" anwser from Kendo Support:
The behavior you are experiencing is caused by the fact that the "path" you have specified will be used as a key in the map created as result of the aggregation. Producing a object similar to the following:
{ "foo.bar" : { sum: 30 } }
Unfortunately, this construct is not supported by the footer template generation and will not be resolved correctly. A possible workaround for this scenario is to use a function instead. I have modify the sample in order to illustrate this.
var grid = $("#grid").kendoGrid({
dataSource: {
data: [
{"foo": {"bar": 10}},
{"foo": {"bar": 20}}
],
aggregate: [
{field: "foo.bar", aggregate: "sum"}
]
},
columns: [
{
field: "foo.bar",
footerTemplate: function(data) { return "Sum: " + data["foo.bar"].sum; }
}
]
}).data("kendoGrid");
It is not possible to have complex objects in aggregates since dynamically generated function for evaluating it, thinks that foo.bar is the name of the field (just one field)?
Do you really need that complex field?
I might understand that the server (providing the data of the grid) sends that complex foo but you can always flatten it using parse or data functions in the datasource. Something like this:
var grid = $("#grid").kendoGrid({
dataSource:{
data:[
{"foo":{"bar":10}},
{"foo":{"bar":20}}
],
aggregate:[
{field:"foo_bar", aggregate:"sum"}
],
schema: {
parse:function (data) {
var res = [];
$.each(data, function (idx, elem) {
res.push({ "foo_bar":elem.foo.bar })
});
return res;
}
}
},
columns: [
{
field: "foo_bar",
footerTemplate:"Sum: #= sum # "
}
]
}).data("kendoGrid");
Where I transform received foo.bar into foo_bar and use this for aggregation.

Resources