Groovy to collect and remove duplicates from a complex json structure

Groovy to collect and remove duplicates from a complex json structure - recursion

this is my first question at Stack Overflow, so, firstly, Hello colleagues and many thanks in advance.
I have this json input message I'm dealing with, but I cannot find the key to get the message I need for further processing
{
"callId": "70f354ed47e643bc9d1cd6595e018f9b",
"errorCode": 0,
"apiVersion": 2,
"statusCode": 200,
"statusReason": "OK",
"time": "2022-08-01T07:56:34.631Z",
"results": [
{
"UID": "5abc8d08d8e148158610c7c6776c4ad5",
"groups": {
"organizations": [
{
"businessModels": [
{
"keys": [
"Company Code",
"Sales Org",
"Distribution Channel",
"Division"
],
"businessEntities": [
{
"codes": [
"HU50",
"HU50_HU50",
"HU50_HU50_10",
"HU50_HU50_10_10"
]
}
],
"id": "SalesArea_161185"
},
{
"keys": [
"ShiptoInc_SalesArea",
"ShiptoInc_Id"
],
"businessEntities": [
{
"codes": [
"HU50_HU50_10_10",
"100563692"
]
},
{
"codes": [
"HU50_HU50_10_10",
"100563691"
]
}
],
"id": "ShiptoInc_161185"
},
{
"keys": [
"Payer_SalesArea",
"Payer_Id"
],
"businessEntities": [
{
"codes": [
"HU50_HU50_10_10",
"960004763"
]
}
],
"id": "Payer_161185"
}
]
}
]
}
},
{
"UID": "d9f2b591f58e4aeebaa0b88175d4fe3c",
"groups": {
"organizations": [
{
"businessModels": [
{
"keys": [
"Company Code",
"Sales Org",
"Distribution Channel",
"Division"
],
"businessEntities": [
{
"codes": [
"HU50",
"HU50_HU50",
"HU50_HU50_10",
"HU50_HU50_10_10"
]
}
],
"id": "SalesArea_161185"
},
{
"keys": [
"ShiptoInc_SalesArea",
"ShiptoInc_Id"
],
"businessEntities": [
{
"codes": [
"HU50_HU50_10_10",
"100563692"
]
},
{
"codes": [
"HU50_HU50_10_10",
"100563691"
]
}
],
"id": "ShiptoInc_161185"
},
{
"keys": [
"Payer_SalesArea",
"Payer_Id"
],
"businessEntities": [
{
"codes": [
"HU50_HU50_10_10",
"960004763"
]
}
],
"id": "Payer_161185"
}
]
}
]
}
},
{
"UID": "74a9ccbc9b8549d1a7726ac1f77f7ea9",
"groups": {
"organizations": [
{
"businessModels": [
{
"keys": [
"ShiptoInc_SalesArea",
"ShiptoInc_Id"
],
"businessEntities": [
{
"codes": [
"HU50_HU50_10_10",
"100563692"
]
}
],
"id": "ShiptoInc_161185"
}
]
}
]
}
},
{
"UID": "d5ed356a3c2a48568ccacb8d9c7c5506",
"groups": {
"organizations": [
{
"businessModels": [
{
"keys": [
"Company Code",
"Sales Org",
"Distribution Channel",
"Division"
],
"businessEntities": [
{
"codes": [
"HU50",
"HU50_HU50",
"HU50_HU50_10",
"HU50_HU50_10_10"
]
}
],
"id": "SalesArea_161185"
},
{
"keys": [
"ShiptoInc_SalesArea",
"ShiptoInc_Id"
],
"businessEntities": [
{
"codes": [
"HU50_HU50_10_10",
"100563692"
]
},
{
"codes": [
"HU50_HU50_20_20",
"100563692"
]
},
{
"codes": [
"HU50_HU50_10_10",
"100563691"
]
}
],
"id": "ShiptoInc_161185"
},
{
"keys": [
"Payer_SalesArea",
"Payer_Id"
],
"businessEntities": [
{
"codes": [
"HU50_HU50_10_10",
"960004763"
]
}
],
"id": "Payer_161185"
}
]
}
]
}
}
],
"objectsCount": 4,
"totalCount": 4
}
For a known id ("Payer_161185" or "ShiptoInc_161185") and a given value ("100563692") we need to extract all repetitions of businessEntities.codes of all UIDs and after get the list, remove duplicates.
For example, for "ShiptoInc_161185", the desired output would be:
{ "salesAreas": ["HU50_HU50_10_10","HU50_HU50_20_20"]}
This output is the list of salesAreas for the given value 100563692 into all the id = ShiptoInc_161185
Other case that I would like to solve is:
How could I add the id instead of text salesAreas?. Something like this {"Payer_111":["HU50_HU50_10_10","HU50_HU50_30_20"],"Payer_222":["HU40_HU40_10_10","HU20_HU20_30_20"]}. This means the id wouldn't be provided, just the prefix Payer_
Your help is appreciated.
***I solved the second requirement
def data = new JsonSlurper().parseText(body);
def bModelsIdFiltered = data.results.groups.organizations.businessModels
.collect { it[0] }.flatten()
.findAll { it.id.contains('Payer_') }
def nList = [];
def rembM = bModelsIdFiltered.each{
it.businessEntities.codes.each { code ->
nList.add(code.plus(it.id))
}
}
println "nl " + nList;
def codesFiltered = nList
.findAll { '100563692' in it }
return codesFiltered;

If the structure is rigid and i understood the task correctly (namely, you should find codes that have value 100563692 in the same array), you can doing it like that:
class FindCodesSpec extends Specification {
def testString = '''<insert_your_string_here>'''
def flattenOnce(List array) {
return array.inject([]) { res, el -> res + el }
}
def findCodes(String message, String id, String code) {
def data = new JsonSlurper().parseText(message)
def bModelsIdFiltered = data.results.groups.organizations.businessModels
.collect { it[0] }.flatten()
.findAll { it.id == id }
def codesFiltered = flattenOnce(bModelsIdFiltered.businessEntities.codes)
.findAll { code in it }
def uniqueCodes = codesFiltered.flatten().unique() - code
return JsonOutput.toJson(['salesAreas': uniqueCodes])
}
def 'run test'() {
expect:
'''{"salesAreas":["HU50_HU50_10_10","HU50_HU50_20_20"]}''' == findCodes(testString, 'ShiptoInc_161185', '100563692')
}
}

Related

jq array filter for nested array elements

I am trying to add a new user in below json which matches group NP01-RW. i am able to do without NP01-RW but not able to select users under NP01-RW and return updated json.
{
"id": 181,
"guid": "c9b7dbde-63de-42cc-9840-1b4a06e13364",
"isEnabled": true,
"version": 17,
"service": "Np-Hue",
"name": "DATASCIENCE-CUROPT-RO",
"policyType": 0,
"policyPriority": 0,
"isAuditEnabled": true,
"resources": {
"database": {
"values": [
"hive_cur_acct_1dev",
"hive_cur_acct_1eng",
"hive_cur_acct_1rwy",
"hive_cur_acct_1stg",
"hive_opt_acct_1dev",
"hive_opt_acct_1eng",
"hive_opt_acct_1stg",
"hive_opt_acct_1rwy"
],
"isExcludes": false,
"isRecursive": false
},
"column": {
"values": [
"*"
],
"isExcludes": false,
"isRecursive": false
},
"table": {
"values": [
"*"
],
"isExcludes": false,
"isRecursive": false
}
},
"policyItems": [
{
"accesses": [
{
"type": "select",
"isAllowed": true
},
{
"type": "update",
"isAllowed": true
},
{
"type": "create",
"isAllowed": true
},
{
"type": "drop",
"isAllowed": true
},
{
"type": "alter",
"isAllowed": true
},
{
"type": "index",
"isAllowed": true
},
{
"type": "lock",
"isAllowed": true
},
{
"type": "all",
"isAllowed": true
},
{
"type": "read",
"isAllowed": true
},
{
"type": "write",
"isAllowed": true
}
],
"users": [
"user1",
"user2",
"user3"
],
"groups": [
"NP01-RW"
],
"conditions": [],
"delegateAdmin": false
},
{
"accesses": [
{
"type": "select",
"isAllowed": true
}
],
"users": [
"user1"
],
"groups": [
"NP01-RO"
],
"conditions": [],
"delegateAdmin": false
}
],
"denyPolicyItems": [],
"allowExceptions": [],
"denyExceptions": [],
"dataMaskPolicyItems": [],
"rowFilterPolicyItems": [],
"options": {},
"validitySchedules": [],
"policyLabels": [
"DATASCIENCE-CurOpt-RO_NP01"
]
}
below is what i have tried but it returns part of the JSON matching NP01-RW and not full JSON
jq --arg username "$sync_userName" '.policyItems[] | select(.groups[] | IN("NP01-RO")).users += [$username]' > ${sync_policyName}.json

Operator precedence in jq is not always intuitive. Your program is parsed as:
.policyItems[] | (select(.groups[] | IN("NP01-RO")).users += [$username])
Which first streams all policyItems and only then changes them, leaving you with policyItems only in the output.
You need to make sure that the stream selects the correct values, which you can then assign:
(.policyItems[] | select(.groups[] | IN("NP01-RO")).users) += [$username]
This will do the assignment, but still return the full input (.).

jq: filter nested array objects

Here my documents:
[
{
"id": "3e67b455-8cdb-4bc0-a5e1-f90253870fc9",
"identifier": [
{
"system": {
"value": "urn:oid:2.16.724.4.9.20.91-INVENTAT"
},
"value": {
"value": "04374"
}
},
{
"system": {
"value": "urn:oid:2.16.724.4.9.20.2-INVENTAT"
},
"value": {
"value": "INFP3"
}
},
{
"system": {
"value": "urn:oid:INVENTAT"
},
"value": {
"value": "CBOU035"
}
}
]
},
{
"id": "0f22e5ff-70bc-457f-bdaf-7afe86d478de",
"identifier": [
{
"system": {
"value": "urn:oid:2.16.724.4.9.20.91-INVENTAT"
},
"value": {
"value": "04376"
}
},
{
"system": {
"value": "urn:oid:2.16.724.4.9.20.2-INVENTAT"
},
"value": {
"value": "INF07"
}
},
{
"system": {
"value": "urn:oid:INVENTAT"
},
"value": {
"value": "S527918"
}
}
]
},
{
"id": "a1ea574c-438b-443c-ad87-d31d09d581f0",
"identifier": [
{
"system": {
"value": "urn:oid:2.16.724.4.9.20.91-INVENTAT"
},
"value": {
"value": "08096"
}
},
{
"system": {
"value": "urn:oid:2.16.724.4.9.20.2-INVENTAT"
},
"value": {
"value": "INF04"
}
},
{
"system": {
"value": "urn:oid:INVENTAT"
},
"value": {
"value": "5635132"
}
}
]
}
]
I need to filter .identifier where system.value="urn:oid:2.16.724.4.9.20.91-INVENTAT" or system.value="urn:oid:2.16.724.4.9.20.2-INVENTAT" and pick .value.value.
Desired output:
[
{
"id": "3e67b455-8cdb-4bc0-a5e1-f90253870fc9",
"oid1": "04374",
"oid2": "INFP3"
},
{
"id": "0f22e5ff-70bc-457f-bdaf-7afe86d478de",
"oid1": "04376",
"oid2": "INF07"
},
{
"id": "a1ea574c-438b-443c-ad87-d31d09d581f0",
"oid1": "08096",
"oid2": "INF04"
}
]
I've tried:
map(
{
id,
oid1: select(.identifier?[]?.system.value == "urn:oid:2.16.724.4.9.20.91-INVENTAT") | .identifier[].value.value,
oid2: select(.identifier?[]?.system.value == "urn:oid:2.16.724.4.9.20.2-INVENTAT") | .identifier[].value.value
}
)
But output is not what I need: you can find it on this jqplay.
Any ideas?

This uses IN to check for your query strings, and with_entries on an array to generate the indeces for the oid keys.
jq '
map({id} + (.identifier | map(select(IN(.system.value;
"urn:oid:2.16.724.4.9.20.91-INVENTAT",
"urn:oid:2.16.724.4.9.20.2-INVENTAT"
)).value.value) | with_entries(.key |= "oid\(. + 1)")))
'
[
{
"id": "3e67b455-8cdb-4bc0-a5e1-f90253870fc9",
"oid1": "04374",
"oid2": "INFP3"
},
{
"id": "0f22e5ff-70bc-457f-bdaf-7afe86d478de",
"oid1": "04376",
"oid2": "INF07"
},
{
"id": "a1ea574c-438b-443c-ad87-d31d09d581f0",
"oid1": "08096",
"oid2": "INF04"
}
]
Demo

Here is a ruby to do that:
ruby -r json -e '
def walk(x, filt)
rtr=[]
rep=["uab", "ub"]
x.each{|e|
rd={"id"=>e["id"]}.merge(
e["identifier"].
filter{|ea| filt.include?(ea["system"]["value"])}.
map.with_index(1){|di, i| ["#{rep[i%2]}", "#{di["value"]["value"]}"]}.to_h)
rtr << rd
}
rtr
end
data=JSON.parse($<.read)
puts walk(data, ["urn:oid:2.16.724.4.9.20.91-INVENTAT", "urn:oid:2.16.724.4.9.20.2-INVENTAT"]).to_json
' file
Prints:
[{"id":"3e67b455-8cdb-4bc0-a5e1-f90253870fc9","ub":"04374","uab":"INFP3"},{"id":"0f22e5ff-70bc-457f-bdaf-7afe86d478de","ub":"04376","uab":"INF07"},{"id":"a1ea574c-438b-443c-ad87-d31d09d581f0","ub":"08096","uab":"INF04"}]

How can I duplicate multiple times an existing object within a JSON array using jq?

I have the following json file:
{
"actions": [
{
"values": "test",
"features": [
{
"v1": 100,
"v2": {
"dates": [
"2020-04-08 06:58:26",
"2020-04-08 06:58:26"
]
}
}
]
}
]
}
I would like to append n-times the object within the "actions" array to the end of it, creating n+1 total objects.
Expected output if n=2:
{
"actions": [
{
"values": "test",
"features": [
{
"v1": 100,
"v2": {
"dates": [
"2020-04-08 06:58:26",
"2020-04-08 06:58:26"
]
}
}
]
},
{
"values": "test",
"features": [
{
"v1": 100,
"v2": {
"dates": [
"2020-04-08 06:58:26",
"2020-04-08 06:58:26"
]
}
}
]
},
{
"values": "test",
"features": [
{
"v1": 100,
"v2": {
"dates": [
"2020-04-08 06:58:26",
"2020-04-08 06:58:26"
]
}
}
]
}
]
}
I found this answer [How can I duplicate an existing object within a JSON array using jq? however it only works with one element at the end.

You can just use a reduce() function with range() together to create the index to include the object at.
jq --arg n 2 'reduce range(0, ($n|tonumber)) as $d (.; .actions[$d+1] += .actions[0] )' json

Why is google analytics return different number of results with same parameters?

Reporting API v4
I am a developer. I have my clients google adwords and analytics. I have been using adwords and analytics report API for almost a year now.
I am also using https://ga-dev-tools.appspot.com/query-explorer/. The query builder. For comparing if I have retrieve the right amount of data.
I don't know if its an error or not but its acting weird.
Try number 1 using https://ga-dev-tools.appspot.com/query-explorer/
I tried to add 2 metrics and 7 dimensions. This Account ID, contains 1 million data in only 1 month. I know this because I retrieved 1 million in a range of july 25, 2018 - august 16, 2018.
Then, here's the interesting part. I run the query again with the same parameters, it retrieves 5999 results. I did it again it returns 1 million. The results keep changing. I thought its the error in my code but its also happening in the query builder.
What do you guys think? is it a bug or not?
You can try this if you have more than a million data.
I know its not related to coding. But Google Analytics doesn't have forums just like Adwords.
Try number 2 using this link https://developers.google.com/analytics/devguides/reporting/core/v4/rest/v4/reports/batchGet
this is my request
{
"reportRequests": [
{
"dateRanges": [
{
"endDate": "2018-08-16",
"startDate": "2018-07-16"
}
],
"dimensions": [
{
"name": "ga:dimension2"
},
{
"name": "ga:dimension3"
},
{
"name": "ga:dimension1"
},
{
"name": "ga:adPlacementDomain"
}
],
"pageSize": 5,
"viewId": "********",
"samplingLevel": "LARGE",
"metrics": [
{
"expression": "ga:entrances"
},
{
"expression": "ga:newUsers"
}
],
"includeEmptyRows": true
}
]
}
The return of rowCount is sometimes 2111 and then 1000000.
This my response json with 1million result:
{
"reports": [
{
"columnHeader": {
"dimensions": [
"ga:dimension2",
"ga:dimension3",
"ga:dimension1",
"ga:adPlacementDomain"
],
"metricHeader": {
"metricHeaderEntries": [
{
"name": "ga:entrances",
"type": "INTEGER"
},
{
"name": "ga:newUsers",
"type": "INTEGER"
}
]
}
},
"data": {
"rows": [
{
"dimensions": [
"(other)",
"(other)",
"(other)",
"(other)"
],
"metrics": [
{
"values": [
"120834",
"68730"
]
}
]
},
{
"dimensions": [
"1000025873.1532426892",
"1532426891790.o9z84x",
"2018-07-24T11:08:15.449+01:00",
"unknown"
],
"metrics": [
{
"values": [
"0",
"0"
]
}
]
},
{
"dimensions": [
"1000025873.1532426892",
"1532426891790.o9z84x",
"2018-07-24T11:08:17.589+01:00",
"unknown"
],
"metrics": [
{
"values": [
"0",
"0"
]
}
]
},
{
"dimensions": [
"1000025873.1532426892",
"1532426891790.o9z84x",
"2018-07-24T11:08:31.809+01:00",
"unknown"
],
"metrics": [
{
"values": [
"0",
"0"
]
}
]
},
{
"dimensions": [
"1000025873.1532426892",
"1532427045552.p38pk78",
"2018-07-24T11:09:06.43+01:00",
"unknown"
],
"metrics": [
{
"values": [
"0",
"0"
]
}
]
}
],
"totals": [
{
"values": [
"158626",
"90225"
]
}
],
"rowCount": 1000000,
"minimums": [
{
"values": [
"0",
"0"
]
}
],
"maximums": [
{
"values": [
"120834",
"68730"
]
}
],
"isDataGolden": true
},
"nextPageToken": "5"
}
]
}
another response example when i have less 1million results:
{
"reports": [
{
"columnHeader": {
"dimensions": [
"ga:dimension2",
"ga:dimension3",
"ga:dimension1",
"ga:adPlacementDomain"
],
"metricHeader": {
"metricHeaderEntries": [
{
"name": "ga:entrances",
"type": "INTEGER"
},
{
"name": "ga:newUsers",
"type": "INTEGER"
}
]
}
},
"data": {
"rows": [
{
"dimensions": [
"1002211166.1531434756",
"1531762918308.fjnj7pa6",
"2018-07-16T18:41:58.307+01:00",
"mobileapp::2-com.forsbit.spider"
],
"metrics": [
{
"values": [
"1",
"0"
]
}
]
},
{
"dimensions": [
"1002211166.1531434756",
"1531771001486.jawfrpz8",
"2018-07-16T20:56:41.482+01:00",
"mobileapp::2-com.forsbit.spider"
],
"metrics": [
{
"values": [
"1",
"0"
]
}
]
},
{
"dimensions": [
"1002211166.1531434756",
"1531772475507.7n4w2qzb",
"2018-07-16T21:21:15.503+01:00",
"mobileapp::2-com.forsbit.spider"
],
"metrics": [
{
"values": [
"1",
"0"
]
}
]
},
{
"dimensions": [
"1002211166.1531434756",
"1531859165986.zl7we6a5",
"2018-07-17T21:26:05.977+01:00",
"mobileapp::2-com.forsbit.spider"
],
"metrics": [
{
"values": [
"1",
"0"
]
}
]
},
{
"dimensions": [
"1002211166.1531434756",
"1531859632678.dz7hccsa",
"2018-07-17T21:33:52.673+01:00",
"mobileapp::2-com.forsbit.spider"
],
"metrics": [
{
"values": [
"1",
"0"
]
}
]
},
{
"dimensions": [
"1002211166.1531434756",
"1531861026792.kw71ngx9",
"2018-07-17T21:42:31.667+01:00",
"mobileapp::2-com.forsbit.spider"
],
"metrics": [
{
"values": [
"1",
"0"
]
}
]
}
],
"totals": [
{
"values": [
"2111",
"233"
]
}
],
"rowCount": 2112,
"minimums": [
{
"values": [
"0",
"0"
]
}
],
"maximums": [
{
"values": [
"1",
"1"
]
}
],
"isDataGolden": true
},
"nextPageToken": "6"
}
]
}

I am assuming that you have kept all the queries intact. Double check just to make sure.
Second step would be to check for sampling. Check the field samplingSpaceSizes and samplesReadCounts in the response for sampling. If these fields were not defined that means no sampling was introduced.

Different results between date histogram and date range on Elastic Search

I would like to analyse my logs data with Elastic Search/Kibana and count unique customer by month.
Results are different when I use a date histogram aggregation and date range aggregation.
Here is the date histogram query :
"query": {
"query_string": {
"query": "_type:logs AND created_at:[2015-04-01 TO now]",
"analyze_wildcard": true
}
},
"size": 0,
"aggs": {
"2": {
"date_histogram": {
"field": "created_at",
"interval": "1M",
"min_doc_count": 1
},
"aggs": {
"1": {
"cardinality": {
"field": "customer.id"
}
}
}
}
}
And results :
"aggregations": {
"2": {
"buckets": [
{
"1": {
"value": 595805
},
"key_as_string": "2015-04-01T00:00:00.000Z",
"key": 1427839200000,
"doc_count": 6410438
},
{
"1": {
"value": 647788
},
"key_as_string": "2015-05-01T00:00:00.000Z",
"key": 1430431200000,
"doc_count": 6669555
},...
Here is the date range query :
"query": {
"query_string": {
"query": "_type:logs AND created_at:[2015-04-01 TO now]",
"analyze_wildcard": true
}
},
"size": 0,
"aggs": {
"2": {
"date_range": {
"field": "created_at",
"ranges": [
{
"from": "2015-04-01",
"to": "2015-05-01"
},
{
"from": "2015-05-01",
"to": "2015-06-01"
}
]
},
"aggs": {
"1": {
"cardinality": {
"field": "customer.id"
}
}
}
}
}
And the response :
"aggregations": {
"2": {
"buckets": [
{
"1": {
"value": 592179
},
"key": "2015-04-01T00:00:00.000Z-2015-05-01T00:00:00.000Z",
"from": 1427846400000,
"from_as_string": "2015-04-01T00:00:00.000Z",
"to": 1430438400000,
"to_as_string": "2015-05-01T00:00:00.000Z",
"doc_count": 6411884
},
{
"1": {
"value": 616995
},
"key": "2015-05-01T00:00:00.000Z-2015-06-01T00:00:00.000Z",
"from": 1430438400000,
"from_as_string": "2015-05-01T00:00:00.000Z",
"to": 1433116800000,
"to_as_string": "2015-06-01T00:00:00.000Z",
"doc_count": 6668060
}
]
}
}
In the first case, I have 595,805 for April and 647,788 for May
In the second case, I have 592,179 for April and 616,995 for May
Someone could explain me why I have these differences between these use cases ?
Thank you
I update my first post to add another example
I add another example with fewer data (on 1 day) but with the same issue. Here is the first request with date histogram :
{
"size": 0,
"query": {
"query_string": {
"query": "_type:logs AND logs.created_at:[2015-04-01 TO 2015-04-01]",
"analyze_wildcard": true
}
},
"aggs": {
"2": {
"date_histogram": {
"field": "created_at",
"interval": "1h",
"pre_zone": "00:00",
"pre_zone_adjust_large_interval": true,
"min_doc_count": 1
},
"aggs": {
"1": {
"cardinality": {
"field": "customer.id"
}
}
}
}
}
}
And we can see 660 unique count with 1717 doc count for the first hour :
{
"hits":{
"total":203961,
"max_score":0,
"hits":[
]
},
"aggregations":{
"2":{
"buckets":[
{
"1":{
"value":660
},
"key_as_string":"2015-04-01T00:00:00.000Z",
"key":1427846400000,
"doc_count":1717
},
{
"1":{
"value":324
},
"key_as_string":"2015-04-01T01:00:00.000Z",
"key":1427850000000,
"doc_count":776
},
{
"1":{
"value":190
},
"key_as_string":"2015-04-01T02:00:00.000Z",
"key":1427853600000,
"doc_count":481
}
]
}
}
}
But on the second request with the date range :
{
"size": 0,
"query": {
"query_string": {
"query": "_type:logs AND logs.created_at:[2015-04-01 TO 2015-04-01]",
"analyze_wildcard": true
}
},
"aggs": {
"2": {
"date_range": {
"field": "created_at",
"ranges": [
{
"from": "2015-04-01T00:00:00",
"to": "2015-04-01T01:00:00"
},
{
"from": "2015-04-01T01:00:00",
"to": "2015-04-01T02:00:00"
}
]
},
"aggs": {
"1": {
"cardinality": {
"field": "customer.id"
}
}
}
}
}
}
We can see only 633 unique count with 1717 doc count :
{
"hits":{
"total":203961,
"max_score":0,
"hits":[
]
},
"aggregations":{
"2":{
"buckets":[
{
"1":{
"value":633
},
"key":"2015-04-01T00:00:00.000Z-2015-04-01T01:00:00.000Z",
"from":1427846400000,
"from_as_string":"2015-04-01T00:00:00.000Z",
"to":1427850000000,
"to_as_string":"2015-04-01T01:00:00.000Z",
"doc_count":1717
},
{
"1":{
"value":328
},
"key":"2015-04-01T01:00:00.000Z-2015-04-01T02:00:00.000Z",
"from":1427850000000,
"from_as_string":"2015-04-01T01:00:00.000Z",
"to":1427853600000,
"to_as_string":"2015-04-01T02:00:00.000Z",
"doc_count":776
}
]
}
}
}
Please someone could tell me why ? Thank you

When using the date_histogram aggregation you need to take into account the timezone, which date_range doesn't as it's always using the GMT timezone.
If you look at the long millisecond values in your results, you'll see the following:
For your date histogram, from: 1427839200000 is actually equal to 2015-03-31T22:00:00.000Z which differs from the key_as_string value (i.e. 2015-04-01T00:00:00.000Z) that is formatted according to the GMT timezone.
In your first aggregation, try explicitly specifying the time_zone parameter to be your current timezone (apparently GMT+2) and you should get the same results:
"date_histogram": {
"field": "created_at",
"interval": "1M",
"min_doc_count": 1,
"time_zone": -2
},

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Groovy to collect and remove duplicates from a complex json structure - recursion

Related

jq array filter for nested array elements

jq: filter nested array objects

How can I duplicate multiple times an existing object within a JSON array using jq?

Why is google analytics return different number of results with same parameters?

Different results between date histogram and date range on Elastic Search

Categories

Resources