Is there any function that is used to decrease the date by year when the key has null value - mule4

FOllowing is the input that i have
[{
"date": " "
},
{
"date": "2022-01-21"
},{
"date": " "
},{
"date": " "
},{
"date": " "
}]
And the required output is as follows
[
{
"Date": "2022-01-21"
},
{
"Date": "2021-01-21"
},
{
"Date": "2020-01-21"
},
{
"Date": "2019-01-21"
},
{
"Date": "2018-01-21"
}
]
Thanks in advance!!

It should be obvious that there is no buil-in function in DataWeave to achieve this result. You can create a custom function for it. For example using a recursive function we can get the expected output from your input:
%dw 2.0
output application/json
fun decDates(a, nextDate)=
[{date: nextDate}]
++
if (sizeOf(a) > 1) decDates(dw::core::Arrays::drop(a, 1), nextDate - |P1Y|) else []
---
decDates(payload, |2022-03-21|)
Output:
[
{
"date": "2022-03-21"
},
{
"date": "2021-03-21"
},
{
"date": "2020-03-21"
},
{
"date": "2019-03-21"
},
{
"date": "2018-03-21"
}
]

Related

jq array filter for nested array elements

I am trying to add a new user in below json which matches group NP01-RW. i am able to do without NP01-RW but not able to select users under NP01-RW and return updated json.
{
"id": 181,
"guid": "c9b7dbde-63de-42cc-9840-1b4a06e13364",
"isEnabled": true,
"version": 17,
"service": "Np-Hue",
"name": "DATASCIENCE-CUROPT-RO",
"policyType": 0,
"policyPriority": 0,
"isAuditEnabled": true,
"resources": {
"database": {
"values": [
"hive_cur_acct_1dev",
"hive_cur_acct_1eng",
"hive_cur_acct_1rwy",
"hive_cur_acct_1stg",
"hive_opt_acct_1dev",
"hive_opt_acct_1eng",
"hive_opt_acct_1stg",
"hive_opt_acct_1rwy"
],
"isExcludes": false,
"isRecursive": false
},
"column": {
"values": [
"*"
],
"isExcludes": false,
"isRecursive": false
},
"table": {
"values": [
"*"
],
"isExcludes": false,
"isRecursive": false
}
},
"policyItems": [
{
"accesses": [
{
"type": "select",
"isAllowed": true
},
{
"type": "update",
"isAllowed": true
},
{
"type": "create",
"isAllowed": true
},
{
"type": "drop",
"isAllowed": true
},
{
"type": "alter",
"isAllowed": true
},
{
"type": "index",
"isAllowed": true
},
{
"type": "lock",
"isAllowed": true
},
{
"type": "all",
"isAllowed": true
},
{
"type": "read",
"isAllowed": true
},
{
"type": "write",
"isAllowed": true
}
],
"users": [
"user1",
"user2",
"user3"
],
"groups": [
"NP01-RW"
],
"conditions": [],
"delegateAdmin": false
},
{
"accesses": [
{
"type": "select",
"isAllowed": true
}
],
"users": [
"user1"
],
"groups": [
"NP01-RO"
],
"conditions": [],
"delegateAdmin": false
}
],
"denyPolicyItems": [],
"allowExceptions": [],
"denyExceptions": [],
"dataMaskPolicyItems": [],
"rowFilterPolicyItems": [],
"options": {},
"validitySchedules": [],
"policyLabels": [
"DATASCIENCE-CurOpt-RO_NP01"
]
}
below is what i have tried but it returns part of the JSON matching NP01-RW and not full JSON
jq --arg username "$sync_userName" '.policyItems[] | select(.groups[] | IN("NP01-RO")).users += [$username]' > ${sync_policyName}.json
Operator precedence in jq is not always intuitive. Your program is parsed as:
.policyItems[] | (select(.groups[] | IN("NP01-RO")).users += [$username])
Which first streams all policyItems and only then changes them, leaving you with policyItems only in the output.
You need to make sure that the stream selects the correct values, which you can then assign:
(.policyItems[] | select(.groups[] | IN("NP01-RO")).users) += [$username]
This will do the assignment, but still return the full input (.).

jq: filter nested array objects

Here my documents:
[
{
"id": "3e67b455-8cdb-4bc0-a5e1-f90253870fc9",
"identifier": [
{
"system": {
"value": "urn:oid:2.16.724.4.9.20.91-INVENTAT"
},
"value": {
"value": "04374"
}
},
{
"system": {
"value": "urn:oid:2.16.724.4.9.20.2-INVENTAT"
},
"value": {
"value": "INFP3"
}
},
{
"system": {
"value": "urn:oid:INVENTAT"
},
"value": {
"value": "CBOU035"
}
}
]
},
{
"id": "0f22e5ff-70bc-457f-bdaf-7afe86d478de",
"identifier": [
{
"system": {
"value": "urn:oid:2.16.724.4.9.20.91-INVENTAT"
},
"value": {
"value": "04376"
}
},
{
"system": {
"value": "urn:oid:2.16.724.4.9.20.2-INVENTAT"
},
"value": {
"value": "INF07"
}
},
{
"system": {
"value": "urn:oid:INVENTAT"
},
"value": {
"value": "S527918"
}
}
]
},
{
"id": "a1ea574c-438b-443c-ad87-d31d09d581f0",
"identifier": [
{
"system": {
"value": "urn:oid:2.16.724.4.9.20.91-INVENTAT"
},
"value": {
"value": "08096"
}
},
{
"system": {
"value": "urn:oid:2.16.724.4.9.20.2-INVENTAT"
},
"value": {
"value": "INF04"
}
},
{
"system": {
"value": "urn:oid:INVENTAT"
},
"value": {
"value": "5635132"
}
}
]
}
]
I need to filter .identifier where system.value="urn:oid:2.16.724.4.9.20.91-INVENTAT" or system.value="urn:oid:2.16.724.4.9.20.2-INVENTAT" and pick .value.value.
Desired output:
[
{
"id": "3e67b455-8cdb-4bc0-a5e1-f90253870fc9",
"oid1": "04374",
"oid2": "INFP3"
},
{
"id": "0f22e5ff-70bc-457f-bdaf-7afe86d478de",
"oid1": "04376",
"oid2": "INF07"
},
{
"id": "a1ea574c-438b-443c-ad87-d31d09d581f0",
"oid1": "08096",
"oid2": "INF04"
}
]
I've tried:
map(
{
id,
oid1: select(.identifier?[]?.system.value == "urn:oid:2.16.724.4.9.20.91-INVENTAT") | .identifier[].value.value,
oid2: select(.identifier?[]?.system.value == "urn:oid:2.16.724.4.9.20.2-INVENTAT") | .identifier[].value.value
}
)
But output is not what I need: you can find it on this jqplay.
Any ideas?
This uses IN to check for your query strings, and with_entries on an array to generate the indeces for the oid keys.
jq '
map({id} + (.identifier | map(select(IN(.system.value;
"urn:oid:2.16.724.4.9.20.91-INVENTAT",
"urn:oid:2.16.724.4.9.20.2-INVENTAT"
)).value.value) | with_entries(.key |= "oid\(. + 1)")))
'
[
{
"id": "3e67b455-8cdb-4bc0-a5e1-f90253870fc9",
"oid1": "04374",
"oid2": "INFP3"
},
{
"id": "0f22e5ff-70bc-457f-bdaf-7afe86d478de",
"oid1": "04376",
"oid2": "INF07"
},
{
"id": "a1ea574c-438b-443c-ad87-d31d09d581f0",
"oid1": "08096",
"oid2": "INF04"
}
]
Demo
Here is a ruby to do that:
ruby -r json -e '
def walk(x, filt)
rtr=[]
rep=["uab", "ub"]
x.each{|e|
rd={"id"=>e["id"]}.merge(
e["identifier"].
filter{|ea| filt.include?(ea["system"]["value"])}.
map.with_index(1){|di, i| ["#{rep[i%2]}", "#{di["value"]["value"]}"]}.to_h)
rtr << rd
}
rtr
end
data=JSON.parse($<.read)
puts walk(data, ["urn:oid:2.16.724.4.9.20.91-INVENTAT", "urn:oid:2.16.724.4.9.20.2-INVENTAT"]).to_json
' file
Prints:
[{"id":"3e67b455-8cdb-4bc0-a5e1-f90253870fc9","ub":"04374","uab":"INFP3"},{"id":"0f22e5ff-70bc-457f-bdaf-7afe86d478de","ub":"04376","uab":"INF07"},{"id":"a1ea574c-438b-443c-ad87-d31d09d581f0","ub":"08096","uab":"INF04"}]

How to convert JSON data to tidy format in R

I never have worked with json data in R and unfortunately, I was sent a sample of data as:
{
"task_id": "104",
"status": "succeeded",
"metrics": {
"requests_made": 2,
"network_errors": 0,
"unique_locations_visited": 0,
"requests_queued": 0,
"queue_items_completed": 2,
"queue_items_waiting": 0,
"issue_events": 9,
"caption": "",
"progress": 100
},
"message": "",
"issue_events": [
{
"id": "1234",
"type": "issue_found",
"issue": {
"name": "policy not enforced",
"type_index": 123456789,
"serial_number": "123456789183923712",
"origin": "https://test.com",
"path": "/robots.txt",
"severity": "low",
"confidence": "certain",
"caption": "/robots.txt",
"evidence": [
{
"type": "FirstOrderEvidence",
"detail": {
"band_flags": [
"in_band"
]
},
"request_response": {
"url": "https://test.com/robots.txt",
"request": [
{
"type": "DataSegment",
"data": "jaghsdjgasdgaskjdgasdgashdgsahdgasjkdgh==",
"length": 313
}
],
"response": [
{
"type": "DataSegment",
"data": "asudasjdgasaaasgdasgaksjdhgasjdgkjghKGKGgKJgKJgKJGKgh==",
"length": 303
}
],
"was_redirect_followed": false,
"request_time": "1234567890"
}
}
],
"internal_data": "jdfhgjhJHkjhdskfhkjhjs0sajkdfhKHKhkj=="
}
},
{
"id": "1235",
"type": "issue_found",
"issue": {
"name": "certificate",
"type_index": 12345845684,
"serial_number": "123456789165637150",
"origin": "https://test.com",
"path": "/",
"severity": "info",
"confidence": "certain",
"description": "The server description a valid, trusted certificate. This issue is purely informational.<br><br>The server presented the following certificates:<br><br><h4>Server certificate</h4><table><tr><td><b>Issued to:</b> </td><td>test.ie, test.com, www.test.com, www.test.ie</td></tr><tr><td><b>Issued by:</b> </td><td>GeoTrust EV RSA CA 2018</td></tr><tr><td><b>Valid from:</b> </td><td>Tue May 12 00:00:00 UTC 2020</td></tr><tr><td><b>Valid to:</b> </td><td>Tue May 17 12:00:00 UTC 2022</td></tr></table><h4>Certificate chain #1</h4><table><tr><td><b>Issued to:</b> </td><td>GeoTrust EV RSA CA 2018</td></tr><tr><td><b>Issued by:</b> </td><td> High Assurance EV Root CA</td></tr><tr><td><b>Valid from:</b> </td><td>Mon Nov 06 12:22:46 UTC 2017</td></tr><tr><td><b>Valid to:</b> </td><td>Sat Nov 06 12:22:46 UTC 2027</td></tr></table><h4>Certificate chain #2</h4><table><tr><td><b>Issued to:</b> </td><td> High Assurance EV Root CA</td></tr><tr><td><b>Issued by:</b> </td><td> High Assurance EV Root CA</td></tr><tr><td><b>Valid from:</b> </td><td>Fri Nov 10 00:00:00 UTC 2006</td></tr><tr><td><b>Valid to:</b> </td><td>Mon Nov 10 00:00:00 UTC 2031</td></tr></table>",
"caption": "/",
"evidence": [],
"internal_data": "sjhdgsajdggJGJHgjfgjhGJHgjhsdgfgjhGJHGjhsdgfjhsgfdsjfg098867hjhgJHGJHG=="
}
},
{
"id": "1236",
"type": "issue_found",
"issue": {
"name": "without flag set",
"type_index": 1254392,
"serial_number": "12345678965616",
"origin": "https://test.com",
"path": "/robots.txt",
"severity": "info",
"confidence": "certain",
"description": "my description text here....",
"caption": "/robots.txt",
"evidence": [
{
"type": "InformationListEvidence",
"request_response": {
"url": "https://test.com/robots.txt",
"request": [
{
"type": "DataSegment",
"data": "adjkhajksdhaskjdhkjHKJHjkhaskjdhkjasdhKHKJHkjsdhfkjsdhfkjsdhKHJKHjksdfhsdjkfhksdjhKHKJHJKhsdkfjhsdkfjhKHJKHjksdkfjhsdkjfhKHKJHjkhsdkfjhsdkjfhsdjkfhksdjfhKJHKjksdhfsdjkfhksdjfhsdkjhKHJKhsdkfhsdkjfhsdkfhdskjhKHKjhsdfkjhsdjkfh==",
"length": 313
}
],
"response": [
{
"type": "DataSegment",
"data": "adjkhajksdhaskjdhkjHKJHjkhaskjdhkjasdhKHKJHkjsdhfkjsdhfkjsdhKHJKHjksdfhsdjkfhksdjhKHKJHJKhsdkfjhsdkfjhKHJKHjksdkfjhsdkjfhKHKJHjkhsdkfjhsdkjfhsdjkfhksdjfhKJHKjksdhfsdjkfhksdjfhsdkjhKHJKhsdkfhsdkjfhsdkfhdskjhKHKjhsdfkjhsdjkfh=",
"length": 161
},
{
"type": "HighlightSegment",
"data": "adjkhajksdhaskjdhkjHKJHjkhaskjdhkjasdhKHKJHkjsdhfkjsdhfkjsdhKHJKHjksdfhsdjkfhksdjhKHKJHJKhsdkfjhsdkfjhKHJKHjksdkfjhsdkjfhKHKJHjkhsdkfjhsdkjfhsdjkfhksdjfhKJHKjksdhfsdjkfhksdjfhsdkjhKHJKhsdkfhsdkjfhsdkfhdskjhKHKjhsdf=",
"length": 119
},
{
"type": "DataSegment",
"data": "AasjkdhasjkhkjHKJSDHFJKSDFHKhjkHSKADJFHKhjkhjkh=",
"length": 23
}
],
"was_redirect_followed": false,
"request_time": "178454751191465"
},
"information_items": [
"Other: user_id"
]
}
],
"internal_data": "adjkhajksdhaskjdhkjHKJHjkhaskjdhkjasdhKHKJHkjsdhfkjsdhfkjsdhKHJKHjksdfhsdjkfhksdjhKHKJHJKhsdkfjhsdkfjhKHJKHjksdkfjhsdkjfhKHKJHjkhsdkfjhsdkjfhsdjkfhksdjfhKJHKjksdhfsdjkfhksdjfhsdkjhKHJKhsdkfhsdkjfhsdkfhdskjhKH=="
}
},
{
"id": "1237",
"type": "issue_found",
"issue": {
"name": "without flag set",
"type_index": 1234567,
"serial_number": "123456789056704",
"origin": "https://test.com",
"path": "/",
"severity": "info",
"confidence": "certain",
"description": "long description here zjkhasdjkh hsajkdhsajkd hasjkdhbsjkdash d",
"caption": "/",
"evidence": [
{
"type": "InformationListEvidence",
"request_response": {
"url": "https://test.com/",
"request": [
{
"type": "DataSegment",
"data": "adjkhajksdhaskjdhkjHKJHjkhaskjdhkjasdhKHKJHkjsdhfkjsdhfkjsdhKHJKHjksdfhsdjkfhksdjhKHKJHJKhsdkfjhsdkfjhKHJKHjksdkfjhsdkjfhKHKJHjkhsdkfjhsdkjfhsdjkfhksdjfhKJHKjksdhfsdjkfhksdjfhsdkjhKHJKhsdkfhsdkjfhsdkfhdskjhKHKjhsdfkjhsdjkfhsfdsfdsfdsfdsfdsfsdfdsf",
"length": 303
}
],
"response": [
{
"type": "DataSegment",
"data": "adjkhajksdhaskjdhkjHKJHjkhaskjdhkjasdhKHKJHkjsdhfkjsdhfkjsdhKHJKHjksdfhsdjkfhksdjhKHKJHJKhsdkfjhsdkfjhKHJKHjksdkfjhsdkjfhKHKJHjkhsdkfjhsdkjfhsdjkfhksdjfhKJHKjksdhfsdjkfhksdjfhsdkjhKHJKhsdkfhsdkjfhsdkfhdskjhKHKjhsdfkjhsdjkfh==",
"length": 151
},
{
"type": "HighlightSegment",
"data": "adjkhajksdhaskjdhkjHKJHjkhaskjdhkjasdhKHKJHkjsdhfkjsdhfkjsdhKHJKHjksdfhsdjkfhksdjhKHKJHJKhsdkfjhsdkfjhKHJKHjksdkfjhsdkjfhKHKJHjkhsdkfjhsdkjfhsdjkfhksdjfhKJHKjksdhfsdjkfhksdjfhsdkjhKHJKhsdkfhsdkjfhsdkfhdskjhKHKjhsdfkjhsdjkfh=",
"length": 119
},
{
"type": "DataSegment",
"data": "sdfdsfsdfSDFSDFdSFDS546SDFSDFDSFG657=",
"length": 23
}
],
"was_redirect_followed": false,
"request_time": "123541191466"
},
"information_items": [
"Other: user_id"
]
}
],
"internal_data": "adjkhajksdhaskjdhkjHKJHjkhaskjdhkjasdhKHKJHkjsdhfkjsdhfkjsdhKHJKHjksdfhsdjkfhksdjhKHKJHJKhsdkfjhsdkfjhKHJKHjksdkfjhsdkjfhKHKJHjkhsdkfjhsdkjfhsdjkfhksdjfhKJHKjksdhfsdjkfhksdjfhsd=="
}
},
{
"id": "1238",
"type": "issue_found",
"issue": {
"name": "parameter pollution",
"type_index": 4137000,
"serial_number": "123456789810290176",
"origin": "https://test.com",
"path": "/robots.txt",
"severity": "low",
"confidence": "firm",
"description": "very long description text here...",
"caption": "/robots.txt [URL path filename]",
"evidence": [
{
"type": "FirstOrderEvidence",
"detail": {
"payload": {
"bytes": "Q3jkeiZkcmg8MQ==",
"flags": 0
},
"band_flags": [
"in_band"
]
},
"request_response": {
"url": "https://test.com/%3fhdz%26drh%3d1",
"request": [
{
"type": "DataSegment",
"data": "W1QOIC8=",
"length": 5
},
{
"type": "HighlightSegment",
"data": "WRMnBGR6JTI2ZHJoJTNkMQ==",
"length": 16
},
{
"type": "DataSegment",
"data": "adjkhajksdhaskjdhkjHKJHjkhaskjdhkjasdhKHKJHkjsdhfkjsdhfkjsdhKHJKHjksdfhsdjkfhksdjhKHKJHJKhsdkfjhsdkfjhKHJKHjksdkfjhsdkjfhKHKJHjkhsdkfjhsdkjfhsdjkfhksdjfhKJHKjksdhfsdjkfhksdjfhsdkjhKHJKhsdkfhsdkjfhsdkfhdskjhKHKjhsdfkjhsdjkfhcvxxcvklxcvjkxclvjxclkvjxcklvjlxckjvlxckjvklxcjvxcklvjxcklvjxckljvlxckjvxcklvjxckljvxcklvjcklxjvcxkl==",
"length": 298
}
],
"response": [
{
"type": "DataSegment",
"data": "adjkhajksdhaskjdhkjHKJHjkhaskjdhkjasdhKHKJHkjsdhfkjsdhfkjsdhKHJKHjksdfhsdjkfhksdjhKHKJHJKhsdkfjhsdkfjhKHJKHjksdkfjhsdkjfhKHKJHjkhsdkfjhsdkjfhsdjkfhksdjfhKJHKjksdhfsdjkfhksdjfhsdkjhKHJKhsdkfhsdkjfhsdkfhdskjhKHKjhsdfkjhsdjkfh==",
"length": 130
},
{
"type": "HighlightSegment",
"data": "Q4jleiZkcmg9MQ==",
"length": 10
},
{
"type": "DataSegment",
"data": "adjkhajksdhaskjdhkjHKJHjkhaskjdhkjasdhKHKJHkjsdhfkjsdhfkjsdhKHJKHjksdfhsdjkfhksdjhKHKJHJKhsdkfjhsdkfjhKHJKHjksdkfjhsdkjfhKHKJHjkhsdkfjhsdkjfhsdjkfhksdjfhKJHKjksdhfsdjkfhksdjfhsdkjhKHJKhsdkfhsdkjfhsdkfhdskjhKHKjhsdfkjhsdjkfh==",
"length": 163
}
],
"was_redirect_followed": false,
"request_time": "51"
}
}
],
"internal_data": "adjkhajksdhaskjdhkjHKJHjkhaskjdhkjasdhKHKJHkjsdhfkjsdhfkjsdhKHJKHjksdfhsdjkfhksdjhKHKJHJKhsdkfjhsdkfjhKHJKHjksdkfjhsdkjfhKHKJHjkhsdkfjhsdkjfhsdjkfhksdjfhKJHKjksdhfsdjkfhksdjfhsdkjhKHJKhsdkfhsdkjfhsdkfhdskjhKHKjhsdfkjhsdjkfh="
}
}
],
"event_logs": [],
"audit_items": []
}
I read it in R using jsonlite:
df_orig <- fromJSON('dast_sample_output.json', flatten= T)
This gives a nested list type R object. I wish to convert this list to a data frame in a tidy format with all the arrays and sub arrays being unnested.
If you run the str(df_orig), you could see the nested data frames in there.
How do I convert it to tidy format?
I tried unnest(), purrr but struggling to get into the tidy format for analysis? Any pointers would be highly appreciated.
Cheers,
use the jsonlite package function fromJSON()
edit:
set option flatten=T
edit2:
use content( x, 'text') before flattening
here is a full example converting to data.table:
get.json <- GET( apicall.text )
get.json.text <- content( get.json , 'text')
get.json.flat <- fromJSON( get.json.text , flatten = T)
dt <- as.data.table( get.json.flat )

MongoDB - Document Structure to create matrix from multiple value pairs

I am new to NoSQL and MongoDB, so please don't bash. I have used SQL databases in the past, but am now looking to leverage the scalability of NoSQL. One application that comes to mind is the collection of experimental results, where they are serialized in some manner with a start date, end date, part number, serial number, etc. Along with each experiment, there are many "measurements" collected, but the list of measurements may be unique in each experiment.
I am looking for ideas in how to structure the document to achieve the follow tasks:
1) Query based on date ranges, part numbers, serial numbers
2) See resulting table in a "spreadsheet" table
3) Perform statistical calculats, perhaps with R, on the different "measurements"
An example might look like:
[
{
"_id": {
"$oid": "5e680d6063cb144f9d1be261"
},
"StartDate": {
"$date": {
"$numberLong": "1583841600000"
}
},
"EndDate": {
"$date": {
"$numberLong": "1583842007000"
}
},
"PartNumber": "1Z45NP7X",
"SerialNumber": "U84A3102",
"Status": "Acceptable",
"Results": [
{
"Sensor": "Pressure",
"Value": "14.68453",
"Units": "PSIA",
"Flag": "1"
},
{
"Sensor": "Temperature",
"Value": {
"$numberDouble": "68.43"
},
"Units": "DegF",
"Flag": {
"$numberInt": "1"
}
},
{
"Sensor": "Velocity",
"Value": {
"$numberDouble": "12.4"
},
"Units": "ft/s",
"Flag": {
"$numberInt": "1"
}
}
]
},
{
"_id": {
"$oid": "5e68114763cb144f9d1be263"
},
"StartDate": {
"$date": {
"$numberLong": "1583842033000"
}
},
"EndDate": {
"$date": {
"$numberLong": "1583842434000"
}
},
"PartNumber": "1Z45NP7X",
"SerialNumber": "U84A3103",
"Status": "Acceptable",
"Results": [
{
"Sensor": "Pressure",
"Value": "14.70153",
"Units": "PSIA",
"Flag": "1"
},
{
"Sensor": "Temperature",
"Value": {
"$numberDouble": "68.55"
},
"Units": "DegF",
"Flag": {
"$numberInt": "1"
}
},
{
"Sensor": "Velocity",
"Value": {
"$numberDouble": "12.7"
},
"Units": "ft/s",
"Flag": {
"$numberInt": "1"
}
}
]
},
{
"_id": {
"$oid": "5e68115f63cb144f9d1be264"
},
"StartDate": {
"$date": {
"$numberLong": "1583842464000"
}
},
"EndDate": {
"$date": {
"$numberLong": "1583842434000"
}
},
"PartNumber": "1Z45NP7X",
"SerialNumber": "U84A3104",
"Status": "Acceptable",
"Results": [
{
"Sensor": "Pressure",
"Value": "14.59243",
"Units": "PSIA",
"Flag": "1"
},
{
"Sensor": "Weight",
"Value": {
"$numberDouble": "67.93"
},
"Units": "lbf",
"Flag": {
"$numberInt": "1"
}
},
{
"Sensor": "Torque",
"Value": {
"$numberDouble": "122.33"
},
"Units": "ft-lbf",
"Flag": {
"$numberInt": "1"
}
}
]
}
]
Another approach might be:
[
{
"_id": {
"$oid": "5e680d6063cb144f9d1be261"
},
"StartDate": {
"$date": {
"$numberLong": "1583841600000"
}
},
"EndDate": {
"$date": {
"$numberLong": "1583842007000"
}
},
"PartNumber": "1Z45NP7X",
"SerialNumber": "U84A3102",
"Status": "Acceptable",
"Pressure (PSIA)" : "14.68453",
"Pressure - Flag": "1",
"Temperature (degF)": "68.43",
"Temperature - Flag": "1",
"Velocity (ft/s)": "12.4",
"Velocity Flag": "1"
},
{
"_id": {
"$oid": "5e68114763cb144f9d1be263"
},
"StartDate": {
"$date": {
"$numberLong": "1583842033000"
}
},
"EndDate": {
"$date": {
"$numberLong": "1583842434000"
}
},
"PartNumber": "1Z45NP7X",
"SerialNumber": "U84A3103",
"Status": "Acceptable",
"Pressure (PSIA)" : "14.70153",
"Pressure - Flag": "1",
"Temperature (degF)": "68.55",
"Temperature - Flag": "1",
"Velocity (ft/s)": "12.7",
"Velocity Flag": "1"
},
{
"_id": {
"$oid": "5e68115f63cb144f9d1be264"
},
"StartDate": {
"$date": {
"$numberLong": "1583842464000"
}
},
"EndDate": {
"$date": {
"$numberLong": "1583842434000"
}
},
"PartNumber": "1Z45NP7X",
"SerialNumber": "U84A3104",
"Status": "Acceptable",
"Pressure (PSIA)" : "14.59243",
"Pressure - Flag": "1",
"Weight (lbf)": "67.93",
"Weight - Flag": "1",
"Torque (ft-lbf)": "122.33",
"Torque - Flag": : "1"
}
]
An example table might look like (probably with correct spacing):
StartDate EndDate PartNumber SerialNumber Pressure 'Pressure - Flag' Temperature 'Temperature - Flag' Velocity 'Velocity - Flag' Torque 'Torque - Flag' Weight 'Weight - Flag'
2020-03-10T12:00:00Z 2020-03-10T12:06:47Z 1Z45NP7X U84A3102 14.68453 1 68.43 1 12.4 1 N/A N/A N/A
N/A
2020-03-10T12:07:13Z 2020-03-10T12:13:54Z 1Z45NP7X U84A3103 14.70153 1 68.55 1 12.7 1 N/A N/A N/A
N/A
2020-03-10T12:07:13Z 2020-03-10T12:13:54Z 1Z45NP7X U84A3104 14.59243 1 N/A N/A N/A N/A 67.93 1 122.33
1
Any thoughts on the best structure? In reality, there might be 200+ "sensor values".
Thanks,
DG

Different results between date histogram and date range on Elastic Search

I would like to analyse my logs data with Elastic Search/Kibana and count unique customer by month.
Results are different when I use a date histogram aggregation and date range aggregation.
Here is the date histogram query :
"query": {
"query_string": {
"query": "_type:logs AND created_at:[2015-04-01 TO now]",
"analyze_wildcard": true
}
},
"size": 0,
"aggs": {
"2": {
"date_histogram": {
"field": "created_at",
"interval": "1M",
"min_doc_count": 1
},
"aggs": {
"1": {
"cardinality": {
"field": "customer.id"
}
}
}
}
}
And results :
"aggregations": {
"2": {
"buckets": [
{
"1": {
"value": 595805
},
"key_as_string": "2015-04-01T00:00:00.000Z",
"key": 1427839200000,
"doc_count": 6410438
},
{
"1": {
"value": 647788
},
"key_as_string": "2015-05-01T00:00:00.000Z",
"key": 1430431200000,
"doc_count": 6669555
},...
Here is the date range query :
"query": {
"query_string": {
"query": "_type:logs AND created_at:[2015-04-01 TO now]",
"analyze_wildcard": true
}
},
"size": 0,
"aggs": {
"2": {
"date_range": {
"field": "created_at",
"ranges": [
{
"from": "2015-04-01",
"to": "2015-05-01"
},
{
"from": "2015-05-01",
"to": "2015-06-01"
}
]
},
"aggs": {
"1": {
"cardinality": {
"field": "customer.id"
}
}
}
}
}
And the response :
"aggregations": {
"2": {
"buckets": [
{
"1": {
"value": 592179
},
"key": "2015-04-01T00:00:00.000Z-2015-05-01T00:00:00.000Z",
"from": 1427846400000,
"from_as_string": "2015-04-01T00:00:00.000Z",
"to": 1430438400000,
"to_as_string": "2015-05-01T00:00:00.000Z",
"doc_count": 6411884
},
{
"1": {
"value": 616995
},
"key": "2015-05-01T00:00:00.000Z-2015-06-01T00:00:00.000Z",
"from": 1430438400000,
"from_as_string": "2015-05-01T00:00:00.000Z",
"to": 1433116800000,
"to_as_string": "2015-06-01T00:00:00.000Z",
"doc_count": 6668060
}
]
}
}
In the first case, I have 595,805 for April and 647,788 for May
In the second case, I have 592,179 for April and 616,995 for May
Someone could explain me why I have these differences between these use cases ?
Thank you
I update my first post to add another example
I add another example with fewer data (on 1 day) but with the same issue. Here is the first request with date histogram :
{
"size": 0,
"query": {
"query_string": {
"query": "_type:logs AND logs.created_at:[2015-04-01 TO 2015-04-01]",
"analyze_wildcard": true
}
},
"aggs": {
"2": {
"date_histogram": {
"field": "created_at",
"interval": "1h",
"pre_zone": "00:00",
"pre_zone_adjust_large_interval": true,
"min_doc_count": 1
},
"aggs": {
"1": {
"cardinality": {
"field": "customer.id"
}
}
}
}
}
}
And we can see 660 unique count with 1717 doc count for the first hour :
{
"hits":{
"total":203961,
"max_score":0,
"hits":[
]
},
"aggregations":{
"2":{
"buckets":[
{
"1":{
"value":660
},
"key_as_string":"2015-04-01T00:00:00.000Z",
"key":1427846400000,
"doc_count":1717
},
{
"1":{
"value":324
},
"key_as_string":"2015-04-01T01:00:00.000Z",
"key":1427850000000,
"doc_count":776
},
{
"1":{
"value":190
},
"key_as_string":"2015-04-01T02:00:00.000Z",
"key":1427853600000,
"doc_count":481
}
]
}
}
}
But on the second request with the date range :
{
"size": 0,
"query": {
"query_string": {
"query": "_type:logs AND logs.created_at:[2015-04-01 TO 2015-04-01]",
"analyze_wildcard": true
}
},
"aggs": {
"2": {
"date_range": {
"field": "created_at",
"ranges": [
{
"from": "2015-04-01T00:00:00",
"to": "2015-04-01T01:00:00"
},
{
"from": "2015-04-01T01:00:00",
"to": "2015-04-01T02:00:00"
}
]
},
"aggs": {
"1": {
"cardinality": {
"field": "customer.id"
}
}
}
}
}
}
We can see only 633 unique count with 1717 doc count :
{
"hits":{
"total":203961,
"max_score":0,
"hits":[
]
},
"aggregations":{
"2":{
"buckets":[
{
"1":{
"value":633
},
"key":"2015-04-01T00:00:00.000Z-2015-04-01T01:00:00.000Z",
"from":1427846400000,
"from_as_string":"2015-04-01T00:00:00.000Z",
"to":1427850000000,
"to_as_string":"2015-04-01T01:00:00.000Z",
"doc_count":1717
},
{
"1":{
"value":328
},
"key":"2015-04-01T01:00:00.000Z-2015-04-01T02:00:00.000Z",
"from":1427850000000,
"from_as_string":"2015-04-01T01:00:00.000Z",
"to":1427853600000,
"to_as_string":"2015-04-01T02:00:00.000Z",
"doc_count":776
}
]
}
}
}
Please someone could tell me why ? Thank you
When using the date_histogram aggregation you need to take into account the timezone, which date_range doesn't as it's always using the GMT timezone.
If you look at the long millisecond values in your results, you'll see the following:
For your date histogram, from: 1427839200000 is actually equal to 2015-03-31T22:00:00.000Z which differs from the key_as_string value (i.e. 2015-04-01T00:00:00.000Z) that is formatted according to the GMT timezone.
In your first aggregation, try explicitly specifying the time_zone parameter to be your current timezone (apparently GMT+2) and you should get the same results:
"date_histogram": {
"field": "created_at",
"interval": "1M",
"min_doc_count": 1,
"time_zone": -2
},

Resources