How to merge dynamic number data with condition? - azure-data-explorer

Let take the following dataset
timestamp_bin
deviceId
flow
level
pressure
2020-05-15T00:00:00Z
fddf1cec-16db-4461-9057-3d08e46b6bcf
NaN
55
NaN
2020-05-15T00:00:00Z
aaaaaaaa-fed4-c23b-422b-e85e0877c092
365
85
NaN
2020-05-15T00:00:00Z
cb04ccff-48bc-4108-9d16-7d7db9152895
NaN
NaN
130
I would like to merge the flow, level and pressure column without the NaN values and without having to mention the flow, level and pressure column:
deviceId
timestamp
value
fddf1cec-16db-4461-9057-3d08e46b6bcf
2020-05-15 17:01:35.7750000
{"level": 55.0}
aaaaaaaa-fed4-c23b-422b-e85e0877c092
2020-05-15 17:01:35.7750000
{"flow": 365.0, "level": 85.0}
cb04ccff-48bc-4108-9d16-7d7db9152895
2020-05-15 17:01:35.7750000
{"pressure": 130.0}
The following query allow to achieve this result
let deviceTelemetry = datatable (deviceId: guid, timestamp: datetime, flow: real, level: real, pressure: real)
[
'fddf1cec-16db-4461-9057-3d08e46b6bcf', '2020-05-15 17:01:35.7750000', real(NaN), 55, real(NaN),
'aaaaaaaa-fed4-c23b-422b-e85e0877c092', '2020-05-15 17:01:35.7750000', 365, 85, real(NaN),
'cb04ccff-48bc-4108-9d16-7d7db9152895', '2020-05-15 17:01:35.7750000', real(NaN), real(NaN), 130,
];
deviceTelemetry
| extend packAll=pack_all()
| extend packWithNan=bag_remove_keys(packAll, dynamic(['deviceId', 'timestamp']))
| project-away packAll
| mv-expand kind=array packWithNan
| where packWithNan[1] != 'NaN'
| extend packWithoutNan=pack(tostring(packWithNan[0]), packWithNan[1])
| summarize value=make_bag(packWithoutNan) by deviceId, timestamp
Is there a better way to achieve this query ?
Edit:
This discussion is a continuation of this previous thread

... and just for the fun of it, here is an extension of the previous solution:
let deviceTelemetry = datatable (deviceId:guid, timestamp:datetime, value:dynamic)[
'fddf1cec-16db-4461-9057-3d08e46b6bcf','2020-05-15 17:01:35.7750000', dynamic({ "level": 60}),
'fddf1cec-16db-4461-9057-3d08e46b6bcf','2020-05-15 18:01:35.7750000', dynamic({ "level": 50}),
'aaaaaaaa-fed4-c23b-422b-e85e0877c092','2020-05-15 17:01:35.7750000', dynamic({ "level": 100, "flow": 350}),
'aaaaaaaa-fed4-c23b-422b-e85e0877c092','2020-05-15 18:01:35.7750000', dynamic({ "level": 90, "flow": 360}),
'aaaaaaaa-fed4-c23b-422b-e85e0877c092','2020-05-15 19:01:35.7750000', dynamic({ "level": 80, "flow": 370}),
'aaaaaaaa-fed4-c23b-422b-e85e0877c092','2020-05-15 20:01:35.7750000', dynamic({ "level": 70, "flow": 380}),
'cb04ccff-48bc-4108-9d16-7d7db9152895','2020-05-15 21:01:35.7750000', dynamic({ "pressure": 120}),
'cb04ccff-48bc-4108-9d16-7d7db9152895','2020-05-15 20:01:35.7750000', dynamic({ "pressure": 130}),
'cb04ccff-48bc-4108-9d16-7d7db9152895','2020-05-15 21:01:35.7750000', dynamic({ "pressure": 140}),
];
deviceTelemetry
| mv-expand kind=array value
| extend k = tostring(value[0]), v = toreal(value[1])
| extend timestamp_bin = bin(timestamp , 1d)
| evaluate pivot(k, avg(v), timestamp_bin, deviceId)
// add the following 2 lines of code
| mv-apply with_itemindex=i pa = pack_all() on (summarize make_bag_if(pa, i >= 2 and tostring(pa) !has_cs "NaN"))
| project timestamp_bin, deviceId, bag_pa
timestamp_bin
deviceId
bag_pa
2020-05-15T00:00:00Z
fddf1cec-16db-4461-9057-3d08e46b6bcf
{"level":55}
2020-05-15T00:00:00Z
aaaaaaaa-fed4-c23b-422b-e85e0877c092
{"flow":365,"level":85}
2020-05-15T00:00:00Z
cb04ccff-48bc-4108-9d16-7d7db9152895
{"pressure":130}
Fiddle

This is why it's important to share the whole scenario and not just a friction of it :-)
let deviceTelemetry = datatable (deviceId:guid, timestamp:datetime, value:dynamic)[
'fddf1cec-16db-4461-9057-3d08e46b6bcf','2020-05-15 17:01:35.7750000', dynamic({ "level": 60}),
'fddf1cec-16db-4461-9057-3d08e46b6bcf','2020-05-15 18:01:35.7750000', dynamic({ "level": 50}),
'aaaaaaaa-fed4-c23b-422b-e85e0877c092','2020-05-15 17:01:35.7750000', dynamic({ "level": 100, "flow": 350}),
'aaaaaaaa-fed4-c23b-422b-e85e0877c092','2020-05-15 18:01:35.7750000', dynamic({ "level": 90, "flow": 360}),
'aaaaaaaa-fed4-c23b-422b-e85e0877c092','2020-05-15 19:01:35.7750000', dynamic({ "level": 80, "flow": 370}),
'aaaaaaaa-fed4-c23b-422b-e85e0877c092','2020-05-15 20:01:35.7750000', dynamic({ "level": 70, "flow": 380}),
'cb04ccff-48bc-4108-9d16-7d7db9152895','2020-05-15 21:01:35.7750000', dynamic({ "pressure": 120}),
'cb04ccff-48bc-4108-9d16-7d7db9152895','2020-05-15 20:01:35.7750000', dynamic({ "pressure": 130}),
'cb04ccff-48bc-4108-9d16-7d7db9152895','2020-05-15 21:01:35.7750000', dynamic({ "pressure": 140}),
];
deviceTelemetry
| mv-expand kind=array value
| extend k = tostring(value[0]), v = toreal(value[1])
| summarize avg(v) by k, timestamp_bin = bin(timestamp , 1d), deviceId
| summarize make_bag(pack_dictionary(k, avg_v)) by timestamp_bin, deviceId
timestamp_bin
deviceId
bag_
2020-05-15T00:00:00Z
fddf1cec-16db-4461-9057-3d08e46b6bcf
{"level":55}
2020-05-15T00:00:00Z
aaaaaaaa-fed4-c23b-422b-e85e0877c092
{"level":85,"flow":365}
2020-05-15T00:00:00Z
cb04ccff-48bc-4108-9d16-7d7db9152895
{"pressure":130}
Fiddle

Related

MariaDB JSON_ARRAYAGG gives wrong result

I have 2 problems in MariaDB 15.1 when using JSON_ARRAYAGG
The brackets [] are omitted
Incorrect wrong result, values are duplicates or omitted
My database is the following:
user:
+----+------+
| id | name |
+----+------+
| 1 | Jhon |
| 2 | Bob |
+----+------+
car:
+----+---------+-------------+
| id | user_id | model |
+----+---------+-------------+
| 1 | 1 | Tesla |
| 2 | 1 | Ferrari |
| 3 | 2 | Lamborghini |
+----+---------+-------------+
phone:
+----+---------+----------+--------+
| id | user_id | company | number |
+----+---------+----------+--------+
| 1 | 1 | Verzion | 1 |
| 2 | 1 | AT&T | 2 |
| 3 | 1 | T-Mobile | 3 |
| 4 | 2 | Sprint | 4 |
| 5 | 1 | Sprint | 2 |
+----+---------+----------+--------+
1. The brackets [] are omitted
For example this query that gets users with their list of cars:
SELECT
user.id AS id,
user.name AS name,
JSON_ARRAYAGG(
JSON_OBJECT(
'id', car.id,
'model', car.model
)
) AS cars
FROM user
INNER JOIN car ON user.id = car.user_id
GROUP BY user.id;
Result: brackets [] were omitted in cars (JSON_ARRAYAGG has the behavior similar to GROUP_CONCAT)
+----+------+-----------------------------------------------------------+
| id | name | cars |
+----+------+-----------------------------------------------------------+
| 1 | Jhon | {"id": 1, "model": "Tesla"},{"id": 2, "model": "Ferrari"} |
| 2 | Bob | {"id": 3, "model": "Lamborghini"} |
+----+------+-----------------------------------------------------------+
However when adding the filter WHERE user.id = 1, the brackets [] are not omitted:
+----+------+-------------------------------------------------------------+
| id | name | cars |
+----+------+-------------------------------------------------------------+
| 1 | Jhon | [{"id": 1, "model": "Tesla"},{"id": 2, "model": "Ferrari"}] |
+----+------+-------------------------------------------------------------+
2. Incorrect wrong result, values are duplicates or omitted
This error is strange as the following conditions must be met:
Consult more than 2 tables
The DISTINCT option must be used
A user has at least 2 cars and at least 3 phones.
Duplicate values
for example, this query that gets users with their car list and their phone list:
SELECT
user.id AS id,
user.name AS name,
JSON_ARRAYAGG( DISTINCT
JSON_OBJECT(
'id', car.id,
'model', car.model
)
) AS cars,
JSON_ARRAYAGG( DISTINCT
JSON_OBJECT(
'id', phone.id,
'company', phone.company,
'number', phone.number
)
) AS phones
FROM user
INNER JOIN car ON user.id = car.user_id
INNER JOIN phone ON user.id = phone.user_id
GROUP BY user.id;
I will leave the output in json format and I will only leave the elements that interest.
Result: brackets [] were omitted and duplicate Verizon
{
"id": 1,
"name": "Jhon",
"phones": // [ Opening bracket expected
{
"id": 5,
"company": "Sprint",
"number": 2
},
{
"id": 1,
"company": "Verzion",
"number": 1
},
{
"id": 1,
"company": "Verzion",
"number": 1
}, // Duplicate object with the DISTINCT option
{
"id": 2,
"company": "AT&T",
"number": 2
},
{
"id": 3,
"company": "T-Mobile",
"number": 3
}
// ] Closing bracket expected
}
Omitted values
This error occurs when omit phone.id is omitted in the query
SELECT
user.id AS id,
user.name AS name,
JSON_ARRAYAGG( DISTINCT
JSON_OBJECT(
'id', car.id,
'model', car.model
)
) AS cars,
JSON_ARRAYAGG( DISTINCT
JSON_OBJECT(
--'id', phone.id,
'company', phone.company,
'number', phone.number
)
) AS phones
FROM user
INNER JOIN car ON user.id = car.user_id
INNER JOIN phone ON user.id = phone.user_id
GROUP BY user.id;
Result: brackets [] were omitted and Sprint was omitted.
Apparently this happens because it makes an OR type between the columns of the JSON_OBJECT, since the company exists in a different row and number in a other different row
{
"id": 1,
"name": "Jhon",
"phones": // [ Opening bracket expected
//{
// "company": "Sprint",
// "number": 2
//}, `Sprint` was omitted
{
"company": "Verzion",
"number": 1
},
{
"company": "AT&T",
"number": 2
},
{
"company": "T-Mobile",
"number": 3
}
// ] Closing bracket expected
}
GROUP_CONCAT instance of JSON_ARRAYAGG solves the problem of duplicate or omitted objects
However, by adding the filter WHERE user.id = 1, the brackets [] are not omitted and also the problem of duplicate or omitted objects is also solved:
{
"id": 1,
"name": "Jhon",
"phones": [
{
"id": 1,
"company": "Verzion",
"number": 1
},
{
"id": 2,
"company": "AT&T",
"number": 2
},
{
"id": 3,
"company": "T-Mobile",
"number": 3
},
{
"id": 5,
"company": "Sprint",
"number": 2
}
]
}
What am I doing wrong?
So far my solution is this, but I would like to use JSON_ARRAYAGG since the query is cleaner
-- 1
SELECT
user.id AS id,
user.name AS name,
CONCAT(
'[',
GROUP_CONCAT( DISTINCT
JSON_OBJECT(
'id', car.id,
'model', car.model
)
),
']'
) AS cars
FROM user
INNER JOIN car ON user.id = car.user_id
GROUP BY user.id;
-- 2
SELECT
user.id AS id,
user.name AS name,
CONCAT(
'[',
GROUP_CONCAT( DISTINCT
JSON_OBJECT(
'id', car.id,
'model', car.model
)
),
']'
) AS cars,
CONCAT(
'[',
GROUP_CONCAT( DISTINCT
JSON_OBJECT(
'id', phone.id,
'company', phone.company,
'number', phone.number
)
),
']'
) AS phones
FROM user
INNER JOIN car ON user.id = car.user_id
INNER JOIN phone ON user.id = phone.user_id
GROUP BY user.id;

How to do 2 summarize operation in one Kusto query?

I am stuck with a Kusto query.
This is what I want to do - I would like to show day wise sales amount with the previous month's sales amount on the same day.
datatable(DateStamp:datetime, OrderId:string, SalesAmount:int)
[
"02-01-2019", "I01", 100,
"02-01-2019", "I02", 200,
"02-02-2019", "I03", 250,
"02-02-2019", "I04", 150,
"02-03-2019", "I13", 110,
"01-01-2019", "I10", 20,
"01-02-2019", "I11", 50,
"01-02-2019", "I12", 30,
]
| extend SalesDate = format_datetime(DateStamp, 'MM/dd/yyyy')
| summarize AmountOfSales = sum(SalesAmount) by SalesDate
This is what I see.
And, instead this is what I want to show as result --
I couldn't figure out how to add multiple summarize operator in one query.
Here's an option:
datatable(DateStamp:datetime, OrderId:string, SalesAmount:int)
[
"02-01-2019", "I01", 100,
"02-01-2019", "I02", 200,
"02-02-2019", "I03", 250,
"02-02-2019", "I04", 150,
"02-03-2019", "I13", 110,
"01-01-2019", "I10", 20,
"01-02-2019", "I11", 50,
"01-02-2019", "I12", 30,
]
| summarize AmountOfSales = sum(SalesAmount) by bin(DateStamp, 1d)
| as hint.materialized = true T
| extend prev_month = datetime_add("Month", -1, DateStamp)
| join kind=leftouter T on $left.prev_month == $right.DateStamp
| project SalesDate = format_datetime(DateStamp, 'MM/dd/yyyy'), AmountOfSales, AmountOfSalesPrevMonth = coalesce(AmountOfSales1, 0)
SalesDate
AmountOfSales
AmountOfSalesPrevMonth
01/01/2019
20
0
01/02/2019
80
0
02/01/2019
300
20
02/02/2019
400
80
02/03/2019
110
0

Create column based on calculated % of failures

Here is the sample data and my query. How to calculate % of failures and create new column value based on the percentage of failures?
let M = datatable (Run_Date:datetime , Job_Name:string , Job_Status:string , Count:int )
["2020-10-21", "Job_A1", "Succeeded", 10,
"2020-10-21", "Job_A1", "Failed", 8,
"10/21/2020", "Job_B2", "Succeeded", 21,
"10/21/2020", "Job_C3", "Succeeded", 21,
"10/21/2020", "Job_D4", "Succeeded", 136,
"10/21/2020", "Job_E5", "Succeeded", 187,
"10/21/2020", "Job_E5", "Failed", 4
];
M
| summarize count() by Job_Name, Count, summary = strcat(Job_Name, " failed " , Count, " out of ", Count ," times.")
And the desired output is below.
you could try something like this:
datatable (Run_Date:datetime , Job_Name:string , Job_Status:string , Count:int )
[
"2020-10-21", "Job_A1", "Succeeded", 10,
"2020-10-21", "Job_A1", "Failed", 8,
"10/21/2020", "Job_B2", "Succeeded", 21,
"10/21/2020", "Job_C3", "Succeeded", 21,
"10/21/2020", "Job_D4", "Succeeded", 136,
"10/21/2020", "Job_E5", "Succeeded", 187,
"10/21/2020", "Job_E5", "Failed", 4
]
| summarize failures = sumif(Count, Job_Status == "Failed"), total = sum(Count) by Job_Name, Run_Date
| project Job_Name, failure_rate = round(100.0 * failures / total, 2), Summary = strcat(Job_Name, " failed " , failures, " out of ", total ," times.")
| extend ['Alert if failure rate > 40%'] = iff(failure_rate > 40, "Yes", "No")
--->
| Job_Name | failure_rate | Summary | Alert if failure rate > 40% |
|----------|--------------|-----------------------------------|-----------------------------|
| Job_A1 | 44.44 | Job_A1 failed 8 out of 18 times. | Yes |
| Job_B2 | 0 | Job_B2 failed 0 out of 21 times. | No |
| Job_C3 | 0 | Job_C3 failed 0 out of 21 times. | No |
| Job_D4 | 0 | Job_D4 failed 0 out of 136 times. | No |
| Job_E5 | 2.09 | Job_E5 failed 4 out of 191 times. | No |

Remove slash character in JSON response using jq

Docker Engine API returns container name with / appended
{
"Id": "8dfafdbc3a40",
"Names": [
"/boring_feynman"
],
"Image": "ubuntu:latest",
"ImageID": "d74508fb6632491cea586a1fd7d748dfc5274cd6fdfedee309ecdcbc2bf5cb82",
"Command": "echo 1",
"Created": 1367854155,
"State": "Exited",
"Status": "Exit 0",
"Ports": [{
"PrivatePort": 2222,
"PublicPort": 3333,
"Type": "tcp"
}],
"Labels": {
"com.example.vendor": "Acme",
"com.example.license": "GPL",
"com.example.version": "1.0"
},
"SizeRw": 12288,
"SizeRootFs": 0,
"HostConfig": {
"NetworkMode": "default"
},
"NetworkSettings": {
"Networks": {}
},
"Mounts": [{
"Name": "fac362...80535",
"Source": "/data",
"Destination": "/data",
"Driver": "local",
"Mode": "ro,Z",
"RW": false,
"Propagation": ""
}]
}
I want to remove the slash so the response can be used as a table in JQ:
jq -r '(["Names","Image"] | (., map(length*"-"))), (.[] | [.Names, .Image]) | #tsv'
Currently, when I run the above, I get:
jq: error (at <stdin>:1): array (["/boring_feynman"]) is not valid in a csv row
The problem is not because of / in the .Names field, but in your expression. For filters like #csv or #tsv to work, the values need to be in a scalar format and in an array. But your expression .Name in of type array.
So basically you are passing this result to the #tsv function
[
[
"/boring_feynman"
],
"ubuntu:latest"
]
instead of
[
"/boring_feynman",
"ubuntu:latest"
]
So modifying your filter, you can do below for the JSON in question.
jq -r '(["Names","Image"] | (., map(length*"-"))), ([.Names[], .Image]) | #tsv'
or if you still want to remove the /, use gsub() function
jq -r '(["Names","Image"] | (., map(length*"-"))), ([ (.Names[] | gsub("^/";"")), .Image]) | #tsv'

Jq getting output from nested hash

Hi I am trying to parse below hash with jq
{
"name": "a",
"data": [
{
"sensitive": false,
"type": "string",
"value": "mykeypair"
},
{
"sensitive": false,
"type": "int",
"value": 123
}
]
}
and get output like
a,string,mykeypair
a,int,123
I am able to get output like this
a,string,mykeypair
a,int,mykeypair
a,string,123
a,int,123
jq solution:
jq -r '.name as $n | .data[] | [$n, .type, .value] | #csv' file.json
The output:
"a","string","mykeypair"
"a","int",123
If it's mandatory to output unquoted values:
jq -r '.name as $n | .data[] | [$n, .type, "\(.value)"] | join(",")' file.json
The output:
a,string,mykeypair
a,int,123

Resources