CosmosDB, help flatten and filter by nested array - azure-cosmosdb

I'm trying to flatten and filter my json data that is in a CosmosDB.
The data looks like below and I would like to flatten everything in the array Variables and then filter by specific _id and Timestamp inside of the array:
{
"_id": 21032,
"FirstConnected": {
"$date": 1522835868346
},
"LastUpdated": {
"$date": 1523360279908
},
"Variables": [
{
"_id": 99999,
"Values": [
{
"Timestamp": {
"$date": 1522835868347
},
"Value": 1
}
]
},
{
"_id": 99998,
"Values": [
{
"Timestamp": {
"$date": 1523270312001
},
"Value": 8888
}
]
}
]
}

If you want to flatten data from the Variables array with properties from the root object you can query your collection like this:
SELECT root._id, root.FirstConnected, root.LastUpdated, var.Values
FROM root
JOIN var IN root.Variables
WHERE var._id = 99998
This will result into:
[
{
"_id": 21032,
"FirstConnected": {
"$date": 1522835868346
},
"LastUpdated": {
"$date": 1523360279908
},
"Values": [
{
"Timestamp": {
"$date": 1523270312001
},
"Value": 8888
}
]
}
]
If you want to even flatten the Values array you will need to write something like this:
SELECT root._id, root.FirstConnected, root.LastUpdated,
var.Values[0].Timestamp, var.Values[0]["Value"]
FROM root
JOIN var IN root.Variables
WHERE var._id = 99998
Note that CosmosDB considers "Value" as a reserved keyword and you need to use an escpape syntax. The result for this query is:
[
{
"_id": 21032,
"FirstConnected": {
"$date": 1522835868346
},
"LastUpdated": {
"$date": 1523360279908
},
"Timestamp": "1970-01-01T00:00:00Z",
"Value": 8888
}
]
Check for more details https://learn.microsoft.com/en-us/azure/cosmos-db/sql-api-sql-query#Advanced

If you're only looking for filtering by the nested '_id' property then you could use ARRAY_CONTAINS w/ the partial_match argument set to true. The query would look something like this:
SELECT VALUE c
FROM c
WHERE ARRAY_CONTAINS(c.Variables, {_id: 99998}, true)
If you also want to flatten the array, then you could use JOIN
SELECT VALUE v
FROM v IN c.Variables
WHERE v._id = 99998

Related

Firestore Pagination: how to define **unique** 'startAt'-cursor for REST?

This is a follow up question to an already solved one.
For this previous question an answer was given, how to define a cursor for query-pagination with 'startAt' for REST, that relates to a range of documents. In the example below, the cursor relates to all documents with an 'instructionNumber.stringValue' equal to "instr. 101". According to my testing, this results in skipping of documents.
New question:
How has the cursor to be defined, to not only relate to the stringValue of a field, that the query is ordered by? But instead to a distinct document (usually defined by its document-id)?
"structuredQuery": {
"from": [{"collectionId": "instructions"}],
"where": {
"fieldFilter": {
"field": {
"fieldPath": "belongsToDepartementID"
},
"op": "EQUAL",
"value": {
"stringValue": "toplevel-document-id"
}
}
},
"orderBy": [
{
"field": {
"fieldPath": "instructionNumber"
},
"direction": "ASCENDING"
}
],
"startAt": {
"values": [{
"stringValue": "instr. 101"
}]
},
"limit": 5
}
}
For better understanding, here is the condensed schema of the documents.
{
"document": {
"name": "projects/PROJECT_NAME/databases/(default)/documents/organizations/testManyInstructions/instructions/i104",
"fields":
"belongsToDepartementID": {
"stringValue": "toplevel-document-id"
},
"instructionNumber": {
"stringValue": "instr. 104"
},
"instructionTitle": {
"stringValue": "dummy Title104"
},
"instructionCurrentRevision": {
"stringValue": "A"
}
},
"createTime": "2022-02-18T13:55:47.300271Z",
"updateTime": "2022-02-18T13:55:47.300271Z"
}
}
For a query with no ordering:
"orderBy": [{
"direction": "ASCENDING",
"field": {"fieldPath": "__name__"}
}],
"startAt": {
"before": false,
"values": [{"referenceValue": "last/doc/ref"}]
}
For a query with ordering:
"orderBy": [
{
"direction": "DESCENDING",
"field": {"fieldPath": "instructionNumber"}
},
{
"direction": "DESCENDING",
"field": {"fieldPath": "__name__"}
}
],
"startAt":
{
"before": false,
"values": [
{"stringValue": "instr. 101"},
{"referenceValue": "last/doc/ref"}
]
}
Be sure to use the same direction for __name__ as the previous "orderBy" or it will need a composite index.
To ensure you have identify unique document for starting at you'll always want to include the document ID in your call to startAt.
I'm not sure of the exact syntax for the REST API, but the Firebase SDKs automatically pass this document ID when you call startAt with a DocumentSnapshot.

Not able to filter out required property in Azure TSI Gen1 Get Events API response

I am using the below request body to fetch only the required property values.
"searchSpan": {
"from": {
"dateTime": "2021-11-20T00:00:00.000Z"
},
"to": {
"dateTime": "2021-11-20T23:00:00.000Z"
}
},
"predicateString": "[Params.Name] = 'power'",
"take": 100
}
}
The URL is like below:
https://12345678a-bcde-3e91-blah-2292933292aa.env.timeseries.azure.com/events?api-version=2016-12-12
Despite specifying the required property the response returns all properties as if it has not seen the predicate string. What might I be doing wrong?
{
"warnings": [],
"events": [
{
"schema": {
"rid": 0,
"$esn": "my-event-hub",
"properties": [
{
"name": "mytimestamp",
"type": "DateTime"
},
{
"name": "Params.Name",
"type": "String"
},
{
"name": "Params.Value",
"type": "Double"
}
]
},
"$ts": "2021-11-20T10:01:50Z",
"values": [
"2021-11-20T10:01:50Z",
"energy",
60
]
},
{
"schemaRid": 0,
"$ts": "2021-11-20T10:01:50Z",
"values": [
"2021-11-20T10:01:50Z",
"power",
10
]
},
{
"schemaRid": 0,
"$ts": "2021-11-20T10:01:50Z",
"values": [
"2021-11-20T10:01:50Z",
"strength",
200
]
},
]
}
Edit
I'm getting "Properties count error" in the TSI overview page. This might quite be the root cause but I don't know for sure
"For Time Series Insights environment ABC: You have used all 641/600 properties in your environment".

How to query Cosmos DB to have an array from multiple items in the result set

I have the following content in a container, where device_id is the partition key.
[
{
"id": "hub-01",
"device_id": "device-01",
"created": "2020-12-08T17:47:35",
"cohort": "test"
},
{
"id": "hub-02",
"device_id": "device-01",
"created": "2020-12-08T17:47:36",
"cohort": "test"
},
{
"id": "hub-01",
"device_id": "device-02",
"created": "2020-11-17T20:25:20",
"cohort": "test"
},
{
"id": "hub-01",
"device_id": "device-03",
"created": "2020-11-17T16:05:18",
"cohort": "test"
}
]
How do I query all unique devices, with all their metadata collected into a sub-list, so I get the following result set:
[
{
"device_id": "device-01",
"hubs": [
{
"id": "hub-01",
"created": "2020-12-08T17:47:35",
"cohort": "test"
},
{
"id": "hub-02",
"created": "2020-12-08T17:47:36",
"cohort": "test"
}
]
},
{
"device_id": "device-02",
"hubs": [
{
"id": "hub-01",
"created": "2020-11-17T20:25:20",
"cohort": "test"
}
]
},
{
"device_id": "device-03",
"hubs": [
{
"id": "hub-01",
"created": "2020-11-17T16:05:18",
"cohort": "test"
}
]
}
]
I was experimenting along the lines of the following sub-query, but it does not behave as I would expect:
SELECT
DISTINCT c.device_id,
ARRAY(
SELECT
c2.id,
c2.created,
c2.cohort
FROM c AS c2
WHERE c2.device_id = c.device_id
) as hubs
FROM c
You can create UDF function to handle this.
Here is a similar question I answered from another post.
group data by same timestamp using cosmos db sql
I agree with Mo B. You need to deal with this on your client side. I don't think UDF function can handle this because UDF function can't combine multiple items to one. I think the closest SQL like this:
SELECT
c2.device_id,ARRAY_CONCAT([],c2.hubs)
FROM
(SELECT c.device_id,ARRAY(
SELECT
c.id,
c.created,
c.cohort
FROM c
) as hubs FROM c) as c2
GROUP BY c2.device_id
But ARRAY_CONCAT isn't Aggregate function and there is no Aggregate function can concat array.

Map to an array within an object within an array

I am still having trouble understanding how to use the map function. In this case my payload is a JSON object that contains an array of "orders" with each "order" being an object... How do I create a map that would let me get to the array of "ContactEmailAddresses"?
{
"orders": [
{
"OrderGroupNumber": 1,
"Requester": {
"Name": "Mickey Mouse"
},
"ContactEmailAddresses": [
"user1#abc.com",
"user2#abc.com"
],
"CreatedByEmailAddress": "user1#abc.com"
},
{
"OrderGroupNumber": 2,
"Requester": {
"Name": "Donald Duck"
},
"ContactEmailAddresses": [
"user3#abc.com",
"user4#abc.com"
],
"CreatedByEmailAddress": "user3#abc.com"
},
{
"OrderGroupNumber": 3,
"Requester": {
"Name": "Goofy"
},
"ContactEmailAddresses": [
"user5#abc.com",
"user6#abc.com"
]
}
]
}
My current attempt that doesn't work is:
payload.*orders map (order, index) ->
{
order.contactEmailAddresses
}
%dw 2.0
output application/json
---
payload.orders flatMap $.ContactEmailAddresses
Output:
[
"user1#abc.com",
"user2#abc.com",
"user3#abc.com",
"user4#abc.com",
"user5#abc.com",
"user6#abc.com"
]

Updating item in DynamoDB fails for the UpdateExpression syntax

My table data looks like below one
{
"id": {
"S": "alpha-rocket"
},
"images": {
"SS": [
"apple/value:50",
"Mango/aa:284_454_51.0.0",
"Mango/bb:291",
"Mango/cc:4"
]
},
"product": {
"S": "fruit"
}
}
Below is my code to update table. The variables I am passing to function has values product_id has alpha-rocket, image_val has 284_454_53.0.0 and image has Mango/aa:284_454_53.0.0.
I am trying to update value of Mango/aa from 284_454_51.0.0 to 284_454_53.0.0 but getting an error "The document path provided in the update expression is invalid for update"
def update_player_score(product_id, image_val, image):
dynamo = boto3.resource('dynamodb')
tbl = dynamo.Table('<TableName>')
result = tbl.update_item(
expression_attribute_names: {
"#image_name" => "image_name"
},
expression_attribute_values: {
":image_val" => image_val,
},
key: {
"product" => "fruit",
"id" => product_id,
},
return_values: "ALL_NEW",
table_name: "orcus",
update_expression: "SET images.#image_val = :image_val",
}
Is there a way to update the value of Mango/aa or replace full string "Mango/aa:284_454_51.0.0" to "Mango/aa:284_454_53.0.0"
You cannot update a string in a list by matching the string. If you know the index of it you can replace the value of the string by index:
SET images[1] = : image_val
It seems like maybe what you want is not a list of strings, but another map. So instead of your data looking like it does you'd make it look like this, which would allow you to do the update you're looking for:
{
"id": {
"S": "alpha-rocket"
},
"images": {
"M": {
"apple" : {
"M": {
"value": {
"S": "50"
}
},
"Mango" : {
"M": {
"aa": {
"S": "284_454_51.0.0"
},
"bb": {
"S": "291"
},
"cc": {
"S": "4"
}
}
}
},
"product": {
"S": "fruit"
}
}
I would also consider putting the different values in different "rows" in the table and using queries to build the objects.

Resources