Search in array by array of values in Azure Cosmos DB - azure-cosmosdb

Document contains field Tags with array of strings, e.g ["tag1", "tag2", "tag3"]. How can I get all documents both with tag1 and tag2?
I've tried ARRAY_CONTAINS, but with this query, CosmosDB returns all documents with tag1 and all documents with tag2, not only where tag1 and tag2 are together.
SELECT VALUE files FROM files JOIN tag IN files.Tags WHERE IS_DEFINED(files.Tags) AND ARRAY_CONTAINS(["tag1", "tag2"], tag, false)
My sample data
[
{
"id": "23244d53-f4ba-4bf5-90a4-a4073a759232",
"Tags": [
"tag1",
"tag2"
]
},
{
"id": "23244d53-f4ba-4bf5-90a4-a4073a759232",
"Tags": [
"tag1",
"tag3"
]
},
{
"id": "0bbd57d1-38cb-4094-81cf-33703448bbcd",
"Tags": [
"tag1"
]
}
]

You can try something like this SQL:
SELECT *
FROM files
WHERE IS_DEFINED(files.Tags) AND ARRAY_CONTAINS(files.Tags, "tag1",false) AND ARRAY_CONTAINS(files.Tags, "tag2",false)
Result:
[
{
"id": "23244d53-f4ba-4bf5-90a4-a4073a759232",
"Tags": [
"tag1",
"tag2"
]
}
]

Related

How to create a json file in R with list with lists as elements with fromJson()

I am trying to create a .json file. The third element should be a list with lists as elements.
What am I doing wrong?
Bellow is the json file I created with R:
{
"list1": [
"element1"
],
"list2": [
"element2"
],
"List_with_lists_as_elements": [
"Child1":{
"Name": "Child1",
"Child1_Title": [
"Title1",
"Title2",
"Title3"]
,
"Child1_Subtitle": [
"Subtitle_1",
"Subtitle_2",
"Subtitle_3"
]
},
"Child2":{
"Name": "Child2",
"Child2_Title": [
"Title1",
"Title2",
"Title3"]
,
"Child2_Subtitle": [
"Subtitle2_1",
"Subtitle2_2",
"Subtitle2_3"
]
},
"Child3":{
"Name": "Child3",
"Child2_Title": [
"Title1",
"Title2",
"Title3"]
,
"Child2_Subtitle": [
"Subtitle3_1",
"Subtitle3_2",
"Subtitle3_3"
]
}
]
}
I then save this as example_json.json and upload using fromJSON(txt = 'example_json.json'), and I have a error message, probably because I dont know quite well create a .json file:
Error in parse_con(txt, bigint_as_char) :
parse error: after array element, I expect ',' or ']'
_as_elements": [ "Child1":{ "Name": "Child1",
(right here) ------^
How can I create a .json file that gives me a list with lists() ?
The issue is that you have keys in your array :
...
"List_with_lists_as_elements": [
"Child1":{
"Name": "Child1",
...
},
"Child2":{
"Name": "Child2",
...
},
"Child3":{
"Name": "Child3",
...
}
]
...
You have a Name field which contains the key values, so you can probably just remove the keys:
...
"List_with_lists_as_elements": [
{
"Name": "Child1",
...
},
{
"Name": "Child2",
...
},
{
"Name": "Child3",
...
}
]
...

How to query Cosmos DB to have an array from multiple items in the result set

I have the following content in a container, where device_id is the partition key.
[
{
"id": "hub-01",
"device_id": "device-01",
"created": "2020-12-08T17:47:35",
"cohort": "test"
},
{
"id": "hub-02",
"device_id": "device-01",
"created": "2020-12-08T17:47:36",
"cohort": "test"
},
{
"id": "hub-01",
"device_id": "device-02",
"created": "2020-11-17T20:25:20",
"cohort": "test"
},
{
"id": "hub-01",
"device_id": "device-03",
"created": "2020-11-17T16:05:18",
"cohort": "test"
}
]
How do I query all unique devices, with all their metadata collected into a sub-list, so I get the following result set:
[
{
"device_id": "device-01",
"hubs": [
{
"id": "hub-01",
"created": "2020-12-08T17:47:35",
"cohort": "test"
},
{
"id": "hub-02",
"created": "2020-12-08T17:47:36",
"cohort": "test"
}
]
},
{
"device_id": "device-02",
"hubs": [
{
"id": "hub-01",
"created": "2020-11-17T20:25:20",
"cohort": "test"
}
]
},
{
"device_id": "device-03",
"hubs": [
{
"id": "hub-01",
"created": "2020-11-17T16:05:18",
"cohort": "test"
}
]
}
]
I was experimenting along the lines of the following sub-query, but it does not behave as I would expect:
SELECT
DISTINCT c.device_id,
ARRAY(
SELECT
c2.id,
c2.created,
c2.cohort
FROM c AS c2
WHERE c2.device_id = c.device_id
) as hubs
FROM c
You can create UDF function to handle this.
Here is a similar question I answered from another post.
group data by same timestamp using cosmos db sql
I agree with Mo B. You need to deal with this on your client side. I don't think UDF function can handle this because UDF function can't combine multiple items to one. I think the closest SQL like this:
SELECT
c2.device_id,ARRAY_CONCAT([],c2.hubs)
FROM
(SELECT c.device_id,ARRAY(
SELECT
c.id,
c.created,
c.cohort
FROM c
) as hubs FROM c) as c2
GROUP BY c2.device_id
But ARRAY_CONCAT isn't Aggregate function and there is no Aggregate function can concat array.

Cosmos DB query syntax WHERE clause with array in array

The following json represents two documents in a Cosmos DB container.
How can I write a query that gets any document that has an item with an id of item_1 and value of bar.
I've looked into ARRAY_CONTAINS, but don't get this to work with array's in array's.
Als I've tried somethings with any. Although I can't seem to find any documentation on how to use this, any seems to be a valid function, as I do get formatting highlights in the cosmos db explorer in Azure Portal.
For the any function I tried things like SELECT * FROM c WHERE c.pages.any(p, p.items.any(i, i.id = "item_1" AND i.value = "bar")).
The id fields are unique so if it's easier to find any document that contains any object with the right id and value, that would be fine too.
[
{
"type": "form",
"id": "form_a",
"pages": [
{
"name": "Page 1",
"id": "page_1",
"items": [
{
"id": "item_1",
"value": "foo"
}
]
}
]
},
{
"type": "form",
"id": "form_b",
"pages": [
{
"name": "Page 1",
"id": "page_1",
"items": [
{
"id": "item_1",
"value": "bar"
}
]
}
]
}
]
I think join could handle with WHERE clause with array in array.Please test below sql:
SELECT c.id FROM c
join pages in c.pages
where array_contains(pages.items,{"id": "item_1","value": "bar"},true)
Output:

CosmosDB, help flatten and filter by nested array

I'm trying to flatten and filter my json data that is in a CosmosDB.
The data looks like below and I would like to flatten everything in the array Variables and then filter by specific _id and Timestamp inside of the array:
{
"_id": 21032,
"FirstConnected": {
"$date": 1522835868346
},
"LastUpdated": {
"$date": 1523360279908
},
"Variables": [
{
"_id": 99999,
"Values": [
{
"Timestamp": {
"$date": 1522835868347
},
"Value": 1
}
]
},
{
"_id": 99998,
"Values": [
{
"Timestamp": {
"$date": 1523270312001
},
"Value": 8888
}
]
}
]
}
If you want to flatten data from the Variables array with properties from the root object you can query your collection like this:
SELECT root._id, root.FirstConnected, root.LastUpdated, var.Values
FROM root
JOIN var IN root.Variables
WHERE var._id = 99998
This will result into:
[
{
"_id": 21032,
"FirstConnected": {
"$date": 1522835868346
},
"LastUpdated": {
"$date": 1523360279908
},
"Values": [
{
"Timestamp": {
"$date": 1523270312001
},
"Value": 8888
}
]
}
]
If you want to even flatten the Values array you will need to write something like this:
SELECT root._id, root.FirstConnected, root.LastUpdated,
var.Values[0].Timestamp, var.Values[0]["Value"]
FROM root
JOIN var IN root.Variables
WHERE var._id = 99998
Note that CosmosDB considers "Value" as a reserved keyword and you need to use an escpape syntax. The result for this query is:
[
{
"_id": 21032,
"FirstConnected": {
"$date": 1522835868346
},
"LastUpdated": {
"$date": 1523360279908
},
"Timestamp": "1970-01-01T00:00:00Z",
"Value": 8888
}
]
Check for more details https://learn.microsoft.com/en-us/azure/cosmos-db/sql-api-sql-query#Advanced
If you're only looking for filtering by the nested '_id' property then you could use ARRAY_CONTAINS w/ the partial_match argument set to true. The query would look something like this:
SELECT VALUE c
FROM c
WHERE ARRAY_CONTAINS(c.Variables, {_id: 99998}, true)
If you also want to flatten the array, then you could use JOIN
SELECT VALUE v
FROM v IN c.Variables
WHERE v._id = 99998

How can I write a document db query which returns results of sub-documents satisifying any conditions of input list?

Let's assume that I have the following document:
[
{
"name" : "obj1",
"field": [ "Foo1", "Foo3" ]
},
{
"name": "obj2",
"field": [ "Foo2" ]
},
{
"name": "obj3",
"field": [ "Foo3" ]
},
{
"name": "obj4",
"field": [ "Foo1" ]
}
]
I want to write a query which returns obj1, obj3, and obj4 when field = "Foo1" or "Foo3" are searched for. Obviously I can write something like:
SELECT * FROM c WHERE ARRAY_CONTAINS(c.field, "Foo1") OR ARRAY_CONTAINS(c.field, "Foo3")
Though I want to avoid constructing a long query by concatenating query string with ARRAY_CONTAINS for each value in search list.
How can this query be expressed succinctly?
You could rewrite the query using JOIN as follows:
SELECT c
FROM c
JOIN tag IN c.field
WHERE ARRAY_CONTAINS(["Foo1", "Foo3"], tag)
Note that if you have an object with both tags, then it would occur multiple times in the result, and you have to perform distinct/de-duping on the client side.

Resources