Cosmos DB Query Projection - azure-cosmosdb

I have Cosmos Document of below format:
(Note: Example is just for illustration, the actual document has different properties)
{
"OrderNumer":"12345",
"Version":"5",
"OrderDetails": [
{
"id":1,
"OrderItem": {
"Name": "ABC",
"Description": "ABC Description",
.
.
.
},
"PaymentDetails": {
},
.
.
.
},
{
"id":2,
"OrderItem": {
"Name": "PQR",
"Description": "PQR Description",
.
.
.
},
"PaymentDetails": {
},
.
.
.
},
],
"OrderDate": "12-01-2020",
"CustomerDetails": {
}
.
.
.
}
OrderNumber is a partition key.
I'm trying to project the above document through Cosmos Query to get the entire OrderDetails JSON object and a few properties from the parent into a single JSON with a filter on OrderDetails-> id.
The expected projection should be:
{
"Order":
[
{
"OrderNumer":"12345",
"Version":"5",
"id":1,
"OrderItem": {
"Name": "ABC",
"Description": "ABC Description",
.
.
.
},
"PaymentDetails": {
},
.
.
.
},
{
"OrderNumer":"12345",
"Version":"5",
"id":2,
"OrderItem": {
"Name": "PQR",
"Description": "PQR Description",
.
.
.
},
"PaymentDetails": {
},
.
.
.
},
]
}
As shown in the above JSON, I'm projecting OrderNumber and Version properties within the OrderDetails as Order.
OrderDetails is a massive JSON with a dynamic schema that makes it difficult for us to project its properties individually.
I have tried a few options to project this, but the closest I could reach was below query:
SELECT c.OrderNumber, c.Version, o as Order
FROM c
JOIN o in c.OrderDetails WHERE
c.OrderNumber= '1235' AND ARRAY_CONTAINS([1,2], o.id)
However, the above query does not give desired result as it keeps the OrderNumber and Version in as separate property to Order.
Is there any way I can achieve this?

You can use UDF.
First:create a udf,below is code
function Converted(version,orderNumber,orderItem){
orderItem.Version = version;
orderItem.OrderNumber = orderNumber;
return orderItem;
}
Second:use this sql
select value udf.Converted(c.Version,c.OrderNumber,o) from c join o in c.OrderDetails WHERE c.OrderNumber= '12345' AND ARRAY_CONTAINS([1,2], o.id)
Finally:new JsonObject add this result via code.
Here is result:
[
{
"id": 1,
"OrderItem": {
"Name": "ABC",
"Description": "ABC Description"
},
"PaymentDetails": {},
"Version": "5",
"OrderNumber": "12345"
},
{
"id": 2,
"OrderItem": {
"Name": "PQR",
"Description": "PQR Description"
},
"PaymentDetails": {},
"Version": "5",
"OrderNumber": "12345"
}
]
Does this meet your requirements?

I know this is too late to answer, till now you have resolved your problem but still providing the query to help other people who may have faced a similar issue.
CosmosDB provides flexibility which will help you to modify output JSON structure. Please see the below query for above problem.
Query:
SELECT
[
{
"OrderNumer": c.OrderNumer,
"Version": c.Version,
"id": od.id,
"OrderItem": {
"Name": od.OrderItem.Name,
"Description": od.OrderItem.Description
}
}
] AS Orders
FROM c
JOIN od IN c.OrderDetails
Happy coding!

Related

Combine multiple json to single json using jq

I am new to jq and stuck with this problem for a while. Any help is appreciable.
I have two json files,
In file1.json:
{
"version": 4,
"group1": [
{
"name":"olditem1",
"content": "old content"
}
],
"group2": [
{
"name":"olditem2"
}
]
}
And in file2.json:
{
"group1": [
{
"name" : "newitem1"
},
{
"name":"olditem1",
"content": "new content"
}
],
"group2": [
{
"name" : "newitem2"
}
]
}
Expected result is:
{
"version": 4,
"group1": [
{
"name":"olditem1",
"content": "old content"
},
{
"name" : "newitem1"
}
],
"group2": [
{
"name":"olditem2"
},
{
"name" : "newitem2"
}
]
}
Criterial for merge:
Has to merge only group1 and group2
Match only by name
I have tried
jq -S '.group1+=.group1|.group1|unique_by(.name)' file1.json file2.json
but this is filtering group1 and all other info are lost.
This approach uses INDEX to create a dictionary of unique elements based on their .name field, reduce to iterate over the group fields to be considered, and an initial state created by combining the slurped (-s) input files using add after removing the group fileds to be processed separately using del.
jq -s '
[ "group1", "group2" ] as $gs | . as $in | reduce $gs[] as $g (
map(del(.[$gs[]])) | add; .[$g] = [INDEX($in[][$g][]; .name)[]]
)
' file1.json file2.json
{
"version": 4,
"group1": [
{
"name": "olditem1",
"content": "new content"
},
{
"name": "newitem1"
}
],
"group2": [
{
"name": "olditem2"
},
{
"name": "newitem2"
}
]
}
Demo

How to query Cosmos DB to have an array from multiple items in the result set

I have the following content in a container, where device_id is the partition key.
[
{
"id": "hub-01",
"device_id": "device-01",
"created": "2020-12-08T17:47:35",
"cohort": "test"
},
{
"id": "hub-02",
"device_id": "device-01",
"created": "2020-12-08T17:47:36",
"cohort": "test"
},
{
"id": "hub-01",
"device_id": "device-02",
"created": "2020-11-17T20:25:20",
"cohort": "test"
},
{
"id": "hub-01",
"device_id": "device-03",
"created": "2020-11-17T16:05:18",
"cohort": "test"
}
]
How do I query all unique devices, with all their metadata collected into a sub-list, so I get the following result set:
[
{
"device_id": "device-01",
"hubs": [
{
"id": "hub-01",
"created": "2020-12-08T17:47:35",
"cohort": "test"
},
{
"id": "hub-02",
"created": "2020-12-08T17:47:36",
"cohort": "test"
}
]
},
{
"device_id": "device-02",
"hubs": [
{
"id": "hub-01",
"created": "2020-11-17T20:25:20",
"cohort": "test"
}
]
},
{
"device_id": "device-03",
"hubs": [
{
"id": "hub-01",
"created": "2020-11-17T16:05:18",
"cohort": "test"
}
]
}
]
I was experimenting along the lines of the following sub-query, but it does not behave as I would expect:
SELECT
DISTINCT c.device_id,
ARRAY(
SELECT
c2.id,
c2.created,
c2.cohort
FROM c AS c2
WHERE c2.device_id = c.device_id
) as hubs
FROM c
You can create UDF function to handle this.
Here is a similar question I answered from another post.
group data by same timestamp using cosmos db sql
I agree with Mo B. You need to deal with this on your client side. I don't think UDF function can handle this because UDF function can't combine multiple items to one. I think the closest SQL like this:
SELECT
c2.device_id,ARRAY_CONCAT([],c2.hubs)
FROM
(SELECT c.device_id,ARRAY(
SELECT
c.id,
c.created,
c.cohort
FROM c
) as hubs FROM c) as c2
GROUP BY c2.device_id
But ARRAY_CONCAT isn't Aggregate function and there is no Aggregate function can concat array.

Search for available items for booking

I am working on a online booking system of items.
I am using mongo to store booking and item details
Item
{
id: "3",
"name": "",
"description": "",
"extra": [{}]
}
Booking
{
"id": "",
"itemId":""
"startDate": millis,
"endDate": millis,
"status": "",
"userId": ""
}
I have to implement search b/w dates. The search should return only available items for the specified period. How can I build a scalable search for this? I am planning to use elastic also for search. Any suggestion related to new technology also welcome.
I'd suggest making the booking the base object and putting the item info inside it. That is to say:
Set up mapping:
PUT bookings
{
"mappings": {
"properties": {
"id": {
"type": "keyword"
},
"item": {
"properties": {
"id": {
"type": "keyword"
},
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"description": {
"type": "text"
},
"extra": {
"type": "nested"
}
}
},
"startDate": {
"type": "date",
"format": "epoch_millis"
},
"endDate": {
"type": "date",
"format": "epoch_millis"
},
"status": {
"type": "keyword"
},
"userId": {
"type": "keyword"
}
}
}
}
Ingest the simplest booking
POST bookings/_doc
{
"item": {
"id": "987"
},
"startDate": 1587110540025,
"endDate": 1587220730025
}
Restricting the *Date fields and only returning the corresponding item:
GET bookings/_search
{
"_source": "item",
"query": {
"bool": {
"must": [
{
"range": {
"startDate": {
"gte": "17/04/2020",
"format": "dd/MM/yyyy"
}
}
},
{
"range": {
"endDate": {
"lte": "18/04/2020",
"format": "dd/MM/yyyy"
}
}
}
]
}
}
}
Note that although our date fields are defined as epoch_millis, we can still query using human-readable date strings, provided we specify the format. You can of course use milliseconds if you prefer.
While indexing the items to Elasticsearch you can check bookings. Think that, you are indexing items and you get the item from Mongo. Also, you can get the bookings for this item and you can add a field like bookingCount inside the item document of Elasticsearch. While searching you can use bookingCount field to search without booking items.
In generally, the indexing is async operations. You can use queue. So, this will reduce latency for the user operations. And, you can do what you want in there. You can get a summary with bookings and you can put inside the item.
{
id: "3",
"name": "",
"description": "",
"extra": [{}],
"bookingCount": "",
"bookingsByStatus": {
"status_1": 1233,
"status_2": 1233,
...
}
}
But this is a business decision. And after any update of items and booking, you need yo update the item from Elasticsearch index. Also, you can use other solution like mentione by #jzzfs.

Document Db query filter for an attribute that contains an array

With the sample json shown below, am trying to retrieve all documents that contains atleast one category which is array object wrapped underneath Categories that has the text value 'drinks' with the following query but the returned result is empty. Can someone help me get this right?
SELECT items.id
,items.description
,items.Categories
FROM items
WHERE ARRAY_CONTAINS(items.Categories.Category.Text, "drink")
{
"id": "1dbaf1d0-6549-11a0-88a8-001256957023",
"Categories": {
"Category": [{
"Type": "GS1",
"Id": "10000266",
"Text": "Stimulants/Energy Drinks Ready to Drink"
}, {
"Type": "GS2",
"Id": "10000266",
"Text": "Healthy Drink"
}]
}
},
Note: The json is a bit wierd to have the array wrapped by an object itself - this json was converted from a XML hence the result. So please assume I do not have any control over how this object is saved as json
You need to flatten the document in your query to get the result you want by joining the array back to the main document. The query you want would look like this:
SELECT items.id, items.Categories
FROM items
JOIN Category IN items.Categories.Category
WHERE CONTAINS(LOWER(Category.Text), "drink")
However, because there is no concept of a DISTINCT query, this will produce duplicates equal to the number of Category items that contain the word "drink". So this query would produce your example document twice like this:
[
{
"id": "1dbaf1d0-6549-11a0-88a8-001256957023",
"Categories": {
"Category": [
{
"Type": "GS1",
"Id": "10000266",
"Text": "Stimulants/Energy Drinks Ready to Drink"
},
{
"Type": "GS2",
"Id": "10000266",
"Text": "Healthy Drink"
}
]
}
},
{
"id": "1dbaf1d0-6549-11a0-88a8-001256957023",
"Categories": {
"Category": [
{
"Type": "GS1",
"Id": "10000266",
"Text": "Stimulants/Energy Drinks Ready to Drink"
},
{
"Type": "GS2",
"Id": "10000266",
"Text": "Healthy Drink"
}
]
}
}
]
This could be problematic and expensive if the Categories array holds a lot of Category items that have "drink" in them.
You can cut that down if you are only interested in a single Category by changing the query to:
SELECT items.id, Category
FROM items
JOIN Category IN items.Categories.Category
WHERE CONTAINS(LOWER(Category.Text), "drink")
Which would produce a more concise result with only the id field repeated with each matching Category item showing up once:
[{
"id": "1dbaf1d0-6549-11a0-88a8-001256957023",
"Category": {
"Type": "GS1",
"Id": "10000266",
"Text": "Stimulants/Energy Drinks Ready to Drink"
}
},
{
"id": "1dbaf1d0-6549-11a0-88a8-001256957023",
"Category": {
"Type": "GS2",
"Id": "10000266",
"Text": "Healthy Drink"
}
}]
Otherwise, you will have to filter the results when you get them back from the query to remove duplicate documents.
If it were me and I was building a production system with this requirement, I'd use Azure Search. Here is some info on hooking it up to DocumentDB.
If you don't want to do that and we must live with the constraint that you can't change the shape of the documents, the only way I can think to do this is to use a User Defined Function (UDF) like this:
function GetItemsWithMatchingCategories(categories, matchingString) {
if (Array.isArray(categories) && categories !== null) {
var lowerMatchingString = matchingString.toLowerCase();
for (var index = 0; index < categories.length; index++) {
var category = categories[index];
var categoryName = category.Text.toLowerCase();
if (categoryName.indexOf(lowerMatchingString) >= 0) {
return true;
}
}
}
}
Note, the code above was modified by the asker after actually trying it out so it's somewhat tested.
You would use it with a query like this:
SELECT * FROM items WHERE udf.GetItemsWithMatchingCategories(items.Categories, "drink")
Also, note that this will result in a full table scan (unless you can combine it with other criteria that can use an index) which may or may not meet your performance/RU limit constraints.

JsonSchemaGenerator adding Minimum

I'm trying to generate a schema using Newtonsoft JsonSchemaGenerator similar to the following schema
{
"title": "Example Schema",
"type": "object",
"properties": {
"firstName": {
"type": "string"
},
"lastName": {
"type": "string"
},
"age": {
"description": "Age in years",
"type": "integer",
"minimum": 0
}
},
}
I'm trying to create a JsonConverter and set it to the ContractResolver but I can not find a way to bind it to the specific property.
I am tracing the code and I see the GenerateInternal on JsonSchemaGenerator (via GenerateObjectSchema) is looping through Contract.Properties and recursively calling itself but only passing property.PropertyType to itself.
foreach (JsonProperty property in contract.Properties)
{
. . .
JsonSchema propertySchema = GenerateInternal(property.PropertyType, property.Required, !optional);
. . .
}
private JsonSchema GenerateInternal(Type type, Required valueRequired, bool required)
So if I have a class with 2 integer properties with different ranges it won't be able to create different schemas for each.

Resources