I am working with Cosmos DB and I want to write a SQL query that will match common value in array of documents based on id.
To elaborate, imagine you have the following three documents:
{
"id": "2ECF4568-CB0E-4E11-A5CD-1206638F9C39",
"entityType": "ServiceInformationFacility",
"facilities": [
{
"id": "6F706BA3-27AD-45B8-9831-A531E37C4C17",
"facilityName": "Kat Service Center",
"phoneNumber": "9879561234"
},
{
"id": "7F706BA3-27AD-45B8-9831-A531E37C4C17",
"facilityName": "Honda Service Center",
"phoneNumber": "9879561234"
}]
},
{
"id": "3ECF4568-CB0E-4E11-A5CD-1206638F9C39",
"entityType": "ServiceInformationFacility",
"facilities": [
{
"id": "8F706BA3-27AD-45B8-9831-A531E37C4C17",
"facilityName": "Hyundai Service Center",
"phoneNumber": "9879561234"
},
{
"id": "7F706BA3-27AD-45B8-9831-A531E37C4C17",
"facilityName": "Honda Service Center",
"phoneNumber": "9879561234"
}]
},
{
"id": "6ECF4568-CB0E-4E11-A5CD-1206638F9C39",
"entityType": "ServiceInformationFacility",
"facilities": [
{
"id": "8F706BA3-27AD-45B8-9831-A531E37C4C17",
"facilityName": "Hyundai Service Center",
"phoneNumber": "9879561234"
},
{
"id": "7F706BA3-27AD-45B8-9831-A531E37C4C17",
"facilityName": "Honda Service Center",
"phoneNumber": "9879561234"
} ]
}
I want to write a query that return all the common facility based on id.That means when passing the list of Ids, the facility exists in the given Ids should be display(not either or).
so in the above collection it should only return "facilityName": "Honda Service Center" by passing parameter id("2ECF4568-CB0E-4E11-A5CD-1206638F9C39","3ECF4568-CB0E-4E11-A5CD-1206638F9C39","6ECF4568-CB0E-4E11-A5CD-1206638F9C39").
So far I have tried:
SELECT q.facilityName FROM c
join q in c.facilities
where c.id in('6ECF4568-CB0E-4E11-A5CD-1206638F9C39','2ECF4568-CB0E-4E11-A5CD-1206638F9C39')AND c.entityType = 'ServiceInformationFacility'
It gives me all the facility name but I need only facility which are common in the above documents that is "facilityName": "Honda Service Center".
Thanks in advance
It gives me all the facility name but I need only facility which are
common in the above documents that is "facilityName": "Honda Service
Center".
I may get your point now.However,i'm afraid that's impossible in cosmos sql. I try to count number of appearance of facilitiesName cross the documents and get below solution which is closest with your need.
sql:
SELECT count(c.id) as cnt, f.facilityName from c
join f in c.facilities
where array_contains(['6ECF4568-CB0E-4E11-A5CD-1206638F9C39','2ECF4568-CB0E-4E11-A5CD-1206638F9C39'],c.id,true)
AND c.entityType = 'ServiceInformationFacility'
group by f.facilityName
output:
Then i tried to extend it with some subquery but no luck. So i'd suggest using stored procedure to finish the next job.The main purpose is looping above result and judge if the cnt equals the [ids array].length.
Update Answer for Stored procedure code:
input param for #idArray:["6ECF4568-CB0E-4E11-A5CD-1206638F9C39","2ECF4568-CB0E-4E11-A5CD-1206638F9C39"]
Sp code:
function sample(idArray) {
var collection = getContext().getCollection();
var length = idArray.length;
var sqlQuery = {
"query": 'SELECT count(c.id) as cnt, f.facilityName from c join f in c.facilities '+
'where array_contains( #idArray,c.id,true) ' +
'AND c.entityType = "ServiceInformationFacility" group by f.facilityName',
"parameters": [
{"name": "#idArray", "value": idArray}
]
}
// Query documents and take 1st item.
var isAccepted = collection.queryDocuments(
collection.getSelfLink(),
sqlQuery,
function (err, feed, options) {
if (err) throw err;
if (!feed || !feed.length) {
var response = getContext().getResponse();
response.setBody('no docs found');
}
else {
var response = getContext().getResponse();
var returenArray = [];
for(var i=0;i<feed.length;i++){
if(feed[i].cnt==length)
returenArray.push(feed[i])
}
response.setBody(returenArray);
}
});
if (!isAccepted) throw new Error('The query was not accepted by the server.');
}
Output:
Related
Assume this model:
Having a collection of Writers documents, each Writer has some Posts. and each Post contains an array of Comments.
JSON:
{
"id": "1",
"partitionKey": "somePK",
"name": "John",
"posts": [
{
"id": "20",
"title": "post1",
"comments": [
{
"body": "some body"
}
]
},
{
"id": "21",
"title": "post2",
"comments": [
{
"body": "some new body"
}
]
}
]
}
I need a query that returns the following output:
[
{
"WriterName": "John",
"WriterCommentsCount": 2
}
]
I managed to get the writer's comment but I have a problem getting the name(or other properties) beside the WriterCommentsCount. Any idea of how to get the writer's name?
this is what I've tried so far (only the writer's comments count)
Along your thought, if we execute sql
SELECT c.name,array_length(post.comments) as countC FROM c join post in c.posts where c.id = '1'
we'll get response like below:
[
{
"name": "John",
"countC": 1
},
{
"name": "John",
"countC": 3
},
{
"name": "John",
"countC": 0
}
]
So using this SELECT c.name,sum(array_length(post.comments)) as countC FROM c join post in c.posts where c.id = '1' will return error 'Property reference 'c.name' is invalid'. The solution is using 'group by c.name'.
My answer in the comment and the one from #404 are all the same in this point, the difference is that I followed your thought using sum function and he uses count(1) after expand all the child item. My sql costs '3.09 RUs' and his costs the same on my test data.
Here's the situation screenshot.
The easiest is making a flat structure where you have an entry for each comment. After that you can use GROUP BY to group all entries for the same name. Try the following:
SELECT
c.name AS WriterName,
COUNT(1) AS WriterCommentsCount
FROM c
JOIN s IN c.posts
JOIN t IN s.comments
WHERE c.id = '1'
GROUP BY c.name
I have the following result from a query, where count field is derived from an aggregate function
[
{
"count": 1,
"facilityName": "Hyundai Service Center"
},
{
"count": 2,
"facilityName": "Honda Service Center"
},
{
"count": 1,
"facilityName": "Kat Service Center"
}
]
I want to display only those facilityName where count >= 2.
How can we achieve this?
I tried to implement your requirement with Stored procedure,please refer to my SP code:
function sample(idArray) {
var collection = getContext().getCollection();
var length = idArray.length;
var sqlQuery = {
"query": 'SELECT count(c.id) as cnt, f.facilityName from c join f in c.facilities '+
'where array_contains( #idArray,c.id,true) ' +
'AND c.entityType = "ServiceInformationFacility" group by f.facilityName',
"parameters": [
{"name": "#idArray", "value": idArray}
]
}
// Query documents and take 1st item.
var isAccepted = collection.queryDocuments(
collection.getSelfLink(),
sqlQuery,
function (err, feed, options) {
if (err) throw err;
if (!feed || !feed.length) {
var response = getContext().getResponse();
response.setBody('no docs found');
}
else {
var response = getContext().getResponse();
var returenArray = [];
for(var i=0;i<feed.length;i++){
if(feed[i].cnt==length)
returenArray.push(feed[i])
}
response.setBody(returenArray);
}
});
if (!isAccepted) throw new Error('The query was not accepted by the server.');
}
Input param:
["6ECF4568-CB0E-4E11-A5CD-1206638F9C39","2ECF4568-CB0E-4E11-A5CD-1206638F9C39"]
Get output:
UPDATES:
So,if your collection is partitioned,maybe stored procedure is not suitable for you because partition key is necessary for execution of SP.Please refer to my detailed explanations in this thread:Delete Documents from Cosmos using Query without Partition Key Specification
Actually, there is no complex logic in my above sp code.It just loop the result of the sql and try to find which object.count equals the idArray.length which means the object.facilityName exists cross all the documents.
So,you don't have to use SP, you can use any tiny piece of code to handle the logic I describe above.
Given a collection of Cosmos documents similar to the following, I'd like to generate a grouped (distinct?!?) list of "categories" using Cosmos SQL. Any help in this regard would be greatly appreciated.
[
{
"id": "f0136e76-8e66-6a5a-3790-b577001d6420",
"itemId": "analyze-and-visualize-your-data-with-azure-cosmos-db-notebooks",
"title": "Built-in Jupyter notebooks in Azure Cosmos DB are now available",
"categories": [
"Developer",
"Database",
"Data Science"
]
},
{
"id": "f0136e76-8e66-6a5a-3790-b577001d6420",
"itemId": "analyze-and-visualize-your-data-with-azure-cosmos-db-notebooks",
"title": "Built-in Jupyter notebooks in Azure Cosmos DB are now available",
"categories": [
"Developer",
"Database",
"Data Science"
]
},
{
"id": "d98c1dd4-008f-04b2-e980-0998ecf8427e",
"itemId": "improving-azure-virtual-machines-resiliency-with-project-tardigrade",
"title": "Improving Azure Virtual Machines resiliency with Project Tardigrade",
"categories": [
"Virtual Machines",
"Supportability",
"Monitoring"
]
}
]
GroupBY is not supported by Azure CosmosDB so far. You can alternatively use Stored Procedure to implement your requirement.
Base on the sample documents you have given above, here is a sample stored Procedure
function groupBy() {
var collection = getContext().getCollection();
var collectionLink = collection.getSelfLink();
var isValid = collection.queryDocuments(
collectionLink,
'SELECT * FROM stackoverflow s',
{EnableCrossPartitionQuery: true},
function (err, feed, options) {
if (err) throw err;
if (!feed || !feed.length) {
var response = getContext().getResponse();
console.log(JSON.stringify(response));
response.setBody('no docs found');
}
else {
var response = getContext().getResponse();
var items = {};
for(var i=0;i<feed.length;i++){
var categories = feed[i].categories;
for(var j=0;j<categories.length;j++){
items[categories[j]] = categories[j]
}
var distinctArray = [];
for(var distinctObj in items){
distinctArray.push(items[distinctObj]);
}
}
response.setBody(distinctArray);
}
});
if (!isValid) throw new Error('Kindly check your query, which not accepted by the server.');
}
reference: DocumentDB SQL with ARRAY_CONTAINS
Question: Is there a better and more efficient query for what is happening below?
In the above reference question, the UDF was written to check if an object in the array had a match to the passed in string. In this variant, I am passing into the UDF an array of strings.
Now I have a working O(N^2) version that I would hope CosmosDB had a more efficient solution for.
function ScopesContainsNames(scopes, names){
var s, _i,_j, _ilen, _jLen;
for (_i = 0, _ilen = scopes.length; _i < _ilen; _i++) {
for (_j = 0, _jLen = names.length; _j < _jLen; _j++) {
s = scopes[_i];
n = names[_j];
if (s.name === n) {
return true;
}
}
}
return false;
}
My QUERY looks like this.
SELECT * FROM c WHERE udf.ScopesContainsNames(c.scopes, ["apples", "strawberries", "bananas"])
The following is an example of my Document:
{
"scopes": [
{
"name": "apples",
"displayName": "3048b61e-06d8-4dbf-a4ab-d4c2ba0a8943/a"
},
{
"name": "bananas",
"displayName": "3048b61e-06d8-4dbf-a4ab-d4c2ba0a8943/a"
}
],
"enabled": true,
"name": "dc1e4c12-95c1-4b7f-bf27-f60f0c29bf52/a",
"displayName": "218aea3d-4492-447e-93be-2d3646802ac6/a",
"description": "4aa62367-7421-4fb6-88c7-2699c9c309dd/a",
"userClaims": [
"98988d5b-38b5-400c-aecf-da57d2b66433/a"
],
"properties": {
"437d7bab-a4fb-4b1d-b0b9-f5111d01882a/a": "863defc1-c177-4ba5-b699-15f4fee78ea5/a"
},
"id": "677d4a49-a46c-4613-b3f6-f390ab0d013a",
"_rid": "q6I9AOf180hJAAAAAAAAAA==",
"_self": "dbs/q6I9AA==/colls/q6I9AOf180g=/docs/q6I9AOf180hJAAAAAAAAAA==/",
"_etag": "\"00000000-0000-0000-1ede-f2bc622201d5\"",
"_attachments": "attachments/",
"_ts": 1560097098
}
If i don't misunderstanding your requirement,you need to search the results where any name property of scopes array is included by the ["apples", "strawberries", "bananas"].
No need to use udf, please see the sample documents i made as below:
Using sql:
SELECT distinct c.scopes FROM c
join fruit in c.scopes
where Array_contains(["apples", "strawberries", "bananas"],fruit.name,false)
Result:
I have a 'Message' collection in DocumentDb:
{
Subject: "Foo",
To: [
{ "Name": "Me", "Address": "me#company.com" }
],
Cc: [
{ "Name": "You", "Address": "you#company.com" }
]
}
and
{
Subject: "Bar",
To: [
{ "Name": "You", "Address": "you#company.com" }
],
Cc: []
}
I would like to select all documents that have 'you#company.com' as the To or Cc address:
SELECT Message.Subject
FROM Message
JOIN To IN Message.To
JOIN Cc IN Message.Cc
WHERE "you#company.com" IN (To.Address, Cc.Address)
This returns the first document, but not the second.
I believe the JOIN Cc in Message.Cc is causing the second document to be removed from the results because it is empty.
Is there a way I can structure the SQL query to include the second document in the result set?
No, this require two queries (You could write it in one query using a user-defined function, but that approach might not be able to use the index effectively).
SELECT Message.Subject
FROM Message
JOIN To IN Message.To
WHERE To.Address = "you#company.com"
SELECT Message.Subject
FROM Message
JOIN To IN Message.Cc
WHERE Cc.Address = "you#company.com"
Alternatively, if you have values for both Name and Value, you could write it as:
SELECT Message.Subject
FROM Message
WHERE ARRAY_CONTAINS(Message.To, { "Name": "You", "Address": "you#company.com" })
OR ARRAY_CONTAINS(Message.Cc, { "Name": "You", "Address": "you#company.com" })
If you'd like to see this added to DocumentDB's query, please support/upvote support for sub-queries at https://feedback.azure.com/forums/263030-documentdb/.