Get multiple counts with one Cosmos DB query? - azure-cosmosdb

Consider these queries:
SELECT COUNT(1) AS failures 
FROM c 
WHERE c.time = 1623332779 AND c.status = 'FAILURE'
SELECT COUNT(1) AS successes 
FROM c 
WHERE c.time = 1623332779 AND c.status = 'SUCCESS'
How can I combine these two distinct queries into one query?
I tried repurposing the answers from How to get multiple counts with one SQL query?, but ran into a few problems:
COUNT(*) throws an error "Syntax error, incorrect syntax near '*'."
UNION throws "Syntax error, incorrect syntax near 'UNION'."
I also experimented with
SELECT 
SUM(CASE WHEN c.time = 1623332779 THEN 1 else 0 end)
FROM c
but this leads to another syntax error. I noticed that
SELECT COUNT(1) AS mycounter, COUNT(1) AS mycounter2 
FROM c
WHERE c.time = 1623332779
returns
[
{
"mycounter": 3,
"mycounter2": 3
}
]
but I was unable to link these distinct counters to distinct queries.

The following should work. The count operator skips values that are undefined which allows you to filter out rows from it:
SELECT
COUNT(c.status = 'SUCCESS' ? 1 : undefined) AS successes,
COUNT(c.status = 'FAILURE' ? 1 : undefined) AS failures
FROM c
WHERE c.time = 1623332779
It ruins performance though as it doesn't use indexing at all for the count. So you're better off using two seperate queries. If you really want to use a single request you could create a stored procedure that runs both queries and pastes the results together.

Instead of doing counts of the overall query, you can use GROUP BY to get counts in a single query. For example:
SELECT c.time, c.status, COUNT(c.status) AS statuscount
FROM c
WHERE c.time = "1623332779"
GROUP BY c.time, c.status
This won't give you explicit counts called "successes" and "failures" but it will return both counts, something like:
[
{
"time": "1623332779",
"status": "FAILURE",
"statuscount": 123
},
{
"time": "1623332779",
"status": "SUCCESS",
"statuscount": 456
}
]

Related

Top N per Classification in CosmosDB

I'm kinda stuck on this issue. I have several hundreds of a certain model stored in ComsosDb and I can't seem to get the top 5 of each category.
This is the model:
"id": "06224840-6b88-4394-9324-4d1628383702",
"name": "Reservation",
"description": null,
"client": null,
"reference": null,
"isMonitoring": false,
"monitoringSince": null,
"hasRiskProfile": false,
"riskProfile": -1,
"monitorFrequency": 0,
"mainBindable": null,
"organizationId": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
"userId": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
"createDate": "2020-08-18T11:00:02.5266403Z",
"updateDate": "2020-08-18T11:00:02.5266419Z",
"lastMonitorDate": "2020-08-18T11:00:02.5266427Z"
So what i'm trying to do is use C# to get the top 5 from each risk profile where the organizationId matches. GroupBy through LINQ throws an error, same with a row_number() query combined with a PARTITION BY, doesn't seem to work either.
Any way I can get this to work in a single query compatible with cosmos?
EDIT:
What i am trying to achieve in CosmosDb is this roughly:
WITH TopEntries AS (
SELECT *
,ROW_NUMBER() OVER (
PARTITION BY [riskProfile]
ORDER BY [updateDate] DESC
) AS [ROW NUMBER]
WHERE [organizationId] = "xyz"
FROM [reservations]
)
SELECT * FROM TopEntries
WHERE TopEntries.[ROW NUMBER] <= 5
It sounds like combining TOP and ORDER BY would do the job. For example:
SELECT TOP 5 *
FROM c
WHERE c.organizationId = "xyz"
ORDER BY c.riskProfile
You can build such queries with parameters in the .NET SDK as in this sample.
The functionality you are trying to achieve is not directly possible through single query in Cosmos DB. There are 2 steps to do this(You can change as per you document sets)
Firstly you will have to group by like below:
SELECT c.city FROM c where c.org = 'xyz' group by c.city
Then loop through the result one by one from the first query like below:
SELECT TOP 5 * FROM C WHERE C.city = 'delhi' order by C.date desc
You can refer to similar issue here:
https://learn.microsoft.com/en-us/answers/questions/38454/index.html

Query DynamoDB with multiple begins_with clause in AppSync

I'm currently trying to create a dynamic query using AppSync and Apache Velocity Template Language (VTL).
I want to evaluate series of begins_with with "OR"
Such as:
{
"operation": "Query",
"query": {
"expression": "pk = :pk and (begins_with(sk,:sk) or begins_with(sk, :sk1)",
"expressionValues": {
":pk": { "S": "tenant:${context.args.tenantId}",
":sk": {"S": "my-sort-key-${context.args.evidenceId[0]}"},
":sk1": {"S": "my-sort-key-${context.args.evidenceId[1]}"}
}
}
But that isn't working. I've also tried using | instead of or but it hasn't worked either. I get:
Invalid KeyConditionExpression: Syntax error; token: "|", near: ") | begins_with" (Service: AmazonDynamoDBv2;
How can I achieve this using VTL?
Original answer
you're missing a closing parenthesis after the begins_with(sk, :sk1). That is, the third line should be:
"expression": "pk = :pk and (begins_with(sk,:sk) or begins_with(sk, :sk1))"
I just ran the fixed expression and it worked as expected.
Revised
Actually, there are subtleties.
the or operator can be used in filter-expression but not in key-condition-expressions. For instance, a = :v1 and (b = :v2 or b = :v3) will work as long as a and b are "regular" attributes. If a and b are the table's primary key (partition key, sort key) then DDB will reject the query.
Reading this answer seems that this isn't possible, as DynamoDB only accepts a single Sort key value and a single operation.
There's also no "OR" condition in the operation:
https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_Query.html#DDB-Query-request-KeyConditionExpression
If you also want to provide a condition for the sort key, it must be combined using AND with the condition for the sort key. Following is an example, using the = comparison operator for the sort key:
I am going to be restructuring the access pattern to better match my request.

How would I return a count of all rows in the table and then the count of each time a specific status is found?

Please forgive my ignorance on sqlalchemy, up until this point I've been able to navigate the seas just fine. What I'm looking to do is this:
Return a count of how many items are in the table.
Return a count of many times different statuses appear in the table.
I'm currently using sqlalchemy, but even a pure sqlite solution would be beneficial in figuring out what I'm missing.
Here is how my table is configured:
class KbStatus(db.Model):
id = db.Column(db.Integer, primary_key=True)
status = db.Column(db.String, nullable=False)
It's a very basic table but I'm having a hard time getting back the data I'm looking for. I have this working with 2 separate queries, but I have to believe there is a way to do this all in one query.
Here are the separate queries I'm running:
total = len(cls.query.all())
status_count = cls.query.with_entities(KbStatus.status, func.count(KbStatus.id).label("total")).group_by(KbStatus.status).all()
From here I'm converting it to a dict and combining it to make the output look like so:
{
"data": {
"status_count": {
"Assigned": 1,
"In Progress": 1,
"Peer Review": 1,
"Ready to Publish": 1,
"Unassigned": 4
},
"total_requests": 8
}
}
Any help is greatly appreciated.
I don't know about sqlalchemy, but it's possible to generate the results you want in a single query with pure sqlite using the JSON1 extension:
Given the following table and data:
CREATE TABLE data(id INTEGER PRIMARY KEY, status TEXT);
INSERT INTO data(status) VALUES ('Assigned'),('In Progress'),('Peer Review'),('Ready to Publish')
,('Unassigned'),('Unassigned'),('Unassigned'),('Unassigned');
CREATE INDEX data_idx_status ON data(status);
this query
WITH individuals AS (SELECT status, count(status) AS total FROM data GROUP BY status)
SELECT json_object('data'
, json_object('status_count'
, json_group_object(status, total)
, 'total_requests'
, (SELECT sum(total) FROM individuals)))
FROM individuals;
will return one row holding (After running through a JSON pretty printer; the actual string is more compact):
{
"data": {
"status_count": {
"Assigned": 1,
"In Progress": 1,
"Peer Review": 1,
"Ready to Publish": 1,
"Unassigned": 4
},
"total_requests": 8
}
}
If the sqlite instance you're using wasn't built with support for JSON1:
SELECT status, count(status) AS total FROM data GROUP BY status;
will give
status total
-------------------- ----------
Assigned 1
In Progress 1
Peer Review 1
Ready to Publish 1
Unassigned 4
which you can iterate through in python, inserting each row into your dict and adding up all total values in another variable as you go to get the total_requests value at the end. No need for another query just to calculate that number; do it manually. I bet it's really easy to do the same thing with your existing second sqlachemy query.

CosmosDB SQL String functions not working with a join?

I have a collection in DocumentDB with objects that look like this:
{
"id":"1de03a93-729d-43da-985a-12584079b4f8",
"Components":[
{
"Name":"MyComponentName1",
"Value": 12345
},
{
"Name":"MyComponentName2",
"Value": 34567
},
{
"Name":"MyComponentName3",
"Value": 56789
}
]
...other properties irrelevant to question...
}
When querying CosmosDB, I have the following query:
SELECT VALUE d FROM c
JOIN d IN c.Components
WHERE d.Name="MyComponentName1"
which correctly returns:
{
"Name":"MyComponentName1",
"Value":12345
}
However, when I attempt to query based on a String operator:
SELECT VALUE d FROM c
JOIN d IN c.Components
WHERE CONTAINS(d.Name,'MyComponent') --OR STARTSWITH OR ENDSWITH
I get no results.
If I take the same query as above but I add an id restriction to the where clause:
SELECT VALUE d FROM c
JOIN d IN c.Components
WHERE CONTAINS(d.Name,'MyComponent')
AND c.id = "1de03a93-729d-43da-985a-12584079b4f8"
I get back the results I expect, but obviously only for that id. I need all of the documents that match the String operator.
Is this a bug with CosmosDB, or am I doing something wrong?
Nick,
Make sure that you're following all the continuations when you execute this query. Please keep in mind that the query w/ Contains will result in a full scan and hence it might not finish in a single continuation. This is the same case w/ EndsWith. For StartsWith, however, it should utilize the index, but only if the collection index policy define range index on strings; otherwise, it will still be a scan.

Ora-00904 - Error with creating a View

I'm stuck with creating a view in Oracle, but before I create the view, I always test it first and I always got this error: Ora-00904.
This is the situation. I have this one Set of Query let say Query A that I need to combined using UNION ALL with the Query A itself with only few modifications applied to create another bigger Set of Query - Query B. The main constraint that keeps me on doing this is the Database Design, and I'm not in the position in the company to change it, so I have to adapt to it. Query A unions Query A for 6 times creating Query B. The additional Major constraint is Query B is from 1 database user only, but there are 54 database users with the same structures that I need to fetch the same query. Query B (db user1) unions Query B (db user2) unions Query B (db user3) and so on until 54 then finally creating Query C --- the final output. My scrip has already reached 6048 lines, then I got this problem that I don't get when I test Query A and Query B. All my table names, owner names, and column names are all correct but I got that error.
This is the code (that needs to be repeated for 54x6 times) - the Query A. Query B applies some similar modification only.:
Select
'2013' "YEAR",
Upper(a.text_month) "MONTH",
Upper('Budget') "VERSION",
case
when length(b.level1_name) > 5 then 'Parent'
else 'SUBSIDIARIES'
end "COMPANY_GROUP",
case
when length(b.level1_name) < 6 and b.level1_name <> '1000' then 'Subsidiaries'
else '1000 Parent'
end "COMPANY",
case
when length(b.level1_name) < 6 and b.level1_name <> '1000' then 'SUBS'
else '1000'
end "COMPANY_CODE",
case
when length(b.level1_name) > 5 then 'Parent'
else 'SUBSIDIARIES'
end "COMPANY_NAME",
b.level1_displayname "DIVISION",
b.level1_name "DIVISION_CODE",
case
when length(b.level1_name) > 5 then ltrim(upper(substr(b.level1_displayname, 8)))
else upper(ltrim(substr(b.level1_displayname, 10)))
end "DIVISION_NAME",
upper(a.text_nature_of_trip) "NATURE_OF_TRAVEL",
upper(a.text_placeeventstraining) "TRAVEL_DETAILS",
upper(a.text_country) "COUNTRY",
a.text_name_of_employee "EMPLOYEE_NAME", a.float_no_of_attendees "NO_OF_ATTENDEES",
a.text_sponsored "SPONSORED",
a.text_remarks "REMARKS",
'OTHER TRAVEL EXPENSES' "COST_ELEMENT",
a.FLOAT_702120005_OTHER_TRAVEL_E "AMOUNT"
From PUBLISH_PNL_AAAA_2013.et_travel_transaction a,
PUBLISH_PNL_AAAA_2013.cy_2elist b
Where a.elist = b.level3_iid
ORA-00904 is "invalid column name" -- either you've spelled the column name wrongly, or prefixed it with the wrong table alias, omitted quotes from a string literal, or any number of other issues.
Check the point in the code that the error message mentions for mistakes like that.

Resources