Getting values from array in Cosmos Db - azure-cosmosdb

My document that I save in Cosmos DB looks like this:
{
"id": "abc123",
"myProperty": [
"1905844b-6ca9-4967-ba40-a736b685ca62",
"b03cc85c-ef0b-4f48-9c31-800de089190a"
]
}
As you can see, in the myProperty property, I have an array of GUID values and I want to read them as an array/list of GUID values but I'm having trouble formulating the correct SELECT statement.
The output I'm looking for is:
[
"1905844b-6ca9-4967-ba40-a736b685ca62",
"b03cc85c-ef0b-4f48-9c31-800de089190a"
]
The closest I could get is this `SELECT statement:
SELECT VALUE c.myProperty FROM c WHERE c.id = "abc123"
But this doesn't give me exactly what I want either. This gives me an array within an array i.e.
[
[
"1905844b-6ca9-4967-ba40-a736b685ca62",
"b03cc85c-ef0b-4f48-9c31-800de089190a"
]
]
What should my SELECT statement look like to get what I want?

I dont think you can ever get anything else, because cosmos db will always return an array in response to a query because potentially there can be 0-infinity results. so you will always get a top level array that will wrap all your results (even if you have only one)

Related

Cosmos DB query on key-value pairs

I have a large collection of json documents whose structure is in the form:
{
"id": "00000000-0000-0000-0000-000000001122",
"typeId": 0,
"projectId": "p001",
"properties": [
{
"id": "a6fdd321-562c-4a40-97c7-4a34c097033d",
"name": "projectName",
"value": "contoso",
},
{
"id": "d3b5d3b6-66de-47b5-894b-cdecfc8afc40",
"name": "status",
"value": "open",
},
.....{etc}
]
}
There may be a lot of properties in the collection, all identified by the value of name. The fields in properties are pretty consistent -- there may be some variability, but they will all have the fields that I care about. There's an Id, some labels, etc
I'm wanting to combine these with some other data in PowerBI using the projectId to create some very valuable reports.
I think what I want to do it 'normalize' this data into a table, like:
ProjectId
projectName
status
openDate
closeDate
manager
p001
contoso
open
20200101
me
etc
​
Where I'm at...
I can go:
SELECT c["value"] AS ProjectName
FROM c in t.Properties
WHERE c["name"] = "projectName"
... this will give me each projectName
I can do that a heap of times to get the 'values' (status, openDate, manager, etc)
If I want to combine them together then I would need to combine all those sub-queries together with 'id'. But 'id' in not in the scope of the SELECT, so how do I get it?? If I were to do this, it sounds like something that would be very expensive (RU's) to execute.
I think I'm overcomplicating this, but I cant quite get my head around the Cosmos syntax.
Help??
You can achieve it with JOINS and the WHERE expressions although the scheme is not ideal for querying and you should consider changing it.
SELECT
c['projectId'], --c.projectId also works, but value is a reserved keyword
n['value'] AS projectName,
s['value'] AS status
FROM c
JOIN n IN c.properties
JOIN s IN c.properties
WHERE n['name'] = 'projectName' AND s['name'] = 'status'
--note all filtered properties must appear exactly once for it to work properly
Edit; new query that solves the potential issue that filtered properties must appear exactly once.
SELECT
c['projectId'],
ARRAY(
SELECT VALUE n['value']
FROM n IN c.properties
WHERE n['name'] = 'projectName'
)[0] AS projectName,
ARRAY(
SELECT VALUE n['value']
FROM n IN c.properties
WHERE n['name'] = 'status'
)[0] AS status
FROM c

DynamoDB filter if primary key contains value

CURRENTLY
I have a table in DynamoDB with a single attribute - Primary Key - that contains unique values.
PK
------
#A#B#C#
#B#C#
#C#D#E#
#BC#
ISSUE
I am looking to do 2 searches for #B#C# (1) exact match, and (2) containing match, and therefore only want results:
(1) Exact Match:
#B#C#
(2) Containing Match:
#A#B#C#
#B#C#
Are these 2 searches possible against the primary key?
If so, what is the most efficient query to run? e.g. QUERY or SCAN
Note:
For (2) I am using the following code, but it is returning all items in DB:
params = {
TableName: 'myTable',
FilterExpression: "contains(#key, :v)",
ExpressionAttributeNames: { "#key": "PK" },
ExpressionAttributeValues: { ":v": #B#C# }
}
dynamodb.scan(params,callback)
DynamoDB supports two main types of searches: query and scan. The Query operation finds items based on primary key values. The Scan operation returns one or more items and item attributes by accessing every item in a table or a secondary index
If you wanted to find the item with a primary key #B#C, you would use the query API:
ddbClient.query(
{
"TableName": "<YOUR TABLE NAME>",
"KeyConditionExpression": "#pk = :pk",
"ExpressionAttributeValues": {
":pk": {
"S": "#B#C"
}
},
"ExpressionAttributeNames": {
"#pk": "PK"
}
}
)
For your second access pattern, you'll need to use the scan API because you are searching across the entire table/secondary index.
You can use scan to test if a primary key has a substring using contains. I don't see anything wrong with the format of your scan operation.
Be careful when using scan this way. Because scan will read your entire table to fetch results, you will have a fairly inefficient operation at scale. If this operation is run infrequently, or you are running it against a sparse index, it's probably fine. However, if it's one of your primary access patterns, you may want to reconsider using the scan API for this operation.

Top N per Classification in CosmosDB

I'm kinda stuck on this issue. I have several hundreds of a certain model stored in ComsosDb and I can't seem to get the top 5 of each category.
This is the model:
"id": "06224840-6b88-4394-9324-4d1628383702",
"name": "Reservation",
"description": null,
"client": null,
"reference": null,
"isMonitoring": false,
"monitoringSince": null,
"hasRiskProfile": false,
"riskProfile": -1,
"monitorFrequency": 0,
"mainBindable": null,
"organizationId": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
"userId": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
"createDate": "2020-08-18T11:00:02.5266403Z",
"updateDate": "2020-08-18T11:00:02.5266419Z",
"lastMonitorDate": "2020-08-18T11:00:02.5266427Z"
So what i'm trying to do is use C# to get the top 5 from each risk profile where the organizationId matches. GroupBy through LINQ throws an error, same with a row_number() query combined with a PARTITION BY, doesn't seem to work either.
Any way I can get this to work in a single query compatible with cosmos?
EDIT:
What i am trying to achieve in CosmosDb is this roughly:
WITH TopEntries AS (
SELECT *
,ROW_NUMBER() OVER (
PARTITION BY [riskProfile]
ORDER BY [updateDate] DESC
) AS [ROW NUMBER]
WHERE [organizationId] = "xyz"
FROM [reservations]
)
SELECT * FROM TopEntries
WHERE TopEntries.[ROW NUMBER] <= 5
It sounds like combining TOP and ORDER BY would do the job. For example:
SELECT TOP 5 *
FROM c
WHERE c.organizationId = "xyz"
ORDER BY c.riskProfile
You can build such queries with parameters in the .NET SDK as in this sample.
The functionality you are trying to achieve is not directly possible through single query in Cosmos DB. There are 2 steps to do this(You can change as per you document sets)
Firstly you will have to group by like below:
SELECT c.city FROM c where c.org = 'xyz' group by c.city
Then loop through the result one by one from the first query like below:
SELECT TOP 5 * FROM C WHERE C.city = 'delhi' order by C.date desc
You can refer to similar issue here:
https://learn.microsoft.com/en-us/answers/questions/38454/index.html

CosmosDB SQL String functions not working with a join?

I have a collection in DocumentDB with objects that look like this:
{
"id":"1de03a93-729d-43da-985a-12584079b4f8",
"Components":[
{
"Name":"MyComponentName1",
"Value": 12345
},
{
"Name":"MyComponentName2",
"Value": 34567
},
{
"Name":"MyComponentName3",
"Value": 56789
}
]
...other properties irrelevant to question...
}
When querying CosmosDB, I have the following query:
SELECT VALUE d FROM c
JOIN d IN c.Components
WHERE d.Name="MyComponentName1"
which correctly returns:
{
"Name":"MyComponentName1",
"Value":12345
}
However, when I attempt to query based on a String operator:
SELECT VALUE d FROM c
JOIN d IN c.Components
WHERE CONTAINS(d.Name,'MyComponent') --OR STARTSWITH OR ENDSWITH
I get no results.
If I take the same query as above but I add an id restriction to the where clause:
SELECT VALUE d FROM c
JOIN d IN c.Components
WHERE CONTAINS(d.Name,'MyComponent')
AND c.id = "1de03a93-729d-43da-985a-12584079b4f8"
I get back the results I expect, but obviously only for that id. I need all of the documents that match the String operator.
Is this a bug with CosmosDB, or am I doing something wrong?
Nick,
Make sure that you're following all the continuations when you execute this query. Please keep in mind that the query w/ Contains will result in a full scan and hence it might not finish in a single continuation. This is the same case w/ EndsWith. For StartsWith, however, it should utilize the index, but only if the collection index policy define range index on strings; otherwise, it will still be a scan.

How to remove collection or edge document using for loop in ArangoDB?

I'm using the latest ArangoDB 3.1 on Windows 10.
Here I want to remove the collection document and edge document using the for loop. But I'm getting an error like document not found (vName).
vName contains the many collection names. But I dunno how to use it in for loop.
This is the AQL I am using to remove the documents from the graph:
LET op = (FOR v, e IN 1..1 ANY 'User/588751454' GRAPH 'my_graph'
COLLECT vid = v._id, eid = e._id
RETURN { vid, eid }
)
FOR doc IN op
COLLECT vName = SPLIT(doc.vid,'/')[0],
vid = SPLIT(doc.vid,'/')[1],
eName = SPLIT(doc.eid,'/')[0],
eid = SPLIT(doc.eid,'/')[1]
REMOVE { _key: vid } in vName
Return output im getting from the AQL (Web UI screenshot)
vName is a variable introduced by COLLECT. It is a string with the collection name of a vertex (extracted from vid / v._id). You then try to use it in the removal operation REMOVE { ... } IN vName.
AQL does not support dynamic collection names however, collection names must be known at query compile time:
Each REMOVE operation is restricted to a single collection, and the collection name must not be dynamic.
Source: https://docs.arangodb.com/3.2/AQL/Operations/Remove.html
So, you either have to hardcode the collection into the query, e.g. REMOVE { ... } IN User, or use the special bind parameter syntax for collections, e.g. REMOVE { ... } IN ##coll and bind parameters: {"#coll": "User", ...}.
This also means that REMOVE can only delete documents in a single collection.
It's possible to workaround the limitation somewhat by using subqueries like this:
LET x1 = (FOR doc IN User REMOVE aa IN User)
LET x2 = (FOR doc IN relations REMOVE bb IN relations)
RETURN 1
The variables x1 and x2 are syntactically required and receive an empty array as subquery result. The query also requires a RETURN statement, even if we don't expect any result.
Do not attempt to remove from the same collection twice in the same query though, as it would raise a access after data-modification error.

Resources