I have a collection in DocumentDB with objects that look like this:
{
"id":"1de03a93-729d-43da-985a-12584079b4f8",
"Components":[
{
"Name":"MyComponentName1",
"Value": 12345
},
{
"Name":"MyComponentName2",
"Value": 34567
},
{
"Name":"MyComponentName3",
"Value": 56789
}
]
...other properties irrelevant to question...
}
When querying CosmosDB, I have the following query:
SELECT VALUE d FROM c
JOIN d IN c.Components
WHERE d.Name="MyComponentName1"
which correctly returns:
{
"Name":"MyComponentName1",
"Value":12345
}
However, when I attempt to query based on a String operator:
SELECT VALUE d FROM c
JOIN d IN c.Components
WHERE CONTAINS(d.Name,'MyComponent') --OR STARTSWITH OR ENDSWITH
I get no results.
If I take the same query as above but I add an id restriction to the where clause:
SELECT VALUE d FROM c
JOIN d IN c.Components
WHERE CONTAINS(d.Name,'MyComponent')
AND c.id = "1de03a93-729d-43da-985a-12584079b4f8"
I get back the results I expect, but obviously only for that id. I need all of the documents that match the String operator.
Is this a bug with CosmosDB, or am I doing something wrong?
Nick,
Make sure that you're following all the continuations when you execute this query. Please keep in mind that the query w/ Contains will result in a full scan and hence it might not finish in a single continuation. This is the same case w/ EndsWith. For StartsWith, however, it should utilize the index, but only if the collection index policy define range index on strings; otherwise, it will still be a scan.
Related
I have a large collection of json documents whose structure is in the form:
{
"id": "00000000-0000-0000-0000-000000001122",
"typeId": 0,
"projectId": "p001",
"properties": [
{
"id": "a6fdd321-562c-4a40-97c7-4a34c097033d",
"name": "projectName",
"value": "contoso",
},
{
"id": "d3b5d3b6-66de-47b5-894b-cdecfc8afc40",
"name": "status",
"value": "open",
},
.....{etc}
]
}
There may be a lot of properties in the collection, all identified by the value of name. The fields in properties are pretty consistent -- there may be some variability, but they will all have the fields that I care about. There's an Id, some labels, etc
I'm wanting to combine these with some other data in PowerBI using the projectId to create some very valuable reports.
I think what I want to do it 'normalize' this data into a table, like:
ProjectId
projectName
status
openDate
closeDate
manager
p001
contoso
open
20200101
me
etc
Where I'm at...
I can go:
SELECT c["value"] AS ProjectName
FROM c in t.Properties
WHERE c["name"] = "projectName"
... this will give me each projectName
I can do that a heap of times to get the 'values' (status, openDate, manager, etc)
If I want to combine them together then I would need to combine all those sub-queries together with 'id'. But 'id' in not in the scope of the SELECT, so how do I get it?? If I were to do this, it sounds like something that would be very expensive (RU's) to execute.
I think I'm overcomplicating this, but I cant quite get my head around the Cosmos syntax.
Help??
You can achieve it with JOINS and the WHERE expressions although the scheme is not ideal for querying and you should consider changing it.
SELECT
c['projectId'], --c.projectId also works, but value is a reserved keyword
n['value'] AS projectName,
s['value'] AS status
FROM c
JOIN n IN c.properties
JOIN s IN c.properties
WHERE n['name'] = 'projectName' AND s['name'] = 'status'
--note all filtered properties must appear exactly once for it to work properly
Edit; new query that solves the potential issue that filtered properties must appear exactly once.
SELECT
c['projectId'],
ARRAY(
SELECT VALUE n['value']
FROM n IN c.properties
WHERE n['name'] = 'projectName'
)[0] AS projectName,
ARRAY(
SELECT VALUE n['value']
FROM n IN c.properties
WHERE n['name'] = 'status'
)[0] AS status
FROM c
Consider these queries:
SELECT COUNT(1) AS failures
FROM c
WHERE c.time = 1623332779 AND c.status = 'FAILURE'
SELECT COUNT(1) AS successes
FROM c
WHERE c.time = 1623332779 AND c.status = 'SUCCESS'
How can I combine these two distinct queries into one query?
I tried repurposing the answers from How to get multiple counts with one SQL query?, but ran into a few problems:
COUNT(*) throws an error "Syntax error, incorrect syntax near '*'."
UNION throws "Syntax error, incorrect syntax near 'UNION'."
I also experimented with
SELECT
SUM(CASE WHEN c.time = 1623332779 THEN 1 else 0 end)
FROM c
but this leads to another syntax error. I noticed that
SELECT COUNT(1) AS mycounter, COUNT(1) AS mycounter2
FROM c
WHERE c.time = 1623332779
returns
[
{
"mycounter": 3,
"mycounter2": 3
}
]
but I was unable to link these distinct counters to distinct queries.
The following should work. The count operator skips values that are undefined which allows you to filter out rows from it:
SELECT
COUNT(c.status = 'SUCCESS' ? 1 : undefined) AS successes,
COUNT(c.status = 'FAILURE' ? 1 : undefined) AS failures
FROM c
WHERE c.time = 1623332779
It ruins performance though as it doesn't use indexing at all for the count. So you're better off using two seperate queries. If you really want to use a single request you could create a stored procedure that runs both queries and pastes the results together.
Instead of doing counts of the overall query, you can use GROUP BY to get counts in a single query. For example:
SELECT c.time, c.status, COUNT(c.status) AS statuscount
FROM c
WHERE c.time = "1623332779"
GROUP BY c.time, c.status
This won't give you explicit counts called "successes" and "failures" but it will return both counts, something like:
[
{
"time": "1623332779",
"status": "FAILURE",
"statuscount": 123
},
{
"time": "1623332779",
"status": "SUCCESS",
"statuscount": 456
}
]
I have a document which has 2 list attributes
{
CurrentDocument:[
{ DocName: "name1", DocType: "Identity" },
{ DocName: "name2", DocType: "Authorization" }
],
ClosedDocument:[
{ DocName: "name3", DocType: "Passport" }
]
}
I want to have a query that return DocName & DocType of my two lists.
I can't use Join because if one of the list is empty, my query return nothing.
Furthermore, in case of a join, I can't merge all my attributes in one list.
SELECT cur.DocName AS curName, clo.DocName AS cloName FROM c JOIN cur IN c.CurrentDocument JOIN clo IN c.ClosedDocument
This query is not what I'm looking for cause :
if one list is empty, I lost all data
I get a list of value n*m which has duplicates (n : number of CurrentDocument, m : number of ClosedDocument)
I tried using the Union expression, but i can't seem to make it work in a query.
Thanks in advance.
Use UDF for this.
Create the following UDF
function userDefinedFunction(current, closed){
return current.concat(closed);}
Use it in your query
SELECT udf.MergeLists(o.CurrentDocument, o.ClosedDocument) as merged FROM Orders o WHERE o.id = 'a811d13f-a308-4df1-85c1-31e566e9fc1e'
This returns the following
My document that I save in Cosmos DB looks like this:
{
"id": "abc123",
"myProperty": [
"1905844b-6ca9-4967-ba40-a736b685ca62",
"b03cc85c-ef0b-4f48-9c31-800de089190a"
]
}
As you can see, in the myProperty property, I have an array of GUID values and I want to read them as an array/list of GUID values but I'm having trouble formulating the correct SELECT statement.
The output I'm looking for is:
[
"1905844b-6ca9-4967-ba40-a736b685ca62",
"b03cc85c-ef0b-4f48-9c31-800de089190a"
]
The closest I could get is this `SELECT statement:
SELECT VALUE c.myProperty FROM c WHERE c.id = "abc123"
But this doesn't give me exactly what I want either. This gives me an array within an array i.e.
[
[
"1905844b-6ca9-4967-ba40-a736b685ca62",
"b03cc85c-ef0b-4f48-9c31-800de089190a"
]
]
What should my SELECT statement look like to get what I want?
I dont think you can ever get anything else, because cosmos db will always return an array in response to a query because potentially there can be 0-infinity results. so you will always get a top level array that will wrap all your results (even if you have only one)
I'm using the latest ArangoDB 3.1 on Windows 10.
Here I want to remove the collection document and edge document using the for loop. But I'm getting an error like document not found (vName).
vName contains the many collection names. But I dunno how to use it in for loop.
This is the AQL I am using to remove the documents from the graph:
LET op = (FOR v, e IN 1..1 ANY 'User/588751454' GRAPH 'my_graph'
COLLECT vid = v._id, eid = e._id
RETURN { vid, eid }
)
FOR doc IN op
COLLECT vName = SPLIT(doc.vid,'/')[0],
vid = SPLIT(doc.vid,'/')[1],
eName = SPLIT(doc.eid,'/')[0],
eid = SPLIT(doc.eid,'/')[1]
REMOVE { _key: vid } in vName
Return output im getting from the AQL (Web UI screenshot)
vName is a variable introduced by COLLECT. It is a string with the collection name of a vertex (extracted from vid / v._id). You then try to use it in the removal operation REMOVE { ... } IN vName.
AQL does not support dynamic collection names however, collection names must be known at query compile time:
Each REMOVE operation is restricted to a single collection, and the collection name must not be dynamic.
Source: https://docs.arangodb.com/3.2/AQL/Operations/Remove.html
So, you either have to hardcode the collection into the query, e.g. REMOVE { ... } IN User, or use the special bind parameter syntax for collections, e.g. REMOVE { ... } IN ##coll and bind parameters: {"#coll": "User", ...}.
This also means that REMOVE can only delete documents in a single collection.
It's possible to workaround the limitation somewhat by using subqueries like this:
LET x1 = (FOR doc IN User REMOVE aa IN User)
LET x2 = (FOR doc IN relations REMOVE bb IN relations)
RETURN 1
The variables x1 and x2 are syntactically required and receive an empty array as subquery result. The query also requires a RETURN statement, even if we don't expect any result.
Do not attempt to remove from the same collection twice in the same query though, as it would raise a access after data-modification error.