Azure DocumentDB ARRAY_CONTAINS on nested documents - azure-cosmosdb

It seems like the ARRAY_CONTAINS function on nested documents never matches any document.
For example, trying the following simple query with the Azure DocumentDB Query Playground would return no result, even if some nested documents should match this query.
SELECT *
FROM food
WHERE ARRAY_CONTAINS(food.tags.name, "blueberries")
This past question on Stack Overflow also infered that this kind of nested query is valid.
Thank you

The first argument to ARRAY_CONTAINS must be an array. For example, in this case food.tags is valid as an argument, but food.tags.name is not.
Both the following DocumentDB queries are valid and might be what you're looking for:
SELECT food
FROM food
JOIN tag IN food.tags
WHERE tag.name = "blueberries"
Or
SELECT food
FROM food
WHERE ARRAY_CONTAINS(food.tags, { name: "blueberries" })

Related

How to properly use MATCH inside UNWIND for a Nebula query

I’m currently working with the Nebula graph database for the first time and I’m running into some issues with a query. In terms of the schema, I have “Person” nodes, which have a “name” property, as well as Location nodes also with a name property. These node types can be connected by a relationship edge, called HAS_LIVED (to signify whether a person has lived in a certain location). Now for the query, I have a list of names (strings). The query looks like:
UNWIND [“Anna”, “Emma”, “Zach”] AS n
MATCH (p:Person {name: n})-[:HAS_LIVED]->(loc)
RETURN loc.Location.name
This should return a list of three places, i.e. [“London”, “Paris”, “Berlin”]. However, I am getting nothing as a result from the query. When I get rid of the UNWIND and write three separate MATCH queries with each name, it works individually. Not sure why.
Try this instead. It is using "where" clause.
UNWIND [“Anna”, “Emma”, “Zach”] AS n
MATCH (p:Person)-[:HAS_LIVED]->(loc)
where p.name = n
RETURN loc.Location.name

How to use whereIn filter multiple times in a query

I am using this code to retrive data from firestore
querysnap = FirebaseFirestore.instance
.collection("datas")
.where("cat", whereIn: Cateogryarray )
.where("City", whereIn:Cityarray)
.snapshots();
Then I get this error.
You cannot use 'whereIn' filters more than once.
How can I execute this query.
As the Firestore documentation on query limitations says:
You can use at most one in, not-in, or array-contains-any clause per query. You can't combine these operators in the same query.
Since you're trying to use two in clauses in your query, Firestore gives an error.
The most common workaround is to run with one clause against the database (typically the one you expect to exclude most documents from the result), and perform the rest of the filtering in your application code.
Another idea is to get individual snapshots for each parameter that would need the inclause.
Then once you have each of your snapshots, do an intersection of the lists. This will return the results where every record adheres to each individual filter.

Cosmos DB composite index best practices?

I've got some pretty high Cosmos usage right now that I'd like to reduce, and I think the way to do that is through composite indices, but I'm a little confused about the best approach.
My actual queries get more complex than this, but let's say I have 2 queries that look like this:
SELECT TOP 100 * FROM c WHERE c.partitionkey=n AND c.data.subdata1="str1" ORDER BY c._ts DESC
SELECT TOP 100 * FROM c WHERE c.partitionkey=n AND c.data.subdata1="str1" AND c.data.subdata2="str2" ORDER BY c._ts DESC
If I create a composite index that looks like this, will it help? Should I create two separate indices, one for each query? Should I put the partitionkey into the composite index, even though I'll only ever be searching on a single partition?
"compositeIndexes":[
[
{
"path":"/data/subdata1",
"order":"ascending"
},
{
"path":"/_ts",
"order":"descending"
}
]
]
In Cosmos DB, composite indexes will have a performance benefit for queries that have a multiple filters or both a filter and an ORDER BY clause. So in your case, I think the composite index will help.
Should I put the partitionkey into the composite index, even though
I'll only ever be searching on a single partition?
I believe that put the partitionkey into the composite index will improve performance of your SQL, although you search on a single partition.
The best practice is to test your SQL with different composite indexes in Azure Cosmos DB Emulator, and according to the Query Status to decide which to use.
I think you should have 2 composite indexes
First one should have partitionkey, subdata1 and _ts
Second one should have partitionkey, subdata2 and _ts
If your data is too large and you don't want to re-index, I would suggest to remove ORDER BY in database level and move it to your code.

Azure CosmosDB IS_DEFINED vs NOT IS_DEFINED

I was trying to query a collection, which had few documents. Some of the collections had "Exception" property, where some don't have.
My end query looks some thing like:
Records that do not contain Exception:
**select COUNT(1) from doc c WHERE NOT IS_DEFINED(c.Exception)**
Records that contain Exception:
**select COUNT(1) from doc c WHERE IS_DEFINED(c.Exception)**
But this seems not be working. When NOT IS_DEFINED is returning some count, IS_DEFINED is returning 0 records, where it actually had data.
My data looks something like (some documents can contain Exception property & others don't):
[{
'Name': 'Sagar',
'Age': 26,
'Exception: 'Object reference not set to an instance of the object', ...
},
{
'Name': 'Sagar',
'Age': 26, ...
}]
Update
As Dax Fohl said in an answer NOT IS_DEFINED is implemented now. See the the cosmos dev blob April updates for more details.
To use it properly the queried property should be added to the index of the collection.
Excerpt from the blog post:
Queries with inequality filters or filters on undefined values can now
be run more efficiently. Previously, these filters did not utilize the
index. When executing a query, Azure Cosmos DB would first evaluate
other less expensive filters (such as =, >, or <) in the query. If
there were inequality filters or filters on undefined values
remaining, the query engine would be required to load each of these
documents. Since inequality filters and filters on undefined values
now utilize the index, we can avoid loading these documents and see a
significant improvement in RU charge.
Here’s a full list of query filters with improvements:
Inequality comparison expression (e.g. c.age != 4)
NOT IN expression (e.g. c.name NOT IN (‘Luis’, ‘Andrew’, ‘Deborah’))
NOT IsDefined
Is expressions (e.g. NOT IsDefined(c.age), NOT IsString(c.name))
Coalesce operator expression (e.g. (c.name ?? ‘N/A’) = ‘Thomas’)
Ternary operator expression (e.g. c.name = null ? ‘N/A’ : c.name)
If you have queries with these filters, you should add an index for
the relevant properties.
The main difference between IS_DEFINED and NOT IS_DEFINED is the former utilizes the index while the later does not (same w/ = vs. !=). It's most likely the case here is IS_DEFINED query finishes in a single continuation and thus you get the full COUNT result. On the other hand, it seems that NOT IS_DEFINED query did not finish in a single continuation and thus you got partial COUNT result. You should get the full result by following the query continuation.

Get the number of records from MDX query with Subcubes

I'm developing a system for generate mdx queries from entity "FilterCriterias" and related info like the number of records of a query, so I need a generic way to get the number of records of a mdx query than use subcubes. In a normal query I do something like:
WITH
MEMBER [MyCount] AS
Count([Date].[Date].MEMBERS)
SELECT
{[MyCount]} ON 0
FROM [Adventure Works];
But I have problems when use this way in queries a little more complexes like that
WITH
MEMBER [MyCount] AS
Count([Date].[Date].MEMBERS)
SELECT
{[MyCount]} ON 0
FROM
(
SELECT
{[Measures].[Sales Amount]} ON 0
,{[Date].[Date].&[20050701] : [Date].[Date].&[20051231]} ON 1
FROM
(
SELECT
{[Sales Channel].[Sales Channel].&[Internet]} ON 0
FROM [Adventure Works]
)
);
I guess the logic response could be the number of records of [Date].[Members] left in the subcube, but I get a result without columns and rows. I'm newbie in mdx language and I don't understand this behavior. Exists some generic way to get the number of records from a "base" query just like SELECT COUNT(*) FROM () in plain SQL?
The structure is quite different to a ralational SELECT COUNT(*) FROM ().
I believe that the structure of a sub-select will be very similar to that of a sub-cube and reading through this definition from MSDN (https://msdn.microsoft.com/en-us/library/ms144774.aspx) of what a sub-cube contains tells us that it isn't a straight filter like in a relational query:
Admittedly I still find this behaviour rather "enigmatic" (a polite way of saying "I do not understand it")
Is there a workaround?

Resources