Composite index for optional field in Cosmos - azure-cosmosdb

I have a collection in Cosmos DB which contains documents of different types (and schemas):
{
"partKey": "...",
"type": "type1",
"data": {
"field1": 123,
"field2": "sdfsdf"
}
}
{
"partKey": "...",
"type": "type2",
"data": {
"field3": ["123", "456", "789"]
}
}
I'm trying to create a composite index [/type, /data/field3/[]/?], but faced an issue:
The indexing path '\\/data\\/field3\\/[]\\/?' could not be accepted, failed near position '15'. Please ensure that the path is a valid path. Common errors include invalid characters or absence of quotes around labels

We don't support wildcards for Composite Indexes in Cosmos DB. Here is a composite index sample as reference.
We will update our docs to be more clear in this. I looked over these and we don't currently document this today.
Thanks.

In composite indexes, you just need to specify the paths that you want to index, rather than the values, so for your example:
"compositeIndexes":[
[
{
"path":"/type",
"order":"ascending"
},
{
"path":"/data/field3",
"order":"descending"
}
]
]
Just specify the order type you need for your queries (I've just used these ones as an example).
For different documents that have different properties underneath your data property, I believe you will have to add each composite index for each use case that you need since composite indexes don't support wildcards, so you would need to add:
/data/field1 /data/field2 etc etc
Hope this helps.

Related

How to query documents where contains an array and the value of the array is ["val1", "val2"] Firestore

How can I get a collection where the query should be applicable to an array inside the document.
Document example: I would like to know how to query the document where the brands are fiat and seat
{
"name":"test 1",
"brands":[
{
"brand":{
"id":1,
"name":"Fiat",
"slug":"fiat",
"image":null,
"year_end":null,
"year_start":null
},
"released_at":"2018-10-26"
},
{
"brand":{
"id":2,
"name":"Seat",
"slug":"seat",
"image":null,
"year_end":null,
"year_start":null
},
"released_at":"2018-10-26"
},
{
"brand":{
"id":3,
"name":"Mercedes",
"slug":"mercedes",
"image":null,
"year_end":null,
"year_start":null
},
"released_at":"2018-10-26"
},
{
"brand":{
"id":4,
"name":"Yamaha",
"slug":"yamaha",
"image":null,
"year_end":null,
"year_start":null
},
"released_at":"2018-10-26"
}
]
}
I have tried something like:
.collection("motors")
.where("brands.slug", "array-contains-any", ["fiat", "seat"])
but this is not working I cannot figure out by the documentation how to get this.
When using the array-contains-any operator, you can check the values of your array against the value of a property of type String and not an array. There is currently no way you can use array-contains-any operator on an array. There are two options, one would be to create two separate fields and create two separate queries or, been only a document, you can get the entire document and filter the data on the client.
Edit:
What #FrankvanPuffelen has commented is correct, I made some research and I found that we can check against any type and even complex types, not just against strings, as mentioned before. The key to solving this issue is to match the entire object, meaning all properties of that object and not just a partial match, for example, one of three properties.
What you are trying to achieve is not working with your current database structure because your slug property exists in an object that is nested within the actual object that exists in your array. A possible solution might also be to duplicate some data and add only the desired values into an array and use the array-contains-any operator on this new creatded array.

AppSync query resolver: are expressionNames and expressionValues necessary?

https://docs.aws.amazon.com/appsync/latest/devguide/resolver-mapping-template-reference-dynamodb.html#aws-appsync-resolver-mapping-template-reference-dynamodb-query
AppSync doc says that expressionNames and expressionValues are optional fields, but they are always populated by code generation. First question, should they be included as a best practice when working with DynamoDB? If so, why?
AppSync resolver for a query on the partition key:
{
"version": "2017-02-28",
"operation": "Query",
"query": {
"expression": "#partitionKey = :partitionKey",
"expressionNames": {
"#partitionKey": "partitionKey"
},
"expressionValues": {
":partitionKey": {
"S": "${ctx.args.partitionKey}"
}
}
}
}
Second question, what exactly is the layman translation of the expression field here in the code above? What exactly is that statement telling DynamoDB to do? What is the use of the # in "expression": "#partitionKey = :partitionKey" and are the expression names and values just formatting safeguards?
Let me answer your second question first:
expressionNames
expressionNames are used for interpolation. What this means is after interpolation, this filter expression object:
"expression": "#partitionKey = :value",
"expressionNames": {
"#partitionKey": "id"
}
will be transformed to:
"expression": "id = :value",
the #partitionKey acts as a placeholder for your column name id. '#' happens to be the delimiter.
But why?
expressionNames are necessary because certain keywords are reserved by DynamoDB, meaning you can't use these words inside a DynamoDB expression.
expressionValues
When you need to compare anything in a DynamoDB expression, you will need also to use a substitute for the actual value using a placeholder, because the DynamoDB typed value is a complex object.
In the following example:
"expression": "myKey = :partitionKey",
"expressionValues": {
":partitionKey": {
"S": "123"
}
}
:partitionKey is the placeholder for the complex value
{
"S": "123"
}
':' is the different delimiter that tells DynamoDB to use the expressionValues map when replacing.
Why are expressionNames and expressionValues always used by code generation?
It is just simpler for the code generation logic to always use expressionNames and expressionValues because there is no need to have two code paths for reserved/non-reserved DynamoDB words. Using expressionNames will always prevent collisions!

Firebase nested data using "reference" : true instead of array

On the firebase structure data section, it shows how to structure data with a many-many user-group situation. But, why they have used "referece":true on both the side instead of using a simple array od ids.
Like, it can be used like both the ways:
A user having array of groups
"groups" : [ "groupId1", "groupId2", ... ]
A user having
"groups": {
"groupId1" : true,
"groupId2" : true,
..
}
They have done it a second way. What is the reason for that?
Something was told at the Google I/O 2016 for that in some video. But, I'm unable to recall.
Example from structure your data:
// An index to track Ada's memberships
{
"users": {
"alovelace": {
"name": "Ada Lovelace",
// Index Ada's groups in her profile
"groups": {
// the value here doesn't matter, just that the key exists
"techpioneers": true,
"womentechmakers": true
}
},
...
},
"groups": {
"techpioneers": {
"name": "Historical Tech Pioneers",
"members": {
"alovelace": true,
"ghopper": true,
"eclarke": true
}
},
...
}
}
Firebase recommends against using arrays in its database for most cases. Instead of repeating the reasons here, I'll refer you to this classic blog post on arrays in Firebase.
Let's look at one simple reason you can easily see from your example. Since Firebase arrays in JavaScript are just associative objects with sequential, integer keys, your first sample is stored as:
"groups" : {
0: "groupId1",
1: "groupId2"
]
To detect whether this user is in groupId2, you have to scan all the values in the array. When there's only two values, that may not be too bad. But it quickly gets slower as you have more values. You also won't be able to query or secure this data, since neither Firebase Queries nor its security rules support a contains() operator.
Now look at the alternative data structure:
"groups": {
"groupId1" : true,
"groupId2" : true
}
In this structure you can see whether the user is in groupId2 by checking precisely one location: /groups/groupId2. It that key exists, the user is a member of groupId2. The actual value doesn't really matter in this case, we just use true as a marker value (since Firebase will delete a path if there's no value).
This will also work better with queries and security rules, because you now "just" needs an exists() operator.
For some great insights into this type of modeling, I highly recommend that article on NoSQL data modeling.

Freebase currency query

I have a little problem, maybe can you help me. I'm trying to get the "currency features" from freebase. So I tried to do : "/base/schemastaging/person_extra/net_worth": null but, I can't get the value written on freebase (for example, with Madonna, it's 650,000,000). Do you know, why it's not working ?
First of all, as the property path suggests, /base/schemastaging/person_extra/net_worth is just being staged right now so the final property ID will be something else (follow the mailing list to discuss new schema). You should NOT be using this property for anything other than experimentation.
The reason why you don't see the data that you want withe the following query is because this property is a CVT.
{
"id": "/en/madonna",
"type": "/base/schemastaging/person_extra",
"net_worth": null
}
CVT values are complex objects that need to be expanded to access the values that you want. In this case, net_worth is a CVT so that we can record a person's net worth at different points in time.
If you expand your query to include the relevant properties from /measurement_unit/dated_money_value you'll see the data that you're after.
{
"id": "/en/madonna",
"type": "/base/schemastaging/person_extra",
"net_worth": {
"amount": null,
"currency": null,
"valid_date": null
}
}
One other issue, that isn't obvious from this example, is that since there can be multiple dated money values, you'll need to make your query more precise so as to get only the latest value. You can do that like this;
{
"id": "/en/madonna",
"type": "/base/schemastaging/person_extra",
"net_worth": {
"amount": null,
"currency": null,
"valid_date": null,
"sort": "-valid_date",
"limit": 1,
"optional": true
}
}​
Update: Made net worth an optional property value.

Freebase MQL - Don't show parent object if a value in array element is present?

Trying to get some movies and their genres but leave out any records that contain the genre "Thriller" in the array of genres.
How do I not only ignore the genre key itself for "Thriller", but squelch that entire movie result? With my current query, Thriller is removed from the array of genres, but the parent object (film) is still displayed.
Here's my current workup in the query editor:
http://tinyurl.com/d2g54lj
[{
'type':'/film/film',
'limit':5,
'name':null,
'/film/film/genre': [],
'/film/film/genre!=': "Thriller",
}]​
The answer provided is correct, but changes some other stuff in the query too. Here's the direct equivalent to the original query:
[{
"type": "/film/film",
"limit": 5,
"name": null,
"genre": [],
"x:genre": {"name":"Thriller",
"optional":"forbidden"},
}]​
The important part is the "optional":"forbidden". The default property used is "name", but we need to specify it explicitly when we use a subclause (to allow us to specify the "optional" keyword). Using ids instead of names, as #kook did, is actually more reliable, so that's an improvement, but I wanted people to be able to see the minimum necessary to fix the broken query.
We can abbreviate the property name to "genre" from "/film/film/genre" since "type":"/film/film" is included (we also never need to use /type/object for properties like /type/object/name).
Answering my own question.
So the trick is to not use the != (but not) operator, but to actually flip it on its head and use the "|=" (one of) operator with 'forbid', like so:
[{
'type':'/film/film',
'limit':5,
'name':null,
'/film/film/genre': [{
"id": null,
"optional": true
}],
"forbid:/film/film/genre": {
"id|=": [
"/en/thriller",
"/en/slapstick"
],
"optional": "forbidden"
}
}]​
Thanks to the following post:
Freebase query - exclusion of certain values

Resources