Because of array ordering issues with change detection AND the fact that the schema in the array is inconsistent, i'd prefer to structure arrays of object more as dictionaries, or plain JSON objects, with keys of each object id.
For example, instead of:
{id: 'abc', items: [{id: 'xyz', text:'foo'},{id: 'wxy', name:'bar'}]}
I'd prefer:
{id: 'abc', items: {'xyz': {text:'foo'},'wxy', {name:'bar'}}}
It's there anyway in Cosmos SQL API to query an object and list it's keys and/or the values since the keys are inconsistent between objects?
I'm looking for something equivalent to Object.keys() or Object.values() in JavaScript.
Related
Consider the following data structure:
[
{
"id": "parent",
"links":[{
"href": "child"
}]
},
{
"id": "child",
"links":[{
"href": "grand_child"
}]
},
{
"id": "grand_child",
"links":[]
},
{
"id": "some-other-item",
"links":[{
"href": "anywhere"
}]
}
]
Is there a way to select all the documents related to the "parent" in a single query using Cosmos DB SQL?
The result must include parent, child and grand_child documents.
Assuming here the JSON array shown in OP is array of documents in your collection.
NO, this can not be done using SQL API querying tools. CosmosDB does not support joining different documents, even less recursively. Joins in CosmosDB SQL are self-joins.
But if you definitely need this to happen server-side then you can implement the recursive gathering algorithm by scripting it to a Used Defined Function. Rather ugly though, imho.
I would suggest to just implement this on client side, single query per depth and merge the results. This also keeps nice separation of logic and data and performance should be acceptable if you correctly query all new links together in single query to avoid (1+N).
If your actual needs get much more complex on graph traversal, then you'd have to migrate your documents (or just links) to somewhere else capable of querying graphs (ex: Gremlin API).
I was trying to query a collection, which had few documents. Some of the collections had "Exception" property, where some don't have.
My end query looks some thing like:
Records that do not contain Exception:
**select COUNT(1) from doc c WHERE NOT IS_DEFINED(c.Exception)**
Records that contain Exception:
**select COUNT(1) from doc c WHERE IS_DEFINED(c.Exception)**
But this seems not be working. When NOT IS_DEFINED is returning some count, IS_DEFINED is returning 0 records, where it actually had data.
My data looks something like (some documents can contain Exception property & others don't):
[{
'Name': 'Sagar',
'Age': 26,
'Exception: 'Object reference not set to an instance of the object', ...
},
{
'Name': 'Sagar',
'Age': 26, ...
}]
Update
As Dax Fohl said in an answer NOT IS_DEFINED is implemented now. See the the cosmos dev blob April updates for more details.
To use it properly the queried property should be added to the index of the collection.
Excerpt from the blog post:
Queries with inequality filters or filters on undefined values can now
be run more efficiently. Previously, these filters did not utilize the
index. When executing a query, Azure Cosmos DB would first evaluate
other less expensive filters (such as =, >, or <) in the query. If
there were inequality filters or filters on undefined values
remaining, the query engine would be required to load each of these
documents. Since inequality filters and filters on undefined values
now utilize the index, we can avoid loading these documents and see a
significant improvement in RU charge.
Here’s a full list of query filters with improvements:
Inequality comparison expression (e.g. c.age != 4)
NOT IN expression (e.g. c.name NOT IN (‘Luis’, ‘Andrew’, ‘Deborah’))
NOT IsDefined
Is expressions (e.g. NOT IsDefined(c.age), NOT IsString(c.name))
Coalesce operator expression (e.g. (c.name ?? ‘N/A’) = ‘Thomas’)
Ternary operator expression (e.g. c.name = null ? ‘N/A’ : c.name)
If you have queries with these filters, you should add an index for
the relevant properties.
The main difference between IS_DEFINED and NOT IS_DEFINED is the former utilizes the index while the later does not (same w/ = vs. !=). It's most likely the case here is IS_DEFINED query finishes in a single continuation and thus you get the full COUNT result. On the other hand, it seems that NOT IS_DEFINED query did not finish in a single continuation and thus you got partial COUNT result. You should get the full result by following the query continuation.
We are migrating from mongoDB to CosmoDB using the Mongo API.
We have encountered the following difference in query behavior around sorting.
Using the CosmoDB mongo API sorting by a field removes all documents that don't have that field. Is it possible to modifying the query to including the nulls to replicate the mongo behavior?
For example if we have the following 2 documents
[{
"id":"p1",
"priority":1
},{
"id":"p2"
}]
performing:
sort({"priority":1})
cosmoDB will return a single result 'p1'.
mongo will return both results in the order 'p2', 'p1', the null documents will be first.
As far as I know, the null value will not include in the query result sort scan.
Here is a workaround, you could set a not exists field in the sort method to force the engine scan all the data.
Like this:
db.getCollection('brandotestcollections').find().sort({"test": 1, "aaaa":1})
The result is like this:
I had the same problem and got solved after some reading
Refer the document...
You have to update the indexing policy of the container to change the default way of Cosmos DB sorting!
The following example performs an unordered insert of three documents. With unordered inserts, if an error occurs during an insert of one of the documents, MongoDB continues to insert the remaining documents in the array:
db.products.insert(
[
{ _id: 20, item: "lamp", qty: 50, type: "desk" },
{ _id: 21, item: "lamp", qty: 20, type: "floor" },
{ _id: 22, item: "bulk", qty: 100 }
],
{ ordered: false }
)
Is this possible with mongolite? I am using a dataframe to insert data into mongo.
The mongo shell converts multiple insert statements into a bulk insert operation, which is where the ordered vs unordered behaviour applies. The Bulk API was introduced in MongoDB 2.6; older versions of MongoDB had a batch insert API which had an option to "continue on error" that defaulted to false.
The mongolite R package builds on the officially supported libmongoc driver, but as at mongolite 1.2 does not properly expose an option to control the behaviour of bulk inserts. However, the underlying mongolite C functions do have a stop_on_error boolean (default: TRUE) which maps to ordered vs unordered inserts.
I've submitted a pull request (mongolite #99) which will pass through the stop_on_error parameter for bulk inserts.
This doesn't change the default mongolite behaviour, which will be to stop at the first error encountered in a bulk insert. With stop_on_error set to FALSE, errors will be summarised for each batch of bulk inserts.
Sample usage (where data could be any valid parameter for insert() such as a data frame, named list, or character vector with JSON strings):
coll$insert(data, stop_on_error = FALSE)
It may make more sense to rename the parameter from stop_on_error to ordered for consistency with the bulk API, but I'll leave that to the discretion of the mongolite maintainer.
I have a collection in Azure DocumentDB where in I have documents clustered into 3 sets using a JSON property called clusterName for each document. The 3 clusters of documents are templated somewhat like these:
{
"clusterName": "CustomerInformation",
"id": "CustInfo1001",
"custName": "XXXX"
},
{
"clusterName": "ZoneInformation",
"id": "ZoneInfo5005",
"zoneName": "YYYY"
},
{
"clusterName": "CustomerZoneAssociation",
"id": "CustZoneAss9009",
"custId": "CustInfo1001",
"zoneId": "ZoneInfo5005"
}
As you can see the document for CustomerZoneAssociation links the documents of CustomerInformation and ZoneInformation with their Id s. I need help in querying out information from CustomerInformation and ZoneInformation cluster with the help of their Id s associated in the CustomerZoneAssociation cluster. The result of the query I am expecting is:
{
"clusterName": "CustomerZoneAssociation",
"id": "CustZoneAss9009",
"custId": "CustInfo1001",
"custName": "XXXX",
"zoneId": "ZoneInfo5005",
"zoneName": "YYYY"
}
Please suggest a solution which would take only 1 trip to DocumentDB
DocumentDB does not support inter-document JOINs... instead, the JOIN keyword is used to perform intra-document cross-products (to be used with nested arrays).
I would recommend one of the following approaches:
Keep in mind that you do not have to normalize every entity as you would with a traditional RDBMS. It may be worth revisiting your data model, and de-normalize parts of your data where appropriate. Also keep in mind that, de-normalizing comes with its own trade offs (fanning out writes vs issuing follow up reads). Check out the following SO answer to read more on the tradeoffs between normalizing vs de-normalizing data.
Write a stored procedure to batch a sequence of operations within a single network request. Checkout the following SO answer for a code sample on this approach.