How can I select a filtered child document collection when querying a top level document in cosmos db - azure-cosmosdb

I'm trying to filter the child documents returned when querying a parent document using the sql api in cosmos db.
For example given this document:
{
"customerName": "Wallace",
"customerReference": 666777,
"orders": [
{
"date": "20181105T00:00:00",
"amount": 118.84,
"description": "Laptop Battery"
},
{
"date": "20181105T00:00:00",
"amount": 81.27,
"description": "Toner"
},
{
"date": "20181105T00:00:00",
"amount": 55.12,
"description": "Business Cards"
},
{
"date": "20181105T00:00:00",
"amount": 281.00,
"description": "Espresso Machine"
}]
}
I would like to query the customer to retrieve the name, reference and orders over 100.00 to produce a results like this
[{
"customerName": "Wallace",
"customerReference": 666777,
"orders": [
{
"date": "20181105T00:00:00",
"amount": 118.84,
"description": "Laptop Battery"
},
{
"date": "20181105T00:00:00",
"amount": 281.00,
"description": "Espresso Machine"
}]
}]
the query I have so far is as follows
SELECT c.customerName, c.customerReference, c.orders
from c
where c.customerReference = 666777
and c.orders.amount > 100
this returns an empty set
[]
and if you remove "and c.orders.amount > 100" it matches the document and returns all orders.
To reproduce this issue I simply set up a new database, added a new collection and copied the json example in as the only document. The index policy is left as the default which I've copied below.
{
"indexingMode": "consistent",
"automatic": true,
"includedPaths": [
{
"path": "/*",
"indexes": [
{
"kind": "Range",
"dataType": "Number",
"precision": -1
},
{
"kind": "Range",
"dataType": "String",
"precision": -1
},
{
"kind": "Spatial",
"dataType": "Point"
}
]
}
],
"excludedPaths": []
}

Cosmos DB doesn't support the deep filtering in the way I attempted in my original query.
To achieve the results described you need to use a subquery using a combination of ARRAY and VALUE as follows:
SELECT
c.customerName,
c.customerReference,
ARRAY(SELECT Value ord from ord in c.orders WHERE ord.amount > 100) orders
from c
where c.customerReference = 666777
note the use of 'ord' - 'order' is a reserved word.
The query then produces the correct result - eg
[{
"customerName": "Wallace",
"customerReference": 666777,
"orders": [
{
"date": "20181105T00:00:00",
"amount": 118.84,
"description": "Laptop Battery"
},
{
"date": "20181105T00:00:00",
"amount": 281.00,
"description": "Espresso Machine"
}
]
}]

Related

Firestore Pagination: how to define **unique** 'startAt'-cursor for REST?

This is a follow up question to an already solved one.
For this previous question an answer was given, how to define a cursor for query-pagination with 'startAt' for REST, that relates to a range of documents. In the example below, the cursor relates to all documents with an 'instructionNumber.stringValue' equal to "instr. 101". According to my testing, this results in skipping of documents.
New question:
How has the cursor to be defined, to not only relate to the stringValue of a field, that the query is ordered by? But instead to a distinct document (usually defined by its document-id)?
"structuredQuery": {
"from": [{"collectionId": "instructions"}],
"where": {
"fieldFilter": {
"field": {
"fieldPath": "belongsToDepartementID"
},
"op": "EQUAL",
"value": {
"stringValue": "toplevel-document-id"
}
}
},
"orderBy": [
{
"field": {
"fieldPath": "instructionNumber"
},
"direction": "ASCENDING"
}
],
"startAt": {
"values": [{
"stringValue": "instr. 101"
}]
},
"limit": 5
}
}
For better understanding, here is the condensed schema of the documents.
{
"document": {
"name": "projects/PROJECT_NAME/databases/(default)/documents/organizations/testManyInstructions/instructions/i104",
"fields":
"belongsToDepartementID": {
"stringValue": "toplevel-document-id"
},
"instructionNumber": {
"stringValue": "instr. 104"
},
"instructionTitle": {
"stringValue": "dummy Title104"
},
"instructionCurrentRevision": {
"stringValue": "A"
}
},
"createTime": "2022-02-18T13:55:47.300271Z",
"updateTime": "2022-02-18T13:55:47.300271Z"
}
}
For a query with no ordering:
"orderBy": [{
"direction": "ASCENDING",
"field": {"fieldPath": "__name__"}
}],
"startAt": {
"before": false,
"values": [{"referenceValue": "last/doc/ref"}]
}
For a query with ordering:
"orderBy": [
{
"direction": "DESCENDING",
"field": {"fieldPath": "instructionNumber"}
},
{
"direction": "DESCENDING",
"field": {"fieldPath": "__name__"}
}
],
"startAt":
{
"before": false,
"values": [
{"stringValue": "instr. 101"},
{"referenceValue": "last/doc/ref"}
]
}
Be sure to use the same direction for __name__ as the previous "orderBy" or it will need a composite index.
To ensure you have identify unique document for starting at you'll always want to include the document ID in your call to startAt.
I'm not sure of the exact syntax for the REST API, but the Firebase SDKs automatically pass this document ID when you call startAt with a DocumentSnapshot.

Not able to filter out required property in Azure TSI Gen1 Get Events API response

I am using the below request body to fetch only the required property values.
"searchSpan": {
"from": {
"dateTime": "2021-11-20T00:00:00.000Z"
},
"to": {
"dateTime": "2021-11-20T23:00:00.000Z"
}
},
"predicateString": "[Params.Name] = 'power'",
"take": 100
}
}
The URL is like below:
https://12345678a-bcde-3e91-blah-2292933292aa.env.timeseries.azure.com/events?api-version=2016-12-12
Despite specifying the required property the response returns all properties as if it has not seen the predicate string. What might I be doing wrong?
{
"warnings": [],
"events": [
{
"schema": {
"rid": 0,
"$esn": "my-event-hub",
"properties": [
{
"name": "mytimestamp",
"type": "DateTime"
},
{
"name": "Params.Name",
"type": "String"
},
{
"name": "Params.Value",
"type": "Double"
}
]
},
"$ts": "2021-11-20T10:01:50Z",
"values": [
"2021-11-20T10:01:50Z",
"energy",
60
]
},
{
"schemaRid": 0,
"$ts": "2021-11-20T10:01:50Z",
"values": [
"2021-11-20T10:01:50Z",
"power",
10
]
},
{
"schemaRid": 0,
"$ts": "2021-11-20T10:01:50Z",
"values": [
"2021-11-20T10:01:50Z",
"strength",
200
]
},
]
}
Edit
I'm getting "Properties count error" in the TSI overview page. This might quite be the root cause but I don't know for sure
"For Time Series Insights environment ABC: You have used all 641/600 properties in your environment".

GraphDb Indexing Policies

Consider the following json responses..
If you run the graph query g.V().hasLabel('customer'), the response is:
[
{
"id": "75b9bddc-4008-43d7-a24c-8b138735a36a",
"label": "customer",
"type": "vertex",
"properties": {
"partitionKey": [
{
"id": "75b9bddc-4008-43d7-a24c-8b138735a36a|partitionKey",
"value": 1
}
]
}
}
]
If you run the sql query select * from c where c.label = 'customer', the response is:
[
{
"label": "customer",
"partitionKey": 1,
"id": "75b9bddc-4008-43d7-a24c-8b138735a36a",
"_rid": "0osWAOso6VYBAAAAAAAAAA==",
"_self": "dbs/0osWAA==/colls/0osWAOso6VY=/docs/0osWAOso6VYBAAAAAAAAAA==/",
"_etag": "\"2400985f-0000-0c00-0000-5e2066190000\"",
"_attachments": "attachments/",
"_ts": 1579181593
}
]
Q: With this difference in structure around the partitionKey section, should this be referenced as /properties/partitionKey/*, or /partitionKey/? in the indexing policy?
Currently i have hedged by bets with...
{
"indexingMode": "consistent",
"automatic": true,
"includedPaths": [{
"path": "/properties/partitionKey/*"
},{
"path": "/partitionKey/?"
},{
"path": "/label/?"
}
],
"excludedPaths": [{
"path": "/*"
},{
"path": "/\"_etag\"/?"
}
]
}
TIA! 😁
This should be stored as "/partitionKey" in your index policy, not "/properties/partitionKey"
btw, another thing to point out here is it's generally better to only exclude paths you will never query on rather than have to include those you will. This way, if you add properties to your graph you won't have to rebuild the index to query on the new properties.

How to Query Google Cloud Datastore for array

I have wriiten the query to get the all the list of Event Data entities. The result is Coming like this from the google Data Store.
[{
"key": {
"id": 5678669024460800,
"kind": "Event",
"path": [
"Event",
5678669024460800
]
},
"data": {
"createdAt": "2017-03-27T06:28:58.000Z",
"users":["test1#xxx.com","test2#xxx.com","test3#xxx.com"]
}
},
{
"key": {
"id": 5678669024460800,
"kind": "Event",
"path": [
"Event",
5678669024460800
]
},
"data": {
"createdAt": "2017-03-27T06:28:58.000Z",
"users":["test1#xxx.com"]
}
},
{
"key": {
"id": 5678669024460800,
"kind": "Event",
"path": [
"Event",
5678669024460800
]
},
"data": {
"createdAt": "2017-03-27T06:28:58.000Z",
"users":["test2#xxx.com","test3#xxx.com"]
}
}]
but i need to Write a Query to filter by Email'id. means i need to fetch the entities which are match with the Email id. For Eg if i pass the emailid as "test1#xxx.com" i should get final Result like this. Can anybody help me on this.
[{
"key": {
"id": 5678669024460800,
"kind": "Event",
"path": [
"Event",
5678669024460800
]
},
"data": {
"createdAt": "2017-03-27T06:28:58.000Z",
"users":["test1#xxx.com","test2#xxx.com","test3#xxx.com"]
}
},
{
"key": {
"id": 5678669024460800,
"kind": "Event",
"path": [
"Event",
5678669024460800
]
},
"data": {
"createdAt": "2017-03-27T06:28:58.000Z",
"users":["test1#xxx.com"]
}
}]
The GQL query would be something like -
SELECT * FROM Event WHERE users='test1#xxx.com'
You need to make sure the users property is indexed in order for the search to work, otherwise you may not get any results back.

OpenRefine reconciliation service not working - mutliple vs single queries

I have been using OpenRefine 2.6 Beta 1 w/o problems since its release, and later, with the reconciliation service at:
http://reconcile.freebaseapps.com/reconcile
However, in the past fee days, I have not been able to use it all. If I go to the URL:
http://reconcile.freebaseapps.com/
and type the multiple query:
{
"query": "Ford",
"type": "/people/person",
"properties": [
{
"pid": "/people/person/place_of_birth",
"v": "Detroit"
}
]
}
I obtain:
{
"result": [
{
"id": "/m/0j8pb6y",
"name": "Ford",
"type": [
{
"id": "/people/person",
"name": "Person"
},
{
"id": "/common/topic",
"name": "Topic"
},
{
"id": "/geography/mountaineer",
"name": "Mountaineer"
}
],
"notable": [],
"score": 1.1546246,
"match": false
},
{
"id": "/m/01vd3gv",
"name": "Ford",
"type": [
{
"id": "/common/topic",
"name": "Topic"
},
{
"id": "/music/artist",
"name": "Musical Artist"
}
],
"notable": [],
"score": 1.0330245999999998,
"match": false
},
{
"id": "/m/0cmdhzt",
"name": "James Meredith",
"type": [
{
"id": "/common/topic",
"name": "Topic"
},
{
"id": "/people/person",
"name": "Person"
},
{
"id": "/military/military_person",
"name": "Military Person"
},
{
"id": "/people/deceased_person",
"name": "Deceased Person"
}
],
"notable": [],
"score": 0.0681692,
"match": false
}
],
"duration": 369
}
But if I try a simple query:
{
"query": "Ford"
}
I get:
Status: error Error:undefined
Any insights into what's happening with the reconciliation service? Is there any other service I could use to replace freebaseapps.com?
Thanks
Try this in Queries Parameter at http://reconcile.freebaseapps.com/
{
"q0": {
"query": "Ford"
}
}
For some reason, single queries are not accepted in Query Parameter but in Queries Parameter in the format above. I have not tested this in OpenRefine, so you might have to modify it.
I don't know for certain about the date, but Freebase was announced earlier this year as being shutdown by Jun 30, 2015, for some services. Maybe service is intermittent until full shutdown? Sorry, this answer probably doesn't help much.

Resources