Spatial Indexing not working with ST_DISTANCE queries and '<' - azure-cosmosdb

Spatial indexing does not seem to be working on a collection which contains a document with GeoJson coordinates. I've tried using the default indexing policy which inherently provides spatial indexing on all fields.
I've tried creating a new Cosmos Db account, database, and collection from scratch without any success of getting the spatial indexing to work with ST_DISTANCE query.
I've setup a simple collection with the following indexing policy:
{
"indexingMode": "consistent",
"automatic": true,
"includedPaths": [
{
"path": "/\"location\"/?",
"indexes": [
{
"kind": "Spatial",
"dataType": "Point"
},
{
"kind": "Range",
"dataType": "Number",
"precision": -1
},
{
"kind": "Range",
"dataType": "String",
"precision": -1
}
]
}
],
"excludedPaths": [
{
"path": "/*",
},
{
"path": "/\"_etag\"/?"
}
]
}
The document that I've inserted into the collection:
{
"id": "document1",
"type": "Type1",
"location": {
"type": "Point",
"coordinates": [
-50,
50
]
},
"name": "TestObject"
}
The query that should return the single document in the collection:
SELECT * FROM f WHERE f.type = "Type1" and ST_DISTANCE(f.location, {'type': 'Point', 'coordinates':[-50,50]}) < 200000
Is not returning any results. If I explicitly query without using the spatial index like so:
SELECT * FROM f WHERE f.type = "Type1" and ST_DISTANCE({'type': 'Point', 'coordinates':[f.location.coordinates[0],f.location.coordinates[1]]}, {'type': 'Point', 'coordinates':[-50,50]}) < 200000
It returns the document as it should, but doesn't take advantage of the indexing which I will need because I will be storing a lot of coordinates.
This seems to be the same issue referenced here. If I add a second document far away and change the '<' to '>' in the first query it works!
I should mention this is only occurring on Azure. When I use the Azure Cosmos Db Emulator it works perfectly! What is going on here?! Any tips or suggestions are much appreciated.
UPDATE: I found out the reason that the query works on the Emulator and not Azure - the database on the emulator doesn't have provisioned (shared) throughput among its collections, while I made the database in Azure with provisioned throughput to keep costs down (i.e. 4 collections sharing 400 RU/s). I created a non provisioned throughput database in Azure and the query works with spatial indexing!! I will log this issue with Microsoft to see if there is a reason why this is the case?

Thanks for following up with additional details with regards to a fixed collection being the solution but, I did want to get some additional information.
The Cosmos DB Emulator now supports containers:
By default, you can create up to 25 fixed size containers (only supported using Azure Cosmos DB SDKs), or 5 unlimited containers using the Azure Cosmos Emulator. By modifying the PartitionCount value, you can create up to 250 fixed size containers or 50 unlimited containers, or any combination of the two that does not exceed 250 fixed size containers (where one unlimited container = 5 fixed size containers). However it's not recommended to set up the emulator to run with more than 200 fixed size containers. Because of the overhead that it adds to the disk IO operations, which result in unpredictable timeouts when using the endpoint APIs.
So, I want to see which version of the Emulator you were using. Current version is azure-cosmosdb-emulator-2.2.2.

Related

can't push data from AWS IoT Core to AWS TimeStream

i finished 2 days trying to search and solve my problem but no result, i wish i can get some help from you.
i am pushing data from local pc running KEPServerEX to AWS IoT Core using MQTT Agent. i can see the data updating on AWS without issue. Then i created a DB and table on TimeStream called respectively Kep_DB & Table_Kep1.
my issue is that i am trying to create a Rule on AWS IoT Core to send data to the DB table. a rule to work it require to create an SQL like statement and Dimensions.
below what i have tried:
`SELECT (SELECT v FROM values) as value FROM 'iotgateway'`
Then as dimension i put 1 dimension like below:
name: id value: ${id}
my pyload on AWS IoT Core is having this format:
`{
"timestamp": 1668852877344,
"values": [
{
"id": "Simulation Examples.Functions.Random1",
"v": 7,
"q": true,
"t": 1668852868880
},
{
"id": "Simulation Examples.Functions.Ramp2",
"v": 161,
"q": true,
"t": 1668852868880
},
{
"id": "Simulation Examples.Functions.Sine4",
"v": 39.9302559,
"q": true,
"t": 1668852868880
}
]}`
I am still not able to see to data coming to my DB even i tried several dimension name and several SQL statement format.
Any experience on this please ?

How to convert pipeline().parameters.windowStart to epoch in Azure Data Factory copy pipeline query

I have a timestamp column in DocDb, I would like to query that in Azure Data Factory copy pipeline, which copies DocDb to Azure Data Lake
I would like to
select * from c
where c._ts > '#{pipeline().parameters.windowStart}'
But I got
Errors":["An invalid query has been specified with filters against path(s) that are not range-indexed.
In the DocDb policy, I have
"includedPaths": [
{
"path": "/*",
"indexes": [
{
"kind": "Range",
"dataType": "Number",
"precision": -1
},
{
"kind": "Hash",
"dataType": "String",
"precision": 3
}
]
}
]
I think this should allow _ts int64 to be queried by range.
Where did I go wrong?
Thanks.
I reproduce your issue with your sql and your index policy.
Based on my observation, it seems that the filter is treated as String,not Int.You could remove the ' in your sql and try again,it works for me.
sql:
select * from c
where c._ts > #{pipeline().parameters.windowStart}
Output:
Thanks, #Jay.
I ended up using UDF
function dateTime2Epoch(dateTimeString){
return Math.trunc(new Date(dateTimeString).getTime()/1000);
}
in Cosmos db.
Then in Azure Data Factory query
select * from c
where c._ts >= udf.dateTime2Epoch('#{pipeline().parameters.windowStart}')
and c._ts < udf.dateTime2Epoch('#{pipeline().parameters.windowEnd}')
However, the query seems to be very slow. I will update this when I found more.
Update: Ended up with copying the whole thing.

Cosmos DocumentDb: Inefficient ORDER BY

I'm doing some early trials on Cosmos, and populated a table with a set of DTOs. While some simple WHERE queries seem to return quite quickly, others are horribly inefficient. A simple COUNT(1) from c took several secons and used over 10K request units. Even worse, doing a little experiment with ordering also was very discouraging. Here's my query
SELECT TOP 20 c.Foo, c.Location from c
ORDER BY c.Location.Position.Latitude DESC
My collection (if the count was correct, I got super weird results running it while populating the DB, but that's another issue) contains about 300K DTOs. The above query ran for about 30 seconds (I currently have the DB configured to perform with 4K RU/s), and ate 87453.439 RUs with 6 round trips. Obviously, that's a no-go.
According to the documentation, the numeric Latitute property should be indexed, so I'm not sure it's me screwing up here, or the reality didn't really catch up with the marketing here ;)
Any idea on why this doesn't perform properly? Thanks for your advice!
Here's a document as returned:
{
"Id": "y-139",
"Location": {
"Position": {
"Latitude": 47.3796977,
"Longitude": 8.523499
},
"Name": "Restaurant Eichhörnli",
"Details": "Nietengasse 16, 8004 Zürich, Switzerland"
},
"TimeWindow": {
"ReferenceTime": "2017-07-01T15:00:00",
"ReferenceTimeUtc": "2017-07-01T15:00:00+02:00",
"Direction": 0,
"Minutes": 45
}
}
The DB/collection I use is just the default one that can be created for the ToDo application from within the Azure portal. This apparently created the following indexing policy:
{
"indexingMode": "consistent",
"automatic": true,
"includedPaths": [
{
"path": "/*",
"indexes": [
{
"kind": "Range",
"dataType": "Number",
"precision": -1
},
{
"kind": "Hash",
"dataType": "String",
"precision": 3
}
]
}
],
"excludedPaths": []
}
Update as of Dec 2017:
I revisited my unchanged database and ran the same query again. This time, it's fast and instead of > 87000 RUs, it eats up around 6 RUs. Bottom line: It appears there was something very, very wrong with Cosmos DB, but whatever it was, it seems to be gone.

How to represent the Data Model of a Graph

So we have been developing some graph based analysis tools, using neo4j as a persistence engine in the background. As part of this we are developing a graph data model suitable for our domain, and we want to use this in the application layer to restrict the types of nodes, or to ensure that nodes of certain types must carry certain properties. Normal data model restrictions.
So thats the background, what I am asking is if there is some standard way to represent a data-model for a graph db? The graph equivalent of an xsd perhaps?
There's an open-source project supporting strong schema definitions in Neo4j: Structr (http://structr.org, see it in action: http://vimeo.com/structr/videos)
With Structr, you can define an in-graph schema of your data model including
Type inheritance
Supported data types: Boolean, String, Integer, Long, Double, Date, Enum (+ values)
Default values
Cardinality (1:1, 1:*, *:1)
Not-null constraints
Uniqueness constraints
Full type safety
Validation
Cardinality enforcement
Support for methods (custom action) is currently being added to the schema.
The schema can be edited with an editor, or directly via REST, modifiying the JSON representation of the data model:
{
"query_time": "0.001618446",
"result_count": 4,
"result": [
{
"name": "Whisky",
"extendsClass": null,
"relatedTo": [
{
"id": "96d05ddc9f0b42e2801f06afb1374458",
"name": "Flavour"
},
{
"id": "28f85dca915245afa3782354ea824130",
"name": "Location"
}
],
"relatedFrom": [],
"id": "df9f9431ed304b0494da84ef63f5f2d8",
"type": "SchemaNode",
"_name": "String"
},
{
"name": "Flavour",
...
},
{
"name": "Location",
...
},
{
"name": "Region",
...
}
],
"serialization_time": "0.000829985"
}
{
"query_time": "0.001466743",
"result_count": 3,
"result": [
{
"name": null,
"sourceId": "28f85dca915245afa3782354ea824130",
"targetId": "e4139c5db45a4c1cbfe5e358a84b11ed",
"sourceMultiplicity": null,
"targetMultiplicity": "1",
"sourceNotion": null,
"targetNotion": null,
"relationshipType": "LOCATED_IN",
"sourceJsonName": null,
"targetJsonName": null,
"id": "d43902ad7348498cbdebcd92135926ea",
"type": "SchemaRelationship",
"relType": "IS_RELATED_TO"
},
{
"name": null,
"sourceId": "df9f9431ed304b0494da84ef63f5f2d8",
"targetId": "96d05ddc9f0b42e2801f06afb1374458",
"sourceMultiplicity": null,
"targetMultiplicity": "1",
"sourceNotion": null,
"targetNotion": null,
"relationshipType": "HAS_FLAVOURS",
"sourceJsonName": null,
"targetJsonName": null,
"id": "bc9a6308d1fd4bfdb64caa355444299d",
"type": "SchemaRelationship",
"relType": "IS_RELATED_TO"
},
{
"name": null,
"sourceId": "df9f9431ed304b0494da84ef63f5f2d8",
"targetId": "28f85dca915245afa3782354ea824130",
"sourceMultiplicity": null,
"targetMultiplicity": "1",
"sourceNotion": null,
"targetNotion": null,
"relationshipType": "PRODUCED_IN",
"sourceJsonName": null,
"targetJsonName": null,
"id": "a55fb5c3cc29448e99a538ef209b8421",
"type": "SchemaRelationship",
"relType": "IS_RELATED_TO"
}
],
"serialization_time": "0.000403616"
}
You can access nodes and relationships stored in Neo4j as JSON objects through a RESTful API which is dynamically configured based on the in-graph schema.
$ curl try.structr.org:8082/structr/rest/whiskies?name=Ardbeg
{
"query_time": "0.001267211",
"result_count": 1,
"result": [
{
"flavour": {
"name": "J",
"description": "Full-Bodied, Dry, Pungent, Peaty and Medicinal, with Spicy, Feinty Notes.",
"id": "626ba892263b45e29d71f51889839ebc",
"type": "Flavour"
},
"location": {
"region": {
"name": "Islay",
"id": "4c7dd3fe2779492e85bdfe7323cd78ee",
"type": "Region"
},
"whiskies": [
...
],
"name": "Port Ellen",
"latitude": null,
"longitude": null,
"altitude": null,
"id": "47f90d67e1954cc584c868e7337b6cbb",
"type": "Location"
},
"name": "Ardbeg",
"id": "2db6b3b41b70439dac002ba2294dc5e7",
"type": "Whisky"
}
],
"serialization_time": "0.010824154"
}
In the UI, there's also a data editing (CRUD) tool, and CMS components supporting to create web applications on Neo4j.
Disclaimer: I'm a developer of Structr and founder of the project.
No, there's no standard way to do this. Indeed, even if there were, keep in mind that the only constraints that neo4j currently supports are uniqueness constraints.
Take for example some sample rules:
All nodes labeled :Person must have non-empty properties fname and lname
All nodes labeled :Person must have >= 1 outbound relationship of type :works_for
The trouble with the present neo4j is that even in the case where you did have a schema language (standardized) that could express these things, there wouldn't be a way that the db engine itself could actually enforce that constraint.
So the simple answer is no, there's no standard way of doing that right now.
A few tricks I've seen people use to simulate the same:
Assemble a list of "test suite" cypher queries, with known results. Query for things you know shouldn't be there; non-empty result sets are a sign of a problem/integrity violation. Query for things you know should be there; empty result sets are a problem.
Application-level control -- via some layer like spring-data or similar, control who can talk to the database. This essentially moves your data integrity/testing problem up into the app, away from the database.
It's a common (and IMHO annoying) aspect of many NoSQL solutions (not specifically neo4j) that because of their schema-weakness, they tend to force validation up the tech stack into the application. Doing these things in the application tends to be harder and more error-prone. SQL databases permit you to implement all sorts of schema constraints, triggers, etc -- specifically to make it really damn hard to put the wrong data into the database. The NoSQL databases typically either aren't there yet, or don't do this as a design decision. There are indeed flexibility/performance tradeoffs. Databases can insert faster and be more flexible to adapt quickly if they aren't burdened with checking each atom of data against a long list of schema rules.
EDIT: Two relevant resources: the metagraphs proposal talks about how you could represent the schema as a graph, and neoprofiler is an application that attempts to infer the actual structure of a neo4j database and show you its "profile".
With time, I think it's reasonable to hope that neo would include basic integrity features like requiring certain labels to have certain properties (the example above), restricting the data types of certain properties (lname must always be a String, never an integer), and so on. The graph data model is a bit wild and wooly though (in the computational complexity sense) and there are some constraints on graphs that people desperately would want, but will probably never get. An example would be the constraint that a graph can't have cycles in it. Enforcing that on the creation of every relationship would be very computationally intensive. (

Google Cloud Datastore runQuery returning 412 "no matching index found"

** UPDATE **
Thanks to Alfred Fuller for pointing out that I need to create a manual index for this query.
Unfortunately, using the JSON API, from a .NET application, there does not appear to be an officially supported way of doing so. In fact, there does not officially appear to be a way to do this at all from an app outside of App Engine, which is strange since the Cloud Datastore API was designed to allow access to the Datastore outside of App Engine.
The closest hack I could find was to POST the index definition using RPC to http://appengine.google.com/api/datastore/index/add. Can someone give me the raw spec for how to do this exactly (i.e. URL parameters, what exactly should the body look like, etc), perhaps using Fiddler to inspect the call made by appcfg.cmd?
** ORIGINAL QUESTION **
According to the docs, "a query can combine equality (EQUAL) filters for different properties, along with one or more inequality filters on a single property".
However, this query fails:
{
"query": {
"kinds": [
{
"name": "CodeProse.Pogo.Tests.TestPerson"
}
],
"filter": {
"compositeFilter": {
"operator": "and",
"filters": [
{
"propertyFilter": {
"operator": "equal",
"property": {
"name": "DepartmentCode"
},
"value": {
"integerValue": "123"
}
}
},
{
"propertyFilter": {
"operator": "greaterThan",
"property": {
"name": "HourlyRate"
},
"value": {
"doubleValue": 50
}
}
},
{
"propertyFilter": {
"operator": "lessThan",
"property": {
"name": "HourlyRate"
},
"value": {
"doubleValue": 100
}
}
}
]
}
}
}
}
with the following response:
{
"error": {
"errors": [
{
"domain": "global",
"reason": "FAILED_PRECONDITION",
"message": "no matching index found.",
"locationType": "header",
"location": "If-Match"
}
],
"code": 412,
"message": "no matching index found."
}
}
The JSON API does not yet support local index generation, but we've documented a process that you can follow to generate the xml definition of the index at https://developers.google.com/datastore/docs/tools/indexconfig#Datastore_Manual_index_configuration
Please give this a shot and let us know if it doesn't work.
This is a temporary solution that we hope to replace with automatic local index generation as soon as we can.
The error "no matching index found." indicates that an index needs to be added for the query to work. See the auto index generation documentation.
In this case you need an index with the properties DepartmentCode and HourlyRate (in that order).
For gcloud-node I fixed it with those 3 links:
https://github.com/GoogleCloudPlatform/gcloud-node/issues/369
https://github.com/GoogleCloudPlatform/gcloud-node/blob/master/system-test/data/index.yaml
and most important link:
https://cloud.google.com/appengine/docs/python/config/indexconfig#Python_About_index_yaml to write your index.yaml file
As explained in the last link, an index is what allows complex queries to run faster by storing the result set of the queries in an index. When you get no matching index found it means that you tried to run a complex query involving order or filter. So to make your query work, you need to create your index on the google datastore indexes by creating a config file manually to define your indexes that represent the query you are trying to run. Here is how you fix:
create an index.yaml file in a folder named for example indexes in your app directory by following the directives for the python conf file: https://cloud.google.com/appengine/docs/python/config/indexconfig#Python_About_index_yaml or get inspiration from the gcloud-node tests in https://github.com/GoogleCloudPlatform/gcloud-node/blob/master/system-test/data/index.yaml
create the indexes from the config file with this command:
gcloud preview datastore create-indexes indexes/index.yaml
see https://cloud.google.com/sdk/gcloud/reference/preview/datastore/create-indexes
wait for the indexes to serve on your developer console in Cloud Datastore/Indexes, the interface should display "serving" once the index is built
once it is serving your query should work
For example for this query:
var q = ds.createQuery('project')
.filter('tags =', category)
.order('-date');
index.yaml looks like:
indexes:
- kind: project
ancestor: no
properties:
- name: tags
- name: date
direction: desc
Try not to order the result. After removing orderby(), it worked for me.

Resources