Question 1:
I am using Google Datastore (Java Client Libraries) with App Engine Flex and I am storing location data. I would like to query for a list of locations within a rectangular region.
My location entity looks like this:
Location
- lat (eg. -56.1)
- long (eg. 45.6)
Google Datastore has limitations for querying with multiple inequalities on different properties so I can't query using GQL like this:
SELECT * FROM Location WHERE lat <= #maxLat AND lat >= #minLat AND long <= #maxLong AND long >= #minLong
Where maxLat, minLat, maxLong, and minLong represent the bounding rectangle to search for Locations.
Currently I am querying using just one filter:
SELECT * FROM Location WHERE lat <= #maxLat AND lat >= #minLat
And from the returned results, I filter out the ones within the longitude bounds as well. Is there a better way to do this without resorting to this strategy ?
Question 2:
If I store the latitude/longitude combination as a Geopoint in Google Datastore, how can I check the latitude and longitude ?
For example if the location is stored like this:
Location
- location (Geopoint)
I cannot filter on the lat/long within the Geopoint using the Java Client Libraries.
SELECT * FROM Location WHERE location.long <= #maxLong
Is there a workaround for the Datastore Geopoint property ?
For an example of how to do this, look at the 'geomodel' library in Python.
Basic premise:
Divide your latitude and longitude values into discrete values (for example, 1 degree blocks), there by defining 'cells' that the value is contained in. Combine/hash then together so you have a single value.
Now, you can convert your queries for locations in a bounded rectangle by doing equality filters against these cells. Post filter at the end if your bounded rectangle doesn't entirely line up with the cell edges.
A related example for doing this with time is: Is datastore good for storing hr shifts?
Check out Datastore's REST API. There is an explicit value type for geopoints which you can find documented at the link below:
https://cloud.google.com/datastore/docs/reference/rest/v1/projects/runQuery#Value
All you'll need to do then is create a geopoint value property (let's call it geo) on an entity and then you'll be able to write GQL queries like this:
SELECT * FROM Address WHERE geo.latitude <= #maxLat AND geo.latitude >= #minLat
Related
Is it possible to query database extents directly without specifying name of a table ?
For example when I fire the command .show database extents , I get a list of extents in the given database. If I pick a specific extent id from the result , or in general any extent id belonging to that database , is there a way to query it without reference to table name?
It's not recommended to take any kind of dependency on an extent ID.
Though your use case isn't clear nor standard, it is possible to run a query as follows:
union *
| where extent_id() == '6810147e-1234-1234-1234-d3649e3d3a83'
| take 10
I'm struggling with DynamoDB schema design of a table storing locations. The table will have [userId, lastUpdatedTime, locationGooglePlaceId, longitude, latitude, hideOnUI(bool)]
one of the main query is given user current location (x, y) as GPS coordinate, query nearby userId based on their longitude and latitude.
The problem is how would I design index for this purpose? The table itself can have HASH key UserId, SORT key lastUpdatedTime; but how would GSI go? I seem can't identify any partition key for "equal" operation
In SQL it'll be something like:
select * from table
where x-c <= longitude and longitude < x+c
AND y-c <= latitude and latitude < y+c
Thanks
First of all, I am not sure if DynamoDB is a good fit here, maybe it's better to use another database, since Dynamo does not support complicated indexes.
Nonetheless, here is a design that you can try.
First of all you can split your map into multiple square blocks, every square block would have an id, and known position and size.
Then if you have a location and you want to find all nearby points, you can do the following.
Every point in your database will be storred in the Points table and it will have following keys:
BlockId (String, UUID, partition key) - id of a block this point belongs to
Latitude (Number, sort key) - latitute of a point
Longtitude (Number) - simple attribute
Now if you know what square a user location in and what squares are nearby, you can perform in all nearby squares the following search:
BlockId = <nearby_block_id>
Latitute between(y-c, y+c)
and use a filter expression based on the Longtitude attribute:
Longtitutede between(x-c, x+c)
It does not really matter what to use as a sort key latitude or longtitude here.
Between is a DynamoDB operator that can be used with sort keys or for filtering expressions:
BETWEEN : Greater than or equal to the first value, and less than or
equal to the second value. AttributeValueList must contain two
AttributeValue elements of the same type, either String, Number, or
Binary (not a set type). A target attribute matches if the target
value is greater than, or equal to, the first element and less than,
or equal to, the second element. If an item contains an AttributeValue
element of a different type than the one provided in the request, the
value does not match. For example, {"S":"6"} does not compare to
{"N":"6"}. Also, {"N":"6"} does not compare to {"NS":["6", "2", "1"]}
Now the downside of this is that for every partition key there can be no more than 10GB of data with this key. So the number of dots that you can put in a single square is limited. You can go around this if your squares are small enough or if your squares have variable sizes and you use big squares for not very crowded areas and small squares for very crowded areas, but seems to be a non-trivial project.
I am using gcloud-python library for a project which needs to serve following use case:
Get a batch of entities with a subset of its properties (projection)
gcloud.datastore.api.get_multi() provides me batch get but not projection
and gcloud.datastore.api.Query() provides me projection but not batch get (like a IN query)
AFAIK, GQLQuery provides both IN query(batch get) and projections. Is there a plan to support GQLQueries in gcloud-python library? OR, is there another way to get batching and projection in single request?
Currently there is no way to request a subset of an entities properties. When you have the list of keys that you need, you should use get_multi().
Projection Query Background
In Datastore, projection queries are simply index scans.
For example, consider you are writing the query SELECT * FROM MyKind ORDER BY myFirstProp, mySecondProp. This query will execute against an index: Index(MyKind, myFirstProp, mySecondProp). This index may look something like:
myFirstProp | mySecondProp | __key__
------------------------------------
a 1 k1
a 2 k2
b 1 k3
For each result in the index, Datastore then looks up the key associated with that index result. If you do a projection query where you project only myFirstProp or mySecondProp or both, Datastore can avoid doing the random access lookup to find the associated entity for each result. This is generally where you get the large performance gain from using projections -- not from the savings of transporting it over the network.
Likewise, if you know the list of keys that you need, you can lookup the key directly -- there is no need to look in an index first.
IN Operator
In Python GQL (not in the similar Cloud Datastore GQL), there is the IN operator, which allows you to write a query that looks something like:
SELECT * FROM MyKind WHERE myFirstProp IN ['a', 'b'].
However, Datastore does not actually support this query natively. Inside the python client, this will get converted into disjunctive normal form:
SELECT * FROM MyKind WHERE myFirstProp = 'a'
UNION
SELECT * FROM MyKind WHERE myFirstProp = 'b'
This means for each value inside your IN, you'll be issuing a separate Datastore query.
Background:
I'm working on a SQLite tile cache database (similar to MBTiles specification), consisting for now just from a single table Tiles with the following columns:
X [INTEGER] - horizontal tile index (not map coordinate)
Y [INTEGER] - vertical tile index (not map coordinate)
Z [INTEGER] - zoom level of a tile
Data [BLOB] - stream with a tile image data (currently PNG images)
All the coords to tile calculations are done in the application, so the SQLite R*Tree Module with the corresponding TADSQLiteRTree class have no meaning for me. All I need is to load Data field blob stream of a record found by a given X, Y, Z values as fast as possible.
The application will, except this database, have also a memory cache implemented by a hash table like this TTileCache type:
type
TTileIdent = record
X: Integer;
Y: Integer;
Z: Integer;
end;
TTileData = TMemoryStream;
TTileCache = TDictionary<TTileIdent, TTileData>;
The workflow when asking for a certain tile while having X, Y, Z values calculated will be simple. I will ask for a tile the memory cache (partially filled from the above table at app. startup), and if the tile won't be found there, ask the database (and even if there won't the tile be found, download it from tile server).
Question:
Which AnyDAC (FireDAC) component(s) would you use for frequent querying of 3 integer column values in a SQLite table (with, let's say 100k records) with an optional loading of the found blob stream ?
Would you use:
query type component (I'd say executing of the same prepared query might be efficient, isn't it ?)
memory table (I'm afraid of it's size, since there might be several GB stored in the tiles table, or is it somehow streamed for instance ?)
something different ?
Definitely use TADQuery. Unless you set the query to Unidirectional, it will buffer all the records returned from the database in memory (default 50). Since you are dealing with blobs, your query should be written to retrieve the minimum number of records you need.
Use a parameterized query, like the following the query
SELECT * FROM ATable
WHERE X = :X AND Y = :Y AND Z = :Z
Once you have initially opened the query, you can change the parameters, then use the Refresh method to retrieve the next record.
A memory table could not be used to retrieve data from the database, it would have to be populated via a query. It could be used to replace your TTileCache records, but I would not recommend it because it would have more overhead than your memory cache implementation.
I would use TFDQuery with a query like follows. Assuming you're about to display fetched tiles on a map, you may consider fetching all tiles for the missing (non cached) tile area at once, not fetching tiles always one by one for your tile grid:
SELECT
X,
Y,
Data
FROM
Tiles
WHERE
(X BETWEEN :HorzMin AND :HorzMax) AND
(Y BETWEEN :VertMin AND :VertMax) AND
(Z = :Zoom)
For the above query I would consider excluding fiBlobs from the FetchOptions for saving some I/O time for cases when the user moves the map view whilst you're reading tiles from the resultset and the requested area is out of of the visible view (you stop reading and never read the rest of them).
I have a SQL table :
SuburbId int,
SubrubName varchar,
StateName varchar,
Postcode varchar,
Latitude Decimal,
Longtitude Decimal
and in MY C# I have created code that creates a bounding box so I can search by distance.
And my stored procedure to get the subrubs is:
[dbo].[Lookup] (
#MinLat decimal(18,15),
#MaxLat decimal(18,15),
#MinLon decimal(18,15),
#MaxLon decimal(18,15)
)
AS
BEGIN
SELECT SuburbId, suburbName, StateName, Latitude, Longitude
FROM SuburbLookup
WHERE (Latitude >= #MinLat AND Latitude <= #MaxLat AND Longitude >= #MinLon AND Longitude <= #MaxLon)
END
My Question is.. this is a Clustered Index Scan... Is there a more efficient way of doing this?
This type of query tends to perform quite poorly with the standard B-tree index. For better performance you can use the geography column type and add a spatial index.
Queries such as WHERE geography.STDistance(geography2) < number can use a spatial index.
Couple of links that should help. Of course depending on the scope of you project you may already have the best solution.
That said if you care to, you can create a custom index in sql server for your locations.
Custom Indexing
Additionally if you wanted to, you could look into Quadtrees and Quadtiles. There is a technique where you can calculate a key via an interleaved addresses, a combination of the lat and lon pairs that, can be represented as an integer and then truncating to a base level to see how they relate to eachother.
see more here