I have a SQL table :
SuburbId int,
SubrubName varchar,
StateName varchar,
Postcode varchar,
Latitude Decimal,
Longtitude Decimal
and in MY C# I have created code that creates a bounding box so I can search by distance.
And my stored procedure to get the subrubs is:
[dbo].[Lookup] (
#MinLat decimal(18,15),
#MaxLat decimal(18,15),
#MinLon decimal(18,15),
#MaxLon decimal(18,15)
)
AS
BEGIN
SELECT SuburbId, suburbName, StateName, Latitude, Longitude
FROM SuburbLookup
WHERE (Latitude >= #MinLat AND Latitude <= #MaxLat AND Longitude >= #MinLon AND Longitude <= #MaxLon)
END
My Question is.. this is a Clustered Index Scan... Is there a more efficient way of doing this?
This type of query tends to perform quite poorly with the standard B-tree index. For better performance you can use the geography column type and add a spatial index.
Queries such as WHERE geography.STDistance(geography2) < number can use a spatial index.
Couple of links that should help. Of course depending on the scope of you project you may already have the best solution.
That said if you care to, you can create a custom index in sql server for your locations.
Custom Indexing
Additionally if you wanted to, you could look into Quadtrees and Quadtiles. There is a technique where you can calculate a key via an interleaved addresses, a combination of the lat and lon pairs that, can be represented as an integer and then truncating to a base level to see how they relate to eachother.
see more here
Related
Is there a storage and performance gain to denormalize sqlite3 blob column into primary table, but treat it as a foreign key only in some cases? I have two implementations, and both seem to run slower. Is there some sqlite3 internals that preclude such usage?
I have a ~100GB sqlite file with two tables. The first maps z,x,y coordinates to an ID -- a 32 chars hex string stored as TEXT. The second table maps that ID to a BLOB, usually a few kilobytes. There are unique indexes for both (z,x,y) and ID. There is a VIEW that joins both tables.
For ~30% of coordinates, BLOBs are unique per coordinate combination. The rest reference the same ~100 of frequently occurring BLOBs.
I would like to optimize for space and performance: move unique BLOBs into the first table, and keep the second table only as a small 100-row lookup for the few shared BLOBs. The first table's blob could be checked at run time -- if it is exactly the size of the hash key, treat it as a lookup. Otherwise, treat it as value.
My thinking is that this will often avoid a lookup into a large second table, keep small lookup table fully in cache, and avoid storing keys for most of the blobs. My perf testing does not confirm this theory, and I don't understand why.
Original implementation:
CREATE TABLE map (z INTEGER, x INTEGER, y INTEGER, id TEXT);
CREATE TABLE blobs (id TEXT, data BLOB);
CREATE VIEW tiles AS
SELECT z, x, y, data FROM map JOIN blobs ON blobs.id = map.id;
CREATE UNIQUE INDEX map_index ON map (z, x, y);
CREATE UNIQUE INDEX blobs_id ON blobs (id);
Optimized implementation changes ID column in the map table from id TEXT to mix BLOB.
CREATE TABLE map (z INTEGER, x INTEGER, y INTEGER, mix BLOB);
I tried two VIEW implementations, both run slower by ~10% than the INNER JOIN method above. The LEFT JOIN method:
CREATE VIEW tiles AS
SELECT z, x, y,
COALESCE(blobs.data, map.mix) AS data
FROM map LEFT JOIN blobs ON LENGTH(map.mix) = 32 AND map.mix = blobs.id;
And I tried the sub-query approach:
CREATE VIEW tiles AS
SELECT z, x, y,
CASE
WHEN LENGTH(map.mix) = 32 THEN
(SELECT COALESCE(blobs.data, map.mix) FROM blobs WHERE map.mix = blobs.id)
ELSE map.mix
END AS data
FROM map;
P.S. COALESCE() ensures that in case my data length happened to be 32, but it is not a foreign key, the query should return data as is.
P.P.S. This is an mbtiles file with map tiles, and duplicate tiles represent empty water and land, whereas unique tiles represent places with some unique features, like city streets)
I'm struggling with DynamoDB schema design of a table storing locations. The table will have [userId, lastUpdatedTime, locationGooglePlaceId, longitude, latitude, hideOnUI(bool)]
one of the main query is given user current location (x, y) as GPS coordinate, query nearby userId based on their longitude and latitude.
The problem is how would I design index for this purpose? The table itself can have HASH key UserId, SORT key lastUpdatedTime; but how would GSI go? I seem can't identify any partition key for "equal" operation
In SQL it'll be something like:
select * from table
where x-c <= longitude and longitude < x+c
AND y-c <= latitude and latitude < y+c
Thanks
First of all, I am not sure if DynamoDB is a good fit here, maybe it's better to use another database, since Dynamo does not support complicated indexes.
Nonetheless, here is a design that you can try.
First of all you can split your map into multiple square blocks, every square block would have an id, and known position and size.
Then if you have a location and you want to find all nearby points, you can do the following.
Every point in your database will be storred in the Points table and it will have following keys:
BlockId (String, UUID, partition key) - id of a block this point belongs to
Latitude (Number, sort key) - latitute of a point
Longtitude (Number) - simple attribute
Now if you know what square a user location in and what squares are nearby, you can perform in all nearby squares the following search:
BlockId = <nearby_block_id>
Latitute between(y-c, y+c)
and use a filter expression based on the Longtitude attribute:
Longtitutede between(x-c, x+c)
It does not really matter what to use as a sort key latitude or longtitude here.
Between is a DynamoDB operator that can be used with sort keys or for filtering expressions:
BETWEEN : Greater than or equal to the first value, and less than or
equal to the second value. AttributeValueList must contain two
AttributeValue elements of the same type, either String, Number, or
Binary (not a set type). A target attribute matches if the target
value is greater than, or equal to, the first element and less than,
or equal to, the second element. If an item contains an AttributeValue
element of a different type than the one provided in the request, the
value does not match. For example, {"S":"6"} does not compare to
{"N":"6"}. Also, {"N":"6"} does not compare to {"NS":["6", "2", "1"]}
Now the downside of this is that for every partition key there can be no more than 10GB of data with this key. So the number of dots that you can put in a single square is limited. You can go around this if your squares are small enough or if your squares have variable sizes and you use big squares for not very crowded areas and small squares for very crowded areas, but seems to be a non-trivial project.
Question 1:
I am using Google Datastore (Java Client Libraries) with App Engine Flex and I am storing location data. I would like to query for a list of locations within a rectangular region.
My location entity looks like this:
Location
- lat (eg. -56.1)
- long (eg. 45.6)
Google Datastore has limitations for querying with multiple inequalities on different properties so I can't query using GQL like this:
SELECT * FROM Location WHERE lat <= #maxLat AND lat >= #minLat AND long <= #maxLong AND long >= #minLong
Where maxLat, minLat, maxLong, and minLong represent the bounding rectangle to search for Locations.
Currently I am querying using just one filter:
SELECT * FROM Location WHERE lat <= #maxLat AND lat >= #minLat
And from the returned results, I filter out the ones within the longitude bounds as well. Is there a better way to do this without resorting to this strategy ?
Question 2:
If I store the latitude/longitude combination as a Geopoint in Google Datastore, how can I check the latitude and longitude ?
For example if the location is stored like this:
Location
- location (Geopoint)
I cannot filter on the lat/long within the Geopoint using the Java Client Libraries.
SELECT * FROM Location WHERE location.long <= #maxLong
Is there a workaround for the Datastore Geopoint property ?
For an example of how to do this, look at the 'geomodel' library in Python.
Basic premise:
Divide your latitude and longitude values into discrete values (for example, 1 degree blocks), there by defining 'cells' that the value is contained in. Combine/hash then together so you have a single value.
Now, you can convert your queries for locations in a bounded rectangle by doing equality filters against these cells. Post filter at the end if your bounded rectangle doesn't entirely line up with the cell edges.
A related example for doing this with time is: Is datastore good for storing hr shifts?
Check out Datastore's REST API. There is an explicit value type for geopoints which you can find documented at the link below:
https://cloud.google.com/datastore/docs/reference/rest/v1/projects/runQuery#Value
All you'll need to do then is create a geopoint value property (let's call it geo) on an entity and then you'll be able to write GQL queries like this:
SELECT * FROM Address WHERE geo.latitude <= #maxLat AND geo.latitude >= #minLat
Is it possible to convert rowset variables to scalar value for eg.
#maxKnownId =
SELECT MAX(Id) AS maxID
FROM #PrevDayLog;
DECLARE #max int = #maxKnownId;
There is no implicit conversion of a single-cell rowset to a scalar value in U-SQL (yet).
What are you interested in using the value for?
Most of the time you can write your U-SQL expression in a way that you do not need the scalar variable. E.g., if you want to use the value in a condition in another query, you could just use the single value rowset in a join with the other query (and with the right statistics, I am pretty sure that the optimizer would turn it into a broadcast join).
If you feel you cannot easily write the expression without the rowset to a scalar, please let us know via http://aka.ms/adlfeedback by providing your scenario.
Thanks for input, below is the business cases -
We have catalog data coming from source for which we need to generate unique ids. With ROW_NUMBER() OVER() AS Id method we can generate unique id. But while merging new records it changes ids of existing records also and causes issues with relational data
Below is simple solutions
//get max id from existing catalog
#maxId =
SELECT (int)MAX(Id) AS lastId
FROM #ExistingCat;
//because #maxId is not scalar, we will do CROSS JOIN so that maxId is repeated for every record.
//ROW_NUMBER() always starts from 1, we can generate next Id with maxId+ROW_NUMBER()
#newRecordsWithId =
SELECT (int)lastId + (int)ROW_NUMBER() OVER() AS Id,
CatalogItemName
FROM #newRecords CROSS JOIN #maxId;
Is there a way to further restrict the lookup performed by a database lookup functoid to include another column?
I have a table containing four columns.
Id (identity not important for this)
MapId int
Ident1 varchar
Ident2 varchar
I'm trying to get Ident2 for a match on Ident1 but wish it to only lookup where MapId = 1.
The functoid only allows the four inputs any ideas?
UPDATE
It appears there is a technique if you are interested in searching across columns that are string data types. For those interested I found this out here...
Google Books: BizTalk 2006 Recipes
Seeing as I wish to restrict on a numberic column this doesn't work for me. If anyone has any ideas I'd appreciate it. Otherwwise I may need to think about my MapId column becoming a string.
I changed the MapId to MapCode of type char(3) and used the technique described in the book I linked to in the update to the original question.
The only issue I faced was that my column collations where not in line so I was getting an error from the SQL when they where concatenated in the statement generated by the map.
exec sp_executesql N'SELECT * FROM IdentMap WHERE MapCode+Ident1= #P1',N'#P1 nvarchar(17)',N'<MapCode><Ident2>'
Sniffed this using the SQL Profiler