Here is the question I have for all
I already have an existing system that does the "Near By Zipcodes" search.
The purpose of this post is to see if I can optimize the process.
Currently, I have about 43k zipcode records.
The way my current algorithm works is I select 1 record and do a full table scan of 43k records.
Is there a way to only take a subset (lat range, long range) and do the calculation that way?
You can get lat/lon for the centroids of all the zip codes from google, or geocoder.us (or wherever you get your geo data). Use them in your where clause as a rough guide to narrow your search.
What I usually do is bracket my searches plus and minus one degree. As mentioned, degrees latitude are about 69 miles. With longitude, things get a little more complicated. The closer you get to the North pole, the shorter the distance will be from one degree of longitude to another. For instance, at the Florida / Georgia border, one degree of longitude is about 55 miles apart, but at the California / Oregon border, it's only about 50 miles apart.
You first need to have the latitude / longitude for your starting location and probably want to set 4 variables such as: beglat, endlat, beglong and endlong. Assign the beglat to the starting location's latitude minus one degree, the endlat to the starting location's latitude plus one degree, etc. For a tighter search radius, you can choose half a degree or something else.
Related
I have a large set of data with 7 stations along a river. During the collection of the data, samples were taken up and down a range of the river to get a representative idea of the environment. These aren't insignificant distances, the reaches can be over 50 miles over a curvy river. However, for the purposes of analysis, I've been asked to compress this data to a representative site in the middle of this reach using the latitude and longitude. This would be easy to do in GIS, but working with R I haven't been able to find a good method. I've tried taking the median of both the latitude and longitude and comparing them, but there's no overlap and I don't think that's a justifiable method. Is there a good package in R or a mathematical trick that I'm missing to find a median latitude, longitude pair?
I am running a taxicab distance function on a list of coordinates and I would like to convert the outcome integer to a mile or km quantity. For example:
0.0117420 = |40.721319 - 40.712278| + |-73.844311 - -73.841610|
Where 0.0117420 is the output I would like to convert to mi/km. How could I go about this?
This appears to be a situation where you are trying to navigate from (40.721319, -73.844311) to (40.712278, -73.841610) where these are lat / lon pairs, and you want to navigate using a "Manhattan" routing rather than a direct great circle route.
It looks like you are considering these points as opposite corners of a "rectangle" where travel is only allowed along north, south, east and west headings to move from one point to another and where travel along the path always brings the traveler closer to the destination point.
An approximation of this is to find one of the corners of the bounding rectangle for all such paths. There are two of them, one at (40.721319, -73.841610) and the other at (40.712278, -73.844311). So, you can pick one of these and chose that as a waypoint for approximating the length each possible "Manhattan routes" between the two points. If we chose the first, you will need to calculate the distance from the starting point to the waypoint then to the destination point. Such as:
l(0) = (40.721319, -73.844311)
l(1) = (40.721319, -73.841610)
l(2) = (40.712278, -73.841610)
Using the Haversine equations we see the distance from l(0) to l(1) is approximately 0.2276km and the distance from l(1) to l(2) is approximately 1.005km making the entire route about 1.2326km.
This is approximately the length of any "Manhattan route" you pick where the distance is strictly decreasing along the path taken between the two points. There are also some errors due to the curvature of the Earth, but for points this close to each other and so distant from either of the poles, this should be good enough for most applications.
I have a list of GPS coordinates and I want to see if they fall in a certain range of GPS coordinates. My list is about 6000 points, but I have 44000 ranges. These GPS coordinates are based on street addresses so I was thinking to narrow my ranges by street name first and then seeing if the coordinates fall within the subset of ranges, which are street blocks. Otherwise I would have 6000x44000 searches and that would take forever. Anyone have any idea on what would be the most efficient way to do this? I'm completely new to R and coding in general so I'm have no idea where to start.
I work on a project building recognition system. I want to ask the experienced people in collecting GPS data from a map.
I want to partition the map into grids the area of each is 30 meter * 30 meter in each grid i want to store the center GPS coordinate (i.e point(15, 15)
What is the best way to do this?
Here's an image that demonstrates what I need.
This is not so easy:
There are two ways:
The professional solution:
Draw the grid 30x30 using the UTM coordinate system of that country / city.
UTM is measured in meters and is a flat cartesian coordinate system, while latitude / longitude are spehrical and not linear in x,y.
Align your grid such that it corresponds to integral UTM coordinates.
Then you need a method sto transform from latitude/longitude WGS84 to UTM (hopefully WGS84, but in some countries other ellipsoids are used)
The map display software should be apple to use UTM coordinate system.
And the simpler one:
stay using the lat long cooridnates and calculate the latitudinalGridWithDegrees measure in degrees wich corresponds to 30m (in the middle of the map/ city/ curch of city)
Since lat and lon do not use the same scale (1 degree of latitude differecne is not the same meters as one degree in longitude - only at the aequator), additionally calculate the longitudinalGridWithDegrees.
You will get two different values (they differ by a factor of cos(mapCenterLatitudeRadians).
To calculate this values either understand geo calculations or simply use a function which creates an offset point by given radius and direction from an start point. Use the center of the map as point to be offset (start point).
Create one point with offset 30m ,and heading = 0°, then measure the lat difference by subtracting and store in latitudinalGridWithDegrees
Do the same using heading = 90, and measure the longitudinal difference and store in longitudinalGridWithDegrees.
Now you are able two draw the grid using as lat and long steps the value of that both variables.
These then gives the corner points of such an square, and you always use latitude and longituide as interfcae to the mapping software.
If you are living in special countries/continents like Australia or Norwegen where the continent drift with up to 1m per year, it is more difficult.
Other continents drift only 1mm per year, which you and all other apps just ignore.
Advantages / Disadvantages of each solution
Simple Solution:
- grid is not square at corners of map, but probabyl not visisble for a city
- grid cell is exactly 30m only in the center of the map / reference point choosen for offset calc)
+ easier implementtaion
+ maping software interfcae simpler and always supported
Professional solution.
+ grid will match professional paper maps
+ grid cell is always exactly 30m
- needs geo transformation software or method
- unclear whether map display software provdes UTM (in most cases not)
However, maybe it is easier to assign road and house numbers to be measured, supported to match more ore less the grid.
Mine are follow-ups to the question & answer in Approaches for spatial geodesic latitude longitude clustering in R with geodesic or great circle distances.
I would like to better understand:
Question #1: If all the lat / long values are within the same city, is it necessary to use either fossil or distHaversine(...) to first calculate great circle distances ?
or, within a single city, is it OK to run clustering on the lat/long values themselves ?
Question #2: jlhoward suggests that :
It's worth noting that these methods require that all points must go into some cluster. If you just ask which points are close together, and allow that some cities don't go into any cluster, you get very different results.
In my case I would like to ask just ask "which points are close together", without forcing every point into a cluster. How can I do this ?
Question #3: To include one or two factor variables into the clustering (in addition to lat/long), is it as easy as including those factor variables in the df upon which the clustering is run ?
Please confirm.
Thanks!
"within a single city, is it OK to run clustering on the lat/long values themselves ?"
Yes, as long as your city is on the equator, where a degree of longitude is the same distance as a degree of latitude.
I'm standing very close to the north pole. One degree of longitude is 1/360 of the circumference of the circle round the pole from me. Someone ten degrees east of me might only be ten feet away. Someone one degree south of me is miles away. A clustering algorithm based on lat-long would think that guy miles away was closer to me than the guy I can wave to ten degrees east of me.
The solution for small areas to save having to compute great-circle ellipsoid distances is to project to a coordinate system that is near-enough cartesian so that you can use pythagoras' theorem for distance without too much error. Typically you would use a UTM zone transform, which is essentially a coordinate system that puts its equator through your study area.
The spTransform function in sp and rgdal will sort this out for you.