Number of geocoded points in concentric circles in R - r

I have a data set of around 36K hotels geocoded with latitude and longitude.
For each of them, I would need to know how many other hotels (and also which of the others) are placed in different concentric circles around each point ( 2miles, 5miles, 10miles).
For example, the dataset looks like this:
ID Latitude Longitude Rooms
1 N K 200
2 N K 150
3 N K 80
4 N K 140
5 N K 100
I would need a measure of density for each hotel in each concentric circle (which is normally calculated by dividing the number of room of the focal hotel per hotel by the total number of rooms in its concentric circle)
Normally, I would calculate the distance between each point and then filter for the ones that are within each distance but with 36k points, it would take a lot of time because I would go to calculate the distance among each point when I probably need the distance for each point with other 4-5 others maximum.
Do you have an idea on how to calculate the distance and then the density efficiently using R or ArcGIS?
Thanks

It seems the best way to make your code more efficient is not by getting a more efficient distance calculating algorithm, but by only applying that algorithm to a couple of hotels.
You could do a rough "square" approximation very quickly:
make a new dataset of hotels sorted by latitude
make a new dataset of hotels sorted by longitude
For each hotel:
make 2 new empty lists: hotels_in_lat_range and hotels_in_long_range
start at your hotel in the latitude-sorted dataset, and go up until you reach a certain limit
go back down until you reach a lower limit, adding the hotels to hotels_in_lat_range as you go along
repeat steps 4 and 5 for the longitude-sorted dataset, adding hotels to hotels_in_long_range
for every hotel that is in both lists, calculate the distance between your test hotel and that hotel. If the distance is less than your circle radius, include it when you calculate the density.
For the upper and lower limits of latitude and longitude, I'd recommend using the following approximation (I wrote this in Python because I don't know R):
min_lat = max(-89.9, test_lat - 4 * math.degrees(test_rad/Earth_rad))
max_lat = min(89.9, test_lat + 4 * math.degrees(test_rad/Earth_rad))
min_long = max(
-180.0,
test_lat - 4 * math.degrees(
test_rad/(Earth_rad * min(cos(min_lat), cos(max_lat)))
)
)
max_long = min(
180.0,
test_lat + 4 * math.degrees(
test_rad/(Earth_rad * min(cos(min_lat), cos(max_lat)))
)
)
This is a reasonable approximation when your testing radius is significantly smaller than the Earth's radius. I'd recommend staying within 100 miles.

Related

Given a set of points with x, y and z coordinates whose bounds are 0 to 1 (inclusive), determine if they're all uniformly distributed (or close to)

I'm trying to determine whether a set of points are uniformly distributed in a 1 x 1 x 1 cube. Each point comes with an x, y, and z coordinate that corresponds to their location in the cube.
A trivial way that I can think of is to flatten the set of points into 2 graphs and check how normally distributed both are however I do not know whether that's a correct way of doing so.
Anyone else has any idea?
I would compute point density map and then check for anomalies in it:
definitions
let assume we have N points to test. If the points are uniformly distributed then they should form "uniform grid" of mmm points:
m * m * m = N
m = N^(1/3)
To account for disturbances from uniform grid and asses statistics you need to divide your cube to grid of cubes where each cube will hold several points (so statistical properties could be computed) let assume k>=5 points per grid cube so:
cubes = m/k
create a 3D array of counters
simply we need integer counter per each grid cube so:
int map[cubes][cubes][cubes];
fill it with zeroes.
process all points p(x,y,z) and update map[][][]
Simply loop through all of your points, and compute grid cube position they belong to and update their counter by incrementing it.
map[x*(cubes-1)][y*(cubes-1)][z*(cubes-1)]++;
compute average count of the map[][][]
simple average like this will do:
avg=0;
for (xx=0;xx<cubes;xx++)
for (yy=0;yy<cubes;yy++)
for (zz=0;zz<cubes;zz++)
avg+=map[xx][yy][zz];
avg/=cubes*cubes*cubes;
now just compute abs distance to this average
d=0;
for (xx=0;xx<cubes;xx++)
for (yy=0;yy<cubes;yy++)
for (zz=0;zz<cubes;zz++)
d+=fabs(map[xx][yy][zz]-avg);
d/=cubes*cubes*cubes;
the d will hold a metric telling how far are the points from uniform density. Where 0 means uniform distribution. So just threshold it ... the d is also depending on the number of points and my intuition tells me d>=k means totally not uniform so if you want to make it more robust you can do something like this (the threshold might need tweaking):
d/=k;
if (d<0.25) uniform;
else nonuniform;
As you can see all this is O(N) time so it should be fast enough for you. If it isn't you can evaluate every 10th point by skipping points however that can be done only if the order of points is random. If not you would need to pick N/10 random points instead. The 10 might be any constant but you need to take in mind you still need enough points to process so the statistic results are representing your set so I would not go below 250 points (but that depends on what exactly you need)
Here few of my answers using density map technique:
Finding holes in 2d point sets?
Location of highest density on a sphere

Getting limits on latitude and longitude

I have a service that looks for nearby locations(300m) from a user specified point.
I'm using the haversine formula to check if a location is near the point
https://en.wikipedia.org/wiki/Haversine_formula
My problem is that it's slow since it's checking against all of the points in my DB.
What I want to do is limit the initial query and apply the haversine formula to a list of points in a smaller bounded area
e.g.
results = (SELECT * FROM location WHERE location.latitude BETWEEN 14.223 AND 14.5 )
AND location.longitude BETWEEN 121.5 AND 122
haversine(results, user_point)
Is there a loose way of getting the bounds from a given point?
Or basically a dirty conversion of lat/long to meters?
If you can modify your database structure, there's one super-easy way to do it: instead of (or in addition to) storing latitude and longitude, convert your location coordinates into 3D space, with columns for x, y, and z in meters. Then you can just do
SELECT * FROM location
WHERE location.x BETWEEN center.x - 300 AND center.x + 300
AND location.y BETWEEN center.y - 300 AND center.y + 300
AND location.z BETWEEN center.z - 300 AND center.z + 300
That will trim down your list pretty well, and you can do the haversine calculation on the resulting set.
If you're stuck with using a database that has only longitude and latitude in it, you can still narrow down the search. It's easy for latitude: one degree of latitude due north or south always corresponds to 111 km of distance, as long as you ignore the complications that arise when you get close to the poles. That means a distance of 300 m is 0.0027... degrees of latitude, although you might as well be a bit conservative and use 0.003 or 0.004.
Longitude is a bit trickier because the conversion factor changes depending on how far north or south you are, but it's still not too complicated: you just multiply by the cosine of the latitude.
distance = cos(latitude) * 111.19... km/degree * delta_angle
At the equator, it's the same as with latitude: one degree change in longitude at the equator is 111 km. At 80 degrees north (or south), you multiply by a factor of cos(80 degrees) = 0.17..., with the result that 1 degree change in longitude is only 19.3 km. For your purposes, you could invert this and find the range of longitudes to select as 300 m / cos(latitude) / (111.19... km/degree) = (0.0027... degrees) / cos(latitude). That coefficient is the same quantity from the first paragraph; it's not a coincidence.
The tricky problems come up near the discontinuities of the coordinate system, for example when you get near the poles. You can see why when you start plugging in latitudes like 89.9996 degrees:
0.0027... degrees / cos(89.9996 degrees) = 386... degrees
Well, how can that be when there are only 360 degrees in a whole circle? This is an indicator that you've gotten to the point where your 300 m radius extends all the way around the pole and comes back to include your starting location, in a manner of speaking. At that point, you might as well just search all points in your database close enough to the pole. Of course you should really start doing this at 89.999 degrees or so, because that's where the 600 m diameter of the region you're searching just about encircles the pole completely.
There's another issue at (well, near) the International Date Line, or more precisely the "antimeridian", having to do with the jump from -180 to +180 degrees of longitude. A point at +179.9999 degrees and one at -179.9999 degrees, both on the equator, will have very different coordinates even though they are geographically just a few meters apart. Since you're just doing this as a preliminary filter for a more detailed search, it's probably easiest to just pass through every point within 0.006 degrees (that's roughly the diameter of a 300 m-radius circle) of the antimeridian, and then the haversine calculation will determine whether the points are actually close.
To sum up, you can use the bounds on latitude and longitude I mentioned above and just add special cases for the poles and the antimeridian. In some kind of pseudo-SQL/code hybrid:
IF abs(center.latitude) > 89.999
SELECT * FROM location WHERE abs(location.latitude - center.latitude) < 0.003
ELSE
IF abs(center.longitude) > 179.997
SELECT * FROM location
WHERE abs(location.latitude - center.latitude) < 0.003
AND 180 - abs(location.longitude) < (0.006 / cos(center.latitude))
ELSE
SELECT * FROM location
WHERE abs(location.latitude - center.latitude) < 0.003
AND abs(location.longitude - center.longitude) < (0.003 / cos(center.latitude))
ENDIF
ENDIF
If you want a pithy statement at the expense of having to test potentially twice as many points, you can only compare the absolute values of longitude:
SELECT * FROM location
WHERE abs(location.latitude - center.latitude) < 0.003
AND abs(abs(location.longitude) - abs(center.longitude)) <= min(0.003 / cos(center.latitude), 180)
Approximating the earth with a sphere, the distance between two consecutive latitudes can be calculated by
dPerLat = pi * r / 180°,
where r is the radius of the earth. This will be about 111 km.
So, if your reference point is (lat, long) and your search radius is d then you want to search for latitudes in the range
lat* \in [lat - d / dPerLat, lat + d / dPerLat]
Then, for a given latitude, the distance of consecutive longitudes is:
dPerLong = pi * r * cos(lat) / 180°
Again, the range of longitudes to search is +- d / dPerLong. You should use the lat value that gives you a conservative (maximal) range, i.e. the lat value with the highest absolute value.
Be careful at the poles.

How to calculate distance from points to multipolygons within a buffer distance in R

I have a file with 52,000 points distributed in Brazil and a map of forest remnants (in polygon format).
What I want to do is calculate the distance from each point to each forest fragment that is within a buffer of, for example, 500m. So, if I have 3 fragments within a buffer of 500m, I want to have all three distances (euclidian) calculated from the centroid (focal point) to these fragments.
At the end I would like to take the mean distance from each focal point to their respective fragments.
I tried the function gWithinDistance,from the package "rgeos", like below:
near_frag_500 <- gWithinDistance (points, veg_natural, 500, byid=T)
being the argument "points" my focal points and "veg_natural" my forest remnant polygons. The number 500 refers to the buffer of 500m I want to calculate the distance. However, the output of this function is a matrix with TRUE or FALSE values. TRUE for those polygons which fall within the 500m buffer and FALSE for those polygons which fall outside the 500m buffer. It doesn´t give me the actual values of the distances calculated. I guess what I am looking for is an equivalent to the "Generate Near Table" function in ArcGIS.
I would really appreciate if someone could help me with that! I also have my forest remnants polygons in raster if there is any solution for that using a raster file.
I have made a simple test set with 7 points and 8 polygons. Everything has to be projected to a cartesian system in metres, so not lat-long. Use a local UTM zone if nothing else.
I compute the distance matrix from points to polygons:
> dmat = gDistance(points, veg_natural,byid=TRUE)
Then mask out anything over 500, and compute the row means:
> dmat[dmat>500]=NA
> apply(dmat, 1, mean, na.rm=TRUE)
0 1 2 3 4 5 6 7
331.5823 262.7129 380.2073 187.2068 111.9961 NaN 224.6962 360.7995
and that is the mean of the distances from each point to the nearest features within 500m. Note the NaN for point 5 which is because it is not 500m from any polygon features.
If this matrix is too big for your case with 52,000 points (and ?? polygons?) then just do it for 1000 points at a time in a loop or whatever your computer can cope with. I think mine would fall over with 52,000.
If you want to know which of the polygons are the ones within 500m of each point, then something like:
> apply(dmat,1, function(r){which(!is.na(r))})
$`0`
5 6
5 6
$`1`
4 5 7
4 5 7
shows my first point (labelled 0) is near to polygons 5 and 6.

Trying to find lat lon given original lat lon + 3 meters

I have this problem I have to solve.
I am given a coordinate lat/lon, and I need to find a random point within 3 meters of this original point. Approximations are good, but all I could find was this https://gis.stackexchange.com/questions/2951/algorithm-for-offsetting-a-latitude-longitude-by-some-amount-of-meters that has a 10 meter error. Thank you.
Not sure what "find" and "random" mean in this question.
The earth is about 10 million meters from equator to either pole (that's actually how they defined the size of the meter, at first; it's been modified slightly since). The width of latitude lines doesn't vary, so one meter north or south is always is one ten-millionth of 90 degrees, or 9e-6 degrees, so just multiply that by the north-south displacement in meters of your desired point from the initial point and you'll get the number to add to the initial point in degrees: delta_lat = y_meters * 9e-6.
The width of longitude lines does vary, but it works out as simply east-west displacement in meters * 9e-6 = delta_lon * cos(lat), which means you can use the distance from your initial point to figure the east-west difference in degrees: delta_lon = x_meters * 9e-6/cos(lat).
You'll have to be careful with that last part around the poles, because cos(lat) will approach zero. Navigational systems use quaternions to do these things because they don't have singularities in spherical coordinates.

Calculating angle from latitude and longitude

I have a set of latitudes and longitudes , so this is the data for an animal as it moves in time. what i want to do is to calculate turning angle, that is by what angle it turns between every movement. so say i have point 1, point 2 and point 3 with latitude and longitude value corresponding to each point(animal moves from point 1 to point 2 to point 3 and so on) and i want to calculate the angle between these 3 points, point 2 being the middle point. what should i do? my OS is windows and i am using R for analysis.
so here is my sample data:
longitude latitude
36.89379547 0.290166977
36.89384037 0.290194109
36.88999724 0.286821044
36.88708721 0.288339411
36.88650313 0.29010232
36.88563203 0.289939416
36.88545224 0.290924863
they are in decimal degrees
Using the function trackAzimuth in maptools:
library(maptools)
trackAngle <- function(xy) {
angles <- abs(c(trackAzimuth(xy), 0) -
c(0, rev(trackAzimuth(xy[nrow(xy):1, ]))))
angles <- ifelse(angles > 180, 360 - angles, angles)
angles[is.na(angles)] <- 180
angles[-c(1, length(angles))]
}
The trackAzimuth function is a simple loop wrapper around gzAzimuth. See ?gzAzimuth for references on calculating directions on the sphere.
Using your data:
x <- read.table(text = "longitude latitude
36.89379547 0.290166977
36.89384037 0.290194109
36.88999724 0.286821044
36.88708721 0.288339411
36.88650313 0.29010232
36.88563203 0.289939416
36.88545224 0.290924863", header = TRUE)
trackAngle(as.matrix(x))
[1] 10.12946 111.17211 135.88514 97.73801 89.74684
EDIT: I had to remove first/last angles from the function, something I was doing after the fact with this function elsewhere. Should be right now. :)
Also, the packages adehabitatLT and argosfilter contain functions to calculate track directions and angles.
Your data points vary over only a small range. We can look at one small patch of Earth's surface and pretend it's flat, two dimensional. You have to figure out the scale of how many km, meters, miles, whatever your favorite unit is, corresponds to one degree of latitude, and for one degree of longitude. The latter depends on latitude - it'll be the same as the scale for latitude when near the equator, but if you are standing within arm's length of the north pole, one step will take you through fifty degrees. Set up x,y coordinates where x=0 is at longitude 36.88000, and y=0 is latitude 0.29000.
So, now you have a series of (x,y) points. Take the differences from each point to the next: P2-P1, P3-P2, etc. These could be called "displacement vectors" but other terms may be used in other fields than where i'm from. Call them V1, V2, etc. Use dot products and norms: dot(V1,V2) = magnitude(V1)*magnitude(V2)*cos(a) where a is the angle by which V2 deviates from the direction of V1. Repeat for V3 and V2, and so on.
R has all the tools to do this, but I don't know enough syntax of R to give examples.

Resources