I need to calculate the shortest distance between two point matrices. I am new to R and have no clue how to do this. This is the code that I used to call in the data and convert them to points
library(dismo)
laurus <- gbif("Laurus", "nobilis")
locs <- subset(laurus, select = c("country", "lat", "lon"))
#uk observations
locs.uk <-subset(locs, locs$country=="United Kingdom")
#ireland observations
locs.ire <- subset(locs, locs$country=="Ireland")
uk_coord <-SpatialPoints(locs.uk[,c("lon","lat")])
ire_coord <-SpatialPoints(locs.ire[,c("lon","lat")])
crs.geo<-CRS("+proj=longlat +ellps=WGS84 +datum=WGS84") # geographical, datum WGS84
proj4string(uk_coord) <-crs.geo #define projection
proj4string(ire_coord) <-crs.geo #define projection
I need to calculate the shortest distance (Euclidean) from points in Ireland to points in UK. In other words I need to calculate the distance from each point in Ireland to its closet point in the UK points layer.
Can some one tell me what function or package I need to use in order to do this. I looked at gdistance and could not find a function that calculate the shortest distance.
You can use the FNN package which uses spatial trees to make the search efficient. It works with euclidean geometry, so you should transform your points to a planar coordinate system. I'll use rgdal package to convert to UK grid reference (stretching it a bit to use it over ireland here, but your original data was New York and you should use a New York planar coord system for that):
> require(rgdal)
> uk_coord = spTransform(uk_coord, CRS("+init=epsg:27700"))
> ire_coord = spTransform(ire_coord, CRS("+init=epsg:27700"))
Now we can use FNN:
> require(FNN)
> g = get.knnx(coordinates(uk_coord), coordinates(ire_coord),k=1)
> str(g)
List of 2
$ nn.index: int [1:69, 1] 202 488 202 488 253 253 488 253 253 253 ...
$ nn.dist : num [1:69, 1] 232352 325375 87325 251770 203863 ...
g is a list of indexes and distances of the uk points that are nearest to the 69 irish points. The distances are in metres because the coordinate system is in metres.
You can illustrate this by plotting the points then joining irish point 1 to uk point 202, irish 2 to uk 488, irish 3 to uk 202 etc. In code:
> plot(uk_coord, col=2, xlim=c(-1e5,6e5))
> plot(ire_coord, add=TRUE)
> segments(coordinates(ire_coord)[,1], coordinates(ire_coord)[,2], coordinates(uk_coord[g$nn.index[,1]])[,1], coordinates(uk_coord[g$nn.index[,1]])[,2])
gDistance() from the rgeos package will give you the distance matrix
library(rgeos)
gDistance(uk_coord, ire_coord, byid = TRUE)
Another option is nncross() from the spatstat package. Pro: it gives the distance to the nearest neighbour. Contra: you 'll need to convert the SpatialPoints to a SpatialPointPattern (see ?as.ppp in statstat)
library(spatstat)
nncros(uk.ppp, ire.ppp)
The package geosphere offers a lot of dist* functions to evaluate distances from two lat/lon points. In your example, you could try:
require(geosphere)
#get the coordinates of UK and Ireland
pointuk<-uk_coord#coords
pointire<-ire_coord#coords
#prepare a vector which will contain the minimum distance for each Ireland point
res<-numeric(nrow(pointire))
#get the min distance
for (i in 1:length(res)) res[i]<-min(distHaversine(pointire[i,,drop=FALSE],pointuk))
The distances you'll obtain are in meters (you can change by setting the radius of the earth in the call to distHaversine).
The problem with gDistance and other rgeos functions is that they evaluate the distance as the coordinates were planar. Basically, the number you obtain is not much useful.
Related
I have data points of a species observed using camera traps and would like to measure the distance of each camera trap site (CameraStation) to the edge of a national park using R. I have a shapefile of the park (shp) and want to apply a criterion to CameraStation(s) which are <5km from the edge. My data frame (df) consists of multiple events/observations (EventID) per CameraStation. The aim is to analyse when events near the park edge are most frequent given other environmental factors such as Season, Moon Phase and DayNight (also columns in DF).
I found a package called distance in R but this is for distance sampling and not what I want to do. Which package is relevant in this situation?
I expect the following outcome:
EventID CameraStation Distance(km) Within 5km
0001 Station 1 4.3 Yes
0002 Station 1 4.3 Yes
0003 Station 2 16.2 No
0004 Station 3 0.5 Yes
...
Here's a general solution, adapted from Spacedmans answer to this question at gis.stackexchange. Note: This solution requires working in a projected coordinate system. You can transform to a projected CRS if needed using spTransform.
The gDistance function of the rgeos package calculates the distance between geometries, but for the case of points inside a polygon the distance is zero. The trick is to create a new "mask" polygon where the original polygon is a hole cut out from the mask. Then we can measure the distance between points in the hole and the mask, which is the distance to the edge of the original polygon that we really care about.
We'll use the shape file of the Yellowstone National Park Boundary found on this page.
library(sp) # for SpatialPoints and proj4string
library(rgdal) # to read shapefile with readOGR
library(rgeos) # for gDistance, gDifference, and gBuffer
# ab67 was the name of the shape file I downloaded.
yellowstone.shp <- readOGR("ab67")
# gBuffer enlarges the boundary of the polygon by the amount specified by `width`.
# The units of `width` (meters in this case) can be found in the proj4string
# for the polygon.
yellowstone_buffer <- gBuffer(yellowstone.shp, width = 5000)
# gDifference calculates the difference between the polygons, i.e. what's
# in one and not in the other. That's our mask.
mask <- gDifference(yellowstone_buffer, yellowstone.shp)
# Some points inside the park
pts <- list(x = c(536587.281264245, 507432.037861251, 542517.161278414,
477782.637790409, 517315.171218198),
y = c(85158.0056377799, 77251.498952222, 15976.0721391485,
40683.9055315169, -3790.19457474617))
# Sanity checking the mask and our points.
plot(mask)
points(pts)
# Put the points in a SpatialPointsDataFrame with camera id in a data field.
spts.df <- SpatialPointsDataFrame(pts, data = data.frame(Camera = ordered(1:length(pts$x))))
# Give our SpatialPointsDataFrame the same spatial reference as the polygon.
proj4string(spts.df) <- proj4string(yellowstone.shp)
# Calculate distances (km) from points to edge and put in a new column.
spts.df$km_to_edge <- apply(gDistance(spts.df, difference, byid=TRUE),2,min)/1000
# Determine which records are within 5 km of an edge and note in new column.
spts.df$edge <- ifelse(spts.df$km_to_edge < 5, TRUE, FALSE)
# Results
spts.df
# coordinates Camera km_to_edge edge
# 1 (536587.3, 85158.01) 1 1.855010 TRUE
# 2 (507432, 77251.5) 2 9.762755 FALSE
# 3 (542517.2, 15976.07) 3 11.668700 FALSE
# 4 (477782.6, 40683.91) 4 4.579638 TRUE
# 5 (517315.2, -3790.195) 5 8.211961 FALSE
Here's a quick solution.
Simplify the outline of your shapefile into N points. Then calculate the minimum distance for each camera trap to every point in the outline of the national park.
library(geosphrere)
n <- 500 ##The number of points summarizing the shapefile
NPs <- ##Your shapefile goes here
NP.pts <- spsample(NPs, n = n, type = "regular")
CP.pts <- ## Coordinates for a single trap
distances<-distm(coordinates(CP.pts), coordinates(NP.pts), fun = distHaversine)/1000
##Distance in Km between the trap to each point in the perimeter of the shapefile:
distances
Use distances to find the minimum distance between the shapefile and that given trap. This approach can easily be generalizable using for loops or apply functions.
I had a problem with the points data frame and shape file being projected so instead I used the example in this link to answer my question
https://gis.stackexchange.com/questions/225102/calculate-distance-between-points-and-nearest-polygon-in-r
Basically, I used this code;
df # my data frame with points
shp # my shapefile (non-projected)
dist.mat <- geosphere::dist2Line(p = df2, line = shp)
coordinates(df2)<-~Longitude+Latitude # Longitude and Latitude are columns in my df
dmat<-data.frame(dist.mat) # turned it into a data frame
dmat$km5 <- ifelse(dmat$distance < 5000, TRUE, FALSE) # in meters (5000)
coordinates(dmat)<-~lon+lat
df2$distance <- dmat$distance # added new Distance column to my df
I want to calculate road network distances between a reference line (or a reference point if a single point facilitates the possible solution) and a dataframe of long/lat points. I have the following data frame:
Latitude Longitude
1 40.66858 22.88713
2 40.66858 22.88713
3 40.66858 22.88713
4 40.66858 22.88713
5 40.66858 22.88714
6 40.66857 22.88715
7 40.66858 22.88716
8 40.66858 22.88717
9 40.66859 22.88718
10 40.66861 22.88719
and the following reference line with start/end coordinates:
22.88600 40.66885
22.88609 40.66880
(If we want a single reference point in the middle of the line (instead of the whole line) its coordinates are: 22.88602844465866,40.66883357487465)
Here is a screenshot from google earth after plotting the points and the line:
I have tried to compute the distances of each point with the reference line with the following way:
dist2Line(points, line, distfun=distHaversine) #from geosphere package
The distance which is computed (e.g. for the first point) is the one with the yellow line in the following screenshot. The desired one is the one with the red
line (road network distance). How can I solve this? I want to compute the road network distances for all points!
Thank you in advance!
library(sp)
library(rgeos)
library(geosphere)
Let's join the midpoint of your line to the other line:
pt1 <- matrix(c(22.88600, 40.66885), ncol=2)
pt2 <- matrix(c(22.88609, 40.66880), ncol=2)
midpt <- as.data.frame(midPoint(pt1, pt2))
NOTE: The first 4 line points are the same in your supplied data
read.csv(text="lat,lon
40.66858,22.88713
40.66858,22.88713
40.66858,22.88713
40.66858,22.88713
40.66858,22.88714
40.66857,22.88715
40.66858,22.88716
40.66858,22.88717
40.66859,22.88718
40.66861,22.88719", stringsAsFactors = FALSE) -> l
l <- rbind.data.frame(midpt, l)
Using the midpoint on the line isn't perfect so you could use the spatial intersection operations as well to find the correct intersecting point.
Now, make it a spatial object and give it the boring longlat "projection".
l <- SpatialLines(list(Lines(Line(l[,2:1]), "1")), proj4string = CRS("+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"))
Convert said "projection" to something meaningful (I picked EPSG:3265, but choose whatever you want so you can get real distance):
l <- spTransform(l, CRS("+init=epsg:3265"))
Get the points from the line:
pts <- as(l, "SpatialPoints")
Follow How to calculate geographic distance between two points along a line in R? to get the distance between points which you can do the rest from there:
diff(sort(gProject(l, pts, normalized = FALSE)))
## [1] 372.553928 0.000000 0.000000 0.000000 3.360954 4.581859
## [7] 4.581860 3.360956 4.581862 7.077129
It'd be 👍🏼 if someone who knows how to do this with sf could do that as well since I couldn't find a gProject equivalent.
I have what may be a very simplistic question on the KEST function in Spatstat.KEST graph output I'm using the KEST function in Spatstat to assess spatial randomness in a dataset. I have uploaded lat and long values spread over London and converted them to a PPP object, using the ripras function to specify the spatial domain. When I run my KEST analysis on my ppp, and plot the graph, I end up with an r value on the x, but although I know this is a distance measurement, I don't know what units it's using. I get this summary output:
Planar point pattern: 113 points
Average intensity 407.9378 points per square unit
Coordinates are given to 9 decimal places
Window: polygonal boundary
single connected closed polygon with 14 vertices
enclosing rectangle: [-0.5532963, 0.3519148] x [51.2901, 51.7022] units
Window area = 0.277003 square units
with the max r on the x axis being 0.1 units, and the K(r) on the y axis being 0.04. How do I figure out what unit of distance these equate to?
Your lat,lon coordinates correspond to points on a sphere (or ellipsoid or whatever) used as a model for planet Earth. Essentially, spatstat assumes you are using coordinates projected on a flat map. This conversion could be done with e.g. the sp package (using Buckingham Palace as an example):
library(sp)
lat = c(51.501476)
lon = c(-0.140634)
xy = data.frame(lon, lat)
coordinates(xy) <- c("lon", "lat")
proj4string(xy) <- CRS("+proj=longlat +datum=WGS84")
NE <- spTransform(xy, CRS("+proj=utm +zone=30 ellps=WGS84"))
NE <- as.data.frame(NE)
The result is a data.frame with projected coordinates in Easting, Northing in metres. Then you can continue your analysis from there. To assign a unit label like "m" for prettier labels in figures use the function unitname on your ppp object (assuming the object is called X): unitname(X) <- "m"
If the function is able to accept geographic coordinates, then it is using a great circle equation to calculate distance. This normally results in units that are in Kilometers.
It is not very good practice to perform PPA on non-projected data. If possible, you should project your data into a coordinate system that is in distance units. I believe that most of the functions in spatstat use Euclidean distance, which is quite inappropriate for projection units in decimal degrees. Since there is not a latlong argument in the Kest function, I do not believe that your results are valid.
The K function itself (i.e. the theoretical K-function, not just the computer code) assumes that the space is flat rather than curved.
This would probably be a reasonable approximation in your case (points scattered over a few dozen kilometres) but not for a point pattern scattered over a continent. That is, in general the planar K-function should not be used for point patterns on a sphere.
The other posts are correct. The Kest function expects the coordinates to be given in an isometric coordinate system. You just need to express the spatial locations in a coordinate system in which the x and y coordinates are measured in the same distance units. Longitude and latitude are not measured in the same distance units because one degree (say) of longitude does not represent the same distance as one degree of latitude. Ege Rubak's example using spTransform is probably the best way to go.
I assumed (probably wrongly) that in the easiest cases the output of variog in the geoR package and variogram in the sp package would have been the same.
I have this dataset:
head(final)
lat lon elev seadist tradist samples rssi
1 60.1577 24.9111 2.392 125 15.21606 200 -58
2 60.1557 24.9214 3.195 116 15.81549 200 -55
3 60.1653 24.9221 4.604 387 15.72119 200 -70
4 60.1667 24.9165 7.355 205 15.39796 200 -62
5 60.1637 24.9166 3.648 252 15.43457 200 -73
6 60.1530 24.9258 2.733 65 16.10631 200 -57
that is made of (I guess) unprojected data, so I project them
#data projection
#convert to sp object:
coordinates(final) <- ~ lon + lat #longitude first
library(rgdal)
proj4string(final) = "+proj=longlat +datum=WGS84"
UTM <- spTransform(final, CRS=CRS("+proj=utm +zone=35V+north+ellps=WGS84+datum=WGS84"))
and produce the variogram without trend according to the gstat library
var.notrend.sp<-variogram(rssi~1, UTM)
plot(var.notrend.sp)
trying to get the same output in geoR I go with
UTM1<-as.data.frame(UTM)
UTM1<-cbind(UTM1[,6:7], UTM1[,1:5])
UTM1
coords<-UTM1[,1:2]
coords
var.notrend.geoR <- variog(coords=coords, data=rssi,estimator.type='classical')
plot(var.notrend.geoR)
A couple of points.
gstat can work with unprojected data, and will compute the great-circle distance
setting the "projection" to be "+proj=longlat +datum=WGS84" does not transform the data to a cartesian grid-based system (such as UTM)
What you are seeing in the output of variogram is the fact that is (sensibly) using great circle distances. If you look at the scale of the distance axis, you will see that the ranges are quite different, because geoR doesn't know (and can't account for) the fact you are not using a grid-based projection.
If you want to compare apples with apples use rgdal and spTransform to transform the coordinate system to an appropriate projection and then create variograms with similar specifications. (Note that gstat defines a cutoff ( the length of the diagonal of the box spanning the data is divided by three.)).
The empirical variogram is highly dependent on the definition of distance and the choice of binning. (see the brilliant model-based geostatistics by Diggle and Ribeiro, especially chapter 5 which deals with this issue in detail.
I have a set of about 5000 geographical (WGS84) coordinates. All of them are inside 40km square.
Is there any algorithm / R function to find point, inside square and not in the given set, farthest from any point from set?
I mean how to find point in the square where the distance to the nearest point from set is longest?
Now I do it by generating grid of coordinates equally spaced and finding distance from each grid point to the nearest set point. Is there any less numerical / not brute force method?
EDIT:
I made mistake in previous version of the question. Maybe this will help:
Set of points are coordinates of the 5000 shops in the city. I want to find place in the city where distance to the nearest shop is the longest.
I think that if the point you seek isn't on the edge of the box then it has to be at a vertex of the voronoi tesselation of the points. If it is on the edge of the box then it has to be on the intersection of the box and an edge of the voronoi tesselation.
So if you compute the voronoi tesselation and then use rgeos to intersect it with the box, that gives you a set of possible points. You can then use the FNN package to compute the neighbour distances from those possible points to the data points, sort, and find the possible point with the biggest nearest neighbour.
That gives you an exact point without any of this gridding business. If it wasn't so close to bedtime I'd sort out some code to do it. You probably want the deldir package or voronoi tesselations. It might even already do the box intersection...
Okay, not quite bedtime. Here's the solution:
findM <- function(pts,xmin,xmax,ymin,ymax){
require(deldir)
require(FNN)
d = deldir(pts[,1],pts[,2],rw=c(xmin,xmax,ymin,ymax))
vpts = rbind(as.matrix(d$dirsgs[,1:2]),as.matrix(d$dirsgs[,3:4]))
vpts = rbind(vpts,cbind(c(xmin,xmax,xmin,xmax),c(ymin,ymin,ymax,ymax)))
vpts = vpts[!duplicated(vpts),]
nn = get.knnx(pts,vpts,k=1)
ptmin = which(nn$nn.dist==max(nn$nn.dist))
list(point = vpts[ptmin,,drop=FALSE], dist = nn$nn.dist[ptmin])
}
Edited version now returns one point and adds the corner points as possibles.
Here's an example that uses several functions (distanceFromPoints(), maxValue(), Which(), and xyFromCell()) from the raster package to perform the key calculations:
# Load required libraries
library(sp)
library(rgdal)
library(raster)
# Create a SpatialPoints object with 10 points randomly sampled from
# the area lying between longitudes 0 and 1 and latitudes 0 and 1
bbox <- matrix(c(0,0,1,1), ncol=2, dimnames = list(NULL, c("min", "max")))
PRJ4 <- CRS("+proj=longlat +datum=WGS84 +ellps=WGS84")
S <- Spatial(bbox = bbox, proj4string = PRJ4)
SP <- spsample(S, 10, type="random")
# Create a raster object covering the same area
R <- raster(extent(bbox), nrow=100, ncol=100, crs=PRJ4)
# Find the coordinates of the cell that is farthest from all of the points
D <- distanceFromPoints(object = R, xy = SP)
IDmaxD <- Which(D == maxValue(D), cells=TRUE)
(XY <- xyFromCell(D, IDmaxD))
# x y
# [1,] 0.005 0.795
# Plot the results
plot(D, main = "Distance map, with most distant cell in red")
points(SP)
points(XY, col="red", pch=16, cex=2)