I am working on a university project which combines data science and GIS. We need to find an open-source solution capable of obtaining additional information from a massive GPS coordinates dataset. Clearly, I cannot use any API with daily request limit.
THE DATA
Here you can find a sample of the dataset the Professor provided us:
longitude <- c(10.86361, 10.96062, 10.93032, 10.93103, 10.93212)
latitude <- c(44.53355, 44.63234, 44.63470, 44.63634, 44.64559)
longlat <- data.frame(longitude, latitude)
ID <- seq.int(1, 10)
FIRST TASK: Already Accomplished!
The first step was joining my SpatialPoints with a SpatialPolygonsDataFrame using over() of rgeos. The SpatialPolygonsDataFrame was obtained through getData('GADM', country='ITA', level=3) of rgeos.
For this first accomplished task, the objective was to associate to each GPS coordinates the information about City and Region which they belong to.
An example of the result I was able to obtain is:
require(sp)
require(rgeos)
my_spdf <- SpatialPointsDataFrame(coords = longlat, data = ID, proj4string = CRS(" +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 "))
italy_administrative_boundaries_level3 <- getData('GADM', country='ITA', level=3)
result <- over(my_spdf, italy_administrative_boundaries_level3)[, c("NAME_0", "NAME_1", "NAME_2", "NAME_3")]
result$ID <- ID
print(result)
SECOND TASK: MY QUESTION
Now the stuff become tricky because I need to associate additional and deeper information like road_name and road_type.
This information are contained in the shapefiles created on OpenStreetMap and available at: download.geofabrik.de/europe/italy.html.
I loaded the shapefile in R obtaining a SpatialLinesDataFrame:
require(rgdal)
shapefile_roads <- readOGR(dsn = "./road", layer = "roads")
Then, I naively tried to apply the same technique as for joining SpatialPoints and SpatialPolygonsDataFrame:
result <- over(my_spdf, shapefile_roads)
Clearly, the result is just NA. One possible reason that came into my mind was that the coordinates of my_df are not in the exact position of the Lines in shapefile_roads, therefore, I should need some kind of radius parameter. However, I am not really sure.
Can you suggest me the correct approach to perform this spatial join between my SpatialPoints and the attributes of the SpatialLinesDataFrame obtained from the road_shapefile of OpenStreetMap?
Please if something is not very clear do not hesitate to ask.
Your example data
library(raster)
longitude <- c(10.86361, 10.96062, 10.93032, 10.93103, 10.93212)
latitude <- c(44.53355, 44.63234, 44.63470, 44.63634, 44.64559)
longlat <- data.frame(longitude, latitude)
ID <- data.frame(ID=1:5)
ita_gadm3 <- getData('GADM', country='ITA', level=3)[, c("NAME_0", "NAME_1", "NAME_2", "NAME_3")]
#use `sp::over` or `raster::extract`
result <- extract(ita_gadm3, longlat)
Some roads:
road <- spLines(cbind(longitude+.1, latitude), cbind(longitude-.1, rev(latitude)), cbind(longitude-.1, latitude+1), crs=crs(ita_gadm3))
Now find the nearest road segment. You can use geosphere::dist2Line because you are using angular (lon/lat) coordinates.
library(geosphere)
geosphere::dist2Line(longlat, road)
# distance lon lat ID
#[1,] 2498.825 10.83212 44.53355 2
#[2,] 5527.646 11.03032 44.63470 1
#[3,] 5524.227 10.86062 44.63634 2
#[4,] 5577.372 10.86062 44.63634 2
#[5,] 5756.113 10.86062 44.63634 2
Note the variable ID which refers back to the roads. The problem is that dist2line is currently slow and you have a large data set.
The alternative is to transform your spatial data to a planar coordinate system appropriate for Italy and use gDistance.
library(rgeos)
library(rgeos)
sp <- SpatialPoints(longlat, proj4string=crs(ita_gadm3))
spita <- spTransform(sp, "+proj=tmerc +lat_0=0 +lon_0=15 +k=0.9996 +x_0=2520000 +y_0=0 +ellps=intl +units=m")
rdita <- spTransform(road, "+proj=tmerc +lat_0=0 +lon_0=15 +k=0.9996 +x_0=2520000 +y_0=0 +ellps=intl +units=m")
gd <- gDistance(rdita, spita, byid=TRUE)
a <- apply(gd, 1, which.min)
a
#1 2 3 4 5
#2 1 2 2 2
That is, point 2 is closest to road 1. The other points are closest to road 2.
You probably need to do that in batches of points or tiles to avoid getting a distance matrix that is too large.
The buffer solution suggested by Sébastien could work in principle, but gets really complicated as there is no good buffer size. At the one hand, points may be outside any buffer and, at the other hand, they may overlap with several buffers. If you use buffers, sp::over returns an arbitrary match if there are multiple matches, whereas raster::extract will return them all. Neither is pretty, and I would avoid this approach. Illustrated here:
b <- buffer(road, width=.15, dissolve=F)
plot(b)
lines(road, col='red', lwd=2)
points(longlat, pch=20, col='blue')
extract(b, longlat)
# point.ID poly.ID
#1 1 1
#2 1 2
#3 2 2
#4 2 1
#5 3 2
#6 3 1
#7 4 2
#8 4 1
#9 5 1
#10 5 2
over(sp, b)
#1 2 3 4 5
#2 2 2 2 2
You need to join polygons with your points, not Lines. To do so, you can create a buffer area around your Lines using rgeos::gBuffer(). Be careful, because the buffer area will be in the coordinate system of your Lines. Probably degrees (wgs84) in your case (Verify it). Choose the correct distance (width) according to your case.
LinesBuffer <- rgeos::gBuffer(shapefile_roads, width = 0.01)
Then you will be able to join points with "LinesBuffer" using over (if they are in the same coordinates system).
result <- over(my_spdf, LinesBuffer)
Related
I have data points of a species observed using camera traps and would like to measure the distance of each camera trap site (CameraStation) to the edge of a national park using R. I have a shapefile of the park (shp) and want to apply a criterion to CameraStation(s) which are <5km from the edge. My data frame (df) consists of multiple events/observations (EventID) per CameraStation. The aim is to analyse when events near the park edge are most frequent given other environmental factors such as Season, Moon Phase and DayNight (also columns in DF).
I found a package called distance in R but this is for distance sampling and not what I want to do. Which package is relevant in this situation?
I expect the following outcome:
EventID CameraStation Distance(km) Within 5km
0001 Station 1 4.3 Yes
0002 Station 1 4.3 Yes
0003 Station 2 16.2 No
0004 Station 3 0.5 Yes
...
Here's a general solution, adapted from Spacedmans answer to this question at gis.stackexchange. Note: This solution requires working in a projected coordinate system. You can transform to a projected CRS if needed using spTransform.
The gDistance function of the rgeos package calculates the distance between geometries, but for the case of points inside a polygon the distance is zero. The trick is to create a new "mask" polygon where the original polygon is a hole cut out from the mask. Then we can measure the distance between points in the hole and the mask, which is the distance to the edge of the original polygon that we really care about.
We'll use the shape file of the Yellowstone National Park Boundary found on this page.
library(sp) # for SpatialPoints and proj4string
library(rgdal) # to read shapefile with readOGR
library(rgeos) # for gDistance, gDifference, and gBuffer
# ab67 was the name of the shape file I downloaded.
yellowstone.shp <- readOGR("ab67")
# gBuffer enlarges the boundary of the polygon by the amount specified by `width`.
# The units of `width` (meters in this case) can be found in the proj4string
# for the polygon.
yellowstone_buffer <- gBuffer(yellowstone.shp, width = 5000)
# gDifference calculates the difference between the polygons, i.e. what's
# in one and not in the other. That's our mask.
mask <- gDifference(yellowstone_buffer, yellowstone.shp)
# Some points inside the park
pts <- list(x = c(536587.281264245, 507432.037861251, 542517.161278414,
477782.637790409, 517315.171218198),
y = c(85158.0056377799, 77251.498952222, 15976.0721391485,
40683.9055315169, -3790.19457474617))
# Sanity checking the mask and our points.
plot(mask)
points(pts)
# Put the points in a SpatialPointsDataFrame with camera id in a data field.
spts.df <- SpatialPointsDataFrame(pts, data = data.frame(Camera = ordered(1:length(pts$x))))
# Give our SpatialPointsDataFrame the same spatial reference as the polygon.
proj4string(spts.df) <- proj4string(yellowstone.shp)
# Calculate distances (km) from points to edge and put in a new column.
spts.df$km_to_edge <- apply(gDistance(spts.df, difference, byid=TRUE),2,min)/1000
# Determine which records are within 5 km of an edge and note in new column.
spts.df$edge <- ifelse(spts.df$km_to_edge < 5, TRUE, FALSE)
# Results
spts.df
# coordinates Camera km_to_edge edge
# 1 (536587.3, 85158.01) 1 1.855010 TRUE
# 2 (507432, 77251.5) 2 9.762755 FALSE
# 3 (542517.2, 15976.07) 3 11.668700 FALSE
# 4 (477782.6, 40683.91) 4 4.579638 TRUE
# 5 (517315.2, -3790.195) 5 8.211961 FALSE
Here's a quick solution.
Simplify the outline of your shapefile into N points. Then calculate the minimum distance for each camera trap to every point in the outline of the national park.
library(geosphrere)
n <- 500 ##The number of points summarizing the shapefile
NPs <- ##Your shapefile goes here
NP.pts <- spsample(NPs, n = n, type = "regular")
CP.pts <- ## Coordinates for a single trap
distances<-distm(coordinates(CP.pts), coordinates(NP.pts), fun = distHaversine)/1000
##Distance in Km between the trap to each point in the perimeter of the shapefile:
distances
Use distances to find the minimum distance between the shapefile and that given trap. This approach can easily be generalizable using for loops or apply functions.
I had a problem with the points data frame and shape file being projected so instead I used the example in this link to answer my question
https://gis.stackexchange.com/questions/225102/calculate-distance-between-points-and-nearest-polygon-in-r
Basically, I used this code;
df # my data frame with points
shp # my shapefile (non-projected)
dist.mat <- geosphere::dist2Line(p = df2, line = shp)
coordinates(df2)<-~Longitude+Latitude # Longitude and Latitude are columns in my df
dmat<-data.frame(dist.mat) # turned it into a data frame
dmat$km5 <- ifelse(dmat$distance < 5000, TRUE, FALSE) # in meters (5000)
coordinates(dmat)<-~lon+lat
df2$distance <- dmat$distance # added new Distance column to my df
I want to calculate road network distances between a reference line (or a reference point if a single point facilitates the possible solution) and a dataframe of long/lat points. I have the following data frame:
Latitude Longitude
1 40.66858 22.88713
2 40.66858 22.88713
3 40.66858 22.88713
4 40.66858 22.88713
5 40.66858 22.88714
6 40.66857 22.88715
7 40.66858 22.88716
8 40.66858 22.88717
9 40.66859 22.88718
10 40.66861 22.88719
and the following reference line with start/end coordinates:
22.88600 40.66885
22.88609 40.66880
(If we want a single reference point in the middle of the line (instead of the whole line) its coordinates are: 22.88602844465866,40.66883357487465)
Here is a screenshot from google earth after plotting the points and the line:
I have tried to compute the distances of each point with the reference line with the following way:
dist2Line(points, line, distfun=distHaversine) #from geosphere package
The distance which is computed (e.g. for the first point) is the one with the yellow line in the following screenshot. The desired one is the one with the red
line (road network distance). How can I solve this? I want to compute the road network distances for all points!
Thank you in advance!
library(sp)
library(rgeos)
library(geosphere)
Let's join the midpoint of your line to the other line:
pt1 <- matrix(c(22.88600, 40.66885), ncol=2)
pt2 <- matrix(c(22.88609, 40.66880), ncol=2)
midpt <- as.data.frame(midPoint(pt1, pt2))
NOTE: The first 4 line points are the same in your supplied data
read.csv(text="lat,lon
40.66858,22.88713
40.66858,22.88713
40.66858,22.88713
40.66858,22.88713
40.66858,22.88714
40.66857,22.88715
40.66858,22.88716
40.66858,22.88717
40.66859,22.88718
40.66861,22.88719", stringsAsFactors = FALSE) -> l
l <- rbind.data.frame(midpt, l)
Using the midpoint on the line isn't perfect so you could use the spatial intersection operations as well to find the correct intersecting point.
Now, make it a spatial object and give it the boring longlat "projection".
l <- SpatialLines(list(Lines(Line(l[,2:1]), "1")), proj4string = CRS("+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"))
Convert said "projection" to something meaningful (I picked EPSG:3265, but choose whatever you want so you can get real distance):
l <- spTransform(l, CRS("+init=epsg:3265"))
Get the points from the line:
pts <- as(l, "SpatialPoints")
Follow How to calculate geographic distance between two points along a line in R? to get the distance between points which you can do the rest from there:
diff(sort(gProject(l, pts, normalized = FALSE)))
## [1] 372.553928 0.000000 0.000000 0.000000 3.360954 4.581859
## [7] 4.581860 3.360956 4.581862 7.077129
It'd be 👍🏼 if someone who knows how to do this with sf could do that as well since I couldn't find a gProject equivalent.
I have a data frame of lat/long coordinates and a polygon, which represents a coastline. I am trying to find the distance between each point and the nearest coastline feature. I would like to end up with an output data frame that includes columns for my original lat/long values and a new distance column.
I have attempted to use the gDistance function after reading answers to similar questions online, but I think that I am missing a few steps and am having trouble figuring it out. At present, I only end up with a single distance value. I am quite new to R and would really appreciate any help that anyone might be able to give.
Thanks!
#Load data
Locs = structure(list(id = 1:5, Lat = c(29.59679167, 29.43586667, 29.37642222,29.52786111, 30.10603611), Long = c(-81.02547778, -80.92573889,
-80.97714167, -81.08721667, -80.94368611)), .Names = c("id","Lat", "Long"), class = "data.frame", row.names = c(NA, -5L))
#Extract lat/long coordinates
xy = Locs[,c("Lat","Long")]
#Create SpatialPointsDataFrame from xy data and change projection to metres
spdf = SpatialPointsDataFrame(coords=xy, data=xy, proj4string = CRS("+proj=aea +zone=17 ellps=WGS84"))
#Read in shapefile as a spatialdataframe object
coast = readOGR(dsn="land data", layer="coast")
#Transform to AEA (m) projection to match projection of points
land_poly = spTransform(coast, CRS("+proj=aea +zone=17 ellps=WGS84"))
#OR load map from map package (but unfortunately map objects do not work in gDistance)
library(maps)
library(mapdata)
coast2 = map('usa', col = "grey90", fill=TRUE)
#Calculate distance between each point and the nearest land feature
for(i in 1:dim(spdf)[1]){
g = gDistance(spdf[i,],land_poly)
}
EDIT: Using AEF's code alterations below (for the for loop step), I am able to get gDistance values for each row, however the output distances are not correct (see below). According to arcGIS they should be between 4-37km, not >500km. Any thoughts on what I am doing wrong here? My land polygon and points are both in the same projection.
gDistance output
id Lat Long dist_gDist
1 1 29.59679 -81.02548 516299.0
2 2 29.43587 -80.92574 516298.8
3 3 29.37642 -80.97714 516298.9
4 4 29.52786 -81.08722 516299.0
5 5 30.10604 -80.94369 516299.0
The correct distances (calculated in GIS)
id Lat Long dist_arc
1 1 29.59679 -81.02548 13.630
2 2 29.43587 -80.92574 15.039
3 3 29.37642 -80.97714 8.111
4 4 29.52786 -81.08722 4.784
5 5 30.10604 -80.94369 36.855
I think you get only one distance value because you overwrite g in every iteration of your for-loop. I do however not not know if this is the only problem because I cannot reproduce your issue without suitable data.
Try changing the last loop to this:
g = rep(NA, dim(spdf)[1])
for(i in 1:dim(spdf)[1]){
g[i] = gDistance(spdf[i,],land_poly)
}
I want to do some spatial statistic analysis with the county-level crop yield data in Nebraska for the STAT class. For that I need the longitude and latitude of the geographic centroids of each county. Anybody know how to do it in R? I know it can be done in ArcGIS but I have no access to it now.
You didn't give any details where you got your shapefile from, but I got one from here and you can use gCentroid from rgeos thusly:
library(rgdal)
library(sp)
library(rgeos)
nebraska <- readOGR("CountyBoundsUTM/", "CountyUTM")
gCentroid(nebraska, byid=TRUE)
## SpatialPoints:
## x y
## 0 721768.5 4636738
## 1 430938.8 4524651
## 2 698036.4 4566570
## 3 370970.6 4641340
## ...
## 89 623301.6 4603228
## 90 618883.0 4486931
## 91 439295.3 4582756
## 92 493680.8 4522680
## Coordinate Reference System (CRS) arguments: +proj=utm +zone=14 +datum=NAD83
## +units=m +no_defs +ellps=GRS80 +towgs84=0,0,0
You can also extract centroids of SpatialPolygons* objects with coordinates, though the centroids won't be returned as SpatialPoints as with rgeos::gCentroid.
For example:
library(rgdal)
download.file('http://dds.cr.usgs.gov/pub/data/nationalatlas/countyp020_nt00009.tar.gz',
f <- tempfile()) # ~ 4.5 Mb
untar(f, exdir=tempdir())
counties <- readOGR(tempdir(), 'countyp020')
xy <- coordinates(counties)
head(xy)
# [,1] [,2]
# 0 -153.3905 69.30193
# 1 -156.0582 71.33094
# 2 -155.6695 71.24763
# 3 -155.5164 71.23148
# 4 -155.1846 71.18189
# 5 -155.6126 71.00725
Note that, as pointed out by #Spacedman in the comments, the polygons should be projected to a planar coordinate system first.
Recent versions of the sf package (I think since version 1) use the S2 library from Google for spherical geometry calculations. The advantage is that centroid calculations are not simply planar. The relevant method is st_centroid(). An example for a region with a significant spatial extent:
library(rnaturalearth)
# ne_countries() returns 'sp'-type data by default
nc <- ne_countries(continent = "Asia", returnclass = "sf")
library(sf)
# long-lat data in WGS84
st_crs(nc)
# use st_geometry() to plot only the polygons and not the associated data
plot(st_geometry(nc), axes = T)
plot(st_centroid(st_geometry(nc)), pch = "+", col = "red", add = T)
# 'sf' integrates nicely with 'ggplot2':
library(ggplot2)
ggplot(nc) + geom_sf() +
geom_sf(aes(geometry = st_centroid(st_geometry(nc))), colour = "red")
You can use the get_map() function from the ggplot2 package to extract the US county map data from the maps package to a dataframe. Then you can calculate the mid points of the ranges of the lat/lon columns by county (or whatever method you want to use to define geographic center).
I'm trying to buffer the points in my dataset with a radius of 100km. I'm using the function gBuffer from the package rgeos. Here's what I have so far:
head( sampledf )
# postalcode lat lon city province
#1 A0A0A0 47.05564 -53.20198 Gander NL
#4 A0A1C0 47.31741 -52.81218 St. John's NL
coordinates( sampledf ) <- c( "lon", "lat" )
proj4string( sampledf ) <- CRS( "+proj=longlat +datum=WGS84" )
distInMeters <- 1000
pc100km <- gBuffer( sampledf, width=100*distInMeters, byid=TRUE )
I get the following warning:
In gBuffer(sampledf, width = 100 * distInMeters, byid = TRUE) :
Spatial object is not projected; GEOS expects planar coordinates
From what I understand/read, I need to change the Coordinate Reference System (CRS),
in particular the projection, of the dataset from 'geographic' to 'projected'.
I'm not sure sure how to change this. These are all Canadian addresses, I might add.
So NAD83 seems to me a natural projection to choose but I may be wrong.
Any/all help would be greatly appreciated.
With a little bit more digging, it turns out that using a 'projected' coordinates reference system is as simple as
# To get Statscan CRS, see here:
# http://spatialreference.org/ref/epsg/3347/
pc <- spTransform( sampledf, CRS( "+init=epsg:3347" ) )
EPSG3347, used by STATSCAN (adequate for Canadian addresses), uses a lambert conformal conic projection. Note that NAD83 is inappropriate: it is a 'geographic', rather than a 'projected' CRS. To buffer the points
pc100km <- gBuffer( pc, width=100*distm, byid=TRUE )
# Add data, and write to shapefile
pc100km <- SpatialPolygonsDataFrame( pc100km, data=pc100km#data )
writeOGR( pc100km, "pc100km", "pc100km", driver="ESRI Shapefile" )
As #MichaelChirico pointed out, projecting your data to usergeos::gBuffer() should be applied with care. I am not an expert in geodesy, but as far I understood from this ESRI article (Understanding Geodesic Buffering), projecting and then applying gBuffer means actually producing Euclidean buffers as opposed to Geodesic ones. Euclidean buffers are affected by the distortions introduced by projected coordinate systems. These distortions might be something to worry about if your analysis involves wide buffers especially with a wider range of latitudes across big areas (I presume Canada is a good candidate).
I came across the same issue some time ago and I targeted my question towards gis.stackexchange - Euclidean and Geodesic Buffering in R. I think the R code that I proposed then and also the given answer are relevant to this question here as well.
The main idea is to make use of geosphere::destPoint(). For more details and a faster alternative, see the mentioned gis.stackexchange link above. Here is my older attempt applied on your two points:
library(geosphere)
library(sp)
pts <- data.frame(lon = c(-53.20198, -52.81218),
lat = c(47.05564, 47.31741))
pts
#> lon lat
#> 1 -53.20198 47.05564
#> 2 -52.81218 47.31741
make_GeodesicBuffer <- function(pts, width) {
# A) Construct buffers as points at given distance and bearing ---------------
dg <- seq(from = 0, to = 360, by = 5)
# Construct equidistant points defining circle shapes (the "buffer points")
buff.XY <- geosphere::destPoint(p = pts,
b = rep(dg, each = length(pts)),
d = width)
# B) Make SpatialPolygons -------------------------------------------------
# Group (split) "buffer points" by id
buff.XY <- as.data.frame(buff.XY)
id <- rep(1:dim(pts)[1], times = length(dg))
lst <- split(buff.XY, id)
# Make SpatialPolygons out of the list of coordinates
poly <- lapply(lst, sp::Polygon, hole = FALSE)
polys <- lapply(list(poly), sp::Polygons, ID = NA)
spolys <- sp::SpatialPolygons(Srl = polys,
proj4string = CRS("+proj=longlat +ellps=WGS84 +datum=WGS84"))
# Disaggregate (split in unique polygons)
spolys <- sp::disaggregate(spolys)
return(spolys)
}
pts_buf_100km <- make_GeodesicBuffer(as.matrix(pts), width = 100*10^3)
# Make a kml file and check the results on Google Earth
library(plotKML)
#> plotKML version 0.5-9 (2019-01-04)
#> URL: http://plotkml.r-forge.r-project.org/
kml(pts_buf_100km, file.name = "pts_buf_100km.kml")
#> KML file opened for writing...
#> Writing to KML...
#> Closing pts_buf_100km.kml
Created on 2019-02-11 by the reprex package (v0.2.1)
And to toy around, I wrapped the function in a package - geobuffer
Here is an example:
# install.packages("devtools") # if you do not have devtools, then install it
devtools::install_github("valentinitnelav/geobuffer")
library(geobuffer)
pts <- data.frame(lon = c(-53.20198, -52.81218),
lat = c(47.05564, 47.31741))
pts_buf_100km <- geobuffer_pts(xy = pts, dist_m = 100*10^3)
Created on 2019-02-11 by the reprex package (v0.2.1)
Others might come up with better solutions, but for now, this worked well for my problems and hopefully can solve other's problems as well.