I'm trying to better understand how various R packages calculate areas when in lat-long geographic coordinate reference systems, and which functions provide the most accurate estimates.
The estimate of area with the s2 geometry library turned on in {sf} provides a different estimate than when s2 is turned off and GEOS is used. The {sf} GEOS area calculation is the the same as {terra}'s estimate.
Which is the most accurate way to calculate area?
library(terra)
library(sf)
# create a polygon using terra
p <- vect('POLYGON ((2 2, 7 6, 4 9, 2 2))', crs = 'EPSG:4326')
# calculate area with terra
expanse(p, unit = 'km')
# 165515 km2
# calculate area with sf
st_area(st_as_sf(p))/1e6
# 166235 km2
# turn off s2, then terra and sf area estimates are the same
sf_use_s2(FALSE)
round((expanse(p, unit = 'km')) - (as.numeric(st_area(st_as_sf(p))/1e6)))
It appears that the S2 library represents the earth as a sphere. The webpage states that
the S2 library represents all data on a three-dimensional sphere
The high precision GeographicLib uses a spheroid to represent the earth's shape. GeographicLib was first used by "geosphere" and it is now also used by "terra", and by "sf" when S2 is turned off.
The earth bulges at the equator and that is why you can better approximate the earth's shape with a spheroid than with a sphere. This suggests that the computation is less precise when S2 is used.
GEOS is not used in this context, as it can only be used to compute areas for planar polygons, not for lon/lat polygons.
Related
I'm new to using spatial data so this probably seems like a very simple question, but something I'm struggling to get my head around.
I have a global dataset of sample sites and corresponding coordinates. I am using st_buffer in the sf package to create buffers of different sizes around these points. However, I need these buffers to be in km, for example 2,10,50 km radius, rather than in the units of the CRS projection (currently in long/lat, WGS84). As I understand it, UTM is the only planar projection, but does this mean I have to split my global dataset into each of the UTM zones before converting to UTM, and then create the buffers for each of these separately?
Is it possible to then convert the buffers back to my previous CRS projection?
Thank you!
It is not necessary to step out of the comfort of WGS84 to do a metric buffer; most of the tools are ready to combine longlat CRS with a metric definition of distance (it is a very common use case).
When buffering in WGS84 I kind of prefer terra::buffer() to sf::st_buffer() as it is likely to produce a smoother shape - the S2 functions that work under the hood of unprojected {sf} do not work smooth enough for me and the outcome is somewhat grainy. But I digress...
Consider this piece of code, what it does is:
looks up the coordinates of a semi random landmark (Greenwich Observatory in London)
buffers it by 25 kilometers
displays the result
Note how the terra documentation states that the buffer is in meters for unprojected coordinates.
library(nominatimlite)
library(sf)
library(terra)
a_point <- geo_lite_sf("Royal Observatory, Greenwich")
st_crs(a_point) # WGS84
terra_buffer <- a_point |>
vect() |>
buffer(width = 25000) %>% # 25 kilometers
st_as_sf()
mapview::mapview(terra_buffer)
lat long
7.16 124.21
8.6 123.35
8.43 124.28
8.15 125.08
Consider these coordinates, these coordinates correspond to weather stations that measure rainfall data.
The intro to the gstat package in R uses the meuse dataset. At some point in this tutorial: https://rpubs.com/nabilabd/118172, the guys makes use of a "meuse.grid" in this line of code:
data("meuse.grid")
I do not have such a file and I do not know how to create it, can I create one using these coordinates? Or at least point me to material that discusses how to create a custom grid for a custom area (i.e not using administrative boundaries from GADM).
Probably wording this wrong, don't even know if this question makes sense to R savvy people. Still, would love to hear some direction, or at least tips. Thanks a lot!
Total noob at R and statistics.
EDIT: See the sample grid that the tutorial I posted looks like, that's the thing I want to make.
EDIT 2: Would this method be viable? https://rstudio-pubs-static.s3.amazonaws.com/46259_d328295794034414944deea60552a942.html
I am going to share my approach to create a grid for kriging. There are probably more efficient or elegant ways to achieve the same task, but I hope this will be a start to facilitate some discussions.
The original poster was thinking about 1 km for every 10 pixels, but that is probably too much. I am going to create a grid with cell size equals to 1 km * 1 km. In addition, the original poster did not specify an origin of the grid, so I will spend some time determining a good starting point. I also assume that the Spherical Mercator projection coordinate system is the appropriate choice for the projection. This is a common projection for Google Map or Open Street Maps.
1. Load Packages
I am going to use the following packages. sp, rgdal, and raster are packages provide many useful functions for spatial analysis. leaflet and mapview are packages for quick exploratory visualization of spatial data.
# Load packages
library(sp)
library(rgdal)
library(raster)
library(leaflet)
library(mapview)
2. Exploratory Visualization of the station locations
I created an interactive map to inspect the location of the four stations. Because the original poster provided the latitude and longitude of these four stations, I can create a SpatialPointsDataFrame with Latitude/Longitude projection. Notice the EPSG code for Latitude/Longitude projection is 4326. To learn more about EPSG code, please see this tutorial (https://www.nceas.ucsb.edu/~frazier/RSpatialGuides/OverviewCoordinateReferenceSystems.pdf).
# Create a data frame showing the **Latitude/Longitude**
station <- data.frame(lat = c(7.16, 8.6, 8.43, 8.15),
long = c(124.21, 123.35, 124.28, 125.08),
station = 1:4)
# Convert to SpatialPointsDataFrame
coordinates(station) <- ~long + lat
# Set the projection. They were latitude and longitude, so use WGS84 long-lat projection
proj4string(station) <- CRS("+init=epsg:4326")
# View the station location using the mapview function
mapview(station)
The mapview function will create an interactive map. We can use this map to determine what could be a suitable for the origin of the grid.
3. Determine the origin
After inspecting the map, I decided that the origin could be around longitude 123 and latitude 7. This origin will be on the lower left of the grid. Now I need to find the coordinate representing the same point under Spherical Mercator projection.
# Set the origin
ori <- SpatialPoints(cbind(123, 7), proj4string = CRS("+init=epsg:4326"))
# Convert the projection of ori
# Use EPSG: 3857 (Spherical Mercator)
ori_t <- spTransform(ori, CRSobj = CRS("+init=epsg:3857"))
I first created a SpatialPoints object based on the latitude and longitude of the origin. After that I used the spTransform to perform project transformation. The object ori_t now is the origin with Spherical Mercator projection. Notice that the EPSG code for Spherical Mercator is 3857.
To see the value of coordinates, we can use the coordinates function as follows.
coordinates(ori_t)
coords.x1 coords.x2
[1,] 13692297 781182.2
4. Determine the extent of the grid
Now I need to decide the extent of the grid that can cover all the four points and the desired area for kriging, which depends on the cell size and the number of cells. The following code sets up the extent based on the information. I have decided that the cell size is 1 km * 1 km, but I need to experiment on what would be a good cell number for both x- and y-direction.
# The origin has been rounded to the nearest 100
x_ori <- round(coordinates(ori_t)[1, 1]/100) * 100
y_ori <- round(coordinates(ori_t)[1, 2]/100) * 100
# Define how many cells for x and y axis
x_cell <- 250
y_cell <- 200
# Define the resolution to be 1000 meters
cell_size <- 1000
# Create the extent
ext <- extent(x_ori, x_ori + (x_cell * cell_size), y_ori, y_ori + (y_cell * cell_size))
Based on the extent I created, I can create a raster layer with number all equal to 0. Then I can use the mapview function again to see if the raster and the four stations matches well.
# Initialize a raster layer
ras <- raster(ext)
# Set the resolution to be
res(ras) <- c(cell_size, cell_size)
ras[] <- 0
# Project the raster
projection(ras) <- CRS("+init=epsg:3857")
# Create interactive map
mapview(station) + mapview(ras)
I repeated this process several times. Finally I decided that the number of cells is 250 and 200 for x- and y-direction, respectively.
5. Create spatial grid
Now I have created a raster layer with proper extent. I can first save this raster as a GeoTiff for future use.
# Save the raster layer
writeRaster(ras, filename = "ras.tif", format="GTiff")
Finally, to use the kriging functions from the package gstat, I need to convert the raster to SpatialPixels.
# Convert to spatial pixel
st_grid <- rasterToPoints(ras, spatial = TRUE)
gridded(st_grid) <- TRUE
st_grid <- as(st_grid, "SpatialPixels")
The st_grid is a SpatialPixels that can be used in kriging.
This is an iterative process to determine a suitable grid. Throughout the process, users can change the projection, origin, cell size, or cell number depends on the needs of their analysis.
#yzw and #Edzer bring up good points for creating a regular rectangular grid, but sometimes, there is the need to create an irregular grid over a defined polygon, usually for kriging.
This is a sparsely documented topic. One good answer can be found here. I expand on it with code below:
Consider the the built in meuse dataset. meuse.grid is an irregularly shaped grid. How do we make an grid like meuse.grid for our unique study area?
library(sp)
data(meuse.grid)
ggplot(data = meuse.grid) + geom_point(aes(x, y))
Imagine an irregularly shaped SpatialPolygon or SpatialPolygonsDataFrame, called spdf. You first build a regular rectangular grid over it, then subset the points in that regular grid by the irregularly-shaped polygon.
# First, make a rectangular grid over your `SpatialPolygonsDataFrame`
grd <- makegrid(spdf, n = 100)
colnames(grd) <- c("x", "y")
# Next, convert the grid to `SpatialPoints` and subset these points by the polygon.
grd_pts <- SpatialPoints(
coords = grd,
proj4string = CRS(proj4string(spdf))
)
# subset all points in `grd_pts` that fall within `spdf`
grd_pts_in <- grd_pts[spdf, ]
# Then, visualize your clipped grid which can be used for kriging
ggplot(as.data.frame(coordinates(grd_pts_in))) +
geom_point(aes(x, y))
If you have your study area as a polygon, imported as a SpatialPolygons, you could either use package raster to rasterize it, or use sp::spsample to sample it using sampling type regular.
If you don't have such a polygon, you can create points regularly spread over a rectangular long/lat area using expand.grid, using seq to generate a sequence of long and lat values.
I have what may be a very simplistic question on the KEST function in Spatstat.KEST graph output I'm using the KEST function in Spatstat to assess spatial randomness in a dataset. I have uploaded lat and long values spread over London and converted them to a PPP object, using the ripras function to specify the spatial domain. When I run my KEST analysis on my ppp, and plot the graph, I end up with an r value on the x, but although I know this is a distance measurement, I don't know what units it's using. I get this summary output:
Planar point pattern: 113 points
Average intensity 407.9378 points per square unit
Coordinates are given to 9 decimal places
Window: polygonal boundary
single connected closed polygon with 14 vertices
enclosing rectangle: [-0.5532963, 0.3519148] x [51.2901, 51.7022] units
Window area = 0.277003 square units
with the max r on the x axis being 0.1 units, and the K(r) on the y axis being 0.04. How do I figure out what unit of distance these equate to?
Your lat,lon coordinates correspond to points on a sphere (or ellipsoid or whatever) used as a model for planet Earth. Essentially, spatstat assumes you are using coordinates projected on a flat map. This conversion could be done with e.g. the sp package (using Buckingham Palace as an example):
library(sp)
lat = c(51.501476)
lon = c(-0.140634)
xy = data.frame(lon, lat)
coordinates(xy) <- c("lon", "lat")
proj4string(xy) <- CRS("+proj=longlat +datum=WGS84")
NE <- spTransform(xy, CRS("+proj=utm +zone=30 ellps=WGS84"))
NE <- as.data.frame(NE)
The result is a data.frame with projected coordinates in Easting, Northing in metres. Then you can continue your analysis from there. To assign a unit label like "m" for prettier labels in figures use the function unitname on your ppp object (assuming the object is called X): unitname(X) <- "m"
If the function is able to accept geographic coordinates, then it is using a great circle equation to calculate distance. This normally results in units that are in Kilometers.
It is not very good practice to perform PPA on non-projected data. If possible, you should project your data into a coordinate system that is in distance units. I believe that most of the functions in spatstat use Euclidean distance, which is quite inappropriate for projection units in decimal degrees. Since there is not a latlong argument in the Kest function, I do not believe that your results are valid.
The K function itself (i.e. the theoretical K-function, not just the computer code) assumes that the space is flat rather than curved.
This would probably be a reasonable approximation in your case (points scattered over a few dozen kilometres) but not for a point pattern scattered over a continent. That is, in general the planar K-function should not be used for point patterns on a sphere.
The other posts are correct. The Kest function expects the coordinates to be given in an isometric coordinate system. You just need to express the spatial locations in a coordinate system in which the x and y coordinates are measured in the same distance units. Longitude and latitude are not measured in the same distance units because one degree (say) of longitude does not represent the same distance as one degree of latitude. Ege Rubak's example using spTransform is probably the best way to go.
I need to calculate the shortest distance between two point matrices. I am new to R and have no clue how to do this. This is the code that I used to call in the data and convert them to points
library(dismo)
laurus <- gbif("Laurus", "nobilis")
locs <- subset(laurus, select = c("country", "lat", "lon"))
#uk observations
locs.uk <-subset(locs, locs$country=="United Kingdom")
#ireland observations
locs.ire <- subset(locs, locs$country=="Ireland")
uk_coord <-SpatialPoints(locs.uk[,c("lon","lat")])
ire_coord <-SpatialPoints(locs.ire[,c("lon","lat")])
crs.geo<-CRS("+proj=longlat +ellps=WGS84 +datum=WGS84") # geographical, datum WGS84
proj4string(uk_coord) <-crs.geo #define projection
proj4string(ire_coord) <-crs.geo #define projection
I need to calculate the shortest distance (Euclidean) from points in Ireland to points in UK. In other words I need to calculate the distance from each point in Ireland to its closet point in the UK points layer.
Can some one tell me what function or package I need to use in order to do this. I looked at gdistance and could not find a function that calculate the shortest distance.
You can use the FNN package which uses spatial trees to make the search efficient. It works with euclidean geometry, so you should transform your points to a planar coordinate system. I'll use rgdal package to convert to UK grid reference (stretching it a bit to use it over ireland here, but your original data was New York and you should use a New York planar coord system for that):
> require(rgdal)
> uk_coord = spTransform(uk_coord, CRS("+init=epsg:27700"))
> ire_coord = spTransform(ire_coord, CRS("+init=epsg:27700"))
Now we can use FNN:
> require(FNN)
> g = get.knnx(coordinates(uk_coord), coordinates(ire_coord),k=1)
> str(g)
List of 2
$ nn.index: int [1:69, 1] 202 488 202 488 253 253 488 253 253 253 ...
$ nn.dist : num [1:69, 1] 232352 325375 87325 251770 203863 ...
g is a list of indexes and distances of the uk points that are nearest to the 69 irish points. The distances are in metres because the coordinate system is in metres.
You can illustrate this by plotting the points then joining irish point 1 to uk point 202, irish 2 to uk 488, irish 3 to uk 202 etc. In code:
> plot(uk_coord, col=2, xlim=c(-1e5,6e5))
> plot(ire_coord, add=TRUE)
> segments(coordinates(ire_coord)[,1], coordinates(ire_coord)[,2], coordinates(uk_coord[g$nn.index[,1]])[,1], coordinates(uk_coord[g$nn.index[,1]])[,2])
gDistance() from the rgeos package will give you the distance matrix
library(rgeos)
gDistance(uk_coord, ire_coord, byid = TRUE)
Another option is nncross() from the spatstat package. Pro: it gives the distance to the nearest neighbour. Contra: you 'll need to convert the SpatialPoints to a SpatialPointPattern (see ?as.ppp in statstat)
library(spatstat)
nncros(uk.ppp, ire.ppp)
The package geosphere offers a lot of dist* functions to evaluate distances from two lat/lon points. In your example, you could try:
require(geosphere)
#get the coordinates of UK and Ireland
pointuk<-uk_coord#coords
pointire<-ire_coord#coords
#prepare a vector which will contain the minimum distance for each Ireland point
res<-numeric(nrow(pointire))
#get the min distance
for (i in 1:length(res)) res[i]<-min(distHaversine(pointire[i,,drop=FALSE],pointuk))
The distances you'll obtain are in meters (you can change by setting the radius of the earth in the call to distHaversine).
The problem with gDistance and other rgeos functions is that they evaluate the distance as the coordinates were planar. Basically, the number you obtain is not much useful.
I have a shapefile with 50+ different polygonal shapes (representing 50+ different regions) and 10,000+ data points that are supposed to be present in one of the regions. The thing is, the 10,000+ points are already coded with a region they are supposed to be in, and I want to figure out how far they are from this coded region in geo-spatial distance.
My current approach (code below), which involves converting shapefiles to owin objects from the sp library and using distfun gets me distances in lat,long euclidean space. But I would like to get geo-spatial distances (eventually to convert to km). Where should I go next?
#basically cribbed from http://cran.r-project.org/web/packages/spatstat/vignettes/shapefiles.pdf (page 9)
shp <- readShapeSpatial("myShapeFile.shp", proj4string=CRS("+proj=longlat +datum=WGS84"))
regions <- lapply(slot(shp, "polygons"), function(x) SpatialPolygons(list(x)))
windows <- lapply(regions, as.owin)
# need to convert this to geo distance
distance_from_region <- function(regionData, regionName) {
w <- windows[[regionName]]
regionData$dists <- distfun(w)(regionData$lat, regionData$long)
regionData
}
I'd project the data to a euclidean (or near euclidean) coordinate system - unless you are spanning a large chunk of the globe then this is feasible. Use spTransform from maptools or sp or rgdal (I forget which) and convert to a UTM zone near your data.
You also might do better with package rgeos and the gDistance function:
gDistance by default returns the cartesian minimum distance
between the two geometries in the units of the current projection.
If your data is over a large chunk of globe then... tricky... 42...
Barry