I am trying to adapt the solution to this question for a slightly different purpose.
During my search for a solution, I found J. Win's nice solution here for finding a path via sea from A to B.
So I tried to adapt the code to find distances as well but did not get the expected output.
Problem 1: Scaling issue or incorrect use of function? E.g. coordinates go from north to south of Spain. The expected distance should be about 1600km~, the function outputs 66349.24
Problem 2 The required applications requires working around small islands, is there another data model that will simply slot into this when required?
library(raster)
library(gdistance)
library(maptools)
library(rgdal)
library(maps)
# make a raster of the world, low resolution for simplicity
# with all land having "NA" value
# use your own shapefile or raster if you have it
# the wrld_simpl data set is from maptools package
data(wrld_simpl)
# a typical world projection
world_crs <- crs(wrld_simpl)
world <- wrld_simpl
worldshp <- spTransform(world, world_crs)
ras <- raster(nrow=300, ncol=300)
# rasterize will set ocean to NA so we just inverse it
# and set water to "1"
# land is equal to zero because it is "NOT" NA
worldmask <- rasterize(worldshp, ras)
worldras <- is.na(worldmask)
# originally I set land to "NA"
# but that would make some ports impossible to visit
# so at 999 land (ie everything that was zero)
# becomes very expensive to cross but not "impossible"
worldras[worldras==0] <- 999
# each cell antras now has value of zero or 999, nothing else
# create a Transition object from the raster
# this calculation took a bit of time
tr <- transition(worldras, function(x) 1/mean(x), 16)
tr = geoCorrection(tr, scl=FALSE)
# distance matrix excluding the land, must be calculated
# from a point of origin, specified in the CRS of your raster
# let's start with latlong in Black Sea as a challenging route
port_origin <- structure(c(44.206963342648436, -4.350866822197724), .Dim = 1:2)
port_origin <- project(port_origin, crs(world_crs, asText = TRUE))
points(port_origin)
# function accCost uses the transition object and point of origin
A <- accCost(tr, port_origin)
# now A still shows the expensive travel over land
# so we mask it out to display sea travel only
A <- mask(A, worldmask, inverse=TRUE)
# and calculation of a travel path, let's say to South Africa
port_destination <- structure(c(43.83115853071615, -5.134767965894603), .Dim = 1:2)
port_destination <- project(port_destination, crs(world_crs, asText = TRUE))
path <- shortestPath(tr, port_origin, port_destination, output = "SpatialLines")
t_path <-shortestPath(tr, port_origin, port_destination)
distance <-costDistance(tr, port_origin, port_destination)
# make a demonstration plot
plot(A)
points(rbind(port_origin, port_destination))
lines(path)
# we can wrap all this in a customized function
# if you two points in the projected coordinate system,
# and a raster whose cells are weighted
# according to ease of shipping
RouteShip <- function(from_port, to_port, cost_raster, land_mask) {
tr <- transition(cost_raster, function(x) 1/mean(x), 16)
tr = geoCorrection(tr, scl=FALSE)
A <- accCost(tr, from_port)
A <- mask(A, land_mask, inverse=TRUE)
path <- shortestPath(tr, from_port, to_port, output = "SpatialLines")
plot(A)
points(rbind(from_port, to_port))
lines(path)
}
Related
I'm trying to get a boolleans vector, where for example, v[i] =1 tells me if an i-th point (latitude longitude pair, present inside a train dataframe) falls within one of the geographical areas identified by an OGRGeoJSON file.
The OGR file is structured roughly like this:
District 1: 24 polygonal
District 2: 4 polygonal
District 3: 27 polygonal
District 4: 18 polygonal
District 5: 34 polygonal
That's what I tried to do.
However, the results obtained are not correct because the polygonal that is generated is a mix of all the various areas present in the OGR file.
library(rgdal)
library(httr)
library(sp)
r <- GET('https://data.cityofnewyork.us/api/geospatial/tqmj-j8zm?method=export&format=GeoJSON')
nyc_neighborhoods <- readOGR(content(r,'text'), 'OGRGeoJSON', verbose = F)
#New York City polygonal
pol_lat <- c(nyc_neighborhoods_df$lat)
pol_long <- c(nyc_neighborhoods_df$long)
xy <- cbind(pol_lat, pol_long)
p = Polygon(xy)
ps = Polygons(list(p),1)
pol = SpatialPolygons(list(ps))
#Points to analyse (pair of coordinates)
ny_lat <- c(train$pickup_latitude, train$dropoff_latitude)
ny_long <- c(train$pickup_longitude, train$dropoff_longitude)
ny_coord <- cbind(ny_lat, ny_long)
pts <- SpatialPoints(ny_coord)
#Query: Does the point to analyze fall in or out NYC?
over(pts, pol, returnList = TRUE)
How can I fix this to get the correct result?
sp is an older package which is being phased out in favor of the newer "Simple Features" sf package. Let me know if you are open to using the pipe operator %>% from the magrittr package, as it works nicely with the sf package (as does dplyr and purrr).
Using sf, you could do:
library(sf)
# Replace this with the path to the geojson file
geojson_path <- "path/to/file.geojson"
boroughs <- sf::st_read(dsn = geojson_path, stringsAsFactors = FALSE)
Now making a very simple spatial point object to stand in for the "trains" data.
# Make test data.frame
test_df <-
data.frame(
# Random test point I chose, a couple of blocks from Central Park
a = "manhattan_point",
y = 40.771959,
x = -73.964128,
stringsAsFactors = FALSE)
# Turn the test_df into a spatial object
test_point <-
sf::st_as_sf(
test_df,
# The coords argument tells the st_as_sf function
# what columns store the longitude and latitude data
# which it uses to associate a spatial point to each
# row in the data.frame
coords = c("x", "y"),
crs = 4326 # WGS84
)
Now we are ready to determine what polygon(s) our point falls in:
# Get the sparse binary predicate. This will give a list with as
# many elements as there are spatial objects in the first argument,
# in this case, test_point, which has 1 element.
# It also has attributes which detail what the relationship is
# (intersection, in our case)
sparse_bin_pred <- sf::st_intersects(test_point, boroughs)
# Output the boro_name that matched. I think the package purrr
# offers some more intuitive ways to do this, but
lapply(
sparse_bin_pred,
function(x) boroughs$boro_name[x]
)
That last part outputs:
[[1]]
[1] "Manhattan"
I have data points of a species observed using camera traps and would like to measure the distance of each camera trap site (CameraStation) to the edge of a national park using R. I have a shapefile of the park (shp) and want to apply a criterion to CameraStation(s) which are <5km from the edge. My data frame (df) consists of multiple events/observations (EventID) per CameraStation. The aim is to analyse when events near the park edge are most frequent given other environmental factors such as Season, Moon Phase and DayNight (also columns in DF).
I found a package called distance in R but this is for distance sampling and not what I want to do. Which package is relevant in this situation?
I expect the following outcome:
EventID CameraStation Distance(km) Within 5km
0001 Station 1 4.3 Yes
0002 Station 1 4.3 Yes
0003 Station 2 16.2 No
0004 Station 3 0.5 Yes
...
Here's a general solution, adapted from Spacedmans answer to this question at gis.stackexchange. Note: This solution requires working in a projected coordinate system. You can transform to a projected CRS if needed using spTransform.
The gDistance function of the rgeos package calculates the distance between geometries, but for the case of points inside a polygon the distance is zero. The trick is to create a new "mask" polygon where the original polygon is a hole cut out from the mask. Then we can measure the distance between points in the hole and the mask, which is the distance to the edge of the original polygon that we really care about.
We'll use the shape file of the Yellowstone National Park Boundary found on this page.
library(sp) # for SpatialPoints and proj4string
library(rgdal) # to read shapefile with readOGR
library(rgeos) # for gDistance, gDifference, and gBuffer
# ab67 was the name of the shape file I downloaded.
yellowstone.shp <- readOGR("ab67")
# gBuffer enlarges the boundary of the polygon by the amount specified by `width`.
# The units of `width` (meters in this case) can be found in the proj4string
# for the polygon.
yellowstone_buffer <- gBuffer(yellowstone.shp, width = 5000)
# gDifference calculates the difference between the polygons, i.e. what's
# in one and not in the other. That's our mask.
mask <- gDifference(yellowstone_buffer, yellowstone.shp)
# Some points inside the park
pts <- list(x = c(536587.281264245, 507432.037861251, 542517.161278414,
477782.637790409, 517315.171218198),
y = c(85158.0056377799, 77251.498952222, 15976.0721391485,
40683.9055315169, -3790.19457474617))
# Sanity checking the mask and our points.
plot(mask)
points(pts)
# Put the points in a SpatialPointsDataFrame with camera id in a data field.
spts.df <- SpatialPointsDataFrame(pts, data = data.frame(Camera = ordered(1:length(pts$x))))
# Give our SpatialPointsDataFrame the same spatial reference as the polygon.
proj4string(spts.df) <- proj4string(yellowstone.shp)
# Calculate distances (km) from points to edge and put in a new column.
spts.df$km_to_edge <- apply(gDistance(spts.df, difference, byid=TRUE),2,min)/1000
# Determine which records are within 5 km of an edge and note in new column.
spts.df$edge <- ifelse(spts.df$km_to_edge < 5, TRUE, FALSE)
# Results
spts.df
# coordinates Camera km_to_edge edge
# 1 (536587.3, 85158.01) 1 1.855010 TRUE
# 2 (507432, 77251.5) 2 9.762755 FALSE
# 3 (542517.2, 15976.07) 3 11.668700 FALSE
# 4 (477782.6, 40683.91) 4 4.579638 TRUE
# 5 (517315.2, -3790.195) 5 8.211961 FALSE
Here's a quick solution.
Simplify the outline of your shapefile into N points. Then calculate the minimum distance for each camera trap to every point in the outline of the national park.
library(geosphrere)
n <- 500 ##The number of points summarizing the shapefile
NPs <- ##Your shapefile goes here
NP.pts <- spsample(NPs, n = n, type = "regular")
CP.pts <- ## Coordinates for a single trap
distances<-distm(coordinates(CP.pts), coordinates(NP.pts), fun = distHaversine)/1000
##Distance in Km between the trap to each point in the perimeter of the shapefile:
distances
Use distances to find the minimum distance between the shapefile and that given trap. This approach can easily be generalizable using for loops or apply functions.
I had a problem with the points data frame and shape file being projected so instead I used the example in this link to answer my question
https://gis.stackexchange.com/questions/225102/calculate-distance-between-points-and-nearest-polygon-in-r
Basically, I used this code;
df # my data frame with points
shp # my shapefile (non-projected)
dist.mat <- geosphere::dist2Line(p = df2, line = shp)
coordinates(df2)<-~Longitude+Latitude # Longitude and Latitude are columns in my df
dmat<-data.frame(dist.mat) # turned it into a data frame
dmat$km5 <- ifelse(dmat$distance < 5000, TRUE, FALSE) # in meters (5000)
coordinates(dmat)<-~lon+lat
df2$distance <- dmat$distance # added new Distance column to my df
I need to calculate the proportion of each raster cell in a high resolution grid (raster stack with 8 layers) covered by a polygon using R.
My standard approach would be to use raster::rasterize(..., getCover = TRUE), however, this approach takes a very long time, particularly when the size of the polygon increases.
As an alternative, I tried cropping the raster stack to the extent of the polygon, transforming the raster stack to polygons and calculating the proportion from the intersection of the resulting shapes with the original polygon. This works well for small polygons but breaks down as the polygon increases, because R runs out of memory (I am limited to 16GB) or because the calculation of the intersection takes too long.
Here a reproducible example using my current solution with a very small shape file.
library(raster)
library(spex)
library(dplyr)
library(sf)
library(data.table)
# setup a dummy example
r <- raster(nrow = 21600, ncol = 43200)
r[] <- 1:933120000
r_stack <- stack(r,r,r,r,r,r,r,r)
# get a small dummy shapefile
shp_small <- raster::getData(name = "GADM", country = "CHE", level = 2, download = TRUE)
shp_small <- st_as_sf(shp_small)[1, ]
# for comparison, use a big dummy shapefile
# shp_big <- raster::getData(name = "GADM", country = "BRA", level = 0, download = TRUE)
## Approach for a small shape file
stack_small <- raster::crop(r_stack, shp_small, snap = "out")
## transform to polygon
stack_small_poly <- spex::polygonize(stack_small)
stack_small_poly$ID <- 1:nrow(stack_small_poly)
## I can then perform the necessary calculations on the polygons to obtain
## the proportional overlay
# transform to mollweide for area calculation
mollw <- "+proj=moll +lon_0=0 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs"
stack_small_crs <- st_crs(stack_small_poly)
stack_small <- st_transform(stack_small_poly, mollw)
stack_small_poly <- st_transform(stack_small_poly, mollw)
# calculate area for each cell
stack_small_poly$area_org <- st_area(stack_small_poly)
# transform to world equidistant cylindrical
stack_small_poly<- st_transform(stack_small_poly, 4087)
shp_small <- st_transform(shp_small, 4087)
# get call the cells that intersect with the shape (this might take a while)
i <- st_intersects(stack_small_poly, shp_small, sparse = FALSE)
stack_small_poly <- dplyr::filter(stack_small_poly, i)
# now calculate the extact intersection (this might take a while)
st_agr(stack_small_poly) <- "constant"
stack_small_poly <- st_intersection(stack_small_poly, st_geometry(shp_small))
# calculate the new areas and backtransform
stack_small_poly <- st_transform(stack_small_poly, mollw)
stack_small_poly$new_area <- st_area(stack_small_poly)
stack_small_poly <- st_transform(stack_small_poly, stack_small_crs)
# calculate proportion
stack_small_poly$proportion <- as.numeric(stack_small_poly$new_area/stack_small_poly$area_org)
# finally transform to data.table for subsequent analysis
st_geometry(stack_small_poly) <- NULL
setDT(stack_small_poly)
I am looking for a solution in R that is able to perform the task in 10-15 minutes (preferably faster) with a memory limit of 16 GB RAM for the shapefile representing Brazil (see shp_big in code above).
I am well aware that this optimum might not be achievable and every suggestion leading to a reduction in execution time and/or memory usage is more than wellcome.
New to spatial analysis on R here. I have a shapefile for the USA that I downloaded from HERE. I also have a set of lat/long points (half a million) that lie within the contiguous USA.
I'd like to find the "most remote spot" -- the spot within the contiguous USA that's farthest from the set of points.
I'm using the rgdal, raster and sp packages. Here's a reproducible example with a random sample of 10 points:
# Set wd to the folder tl_2010_us_state_10
usa <- readOGR(dsn = ".", layer = "tl_2010_us_state10")
# Sample 10 points in USA
sample <- spsample(usa, 10, type = "random")
# Set extent for contiguous united states
ext <- extent(-124.848974, -66.885444, 24.396308, 49.384358)
# Rasterize USA
r <- raster(ext, nrow = 500, ncol = 500)
rr <- rasterize(usa, r)
# Find distance from sample points to cells of USA raster
D <- distanceFromPoints(object = rr, xy = sample)
# Plot distances and points
plot(D)
points(sample)
After the last two lines of code, I get this plot.
However, I'd like it to be over the rasterized map of the USA. And, I'd like it to only consider distances from cells that are in the contiguous USA, not all cells in the bounding box. How do I go about doing this?
I'd also appreciate any other tips regarding the shape file I'm using -- is it the best one? Should I be worried about using the right projection, since my actual dataset is lat/long? Will distanceFromPoints be able to efficiently process such a large dataset, or is there a better function?
To limit raster D to the contiguous USA you could find the elements of rr assigned values of NA (i.e. raster cells within the bounding box but outside of the usa polygons), and assign these same elements of D a value of NA.
D[which(is.na(rr[]))] <- NA
plot(D)
lines(usa)
You can use 'proj4string(usa)' to find the projection info for the usa shapefile. If your coordinates of interest are based on a different projection, you can transform them to match the usa shapefile projection as follows:
my_coords_xform <- spTransform(my_coords, CRS(proj4string(usa)))
Not sure about the relative efficiency of distanceFromPoints, but it only took ~ 1 sec to run on my computer using your example with 10 points.
I think you were looking for the mask function.
library(raster)
usa <- getData('GADM', country='USA', level=1)
# exclude Alaska and Hawaii
usa <- usa[!usa$NAME_1 %in% c( "Alaska" , "Hawaii"), ]
# get the extent and create raster with preferred resolution
r <- raster(floor(extent(usa)), res=1)
# rasterize polygons
rr <- rasterize(usa, r)
set.seed(89)
sample <- spsample(usa, 10, type = "random")
# Find distance from sample points to cells of USA raster
D <- distanceFromPoints(object = rr, xy = sample)
# remove areas outside of polygons
Dm <- mask(D, rr)
# an alternative would be mask(D, usa)
# cell with highest value
mxd <- which.max(Dm)
# coordinates of that cell
pt <- xyFromCell(r, mxd)
plot(Dm)
points(pt)
The distances should be fine, also when using long/lat data. But rasterFromPoints could indeed be a bit slow with a large data set as it uses a brute force algorithm.
I am trying to calculate the closest distance between locations in the ocean and points on land but not going through a coastline. Ultimately, I want to create a distance to land-features map.
This map was created using rdist.earth and is a straight line distance. Therefore it is not always correct because it not taking into account the curvatures of the coastline.
c<-matrix(coast_lonlat[,1], 332, 316, byrow=T)
image(1:316, 1:332, t(c))
min_dist2_feature<-NULL
for(q in 1:nrow(coast_lonlat)){
diff_lonlat <- rdist.earth(matrix(coast_lonlat[q,2:3],1,2),as.matrix(feature[,1:2]), miles = F)
min_dist2_feature<-c(min_dist2_feature, min(diff_lonlat,na.rm=T))
}
distmat <- matrix( min_dist2_feature, 316, 332)
image(1:316, 1:332, distmat)
Land feature data is a two column matrix of xy coordinates, e.g.:
ant_x <- c(85, 90, 95, 100)
ant_y <- c(-68, -68, -68, -68)
feature <- cbind(ant_x, ant_y)
Does anyone have any suggestions? Thanks
Not fully errorchecked yet but it may get you started. Rather than coastlines, I think you need to start with a raster whose the no-go areas are set to NA.
library(raster)
library(gdistance)
library(maptools)
library(rgdal)
# a mockup of the original features dataset (no longer available)
# as I recall it, these were just a two-column matrix of xy coordinates
# along the coast of East Antarctica, in degrees of lat/long
ant_x <- c(85, 90, 95, 100)
ant_y <- c(-68, -68, -68, -68)
feature <- cbind(ant_x, ant_y)
# a projection I found for antarctica
antcrs <- crs("+proj=stere +lat_0=-90 +lat_ts=-71 +datum=WGS84")
# set projection for your features
# function 'project' is from the rgdal package
antfeat <- project(feature, crs(antcrs, asText=TRUE))
# make a raster similar to yours
# with all land having "NA" value
# use your own shapefile or raster if you have it
# the wrld_simpl data set is from maptools package
data(wrld_simpl)
world <- wrld_simpl
ant <- world[world$LAT < -60, ]
antshp <- spTransform(ant, antcrs)
ras <- raster(nrow=300, ncol=300)
crs(ras) <- crs(antshp)
extent(ras) <- extent(antshp)
# rasterize will set ocean to NA so we just inverse it
# and set water to "1"
# land is equal to zero because it is "NOT" NA
antmask <- rasterize(antshp, ras)
antras <- is.na(antmask)
# originally I sent land to "NA"
# but that seemed to make some of your features not visible
# so at 999 land (ie everything that was zero)
# becomes very expensive to cross but not "impossible"
antras[antras==0] <- 999
# each cell antras now has value of zero or 999, nothing else
# create a Transition object from the raster
# this calculation took a bit of time
tr <- transition(antras, function(x) 1/mean(x), 8)
tr = geoCorrection(tr, scl=FALSE)
# distance matrix excluding the land
# just pick a few features to prove it works
sel_feat <- head(antfeat, 3)
A <- accCost(tr, sel_feat)
# now A still shows the expensive travel over land
# so we mask it out for sea travel only
A <- mask(A, antmask, inverse=TRUE)
plot(A)
points(sel_feat)
Seems to be working because the left side ocean has higher values than the right side ocean, and likewise as you go down into the Ross Sea.