how to merge a shapefile with a dataframe with latitude/longitude data - r

I am struggling with the following issue
I have downloaded the PLUTO NYC Manhattan Shapefile for the NYC tax lots from here https://www1.nyc.gov/site/planning/data-maps/open-data/dwn-pluto-mappluto.page
I am able to read them in sf with a simple st_read
> mydf
Simple feature collection with 42638 features and 90 fields
geometry type: MULTIPOLYGON
dimension: XY
bbox: xmin: 971045.3 ymin: 188447.4 xmax: 1010027 ymax: 259571.5
epsg (SRID): NA
proj4string: +proj=lcc +lat_1=40.66666666666666 +lat_2=41.03333333333333 +lat_0=40.16666666666666 +lon_0=-74 +x_0=300000 +y_0=0 +datum=NAD83 +units=us-ft +no_defs
First 10 features:
Borough Block Lot CD CT2010 CB2010 SchoolDist Council ZipCode FireComp PolicePrct HealthCent HealthArea
1 MN 1545 52 108 138 4000 02 5 10028 E022 19 13 3700
My problem is the following: I have a dataframe as follows
> data_frame('lat' = c(40.785091,40.785091), 'lon' = c(-73.968285, -73.968285))
# A tibble: 2 x 2
lat lon
<dbl> <dbl>
1 40.785091 -73.968285
2 40.785091 -73.968285
I would like to merge this data to the mydf dataframe above, so that I can count how many latitude/longitude observations I have within each tax lot (remember, mydf is at the tax lot granularity), and plot the corresponding map of it. I need to do so using sf.
In essence something similar to
pol <- mydf %>% select(SchoolDist)
plot(pol)
but where the counts for each tax lot come from counting how many points in my latitude/longitude dataframe fall into them.
Of course, in my small example I just have 2 points in the same tax lot, so that would just highlight one single tax lot in the whole area. My real data contains a lot more points.
I think there is an easy way to do it, but I was not able to find it.
Thanks!

This is how I would do it with arbitrary polygon and point data. I wouldn't merge the two and instead just use a geometry predicate to get the counts that you want. Here we:
Use the built in nc dataset and transform to 3857 crs, which is projected rather than lat-long (avoids a warning in st_contains)
Create 1000 random points within the bounding box of nc, using st_bbox and runif. Note that st_as_sf can turn a data.frame with lat long columns into sf points.
Use lengths(st_contains(polygons, points) to get the counts of points per polygon. sgbp objects created by a geometry predicate are basically "for each geometry in sf x, what indices of geometries in sf y satisfy the predicate". So lengths1 effectively gives the number of points that satisfy the predicate for each geometry, in this case number of points contained within each polygon.
Once the counts are in the sf object as a column, we can just select and plot them with the plot.sf method.
For your data, simply replace nc with mydf and leave out the call to tibble, instead use your data.frame with the right lat long pairs.
library(tidyverse)
library(sf)
#> Linking to GEOS 3.6.1, GDAL 2.2.3, proj.4 4.9.3
nc <- system.file("shape/nc.shp", package="sf") %>%
read_sf() %>%
st_transform(3857)
set.seed(1000)
points <- tibble(
x = runif(1000, min = st_bbox(nc)[1], max = st_bbox(nc)[3]),
y = runif(1000, min = st_bbox(nc)[2], max = st_bbox(nc)[4])
) %>%
st_as_sf(coords = c("x", "y"), crs = 3857)
plot(nc$geometry)
plot(points$geometry, add = TRUE)
nc %>%
mutate(pt_count = lengths(st_contains(nc, points))) %>%
select(pt_count) %>%
plot()
Created on 2018-05-02 by the reprex package (v0.2.0).

I tried this on your data, but the intersection is empty for the both sets of points you provided. However, the code should work.
EDIT: Simplified group_by + mutate with add_count:
mydf = st_read("MN_Dcp_Mappinglot.shp")
xydf = data.frame(lat=c(40.758896,40.758896), lon=c(-73.985130, -73.985130))
xysf = st_as_sf(xydf, coords=c('lon', 'lat'), crs=st_crs(mydf))
## NB: make sure to st_transform both to common CRS, as Calum You suggests
xysf %>%
sf::st_intersection(mydf) %>%
dplyr::add_count(LOT)
Reproducible example:
nc = sf::st_read(system.file("shape/nc.shp", package="sf"))
ncxy = sf::st_as_sf(data.frame(lon=c(-80, -80.1, -82), lat=c(35.5, 35.5, 35.5)),
coords=c('lon', 'lat'), crs=st_crs(nc))
ncxy = ncxy %>%
sf::st_intersection(nc) %>%
dplyr::add_count(FIPS)
## a better approach
ncxy = ncxy %>%
sf::st_join(nc, join=st_intersects) %>%
dplyr::add_count(FIPS)
The new column n includes the total number of points per FIPS code.
ncxy %>% dplyr::group_by(FIPS) %>% dplyr::distinct(n)
> although coordinates are longitude/latitude, st_intersects assumes
that they are planar
# A tibble: 2 x 2
# Groups: FIPS [2]
FIPS n
<fctr> <int>
1 37123 2
2 37161 1
I'm not sure why your data results in an empty intersection, but since the code works on the example above there must be a separate issue.
HT: st_join approach from this answer.

Related

How to extract the largest polygon in a raster?

I can read a rastre and exrec polygones likes:
If I have this code to read a raster and shapefile:
library(raster)
library(geojsonsf)
library(sf)
library(exactextractr)
r <- raster(matrix(rnorm(10*12), nrow=10), xmn = -180, xmx= 180, ymn = -90, ymx= 90)
myurl <- "http://eric.clst.org/assets/wiki/uploads/Stuff/gz_2010_us_050_00_500k.json"
geo <- readLines(myurl)
geo <- paste0(geo, collapse = "")
system.time({ sf <- geojson_sf(geo)})
#add crs information for the raster 'r'
crs(r) <- 4326
# extract the 'r' raster value for each polygon 'NAME' in 'sf'
res <- do.call(rbind, exactextractr::exact_extract(r, sf, include_cols = 'NAME'))[-3]
It can be that several polygons are within one pixel and I need to extract not all polygons but only the largest polygon in a pixel.
Here's some code that does what I think you want which is to find the polygon that is largest for each of the raster cells. I have modified the code so rasters can be uniquely identified and then I use GEO_ID because NAME is not unique in the data (there are 31 Washingtons for example). I use dplyr to find the maximum coverage for each raster and mapview to view the results and convince myself that the code is working.
library(raster)
library(geojsonsf)
library(sf)
library(exactextractr)
library(mapview)
library(dplyr)
# Give the raster cells a unique identifier so we can use this to find which
# polygon is the maximum in a given raster
r <- raster(matrix(1:120, nrow=10), xmn = -180, xmx= 180, ymn = -90, ymx= 90)
crs(r) <- 4326
myurl <- "http://eric.clst.org/assets/wiki/uploads/Stuff/gz_2010_us_050_00_500k.json"
geo <- readLines(myurl)
geo <- paste0(geo, collapse = "")
sf <- geojson_sf(geo)
# Find the polygons that overlap with the raster cells
# the coverage fraction says how much overlap so
# simply find the largest whilst grouping ny the id
# of the raster
# Use GEO_ID because NAME has many duplicates
overlay <- do.call(rbind, exactextractr::exact_extract(r, sf, include_cols = c('GEO_ID', 'NAME')))
# Find the maximum in each raster
maximum_in_raster <- overlay %>% group_by(value) %>% top_n(1, coverage_fraction)
# Create a subset of polygons corresponding to the maxima
# This is where GEO_ID is important
maximum_polygons <- sf[sf$GEO_ID %in% maximum_in_raster$GEO_ID, 'NAME']
# Make a grid to display
sf_grid <- st_make_grid(r, n=c(12,10))
# Use mapview to have a look at the results to convince ourselves that it's working
mapview(maximum_polygons, alpha.regions=1) + mapview(sf_grid, col.regions='white')
Here's an example showing the maxima in the south west of the US.
The 4 polygons shown are Lake, Nye, San Bernardino and San Luis Obispo. The horizontal and vertical lines show the boundaries of the raster cells.
And to make the point about duplicate names, here is the code to count how many names appear in each raster cell.
counts <- overlay %>% count(value, NAME, sort = T)
head(counts)
value NAME n
1 33 Washington 13
2 23 Lincoln 12
3 33 Franklin 12
4 23 Washington 10
5 23 Douglas 9
6 23 Grant 9
Which shows 13 Washingtons in cell 33.

Cleaning Geocode data with r

I am cleaning my dataset and I don't know how to clean GPS data.
when I use the table function I find that they are entered in different shapes.
"547140",
"35.6997",
"251825.7959",
"251470.43",
"54/4077070001",
and "54/305495"
I don't know how to clean this variable with this great difference.
I would be thankful if help me or suggest me a website for training.
Your main issue is standardizing the GPS by projecting GPS to a coordinate system of choice. Say we have the GPS of amsterdam in two different coordinate systems, one in amersfoort/rd new (espg 28992) and one in wsg1984 (espg 4326):
x y location espg
1: 1.207330e+05 486632.35593 amsterdam 28992
2: 4.884088e+00 52.36651 amsterdam 4326
structure(list(x = c(120733.012428048, 4.88408811380055), y = c(486632.355933105,
52.3665054922233), location = c("amsterdam", "amsterdam"), espg = c(28992,
4326)), row.names = c(NA, -2L), class = "data.frame")
What we want to do is reproject our coordinates to one geographic coordinate system of choice. In this case I used WSG1984 (espg 4326).
library(sf)
#here I tell R which columns contain the coordinates
coordinates(dt) <- ~x+y
#I now convert the table to a spatial object
dt <- st_as_sf(dt)
#here I split by the different ESPG's present
dt <- split(dt, dt$espg)
#here I loop through every individual espg present in the dataset
for(i in 1:length(dt)){
#here I say in which coordinate system (espg) the GPS data is in
st_crs(dt[[i]]) <- unique(dt[[i]]$espg)
#here I transform the coordinates to another projection (in this case WSG1984, espg 4326)
dt[[i]] <- dt[[i]] %>% st_transform(4326)
}
#here I bind the items of the list together
dt <- do.call(rbind, dt)
head(dt)
Simple feature collection with 2 features and 2 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 4.884088 ymin: 52.36651 xmax: 4.884088 ymax: 52.36651
Geodetic CRS: WGS 84
location espg geometry
4326 amsterdam 4326 POINT (4.884088 52.36651)
28992 amsterdam 28992 POINT (4.884088 52.36651)
In the geometry column you now see that the coordinates are equal to one another.
Bottom line is that you need to know the geographic coordinate system the GPS data is in. Then you can convert your data from a table to a spatial object and transform the GPS data to a projection of choice.
In addition, it is always a good idea to check if your assumption on the original ESPG is good by for example plotting the data.
library(ggplot2)
library(ggspatial)
ggplot(dt) + annotation_mape_tile() + geom_sf(size = 4) + theme(text = element_text(size = 15) + facet_wrap(~espg)
In the figurebelow we see that the projection went well for both espg's.

Remove line from polygon crossing the international dateline in R (e.g. Russia in rnaturalearth)

Question: Polygons that cross the international dateline frequently have a North-South line through them. Eastern Russia in the rnaturalearth package is a good example of this, but I have also encountered it with other spatial data. I would like to be able to remove this line for plotting.
Attempts:
I primarily use the sf package in R for mapping. I have tried various solutions involving st_union, st_combine, st_wrap_dateline, st_remove_holes, as well as using functions from other packages such as aggregate, merge, and gUnaryUnion, but my efforts have been fruitless so far.
Example: The following code demonstrates the problem lines in Russia along the international dateline using the popular rnaturalearth package.
library(tidyverse)
library(rnaturalearth)
library(sf)
#Import data
world <- ne_countries(scale = "medium",
returnclass = "sf")
#I use the Alaska albers projection for this map,
#limit extent (https://spatialreference.org/ref/epsg/nad83-alaska-albers/)
xmin <- -2255938
xmax <- 1646517
ymin <- 449981
ymax <- 2676986
#plot
ggplot()+
geom_sf(data=world, color="black", size=1)+
coord_sf(crs=3338)+
xlim(c(xmin,xmax))+ylim(c(ymin,ymax))+
theme_bw()
Thanks!
Short answer
EPSG:3338 is the problem - use a UTM (326XX or 327XX) code instead.
Long answer
My gut feeling is this is related to the challenges of projecting geographic (long-lat) data to a flat surface - either a projected CRS, or more simply the flat surface of the plot viewer pane in RStudio.
We know that on a ellipsoidal model of Earth, the (minimum) on-ground distance between longitudes of -179 and +179 is the same as the distance between -1 and +1, a distance of 2 degrees. However from a numerical perspective, these two lines of longitude have a distance of 358 degrees between them.
Imagine you are an alien (or a flat-earther) and looking at the following projection of world, and you didn't know that Earth was ellipsoidal in shape (or you didn't know this was a projection). You would be forgiven for thinking that to get from one part of Russia (red) to the other, you would have to get wet. I guess by default, ggplot is a flat-earther.
Imagine each polygon in the above plot is a piece of a jigsaw. In your plot, I guess you are setting the origin to the centre of EPSG:3338 (coord_sf(crs = 3338)), which I think is somewhere in Alaska/Canada? (I'm guessing here as I don't use this notation, rather I prefer to transform data before sending to ggplot). Regardless, ggplot knows it should rearrange it's 'puzzle pieces', so longitude -179 and +179 are next to each other - but this is purely visual, as in your plot:
So, my guess is that when you try and use st_union() or st_simplify(), the polygons aren't actually next to each other in space so are not joined. This is where a projected CRS should solve the problem, transforming the coords to values relative to an origin other than (long 0, lat 0).
This I think is one source of trouble for you - a quick google of EPSG:3338 says it is good for Alaska, but no mention of Russia. The first thing that came up when I googled 'utm russia' was EPSG:32635. So, let's take a look at the values for longitude for EPSG codes 4326 (WGS84 longlat), 3338 (NAD83 Alaska) and 32635.
# pull out russia
world %>%
filter(
str_detect(name_long, 'Russia')
) %>%
select(name_long, geometry) %>%
{. ->> russia}
# extract coords of each projection
russia %>%
st_transform(3338) %>%
{. ->> russia_3338} %>%
st_coordinates %>%
as_tibble %>%
select(X) %>%
mutate(
crs = 'utm_3338'
) %>%
{. ->> russia_coords_3338}
russia %>%
st_transform(4326) %>%
{. ->> russia_4326} %>%
st_coordinates %>%
as_tibble %>%
select(X) %>%
mutate(
crs = 'utm_4326'
) %>%
{. ->> russia_coords_4326}
russia %>%
st_transform(32635) %>%
{. ->> russia_32635} %>%
st_coordinates %>%
as_tibble %>%
select(X) %>%
mutate(
crs = 'utm_32635'
) %>%
{. ->> russia_coords_32635}
Let's combine them and look at a histogram of longitude values
# inspect X coords on a histogram
bind_rows(
russia_coords_3338,
russia_coords_4326,
russia_coords_32635,
) %>%
ggplot(aes(X))+
geom_histogram()+
facet_wrap(~crs, ncol = 1, scales = 'free')
So, as you can see projections 4326 and 3338 have 2 distinct groups of coords at either ends of the earth, with a big break (spanning x = 0) in between. Projection 32635 though, has only one group of coords, suggesting that the 2 parts of Russia, according to this projection, are numerically positioned next to each other. Projection 32635 works because it transforms the coords into '(minimum?) distance from an origin'; the origin of which (unlike long-lat coords) is not on the opposite side of the world and doesn't need to go 2 different directions around the globe to determine minimum distance to either end of the country (this is what causes the break in longitude coords for the other 2 projections). I don't know enough about EPSG:3338 to explain why it does this too, but suspect it's because it is Alaska-focused so they didn't consider crossing the 180th meridian.
If we plot russia_32635 we can see these pieces are next to each other, but remember we don't trust ggplot just yet. When we use st_simplify() this date line (red) disappears, proving that the 2 polygons are next to each other and can be simplified/unioned.
ggplot()+
geom_sf(data = russia_32635, colour = 'red')+
geom_sf(data = russia_32635 %>% st_simplify, fill = NA)
st_simplify() has dissolved the 2 boundaries on the date line, reducing our number of individual polygons from 100 to 98.
russia_32635 %>%
st_cast('POLYGON')
# Simple feature collection with 100 features and 1 field
# Geometry type: POLYGON
# Dimension: XY
# Bounding box: xmin: 21006.08 ymin: 4772449 xmax: 6273473 ymax: 13233690
# Projected CRS: WGS 84 / UTM zone 35N
russia_32635 %>%
st_simplify %>%
st_cast('POLYGON')
# Simple feature collection with 98 features and 1 field
# Geometry type: POLYGON
# Dimension: XY
# Bounding box: xmin: 21006.08 ymin: 4772449 xmax: 6273473 ymax: 13233690
# Projected CRS: WGS 84 / UTM zone 35N
Alternatively, it looks like st_union(..., by_feature = TRUE) also works - see ?st_union:
If by_feature is TRUE each feature geometry is unioned. This can for instance be used to resolve internal boundaries after polygons were combined using st_combine.
russia_32635 %>%
st_union(by_feature = TRUE) %>%
st_cast('POLYGON')
# Simple feature collection with 98 features and 1 field
# Geometry type: POLYGON
# Dimension: XY
# Bounding box: xmin: 21006.08 ymin: 4772449 xmax: 6273473 ymax: 13233690
# Projected CRS: WGS 84 / UTM zone 35N
So, technically there is your plot of Russia without the date line. I think Russia is tricky to plot because a) it's close to the poles, and b) it covers such a vast area meaning most projections are going to skew from one end of the country to another.
However to me, it makes sense to orient the plot 'north-up'. A way to do this is to make your own 'Mollweide' projection and assign the origin to the approximate centre of Russia (lon 99, lat 65). Without st_buffer(0), this plots with the date line for some reason (see here and here for examples, and section 6.5 here for explanation).
my_proj <- '+proj=moll +lon_0=99 +lat_0=65 +units=m'
russia_32635 %>%
st_buffer(0) %>%
st_transform(crs(my_proj)) %>%
st_simplify %>%
ggplot()+
geom_sf()
Bonus
I tried plotting russia_32635 %>% st_simplify with tmap and leaflet, but did not get desired results. I assume this is because these packages prefer geographic (lon-lat) coords; leaflet only accepts longlat format as far as I can tell, and although tmap can certainly handle projected data, my guess is that under the bonnet it transforms it (or similar) to it's preferred projection. Workarounds look to be available at the same links as above if you really really want this visualisaiton (here, here and here).
library(tmap)
russia_32635 %>%
st_simplify %>%
tm_shape()+
tm_polygons()
library(leaflet)
russia_32635 %>%
st_simplify %>%
st_transform(4326) %>% # because leaflet only works with longlat projections
leaflet %>%
addTiles %>%
addPolygons()
Ultimately, you can only preserve 2/3 primary characteristics when projecting data: area, direction or distance. This is made even more obvious when projecting something as big and polar as Russia. Hopefully one of these options is suitable for your problem.
I feel like I made significant progress, so I'm posting, but this isn't a complete answer.
# This is the portion containing the international dateline
df <- world[184, ]
# Split MULTIPOLYGON into individuals
df2 <- st_cast(df, "POLYGON")
# The little blob at the top is in df2[36, ] and df[38, ]
# Simplify it with the right tolerance and the line is gone
ggplot()+
geom_sf(data=st_simplify(st_union(df2[36, ], df2[38, ]), dTolerance = 2), color="black", size=1)+
coord_sf(crs=3338)+
xlim(c(xmin,xmax))+ylim(c(ymin,ymax))+
theme_bw()
Result:
Another solution is to use ms_dissolve() from rmapshaper package
chukotka %>%
st_transform(32660) %>%
rmapshaper::ms_dissolve() %>%
ggplot()+
geom_sf()

Length of polygon in R

I want to calculate length of each polygon.
-Around each polygon I created points (st_sample),
-from combiantion of points I created all possible polyline,
-for polylines which are inside polygon I calucalted length,
-the longest polyline is my result (max length of poylgon).
I wrote code which got me results but it is really slow. Do you have some solution for improvment of my code? I know that with two loops I cannot expect some miracle about speed but I do not know how get results another way.
If nothing else mybe at least some alterntive solution for creating all polyline from combination of points for one polygon in one step without loop ? :)
thank you
library(sf)
library(data.table)
poly=st_read(system.file("shape/nc.shp", package="sf"))
poly=poly[1:10,]
poly=st_cast(poly,"POLYGON")
poly$max_length=0
##Combination of 10 points, withot repetiton
aa=CJ(1:10,1:10)
aa=aa[!duplicated(t(apply(aa[,.(V1, V2)], 1, sort))),][V1!=V2]
##for each polygon create sample of coordinates along line, from them I create polyline and calculated length for linestring which are inside polygon
for (ii in 1:nrow(poly)){
ncl=st_cast(poly[ii,],"LINESTRING")
##sample of point along line
ncp=st_cast(st_sample(ncl,10, type="regular", exact=T),"POINT")
##create empty sf
aaa=st_sf(st_sfc())
st_crs(aaa)="NAD27"
##for each combination of points create linestring and calculate length only for polylines which are inside polygon
for (i in 1:nrow(aa)){
aaa=rbind(aaa,st_sf(geometry=st_cast(st_union(ncp[t(aa[i])]),"LINESTRING")))
}
poly$max_length[ii]=as.numeric(max(st_length(aaa[unlist(st_contains(poly[ii,],aaa)),])))
}
Second attempt with running function inside data.table. One loop less but problem is probably second loop.
poly=st_read(system.file("shape/nc.shp", package="sf"))
poly=poly[1:10,]
poly=st_cast(poly,"POLYGON")
poly$max_length=0
##Combination of 10 points, withot repetiton
aa=CJ(1:10,1:10)
aa=aa[!duplicated(t(apply(aa[,.(V1, V2)], 1, sort))),][V1!=V2]
overFun <- function(x){
ncl=st_cast(x[,geometry],"LINESTRING")
##sample of point along line
ncp=st_cast(st_sample(ncl,40, type="regular", exact=T),"POINT")
##create empty sf
aaa=st_sf(st_sfc())
st_crs(aaa)="NAD27"
##for each combination pof points create linestring and calculate length
for (i in 1:nrow(aa)){
aaa=rbind(aaa,st_sf(geometry=st_cast(st_union(ncp[t(aa[i])]),"LINESTRING")))
}
as.numeric(max(st_length(aaa[unlist(st_contains(x[,geometry],aaa)),])))}
setDT(poly)
##run function inside data.table
poly[,max_length:=overFun(poly), by=seq(nrow(poly))]
Edit: I found some solution for my problem which is enough fast for my needs.
Using parallel library inside data.table with function which also work on a data.table. There is still question why some polyline are excluded with function st_contains (see picture upper). Maybe some problem with precision?
library(sf)
library(data.table)
poly=st_read(system.file("shape/nc.shp", package="sf"))
poly=st_cast(poly,"POLYGON")
setDT(poly)
##Combination of 10 points, withot repetiton
aa=CJ(1:10,1:10)
aa=aa[!duplicated(t(apply(aa[,.(V1, V2)], 1, sort))),][V1!=V2]
overFun <- function(x){
ncl=st_cast(poly[1,geometry],"LINESTRING")
##sample of point along line
ncp=st_cast(st_sample(ncl,10, type="regular", exact=T),"POINT")
df=data.table(ncp[aa[,V1]],ncp[aa[,V2]] )
df[,v3:=st_cast(st_union(st_as_sf(V1),st_as_sf(V2)),"LINESTRING"), by=seq(nrow(df))]
as.numeric(max(st_length(df[unlist(st_contains(poly[1,geometry], df$v3)),]$v3)))}
library(parallel)
cl <- makeCluster(detectCores() - 1)
clusterExport(cl, list("overFun","data.table","st_cast","CJ","poly","st_sample","st_sf","st_sfc","aa","st_length","st_union",
"st_as_sf","st_contains"))
system.time(poly[,c("max_length"):=.(clusterMap(cl, overFun, poly$geometry)),])
stopCluster(cl)
I encountered a similar problem and frankly have not found any ready-made solution.
I will use the same Ashe county from sf package.
library(sf)
library(dplyr)
shape <- st_read(system.file("shape/nc.shp", package="sf")) %>%
dplyr::filter(CNTY_ID == 1825) %>% # Keep only one polygon
st_transform(32617) # Reproject to WGS 84 / UTM zone 17N
Solution 1
What you can do with just dplyr, tidyr, and sf is to turn polygons into points and calculate the distance between all the points. From this variety, choose the maximal value. It would be a green line from your example figure.
library(tidyr)
shape %>%
st_cast("POINT") %>% # turn polygon into points
distinct() %>% # remove duplicates
st_distance() %>% # calculate distance matrix
as.data.frame() %>%
gather(point_id, dist) %>% # convert to long format
pull(dist) %>% # keep only distance column
max()
#> Warning in st_cast.sf(., "POINT"): repeating attributes for all sub-geometries
#> for which they may not be constant
#> 45865.15 [m]
Solution 2
You can also use the Momocs package. It was created for 2D morphometric analysis. While it wasn't essential to reproject our shape to UTM in the first case (sf can handle geographic coordinates), your polygon should be projected in the case of the Momocs package.
library(Momocs)
shape %>%
st_cast("POINT") %>% # Polygon to points
distinct() %>% # remove duplicates
st_coordinates() %>% # get coordinates matrix
coo_calliper() # calculate max length
#> Warning in st_cast.sf(., "POINT"): repeating attributes for all sub-geometries
#> for which they may not be constant
#> [1] 45865.15
Comments
There are several other functions in the Momocs package. For example, you can calculate the length of a shape based on their iniertia axis i.e. alignment to the x-axis. The coo_length will return you 44432.02 [m].
For example, one can apply several functions from the Momocs package to the coordinate matrix as following:
point_matrix <- shape %>%
st_cast("POINT") %>%
distinct() %>%
st_coordinates()
#> Warning in st_cast.sf(., "POINT"): repeating attributes for all sub-geometries
#> for which they may not be constant
funs <- list("length" = coo_length,
"width" = coo_width,
"elongation" = coo_elongation)
sapply(funs, function(fun, x) fun(x), x = point_matrix)
#> length width elongation
#> 4.443202e+04 3.921162e+04 1.174917e-01
If you are after circumference of your polygons consider this code:
library(sf)
library(dplyr)
shape <- st_read(system.file("shape/nc.shp", package="sf")) # included with sf package
lengths <- shape %>%
mutate(circumference = st_length(.)) %>%
st_drop_geometry() %>%
select(NAME, circumference)
head(lengths)
NAME circumference
1 Ashe 141665.4 [m]
2 Alleghany 119929.0 [m]
3 Surry 160497.7 [m]
4 Currituck 301515.3 [m]
5 Northampton 211953.8 [m]
6 Hertford 160892.0 [m]
If you have some holes inside and do not want them included in the circumference consider removing them via nngeo::st_remove_holes().

Make a vector of coordinates to filter data within a certain area

Rookie R user here and I would greatly appreciate any help you someone could give me.
My project requires me to create a vector boundary box around a city of my choice and then filter a lot of data so I only have the data relative to the area. However, it is several years since I have used R studio and its fair to say I remember little to nothing about the language.
I have initially used
geocode("Hereford, UK")
bbox <-c(Longitude=-2.72,Latitude=52.1)
myMap <- get_map(location = "Hereford, UK",source="google",maptype="roadmap")
I then must create a new tibble which filters out and gives only the relevant data to the area.
I am unsure how to proceed with this and I then must overlay the data onto the map which I have created.
As I only have a centre point of coordinates, is it possible to create a circle with a radius of say 3 miles around the centre of my location so I can then filter this area?
Thank you all for taking the time to read my post. Cheers!
Most spatial work can now be done pretty easily using the sf package.
Example code for a similar problem is below. The comments explain most of what it does.
The difficult part may be in understanding map projections (the crs). Some use units(meters, feet, etc) and others use latitude / longitude. Which one you choose depends on what area of the globe you're working with and what you're trying to accomplish. Most web mapping uses crs 4326, but that does not include an easily usable distance measurement.
The map below shows points outside ~3 miles from Hereford as red, and those inside in dark maroon. The blue point is used as the center for Hereford & the buffer zone.
library(tidyverse)
library(sf)
#> Linking to GEOS 3.6.2, GDAL 2.2.3, PROJ 4.9.3
library(mapview)
set.seed(4)
#hereford approx location, ggmap requires api key
hereford <- data.frame(place = 'hereford', lat = -2.7160, lon = 52.0564) %>%
st_as_sf(coords = c('lat', 'lon')) %>% st_set_crs(4326)
#simulation of data points near-ish hereford
random_points <- data.frame(point_num = 1:20,
lat = runif(20, min = -2.8, max = -2.6),
lon = runif(20, min = 52, max = 52.1)) %>%
st_as_sf(coords = c('lat', 'lon')) %>% st_set_crs(4326) %>%st_transform(27700)
#make a buffer of ~3miles (4800m) around hereford
h_buffer <- hereford %>% st_transform(27700) %>% #change crs to one measured in meters
st_buffer(4800)
#only points inside ~3mi buffer
points_within <- random_points[st_within( random_points, h_buffer, sparse = F), ]
head(points_within)
#> Simple feature collection with 6 features and 1 field
#> geometry type: POINT
#> dimension: XY
#> bbox: xmin: 346243.2 ymin: 239070.3 xmax: 355169.8 ymax: 243011.4
#> CRS: EPSG:27700
#> point_num geometry
#> 1 1 POINT (353293.1 241673.9)
#> 3 3 POINT (349265.8 239397)
#> 4 4 POINT (349039.5 239217.7)
#> 6 6 POINT (348846.1 243011.4)
#> 7 7 POINT (355169.8 239070.3)
#> 10 10 POINT (346243.2 239690.3)
#shown in mapview
mapview(hereford, color = 'blue') +
mapview(random_points, color = 'red', legend = F, col.regions = 'red') +
mapview(h_buffer, legend = F) +
mapview(points_within, color = 'black', legend = F, col.regions = 'black')
Created on 2020-04-12 by the reprex package (v0.3.0)

Resources