Aggregate features by distance in sf objects - r

I have two sf objects: polygon county (note: this is a multiple polygon, i.e. many counties) and points monitor2.
The county looks like below. Chinese characters cannot be displayed properly, but it's not a big deal.
Simple feature collection with 6 features and 4 fields
geometry type: MULTIPOLYGON
dimension: XY
bbox: xmin: 113.15 ymin: 20.58265 xmax: 124.5656 ymax: 40.10793
epsg (SRID): 4326
proj4string: +proj=longlat +datum=WGS84 +no_defs
City District Province Code geometry
1 <U+53F0><U+6E7E><U+7701> <U+53F0><U+6E7E><U+7701> <U+53F0><U+6E7E><U+7701> 710000 MULTIPOLYGON (((116.7346 20...
2 <U+5317><U+4EAC><U+5E02> <U+671D><U+9633><U+533A> <U+5317><U+4EAC><U+5E02> 110105 MULTIPOLYGON (((116.4834 40...
3 <U+4E0A><U+6D77><U+5E02> <U+666E><U+9640><U+533A> <U+4E0A><U+6D77><U+5E02> 310107 MULTIPOLYGON (((121.3562 31...
4 <U+4E0A><U+6D77><U+5E02> <U+5B9D><U+5C71><U+533A> <U+4E0A><U+6D77><U+5E02> 230506 MULTIPOLYGON (((121.4855 31...
5 <U+5E7F><U+5DDE><U+5E02> <U+767D><U+4E91><U+533A> <U+5E7F><U+4E1C><U+7701> 440111 MULTIPOLYGON (((113.4965 23...
6 <U+798F><U+5DDE><U+5E02> <U+9F13><U+697C><U+533A> <U+798F><U+5EFA><U+7701> 320106 MULTIPOLYGON (((119.2611 26...
The monitor2 looks like below.
Simple feature collection with 6 features and 5 fields
geometry type: POINT
dimension: XY
bbox: xmin: 116.17 ymin: 39.8673 xmax: 116.473 ymax: 40.2865
epsg (SRID): 4326
proj4string: +proj=longlat +datum=WGS84 +no_defs
# A tibble: 6 x 6
code name city ref value geometry
<chr> <chr> <chr> <chr> <dbl> <POINT [°]>
1 1001A 万寿西宫 北京 N 47.8 (116.366 39.8673)
2 1002A 定陵 北京 Y 45.9 (116.17 40.2865)
3 1003A 东四 北京 N 42.2 (116.434 39.9522)
4 1004A 天坛 北京 N 51.2 (116.434 39.8745)
5 1005A 农展馆 北京 N 46.9 (116.473 39.9716)
6 1006A 官园 北京 N 49.5 (116.361 39.9425)
The first task is to join the value feature in monitor2 to county. I did this with st_is_within_distance and st_join. See the code below. I set distance to be 50 km. Some counties in the new polygon may have values from multiple points within the 50 km buffer.
new = st_join(county, monitor2,
join = st_is_within_distance, dist = 50)
Here comes the second task. I need to aggregate values from different points within that 50 km buffer by their distances to the centroid of the county. How do I achieve this task?
Any comments are welcome.

It's difficult to know exactly what you want without reproducible data, but here's an attempt to show how you can do this.
Get sample data. We reproject here from lat/long to something with metres so we can do distance based spatial operations. We'll use 3 counties from the sample data and use the middle county as the main polygon we want to measure distances from and add a random sample of points scattered across the three counties.
library(sf)
nc <- st_read(system.file("shape/nc.shp", package="sf"))
nc <- st_transform(nc, 32119) # NC state plane projection in metres
county = st_cast(nc[2,],"POLYGON")
p1 = st_as_sf(st_sample(nc[1:3, ], 200)) # random points
# Visualize
plot(st_geometry(nc)[1:3])
plot(county, col = "grey80", add = TRUE)
We want to focus only on points within some distance from our target county. Let's see what that looks like by adding a buffer using st_buffer.
plot(st_buffer(county, dist = 10000), col = NA, border = "red", lty = 3, add = TRUE)
We can subset the points within 10000m of the central county by using st_is_within_distance which would accomplish the same as doing an intersect with the st_buffer object.
p1_10 <- p1[st_is_within_distance(county,p1,dist = 10000, sparse = FALSE),]
Measuring distance between the centroid and each element of this subset is straight forward. We can then assign the distance measurement as a variable in the subset spatial object.
p1_10$distance_to_centroid <- as.vector(st_distance(st_centroid(county), p1_10))
Here's what that looks like plotted altogether
plot(st_geometry(nc)[1:3])
plot(county, col = "grey80", add = TRUE)
plot(p1, add = TRUE, pch = 19)
plot(st_buffer(county, dist = 10000), col = NA, border = "red", lty = 3, add = TRUE)
plot(st_centroid(county), col = "red", pch = 15, cex = 1, axes = TRUE, add = TRUE)
plot(p1_10["distance_to_centroid"], add = TRUE, pch = 19)
This is what the p1_10 obj looks like here:
> p1_10
Simple feature collection with 78 features and 1 field
geometry type: POINT
dimension: XY
bbox: xmin: 389967.6 ymin: 293489.4 xmax: 448197.1 ymax: 315140.7
CRS: EPSG:32119
First 10 features:
x distance_to_centroid
1 POINT (437228.1 294079.7) 21703.5425
2 POINT (425029.8 305656.7) 5868.4917
3 POINT (425131.4 309137.8) 6665.0253
4 POINT (409851.2 294971.7) 14549.0585
5 POINT (393070.6 303879.7) 26207.5651
6 POINT (436666.3 296282.2) 20070.5879
7 POINT (442623.8 295976.3) 25549.5662
8 POINT (400517.2 307897.4) 18746.6918
9 POINT (418763.7 306728) 724.6165
10 POINT (405001.4 294845.7) 18125.0738
So from here you can aggregate your features by distance using whatever method you want. In dplyr, it's pretty straightforward. Suppose for example here I wanted to aggregate in 5km intervals.
library(dplyr)
p1_10 %>%
mutate(dist_group = ceiling(distance_to_centroid/5000)) %>%
group_by(dist_group) %>%
tally() %>% # stop here if you want the result to retain geography
as_tibble() %>%
select(dist_group, n)
# A tibble: 7 x 2
dist_group n
<dbl> <int>
1 1 7
2 2 15
3 3 22
4 4 13
5 5 11
6 6 9
7 7 1

Related

having trouble testing point/geometry intersections in sf

i am trying to figure out how to use sf_intersects() to test whether or not point data that i have falls inside the geometries of some map data i have.
data i'm working with: https://osfm.fire.ca.gov/media/5818/fhszs19sn.zip
other data i'm working with too: https://osfm.fire.ca.gov/media/7564/c19fhszl06_5.zip
for now, i'm just trying to see if this data falls in the polygons of the above shapefile:
la_test_points <- data.frame(y = runif(1000, 33.6, 34.8), x = runif(1000, -119, -117.6))
when i put my map data and point data together, this is what it looks like:
so far, so good. now i attempt to test point/geometry intersections. as the figure suggests, i should be able to get quite a few.
# changing coordinate system of map created by shape file
la_fire_sra <- st_transform(st_as_sf(la_fire_sra), crs = 3857)
# merging test points with map data
la_test_points_merged <- st_as_sf(la_test_points, coords = c('y', 'x'), crs = st_crs(la_fire_sra))
# seeing if points fall within any of the geometries in the shapefile
la_test_points_merged <- la_test_points_merged %>%
mutate(intersection = st_intersects(geometry, la_fire_sra))
that last bit is where it all goes wrong. rstudio doesn't throw an error, but when i print la_test_points_merged to see my results, this is what i see:
> la_test_points_merged
Simple feature collection with 1000 features and 1 field
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 33.60155 ymin: -118.9959 xmax: 34.79907 ymax: -117.6015
Projected CRS: WGS 84 / Pseudo-Mercator
First 10 features:
Error in xj[i, , drop = FALSE] : incorrect number of dimensions
the last line above is in red.
when i try using st_intersection() instead of st_intersects(), i get a different error:
> la_test_points_merged <- la_test_points_merged %>%
+ mutate(intersection = st_intersection(geometry, la_fire_sra))
Error in `stopifnot()`:
! Problem while computing `intersection = st_intersection(geometry, la_fire_sra)`.
x `intersection` must be size 1000 or 1, not 0.
Run `rlang::last_error()` to see where the error occurred.
i would like to end up with a result like this that tells me whether or not each of the points in la_test_points is contained by any of the geometry values in la_fire_sa.
how can i fix this to make my code work? i have looked at lots of other similar questions, but i can't seem to find any answers that apply to my current situation.
thanks in advance for any help.
You can join the points to the shapefile, and the result will show you the fire hazard for each point that falls within a polygon. The default for an st_join is st_intersects, but you can change it if you'd like.
Below I've used one of the shapefiles you linked. If you need to use both you can combine them for a single dataframe with all the polygons. Looks like they have different columns though, so some cleaning might be needed.
library(tidyverse)
library(sf)
set.seed(3) #to make la_test_points reproducible
a <- read_sf('fhszs06_3_19.shp')
# Create synthetic data, make it an sf object, and set the crs
la_test_points <- data.frame(y = runif(1000, 33.6, 34.8), x = runif(1000, -119, -117.6)) %>%
st_as_sf(coords = c('x','y')) %>%
st_set_crs(4326) %>%
st_transform(st_crs(a))
# join the points with the fire hazard area
joined <- st_join(la_test_points, a)
# the sf dataframe, lots of NA's so they're removed for a look:
joined %>% filter(!is.na(HAZ_CODE)) %>% head()
#> Simple feature collection with 6 features and 5 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 125951 ymin: -433789.6 xmax: 177186.8 ymax: -369094
#> Projected CRS: NAD_1983_Albers
#> SRA HAZ_CODE HAZ_CLASS Shape_Leng Shape_Area geometry
#> 1 SRA 3 Very High 613618.0 686671532 POINT (163249.3 -395328.4)
#> 2 SRA 3 Very High 250826.8 233414399 POINT (127980.6 -433789.6)
#> 3 SRA 3 Very High 613618.0 686671532 POINT (167675.9 -386506.6)
#> 4 SRA 3 Very High 391522.6 297194108 POINT (143421.2 -369094)
#> 5 SRA 2 High 208122.8 211364977 POINT (177186.8 -388738.9)
#> 6 SRA 3 Very High 613618.0 686671532 POINT (125951 -399105.6)
# Plotting points, colored according to fire hazard code
ggplot() +
geom_sf(data = a) +
geom_sf(data = joined, aes(color = HAZ_CODE)) +
scale_color_gradient(low = 'yellow', high = 'red')
Created on 2022-11-08 with reprex v2.0.2
Edit to address joining the example shapefiles:
# Keeping the columns that the example shapefiles have in common,
# and joining them together.
ax <- a %>% select(HAZ_CODE, HAZ_CLASS, Shape_Leng, Shape_Area)
bx <- b %>% select(HAZ_CODE, HAZ_CLASS, Shape_Leng, Shape_Area)
fires <- rbind(ax, bx)
head(fires)
Simple feature collection with 6 features and 4 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 151597.9 ymin: -470591.9 xmax: 198216 ymax: -443900.4
Projected CRS: NAD83 / California Albers
# A tibble: 6 × 5
HAZ_CODE HAZ_CLASS Shape_Leng Shape_Area geometry
<int> <chr> <dbl> <dbl> <MULTIPOLYGON [m]>
1 3 Very High 5415. 1355567. (((152996.8 -469302.2, 152996.9 -469302.2, 152965.9 -469339.9, 152957.5 -…
2 3 Very High 2802. 423658. (((153701.7 -468506, 153703.9 -468590.6, 153708 -468758.1, 153707.6 -4687…
3 3 Very High 802. 32272. (((191491 -449977.1, 191494.3 -449973.2, 191517.3 -449946.5, 191521.5 -44…
4 3 Very High 1097. 40800. (((182453.8 -445649.1, 182216.3 -445706.6, 182215.4 -445655.7, 182170.4 -…
5 3 Very High 59226. 9379764. (((198201 -446611.2, 198199.9 -446580, 198199.1 -446551.3, 198200 -446580…
6 3 Very High 1255. 70800. (((186617.7 -444161.6, 186619 -444164.5, 186630.5 -444192.8, 186561.8 -44…

Why is my intersection of lat lon points in R not aligning with the correct zip codes?

This has been driving me nuts all day. I am mapping lat/lon coordinates to zip codes in North Carolina. ~5000 out of the 6000 points are mapping just fine, however, points around the military bases (Fort Bragg) were not mapping to any zip code. I wrote code to get the nearest zip code to each point to map it to those zip codes, but when I go back to check to make sure it worked correctly, it is mapping the points to zip codes not even close to where the lat/lon coordinates are.
Here is a link to the shapefile
https://www.nconemap.gov/datasets/nconemap::zip-code-tabulation-areas/about
library(sf)
library(tidyverse)
### Sample data from 3 points that did not work.
POINT_LAT = c(35.18, 35.181, 35.182)
POINT_LON = c(-79.19, -79.272, -79.24)
all_points = cbind(POINT_LAT, POINT_LON)
zipcode_nc = read.sf(NC_zipcodes.shp)
zipmap = st_transform(zipcode_nc, crs = 4326)
locations_zip = st_as_sf(all_points, coords = "POINT_LON", "POINT_LAT"), crs = st_crs(zipmap))
point_zips = locations_zip %>%
mutate(intersection = as.integer(st_intersects(geometry, zipmap)), area = if_else(is.na(intersection), ' ', zipmap$GEOID10,[intersection]))
## Try to map missing points to closest zip code.
points_zips_external_nearest = point_zips %>%
filter(is.na(intersection)) %>%
st_nearest_feature(zipmap)
points_zips_external = points_zips %>%
filter(is.na(intersection)) %>%
mutate(zip = point_zips$ZIP[points_zips_external_nearest])
This gives the wrong zip codes for the points.
Firstly, your example code caused me a couple of errors:
Couldn't find read.sf() function - do you mean read_sf()?
I couldn't make locations_zip using st_as_sf() without making all_points a tibble first.
When making points_zips, specifically mutateing area - the comma in zipmap$GEOID10,[intersection] isn't needed and causes an error.
As well as making sure your code is working, it would be helpful if you included the "wrong" results you are getting, and also the "right" results which you are expecting. That way other people can see if they are getting the right/wrong results themselves.
I simplified some of the code a bit, and I think I am getting the correct ZIP codes.
First, load libraries and data. I used st_read() to import the shapefile, from the name of the folder it came with when downloaded.
library(sf)
library(tidyverse)
library(tmap)
# sample points
POINT_LAT = c(35.18, 35.181, 35.182)
POINT_LON = c(-79.19, -79.272, -79.24)
all_points = cbind(POINT_LAT, POINT_LON)
# zipcode_nc = read.sf(NC_zipcodes.shp)
zipcode_nc <- st_read('ZIP_Code_Tabulation_Areas/ZIP_Code_Tabulation_Areas.shp')
zipmap = st_transform(zipcode_nc, crs = 4326)
zipmap
# Simple feature collection with 808 features and 8 fields
# Geometry type: MULTIPOLYGON
# Dimension: XY
# Bounding box: xmin: -84.32186 ymin: 33.84231 xmax: -75.46062 ymax: 36.58811
# Geodetic CRS: WGS 84
# First 10 features:
# OBJECTID ZCTA5CE10 AFFGEOID10 GEOID10 ALAND10 AWATER10 ShapeSTAre ShapeSTLen geometry
# 1 1 28306 8600000US28306 28306 177888344 2457841 180219025.11 155040.4114 MULTIPOLYGON (((-79.06381 3...
# 2 2 28334 8600000US28334 28334 414866754 3998968 418736167.65 161264.1606 MULTIPOLYGON (((-78.73546 3...
# 3 3 28169 8600000US28169 28169 1450978 0 1413700.15 6504.1990 MULTIPOLYGON (((-81.43786 3...
# 4 4 27278 8600000US27278 27278 262556156 2234207 264733578.27 138272.0708 MULTIPOLYGON (((-79.20634 3...
# 5 5 28325 8600000US28325 28325 2203868 0 2175834.07 7088.4067 MULTIPOLYGON (((-78.11489 3...
# 6 6 28472 8600000US28472 28472 469545124 1646985 471198707.46 225630.0920 MULTIPOLYGON (((-78.85764 3...
# 7 7 27841 8600000US27841 27841 684051 0 669653.06 4155.6886 MULTIPOLYGON (((-77.28181 3...
# 8 8 28280 8600000US28280 28280 19577 0 19572.48 560.1441 MULTIPOLYGON (((-80.8442 35...
# 9 9 28560 8600000US28560 28560 304195338 65656182 314791215.62 206410.4354 MULTIPOLYGON (((-77.15578 3...
# 10 10 27881 8600000US27881 27881 1760084 0 1765449.38 8334.0644 MULTIPOLYGON (((-77.44656 3...
# locations_zip = st_as_sf(all_points, coords = c("POINT_LON", "POINT_LAT"), crs = st_crs(zipmap))
locations_zip = st_as_sf(all_points %>% as_tibble, coords = c("POINT_LON", "POINT_LAT"), crs = st_crs(zipmap))
locations_zip
# Geometry type: POINT
# Dimension: XY
# Bounding box: xmin: -79.272 ymin: 35.18 xmax: -79.19 ymax: 35.182
# Geodetic CRS: WGS 84
# # A tibble: 3 x 1
# geometry
# * <POINT [°]>
# 1 (-79.19 35.18)
# 2 (-79.272 35.181)
# 3 (-79.24 35.182)
First up I tried plotting the data on a map using the tmap package:
# a map
tm_shape(zipmap)+
tm_polygons()
# Error: Shape contains invalid polygons. Please fix it or set tmap_options(check.and.fix = TRUE) and rerun the plot
The error indicates some of the polygons are invalid, and be due to various reasons such as self-intersections etc. See ?st_make_valid for more details.
So, I make the polygons valid using st_make_valid(), and then use st_cast() to separate each individual polygon. Without this some ZIP codes have multiple polygons so plotting the labels in the next step is tricky.
# make the polygons valid
zipmap %>%
st_make_valid() %>%
st_cast('POLYGON') %>%
{. ->> zipmap_2}
zipmap_2
# Simple feature collection with 966 features and 8 fields
# Geometry type: POLYGON
# Dimension: XY
# Bounding box: xmin: -84.32186 ymin: 33.84231 xmax: -75.46062 ymax: 36.58811
# Geodetic CRS: WGS 84
# First 10 features:
# OBJECTID ZCTA5CE10 AFFGEOID10 GEOID10 ALAND10 AWATER10 ShapeSTAre ShapeSTLen geometry
# 1 1 28306 8600000US28306 28306 177888344 2457841 180219025 155040.411 POLYGON ((-79.06394 34.9998...
# 1.1 1 28306 8600000US28306 28306 177888344 2457841 180219025 155040.411 POLYGON ((-78.8688 34.89214...
# 1.2 1 28306 8600000US28306 28306 177888344 2457841 180219025 155040.411 POLYGON ((-78.86443 34.9142...
# 1.3 1 28306 8600000US28306 28306 177888344 2457841 180219025 155040.411 POLYGON ((-78.8571 34.90978...
# 1.4 1 28306 8600000US28306 28306 177888344 2457841 180219025 155040.411 POLYGON ((-79.05672 34.9803...
# 2 2 28334 8600000US28334 28334 414866754 3998968 418736168 161264.161 POLYGON ((-78.73424 35.1739...
# 3 3 28169 8600000US28169 28169 1450978 0 1413700 6504.199 POLYGON ((-81.43807 35.3547...
# 4 4 27278 8600000US27278 27278 262556156 2234207 264733578 138272.071 POLYGON ((-79.20665 35.9857...
# 5 5 28325 8600000US28325 28325 2203868 0 2175834 7088.407 POLYGON ((-78.11554 35.1526...
# 6 6 28472 8600000US28472 28472 469545124 1646985 471198707 225630.092 POLYGON ((-78.85624 34.4161...
Also make an id column for the three points, so we can plot a label on the map to see which one is which and which ZIP code is closest.
# make point ID for the 3 points
locations_zip %>%
mutate(
id = row_number()
) %>%
{. ->> locations_zip_2}
locations_zip_2
#
# Simple feature collection with 3 features and 1 field
# Geometry type: POINT
# Dimension: XY
# Bounding box: xmin: -79.272 ymin: 35.18 xmax: -79.19 ymax: 35.182
# Geodetic CRS: WGS 84
# # A tibble: 3 x 2
# geometry id
# * <POINT [°]> <int>
# 1 (-79.19 35.18) 1
# 2 (-79.272 35.181) 2
# 3 (-79.24 35.182) 3
Now we can plot a map of the ZIP codes and the three points. We include labels for each feature too. I set a bbox inside tm_shape() so it focuses on our three points, and add ZIP code labels (I assume the values in the ZCTA5CE10 column are the ZIP codes?). I use st_centroid() to find the centre of each polygon, as the coordinates to place the ZIP code label.
# now make the map again
tmap_mode('view')
tm_shape(zipmap_2, bbox = locations_zip_2 %>% st_buffer(3000) %>% st_bbox)+
tm_polygons()+
tm_shape(zipmap_2 %>% st_centroid)+
tm_text(text = 'ZCTA5CE10', size = 2)+
tm_shape(locations_zip_2)+
tm_dots(col = 'red')+
tm_text(text = 'id', ymod = -3, col = 'red', size = 2)
Then, we use st_nearest_feature() to find the nearest polygon by index, and them pull out the ZIP code of the index.
# which is the closest zip code to each point?
locations_zip_2 %>%
mutate(
nearest_index = st_nearest_feature(., zipmap_2),
nearest_zip_code = zipmap_2$ZCTA5CE10[nearest_index]
)
# Simple feature collection with 3 features and 3 fields
# Geometry type: POINT
# Dimension: XY
# Bounding box: xmin: -79.272 ymin: 35.18 xmax: -79.19 ymax: 35.182
# Geodetic CRS: WGS 84
# # A tibble: 3 x 4
# geometry id nearest_index nearest_zip_code
# * <POINT [°]> <int> <int> <chr>
# 1 (-79.19 35.18) 1 76 28315
# 2 (-79.272 35.181) 2 117 28394
# 3 (-79.24 35.182) 3 76 28315
As far as I can tell, these ZIP codes match the map so I assume they are "correct". If not, feel free to edit the question with the outputs you are getting, and what you are expecting.

Strange behaviour when ggplotting line thickness depending on field of a sf "MULTILINESTRING"

I have a simple feature collection (lines) with the geometry type "MULTILINESTRING" that i want to plot with ggplot2. The line thickness should represent the value of a field. Unfortunately I am not able to provide a reprex. But maybe there is an obvious mistake that can be easily identified.
lines
Output:
Simple feature collection with 171 features and 1 field
geometry type: MULTILINESTRING
dimension: XY
bbox: xmin: 579649.7 ymin: 5801899 xmax: 625387 ymax: 5851019
projected CRS: ETRS89 / UTM zone 32N
First 10 features:
strahler geometry
1 1 MULTILINESTRING ((580016.3 ...
2 1 MULTILINESTRING ((582188.7 ...
3 1 MULTILINESTRING ((581499.9 ...
4 1 MULTILINESTRING ((586581.9 ...
5 1 MULTILINESTRING ((584296.4 ...
6 1 MULTILINESTRING ((583833.5 ...
7 1 MULTILINESTRING ((584608.8 ...
8 1 MULTILINESTRING ((583096.1 ...
9 1 MULTILINESTRING ((588869.2 ...
10 1 MULTILINESTRING ((587474.7 ...
I am plotting it with
lines %>%
mutate(strahler = as.integer(strahler)) %>%
ggplot() +
geom_sf(aes(size = strahler))
The result looks really messy. It seems that using size this way affects the vertices of the lines.
Reprex
When I tried to provide a reprex, this messy look didn't occur
lines <-
tibble(
x = c(1, 2, 5, 2, 4),
y = c(0, 3, 4, 3, 2),
order = c(2L, 2L, 2L, 1L, 1L)
) %>%
st_as_sf(coords = c("x", "y")) %>%
st_sf(crs = 25832) %>%
group_by(order) %>%
summarise() %>%
st_cast("MULTILINESTRING")
lines
Simple feature collection with 2 features and 1 field
geometry type: MULTILINESTRING
dimension: XY
bbox: xmin: 1 ymin: 0 xmax: 5 ymax: 4
projected CRS: ETRS89 / UTM zone 32N
# A tibble: 2 x 2
strahler geometry
<int> <MULTILINESTRING [m]>
1 1 ((2 3, 4 2))
2 2 ((1 0, 2 3, 5 4))
Plotting it
lines %>%
mutate(strahler = as.integer(strahler)) %>%
ggplot() +
geom_sf(aes(size = strahler))
I guess it has to do with the geometry field which looks different but I don't know how to convert it. I am happy about any hint!
Thanks to #mrhellmann: The proposed setting of the argument lineend = "round" in geom_sf() in combination with scale_size_identity() did solve this issue although it seems more like a work-around than a proper solution to me.
Using this code works:
lines %>%
mutate(strahler = as.integer(strahler)) %>%
ggplot() +
geom_sf(aes(size = strahler)) +
scale_size_identity()
However, a legend is not show using this solution!

How can I bin data into hexagons of a shapefile and plot it?

I am new to r and also to this website. I ran into some trouble with my current distribution project. My goal is to create a map with hexagons that have a colour gradient based on different attributes. For example number of records, number of species, rarefaction, etc. in the hexagon. I started with two shapefiles.
One for the hexagons:
Simple feature collection with 10242 features and 4 fields
geometry type: MULTIPOLYGON
dimension: XY
bbox: xmin: -180 ymin: -90 xmax: 180 ymax: 90
CRS: 4326
First 10 features:
ID CENTRELAT CENTRELON AREA geometry
1 -43.06618 41.95708 41583.14 MULTIPOLYGON (((43.50039 -4...
2 -73.41802 -144.73583 41836.20 MULTIPOLYGON (((-147.695 -7...
4862 -82.71189 -73.45815 50247.96 MULTIPOLYGON (((-78.89901 -...
7162 88.01938 53.07438 50258.17 MULTIPOLYGON (((36.63494 87...
3 -75.32015 -145.44626 50215.61 MULTIPOLYGON (((-148.815 -7...
4 -77.21239 -146.36437 50225.85 MULTIPOLYGON (((-150.2982 -...
5 -79.11698 -147.60550 50234.84 MULTIPOLYGON (((-152.3518 -...
6 -81.03039 -149.37750 50242.49 MULTIPOLYGON (((-155.3729 -...
7 -82.94618 -152.11105 50248.70 MULTIPOLYGON (((-160.2168 -...
8 -84.84996 -156.85274 50253.03 MULTIPOLYGON (((-169.0374 -...
And one for the map: geometry type: POLYGON; dimension: XY; bbox: xmin: -180 ymin: -90 xmax: 180 ymax: 83.64513; CRS: 4326
It is the land shapefile from this link:
natural earth data
I loaded them with the st_read function. And created a map with this code:
ggplot() +
geom_sf(data = hex5) +
geom_sf(data = land) +
coord_sf(1, xlim = c(100, 180), ylim = c(0, 90))
The map
I have a data frame that contains species names, longitude and latitude. Roughly 6300 entries.
scientific lat lon
1 Acoetes melanonota 11.75690 124.8010
2 Acoetes melanonota 11.97500 102.7350
3 Acoetes melanonota 13.33000 100.9200
4 Acrocirrus muroranensis 42.31400 140.9670
5 Acrocirrus uchidai 43.04800 144.8560
6 Acrocirrus validus 35.30000 139.4830
7 Acutomunna minuta 29.84047 130.9178
8 Admetella longipedata 13.35830 120.5090
9 Admetella longipedata 13.60310 120.7570
10 Aega acuticauda 11.95750 124.1780
How can I bin this data into the hexagons of the map and colour them with a gradient?
Thank you very much!
As I understand it, you have some points and some polygons. You want to summarise the values of the points by the polygon they are in. I made a reproducible example of a possible solution:
library(sf)
library(data.table)
library(dplyr)
# Create an exagonal grid
sfc = sf::st_sfc(sf::st_polygon(list(rbind(c(0,0), c(1,0), c(1,1), c(0,0)))))
G = sf::st_make_grid(sfc, cellsize = .1, square = FALSE)
# Convert to sf object
G = sf::st_as_sf(data.table(id_hex=1:76, geom=sf::st_as_text(G)), wkt='geom')
# Create random points on the grid with random value
n=500
p = data.table(id_point=1:n,
value = rnorm(n),
x=sample(seq(0,1,0.01), n, replace=T),
y=sample(seq(0,1,0.01), n, replace=T)
)
p = p[x >= y]
P = sf::st_as_sf(p, coords=c('x', 'y'))
# Plot geometry
plot(sf::st_geometry(G))
plot(P, add=TRUE)
# Join the geometries to associate each polygon to the points it contains
# Group by and summarise
J = sf::st_join(G, P, join=sf::st_contains) %>%
dplyr::group_by(id_hex) %>%
dplyr::summarise(sum_value=sum(value, na.rm=F),
count_value=length(value),
mean_value=mean(value, na.rm=F))
plot(J)
# Plot interactive map with mapview package
mapview::mapview(J, zcol="count_value") +
mapview::mapview(P)
Created on 2020-04-25 by the reprex package (v0.3.0)

How to convert X and Y coordinates into Latitude and longitude?

Following is an example of the data frame I have that was obtained from a publicly available crime data set for St. Louis. The documentation related to the data states that the Xcoord and Ycoord are in
State Plane North American Datum 1983 (NAD83) format
CodedMonth Description XCoord YCoord
1: 2019-09 AUTO THEFT-PERM RETNT/UNRECOV OVER 48HR 908297.3 1018623.0
2: 2019-09 ASSLT-AGGRAV-OTH-WPN-2ND-CHILD-DOMESTIC 903995.7 1014255.0
3: 2019-09 FORGERY-ISSUING FALSE INSTRUMENT OR CERTIFICAT 0.0 0.0
4: 2019-09 STLG BY DECEIT/IDENTITY THEFT REPORT 890704.7 1010659.0
5: 2019-09 STALKING (HARASSMENT ONLY, NO THREAT) 881105.8 1008297.0
6: 2019-09 LARCENY-MTR VEH PARTS UNDER $500 882929.6 992941.3
how do I convert these into Xcoord and Ycoord columns into lon and lat format so that I can plot this using ggmap
I have found a couple of answers Convert latitude/longitude to state plane coordinates
But I cant seem to get it to work for my data
You can use the sf package to convert it to a simple features geography.
In order to get this to work you need to know what coordinate system you are working with, and based on the description you provide (State Plane NAD83 and are near St. Louis), My first guess was EPSG 26996 (NAD83 / Missouri East USFT), but that plotted in the middle of lake Huron, so I tried ESRI: 102696. You can look up projections at spatialreference.org.
library(sf)
library(tidyverse)
library(ggmap)
my_df <- read_csv("C:/Users/Brian/Documents/temp.csv")
my_sf_df <- st_as_sf(my_df, coords = c("XCoord", "YCoord"), crs = 102696)
This sets the x and y to spatial coordinates. You need to re-project into a geographic system like WGS84 to convert to lat long. st_transform does this for us using crs = 4326, which is the WGS 84 coordinate system
my_latlon_df <- st_transform(my_sf_df, crs = 4326 )
my_latlon_df <- my_latlon_df%>%
mutate( lat= st_coordinates(my_latlon_df)[,1],
lon = st_coordinates(my_latlon_df)[,2])
my_latlon_df
# Simple feature collection with 6 features and 5 fields
# geometry type: POINT
# dimension: XY
# bbox: xmin: -93.26566 ymin: 35.80151 xmax: -90.19163 ymax: 38.63065
# epsg (SRID): 4326
# proj4string: +proj=longlat +datum=WGS84 +no_defs
# # A tibble: 6 x 6
# X1 CodedMonth Description geometry lat lon
# * <chr> <chr> <chr> <POINT [°]> <dbl> <dbl>
# 1 1: 2019-09 AUTO THEFT-PERM RETNT/UNRECOV OVER 48HR (-90.19163 38.63065) -82.2 44.7
# 2 2: 2019-09 ASSLT-AGGRAV-OTH-WPN-2ND-CHILD-DOMESTIC (-90.20674 38.6187) -82.3 44.7
# 3 3: 2019-09 FORGERY-ISSUING FALSE INSTRUMENT OR CERTIFICAT (-93.26566 35.80151) -93.3 35.8
# 4 4: 2019-09 STLG BY DECEIT/IDENTITY THEFT REPORT (-90.25329 38.60893) -82.4 44.6
# 5 5: 2019-09 STALKING (HARASSMENT ONLY, NO THREAT) (-90.2869 38.60251) -82.5 44.6
# 6 6: 2019-09 LARCENY-MTR VEH PARTS UNDER $500 (-90.28065 38.56034) -82.5 44.5
We now have geographic coordinates with lat and long as columns of our data frame. The no location information is going to cause problems, since it will plot at the origin of the state plane coordinate plane, which is down in Arkansas somewhere. Let's remove it so we can focus on the good points
# let's exclude point 3 for now
my_latlon_df <- my_latlon_df[-3,]
box <- st_bbox(my_latlon_df) # bounding box
names(box) <- NULL # removing non-complient labels
buffer = .2
box2 <- box + c(-buffer, -buffer, buffer, buffer) # buffering
base_map <- get_map(location = box2, source = "osm") # getting base map
# plotting
ggmap(base_map)+
geom_sf(data = my_latlon_df,
color = "red",
size = 2
)+
scale_x_continuous(limits = c(-90.35, -90.1))+
scale_y_continuous(limits = c(38.5, 38.7))
Unfortunately, if you don't know what coordinate system your x and y points are in, it can become a frustrating game of trial and error. The projected coordinate systems basically create a Cartesian plane on the surface of the globe, and the choice of origin, scale and other parameters are specific to each projection. There isn't nearly as much difference in geographic coordinate systems such as WGS84.
The correct geographic system/projection is "ESRI:102696" so the code should read:
my_sf_df <- st_as_sf(my_df, coords = c("XCoord", "YCoord"), crs = "ESRI:102696" )

Resources