Spatial Points to Polygons query - r

I have a question regarding converting spatial data in R and bringing it from R into QGIS.
I have a GeoTiff of Antarctic sea ice concentration, downloaded from the link below:
https://seaice.uni-bremen.de/databrowser/#day=13&month=10&year=2022&img=%7B%22image%22%3A%22image-1%22%2C%22product%22%3A%22AMSR%22%2C%22type%22%3A%22visual%22%2C%22region%22%3A%22Antarctic3125%22%7D
I want to extract the contour of the sea ice edge (defined as 15%), and then have this contour in a file type that I can open in QGIS and reproject for use in making other maps. My current understanding is that to do this, I would need to convert the contour to a spatial points df, and then convert that to a spatial polygons df which I would then be able to open as a shapefile in QGIS. However, I think I'm going wrong here as I cannot make the conversion with the below code - any suggestions?
**This is my current workflow:**
library(raster)
library(tidyverse)
library(sp)
library(sf)
#Load in sea ice geotiff
sic <- raster('Environmental_Data/SIC/AMSR2/asi-AMSR2-s3125-20220107-v5.4.tif')/1
plot(sic)
#Make all values over land NA
sic[sic>100] = NA
#Crop to make area smaller (I have a specific area of interest)
sic = crop(sic, extent(sic)*c(0.5,0.5,0,1))
plot(sic)
#Pull out the sea ice edge (15% contour) (this makes it a spatial lines df)
ie = rasterToContour(sic, levels=15)
#Convert to spatial points
ie.pt = as(ie, "SpatialPointsDataFrame") plot(ie.pt, add=T, pch=16, cex=0.4)
#Convert to spatial polygons
ie.pt_poly <-as(ie.pt, "SpatialPolygons")
#Then I get this error:
Error in as(ie.pt, "SpatialPolygons"):
no method or default for coercing “SpatialPointsDataFrame” to “SpatialPolygons”

reworking your process to terra and sf...
library(terra)
library(sf)
sic <- rast('~/Downloads/asi-AMSR2-s3125-20221113-v5.4.tif')
sic[sic>100] = NA
sic2 = crop(sic, ext(sic)*c(0.5,0.5,0,1))
sic2_contour <- terra::contour(sic2, maxcells=100000, filled = TRUE) # plot side effect
sic2_cont <- as.contour(sic2)
sic2_cont_disagg <- disagg(sic2_cont)
y <- sf::st_as_sf(sic2_cont_disagg)
y
Simple feature collection with 6519 features and 1 field
Geometry type: LINESTRING
Dimension: XY
Bounding box: xmin: -1968608 ymin: 206384.6 xmax: 1968608 ymax: 3660462
Projected CRS: WGS 84 / NSIDC Sea Ice Polar Stereographic South
First 10 features:
level geometry
1 10 LINESTRING (-1968608 340765...
2 10 LINESTRING (-1955825 232458...
3 10 LINESTRING (-1968608 259539...
4 10 LINESTRING (-1968608 262189...
5 10 LINESTRING (-1960827 264530...
6 10 LINESTRING (-1968608 265308...
7 10 LINESTRING (-1968608 278293...
8 10 LINESTRING (-1943042 251270...
9 10 LINESTRING (-1943042 275001...
10 10 LINESTRING (-1930259 331948...
unique(y$level)
[1] 10 20 30 40 50 60 70 80 90 100
y10 <- y[which(y$level == 10),]
plot(sic2)
plot(y10, col = 'pink', lwd =3, add = TRUE)
I can't think why one would go to points, except perhaps to then buffer and fill one's contour. But terra::writeRaster(sic2..., and terra::writeVector(y,..., or y10 pull into QGIS and see.
There are pink(s) interior to sic2 as these presumably are holes in sea ice that have the same value as northernmost contour that could perhaps be further removed by testing for within.

I think this is what you are looking for.
library(terra)
r <- rast("asi-AMSR2-s3125-20221113-v5.4.tif")
# crop to the area of interest
e <- ext(-1975000, 1975000, 2e+05, 4350000)
re <- crop(r, e)
# get contour and save to file
v <- as.contour(re, levels=15)
writeVector(v, "contour_lines.shp")
Contours are normally lines (neither points nor polygons). But if you wanted a polygon you could do
x <- ifel(x <15 | x>100, NA, 1)
p <- as.polygons(x)
writeVector(p, "contour_polygons.shp")
Or, more generally, use terra::classify to create regions before using as.polygons.

Related

Cleaning Geocode data with r

I am cleaning my dataset and I don't know how to clean GPS data.
when I use the table function I find that they are entered in different shapes.
"547140",
"35.6997",
"251825.7959",
"251470.43",
"54/4077070001",
and "54/305495"
I don't know how to clean this variable with this great difference.
I would be thankful if help me or suggest me a website for training.
Your main issue is standardizing the GPS by projecting GPS to a coordinate system of choice. Say we have the GPS of amsterdam in two different coordinate systems, one in amersfoort/rd new (espg 28992) and one in wsg1984 (espg 4326):
x y location espg
1: 1.207330e+05 486632.35593 amsterdam 28992
2: 4.884088e+00 52.36651 amsterdam 4326
structure(list(x = c(120733.012428048, 4.88408811380055), y = c(486632.355933105,
52.3665054922233), location = c("amsterdam", "amsterdam"), espg = c(28992,
4326)), row.names = c(NA, -2L), class = "data.frame")
What we want to do is reproject our coordinates to one geographic coordinate system of choice. In this case I used WSG1984 (espg 4326).
library(sf)
#here I tell R which columns contain the coordinates
coordinates(dt) <- ~x+y
#I now convert the table to a spatial object
dt <- st_as_sf(dt)
#here I split by the different ESPG's present
dt <- split(dt, dt$espg)
#here I loop through every individual espg present in the dataset
for(i in 1:length(dt)){
#here I say in which coordinate system (espg) the GPS data is in
st_crs(dt[[i]]) <- unique(dt[[i]]$espg)
#here I transform the coordinates to another projection (in this case WSG1984, espg 4326)
dt[[i]] <- dt[[i]] %>% st_transform(4326)
}
#here I bind the items of the list together
dt <- do.call(rbind, dt)
head(dt)
Simple feature collection with 2 features and 2 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 4.884088 ymin: 52.36651 xmax: 4.884088 ymax: 52.36651
Geodetic CRS: WGS 84
location espg geometry
4326 amsterdam 4326 POINT (4.884088 52.36651)
28992 amsterdam 28992 POINT (4.884088 52.36651)
In the geometry column you now see that the coordinates are equal to one another.
Bottom line is that you need to know the geographic coordinate system the GPS data is in. Then you can convert your data from a table to a spatial object and transform the GPS data to a projection of choice.
In addition, it is always a good idea to check if your assumption on the original ESPG is good by for example plotting the data.
library(ggplot2)
library(ggspatial)
ggplot(dt) + annotation_mape_tile() + geom_sf(size = 4) + theme(text = element_text(size = 15) + facet_wrap(~espg)
In the figurebelow we see that the projection went well for both espg's.

Issues plotting a combined shapefile in R using ggplot, no defined error given

Libraries used:
library(sp)
library(sf)
library(ggplot2)
library(ggmap)
Created a dataframe called "coordinate.data" with longitude & latitude as column names, & weather station locations as row names.
longitude <- c(-73.964482,-73.953678,-73.893522,-73.815856,-74.148499)
latitude <- c(40.767544,40.631762,40.872481,40.734335,40.604014)
coordinate.data <- data.frame(longitude,latitude)
rownames(coordinate.data) <- c("MANH","BKLN","BRON","QUEE","STAT")
I then retreived shapefile data of NJ counties and NYC boroughs, & deleted all unnecessary columns so only the geometry field was left in both shapefiles. The NYC Boroughs shapefile data was downloaded from NYC Open Data, & the NJ county boundaries was downloaded from NJGIN Open Data.
nj.shp <- st_read("~/Downloads/NJ/NJ_Counties.shp")
nj <- nj.shp[,-(1:21)]
nyc.shp <- st_read("~/Downloads/NY/NYC_Boroughs.shp")
nyc <- nyc.shp[,-(1:4)]
I formatted both shapefiles to have the same projection (ESPG code 3857) and combined them into a shapefile dataframe with 26 observations (counties/boroughs) in one variable (geometry).
same.projection <- CRS("+init=EPSG:3857")
nj.data <- st_transform(nj,same.projection)
new.projection <- CRS("+init=EPSG:3857")
nyc.data <- st_transform(nyc,new.projection)
combined.data <- rbind(nj.data,nyc.data)
I am now attempting to plot the combined shapefile ("combined.data") on a map, in addition to the weather station locations ("coordinate.data"). When I attempt this, it runs inevitably & R shuts down. If I remove geom_sf(...), it plots the stations & formats everything correctly, so I assume the issue is with this line of code.
mesonet.map <-ggplot() +
ggtitle("NY Mesonet Site Locations") +
xlab("Longitude") +
ylab("Latitude") +
geom_point(data=coordinate.data,aes(x=longitude,y=latitude))+
geom_text(aes(x=longitude,y=latitude,label=rownames(coordinate.data)),size=3.25,nudge_y=0.02)+
geom_sf(data=combined.data,fill='darkgreen') +
mesonet.map + theme(
panel.background=element_rect(fill="lightblue",color="lightblue",size=0.5,linetype="solid"),
panel.grid.major=element_line(size=0.5,linetype='solid',color="white"),
panel.grid.minor=element_line(size=0.25,linetype='solid',color="white")
)
I'm not too sure what you were deleting from those shape files. I didn't delete anything. Nor did I combine anything. There are four separate layers projected in my output: the base map, NY shape file, NJ shape file, and the shape file called sites that was created and described below.
This map was created using 3 existing shape files, and the site shape file that was created in the following steps. The variables that were created from each step are printed and shown with the descriptions of each step.
First, created the spatial geometry variable:
MULTIPOINT ((-73.96448 40.76754), (-73.95368 40.63176), (-73.89352
40.87248), (-73.81586 40.73434), (-74.1485 40.60401))
Then, created the geometry column (combined the multipoint variable with the crs)
MULTIPOINT ((-73.96448 40.76754), (-73.95368 40...
Geometry set for 1 feature
geometry type: MULTIPOINT
dimension: XY
bbox: xmin: -74.1485 ymin: 40.60401 xmax: -73.81586 ymax: 40.87248
projected CRS: WGS 84 / Pseudo-Mercator
Then created a regular df consisting or one column, the site names.
Then created the simple feature object - (combined the geometry column with the df). Output shown:
A sf: 5 × 2
site pts.sfc
<fct> <MULTIPOINT [m]>
MANH MULTIPOINT ((-73.96448 40.7...
BKLN MULTIPOINT ((-73.96448 40.7...
BRON MULTIPOINT ((-73.96448 40.7...
QUEE MULTIPOINT ((-73.96448 40.7...
STAT MULTIPOINT ((-73.96448 40.7...
Then this sf object was written to the shape folder:
Writing layer `Weather_sites' to data source `C:/pathR' using driver `ESRI Shapefile'
Writing 5 features with 1 fields and geometry type Multi Point.
The output shows the 5 sites in NY and the states of NJ and NY. The NY shape file seems less inclusive and only includes some limited NY data.
Getting the entire mapping thing right involves doing the four steps in order. Geometry, geometry column, sf object, write to shape folder.
The map exported to .png

Make a vector of coordinates to filter data within a certain area

Rookie R user here and I would greatly appreciate any help you someone could give me.
My project requires me to create a vector boundary box around a city of my choice and then filter a lot of data so I only have the data relative to the area. However, it is several years since I have used R studio and its fair to say I remember little to nothing about the language.
I have initially used
geocode("Hereford, UK")
bbox <-c(Longitude=-2.72,Latitude=52.1)
myMap <- get_map(location = "Hereford, UK",source="google",maptype="roadmap")
I then must create a new tibble which filters out and gives only the relevant data to the area.
I am unsure how to proceed with this and I then must overlay the data onto the map which I have created.
As I only have a centre point of coordinates, is it possible to create a circle with a radius of say 3 miles around the centre of my location so I can then filter this area?
Thank you all for taking the time to read my post. Cheers!
Most spatial work can now be done pretty easily using the sf package.
Example code for a similar problem is below. The comments explain most of what it does.
The difficult part may be in understanding map projections (the crs). Some use units(meters, feet, etc) and others use latitude / longitude. Which one you choose depends on what area of the globe you're working with and what you're trying to accomplish. Most web mapping uses crs 4326, but that does not include an easily usable distance measurement.
The map below shows points outside ~3 miles from Hereford as red, and those inside in dark maroon. The blue point is used as the center for Hereford & the buffer zone.
library(tidyverse)
library(sf)
#> Linking to GEOS 3.6.2, GDAL 2.2.3, PROJ 4.9.3
library(mapview)
set.seed(4)
#hereford approx location, ggmap requires api key
hereford <- data.frame(place = 'hereford', lat = -2.7160, lon = 52.0564) %>%
st_as_sf(coords = c('lat', 'lon')) %>% st_set_crs(4326)
#simulation of data points near-ish hereford
random_points <- data.frame(point_num = 1:20,
lat = runif(20, min = -2.8, max = -2.6),
lon = runif(20, min = 52, max = 52.1)) %>%
st_as_sf(coords = c('lat', 'lon')) %>% st_set_crs(4326) %>%st_transform(27700)
#make a buffer of ~3miles (4800m) around hereford
h_buffer <- hereford %>% st_transform(27700) %>% #change crs to one measured in meters
st_buffer(4800)
#only points inside ~3mi buffer
points_within <- random_points[st_within( random_points, h_buffer, sparse = F), ]
head(points_within)
#> Simple feature collection with 6 features and 1 field
#> geometry type: POINT
#> dimension: XY
#> bbox: xmin: 346243.2 ymin: 239070.3 xmax: 355169.8 ymax: 243011.4
#> CRS: EPSG:27700
#> point_num geometry
#> 1 1 POINT (353293.1 241673.9)
#> 3 3 POINT (349265.8 239397)
#> 4 4 POINT (349039.5 239217.7)
#> 6 6 POINT (348846.1 243011.4)
#> 7 7 POINT (355169.8 239070.3)
#> 10 10 POINT (346243.2 239690.3)
#shown in mapview
mapview(hereford, color = 'blue') +
mapview(random_points, color = 'red', legend = F, col.regions = 'red') +
mapview(h_buffer, legend = F) +
mapview(points_within, color = 'black', legend = F, col.regions = 'black')
Created on 2020-04-12 by the reprex package (v0.3.0)

create density raster and extract sum by polygon feature

I have a polygon (zones) and a set of coordinates (points). I'd like to create a spatial kernal density raster for the entire polygon and extract the sum of the density by zone. Points outside of the polygon should be discarded.
library(raster)
library(tidyverse)
library(sf)
library(spatstat)
library(maptools)
load(url("https://www.dropbox.com/s/iv1s5butsx2v01r/example.RData?dl=1"))
# alternatively, links to gists for each object
# https://gist.github.com/ericpgreen/d80665d22dfa1c05607e75b8d2163b84
# https://gist.github.com/ericpgreen/7f4d3cee3eb5efed5486f7f713306e96
ggplot() +
geom_sf(data = zones) +
geom_sf(data = points) +
theme_minimal()
I tried converting to ppp with {spatstat} and then using density(), but I'm confused by the units in the result. I believe the problem is related to the units of the map, but I'm not sure how to proceed.
Update
Here's the code to reproduce the density map I created:
zones_owin <- as.owin(as_Spatial(zones))
pts <- st_coordinates(points)
p <- ppp(pts[,1], pts[,2], window=zones_owin, unitname=c("metre","metres"))
ds <- density(p)
r <- raster(ds)
plot(r)
Units are difficult when you work directly with geographic coordinates (lon, lat). If possible you should convert to planar coordinates (which is a requirement for spatstat) and proceed from there. The planar coordinates would typically be in units of meters, but I guess it depends on the specific projection and underlying ellipsoid etc. You can see this answer for how to project to planar coordinates with sf and export to spatstat format using maptools. Note: You have to manually choose a sensible projection (you can use http://epsg.io to find one) and you have to project both the polygon and the points.
Once everything is in spatstat format you can use density.ppp to do kernel smoothing. The resulting grid values (object of class im) are intensities of points, i.e., number of points per square unit (e.g. square meter). If you want to aggregate over some region you can use integral.im(..., domain = ...) to get the expected number of points in this region for a point process model with the given intensity.
I'm not sure if this answers all of your question, but should be a good start. Clarify in a comment or in your question should you need a different type of output.
It removes all points that are not inside one of the 'zone' polygons, counts them by zone and plots the zones colored by the number of points that fall within.
library(raster)
library(tidyverse)
library(sf)
#> Linking to GEOS 3.6.2, GDAL 2.2.3, PROJ 4.9.3
library(spatstat)
library(maptools)
#> Checking rgeos availability: TRUE
load(url("https://www.dropbox.com/s/iv1s5butsx2v01r/example.RData?dl=1"))
# alternatively, links to gists for each object
# https://gist.github.com/ericpgreen/d80665d22dfa1c05607e75b8d2163b84
# https://gist.github.com/ericpgreen/7f4d3cee3eb5efed5486f7f713306e96
p1 <- ggplot() +
geom_sf(data = zones) +
geom_sf(data = points) +
theme_minimal()
#Remove points outside of zones
points_inside <- st_intersection(points, zones)
#> although coordinates are longitude/latitude, st_intersection assumes that they are planar
#> Warning: attribute variables are assumed to be spatially constant throughout all
#> geometries
nrow(points)
#> [1] 308
nrow(points_inside)
#> [1] 201
p2 <- ggplot() +
geom_sf(data = zones) +
geom_sf(data = points_inside)
points_per_zone <- st_join(zones, points_inside) %>%
count(LocationID.x)
#> although coordinates are longitude/latitude, st_intersects assumes that they are planar
p3 <- ggplot() +
geom_sf(data = points_per_zone,
aes(fill = n)) +
scale_fill_viridis_c(option = 'C')
points_per_zone
#> Simple feature collection with 4 features and 2 fields
#> geometry type: POLYGON
#> dimension: XY
#> bbox: xmin: 34.0401 ymin: -1.076718 xmax: 34.17818 ymax: -0.9755066
#> epsg (SRID): 4326
#> proj4string: +proj=longlat +ellps=WGS84 +no_defs
#> # A tibble: 4 x 3
#> LocationID.x n geometry
#> * <dbl> <int> <POLYGON [°]>
#> 1 10 129 ((34.08018 -0.9755066, 34.0803 -0.9757393, 34.08046 -0.975…
#> 2 20 19 ((34.05622 -0.9959458, 34.05642 -0.9960835, 34.05665 -0.99…
#> 3 30 29 ((34.12994 -1.026372, 34.12994 -1.026512, 34.12988 -1.0266…
#> 4 40 24 ((34.11962 -1.001829, 34.11956 -1.002018, 34.11966 -1.0020…
cowplot::plot_grid(p1, p2, p3, nrow = 2, ncol = 2)
It seems I underestimated the difficulty of your problem. Is something like the plot below (& underlying data) what you're looking for?
It uses raster with ~50x50 grid, raster::focal with a window of 9x9 using the mean to interpolate the data.

how to merge a shapefile with a dataframe with latitude/longitude data

I am struggling with the following issue
I have downloaded the PLUTO NYC Manhattan Shapefile for the NYC tax lots from here https://www1.nyc.gov/site/planning/data-maps/open-data/dwn-pluto-mappluto.page
I am able to read them in sf with a simple st_read
> mydf
Simple feature collection with 42638 features and 90 fields
geometry type: MULTIPOLYGON
dimension: XY
bbox: xmin: 971045.3 ymin: 188447.4 xmax: 1010027 ymax: 259571.5
epsg (SRID): NA
proj4string: +proj=lcc +lat_1=40.66666666666666 +lat_2=41.03333333333333 +lat_0=40.16666666666666 +lon_0=-74 +x_0=300000 +y_0=0 +datum=NAD83 +units=us-ft +no_defs
First 10 features:
Borough Block Lot CD CT2010 CB2010 SchoolDist Council ZipCode FireComp PolicePrct HealthCent HealthArea
1 MN 1545 52 108 138 4000 02 5 10028 E022 19 13 3700
My problem is the following: I have a dataframe as follows
> data_frame('lat' = c(40.785091,40.785091), 'lon' = c(-73.968285, -73.968285))
# A tibble: 2 x 2
lat lon
<dbl> <dbl>
1 40.785091 -73.968285
2 40.785091 -73.968285
I would like to merge this data to the mydf dataframe above, so that I can count how many latitude/longitude observations I have within each tax lot (remember, mydf is at the tax lot granularity), and plot the corresponding map of it. I need to do so using sf.
In essence something similar to
pol <- mydf %>% select(SchoolDist)
plot(pol)
but where the counts for each tax lot come from counting how many points in my latitude/longitude dataframe fall into them.
Of course, in my small example I just have 2 points in the same tax lot, so that would just highlight one single tax lot in the whole area. My real data contains a lot more points.
I think there is an easy way to do it, but I was not able to find it.
Thanks!
This is how I would do it with arbitrary polygon and point data. I wouldn't merge the two and instead just use a geometry predicate to get the counts that you want. Here we:
Use the built in nc dataset and transform to 3857 crs, which is projected rather than lat-long (avoids a warning in st_contains)
Create 1000 random points within the bounding box of nc, using st_bbox and runif. Note that st_as_sf can turn a data.frame with lat long columns into sf points.
Use lengths(st_contains(polygons, points) to get the counts of points per polygon. sgbp objects created by a geometry predicate are basically "for each geometry in sf x, what indices of geometries in sf y satisfy the predicate". So lengths1 effectively gives the number of points that satisfy the predicate for each geometry, in this case number of points contained within each polygon.
Once the counts are in the sf object as a column, we can just select and plot them with the plot.sf method.
For your data, simply replace nc with mydf and leave out the call to tibble, instead use your data.frame with the right lat long pairs.
library(tidyverse)
library(sf)
#> Linking to GEOS 3.6.1, GDAL 2.2.3, proj.4 4.9.3
nc <- system.file("shape/nc.shp", package="sf") %>%
read_sf() %>%
st_transform(3857)
set.seed(1000)
points <- tibble(
x = runif(1000, min = st_bbox(nc)[1], max = st_bbox(nc)[3]),
y = runif(1000, min = st_bbox(nc)[2], max = st_bbox(nc)[4])
) %>%
st_as_sf(coords = c("x", "y"), crs = 3857)
plot(nc$geometry)
plot(points$geometry, add = TRUE)
nc %>%
mutate(pt_count = lengths(st_contains(nc, points))) %>%
select(pt_count) %>%
plot()
Created on 2018-05-02 by the reprex package (v0.2.0).
I tried this on your data, but the intersection is empty for the both sets of points you provided. However, the code should work.
EDIT: Simplified group_by + mutate with add_count:
mydf = st_read("MN_Dcp_Mappinglot.shp")
xydf = data.frame(lat=c(40.758896,40.758896), lon=c(-73.985130, -73.985130))
xysf = st_as_sf(xydf, coords=c('lon', 'lat'), crs=st_crs(mydf))
## NB: make sure to st_transform both to common CRS, as Calum You suggests
xysf %>%
sf::st_intersection(mydf) %>%
dplyr::add_count(LOT)
Reproducible example:
nc = sf::st_read(system.file("shape/nc.shp", package="sf"))
ncxy = sf::st_as_sf(data.frame(lon=c(-80, -80.1, -82), lat=c(35.5, 35.5, 35.5)),
coords=c('lon', 'lat'), crs=st_crs(nc))
ncxy = ncxy %>%
sf::st_intersection(nc) %>%
dplyr::add_count(FIPS)
## a better approach
ncxy = ncxy %>%
sf::st_join(nc, join=st_intersects) %>%
dplyr::add_count(FIPS)
The new column n includes the total number of points per FIPS code.
ncxy %>% dplyr::group_by(FIPS) %>% dplyr::distinct(n)
> although coordinates are longitude/latitude, st_intersects assumes
that they are planar
# A tibble: 2 x 2
# Groups: FIPS [2]
FIPS n
<fctr> <int>
1 37123 2
2 37161 1
I'm not sure why your data results in an empty intersection, but since the code works on the example above there must be a separate issue.
HT: st_join approach from this answer.

Resources