How to extract the largest polygon in a raster? - r

I can read a rastre and exrec polygones likes:
If I have this code to read a raster and shapefile:
library(raster)
library(geojsonsf)
library(sf)
library(exactextractr)
r <- raster(matrix(rnorm(10*12), nrow=10), xmn = -180, xmx= 180, ymn = -90, ymx= 90)
myurl <- "http://eric.clst.org/assets/wiki/uploads/Stuff/gz_2010_us_050_00_500k.json"
geo <- readLines(myurl)
geo <- paste0(geo, collapse = "")
system.time({ sf <- geojson_sf(geo)})
#add crs information for the raster 'r'
crs(r) <- 4326
# extract the 'r' raster value for each polygon 'NAME' in 'sf'
res <- do.call(rbind, exactextractr::exact_extract(r, sf, include_cols = 'NAME'))[-3]
It can be that several polygons are within one pixel and I need to extract not all polygons but only the largest polygon in a pixel.

Here's some code that does what I think you want which is to find the polygon that is largest for each of the raster cells. I have modified the code so rasters can be uniquely identified and then I use GEO_ID because NAME is not unique in the data (there are 31 Washingtons for example). I use dplyr to find the maximum coverage for each raster and mapview to view the results and convince myself that the code is working.
library(raster)
library(geojsonsf)
library(sf)
library(exactextractr)
library(mapview)
library(dplyr)
# Give the raster cells a unique identifier so we can use this to find which
# polygon is the maximum in a given raster
r <- raster(matrix(1:120, nrow=10), xmn = -180, xmx= 180, ymn = -90, ymx= 90)
crs(r) <- 4326
myurl <- "http://eric.clst.org/assets/wiki/uploads/Stuff/gz_2010_us_050_00_500k.json"
geo <- readLines(myurl)
geo <- paste0(geo, collapse = "")
sf <- geojson_sf(geo)
# Find the polygons that overlap with the raster cells
# the coverage fraction says how much overlap so
# simply find the largest whilst grouping ny the id
# of the raster
# Use GEO_ID because NAME has many duplicates
overlay <- do.call(rbind, exactextractr::exact_extract(r, sf, include_cols = c('GEO_ID', 'NAME')))
# Find the maximum in each raster
maximum_in_raster <- overlay %>% group_by(value) %>% top_n(1, coverage_fraction)
# Create a subset of polygons corresponding to the maxima
# This is where GEO_ID is important
maximum_polygons <- sf[sf$GEO_ID %in% maximum_in_raster$GEO_ID, 'NAME']
# Make a grid to display
sf_grid <- st_make_grid(r, n=c(12,10))
# Use mapview to have a look at the results to convince ourselves that it's working
mapview(maximum_polygons, alpha.regions=1) + mapview(sf_grid, col.regions='white')
Here's an example showing the maxima in the south west of the US.
The 4 polygons shown are Lake, Nye, San Bernardino and San Luis Obispo. The horizontal and vertical lines show the boundaries of the raster cells.
And to make the point about duplicate names, here is the code to count how many names appear in each raster cell.
counts <- overlay %>% count(value, NAME, sort = T)
head(counts)
value NAME n
1 33 Washington 13
2 23 Lincoln 12
3 33 Franklin 12
4 23 Washington 10
5 23 Douglas 9
6 23 Grant 9
Which shows 13 Washingtons in cell 33.

Related

create evenly spaced polylines over counties using R

I would like to create evenly spaced polylines going North to South with 50 mile spacing between each line and 10 miles long. Not sure if this is possible using sf package. In the example below, I would like to have the lines filling the counties across the state of Washington.
library(tigris)
library(leaflet)
states <- states(cb = TRUE)
counties<-counties(cb=TRUE)
counties<- counties%>%filter(STATEFP==53)
states<- states%>%filter(NAME=="Washington")
leaflet(states) %>%
addProviderTiles("CartoDB.Positron") %>%
addPolygons(fillColor = "white",
color = "black",
weight = 0.5) %>%
addPolygons(data=counties,color='red',fillColor = 'white')%>%
setView(-120.5, 47.3, zoom=8)
I've updated to include an image of what I'd like to do below.
You can create a multilinestring sf object from scratch by specifying coordinates.
You can get these coordinates from the extent (bounding box) of Washington, but you may also be interested in knowing how to create a grid, which I will demonstrate below because it may be helpful.
Copy and paste this reproducible example:
library(tidyverse)
library(tigris)
library(leaflet)
library(sf)
library(raster)
states <- states(cb = TRUE)
# subset for WA and transform to a meter-based CRS
states <- states %>%
filter(NAME == "Washington") %>%
st_transform(crs = 3857) # Mercator
# fifty miles in meters
fm <- 80467.2
# subset for Washington
states_sp <- as(states, "Spatial")
# create a grid, convert it to polygons to plot
grid <- raster(extent(states_sp),
resolution = c(fm, fm),
crs = proj4string(states_sp))
grid <- rasterToPolygons(grid)
plot(states_sp)
plot(grid, add = TRUE)
# find the top y coordinate and calculate 50 mile intervals moving south
ty <- extent(grid)[4] # y coordinate along northern WA edge
ty <- ty - (fm * 0:7) # y coordinates moving south at 10 mile intervals
# create a list of sf linestring objects
l <- vector("list", length(ty))
for(i in seq_along(l)){
l[[i]] <-
st_linestring(
rbind(
c(extent(grid)[1], ty[i]),
c(extent(grid)[2], ty[i])
)
)
}
# create the multilinestring, which expects a list of linestrings
ml <- st_multilinestring(l)
plot(states_sp)
plot(as(ml, "Spatial"), add = TRUE, col = "red")
As you can see, I switch back and forth between sf and sp objects using the functions as(sf_object, "Spatial") and st_as_sf(sp_object). Use these to transform the data to your needs.

sp::over(). Does the dot belong to one of the polygons identified with an OGRGeoJSON file?

I'm trying to get a boolleans vector, where for example, v[i] =1 tells me if an i-th point (latitude longitude pair, present inside a train dataframe) falls within one of the geographical areas identified by an OGRGeoJSON file.
The OGR file is structured roughly like this:
District 1: 24 polygonal
District 2: 4 polygonal
District 3: 27 polygonal
District 4: 18 polygonal
District 5: 34 polygonal
That's what I tried to do.
However, the results obtained are not correct because the polygonal that is generated is a mix of all the various areas present in the OGR file.
library(rgdal)
library(httr)
library(sp)
r <- GET('https://data.cityofnewyork.us/api/geospatial/tqmj-j8zm?method=export&format=GeoJSON')
nyc_neighborhoods <- readOGR(content(r,'text'), 'OGRGeoJSON', verbose = F)
#New York City polygonal
pol_lat <- c(nyc_neighborhoods_df$lat)
pol_long <- c(nyc_neighborhoods_df$long)
xy <- cbind(pol_lat, pol_long)
p = Polygon(xy)
ps = Polygons(list(p),1)
pol = SpatialPolygons(list(ps))
#Points to analyse (pair of coordinates)
ny_lat <- c(train$pickup_latitude, train$dropoff_latitude)
ny_long <- c(train$pickup_longitude, train$dropoff_longitude)
ny_coord <- cbind(ny_lat, ny_long)
pts <- SpatialPoints(ny_coord)
#Query: Does the point to analyze fall in or out NYC?
over(pts, pol, returnList = TRUE)
How can I fix this to get the correct result?
sp is an older package which is being phased out in favor of the newer "Simple Features" sf package. Let me know if you are open to using the pipe operator %>% from the magrittr package, as it works nicely with the sf package (as does dplyr and purrr).
Using sf, you could do:
library(sf)
# Replace this with the path to the geojson file
geojson_path <- "path/to/file.geojson"
boroughs <- sf::st_read(dsn = geojson_path, stringsAsFactors = FALSE)
Now making a very simple spatial point object to stand in for the "trains" data.
# Make test data.frame
test_df <-
data.frame(
# Random test point I chose, a couple of blocks from Central Park
a = "manhattan_point",
y = 40.771959,
x = -73.964128,
stringsAsFactors = FALSE)
# Turn the test_df into a spatial object
test_point <-
sf::st_as_sf(
test_df,
# The coords argument tells the st_as_sf function
# what columns store the longitude and latitude data
# which it uses to associate a spatial point to each
# row in the data.frame
coords = c("x", "y"),
crs = 4326 # WGS84
)
Now we are ready to determine what polygon(s) our point falls in:
# Get the sparse binary predicate. This will give a list with as
# many elements as there are spatial objects in the first argument,
# in this case, test_point, which has 1 element.
# It also has attributes which detail what the relationship is
# (intersection, in our case)
sparse_bin_pred <- sf::st_intersects(test_point, boroughs)
# Output the boro_name that matched. I think the package purrr
# offers some more intuitive ways to do this, but
lapply(
sparse_bin_pred,
function(x) boroughs$boro_name[x]
)
That last part outputs:
[[1]]
[1] "Manhattan"

how to merge a shapefile with a dataframe with latitude/longitude data

I am struggling with the following issue
I have downloaded the PLUTO NYC Manhattan Shapefile for the NYC tax lots from here https://www1.nyc.gov/site/planning/data-maps/open-data/dwn-pluto-mappluto.page
I am able to read them in sf with a simple st_read
> mydf
Simple feature collection with 42638 features and 90 fields
geometry type: MULTIPOLYGON
dimension: XY
bbox: xmin: 971045.3 ymin: 188447.4 xmax: 1010027 ymax: 259571.5
epsg (SRID): NA
proj4string: +proj=lcc +lat_1=40.66666666666666 +lat_2=41.03333333333333 +lat_0=40.16666666666666 +lon_0=-74 +x_0=300000 +y_0=0 +datum=NAD83 +units=us-ft +no_defs
First 10 features:
Borough Block Lot CD CT2010 CB2010 SchoolDist Council ZipCode FireComp PolicePrct HealthCent HealthArea
1 MN 1545 52 108 138 4000 02 5 10028 E022 19 13 3700
My problem is the following: I have a dataframe as follows
> data_frame('lat' = c(40.785091,40.785091), 'lon' = c(-73.968285, -73.968285))
# A tibble: 2 x 2
lat lon
<dbl> <dbl>
1 40.785091 -73.968285
2 40.785091 -73.968285
I would like to merge this data to the mydf dataframe above, so that I can count how many latitude/longitude observations I have within each tax lot (remember, mydf is at the tax lot granularity), and plot the corresponding map of it. I need to do so using sf.
In essence something similar to
pol <- mydf %>% select(SchoolDist)
plot(pol)
but where the counts for each tax lot come from counting how many points in my latitude/longitude dataframe fall into them.
Of course, in my small example I just have 2 points in the same tax lot, so that would just highlight one single tax lot in the whole area. My real data contains a lot more points.
I think there is an easy way to do it, but I was not able to find it.
Thanks!
This is how I would do it with arbitrary polygon and point data. I wouldn't merge the two and instead just use a geometry predicate to get the counts that you want. Here we:
Use the built in nc dataset and transform to 3857 crs, which is projected rather than lat-long (avoids a warning in st_contains)
Create 1000 random points within the bounding box of nc, using st_bbox and runif. Note that st_as_sf can turn a data.frame with lat long columns into sf points.
Use lengths(st_contains(polygons, points) to get the counts of points per polygon. sgbp objects created by a geometry predicate are basically "for each geometry in sf x, what indices of geometries in sf y satisfy the predicate". So lengths1 effectively gives the number of points that satisfy the predicate for each geometry, in this case number of points contained within each polygon.
Once the counts are in the sf object as a column, we can just select and plot them with the plot.sf method.
For your data, simply replace nc with mydf and leave out the call to tibble, instead use your data.frame with the right lat long pairs.
library(tidyverse)
library(sf)
#> Linking to GEOS 3.6.1, GDAL 2.2.3, proj.4 4.9.3
nc <- system.file("shape/nc.shp", package="sf") %>%
read_sf() %>%
st_transform(3857)
set.seed(1000)
points <- tibble(
x = runif(1000, min = st_bbox(nc)[1], max = st_bbox(nc)[3]),
y = runif(1000, min = st_bbox(nc)[2], max = st_bbox(nc)[4])
) %>%
st_as_sf(coords = c("x", "y"), crs = 3857)
plot(nc$geometry)
plot(points$geometry, add = TRUE)
nc %>%
mutate(pt_count = lengths(st_contains(nc, points))) %>%
select(pt_count) %>%
plot()
Created on 2018-05-02 by the reprex package (v0.2.0).
I tried this on your data, but the intersection is empty for the both sets of points you provided. However, the code should work.
EDIT: Simplified group_by + mutate with add_count:
mydf = st_read("MN_Dcp_Mappinglot.shp")
xydf = data.frame(lat=c(40.758896,40.758896), lon=c(-73.985130, -73.985130))
xysf = st_as_sf(xydf, coords=c('lon', 'lat'), crs=st_crs(mydf))
## NB: make sure to st_transform both to common CRS, as Calum You suggests
xysf %>%
sf::st_intersection(mydf) %>%
dplyr::add_count(LOT)
Reproducible example:
nc = sf::st_read(system.file("shape/nc.shp", package="sf"))
ncxy = sf::st_as_sf(data.frame(lon=c(-80, -80.1, -82), lat=c(35.5, 35.5, 35.5)),
coords=c('lon', 'lat'), crs=st_crs(nc))
ncxy = ncxy %>%
sf::st_intersection(nc) %>%
dplyr::add_count(FIPS)
## a better approach
ncxy = ncxy %>%
sf::st_join(nc, join=st_intersects) %>%
dplyr::add_count(FIPS)
The new column n includes the total number of points per FIPS code.
ncxy %>% dplyr::group_by(FIPS) %>% dplyr::distinct(n)
> although coordinates are longitude/latitude, st_intersects assumes
that they are planar
# A tibble: 2 x 2
# Groups: FIPS [2]
FIPS n
<fctr> <int>
1 37123 2
2 37161 1
I'm not sure why your data results in an empty intersection, but since the code works on the example above there must be a separate issue.
HT: st_join approach from this answer.

R - find point farthest from set of points on rasterized USA map

New to spatial analysis on R here. I have a shapefile for the USA that I downloaded from HERE. I also have a set of lat/long points (half a million) that lie within the contiguous USA.
I'd like to find the "most remote spot" -- the spot within the contiguous USA that's farthest from the set of points.
I'm using the rgdal, raster and sp packages. Here's a reproducible example with a random sample of 10 points:
# Set wd to the folder tl_2010_us_state_10
usa <- readOGR(dsn = ".", layer = "tl_2010_us_state10")
# Sample 10 points in USA
sample <- spsample(usa, 10, type = "random")
# Set extent for contiguous united states
ext <- extent(-124.848974, -66.885444, 24.396308, 49.384358)
# Rasterize USA
r <- raster(ext, nrow = 500, ncol = 500)
rr <- rasterize(usa, r)
# Find distance from sample points to cells of USA raster
D <- distanceFromPoints(object = rr, xy = sample)
# Plot distances and points
plot(D)
points(sample)
After the last two lines of code, I get this plot.
However, I'd like it to be over the rasterized map of the USA. And, I'd like it to only consider distances from cells that are in the contiguous USA, not all cells in the bounding box. How do I go about doing this?
I'd also appreciate any other tips regarding the shape file I'm using -- is it the best one? Should I be worried about using the right projection, since my actual dataset is lat/long? Will distanceFromPoints be able to efficiently process such a large dataset, or is there a better function?
To limit raster D to the contiguous USA you could find the elements of rr assigned values of NA (i.e. raster cells within the bounding box but outside of the usa polygons), and assign these same elements of D a value of NA.
D[which(is.na(rr[]))] <- NA
plot(D)
lines(usa)
You can use 'proj4string(usa)' to find the projection info for the usa shapefile. If your coordinates of interest are based on a different projection, you can transform them to match the usa shapefile projection as follows:
my_coords_xform <- spTransform(my_coords, CRS(proj4string(usa)))
Not sure about the relative efficiency of distanceFromPoints, but it only took ~ 1 sec to run on my computer using your example with 10 points.
I think you were looking for the mask function.
library(raster)
usa <- getData('GADM', country='USA', level=1)
# exclude Alaska and Hawaii
usa <- usa[!usa$NAME_1 %in% c( "Alaska" , "Hawaii"), ]
# get the extent and create raster with preferred resolution
r <- raster(floor(extent(usa)), res=1)
# rasterize polygons
rr <- rasterize(usa, r)
set.seed(89)
sample <- spsample(usa, 10, type = "random")
# Find distance from sample points to cells of USA raster
D <- distanceFromPoints(object = rr, xy = sample)
# remove areas outside of polygons
Dm <- mask(D, rr)
# an alternative would be mask(D, usa)
# cell with highest value
mxd <- which.max(Dm)
# coordinates of that cell
pt <- xyFromCell(r, mxd)
plot(Dm)
points(pt)
The distances should be fine, also when using long/lat data. But rasterFromPoints could indeed be a bit slow with a large data set as it uses a brute force algorithm.

rworldmap coordinates, how to match NetCDF data to the map?

Rworldmap looks like exactly what I need for mapping climate data, but I'm having a problem lining up the base map with the climate data. What I am mapping is ocean temperature data from JAMSTEC for August, 2015 from here:
http://www.jamstec.go.jp/ARGO/argo_web/ancient/MapQ/Mapdataset_e.html
The dataset name is TS_201508_GLB.nc. The R script I'm using is below. The country outlines are fine, but the data is for the oceans only and the data does not show in the oceans it is offset somehow. Can you tell me how to align the data to the map?
I've read lots of articles but I cannot tell how to align the two. The data I have is longitude and latitude. South latitude is negative and west longitude is negative, I don't see how they could be confused. How is the map shown, is there some sort of special convention for the lat/longs?
Thanks for any help you can provide.
The code:
library(RNetCDF)
library(sp)
library(rworldmap)
library(rgeos)
library(RColorBrewer)
library (classInt)
library(grid)
library(spam)
library(maps)
library(maptools)
library(fields)
library(methods)
library(rgdal)
library(rworldxtra)
fname <- "G:/Climate_Change/Ocean_Warming/MOAA_GPV_Jamstec_Temperature/TS_201508_GLB.nc"
moaa <- open.nc(fname)
# moaa
print.nc(moaa)
file.inq.nc(moaa)
#TOI is the temperature array extracted from the NCDF file
TOI = var.get.nc(moaa,"TOI",start=c(1,1,1),count=c(360,132,25))
TOI[1,1,1]
Long = var.get.nc(moaa,"LONGITUDE")
Lat = var.get.nc(moaa, "LATITUDE")
Pres = var.get.nc(moaa,"PRES")
# create grid
offset=c(-179.5,-60.50)
cellsize = c(abs(Long[1]-Long[2]),abs(Lat[1]-Lat[2]))
cells.dim = c(dim(Long), dim(Lat))
# create gt
gt <- GridTopology(cellcentre.offset=offset,cellsize=cellsize,cells.dim=cells.dim)
# create map window
mapDevice()
# Create a color pallette
colourPalette=c('blue','lightblue','white',brewer.pal(9,'YlOrRd'))
# Values at 2000 decibar for August 2015
ncMatrix <- TOI[,,25]
# Gridvalues
gridVals <-data.frame(att=as.vector(ncMatrix))
# create a spatialGridDataFrame
sGDF <-SpatialGridDataFrame(gt,data=gridVals)
# Vector to classify data
catMethod=seq(from=0,to=4,by=.33)
# plotting the map and getting params for legend
mapParams <- mapGriddedData( sGDF, nameColumnToPlot='att',catMethod=catMethod,colourPalette=colourPalette,addLegend=FALSE)
I finally figured it out. rworldmap wants the data organized from the upper left of the map(Northwest corner), that is Long = -180, Lat=90. The NetCDF data starts at Long=0 and Lat=-90(the middle of the map and south edge). So we have to reverse the values in the North-South direction:
#
# Flip the Latitude values so south is last
ncMatrix2 <- ncMatrix[,dim(Lat):1]
Then switch the values for east longitude and west longitude:
#
#Longitude values need to be from -180 to 0 then 0 to 180
# So we divide into East and West, then recombine with rbind
East_Long_values <-ncMatrix2[1:180,]
West_Long_Values <-ncMatrix2[181:360,]
ncMatrix3 <- rbind(West_Long_Values,East_Long_values)
Then everything else works.

Resources