Extract shapefiles from longitude/latitude gridded data - r

I have some gridded data of sea surface temperature values in the Mediterranean to which I've applied clustering. I have 420 files with three columns structure (long,lat,value). The data for a particular file looks like this map
Now I want to extract the cluster areas as shapefile for postprocessing. I have found this post (https://gis.stackexchange.com/a/187800/9227) and tried to use its code like this
# Packages
library(sp)
library(rgdal)
library(raster)
# Paths
ruta_datos<-"/home/meteo/PROJECTES/VERSUS/OUTPUT/DATA/CLUSTER_MED/"
setwd("~/PROJECTES/VERSUS/temp")
# File list
files <- list.files(path = ruta_datos, pattern = "SST-cluster-mitja-mensual")
for (i in 1:length(files)){
datos<-read.csv(paste0(ruta_datos,files[i],sep=""),header=TRUE)
nclusters<-max(datos$cluster)
for (j in 1:nclusters){
clust.dat<-subset(datos, cluster == j)
coordinates(clust.dat)=~longitud+latitud
proj4string(clust.dat)=CRS("+init=epsg:4326")
pts = spTransform(clust.dat,CRS("+init=epsg:4326"))
gridded(pts) = TRUE
r = raster(pts)
projection(r) = CRS("+init=epsg:4326")
# make all values the same. Either do
s <- r > -Inf
# convert to polygons
pp <- rasterToPolygons(s, dissolve=TRUE)
# save shapefile
shname<-paste("SST-shape-",substr(files[i],27,32),"-",j,sep="")
writeOGR(pp, dsn = '.', layer = shname, driver = "ESRI Shapefile")
}
}
But the code stops for with this error message
gridded(pts) = TRUE
suggested tolerance minimum: 1
Error in points2grid(points, tolerance, round) : dimension 2
: coordinate intervals are not constant
Warning message: In points2grid(points, tolerance, round) : grid has empty
column/rows in dimension 1
I don't understand that at a certain file it says that coordinate intervals are not constant while they indeed are, original SST data from which clustering was derived are on a regular grid over the whole globe. All cluster data files have the same size, 4248 points. A sample data file is available here
What does the tolerance suggestion means? I've been looking for a solution and found some suggestion to use SpatialPixelsDataFrame but couldn't find out how to apply.
Any help would be appreciated. Thanks.

I am not an expert of geospatial data but for me, if you filter on cluster, data are indeed not on a grid. So far as I understand, you start from a grid (convex set of regularly distant points).
I tried following modifications to your code and some files are generated but I can't test whether they are correct or not.
Principle is to build the grid on all data then only filter on cluster before calling raster.
This gives:
files <- list.files(path = ruta_datos, pattern = "SST-cluster-mitja-mensual")
for (i in 1:length(files)){
datos<-read.csv(paste0(ruta_datos,files[i],sep=""),header=TRUE)
nclusters<-max(datos$cluster)
for (j in 1:nclusters){
## clust.dat<-subset(datos, cluster == j)
clust.dat <- datos
coordinates(clust.dat)=~longitud+latitud
proj4string(clust.dat)=CRS("+init=epsg:4326")
pts = spTransform(clust.dat,CRS("+init=epsg:4326"))
gridded(pts) = TRUE
## r = raster(pts)
r= raster(pts[pts$cluster==j,])
projection(r) = CRS("+init=epsg:4326")
# make all values the same. Either do
s <- r > -Inf
# convert to polygons
pp <- rasterToPolygons(s, dissolve=TRUE)
# save shapefile
shname<-paste("SST-shape-",substr(files[i],27,32),"-",j,sep="")
writeOGR(pp, dsn = '.', layer = shname, driver = "ESRI Shapefile")
}
}
So, two lines in comment and update just the line below.

Related

How to select one point per raster grid cell?

I have a point shapefile ("search_effort.shp") that is highly clustered and an NDVI raster (resolution in m: 30.94948, 30.77829). I would like to subset my search_effort.shp by selecting 1 point per raster grid cell and create a new search_effort shapefile. I am using R version 4.0.3
I think I could have used Package ‘gridsample’ (in 'raster' v1.3-1), but it was removed from the CRAN repository and I would prefer not to use the archived version. Is there another way to do this in R?
I have also tried sample.grid but I do not know how to specify my raster as the grid, and have tried the following:
# NDVI raster to be used as the reference extent
NDVI_extent <-readGDAL('C:/Model_layers/NDVI.tif')
# Load the file names
layername <- "SearchEffort"
# Read in the shapefile
search_effort <- readOGR(dsn= ".", layer = layername)
plot(search_effort)
# Set the reference extent
r <- raster(NDVI_extent)
# Extract coordinates from the shapefile
search_effort#coords <- search_effort#coords[, 1:2]
#Subset points
sample.grid(search_effort, cell.size = c(30.94948, 30.77829), n = 1)
I get the following error:
"Error in validObject(.Object) : invalid class “GridTopology” object: cellsize has incorrect dimension."
I get the same error regardless of the cell.size I specify.
Example data
library(raster)
r <- raster(res=30)
values(r) <- 1:ncell(r)
x <- runif(1000,-180,180)
y <- runif(1000,-90,90)
xy <- cbind(x, y)
Solution
library(dismo)
s <- gridSample(xy, r, n=1)
Illustration
plot(as(r, "SpatialPolygons"))
points(s, col="red")
points(xy, cex=.1, col="blue")

sp::over(). Does the dot belong to one of the polygons identified with an OGRGeoJSON file?

I'm trying to get a boolleans vector, where for example, v[i] =1 tells me if an i-th point (latitude longitude pair, present inside a train dataframe) falls within one of the geographical areas identified by an OGRGeoJSON file.
The OGR file is structured roughly like this:
District 1: 24 polygonal
District 2: 4 polygonal
District 3: 27 polygonal
District 4: 18 polygonal
District 5: 34 polygonal
That's what I tried to do.
However, the results obtained are not correct because the polygonal that is generated is a mix of all the various areas present in the OGR file.
library(rgdal)
library(httr)
library(sp)
r <- GET('https://data.cityofnewyork.us/api/geospatial/tqmj-j8zm?method=export&format=GeoJSON')
nyc_neighborhoods <- readOGR(content(r,'text'), 'OGRGeoJSON', verbose = F)
#New York City polygonal
pol_lat <- c(nyc_neighborhoods_df$lat)
pol_long <- c(nyc_neighborhoods_df$long)
xy <- cbind(pol_lat, pol_long)
p = Polygon(xy)
ps = Polygons(list(p),1)
pol = SpatialPolygons(list(ps))
#Points to analyse (pair of coordinates)
ny_lat <- c(train$pickup_latitude, train$dropoff_latitude)
ny_long <- c(train$pickup_longitude, train$dropoff_longitude)
ny_coord <- cbind(ny_lat, ny_long)
pts <- SpatialPoints(ny_coord)
#Query: Does the point to analyze fall in or out NYC?
over(pts, pol, returnList = TRUE)
How can I fix this to get the correct result?
sp is an older package which is being phased out in favor of the newer "Simple Features" sf package. Let me know if you are open to using the pipe operator %>% from the magrittr package, as it works nicely with the sf package (as does dplyr and purrr).
Using sf, you could do:
library(sf)
# Replace this with the path to the geojson file
geojson_path <- "path/to/file.geojson"
boroughs <- sf::st_read(dsn = geojson_path, stringsAsFactors = FALSE)
Now making a very simple spatial point object to stand in for the "trains" data.
# Make test data.frame
test_df <-
data.frame(
# Random test point I chose, a couple of blocks from Central Park
a = "manhattan_point",
y = 40.771959,
x = -73.964128,
stringsAsFactors = FALSE)
# Turn the test_df into a spatial object
test_point <-
sf::st_as_sf(
test_df,
# The coords argument tells the st_as_sf function
# what columns store the longitude and latitude data
# which it uses to associate a spatial point to each
# row in the data.frame
coords = c("x", "y"),
crs = 4326 # WGS84
)
Now we are ready to determine what polygon(s) our point falls in:
# Get the sparse binary predicate. This will give a list with as
# many elements as there are spatial objects in the first argument,
# in this case, test_point, which has 1 element.
# It also has attributes which detail what the relationship is
# (intersection, in our case)
sparse_bin_pred <- sf::st_intersects(test_point, boroughs)
# Output the boro_name that matched. I think the package purrr
# offers some more intuitive ways to do this, but
lapply(
sparse_bin_pred,
function(x) boroughs$boro_name[x]
)
That last part outputs:
[[1]]
[1] "Manhattan"

Calculate zonal statistics in R as in GIS

I have multiple rasters in a folder. I need to extract mean of each of these rasters over a polygon shape file (has more 2500 polygons).
I came across two functions zonal and extract. It says extract can be used for points, lines and polygons too. Is it the only difference ? (Yes/No expected)
How can I extract mean from these multiple rasters and specify different column names as per their filenames for these extracted mean values ?
Edit::
I found a code somewhere and implemented it. But it is taking forever and no progress at all.
grids <- list.files("my_path", pattern = "*.tif$")
#check the number of files in the raster list (grids)
length <- length(grids)
#read-in the polygon shapefile
poly <- readShapePoly("my_path/supplimentY.shp")
#create a raster stack
s <- stack(paste0("my_path/", grids))
#extract raster cell count (sum) within each polygon area (poly)
for (i in 1:length(grids)){
ex <- extract(s, poly, fun='mean', na.rm=TRUE, df=TRUE, weights = TRUE)
# the code doesnot progress from here onwards.
# i checked it by adding this line:: print(i)
}
#write to a data frame
dfr <- data.frame(ex)
You do not need the loop (you repeat the same operation at each iteration!).
It should be like this:
library(raster)
ff <- list.files("my_path", pattern = "\\.tif$", full=TRUE)
s <- stack(ff)
poly <- shapefile("my_path/supplimentY.shp")
ex <- extract(s, poly, fun='mean', na.rm=TRUE, df=TRUE, weights = TRUE)
I used the same code to calculate the zonal mean of climatology for an area boundary, it took me 5-6mins to work on 2736 layers of raster data.
layers <- length(clim)
for (i in 1:length(clim)) {
ex <- extract(clim, shpwb, fun=mean, na.rm=TRUE, df=TRUE)
}
df <- data.frame(ex)
write.csv(df, file = "E:/Central University of Jharkhand/3rd Semester/Climatology/R Studio/CSV.csv")

Line density function in R equivalent to Line density tool in ArcMap (arcpy)

I need to calculate the magnitude-per-unit area of polylines that fall within a radius around each cell. Essentially I need to calculate a km/km2 road density within a 500m pixel search radius. ArcMap has a quick and easy tool that handles this, but I need a pure R solution.
Here is a link on how line density works: http://desktop.arcgis.com/en/arcmap/10.3/tools/spatial-analyst-toolbox/how-line-density-works.htm
And this is how to use it in a python (arcpy) script: http://desktop.arcgis.com/en/arcmap/10.3/tools/spatial-analyst-toolbox/line-density.htm
I currently execute a backwards approach using raster::focal function, calculating a density of burned in road features. I then convert the km2/km2 output to km/km2.
#Import libraries
library(raster)
library(rgdal)
library(gdalUtils)
#Read-in an already created raster mask (cells are all set to 0)
mask <- raster("x://path to raster mask...")
#Make a copy of the mask to burn features in, keeping the original untouched
roads_mask <- file.copy(mask, "x://output path ...//roads.tif")
#Read-in road features (shapefile format)
roads_sldf <- readOGR("x://path to shapefile" , "roads")
#Rasterize spatial lines data frame ie. burn road features into mask
#Where road features get a value of 1, mask extent gets a value of 0
roads_raster <- gdalUtils::gdal_rasterize(src_datasource = roads_sldf,
dst_filename = "x://output path ...//roads.tif", b = 1,
burn = 1, l = "roads", output_Raster = TRUE)
#Run a 1km circular radius density function (be mindful of edge effects)
weight <- raster::focalWeight(roads_raster,1000,type = "circle")
1km_rdDensity <- raster::focal(roads_raster, weight, fun=sum, filename = '',
na.rm=TRUE, pad=TRUE, NAonly=FALSE, overwrite=TRUE)
#Convert km2/km2 road density to km/km2
#Set up the moving window
weight <- raster::focalWeight(roads_raster,1000,type = "circle")
#Count how many records in each column of the moving window are > 0
columnCount <- apply(weight,2,function(x) sum(x > 0))
#Get the sum of the column count
number_of_cells <- sum(columnCount)
#multiply km2/km2 density by number of cells in the moving window
step1 <- roads_raster * number_of_cells
#Rescale step1 output with respect to cell size(30m) and radius of a circle
final_rdDensity <- (step1*0.03)/3.14159265
#Write out final km/km2 road density raster
writeRaster(final_rdDensity,"X://path to output...", datatype = 'FLT4S', overwrite = TRUE)
After some more research I think I may be able to use a kernel function, however I don't want to apply the smoothing algorithm... As well the output is an 'im' object which I would need to write to as a 'tif'
#Import libraries
library(spatstat)
library(rgdal)
#Read-in road features (shapefile format)
roads_sldf <- readOGR("x://path to shapefile" , "roads")
#Convert roads spatial lines data frame to psp object
psp_roads <- as.psp(roads_sldf)
#Apply kernel density, however this is where I am unsure of the arguments
road_density <- spatstat::density.psp(psp_roads, sigma = 0.01, eps = 500)
Cheers.
See this question https://gis.stackexchange.com/questions/138861/calculating-road-density-in-r-using-kernel-density
Tried to mark as a duplicate but doesn't work because the other Q is on gis stack exchange
Short answer is use spatstat.geom::pixellate()
I also needed spatstat.geom::as.psp(sf::st_geometry(x)) to convert an sf lines object to the correct format and maptools::as.im.RasterLayer(r) to convert a raster. I was able to convert the result to RasterLayer with raster::raster(pix_res)
Perhaps you can use terra::rasterizeGeom which is available in the development version that you can install with install.packages('terra', repos='https://rspatial.r-universe.dev')
Example data
library(terra)
f <- system.file("ex/lux.shp", package="terra")
v <- vect(f) |> as.lines()
r <- rast(v, res=.1)
Solution
x <- rasterizeGeom(v, r, fun="length", "km")
And then use focal sum, but you would not have a perfect circle.
What you could do instead, if your dataset is not too large, is create a circle for each grid cell and use intersect. Something like this:
p <- xyFromCell(r, 1:ncell(r)) |> vect(crs="+proj=longlat")
p$id <- 1:ncell(r)
b <- buffer(p, 10000)
values(v) <- NULL
i <- intersect(v, b)
x <- aggregate(perim(i), list(id=i$id), sum)
r[x$id] <- x[,2]

Using R intersections to create a polygons-inside-a-polygon key using two shapefile layers

The data
I have two shapefiles marking the boundaries of national and provincial electoral constituencies in Pakistan.
The objective
I am attempting to use R to create a key that will generate a list of which provincial-level constituencies are "contained within" or otherwise intersecting with which national-level constituencies, based on their coordinates in this data. For example, NA-01 corresponds with PA-01, PA-02, PA-03; NA-02 corresponds with PA-04 and PA-05, etc. (The key will ultimately be used to link separate dataframes containing electoral results at the national and provincial level; that part I've figured out.)
I have only basic/intermediate R skills learned largely through trial and error and no experience working with GIS data outside of R.
The attempted solution
The closest solution I could find for this problem comes from this guide to calculating intersection areas in R. However, I have been unable to successfully replicate any of the three proposed approaches (either the questioner's use of a general TRUE/FALSE report on intersections, or the more precise calculations of area of overlap).
The code
# import map files
NA_map <- readOGR(dsn = "./National_Constituency_Boundary", layer = "National_Constituency_Boundary")
PA_map <- readOGR(dsn = "./Provincial_Constituency_Boundary", layer = "Provincial_Constituency_Boundary")
# Both are now SpatialPolygonsDataFrame objects of 273 and 577 elements, respectively.
# If relevant, I used spdpylr to tweak some of data attribute names (for use later when joining to electoral dataframes):
NA_map <- NA_map %>%
rename(constituency_number = NA_Cons,
district_name = District,
province = Province)
PA_map <- PA_map %>%
rename(province = PROVINCE,
district_name = DISTRICT,
constituency_number = PA)
# calculate intersections, take one
Results <- gIntersects(NA_map, PA_map, byid = TRUE)
# this creates a large matrix of 157,521 elements
rownames(Results) <- NA_map#data$constituency_number
colnames(Results) <- PA_map#data$constituency_number
Attempting to add the rowname/colname labels, however, gives me the error message:
Error in dimnames(x) <- dn :
length of 'dimnames' [1] not equal to array extent
Without the rowname/colname labels, I'm unable to read the overlay matrix, and unsure how to filter them so as to produce a list of only TRUE intersections that would help make a NA-PA key.
I also attempted to replicate the other two proposed solutions for calculating exact area of overlap:
# calculate intersections, take two
pi <- intersect(NA_map, PA_map)
# this generates a SpatialPolygons object with 273 elements
areas <- data.frame(area=sapply(pi#polygons, FUN = function(x) {slot(x, 'area')}))
# this calculates the area of intersection but has no other variables
row.names(areas) <- sapply(pi#polygons, FUN=function(x) {slot(x, 'ID')})
This generates the error message:
Error in `row.names<-.data.frame`(`*tmp*`, value = c("2", "1", "4", "5", :
duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique value when setting 'row.names': ‘1’
So that when I attempt to attach areas to attributes info with
attArrea <- spCbind(pi, areas)
I get the error message
Error in spCbind(pi, areas) : row names not identical
Attempting the third proposed method:
# calculate intersections, take three
pi <- st_intersection(NA_map, PA_map)
Produces the error message:
Error in UseMethod("st_intersection") :
no applicable method for 'st_intersection' applied to an object of class "c('SpatialPolygonsDataFrame', 'SpatialPolygons', 'Spatial', 'SpatialPolygonsNULL', 'SpatialVector')"
I understand that my SPDF maps can't be used for this third approach, but wasn't clear from the description what steps would be needed to transform it and attempt this method.
The plea for help
Any suggestions on corrections necessary to use any of these approaches, or pointers towards some other method of figuring this, would be greatly appreciated. Thanks!
Here is some example data
library(raster)
p <- shapefile(system.file("external/lux.shp", package="raster"))
p1 <- aggregate(p, by="NAME_1")
p2 <- p[, 'NAME_2']
So we have p1 with regions, and p2 with lower level divisions.
Now we can do
x <- intersect(p1, p2)
# or x <- union(p1, p2)
data.frame(x)
Which should be (and is) the same as the original
data.frame(p)[, c('NAME_1', 'NAME_2')]
To get the area of the polygons, you can do
x$area <- area(x) / 1000000 # divide to get km2
There are likely to be many "slivers", very small polygons because of slight variations in borders. That might not matter to you.
But another approach could be matching by centroid:
y <- p2
e <- extract(p1, coordinates(p2))
y$NAME_1 <- e$NAME_1
data.frame(y)
Your code isn't self-contained, so I didn't try to replicate the errors you report.
However, getting the 'key' you want is very simple using the sf package (which is intended to supercede rgeos, rgdal and sp in the near future). See here:
library(sf)
# Download shapefiles
national.url <- 'https://data.humdata.org/dataset/5d48a142-1f92-4a65-8ee5-5d22eb85f60f/resource/d85318cb-dcc0-4a59-a0c7-cf0b7123a5fd/download/national-constituency-boundary.zip'
provincial.url <- 'https://data.humdata.org/dataset/137532ad-f4a9-471e-8b5f-d1323df42991/resource/c84c93d7-7730-4b97-8382-4a783932d126/download/provincial-constituency-boundary.zip'
download.file(national.url, destfile = file.path(tempdir(), 'national.zip'))
download.file(provincial.url, destfile = file.path(tempdir(), 'provincial.zip'))
# Unzip shapefiles
unzip(file.path(tempdir(), 'national.zip'), exdir = file.path(tempdir(), 'national'))
unzip(file.path(tempdir(), 'provincial.zip'), exdir = file.path(tempdir(), 'provincial'))
# Read map files
NA_map <- st_read(dsn = file.path(tempdir(), 'national'), layer = "National_Constituency_Boundary")
PA_map <- st_read(dsn = file.path(tempdir(), 'provincial'), layer = "Provincial_Constituency_Boundary")
# Get sparse list representation of intersections
intrs.sgpb <- st_intersects(NA_map, PA_map)
length(intrs.sgpb) # One list element per national constituency
# [1] 273
print(intrs.sgpb[[1]]) # Indices of provnicial constituencies intersecting with first national constituency
# [1] 506 522 554 555 556
print(PA_map$PROVINCE[intrs.sgpb[[1]]])[1] # Name of first province intersecting with first national constituency
# [1] KHYBER PAKHTUNKHWA

Resources