Subset SpatialPolygonsDataFrame - r

I have a SpatialPolygonsDataFrame that I created by reading in a shapefile using readOGR in the rgdal package. I am attempting to use it to generate a sampling grid using spsample in the sp package to be used for interpolation from survey data collected in the area. However, the SpatialPointsDataFrame encompasses a much larger area than the survey, and as a result, the interpolation is predicting values far away from where any surveys were conducted. The survey data and the shapefile were both projected using the same proj4string.
I would like to subset the SpatialPolygonsDataFrame using the coordinates set by the survey stations, but I'm not sure where the relevant values are stored in the object.
I'm afraid that I can't provide the relevant data as the shapefile is not hosted online. However, I will borrow some code from Paul Hiemstra's response to this post for the Netherlands:
library(ggplot2)
library(sp)
library(automap)
library(rgdal)
#get the spatial data for the Netherlands
con <- url("http://gadm.org/data/rda/NLD_adm0.RData")
print(load(con))
close(con)
class(gadm)
bbox(gadm)
> min max
>r1 3.360782 7.29271
>r2 50.755165 53.55458
Let's say that the surveys were conducted in this area:
bbox(surveys)
> min max
>r1 4.000 7.000
>r2 51.000 53.000
How can I crop out that area of the SpatialPolygonsDataFrame?
EDIT: This question appears to answer mine. Apologies for not searching hard enough (although the comments did give me some sense of where to turn with rgeos). However, gIntersection causes R to crash...

Depending upon the size of the polygons you could do something like
range = cbind(c(4,7), c(51,53))
centroids <- coordinates(spdf)
spdf.subset <- spdf[centroids[,1] > range[1,1] &
centroids[,1] < range[2,1] &
centroids[,2] > range[1,2] &
centroids[,2] < range[2,2],]

Related

Use netCDF file in R as panel data sf object

I have a netCDF file from:
Kummu, Matti; Taka, Maija; Guillaume, Joseph H. A. (2020), Data from: Gridded global datasets for Gross Domestic Product and Human Development Index over 1990-2015, Dryad, Dataset, https://doi.org/10.5061/dryad.dk1j0
More specifically, I want to use this dataset:
https://drive.google.com/file/d/1-PUa0R0ZcUCJbA1VIMe0KKdElHX2C3Cc/view?usp=sharing
which is in netCDF format.
I would like to import it as a panel dataset sf-object such that I can spatia-join it with a different sf-object.
I tried using
ppp <- stars::read_stars('~/GDP_per_capita_PPP_1990_2015_v2.nc', var="GDP_per_capita_PPP")
ppp <- sf::st_as_sf(ppp)
The result is a panel-dataset with over 9 million observations but the geometry seems to be off. I am only interested in African countries, maybe this helps reduce the dataset, but from the sf-object I get I do not know how to proceed.
The code below to combine the gridded gdp with the centroid of an administrative region (which was an SF object - represented by obj2 in the code snippet).
If you need to aggregate the grids into an administrative region (say by averaging over the region), have a look at exactextractr
library(sf)
library(raster)
library(terra)
obj1 <- stack("./doi_10.5061_dryad.dk1j0__v2/GDP_per_capita_PPP_1990_2015_v2.nc")
extract.df <- terra::extract(obj1, obj2, df = T)

tmap is plotting a different legend (range of values?) for a cropped rasterlayer compared to original raster

I am extremely new to working with spatial data and so most of what I'm about to say is me trying to speak a foreign language. Right now I am trying to learn how to do this all in R (I am slightly more capable with this data in QGIS but for this solution, I am looking for R only).
My research involves ecological data in Pennsylvania (PA) and so I am playing around with cropping the US NLCD dataset to PA. I have a raster layer for the NLCD and a shapefile for the boundary of Pennsylvania. I am able to successfully crop the larger US raster down to PA as follows:
library(raster)
library(rgdal)
pabound <- readOGR(dsn="...",
layer="PAbound")
nlcdRast <- raster(".../NLCD_2016_Land_Cover_L48_20190424.img")
pabound <- spTransform(pabound,CRS(proj4string(nlcdRast)))
PAnlcd <- raster::crop(nlcdRast,pabound)
If I run the simple plot command for both nlcdRast and PAnlcd (i.e. plot(nlcdRast) they maintain the same color scheme. But when I run it through tmap it seems to look at the cropped data differently and I am not exactly sure how to figure this out. Please see the plots below:
library(tmap)
tm_shape(nlcdRast) +
tm_raster()
And then when I plot the cropped version in tmap:
tm_shape(PAnlcd) +
tm_raster()
As you can see, it is not simply the color palette that is changing (I am confident I could figure that out) but the real problem is I'm losing the important information as seen in the legend. Whereas the full plot actually shows the categorical values for the raster NLCD, the cropped version now seems to show just some unknown numerical range. Even though it looks bad at the moment, I'd like to have the same legend/information as seen in the full US map.
I apologize for not having a more reproducible example but I am completely lost on what is happening here so I can't quite replicate it. I suppose right now I'm just looking for where to look to try and figure out what changed. Thank you in advance.
Cropping is changing the way the pixels are represented. To maintain your values use the stars package (also note I'm using the sf package for the shapefile):
library(stars)
library(sf)
# load in NLCD
nlcdRast <- read_stars(".../NLCD_2016_Land_Cover_L48_20190424.img")
# read in study area
pabound <- st_read(dsn="...", layer="PAbound")
# reproject pabound to match NLCD
pabound <- st_transform(pabound, CRSobj = crs(nlcdRast))
# now crop
panlcd <- st_crop(nlcdRast, pabound)

Any workaround to construct temperature distribution over multi-layers raster in R

Here I found a very interesting blog:critical threshold in temperature effects and empirical approach is very interesting, so I want to implement its idea in R. However, I have multi-layer raster data of German' historical daily temperatures (15 years' historical daily mean temperature) in large RasterBrick object. According to the empirical approach that discussed in inspired post, I need to construct the distribution of temperature from my multi-layer raster data.
Update 2: reproducible shapefile:
I aware that downloading shapefile from 3rd party website is not practical, so here I come up reproducible shapefile to give it try:
library(sf)
library(maps)
library(rgeos)
library(mapdata)
germany <- st_as_sf(map("Germany", plot = FALSE, fill = TRUE))
write_sf(germany, "germany.shp")
To easily follow up my post, I created reproducible raster data to work with in R. I also provide Germany' shapefile that taken from eurostat website; here is the shapefile on the fly (I can guarantee the link is quite safe and file is very small to use): eurostat' shapefile and here is handy reproducible raster data:
reproducible data
library(raster)
library(lubridate)
library(tidyverse)
r <- raster(xmn=5.75, xmx= 15, ymn = 47.25, ymx =55,res=c(0.25,0.25))
Deu_crop <- do.call(stack,lapply(1:5479,function(i) setValues(r,round(runif(n = ncell(r),min = -10,max = 25)))))
names(Deu_crop) <- paste0('X',gsub('-','.',ymd('1980.01.01') + days(1:5479)))
shp <- shapefile('eurostat_NUTS3/deu_adm_2006.shp')
e <- raster::extract(Deu_crop,shp)
names(e) <- shp$NUTS_ID
so to test the workflow that presented in inspired post, I need to design several global variables which will be used helper functions that presented in here. But I don't understand how to design some critical global variables that used to accomplish its workflow; It is recommended to define global variable like: w - weather data; tempDat: particular aggregated weather data; Trows: span aggregated grid data; and T: vector of integer temperature (details can be found here: details).
I want to estimate the distribution of temperatures over time from the gridded daily weather data. But I have a difficulty to test the empirical steps that presented in this inspired post because it didn't mention the solution for the case of handling multi-layer raster data, so I don't know how to adopt its fantastic idea on my own in R.
Here is my approach to aggregate multi-layer raster data for each polygons from shapefile (eurostat' shapefile) before using helper funcion in inspired post:
initial attempt to manipulate multi-layers raster:
rasterHelper <- function(ix,e){
gather(data.frame(e[[ix]],stringsAsFactors = F),'colname','temp') %>%
group_by(colname) %>% summarise(temp = mean(temp)) %>% ungroup() %>% # spatial mean
mutate(year = sub('X(\\d{4}).+','\\1',colname)) %>%
group_by(year) %>% summarise_all(funs(sum)) %>% mutate(NUTS_ID = names(e)[ix])
}
do.call(rbind,lapply(1:length(e),function(ix) rasterHelper(ix)))
but my above attempt is not working; In my attempt, I intend to aggregate temperature raster data for each polygon. The implementation of inspired post is very useful but it is still hard to follow up for handling multi-layer raster data. I assume I should work on each raster layer and construct temperature distribution over time, but really don't have a solid idea how to do in R. Any idea?
Update:
Here is the paper that I got inspiration from it: nonlinear temperature effect ..., but implementing the proposed method is still challenging for me even I followed the workflow that presented in the respective blog: searching critical threshold in temperature effect
Is there anyone point me out how to adopt its empirical approach on multi-layer raster data in R? How can I estimate the distribution of temperature over time? How can I make this happen in R? Any more thoughts? Thanks
I am not quite sure what you want to do.
To set up a smaller example:
library(raster)
lux <- shapefile(system.file("external/lux.shp", package="raster"))
r <- raster(lux)
s <- stack(lapply(1:12, function(i) setValues(r, 1:ncell(r))))
e <- extract(s, lux)
Now you say that you want to aggregate --- that is a bit ambiguous, but perhaps what you want is
x <- lapply(e, function(i) apply(i,2,mean))
equivalent to
y <- extract(s, lux, fun='mean')

Query raster brick layer based on another raster in R

I have a NetCDF file of global oceanographic (OmegaA) data at relatively coarse spatial resolution with 33 depth levels. I also have a global bathymetry raster at much finer resolution. My goal is to use get the seabed OmegaA data from the NetCDF file, using the bathymetry data to determine the desired depth. My code so far;
library(raster)
library(rgdal)
library(ncdf4)
# Aragonite data. Defaults to CRS WGS84
ncin <- nc_open("C:/..../GLODAPv2.2016b.OmegaA.nc")
ncin.depth <- ncvar_get(ncin, "Depth")# 33 depth levels
omegaA.brk <- brick("C:/.../GLODAPv2.2016b.OmegaA.nc")
omegaA.brk <-rotate(omegaA.bkr)# because netCDF is in Lon 0-360.
# depth raster. CRS WGS84
r<-raster("C:/....GEBCO.tif")
# resample the raster brick to the resolution that matches the bathymetry raster
omegaA.brk <-resample(omegaA.brk, r, method="bilinear")
# create blank final raster
omegaA.rast <- raster(ncol = r#ncols, nrow = r#nrows)
extent(omegaA.rast) <- extent(r)
omegaA.rast[] <- NA_real_
# create vector of indices of desired depth values
depth.values<-getValues(r)
depth.values.index<-which(!is.na(depth.values))
# loop to find appropriate raster brick layer, and extract the value at the desired index, and insert into blank raster
for (p in depth.values.index) {
dep.index <-which(abs(ncin.depth+depth.values[p]) == min(abs(ncin.depth+depth.values[p]))) ## this sometimes results in multiple levels being selected
brk.level <-omegaA.brk[[dep.index]] # can be more than on level if multiple layers selected above.
omegaA.rast[p] <-omegaA.brk[[1]][p] ## here I choose the first level if multiple levels have been selected above
print(paste(p, "of", length(depth.values.index))) # counter to look at progress.
}
The problem: The result is a raster with massive gaps (NAs) in it where there should be data. The gaps often take a distinctive shape - eg, follow a contour, or along a long straight line. I've pasted a cropped example.
enter image description here
I think this could be because either 1) for some reason the 'which' statement in the loop is not finding a match or 2) a misalignment of the projections is created which I've read can happen when using 'Rotate'.
I've tried to make sure all the extents, resolutions, number of cells, and CRS's are all the same, which they seem to be.
To speed up the process I've cropped the global brick and bathy raster to my area of interest, again checking that all the spatial resolutions, etc etc match - I've not included those steps here for simplicity.
At a loss. Any help welcome!
Without a reproducible example, this kind of problems is hard to solve. I can't tell where your problem is but I'll present to you the approach I would try. Maybe it's good, maybe it's bad, I don't know but it may inspire you to find a way to go around your problem.
To my understanding, you have a brick of OmegaA (33 layers/depth) and a bathymetry raster. You want to get the OmegaA value at the bottom of the sea. Here is how I would do:
Make OmegaA raster to the same resolution and extent to the bathymetry one
Transforme the bathymetry raster into a raster brick of 33 three layers of 0-1. e.g. If the sea bottom is at 200m for one particular pixel, than this pixel on all depth layer other than 200 is 0 and 1 for the 200. To program this, I would go the long way, something like
:
r_1 <- r
values(r_1) <- values(r)==10 # where 10 is the depth (it could be a range with < or >)
r_2 <- r
values(r_2) <- values(r)==20
...
r_33 <- r
values(r_33) <- values(r)==250
r_brick <- brick(r_1, r_2, ..., r_33)
then you multiple both your raster bricks. They have the same dimension, it should be easy. The output should be a raster brick of 33 layers with 0 everywhere where it isn't the bottom of the sea and the value of OmegaA anywhere else.
Combine all the layer of the brick obtained previously into a simple raster with a sum.
This should work. If you have problem with dealing with raster brick, you could make the data into base R arrays, it could be simpler.
Good luck.

How to get count of non-NA raster cells within polygon

I've been running into all sorts of issues using ArcGIS ZonalStats and thought R could be a great way. Saying that I'm fairly new to R, but got a coding background.
The situation is that I have several rasters and a polygon shape file with many features of different sizes (though all features are bigger than a raster cell and the polygon features are aligned to the raster).
I've figured out how to get the mean value for each polygon feature using the raster library with extract:
#load packages required
require(rgdal)
require(sp)
require(raster)
require(maptools)
# ---Set the working directory-------
datdir <- "/test_data/"
#Read in a ESRI grid of water depth
ras <- readGDAL("test_data/raster/pl_sm_rp1000/w001001.adf")
#convert it to a format recognizable by the raster package
ras <- raster(ras)
#read in polygon shape file
proxNA <- readShapePoly("test_data/proxy/PL_proxy_WD_NA_test")
#plot raster and shp
plot(ras)
plot(proxNA)
#calc mean depth per polygon feature
#unweighted - only assigns grid to district if centroid is in that district
proxNA#data$RP1000 <- extract(ras, proxNA, fun = mean, na.rm = TRUE, weights = FALSE)
#check results
head(proxNA)
#plot depth values
spplot(proxNA[,'RP1000'])
The issue I have is that I also need an area based ratio between the area of the polygon and all non NA cells in the same polygon. I know what the cell size of the raster is and I can get the area for each polygon, but the missing link is the count of all non-NA cells in each feature. I managed to get the cell number of all the cells in the polygon proxNA#data$Cnumb1000 <- cellFromPolygon(ras, proxNA)and I'm sure there is a way to get the actual value of the raster cell, which then requires a loop to get the number of all non-NA cells combined with a count, etc.
BUT, I'm sure there is a much better and quicker way to do that! If any of you has an idea or can point me in the right direction, I would be very grateful!
I do not have access to your files, but based on what you described, this should work:
library(raster)
mask_layer=shapefile(paste0(shapedir,"AOI.shp"))
original_raster=raster(paste0(template_raster_dir,"temp_raster_DecDeg250.tif"))
nonNA_raster=!is.na(original_raster)
masked_img=mask(nonNA_raster,mask_layer) #based on centroid location of cells
nonNA_count=cellStats(masked_img, sum)

Resources