Extracting gridded (netcdf) climate data for an entire watershed polygon - r

My goal is to extract precipitation data from the Daymet database (https://daymet.ornl.gov/) for each of my 68 watershed polygons. I was able to use the ncdf4 package to download the data:
download_daymet_ncss(location = c(53.5,-116.6, 48,-98),
start = 1992,
end = 2000,
param = "prcp",
path = "./Daymet gridded data/Precip_raw")#download the data
I realize this is quite a large area and might be part of the issue.
Once the data is downloaded, the challenge begins.
I've tried two approaches, the first to aggregate the data into annual values (using daymet_grid_agg from the daymetr package). But then extracting the correct areas from the raster generated is challenging (and I haven't been able to do successfully).
I then tried to use the RavenR package to generate a grid overlay from a netcdf (rvn_netcdf_to_gridshp).
fn<-"prcp_daily_1992_ncss.nc"
ncfile<-nc_open(fn)
outshp<-rvn_netcdf_to_gridshp(ncfile, projID = 4326)
This fails completely - either RStudio freezes, cannont allocate 2.7 GB, or the shapefile is empty. I have tried increasing memory size, but then R is just running forever and nothing seems to happen.
Next, I tried this simple approach (as per https://www.researchgate.net/post/How-to-get-data-only-within-shapefile-boundary-from-a-netcdf-data-file-in-R-software):
shp<-st_read(file.choose())
data<-brick(file.choose())
crs(data) <- "+proj=lcc +lon_0=-90 +lat_1=33 +lat_2=45"
output<-raster::mask(data, shp)
The output brick raster is full of NAs...
I have made the downloaded netcdf quite a bit smaller, but none of these approaches seem to work... And yes, my data are gridded (unlike in this case: How to extract NetCDF data frame by region using a polygon shapefile)

Related

I want to rasterise a shape file but it is taking me ages to complete and have not succeeded yet. Can anyone tell me why?

I have a shapefile that I made in qgis of a national park with landcover type. I clipped a large shapefile of Thailand with a smaller shapefile of just the park. This file (DPKY.lc5) is now 10.5 Mb. When I run the code it takes forever and has not been successful. Why is that?
library(raster)
DPKY.lc5 <- shapefile("dpky.lc5.shp")
DPKY.lc5 <- spTransform(DPKY.lc5, CRS('+init=EPSG:4326'))
DPKY.lc5$VALUE<-as.numeric(DPKY.lc5$VALUE)
rr <- raster(DPKY.lc5, res=0.01)
rr1 <- rasterize(DPKY.lc5, rr, field="VALUE")
rr1
I would expect this to work as it is only 10.5Mb but it takes forever. I need this raster to work so I can use data points in another data frame to show the frequency of elephant habitat use in the park. I succeeded with other raster files to get elevation and slope aspect but this time it doesn't like​ it.

Query raster brick layer based on another raster in R

I have a NetCDF file of global oceanographic (OmegaA) data at relatively coarse spatial resolution with 33 depth levels. I also have a global bathymetry raster at much finer resolution. My goal is to use get the seabed OmegaA data from the NetCDF file, using the bathymetry data to determine the desired depth. My code so far;
library(raster)
library(rgdal)
library(ncdf4)
# Aragonite data. Defaults to CRS WGS84
ncin <- nc_open("C:/..../GLODAPv2.2016b.OmegaA.nc")
ncin.depth <- ncvar_get(ncin, "Depth")# 33 depth levels
omegaA.brk <- brick("C:/.../GLODAPv2.2016b.OmegaA.nc")
omegaA.brk <-rotate(omegaA.bkr)# because netCDF is in Lon 0-360.
# depth raster. CRS WGS84
r<-raster("C:/....GEBCO.tif")
# resample the raster brick to the resolution that matches the bathymetry raster
omegaA.brk <-resample(omegaA.brk, r, method="bilinear")
# create blank final raster
omegaA.rast <- raster(ncol = r#ncols, nrow = r#nrows)
extent(omegaA.rast) <- extent(r)
omegaA.rast[] <- NA_real_
# create vector of indices of desired depth values
depth.values<-getValues(r)
depth.values.index<-which(!is.na(depth.values))
# loop to find appropriate raster brick layer, and extract the value at the desired index, and insert into blank raster
for (p in depth.values.index) {
dep.index <-which(abs(ncin.depth+depth.values[p]) == min(abs(ncin.depth+depth.values[p]))) ## this sometimes results in multiple levels being selected
brk.level <-omegaA.brk[[dep.index]] # can be more than on level if multiple layers selected above.
omegaA.rast[p] <-omegaA.brk[[1]][p] ## here I choose the first level if multiple levels have been selected above
print(paste(p, "of", length(depth.values.index))) # counter to look at progress.
}
The problem: The result is a raster with massive gaps (NAs) in it where there should be data. The gaps often take a distinctive shape - eg, follow a contour, or along a long straight line. I've pasted a cropped example.
enter image description here
I think this could be because either 1) for some reason the 'which' statement in the loop is not finding a match or 2) a misalignment of the projections is created which I've read can happen when using 'Rotate'.
I've tried to make sure all the extents, resolutions, number of cells, and CRS's are all the same, which they seem to be.
To speed up the process I've cropped the global brick and bathy raster to my area of interest, again checking that all the spatial resolutions, etc etc match - I've not included those steps here for simplicity.
At a loss. Any help welcome!
Without a reproducible example, this kind of problems is hard to solve. I can't tell where your problem is but I'll present to you the approach I would try. Maybe it's good, maybe it's bad, I don't know but it may inspire you to find a way to go around your problem.
To my understanding, you have a brick of OmegaA (33 layers/depth) and a bathymetry raster. You want to get the OmegaA value at the bottom of the sea. Here is how I would do:
Make OmegaA raster to the same resolution and extent to the bathymetry one
Transforme the bathymetry raster into a raster brick of 33 three layers of 0-1. e.g. If the sea bottom is at 200m for one particular pixel, than this pixel on all depth layer other than 200 is 0 and 1 for the 200. To program this, I would go the long way, something like
:
r_1 <- r
values(r_1) <- values(r)==10 # where 10 is the depth (it could be a range with < or >)
r_2 <- r
values(r_2) <- values(r)==20
...
r_33 <- r
values(r_33) <- values(r)==250
r_brick <- brick(r_1, r_2, ..., r_33)
then you multiple both your raster bricks. They have the same dimension, it should be easy. The output should be a raster brick of 33 layers with 0 everywhere where it isn't the bottom of the sea and the value of OmegaA anywhere else.
Combine all the layer of the brick obtained previously into a simple raster with a sum.
This should work. If you have problem with dealing with raster brick, you could make the data into base R arrays, it could be simpler.
Good luck.

Rasterize error: Polygon to raster conversion produces horizontal lines

I am working with shapefiles in R that I need to convert from polygon to raster. While the vectors look perfect when plotted, when converted to raster using 'rasterize' they produce erroneous horizontal lines. Here is an example of the problem:
Here is a generic example of the code that I am using (sorry that I cannot upload the data itself as it is proprietary):
spdf.dat <- readOGR("directory here", "layer here")
# Plot polygon
plot(spdf.dat, col = 'dimgrey', border = 'black')
# Extract boundaries
ext <- extent(spdf.dat)
# Set resolution for rasterization
res <- 1
# determine no. of columns from extents and resolution
yrow <- round((ext#ymax - ext#ymin) / res)
xcol <- round((ext#xmax - ext#xmin) / res)
# Rasterize base
rast.base <- raster(ext, yrow, xcol, crs = projection(spdf.dat))
# Rasterize substrate polygons
rast <- rasterize(spdf.dat, rast.base, field = 1, fun = 'min', progress='text')
plot(rast, col = 'dimgrey')
Does this seem to be a problem with the source data or the rasterize function? Has anyone seen this sort of error before? Thank you for any advice that you can provide.
To make it official so the question is considered answered, I'll copy my commented responses here. You can therefor accept it.
When I look at your figure, it seems to me that the problematic appearing lines in the raster are situated at the same latitude of some islands. Try to removes these islands from your dataset. If the problem disappear, you'll know that your data is the problem and where in your data the problem lies.
An other option is to try the gdalUtils package which has a function: gdal_rasterize. Maybe gdal is less exigent in the input data.
I had a similar problem rasterizing the TIGER areal water data for the San Juan Islands in Washington State , as well as for Maui - both of these spatial polygon data frames at the default resolution returned by package Tigris using a raster defined by points 1 arc-second of lat/lon apart. There were several horizontal stripes starting at what appeared to be sharp bends of the coastline. Various simplification algorithms helped, but not predictably, and not perfectly.
Try package Velox, which takes some getting used to as it uses Reference Classes. It probably has size limits, as it uses the Boost geometry libraries and works in memory. You don't need to understand it all, I don't. It is fast compared to raster::rasterize (especially for large and complicated spatial lines dataframes), although I didn't experience the hundred-fold speedups claimed, I am not gonna complain about a mere factor of 10 or 20 speedup. Most importantly, velox$rasterize() doesn't leave streaks for the locations I found where raster::rasterize did!
I found that it leaves a lot of memory garbage, and when converting large rasterLayers derived from velox$rasterize, running gc() was helpful before writing the raster in native R .grd format (in INT1S format to save disk space).
Just as a follow up to this question based on my experiences.
The horizontal lines are as a result of these 'islands' as described above. However, it only occurs if the polygon is 'multi-part'. If 'islands' are distinct polygons rather than a separate part of one polygon, then raster:rasterize() works fine.

Creating a raster object from a gridded data set in R

I have a dataset with dimensions 360x180x720 (lon x lat x time). I would like to change its resolution from 1 deg to 0.25 deg using the disaggregate function that's part of R's raster package. I haven't been able to figure out how to create a raster object from this dataset. I think I need to do something involving stacking the data corresponding to each time step in layers, but I what I've tried so far has been unsuccessful. I apologize if this is a trivial question, but most of the online guidance I've seen has been directed towards data formatted spreadsheet style in (lat, lon) pairs (rather than a gridded dataset).
I appreciate any suggestions!

Complex Maps in R Using ggplot2

I recently needed to create a map using combined statistical areas for the US - I want to share how this can be done. The same applies if one wants to use zip codes or other divisions (county, congressional districts, etc.)
1) You need to obtain shapefiles that are needed by any program to draw a map - these basically define the polygons that create the shapes (states, zips, csa, etc.). In the US the Census bureau provides those at
http://www.census.gov/cgi-bin/geo/shapefiles2013/main
(I am sorry but these are so large that I can't provide the data for an example). Unzip them and place them into your directory of choice. Make sure you set it as the working directory using setwd()
2) The files need to be imported into a spatial dataframe object in R, using readShapeSpatial in package maptools. In my case I read 2 files, state to draw the boundary and the csa file to draw the metro areas. For simplicity I created an id field in each, this is not needed. Please note rgeos is required as well, by default maptools tries to use gpoclib but there are some license issues, rgeos doe snot have those
require(maptools)
require(rgeos)
cbsa_bdry = readShapeSpatial("tl_2013_us_csa.shp")
cbsa_bdry#data$id <- as.character(cbsa_bdry#data$GEOID)
state_bdry = readShapeSpatial("tl_2013_us_state.shp")
state_bdry#data$id <- as.character(state_bdry#data$STUSPS)
3) the files need to be fortified to be used in ggplot - as far as I can tell this is creating a dataffame for plotting from the spatial data set. I do not understand it but it is needed. Here you use the id field - in my case state and the csa number
cbsa_map_f <- fortify(cbsa_bdry, region="id")
state_map_f <- fortify(state_bdry, region="id")
At this point the fortified files can be mapped using ggplot, however normally you want some other data to be mapped with them, for example a variable of interest that defines the color of the shape (it can be say the party that holds the governor mansion, or the average income, or whatever your analysis dictates)
4) I will merge back the data portion of the spatial dataframe for this example, in this case I am not mergin any other data. Please note, I learned the hard way that using merge can mess up the data due to sorting, if you use merge make sure the data is not sorted as part of it, I am using join for that reason.
require(plyr)
state_map_df <-join(state_map_f, state_bdry#data, by="id")
state_map_df$state <- state_map_df$STUSPS
5) Do your chart - you need to draw the layers in order, the bottom layer first, then second one, or one can obscure the other. For simplicity I am only drawing Texas
tx <-subset(cbsa_map_df,state=="TX" )
tx1 <- subset(state_map_df,state=="TX")
require(ggplot2)
require(mapproj)
sample <- ggplot(NULL) +geom_polygon(data=tx1,aes(long,lat,group=group),color="blue",fill="light gray")+coord_map(projection="mercator")+theme_bw()
sample <- sample + geom_polygon(data=tx,aes(long,lat,fill=NAME,group=group),color="black")+theme(legend.position="bottom",axis.text.x = element_blank(),axis.text.y = element_blank(),axis.ticks = element_blank())
sample<- sample+guides(fill=guide_legend(title="MSA Name",nrow = 3,title.position="top",title.hjust=0.5,title.theme=element_text(face="bold",angle=0)))
sample <- sample + scale_x_continuous("")+scale_y_continuous("")
sample
I am not passing a dataframne to the initial gplot call to open the chart as each layer uses a different dataframe for data. If I had merged data to use a continuous scale I would use that on the fill= for the csa portion

Resources