I'm processing yearly multilayer netCDF files with daily precipitation data from CHIRPS. I have the files for the whole world, each file about 1.2gb large. I need to calculate indices from the precipitation data for each cell in the raster for a specific region. In order to do that I'm trying to crop the files to get a rectangular shape above my area of interest using the raster R package.
This is the code I'm using, exemplary for the first file.
library(ncdf4)
library(raster)
library(rgdal)
# Crop extent
crop_extent <- as(raster::extent(79, 89, 25, 31), "SpatialPolygons")
proj4string(crop_extent) <- "+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs"
# Set directory with original files
setwd("~/data")
# Read file
chirps81 <- stack("chirps-v2.0.1981.days_p05.nc")
chirps81crop <-crop(chirps1981, crop_extent)
# Write cropped file back to different folder
setwd("~/croppeddata")
writeRaster(chirps81crop, "chirps81crop.nc", overwrite=TRUE)
For some reason however while writing the file the layers lose their name. In the original files and after cropping the names have layer names of the format "X1981.01.01". But after writing and reading the netCDF file with new file <- stack("chirps81crop.nc") the layer names are changed to the format 'X1' up to 'X365'. I think it should be fine working with it, assuming that the order of the layers didn't get mixed up but I don't understand what is happening to the layer names and if this happens because there is something wrong with the code.
It's the writeRaster() function that is losing the layer names, not the crop operation. It is possible to use lower level ncdf functions to assign a numeric value (not a string unfortunately) to each layer which will then show up in the name of the layers after reading. Taking inspiration from the example here, I created some code that shows this.
library(ncdf4)
library(raster)
library(rgdal)
# Crop extent
crop_extent <- as(raster::extent(5.74, 5.75, 50.96, 50.97), "SpatialPolygons")
proj4string(crop_extent) <- "+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs"
# make a sample file
r <- raster(system.file("external/test.grd", package="raster"))
r.latlon <- projectRaster(r, crs = proj4string(crop_extent))
writeRaster(x=r.latlon, filename = 'test.nc', format = 'CDF', overwrite=TRUE)
# read the sample as a 2 layer stack and crop it
test <- stack('test.nc', 'test.nc')
writeRaster(test, 'teststack.nc', overwrite=TRUE, format='CDF')
testcrop <- crop(test, crop_extent)
names(testcrop)
# [1] "test.1" "test.2"
# write the cropped file and make the zname equal to Layer
writeRaster(testcrop, 'testcrop.nc', overwrite=TRUE, format='CDF', zname='Layer')
# open the cdf file directly
nc <- nc_open('testcrop.nc', write = T)
# give the layers numbers starting from 10 so
# we can see them easily
layers = 1:nlayers(testcrop) + 10
layers
# [1] 11 12
ncvar_put(nc, 'Layer', layers)
nc_close(nc)
newtestcrop <- stack('testcrop.nc')
names(newtestcrop)
# [1] "X11" "X12"
nc <- nc_open('testcrop.nc', write = F)
layers = ncvar_get(nc, 'Layer')
layers
# [1] 11 12
nc_close(nc)
So it is possible to get names with numbers under your control when writing the raster, but I don't know enough about your environment to determine if this will help since it might be tricky to map the names you need to a single unambiguous number.
I hope you don't mind me offering a non-R solution, but this task is much easier from the command line using CDO:
cdo sellonlatbox,79,89,25,31 chirps-v2.0.1981.days_p05.nc cropped_file.nc
Which indices did you want to calculate? I suspect it is possible to calculate those quickly and easily with CDO functions too...
Related
I have more than 50 raster files (ASCII format) that I need to crop. I already exported the mask from ArcMap in ASCII format as well and loaded it into R. How can I make it work for all rasters in a row and export them with the same name as before (of course in a different folder to not overwrite)?
I know there is a crop function in the raster package, but I never used it so far. I only stacked them for further habitat analysis.
My Code so far:
#### Use only part of area
files2 <- list.files(path="D:/",full.names=TRUE, pattern = "\\.asc$")
files2
# Create a RasterLayer from first file
mask_raster <- raster(files2[1])
# Crop. But how??
crop(x = , y=mask_raster)
writeRaster(...)`
I did not find an easy solution to crop multiple rasters by a raster, but by a shape file. So I just went back to ArcMap and converted the raster into a shapefile. Then in R, crop and mask were the crucial steps. See code below (modified from Mauricio Zambrano-Bigiarini's code). Hope this helps.
# Reading the shapefile (mask to crop later)
Maskshp <- readOGR("mask.shp")
# Getting the spatial extent of the shapefile
e <- extent(Maskshp)
# for loop with (in this case) 60 runs
for (i in 1:60) {
# Reading the raster to crop
files <- list.files(path="...your path",full.names=TRUE, pattern = "\\.asc$")
Env_raster <- raster(files[i])
# Save the filename
filename <- (paste(basename(files[i]), sep=""))
# Crop the raster
Env_raster.crop <- crop(Env_raster, e, snap="out")
# Dummy raster with a spatial extension equal to the cropped raster,
# but full of NA values
crop <- setValues(Env_raster.crop, NA)
# Rasterize the catchment boundaries, with NA outside the catchment boundaries
Maskshp.r <- rasterize(Maskshp, crop)
# Putting NA values in all the raster cells outside the shapefile boundaries
Maskshp.masked <- mask(x=Env_raster.crop, mask=Maskshp.r)
plot(Maskshp.masked)
#Export file to working directory with original name as new name
writeRaster(Maskshp.masked, filename)
}
Just to try to filter a shape file to ease plotting
I have a shape file downloaded from UK gov:
http://geoportal.statistics.gov.uk/datasets/7ff28788e1e640de8150fb8f35703f6e_1/data?geometry=-76.678%2C45.365%2C69.572%2C63.013&orderBy=lad16cd&orderByAsc=false&where=%20(UPPER(lad16cd)%20like%20%27%25E0800000%25%27%20OR%20UPPER(lad16cd)%20like%20%27%25E08000010%25%27)%20
Based on this: https://www.r-bloggers.com/r-and-gis-working-with-shapefiles/
I wrote but do not know to filter :
setwd("~/Documents/filename")
getwd() # --double confirm real data directory
#install.packages("maptools")
library(maptools)
crswgs84=CRS("+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs")
ukmap=readShapePoly("filename.shp",proj4string=crswgs84,verbose=TRUE)
class(ukmap)
str(ukmap#data)
str(ukmap#polygons[[1]])
ukmap#bbox
# <-- need to do some filterig
plot(ukmap) # as this will take too long and not want to plot whole UK
For example I just want "E06000001" to "E06000020".
(the filename is "Local_Authority_Districts_December_2016_Full_Extent_Boundaries_in_the_UK" not sure how to include it in the program coding quote)
You can consider to use the sf package to read the shapefile and plot the data. Filtering the sf object is the same as filtering a data frame.
library(sf)
# Read the sahpefile
ukmap <- st_read("Local_Authority_Districts_December_2016_Full_Extent_Boundaries_in_the_UK.shp")
# Subset the sf object
ukmap_sub <- ukmap[ukmap$lad16cd %in% c("E06000001", "E06000020"), ]
# Plot the boundary of ukmap_sub
plot(st_geometry(ukmap_sub))
If you prefer to work with on SpatialPolygonsDataFrame, we can use the readOGR function from the rgdal package. After that, we can subset the SpatialPolygonsDataFrame like a regular data frame.
library(maptools)
library(rgdal)
ukmap <- readOGR(dsn = getwd(), layer = "Local_Authority_Districts_December_2016_Full_Extent_Boundaries_in_the_UK")
ukmap_sub <- ukmap[ukmap$lad16cd %in% c("E06000001", "E06000020"), ]
plot(ukmap_sub)
Currently I'm trying to work with CALIPSO LiDAR data. The files are distributed in HDF4 (checked it with HDFView). The subdataset I'm interested is the Cloud Top Height, which is stored in a simple table (56160 rows, 5 columns):
So far, I tried two things:
A) Converting HDF subdataset to GeoTiff using gdal_translate (gdalUtils package)
# Get a list of sds names
sds <- get_subdatasets('C:\\Users\\Thomas\\Desktop\\NASA Sat Data\\Cloud Layer Data\\CAL_LID_L2_333mCLay-ValStage1-V3-01.2011-08-16T03-02-01ZN.hdf')
# Isolate name of subdataset and convert
name <- sds[25]
filename <- 'Cloud_Top_Height_Test.tif'
gdal_translate(name, dst_dataset = filename)
# Load the Geotiff created into R
r <- raster(filename)
# fix extent, projection and negative values
e <- extent(-80.016243, 64.831413, -179.183868, -1.711731)
r <- setExtent(r, e)
values(r)[values(r) < 0] = 0 # replace -9999 with NA
crs(r) <- '+proj=utm +zone=28 +ellps=WGS84 +datum=WGS84 +units=m +no_defs'
The result was a raster file which was obviously missing any spatial information (dimensions, extend, crs, resolution), that's why I fixed some things afterwards.
This is the result: .
As you can see, the result are some strange lines where it should be square pixels. I suppose I'm missing some statements in the gdal_translate call maybe? Also, all the adjustments afterwards seem a bit stuporous or let's say 'unprofessional'.
B) Converting the table (subdataset) into a data frame (to convert it into a raster in an second step)
A few days ago I came up with the idea to convert the subdataset (which is essentially 'only' a table) to a data frame and convert it in a second step into a raster, where I specify all spatial information 'from scratch' rather than adjusting it afterwards as in my first attempt. Still, I found no commands/packages to do so and I'm still not confident that this is the right way.
What I need in the end: A proper (georeferenced etc) raster (56160 rows, 5 columns) holding the values of the subdataset from the HDF4 file.
My question: How can I extract a HDF4 subdataset and convert it to a raster having the specifications mentioned above?
Any help is greatly appreciated (and since this is my first post at SO: sorry, if this question is missing any forms required in this forum, criticism is also appreciated)
to convert to dataframe:
as.data.frame(r, xy = TRUE)
I would like to rasterize a very large vector file to 25m and have had some success with the 'cluster' package, adapting the qu's here and here, which worked nicely for that particular data.
However I now have a larger vector file that needs rasterizing and have access to a cluster that uses snowfall. I'm not used to cluster functions and i'm just not sure how to set up sfLapply. I am consistently getting the following sort of error as sfLapply is called in the cluster:
Error in checkForRemoteErrors(val) :
one node produced an error: 'quote(96)' is not a function, character or symbol
Calls: sfLapply ... clusterApply -> staticClusterApply -> checkForRemoteErrors
my full code:
library(snowfall)
library(rgeos)
library(maptools)
library(raster)
library(sp)
setwd("/home/dir/")
# Initialise the cluster...
hosts = as.character(read.table(Sys.getenv('PBS_NODEFILE'),header=FALSE)[,1]) # read the nodes to use
sfSetMaxCPUs(length(hosts)) # make sure the maximum allowed number of CPUs matches the number of hosts
sfInit(parallel=TRUE, type="SOCK", socketHosts=hosts, cpus=length(hosts), useRscript=TRUE) # initialise a socket cluster session with the named nodes
sfLibrary(snowfall)
# read in required data
shp <- readShapePoly("my_data.shp")
BNG <- "+proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +ellps=airy +datum=OSGB36 +units=m +no_defs"
crs(shp) <- BNG
### rasterize the uniques to 25m and write (GB and clipped) ###
rw <- raster(res=c(25,25), xmn=0, xmx=600000, ymn=0, ymx=1000000, crs=BNG)
# Number of polygons features in SPDF
features <- 1:nrow(shp[,])
# Split features in n parts
n <- 96
parts <- split(features, cut(features, n))
rasFunction = function(X, shape, raster, nparts){
ras = rasterize(shape[nparts[[X]],], raster, 'CODE')
return(ras)
}
# Export everything in the workspace onto the cluster...
sfExportAll()
# Distribute calculation across the cluster nodes...
rDis = sfLapply(n, fun=rasFunction,X=n, shape=shp, raster=rw, nparts=parts) # equivalent of sapply
rMerge <- do.call(merge, rDis)
writeRaster(rMerge, filename="my_data_25m", format="GTiff", overwrite=TRUE)
# Stop the cluster...
sfStop()
i've tried a number of things, changing the function and sfLapply, but i just can't get this to run. thanks
Because I can't do formatting in comments:
library(maptools)
shp <- readShapePoly("my_data.shp")
BNG <- "+proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +ellps=airy +datum=OSGB36 +units=m +no_defs"
shp.2 <- spTransform(shp, BNG)
#Continue as before
Overwriting a projection != reprojecting data.
Ok so I abandoned snowfall and I looked into gdalUtils::gdal_rasterize instead and found a lot of benefits to using it (with one downside that someone might be able to answer?)
Context & Issue: My vector data exist inside an ESRI file Geodatabase and require some processing pre rasterization. No problem, rgdal::readOGR is fine. However as gdal_rasterize requires a pathname to the vector data, i had trouble here because I could not write out my processed vector data, they exceed the max file size for a shapefile outside of a geodatabase and gdal_rasterize will not accept objects, paths to .gdbs or .Rdata/.rds files. How do I pass an object to gdal_rasterize??
So I wrote out the large shapefile in segments equal to number of processors.
Originally raster::rasterize was used as I could simply pass the vector object stored in memory to rasterize without the writing problem (though I would have liked to have it written), rasterizing this data to 25m. This took a pretty long time, even in parallel.
Solution: gdal_rasterize in parallel.
# gdal_rasterize in parallel
require(gdalUtils)
require(rgdal)
require(rgeos)
require(cluster)
require(parallel)
require(raster)
# read in vector data
shape <- readOGR("./mygdb.gdb", layer="mydata",stringsAsFactors=F)
## do all the vector processing etc ##
# split vector data into n parts, the same as number of processors (minus 1)
npar <- detectCores() - 1
features <- 1:nrow(shape[,])
parts <- split(features, cut(features, npar))
# write the vector parts out
for(n in 1:npar){
writeOGR(shape[parts[[n]],], ".\\parts", paste0("mydata_p",n), driver="ESRI Shapefile")
}
# set up and write a blank raster for gdal_rasterize for EACH vector segment created above
r <- raster(res=c(25,25), xmn=234000, xmx=261000, ymn=229000, ymx=256000, crs=projection(shape))
for(n in 1:npar){
writeRaster(r, filename=paste0(".\\gdal_p",n,".tif"), format="GTiff", overwrite=TRUE)
}
# set up cluster and pass required packages and objects to cluster
cl <- makeCluster(npar)
clusterEvalQ(cl, sapply(c('raster', 'gdalUtils',"rgdal"), require, char=TRUE))
clusterExport(cl, list("r","npar"))
# parallel apply the gdal_rasterize function against the vector parts that were written,
# same number as processors, against the pre-prepared rasters
parLapply(cl = cl, X = 1:npar, fun = function(x) gdal_rasterize(src_datasource=paste0(".\\parts\\mydata_p",x,".shp"),
dst_filename=paste0(".\\gdal_p",n,".tif"),b=1,a="code",verbose=F,output_Raster=T))
# There are now n rasters representing the n segments of the original vector file
# read in the rasters as a list, merge and write to a new tif.
s <- lapply(X=1:npar, function(x) raster(paste0(".\\gdal_p",n,".tif")))
s$filename <- "myras_final.tif"
do.call(merge,s)
stopCluster(cl)
The time (split 60% for vector reading/processing/writing & 40% for raster generation and rasterization) for the entire job in this code was around 9 times quicker than raster::rasterize in parallel.
Note: I tried this initially by splitting the vectors into n parts but creating only 1 blank raster. I then wrote to the same blank raster from all cluster nodes simultaneously but this corrupted the raster and made it unusable in R/Arc/anything (despite going through the function without error). Above is a more stable way, but n blank rasters have to be made instead of 1, increasing processing time, plus merging n rasters is extra processing.
caveat - raster::rasterize in parallel did not have writeRaster inside the rasterize function but rather as a separate line, which will have increased processing time in the original run due to storage to temp files etc.
EDIT: Why are the frequency tables from the raster from gdal_rasterize not the same as raster::rasterize? I mean with 100million cells i expect a bit of difference but for some codes it was a few 1000 cells different. I thought they both rasterized by centroid?
I have a BIP binary file https://drive.google.com/open?id=0BwkLXkUMkLd4SnBmUHNmMW5CcDg
with the following information:
ncol <- 193
nrow <- 94
xmin <- -180
xmax<- 180
ymin <- -88.542000
ymax <- 88.542000
The file has 365 bands, I want to be able to convert them into raster grids (raster stack).
I can read the files in R, but I don't know how to deal with the bands. Here is what I have done so far:
cnt <- ncol*nrow*365
data <- readBin(file,what= "double", n=cnt, size=4,endian=.Platform$endian)
data.m <- matrix(data, ncol=193, nrow=94) # I don't know how to add the bands
data.r <- raster(data.m, xmn=xmin, xmx=xmax, ymn=ymin, ymx=ymax)
plot(data.r)
You can also write a header file. For example, rename 'Tavg.dat' to 'Tavg.gri' and save the below to a text file called 'Tavg.grd'
[georeference]
nrows=94
ncols=193
xmin=-180
ymin=-88.542
xmax=180
ymax=88.542
projection=+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
[data]
datatype=FLT4S
byteorder=little
nbands=365
bandorder=BIP
nodatavalue=-9999
[description]
layername=''
Then you can do
library(raster)
b <- brick('Tavg.grd')
Use raster::brick as it is a function to create multi-band rasters based on files with multiple bands including BIL and BIP.
If it by chance does not recognize your georef data, you can input it as arguments when reading the file: ncols=193, xmn=-180, xmx=180, ymn=-88.542000, ymx=88.542000, nl=365.
Here is an example implementation based on the file you provided. Since it was not recognized as a BIP file, you first have to read it and convert it into an 3D array. Then you can simply use it with the brick command to create the raster stack. Now, the tricky thing is that the dimensions of the data file where in an order incompatible with the brick command, so you have to transpose the dimensions using the aperm function. After that, it is straight forward to convert the entire array into a georeferenced stack.
wd="D:/temp/"
setwd(wd)
library(raster)
ncol <- 193
nrow <- 94
nbands <- 365
cnt <- ncol*nrow*nbands
data <- readBin("Tavg.dat",what= "double", n=cnt, size=4,endian=.Platform$endian)
data2=array(data,c(nbands, ncol, nrow))
data2 <- aperm(data2, c(3,2,1)) #for transposing
raster_brick=brick(data2, xmn=-180, xmx=180, ymn=-88.542000, ymx=88.542000)
plot(raster_brick[[1]])
Here is the first band image: