writing a loop for upscaling precipitation for USA - r

I am writing a code to calculate the mean amount of precipitation for different regions of conterminous USA. My total data has 300 times 120 (lon*lat) grids in Netcdf format. I want to write a loop in R to take the average of each 10 by 10 number of grids and assign that value (average) to all of the grids inside the region and repeat this for the next region. At the end instead of a 120 by 300 grids I will have 12 by 30 grids. So this is kind a upscaling method I want to apply to my data. I can use a for-loop for each region separately but It makes my code very huge and I don’t want to do that. Any idea would be appreciated. Thanks.
P.S: Here is the function I have written for one region (10by10) lat*lon.
upscaling <- function(file, variable, start.time=1, count.time=1)
{
library(ncdf) # load ncdf library to manipulate ncdf data
ncdata <- open.ncdf(file); # open ncdf file
lon <- get.var.ncdf(ncdata, "lon");
lat <- get.var.ncdf(ncdata, "lat");
time <- get.var.ncdf(ncdata, "time");
start.lon <- 1
end.lon <- length(lon)
start.lat <- 1
end.lat <- length(lat)
count.lon <- end.lon - start.lon + 1; # count number of longitude
count.lat <- end.lat - start.lat + 1; # count number of latitude
dat <- get.var.ncdf(ncdata, variable, start=c(start.lon, start.lat, 1),
count=c(count.lon, count.lat, 1))
temp.data<- array(0,dim=c(10,10))
for (i in 1:10)
{
for (j in 1:10)
{
temp.data <- mean(dat[i,j,])
}
}
}

There is no need to make a messy loop to spatially aggregate your data. Just use the aggregate function in the raster package:
library(raster)
a=matrix(data=c(1:100),nrow=10,ncol=10)
a=raster(a)
ra <- aggregate(a, fact=5, fun=mean) #fact=5 will aggregate using a 5x5 window
ra=as.matrix(ra)
ra
Now for your netcdf data, use raster's rasterFromXYZ to create the raster that can then be aggregated with the above method. Bonus includes the option to define your projection as an argument in the function so you end up with a georeferenced object at the end. This is important because if you aggregate your data without it you will then have to figure out by hand how to georeference the resulting matrix.
EDIT: If you want a resulting raster with the same dimensions as the original one, disaggregate the data right after aggregating it. While this seems redundant, these raster methods are very fast.
library(raster)
a=matrix(data=c(1:100),nrow=10,ncol=10)
a=raster(a)
ra <- aggregate(a, fact=5, fun=mean) #fact=5 will aggregate using a 5x5 window
ra <- disaggregate(ra, fact=5)
ra=as.matrix(ra)
ra

If you grid definitions follow standard netcdf conventions, then you might be able to remap using the CDO remapping functions. For first order conservative remapping you can try
cdo remapcon,grid_specification_here in.nc out.nc
Note that the answer given above is approximate, and not quite correct as the grid cell size is not the same as a function of latitude. The size of the error is likely small for this particular task as the cell sizes are fine, but nevertheless the answer will be slightly off.

Related

Daily time steps interpolation between several netcdf layers over time in R or CDO

I have several observations over time of a spatial variable (let's say var_A_1985.nc, var_A_2000.nc and var_A_2010.nc) with only one time layer. Would it be possible to efficiently reconstruct the daily values from 01/01/1985 to 01/01/2010 by linearly interpolating these values?
I currently explored two options:
Climate Data Operator (CDO): I saw the operators inttime and intyear, but these do not seem to work for my case.
R as RasterBrick element manipulation
something like:
library(raster)
library(ncdf4)
setwd(mypath)
var_A_1985 <- raster("var_A_1985.nc")
var_A_2000 <- raster("var_A_2000.nc")
var_A_brick <- brick(var_A_1985, var_A_2000)
brick_2<- brick(lapply(1:n_days, function(x) raster::raster(matrix(NA, nrow(var_A_1985), ncol(var_A_1985)))) #with n_days representing the number of days between 01/01/1985 and 01/01/2000
extremes<- c(1, n_days)
for (i in 1:2) {
brick_2[[extremes[i]]]=var_A_brick[[1]]
}
var_A_brick_filled <- approxNA(brick_2, method = 'linear')
#and then write this the new netcdf with daily values
This second method actually works, but it takes forever. Is there a more efficient way to do this?
You can do this with CDO as follows:
cdo -inttime,1985-01-01,12:00:00,1day -mergetime var*.nc outfile
Adjust the start time, of course.

How to streamline and speed up loop with getData package in R

I am trying to download high-resolution climate data for a bunch of lat/long coordinates, and combine them into a single dataframe. I've come up with a solution (below), but it will take forever with the large list of coordinates I have. I asked a related question on the GIS StackExchange to see if anyone knew of a better approach for downloading and merging the data, but I'm wondering if I could somehow just speed up the operation of the loop? Does anyone have any suggestions on how I might do that? Here is a reproducible example:
# Download and merge 0.5 minute MAT/MAP data from WorldClim for a list of lon/lat coordinates
# This is based on https://emilypiche.github.io/BIO381/raster.html
# Make a dataframe with coordinates
coords <- data.frame(Lon = c(-83.63, 149.12), Lat=c(10.39,-35.31))
# Load package
library(raster)
# Make an empty dataframe for dumping data into
coords3 <- data.frame(Lon=integer(), Lat=integer(), MAT_10=integer(), MAP_MM=integer())
# Get WorldClim data for all the coordinates, and dump into coords 3
for(i in seq_along(coords$Lon)) {
r <- getData("worldclim", var="bio", res=0.5, lon=coords[i,1], lat=coords[i,2]) # Download the tile containing the lat/lon
r <- r[[c(1,12)]] # Reduce the layers in the RasterStack to just the variables we want to look at (MAT*10 and MAP_mm)
names(r) <- c("MAT_10", "MAP_mm") # Rename the columns to something intelligible
points <- SpatialPoints(na.omit(coords[i,1:2]), proj4string = r#crs) #give lon,lat to SpatialPoints
values <- extract(r,points)
coords2 <- cbind.data.frame(coords[i,1:2],values)
coords3 <- rbind(coords3, coords2)
}
# Convert MAT*10 from WorldClim into MAT in Celcius
coords3$MAT_C <- coords3$MAT_10/10
Edit: Thanks to advice from Dave2e, I've first made a list, then put intermediate results in the list, and rbind it at the end. I haven't timed this yet to see how much faster it is than my original solution. If anyone has further suggestions on how to improve the speed, I'm all ears! Here is the new version:
coordsList <- list()
for(i in seq_along(coordinates$lon_stm)) {
r <- getData("worldclim", var="bio", res=0.5, lon=coordinates[i,7], lat=coordinates[i,6]) # Download the tile containing the lat/lon
r <- r[[c(1,12)]] # Reduce the layers in the RasterStack to just the variables we want to look at (MAT*10 and MAP_mm)
names(r) <- c("MAT_10", "MAP_mm") # Rename the columns to something intelligible
points <- SpatialPoints(na.omit(coordinates[i,7:6]), proj4string = r#crs) #give lon,lat to SpatialPoints
values <- extract(r,points)
coordsList[[i]] <- cbind.data.frame(coordinates[i,7:6],values)
}
coords_new <- bind_rows(coordsList)
Edit2: I used system.time() to time the execution of both of the above approaches. When I did the timing, I had already downloaded all of the data, so the download time isn't included in my time estimates. My first approach took 45.01 minutes, and the revised approach took 44.15 minutes, so I'm not really seeing a substantial time savings by doing it the latter way. Still open to advice on how to revise the code so I can improve the speed of the operations!

Normalize RasterLayer as Matrix to use as Clip Frame

I was assigned the task to clip a raster from .nc file from a .tif file.
edit (from comment):
i want to extract temp. info from the .nc because i need to check the yearly mean temperature of a specific region. to be comparable the comparison has to occur on exactly the same area. The .nc file is larger than the previously checked area so i need to "clip" it to the extent of a .tif I have. The .tif data is in form 0|1 where it is 0 (or the .tif is smaller than the .nc) the .nc data should be "cliped". In the end i want to keep the .nc data but at the extent of the .tif while still retaining its resolution & projection. (.tif and .nc have different projections&pixel sizes)
Now ordinarily that wouldn't be a problem as i could use raster::crop. This doesn't deal with different projections and different pixel size/resolution though. (I still used it to generate an approximation, but it is not precise enough for the final infromation, as can be seen in the code snippet below). The obvious method to generate a more reliable dataset/rasterset would be to first use a method like raster::projectRaster or raster::sp.Transform # adding sp.transform was done in an edit to the original question and homogenize the datasets but this approach takes too much time, as i have to do this for quite a few .nc files.
I was told the best method would be to generate a normalized matrix from the smaller raster "clip_frame" and then just multiply it with the "nc_to_clip" raster. Doing so should prevent any errors through map projections or other factors. This makes a lot of sense to me in theory but I have no idea how to do this in practice. I would be very grateful to any kind of hint/code snippet or any other help.
I have looked at similar problems on StackOverflow (and other sites) like:
convert matrix to raster in R
Convert raster into matrix with R
https://www.researchgate.net/post/Hi_Is_there_a_way_to_multiply_Raster_value_by_Raster_Latitude
As I am not even sure how to frame the question correctly, I might have overlooked an answer to this problem, if so please point me there!
My (working) code so far, just to give you an idea of how I want to approach the topic (here using the crop-function).
#library(ncdf4)
library(raster)
library(rgdal)
library(tidyverse)
nc_list<-list.files(pattern = ".*0.nc$") # list of .nc files containing raster and temperature information
#nc_to_clip <- lapply(nc_list, raster, varname="GST") # read in as raster
nc_to_clip < -raster(ABC.nc, vername="GST)
clip_frame <- raster("XYZ.tif") # read in .tif for further use as frame
mean_temp_from_raster<-function(input_clip_raster, input_clip_frame){ # input_clip_raster= raster to clip, input_clip_frame
r2_coord<-rasterToPoints(input_clip_raster, spatial = TRUE) # step 1 to extract coordinates
map_clip <- crop(input_clip_raster, extent(input_clip_frame)) # use crop to cut the input_clip_raster (this being the function I have to extend on)
temp<-raster::extract(map_clip, r2_coord#coords) # step 2 to extract coordinates
temp_C<-temp*0.01-273.15 # convert kelvin*100 to celsius
temp_C<-na.omit(temp_C)
mean(temp_C)
return_list<-list(map_clip, mean(temp_C))
return(return_list)
}
mean_tempC<-lapply(nc_to_clip, mean_temp_from_raster,clip_frame)
Thanks!
PS:
I don't have much experience working with .nc files and/or RasterLayers in R as I used to work with ArcGIS/Python (arcpy) for problems like this, which is not an option right now.
Perhaps something like this?
library(raster)
nc <- raster(ABC.nc, vername="GST)
clip <- raster("XYZ.tif")
x <- as(extent(clip), "SpatialPolygons")
crs(x) <- crs(clip)
y <- sp::spTransform(x, crs(nc))
clipped <- crop(nc, y)

How to write a loop for creating cropped raster for every id of a shapefile with a raster base?

I'm still new to R and don't know how to create a loop for my workprocess to make it more efficient.
I have a Digital Elevation Model (raster Barrow_5m.tif), a shapefile for lakes and buffer with 10 iDs in a row of the table each.
In the script below I created a new raster file for all values of the lake and the buffer shape file with the data from the DEM raster. This works fine.
setwd("...")
Barrow_5m <- raster("Barrow_5m.tif")
Barrow_DTLB <- st_read("Barrow_DTLB.shp")
Barrow_DTLB_Buffer <- st_read("Barrow_DTLB_BufferOUT.shp")
Barrow_lake <- crop(Barrow_5m, extent(Barrow_DTLB))
raster_lake <- rasterize(Barrow_DTLB, Barrow_lake, mask = TRUE)
Barrow_buffer <- crop(Barrow_2m, extent(Barrow_DTLB_Buffer))
raster_buffer <- rasterize(Barrow_DTLB_Buffer, Barrow_buffer, mask = TRUE)
writeRaster(raster_lake, "raster_lake.tif")
writeRaster(raster_buffer, "raster_buffer.tif")
But now I want to have a raster file for every id of the lake and the buffer shapefile seperately, so 2x10 files.
I thought it's best to write a loop for this, but my skills are not enough so far to do this.
Also other questions didn't bring the solution so far. I tried to help me with this.
Alternatively I could use my end product tif from the script above and undo this in files for every ID.
I want to write the loop and not do it by hand for all the IDs of the shapefiles, because afterwards I am going to do the same with an even bigger shapefile of more values.
I found a solution now, by extracting data by the ID.
It creates a largelist with 11 elements and all values of each id, which is sufficient for my further work. You can also directly creat the mean, max, min, etc values of each element (so each ID).
k <- Barrow_DTLB$ID #k= number of rows
LakesA <- extract(raster_lakeA, Barrow_DTLB[k, ])
LakesA_mean <- extract(raster_lakeA, Barrow_DTLB[k, ], fun=mean)
Maybe this solution is also helpful for a few, who already viewed the question.
I think this should work:
for (i in unique(raster_lake)){
r <- raster_lake
r[!(values(r) == i)] <- NA
r <- trim(r)
writeRaster(r, paste0("raster_lake_", i, ".tif"))
}

efficient use of raster functions in r

I have 500+ points in a SpatialPointsDataFrame object; I have a 1.7GB (200,000 rows x 200,000 cols) raster object. I want to have a tabulation of the values of the raster cells within a buffer around each of the 500+ points.
I have managed to achieve that with the code below (I got a lot of inspiration from here.). However, it is slow to run and I would like to make it run faster. It actually runs OK for buffers with "small" widths, say 5km ro even 15km (~1 million cells), but it becomes super slow when buffer increases to say 100km (~42 million cells).
I could easily improve on the loop below by using something from the apply family and/or a parallel loop. But my suspicion is that it is slow because the raster package writes 400Mb+ temporary files for each interaction of the loop.
# packages
library(rgeos)
library(raster)
library(rgdal)
myPoints = readOGR(points_path, 'myLayer')
myRaster = raster(raster_path)
myFunction = function(polygon_obj, raster_obj) {
# this function return a tabulation of the values of raster cells
# inside a polygon (buffer)
# crop to extent of polygon
clip1 = crop(raster_obj, extent(polygon_obj))
# crops to polygon edge & converts to raster
clip2 = rasterize(polygon_obj, clip1, mask = TRUE)
# much faster than extract
ext = getValues(clip2)
# tabulates the values of the raster in the polygon
tab = table(ext)
return(tab)
}
# loop over the points
ids = unique(myPoints$ID)
for (id in ids) {
# select point
myPoint = myPoints[myPoints$ID == id, ]
# create buffer
myPolygon = gBuffer(spgeom = myPoint, byid = FALSE, width = myWidth)
# extract the data I want (projections, etc are fine)
tab = myFunction(myPolygon, myRaster)
# do stuff with tab ...
}
My questions:
Am I right to partially blame the writing operations? If I managed to avoid all those writing operations, would this code run faster? I have access to a machine with 32GB of RAM -- so I guess it is safe to assume I could load the raster to the memory and need not to write temporary files?
What else could I do to improve efficiency in this code?
I think you should approach it like this
library(raster)
library(rgdal)
myPoints <- readOGR(points_path, 'myLayer')
myRaster <- raster(raster_path)
e <- extract(myRaster, myPoints, buffer=myWidth)
And then something like
etab <- sapply(e, table)
It is hard to answer your question #1 as we do not know enough about your data (we do not know how many cells are covered by a "100 km" buffer). But you can set options about when to write to file with the rasterOptions function. You notice that getValues is faster than extract, based on the post you link to, but I think that is wrong, or at least not very important. The combination of crop, rasterize and getValues should have a similar performance as extract (which does almost exactly that under the hood). If you go this route anyway, you should pass an empty RasterLayer, created by raster(myRaster) for faster cropping.

Resources