Creating a new raster image from averages - r

I have a long list of rasters and I need to take an average of a certain number and create a new image. For example, if I have rasters r1 r2 r3 r4 r5 r6 r7 r8
I want to take an average of r1 and r2 to give me an image, lets say new1. Then I want an average of r3 and r4 to give me image new2. I tried using the runmean in caTools but I couldn't get the output I needed. If I have 8 raster images then using a window of two should leave me with four raster images. I know raster usually belong on the GIS site but I needed help with the code so I hope it is ok here.

Supposing you have all the rasters in one folder: rasdir (and nothing else in this folder but the rasters to loop over), set environmental vars:
rasdir="myrasters/"
raspaths <- list.files(path=rasdir, full.names=T)
Assuming all rasters have the same extent and resolution, they can be stacked:
rascube <- stack(raspaths)
Create function that executes some function e.g. mean across bands
rascube is the stack of images to loop over, win the window size and outdir is the output directory
rasfun <- function(x=rascube, win=2, outdir=getwd()){
#Sanity check
if(!(length(raspaths)/win)%%1==0){stop("Number of rasters must be divisible by window size")}
#Create ```mat``` , an index that determines how rasters in ```rascube``` are aggregated:
#indices in the same row refer to rasters to be averaged into the ith output.
mat <- matrix(data=1:length(raspaths), ncol=win, byrow=T)
#Loop over ```rascube```, calculating the moving average as controlled by ```mat```
for (i in 1:nrow(mat)){
#Compute ith moving mean, You can alter this to compute a moving "whatever you like"
#Note the usage of ```[[ ]]``` to subset raster bands: see ```raster``` docu.
#Also Note the usage of ```na.rm=T```, just in case your images have NA's you dont care about
res_i <- sum(x[[ mat[i,1]:mat[i,win] ]], na.rm=T)/win #
#Write output to file: note how output filename is derived from the respective input rasters
#makes it possible to trace the outputs back to their source rasters.
writeRaster(x=res_i, filename=paste(mat[i,1:win], collapse=""),
format="GTiff", overwrite=T)
}
}
#Run newly created function on your stack of inputs with whatever args:
rasfun(x=rascube, win=2, outdir="moving_mean_rasters/")
Note: the number of rasters must be divisible by the window size e.g. attempting to run a moving window computation on 7 rasters with a window size of 2 will be made to fail by the sanity check. Of course you can alter the function to behave as you think is best for your use case. Cheers!

Related

Normalize RasterLayer as Matrix to use as Clip Frame

I was assigned the task to clip a raster from .nc file from a .tif file.
edit (from comment):
i want to extract temp. info from the .nc because i need to check the yearly mean temperature of a specific region. to be comparable the comparison has to occur on exactly the same area. The .nc file is larger than the previously checked area so i need to "clip" it to the extent of a .tif I have. The .tif data is in form 0|1 where it is 0 (or the .tif is smaller than the .nc) the .nc data should be "cliped". In the end i want to keep the .nc data but at the extent of the .tif while still retaining its resolution & projection. (.tif and .nc have different projections&pixel sizes)
Now ordinarily that wouldn't be a problem as i could use raster::crop. This doesn't deal with different projections and different pixel size/resolution though. (I still used it to generate an approximation, but it is not precise enough for the final infromation, as can be seen in the code snippet below). The obvious method to generate a more reliable dataset/rasterset would be to first use a method like raster::projectRaster or raster::sp.Transform # adding sp.transform was done in an edit to the original question and homogenize the datasets but this approach takes too much time, as i have to do this for quite a few .nc files.
I was told the best method would be to generate a normalized matrix from the smaller raster "clip_frame" and then just multiply it with the "nc_to_clip" raster. Doing so should prevent any errors through map projections or other factors. This makes a lot of sense to me in theory but I have no idea how to do this in practice. I would be very grateful to any kind of hint/code snippet or any other help.
I have looked at similar problems on StackOverflow (and other sites) like:
convert matrix to raster in R
Convert raster into matrix with R
https://www.researchgate.net/post/Hi_Is_there_a_way_to_multiply_Raster_value_by_Raster_Latitude
As I am not even sure how to frame the question correctly, I might have overlooked an answer to this problem, if so please point me there!
My (working) code so far, just to give you an idea of how I want to approach the topic (here using the crop-function).
#library(ncdf4)
library(raster)
library(rgdal)
library(tidyverse)
nc_list<-list.files(pattern = ".*0.nc$") # list of .nc files containing raster and temperature information
#nc_to_clip <- lapply(nc_list, raster, varname="GST") # read in as raster
nc_to_clip < -raster(ABC.nc, vername="GST)
clip_frame <- raster("XYZ.tif") # read in .tif for further use as frame
mean_temp_from_raster<-function(input_clip_raster, input_clip_frame){ # input_clip_raster= raster to clip, input_clip_frame
r2_coord<-rasterToPoints(input_clip_raster, spatial = TRUE) # step 1 to extract coordinates
map_clip <- crop(input_clip_raster, extent(input_clip_frame)) # use crop to cut the input_clip_raster (this being the function I have to extend on)
temp<-raster::extract(map_clip, r2_coord#coords) # step 2 to extract coordinates
temp_C<-temp*0.01-273.15 # convert kelvin*100 to celsius
temp_C<-na.omit(temp_C)
mean(temp_C)
return_list<-list(map_clip, mean(temp_C))
return(return_list)
}
mean_tempC<-lapply(nc_to_clip, mean_temp_from_raster,clip_frame)
Thanks!
PS:
I don't have much experience working with .nc files and/or RasterLayers in R as I used to work with ArcGIS/Python (arcpy) for problems like this, which is not an option right now.
Perhaps something like this?
library(raster)
nc <- raster(ABC.nc, vername="GST)
clip <- raster("XYZ.tif")
x <- as(extent(clip), "SpatialPolygons")
crs(x) <- crs(clip)
y <- sp::spTransform(x, crs(nc))
clipped <- crop(nc, y)

How to write a loop for creating cropped raster for every id of a shapefile with a raster base?

I'm still new to R and don't know how to create a loop for my workprocess to make it more efficient.
I have a Digital Elevation Model (raster Barrow_5m.tif), a shapefile for lakes and buffer with 10 iDs in a row of the table each.
In the script below I created a new raster file for all values of the lake and the buffer shape file with the data from the DEM raster. This works fine.
setwd("...")
Barrow_5m <- raster("Barrow_5m.tif")
Barrow_DTLB <- st_read("Barrow_DTLB.shp")
Barrow_DTLB_Buffer <- st_read("Barrow_DTLB_BufferOUT.shp")
Barrow_lake <- crop(Barrow_5m, extent(Barrow_DTLB))
raster_lake <- rasterize(Barrow_DTLB, Barrow_lake, mask = TRUE)
Barrow_buffer <- crop(Barrow_2m, extent(Barrow_DTLB_Buffer))
raster_buffer <- rasterize(Barrow_DTLB_Buffer, Barrow_buffer, mask = TRUE)
writeRaster(raster_lake, "raster_lake.tif")
writeRaster(raster_buffer, "raster_buffer.tif")
But now I want to have a raster file for every id of the lake and the buffer shapefile seperately, so 2x10 files.
I thought it's best to write a loop for this, but my skills are not enough so far to do this.
Also other questions didn't bring the solution so far. I tried to help me with this.
Alternatively I could use my end product tif from the script above and undo this in files for every ID.
I want to write the loop and not do it by hand for all the IDs of the shapefiles, because afterwards I am going to do the same with an even bigger shapefile of more values.
I found a solution now, by extracting data by the ID.
It creates a largelist with 11 elements and all values of each id, which is sufficient for my further work. You can also directly creat the mean, max, min, etc values of each element (so each ID).
k <- Barrow_DTLB$ID #k= number of rows
LakesA <- extract(raster_lakeA, Barrow_DTLB[k, ])
LakesA_mean <- extract(raster_lakeA, Barrow_DTLB[k, ], fun=mean)
Maybe this solution is also helpful for a few, who already viewed the question.
I think this should work:
for (i in unique(raster_lake)){
r <- raster_lake
r[!(values(r) == i)] <- NA
r <- trim(r)
writeRaster(r, paste0("raster_lake_", i, ".tif"))
}

writeRaster to .img file sizes are much bigger than source files

I am trying to take multiple raster files, add them together, and write the resulting raster file to disk. However, I've noticed that each time I've done this the resulting raster.img file is ~877 mb in size, while the individual raster files I am adding together are no bigger than 4 mb. Additionally, sometimes I'm combining 3 rasters into 1, sometimes I'm combining 10 rasters in to 1. No matter how many I start with, the resulting raster files are all the same size.
Here is some example code for how I'm adding the rasters together and writing to file
library(raster)
r1 <- r2 <- r3 <- raster(nrow=10, ncol=10)
values(r1) <- runif(ncell(r1))
values(r2) <- runif(ncell(r2))
values(r3) <- runif(ncell(r3))
combined <- r1 + r2 + r3
writeRaster(combined, "data/filename.img", overwrite = T)
Based on a similar question here I checked that the datatypes of my input rasters and the combined raster were the same, and they were. All of them are FLT4S. The resulting values in the combined raster don't seem wrong - I have no negative values or absurdly high values or anything.
Is there something else that I'm missing? Could there be some quality of the rasters I'm starting with that I'm overlooking that would affect this?
Data type can plays a role:
x <- writeRaster(combined, "filename1.img", datatype='FLT4S', overwrite=TRUE)
y <- writeRaster(combined, "filename2.img", datatype='INT2S', overwrite=TRUE)
file.size("filename1.img")
#[1] 23249
file.size("filename2.img")
#[1] 15057
But you state that all files are FLT4S. And given the very large (> 200 times) difference in file size, something else must be going on. The input files are probably compressed. You can compress the output files (see manual)
z <- writeRaster(combined, "filename3.img", datatype='FLT4S', options="COMPRESSED=YES", overwrite=TRUE)
file.size("filename3.img")
#[1] 7429
That is about 3 times smaller that without compression. Perhaps the original data can be compressed more, but it still seems unlikely to get 200 times reduction.

efficient use of raster functions in r

I have 500+ points in a SpatialPointsDataFrame object; I have a 1.7GB (200,000 rows x 200,000 cols) raster object. I want to have a tabulation of the values of the raster cells within a buffer around each of the 500+ points.
I have managed to achieve that with the code below (I got a lot of inspiration from here.). However, it is slow to run and I would like to make it run faster. It actually runs OK for buffers with "small" widths, say 5km ro even 15km (~1 million cells), but it becomes super slow when buffer increases to say 100km (~42 million cells).
I could easily improve on the loop below by using something from the apply family and/or a parallel loop. But my suspicion is that it is slow because the raster package writes 400Mb+ temporary files for each interaction of the loop.
# packages
library(rgeos)
library(raster)
library(rgdal)
myPoints = readOGR(points_path, 'myLayer')
myRaster = raster(raster_path)
myFunction = function(polygon_obj, raster_obj) {
# this function return a tabulation of the values of raster cells
# inside a polygon (buffer)
# crop to extent of polygon
clip1 = crop(raster_obj, extent(polygon_obj))
# crops to polygon edge & converts to raster
clip2 = rasterize(polygon_obj, clip1, mask = TRUE)
# much faster than extract
ext = getValues(clip2)
# tabulates the values of the raster in the polygon
tab = table(ext)
return(tab)
}
# loop over the points
ids = unique(myPoints$ID)
for (id in ids) {
# select point
myPoint = myPoints[myPoints$ID == id, ]
# create buffer
myPolygon = gBuffer(spgeom = myPoint, byid = FALSE, width = myWidth)
# extract the data I want (projections, etc are fine)
tab = myFunction(myPolygon, myRaster)
# do stuff with tab ...
}
My questions:
Am I right to partially blame the writing operations? If I managed to avoid all those writing operations, would this code run faster? I have access to a machine with 32GB of RAM -- so I guess it is safe to assume I could load the raster to the memory and need not to write temporary files?
What else could I do to improve efficiency in this code?
I think you should approach it like this
library(raster)
library(rgdal)
myPoints <- readOGR(points_path, 'myLayer')
myRaster <- raster(raster_path)
e <- extract(myRaster, myPoints, buffer=myWidth)
And then something like
etab <- sapply(e, table)
It is hard to answer your question #1 as we do not know enough about your data (we do not know how many cells are covered by a "100 km" buffer). But you can set options about when to write to file with the rasterOptions function. You notice that getValues is faster than extract, based on the post you link to, but I think that is wrong, or at least not very important. The combination of crop, rasterize and getValues should have a similar performance as extract (which does almost exactly that under the hood). If you go this route anyway, you should pass an empty RasterLayer, created by raster(myRaster) for faster cropping.

writing a loop for upscaling precipitation for USA

I am writing a code to calculate the mean amount of precipitation for different regions of conterminous USA. My total data has 300 times 120 (lon*lat) grids in Netcdf format. I want to write a loop in R to take the average of each 10 by 10 number of grids and assign that value (average) to all of the grids inside the region and repeat this for the next region. At the end instead of a 120 by 300 grids I will have 12 by 30 grids. So this is kind a upscaling method I want to apply to my data. I can use a for-loop for each region separately but It makes my code very huge and I don’t want to do that. Any idea would be appreciated. Thanks.
P.S: Here is the function I have written for one region (10by10) lat*lon.
upscaling <- function(file, variable, start.time=1, count.time=1)
{
library(ncdf) # load ncdf library to manipulate ncdf data
ncdata <- open.ncdf(file); # open ncdf file
lon <- get.var.ncdf(ncdata, "lon");
lat <- get.var.ncdf(ncdata, "lat");
time <- get.var.ncdf(ncdata, "time");
start.lon <- 1
end.lon <- length(lon)
start.lat <- 1
end.lat <- length(lat)
count.lon <- end.lon - start.lon + 1; # count number of longitude
count.lat <- end.lat - start.lat + 1; # count number of latitude
dat <- get.var.ncdf(ncdata, variable, start=c(start.lon, start.lat, 1),
count=c(count.lon, count.lat, 1))
temp.data<- array(0,dim=c(10,10))
for (i in 1:10)
{
for (j in 1:10)
{
temp.data <- mean(dat[i,j,])
}
}
}
There is no need to make a messy loop to spatially aggregate your data. Just use the aggregate function in the raster package:
library(raster)
a=matrix(data=c(1:100),nrow=10,ncol=10)
a=raster(a)
ra <- aggregate(a, fact=5, fun=mean) #fact=5 will aggregate using a 5x5 window
ra=as.matrix(ra)
ra
Now for your netcdf data, use raster's rasterFromXYZ to create the raster that can then be aggregated with the above method. Bonus includes the option to define your projection as an argument in the function so you end up with a georeferenced object at the end. This is important because if you aggregate your data without it you will then have to figure out by hand how to georeference the resulting matrix.
EDIT: If you want a resulting raster with the same dimensions as the original one, disaggregate the data right after aggregating it. While this seems redundant, these raster methods are very fast.
library(raster)
a=matrix(data=c(1:100),nrow=10,ncol=10)
a=raster(a)
ra <- aggregate(a, fact=5, fun=mean) #fact=5 will aggregate using a 5x5 window
ra <- disaggregate(ra, fact=5)
ra=as.matrix(ra)
ra
If you grid definitions follow standard netcdf conventions, then you might be able to remap using the CDO remapping functions. For first order conservative remapping you can try
cdo remapcon,grid_specification_here in.nc out.nc
Note that the answer given above is approximate, and not quite correct as the grid cell size is not the same as a function of latitude. The size of the error is likely small for this particular task as the cell sizes are fine, but nevertheless the answer will be slightly off.

Resources