R- Raster math while preserving integer format - r

I have a some large rasters (~110 MB each) I want to perform some raster calculations on. For the purposes of this example, I want to average the files SNDPPT_M_sl1_1km_ll.tif and SNDPPT_M_sl2_1km_ll.tif, available at this website. In reality, the math is a bit more complex (some multiplication and division of several rasters).
Both input rasters are integer (INT1U) data, and I would like the output to also be INT1U. However, whenever I try to do a raster calculation, it creates intermediate temporary files in floating point format which are very large in size. I am working on a laptop with about 7 GB of free hard drive space, which gets filled before the calculation is complete.
# load packages
require(raster)
## script control
# which property?
prop <- "SNDPPT"
# load layers
r.1 <- raster(paste0("1raw/", prop, "_M_sl1_1km_ll.tif"))
r.2 <- raster(paste0("1raw/", prop, "_M_sl2_1km_ll.tif"))
# allocate space for output raster - this is about 100 MB (same size as input files)
r.out <- writeRaster(r.1,
filename=paste0("2derived/", prop, "_M_meanTop200cm_1km_ll.tif"),
datatype="INT1U")
# perform raster math calculation
r.out <- integer(round((r.out+r.2)/2))
# at this point, my hard drive fills due to temporary files > 7 GB in size
Is anyone aware of a workaround to perform raster math in R with integer input and output files while minimizing or avoiding the very large intermediate files?

The trick here could be to use raster::overlay to make the computation and save the results as a compressed tiff at the same time. Something like this should work:
library(raster)
#> Loading required package: sp
# load layers
r.1 <- raster("C:/Users/LB_laptop/Downloads/SNDPPT_M_sl1_1km_ll.tif")
r.2 <- raster("C:/Users/LB_laptop/Downloads/SNDPPT_M_sl1_1km_ll.tif")
out <- raster::overlay(r.1, r.2,
fun = function(x, y) (round((x + y) / 2)),
filename = "C:/Users/LB_laptop/Downloads/SNDPPT_out.tif",
datatype = "INT1U",
options = "COMPRESS=DEFLATE")
> out
class : RasterLayer
dimensions : 16800, 43200, 725760000 (nrow, ncol, ncell)
resolution : 0.008333333, 0.008333333 (x, y)
extent : -180, 180, -56.00083, 83.99917 (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
data source : C:\Users\LB_laptop\Downloads\SNDPPT_out.tif
names : SNDPPT_out
values : 0, 242 (min, max)
HTH.

Related

Fast Extraction from Raster Datasets using Points - How to speed up the raster::extract() function

I need to extract values from large RasterLayer (a Digital Terrain Model or DTM), using XY coordinates. Coordinates are in another large data.table.
> DTM
class : RasterLayer
dimensions : 93690, 74840, 7011759600 (nrow, ncol, ncell)
resolution : 16, 16 (x, y)
extent : -80000, 1117440, 6448080, 7947120 (xmin, xmax, ymin, ymax)
crs : +proj=utm +zone=33 +ellps=GRS80 +units=m +no_defs
source : dtm_16x16_utm33.tif
names : dtm_16x16_utm33
values : -6.648066, 2273.72 (min, max)
> XY
x y
1: 986488 7930296
2: 986504 7930296
3: 986536 7930296
4: 986552 7930296
5: 986488 7930280
---
454986003: 61832 6451208
454986004: 61848 6451208
454986005: 61864 6451208
454986006: 61912 6451208
454986007: 61928 6451208
extract() function from raster r package could do the task. According to the package manual, the code would be:
Altitude <- extract(DTM, XY)
This, however, takes a long time!!
I have tried using the following line of code to run the function in parallel.
beginCluster(30)
Altitude <- extract(DTM, XY)
endCluster()
However, I can see that of the 64 available cores, only 1 is being used by R, the code is not running in parallel, and the function continues to take a long time.
Any ideas how I can speed this up?
Note1: I have successfully used lines of code similar to this one...
beginCluster(20)
raster3 <- projectRaster(raster1, crs=crs(raster2))
endCluster()
...and the server has worked with multiple cores at the same time.
One of the fastest solutions is to use the velox package.
https://www.rdocumentation.org/packages/velox/versions/0.2.0
library("velox")
library("raster")
library("sp")
library("rgeos")
## convert XY data to SpatialPoints
XY <- SpatialPoints(coords = XY)
## create a velox object from the original raster
DTMV <- velox(DTM)
## fast extraction of altitude data
Altitude <- DTM1V$extract_points(sp = XY)

How to get values for a pixel from a geoTIFF in R?

I'm trying to get RGB components from a geoTIFF file in R. The colours on the image correspond to different land classification types and I have a legend for each classification type in RGB components.
I'm using the raster library. My code so far is
library(raster)
my.map = raster("mygeoTIFFfile.tif")
Here is the information on the file once it has been read in:
> my.map[[1]]
class : RasterLayer
dimensions : 55800, 129600, 7231680000 (nrow, ncol, ncell)
resolution : 0.002777778, 0.002777778 (x, y)
extent : -180.0014, 179.9986, -64.99861, 90.00139 (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
data source : filepah/filename.tif
names : filename.tif
values : 11, 230 (min, max)
The specific geoTIFF file I'm working on can be found here:
http://due.esrin.esa.int/page_globcover.php
(just click on "Globcover2009_V2.3_Global_.zip")
Can someone please help me get the value from a single pixel location from this file please?
The rasterToPoints() function will convert your raster data to a matrix containing x, y, and value for each point. This will be very large, but may be what you're looking for if you want to do a broad analysis of the data.
library(raster)
map <- raster("GLOBCOVER_L4_200901_200912_V2.3.tif")
data <- rasterToPoints(map, progress="text")
head(data)
Another option is to use the extract() function to return a single point by passing a SpatialPoints object with latitude/longitude. If you only want a few individual data points, this will be a lot faster than loading the entire thing into a matrix.
library(raster)
map <- raster("GLOBCOVER_L4_200901_200912_V2.3.tif")
extract(map, SpatialPoints(cbind(-123.3680884, 48.4252848)))
It seems that you are asking the wrong question.
To get a value for a single pixel (grid cell), you can do use indexing. For example, for cell number 10,000 and 10,001 you can do r[10000:10001].
You could get all values by doing values(r). But that will fail for a very large raster like this (unless you have lots of RAM).
However, the question you need answered, it seems, is how to make a map by matching integer cell values with RGB colors.
Let's set up an example raster
library(raster)
r <- raster(nrow=4, ncol=4)
values(r) <- rep(c(11, 14, 20, 30), each=4)
And some matching RGB values
legend <- read.csv(text="Value,Label,Red,Green,Blue
11,Post-flooding or irrigated croplands (or aquatic),170,240,240
14,Rainfed croplands,255,255,100
20,Mosaic cropland (50-70%) / vegetation (grassland/shrubland/forest) (20-50%),220,240,100
30,Mosaic vegetation (grassland/shrubland/forest) (50-70%) / cropland (20-50%) ,205,205,102")
Compute the color code
legend$col <- rgb(legend$Red, legend$Green, legend$Blue, maxColorValue=255)
set up a "color table"
# start with white for all values (1 to 255)
ct <- rep(rgb(1,1,1), 255)
# fill in where necessary
ct[legend$Value+1] <- legend$col
colortable(r) <- ct
plot
plot(r)
You can also try:
tb <- legend[, c('Value', 'Label')]
colnames(tb)[1] = "ID"
tb$Label <- substr(tb$Label, 1,10)
levels(r) <- tb
library(rasterVis)
levelplot(r, col.regions=legend$col, at=0:length(legend$col))

Can't change raster's extent

I want to crop an elevation raster to add it to a raster stack. It's easy, I did this before smoothly, adding a ecoregions raster to the same stack. But with the elevation one, just doesn't work. Now, there are several questions here in overflow adressing this issue and I tryed a lot of things...
First of all, we need this:
library(rgdal)
library(raster)
My stack is predictors2:
#Downloading the stack
predictors2_full<-getData('worldclim', var='bio', res=10)
#Cropping it, I don' need the whole world
xmin=-120; xmax=-35; ymin=-60; ymax=35
limits <- c(xmin, xmax, ymin, ymax)
predictors2 <- crop(predictors2_full,limits)
Then I've downloaded the terr_ecorregions shapefile here: http://maps.tnc.org/files/shp/terr-ecoregions-TNC.zip
setwd("~/ORCHIDACEAE/Ecologicos/w2/layers/terr-ecoregions-TNC")
ecoreg = readOGR("tnc_terr_ecoregions.shp") # I've loaded...
ecoreg2 <- crop(ecoreg,extent(predictors2)) # cropped...
ecoreg2 <- rasterize(ecoreg2, predictors2) # made the shapefile a raster
predictors4<-addLayer(predictors2,elevation,ecoreg2) # and added the raster
# to my stack
With elevation, I just can't. The Digital elevation model is based in GMTED2010, which can be downloaded here: http://edcintl.cr.usgs.gov/downloads/sciweb1/shared/topo/downloads/GMTED/Grid_ZipFiles/mn30_grd.zip
elevation<-raster("w001001.adf") #I've loaded
elevation<-crop(elevation,predictors2) # and cropped
But elevation gets a slightly different extent instead of predictors2's extent:
> extent(elevation)
class : Extent
xmin : -120.0001
xmax : -35.00014
ymin : -60.00014
ymax : 34.99986
>
I tried to make then equal by all means I read about in questions here...
I tried to extend so elevation's ymax would meet predictors2's ymax
elevation<-extend(elevation,predictors2) #didn't work, extent remains the same
I tried the opposite... making predictors2 extent meet elevation's extent... nothing either.
But then I read that
You might not want to play with setExtent() or extent() <- extent(), as you could end with wrong geographic coordinates of your rasters - #ztl, Jun 29 '15
And I tried to get the minimal common extent of my rasters, following #zlt answer in another extent question, by doing this
# Summing your rasters will only work where they are not NA
r123 = r1+r2+r3 # r123 has the minimal common extent
r1 = crop(r1, r123) # crop to that minimal extent
r2 = crop(r2, r123)
r3 = crop(r3, r123)
For that, first I had to set the resolutions:
res(elevation)<-res(predictors2) #fixing the resolutions... This one worked.
But then, r123 = r1+r2+r didn't work:
> r123=elevation+ecoreg2+predictors2
Error in elevation + ecoreg2 : first Raster object has no values
Can anyone give me a hint on this? I really would like to add my elevation to the raster. Funny thing is, I have another stack named predictors1 with the exact same elevation's extent... And I was able to crop ecoreg and add ecoreg to both predictors1 and predictors2... Why can't I just do the same to elevation?
I'm quite new to this world and runned out of ideas... I appreciate any tips.
EDIT: Solution, Thanks to #Val
I got to this:
#Getting the factor to aggregate (rasters are multiples of each other)
res(ecoreg2)/res(elevation)
[1] 20 20 #The factor is 20
elevation2<-aggregate(elevation, fact=20)
elevation2 <- crop(elevation2,extent(predictors2))
#Finally adding the layer:
predictors2_eco<-addLayer(predictors2,elevation2,ecoreg)
New problem, thought...
I can't write stack to a geotiff
writeRaster(predictors2_eco, filename="cropped_predictors2_eco.tif", options="INTERLEAVE=BAND", overwrite=TRUE)
Error in .checkLevels(levs[[j]], value[[j]]) :
new raster attributes (factor values) should be in a data.frame (inside a list)
I think you're having issues because you're working with rasters of different spatial resolutions. So when you crop both rasters to the same extent, they'll have a slightly different actual extent because of that.
So if you want to stack rasters, you need to get them into the same resolution. Either you disaggregate the raster with the coarser resolution (i.e. increase the resolution by resampling or other methods) or you aggregate the raster with the higher resolution (i.e. decrease the resolution with for instance taking the mean over n pixel).
Please note that if you change the extent or resolution with setExtent(x), extent(x) <-, res(x) <- or similar will NOT work, since you're just changing slots in the raster object, not the actual underlying data.
So to bring the rasters into a common resolution, you need to change the data. You can use the functions (amongst others) aggregate, disaggregate and resample for that purpose. But since you're changing data, you need to be clear on what you're and the function you use is doing.
The most handy way for you should be resample, where you can resample a raster to another raster so they match in extent and resolution. This will be done using a defined method. Per default it's using nearest neighbor for computing the new values. If you're working with continuous data such as elevation, you might want to opt for bilinear which is bilinear interpolation. In this case you're actually creating "new measurements", something to be aware of.
If your two resolutions are multiples of each other, you could look into aggregate and disaggregate. In the case of disaggregate you would split a rastercell by a factor to get a higher resolution (e.g. if your first resolution is 10 degrees and your desired resolution is 0.05 degrees, you could disaggregate with a factor of 200 giving you 200 cells of 0.05 degree for every 10 degree cell). This method would avoid interpolation.
Here's a little working example:
library(raster)
library(rgeos)
shp <- getData(country='AUT',level=0)
# get centroid for downloading eco and dem data
centroid <- coordinates(gCentroid(shp))
# download 10 degree tmin
ecovar <- getData('worldclim', var='tmin', res=10, lon=centroid[,1], lat=centroid[,2])
ecovar_crop <- crop(ecovar,shp)
# output
> ecovar_crop
class : RasterBrick
dimensions : 16, 46, 736, 12 (nrow, ncol, ncell, nlayers)
resolution : 0.1666667, 0.1666667 (x, y)
extent : 9.5, 17.16667, 46.33333, 49 (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
data source : in memory
names : tmin1, tmin2, tmin3, tmin4, tmin5, tmin6, tmin7, tmin8, tmin9, tmin10, tmin11, tmin12
min values : -126, -125, -102, -77, -33, -2, 19, 20, 5, -30, -74, -107
max values : -31, -21, 9, 51, 94, 131, 144, 137, 106, 60, 18, -17
# download SRTM elevation - 90m resolution at eqt
elev <- getData('SRTM',lon=centroid[,1], lat=centroid[,2])
elev_crop <- crop(elev, shp)
# output
> elev_crop
class : RasterLayer
dimensions : 3171, 6001, 19029171 (nrow, ncol, ncell)
resolution : 0.0008333333, 0.0008333333 (x, y)
extent : 9.999584, 15.00042, 46.37458, 49.01708 (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
data source : in memory
names : srtm_39_03
values : 198, 3865 (min, max)
# won't work because of different resolutions (stack is equal to addLayer)
ecoelev <- stack(ecovar_crop,elev_crop)
# resample
elev_crop_RS <- resample(elev_crop,ecovar_crop,method = 'bilinear')
# works now
ecoelev <- stack(ecovar_crop,elev_crop_RS)
# output
> ecoelev
class : RasterStack
dimensions : 16, 46, 736, 13 (nrow, ncol, ncell, nlayers)
resolution : 0.1666667, 0.1666667 (x, y)
extent : 9.5, 17.16667, 46.33333, 49 (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
names : tmin1, tmin2, tmin3, tmin4, tmin5, tmin6, tmin7, tmin8, tmin9, tmin10, tmin11, tmin12, srtm_39_03
min values : -126.0000, -125.0000, -102.0000, -77.0000, -33.0000, -2.0000, 19.0000, 20.0000, 5.0000, -30.0000, -74.0000, -107.0000, 311.7438
max values : -31.000, -21.000, 9.000, 51.000, 94.000, 131.000, 144.000, 137.000, 106.000, 60.000, 18.000, -17.000, 3006.011

Resample raster

I am trying to resample a forest cover raster with high resolution (25 meters) and categorical data (1 to 13) to a new RasterLayer with a lower resolution (~ 1 km). My idea is to combine the forest cover data with other lower-resolution raster data :
I tried raster::resample(), but since the data is categorical I lost a lot of information:
summary(as.factor(df$loss_year_mosaic_30m))
0 1 2 3 4 5 6 7 8 9 10 11 12 13
3777691 65 101 50 151 145 159 295 291 134 102 126 104 91
As you can see, the new raster has the desired resolution but have lots of zeros as well. I suppose that is normal since I used the ´ngb´ option in resample.
The second strategy was using raster::aggregate() but I find difficult to define a factor integer since the change of resolution is not straightforward (like the double of the resolution or alike).
My high-resolution raster has the following resolution, and I want it to aggregate it to a 0.008333333, 0.008333333 (x, y) resolution to the same extent.
loss_year
class : RasterLayer
dimensions : 70503, 59566, 4199581698 (nrow, ncol, ncell)
resolution : 0.00025, 0.00025 (x, y)
extent : -81.73875, -66.84725, -4.2285, 13.39725 (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
data source : /Volumes/LaCie/Deforestacion/Hansen/loss_year_mosaic_30m.tif
names : loss_year_mosaic_30m
values : 0, 13 (min, max)
I have tried a factor of ~33.33 following the description of the aggregate help: "The number of cells is the number of cells of x divided by fact*fact (when fact is a single number)." Nonetheless, the resulting raster data do not seem to have the same number of rows and columns as my other low-resolution rasters.
I have never used this high-resolution data, and I am also computationally limited (some of this commands can be parallelized using clusterR, but sometimes they took the same time than the non-parallelized commands, especially since they do not work for nearest neighboor calculations).
I am short of ideas; maybe I can try layerize to obtain a count raster, but I have to ´aggregate´ and the factor problem arises. Since this processes are taking me days to process, I do want to know the most efficient way to create a lower resolution raster without losing much information
A reproducible example could be the following:
r_hr <- raster(nrow=70, ncol=70) #High resolution raster with categorical data
set.seed(0)
r_hr[] <- round(runif(1:ncell(r_hr), 1, 5))
r_lr <- raster(nrow=6, ncol=6) #Low resolution raster
First strategy: loss of information
r <- resample(r_hr, r_lr, method = "ngb") #The raster data is categorical
Second strategy: difficult to define an aggregate factor
r <- aggregate(r_hr, factor) #How to define a factor to get exactly the same number of cells of h_lr?
Another option: layerize
r_brick <- layerize(r_hr)
aggregate(r_brick, factor) #How to define factor to coincide with the r_lr dimensions?
Thanks for your help!
r_hr <- raster(nrow=70, ncol=70) #High resolution raster with categorical data
set.seed(0)
r_hr[] <- round(runif(1:ncell(r_hr), 1, 5))
r_lr <- raster(nrow=6, ncol=6)
r_hr
#class : RasterLayer
#dimensions : 70, 70, 4900 (nrow, ncol, ncell)
#resolution : 5.142857, 2.571429 (x, y)
#extent : -180, 180, -90, 90 (xmin, xmax, ymin, ymax)
#coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
#data source : in memory
#names : layer
#values : 1, 5 (min, max)
r_lr
#class : RasterLayer
#dimensions : 6, 6, 36 (nrow, ncol, ncell)
#resolution : 60, 30 (x, y)
#extent : -180, 180, -90, 90 (xmin, xmax, ymin, ymax)
#coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
Direct aggregate is not possible, because 70/6 is not an integer.
dim(r_hr)[1:2] / dim(r_lr)[1:2]
#[1] 11.66667 11.66667
Nearest neighbor resampling is not a good idea either as the results would be arbitrary.
Here is a by layer approach that you suggested and dww also showed already.
b <- layerize(r_hr)
fact <- round(dim(r_hr)[1:2] / dim(r_lr)[1:2])
a <- aggregate(b, fact)
x <- resample(a, r_lr)
Now you have proportions. If you want a single class you could do
y <- which.max(x)
In that case, another approach would be to aggregate the classes
ag <- aggregate(r_hr, fact, modal)
agx <- resample(ag, r_lr, method='ngb')
Note that agx and y are the same. But they can both be problematic as you might have cells with 5 classes with each about 20%, making it rather unreasonable to pick one winner.
It is pretty standard practice to aggregate land cover maps into layers of %cover. I.e you should aim to produce 13 layers, each being something like %cover in that grid cell. Doing this allows you to reduce the resolution while retaining as much information as possible. N.B if you require a different summary statistic than %, should be easy to adapt the following method to whatever statistic you want, by changing the fun = function in aggregate.
The following method is pretty fast (it takes just a few seconds on my laptop to process raster with 100 million cells):
First, let's create some dummy rasters to use
Nhr <- 1e4 # resolution of high-res raster
Nlr <- 333 # resolution of low-res raster
r.hr <- raster(ncols=Nhr, nrows=Nhr)
r.lr <- raster(ncols=Nlr, nrows=Nlr)
r.hr[] <- sample(1:13, Nhr^2, replace=T)
Now, we begin by aggregating the high res raster to almost the same resolution as the low res one (to nearest integer number of cells). Each resulting layer contains the fraction of area within that cell in which value of original raster is N.
Nratio <- as.integer(Nhr/Nlr) # ratio of high to low resolutions, to nearest integer value for aggregation
layer1 <- aggregate(r.hr, Nratio, fun=function(x, na.rm=T) {mean(x==1, na.rm=na.rm)})
layer2 <- aggregate(r.hr, Nratio, fun=function(x, na.rm=T) {mean(x==2, na.rm=na.rm)})
And finally, resample low res raster to the desired resolution
layer1 <- resample(layer1, r.lr, method = "ngb")
layer2 <- resample(layer2, r.lr, method = "ngb")
repeat for each layer, and build your layers into a stack or a multi-band raster

Getting Data out of raster file in R

I'm new to raster files, but they seem to be the best way to open up the large gov't files that have all the weather data, so I'm trying to figure out how to use them. For reference, I'm downloading the files located here (just some run of the mill weather stuff). When I use the raster package of R to import the file like this
> r <- raster("/path/to/file.grb")
Everything works fine. I can even get a little metadata when I type in
> r
class : RasterLayer
band : 1 (of 37 bands)
dimensions : 224, 464, 103936 (nrow, ncol, ncell)
resolution : 0.125, 0.125 (x, y)
extent : -125.0005, -67.0005, 25.0005, 53.0005 (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +a=6371200 +b=6371200 +no_defs
data source : /path/to/file.grb
names : NLDAS_MOS0125_H.A20140629.0100.002
All I've managed to do at this point is index the raster in a very obvious way.
> r[100,100]
267.1
So, I guess I can "index" it, but I have no idea what the number 267.1 means. It's certainly not all there is in the cell. There should be a bunch of variables including, but not limited to, soil moisture, surface runoff, and evaporation.
How can I access this information in the same way using R?
# create two rasters
r1 <- raster(matrix(ncol = 10, nrow = 10, runif(100)))
r2 <- raster(matrix(ncol = 10, nrow = 10, runif(100)))
# creates a raster stack -- the stack (or brick function) allows you to
# to use multilayer band rasters
# http://www.inside-r.org/packages/cran/raster/docs/stack
st_r <- stack(r1, r2)
# extract values -- will create a matrix with 100 rows and two columns
vl <- getValues(st_r)
r <- raster("/path/to/file.grb")
values <- getValues(r)
You can read about the function here:
http://www.inside-r.org/packages/cran/raster/docs/values
I believe that the problem is that you are using raster and not stack. The raster function results in a single layer (matrix) whereas stack or brick read an array with all of the raster layers. Here is an example that demonstrates extracting values using an [i,j,z] index.
library(raster)
setwd("D:/TMP")
download.file("ftp://hydro1.sci.gsfc.nasa.gov/data/s4pa/NLDAS/NLDAS_MOS0125_H.002/2014/180/NLDAS_MOS0125_H.A20140629.0000.002.grb",
destfile="NLDAS_MOS0125_H.A20140629.0000.002.grb", mode="wb")
r <- stack("NLDAS_MOS0125_H.A20140629.0000.002.grb")
names(r) <- paste0("L", seq(1:nlayers(r)))
class(r)
# Values for [i,j]
i=100
j=100
r[i,j]
# Values for i,j and z at layer(s) 1, 5 and 10
z=c(1,5,10)
r[i,j][z]

Resources