Fast Extraction from Raster Datasets using Points - How to speed up the raster::extract() function - r

I need to extract values from large RasterLayer (a Digital Terrain Model or DTM), using XY coordinates. Coordinates are in another large data.table.
> DTM
class : RasterLayer
dimensions : 93690, 74840, 7011759600 (nrow, ncol, ncell)
resolution : 16, 16 (x, y)
extent : -80000, 1117440, 6448080, 7947120 (xmin, xmax, ymin, ymax)
crs : +proj=utm +zone=33 +ellps=GRS80 +units=m +no_defs
source : dtm_16x16_utm33.tif
names : dtm_16x16_utm33
values : -6.648066, 2273.72 (min, max)
> XY
x y
1: 986488 7930296
2: 986504 7930296
3: 986536 7930296
4: 986552 7930296
5: 986488 7930280
---
454986003: 61832 6451208
454986004: 61848 6451208
454986005: 61864 6451208
454986006: 61912 6451208
454986007: 61928 6451208
extract() function from raster r package could do the task. According to the package manual, the code would be:
Altitude <- extract(DTM, XY)
This, however, takes a long time!!
I have tried using the following line of code to run the function in parallel.
beginCluster(30)
Altitude <- extract(DTM, XY)
endCluster()
However, I can see that of the 64 available cores, only 1 is being used by R, the code is not running in parallel, and the function continues to take a long time.
Any ideas how I can speed this up?
Note1: I have successfully used lines of code similar to this one...
beginCluster(20)
raster3 <- projectRaster(raster1, crs=crs(raster2))
endCluster()
...and the server has worked with multiple cores at the same time.

One of the fastest solutions is to use the velox package.
https://www.rdocumentation.org/packages/velox/versions/0.2.0
library("velox")
library("raster")
library("sp")
library("rgeos")
## convert XY data to SpatialPoints
XY <- SpatialPoints(coords = XY)
## create a velox object from the original raster
DTMV <- velox(DTM)
## fast extraction of altitude data
Altitude <- DTM1V$extract_points(sp = XY)

Related

How do I get this raster and this shapefile on the same projection?

I have a shapefile of 10 CA counties projected in NAD83(Hard)/CA Albers. I have a raster (a netCDF file of temperature) for the entire US projected in WGS84/WGS84. I want to use the shapefile to clip the raster. I know that I need to get them on the same datum/projection first. But I've tried re-projecting the raster using raster::projectRaster(). That failed (as in the data disappeared). So then I tried re-projecting the shapefile instead using sp::spTransform(). This also failed (as in the data don't overlap). I've searched through stackoverflow but didn't see anything that seemed to help. I'm not getting an error, but projectRaster is not working and re-projecting the shapefile using spTransform doesn't produce the desired outcome. I feel like there is something specific going on here, like the transformation from WGS84 to NAD83 or loading the raster in using raster() is the problem ... but then again, it could easily be something stupid that I'm missing! =)
my shapefile and raster are here: https://www.dropbox.com/sh/l3b2syzcioeqmyy/AAA5CstBZty4ofOcVFkAumNYa?dl=0
here is my code:
library(raster) #for creating rasters from .bil files
library(rgdal) #for reading .bil files and .gdb files
library(ncdf4) #for working with ncdf files
library(sp) #for working with spatial data files
load(my_counties.RData)
myraster <- raster(myraster.nc)
my.crs <- CRS("+init=EPSG:3311") #NAD83(HARN) / California Albers (HARN is high resolution)
newraster <- projectRaster(myraster, res = 6000, crs = my.crs) #raster resolution is 1/16th of a degree
#There is data in the raster.
plot(myraster)
#but none in newraster
plot(newraster)
#Now try re-projecting the shapefile
my.crs2 <- crs(myraster)
newshapefile <- spTransform(my_counties, my.crs2)
#but the data don't overlap
plot(newshapefile); plot(myraster, add = T)
You can do
library(raster)
library(rgdal)
load("my_counties.RData")
b <- brick("myraster.nc")
Now look at b
b
#class : RasterBrick
#dimensions : 490, 960, 470400, 365 (nrow, ncol, ncell, nlayers)
#resolution : 0.0625, 0.0625 (x, y)
#extent : 234, 294, 23.375, 54 (xmin, xmax, ymin, ymax)
#crs : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
#source : myraster.nc
#names : X2005.01.01, X2005.01.02, X2005.01.03, X2005.01.04, X2005.01.05, X2005.01.06, X2005.01.07, X2005.01.08, X2005.01.09, X2005.01.10, X2005.01.11, X2005.01.12, X2005.01.13, X2005.01.14, X2005.01.15, ...
#Date : 2005-01-01, 2005-12-31 (min, max)
#varname : tasmax
The horizontal extent is between 234 and 294 degrees. That points to a system with longitudes that start in Greenwich at 0 and continues to 360 (again in Greenwich). Climatologist do that. To go to the more conventional -180 to 180 degrees system:
r <- shift(b, -360)
(if your data had a global extent, you would use raster::rotate instead)
Now, transform the counties to lonlat and show that they overlap.
counties <- spTransform(my_counties, crs(r))
plot(r, 1)
lines(counties)
It is generally best to transform vector data, not raster data if you can avoid it.

How to get values for a pixel from a geoTIFF in R?

I'm trying to get RGB components from a geoTIFF file in R. The colours on the image correspond to different land classification types and I have a legend for each classification type in RGB components.
I'm using the raster library. My code so far is
library(raster)
my.map = raster("mygeoTIFFfile.tif")
Here is the information on the file once it has been read in:
> my.map[[1]]
class : RasterLayer
dimensions : 55800, 129600, 7231680000 (nrow, ncol, ncell)
resolution : 0.002777778, 0.002777778 (x, y)
extent : -180.0014, 179.9986, -64.99861, 90.00139 (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
data source : filepah/filename.tif
names : filename.tif
values : 11, 230 (min, max)
The specific geoTIFF file I'm working on can be found here:
http://due.esrin.esa.int/page_globcover.php
(just click on "Globcover2009_V2.3_Global_.zip")
Can someone please help me get the value from a single pixel location from this file please?
The rasterToPoints() function will convert your raster data to a matrix containing x, y, and value for each point. This will be very large, but may be what you're looking for if you want to do a broad analysis of the data.
library(raster)
map <- raster("GLOBCOVER_L4_200901_200912_V2.3.tif")
data <- rasterToPoints(map, progress="text")
head(data)
Another option is to use the extract() function to return a single point by passing a SpatialPoints object with latitude/longitude. If you only want a few individual data points, this will be a lot faster than loading the entire thing into a matrix.
library(raster)
map <- raster("GLOBCOVER_L4_200901_200912_V2.3.tif")
extract(map, SpatialPoints(cbind(-123.3680884, 48.4252848)))
It seems that you are asking the wrong question.
To get a value for a single pixel (grid cell), you can do use indexing. For example, for cell number 10,000 and 10,001 you can do r[10000:10001].
You could get all values by doing values(r). But that will fail for a very large raster like this (unless you have lots of RAM).
However, the question you need answered, it seems, is how to make a map by matching integer cell values with RGB colors.
Let's set up an example raster
library(raster)
r <- raster(nrow=4, ncol=4)
values(r) <- rep(c(11, 14, 20, 30), each=4)
And some matching RGB values
legend <- read.csv(text="Value,Label,Red,Green,Blue
11,Post-flooding or irrigated croplands (or aquatic),170,240,240
14,Rainfed croplands,255,255,100
20,Mosaic cropland (50-70%) / vegetation (grassland/shrubland/forest) (20-50%),220,240,100
30,Mosaic vegetation (grassland/shrubland/forest) (50-70%) / cropland (20-50%) ,205,205,102")
Compute the color code
legend$col <- rgb(legend$Red, legend$Green, legend$Blue, maxColorValue=255)
set up a "color table"
# start with white for all values (1 to 255)
ct <- rep(rgb(1,1,1), 255)
# fill in where necessary
ct[legend$Value+1] <- legend$col
colortable(r) <- ct
plot
plot(r)
You can also try:
tb <- legend[, c('Value', 'Label')]
colnames(tb)[1] = "ID"
tb$Label <- substr(tb$Label, 1,10)
levels(r) <- tb
library(rasterVis)
levelplot(r, col.regions=legend$col, at=0:length(legend$col))

R- Raster math while preserving integer format

I have a some large rasters (~110 MB each) I want to perform some raster calculations on. For the purposes of this example, I want to average the files SNDPPT_M_sl1_1km_ll.tif and SNDPPT_M_sl2_1km_ll.tif, available at this website. In reality, the math is a bit more complex (some multiplication and division of several rasters).
Both input rasters are integer (INT1U) data, and I would like the output to also be INT1U. However, whenever I try to do a raster calculation, it creates intermediate temporary files in floating point format which are very large in size. I am working on a laptop with about 7 GB of free hard drive space, which gets filled before the calculation is complete.
# load packages
require(raster)
## script control
# which property?
prop <- "SNDPPT"
# load layers
r.1 <- raster(paste0("1raw/", prop, "_M_sl1_1km_ll.tif"))
r.2 <- raster(paste0("1raw/", prop, "_M_sl2_1km_ll.tif"))
# allocate space for output raster - this is about 100 MB (same size as input files)
r.out <- writeRaster(r.1,
filename=paste0("2derived/", prop, "_M_meanTop200cm_1km_ll.tif"),
datatype="INT1U")
# perform raster math calculation
r.out <- integer(round((r.out+r.2)/2))
# at this point, my hard drive fills due to temporary files > 7 GB in size
Is anyone aware of a workaround to perform raster math in R with integer input and output files while minimizing or avoiding the very large intermediate files?
The trick here could be to use raster::overlay to make the computation and save the results as a compressed tiff at the same time. Something like this should work:
library(raster)
#> Loading required package: sp
# load layers
r.1 <- raster("C:/Users/LB_laptop/Downloads/SNDPPT_M_sl1_1km_ll.tif")
r.2 <- raster("C:/Users/LB_laptop/Downloads/SNDPPT_M_sl1_1km_ll.tif")
out <- raster::overlay(r.1, r.2,
fun = function(x, y) (round((x + y) / 2)),
filename = "C:/Users/LB_laptop/Downloads/SNDPPT_out.tif",
datatype = "INT1U",
options = "COMPRESS=DEFLATE")
> out
class : RasterLayer
dimensions : 16800, 43200, 725760000 (nrow, ncol, ncell)
resolution : 0.008333333, 0.008333333 (x, y)
extent : -180, 180, -56.00083, 83.99917 (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
data source : C:\Users\LB_laptop\Downloads\SNDPPT_out.tif
names : SNDPPT_out
values : 0, 242 (min, max)
HTH.

Getting Data out of raster file in R

I'm new to raster files, but they seem to be the best way to open up the large gov't files that have all the weather data, so I'm trying to figure out how to use them. For reference, I'm downloading the files located here (just some run of the mill weather stuff). When I use the raster package of R to import the file like this
> r <- raster("/path/to/file.grb")
Everything works fine. I can even get a little metadata when I type in
> r
class : RasterLayer
band : 1 (of 37 bands)
dimensions : 224, 464, 103936 (nrow, ncol, ncell)
resolution : 0.125, 0.125 (x, y)
extent : -125.0005, -67.0005, 25.0005, 53.0005 (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +a=6371200 +b=6371200 +no_defs
data source : /path/to/file.grb
names : NLDAS_MOS0125_H.A20140629.0100.002
All I've managed to do at this point is index the raster in a very obvious way.
> r[100,100]
267.1
So, I guess I can "index" it, but I have no idea what the number 267.1 means. It's certainly not all there is in the cell. There should be a bunch of variables including, but not limited to, soil moisture, surface runoff, and evaporation.
How can I access this information in the same way using R?
# create two rasters
r1 <- raster(matrix(ncol = 10, nrow = 10, runif(100)))
r2 <- raster(matrix(ncol = 10, nrow = 10, runif(100)))
# creates a raster stack -- the stack (or brick function) allows you to
# to use multilayer band rasters
# http://www.inside-r.org/packages/cran/raster/docs/stack
st_r <- stack(r1, r2)
# extract values -- will create a matrix with 100 rows and two columns
vl <- getValues(st_r)
r <- raster("/path/to/file.grb")
values <- getValues(r)
You can read about the function here:
http://www.inside-r.org/packages/cran/raster/docs/values
I believe that the problem is that you are using raster and not stack. The raster function results in a single layer (matrix) whereas stack or brick read an array with all of the raster layers. Here is an example that demonstrates extracting values using an [i,j,z] index.
library(raster)
setwd("D:/TMP")
download.file("ftp://hydro1.sci.gsfc.nasa.gov/data/s4pa/NLDAS/NLDAS_MOS0125_H.002/2014/180/NLDAS_MOS0125_H.A20140629.0000.002.grb",
destfile="NLDAS_MOS0125_H.A20140629.0000.002.grb", mode="wb")
r <- stack("NLDAS_MOS0125_H.A20140629.0000.002.grb")
names(r) <- paste0("L", seq(1:nlayers(r)))
class(r)
# Values for [i,j]
i=100
j=100
r[i,j]
# Values for i,j and z at layer(s) 1, 5 and 10
z=c(1,5,10)
r[i,j][z]

netCDF to raster and projection

I've recently started using R for spatial data. I'd be really grateful if you could could help me with this. Thanks!
I've extracted data from a multidimensional netCDF file. This file had longitude, latitude and temperature data (for 12 months of a specific year).
From this netCDF I've got a data frame for January with these variables: longitude, latitude, temperature.
With this data frame I've created a raster.
# Packages
library("sp")
library("raster")
library("rgdal")
library("ncdf")
library("maptools")
library("rgeos")
library("sm")
library("chron")
# Dataframe to raster
# Create spatial points data frame
coordinates(tmp.df01) <- ~ lon + lat
# Coerce to SpatialPixelsDataFrame
gridded(tmp.df01) <- T
# Coerce to raster
rasterDF1 <- raster(tmp.df01)
> print(tmp.df01)
class : SpatialPixelsDataFrame
dimensions : 103, 241, 24823, 1 (nrow, ncol, npixels, nlayers)
resolution : 0.02083333, 0.02083333 (x, y)
extent : 5.739583, 10.76042, 45.73958, 47.88542 (xmin, xmax, ymin, ymax)
coord. ref. : NA
names : TabsM_1
min values : -18.1389980316162
max values : 2.26920962333679
There is no value for 'coord. ref.'
The projection of the original netCDF was WGS84. So I gave this projection to the raster.
proj4string(rasterDF1) <- "+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0"
Then, I wanted to reproject my raster to another projection:
# Reprojecting into CH1903_LV03
# First, change the coordinate reference system (crs)
proj4string(rasterDF1) <- "+init=epsg:21781"
# Second, reproject the raster
rasterDF1.CH <- spTransform(rasterDF1, crs("+init=epsg:21781"))
At this point I get the following error:
Error in spTransform(rasterDF1, crs("+init=epsg:21781")) :
load package rgdal for spTransform methods
But the package rgdal is already uploaded! It must be something wrong in the code!
Here the code to solve the problem described.
Solution provided by Frede Aakmann Tøgersen.
tmp.df01 # tmp.df01 is a data.frame
coordinates(tmp.df01) <- ~ lon + lat # tmp.df01 is now a SpatialPointsDataFrame
# Assign orignial data projection
proj4string(tmp.df01) <- CRS("+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0")
gridded(tmp.df01) <- T # tmp.df01 is now a SpatialPixelFrame
# Coerce to raster
rasterDF1 <- raster(tmp.df01) # rasterDF1 is a RasterLayer
# To reproject the raster layer
rasterDF1.proj <- projectRaster(rasterDF1, crs=CRS("+init=epsg:21781"))
Here is another option GDAL. Using gdal_translate you can convert netcdf file with bands to geotiffs.
gdal_translate -a_srs EPSG:4326 NETCDF:File_Name.nc:Band_Name -of ‘Gtiff’ Output_FileName.geotiff
To explore more options in gdal_translate you can visit this link

Resources