Random sampling from large rasterlayer - r

I have a large Rasterlayer with integers ranging from 0 to 44.
class : RasterLayer
dimensions : 29800, 34470, 1027206000 (nrow, ncol, ncell)
resolution : 10, 10 (x, y)
extent : 331300, 676000, 5681995, 5979995 (xmin, xmax, ymin, ymax)
coord. ref. : +proj=utm +zone=32 +ellps=GRS80 +units=m +no_defs
data source : /home/mkoehler/stk_rast_whz
names : stk_rast_whz
values : 0, 44 (min, max)
I want to do a stratified sampling of 5000 points per stratum.
I get the following error:
POINTS<-sampleStratified(b, size=5000, na.rm=T, xy=F)
(Error in ys[[i]] <- y : attempt to select less than one element)
Here is a code that reproduces the problems (even when only selecting 1
item per stratum):
set.seed(10)
r <- raster(ncol=5000, nrow=5000)
names(r) <- 'stratum'
r[] <- round((runif(ncell(r)))*44)
sampleStratified(r, size=1,xy=T)
Error in ys[[i]] <- y : attempt to select less than one element
Trying that with fewer strata and changing the settings of "size" or
"exp" have no effect.
R version: [64-bit] C:\Program Files\R\R-3.1.1
Any ideas?
thanks in advance!

This appears to be a bug (as at raster 2.3-12), and occurs when (1) your raster contains cells with value 0, and (2) the raster can't be processed in memory (i.e. canProcessInMemory(r) is FALSE).
The function loops over the unique cell values produced by freq(r), and then indexes a list by each of these values in turn. If one of those values is zero, the error will be triggered since the 0th element does not exist. For example:
list()[[0]]
# Error in list()[[0]] : attempt to select less than one element]
You'll notice that the error doesn't occur if you fill r with, e.g., r[] <- sample(44, ncell(r), replace=TRUE), since it won't have any zeroes.
When the raster can be processed in memory, the function loops over the row numbers of freq(r), and so the subsequent list indexing is sensible.
I've contacted the maintainer to report this bug.
Meanwhile, as a temporary fix, you could use something like the following to make a corrected copy of the function (which will remain available in the current R session).
sampleStratified2 <-
eval(parse(text=sub('sr\\[, 2\\] == i', 'sr[, 2] == f[i, 1]',
sub('i in f\\[, 1\\]', 'i in seq_len(nrow(f))',
deparse(getMethod(sampleStratified,
signature='RasterLayer')#.Data))
)))
sampleStratified2(r, size=1, xy=TRUE)

Related

Project MaxENT fitted model into geographic space

Similar question was answered here; however, this problem is a bit different for which I cannot apply that solution. I have fitted maxent using site-with-data format. The problem is I cannot project the fitted model. The output of m1 in the path D:/maxent looks fine. I suspect this two error (below) is related with rJava, but I don't know the solution. Please see my codes below:
> m1 <- maxent(x = d, p = id, path = "D:/maxent",
args = c("-P", "noautofeature", "nolinear", "noquadratic", "nothreshold",
"noproduct", "betamultiplier=1", "replicates=10", "crossvalidate"))
# here d is a dataframe containing 11213 rows and 20 predictor columns (with numeric values), id is a vector containing numeric values of 1 and 0 (representing species presence and absence)
Loading required namespace: rJava
> plot(m1, xlim=c(0,100))
Error in as.double(y) :
cannot coerce type 'S4' to vector of type 'double'
> ras <- raster("E:/bio12.tif") # raster to project the fitted model 'm1'
> pred.m1 <- raster::predict(m1, ras)
Error in .local(object, ...) : missing layers (or wrong names)
Here is the properties of raster file
> ras
class : RasterLayer
dimensions : 4292, 4936, 21185312 (nrow, ncol, ncell)
resolution : 0.008333333, 0.008333333 (x, y)
extent : 112.8917, 154.025, -43.75833, -7.991667 (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
data source : E:/Predictors_grasshoppers/selected.predictors/bio12.tif
names : bio12
values : 79, 7625 (min, max)
Update: I have tried using single quote in m1 and the problem is still there.
> m2 <- maxent(x = d, p = id, path = 'D:/PhD related/Historic climate data Australia/maxent2',
args = c('-P', 'noautofeature', 'nolinear', 'noquadratic', 'nothreshold',
'noproduct', 'betamultiplier=1', 'replicates=10', 'crossvalidate'))
Following the comment of #Bappa Das I found the solution. To project maxent fitted model into geographic space one should use a raster stack (not a single raster) containing variables that were used during the model fitting process. The order and name of predictors in the raster stack should be same as they were in fitted model.

random sampling from a large raster using clusterR

I need to take a 5% random sample from a very large raster and return a new raster. I am trying to use sampleRandom from the raster package, but the process is very slow (I only have 8GB RAM on my machine, running 64-bit R). The raster has been cropped/masked to match an irregularly shaped study area boundary, as well - so has NA values in the rectangular extent around the polygon boundary and some internal NA values - I'm trying to sample only from the non-NA values. I've tried both sampling 5% and reversing that to sampling 95% - both ran for >2 hours without producing a result, at which point I terminated the process.
I am trying to speed it up by running it in parallel using the clusterR command, but I'm new to both the sampleRandom command and to using clusterR. My code runs, but I get all of the non-NA pixels returned, so the sample doesn't seem to working. Is this a problem with my code or is it that sampleRandom can't run with clusterR?
Here is a description of my raster layer:
conv.mod
class : RasterLayer
dimensions : 23828, 19095, 454995660 (nrow, ncol, ncell)
resolution : 56, 56 (x, y)
extent : -1220192, -150872, 87580, 1421948 (xmin, xmax, ymin, ymax)
coord. ref. : +proj=aea +lat_1=44.75 +lat_2=55.75 +lat_0=40 +lon_0=-96 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs +ellps=WGS84 +towgs84=0,0,0
data source : C:\GIS\carbon_cows\Intact\conv_mod.tif
names : conv_mod
values : 1, 1 (min, max)
And here is the code I have tried:
library(raster)
library(parallel)
tot<-cellStats(conv.mod,'sum', na.rm=TRUE) #get the total pixels in conv.mod
sampsize<-tot * 0.05 #calculate how many pixels would represent 5%
removeTmpFiles() #clear some memory
numcores<-detectCores() -1
start<-Sys.time()
beginCluster(numcores)
cl<-getCluster()
clusterExport(cl,"sampsize", envir = .GlobalEnv)
conv.perc <- clusterR(conv.mod,sampleRandom,args=list(size=sampsize,na.rm=TRUE,asRaster=TRUE))
endCluster()
end<-Sys.time()
difftime(end,start)
Here are the total non-NA cells in the original raster layer:
tot<-cellStats(conv.mod,'sum', na.rm=TRUE)
tot
105193858
and the number that should be a 5% sample:
sampsize<-tot * 0.05
sampsize
5259693
But, the resulting raster has the same number of non-NA pixels as the original raster:
tot_convperc<-cellStats(conv.perc,'sum',na.rm=T)
tot_convperc
105193858
I've also tried reversing the sample size calculation and running sampleRandom, so that I'm requesting a 95% sample. But, I get the same result.
I'd appreciate any help in understanding why this code is not running as expected. Thanks!
Never mind. I was able to take advantage of this post: https://gis.stackexchange.com/questions/17255/random-sampling-of-raster-using-r and the reply by whuber.
The following code solved my problem, without the use of a cluster:
col.conv <- ncol(conv.mod)
row.conv<-nrow(conv.mod)
r<-conv.mod
start<-Sys.time()
r[runif(col.conv*row.conv) >= 0.95] <- NA # Randomly *unselect* 5% of the data
end<-Sys.time()
difftime(end,start)
That code ran in ~3 minutes, as opposed to over an hour for putting the simpleRandom code in the clusterR command. I still wonder why simpleRandom was not actually taking a sample and also why this new code is so much more efficient, but happy to have the problem solved.

How to get values for a pixel from a geoTIFF in R?

I'm trying to get RGB components from a geoTIFF file in R. The colours on the image correspond to different land classification types and I have a legend for each classification type in RGB components.
I'm using the raster library. My code so far is
library(raster)
my.map = raster("mygeoTIFFfile.tif")
Here is the information on the file once it has been read in:
> my.map[[1]]
class : RasterLayer
dimensions : 55800, 129600, 7231680000 (nrow, ncol, ncell)
resolution : 0.002777778, 0.002777778 (x, y)
extent : -180.0014, 179.9986, -64.99861, 90.00139 (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
data source : filepah/filename.tif
names : filename.tif
values : 11, 230 (min, max)
The specific geoTIFF file I'm working on can be found here:
http://due.esrin.esa.int/page_globcover.php
(just click on "Globcover2009_V2.3_Global_.zip")
Can someone please help me get the value from a single pixel location from this file please?
The rasterToPoints() function will convert your raster data to a matrix containing x, y, and value for each point. This will be very large, but may be what you're looking for if you want to do a broad analysis of the data.
library(raster)
map <- raster("GLOBCOVER_L4_200901_200912_V2.3.tif")
data <- rasterToPoints(map, progress="text")
head(data)
Another option is to use the extract() function to return a single point by passing a SpatialPoints object with latitude/longitude. If you only want a few individual data points, this will be a lot faster than loading the entire thing into a matrix.
library(raster)
map <- raster("GLOBCOVER_L4_200901_200912_V2.3.tif")
extract(map, SpatialPoints(cbind(-123.3680884, 48.4252848)))
It seems that you are asking the wrong question.
To get a value for a single pixel (grid cell), you can do use indexing. For example, for cell number 10,000 and 10,001 you can do r[10000:10001].
You could get all values by doing values(r). But that will fail for a very large raster like this (unless you have lots of RAM).
However, the question you need answered, it seems, is how to make a map by matching integer cell values with RGB colors.
Let's set up an example raster
library(raster)
r <- raster(nrow=4, ncol=4)
values(r) <- rep(c(11, 14, 20, 30), each=4)
And some matching RGB values
legend <- read.csv(text="Value,Label,Red,Green,Blue
11,Post-flooding or irrigated croplands (or aquatic),170,240,240
14,Rainfed croplands,255,255,100
20,Mosaic cropland (50-70%) / vegetation (grassland/shrubland/forest) (20-50%),220,240,100
30,Mosaic vegetation (grassland/shrubland/forest) (50-70%) / cropland (20-50%) ,205,205,102")
Compute the color code
legend$col <- rgb(legend$Red, legend$Green, legend$Blue, maxColorValue=255)
set up a "color table"
# start with white for all values (1 to 255)
ct <- rep(rgb(1,1,1), 255)
# fill in where necessary
ct[legend$Value+1] <- legend$col
colortable(r) <- ct
plot
plot(r)
You can also try:
tb <- legend[, c('Value', 'Label')]
colnames(tb)[1] = "ID"
tb$Label <- substr(tb$Label, 1,10)
levels(r) <- tb
library(rasterVis)
levelplot(r, col.regions=legend$col, at=0:length(legend$col))

Getting Data out of raster file in R

I'm new to raster files, but they seem to be the best way to open up the large gov't files that have all the weather data, so I'm trying to figure out how to use them. For reference, I'm downloading the files located here (just some run of the mill weather stuff). When I use the raster package of R to import the file like this
> r <- raster("/path/to/file.grb")
Everything works fine. I can even get a little metadata when I type in
> r
class : RasterLayer
band : 1 (of 37 bands)
dimensions : 224, 464, 103936 (nrow, ncol, ncell)
resolution : 0.125, 0.125 (x, y)
extent : -125.0005, -67.0005, 25.0005, 53.0005 (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +a=6371200 +b=6371200 +no_defs
data source : /path/to/file.grb
names : NLDAS_MOS0125_H.A20140629.0100.002
All I've managed to do at this point is index the raster in a very obvious way.
> r[100,100]
267.1
So, I guess I can "index" it, but I have no idea what the number 267.1 means. It's certainly not all there is in the cell. There should be a bunch of variables including, but not limited to, soil moisture, surface runoff, and evaporation.
How can I access this information in the same way using R?
# create two rasters
r1 <- raster(matrix(ncol = 10, nrow = 10, runif(100)))
r2 <- raster(matrix(ncol = 10, nrow = 10, runif(100)))
# creates a raster stack -- the stack (or brick function) allows you to
# to use multilayer band rasters
# http://www.inside-r.org/packages/cran/raster/docs/stack
st_r <- stack(r1, r2)
# extract values -- will create a matrix with 100 rows and two columns
vl <- getValues(st_r)
r <- raster("/path/to/file.grb")
values <- getValues(r)
You can read about the function here:
http://www.inside-r.org/packages/cran/raster/docs/values
I believe that the problem is that you are using raster and not stack. The raster function results in a single layer (matrix) whereas stack or brick read an array with all of the raster layers. Here is an example that demonstrates extracting values using an [i,j,z] index.
library(raster)
setwd("D:/TMP")
download.file("ftp://hydro1.sci.gsfc.nasa.gov/data/s4pa/NLDAS/NLDAS_MOS0125_H.002/2014/180/NLDAS_MOS0125_H.A20140629.0000.002.grb",
destfile="NLDAS_MOS0125_H.A20140629.0000.002.grb", mode="wb")
r <- stack("NLDAS_MOS0125_H.A20140629.0000.002.grb")
names(r) <- paste0("L", seq(1:nlayers(r)))
class(r)
# Values for [i,j]
i=100
j=100
r[i,j]
# Values for i,j and z at layer(s) 1, 5 and 10
z=c(1,5,10)
r[i,j][z]

Raster overlay from a matrix

I have a matrix of 100 raster layers and I'd like to create one new layer that is the average. I understand if there were two layers I could simply use the overlay function or perhaps just use c <- mean (a, b). However, I'm not sure how to proceed with the matrix.
Here is sample of the matrix:
[[1]]
class : RasterLayer
dimensions : 175, 179, 31325 (nrow, ncol, ncell)
resolution : 1, 1 (x, y)
extent : 0, 179, 0, 175 (xmin, xmax, ymin, ymax)
coord. ref. : NA
data source : in memory
names : layer
values : 0, 100 (min, max)
I have tried
a.avg <- mean (a.total[,])
and I receive the error argument is not numeric or logical: returning NA
I assume you have a list of rasterLayers ( or perhaps a stack ). If you already have a stack, skip step one, but I assume you have a list not a matrix which I have called mylistofrasters...
#1 - Get all rasters in the list into a stack
mystack <- do.call( stack , mylistofrasters )
#2 - Take mean of each pixel in the stack returning a single raster that is the average
mean.stack <- calc( mystack , mean , na.rm = TRUE )
This answer is similar to the #SimonO101's answer using a simpler code.
First, let's build a list of RasterLayer (you can skip this step if you already have the list):
library(raster)
r <- raster(nrow=10, ncol=10)
r <- init(r, runif)
lr <- lapply(1:8, function(i)r)
The raster package defines an stack method for lists, so you can use it directly without do.call:
s <- stack(lr)
Besides, there is a mean method for Raster* objects. Therefore, you don't really need calc:
mean(s, na.rm=TRUE)

Resources