Deal with huge dataset?

Deal with huge dataset? - raster

I need to plot a countrywide raster. I tried cellsize=300 and use 64-bit system. Memory.limit() is 8148. When running the code, it still gives me an "Error: cannot allocate vector of size 3.6 Gb". Sometimes, Windows stops working--even worse...
Is there any other way that I can deal with such large dataset? Btw, I am more familiar with ArcGIS. Thanks!!!
{
x.min <- -6328997.74765339; x.max <- 2182662.25234661 # Extent of easting coordinates
y.min <- 310413.438361092; y.max <- 5448183.43836109 # Extent of northing coordinates
n <- 2351
center <- read.csv ("J:\\...,header=T)
attach(center)
center <- as.matrix (center) # XY corrdinates #
emp <- read.csv ("J:\\...,header=T, sep=",")
attach(emp)
n.rows <- 17126
cellsize <- 300
n.cols <- 28373
x.max <- x.min + n.cols * cellsize # Assures square cells are used#
y.0 <- seq(y.max-cellsize/2, y.min+cellsize/2, length.out=n.rows)
x.0 <- seq(x.min+cellsize/2, x.max-cellsize/2, length.out=n.cols)
system.time(
{
i <- order(emp, decreasing=TRUE)
emp <- emp[i]
center <- center[i, , drop=FALSE]
owner <- matrix(0, n.rows, n.cols)
gravity.max <- matrix(0, n.rows, n.cols)
for (i in 1:n) {
r <- emp[i] / outer((y.0 - center[i,2])^2, (x.0 - center[i,1])^2, "+")
update <- which(r >= gravity.max)
gravity.max[update] <- r[update]
owner[update] <- i
}
})

You might try the raster package.
raster can work with large rasters. It is used by lots of people who do spatial analyses in R.
You will save a lot of time if you first read the vignette Introduction to the raster package to get familiar with the package.

Related

Applying complex mathematical functions on Raster Stacks/Bricks in R and creating two different Raster stacks which are dependent on each other

I have a raster brick (ncell=28536 and nlayers=181). I need to run mathematical functions on the original brick and create two more bricks of same size. Where both output bricks are dependent on each other.
inputBrick has 181 layers and 28536 cells per layer. outputBrick1 will calculate values of its 1st layer by analyzing outputBrick2's 1st layer. Then outputBrick2 will calculate values of its 2nd layer by analyzing outputBricks1's 1st layer and so on.
I created a function that works fine with 24 cells and 181 layers. But takes forever for 28000 cells and 181 layers. I know I shouldn't be using for loops for this but as I'm not a programmer I'm struggling.
Here is some example data for a much smaller dataset. There are 3 RasterBricks. Input has values while both outputs are empty
library(raster)
b <- brick(ncols=5, nrows=5, nl=5)
inBrick <- setValues(b, runif(ncell(b) * nlayers(b)))
inBrick[c(1,2,3,22,23,24,25)] <- NA
outBrick1 <- inBrick
outBrick1[] <- NA
outBrick2 <- outBrick1
ini <- 0.3
p <- 0.15
p1 <- p/3
p2 <- p-(p/3)
fc <- 0.3
var1 <- which(!is.na(inBrick[[1]][]))
outBrick2[[1]][var1] <- ini
### now outBrick2 has initial values in 1st layer
weather <- c(0.1, 0, 0, 0, 0.3)
Calculations that I want to do and have no idea how to do it efficiently without using for loops
var3 <- 1:ncell(inBrick)
### outBrick1 Calculations
for (i in 1:nlayers(inBrick)) {
varr1 <- inBrick[[i]][]*(((outBrick2[[i]][]-p1)/(p2))^2)
for (j in 1:ncell(inBrick)) {
if(!is.na(outBrick2[[i]][j])){
if(outBrick2[[i]][j]>p){
outBrick1[[i]][j] <- inBrick[[i]][j]
}else{
outBrick1[[i]][j] <- varr1[j]
}
}
}
###outBrick2 Calculations
for (k in 2:nlayers(inBrick)) {
var2 <- outBrick2[[k-1]][] + (weather[k-1]-outBrick1[[k-1]][])/100
for(l in 1:ncell(inBrick)){
var3[l] <- min(fc, var2[l])
}
outBrick2[[k]][] <- var3
}
}
Now, I want to basically understand what the best approach is to deal with situations like this. I tried increasing memory too by following commands
rasterOptions(maxmemory = 5.17e+10)
rasterOptions(memfrac = 0.8)
rasterOptions(chunksize = 5.17e+10)
but when I see CPU and RAM usage its barely 6% and 10% respectively. R uses only 5% CPU and 1GB RAM. My system has 64GB RAM, 16GB GPU.

Here is an attempt. This is much more concise. It is only three times faster on this example, but the gain may be larger on your real data.
library(terra)
b <- r1 <- r2 <- rast(ncols=5, nrows=5, nl=5, vals=NA)
set.seed(0)
values(b) <- runif(size(b))
b[c(1,2,3,22,23,24,25)] <- NA
p <- 0.15
p1 <- p/3
p2 <- p-(p/3)
fc <- 0.3
weather <- c(0.1, 0, 0, 0, 0.3)
r2[[1]] <- ifel(is.na(b[[1]]), NA, 0.3)
for (i in 1:nlyr(b)) {
varr1 <- b[[i]] * (((r2[[i]] - p1)/p2)^2)
r1[[i]] <- ifel(r2[[i]] > p, b[[i]], varr1)
for (k in 2:nlyr(b)) {
r2[[k]] <- min(r2[[k-1]] + (weather[k-1] - r1[[k-1]]) /100, fc)
}
}
If speed is the issue, and the raster data can be loaded into RAM, this may be much faster:
db <- values(b)
dr1 <- values(r1)
dr2 <- values(r2)
for (i in 1:ncol(db)) {
varr1 <- db[, i] * (((dr2[, i] - p1)/p2)^2)
dr1[,i] <- ifelse(dr2[, i] > p, db[, i], varr1)
for (k in 2:ncol(b)) {
dr2[,k] <- pmin(dr2[, k-1] + (weather[k-1] - dr1[, k-1]) /100, fc)
}
}
values(r1) <- dr1
values(r2) <- dr2

algebra using rasters and dataframes

I want to predict vegetation health using 2 remote sensing vegetation indices (VIs) for multiple tree-stands across multiple months. I previously approached this by using a for() loop to iterate through a list of multi-band rasters and calculate the two VIs for each raster (month) using a given equation. I then used raster::extract() to extract the pixels corresponding to each stand. However, I now would like to include some additional variables in my predictions of vegetation health, and am having trouble integrating them using the same method as they are simply columns in a dataframe and not rasters. I'm open to different ways to do this, I just can't think of any.
example:
#Part 1: Loading libraries and creating some sample data
library(sf)
library(raster)
library(terra)
#polygons to generate random points into
v <- vect(system.file("ex/lux.shp", package="terra"))
v <- v[c(1:12)]
v_sf <- st_as_sf(v) # Convert 'SpatVector' into 'sf' object
#5 rasters (months) with 5 bands each
r <- rast(system.file("ex/elev.tif", package="terra"))
r <- rep(r, 5) * 1:5
names(r) <- paste0("band", 1:5)
ras_list <- list(r,r,r,r,r)
#generating some points (10 forest stands)
pnts <- st_sample(v_sf, size = 10, type = "random")
pnts<- as_Spatial(pnts)
#Part 2: Loop to predict vegetation health using two VI variables
vis <- list() #empty list to store NDVI rasters
for (i in seq_along(ras_list)) {
b <- ras_list[[i]]
#vegetation health = 1.23 + (0.45 * VI1) - (0.67 * VI2)
vis[i] <- 1.23 + 0.45*((b[[4]] + b[[3]] - b[[1]]) / (b[[4]] + b[[3]])) - 0.67*(b[[1]] * b[[3]] - b[[4]])
}
#Part 3: Loop to extract pixel values for each forest stand
vi_vals <- list() #empty list to store extracted pixel values
for (i in 1:length(vis)) {
n <- raster(vis[[i]])
vi_vals[[i]] <- raster::extract(n, pnts, method = "bilinear")
}
This method works fine but as I mentioned above, I now need to repeat the same process using a new equation which incorporates variables that can't be calculated from a raster. These values are simply 3 columns in a dataframe that are identified by a stand ID.

Let's first simplify your example a bit
Example data
library(terra)
v <- vect(system.file("ex/lux.shp", package="terra"))
r <- rast(system.file("ex/elev.tif", package="terra"))
r <- rep(r, 5) * 1:5
names(r) <- paste0("b", 1:5)
ras_list <- list(r,r,r,r,r)
set.seed(1)
pnts <- spatSample(v, 10, "random")
values(pnts) = data.frame(id=10, a=5:14, b=3:12, d=6:15)
Compute VI and extract
vis <- list()
for (i in seq_along(ras_list)) {
b <- ras_list[[i]]
vis[[i]] <- 1.23 + 0.45*((b[[4]] + b[[3]] - b[[1]]) / (b[[4]] + b[[3]])) - 0.67*(b[[1]] * b[[3]] - b[[4]])
}
vis <- rast(vis)
names(vis) = paste0("set", 1:5)
vi_vals <- extract(vis, pnts, method = "bilinear")
And now you can do something with the tree parameters
out <- t(t(vi_vals[,-1])) * pnts$a + pnts$b / pnts$d
It would be more efficient to first extract the values and then apply the function
e <- list()
for (i in seq_along(ras_list)) {
x <- extract(ras_list[[i]], pnts, method="bilinear")[,-1]
e[[i]] = (1.23 + 0.45*((x$b4 + x$b3 - x$b1) / (x$b4 + x$b3)) - 0.67*(x$b1 * x$b3 - x$b4)) * pnts$a + pnts$b / pnts$d
}
e <- do.call(cbind, e)
The results are not exactly the same; I assume because of loss of decimal number precision in one or the other method.

Delete every second row and column from spatial point data in R

I have converted a raster to a point matrix in R. The file has 3 columns, x (lon), y (lat) and v (pixel value) - I am now looking to delete every second column by x and every second row by y as shown in the upper left corner of the image but am at loss how to do this. The idea is to thin the data without any interpolation or resampling.
Sample data as shown can be accessed here: https://drive.google.com/file/d/1XGEPsPEyrVNLEcZy-C6ES5915kWIaqGz/view?usp=sharing

When asking an R question, please always include a minimal reproducible, self-contained example, that is show some code and do not rely on files that must be downloaded.
As you started out with raster data, it is probably easiest to manipulate the raster data before creating points.
With the raster package:
Example data
library(raster)
r <- raster(nrow=20, ncol=20, xmn=0, xmx=1, ymn=0, ymx=1, crs="+proj=utm +zone=1 +datum=WGS84")
values(r) <- 1:ncell(r)
p <- rasterToPoints(r)
plot(r)
points(p, cex=.5)
Solution
i <- seq(1, nrow(r), 2)
j <- seq(1, ncol(r), 2)
r[i,] <- NA
r[, j] <- NA
pp <- rasterToPoints(r)
points(pp, pch=20, cex=2)
Or with the terra package:
library(terra)
r <- rast(nrow=20, ncol=20, xmin=0, xmax=1, ymin=0, ymax=1, crs="+proj=utm +zone=1 +datum=WGS84")
values(r) <- 1:ncell(r)
p <- as.points(r)
plot(r)
points(p, cex=.5)
i <- seq(1, nrow(r), 2)
j <- seq(1, ncol(r), 2)
r[i,] <- NA
r[, j] <- NA
pp <- as.points(r)
points(pp, pch=20, cex=2)

Does this work? Hard to know what to manipulate without a reproducible example and desired output, but this should remove even rows and columns from your matrix.
library(dplyr)
matrix(1:100, nrow = 10) %>%
as.data.frame() %>%
filter(row_number() %% 2 != 0) %>%
select(seq(1, ncol(.), 2)) %>%
as.matrix()

Function to sum each grid cells of raster stack using other rasters as an indicator

## input raster
s <- stack(list.files("~/dailyraster", full.names=TRUE)) # daily raster stack
r_start <- raster("~/stackSumSTART.asc") # this raster contain starting Julian day
r_end <- raster("~/stackSumEND.asc") # this raster contain ending Julian day
noNAcells <- which(!is.na(r[])) # cell numbers which contain values
## dummy raster
x <- r
x[] <- NA
## loop
for (i in noNAcells) {
x[i] <- sum(s[[r_start[i]:r_end[i]]][i])
}
I would like to create a function like stackApply(), but I want it to work on a cell basis.
Above is a for() loop version and it works well, but it takes too much time.
The point is that each cell gets the range of sum() from two raster layers, r_start, r_end in above script.
Now I am struggling to transform this code using apply() family.
Is there any possibility to improve the speed with for() loop? or please give me some tips to write this code in apply()
Any comments will help me, thank you.

Your approach
x <- s$layer.1
system.time(
for (i in 1:ncell(x)) {
x[i] <- sum(s[[r_start[i]:r_end[i]]][i], na.rm = T)
}
)
user system elapsed
0.708 0.000 0.710
My proposal
You can add the rasters used as indices at the end of your stack and then use calc to highly speed up the process (~30-50x).
s2 <- stack(s, r_start, r_end)
sum_time <- function(x) {sum(x[x[6]:x[7]], na.rm = T)}
system.time(
output <- calc(s2, fun = sum_time)
)
user system elapsed
0.016 0.000 0.015
all.equal(x, output)
[1] TRUE
Sample Data
library(raster)
# Generate rasters of random values
r1 <- r2 <- r3 <- r4 <- r5 <- r_start <- r_end <- raster(ncol=10, nrow=10)
r1[] <- rnorm(ncell(r1), 1, 0.2)
r2[] <- rnorm(ncell(r2), 1, 0.2)
r3[] <- rnorm(ncell(r3), 1, 0.2)
r4[] <- rnorm(ncell(r4), 1, 0.2)
r5[] <- rnorm(ncell(r5), 1, 0.2)
s <- stack(r1,r2,r3,r4,r5)
r_start[] <- sample(1:2, ncell(r_start),replace = T)
r_end[] <- sample(3:5, ncell(r_end),replace = T)

Combine two raster layers, setting NA values in non-mask layer to 0 where mask layer is not NA

I have two raster layers that I wish to combine into one. Let's call them mask (with values 1 and NA), and vrs.
library(raster)
mask <- raster(ncol=10, nrow=10)
mask[] <- c(rep(0, 50), rep(1, 50))
mask[mask < 0.5] <- NA
vrs <-raster(ncol=10, nrow=10)
vrs[] <- rpois(100, 2)
vrs[vrs >= 4] <- NA
I wish to combine two big layers, but for the sake of understanding these small examples are ok. What I wish to do is to set the pixel values of my output layer to zero for those pixels where mask layer is 1 and vrs layer is NA. All other pixels should remain with the values of original vrs.
This is my only thought as to how:
zero.for.NA <- function(x, y, filename){
out <- raster(y)
if(canProcessInMemory(out, n = 4)) { #wild guess..
val <- getValues(y) #values
NA.pos <- which(is.na(val)) #positiones for all NA-values in values-layer
NA.t.noll.pos<-which(x[NA.pos]==1) #Positions where mask is 1 within the
#vector of positions of NA values in vrs
val[NA.pos[NA.t.noll.pos]] <- 0 #set values layer to 0 where condition met
out <- setValues(out, val)
return(out)
} else { #for large rasters the same thing by chunks
bs <- blockSize(out)
out <- writeStart(out, filename, overwrite=TRUE)
for (i in 1:bs$n) {
v <- getValues(y, row=bs$row[i], nrows=bs$nrows[i])
xv <- getValues(x, row=bs$row[i], nrows=bs$nrows[i])
NA.pos <- which(is.na(v))
NA.t.noll.pos <- which(xv[NA.pos]==1)
v[NA.pos[NA.t.noll.pos]] <- 0
out <- writeValues(out, v, bs$row[i])
}
out <- writeStop(out)
return(out)
}
}
This function did work on the small example and seems to work on the bigger ones. Is there a faster/better way of doing this? Some way that is better for larger files? I will have to use this on many sets of layers and I would appreciate any help in making the process safer and or quicker!

I'd use cover():
r <- cover(vrs, mask-1)
plot(r)

You can do this with overlay, as well:
r <- overlay(mask, vrs, fun=function(x, y) ifelse(x==1 & is.na(y), 0, y))

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Deal with huge dataset? - raster

You might try the raster package. raster can work with large rasters. It is used by lots of people who do spatial analyses in R. You will save a lot of time if you first read the vignette Introduction to the raster package to get familiar with the package.

Related

Applying complex mathematical functions on Raster Stacks/Bricks in R and creating two different Raster stacks which are dependent on each other

algebra using rasters and dataframes

Delete every second row and column from spatial point data in R

Function to sum each grid cells of raster stack using other rasters as an indicator

Combine two raster layers, setting NA values in non-mask layer to 0 where mask layer is not NA

Categories

Resources