Looping dataframe values into a raster through time - r

I have a raster N showing the overall distribution of a species.
The raster cells have a value of 1 where the species is present, and a value of 0 otherwise.
I also have a data frame DF showing the relative biomass of this same species over time:
Biomass<-c(0.9, 1.2, 1.3)
Year<-c(1975, 1976, 1977)
DF<-c(Biomass, Year)
I would like to create (and save) a new raster for each year of my time series through a loop, where all my raster cells originally equal to 1 N[N==1] are replaced by the biomass value found in DF for that specific year.
For example, all the cells originally equaling 1 would be replaced by 0.9 and the raster would be saved as N-1975.
The idea would be to create a loop, but I cannot find anything on looping values of a dataframe into a raster.
I would like to end up with one raster per year "N-1975", "N-1976"...
Thank you !

What spatial information are you working with? If you have a simple xy coordinate system you can use rasterfromXYZ(df) to give you a raster layer for each column of your data frame, as long as your first two columns are x and y coordinates respectively. If you're using some other projection then you can specify it in the function: (https://rdrr.io/cran/raster/man/rasterFromXYZ.html)
#make som random data
x<-c(1,4,3,2,4)
y<-c(4,3,1,1,4)
#best to avoid only numbers as col names
X1975<- rnorm(5, 4,1)
X1976<- rnorm(5,5,1)
#make df
df<- cbind(x,y,X1975,X1976)
#make raster
biomass_raster <- rasterFromXYZ(df)
biomass_raster
#returns
class : RasterBrick
dimensions : 4, 4, 16, 2 (nrow, ncol, ncell, nlayers)
resolution : 1, 1 (x, y)
extent : 0.5, 4.5, 0.5, 4.5 (xmin, xmax, ymin, ymax)
crs : NA
source : memory
names : X1975, X1976
min values : 1.290337, 4.523350
max values : 4.413451, 6.512719
#plot all layers: plot(biomass_raster)
#access specific layer by calling biomass_raster$X1975

I ended up finding how to solve this issue, so I will post it here in case anybody runs into the same problem :)
N_loop <- N
years <- 1975:2020
for(i in seq(length(years))){
N_loop[N == 1] <- DF$Biomass[i]
writeRaster(N_loop, paste0("N", years[i], ".asc"), overwrite = TRUE)
}

Related

Using raster predict with SIMCA model error: incorrect number of subscripts on matrix

I would like to generate a raster with categorical (factor) predictions for each pixel in a hyperspectral image. I use rasters to represent the image on an x y coordinate system. The model I have built is a SIMCA model using training data from 4 known classes (roughly 4000 spectra) using 230 wavelengths in the short-wave infrared.
There are no pixel coordinates for the training data because the data has come from an assortment of images, the location is irrelevant, only the correct class label is used to group the data into 4 unique classes.
SIMCA_class1 <- simca(
TrainX_class1,
"Class_1",
ncomp = min(nrow(TrainX_class1) - 1, ncol(TrainX_class1) - 1,5),
x.test = TestX_SIMCA,
c.test = TestY_SIMCA,
cv = list("rand",5,5)
)
#repeat for classes 2, 3 and 4
#collect all models into a classifier called "model"
model = simcam(list(SIMCA_class1, SIMCA_class2, SIMCA_class3,SIMCA_class4))
Training data is of class data.frame and has wavelengths as columns (230 total) with colnames in the format "X1078", 1078 being the wavelength in nm. Individual spectra are rows.
I'd like to import rasters and classify each pixel (spectrum) using the SIMCA model, with the output being a rasterStack with layers corresponding to the four classes. I'm using raster::predict as follows and getting an error:
#raster format
class : RasterBrick
dimensions : 362, 319, 115478, 230 (nrow, ncol, ncell, nlayers)
resolution : 1, 1 (x, y)
extent : -0.5, 318.5, -0.5, 361.5 (xmin, xmax, ymin, ymax)
crs : NA
source : memory
names : X1076, X1082, X1088, X1094, X1100, X1106, X1112, X1118, X1124, X1130, X1136, X1142, X1148, X1154, X1160, ...
min values : -0.3505875, -0.3798712, -0.3721167, -0.3709650, -0.3550088, -0.3486861, -0.3705409, -0.4246334, -0.4115991, -0.4514934, -0.5019244, -0.5359062, -0.7215977, -0.9135590, -1.0096027, ...
max values : 0.5234201, 0.5179902, 0.4630154, 0.4388970, 0.5348250, 0.7429393, 0.8700208, 0.9548087, 0.8703760, 0.6833284, 0.3929231, 0.3720537, 0.3276540, 0.3088802, 0.3346077, ...
res = raster::predict(new_raster, model, fun=predict)
#Error in v[cells, ] <- predv : incorrect number of subscripts on matrix
I've made sure the dimensions and names of the columns match the names of the raster layers, the only difference is that the image I'm classifying has 115478 spectra, but this shouldn't matter, should it?
SO: What does this error mean here?

How can I subset a raster by conditional statement in R using `terra`?

I am trying to plot only certain values from a categorical land cover raster I am working with. I have loaded it in to R using the terra package and it plots fine. However, since the original data did not come with a legend, I am trying to find out which raster value corresponds to what on the map.
Similar to the answer provided here: How to subset a raster based on grid cell values
I have tried using the following line:
> landcover
class : SpatRaster
dimensions : 20057, 63988, 1 (nrow, ncol, nlyr)
resolution : 0.0005253954, 0.0005253954 (x, y)
extent : -135.619, -102, 59.99989, 70.53775 (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 (EPSG:4326)
source : spat_n5WpgzBuVAV3Ijm.tif
name : CAN_LC_2015_CAL_wgs
min value : 1
max value : 18
> plot(landcover[landcover == 18])
Error: cannot allocate vector of size 9.6 Gb
However, this line takes a very long time to run and produces a vector memory error. The object is 1.3 kb in the global environment and the original tif is about 300 mb.
You can use cats to find out which values correspond to which categories.
library(terra)
set.seed(0)
r <- rast(nrows=10, ncols=10)
values(r) <- sample(3, ncell(r), replace=TRUE) - 1
cls <- c("forest", "water", "urban")
levels(r) <- cls
names(r) <- "land cover"
cats(r)[[1]]
# ID category
#1 0 forest
#2 1 water
#3 2 urban
To plot a logical (Boolean) layer for one category, you can do
plot(r == "water")
And from from the above you can see that in this case that is equivalent to
plot(r == 1)
I think I found the solution to write the conditional within the plot function as below:
plot(landcover == 18)
For those looking for a reproduceable example, just load the rlogo:
s <- rast(system.file("ex/logo.tif", package="terra"))
s <- s$red
plot(s == 255)

time and geographical subset of netcdf raster stack or raster brick using R

For the following netcdf file with daily global sea surface temperatures for 2016, I'm trying to (i) subset temporally, (ii) subset geographically, (iii) then take long-term means for each pixel and create a basic plot.
Link to file: here
library(raster)
library(ncdf4)
open the netcdf after setting my working directory
nc_data <- nc_open('sst.day.mean.2016.v2.nc')
change the time variable so it's easy to interpret
time <- ncdf4::ncvar_get(nc_data, varid="time")
head(time)
change to dates that I can interpret
time_d <- as.Date(time, format="%j", origin=as.Date("1800-01-01"))
Now I'd like to subset only September 1 to October 15, but can't figure that out...
Following temporal subset, create raster brick (or stack) and geographical subset
b <- brick('sst.day.mean.2016.v2.nc') # I would change this name to my file with time subest
subset geographically
b <- crop(b, extent(144, 146, 14, 16))
Finally, I'd like to take the average for each pixel across all my days of data, assign this to a single raster, and make a simple plot...
Thanks for any help and guidance.
After b <- brick('sst.day.mean.2016.v2.nc'), we can type b to see information of the raster brick.
b
# class : RasterBrick
# dimensions : 720, 1440, 1036800, 366 (nrow, ncol, ncell, nlayers)
# resolution : 0.25, 0.25 (x, y)
# extent : 0, 360, -90, 90 (xmin, xmax, ymin, ymax)
# coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
# data source : C:\Users\basaw\Downloads\sst.day.mean.2016.v2.nc
# names : X2016.01.01, X2016.01.02, X2016.01.03, X2016.01.04, X2016.01.05, X2016.01.06, X2016.01.07, X2016.01.08, X2016.01.09, X2016.01.10, X2016.01.11, X2016.01.12, X2016.01.13, X2016.01.14, X2016.01.15, ...
# Date : 2016-01-01, 2016-12-31 (min, max)
# varname : sst
Notice that the Date slot has information from 2016-01-01 to 2016-12-31, which means the Z values already has date information and we can use that to subset the raster brick.
We can use the getZ function to access the values stored in the Z values. Type getZ(b) we can see a series of dates.
head(getZ(b))
# [1] "2016-01-01" "2016-01-02" "2016-01-03" "2016-01-04" "2016-01-05" "2016-01-06"
class(getZ(b))
# [1] "Date"
We can thus use the following code to subset the raster brick.
b2 <- b[[which(getZ(b) >= as.Date("2016-09-01") & getZ(b) <= as.Date("2016-10-15"))]]
We can then crop the image based on the code you provided.
b3 <- crop(b2, extent(144, 146, 14, 16))
To calculate the average, just use the mean function.
b4 <- mean(b3, na.rm = TRUE)
Finally, we can plot the average.
plot(b4)
The subsetting and averaging task is easy to do in CDO:
cdo timmean -sellonlatbox,lon1,lon2,lat1,lat2 -seldate,date1,date2 in.nc out.nc
where the lon1,lon2 etc define the lon-lat area to cut out and date1,date2 are the date bounds.
You can call this command directly from R using the climate operators package as per this question.
So for example, without the piping, on 3 lines would be in R:
cdo("seldate,date1,date2",in.fname,out1.fname,debug=TRUE)
cdo("sellonlatbox,lon1,lon2,lat1,lat", out1.fname,out2.fname,debug=TRUE)
cdo("timmean",out2.fname,out.fname,debug=TRUE)

How to properly crop() raster data extent in R

I'm trying to crop some raster data and do some calculations (getting the mean sea surface temperature, specifically).
However, when comparing cropping the extent of the raster data before doing the calculations gives me the same result as doing the calculations before cropping the resulting data.
The original extent of the raster data is -180, 180, -90, 90 (xmin, xmax, ymin, ymax), and I need to crop it to any desired region defined by latitude and longitude coordinates.
This is the script I'm doing tests with:
library(raster) # Crop raster data
library(stringr)
# hadsstR functions ----------------------------------------
load_hadsst <- function(file = "./HadISST_sst.nc") {
b <- brick(file)
NAvalue(b) <- -32768 # Land
return(b)
}
# Transform basin coordinates into numbers
morph_coords <- function(coords){
coords[1] = ifelse(str_extract(coords[1], "[A-Z]") == "W", - as.numeric(str_extract(coords[1], "[^A-Z]+")),
as.numeric(str_extract(coords[1], "[^A-Z]+")) )
coords[2] = ifelse(str_extract(coords[2], "[A-Z]") == "W", - as.numeric(str_extract(coords[2], "[^A-Z]+")),
as.numeric(str_extract(coords[2], "[^A-Z]+")) )
coords[3] = ifelse(str_extract(coords[3], "[A-Z]") == "S", - as.numeric(str_extract(coords[3], "[^A-Z]+")),
as.numeric(str_extract(coords[3], "[^A-Z]+")) )
coords[4] = ifelse(str_extract(coords[4], "[A-Z]") == "S", - as.numeric(str_extract(coords[2], "[^A-Z]+")),
as.numeric(str_extract(coords[4], "[^A-Z]+")) )
return(coords)
}
# Comparison test ------------------------------------------
hadsst.raster <- load_hadsst(file = "~/Hadley/HadISST_sst.nc")
x <- hadsst.raster
nms <- names(x)
months <- c("01","02","03","04","05","06","07","08","09","10","11","12")
coords <- c("85E", "90E", "5N", "10N")
coords <- morph_coords(coords)
years = 1970:1974
range = 5:12
# Crop before calculating mean
x <- crop(x, extent(as.numeric(coords[1]), as.numeric(coords[2]),
as.numeric(coords[3]), as.numeric(coords[4])))
xMeans <- vector(length = length(years)-1,mode='list')
for (ix in seq_along(years[1:length(years)])){
xMeans[[ix]] <- mean(x[[c(sapply(range,function(x) grep(paste0(years[ix],'.',months[x]),nms)))]], na.rm = T)
}
mean.brick1 <- do.call(brick,xMeans)
# Calculate mean before cropping
x <- hadsst.raster
xMeans <- vector(length = length(years)-1,mode='list')
for (ix in seq_along(years[1:length(years)])){
xMeans[[ix]] <- mean(x[[c(sapply(range,function(x) grep(paste0(years[ix],'.',months[x]),nms)))]], na.rm = T)
}
mean.brick2 <- do.call(brick,xMeans)
mean.brick2 <- crop(mean.brick2, extent(as.numeric(coords[1]), as.numeric(coords[2]),
as.numeric(coords[3]), as.numeric(coords[4])))
# Compare the two rasters
mean.brick1 - mean.brick2
This is the output of mean.brick1 - mean.brick2:
class : RasterBrick
dimensions : 5, 5, 25, 5 (nrow, ncol, ncell, nlayers)
resolution : 1, 1 (x, y)
extent : 85, 90, 5, 10 (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +datum=WGS84
data source : in memory
names : layer.1, layer.2, layer.3, layer.4, layer.5
min values : 0, 0, 0, 0, 0
max values : 0, 0, 0, 0, 0
As you can see, both RasterBricks are exactly the same, which should be impossible for any arbitrary choice of coordinates, as exemplified below with a small matrix:
Is there something I'm doing wrong? Cropping the data before doing calculations with them should unequivocally give me different results.
Ok, I'll continue from my post in your previous question:
We start out with the full hadsst.raster brick (which for having a reproducible example, can be fake created with the first part of my solution in my previous answer).
So this dataset has the dimensions 180, 360, 516, meaning 180 rows, 360 columns and 516 temporal layers.
Technically, a raster being a matrix, this could be how it looks like:
Just a bunch of matrix layers (516 to be precise), where each pixel is exactly aligned. Here I only have three example layers, the rest is indicated by the three dots.
So if we do temporal averaging, we basically extract all the values for a single pixel and take the mean (or any other averaging operation) of them. This is indicated here by the red squares.
This also shows why cropping does not influence the temporal averaging:
If we say the orange square is our extent of interest and we perform the cropping operation before the averaging, we basically discard all values around this square. After that we again take all the values for each pixel over all layers and perform our average.
This should make now clear, why it doesn't matter when you discard the pixel around the orange square. You could also calculate the average for them and discard the values afterwards, leaving you with just the values of your orange square. It just doesn't make any real sense if you're already sure you won't need them for further calculations.
Regardless, the values inside the square won't be affected.
When we talk about spatial averaging, it generally means averaging over pixel within a single layer, in this case probably over the values inside the orange rectangle.
Two common operations for that are
focal averaging (also known as neighbourhood averaging)
aggregation
The focal averaging will take will take for each pixel the average of all values of a defined number of adjacent pixels (most common is a 3x3square, where the pixel to be defined is the central one).
The aggregation is literally taking a number of pixel and combining them into a bigger pixel. This means that not only the value of this pixel will be averaged, but also that the resulting raster will have less individual pixels and a coarser resolution.
Alright, coming to the actual solution for you:
I assume you have an area of interest defined by an extent aoi:
aoi <- extent(xmin,xmax,ymin,ymax)
The first thing you would do is crop the initial brick to reduce the computational burden:
hadsst.raster_crp <- crop(hadsst.raster,aoi)
The next step is the temporal averaging, where we use the function I've defined in the solution from my other post:
hadsst.raster_crp_avg <- hadSSTmean(hadsst.raster_crp, 1969:2011, first.range = 11:12, second.range = 1:4)
Alright, now you have your temporal averages just for your region of interest. The next step depends on what your ultimate goal is.
As far as I understood, you just need a single average per temporal average for your region of interest.
If that is the case, it might be the right time to leave the actual raster domain and continue with base R:
res <- lapply(1:nlayers(hadsst.raster_crp_avg),function(ix) mean(as.matrix(hadsst.raster_crp_avg[[ix]])))
This will give you a list with as many elements as your brick hadsst.raster_crp_avg has.
Using lapply, we iterate through the layers, converting each layer into a matrix and then calculating the mean over all elements leaving us with a single value per averaged-timestep for the entire area of interest.
Going further you can use unlistto convert it to a vector and the add it to a data.frame or perform any other operation you like.
Hopefully that was clear and this is what you were looking for.
Best

Getting Data out of raster file in R

I'm new to raster files, but they seem to be the best way to open up the large gov't files that have all the weather data, so I'm trying to figure out how to use them. For reference, I'm downloading the files located here (just some run of the mill weather stuff). When I use the raster package of R to import the file like this
> r <- raster("/path/to/file.grb")
Everything works fine. I can even get a little metadata when I type in
> r
class : RasterLayer
band : 1 (of 37 bands)
dimensions : 224, 464, 103936 (nrow, ncol, ncell)
resolution : 0.125, 0.125 (x, y)
extent : -125.0005, -67.0005, 25.0005, 53.0005 (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +a=6371200 +b=6371200 +no_defs
data source : /path/to/file.grb
names : NLDAS_MOS0125_H.A20140629.0100.002
All I've managed to do at this point is index the raster in a very obvious way.
> r[100,100]
267.1
So, I guess I can "index" it, but I have no idea what the number 267.1 means. It's certainly not all there is in the cell. There should be a bunch of variables including, but not limited to, soil moisture, surface runoff, and evaporation.
How can I access this information in the same way using R?
# create two rasters
r1 <- raster(matrix(ncol = 10, nrow = 10, runif(100)))
r2 <- raster(matrix(ncol = 10, nrow = 10, runif(100)))
# creates a raster stack -- the stack (or brick function) allows you to
# to use multilayer band rasters
# http://www.inside-r.org/packages/cran/raster/docs/stack
st_r <- stack(r1, r2)
# extract values -- will create a matrix with 100 rows and two columns
vl <- getValues(st_r)
r <- raster("/path/to/file.grb")
values <- getValues(r)
You can read about the function here:
http://www.inside-r.org/packages/cran/raster/docs/values
I believe that the problem is that you are using raster and not stack. The raster function results in a single layer (matrix) whereas stack or brick read an array with all of the raster layers. Here is an example that demonstrates extracting values using an [i,j,z] index.
library(raster)
setwd("D:/TMP")
download.file("ftp://hydro1.sci.gsfc.nasa.gov/data/s4pa/NLDAS/NLDAS_MOS0125_H.002/2014/180/NLDAS_MOS0125_H.A20140629.0000.002.grb",
destfile="NLDAS_MOS0125_H.A20140629.0000.002.grb", mode="wb")
r <- stack("NLDAS_MOS0125_H.A20140629.0000.002.grb")
names(r) <- paste0("L", seq(1:nlayers(r)))
class(r)
# Values for [i,j]
i=100
j=100
r[i,j]
# Values for i,j and z at layer(s) 1, 5 and 10
z=c(1,5,10)
r[i,j][z]

Resources