Subsetting a rasterbrick to give mean of three minimum months in each year - r

I'm interested in creating two variables from a time-series of spatial raster data in R from an netcdf file. Opening the data, subsetting by Z and creating a mean value is straight forward:
# Working example below using a ~16mb nc file of sea ice from HadISST
library(R.utils)
library(raster)
library(tidyverse)
download.file("https://www.metoffice.gov.uk/hadobs/hadisst/data/HadISST_ice.nc.gz","HadISST_ice.nc.gz")
gunzip("HadISST_ice.nc.gz", ext="gz", FUN=gzfile)
hadISST <- brick('Datasets/HadISST/HadISST_ice.nc')
# subset to a decade time period and create decadal mean from monthly data
hadISST_a <- hadISST %>% subset(., which(getZ(.) >= as.Date("1900-01-01") & getZ(.) <= as.Date("1909-12-31"))) %>% mean(., na.rm = TRUE)
But, I'm interested in extracting 1) annual mean values, and 2) annual mean of three minimum monthly values for the subsetted time period. My current work flow is using nc_open() and ncvar_get() to open the data, raster::extract() to get the values and then tidverse group_by() and slice_min() to get the annual coolest months, but it's a slow and cpu intensive approach. Is there a more effective way of doing this without converting from raster - data.frame?
Questions:
Using the above code how can I extract annual means rather than a mean of ALL months over the decadal period?
Is there a way of using slice_min(order_by = sst, n = 3) or similar with brick objects to get the minimum three values per year prior to annual averaging?

Example data
if (!file.exists("HadISST_ice.nc")) {
download.file("https://www.metoffice.gov.uk/hadobs/hadisst/data/HadISST_ice.nc.gz","HadISST_ice.nc.gz")
R.utils:::gunzip("HadISST_ice.nc.gz")
}
library(terra)
hadISST <- rast('HadISST_ice.nc')
Annual mean
y <- format(time(hadISST), "%Y")
m <- tapp(hadISST, y, mean)
Mean of the lowest three monthly values by year (this takes much longer because a user-defined R function is used). I now see that there is a bug in the CRAN version. You can instead use version 1.5-47 that you can install like this install.packages('terra', repos='https://rspatial.r-universe.dev').
f <- function(i) mean(sort(i)[1:3])
m3 <- tapp(hadISST, y, f)
To make this faster (if you have multiple cores):
m3 <- tapp(hadISST, y, f, cores=4)

There are likely much more intelligent ways to do this, but a thought as to your annual means, here calling hadISST, hadI:
v <- getValues(hadI)
v_t <- t(v) # this takes a while
v_mean <- vector(mode='numeric')
for(k in 1:nrow(v_t)) {
v_mean[k] = mean(v_t[k, ], na.rm = TRUE)
}
length(v_mean)
[1] 1828
v_mean[1:11]
[1] 0.1651351 0.1593368 0.1600364 0.1890360 0.1931470 0.1995657 0.1982052
[8] 0.1917534 0.1917840 0.1911638 0.1911657
I'm a little unclear on proposition 2, as it seems it would be the average of 3 0's...

Related

R: Calculate standard deviation for specific time interval

I have a dataset with daily bond returns for some unique RIC codes (in total approx. 200.000 observations).
Now I want to calculate the standard deviation of those returns for the combined period t-30 to t-6 and t+6 to t+30. This means for every observation i,t, I need the 24 returns before t in the window t-30 to t-6 and 24 returns in the window t+6 to t+30 and calculate the standard deviation based on those 48 observations.
Here is a small snippet of my dataset:
#My data:
date <- c("2022-05-11", "2022-05-12","2022-05-13","2022-05-16","2022-05-17","2022-05-11", "2022-05-12","2022-05-13","2022-05-16","2022-05-17")
ric <- c("AT0000A1D541=", "AT0000A1D541=", "AT0000A1D541=", "AT0000A1D541=", "AT0000A1D541=", "SE247827293=", "SE247827293=", "SE247827293=", "SE247827293=", "SE247827293=")
return <- c(0.001009681, 0.003925873, 0.000354606, -0.000472641, -0.002935700, 0.003750854, 0.012317347, -0.001314047, 0.001014453, -0.007234452)
df <- data.frame(ric, date, return)
I have tried to use the slider package to generate two lists with the returns of the specific time frame. However, I feel that there is some more efficient way to solve this problem. I hope to find some help here.
This is what I tried before:
x <- slide(df$return, ~.x, .before=30, .after = -6)
y <- slide(df$return, ~.x, .before=-6, .after = 30)
z <- mapply(c, x, y, SIMPLIFY=TRUE)
for (i in 1:length(z))
{
df$sd[i] <- sd(z[[i]])
}

What causes the difference between calc and cellStats in raster calculations in R?

I am working with a dataset that consists of 20 layers, stacked in a RasterBrick (originating from an array). I have looked into the sum of the layers, calculated with both 'calc' and 'cellStats'. I have used calc to calculate the sum of the total values and cellStats to look at the average of the values per layer (useful for a time series).
However, when I sum the average of each layer, it is half the value of the other calculated sum. What causes this difference? What am I overlooking?
Code looks like this:
testarray <- runif(54214776,0,1)
# Although testarray should contain a raster of 127x147 with 2904 time layers.
# Not really sure how to create that yet.
for (i in 1830:1849){
slice<-array2[,,i]
r <- raster(nrow=(127*5), ncol=(147*5), resolution =5, ext=ext1, vals=slice)
x <- stack(x , r)
}
brickhp2 <- brick(x)
r_sumhp2 <- calc(brickhp2, sum, na.rm=TRUE)
r_sumhp2[r_sumhp2<= 0] <- NA
SWEavgpertimestepM <- cellStats(brickhp2, stat='mean', na.rm=TRUE)
The goal is to compare the sum of the layers calculated with 'calc(x, sum)' with the sum of the mean calculated with 'cellStats(x, mean)'.
Rasterbrick looks like this (600kb, GTiff) : http://www.filedropper.com/brickhp2
*If there is a better way to share this, please let me know.
The confusion comes as you are using calc which operates pixel-wise on a brick (i.e. performs the calculation on the 20 values at each pixel and returns a single raster layer) and cellStats which performs the calculation on each raster layer individually and returns a single values for each layer. You can see that the results are comparable if you use this code:
library(raster)
##set seed so you get the same runif vals
set.seed(999)
##create example rasters
ls=list()
for (i in 1:20){
r <- raster(nrow=(127*5), ncol=(147*5), vals=runif(127*5*147*5))
ls[[i]] <- r
}
##create raster brick
brickhp2 <- brick(ls)
##calc sum (pixel-wise)
r_sumhp2 <- calc(brickhp2, sum, na.rm=TRUE)
r_sumhp2 ##returns raster layer
##calc mean (layer-wise)
r_meanhp2 <- cellStats(brickhp2, stat='mean', na.rm=TRUE)
r_meanhp2 ##returns vector of length nlayers(brickhp2)
##to get equivalent values you need to divide r_sumhp2 by the number of layers
##and then calculate the mean
cellStats(r_sumhp2/nlayers(brickhp2),stat="mean")
[1] 0.4999381
##and for r_meanhp2 you need to calculate the mean of the means
mean(r_meanhp2)
[1] 0.4999381
You will need to determine for yourself if you want to use the pixel or layer wise result for your application.

looping through large rasterbrick in raster R

How can i perform for-loop in a large daily rasterbrick to get annual stacks and calculate the maximum value (annual maxima ) for each year( each stack of 365 files).
Basically, i have same question like this. So taking same question as sample, how i can conduct a for-loop that would calculate maximum value for each 46 stacks ( each stack with 8 layers).
I tried using only stackApply but it gives all black/zero value when i run for whole period, however it gives max values if i run for individual years (tested separately for 10 years, i have more than 100 years data).
library(raster)
# example data
sca <- brick(nrow=108,ncol=132,nl=365)
values(sca) <- runif(ncell(sca)*nlayers(sca))
# indices grouping sets of 8
i <- rep(1:ceiling(365/8), each=8)
# the last period is not a complete set of 8 days
i <- i[1:nlayers(sca)]
# This does not work for me, gives output as zero.
x <- stackApply(sca, i, max)
for (i in 1:nlayers(sca)) {
x <- sca[[i]]
xx<-stackApply(sca, i, max)
plot(xx)
# etc.
}
You could loop like this:
library(raster)
sca <- brick(nrow=108,ncol=132,nl=365)
values(sca) <- runif(ncell(sca)*nlayers(sca))
i <- rep(1:ceiling(365/8), each=8)
i <- i[1:nlayers(sca)]
for (j in unique(i)) {
x <- sca[[which(j==i)]]
xx <- max(x, na.rm=TRUE)
# or
# xx <- calc(x, fun=max, na.rm=TRUE, filename = patste0(i, '.tif'))
}

Simple DLNM in R

What I am trying to do is find the relative risk of mortality at the 10th, 50th and 90th percentiles of diurnal temperature range and its additive effects at lags of 0, 1, 3 and 5 days. I'm doing this for a subset of months May-Sept (call subset here for mortality, temperature is already subsetted when read in). I have a code that works below, but no matter what city and what lag I introduce, I get a RR of essentially 1.0, so I believe that something is off or I am missing an argument somewhere. If anyone has more experience with these problems than I, your help would be greatly appreciated.
library('dlnm')
library('splines')
mortdata <- read.table('STLmort.txt', sep="\t", header=T)
morts <- subset(mortdata, Month %in% 5:9)
deaths <- morts$AllMort
tempdata <- read.csv('STLRanges.csv',sep=',',header=T)
temp <- tempdata$Trange
HI <- tempdata$HIrange
#basis.var <- onebasis(1:5, knots=3)
#mklagbasis(maxlag=5, type="poly", degree=3)
basis.temp <- crossbasis(temp,vardegree=3,lag=5)
summary(basis.temp)
model <- glm (deaths ~ basis.temp, family=quasipoisson())
pred.temp <- crosspred(basis.temp, model, at=quantile(temp,c(.10,.50,.90),na.rm=TRUE) , cumul=T)
plot(pred.temp, "slices", var=c(quantile(temp, c(.10, .50, .90),na.rm=TRUE)) ,lag=c(0,1,5))
The problem is you did not put any time variables to control the long-term and seasonal trends in the time-series using DLNM .

calculate lag from phase arrows with biwavelet in r

I'm trying to understand the cross wavelet function in R, but can't figure out how to convert the phase lag arrows to a time lag with the biwavelet package. For example:
require(gamair)
data(cairo)
data_1 <- within(cairo, Date <- as.Date(paste(year, month, day.of.month, sep = "-")))
data_1 <- data_1[,c('Date','temp')]
data_2 <- data_1
# add a lag
n <- nrow(data_1)
nn <- n - 49
data_1 <- data_1[1:nn,]
data_2 <- data_2[50:nrow(data_2),]
data_2[,1] <- data_1[,1]
require(biwavelet)
d1 <- data_1[,c('Date','temp')]
d2 <- data_2[,c('Date','temp')]
xt1 <- xwt(d1,d2)
plot(xt1, plot.phase = TRUE)
These are my two time series. Both are identical but one is lagging the other. The arrows suggest a phase angle of 45 degrees - apparently pointing down or up means 90 degrees (in or out of phase) so my interpretation is that I'm looking at a lag of 45 degrees.
How would I now convert this to a time lag i.e. how would I calculate the time lag between these signals?
I've read online that this can only be done for a specific wavelength (which I presume means for a certain period?). So, given that we're interested in a period of 365, and the time step between the signals is one day, how would one alculate the time lag?
So I believe you're asking how you can determine what the lag time is given two time series (in this case you artificially added in a lag of 49 days).
I'm not aware of any packages that make this a one-step process, but since we are essentially dealing with sin waves, one option would be to "zero out" the waves and then find the zero crossing points. You could then calculate the average distance between zero crossing points of wave 1 and wave 2. If you know the time step between measurements, you can easy calculate the lag time (in this case the time between measurement steps is one day).
Here is the code I used to accomplish this:
#smooth the data to get rid of the noise that would introduce excess zero crossings)
#subtracted 70 from the temp to introduce a "zero" approximately in the middle of the wave
spline1 <- smooth.spline(data_1$Date, y = (data_1$temp - 70), df = 30)
plot(spline1)
#add the smoothed y back into the original data just in case you need it
data_1$temp_smoothed <- spline1$y
#do the same for wave 2
spline2 <- smooth.spline(data_2$Date, y = (data_2$temp - 70), df = 30)
plot(spline2)
data_2$temp_smoothed <- spline2$y
#function for finding zero crossing points, borrowed from the msProcess package
zeroCross <- function(x, slope="positive")
{
checkVectorType(x,"numeric")
checkScalarType(slope,"character")
slope <- match.arg(slope,c("positive","negative"))
slope <- match.arg(lowerCase(slope), c("positive","negative"))
ipost <- ifelse1(slope == "negative", sort(which(c(x, 0) < 0 & c(0, x) > 0)),
sort(which(c(x, 0) > 0 & c(0, x) < 0)))
offset <- apply(matrix(abs(x[c(ipost-1, ipost)]), nrow=2, byrow=TRUE), MARGIN=2, order)[1,] - 2
ipost + offset
}
#find zero crossing points for the two waves
zcross1 <- zeroCross(data_1$temp_smoothed, slope = 'positive')
length(zcross1)
[1] 10
zcross2 <- zeroCross(data_2$temp_smoothed, slope = 'positive')
length(zcross2)
[1] 11
#join the two vectors as a data.frame (using only the first 10 crossing points for wave2 to avoid any issues of mismatched lengths)
zcrossings <- as.data.frame(cbind(zcross1, zcross2[1:10]))
#calculate the mean of the crossing point differences
mean(zcrossings$zcross1 - zcrossings$V2)
[1] 49
I'm sure there are more eloquent ways of going about this, but it should get you the information that you need.
In my case, for the tidal wave in semidiurnal, 90 degree equal to 3 hours (90*12.5 hours/360 = 3.125 hours). 12.5 hours is the period of semidiurnal. So, for 45 degree equal to -> 45*12.5/360 = 1.56 hours.
Thus in your case:
90 degree -> 90*365/360 = 91.25 hours.
45 degree -> 45*365/360= 45.625 hours.
My understanding is as follows:
For there to be a simple cause-and-effect relationship between the phenomena recorded in the time series, we would expect that the oscillations are phase-locked (Grinsted 2004); so, the period where you find the "in phase" arrow (--->) indicates the lag between the signals.
See the simulated examples with different distances between cause-and-effect phenomena; observe that greater the distance, greater is the period of occurrence of the "in phase arrow" in the Cross wavelet transform.
Nonlinear Processes in Geophysics (2004) 11: 561–566 SRef-ID: 1607-7946/npg/2004-11-561
See the example here

Resources