findpeaks replacement zero length error - r

I am currently working with a raster comprised of 1,750,000 data points of a storm taken last winter. I am using the pracma::findpeaks() function in an effort to quantify and analyze periods of the storm. Every hour of the storm occurs over 90,000 data points, and I would like to get an hour by hour analysis. Over some hour intervals, the function works perfectly:
findpeaks(winddf$s1[1609931:1699931], nups = 3, ndowns = 3, minpeakheight = 10.79, minpeakdistance = 5)
returns 110 peaks over this interval with these parameters for a peak
However, over another 90,000 count interval I get this error message after I run this:
findpeaks(winddf$s1[179133:269132], nups = 3, ndowns = 3, minpeakheight = 8.84, minpeakdistance = 5)
Error in xp[i] <- which.max(x[x1[i]:x2[i]]) + x1[i] - 1 :
replacement has length zero
The only changes made were the threshold minpeakheight and the interval at which I am viewing. The function works for all 3000 count intervals, some 8000 count invervals, and a few 15,000 count intervals, but I would much rather just perform 20 analysis of the 20 hours over 90,000 count intervals of the storm than perform 600 analysis using 3000 count intervals. I can not give the complete code or data as the data file is too large. Thank you.

I had the same issue with the findpeaks function. I found out that I had some non-numerical values in my vector that were producing NAs. I converted all NAs to zero with the line of code below and this resolved the issue.
dt$X5[is.na(dt$X5)] <- 0
Where dt is a dataframe and X5 is the vector you are plugging into the findpeaks function.

Related

Time series daily data modeling

I am looking to forecast my time series. I have the following period daily data 2021-Jan-1 to 2022-Jul-1.
So I have a column of observations for each day.
what I tried so far:
d1=zoo(data, seq(from = as.Date("2021-01-01"), to = as.Date("2022-07-01"), by = 1))
tsdata <- ts(d1, frequency = 365)
ddata <- decompose(tsdata, "multiplicative")
I get following error here:
Error in decompose(tsdata, "multiplicative") :
time series has no or less than 2 periods
From what i have read it seems like because I do not have two full years? is that correct? I have tried doing it weekly as well:
series <- ts(data, frequency = 52, start = c(2021, 1))
getting the same issue.
How do I go about it without having to extend my dataset to two years since I do not have that, and still being able to decompose it?
Plus when I am actually trying to forecast it, it isn't giving me good enough forecast:
Plot with forecast
My data somewhat resembles a bell curve during that period. so is there a better fitting timeseries model I can apply instead?
A weekly frequency for daily data should have frequency = 7, not 52. It's possible that this fix to your code will produce a model with a seasonal term.
I don't think you'll be able to produce a time series model with annual seasonality with less than 2 years of data.
You can either produce a model with only weekly seasonality (I expect this is what most folks would recommend), or if you truly believe in the annual seasonal pattern exhibited in your data, your "forecast" can be a seasonal naive forecast that is simply last year's value for that particular day. I wouldn't recommend this, because it just seems risky, and I don't really see the same trajectory in your screenshot over 2022 that's apparent in 2021.
decompose requires two full cycles and that a full cycle represent 1 time unit. ts class can't use Date class anyways. To use frequency 7 we must use times 1/7th apart such as 1, 1+1/7, 1+2/7, etc. so that 1 cycle (7 days) covers 1 unit. Then just label the plot appropriately rather than using those times on the X axis. In the code below use %Y in place of %y if the years start in 19?? and end in 20?? so that tapply maintains the order.
# test data
set.seed(123)
s <- seq(from = as.Date("2021-01-01"), to = as.Date("2022-07-01"), by = 1)
data <- rnorm(length(s))
tsdata <- ts(data, freq = 7)
ddata <- decompose(tsdata, "multiplicative")
plot(ddata, xaxt = "n")
m <- tapply(time(tsdata), format(s, "%y/%m"), head, 1)
axis(1, m, names(m))

Simulation on the probability of doing laundry together

I am trying to simulate a self-made probability problem "Suppose there are 6 households living in a unit of an apartment complex. On average, a single household does laundry twice a week for 2 hours each time. Find the probability that any two households doing laundry at the same time."
However, I was able to simulate for the case when a single household does laundry ONCE a week (R code below) and would appreciate any help on extending the code to the scenario for doing laundry TWICE a week.
I also attempted to find a theoretical solution but it did not match with my simulation results below. Any help is appreciated. Thanks!
dist.min <- function(x) {
ifelse(min(dist(x)) <= 2 * 3600 - 1, T, F)
}
set.seed(12345)
N <- 100000
mat <- matrix(sample(1:(24 * 60 * 60 * 7), N * 6, replace = T), ncol = 6)
is.same <- apply(mat, 1, dist.min)
mean(is.same) # 0.30602
Hi if I understood the problem correctly I would take such an approach.
This is binomial distribution where n=6 number of families and p of success that a family is doing laundry is 4/168 as 4 hours divided by number of week hours.
Then theoretical probability of at least 2 families doing laundry at the same time is
sum(dbinom(2:6,6,4/168))
which gives about 0.7%
And as per simulation let's create a matrix with 6 columns per each family and 10K rows as
number of simulation. Then let's fill matrix with 1(doing laundry) and 0(not) where
probs correspond probabilities of doing a laundry at any point in time.
Running this code I am getting about 0.7% probability of 2 or more families doing laundry at the same time
mat<-replicate(6,sample(0:1,size = 10000,replace=T,prob = c(164/168,4/168)))
table(rowSums(mat))

R: How do i aggregate losses by a poisson observation?

I'm new to R but i am trying to use it in order to aggregate losses that are observed from a severity distribution by an observation from a frequency distribution - essentially what rcompound does. However, i need a more granular approach as i need to manipulate the severity distribution before 'aggregation'.
Lets take an example. Suppose you have:
rpois(10,lambda=3)
Thereby, giving you something like:
[1] 2 2 3 5 2 5 6 4 3 1
Additionally, suppose we have severity of losses determined by:
rgamma(20,shape=1,scale=10000)
So that we also have the following output:
[1] 233.0257 849.5771 7760.4402 731.5646 8982.7640 24172.2369 30824.8424 22622.8826 27646.5168 1638.2333 6770.9010 2459.3722 782.0580 16956.1417 1145.4368 5029.0473 3485.6412 4668.1921 5637.8359 18672.0568
My question is: what is an efficient way to get R to take each Poisson observation in turn and then aggregate losses from my severity distribution? For example, the first Poisson observation is 2. Therefore, adding two observations (the first two) from my Gamma distribution gives 1082.61.
I say this needs to be 'efficient' (run time) due to the fact:
- The Poisson parameter may be come significantly large, i.e. up to 1000 or so.
- The realisations are likely to be up to 1,000,000, i.e. up to a million Poisson and Gamma observations to sort through.
Any help would be greatly appreciated.
Thanks, Dave.
It looks like you want to split the gamma vector at positions indicated by the accumulation of the poisson vector.
The following function (from here) does the splitting:
splitAt <- function(x, pos) unname(split(x, cumsum(seq_along(x) %in% pos)))
pois <- c(2, 2, 3, 5, 2, 5, 6, 4, 3, 1)
gam <- c(233.0257, 849.5771, 7760.4402, 731.5646, 8982.7640, 24172.2369, 30824.8424, 22622.8826, 27646.5168, 1638.2333, 6770.9010, 2459.3722, 782.0580, 16956.1417, 1145.4368, 5029.0473, 3485.6412, 4668.1921, 5637.8359, 18672.0568)
posits <- cumsum(pois)
Then do the following:
sapply(splitAt(gam, posits + 1), sum)
[1] 1082.603 8492.005 63979.843 61137.906 17738.200 19966.153 18672.057
According to post I linked to above, the splitAt() function slows down for large arrays, so you could (if necessary) consider the alternatives proposed in that post. For my part, I generated 1e6 poissons and 1e6 gammas, and the above function ran in 0.78 sec on my machine.

calculate lag from phase arrows with biwavelet in r

I'm trying to understand the cross wavelet function in R, but can't figure out how to convert the phase lag arrows to a time lag with the biwavelet package. For example:
require(gamair)
data(cairo)
data_1 <- within(cairo, Date <- as.Date(paste(year, month, day.of.month, sep = "-")))
data_1 <- data_1[,c('Date','temp')]
data_2 <- data_1
# add a lag
n <- nrow(data_1)
nn <- n - 49
data_1 <- data_1[1:nn,]
data_2 <- data_2[50:nrow(data_2),]
data_2[,1] <- data_1[,1]
require(biwavelet)
d1 <- data_1[,c('Date','temp')]
d2 <- data_2[,c('Date','temp')]
xt1 <- xwt(d1,d2)
plot(xt1, plot.phase = TRUE)
These are my two time series. Both are identical but one is lagging the other. The arrows suggest a phase angle of 45 degrees - apparently pointing down or up means 90 degrees (in or out of phase) so my interpretation is that I'm looking at a lag of 45 degrees.
How would I now convert this to a time lag i.e. how would I calculate the time lag between these signals?
I've read online that this can only be done for a specific wavelength (which I presume means for a certain period?). So, given that we're interested in a period of 365, and the time step between the signals is one day, how would one alculate the time lag?
So I believe you're asking how you can determine what the lag time is given two time series (in this case you artificially added in a lag of 49 days).
I'm not aware of any packages that make this a one-step process, but since we are essentially dealing with sin waves, one option would be to "zero out" the waves and then find the zero crossing points. You could then calculate the average distance between zero crossing points of wave 1 and wave 2. If you know the time step between measurements, you can easy calculate the lag time (in this case the time between measurement steps is one day).
Here is the code I used to accomplish this:
#smooth the data to get rid of the noise that would introduce excess zero crossings)
#subtracted 70 from the temp to introduce a "zero" approximately in the middle of the wave
spline1 <- smooth.spline(data_1$Date, y = (data_1$temp - 70), df = 30)
plot(spline1)
#add the smoothed y back into the original data just in case you need it
data_1$temp_smoothed <- spline1$y
#do the same for wave 2
spline2 <- smooth.spline(data_2$Date, y = (data_2$temp - 70), df = 30)
plot(spline2)
data_2$temp_smoothed <- spline2$y
#function for finding zero crossing points, borrowed from the msProcess package
zeroCross <- function(x, slope="positive")
{
checkVectorType(x,"numeric")
checkScalarType(slope,"character")
slope <- match.arg(slope,c("positive","negative"))
slope <- match.arg(lowerCase(slope), c("positive","negative"))
ipost <- ifelse1(slope == "negative", sort(which(c(x, 0) < 0 & c(0, x) > 0)),
sort(which(c(x, 0) > 0 & c(0, x) < 0)))
offset <- apply(matrix(abs(x[c(ipost-1, ipost)]), nrow=2, byrow=TRUE), MARGIN=2, order)[1,] - 2
ipost + offset
}
#find zero crossing points for the two waves
zcross1 <- zeroCross(data_1$temp_smoothed, slope = 'positive')
length(zcross1)
[1] 10
zcross2 <- zeroCross(data_2$temp_smoothed, slope = 'positive')
length(zcross2)
[1] 11
#join the two vectors as a data.frame (using only the first 10 crossing points for wave2 to avoid any issues of mismatched lengths)
zcrossings <- as.data.frame(cbind(zcross1, zcross2[1:10]))
#calculate the mean of the crossing point differences
mean(zcrossings$zcross1 - zcrossings$V2)
[1] 49
I'm sure there are more eloquent ways of going about this, but it should get you the information that you need.
In my case, for the tidal wave in semidiurnal, 90 degree equal to 3 hours (90*12.5 hours/360 = 3.125 hours). 12.5 hours is the period of semidiurnal. So, for 45 degree equal to -> 45*12.5/360 = 1.56 hours.
Thus in your case:
90 degree -> 90*365/360 = 91.25 hours.
45 degree -> 45*365/360= 45.625 hours.
My understanding is as follows:
For there to be a simple cause-and-effect relationship between the phenomena recorded in the time series, we would expect that the oscillations are phase-locked (Grinsted 2004); so, the period where you find the "in phase" arrow (--->) indicates the lag between the signals.
See the simulated examples with different distances between cause-and-effect phenomena; observe that greater the distance, greater is the period of occurrence of the "in phase arrow" in the Cross wavelet transform.
Nonlinear Processes in Geophysics (2004) 11: 561–566 SRef-ID: 1607-7946/npg/2004-11-561
See the example here

R times series frequency

I have a time series data set which consists of 374 days of data points (1 data point for each day). I am struggling to understand the frequency parameter in ts function, so I left it empty:
ts_0615391206 <- ts(demand_rev_0615391206$estimated_demand,
start=as.Date(min(demand_rev_0615391206$date),format = "d%/m%/Y%"),
end=as.Date(max(demand_rev_0615391206$date),format = "d%/m%/Y%"),
#frequency = 1
)
plot.ts(ts_0615391206)
however, when I try to decompose using:
ts_0615391206_components <- decompose(ts_0615391206)
I am getting the error:
Error in decompose(ts_0615391206) :
time series has no or less than 2 periods
How do I decide how many period there are in my data and consequently what us the parameter "frequency" value should be?
> dput(head(ts_0615391206))
c(2.71, 2.47, 3.86, 3.61, 5.78, 5.59)
>
> str(ts_0615391206)
Time-Series [1:194] from 16125 to 16318: 2.71 2.47 3.86 3.61 5.78 5.59 3.28 3.4 3.34 3.68 ...
ts_06153912061 <- ts(ts_0615391206, start = c(1999,1), frequency = 365) for Daily
ts_06153912061 <- ts(ts_0615391206, start = c(1999,1), frequency = 52) for Weekly
ts_06153912061 <- ts(ts_0615391206, start = c(1999,1), frequency = 12) for Monthly
ts_06153912061 <- ts(ts_0615391206, start = c(1999,1), frequency = 4) for Quarterly
ts_06153912061 <- ts(ts_0615391206, start = c(1999,1), frequency = 1) for Yearly or Annually
Per the documentation ?ts:
...one could use a value of 7 for frequency when the data are sampled
daily, and the natural time period is a week, or 12 when the data are
sampled monthly and the natural time period is a year. Values of 4 and
12 are assumed in (e.g.) print methods to imply a quarterly and
monthly series respectively.
Try setting frequency = 7.
The decompose() function from the stats R package decomposes given time series into trend, seasonal component and reminder part. The seasonal component is a strictly periodical time series, and the period's length equals to the frequency of the time series. For example, if you set frequency = m, the decompose() function build decomposition which seasonal component will have period m.
The decompose() function works if m is integer, m > 1, and length of the time series is greater or equal to 2m.
The help page ?decompose notes that the time series should "covers an integer number of complete periods" in order the function works well. So it may be better if the series length is multiple of m.
There are no clear periodicity in your data. May be this discussion will be useful for you, because it contains Rob Hyndman's R script to reveal periodicity in a series.
If you have data on daily basis then frequency is 365 for 365 days for a year as it contains 365 entries for a year as definition of frequency said it.
when you try to decompose using:
ts_0615391206_components <- decompose(ts_0615391206)
You are getting the error:
Error in decompose(ts_0615391206) :
time series has no or less than 2 periods
Because it takes at least two time series i.e. two years data to train the model
otherwise it would throw an error

Resources