Autopltoing a dataset that has been converted to TS

Autopltoing a dataset that has been converted to TS - r

I have a dataset named "BEDATA_grouped" that I am trying to autoplot. However, whenever I attempt to convert into a time series and use the autoplot function, I get the following error:
Error in `ggplot2::autoplot()`:
! Objects of type tbl_df/tbl/data.frame not supported by autoplot.
Run `rlang::last_error()` to see where the error occurred.
To convert it to a time series I did the following :
BEDATA_GROUPED %>% mutate(occurrence_yrmn = yearmonth(occurrence_yrmn)) %>% as_tsibble(index = occurrence_yrmn)
Weirdly enough, when I use the following, I am able to use the autoplot function:
BEDATA_GROUPEDts <- ts(BEDATA_GROUPED[,2], frequency = 12, start = c(2014, 1))
I'm wondering why one allows me to autoplot whilst the other does not. The first way is referenced in https://otexts.com/fpp3/tsibbles.html#:~:text=This%20can%20be%20converted%20to%20a%20tsibble%20object%20using%20the%20following%20code%3A.
Dataset:
structure(list(occurrence_yrmn = c("2014-January", "2014-February",
"2014-March", "2014-April", "2014-May", "2014-June", "2014-July",
"2014-August", "2014-September", "2014-October", "2014-November",
"2014-December", "2015-January", "2015-February", "2015-March",
"2015-April", "2015-May", "2015-June", "2015-July", "2015-August",
"2015-September", "2015-October", "2015-November", "2015-December",
"2016-January", "2016-February", "2016-March", "2016-April",
"2016-May", "2016-June", "2016-July", "2016-August", "2016-September",
"2016-October", "2016-November", "2016-December", "2017-January",
"2017-February", "2017-March", "2017-April", "2017-May", "2017-June",
"2017-July", "2017-August", "2017-September", "2017-October",
"2017-November", "2017-December", "2018-January", "2018-February",
"2018-March", "2018-April", "2018-May", "2018-June", "2018-July",
"2018-August", "2018-September", "2018-October", "2018-November",
"2018-December", "2019-January", "2019-February", "2019-March",
"2019-April", "2019-May", "2019-June", "2019-July", "2019-August",
"2019-September", "2019-October", "2019-November", "2019-December",
"2020-January", "2020-February", "2020-March", "2020-April",
"2020-May", "2020-June", "2020-July", "2020-August", "2020-September",
"2020-October", "2020-November", "2020-December", "2021-January",
"2021-February", "2021-March", "2021-April", "2021-May", "2021-June",
"2021-July", "2021-August", "2021-September", "2021-October",
"2021-November", "2021-December"), MCI = c(586, 482, 567, 626,
625, 610, 576, 634, 636, 663, 657, 556, 513, 415, 510, 542, 549,
618, 623, 666, 641, 632, 593, 617, 541, 523, 504, 536, 498, 552,
522, 519, 496, 541, 602, 570, 571, 492, 560, 525, 507, 523, 593,
623, 578, 657, 683, 588, 664, 582, 619, 512, 630, 644, 563, 654,
635, 732, 639, 748, 719, 567, 607, 746, 739, 686, 805, 762, 696,
777, 755, 675, 704, 617, 732, 609, 464, 487, 565, 609, 513, 533,
505, 578, 526, 418, 428, 421, 502, 452, 509, 492, 478, 469, 457,
457)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-96L))

You have created a tsibble. The autoplot for tsibbles is in feasts. Just add library(feasts) before you do autoplot. I see you are using the wonderful Forecasting: Principles and Practice. You'll get all the packages you need using library(fpp3).
You'll need to assign your BEDATA_GROUPED %>% ... pipe to a variable too: BEDATA_GROUPED <- BEDATA_GROUPED %>% ....
library(tsibble)
library(feasts)
library(dplyr)
df <- df |> mutate(occurrence_yrmn = yearmonth(occurrence_yrmn)) |> as_tsibble(index = occurrence_yrmn)
autoplot(df)
Hope this helps :-)

Related

Conditional peak & valley signal detection in realtime timeseries data [R]

I have a timeseries data which contain some peaks and valleys which significantly differ from the threshold level (an example vector is provided below).
The peak/valley height/width may vary as well as the noise level.
I am interested in finding & reporting both peaks and valleys.
Currently I am using a function based on this thread:
Peak signal detection in realtime timeseries data
However, I would like to improve it according to my needs: I would like to introduce an additional parameter: I would like to ignore the peaks/valleys if they are not significant enough (below certain height - peaks, not enough downward). On the attached picture, I put red circles on peaks I would like to be reported; unmarked ones should be ignored).
Here is an example vector:
signal <- c(659, 613, 586, 642, 685, 695, 691, 733, 638, 708, 706, 691, 703, 712, 715, 700, 693, 682, 717, 711, 722, 700, 704, 704, 715, 691, 670, 684, 689, 711, 680, 692, 686, 710, 702, 699, 702, 715, 691, 670, 684, 689, 711, 673, 688, 699, 677, 701, 680, 692, 686, 710, 702, 699, 717, 691, 703, 712, 715, 700, 693, 682, 717, 711, 722, 700, 704, 704, 715, 691, 670, 684, 689, 711, 680, 692, 686, 710, 702, 699, 702, 706, 651, 712, 722, 734, 705, 714, 691, 704, 704, 669, 712, 715, 689, 715, 691, 670, 684, 689, 711, 651, 712, 722, 734, 705, 714, 691, 686, 676, 690, 693, 702, 694, 682, 693, 724, 693, 707, 684, 687, 705, 680, 680, 705, 680, 693, 700, 704, 704, 669, 712, 715, 689, 715, 691, 670, 684, 689, 711, 673, 678, 699, 677, 680, 682, 676, 690, 658, 675, 663, 667, 682, 673, 675, 656, 652, 563, 544, 542, 540, 532, 538, 505, 526, 565, 629, 720, 713, 720, 720, 773, 732, 740, 695, 689, 723, 685, 726, 710, 684, 693, 715, 692, 683, 712, 707, 693, 699, 717, 703, 687, 682, 690, 716, 708, 713, 700, 676, 708, 691, 717, 711, 722, 688, 695, 641, 666, 638, 639, 600, 635, 609, 653, 671, 649, 716, 708, 713, 700, 676, 708, 691, 717, 711, 722, 700, 704, 704, 669, 712, 715, 689, 715, 691, 670, 684, 689, 711, 673, 688, 699, 677, 701, 680, 692, 686, 710, 702, 699, 717, 691, 703, 712, 715, 700, 693, 682, 717, 711, 722, 700, 704, 704, 715, 691, 670, 684, 689, 711, 680, 692, 686, 710, 702, 699, 702, 715, 691, 670, 684, 689, 711, 673, 688, 699, 677, 701, 680, 692, 686, 710, 702, 699, 717, 691, 703, 712, 715, 700, 693, 682, 717, 711, 722, 700, 704, 704, 715, 691, 670, 684, 769, 767, 740, 752, 686, 710, 702, 699, 702, 706, 651, 712, 722, 734, 705, 714, 691, 704, 704, 669, 712, 715, 689, 715, 691, 670, 684, 689, 711, 704, 669, 712, 715, 689, 715, 691, 670, 684, 689, 711, 673, 688, 699, 677, 701, 680, 692, 686, 710, 702, 699, 717, 691, 703, 712, 715, 700, 693, 682, 717, 711, 722, 700, 704, 704, 715, 691, 670, 684, 689, 711, 680, 692, 686, 710, 702, 699, 702, 715, 691, 670, 684, 689, 711, 673, 688, 699, 677, 701, 680, 692, 686, 710, 702, 699, 665, 630, 808, 686, 787, 781, 796, 815, 786, 793, 664, 717, 691, 703, 712, 715, 700, 693, 682, 717, 711, 722, 700, 704, 704, 715, 691, 670, 684, 689, 711, 680, 692, 686, 710, 702, 699, 702, 706, 651, 712, 722, 734, 705, 714, 691, 704, 704, 669, 712, 715, 689, 715, 691, 670, 684, 689, 711)
And here is a depiction of what I want (red circles - peaks to be reported):

I am the author of the original algorithm you were referring to.
To answer your question, let's first discuss the characteristics of your data:
The timeseries is stationary: the average value seems to be constant around 700
There are infrequent peaks, both up and down
The peaks do not change the distribution or the trend of the data
The peaks are clustered: if there is a peak (signal), the peak persists for approximately 20-30 data points
Because your timeseries is stationary, you should put the influence parameter close to zero. Because there are infrequent peaks, you should put the threshold high (above 3 std. devs). Because the peaks are clustered, you should put the lag parameter as high as you can.
For example, using:
lag <- 130
threshold <- 4
influence <- 0
Yields the following:
Here is the accompanying code:
ThresholdingAlgo <- function(y,lag,threshold,influence) {
signals <- rep(0,length(y))
filteredY <- y[0:lag]
avgFilter <- NULL
stdFilter <- NULL
avgFilter[lag] <- mean(y[0:lag], na.rm=TRUE)
stdFilter[lag] <- sd(y[0:lag], na.rm=TRUE)
for (i in (lag+1):length(y)){
if (abs(y[i]-avgFilter[i-1]) > threshold*stdFilter[i-1]) {
if (y[i] > avgFilter[i-1]) {
signals[i] <- 1;
} else {
signals[i] <- -1;
}
filteredY[i] <- influence*y[i]+(1-influence)*filteredY[i-1]
} else {
signals[i] <- 0
filteredY[i] <- y[i]
}
avgFilter[i] <- mean(filteredY[(i-lag):i], na.rm=TRUE)
stdFilter[i] <- sd(filteredY[(i-lag):i], na.rm=TRUE)
}
return(list("signals"=signals,"avgFilter"=avgFilter,"stdFilter"=stdFilter))
}
y <- c(659, 613, 586, 642, 685, 695, 691, 733, 638, 708, 706, 691, 703, 712, 715, 700, 693, 682, 717, 711, 722, 700, 704, 704, 715, 691, 670, 684, 689, 711, 680, 692, 686, 710, 702, 699, 702, 715, 691, 670, 684, 689, 711, 673, 688, 699, 677, 701, 680, 692, 686, 710, 702, 699, 717, 691, 703, 712, 715, 700, 693, 682, 717, 711, 722, 700, 704, 704, 715, 691, 670, 684, 689, 711, 680, 692, 686, 710, 702, 699, 702, 706, 651, 712, 722, 734, 705, 714, 691, 704, 704, 669, 712, 715, 689, 715, 691, 670, 684, 689, 711, 651, 712, 722, 734, 705, 714, 691, 686, 676, 690, 693, 702, 694, 682, 693, 724, 693, 707, 684, 687, 705, 680, 680, 705, 680, 693, 700, 704, 704, 669, 712, 715, 689, 715, 691, 670, 684, 689, 711, 673, 678, 699, 677, 680, 682, 676, 690, 658, 675, 663, 667, 682, 673, 675, 656, 652, 563, 544, 542, 540, 532, 538, 505, 526, 565, 629, 720, 713, 720, 720, 773, 732, 740, 695, 689, 723, 685, 726, 710, 684, 693, 715, 692, 683, 712, 707, 693, 699, 717, 703, 687, 682, 690, 716, 708, 713, 700, 676, 708, 691, 717, 711, 722, 688, 695, 641, 666, 638, 639, 600, 635, 609, 653, 671, 649, 716, 708, 713, 700, 676, 708, 691, 717, 711, 722, 700, 704, 704, 669, 712, 715, 689, 715, 691, 670, 684, 689, 711, 673, 688, 699, 677, 701, 680, 692, 686, 710, 702, 699, 717, 691, 703, 712, 715, 700, 693, 682, 717, 711, 722, 700, 704, 704, 715, 691, 670, 684, 689, 711, 680, 692, 686, 710, 702, 699, 702, 715, 691, 670, 684, 689, 711, 673, 688, 699, 677, 701, 680, 692, 686, 710, 702, 699, 717, 691, 703, 712, 715, 700, 693, 682, 717, 711, 722, 700, 704, 704, 715, 691, 670, 684, 769, 767, 740, 752, 686, 710, 702, 699, 702, 706, 651, 712, 722, 734, 705, 714, 691, 704, 704, 669, 712, 715, 689, 715, 691, 670, 684, 689, 711, 704, 669, 712, 715, 689, 715, 691, 670, 684, 689, 711, 673, 688, 699, 677, 701, 680, 692, 686, 710, 702, 699, 717, 691, 703, 712, 715, 700, 693, 682, 717, 711, 722, 700, 704, 704, 715, 691, 670, 684, 689, 711, 680, 692, 686, 710, 702, 699, 702, 715, 691, 670, 684, 689, 711, 673, 688, 699, 677, 701, 680, 692, 686, 710, 702, 699, 665, 630, 808, 686, 787, 781, 796, 815, 786, 793, 664, 717, 691, 703, 712, 715, 700, 693, 682, 717, 711, 722, 700, 704, 704, 715, 691, 670, 684, 689, 711, 680, 692, 686, 710, 702, 699, 702, 706, 651, 712, 722, 734, 705, 714, 691, 704, 704, 669, 712, 715, 689, 715, 691, 670, 684, 689, 711)
lag <- 130
threshold <- 4
influence <- 0
# Run algo with lag = 30, threshold = 5, influence = 0
result <- ThresholdingAlgo(y,lag,threshold,influence)
# Plot result
par(mfrow = c(2,1),oma = c(2,2,0,0) + 0.1,mar = c(0,0,2,1) + 0.2)
plot(1:length(y),y,type="l",ylab="",xlab="")
lines(1:length(y),result$avgFilter,type="l",col="cyan",lwd=2)
lines(1:length(y),result$avgFilter+threshold*result$stdFilter,type="l",col="green",lwd=2)
lines(1:length(y),result$avgFilter-threshold*result$stdFilter,type="l",col="green",lwd=2)
plot(result$signals,type="S",col="red",ylab="",xlab="",ylim=c(-1.5,1.5),lwd=2)
}
Now if you also wish to find the minimum or maximum value within each signal, you can simply overlay the result array on your y (data) array and identify where it achieves the highest or lowest value.
For example (using explicit loops instead of a quick one-liner):
cluster$flag <- FALSE
cluster$sign <- FALSE
cluster$data <- NULL
cluster$index <- NULL
for(i in 1:length(result$signals)) {
if (abs(result$signals[i]) == 1.0) {
cluster$flag <- TRUE
cluster$sign <- result$signals[i]
cluster$data <- append(cluster$data, y[i])
cluster$index <- append(cluster$index, i)
} else {
if (cluster$flag) {
if (cluster$sign == 1) {
print(cat("The positive cluster starting at", cluster$index[1], "has maximum", max(cluster$data), "\n"), sep='')
} else {
print(cat("The negative cluster starting at", cluster$index[1], "has minimum", min(cluster$data), "\n"), sep='')
}
}
cluster$flag <- FALSE
cluster$sign <- NULL
cluster$data <- NULL
cluster$index <- NULL
}
}
This gives the following output:
The negative cluster starting at 158 has minimum 505
The positive cluster starting at 410 has maximum 808
The positive cluster starting at 412 has maximum 815
Hope that helps!

Let's start by plotting your data, along with its mean:
plot(signal, type = 'l')
abline(h = mean(signal), col = 'blue3', lty = 2)
Since you want the upwards peak near the end of the series to be included but not the downwards spike at the very start, we need to find the number of standard deviations that would identify this peak but not the one at the start. This turns out to be about 3.5 standard deviations, as we can see if we plot the 3.5 standard deviation lines:
abline(h = mean(signal) + c(3.5, -3.5) * sd(signal), lty = 2, col = 'red')
Now comes the tricky part. We use run length encoding to identify which contiguous parts of the sequence are outside of the 3.5 standard deviations, and find the point of largest absolute deviation for each:
big <- abs(signal - mean(signal)) > 3.5 * sd(signal)
exceed <- split(seq_along(big), data.table::rleid(big))[rle(big)$value]
peaks <- sapply(exceed, function(x) x[which.max(abs(signal[x] - mean(signal)))])
Now the vector peaks will contain only the maximum or minimum point within each spike. Note although you only have two spikes in your sample data, this will work for however many spikes you have.
To demonstrate, let us plot the points:
points(peaks, signal[peaks], col = 'red', pch = 16)

Maybe you want something like this:
max.index <- which.max(signal)
max <- max(signal)
min.index <- which.min(signal)
min <- min(signal)
plot(signal, type = "l")
points(max.index, max, col = "red", pch = 16)
points(min.index, min, col = "red", pch = 16)
Output:

R geom_forescast use case interpretation

Since I just started getting familiar with forecasting, so I stumbled upon the example here based on which I have a few questions:
How can I forecast for the next 5 years?
What are the red and blue shaded areas around the forecast lines and what's the interpretation?
Why is there a break between the forecast lines and the historical lines?
What forecasting model does geom_forecast use?
lungDeaths data:
structure(c(2134, 1863, 1877, 1877, 1492, 1249, 1280, 1131, 1209,
1492, 1621, 1846, 2103, 2137, 2153, 1833, 1403, 1288, 1186, 1133,
1053, 1347, 1545, 2066, 2020, 2750, 2283, 1479, 1189, 1160, 1113,
970, 999, 1208, 1467, 2059, 2240, 1634, 1722, 1801, 1246, 1162,
1087, 1013, 959, 1179, 1229, 1655, 2019, 2284, 1942, 1423, 1340,
1187, 1098, 1004, 970, 1140, 1110, 1812, 2263, 1820, 1846, 1531,
1215, 1075, 1056, 975, 940, 1081, 1294, 1341, 901, 689, 827,
677, 522, 406, 441, 393, 387, 582, 578, 666, 830, 752, 785, 664,
467, 438, 421, 412, 343, 440, 531, 771, 767, 1141, 896, 532,
447, 420, 376, 330, 357, 445, 546, 764, 862, 660, 663, 643, 502,
392, 411, 348, 387, 385, 411, 638, 796, 853, 737, 546, 530, 446,
431, 362, 387, 430, 425, 679, 821, 785, 727, 612, 478, 429, 405,
379, 393, 411, 487, 574), .Dim = c(72L, 2L), .Dimnames = list(
NULL, c("mdeaths", "fdeaths")), .Tsp = c(1974, 1979.91666666667,
12), class = c("mts", "ts", "matrix"))
Code:
library(forecast)
# Data
lungDeaths = cbind(mdeaths, fdeaths)
# Plot
autoplot(lungDeaths) + geom_forecast()
Output:

To remove the gap you can use showgap:
If showgap=FALSE, the gap between the historical observations and the
forecasts is removed.
Code:
library(forecast)
autoplot(lungDeaths) +
geom_forecast(showgap = FALSE)
Output:
To forecast 5 years you can use h to set the number of forecasts:
autoplot(lungDeaths) +
geom_forecast(h = 60, showgap = FALSE)
Output:
To remove the confidence intervals use PI:
If FALSE, confidence intervals will not be plotted, giving only the
forecast line.
library(forecast)
autoplot(lungDeaths) +
geom_forecast(h = 60, showgap = FALSE, PI = FALSE)
Output:

find identical values in different numerics (or columns of a dataframe)

I have two numerics of different length and I need to find a value in the 2nd numeric that is identical with one of the values in the 1st numeric (currently, only one value is identical, but I do not know which). e.g.:
x <- c(15,43,46,76,111,138,205,227,242,330,333,339,348,380,402,403,498,534,579)
y <- c(391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434, 435, 436, 437, 438, 439, 440, 441, 442, 443, 444, 445, 446, 447, 448, 449, 450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464, 465, 466, 467, 469, 470, 471, 472, 473, 474, 475, 476, 477, 478, 479, 480, 481, 482, 483, 484, 485, 486, 487,
488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 498, 499, 500, 501, 503, 504, 505, 506, 507)
My solutions so far failed: I was able to construct a dataframe with the numerics as columns.
df <- dataframe(x=x,y=y)
But:
With which(df$y==15) I can only compare one value at once.
With dplyr::duplicate() I can only find identical values within one column or within the same row.
Currently, I work with just two columns. But it would also be helpful to have code when there are three columns and the goal is to find a value from column 1 in column 2 and 3.
Has anyone an idea?

You can find identical values using intersect for multiple vectors like this:
Reduce(intersect, list(y,x))
Output:
[1] 498

You can use the %in% operator to ask which elements of y are in x e.g.
which(y %in% x)
#> [1] 105
that gives the index of the element(s) in y that are also in x. You can subset with that to find the actual vales:
y[which(y %in% x)]
#> [1] 498
You can then combine %in% operations with & if you have more than one vector you want to check.
z <- 498
y[which(y %in% x & y %in% z)]
#> [1] 498
You'll get back an empty vector if there's no matches.
z <- 500
y[which(y %in% x & y %in% z)]
#> numeric(0)

Google OR-Tools Pickup Delivery on graph structure

Being new to OR-Tools libraries I am unable to modify the existing code for my requirements. I'm trying to solve a routing problem based on a graph structure. Right now the distance matrix is configured in a way that each node/location always has a connection to every other node/location. Is it possible to change the distance matrix in a way (sth. link enter -1) to show google or-tools that there are node/locations which don't have connections to certain other nodes?
"""Simple Pickup Delivery Problem (PDP)."""
from ortools.constraint_solver import routing_enums_pb2
from ortools.constraint_solver import pywrapcp
def create_data_model():
"""Stores the data for the problem."""
data = {}
data['distance_matrix'] = [
[
0, 548, 776, 696, 582, 274, 502, 194, 308, 194, 536, 502, 388, 354,
468, 776, 662
],
[
548, 0, 684, 308, 194, 502, 730, 354, 696, 742, 1084, 594, 480, 674,
1016, 868, 1210
],
[
776, 684, 0, 992, 878, 502, 274, 810, 468, 742, 400, 1278, 1164,
1130, 788, 1552, 754
],
[
696, 308, 992, 0, 114, 650, 878, 502, 844, 890, 1232, 514, 628, 822,
1164, 560, 1358
],
[
582, 194, 878, 114, 0, 536, 764, 388, 730, 776, 1118, 400, 514, 708,
1050, 674, 1244
],
[
274, 502, 502, 650, 536, 0, 228, 308, 194, 240, 582, 776, 662, 628,
514, 1050, 708
],
[
502, 730, 274, 878, 764, 228, 0, 536, 194, 468, 354, 1004, 890, 856,
514, 1278, 480
],
[
194, 354, 810, 502, 388, 308, 536, 0, 342, 388, 730, 468, 354, 320,
662, 742, 856
],
[
308, 696, 468, 844, 730, 194, 194, 342, 0, 274, 388, 810, 696, 662,
320, 1084, 514
],
[
194, 742, 742, 890, 776, 240, 468, 388, 274, 0, 342, 536, 422, 388,
274, 810, 468
],
[
536, 1084, 400, 1232, 1118, 582, 354, 730, 388, 342, 0, 878, 764,
730, 388, 1152, 354
],
[
502, 594, 1278, 514, 400, 776, 1004, 468, 810, 536, 878, 0, 114,
308, 650, 274, 844
],
[
388, 480, 1164, 628, 514, 662, 890, 354, 696, 422, 764, 114, 0, 194,
536, 388, 730
],
[
354, 674, 1130, 822, 708, 628, 856, 320, 662, 388, 730, 308, 194, 0,
342, 422, 536
],
[
468, 1016, 788, 1164, 1050, 514, 514, 662, 320, 274, 388, 650, 536,
342, 0, 764, 194
],
[
776, 868, 1552, 560, 674, 1050, 1278, 742, 1084, 810, 1152, 274,
388, 422, 764, 0, 798
],
[
662, 1210, 754, 1358, 1244, 708, 480, 856, 514, 468, 354, 844, 730,
536, 194, 798, 0
],
]
data['pickups_deliveries'] = [
[1, 6],
[2, 10],
[4, 3],
[5, 9],
[7, 8],
[15, 11],
[13, 12],
[16, 14],
]
data['num_vehicles'] = 4
data['depot'] = 0
return data
def print_solution(data, manager, routing, solution):
"""Prints solution on console."""
print(f'Objective: {solution.ObjectiveValue()}')
total_distance = 0
for vehicle_id in range(data['num_vehicles']):
index = routing.Start(vehicle_id)
plan_output = 'Route for vehicle {}:\n'.format(vehicle_id)
route_distance = 0
while not routing.IsEnd(index):
plan_output += ' {} -> '.format(manager.IndexToNode(index))
previous_index = index
index = solution.Value(routing.NextVar(index))
route_distance += routing.GetArcCostForVehicle(
previous_index, index, vehicle_id)
plan_output += '{}\n'.format(manager.IndexToNode(index))
plan_output += 'Distance of the route: {}m\n'.format(route_distance)
print(plan_output)
total_distance += route_distance
print('Total Distance of all routes: {}m'.format(total_distance))
def main():
"""Entry point of the program."""
# Instantiate the data problem.
data = create_data_model()
# Create the routing index manager.
manager = pywrapcp.RoutingIndexManager(len(data['distance_matrix']),
data['num_vehicles'], data['depot'])
# Create Routing Model.
routing = pywrapcp.RoutingModel(manager)
# Define cost of each arc.
def distance_callback(from_index, to_index):
"""Returns the manhattan distance between the two nodes."""
# Convert from routing variable Index to distance matrix NodeIndex.
from_node = manager.IndexToNode(from_index)
to_node = manager.IndexToNode(to_index)
return data['distance_matrix'][from_node][to_node]
transit_callback_index = routing.RegisterTransitCallback(distance_callback)
routing.SetArcCostEvaluatorOfAllVehicles(transit_callback_index)
# Add Distance constraint.
dimension_name = 'Distance'
routing.AddDimension(
transit_callback_index,
0, # no slack
3000, # vehicle maximum travel distance
True, # start cumul to zero
dimension_name)
distance_dimension = routing.GetDimensionOrDie(dimension_name)
distance_dimension.SetGlobalSpanCostCoefficient(100)
# Define Transportation Requests.
for request in data['pickups_deliveries']:
pickup_index = manager.NodeToIndex(request[0])
delivery_index = manager.NodeToIndex(request[1])
routing.AddPickupAndDelivery(pickup_index, delivery_index)
routing.solver().Add(
routing.VehicleVar(pickup_index) == routing.VehicleVar(
delivery_index))
routing.solver().Add(
distance_dimension.CumulVar(pickup_index) <=
distance_dimension.CumulVar(delivery_index))
# Setting first solution heuristic.
search_parameters = pywrapcp.DefaultRoutingSearchParameters()
search_parameters.first_solution_strategy = (
routing_enums_pb2.FirstSolutionStrategy.PARALLEL_CHEAPEST_INSERTION)
# Solve the problem.
solution = routing.SolveWithParameters(search_parameters)
# Print solution on console.
if solution:
print_solution(data, manager, routing, solution)
if __name__ == '__main__':
main()

Detect peaks with two adjacent identical values using pracma::findpeaks [duplicate]

This question already has answers here:
Find sustained peaks using pracma::findpeaks
(1 answer)
Identify sustained peaks using pracma::findpeaks
(2 answers)
Closed 2 years ago.
I've got some data with 23 peaks. I've used pracma::findpeaks to pick out the peaks. However, one of the peaks has two identical values adjacent each other, at time=7524 and time=7525. It seems findpeaks deals with this by ignoring the peak.
Could I please ask if someone could help me make it recognise it. I'd like it to pick out the first of the two peaks, though it would also be good to know how to make it pick out the last of them as well
data <- data.frame(time=c(1562, 1563, 1564, 1565, 1566, 1810, 1811, 1812, 1813, 1814,
2058, 2059, 2060, 2061, 2306, 2307, 2308, 2309, 2310, 2560, 2561,
2562, 2563, 2564, 3064, 3065, 3066, 3067, 3580, 3581, 3582, 3583,
3584, 4095, 4096, 4097, 4098, 4099, 4610, 4611, 4612, 4613, 4614,
5128, 5129, 5130, 5131, 5132, 5133, 5637, 5638, 5639, 5640, 5641,
5876, 5877, 5878, 5879, 5880, 5881, 5882, 6125, 6126, 6127, 6128,
6129, 6130, 6607, 6608, 6609, 6610, 6611, 6612, 6613, 7072, 7073,
7074, 7075, 7076, 7077, 7078, 7079, 7519, 7520, 7521, 7522, 7523,
7524, 7525, 7526, 7527, 7528, 7941, 7942, 7943, 7944, 7945, 7946,
7947, 7948, 7949, 8342, 8343, 8344, 8345, 8346, 8347, 8348, 8349,
8350, 8351, 8708, 8709, 8710, 8711, 8712, 8713, 8714, 8715, 8716,
8717, 8718, 9045, 9046, 9047, 9048, 9049, 9050, 9051, 9052, 9053,
9054, 9055, 9352, 9353, 9354, 9355, 9356, 9357, 9358, 9359, 9360,
9361, 9362, 9363, 9624, 9625, 9626, 9627, 9628, 9629, 9630, 9631,
9632, 9633, 9634, 9867, 9868, 9869, 9870, 9871, 9872, 9873, 9874,
9875, 9876),
value=c(509, 672, 758, 686, 584, 559, 727, 759, 688, 528, 562, 711,
768, 678, 644, 750, 822, 693, 531, 566, 738, 793, 730, 511, 587,
739, 761, 651, 579, 747, 768, 705, 544, 551, 687, 756, 749, 645,
564, 680, 724, 691, 596, 535, 625, 685, 689, 612, 512, 537, 616,
657, 653, 573, 506, 598, 675, 685, 668, 609, 515, 575, 656, 687,
678, 626, 533, 509, 587, 641, 680, 663, 602, 515, 505, 583, 646,
693, 696, 684, 630, 549, 500, 572, 637, 681, 725, 736, 736, 703,
649, 556, 568, 637, 682, 743, 765, 767, 709, 660, 587, 548, 622,
690, 761, 779, 764, 749, 694, 631, 525, 571, 646, 724, 788, 811,
834, 818, 776, 712, 616, 536, 556, 649, 738, 801, 857, 866, 837,
808, 718, 647, 568, 508, 605, 714, 823, 872, 917, 916, 890, 825,
742, 642, 543, 549, 656, 766, 851, 921, 947, 951, 892, 830, 730,
617, 586, 675, 760, 804, 816, 795, 740, 690, 613, 522))
peaks <- data.frame(findpeaks(data$value, npeaks=23, threshold=100, sortstr=TRUE))
data$n <- seq(1,length(data$value))
data <- merge(x=data, y=peaks, by.x="n", by.y="X2", all.x=TRUE, all.y=TRUE)
ggplot(data, aes(x=time, y=value)) +
geom_col(fill="red") +
geom_point(aes(x=time, y=X1))

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Autopltoing a dataset that has been converted to TS - r

Related

Conditional peak & valley signal detection in realtime timeseries data [R]

R geom_forescast use case interpretation

find identical values in different numerics (or columns of a dataframe)

Google OR-Tools Pickup Delivery on graph structure

Detect peaks with two adjacent identical values using pracma::findpeaks [duplicate]

Categories

Resources