R geom_forescast use case interpretation - r

Since I just started getting familiar with forecasting, so I stumbled upon the example here based on which I have a few questions:
How can I forecast for the next 5 years?
What are the red and blue shaded areas around the forecast lines and what's the interpretation?
Why is there a break between the forecast lines and the historical lines?
What forecasting model does geom_forecast use?
lungDeaths data:
structure(c(2134, 1863, 1877, 1877, 1492, 1249, 1280, 1131, 1209,
1492, 1621, 1846, 2103, 2137, 2153, 1833, 1403, 1288, 1186, 1133,
1053, 1347, 1545, 2066, 2020, 2750, 2283, 1479, 1189, 1160, 1113,
970, 999, 1208, 1467, 2059, 2240, 1634, 1722, 1801, 1246, 1162,
1087, 1013, 959, 1179, 1229, 1655, 2019, 2284, 1942, 1423, 1340,
1187, 1098, 1004, 970, 1140, 1110, 1812, 2263, 1820, 1846, 1531,
1215, 1075, 1056, 975, 940, 1081, 1294, 1341, 901, 689, 827,
677, 522, 406, 441, 393, 387, 582, 578, 666, 830, 752, 785, 664,
467, 438, 421, 412, 343, 440, 531, 771, 767, 1141, 896, 532,
447, 420, 376, 330, 357, 445, 546, 764, 862, 660, 663, 643, 502,
392, 411, 348, 387, 385, 411, 638, 796, 853, 737, 546, 530, 446,
431, 362, 387, 430, 425, 679, 821, 785, 727, 612, 478, 429, 405,
379, 393, 411, 487, 574), .Dim = c(72L, 2L), .Dimnames = list(
NULL, c("mdeaths", "fdeaths")), .Tsp = c(1974, 1979.91666666667,
12), class = c("mts", "ts", "matrix"))
Code:
library(forecast)
# Data
lungDeaths = cbind(mdeaths, fdeaths)
# Plot
autoplot(lungDeaths) + geom_forecast()
Output:

To remove the gap you can use showgap:
If showgap=FALSE, the gap between the historical observations and the
forecasts is removed.
Code:
library(forecast)
autoplot(lungDeaths) +
geom_forecast(showgap = FALSE)
Output:
To forecast 5 years you can use h to set the number of forecasts:
autoplot(lungDeaths) +
geom_forecast(h = 60, showgap = FALSE)
Output:
To remove the confidence intervals use PI:
If FALSE, confidence intervals will not be plotted, giving only the
forecast line.
library(forecast)
autoplot(lungDeaths) +
geom_forecast(h = 60, showgap = FALSE, PI = FALSE)
Output:

Related

ARIMA Modeling running time issue

My data set is a weekly data that contains two variables Production and Shipment. Production is the independent variable and Shipment is the dependent variable. First I'm trying to forecast Production values and use that as a regressor to forecast Shipment variable.
If I run the Arima using the training set date range From-> "2018-12-31" To-> "2021-11-22"
The model runs within 10 minutes and I could see the model values.
Using the Same model, If I extend the training set data range From -"2018-12-31" To-> "2021-12-27"
The model runs for so long as it never finished the model execution and I couldnt view the model output.
Could you please help me with this query.
Thank you for the support
Original.df<-structure(list(YearWeek = c("201901", "201902", "201903", "201904",
"201905", "201906", "201907", "201908", "201909", "201910", "201911",
"201912", "201913", "201914", "201915", "201916", "201917", "201918",
"201919", "201920", "201921", "201922", "201923", "201924", "201925",
"201926", "201927", "201928", "201929", "201930", "201931", "201932",
"201933", "201934", "201935", "201936", "201937", "201938", "201939",
"201940", "201941", "201942", "201943", "201944", "201945", "201946",
"201947", "201948", "201949", "201950", "201951", "201952", "202001",
"202002", "202003", "202004", "202005", "202006", "202007", "202008",
"202009", "202010", "202011", "202012", "202013", "202014", "202015",
"202016", "202017", "202018", "202019", "202020", "202021", "202022",
"202023", "202024", "202025", "202026", "202027", "202028", "202029",
"202030", "202031", "202032", "202033", "202034", "202035", "202036",
"202037", "202038", "202039", "202040", "202041", "202042", "202043",
"202044", "202045", "202046", "202047", "202048", "202049", "202050",
"202051", "202052", "202053", "202101", "202102", "202103", "202104",
"202105", "202106", "202107", "202108", "202109", "202110", "202111",
"202112", "202113", "202114", "202115", "202116", "202117", "202118",
"202119", "202120", "202121", "202122", "202123", "202124", "202125",
"202126", "202127", "202128", "202129", "202130", "202131", "202132",
"202133", "202134", "202135", "202136", "202137", "202138", "202139",
"202140", "202141", "202142", "202143", "202144", "202145", "202146",
"202147", "202148", "202149", "202150", "202151", "202152", "202201",
"202202", "202203"), Shipment = c(399, 1336, 1018, 1126, 1098,
1235, 1130, 1258, 897, 1333, 1221, 1294, 1628, 1611, 1484, 1238,
1645, 1936, 1664, 1482, 2060, 1964, 1875, 1645, 2039, 1640, 733,
1764, 1639, 1968, 1692, 1677, 1542, 1299, 1328, 1130, 1741, 1929,
1843, 1427, 1467, 1450, 1041, 1238, 1721, 1757, 1813, 1001, 1208,
1916, 1435, 540, 681, 1436, 1170, 938, 1206, 1648, 1169, 1311,
1772, 1333, 1534, 1365, 1124, 846, 732, 753, 1266, 1652, 1772,
1814, 1649, 1191, 1298, 986, 1296, 1066, 777, 1041, 1388, 1289,
1097, 1356, 1238, 1732, 1109, 1104, 1155, 1334, 1094, 770, 1411,
1304, 1269, 1093, 1096, 1121, 943, 695, 1792, 2033, 1586, 768,
685, 993, 1406, 1246, 1746, 1740, 938, 160, 1641, 1373, 1023,
1173, 1611, 928, 1038, 1009, 1274, 1369, 1231, 1053, 1163, 880,
870, 1131, 882, 1143, 632, 394, 510, 543, 535, 824, 874, 591,
512, 448, 247, 452, 470, 747, 545, 639, 326, 414, 604, 640, 458,
272, 524, 589, 666, 217, 215, 348, 537, 466), Production = c(794,
1400, 1505, 1055, 1396, 1331, 1461, 1623, 1513, 1667, 1737, 1264,
1722, 1587, 2094, 1363, 2007, 1899, 1749, 1693, 1748, 1455, 2078,
1702, 1736, 1885, 860, 1372, 1716, 1290, 1347, 1451, 1347, 1409,
1203, 1235, 1397, 1557, 1406, 1451, 1704, 670, 1442, 1336, 1611,
1401, 1749, 744, 1558, 1665, 1317, 41, 441, 1351, 1392, 1180,
1447, 1265, 1485, 1494, 1543, 1581, 1575, 1597, 1191, 1386, 889,
1002, 1573, 1380, 1346, 1243, 1009, 965, 1051, 905, 1094, 1194,
891, 1033, 921, 880, 1135, 1058, 1171, 1022, 956, 880, 902, 983,
1014, 945, 1021, 1058, 1191, 1139, 1292, 573, 1173, 514, 1292,
1310, 1239, 41, 41, 1182, 1028, 1028, 1196, 1214, 1045, 256, 1451,
1344, 1352, 1257, 1444, 786, 1369, 1185, 1262, 1025, 949, 1051,
941, 727, 911, 951, 987, 1136, 884, 770, 959, 1102, 1109, 1098,
988, 983, 1002, 904, 1147, 1149, 919, 1058, 1112, 479, 1028,
1154, 1126, 1155, 1208, 536, 839, 1178, 1225, 539, 41, 862, 839,
873)), row.names = c(NA, 160L), class = "data.frame")
# Converting the df to accomodate leap year for weekly observations
Original.df <- Original.df %>%
mutate(
isoweek =stringr::str_replace(YearWeek, "^(\\d{4})(\\d{2})$", "\\1-W\\2-1"),
date = ISOweek::ISOweek2date(isoweek)
)
#creating test and train data- 1st case- Training data until WK47("2021-11-22")
Original.train.df <- Original.df %>%
filter(date >= "2018-12-31", date <= "2021-11-22")
Original.test.df <- Original.df %>%
filter(date >= "2021-11-29", date <= "2021-12-27")
Shipment.Test.df<- Original.test.df %>%
dplyr::select(-YearWeek, -Production, -date,-isoweek) %>% as_tibble()
# splitting the original train data to contain only Week, Dependent and Independent variables
Total.train.df<-Original.train.df %>%
mutate(Week.1 = yearweek(ISOweek::ISOweek(date))) %>%
dplyr::select(-YearWeek,-date,-isoweek) %>%
as_tsibble(index = Week.1)
#Model.1-Fitting forecast model(Arima with Fourier terms) to Production.qty with the training
#until WK47(2021-11-22)
lambda_production<-Total.train.df %>% features(Production,features = guerrero) %>% pull(lambda_guerrero)
bestfit.Prod.1.AICc <- Inf
for(K in seq(25)){
fit.Prod.1 <- Total.train.df %>%
model(ARIMA(box_cox(Production,lambda_production) ~ fourier(K = K), stepwise = FALSE, approximation = FALSE))
if(purrr::pluck(glance(fit.Prod.1), "AICc") < bestfit.Prod.1.AICc)
{
bestfit.Prod.1.AICc <- purrr::pluck(glance(fit.Prod.1), "AICc")
bestfit.Prod.1<- fit.Prod.1
bestK.Prod.1 <- K
}
}
bestK.Prod.1
glance(bestfit.Prod.1)
#creating test and train data- 2nd case- Training data until WK52("2021-12-27")
Original.train.df_2 <- Original.df %>%
filter(date >= "2018-12-31", date <= "2021-12-27")
Original.test.df_2 <- Original.df %>%
filter(date >= "2022-01-03", date <= "2022-01-17")
Shipment.Test.df_2<- Original.test.df_2 %>%
dplyr::select(-YearWeek, -Production, -date,-isoweek) %>% as_tibble()
# splitting the original train data to contain only Week, Dependent and Independent variables
Total.train.df_2<-Original.train.df_2 %>%
mutate(Week.1 = yearweek(ISOweek::ISOweek(date))) %>%
dplyr::select(-YearWeek,-date,-isoweek) %>%
as_tsibble(index = Week.1)
#Model.2-Fitting forecast model(Arima with Fourier terms) to Production.qty with the training
#until WK52
lambda_production_2<-Total.train.df_2 %>% features(Production,features = guerrero) %>% pull(lambda_guerrero)
bestfit.Prod.2.AICc <- Inf
for(K in seq(25)){
fit.Prod.2 <- Total.train.df %>%
model(ARIMA(box_cox(Production,lambda_production_2) ~ fourier(K = K), stepwise = FALSE, approximation = FALSE))
if(purrr::pluck(glance(fit.Prod.1), "AICc") < bestfit.Prod.1.AICc)
{
bestfit.Prod.2.AICc <- purrr::pluck(glance(fit.Prod.2), "AICc")
bestfit.Prod.2<- fit.Prod.2
bestK.Prod.2 <- K
}
}
bestK.Prod.2
glance(bestfit.Prod.2)
On the above model 2 never got executed fully and still the model is running.
As you can see from above, model 1 and model 2 didnt have any difference other than the training data ,so could you please let me know what is it that im missing here.
Thank you

Trying to replicate an IR signal from a remote with a unknown protocol on ESP8266 using Arduino

I have some problems with resending IR signals from a remote to control my shutters.
I recorded the raw IR codes, but even another Arduino does not recieve anything. It does not print any data.
I am a bit confused about the library ESP8266irRemote. It needs a frequency for sending raw ir data. As the timings are given in ms, I do not understand what this frequncy is supposed to be. Where could I read this frequency from? What are some default values? -- EDIT cleared up, it is the carrier frequency. Seems like the default of 38kHz should be right.
And why could it be that my Arduino does not recieve anything? If I simply use an example for a Samsung TV, it receives everything fine.
Thanks for any help!
EDIT:
uint16_t up3[95] = {444, 1190, 442, 1190, 1256, 376, 1258, 374, 440, 1190, 440, 1192, 440, 1192, 440, 1192, 440, 1192, 440, 1192, 1282, 350, 440, 1192, 440, 1192, 440, 1190, 440, 1192, 440, 1192, 440, 1192, 440, 1192, 440, 1192, 440, 1192, 438, 1194, 1256, 374, 1258, 374, 1256, 19240, 440, 1192, 440, 1192, 1282, 350, 1256, 376, 440, 1192, 440, 1192, 440, 1192, 440, 1192, 440, 1192, 440, 1192, 1256, 374, 440, 1192, 440, 1192, 440, 1192, 440, 1192, 438, 1192, 440, 1192, 438, 1192, 440, 1192, 440, 1192, 464, 1168, 1256, 376, 1256, 376, 1256}; // UNKNOWN 87FDCA19
uint16_t stop3[95] = {1288, 346, 448, 1182, 1214, 418, 1222, 410, 444, 1188, 438, 1194, 466, 1164, 448, 1184, 440, 1192, 438, 1192, 1258, 374, 380, 1252, 448, 1182, 466, 1166, 448, 1184, 466, 1166, 448, 1182, 404, 1228, 468, 1164, 378, 1252, 1280, 350, 1256, 376, 448, 1184, 1264, 19234, 1220, 414, 402, 1230, 1284, 348, 1252, 380, 406, 1226, 378, 1252, 404, 1228, 404, 1228, 404, 1228, 438, 1192, 1266, 366, 468, 1164, 406, 1226, 446, 1186, 448, 1184, 448, 1184, 378, 1252, 448, 1184, 400, 1232, 448, 1184, 1264, 368, 1254, 376, 468, 1164, 1264}; // UNKNOWN 6CE4F608
uint16_t dwn3[95] = {398, 1252, 1280, 352, 1284, 348, 1250, 380, 446, 1188, 462, 1170, 432, 1198, 378, 1254, 446, 1186, 442, 1188, 1282, 348, 402, 1230, 464, 1166, 434, 1196, 446, 1186, 446, 1186, 434, 1198, 462, 1168, 446, 1186, 446, 1186, 378, 1252, 400, 1230, 1218, 414, 378, 20118, 466, 1168, 1216, 414, 1262, 370, 1194, 436, 398, 1232, 398, 1232, 380, 1252, 464, 1168, 464, 1166, 466, 1164, 1196, 436, 400, 1232, 444, 1188, 400, 1230, 446, 1188, 466, 1164, 378, 1254, 446, 1186, 444, 1186, 466, 1166, 402, 1230, 458, 1172, 1282, 348, 464}; // UNKNOWN 2744EDAC
uint16_t up2[95] = {466, 1186, 444, 1186, 1262, 370, 444, 1186, 1260, 370, 446, 1186, 444, 1186, 446, 1186, 468, 1162, 446, 1186, 1262, 370, 444, 1188, 444, 1186, 444, 1188, 444, 1188, 444, 1186, 446, 1186, 444, 1188, 444, 1186, 444, 1188, 1262, 368, 1262, 370, 444, 1186, 1262, 19236, 446, 1186, 446, 1186, 1260, 370, 444, 1188, 1262, 370, 444, 1186, 446, 1186, 446, 1186, 446, 1186, 444, 1186, 1262, 370, 446, 1186, 444, 1188, 444, 1188, 446, 1186, 446, 1184, 446, 1186, 446, 1186, 446, 1186, 446, 1184, 1262, 370, 1260, 372, 446, 1186, 1260}; // UNKNOWN 2D1A9455
uint16_t stop2[95] = {1260, 374, 442, 1190, 1256, 376, 440, 1190, 1258, 374, 440, 1190, 440, 1192, 442, 1190, 440, 1192, 440, 1192, 1256, 374, 440, 1190, 440, 1192, 440, 1190, 440, 1192, 440, 1192, 440, 1192, 440, 1192, 440, 1192, 440, 1192, 440, 1192, 1256, 374, 1258, 374, 1256, 19240, 1258, 374, 440, 1192, 1256, 374, 440, 1192, 1256, 374, 440, 1192, 440, 1192, 440, 1190, 440, 1190, 440, 1192, 1256, 374, 440, 1192, 440, 1192, 440, 1192, 440, 1192, 440, 1192, 440, 1190, 440, 1192, 440, 1192, 440, 1190, 440, 1192, 1256, 374, 1256, 376, 1256}; // UNKNOWN B54FF968
uint16_t dwn2[95] = {478, 1156, 1288, 342, 1288, 344, 450, 1182, 1288, 342, 450, 1182, 476, 1154, 452, 1180, 450, 1180, 450, 1182, 1290, 342, 450, 1182, 476, 1156, 478, 1154, 478, 1154, 474, 1158, 450, 1182, 450, 1182, 474, 1156, 450, 1180, 1292, 340, 476, 1156, 474, 1158, 450, 20048, 476, 1156, 1290, 340, 1266, 366, 450, 1182, 1266, 364, 450, 1182, 476, 1156, 476, 1156, 450, 1182, 474, 1156, 1266, 366, 450, 1182, 474, 1156, 476, 1156, 476, 1156, 474, 1156, 450, 1182, 450, 1182, 474, 1158, 474, 1158, 1266, 366, 450, 1180, 450, 1182, 450}; // UNKNOWN 983238A8
IRsend irsend(4);
void setup() {
// put your setup code here, to run once:
irsend.begin();
}
void loop() {
// put your main code here, to run repeatedly:
irsend.sendRaw(dwn3, 95, 999);
delay(10000);
}
That's the code I used. I recoded the raw arrays using the raw dump example provided with the esp8266ir library.
I cut the import part, but be assured, the correct headers were imported. The code compiles without any issue.
Thanks for the suggested edit. I am sorry about the first, not well organized question.
As you did not provide any code and not much information in general I can only guess.
Possible issues:
wrong emitter wavelength
wrong carrier frequency, typically between 30 and 60kHz. 38kHz is most common.
or some error in sending what you have recorded.
I suggest you first find out how a IR remote control works befor you attempt to build one yourself.

Detect peaks with two adjacent identical values using pracma::findpeaks [duplicate]

This question already has answers here:
Find sustained peaks using pracma::findpeaks
(1 answer)
Identify sustained peaks using pracma::findpeaks
(2 answers)
Closed 2 years ago.
I've got some data with 23 peaks. I've used pracma::findpeaks to pick out the peaks. However, one of the peaks has two identical values adjacent each other, at time=7524 and time=7525. It seems findpeaks deals with this by ignoring the peak.
Could I please ask if someone could help me make it recognise it. I'd like it to pick out the first of the two peaks, though it would also be good to know how to make it pick out the last of them as well
data <- data.frame(time=c(1562, 1563, 1564, 1565, 1566, 1810, 1811, 1812, 1813, 1814,
2058, 2059, 2060, 2061, 2306, 2307, 2308, 2309, 2310, 2560, 2561,
2562, 2563, 2564, 3064, 3065, 3066, 3067, 3580, 3581, 3582, 3583,
3584, 4095, 4096, 4097, 4098, 4099, 4610, 4611, 4612, 4613, 4614,
5128, 5129, 5130, 5131, 5132, 5133, 5637, 5638, 5639, 5640, 5641,
5876, 5877, 5878, 5879, 5880, 5881, 5882, 6125, 6126, 6127, 6128,
6129, 6130, 6607, 6608, 6609, 6610, 6611, 6612, 6613, 7072, 7073,
7074, 7075, 7076, 7077, 7078, 7079, 7519, 7520, 7521, 7522, 7523,
7524, 7525, 7526, 7527, 7528, 7941, 7942, 7943, 7944, 7945, 7946,
7947, 7948, 7949, 8342, 8343, 8344, 8345, 8346, 8347, 8348, 8349,
8350, 8351, 8708, 8709, 8710, 8711, 8712, 8713, 8714, 8715, 8716,
8717, 8718, 9045, 9046, 9047, 9048, 9049, 9050, 9051, 9052, 9053,
9054, 9055, 9352, 9353, 9354, 9355, 9356, 9357, 9358, 9359, 9360,
9361, 9362, 9363, 9624, 9625, 9626, 9627, 9628, 9629, 9630, 9631,
9632, 9633, 9634, 9867, 9868, 9869, 9870, 9871, 9872, 9873, 9874,
9875, 9876),
value=c(509, 672, 758, 686, 584, 559, 727, 759, 688, 528, 562, 711,
768, 678, 644, 750, 822, 693, 531, 566, 738, 793, 730, 511, 587,
739, 761, 651, 579, 747, 768, 705, 544, 551, 687, 756, 749, 645,
564, 680, 724, 691, 596, 535, 625, 685, 689, 612, 512, 537, 616,
657, 653, 573, 506, 598, 675, 685, 668, 609, 515, 575, 656, 687,
678, 626, 533, 509, 587, 641, 680, 663, 602, 515, 505, 583, 646,
693, 696, 684, 630, 549, 500, 572, 637, 681, 725, 736, 736, 703,
649, 556, 568, 637, 682, 743, 765, 767, 709, 660, 587, 548, 622,
690, 761, 779, 764, 749, 694, 631, 525, 571, 646, 724, 788, 811,
834, 818, 776, 712, 616, 536, 556, 649, 738, 801, 857, 866, 837,
808, 718, 647, 568, 508, 605, 714, 823, 872, 917, 916, 890, 825,
742, 642, 543, 549, 656, 766, 851, 921, 947, 951, 892, 830, 730,
617, 586, 675, 760, 804, 816, 795, 740, 690, 613, 522))
peaks <- data.frame(findpeaks(data$value, npeaks=23, threshold=100, sortstr=TRUE))
data$n <- seq(1,length(data$value))
data <- merge(x=data, y=peaks, by.x="n", by.y="X2", all.x=TRUE, all.y=TRUE)
ggplot(data, aes(x=time, y=value)) +
geom_col(fill="red") +
geom_point(aes(x=time, y=X1))

How to perform a bootstrap and find 95% confidence interval for the median of a dataset

I am working to perform a bootstrap using the statistic median for dataset "file", containing only one column "Total". This is it:
Total <-
c(2089, 1567, 1336, 1616, 1590, 1649, 1341, 1614, 1590, 1621,
1621, 1631, 1295, 107, 18, 195, 2059, 870, 2371, 787, 98, 2422,
655, 1277, 1336, 2109, 1811, 1337, 1290, 1308, 1359, 1600, 1296,
693, 107, 1359, 89, 89, 89, 89, 2411, 1639, 89, 89, 1283, 89,
89, 89, 2341, 1012, 1295, 1853, 1277, 1571, 1288, 1300, 1619,
107, 555, 1612, 1300, 1300, 2093, 133, 1674, 988, 132, 647, 606,
544, 873, 274, 120, 1620, 1601, 1601, 906, 1603, 1613, 1592,
1603, 1610, 1321, 2380, 1575, 1575, 1277, 2354, 1561, 1579, 2367,
2341, 876, 1612, 1588, 2087, 1612, 890, 1586, 1580, 611, 1797,
2079, 1937, 189, 171, 706, 1647, 1642, 1278, 1650, 1623, 1647,
1661, 1692, 1632, 1684, 2474, 403, 842, 593, 98, 2354, 1265,
866, 1483, 2379, 1650, 1875, 1655, 1632, 1691, 1329, 867, 1632,
1693, 1623, 829, 1659, 1685, 666, 1585, 1659, 2169, 1623, 1645,
1654, 1698, 2172, 789, 1698, 579, 2443, 335, 132, 1952, 1265,
978, 1624, 979, 1729, 607, 181, 752, 424, 386, 309, 998, 1435,
2476, 392, 1657, 348, 1652, 1646, 1345, 2445, 1655, 840, 1624,
1652, 1321, 1321, 2201, 957, 917, 2458, 4096, 2458, 1346, 2459,
1634, 2459, 2459, 2459, 2508, 714, 2457, 2457, 1703, 669, 976,
1634, 2459, 2491, 2393, 625, 1763, 879, 886, 1085, 731, 924,
1649, 1216, 1647, 2470, 668, 2326, 757, 215, 276, 186, 901, 1402,
429, 554, 2457, 1643, 986, 730, 1028, 971, 1952, 1584, 1023,
1352, 839, 2434, 430, 2462, 1327, 1004, 385, 1099, 1067, 758,
679, 1423, 2495, 1664, 2495, 2495, 1345, 2530, 1754, 1804, 2525,
1652, 2536, 1646, 2529, 1380, 1845, 963, 1339, 2482, 1417, 1729,
1384, 1648, 344, 1648, 955, 609, 485, 1822, 513, 223, 222, 193,
1410, 1159, 586, 585, 2671, 2702, 2529, 2212, 1658, 741, 2529,
861, 1758, 905, 2529, 597, 1049, 2529, 619, 2620, 2596, 1688,
2590, 2545, 2590, 883, 287, 723, 2565, 1835, 1738, 2243, 1693,
2565, 250, 2529, 1880, 1777, 701, 444, 927, 1127, 825, 2726,
1977, 235, 241, 269, 660, 1523, 420, 678, 213, 544, 940, 983,
605, 2716, 1848, 1848, 182, 1225, 365, 993, 224, 267, 309, 271,
324, 178, 2657, 1772, 546, 456, 2637, 1771, 677, 1409, 653, 2359,
690, 828, 2742, 1812, 2777, 552, 1572, 2742, 2792, 2819, 1753,
265, 1901, 1753, 2716, 2800, 2742, 453, 2742, 586, 1920, 929,
1897, 2742, 1859, 1899, 1106, 1135, 759, 730, 1838, 863, 1929,
2751, 2751, 2751, 2751, 713, 430, 2788, 1784, 966, 2483, 1784,
1786, 2727, 857, 1798, 1815, 730, 390, 593, 1489, 1448, 1784,
1510, 2788, 812, 856, 808, 941, 2797, 2757, 1852, 2757, 2412,
486, 1034, 615, 845, 974, 727, 969, 2916, 1841, 1926, 1926, 533,
446, 733, 696, 1214, 1857, 1907, 2824, 2631, 3556, 2496, 1617,
1000, 707, 936, 761, 960, 1936, 857, 423, 1130, 1165, 2453, 338,
988, 1869, 1951, 1932, 2820, 2742, 628, 447, 866, 637, 932, 2742,
1795, 2881, 695, 762, 2778, 427, 714, 2781, 1865, 1861, 678,
1465, 1770, 845, 356, 817, 385, 1820, 2692, 1787, 1510, 1814,
857, 2616, 204, 465, 1773, 2754, 1793, 1773, 1900, 185, 2706,
1162, 766, 2742, 1816, 2742, 1790, 1803, 1795, 1026, 334, 832,
478, 1849, 2679, 1773, 797, 2649, 1814, 1808, 99, 2037, 2616,
2719, 1813, 2637, 2648, 1813, 865, 1717, 2588, 2711, 2818, 1828,
2553, 2720, 1791, 1780, 2706, 2565, 1717, 1881, 1037, 329, 893,
723, 1821, 2692, 2586, 2729, 1755, 1793, 2670, 2602, 2638, 2684,
1813, 1755, 1755, 2626, 832, 739, 724, 1968, 2598, 2627, 851,
749, 684, 625, 2673, 2778, 1764, 2644, 1800, 1792, 511, 2776,
1890, 1764, 2776, 1040, 1049, 2699, 2061, 897, 1764, 274, 2755,
1912, 2581, 1780, 820, 1803, 2692, 2783, 572, 2751, 2699, 1830,
1875, 633, 1083)
Then I tried to use the bootstrap function:
> boot (Total, median, 1000)
ORDINARY NONPARAMETRIC BOOTSTRAP
Call:
boot(data = Total, statistic = median, R = 1000)
Bootstrap Statistics :
original bias std. error
t1* 1603 0 0
There were 50 or more warnings (use warnings() to see the first 50)
The warning message was:
the condition has length > 1 and only the first element will be used
Can you please advise me how do I perform bootstrap to generate 95% confidence intervals for the median? I am a beginner in this and your help would be much appreciated.
Thank you so much in advance.
Admittedly the boot function from the boot package has a slightly non-intuitive aspect to it. But if you read the documentation (or look at the examples in the documentation) you'll see specific instructions about the statistic argument:
In all other cases statistic must take at least two arguments. The
first argument passed will always be the original data. The second
will be a vector of indices, frequencies or weights which define the
bootstrap sample.
So instead of:
x <- rnorm(10)
boot(data = x,statistic = median,R = 1000)
You want this:
boot(data = x,statistic = function(x,i) median(x[i]),R = 1000)
Once you're that far, the function boot.ci() can be used to compute the confidence intervals (only some of them are available in this particular example I believe).
b <- boot(data = x,statistic = function(x,i) median(x[i]),R = 1000)
boot.ci(b)
Though the answer by #joran is right, since I already had code tested, with the CI computation, here it goes.
library(boot)
bootMedian <- function(data, indices) median(data[indices])
b <- boot(Total, bootMedian, R = 1000)
boot.ci(b)
This is how you would "roll your own" bootrap:
# number of bootstrap replicates
B <- 10000
# create empty storage container
result_vec <- vector(length=B)
for(b in 1:B) {
# draw a bootstrap sample
this_sample <- sample(Total, size=length(Total), replace=TRUE)
# calculate your statistic
m <- median(this_sample)
# save your calucated statistic
result_vec[b] <- m
}
# then probably draw a histogram of your bootstrapped replicates
hist(result_vec)
# get 95% confidence interval
result_vec <- result_vec[order(result_vec)]
lower_bound <- result_vec[round(0.025*B)]
upper_bound <- result_vec[round(0.0975*B)]
I use the standard normal random generator in this code:
B <- i
bs.result <- matrix(NA, nrow=i, ncol=...)
for (b in 1:i) {
sample.n <- rnorm(n, mean-..., sd=...)
optim.b <- optim(c(mu=0, sd=1), loglik, control=list(fnscale=-1), z=sample.n)
bs.result <- c(optim.b$par, optim.b$converge)
}
With the last column of the table you can check whether your optimize function had converged.

Unused argument in GA package

I'm trying to use TSP package with GA. I want to do something similar to this
My code:
library(GA)
library(globalOptTests)
library(TSP)
data("USCA50")
fitFun <-
function(x)
-tour_length(solve_TSP(USCA50))
dist <- as.matrix(USCA50)
GA <- ga(
type = "permutation",
fitness = fitFun,
distMatrix = dist,
min =1,
max = 50
)
The error I get:
Error in fitness(Pop[i, ], ...) :
unused argument (distMatrix = c(0, 1167, 1579, 437, 3575, 1453, 226, 2976, 1107, 1006, 1046, 891, 1488, 1030, 1803, 190, 1122, 1373, 1860, 523, 1047, 1152, 370, 1453, 1629, 1323, 1032, 654, 1462, 752, 993, 813, 1178, 1705, 816, 1206, 1285, 1641, 1578, 1703, 1343, 1317, 1647, 1157, 1479, 1703, 1166, 1211, 795, 1572, 1167, 0, 413, 1422, 2895, 316, 1172, 3094, 140, 382, 189, 530, 392, 526, 635, 1174, 2056, 286, 692, 910, 207, 211, 1035, 303, 2046, 2164, 1385, 845, 297, 597, 1033, 393, 1766, 546, 386, 1076,
153, 476, 432, 546, 184, 184, 481, 1579, 1686, 543, 20, 2008, 527, 434, 1579, 413, 0, 1832, 2766, 167, 1585, 3265, 508, 677, 547, 842, 229, 775, 229, 1575, 2451, 275, 289, 1277, 582, 514, 1420, 207, 2347, 2544, 1720, 1189, 116, 947, 1350, 800, 2117, 138, 777, 1338, 334, 62, 106, 145, 260, 312, 128, 1911, 1961, 136, 413, 2384, 913, 131, 437, 1422, 1832, 0, 3437, 1732, 272, 2607, 1327, 1355, 1345, 1269, 1787, 1409, 2041, 615, 697, 1670, 2093, 954, 1256, 1345, 807, 1672, 1242, 8
Is there something wrong with my GA package? RStudio doesn't show me this parameter but somehow others are able to run it.

Resources