Count the amount of observations in predetermined timestep - r

I have a large dataset of over 75.000 observations. Of these observations I have a list of date and time combinations. I want to calculate the observation frequency in a predetermined timestep (15, 30 or 60 minutes). The study period is from 2014-10-21 00:00 to 2015-10-21 23:59.
The raw data is stored in a DF, but date (as POSIXlt) and time (as character) are in different columns, so I combine them back into one column to create a POSIXct timestamp.
receiver$date2 = as.POSIXct(paste(receiver$date, receiver$time), format="%Y-
%m-%d %H:%M:%S")
dateseq = receiver$date2
dateseq is now (only a small fragment using dput()):
dateseq = structure(c(1414140420, 1414140720, 1414140960, 1414141080, 1414143540, 1414144980, 1414145940, 1414147380, 1414147440, 1414148100, 1414148280, 1414152720, 1414153740, 1414154520, 1414154580, 1414158540, 1414159380, 1414159680, 1414164240, 1414164300, 1414164840, 1414164900, 1414165500, 1414166100, 1414166220, 1414166460, 1414166520, 1414166820, 1414166880, 1414166940, 1414167300, 1414167360, 1414167480, 1414167780, 1414168380, 1414168440, 1414168800, 1414168860, 1414202040, 1414202220, 1414202280, 1414202700, 1414202820, 1414202880, 1414203660, 1414203960, 1414215180, 1414215300, 1414215900, 1414216560, 1414216860, 1414217220, 1414217280, 1414217460, 1414217580, 1414217700, 1414217820, 1414217880, 1414218240, 1414218720, 1414219380, 1414219800, 1414219920, 1414219980, 1414220160, 1414220280, 1414220820, 1414220880, 1414221000, 1414221960, 1414222080, 1414222200, 1414222320, 1414222500, 1414222560, 1414222860, 1414223640, 1414224780, 1414225800, 1414225920, 1414225980, 1414226040, 1414226100, 1414226220, 1414227240, 1414227420, 1414227600, 1414230300, 1414230540, 1414230840, 1414231140, 1414231320, 1414231440, 1414231560, 1414231800, 1414231860, 1414232040, 1414232160, 1414232400, 1414232520, 1414232640, 1414232700, 1414232760, 1414232880, 1414232940, 1414233060, 1414233180, 1414233240, 1414233300, 1414233420, 1414233480, 1414233660, 1414233720, 1414233780, 1414233840, 1414233960, 1414234080, 1414234320, 1414234440, 1414234560, 1414234620, 1414234740, 1414234860, 1414234980, 1414235040, 1414235280, 1414236240, 1414236300, 1414236420, 1414236540, 1414236840, 1414236900, 1414236960, 1414237020, 1414237260, 1414237560, 1414237860, 1414238280, 1414238400, 1414238460, 1414238580, 1414238640, 1414239180, 1414239300, 1414239360, 1414239480, 1414239540, 1414240440, 1414240860, 1414240920, 1414240980, 1414241040, 1414242000, 1414242180, 1414242480, 1414242540, 1414242660, 1414242720, 1414242840, 1414242900, 1414243800, 1414243920, 1414244280, 1414244460, 1414245240, 1414245600, 1414245660, 1414246080, 1414246500, 1414246680, 1414246740, 1414246920, 1414247340, 1414248180, 1414249320, 1414249560, 1414249860, 1414250340, 1414250520, 1414250640, 1414250760, 1414250880, 1414250940, 1414251060, 1414251240, 1414251900, 1414252020, 1414252080, 1414252200, 1414252260, 1414252380, 1414252440, 1414252440, 1414252500, 1414252560, 1414252680, 1414252980, 1414253160, 1414253460, 1414253580), class = c("POSIXct", "POSIXt"), tzone = "")
Then I want to have a timeseq that runs for the whole period (so also the days that don't have any observations) divided by the predetermined timestep.
timestep = 1800 # 1800 sec = 30 min
start = "2014-10-21 00:00"
end = "2015-10-21 23:59"
receiver = R125926
timeseq = seq(from = as.POSIXct(start), to = as.POSIXct(end), by = timestep)
Now I want to 'fill' a new dataframe with the timeseq in one column and the count data of how many observations (from dateseq) occurred in that time period.
EDIT
After some searching on the forum and adjusting some code, I came to one very simple method that brings me very close to what I want my results to look like:
det_interval = data.frame(table(cut(dateseq, breaks = "30 min")))
There's only two adjustments that I don't know how to do. Now it begins at the first record (e.g. when my first record is on 05.17 the interval it will use will be x.17 - x.47 (30min)), not at the start that I want (see the timeseq created above). So how can I make sure that this starts and ends at a predetermined date/time?

Related

Determine start date within time series

I hope you're doing well! I have a theoretical time-series analysis problem I hope that you can help me sort out.
To start, you'll find a reproducible example of my dataset below. Date is in a daily timescale. Q25 is 25th or lower quartile of my non-transformed data, Q75 is the 75th or upper quartile of my non-transformed data, fit is local weighted fit of the median, firstder is the first derivative of fit, and secondder is the second derivative of fit.
Plotting out fit produces two oscillations and then a steady increase in the data. Plotting the quartiles around that produces a large spread of data that narrows towards the increase in fit data. The first derivative shows the rate of change of the fit and this is where my issue comes in. I'm not sure where the increase in fit data starts based on the first derivative data. Logically, I know the signal-to-noise start date has to occur after March 7 (based on the quartiles), and before March 20 (before the steady increase in fit data). And this is also represented in the first derivative for about the same interval where the negative-to-positive inflection point changes on March 5th, becomes positive on March 16th, and then produces a stationary time series.
All that being said, should my exact start date be the change in the inflection point from the first derivative, or be the first positive value on March 16th?
I appreciate your time in this problem and any thoughts you may have!
data<-structure(list(Date = structure(c(1485950474, 1486036874, 1486123274,
1486209674, 1486296074, 1486382474, 1486468874, 1486555274, 1486641674,
1486728074, 1486814474, 1486900874, 1486987274, 1487073674, 1487160074,
1487246474, 1487332874, 1487419274, 1487505674, 1487592074, 1487678474,
1487764874, 1487851274, 1487937674, 1488024074, 1488110474, 1488196874,
1488283274, 1488369674, 1488456074, 1488542474, 1488628874, 1488715274,
1488801674, 1488888074, 1488974474, 1489060874, 1489147274, 1489233674,
1489320074, 1489406474, 1489492874, 1489579274, 1489665674, 1489752074,
1489838474, 1489924874, 1490011274, 1490097674, 1490184074, 1490270474,
1490356874, 1490443274, 1490529674, 1490616074, 1490702474, 1490788874,
1490875274, 1490961674, 1491048074, 1491134474, 1491220874, 1491307274,
1491393674, 1491480074, 1491566474, 1491652874, 1491739274, 1491825674,
1491912074, 1491998474, 1492084874, 1492171274, 1492257674, 1492344074,
1492430474, 1492516874, 1492603274, 1492689674, 1492776074, 1492862474,
1492948874, 1493035274, 1493121674, 1493208074, 1493294474, 1493380874,
1493467274, 1493553674, 1493640074, 1493726474, 1493812874, 1493899274,
1493985674, 1494072074, 1494158474, 1494244874, 1494331274, 1494417674,
1494504074, 1494590474, 1494676874, 1494763274, 1494849674, 1494936074,
1495022474, 1495108874, 1495195274, 1495281674, 1495368074), tzone = "UTC", class = c("POSIXct",
"POSIXt")), Q25 = c(-1.61495132528742, -3.86616056128065, -3.92140420424278,
-4.8011229557052, -8.64427034627082, -3.11323607034871, -4.3673083843457,
-1.45023104534208, 0.395769745934938, -1.49394189431791, -3.54063822876105,
-4.36090193633662, -0.966958995958447, -2.43233048854294, -0.181367797683111,
0.826258942687981, 3.36833418895383, -6.8991417494414, -1.15773470862185,
-1.75360705873163, 1.83790453304777, 2.11575746130393, -3.82025172988123,
0.679651741170909, -4.64628184041103, -6.91923314565111, 0.550274303541761,
0.104011128328036, -0.895257855280075, -0.801630235696042, 2.27958927430356,
2.98003963398985, 3.41649824319921, 1.56559818977215, -2.20923132476973,
0.552658760232765, 0.15158829140461, -4.75454688546242, -0.595460561248954,
-2.53729443345183, -0.826010503400985, -5.20578683534568, -2.78364193219594,
-3.62503323095109, 3.37820215582788, -2.53645164034493, -1.76051141957494,
-1.0256290530567, 1.94178279643985, 0.261239031590387, 0.00321585342072063,
2.87814873140354, -2.26732156613212, 2.65097224867168, -4.16746046231376,
1.64816233695592, 3.50505415841016, 2.83685877611882, 1.66353660199615,
2.27900517713667, 5.47721995923733, -5.31044894311933, 7.30753839733595,
5.50143585044911, -1.25129055380416, -2.41051058119916, 3.69266303212359,
2.28752278841533, -0.275687673398348, 5.74597173218469, 6.5773422259343,
3.72096844335478, 2.05388534852328, 5.41063696868948, 0.526467452167141,
1.60445671702256, -1.80394989627014, 1.56432488418924, 5.95370989889123,
7.94953250403525, 4.09121878799004, 2.11516919787794, -2.12808005361608,
6.77215849921842, 9.53718510298556, 4.16562173164636, 10.4573226478082,
7.703077796612, 7.55811710979136, 4.47194951592662, 10.2104312432178,
11.3454383477984, 0.997649090931488, 4.84898050707927, 10.8819209584302,
8.06296236341084, 11.3317616787558, 7.51878628894305, 7.87729934765305,
11.9108509727303, 6.77401202490232, 5.36297357453455, 10.6362047038983,
8.68979831512869, 4.0465996534104, 11.9579904470733, 9.41141176380086,
10.5754750604254, 12.6944336852953, 7.61563466861022), Q75 = c(5.93775779359077,
26.4536084846094, 7.92690107568623, 16.195405687679, 3.47567054091916,
34.9690262666155, 15.5126126583077, 24.4425589002446, 29.7425859431597,
23.1420118192775, 26.827758017105, 18.6306368759596, 19.759179203689,
10.0667740183259, 30.9080218485755, 10.0628623899296, 21.1120424008512,
12.1232187464341, 14.9571040303508, 11.4927011052638, 16.1617172813173,
19.0606972964125, 8.39991659547325, 9.5080530252195, 10.2717546026802,
12.018391863395, 27.2666992661895, 12.5172584337237, 19.9658806224003,
6.90019918091751, 18.4119063276997, 23.2991253786256, 27.95161418973,
16.9477966472485, 26.3880458021082, 19.2178725103802, 5.58699033890406,
9.82525729279156, 6.22139350667344, 5.6625294221828, 8.18283315939774,
4.78856479855966, 4.91215612536983, 5.35278870440784, 15.7471499356884,
7.95473965312171, 7.58463611165082, 6.03119890210746, 9.88624343762245,
6.66377352843609, 6.92675024060609, 7.20403099201013, 6.96877369392089,
17.7034248870798, 6.22890341708267, 6.1624397247754, 23.3856864094132,
7.13518162203812, 6.96344109315883, 7.69414570220079, 18.0859103957135,
7.52300478408242, 10.1635801549871, 10.021556657451, 8.51746254314866,
7.83000625461296, 15.4938419153615, 8.6844260972191, 8.07596479745038,
13.1423674521087, 8.04161364299224, 10.7442773622841, 8.58410892324644,
9.08436532340561, 8.84748510783176, 9.27529549461203, 9.01978932806698,
9.99776533859531, 9.61123990151036, 11.2228855544025, 10.3285714984086,
10.7107229417799, 11.452541129334, 11.9951421202043, 11.3568792509498,
11.139621487692, 12.957244784325, 13.1010906952192, 11.8445972599726,
12.8124554609003, 12.1817389611984, 13.4529860098547, 13.1808997426024,
12.568956945967, 13.9405958892683, 14.4445923505263, 14.5816203429081,
12.798362023978, 13.7926596005317, 14.3284196983115, 14.3967490595795,
14.3699332949429, 13.8061418130819, 15.4045229902535, 15.328632395916,
15.5928587109464, 15.5111381098579, 15.7167488979248, 16.4121827249844,
16.7700564366026), fit = c(1.3157822724014, 1.44491806546299,
1.67963756121542, 1.96834398237369, 2.32222986513481, 2.73223146146706,
3.16143742264514, 3.74278329406317, 4.4673163398484, 5.08529278937518,
5.58735598987316, 6.01592790788482, 6.19893270175371, 6.0219082198616,
5.64253432163072, 5.29694818196536, 4.89670493804841, 4.35145910275626,
3.89449691453349, 3.48150649031492, 3.06858491643756, 2.88963188544926,
3.13399806321574, 3.62311989322663, 4.03902573446563, 4.40598627768245,
4.84291047423098, 5.1737840740012, 5.3972440468493, 5.5747020603732,
5.62430591107552, 5.42843052467024, 5.07513358262307, 4.79108701506415,
4.59907825712695, 4.39731440509327, 4.22559688081583, 4.10100609028878,
4.00444369172723, 3.92144298531529, 3.82259220819525, 3.72499526558926,
3.68395895980124, 3.69588308031619, 3.73924432798967, 3.84246487218137,
4.07884774763199, 4.41108295888359, 4.70167312999791, 4.95537881350854,
5.2206483181831, 5.42551590243433, 5.52148736399275, 5.55736071284688,
5.60710852579646, 5.65757759073701, 5.68911425674423, 5.76594044238814,
5.93786454015275, 6.15175825295678, 6.31743846502224, 6.40077523837882,
6.45704948591979, 6.53019436816257, 6.59356685208809, 6.63353784524384,
6.71356141899707, 6.88849022040772, 7.11437487009308, 7.30646639975639,
7.43724432723552, 7.55279324817994, 7.67877181101032, 7.76924002146674,
7.83161170884946, 7.97157625691941, 8.25223488219952, 8.60947602940562,
8.95816992458796, 9.34076728750423, 9.77554331222275, 10.1411049362597,
10.3842988541376, 10.5696053585185, 10.7520817841281, 10.9357595672387,
11.0970528791622, 11.2495931571849, 11.3764752236255, 11.4864715266717,
11.6317299424136, 11.8381584436134, 12.0667779318613, 12.2724056764894,
12.462010561811, 12.6517333832877, 12.8101492769744, 12.9055352762602,
12.9678598772259, 13.0582354099638, 13.1489397497677, 13.2204738414797,
13.346284619515, 13.6054940294766, 13.9436193637562, 14.2337005769519,
14.5449448398809, 14.8895799498019, 15.0551768009747, 15.0689572800127
), firstder = c(0.0542499277820437, 0.193160412687084, 0.264645386746196,
0.318230646770668, 0.390583391620104, 0.410606699200811, 0.484714112557398,
0.683182658658343, 0.699350916534123, 0.546311900646561, 0.476582322984034,
0.33921923563074, 0.000346679118119919, -0.32275830659655, -0.377372654859586,
-0.342379980870621, -0.492111485610006, -0.524917784293232, -0.414059192641829,
-0.430018688265099, -0.343482693656914, 0.0295127267198723, 0.42189373253822,
0.482044173095213, 0.364522990904745, 0.40991488301477, 0.40715895907959,
0.264020778627613, 0.200548459021332, 0.136695124879259, -0.0667758528503706,
-0.308783766357995, -0.344835056787729, -0.22338628389576, -0.19056674389956,
-0.195775242472453, -0.146360055189657, -0.107867992742261, -0.0856184200473131,
-0.0883963049921002, -0.106496806989568, -0.0747428483921662,
-0.0103234284849929, 0.028493059030597, 0.0620691939868203, 0.163240621281308,
0.304123951137378, 0.325609827601989, 0.261609166722046, 0.261432729205552,
0.249586474110962, 0.150199026157553, 0.0521536950613295, 0.0370628072573624,
0.0565243651980056, 0.0371337817771211, 0.0409727028064402, 0.124422569131023,
0.207609809433488, 0.20232516927351, 0.121600832063498, 0.058433044321534,
0.0638003776220697, 0.0745713396178918, 0.0471802520722933, 0.0467708263829785,
0.126045851395065, 0.213953247074989, 0.220308792495525, 0.158550331399022,
0.11422743390592, 0.123806714974779, 0.114997378074604, 0.0651990840907102,
0.0828996021118185, 0.210617558388392, 0.336478451788591, 0.356675237198802,
0.354610868118913, 0.419333862640583, 0.419974858146042, 0.301270480481834,
0.201419853206041, 0.17882844049566, 0.186628379656891, 0.172934534594114,
0.156583940148236, 0.142289490196014, 0.11234075824169, 0.119081439314575,
0.177295034391252, 0.226764155293772, 0.22057696671022, 0.193643700730051,
0.190744241252391, 0.181381161962744, 0.127949080877661, 0.0681406193708671,
0.0729227267768433, 0.0983314755975622, 0.0766175682172481, 0.0819140886989596,
0.188151474480757, 0.320764600927798, 0.32011829707578, 0.283266397091015,
0.351814702578002, 0.276441515194414, 0.0724489974588587, -0.0273030060468944
), secondder = c(0.172623240328004, 0.105197729482076, 0.0377722186361492,
0.0693983014127931, 0.0753071882860794, -0.0352605731246656,
0.18347539983784, 0.213461692364051, -0.181125176612492, -0.124952855162631,
-0.0145063001624228, -0.260219874544165, -0.417525238481075,
-0.228684732948264, 0.119456036422192, -0.0494706884442619, -0.249992321034509,
0.184379723668058, 0.037337459634748, -0.0692564508812885, 0.242328440097658,
0.503662400655915, 0.281099610980781, -0.160798729866795, -0.0742436345141417,
0.165027418734192, -0.170539266604553, -0.1157370942994, -0.0112075449131641,
-0.116499123370982, -0.290442832088276, -0.193572994926973, 0.121470414067507,
0.12142713171643, -0.055788051724031, 0.0453710545782453, 0.0534593199873461,
0.0235248049074466, 0.0209743404824492, -0.0265301103720232,
-0.00967089362291196, 0.0731788108177152, 0.0556600289966314,
0.0219729460345484, 0.0451793238778984, 0.157163530711077, 0.124603129001063,
-0.081631376071841, -0.0463699456880455, 0.0460170706550578,
-0.0697095808442372, -0.129065315062581, -0.0670253471298663,
0.0368435715219322, 0.0020795443593542, -0.0408607112011232,
0.0485385532597613, 0.118361179389404, 0.0480133012155273, -0.058582581535485,
-0.102866092884539, -0.0234694825993884, 0.0342041492004599,
-0.0126622252088158, -0.0421199498823812, 0.0413010985037516,
0.117248951520421, 0.0585658398394289, -0.045854748998357, -0.0776621731946507,
-0.0109836217915529, 0.0301421839292724, -0.0477608577296227,
-0.0518357302381656, 0.0872367662803821, 0.168199146272765, 0.0835226405276321,
-0.0431290697072093, 0.039000331547431, 0.0904456574959092, -0.0891636664849909,
-0.148245088843424, -0.0514561657081618, 0.00627334028739845,
0.00932653803506511, -0.036714228160621, 0.00401303926886598,
-0.0326019391733094, -0.0272955247353401, 0.0407768868811118,
0.0756503032722406, 0.0232879385327998, -0.0356623156999039,
-0.0182042162604343, 0.012405297305115, -0.0311314558844096,
-0.0757327062857556, -0.0438842167278324, 0.0534484315397847,
-0.00263093389834701, -0.0407968808622812, 0.0513899218257041,
0.161084849737891, 0.10414140315619, -0.105434010860225, 0.0317302108906947,
0.105366400083279, -0.256112774850456, -0.151872260620654, -0.0476317463908522
)), row.names = c(NA, -110L), class = c("tbl_df", "tbl", "data.frame"
))
If the problem is to find where the fit column starts rising then fit a curve made up of a horizontal line segment followed by a sloped line segment (red in the graph) and report the changepoint (Date0 and dashed line in graph).
# calculate starting values, st
fm0 <- lm(fit ~ Date, data, subset = seq(to = nrow(data), length = 20))
st <- c(mean(data$fit[1:20]), coef(fm0))
names(st) <- c("a0", "a", "b")
fm <- nls(fit ~ pmax(a0, a + b * as.numeric(Date)), data, start = st)
# solve a0 = a + b * Date0 for Date0 using calculated a0, a and b
Date0 <- with(as.list(coef(fm)), .POSIXct((a0 - a)/b))
plot(fit ~ Date, data, ylab = "")
lines(fitted(fm) ~ Date, data, col = "red")
abline(v = Date0, lty = 2)
Date0
## [1] "2017-03-21 07:53:56 EDT"

Remove all rows above and below a value in R

We have citizen scientist recording data for us using In-Situ Aqua troll 600 instruments. It is similar to a CTD but not. The data format is a little different. Different enough that I cannot use CTD trim from the OCE package in R. I need to remove all the rows of data during the soak time (time in the water before they start lowering the instrument) and the up cast from the data. That is all the rows after they reached the max depth. So I just need that center portion of my dataframe.
My Data
Date Time Salinity (ppt) (672441) Chlorophyll-a Fluorescence (RFU) (671721) RDO Concentration (mg/L) (672144) Temperature (°C) (676121) Depth (ft) (671051)
16:29.0 0 0.01089297 7.257619 31.91303 0.008220486
16:31.0 0 0.01765913 7.246986 31.93175 0.1499496
16:33.0 0 0.0130412 7.258863 31.93253 0.5387784
16:35.0 0 0.01299242 7.274049 31.93806 0.6187978
16:37.0 0 0.01429801 7.26965 31.94401 0.6640261
16:39.0 0 0.01342988 7.271608 31.93595 0.681709
16:41.0 0 0.01337719 7.271549 31.93503 0.684597
16:43.0 7.087267 0.007094439 6.98015 31.89018 1.598019
16:45.0 28.3442 0.007111916 6.268753 31.83806 1.687673
16:47.0 31.06357 0.007945394 6.197834 31.77821 1.418773
16:49.0 32.07076 0.0080788 6.166986 31.76881 1.382685
16:51.0 31.95504 0.004382414 6.191305 31.72906 1.358556
16:53.0 36.21165 0.01983912 5.732656 29.3942 123.4148
16:55.0 36.37849 0.02243886 5.626586 28.82502 125.2927
16:57.0 36.43061 0.02416219 5.450325 28.23787 126.7997
16:59.0 36.44484 0.02441683 5.421676 28.14037 127.0321
17:01.0 36.46815 4.510316 5.318929 28.09501 127.2064
17:03.0 36.41381 4.012657 5.241654 28.14595 127.2227
17:05.0 36.42724 0.7891375 5.174401 28.20383 127.2019
17:07.0 36.41064 0.4351442 5.120181 28.18592 127.197
17:09.0 36.38155 0.2253969 5.033384 28.21021 127.1895
17:11.0 36.37671 0.2089337 5.019629 28.21222 127.1885
17:13.0 36.43813 0.08728585 4.981099 28.17526 127.2223
17:15.0 36.47644 0.904435 4.951878 28.13579 127.2108
17:17.0 36.54742 0.1230291 4.93056 28.06166 127.2307
17:19.0 36.60466 10.04291 4.908442 27.9397 126.6003
17:21.0 36.61511 11.33922 4.904828 27.92038 126.5161
17:23.0 36.68179 0.6680982 4.87018 27.78319 123.707
17:25.0 36.74612 0.06539913 4.848994 27.72977 119.906
17:27.0 36.75729 0.02414635 4.826871 27.72545 114.9537
17:29.0 37.1578 0.01556828 4.804105 27.81129 113.3405
> depthmax<- max(WS$`Depth (ft) (671051)`, na.rm = TRUE)
> output <- WS[WS$"Depth (ft) (671051)" < depthmax,]
> Output2 <- output[output$"Depth (ft) (671051)" > 1,]
I tried these and got output2 to work but can't seam to get output to work. Is there a more elegant way to do this? Just to recap I need to remove all rows after the depthmax (127.2307) and all the rows before the depth when they start lowering the instrument (~2.41).
Your code does remove the maximum depth, but not the rows after the maximum depth is reached. You want to locate the row index of the the maximum depth and delete that row and the ones after:
start <- tail(which(na.omit(WS$`Depth (ft) (671051)`) < 2.41), 1) + 1
end<- which.max(na.omit(WS$`Depth (ft) (671051)`)) - 1
output <- WS[start:end, ]
The first line finds the index of the last row less than 2.41 and adds 1 to get the starting row. The second line finds the index of the maximum depth and subtracts 1 to get the row before that.

Error while fitting data in auto.arima - R

I am running auto.arima for forecasting time series data and getting the following error:
1: The time series frequency has been rounded to support seasonal
differencing.
2: In value[3L] : The chosen test encountered
an error, so no seasonal differencing is selected. Check the time
series data.
This is what I am executing:
fit <- auto.arima(data,seasonal = TRUE, approximation = FALSE)
I have weekly time series data.
This is how dput(data) looks like:
structure(c(12911647L, 12618317L, 12827388L, 12967840L, 13264925L,
13557838L, 13701131L, 13812463L, 13971928L, 13837658L, 13550635L,
13022371L, 13507596L, 13456736L, 12992393L, 12831883L, 13262301L,
12831691L, 12808893L, 12726330L, 11893457L, 12434051L, 12363464L,
12077055L, 12107221L, 11986124L, 11997087L, 12264971L, 12164412L,
12438279L, 12733842L, 12543251L, 12627134L, 12480153L, 12276238L,
12443655L, 12497753L, 12279060L, 12549138L, 12308591L, 12416680L,
12516725L, 12326545L, 12772578L, 12524848L, 13429830L, 14188044L,
16611840L, 16476565L, 15659941L, 10785585L, 12150894L, 13436366L,
12985213L, 13097555L, 13204872L, 13786040L, 13760281L, 13295389L,
14734578L, 15043941L, 14821169L, 14361765L, 14300180L, 14357964L,
14271892L, 13248168L, 13813784L, 14092489L, 14100024L, 13378374L,
13225650L, 12582444L, 13267163L, 13026181L, 12747286L, 12707074L,
12534595L, 12546094L, 13030406L, 12950360L, 12814398L, 13405187L,
13277755L, 13142375L, 12742153L, 12610817L, 12267747L, 12570075L,
12704157L, 12835948L, 12851893L, 12978880L, 13104906L, 12754018L,
13213958L, 13584642L, 13963433L, 14471672L, 16312595L, 16630000L,
16443882L, 11555299L, 12018373L, 13031876L, 13013945L, 13164137L,
13313246L, 13652605L, 13803606L, 13308310L, 14466211L, 15092736L,
15346015L, 14467260L, 14767785L, 13914271L, 14185070L, 13851028L,
13605858L, 13597999L, 13876994L, 13026270L, 13113250L, 12288727L,
12925846L, 13525010L, 12594472L, 12654512L, 12888260L), .Tsp = c(2016.00819672131,
2018.48047598209, 52.1785714285714), class = "ts")
This is how I am reading data from the csv
read_data <- read.csv(file="data.csv", header=TRUE)
data_ts <- ts(read_data, freq=365.25/7, start=decimal_date(ymd("2016-1-4")))
data <- data_ts[, 2:2]
This is the data in the csv:
Year si_act
1/4/16 12911647
1/11/16 12618317
1/18/16 12827388
1/25/16 12967840
2/1/16 13264925
2/8/16 13557838
2/15/16 13701131
2/22/16 13812463
2/29/16 13971928
3/7/16 13837658
3/14/16 13550635
3/21/16 13022371
3/28/16 13507596
4/4/16 13456736
4/11/16 12992393
4/18/16 12831883
4/25/16 13262301
5/2/16 12831691
5/9/16 12808893
5/16/16 12726330
5/23/16 11893457
5/30/16 12434051
6/6/16 12363464
6/13/16 12077055
6/20/16 12107221
6/27/16 11986124
7/4/16 11997087
7/11/16 12264971
7/18/16 12164412
7/25/16 12438279
8/1/16 12733842
8/8/16 12543251
8/15/16 12627134
8/22/16 12480153
8/29/16 12276238
9/5/16 12443655
9/12/16 12497753
9/19/16 12279060
9/26/16 12549138
10/3/16 12308591
10/10/16 12416680
10/17/16 12516725
10/24/16 12326545
10/31/16 12772578
11/7/16 12524848
11/14/16 13429830
11/21/16 14188044
11/28/16 16611840
12/5/16 16476565
12/12/16 15659941
12/19/16 10785585
12/26/16 12150894
1/2/17 13436366
1/9/17 12985213
1/16/17 13097555
1/23/17 13204872
1/30/17 13786040
2/6/17 13760281
2/13/17 13295389
2/20/17 14734578
2/27/17 15043941
3/6/17 14821169
3/13/17 14361765
3/20/17 14300180
3/27/17 14357964
4/3/17 14271892
4/10/17 13248168
4/17/17 13813784
4/24/17 14092489
5/1/17 14100024
5/8/17 13378374
5/15/17 13225650
5/22/17 12582444
5/29/17 13267163
6/5/17 13026181
6/12/17 12747286
6/19/17 12707074
6/26/17 12534595
7/3/17 12546094
7/10/17 13030406
7/17/17 12950360
7/24/17 12814398
7/31/17 13405187
8/7/17 13277755
8/14/17 13142375
8/21/17 12742153
8/28/17 12610817
9/4/17 12267747
9/11/17 12570075
9/18/17 12704157
9/25/17 12835948
10/2/17 12851893
10/9/17 12978880
10/16/17 13104906
10/23/17 12754018
10/30/17 13213958
11/6/17 13584642
11/13/17 13963433
11/20/17 14471672
11/27/17 16312595
12/4/17 16630000
12/11/17 16443882
12/18/17 11555299
12/25/17 12018373
1/1/18 13031876
1/8/18 13013945
1/15/18 13164137
1/22/18 13313246
1/29/18 13652605
2/5/18 13803606
2/12/18 13308310
2/19/18 14466211
2/26/18 15092736
3/5/18 15346015
3/12/18 14467260
3/19/18 14767785
3/26/18 13914271
4/2/18 14185070
4/9/18 13851028
4/16/18 13605858
4/23/18 13597999
4/30/18 13876994
5/7/18 13026270
5/14/18 13113250
5/21/18 12288727
5/28/18 12925846
6/4/18 13525010
6/11/18 12594472
6/18/18 12654512
6/25/18 12888260
I was able to read the data without any errors before, initially, I had 160 records & the model does not throw any error but, then for 80-20 test I removed the last 30 records and this error cropped up. Now also, if I run with all the data I don't get any error but is I run it with first 130 as 80% I get this error.
when using auto.arima with seasonal = TRUE the parameter S is not calibrated but taken from the frequency of the ts object you are providing. So in your case S = 52.17.
In case the frequency of the time series is not and integer, S is rounded to next integer so auto.arima takes S = 52.
With S=52 and a data of length 150 it becomes difficult to calibrate a seasonal arima model: e.g if P = 2 and and all other variables are zero the first 104 observations cannot be used. I guess that is what the warning is about. You are being told that the seasonal component cannot be calibrated due to the large coefficient S (or due to your short data).
So either you get a longer data history, or you aggregate your data to monthly data (such that S = 12).

momentjs calculates date difference incorrectly

In my angular web application, I want to compare two dates to see if a person is less than 18 years old when she/he entered the company. Here is the code I use to do this:
const dayOfBirth = moment(formControl.value, this.dateFormat, true).startOf('day');
const entranceDateControl = this.wizardFormGroup.get('entranceDate');
const entranceDate = moment(entranceDateControl.value, this.dateFormat, true).startOf('day');
// Check validation rule R3: Age is less than 18 compared to entrance date
const difference = moment.duration(Math.abs(entranceDate.diff(dayOfBirth)));
if (difference.years() < 18) {
const validationMessage = this.getValidationMessage('R3', formControlName);
return validationMessage ? validationMessage.message : null;
}
As you can see, I am using startOf('day') to get rid of any time component so that I only handle dates. I use diff() to get the difference between two dates and then duration() to convert the difference to years, months, days, etc. Using this code, the validation message should NOT show when the person is turning 18 years old on the day when she/he entered the company.
Upon testing this, I came across what is, in my opinion, strange behavior. Depending on months and years used, it gave different results. For instance, for these dates it was Ok:
dayOfBirth = 1998-03-01, 1998-04-01, ..., 2000-02-01
entranceDate = 2016-03-01, 2016-04-01, ..., 2018-02-01
But the following dates returned the validation message:
dayOfBirth = 2000-03-01, 2000-04-01, ..., 2002-02-01
entranceDate = 2018-03-01, 2000-04-01, ..., 2020-02-01
After these dates, i.e. using 2002-03-01 and onward, it works again. I also got wrong result for the dates preceding 1998-03-01.
Now, I had a closer look at the Duration object and I noticed that for the times where it was less than 18 years, it had calculated 864 milliseconds less then when it came to the right conclusion that it was 18 years between the dates.
Correct duration
----------------
dayOfBirth = 1998-03-01, 1998-04-01, ..., 2000-02-01
entranceDate = 2016-03-01, 2016-04-01, ..., 2018-02-01
Duration = 568080000000 ms
Wrong duration
--------------
dayOfBirth = 2000-03-01, 2000-04-01, ..., 2002-02-01
entranceDate = 2018-03-01, 2000-04-01, ..., 2020-02-01
Duration = 567993600000 ms
Duration difference
-------------------
568080000000 - 567993600000 = 86400000 ms = 24 hours = 1 day
Has anyone an explanation for this? Can it be considered a bug in momentjs? Any viable workaround for this?
I didn't go into details in moment source code but it seems duration() is playing tricks with you. Simplify the code and rely only on diffas follow and you should be good (at least it seems to work for the samples you provided). And it's easier on the eyes :)
const moment = require('moment')
const dayOfBirth = moment('2000-03-01').startOf('day');
const entranceDate = moment('2018-03-01').startOf('day');
const difference = entranceDate.diff(dayOfBirth, 'years')
if (difference < 18) {
console.log( '<18')
} else {
console.log( '>=18')
}
will output >=18

Extend dates in frequency table

I have different data sets that look like the following:
structure(c(1414406460, 1414635660, 1414636260, 1414636920, 1414637040,
1414639020, 1414711140, 1414714500, 1414718760, 1414718820, 1414727520,
1414727640, 1414727760, 1414898220, 1414898880, 1414899000, 1414899120,
1414899240, 1414899300, 1414899480, 1414899600, 1414900140, 1414900260,
1414900560, 1414900620, 1414900860, 1414901040, 1414919940, 1414920420,
1414951320, 1414971720, 1414977600, 1414977660, 1414978620, 1414984980,
1414988100, 1414988160, 1414989000, 1414989240, 1414989900, 1414990020,
1414990080, 1414990320, 1414990500, 1414990920, 1414991100, 1414991220,
1414991340, 1414991940, 1414992240, 1414992780, 1414992900, 1414993020,
1414993140, 1415001600, 1415001660, 1415001720, 1415001840, 1415001960,
1415002140, 1415002260, 1415003280, 1415003340, 1415018040, 1415040780,
1415040900, 1415040960, 1415041080, 1415041320, 1415041440, 1415041560,
1415041980, 1415042220, 1415042280, 1415042400, 1415042820, 1415043240,
1415043360, 1415043420, 1415043540, 1415043600, 1415043660, 1415043720,
1415043840, 1415043900, 1415044020, 1415044260, 1415044320, 1415044440,
1415044560, 1415044620, 1415044680, 1415044800, 1415044860, 1415044920,
1415044980, 1415045100, 1415045160, 1415045280, 1415045340, 1415045400,
1415045460, 1415045520, 1415045700, 1415045760, 1415045820, 1415045940,
1415046000, 1415046120, 1415046240, 1415046360, 1415046420, 1415046540,
1415046600, 1415046720, 1415046780, 1415046900, 1415047020, 1415047080,
1415047140, 1415047260, 1415047320, 1415047380, 1415047440, 1415047560,
1415047620, 1415047680, 1415047740, 1415047800, 1415047920, 1415048040,
1415048160, 1415048280, 1415048340, 1415048460, 1415048520, 1415048580,
1415048640, 1415048760, 1415048820, 1415048940, 1415049000, 1415049060,
1415049120, 1415049180, 1415049420, 1415049600, 1415049720, 1415049780,
1415049900, 1415049960, 1415050020, 1415050140, 1415050320, 1415050500,
1415050620, 1415050860, 1415050980, 1415051220, 1415051520, 1415051580,
1415051700, 1415051760, 1415051880, 1415051940, 1415052000, 1415052120,
1415052180, 1415052240, 1415052360, 1415052540, 1415052660, 1415052720,
1415052780, 1415052900, 1415053020, 1415053080, 1415053140, 1415053260,
1415053380, 1415053440, 1415053500, 1415053560, 1415053620, 1415053680,
1415053800, 1415053860, 1415053980, 1415054100, 1415054160, 1415054760,
1415054820, 1415055120, 1415055180, 1415055480, 1415056920, 1415057160,
1415057760, 1415057760, 1415058720, 1415067660, 1415067780, 1415067900,
1415068020, 1415068080, 1415068140, 1415068980, 1415069100, 1415069400,
1415069520, 1415069580, 1415070000, 1415070060, 1415075760, 1415076720,
1415076840, 1415077200, 1415077500, 1415077620, 1415082600, 1415082660,
1415083140, 1415083260, 1415083380, 1415083440, 1415083500, 1415083800,
1415084100, 1415136600, 1415141340, 1415142780, 1415212920, 1415304360,
1415319300, 1415319900, 1415320320, 1415329140, 1415337660, 1415338680,
1415338980, 1415339040, 1415339160, 1415348640, 1415348820, 1415348940,
1415349000, 1415349060, 1415349120, 1415349180, 1415371200, 1415371320,
1415371500, 1415371800, 1415371920, 1415372040, 1415372100, 1415372160,
1415372220, 1415372280, 1415372340, 1415384520, 1415384580, 1415384640,
1415391720, 1415391840, 1415396160, 1415396580, 1415396760, 1415396940,
1415397780, 1415398080, 1415398380, 1415398380, 1415413620, 1415413680,
1415413740, 1415413800, 1415413860, 1415414040, 1415414400, 1415421900,
1415461920, 1415472240, 1415479740, 1415482920, 1415483280, 1415508780,
1415509020, 1415509140, 1415509620, 1415509800, 1415513760, 1415513880,
1415513940, 1415521980, 1415522040, 1415522100, 1415542260, 1415546520,
1415566740, 1415566800, 1415568300, 1415568360, 1415581380, 1415581560,
1415581620, 1415595000, 1415595240, 1415595480, 1415595540, 1415595720,
1415596560, 1415597100, 1415598360, 1415598540, 1415598960, 1415609160,
1415639700, 1415639760, 1415639880, 1415656320, 1415664360, 1415664480,
1415667960, 1415668500, 1415668740, 1415671620, 1415686740, 1415733660,
1415744280, 1415753340, 1415833500, 1415833620, 1415863260, 1415920320,
1415927760, 1415929020, 1415929260, 1415929320, 1415929500, 1415929920,
1415938560, 1415938920, 1415939400, 1416018540, 1416018600, 1416090120,
1416090300, 1416090360, 1416090480, 1416099900, 1416188640, 1416188700,
1416189120, 1416189240, 1416635940, 1416636000, 1416638400, 1416638520,
1416639180, 1416702720, 1416811860, 1416811980, 1416812040, 1416875940,
1416876000, 1416876060, 1416977640, 1416978360, 1417047780, 1417047840,
1417066620, 1417066680, 1417219320, 1417221240, 1417221300, 1417221540,
1417221600, 1417222740, 1417226580, 1417226640, 1417226700, 1417240560,
1417300800, 1417301400, 1417307280, 1417307400, 1417314780, 1417314900,
1417484160, 1417489200, 1418166300, 1418166540, 1418280240, 1418342280,
1418342340, 1418703240, 1418703240, 1420096320, 1420761360, 1420761720,
1420761840, 1420762020, 1421724780, 1422230580, 1422238380, 1422238500,
1422238560, 1422238620, 1422506820, 1423182120, 1423273080, 1423273200,
1423273260, 1423355820, 1424655360, 1424657520, 1424661840, 1424661900,
1424671260, 1424832840, 1424839980, 1424840340, 1424841900, 1424842020,
1424842800, 1424843040, 1425436080, 1425436200, 1425436320, 1425438960,
1425439020, 1425959880, 1426120200, 1426996140, 1426996380, 1427074800,
1427078100, 1427334360, 1427334420, 1427334660, 1427587800, 1427676780,
1427676900, 1427676960, 1429924860, 1429925220, 1429925520, 1442449440,
1443917580, 1444026600, 1444026780, 1444085880, 1444091040, 1444113300,
1444122900, 1444432680, 1444462920, 1444462980, 1444463040, 1445056380,
1445057700, 1445142420, 1445219160, 1445224500), class = c("POSIXct",
"POSIXt"), tzone = "America/Anguilla")
Now I want to count the frequency of each date/time cobination with timesteps of 30 minutes. I do that with the following code:
timestep = "30 min"
det_interval = data.frame(table(cut(dateseq, breaks = timestep)))
But my study period starts earlier and is exactly one year. So how can I extend the frequency table so that it starts at "2014-10-21 00:00:00" and stops at "2015-10-21 23:59:59"?
I tried to make a separate time sequence, but then I have to merge it with the following DF:
timestep = "30 min"
start = "2014-10-21 00:00"
end = "2015-10-21 23:59"
timeseq = seq(from = as.POSIXct(start), to = as.POSIXct(end), by = timestep)
I think this should do it. You can pass a sequence of breaks to the cut(), instead of the step (which starts at the min of data and goes to the max of the vector).
timestep<-seq.POSIXt(strptime("2014-10-21 00:00:00","%Y-%m-%d %H:%M:%S"), strptime("2015-10-21 23:59:59","%Y-%m-%d %H:%M:%S"), timestep)
det_interval = data.frame(table(cut(dateseq, breaks = timestep, right=T, include.lowest=T)))
head(det_interval)
tail(det_interval)

Resources