I am attempting to assign values to a column based on conditional statements but the POSIXct format seems to be throwing me off. I have a column of times and would like to assign them to day/night/dawn/dusk with something like this:
if(t40636$time>t40636$dawn.b&t40636$time<t40636$dawn.e){
t40636$time.periods=1
} else {
if(t40636$time>t40636$mid.day.b&t40636$time<t40636$mid.day.e){
t40636$time.periods=2
} else {
if(t40636$time>t40636$dusk.b&t40636$time<t40636$dusk.e){
t40636$time.periods=3
} else {
if(t40636$time>t40636$mid.night.b&t40636$time<t40636$mid.night.e){
t40636$time.periods=4
} else {
t40636$time.periods=0
}
}
}
}
However, this code does not work because of the format of the columns and yields the matrix seen below (only 0s in the time.periods column).
Date Temp..ºC. Depth..m. Light time time.at.depth dawn.b dawn.e dusk.b
1 2012-06-19 14.47 -21.5 255 15:32 0 01:42 04:42 19:13
2 2012-06-19 16.99 -20.2 255 15:37 5 01:42 04:42 19:13
3 2012-06-19 12.60 -18.8 255 15:41 4 01:42 04:42 19:13
4 2012-06-19 16.36 -17.5 255 15:46 5 01:42 04:42 19:13
5 2012-06-19 16.36 -13.4 255 15:51 5 01:42 04:42 19:13
6 2012-06-19 17.94 -2.7 255 15:56 5 01:42 04:42 19:13
dusk.e mid.day.b mid.day.e mid.night.b mid.night.e time.periods
1 22:13 10:27 13:27 22:27 01:27 0
2 22:13 10:27 13:27 22:27 01:27 0
3 22:13 10:27 13:27 22:27 01:27 0
4 22:13 10:27 13:27 22:27 01:27 0
5 22:13 10:27 13:27 22:27 01:27 0
6 22:13 10:27 13:27 22:27 01:27 0
ifelse yields something close to what I want but I can't do multiple statements with it. Any suggestions are greatly appreciated.
t40636$time.periods=ifelse(t40636$time>t40636$dawn.b&t40636$time<t40636$dawn.e,1,0)
The answer to "fix my multiple if-else statements" is nearly always "Don't use multiple if-else constructions."
The R-language has a very nice switch function, and its help page has some excellent examples.
Related
I have a time series which represents the amount of a certain product sold throughout the year 2018 (from 2018/01/01 to 2018/12/31); is it correct to think of a frequency of 7 observations per cycle? and if so, what is my cycle? one week? I try to understand this in order to decompose my time series avoiding the error Error in decompose(tsData) : time series has no or less than 2 periods. This is my R script and my data.
library(forecast)
library(sweep)
library(timetk)
Data <- read.delim("R Project/Dataset/MyData.txt")
DataFrame <- data.frame(Data,
Date = seq(as.Date("2018-01-01"), as.Date("2018-12-31"),
by = "day"))
inds <- seq(as.Date("2018-01-01"), as.Date("2018-12-31"), by = "day")
tsData <- ts(Data, start = c(2018, as.numeric(format(inds[1], "%j"))),
frequency = 365)
print(tsData)
plot(tsData)
Axis(inds, side = 1, at = seq(inds[1], tail(inds, 1) + 60,
by = "1 months"), format = "%b %Y")
comp = decompose(tsData)
#comp = stl(tsData)
plot(comp)
fit <- auto.arima(tsData)
fore <- forecast(fit, h = 15, level = 99.5)
plot(fore, xaxt = "n")
Axis(inds, side = 1, at = seq(inds[1], tail(inds, 1) + 60, by = "1 months"),
format = "%b %Y")
This is MyData.txt file
Daily Data
0
2621
3407
3644
3569
1212
0
0
4473
3885
3671
3641
1453
0
4182
3812
3650
3444
3557
1612
0
4004
3631
3342
3203
3424
1597
0
4280
3644
3642
3696
3793
1753
0
4416
3935
3522
3544
3569
1649
0
3871
3442
3144
3158
3693
1780
0
4322
3682
3499
3279
3485
1716
0
4255
3713
3470
3673
3983
1931
0
4771
3986
3833
3501
3620
1710
0
4407
3799
3654
3332
3693
1780
0
0
4574
4016
3748
3559
1625
0
4548
3726
2780
0
0
122
0
5005
4300
3772
3929
3917
2021
0
4820
4117
3668
3664
3639
1742
0
4473
4151
3844
3499
3736
1838
0
4346
3693
3297
3327
3639
1773
0
4519
0
4352
4079
4143
1970
0
4693
4018
3679
3838
3606
1601
0
0
4289
4011
3742
3710
1781
0
4186
3707
3600
3484
3702
1747
0
4195
3838
3504
3609
3934
1943
0
0
5243
4754
4164
4121
1854
0
0
5173
4518
3875
3889
1904
0
5105
4056
4186
4079
3953
1846
0
4543
4341
4013
2998
4048
1767
0
0
4317
5260
5185
4969
2046
0
5683
5004
4567
4542
4266
2065
0
4357
5281
4830
4510
0
1567
0
5818
4906
4518
4218
4275
2074
0
5005
4645
4543
4558
4574
2129
0
4755
0
4458
3845
3746
1689
0
4285
3476
3447
2959
3470
1584
0
0
4159
3881
3533
3360
1643
0
4152
3748
3329
3112
3303
1790
0
3852
4190
3482
3313
3400
1582
0
4042
3706
3451
3137
3178
1518
0
4077
3754
3429
3369
3307
1467
0
3918
3620
3442
3302
3168
1630
0
3967
3707
3397
3294
3314
1646
0
4196
3812
3478
3111
3113
1411
0
0
3717
3501
3282
3366
1554
0
3737
3428
3028
2960
2977
1513
0
3608
3306
2941
2918
3238
1543
0
0
3959
3678
3367
3237
1024
0
0
4057
3562
3344
3367
1602
0
3784
3581
3395
2948
3009
1446
0
3676
3276
3112
3125
3133
1502
0
4200
4027
3739
3531
3222
2
0
4446
4342
4066
3811
2932
1643
0
4587
4534
4146
3994
3350
1400
0
1248
0
4248
4629
4346
1844
0
168
The frequency = parameter in ts() function indicates the number of observations before pattern repetition. If you set a seasonality of 365 (1 year) with 1 year of data it will have only 1 period and so decompose() tells you: time series has no or less than 2 periods.
As you said "7 observations per cycle", you may want to set frequency equal to 7. Or if you want to analyze year seasonality put more data in tsData.
Just change:
# ....
tsData <- ts(Data, start = c(2018, as.numeric(format(inds[1], "%j"))), frequency = 365)
# ...
to :
# ...
### weekly seasonality
tsData <- ts(Data, start = c(2018, as.numeric(format(inds[1], "%j"))), frequency = 7)
#...
and now decompose works:
comp = decompose(tsData) # NO ERROR
### get the plot
plot(comp)
# ... rest of your code ...
here the plot:
EDIT on your comment:
The X-axis on the plot depends on how you declare the start, please have a look at ts documentation.
If you want to have the 2018 year value you can simply use (see documentation) autoplot() :
# ... rest of code ...
autoplot(tsData)
# ... rest of code ...
that is also highly customizable, if you want to know how to customize the plot (made through ggplot2 package) just have a look at documentation and all the posts on this blog etc.
I have a time series which represents the amount of a certain product sold throughout the year 2018. I am trying to decompose the time series but I get the following error Error in decompose(myzoo) : time series has no or less than 2 periods. This is my code in R
## requiere packages
library(forecast)
library(sweep)
library(timetk)
library(zoo)
## Read the Data
Data <- read.delim("R Project/Dataset/MyData.txt")
## Create a daily Date object
inds <- seq(as.Date("2018-01-01"), as.Date("2018-12-31"), by = "day")
## Create a time series object
myzoo <- zoo(Data, inds)
## print myzoo
print(myzoo)
## plot myzoo
plot(myzoo)
plot(myzoo, xaxt = "n")
Axis(inds, side = 1, at = seq(inds[1], tail(inds, 1) + 60, by = "1 months"), format = "%b %Y")
## Decompose myzoo
composition = decompose(myzoo)
stl(myzoo)
## use auto.arima to choose ARIMA terms
fit <- auto.arima(myzoo)
## forecast for next 60 time points
fore <- forecast(fit, h = 15, level=c(99.5))
## plot it with no x-axis
plot(fore, xaxt = "n")
Axis(inds, side = 1, at = seq(inds[1], tail(inds, 1) + 60, by = "1 months"), format = "%b %Y")
And this is my data (MyData.txt):
X
0
2621
3407
3644
3569
1212
0
0
4473
3885
3671
3641
1453
0
4182
3812
3650
3444
3557
1612
0
4004
3631
3342
3203
3424
1597
0
4280
3644
3642
3696
3793
1753
0
4416
3935
3522
3544
3569
1649
0
3871
3442
3144
3158
3693
1780
0
4322
3682
3499
3279
3485
1716
0
4255
3713
3470
3673
3983
1931
0
4771
3986
3833
3501
3620
1710
0
4407
3799
3654
3332
3693
1780
0
0
4574
4016
3748
3559
1625
0
4548
3726
2780
0
0
122
0
5005
4300
3772
3929
3917
2021
0
4820
4117
3668
3664
3639
1742
0
4473
4151
3844
3499
3736
1838
0
4346
3693
3297
3327
3639
1773
0
4519
0
4352
4079
4143
1970
0
4693
4018
3679
3838
3606
1601
0
0
4289
4011
3742
3710
1781
0
4186
3707
3600
3484
3702
1747
0
4195
3838
3504
3609
3934
1943
0
0
5243
4754
4164
4121
1854
0
0
5173
4518
3875
3889
1904
0
5105
4056
4186
4079
3953
1846
0
4543
4341
4013
2998
4048
1767
0
0
4317
5260
5185
4969
2046
0
5683
5004
4567
4542
4266
2065
0
4357
5281
4830
4510
0
1567
0
5818
4906
4518
4218
4275
2074
0
5005
4645
4543
4558
4574
2129
0
4755
0
4458
3845
3746
1689
0
4285
3476
3447
2959
3470
1584
0
0
4159
3881
3533
3360
1643
0
4152
3748
3329
3112
3303
1790
0
3852
4190
3482
3313
3400
1582
0
4042
3706
3451
3137
3178
1518
0
4077
3754
3429
3369
3307
1467
0
3918
3620
3442
3302
3168
1630
0
3967
3707
3397
3294
3314
1646
0
4196
3812
3478
3111
3113
1411
0
0
3717
3501
3282
3366
1554
0
3737
3428
3028
2960
2977
1513
0
3608
3306
2941
2918
3238
1543
0
0
3959
3678
3367
3237
1024
0
0
4057
3562
3344
3367
1602
0
3784
3581
3395
2948
3009
1446
0
3676
3276
3112
3125
3133
1502
0
4200
4027
3739
3531
3222
2
0
4446
4342
4066
3811
2932
1643
0
4587
4534
4146
3994
3350
1400
0
1248
0
4248
4629
4346
1844
0
168
The zeros represent sales on holidays and Sundays. The purpose of this script is to be able to make forecast.
Thanks in advance.
Can't help you with the software . Perhaps contact the author. You data is better suited to deterministic effects rather than arima memory effects. There are strong monthly effects and even stronger daily effects. along with a host of pulses probably reflecting holiday or promotion effects that are currently omitted from the model.
The Actual/Fit and Forecast should give you motivation to pursue this approach. with statistical summary here
I have a daily data of sales with zero values (by holidays and sundays) and I want to apply boxCox.lambda() function, but clearly with the zero values this is impossible. Mi options actually are:
1 - Change the zero values by values approaching zero, but I do not know how this can affect my forecast.
Any suggestions I will be grateful.
This is my data:
Data
0
2621
3407
3644
3569
1212
0
0
4473
3885
3671
3641
1453
0
4182
3812
3650
3444
3557
1612
0
4004
3631
3342
3203
3424
1597
0
4280
3644
3642
3696
3793
1753
0
4416
3935
3522
3544
3569
1649
0
3871
3442
3144
3158
3693
1780
0
4322
3682
3499
3279
3485
1716
0
4255
3713
3470
3673
3983
1931
0
4771
3986
3833
3501
3620
1710
0
4407
3799
3654
3332
3693
1780
0
0
4574
4016
3748
3559
1625
0
4548
3726
2780
0
0
122
0
5005
4300
3772
3929
3917
2021
0
4820
4117
3668
3664
3639
1742
0
4473
4151
3844
3499
3736
1838
0
4346
3693
3297
3327
3639
1773
0
4519
0
4352
4079
4143
1970
0
4693
4018
3679
3838
3606
1601
0
0
4289
4011
3742
3710
1781
0
4186
3707
3600
3484
3702
1747
0
4195
3838
3504
3609
3934
1943
0
0
5243
4754
4164
4121
1854
0
0
5173
4518
3875
3889
1904
0
5105
4056
4186
4079
3953
1846
0
4543
4341
4013
2998
4048
1767
0
0
4317
5260
5185
4969
2046
0
5683
5004
4567
4542
4266
2065
0
4357
5281
4830
4510
0
1567
0
5818
4906
4518
4218
4275
2074
0
5005
4645
4543
4558
4574
2129
0
4755
0
4458
3845
3746
1689
0
4285
3476
3447
2959
3470
1584
0
0
4159
3881
3533
3360
1643
0
4152
3748
3329
3112
3303
1790
0
3852
4190
3482
3313
3400
1582
0
4042
3706
3451
3137
3178
1518
0
4077
3754
3429
3369
3307
1467
0
3918
3620
3442
3302
3168
1630
0
3967
3707
3397
3294
3314
1646
0
4196
3812
3478
3111
3113
1411
0
0
3717
3501
3282
3366
1554
0
3737
3428
3028
2960
2977
1513
0
3608
3306
2941
2918
3238
1543
0
0
3959
3678
3367
3237
1024
0
0
4057
3562
3344
3367
1602
0
3784
3581
3395
2948
3009
1446
0
3676
3276
3112
3125
3133
1502
0
4200
4027
3739
3531
3222
2
0
4446
4342
4066
3811
2932
1643
0
4587
4534
4146
3994
3350
1400
0
1248
0
4248
4629
4346
1844
0
168
I'd recommend you just drop all the Sundays from your data. As we know they will alway s be zero there is no point in spending time and effort on forecasting them.
The periodicity is very strong even with them removed, and diagnosing the data by looking at acf plots etc. is much more straight forward.
# Removing every Sunday and creating a ts object of appropriate frequency
x6 <- x[seq_along(x) %% 7 != 0]
x6.ts <- ts(x6, frequency=6)
# Plenty of periodic structure left
par(mfcol=c(2, 1))
sp <- split(x6.ts, (seq_along(x6.ts)-1) %% 6 + 1)
stripchart(sp, vertical=TRUE, col=rainbow(6, alpha=0.2, start=0.97), pch=16,
method="jitter", group.names=c("Mon", "Tue", "Wed", "Thu", "Fri", "Sat"))
plot.default(x6.ts, type="p", pch=16, col=rainbow(6, alpha=0.6, start=0.97))
The we could f.ex apply a SARIMA model
acf(x6.ts, adj=c(0.5))
title("x6.ts", cex.main=0.9)
acf(diff(x6.ts, lag=6))
title("diff(x6.ts, lag=6)", cex.main=0.9)
I see a seasonal random walk there, and once we take the seasonal difference we see that there's at least a couple of seasonal autoregressive components, and maybe a non-seasonal autoregression.
aa6.1 <- arima(x6.ts, order=c(0, 0, 0), seasonal=c(1, 1, 0))
aa6.2 <- arima(x6.ts, order=c(0, 0, 0), seasonal=c(2, 1, 0))
aa6.3 <- arima(x6.ts, order=c(1, 0, 0), seasonal=c(2, 1, 0))
aa6.4 <- arima(x6.ts, order=c(1, 0, 0), seasonal=c(3, 1, 0))
dummy11 <- model.matrix(~ as.factor(seq_along(x6.ts) %% 11))[,2]
aa6.5 <- arima(x6.ts, order=c(1, 0, 0), seasonal=c(3, 1, 0),
xreg=dummy11)
AIC(aa6.1, aa6.2, aa6.3, aa6.4, aa6.5)
# df AIC
# aa6.1 2 5244.846
# aa6.2 3 5195.019
# aa6.3 4 5192.212
# aa6.4 5 5179.310
# aa6.5 6 5164.567
acfr <- function(x){
a <- acf(residuals(x), plot=FALSE)
a$acf[1, 1, 1] <- 0
plot(a, main="", frame.plot=FALSE, ylim=c(-0.2, 0.2))
mod <- paste(paste(names(x$call),
as.character(x$call), sep="=")[-1], collapse=", ")
text(-0.1, 0.19, pos=4, xpd=NA,
paste0("AIC: ", round(x$aic), "\n", "Mod: ", mod))
}
par(mfcol=c(5, 1))
k <- lapply(list(aa6.1, aa6.2, aa6.3, aa6.4, aa6.5), acfr)
Seems like (1 0 0) (3 1 0)[6] does a decent job, but there's a persistent autocorrelation at lag 11. This is an artefact of the removal of Sundays, but we can address it by including an external regressor of dummys.
I tried to extract indexes from okresy when values fulfills the condition of ifelse. Results of below showed lapply loop are confusing to me. What are this ascending large numbers, and how can I extract indexes in each vector from a list?
okresy <- list(okres96, okres97, okres98, okres99, okres00, okres01, okres02, okres03, okres04, okres05, okres06, okres07, okres08, okres09, okres10, okres11, okres12, okres13, okres14, okres15, okres16, okres17)
day1 <- "1996-05-31"
day2 <- "2012-05-02"
day1 <- as.Date(day1, "%Y-%m-%d")
day2 <- as.Date(day2, "%Y-%m-%d")
values <- lapply(okresy, function(x) ifelse(day1 <= x & x <= day2, x, 0))
values
Results:
[[1]]
[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[35] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[69] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[103] 0 0 9647 9650 9651 9652 9654 9657 9658 9659 9660 9661 9664 9665 9666 9667 9668 9671 9672 9673 9674 9675 9678 9679 9680 9681 9682 9685 9686 9687 9688 9689 9692 9693
[137] 9694 9695 9696 9699 9700 9701 9702 9703 9706 9707 9708 9709 9710 9713 9714 9715 9716 9717 9720 9721 9722 9724 9727 9728 9729 9730 9731 9734 9735 9736 9737 9738 9741 9742
[171] 9743 9744 9745 9748 9749 9750 9751 9752 9755 9756 9757 9758 9759 9762 9763 9764 9765 9766 9769 9770 9771 9772 9773 9776 9777 9778 9779 9780 9783 9784 9785 9786 9787 9790
[205] 9791 9792 9793 9794 9797 9798 9799 9800 9804 9805 9806 9807 9808 9812 9813 9814 9815 9818 9819 9820 9821 9822 9825 9826 9827 9828 9829 9832 9833 9834 9835 9836 9839 9840
[239] 9841 9842 9843 9846 9847 9848 9849 9850 9853 9854 9860 9861
[[2]]
[1] 9863 9864 9867 9868 9869 9870 9871 9874 9875 9876 9877 9878 9881 9882 9883 9884 9885 9888 9889 9890 9891 9892 9895 9896 9897 9898 9899 9902
[29] 9903 9904 9905 9906 9909 9910 9911 9912 9913 9916 9917 9918 9919 9920 9923 9924 9925 9926 9927 9930 9931 9932 9933 9934 9937 9938 9939 9940
[57] 9941 9944 9945 9946 9947 9948 9952 9953 9954 9955 9958 9959 9960 9961 9962 9965 9966 9967 9968 9969 9972 9973 9974 9975 9976 9979 9980 9981
[85] 9986 9987 9988 9989 9990 9993 9994 9995 9996 9997 10000 10001 10002 10003 10004 10007 10008 10009 10011 10014 10015 10016 10017 10018 10021 10022 10023 10024
[113] 10025 10028 10029 10030 10031 10032 10035 10036 10037 10038 10039 10042 10043 10044 10045 10046 10049 10050 10051 10052 10053 10056 10057 10058 10059 10060 10063 10064
[141] 10065 10066 10067 10070 10071 10072 10073 10074 10077 10078 10079 10080 10081 10084 10085 10086 10087 10091 10092 10093 10094 10095 10098 10099 10100 10101 10102 10105
[169] 10106 10107 10108 10109 10112 10113 10114 10115 10116 10119 10120 10121 10122 10123 10126 10127 10128 10129 10130 10133 10134 10135 10136 10137 10140 10141 10142 10143
[197] 10144 10147 10148 10149 10150 10151 10154 10155 10156 10157 10158 10161 10162 10163 10164 10165 10168 10169 10170 10171 10172 10177 10178 10179 10182 10183 10184 10185
[225] 10186 10189 10190 10191 10192 10193 10196 10197 10198 10199 10200 10203 10204 10205 10206 10207 10210 10211 10212 10213 10214 10217 10218 10219 10224 10225 10226
(...)
Sorry for the very specific question, but I have a file as such:
Adj Year man mt wm wmt by bytl gr grtl
3 careless 1802 0 126 0 54 0 13 0 51
4 careless 1803 0 166 0 72 0 1 0 18
5 careless 1804 0 167 0 58 0 2 0 25
6 careless 1805 0 117 0 5 0 5 0 7
7 careless 1806 0 408 0 88 0 15 0 27
8 careless 1807 0 214 0 71 0 9 0 32
...
560 mean 1939 21 5988 8 1961 0 1152 0 1512
561 mean 1940 20 5810 6 1965 1 914 0 1444
562 mean 1941 10 6062 4 2097 5 964 0 1550
563 mean 1942 8 5352 2 1660 2 947 2 1506
564 mean 1943 14 5145 5 1614 1 878 4 1196
565 mean 1944 42 5630 6 1939 1 902 0 1583
566 mean 1945 17 6140 7 2192 4 1004 0 1906
Now I have to call for specific values (e.g. [careless,1804,man] or [mean, 1944, wmt].
Now I have no clue how to do that, one possibility would be to split the data.frame and create an array if I'm correct. But I'd love to have a simpler solution.
Thank you in advance!
Subsetting for specific values in Adj and Year column and selecting the man column will give you the required output.
df[df$Adj == "careless" & df$Year == 1804, "man"]