Converting output format of a time series forecast in R - r

I am working on a Timeseries data of 2 products - X1 & X2 starting from Jan-2016 to Dec-2019 and I am applying NNAR forecast model on it. The code is below:
nnar.Accounts_ts = ts(df, start = c(2016, 1), frequency = 12)
nnar.Accounts_ts
V1 V2
Jan 2016 2792 8882
Feb 2016 3317 10803
Mar 2016 4292 14059
Apr 2016 4500 15617
May 2016 5234 19211
Jun 2016 6657 23632
Jul 2016 6329 25435
Aug 2016 7208 30671
Sep 2016 7046 32429
Oct 2016 7242 35794
Nov 2016 7692 39138
Dec 2016 7860 43767
Jan 2017 6941 42172
Feb 2017 7076 40690
Mar 2017 8943 50362
Apr 2017 8435 50890
May 2017 9757 59852
Jun 2017 9510 62762
Jul 2017 8665 64176
Aug 2017 9538 70739
Sep 2017 8832 69643
Oct 2017 9983 77886
Nov 2017 9541 79059
Dec 2017 9397 82658
Jan 2018 10350 90879
Feb 2018 9853 84161
Mar 2018 12472 98436
Apr 2018 11942 101095
May 2018 12706 109782
Jun 2018 11733 108488
Jul 2018 11114 114713
Aug 2018 12731 122221
Sep 2018 10750 114816
Oct 2018 12319 129158
Nov 2018 12391 127707
Dec 2018 12442 132581
Jan 2019 14218 143658
Feb 2019 13628 131456
Mar 2019 15629 149794
Apr 2019 16457 157845
May 2019 16880 166019
Jun 2019 15362 160458
Jul 2019 15509 175690
Aug 2019 16195 178887
Sep 2019 14645 173253
Oct 2019 16930 189340
Nov 2019 16586 181478
Dec 2019 16520 199305
set.seed(54321)
Brand_nnar_Accounts_forecast <- lapply(nnar.Accounts_ts, function(x) forecast(nnetar(x, PI = TRUE),h = 30))
The output of the forecast model is of the format:
$X1
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2020 17055.65 16935.27 17235.31 17290.44 17310.37 17215.72 17227.75 17276.04 17134.86 17311.74 17297.26 17293.96
2021 17317.46 17312.78 17324.23 17326.22 17326.92 17323.58 17324.00 17325.72 17320.61 17326.95 17326.47 17326.35
2022 17327.17 17327.01 17327.40 17327.47 17327.50 17327.38
$X2
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2020 208483.0 187984.3 220114.3 225257.3 235741.0 225815.4 248283.1 248211.2 241011.6 261533.2 249032.7 270361.6
2021 275791.9 254342.5 286256.4 286638.9 292843.6 286383.4 298840.4 298059.8 294739.4 303357.4 298140.3 305902.2
2022 307070.0 300128.1 309649.4 309460.6 310678.2 309377.8
which I want to convert it into the following format.
Jan 2020 Feb 2020 Mar 2020 Apr 2020 May 2020 Jun 2020 Jul 2020 Aug 2020 Sep 2020 Oct 2020 Nov 2020
X1 17055.65 16935.27 17235.31 17290.44 17310.37 17215.72 17227.75 17276.04 17134.86 17311.74 17297.26
X2 208483.04 187984.26 220114.30 225257.26 235741.04 225815.39 248283.10 248211.23 241011.62 261533.17 249032.70
Dec 2020 Jan 2021 Feb 2021 Mar 2021 Apr 2021 May 2021 Jun 2021 Jul 2021 Aug 2021 Sep 2021 Oct 2021 Nov 2021
X1 17293.96 17317.46 17312.78 17324.23 17326.22 17326.92 17323.58 17324.0 17325.72 17320.61 17326.95 17326.47
X2 270361.60 275791.92 254342.50 286256.43 286638.89 292843.56 286383.42 298840.4 298059.78 294739.42 303357.43 298140.34
Dec 2021 Jan 2022 Feb 2022 Mar 2022 Apr 2022 May 2022 Jun 2022
X1 17326.35 17327.17 17327.01 17327.4 17327.47 17327.5 17327.38
X2 305902.18 307070.02 300128.09 309649.4 309460.55 310678.2 309377.82
NNAR model has embedded lists which is creating problem when I am using proposed solution to convert the format

One option is to transpose the elements in the list after converting to xts and then do the rbind
library(xts)
`row.names<-`(do.call(rbind, lapply(lst1, function(x) t(as.xts(x)))), names(lst1))
# Jan 2020 Feb 2020 Mar 2020 Apr 2020 May 2020 Jun 2020 Jul 2020 Aug 2020 Sep 2020
#Product 1 41 56 2 16 78 60 89 31 68
#Product 2 52 23 57 48 80 53 63 36 10
# Oct 2020 Nov 2020 Dec 2020 Jan 2021 Feb 2021 Mar 2021 Apr 2021 May 2021 Jun 2021
#Product 1 73 2 40 45 11 43 63 58 29
#Product 2 24 24 39 4 56 85 6 20 69
# Jul 2021 Aug 2021 Sep 2021 Oct 2021 Nov 2021 Dec 2021
#Product 1 39 76 8 89 14 3
#Product 2 93 26 23 95 79 56
data
lst1 <- list(`Product 1` = ts(sample(100, 24, replace = TRUE), start = c(2020, 1),
frequency =12), `Product 2` = ts(sample(100, 24, replace = TRUE),
start = c(2020, 1), frequency =12))

Related

How to dynamically loop through a split dataframe in R

I have dataframe df3
df3
x d y
1 bbc Sep 2020 123
2 rsb Sep 2020 234
3 atc Sep 2020 345
4 svc Sep 2020 543
5 mwe Sep 2020 567
6 bpa Oct 2020 322
7 mwe Oct 2020 456
8 uhs Oct 2020 786
9 se Oct 2020 543
10 db Oct 2020 778
11 rsb Nov 2020 358
12 svc Nov 2020 678
13 db Nov 2020 321
14 rb Nov 2020 689
15 bpa Nov 2020 765
After applying split function `df5 = split(df3,f=df3$d)
> df5 = split(df3,f=df3$d)
> df5
$`Sep 2020`
x d y
1 bbc Sep 2020 123
2 rsb Sep 2020 234
3 atc Sep 2020 345
4 svc Sep 2020 543
5 mwe Sep 2020 567
$`Oct 2020`
x d y
6 bpa Oct 2020 322
7 mwe Oct 2020 456
8 uhs Oct 2020 786
9 se Oct 2020 543
10 db Oct 2020 778
$`Nov 2020`
x d y
11 rsb Nov 2020 358
12 svc Nov 2020 678
13 db Nov 2020 321
14 rb Nov 2020 689
15 bpa Nov 2020 765
I would like to dynamically loop through the split dataframe.
I need to find out if any values present in Nov 2020 are also present in Oct 2020.
if it is present in both, then have to check the previous one Sep 2020, and also find the number of times the names have occurred. Here df3$d is in as.yearmon format. If any names in df5[["Nov 2020"]]$x are present in df5[["Sep 2020"]]$x, extract and store it in an object along with its count. here the count is 2 since it would be present in Nov 2020 and Oct 2020. Only if the names are present in the most recent month, it should check the previous months. For this example, the output should be
> df4
names_present present_for
1 bpa 2
2 db 2
Thank you in advance

Find the number of times a name occurs in each 'month year' using R

I have a dataframe like
x d y
bbc Sep 2020 123
rsb Sep 2020 234
atc Sep 2020 345
svc Sep 2020 543
mwe Sep 2020 567
bpa Oct 2020 322
mwe Oct 2020 456
uhs Oct 2020 786
se Oct 2020 543
db Oct 2020 778
rsb Nov 2020 358
svc Nov 2020 678
db Nov 2020 321
rb Nov 2020 689
bpa Nov 2020 765
The column 'd' is in as.yearmon format
I wanna find out the values in x column that are repeated for each month year column d.
In this dataframe bpa and db are present in Nov 2020 and Oct 2020. So the output needs to be like
names_present present_for
bpa 2
db 2
The values of x present in each month year data should be given as output only if they are present continuosly. Here svc is present in Nov 2020 and Sep 2020 but it cannot be considered as a part of the output since it is absent in Oct 2020.
I tried splitting the dataframe based on the df$d column but I couldn't loop through the split dataframes and get the output required. the split dataframe looks like
$`Nov 2020`
x d y
11 rsb Nov 2020 358
12 svc Nov 2020 678
13 db Nov 2020 321
14 rb Nov 2020 689
15 bpa Nov 2020 765
$`Oct 2020`
x d y
6 bpa Oct 2020 322
7 mwe Oct 2020 456
8 uhs Oct 2020 786
9 se Oct 2020 543
10 db Oct 2020 778
$`Sep 2020`
x d y
1 bbc Sep 2020 123
2 rsb Sep 2020 234
3 atc Sep 2020 345
4 svc Sep 2020 543
5 mwe Sep 2020 567
The code needs to first check in the most recent month year followed by the previous month year and so on. Check if any of the x names are present in Nov 2020 first and then followed by Oct 2020 and then Sep 2020 and count their occurrences. if the names are present in Nov 2020 and Sep 2020 but not in Oct 2020, then it cannot be considered as a part of the output dataframe. There can be any number of month year values. The dataframe given here is just a small sample. I wanna dynamically find out this information.
I've been struggling with this for a long time. Would be great if anyone could help me solve this. Thank you in advance.

Convert time series to data.frame without losing the year and Month items

I have a time series, dt_ts. I want to convert to dataframe without loosing the year and month
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2005 41.26 40.02 38.24 35.37 39.35 38.90 43.51 40.32 38.14 41.04 41.78 40.48
2006 40.55 42.15 42.30 39.93 38.12 35.79 34.71 34.29 36.27 37.33 37.97 40.16
2007 40.74 39.59 36.74 37.87 38.87 39.35 37.17 38.31 32.44
I want something like:
Year Month Sales
2005 Jan 41.26
etc etc etc
A solution using dplyr, tidyr, and tibble.
library(dplyr)
library(tidyr)
library(tibble)
dt2 <- dt %>%
rownames_to_column("Year") %>%
gather(Month, Sales, -Year) %>%
mutate(Month = factor(Month, levels = colnames(dt))) %>%
arrange(Year, Month)
dt2
Year Month Sales
1 2005 Jan 41.26
2 2005 Feb 40.02
3 2005 Mar 38.24
4 2005 Apr 35.37
5 2005 May 39.35
6 2005 Jun 38.90
7 2005 Jul 43.51
8 2005 Aug 40.32
9 2005 Sep 38.14
10 2005 Oct 41.04
11 2005 Nov 41.78
12 2005 Dec 40.48
13 2006 Jan 40.55
14 2006 Feb 42.15
15 2006 Mar 42.30
16 2006 Apr 39.93
17 2006 May 38.12
18 2006 Jun 35.79
19 2006 Jul 34.71
20 2006 Aug 34.29
21 2006 Sep 36.27
22 2006 Oct 37.33
23 2006 Nov 37.97
24 2006 Dec 40.16
25 2007 Jan 40.74
26 2007 Feb 39.59
27 2007 Mar 36.74
28 2007 Apr 37.87
29 2007 May 38.87
30 2007 Jun 39.35
31 2007 Jul 37.17
32 2007 Aug 38.31
33 2007 Sep 32.44
34 2007 Oct NA
35 2007 Nov NA
36 2007 Dec NA
DATA
dt <- read.table(text = " Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2005 41.26 40.02 38.24 35.37 39.35 38.90 43.51 40.32 38.14 41.04 41.78 40.48
2006 40.55 42.15 42.30 39.93 38.12 35.79 34.71 34.29 36.27 37.33 37.97 40.16
2007 40.74 39.59 36.74 37.87 38.87 39.35 37.17 38.31 32.44",
header = TRUE, fill = TRUE)
One option would be to convert to xts, get the 'index', split it into two column and cbind with vector 'ts1'
library(xts)
cbind(read.table(text = as.character(index(as.xts(ts1))),
col.names = c('Month', 'Year')), Sales = c(ts1))
data
set.seed(24)
ts1 <- ts(sample(50), start = c(2001, 1), frequency = 12)

Why the value of x-axis in time series plot in R shows 2012.0, 2013.0 rather than Jan 2012, etc?

I have create a time series matrix with code and output like below:
ts2 <-ts(cbind(LRC_3PDMUM, LRC_3PDMMS),frequency=12,start=c(2012,1))
ts2
LRC_3PDMUM LRC_3PDMMS
Jan 2012 0.029256 0.025904
Feb 2012 0.051945 0.055827
Mar 2012 0.078153 0.084049
Apr 2012 0.100596 0.110188
May 2012 0.126015 0.136850
Jun 2012 0.149349 0.162446
Jul 2012 0.173949 0.186486
Aug 2012 0.198704 0.212683
Sep 2012 0.220277 0.237433
Oct 2012 0.244358 0.262342
Nov 2012 0.272664 0.286019
Dec 2012 0.293653 0.309429
Jan 2013 0.320472 0.331575
Feb 2013 0.339880 0.356900
Mar 2013 0.362203 0.384612
Apr 2013 0.383525 0.408996
May 2013 0.403316 0.431810
Jun 2013 0.430651 0.454040
Jul 2013 0.453148 0.475161
Aug 2013 0.484378 0.496460
Sep 2013 0.501923 0.518307
Oct 2013 0.525252 0.541631
Nov 2013 0.544958 0.563007
Dec 2013 0.564571 0.582775
However, when I do plot(ts2), the plot has x-axis value like 2012.0, 2013.0, versus what I would expect Jan 2012, feb 2013, etc. Please advise how to revise the code. Thanks!
Assuming an example that looks like yours:
a <- ts( matrix(1:100,ncol=2), frequency = 12, start = c(1959, 1))
> a
Series 1 Series 2
Jan 1959 1 51
Feb 1959 2 52
Mar 1959 3 53
Apr 1959 4 54
May 1959 5 55
Jun 1959 6 56
Jul 1959 7 57
Aug 1959 8 58
Sep 1959 9 59
Oct 1959 10 60
Nov 1959 11 61
Dec 1959 12 62
Jan 1960 13 63
Feb 1960 14 64
#and so on...
The easiest way would be to use the xts package like this:
library(xts)
#transform to xts that uses this date format
b <- as.xts(a)
#plot first series
plot (b[, 'Series 1'], ylim=c(0,100))
#plot second series
lines(b[, 'Series 2'], col='red')

extracting directory name and putting it on top of the list

Aug 1 2013 /home/s/tone/TONE/gong1
Aug 1 2013 /home/s/tone/TONE/gong1.x
Aug 1 2013 /home/s/tone/TONE/gong2
Aug 1 2013 /home/s/tone/TONE/gong1.kbd
Aug 1 2013 /home/s/tone/TONE/gong2.x
Aug 1 2013 /home/s/tone/TONE/gong2.kbd
Aug 1 2013 /home/s/tone/TONE/gong3.kbd
Oct 10 2013 /home/s/man/whatisSPEC
Oct 10 2013 /home/s/man/man3/ctx.3
Oct 10 2013 /home/s/man/man3/sos.3
Oct 10 2013 /home/s/man/man3/dt.3
Oct 10 2013 /home/s/man/man3/timexpr.3
Oct 10 2013 /home/s/man/man3/mpusw.3
Oct 10 2013 /home/s/man/man3/mpu.err.3
Oct 10 2013 /home/s/man/man3/dbr.3
Oct 10 2013 /home/s/man/man3/psi.err.3
Oct 10 2013 /home/s/man/man3/stapo.3
Hi guys,
I would like to know if there are any ways to insert the directory name to the top and reprint the list to make it look like this. Thanks so much.
TONE
Aug 1 2013 /home/s/tone/TONE/gong1
Aug 1 2013 /home/s/tone/TONE/gong1.x
Aug 1 2013 /home/s/tone/TONE/gong2
Aug 1 2013 /home/s/tone/TONE/gong1.kbd
Aug 1 2013 /home/s/tone/TONE/gong2.x
Aug 1 2013 /home/s/tone/TONE/gong2.kbd
Aug 1 2013 /home/s/tone/TONE/gong3.kbd
man
Oct 10 2013 /home/s/man/whatisSPEC
man3
Oct 10 2013 /home/s/man/man3/ctx.3
Oct 10 2013 /home/s/man/man3/sos.3
Oct 10 2013 /home/s/man/man3/dt.3
Oct 10 2013 /home/s/man/man3/timexpr.3
Oct 10 2013 /home/s/man/man3/mpusw.3
Oct 10 2013 /home/s/man/man3/mpu.err.3
Oct 10 2013 /home/s/man/man3/dbr.3
Oct 10 2013 /home/s/man/man3/psi.err.3
Oct 10 2013 /home/s/man/man3/stapo.3
It's not clear where you get your list from, so I make the same assumption as Mari
$ cat sample.txt
> Aug 1 2013 /home/s/tone/TONE/gong1
> Aug 1 2013 /home/s/tone/TONE/gong1.x
> Aug 1 2013 /home/s/tone/TONE/gong2
> Aug 1 2013 /home/s/tone/TONE/gong1.kbd
> Aug 1 2013 /home/s/tone/TONE/gong2.x
> Aug 1 2013 /home/s/tone/TONE/gong2.kbd
> Aug 1 2013 /home/s/tone/TONE/gong3.kbd
> Oct 10 2013 /home/s/man/whatisSPEC
> Oct 10 2013 /home/s/man/man3/ctx.3
> Oct 10 2013 /home/s/man/man3/sos.3
> Oct 10 2013 /home/s/man/man3/dt.3
> Oct 10 2013 /home/s/man/man3/timexpr.3
> Oct 10 2013 /home/s/man/man3/mpusw.3
> Oct 10 2013 /home/s/man/man3/mpu.err.3
> Oct 10 2013 /home/s/man/man3/dbr.3
> Oct 10 2013 /home/s/man/man3/psi.err.3
> Oct 10 2013 /home/s/man/man3/stapo.3
awk can handle this:
$ awk -f script.awk sample.txt
> TONE
> Aug 1 2013 /home/s/tone/TONE/gong1
> Aug 1 2013 /home/s/tone/TONE/gong1.x
> Aug 1 2013 /home/s/tone/TONE/gong2
> Aug 1 2013 /home/s/tone/TONE/gong1.kbd
> Aug 1 2013 /home/s/tone/TONE/gong2.x
> Aug 1 2013 /home/s/tone/TONE/gong2.kbd
> Aug 1 2013 /home/s/tone/TONE/gong3.kbd
> man
> Oct 10 2013 /home/s/man/whatisSPEC
> man3
> Oct 10 2013 /home/s/man/man3/ctx.3
> Oct 10 2013 /home/s/man/man3/sos.3
> Oct 10 2013 /home/s/man/man3/dt.3
> Oct 10 2013 /home/s/man/man3/timexpr.3
> Oct 10 2013 /home/s/man/man3/mpusw.3
> Oct 10 2013 /home/s/man/man3/mpu.err.3
> Oct 10 2013 /home/s/man/man3/dbr.3
> Oct 10 2013 /home/s/man/man3/psi.err.3
> Oct 10 2013 /home/s/man/man3/stapo.3
and the script.awk used in this example looks like:
BEGIN {
FS="/"
}
lastDir!=$(NF-1){
lastDir=$(NF-1)
print lastDir
}
{
print $0
}
At the beginning we set the field separator FS to /, this is the same as calling awk with awk -F "/" but for reasons of clarity I put everything in a script, instead of just an awk oneliner.
The NF variable gives you the number of fields per line, $(NF-1) is therefore the last field (separated by /) in every line and this is exactly the name of the directory. Now we compare, if the lastDir variable is not the same as the $(NF-1) (which is the current directory), then we overwrite the lastDir variable and print it. In any case, we print the whole line with $0. Note that the lastDir variable doesn't need to be initiated, it is simply set to an empty string.
I assume your input data is in a file. So I have created a file with your input data. So it comes like this in my server.
cat sample.txt
Aug 1 2013 /home/s/tone/TONE/gong1
Aug 1 2013 /home/s/tone/TONE/gong1.x
Aug 1 2013 /home/s/tone/TONE/gong2
Aug 1 2013 /home/s/tone/TONE/gong1.kbd
Aug 1 2013 /home/s/tone/TONE/gong2.x
Aug 1 2013 /home/s/tone/TONE/gong2.kbd
Aug 1 2013 /home/s/tone/TONE/gong3.kbd
Oct 10 2013 /home/s/man/whatisSPEC
Oct 10 2013 /home/s/man/man3/ctx.3
Oct 10 2013 /home/s/man/man3/sos.3
Oct 10 2013 /home/s/man/man3/dt.3
Oct 10 2013 /home/s/man/man3/timexpr.3
Oct 10 2013 /home/s/man/man3/mpusw.3
Oct 10 2013 /home/s/man/man3/mpu.err.3
Oct 10 2013 /home/s/man/man3/dbr.3
Oct 10 2013 /home/s/man/man3/psi.err.3
Oct 10 2013 /home/s/man/man3/stapo.3
So you can get the directory names from this command,
awk -F "/" '{print $(NF-1)}' sample.txt | uniq
output
TONE
man
man3
Here am helping you to get the directory name only. Am not sure how to print them at top of each group lines.

Resources