I want to plot a time series together with its moving average like the example in a Forecasting: Principles and Practices I use my own time series called salests:
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2015 110 115 92 120 125 103 132 136 114 139 143 119
2016 150 156 130 169 166 142 170 173 151 180 184 163
I then use similar code as in the book:
autoplot(salests, series="Sales") +
forecast::autolayer(ma(salests, 5), series="5 Moving Average")
But I receive the error:
Error: Invalid input: date_trans works with objects of class Date only
What am I doing wrong? It seems that I just am following the book.
Thanks in advance
Here are some ideas that could help you.
# I start reading your dataset
df1 <- read.table(text='
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2015 110 115 92 120 125 103 132 136 114 139 143 119
2016 150 156 130 169 166 142 170 173 151 180 184 163
', header=T)
# Set locale to 'English' if you have a different setting
Sys.setlocale( locale='English' )
# I reshape your dataset in long format
library(reshape)
df2 <- melt(df1)
df2$time <- paste0("01-",df2$variable,'-',rep(rownames(df1), ncol(df1)))
df2$time <- as.Date(df2$time, "%d-%b-%Y")
( df2 <- df2[order(df2$time),] )
# variable value time
# 1 Jan 110 2015-01-01
# 3 Feb 115 2015-02-01
# 5 Mar 92 2015-03-01
# 7 Apr 120 2015-04-01
# 9 May 125 2015-05-01
# 11 Jun 103 2015-06-01
# 13 Jul 132 2015-07-01
# 15 Aug 136 2015-08-01
# 17 Sep 114 2015-09-01
# 19 Oct 139 2015-10-01
# 21 Nov 143 2015-11-01
# 23 Dec 119 2015-12-01
# 2 Jan 150 2016-01-01
# 4 Feb 156 2016-02-01
# 6 Mar 130 2016-03-01
# 8 Apr 169 2016-04-01
# 10 May 166 2016-05-01
# 12 Jun 142 2016-06-01
# 14 Jul 170 2016-07-01
# 16 Aug 173 2016-08-01
# 18 Sep 151 2016-09-01
# 20 Oct 180 2016-10-01
# 22 Nov 184 2016-11-01
# 24 Dec 163 2016-12-01
Now create a time-series ts object
( salests <- ts(df2$value, frequency=12, start = c(2015,1)) )
# Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
# 1 110 115 92 120 125 103 132 136 114 139 143 119
# 2 150 156 130 169 166 142 170 173 151 180 184 163
and plot it:
library(ggfortify)
library(forecast)
autoplot(salests) +
forecast::autolayer(ma(salests, 5), series="5 Moving Average")
Related
What is the simplest way of turning a frequency data table into a prop table in R?
This is the data:
Time Total Blog News Social.Network Microblog Other Forums Pictures Video
1 15.KW 2022 1816 23 326 39 678 99 27 523 0
2 16.KW 2022 2535 32 690 42 815 135 26 644 1
3 17.KW 2022 2181 20 362 79 805 110 14 634 1
4 18.KW 2022 2583 19 895 25 692 127 6 658 0
5 19.KW 2022 2337 21 555 22 908 148 8 599 0
6 20.KW 2022 2091 23 392 18 851 119 5 554 0
7 21.KW 2022 1658 17 344 16 650 129 1 417 0
8 22.KW 2022 2476 24 798 24 937 150 7 443 0
9 23.KW 2022 1687 14 341 17 691 102 9 400 0
10 24.KW 2022 2476 21 521 29 984 110 19 509 0
11 25.KW 2022 2412 22 696 31 845 115 29 561 0
12 26.KW 2022 2197 22 715 13 709 128 59 445 0
13 27.KW 2022 2111 20 429 10 937 86 28 474 1
14 28.KW 2022 752 5 121 4 373 42 3 172 0
Your data frame df has a 2nd column called Total. It seems that you want to divide subsequent columns by this one.
df[-1] <- df[-1] / df$Total
After this, the 1st column Time does not change. 2nd column Total becomes 1. Other columns become proportions.
I have following parameters of gompertz
A <- 100 # A is always 100
mu <- 35
lambda <- 265 # day of the year. Also the start day
I can use the above parameters to run a gompertz using following equation
grofit::gompertz(time,A,mu,lambda)
time is a basically a vector of lambda:end.day.
Now the issue is that I know the lambda (start day) but not the end day. I want find the end day when it reaches 100.
For e.g in the above example if I supply lambda:end.day as 265:270, I do not reach 100.
time <- 265:270
x <- round(grofit::gompertz(time,A,mu,lambda),2)
x
6.60 35.00 66.67 85.51 94.13 97.69
By multiple trials, I know if I give a vector of 265:277, I will reach 100.
time <- 265:277
x <- round(grofit::gompertz(time,A,mu,lambda),2)
x
[1] 6.60 35.00 66.67 85.51 94.13 97.69
[7] 99.10 99.65 99.87 99.95 99.98 99.99
[13] 100.00
I have dataframe that has the lambda (same as start day) and mu.
df <- data.frame(id = c(1,1,2,2), year = c(1981,1982,1981,1982), mu= c(35,32,33,28), lambda = c(275,278,284,296))
For each id and year, I want two columns: one column called day first value of which is equal to lamba and a second column which tells me the value of x for each day till it reaches 100 (end day).
How do I implement the above equation for each id and year such that I have a dataframe something like this:
id year day x
1 1981 275 6.6
1 1981 276 35
1 1981 277 66.67
1 1981 278 85.51
1 1981 279 94.13
1 1981 280 97.69
1 1981 281 99.1
1 1981 282 99.65
1 1981 283 99.87
1 1981 284 99.95
1 1981 285 99.98
1 1981 286 99.99
1 1981 287 100
. . . .
. . . .
2 1982 296 8
2 1982 297 33
2 1982 298 45
2 1982 299 63
2 1982 300 61
2 1982 301 73
2 1982 302 81
2 1982 303 91
2 1982 304 94
2 1982 305 98
2 1982 306 99
2 1982 307 100
Using dplyr and tidyr:
library(dplyr)
library(tidyr)
A <- 100 # A is always 100
df <-
data.frame(
id = c(1, 1, 2, 2),
year = c(1981, 1982, 1981, 1982),
mu = c(35, 32, 33, 28),
lambda = c(275, 278, 284, 296)
)
df2 <- df %>%
crossing(day = 1:365) %>%
group_by(id, year) %>%
filter(day >= lambda) %>%
mutate(x = round(grofit::gompertz(day, A, mu, lambda), 2)) %>%
group_by(id, year, x) %>%
filter(x != 100 | row_number() == 1)
df2 %>%
as.data.frame()
Result:
id year mu lambda day x
1 1 1981 35 275 275 6.60
2 1 1981 35 275 276 35.00
3 1 1981 35 275 277 66.67
4 1 1981 35 275 278 85.51
5 1 1981 35 275 279 94.13
6 1 1981 35 275 280 97.69
7 1 1981 35 275 281 99.10
8 1 1981 35 275 282 99.65
9 1 1981 35 275 283 99.87
10 1 1981 35 275 284 99.95
11 1 1981 35 275 285 99.98
12 1 1981 35 275 286 99.99
13 1 1981 35 275 287 100.00
14 1 1982 32 278 278 6.60
15 1 1982 32 278 279 32.01
16 1 1982 32 278 280 62.05
17 1 1982 32 278 281 81.87
18 1 1982 32 278 282 91.96
19 1 1982 32 278 283 96.55
20 1 1982 32 278 284 98.54
21 1 1982 32 278 285 99.39
22 1 1982 32 278 286 99.74
23 1 1982 32 278 287 99.89
24 1 1982 32 278 288 99.95
25 1 1982 32 278 289 99.98
26 1 1982 32 278 290 99.99
27 1 1982 32 278 291 100.00
28 2 1981 33 284 284 6.60
29 2 1981 33 284 285 33.01
30 2 1981 33 284 286 63.64
31 2 1981 33 284 287 83.17
32 2 1981 33 284 288 92.76
33 2 1981 33 284 289 96.98
34 2 1981 33 284 290 98.76
35 2 1981 33 284 291 99.49
36 2 1981 33 284 292 99.79
37 2 1981 33 284 293 99.92
38 2 1981 33 284 294 99.97
39 2 1981 33 284 295 99.99
40 2 1981 33 284 296 99.99
41 2 1981 33 284 297 100.00
42 2 1982 28 296 296 6.60
43 2 1982 28 296 297 28.09
44 2 1982 28 296 298 55.26
45 2 1982 28 296 299 75.80
46 2 1982 28 296 300 87.86
47 2 1982 28 296 301 94.13
48 2 1982 28 296 302 97.21
49 2 1982 28 296 303 98.69
50 2 1982 28 296 304 99.39
51 2 1982 28 296 305 99.71
52 2 1982 28 296 306 99.87
53 2 1982 28 296 307 99.94
54 2 1982 28 296 308 99.97
55 2 1982 28 296 309 99.99
56 2 1982 28 296 310 99.99
57 2 1982 28 296 311 100.00
I have a dataframe, with the following data:
data1$YEAR data1$WEEK data1$TOTAL.PATIENTS
1 2009 1 579428
9 2009 2 565631
17 2009 3 582932
25 2009 4 611176
33 2009 5 638613
41 2009 6 648304
49 2009 7 624583
57 2009 8 659573
65 2009 9 623389
73 2009 10 637672
81 2009 11 605503
89 2009 12 608342
97 2009 13 586651
105 2009 14 564460
113 2009 15 558837
121 2009 16 577836
129 2009 17 624734
137 2009 18 598189
145 2009 19 550300
153 2009 20 544432
161 2009 21 531526
169 2009 22 538177
177 2009 23 493761
185 2009 24 521701
193 2009 25 512268
201 2009 26 475877
209 2009 27 480680
217 2009 28 502466
225 2009 29 503971
233 2009 30 485804
241 2009 31 496666
249 2009 32 506019
257 2009 33 544827
265 2009 34 588916
273 2009 35 573972
281 2009 36 571201
289 2009 37 638302
296 2009 38 608464
303 2009 39 606458
311 2009 40 855346
319 2009 41 853912
327 2009 42 906536
335 2009 43 898860
343 2009 44 899425
351 2009 45 864348
359 2009 46 853552
367 2009 47 654101
375 2009 48 814550
383 2009 49 781811
391 2009 50 728401
399 2009 51 536961
407 2009 52 583299
2 2010 1 721138
...
second column is the year from 2009 to 2015
third column is the week of the year
I would like to plot this data frame. On the x-axis of this plot I would like to see the weeks of each year separately.
something like this. How can I do that?
Doe this work or you need to re-label X-axis to Year only (in the following plot the x-axis is in Year-Weeks)?
head(df)
Year Week TOTAL.PATIENTS
1 2009 11 605503
2 2009 12 608342
3 2009 13 586651
4 2009 14 564460
5 2009 15 558837
6 2009 16 577836
df$Year_Week <- paste(df$Year, sprintf('%02d', df$Week), sep='-')
df$Year <- as.factor(df$Year)
library(scales)
ggplot(df, aes(Year_Week,TOTAL.PATIENTS,col=Year, group=Year)) +
geom_line(lwd=2) + scale_y_continuous(labels = comma) +
xlab('Year-Week') +
theme(axis.text.x = element_text(angle=90, vjust = 0.5))
This question already has answers here:
Finding ALL duplicate rows, including "elements with smaller subscripts"
(9 answers)
Closed 5 years ago.
My data is like this:
dat <- read.table(header=TRUE, text="
ID Veh oct nov dec jan feb
1120 1 7 47 152 259 140
2000 1 5 88 236 251 145
2000 2 14 72 263 331 147
1133 1 6 71 207 290 242
2000 3 7 47 152 259 140
2002 1 5 88 236 251 145
2006 1 14 72 263 331 147
2002 2 6 71 207 290 242
")
dat
ID Veh oct nov dec jan feb
1 1120 1 7 47 152 259 140
2 2000 1 5 88 236 251 145
3 2000 2 14 72 263 331 147
4 1133 1 6 71 207 290 242
5 2000 3 7 47 152 259 140
6 2002 1 5 88 236 251 145
7 2006 1 14 72 263 331 147
8 2002 2 6 71 207 290 242
By using duplicated function:
Unique Cells in Column 1
dat[!duplicated(dat[,1]),]
ID Veh oct nov dec jan feb
1 1120 1 7 47 152 259 140
2 2000 1 5 88 236 251 145
4 1133 1 6 71 207 290 242
6 2002 1 5 88 236 251 145
7 2006 1 14 72 263 331 147
Duplicate cells in Column 1
dat[duplicated(dat[,1]),]
ID Veh oct nov dec jan feb
3 2000 2 14 72 263 331 147
5 2000 3 7 47 152 259 140
8 2002 2 6 71 207 290 242
But I want to keep the row with first row like the following (which I am struggling to code):
ID Veh oct nov dec jan feb
2000 1 5 88 236 251 145
2000 2 14 72 263 331 147
2000 3 7 47 152 259 140
2002 1 5 88 236 251 145
2002 2 6 71 207 290 242
Try
dat[duplicated(dat[,1])|duplicated(dat[,1],fromLast=TRUE),]
# ID Veh oct nov dec jan feb
#2 2000 1 5 88 236 251 145
#3 2000 2 14 72 263 331 147
#5 2000 3 7 47 152 259 140
#6 2002 1 5 88 236 251 145
#8 2002 2 6 71 207 290 242
Or
library(data.table)
setDT(dat)[, .SD[.N>1], ID]
Now I have some monthly data like :
1/1/90 620
2/1/90,591
3/1/90,574
4/1/90,542
5/1/90,534
6/1/90,545
#...etc
If I use ts() function, it's easy to make the data into time series structure like:
Jan Feb Mar ... Nov Dec
1990 620 591 574 ... 493 464
1991 100 200 300 ...........
Is there any possibilities to change it into quarterly repeating like this:
1st 2nd 3rd 4th
1990-Q1 620 591 574 464
1990-Q2 100 200 300 400
1990-Q3 ...
1990-Q4 ...
1991-Q1 ...
I tried to change
ts(mydata,start=c(1990,1),frequency=12)
to
ts(mydata,start=c(as.yearqrt("1990-1",1)),frequency=4)
but it seems not working.
Could anyone help me? Thank you very much.
monthly <- ts(mydata, start = c(1990, 1), frequency = 12)
quarterly <- aggregate(monthly, nfrequency = 4)
I don't agree with Hyndman on this one. Which is rare as Hyndman can usually do no wrong. However, I can show you his solution doesn't give the OP what he wants.
test<-c(1:100)
test_ts <- ts(test, start=c(2000,1), frequency=12)
test_ts
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2000 1 2 3 4 5 6 7 8 9 10 11 12
2001 13 14 15 16 17 18 19 20 21 22 23 24
2002 25 26 27 28 29 30 31 32 33 34 35 36
2003 37 38 39 40 41 42 43 44 45 46 47 48
2004 49 50 51 52 53 54 55 56 57 58 59 60
2005 61 62 63 64 65 66 67 68 69 70 71 72
2006 73 74 75 76 77 78 79 80 81 82 83 84
2007 85 86 87 88 89 90 91 92 93 94 95 96
2008 97 98 99 100
test_agg <- aggregate(test_ts, nfrequency=4)
test_agg
2000 6 15 24 33
2001 42 51 60 69
2002 78 87 96 105
2003 114 123 132 141
2004 150 159 168 177
2005 186 195 204 213
2006 222 231 240 249
2007 258 267 276 285
2008 294
Well, wait, that first quarter isn't the average of the 3 months, its the sum. (1+2+3 =6 but you want it to show the mean=2). So you will need to modify that a tad.
test_agg <- aggregate(test_ts, nfrequency=4)/3
# divisor is (old freq)/(new freq) = 12/4 = 3
Qtr1 Qtr2 Qtr3 Qtr4
2000 2 5 8 11
2001 14 17 20 23
2002 26 29 32 35
2003 38 41 44 47
2004 50 53 56 59
2005 62 65 68 71
2006 74 77 80 83
2007 86 89 92 95
2008 98
Which now shows you the mean of the monthly data written as quarterly.
The divisor is the trick here. If you had weekly (freq=52) and wanted quarterly (freq=4) you'd divide by 52/4=13.
If you want the mean instead of the sum, just add "mean":
quarterly <- aggregate(monthly, nfrequency=4,mean)