I have a time series object ts. I have mentioned the entire object here. It has data from Jan 2013 to Dec 2017 for all years. I am trying to find the daily average value so that the value is divided by the number of days in a month.
Expected output
The first value for Jan 2013 in ts is 23770, I want the value to be 23770/31 where 31 is the number of days in Jan, second value for Feb 2013 is 23482. I want the value to be 23482/28 as 28 was the number of days in Feb 2013 and so on
Tried so far:
I know monthdays() can do this. Something like ts/monthdays() .Monthdays() returns number of days in a month. I am not able to implement it here. Read about this tapply somewhere but it is not giving me desired result, since i need values corresponding to each month year combination.
ts
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2013 23770 23482 23601 22889 23401 24240 23873 23647 23378 23871 22624 23496
2014 26765 27619 26341 27320 27389 27418 26874 27005 27538 26324 27267 27583
2015 28354 27452 28336 28998 28595 28338 27806 28660 27226 28317 28666 28574
2016 30209 30659 31554 30248 30358 31091 30389 30247 31227 31839 30602 30609
2017 32180 32203 31639 31784 32375 30856 31863 32827 32506 31702 31681 32176
> cycle(ts_actual_group2)
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2013 1 2 3 4 5 6 7 8 9 10 11 12
2014 1 2 3 4 5 6 7 8 9 10 11 12
2015 1 2 3 4 5 6 7 8 9 10 11 12
2016 1 2 3 4 5 6 7 8 9 10 11 12
2017 1 2 3 4 5 6 7 8 9 10 11 12
Using tapply since i read it , but this is not giving desired output
tapply(ts_actual_group2, cycle(ts_actual_group2), mean)
1 2 3 4 5 6 7 8 9 10 11 12
28255.6 28283.0 28294.2 28247.8 28423.6 28388.6 28161.0 28477.2 28375.0 28410.6 28168.0 28487.6
I am not able to implement it here.
I'm not sure why you couldn't. The monthdays function from the forecast package, when applied to a ts object, returns the number of days in each month of the series. The object returned is a time-series of the same dimension as the input. So you can simply divide them.
library(forecast)
ts/monthdays(ts)
Jan Feb Mar Apr May Jun Jul
2013 766.7742 838.6429 761.3226 762.9667 754.8710 808.0000
2014 863.3871 986.3929 849.7097 910.6667 883.5161 913.9333
2015 914.6452 980.4286 914.0645 966.6000 922.4194 944.6000
2016 974.4839 1057.2069 1017.8710 1008.2667 979.2903 1036.3667
2017 1038.0645 1150.1071 1020.6129 1059.4667 1044.3548 1028.5333
monthsdays(ts) # Accepts a time-series object
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2013 31 28 31 30 31 30 31 31 30 31 30 31
2014 31 28 31 30 31 30 31 31 30 31 30 31
2015 31 28 31 30 31 30 31 31 30 31 30 31
2016 31 29 31 30 31 30 31 31 30 31 30 31
2017 31 28 31 30 31 30 31 31 30 31 30 31
Related
I have grouped data that I want to convert to ungrouped data.
year<-c(rep(2014,4),rep(2015,4))
Age<-rep(c(22,23,24,25),2)
n<-c(1,1,3,2,0,2,3,1)
mydata<-data.frame(year,Age,n)
I would like to have a dataset like the one below created from the previous one.
year Age
1 2014 22
2 2014 23
3 2014 24
4 2014 24
5 2014 24
6 2014 25
7 2014 25
8 2015 23
9 2015 23
10 2015 24
11 2015 24
12 2015 24
13 2015 25
Try
mydata[rep(1:nrow(mydata),mydata$n),]
year Age n
1 2014 22 1
2 2014 23 1
3 2014 24 3
3.1 2014 24 3
3.2 2014 24 3
4 2014 25 2
4.1 2014 25 2
6 2015 23 2
6.1 2015 23 2
7 2015 24 3
7.1 2015 24 3
7.2 2015 24 3
8 2015 25 1
Here's a tidyverse solution:
library(tidyverse)
mydata %>%
uncount(n)
which gives:
year Age
1 2014 22
2 2014 23
3 2014 24
4 2014 24
5 2014 24
6 2014 25
7 2014 25
8 2015 23
9 2015 23
10 2015 24
11 2015 24
12 2015 24
13 2015 25
You can also use tidyr syntax for this:
library(tidyr)
year<-c(rep(2014,4),rep(2015,4))
Age<-rep(c(22,23,24,25),2)
n<-c(1,1,3,2,0,2,3,1)
mydata<-data.frame(year,Age,n)
uncount(mydata, n)
#> year Age
#> 1 2014 22
#> 2 2014 23
#> 3 2014 24
#> 4 2014 24
#> 5 2014 24
#> 6 2014 25
#> 7 2014 25
#> 8 2015 23
#> 9 2015 23
#> 10 2015 24
#> 11 2015 24
#> 12 2015 24
#> 13 2015 25
But of course you shouldn't use tidyr just because it is tidyr :) An alternate view of the Tidyverse "dialect" of the R language, and its promotion by RStudio.
We can use tidyr::complete
library(tidyr)
library(dplyr)
mydata %>% group_by(year, Age) %>%
complete(n = seq_len(n)) %>%
select(-n) %>%
ungroup()
# A tibble: 14 × 2
year Age
<dbl> <dbl>
1 2014 22
2 2014 23
3 2014 24
4 2014 24
5 2014 24
6 2014 25
7 2014 25
8 2015 23
9 2015 23
10 2015 24
11 2015 24
12 2015 24
13 2015 25
14 2015 22
I am using R for my time series analysis and I have the following csv file that I have loaded into R:
CSV file:
I have used the zoo package to convert my data frame into a ts object:
library(zoo)
df1_ts <- as.ts(read.zoo(df1, FUN = as.yearmon))
Running:
class(df1_ts)
# [1] "mts" "ts" "matrix"`
However when I run head(df1_ts), I get the following results:
head(df1_ts)
# Time Series:
# Start = 2014
# End = 2018
# Frequency = 1
# Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
# 2014 4621 3569 4249 4593 3320 1970 2483 3474 4302 5670 5788 5570
# 2015 5747 4346 5176 5362 5360 3707 3883 5138 5568 6034 5989 5648
# 2016 5821 5164 5781 5346 5339 4743 5417 5514 5880 5899 6014 5641
# 2017 5980 5341 5890 5596 5753 5470 5589 5545 5749 5938 5864 5567
# 2018 5655 5392 5766 5268 5680 5337 5197 5714 5802 5935 5955 5637
Why am I getting Frequency=1? I am expecting the Frequency to be 12 as these are monthly data?
How can I fix this?
I have tried the following, without success:
df1_ts <- as.ts(read.zoo(df1, FUN = as.yearmon), freq=12)
The code shown in the question is creating a multivariate time series consisting of 12 series (one for each month column) whose time index is the year; however, what is wanted is a single univariate monthly series.
Using df1 shown reproducibly in the Note at the end, first convert the data.frame df1 to a matrix using transpose and then unravel this transposed matrix column by column into a single vector using c. Now we can define the ts series directly:
tt <- ts(c(t(df1[-1])), start = df1$Year[1], freq = 12)
giving:
frequency(tt)
## [1] 12
tt
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 2014 1 2 3 4 5 6 7 8 9 10 11 12
## 2015 13 14 15 16 17 18 19 20 21 22 23 24
## 2016 25 26 27 28 29 30 31 32 33 34 35 36
## 2017 37 38 39 40 41 42 43 44 45 46 47 48
## 2018 49 50 51 52 53 54 55 56 57 58 59 60
Note
Please do not use images to show your input data as it means that anyone wanting to answer with it would need to retype it. Provide it reproducibly as R code. I have done this for you this time, changing the data to avoid typing all those numbers.
df1 <- as.data.frame(cbind(2014:2018, matrix(1:60, ncol = 12, byrow = TRUE)))
names(df1) <- c("Year", month.abb)
Good afternoon
I have a time series
v2<-c(12,13,15,17,18,12,11,12)
which run from July 1996 to October 1997, just the months between July and October
when I try to convert to time series with
v2.ts<-ts(v2, frequency=12, start=c(1996,7), end=c(1997,10))
It yields me this result
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1996 12 13 15 17 18 12
1997 11 12 12 13 15 17 18 12 11 12
what parameters can I use to make it like:
Jul Aug Sep Oct
1996 12 13 15 17
1997 18 12 11 12
Thanks in advance for the help
A ts series must be regularly spaced but the output shown has points that are one month apart except between Oct of the first year and July of the second year so it is not of that form.
There are several packages that can represent irregularly spaced series. With the zoo package it would be done like this:
library(zoo)
z <- as.zoo(v2.ts)
z[cycle(z) %in% 7:10]
## Jul 1996 Aug 1996 Sep 1996 Oct 1996 Jul 1997 Aug 1997 Sep 1997 Oct 1997
## 12 13 15 17 18 12 11 12
If you are not looking for a time series but just a matrix with the indicated elements then:
tapply(c(v2.ts), list(floor(time(v2.ts)), cycle(v2.ts)), c)[, 7:10]
## 7 8 9 10
## 1996 12 13 15 17
## 1997 18 12 11 12
I have this data.frame:
counts <- data.frame(year = sort(rep(2000:2009, 12)), month = rep(month.abb,10), count = sample(1:500, 120, replace = T))
First 20 rows of data:
head(counts, 20)
year month count
1 2000 Jan 14
2 2000 Feb 182
3 2000 Mar 462
4 2000 Apr 395
5 2000 May 107
6 2000 Jun 127
7 2000 Jul 371
8 2000 Aug 158
9 2000 Sep 147
10 2000 Oct 41
11 2000 Nov 141
12 2000 Dec 27
13 2001 Jan 72
14 2001 Feb 7
15 2001 Mar 40
16 2001 Apr 351
17 2001 May 342
18 2001 Jun 81
19 2001 Jul 442
20 2001 Aug 389
Lets say I try to calculate the standard deviation of these data using the usual R code:
library(plyr)
ddply(counts, .(month), summarise, s.d. = sd(count))
month s.d.
1 Apr 145.3018
2 Aug 140.9949
3 Dec 173.9406
4 Feb 127.5296
5 Jan 148.2661
6 Jul 162.4893
7 Jun 133.4383
8 Mar 125.8425
9 May 168.9517
10 Nov 93.1370
11 Oct 167.9436
12 Sep 166.8740
This gives the standard deviation around the mean of each month. How can I get R to output standard deviation around maximum value of each month?
you want: "max of values per month and the average from this maximum value" [which is not the same as the standard deviation].
counts <- data.frame(year = sort(rep(2000:2009, 12)), month = rep(month.abb,10), count = sample(1:500, 120, replace = T))
library(data.table)
counts=data.table(counts)
counts[,mean(count-max(count)),by=month]
This question is highly vague. If you want to calculate the standard deviation of the differences to the maximum, you can use this code:
> library(plyr)
> ddply(counts, .(month), summarise, sd = sd(count - max(count)))
month sd
1 Apr 182.5071
2 Aug 114.3068
3 Dec 117.1049
4 Feb 184.4638
5 Jan 138.1755
6 Jul 167.0677
7 Jun 100.8841
8 Mar 144.8724
9 May 173.3452
10 Nov 132.0204
11 Oct 127.4645
12 Sep 152.2162
I wanted to cut a quarterly time series and did the following:
cuttedts <- initialts[time(initialts) > 1984.00]
which worked inasmuch as I got all data after the first quarter of 1984. Strikingly
is.ts(initialts)
# returns TRUE
while
is.ts(cuttedts)
# returns FALSE
What did I do wrong, should I use subset? What's the best way to do this?
You can use the window function to extract a subset of a time series.
For example :
R> myts <- ts(data=1:40, start=2001, end=c(2010,4), frequency=4)
R> myts
Qtr1 Qtr2 Qtr3 Qtr4
2001 1 2 3 4
2002 5 6 7 8
2003 9 10 11 12
2004 13 14 15 16
2005 17 18 19 20
2006 21 22 23 24
2007 25 26 27 28
2008 29 30 31 32
2009 33 34 35 36
2010 37 38 39 40
And then :
R> subts <- window(myts, start=c(2005,2), end=c(2008,3))
R> subts
Qtr1 Qtr2 Qtr3 Qtr4
2005 18 19 20
2006 21 22 23 24
2007 25 26 27 28
2008 29 30 31
The result is still a ts object :
R> is.ts(subts)
[1] TRUE