converting to time series using ts() in r - r

Good afternoon
I have a time series
v2<-c(12,13,15,17,18,12,11,12)
which run from July 1996 to October 1997, just the months between July and October
when I try to convert to time series with
v2.ts<-ts(v2, frequency=12, start=c(1996,7), end=c(1997,10))
It yields me this result
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1996 12 13 15 17 18 12
1997 11 12 12 13 15 17 18 12 11 12
what parameters can I use to make it like:
Jul Aug Sep Oct
1996 12 13 15 17
1997 18 12 11 12
Thanks in advance for the help

A ts series must be regularly spaced but the output shown has points that are one month apart except between Oct of the first year and July of the second year so it is not of that form.
There are several packages that can represent irregularly spaced series. With the zoo package it would be done like this:
library(zoo)
z <- as.zoo(v2.ts)
z[cycle(z) %in% 7:10]
## Jul 1996 Aug 1996 Sep 1996 Oct 1996 Jul 1997 Aug 1997 Sep 1997 Oct 1997
## 12 13 15 17 18 12 11 12
If you are not looking for a time series but just a matrix with the indicated elements then:
tapply(c(v2.ts), list(floor(time(v2.ts)), cycle(v2.ts)), c)[, 7:10]
## 7 8 9 10
## 1996 12 13 15 17
## 1997 18 12 11 12

Related

Daily Average of Time series derived from monthly data R monthdays()

I have a time series object ts. I have mentioned the entire object here. It has data from Jan 2013 to Dec 2017 for all years. I am trying to find the daily average value so that the value is divided by the number of days in a month.
Expected output
The first value for Jan 2013 in ts is 23770, I want the value to be 23770/31 where 31 is the number of days in Jan, second value for Feb 2013 is 23482. I want the value to be 23482/28 as 28 was the number of days in Feb 2013 and so on
Tried so far:
I know monthdays() can do this. Something like ts/monthdays() .Monthdays() returns number of days in a month. I am not able to implement it here. Read about this tapply somewhere but it is not giving me desired result, since i need values corresponding to each month year combination.
ts
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2013 23770 23482 23601 22889 23401 24240 23873 23647 23378 23871 22624 23496
2014 26765 27619 26341 27320 27389 27418 26874 27005 27538 26324 27267 27583
2015 28354 27452 28336 28998 28595 28338 27806 28660 27226 28317 28666 28574
2016 30209 30659 31554 30248 30358 31091 30389 30247 31227 31839 30602 30609
2017 32180 32203 31639 31784 32375 30856 31863 32827 32506 31702 31681 32176
> cycle(ts_actual_group2)
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2013 1 2 3 4 5 6 7 8 9 10 11 12
2014 1 2 3 4 5 6 7 8 9 10 11 12
2015 1 2 3 4 5 6 7 8 9 10 11 12
2016 1 2 3 4 5 6 7 8 9 10 11 12
2017 1 2 3 4 5 6 7 8 9 10 11 12
Using tapply since i read it , but this is not giving desired output
tapply(ts_actual_group2, cycle(ts_actual_group2), mean)
1 2 3 4 5 6 7 8 9 10 11 12
28255.6 28283.0 28294.2 28247.8 28423.6 28388.6 28161.0 28477.2 28375.0 28410.6 28168.0 28487.6
I am not able to implement it here.
I'm not sure why you couldn't. The monthdays function from the forecast package, when applied to a ts object, returns the number of days in each month of the series. The object returned is a time-series of the same dimension as the input. So you can simply divide them.
library(forecast)
ts/monthdays(ts)
Jan Feb Mar Apr May Jun Jul
2013 766.7742 838.6429 761.3226 762.9667 754.8710 808.0000
2014 863.3871 986.3929 849.7097 910.6667 883.5161 913.9333
2015 914.6452 980.4286 914.0645 966.6000 922.4194 944.6000
2016 974.4839 1057.2069 1017.8710 1008.2667 979.2903 1036.3667
2017 1038.0645 1150.1071 1020.6129 1059.4667 1044.3548 1028.5333
monthsdays(ts) # Accepts a time-series object
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2013 31 28 31 30 31 30 31 31 30 31 30 31
2014 31 28 31 30 31 30 31 31 30 31 30 31
2015 31 28 31 30 31 30 31 31 30 31 30 31
2016 31 29 31 30 31 30 31 31 30 31 30 31
2017 31 28 31 30 31 30 31 31 30 31 30 31

Dealing with nonexistent data when converting to time-series in CRAN R

I have got following data set and I am trying to convert the consumption to time series. Some of the data are nonexistent (e.g. there is no data for 10/2014).
year month consumption
2014 7 10617
2014 8 8318
2014 9 3199
2014 12 2066
2015 1 10825
2015 2 3096
2015 3 1665
2015 4 3651
2015 5 5807
2015 7 2951
2015 8 5885
2015 9 3653
2015 10 4266
2015 11 9706
when I use ts() in R, the wrong values are replaced for nonexistent months.
ts(mkt$consumptions, start = c(2014,7),end=c(2015,11), frequency=12)
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2014 10617 8318 3199 2066 10825 3096
2015 1665 3651 5807 2951 5885 3653 4266 9706 10617 8318 3199
,y question is how to simply replace the nonexistent values with zero or blank?
"ts" class requires that the data be regularly spaced, i.e. every month should be present or NA but that is not the case here. The zoo package can handle irregularly spaced series. Read the input into zoo using the "yearmon" class for the year/month and then simply use it as a "zoo" series or else convert it to "ts". If the input is in a file but otherwise is exactly the same as in Lines then replace text = Lines with something like "myfile.dat" .
Lines <- "year month consumption
2014 7 10617
2014 8 8318
2014 9 3199
2014 12 2066
2015 1 10825
2015 2 3096
2015 3 1665
2015 4 3651
2015 5 5807
2015 7 2951
2015 8 5885
2015 9 3653
2015 10 4266
2015 11 9706"
library(zoo)
toYearmon <- function(y, m) as.yearmon(paste(y, m), "%Y %m")
z <- read.zoo(text = Lines, header = TRUE, index = 1:2, FUN = toYearmon)
as.ts(z)

Why the value of x-axis in time series plot in R shows 2012.0, 2013.0 rather than Jan 2012, etc?

I have create a time series matrix with code and output like below:
ts2 <-ts(cbind(LRC_3PDMUM, LRC_3PDMMS),frequency=12,start=c(2012,1))
ts2
LRC_3PDMUM LRC_3PDMMS
Jan 2012 0.029256 0.025904
Feb 2012 0.051945 0.055827
Mar 2012 0.078153 0.084049
Apr 2012 0.100596 0.110188
May 2012 0.126015 0.136850
Jun 2012 0.149349 0.162446
Jul 2012 0.173949 0.186486
Aug 2012 0.198704 0.212683
Sep 2012 0.220277 0.237433
Oct 2012 0.244358 0.262342
Nov 2012 0.272664 0.286019
Dec 2012 0.293653 0.309429
Jan 2013 0.320472 0.331575
Feb 2013 0.339880 0.356900
Mar 2013 0.362203 0.384612
Apr 2013 0.383525 0.408996
May 2013 0.403316 0.431810
Jun 2013 0.430651 0.454040
Jul 2013 0.453148 0.475161
Aug 2013 0.484378 0.496460
Sep 2013 0.501923 0.518307
Oct 2013 0.525252 0.541631
Nov 2013 0.544958 0.563007
Dec 2013 0.564571 0.582775
However, when I do plot(ts2), the plot has x-axis value like 2012.0, 2013.0, versus what I would expect Jan 2012, feb 2013, etc. Please advise how to revise the code. Thanks!
Assuming an example that looks like yours:
a <- ts( matrix(1:100,ncol=2), frequency = 12, start = c(1959, 1))
> a
Series 1 Series 2
Jan 1959 1 51
Feb 1959 2 52
Mar 1959 3 53
Apr 1959 4 54
May 1959 5 55
Jun 1959 6 56
Jul 1959 7 57
Aug 1959 8 58
Sep 1959 9 59
Oct 1959 10 60
Nov 1959 11 61
Dec 1959 12 62
Jan 1960 13 63
Feb 1960 14 64
#and so on...
The easiest way would be to use the xts package like this:
library(xts)
#transform to xts that uses this date format
b <- as.xts(a)
#plot first series
plot (b[, 'Series 1'], ylim=c(0,100))
#plot second series
lines(b[, 'Series 2'], col='red')

Fill last spot of matrix with NA

I think I might be missing something very simply, but:
How I can I fill the last spots of a matrix with NA instead of it just repeating previous values?
Data example:
x <- 1:27
m <- matrix(x, nrow = 12, ncol = ceiling(nrow(base.de)/12), byrow = FALSE)
col_names <- c("2013", "2014", "2015")
row_names <- c("Jan", "Feb", "Mar", "Apr", "May",
"Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")
dimnames(m) <- list(row_names, col_names)
m
2013 2014 2015
Jan 1 13 25
Feb 2 14 26
Mar 3 15 27
Apr 4 16 1 # NOT NA?
May 5 17 2
Jun 6 18 3
Jul 7 19 4
Aug 8 20 5
Sep 9 21 6
Oct 10 22 7
Nov 11 23 8
Dec 12 24 9
I would like all values after 2015 March to be filled with NA.
If you assign a shorter vector to a longer vector in R, it recycles the values in the shorter vector. That's what you are observing here. (Note that a matrix is just a vector with dimension attribute.) This behaviour cannot be avoided. So, you should assign NA after creating the matrix:
m[28:length(m)] <- NA
Or, alternatively, you could append the necessary number of NA values to 1:27 when creating the matrix.
Create a list of dimnames and from that create a matrix of NAs. Finally, fill it:
x <- 1:27 # input as per question
dnm <- list(month.abb, 2013:2015) # list of dimnames
m <- matrix(NA, nrow = length(dnm[[1]]), ncol = length(dnm[[2]]), dimnames = dnm)
m[seq_along(x)] <- x
Note: You might not want to do this at all and instead create a monthly time series:
library(zoo)
z <- zooreg(x, as.yearmon("2013-01"), freq = 12)
giving:
> z
Jan 2013 Feb 2013 Mar 2013 Apr 2013 May 2013 Jun 2013 Jul 2013 Aug 2013
1 2 3 4 5 6 7 8
Sep 2013 Oct 2013 Nov 2013 Dec 2013 Jan 2014 Feb 2014 Mar 2014 Apr 2014
9 10 11 12 13 14 15 16
May 2014 Jun 2014 Jul 2014 Aug 2014 Sep 2014 Oct 2014 Nov 2014 Dec 2014
17 18 19 20 21 22 23 24
Jan 2015 Feb 2015 Mar 2015
25 26 27
or a ts series:
> as.ts(z)
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2013 1 2 3 4 5 6 7 8 9 10 11 12
2014 13 14 15 16 17 18 19 20 21 22 23 24
2015 25 26 27
or directly:
ts(x, start = c(2013, 1), freq = 12)

Summarising data frame using maximum counts

I have this data.frame:
counts <- data.frame(year = sort(rep(2000:2009, 12)), month = rep(month.abb,10), count = sample(1:500, 120, replace = T))
First 20 rows of data:
head(counts, 20)
year month count
1 2000 Jan 14
2 2000 Feb 182
3 2000 Mar 462
4 2000 Apr 395
5 2000 May 107
6 2000 Jun 127
7 2000 Jul 371
8 2000 Aug 158
9 2000 Sep 147
10 2000 Oct 41
11 2000 Nov 141
12 2000 Dec 27
13 2001 Jan 72
14 2001 Feb 7
15 2001 Mar 40
16 2001 Apr 351
17 2001 May 342
18 2001 Jun 81
19 2001 Jul 442
20 2001 Aug 389
Lets say I try to calculate the standard deviation of these data using the usual R code:
library(plyr)
ddply(counts, .(month), summarise, s.d. = sd(count))
month s.d.
1 Apr 145.3018
2 Aug 140.9949
3 Dec 173.9406
4 Feb 127.5296
5 Jan 148.2661
6 Jul 162.4893
7 Jun 133.4383
8 Mar 125.8425
9 May 168.9517
10 Nov 93.1370
11 Oct 167.9436
12 Sep 166.8740
This gives the standard deviation around the mean of each month. How can I get R to output standard deviation around maximum value of each month?
you want: "max of values per month and the average from this maximum value" [which is not the same as the standard deviation].
counts <- data.frame(year = sort(rep(2000:2009, 12)), month = rep(month.abb,10), count = sample(1:500, 120, replace = T))
library(data.table)
counts=data.table(counts)
counts[,mean(count-max(count)),by=month]
This question is highly vague. If you want to calculate the standard deviation of the differences to the maximum, you can use this code:
> library(plyr)
> ddply(counts, .(month), summarise, sd = sd(count - max(count)))
month sd
1 Apr 182.5071
2 Aug 114.3068
3 Dec 117.1049
4 Feb 184.4638
5 Jan 138.1755
6 Jul 167.0677
7 Jun 100.8841
8 Mar 144.8724
9 May 173.3452
10 Nov 132.0204
11 Oct 127.4645
12 Sep 152.2162

Resources