I have got following data set and I am trying to convert the consumption to time series. Some of the data are nonexistent (e.g. there is no data for 10/2014).
year month consumption
2014 7 10617
2014 8 8318
2014 9 3199
2014 12 2066
2015 1 10825
2015 2 3096
2015 3 1665
2015 4 3651
2015 5 5807
2015 7 2951
2015 8 5885
2015 9 3653
2015 10 4266
2015 11 9706
when I use ts() in R, the wrong values are replaced for nonexistent months.
ts(mkt$consumptions, start = c(2014,7),end=c(2015,11), frequency=12)
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2014 10617 8318 3199 2066 10825 3096
2015 1665 3651 5807 2951 5885 3653 4266 9706 10617 8318 3199
,y question is how to simply replace the nonexistent values with zero or blank?
"ts" class requires that the data be regularly spaced, i.e. every month should be present or NA but that is not the case here. The zoo package can handle irregularly spaced series. Read the input into zoo using the "yearmon" class for the year/month and then simply use it as a "zoo" series or else convert it to "ts". If the input is in a file but otherwise is exactly the same as in Lines then replace text = Lines with something like "myfile.dat" .
Lines <- "year month consumption
2014 7 10617
2014 8 8318
2014 9 3199
2014 12 2066
2015 1 10825
2015 2 3096
2015 3 1665
2015 4 3651
2015 5 5807
2015 7 2951
2015 8 5885
2015 9 3653
2015 10 4266
2015 11 9706"
library(zoo)
toYearmon <- function(y, m) as.yearmon(paste(y, m), "%Y %m")
z <- read.zoo(text = Lines, header = TRUE, index = 1:2, FUN = toYearmon)
as.ts(z)
Related
I have dataframe df3
df3
x d y
1 bbc Sep 2020 123
2 rsb Sep 2020 234
3 atc Sep 2020 345
4 svc Sep 2020 543
5 mwe Sep 2020 567
6 bpa Oct 2020 322
7 mwe Oct 2020 456
8 uhs Oct 2020 786
9 se Oct 2020 543
10 db Oct 2020 778
11 rsb Nov 2020 358
12 svc Nov 2020 678
13 db Nov 2020 321
14 rb Nov 2020 689
15 bpa Nov 2020 765
After applying split function `df5 = split(df3,f=df3$d)
> df5 = split(df3,f=df3$d)
> df5
$`Sep 2020`
x d y
1 bbc Sep 2020 123
2 rsb Sep 2020 234
3 atc Sep 2020 345
4 svc Sep 2020 543
5 mwe Sep 2020 567
$`Oct 2020`
x d y
6 bpa Oct 2020 322
7 mwe Oct 2020 456
8 uhs Oct 2020 786
9 se Oct 2020 543
10 db Oct 2020 778
$`Nov 2020`
x d y
11 rsb Nov 2020 358
12 svc Nov 2020 678
13 db Nov 2020 321
14 rb Nov 2020 689
15 bpa Nov 2020 765
I would like to dynamically loop through the split dataframe.
I need to find out if any values present in Nov 2020 are also present in Oct 2020.
if it is present in both, then have to check the previous one Sep 2020, and also find the number of times the names have occurred. Here df3$d is in as.yearmon format. If any names in df5[["Nov 2020"]]$x are present in df5[["Sep 2020"]]$x, extract and store it in an object along with its count. here the count is 2 since it would be present in Nov 2020 and Oct 2020. Only if the names are present in the most recent month, it should check the previous months. For this example, the output should be
> df4
names_present present_for
1 bpa 2
2 db 2
Thank you in advance
Good afternoon
I have a time series
v2<-c(12,13,15,17,18,12,11,12)
which run from July 1996 to October 1997, just the months between July and October
when I try to convert to time series with
v2.ts<-ts(v2, frequency=12, start=c(1996,7), end=c(1997,10))
It yields me this result
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1996 12 13 15 17 18 12
1997 11 12 12 13 15 17 18 12 11 12
what parameters can I use to make it like:
Jul Aug Sep Oct
1996 12 13 15 17
1997 18 12 11 12
Thanks in advance for the help
A ts series must be regularly spaced but the output shown has points that are one month apart except between Oct of the first year and July of the second year so it is not of that form.
There are several packages that can represent irregularly spaced series. With the zoo package it would be done like this:
library(zoo)
z <- as.zoo(v2.ts)
z[cycle(z) %in% 7:10]
## Jul 1996 Aug 1996 Sep 1996 Oct 1996 Jul 1997 Aug 1997 Sep 1997 Oct 1997
## 12 13 15 17 18 12 11 12
If you are not looking for a time series but just a matrix with the indicated elements then:
tapply(c(v2.ts), list(floor(time(v2.ts)), cycle(v2.ts)), c)[, 7:10]
## 7 8 9 10
## 1996 12 13 15 17
## 1997 18 12 11 12
This question already has answers here:
How to reshape data from long to wide format
(14 answers)
Closed 7 years ago.
I have a df looks like below.
Year Month Cont
1 2011 Apr 1376
2 2012 Apr 1232
3 2013 Apr 1360
4 2014 Apr 1294
5 2015 Apr 1344
6 2011 Aug 1933
7 2012 Aug 1930
8 2013 Aug 1821
9 2014 Aug 1845
10 2015 Aug 1855
So my question is how can I switch the rows in "Month" the column. The result should look like this.
Cont Apr Aug
1 2011 1376 1933
2 2012 1232 1930
3 2013 1360 1821
4 2014 1294 1845
5 2015 1344 1855
You can use reshape2:
library(reshape2)
dcast(df, Year~Month, value.var="Cont")
Or tidyr:
library(tidyr)
spread(df, Month, Cont)
Please refer the following code
> dat <- read.table("data.txt", quote="\"", comment.char="")
> dat
V1 V2 V3
1 2011 Apr 1376
2 2012 Apr 1232
3 2013 Apr 1360
4 2014 Apr 1294
5 2015 Apr 1344
6 2011 Aug 1933
7 2012 Aug 1930
8 2013 Aug 1821
9 2014 Aug 1845
10 2015 Aug 1855
> library(reshape2)
> dcast(dat, V1~V2)
Using V3 as value column: use value.var to override.
V1 Apr Aug
1 2011 1376 1933
2 2012 1232 1930
3 2013 1360 1821
4 2014 1294 1845
5 2015 1344 1855
I have create a time series matrix with code and output like below:
ts2 <-ts(cbind(LRC_3PDMUM, LRC_3PDMMS),frequency=12,start=c(2012,1))
ts2
LRC_3PDMUM LRC_3PDMMS
Jan 2012 0.029256 0.025904
Feb 2012 0.051945 0.055827
Mar 2012 0.078153 0.084049
Apr 2012 0.100596 0.110188
May 2012 0.126015 0.136850
Jun 2012 0.149349 0.162446
Jul 2012 0.173949 0.186486
Aug 2012 0.198704 0.212683
Sep 2012 0.220277 0.237433
Oct 2012 0.244358 0.262342
Nov 2012 0.272664 0.286019
Dec 2012 0.293653 0.309429
Jan 2013 0.320472 0.331575
Feb 2013 0.339880 0.356900
Mar 2013 0.362203 0.384612
Apr 2013 0.383525 0.408996
May 2013 0.403316 0.431810
Jun 2013 0.430651 0.454040
Jul 2013 0.453148 0.475161
Aug 2013 0.484378 0.496460
Sep 2013 0.501923 0.518307
Oct 2013 0.525252 0.541631
Nov 2013 0.544958 0.563007
Dec 2013 0.564571 0.582775
However, when I do plot(ts2), the plot has x-axis value like 2012.0, 2013.0, versus what I would expect Jan 2012, feb 2013, etc. Please advise how to revise the code. Thanks!
Assuming an example that looks like yours:
a <- ts( matrix(1:100,ncol=2), frequency = 12, start = c(1959, 1))
> a
Series 1 Series 2
Jan 1959 1 51
Feb 1959 2 52
Mar 1959 3 53
Apr 1959 4 54
May 1959 5 55
Jun 1959 6 56
Jul 1959 7 57
Aug 1959 8 58
Sep 1959 9 59
Oct 1959 10 60
Nov 1959 11 61
Dec 1959 12 62
Jan 1960 13 63
Feb 1960 14 64
#and so on...
The easiest way would be to use the xts package like this:
library(xts)
#transform to xts that uses this date format
b <- as.xts(a)
#plot first series
plot (b[, 'Series 1'], ylim=c(0,100))
#plot second series
lines(b[, 'Series 2'], col='red')
I have a matrix in this format:
year month Freq
1 2014 April 466
2 2015 April 59535
3 2014 August 10982
4 2015 August 0
5 2014 December 35881
6 2015 December 0
7 2014 February 17
8 2015 February 24258
9 2014 January 0
10 2015 January 22785
11 2014 July 2981
12 2015 July 0
13 2014 June 1279
14 2015 June 31356
15 2014 March 289
16 2015 March 40274
I need to sort months on the basis of their occurrence i.e jan, feb, mar... when I sort it gets sorted on the basis of first alphabet. I used this:
mat <- mat[order(mat[,1], decreasing = TRUE), ]
and it looks like this :
row.names April August December February January July June March May November October September
1 2015 59535 0 0 24258 22785 0 31356 40274 84211 0 0 0
2 2014 466 10982 35881 17 0 2981 1279 289 879 8911 8565 4000
Can we sort months on the basis of occurrence in R ?
Suppose DF is the data frame from which you derived your matrix. We provide such a data frame in reproducible form at the end. Ensure that month and year are factors with appropriate levels. Note that month.name is a builtin variable in R that is used here to ensure that the month levels are appropriately sorted and we have assumed year is a numeric column. Then use levelplot like this:
DF2 <- transform(DF,
month = factor(as.character(month), levels = month.name),
year = factor(year)
)
library(lattice)
levelplot(Freq ~ year * month, DF2)
Note: Here is DF in reproducible form:
Lines <- " year month Freq
1 2014 April 466
2 2015 April 59535
3 2014 August 10982
4 2015 August 0
5 2014 December 35881
6 2015 December 0
7 2014 February 17
8 2015 February 24258
9 2014 January 0
10 2015 January 22785
11 2014 July 2981
12 2015 July 0
13 2014 June 1279
14 2015 June 31356
15 2014 March 289
16 2015 March 40274 "
DF <- read.table(text = Lines, header = TRUE)
Assuming you want to sort based on time (have to add a dummy day 1 to convert to time format):
time = strptime(paste(1, mat$month, mat$year), format = "%d %B %Y")
mat = mat[sort.ind(time, index.return=T)$ix, ]
Or if you don't care about the year:
time = strptime(paste(1, mat$month, 2000), format = "%d %B %Y")
mat = mat[sort.ind(time, index.return=T)$ix, ]