Time series numeric again after indexing? - r

I wanted to cut a quarterly time series and did the following:
cuttedts <- initialts[time(initialts) > 1984.00]
which worked inasmuch as I got all data after the first quarter of 1984. Strikingly
is.ts(initialts)
# returns TRUE
while
is.ts(cuttedts)
# returns FALSE
What did I do wrong, should I use subset? What's the best way to do this?

You can use the window function to extract a subset of a time series.
For example :
R> myts <- ts(data=1:40, start=2001, end=c(2010,4), frequency=4)
R> myts
Qtr1 Qtr2 Qtr3 Qtr4
2001 1 2 3 4
2002 5 6 7 8
2003 9 10 11 12
2004 13 14 15 16
2005 17 18 19 20
2006 21 22 23 24
2007 25 26 27 28
2008 29 30 31 32
2009 33 34 35 36
2010 37 38 39 40
And then :
R> subts <- window(myts, start=c(2005,2), end=c(2008,3))
R> subts
Qtr1 Qtr2 Qtr3 Qtr4
2005 18 19 20
2006 21 22 23 24
2007 25 26 27 28
2008 29 30 31
The result is still a ts object :
R> is.ts(subts)
[1] TRUE

Related

Daily Average of Time series derived from monthly data R monthdays()

I have a time series object ts. I have mentioned the entire object here. It has data from Jan 2013 to Dec 2017 for all years. I am trying to find the daily average value so that the value is divided by the number of days in a month.
Expected output
The first value for Jan 2013 in ts is 23770, I want the value to be 23770/31 where 31 is the number of days in Jan, second value for Feb 2013 is 23482. I want the value to be 23482/28 as 28 was the number of days in Feb 2013 and so on
Tried so far:
I know monthdays() can do this. Something like ts/monthdays() .Monthdays() returns number of days in a month. I am not able to implement it here. Read about this tapply somewhere but it is not giving me desired result, since i need values corresponding to each month year combination.
ts
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2013 23770 23482 23601 22889 23401 24240 23873 23647 23378 23871 22624 23496
2014 26765 27619 26341 27320 27389 27418 26874 27005 27538 26324 27267 27583
2015 28354 27452 28336 28998 28595 28338 27806 28660 27226 28317 28666 28574
2016 30209 30659 31554 30248 30358 31091 30389 30247 31227 31839 30602 30609
2017 32180 32203 31639 31784 32375 30856 31863 32827 32506 31702 31681 32176
> cycle(ts_actual_group2)
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2013 1 2 3 4 5 6 7 8 9 10 11 12
2014 1 2 3 4 5 6 7 8 9 10 11 12
2015 1 2 3 4 5 6 7 8 9 10 11 12
2016 1 2 3 4 5 6 7 8 9 10 11 12
2017 1 2 3 4 5 6 7 8 9 10 11 12
Using tapply since i read it , but this is not giving desired output
tapply(ts_actual_group2, cycle(ts_actual_group2), mean)
1 2 3 4 5 6 7 8 9 10 11 12
28255.6 28283.0 28294.2 28247.8 28423.6 28388.6 28161.0 28477.2 28375.0 28410.6 28168.0 28487.6
I am not able to implement it here.
I'm not sure why you couldn't. The monthdays function from the forecast package, when applied to a ts object, returns the number of days in each month of the series. The object returned is a time-series of the same dimension as the input. So you can simply divide them.
library(forecast)
ts/monthdays(ts)
Jan Feb Mar Apr May Jun Jul
2013 766.7742 838.6429 761.3226 762.9667 754.8710 808.0000
2014 863.3871 986.3929 849.7097 910.6667 883.5161 913.9333
2015 914.6452 980.4286 914.0645 966.6000 922.4194 944.6000
2016 974.4839 1057.2069 1017.8710 1008.2667 979.2903 1036.3667
2017 1038.0645 1150.1071 1020.6129 1059.4667 1044.3548 1028.5333
monthsdays(ts) # Accepts a time-series object
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2013 31 28 31 30 31 30 31 31 30 31 30 31
2014 31 28 31 30 31 30 31 31 30 31 30 31
2015 31 28 31 30 31 30 31 31 30 31 30 31
2016 31 29 31 30 31 30 31 31 30 31 30 31
2017 31 28 31 30 31 30 31 31 30 31 30 31

Why am I getting a frequency of 1 for this monthly time series data in R?

I am using R for my time series analysis and I have the following csv file that I have loaded into R:
CSV file:
I have used the zoo package to convert my data frame into a ts object:
library(zoo)
df1_ts <- as.ts(read.zoo(df1, FUN = as.yearmon))
Running:
class(df1_ts)
# [1] "mts" "ts" "matrix"`
However when I run head(df1_ts), I get the following results:
head(df1_ts)
# Time Series:
# Start = 2014
# End = 2018
# Frequency = 1
# Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
# 2014 4621 3569 4249 4593 3320 1970 2483 3474 4302 5670 5788 5570
# 2015 5747 4346 5176 5362 5360 3707 3883 5138 5568 6034 5989 5648
# 2016 5821 5164 5781 5346 5339 4743 5417 5514 5880 5899 6014 5641
# 2017 5980 5341 5890 5596 5753 5470 5589 5545 5749 5938 5864 5567
# 2018 5655 5392 5766 5268 5680 5337 5197 5714 5802 5935 5955 5637
Why am I getting Frequency=1? I am expecting the Frequency to be 12 as these are monthly data?
How can I fix this?
I have tried the following, without success:
df1_ts <- as.ts(read.zoo(df1, FUN = as.yearmon), freq=12)
The code shown in the question is creating a multivariate time series consisting of 12 series (one for each month column) whose time index is the year; however, what is wanted is a single univariate monthly series.
Using df1 shown reproducibly in the Note at the end, first convert the data.frame df1 to a matrix using transpose and then unravel this transposed matrix column by column into a single vector using c. Now we can define the ts series directly:
tt <- ts(c(t(df1[-1])), start = df1$Year[1], freq = 12)
giving:
frequency(tt)
## [1] 12
tt
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 2014 1 2 3 4 5 6 7 8 9 10 11 12
## 2015 13 14 15 16 17 18 19 20 21 22 23 24
## 2016 25 26 27 28 29 30 31 32 33 34 35 36
## 2017 37 38 39 40 41 42 43 44 45 46 47 48
## 2018 49 50 51 52 53 54 55 56 57 58 59 60
Note
Please do not use images to show your input data as it means that anyone wanting to answer with it would need to retype it. Provide it reproducibly as R code. I have done this for you this time, changing the data to avoid typing all those numbers.
df1 <- as.data.frame(cbind(2014:2018, matrix(1:60, ncol = 12, byrow = TRUE)))
names(df1) <- c("Year", month.abb)

R stacked percentage bar plot with percentage of binary factor and labels

I want to produce a graphic that looks something like this (with percentage and legend) by R:
My original data is:
AIRBUS BOEING EMBRAER
2002 18 21 30
2003 20 23 31
2004 23 26 29
2005 22 25 26
2006 22 25 25
2007 22 27 17
2008 21 21 16
2009 17 19 22
2010 14 22 24
2011 17 27 22
2012 16 22 19
2013 11 24 19
There are similar questions on SO already, but I seem to lack the sufficient amount of intelligence (or understanding of R) to extrapolate from them to a solution to my particular problem.
First, gather or melt your data into long format. Then it's easy.
library(tidyverse)
df <- read.table(
text = "
YEAR AIRBUS BOEING EMBRAER
2002 18 21 30
2003 20 23 31
2004 23 26 29
2005 22 25 26
2006 22 25 25
2007 22 27 17
2008 21 21 16
2009 17 19 22
2010 14 22 24
2011 17 27 22
2012 16 22 19
2013 11 24 19",
header = TRUE
)
df_long <- df %>%
gather(company, percentage, AIRBUS:EMBRAER)
ggplot(df_long, aes(x = YEAR, y = percentage, fill = company)) +
geom_col() +
ggtitle("Departure delays by company and Year") +
scale_x_continuous(breaks = 2002:2013)

Subset dataframe according to maxima of groups

I am trying to create a subset of a dataframe conditional on grouped cumulative sums of one of the columns (i.e., cumsum of Total, grouped by Year, below).
I have a population table that looks as follows (simplified)
Year Age Total Cum.Sum
1991 20 94619 94619
1991 21 97455 192074
1991 22 101418 293492
1991 23 104192 397684
1991 24 108332 506016
1991 25 111355 617371
1991 26 114569 731940
1991 27 113852 845792
1991 28 112264 958056
1991 29 110230 1068286
1991 30 109149 1177435
1991 31 108222 1285657
1991 32 106641 1392298
1991 33 106658 1498956
1991 34 104730 1603686
1991 35 103383 1707069
1991 36 101441 1808510
1991 37 99773 1908283
1991 38 100621 2008904
1991 39 98135 2107039
1991 40 101946 2208985
2010 20 93470 93470
2010 21 94762 188232
2010 22 92527 280759
2010 23 94696 375455
2010 24 95416 470871
2010 25 98016 568887
2010 26 98387 667274
2010 27 102254 769528
2010 28 103343 872871
2010 29 105179 978050
2010 30 104278 1082328
2010 31 104099 1186427
2010 32 105240 1291667
2010 33 105316 1396983
2010 34 106250 1503233
2010 35 109019 1612252
2010 36 110044 1722296
2010 37 113949 1836245
2010 38 118086 1954331
2010 39 119845 2074176
2010 40 123647 2197823
Now I'd like to subset this dataframe so that the cumulative sum of each year does not exceed a certain treshold, e.g.
1991 2010
1605897 1803476
I do not want to have separate datasets per year.
This will do:
t.h <- read.table(header=TRUE, text=
'Year th
1991 1605897
2010 1803476')
d <- merge(dataset, t.h)
subset(dataset, Cum.Sum < t.h)

Creating a vector with multiple sequences based on number of IDs' repetitions

I've got a data frame with panel-data, subjects' characteristic through the time. I need create a column with a sequence from 1 to the maximum number of year per every subject. For example, if subject 1 is in the data frame from 2000 to 2005, I need the following sequence: 1,2,3,4,5,6.
Below is a small fraction of my data. The last column (exp) is what I trying to get. Additionally, if you have a look at the first subject (13) you'll see that in 2008 the value of qtty is zero. In this case I need just a NA or a code (0,1, -9999), it doesn't matter which one.
Below the data is what I did to get that vector, but it didn't work.
Any help will be much appreciated.
subject season qtty exp
13 2000 29 1
13 2001 29 2
13 2002 29 3
13 2003 29 4
13 2004 29 5
13 2005 27 6
13 2006 27 7
13 2007 27 8
13 2008 0 NA
28 2000 18 1
28 2001 18 2
28 2002 18 3
28 2003 18 4
28 2004 18 5
28 2005 18 6
28 2006 18 7
28 2007 18 8
28 2008 18 9
28 2009 20 10
28 2010 20 11
28 2011 20 12
28 2012 20 13
35 2000 21 1
35 2001 21 2
35 2002 21 3
35 2003 21 4
35 2004 21 5
35 2005 21 6
35 2006 21 7
35 2007 21 8
35 2008 21 9
35 2009 14 10
35 2010 11 11
35 2011 11 12
35 2012 10 13
My code:
numbY<-aggregate(season ~ subject, data = toCountY,length)
colnames(numbY)<-c("subject","inFish")
toCountY$inFish<-numbY$inFish[match(toCountY$subject,numbY$subject)]
numbYbyFisher<-unique(numbY)
seqY<-aggregate(numbYbyFisher$inFish, by=list(numbYbyFisher$subject), function(x)seq(1,x,1))
I am using ddply and I distinguish 2 cases:
Either you generate a sequence along subjet and you replace by NA where you have qtty is zero
ddply(dat,.(subjet),transform,new.exp=ifelse(qtty==0,NA,seq_along(subjet)))
Or you generate a sequence along qtty different of zero with a jump where you have qtty is zero
ddply(dat,.(subjet),transform,new.exp={
hh <- seq_along(which(qtty !=0))
if(length(which(qtty ==0))>0)
hh <- append(hh,NA,which(qtty==0)-1)
hh
})
EDITED
ind=qtty!=0
exp=numeric(length(subject))
temp=0
for(i in 1:length(unique(subject[ind]))){
temp[i]=list(seq(from=1,to=table(subject[ind])[i]))
}
exp[ind]=unlist(temp)
this will provide what you need

Resources