Simple time series analysis with R: aggregating and subsetting - r

I want to convert monthly data into quarterly averages. These are my 2 datasets:
gas <- UKgas
dd <- UKDriverDeaths
I was able to accomplish (I think) for the dd data as so:
dd.zoo <- zoo(dd)
ddq <- aggregate(dd.zoo, as.yearqtr, mean)
However I cannot figure out how to do this with the gas data...any help?
Follow-up
When I try to subset the data based on date (1969-1984) the resulting data does not include 1969 Q1 and instead includes 1985 Q1...any suggestions on how to fix this? I was just trying to subset as gas[1969:1984].

Originally I did not plan to post answer, as it looks like you did not pre-check your UKgas dataset to see that it is already a quarterly time series.
But the follow-up question is worth answering. "ts" object comes with many handy generic functions. We can use window to easily subset a time series. To extract the section between first quarter of 1969 and the final quarter of 1984, we can use
window(UKgas, start = c(1969,1), end = c(1984,4))
The result will still be a quarterly time series.
On the other hand, if we use "[" for subsetting, we lose object class:
class(UKgas[1:12])
#[1] "numeric"

Related

Quantmod - Chop data and constructing matrix of return series

I am having trouble with my R assignment I am working on this semester.
Here is the part that I am tasked with doing that I am confused about:
iv. Download 3 month TBill rate from Fred for the same sample period 01/01/1993 to 12/31/2013.
Useful Hints: You may have to chop the data to match the sample period.
v. Construct a matrix of return series combining Stock, S&P500, and TBill for the sample period.
Useful Hints:
Note that the rownames for the TBill may not match with the other two return series, as the dates do not match, although the month and year matches
You have to construct the row names for each of the series as Year – Month format (e.g. 1993-01) or delete the rownames from T-bill before you can combine all three series into one Return matrix.
You have to convert the Return matrix to a dataframe before you use the lm() function.
I tried this below like I have used getSymbols before for SPY and AAPL but it pulls an entire data set rather than the specific date range. How can I chop the data so it fits the desired date range?
getSymbols('TB3MS', src = 'FRED', from = "1993-01-01", to = "2013-12-31")
Next, how would I go about constructing the matrix of return series combining all of the stocks? Can anyone point me in the right direction?
Filtering an xts object: see examples in the xts documentation ?xts.
# filter 1993 until 2013
TB3MS["1993/2013"]
But these dates are of, because tbills are at the first day of the month, the stock dates are the last day of the month. With the coredata you can extract the tbill data and stick it into the other timeseries if the rows match.
Taking the data example from your previous question, you could do something like this (and I'm creating more steps than needed, you could combine a few statements into one):
# create monthly returns of the spy data and give the column a better name than monthly.returns
spy_returns <- monthlyReturn(SPY)
colnames(spy_returns) <- "SPY_returns"
# filter the tbill data
TB3MS_1993_2013 <- TB3MS["1993/2013"]
# add tbill data to spy data
spy_returns$TB3MS <- coredata(TB3MS_1993_2013)
Merging xts objects can just be done with merge. They will be merged on the dates.
merge(spy_returns, aapl_returns) would combine these two. If you have a lot of tickers, use Reduce (check help and SO on how to use Reduce with merge) but better would be to use the tidyquant package if allowed.

time series in R with sales prediction with only date values

i have a data with date(2015)with mm/dd/yy format and sales. I need to predict sales for 2016 with the given data. I just know, I need to use time series forecasting. However no idea. Since, many examples have only year like(1960,1970,..) my data has only one year with several months. Don't know how to plot too. can you give me a clear structure how to proceed?
Assuming that the date is in string and in the format mm/dd/yy
convert string into date by using this code
a <- "07/23/15"
b <- as.Date(a, format = "%m/%d/%y")
fullYear <- format(b,'%Y') // to get 2015 as year
halfYear <- format(b, '%y') //to get 15 as year
After this you can work on
I have found the solution. Converted sales figure into time series format.
plotted the data and seen whether there is any trend/Seasonality.
Since the data has only trend applied holts exponential smoothing under forecast package. Sales of 2016 has been found and plotted.

Time series analysis applicability?

I have a sample data frame like this (date column format is mm-dd-YYYY):
date count grp
01-09-2009 54 1
01-09-2009 100 2
01-09-2009 546 3
01-10-2009 67 4
01-11-2009 80 5
01-11-2009 45 6
I want to convert this data frame into time series using ts(), but the problem is: the current data frame has multiple values for the same date. Can we apply time series in this case?
Can I convert data frame into time series, and build a model (ARIMA) which can forecast count value on a daily basis?
OR should I forecast count value based on grp, but in that case, I have to select only grp and count column of a data frame. So in that case, I have to skip date column, and daily forecast for count value is not possible?
Suppose if I want to aggregate count value on per day basis. I tried with aggregate function, but there we have to specify date value, but I have a very large data set? Any other option available in r?
Can somebody, please, suggest if there is a better approach to follow? My assumption is that the time series forcast works only for bivariate data? Is this assumption right?
It seems like there are two aspects of your problem:
i want to convert this data frame into time series using ts(), but the
problem is- current data frame having multiple values for the same
date. can we apply time series in this case?
If you are happy making use of the xts package you could attempt:
dta2$date <- as.Date(dta2$date, "%d-%m-%Y")
dtaXTS <- xts::as.xts(dta2[,2:3], dta2$date)
which would result in:
>> head(dtaXTS)
count grp
2009-09-01 54 1
2009-09-01 100 2
2009-09-01 546 3
2009-10-01 67 4
2009-11-01 80 5
2009-11-01 45 6
of the following classes:
>> class(dtaXTS)
[1] "xts" "zoo"
You could then use your time series object as univariate time series and refer to the selected variable or as a multivariate time series, example using PerformanceAnalytics packages:
PerformanceAnalytics::chart.TimeSeries(dtaXTS)
Side points
Concerning your second question:
can somebody plz suggest me what is the better approach to follow, my
assumption is time series forcast is works only for bivariate data? is
this assumption also right?
IMHO, this is rather broad. I would suggest that you use created xts object and elaborate on the model you want to utilise and why, if it's a conceptual question about nature of time series analysis you may prefer to post your follow-up question on CrossValidated.
Data sourced via: dta2 <- read.delim(pipe("pbpaste"), sep = "") using the provided example.
Since daily forecasts are wanted we need to aggregate to daily. Using DF from the Note at the end, read the first two columns of data into a zoo series z using read.zoo and argument aggregate=sum. We could optionally convert that to a "ts" series (tser <- as.ts(z)) although this is unnecessary for many forecasting functions. In particular, checking out the source code of auto.arima we see that it runs x <- as.ts(x) on its input before further processing. Finally run auto.arima, forecast or other forecasting function.
library(forecast)
library(zoo)
z <- read.zoo(DF[1:2], format = "%m-%d-%Y", aggregate = sum)
auto.arima(z)
forecast(z)
Note: DF is given reproducibly here:
Lines <- "date count grp
01-09-2009 54 1
01-09-2009 100 2
01-09-2009 546 3
01-10-2009 67 4
01-11-2009 80 5
01-11-2009 45 6"
DF <- read.table(text = Lines, header = TRUE)
Updated: Revised after re-reading question.

how to extract all values from a data frame by month for years

I have data in a zoo data structure. I want to pull all August daily vaules over 10 years and compute monthly statistics for a period of record. Any thoughts on easy way to do this?
an example will be great of the specific date format, however, try format()
for example:
x <- as.POSIXct("2009-08-03 12:01:59.23")
format(x,"%b")
For simplicity, just create a new column with the format() then subset it with the month your looking for.

Calendar Year Return Calculation

I am trying to calculate calendar year GDP growth for the GDPC96 time series from FRED (i.e. for a xts object). I am looking for a simple function without loops which calculate the calendar year growth where the variables are the data object (here GDPC96), the frequency (here quarterly) and whether deprecated periods (such as 2013) shall be shown or not.
For example:
library(quantmod)
getSymbols("GDPC96",src="FRED")
a <- annualReturn(GDPC96,leading=FALSE)
tail(a)
I would like it to be such that the changes are per calendar year, i.e. it should calculate from 01.01.1947 to 01.01.1948 and so on. Then, for 2012, where data is only available through Oct, it should be omitted.
As far as I have seen none of the functions in PerformanceAnalytics and the related packages can do this properly.
It seems you want something like a year-over-year return calculation. I'm not aware of a function that does this automatically, but it's easy to do with the ROC function in the TTR package.
library(quantmod)
getSymbols("GDPC96",src="FRED")
ROC(GDPC96, 4) # 4-period returns for quarterly data
getSymbols("SPY")
spy <- to.monthly(SPY)
ROC(spy, 12) # 12-period returns for monthly data
Update based on comments:
first.obs.by.year <- lapply(split(GDPC96, "years"),first)
last.obs.by.year <- lapply(split(GDPC96, "years"),last)
ROC(do.call(rbind, first.obs.by.year))
ROC(do.call(rbind, last.obs.by.year))

Resources