starting a daily time series in R - r

I have a daily time series about number of visitors on the web site. my series start from 01/06/2014 until today 14/10/2015 so I wish to predict number of visitor for in the future. How can I read my series with R? I'm thinking:
series <- ts(visitors, frequency=365, start=c(2014, 6))
if yes,and after runing my time series model arimadata=auto.arima() I want to predict visitor's number for the next 6o days, how can i do this?
h=..?
forecast(arimadata,h=..),
the value of h shoud be what ?
thanks in advance for your help

The ts specification is wrong; if you are setting this up as daily observations, then you need to specify what day of the year 2014 is June 1st and specify this in start:
## Create a daily Date object - helps my work on dates
inds <- seq(as.Date("2014-06-01"), as.Date("2015-10-14"), by = "day")
## Create a time series object
set.seed(25)
myts <- ts(rnorm(length(inds)), # random data
start = c(2014, as.numeric(format(inds[1], "%j"))),
frequency = 365)
Note that I specify start as c(2014, as.numeric(format(inds[1], "%j"))). All the complicated bit is doing is working out what day of the year June 1st is:
> as.numeric(format(inds[1], "%j"))
[1] 152
Once you have this, you're effectively there:
## use auto.arima to choose ARIMA terms
fit <- auto.arima(myts)
## forecast for next 60 time points
fore <- forecast(fit, h = 60)
## plot it
plot(fore)
That seems suitable given the random data I supplied...
You'll need to select appropriate arguments for auto.arima() as suits your data.
Note that the x-axis labels refer to 0.5 (half) of a year.
Doing this via zoo
This might be easier to do via a zoo object created using the zoo package:
## create the zoo object as before
set.seed(25)
myzoo <- zoo(rnorm(length(inds)), inds)
Note you now don't need to specify any start or frequency info; just use inds computed earlier from the daily Date object.
Proceed as before
## use auto.arima to choose ARIMA terms
fit <- auto.arima(myts)
## forecast for next 60 time points
fore <- forecast(fit, h = 60)
The plot though will cause an issue as the x-axis is in days since the epoch (1970-01-01), so we need to suppress the auto plotting of this axis and then draw our own. This is easy as we have inds
## plot it
plot(fore, xaxt = "n") # no x-axis
Axis(inds, side = 1)
This only produces a couple of labeled ticks; if you want more control, tell R where you want the ticks and labels:
## plot it
plot(fore, xaxt = "n") # no x-axis
Axis(inds, side = 1,
at = seq(inds[1], tail(inds, 1) + 60, by = "3 months"),
format = "%b %Y")
Here we plot every 3 months.

Time Series Object does not work well with creating daily time series. I will suggest you use the zoo library.
library(zoo)
zoo(visitors, seq(from = as.Date("2014-06-01"), to = as.Date("2015-10-14"), by = 1))

Here's how I created a time series when I was given some daily observations with quite a few observations missing. #gavin-simpson gave quite a big help. Hopefully this saves someone some grief.
The original data looked something like this:
library(lubridate)
set.seed(42)
minday = as.Date("2001-01-01")
maxday = as.Date("2005-12-31")
dates <- seq(minday, maxday, "days")
dates <- dates[sample(1:length(dates),length(dates)/4)] # create some holes
df <- data.frame(date=sort(dates), val=sin(seq(from=0, to=2*pi, length=length(dates))))
To create a time-series with this data I created a 'dummy' dataframe with one row per date and merged that with the existing dataframe:
df <- merge(df, data.frame(date=seq(minday, maxday, "days")), all=T)
This dataframe can be cast into a timeseries. Missing dates are NA.
nts <- ts(df$val, frequency=365, start=c(year(minday), as.numeric(format(minday, "%j"))))
plot(nts)

series <- ts(visitors, frequency=365, start=c(2014, 152))
152 number is 01-06-2014 as it start from 152 number because of frequency=365
To forecast for 60 days, h=60.
forecast(arimadata , h=60)

Related

Can we plot multiple time series in one plot using hydroTSM?

I have daily precipitation data in the following format:
> head(df)
I_2004 G_2004 T_2004 Date
1 3628.79853 2199.310 12741.413 2004-01-01
2 1556.66704 4322.884 5464.395 2004-01-02
3 20.43379 5592.103 72.998 2004-01-03
4 265.94247 8145.041 942.344 2004-01-04
5 914.93958 9668.531 3227.579 2004-01-05
6 2585.63558 6825.905 9043.866 2004-01-06
usually I plot the time series of all 3 variables together using ggplot2:
dfmelt<-melt(df,id.vars="Date")
ggplot(dfmelt,aes(x=Date,y=value,
col=variable,group=12))+
labs(title='ANNUAL')+
geom_line()
I have used hydroTSM to plot ts but never multi variable one. I was wondering if there was any way to achieve this using packages like hydroTSM?
my current method requires subsetting and doing so for multiple years is time consuming. I'm hoping to shorten this using hydroTSM or any other suitable package.
my aim to is plot monthly and seasonal time series plots.
We use a larger data frame below (see Note at end) so that it is possible to display month plots. Convert the data frame df to a zoo series -- hydroTSM makes zoo available -- and use autoplot.zoo . Use aggregate with tail or mean to create a monthly plot and convert that to ts to create the seasonal plot. Except for ggplot2 the following only uses packages already pulled in by hydroTSM.
library(ggplot2)
library(hydroTSM)
z <- read.zoo(df, index = "Date")
autoplot(z) # separate panels
autoplot(z, facets = NULL) # single panel
# monthly plot
zm <- aggregate(z, as.yearmon, tail, 1, frequency = 12)
autoplot(zm)
# for seasonal plot
tt <- as.ts(zm)
nc <- ncol(tt)
opar <- par(mfrow = c(nc, 1), mar = c(2, 4, 0, 4))
for(j in 1:nc) monthplot(tt[, j], ylab = colnames(tt)[j])
par(opar)
Note
df in reproducible form. Larger than in question so that monthly plots can be shown.
set.seed(123)
n <- 700
df <- data.frame(I_2004 = rnorm(n),
G_2004 = rnorm(n),
T_2004 = rnorm(n),
Date = as.Date("2004-01-01") + 1:n - 1)

Create a boxplot per year out of a ts object in R

I have a ts object: 240 monthly observations stating from January 2000:
data <- runif(240)
data_ts <- ts(data,
start = c(2000, 1),
frequency = 12)
And I want to create a boxplot per year out of my data_ts.
I know how to create a boxplot per month:
boxplot(data_ts ~ cycle(data_ts))
But I don't know how to create a boxplot per year, that is, a boxplot of the observations of each year (a boxplot of year 2000, a boxplot of 2001, and so on).
Any idea?
Thanks!
The year is given as shown:
year <- as.integer(time(data_ts))
boxplot(data_ts ~ year)
I use the window() function to subset the years, and a for() loop to iterate each year and create a boxplot(). The title() function adds the title to the plot, and png() and dev.off() work together to save the image to disk:
getwd() # print location files will be saved too.
for (i in 2010:2012) { # small loop for testing)
png(file=paste("boxplot_",i,".png",sep="")) # create a png
boxplot(window(x=data_ts, start=c(i, 1), end=c(i, 12))) # boxplot, of yearly data.
title(i) # add the year as a title to the plot
dev.off() # save the png
}
Maybe this also helps:
data <- runif(240)
data_ts <- ts(data,
start = c(2000, 1),
frequency = 12)
frame<-data.frame(values=as.matrix(data_ts), date=lubridate::year(zoo::as.Date(data_ts)))
library(ggplot2)
ggplot(frame,aes(y=values,x=date,group=date))+
geom_boxplot()
It is not the most elegant solution though as it uses both the zoo and lubridate packages to convert the date into a year that ggplot understands.

ts.plot() not plotting Time Series data against custom x-axis

I am having issues with trying to plot some Time Series data; namely, trying to plot the date (increments in months) against a real number (which represents price).
I can plot the data with just plot(months, mydata) with no issue, but its in a scatter plot format.
However, when I try the same with ts.plot i.e. tsplot(months, mydata), I get the following error:
Error in .cbind.ts(list(...), .makeNamesTs(...), dframe = dframe, union = TRUE) : no time series supplied
I tried to bypass this by doing tsplot(ts(months, mydata)), but with this I get a straight linear line (which I know isn't correct).
I have made sure that both months and mydata have the same length
EDIT: What I mean by custom x-axis
I need the data to be in monthly increments (specifically from 03/1998 to 02/2018) - so I ran the following in R:
d <- seq(as.Date("1998-03-01"), as.Date("2018-02-01"), "day")
months <- seq(min(d), max(d), "month")
Now that I have attained the monthly increments, I need the above variable, months, to act as the x-axis for the Time Series plot (perhaps more accurately, the time index).
With package zoo you can do the following.
library(zoo)
z <- zoo(mydata, order.by = months)
labs <- seq(min(index(z)), max(index(z)), length.out = 10)
plot(z, xaxt = "n")
axis(1, at = labs, labels = format(labs, "%m/%Y"))
Data creation code.
set.seed(1234)
d <- seq(as.Date("1998-03-01"), as.Date("2018-02-01"), "day")
months <- seq(min(d), max(d), "month")
n <- length(months)
mydata <- cumsum(rnorm(n))

Plot time series knowing only start time/date and sampling periods

I want to plot a density time series with following data:
density vector (4,2,5,8,4,6,4)
sampling period vector (unit: seconds) (2,2,2,2,3,2,2)
as you can see, the sampling period is not constant. I only know the starting date and time.
I somehow need to assign the start time date to the first measurement and then compute the following dates and times for the following measurements, but i don't know how exactly to code it.
Try converting first the desired vector in a ts, provided an initial starttime and period's cumsum.
I assumed that you sample a continous process (there are not spanned/death times)
require (lubridate)
require (tidyr)
require (ggplot2)
require (ggfortify)
require (timetk)
density <- c (4,2,5,8,4,6,4)
seconds <- c (2,2,2,2,3,2,2)
starttime <- 0
time <- 0 + cumsum (seconds)
df <- as.data.frame (cbind (time, seconds, density))
df$time <- as_datetime(df$time)
df$ts <- tk_ts (df, select = density)
autoplot (df$ts, ts.geom = 'bar', fill = 'blue')
Plot the density against the cumulative sum of the seconds added to the start.
dens <- c(4,2,5,8,4,6,4)
secs <- c(2,2,2,2,3,2,2)
st <- as.POSIXct("2000-01-01 00:00:00")
plot(st + cumsum(secs), dens, xlab = "", type = "l")

time series in R, unwanted variable class changes

I'm trying to program the Coppock Curve in R and finding time series exceedingly difficult to work with in R. The S&P 500 data can be downloaded from finance.yahoo.com. Just bring in the date and the adjusted close.
sp500 = read.csv(file="/.../sp500.csv",header=TRUE)
attach(zoo)
sp500.z = zoo(sp500)
lag11 = lag(sp500.z$SP500, -11, na.pad=TRUE)
lag14 = lag(sp500.z$SP500, -14, na.pad=TRUE)
sp500.z = cbind(sp500.z, lag11, lag14)
str(sp500.z)
sp500.z[1:25,]
data = (sp500.z)
data[1:25,] ### everything looks good up to here
str(data)
data = as.data.frame(data) ### problem arises here, everything becomes factor even if it wasn't before, so I try to convert, but it doesn't work
data$SP500 = as.numeric(data$SP500)
data$lag11 = as.numeric(data$lag11)
data$lag14 = as.numeric(data$lag14)
data$date = as.Date(data$date)
In order to do further data manipulation I need to convert to a data frame, because you cannot attach a zoo matrix or perform dataset$variable operations on it. When I convert to data frame the lag11 and lag14 variables turn into index numbers. The data frame conversion makes everything a factor, and when the variable types are corrected the problem occurs.
The Coppock Curve is calculated as a 10-month weighted moving average of the sum of the 14-month rate of change and the 11-month rate of change for the index.
Coppock Curve = 10-month weighted MA(of 14-month ROC + 11-month ROC)
Where the ROC is:
ROC = [(Close - Close n periods ago) / (Close n periods ago)] * 100
where n is 11 and 14. The weights on the ROC terms go backwards in time from 10/55 for period t, 9/55 for t-1,..., 1/55 for t-9.
You do not need to convert to a data.frame. While you cannot use $ on a matrix, you can use it on zoo and xts objects. And you really shouldn't be using attach, especially if this is something you plan to put into a reusable script.
What you want to do is very easy with xts/zoo, quantmod, and TTR.
library(quantmod) # also loads TTR, xts, and zoo
# download data from Yahoo Finance
sp500 <- getSymbols("^GSPC", auto.assign=FALSE)
# convert to monthly
sp500m <- to.monthly(sp500)
# add lags (via $<-, like you claimed couldn't be done)
sp500m$lag11 <- ROC(Ad(sp500m), n=11, type="discrete")
sp500m$lag14 <- ROC(Ad(sp500m), n=14, type="discrete")
# calculate Coppock Curve
sp500m$Coppock <- WMA(sp500m$lag11 + sp500m$lag14, n=10, wts=(9:1)/55)
One option would be not to put the data into zoo format and then use the lag() function in dplyr instead. So:
library(dplyr)
sp500 = read.csv(file="/.../sp500.csv",header=TRUE)
sp500.v2 <- sp500 %>%
mutate(lag11 = lag(SP500, 11),
lag14 = lag(SP500, 14))
Are these data grouped somehow, like maybe by ticker symbol? If so, you could accommodate that like this:
sp500.v2 <- sp500 %>%
group_by([grouping variable name, no quotes]) %>%
mutate(lag11 = lag(SP500, 11),
lag14 = lag(SP500, 14))
And if the data aren't pre-sorted by date or you want to make sure that's done before you lag, you can use arrange() like so:
sp500.v2 <- sp500 %>%
group_by([grouping variable name, no quotes]) %>%
arrange([date variable]) %>%
mutate(lag11 = lag(SP500, 11),
lag14 = lag(SP500, 14))

Resources