ts.plot() not plotting Time Series data against custom x-axis - r

I am having issues with trying to plot some Time Series data; namely, trying to plot the date (increments in months) against a real number (which represents price).
I can plot the data with just plot(months, mydata) with no issue, but its in a scatter plot format.
However, when I try the same with ts.plot i.e. tsplot(months, mydata), I get the following error:
Error in .cbind.ts(list(...), .makeNamesTs(...), dframe = dframe, union = TRUE) : no time series supplied
I tried to bypass this by doing tsplot(ts(months, mydata)), but with this I get a straight linear line (which I know isn't correct).
I have made sure that both months and mydata have the same length
EDIT: What I mean by custom x-axis
I need the data to be in monthly increments (specifically from 03/1998 to 02/2018) - so I ran the following in R:
d <- seq(as.Date("1998-03-01"), as.Date("2018-02-01"), "day")
months <- seq(min(d), max(d), "month")
Now that I have attained the monthly increments, I need the above variable, months, to act as the x-axis for the Time Series plot (perhaps more accurately, the time index).

With package zoo you can do the following.
library(zoo)
z <- zoo(mydata, order.by = months)
labs <- seq(min(index(z)), max(index(z)), length.out = 10)
plot(z, xaxt = "n")
axis(1, at = labs, labels = format(labs, "%m/%Y"))
Data creation code.
set.seed(1234)
d <- seq(as.Date("1998-03-01"), as.Date("2018-02-01"), "day")
months <- seq(min(d), max(d), "month")
n <- length(months)
mydata <- cumsum(rnorm(n))

Related

Moving average on several time series using ggplot

Hi I try desperately to plot several time series with a 12 months moving average.
Here is an example with two time series of flower and seeds densities. (I have much more time series to work on...)
#datasets
taxon <- c(rep("Flower",36),rep("Seeds",36))
density <- c(seq(20, 228, length=36),seq(33, 259, length=36))
year <- rep(c(rep("2000",12),rep("2001",12),rep("2002",12)),2)
ymd <- c(rep(seq(ymd('2000-01-01'),ymd('2002-12-01'), by = 'months'),2))
#dataframe
df <- data.frame(taxon, density, year, ymd)
library(forecast)
#create function that does a Symmetric Weighted Moving Average (2x12) of the monthly log density of flowers and seeds
ma_12 <- function(x) {
ts_x <- ts(x, freq = 12, start = c(2000, 1), end = c(2002, 12)) # transform to time-series object as it is necessary to run the ma function
return(ma(log(ts_x + 1), order = 12, centre = T))
}
#trial of the function
ma_12(df[df$taxon=="Flower",]$density) #works well
library(ggplot2)
#Trying to plot flower and seeds log density as two time series
ggplot(df,aes(x=year,y=density,colour=factor(taxon),group=factor(taxon))) +
stat_summary(fun.y = ma_12, geom = "line") #or geom = "smooth"
#Warning message:
#Computation failed in `stat_summary()`:
#invalid time series parameters specified
Function ma_12 works correctly. The problem comes when I try to plot both time-series (Flower and Seed) using ggplot. I cannot define both taxa as different time series and apply a moving average on them. Seems that it has to do with "stat_summary"...
Any help would be more than welcome! Thanks in advance
Note: The following link is quite useful but can not directly help me as I want to apply a specific function and plot it in accordance to the levels of one group variable. For now, I can't find any solution. Any way, thank you to suggest me this.
Multiple time series in one plot
This is what you need?
f <- ma_12(df[df$taxon=="Flower", ]$density)
s <- ma_12(df[df$taxon=="Seeds", ]$density)
f <- cbind(f,time(f))
s <- cbind(s,time(s))
serie <- data.frame(rbind(f,s),
taxon=c(rep("Flower", dim(f)[1]), rep("Seeds", dim(s)[1])))
serie$density <- exp(serie$f)
library(lubridate)
serie$time <- ymd(format(date_decimal(serie$time), "%Y-%m-%d"))
library(ggplot2)
ggplot() + geom_point(data=df, aes(x=ymd, y=density, color=taxon, group=taxon)) +
geom_line(data=serie, aes(x= time, y=density, color=taxon, group=taxon))

Plotting monthly time series in R should be simpler

R could be amazingly powerful and frustrating at the same time. This makes teaching R to non-statisticians (business students in my case) rather challenging. Let me illustrate this with a simple task.
Let's say you are working with a monthly time series dataset. Most business data are usually plotted as monthly time series. We would like to plot the data such that the x-axis depicts a combination of month and year. For instance, January 2017 could be depicted as 2017-01. It should be straightforward with the plot command. Not true.
Data Generation
Let's illustrate this with an example. I'll generate a random time series of monthly data for 120 observations representing 10 years of information starting in January 2007 and ending in December 2017. Here's the code.
set.seed(1234)
x <- rnorm(120)
d <-.07
y <- cumsum(x+d)*-1
Since we have not declared the data as time series, plotting it with the plot command would not return the intended labels for the x-axis. See the code and the chart below.
plot(y, type="l")
Now there should be an option in the plot or the plot.ts command to display the time series specific x-axis. I couldn't find one. So here's the workaround.
Declare the data set to be time series.
Use tsp and seq to generate the required x-axis labels.
Plot the chart but suppress x-axis.
Use the axis command to add the custom x-axis labels.
Add an extra step to draw a vertical line at 2012.
Here's the code.
my.ts <- ts(y, start=c(2007, 1), end=c(2017, 12), frequency=12)
tsp = attributes(my.ts)$tsp
dates = seq(as.Date("2007-01-01"), by = "month", along = my.ts)
plot(my.ts, xaxt = "n", main= "Plotting outcome over time",
ylab="outcome", xlab="time")
axis(1, at = seq(tsp[1], tsp[2], along = my.ts), labels = format(dates, "%Y-%m"))
abline(v=2012, col="blue", lty=2, lwd=2)
The result is charted below.
This is a workable solution for most data scientists. But if your audience comprises business students or professionals there are too many lines of code to write.
Question: Is it possible to plot a time series variable (object) using the plot command with the format option controlling how the x-axis will be displayed?
--
ggplot2 package has the scale_x_date function for plotting time series in desired scales, labels, breaks and limits (day, month, year formats).
All you need is date class object and values y. For eg.
dates = seq(as.Date("01-01-2007", format = "%d-%m-%Y"), length.out = 120, by = "month")
df <- data.frame(dates, y)
# use the format you need in your plot using scale_x_date
library(ggplot2)
ggplot(df, aes(dates, y)) + geom_line() + scale_x_date(date_labels = "%b-%Y") +
geom_vline(xintercept = as.Date("01-01-2012", format = "%d-%m-%Y"), linetype = 'dotted', color = 'blue')
I think the question boils down to wanting a pre-written function for the custom axis you have in mind. Note that plot(my.ts) does give a plot with ticks every month and labels every year which to me looks better than the plot shown in the question but if you want a custom axis since R is a programming language you can certainly write a simple function for that and from then on it's just a matter of calling that function.
For example, to get you started here is a function that accepts a frequency 12 ts object. It draws an X axis with ticks for each month labelling the years and each every'th month where the every argument can be a divisor of 12. The default is 3 so a label for every third month is shown (except Jan which is shown as the year). len is the number of letters of the month shown and can be 1, 2 or 3. 1 means show Jul as J, 2 means Ju and 3 means Jul. The default is 1.
xaxis12 <- function(ser, every = 3, len = 1) {
tt <- time(ser)
axis(side = 1, at = tt, labels = FALSE)
is.every <- cycle(ser) %in% seq(1, 12, every)[-1]
month.labs <- substr(month.abb[cycle(ser)][is.every], 1, len)
axis(side = 1, at = tt[is.every], labels = month.labs,
cex.axis = 0.7, tcl = -0.75)
is.jan <- cycle(ser) == 1
year.labs <- sprintf("'%02d", as.integer(tt)[is.jan] %% 100)
axis(side = 1, at = tt[is.jan], labels = year.labs,
cex.axis = 0.7, tcl = -1)
}
# test
plot(my.ts, xaxt = "n")
xaxis12(my.ts)
Gabor is spot-on. It really just depends on what you want, and what you are willing to dig up or alter. Here is a simple alternative using a newer and less-well-known package that is excellent for plotting xts types:
## alternative
library(rtsplot) # load the plotting package
library(xts) # load the xts time-series container package
xx <- as.xts(my.ts) # create an xts object
rtsplot(xx, main= "Plotting outcome over time")
rtsplot.x.highlight(xx, which(index(xx)=="Jan 2012"), 1)
As you can see, the plotting then is two calls -- rtsplot has lots of nice defaults. Below is a screenshot as I am lazy, the plot window does of course not have a title bar...

Plot time series knowing only start time/date and sampling periods

I want to plot a density time series with following data:
density vector (4,2,5,8,4,6,4)
sampling period vector (unit: seconds) (2,2,2,2,3,2,2)
as you can see, the sampling period is not constant. I only know the starting date and time.
I somehow need to assign the start time date to the first measurement and then compute the following dates and times for the following measurements, but i don't know how exactly to code it.
Try converting first the desired vector in a ts, provided an initial starttime and period's cumsum.
I assumed that you sample a continous process (there are not spanned/death times)
require (lubridate)
require (tidyr)
require (ggplot2)
require (ggfortify)
require (timetk)
density <- c (4,2,5,8,4,6,4)
seconds <- c (2,2,2,2,3,2,2)
starttime <- 0
time <- 0 + cumsum (seconds)
df <- as.data.frame (cbind (time, seconds, density))
df$time <- as_datetime(df$time)
df$ts <- tk_ts (df, select = density)
autoplot (df$ts, ts.geom = 'bar', fill = 'blue')
Plot the density against the cumulative sum of the seconds added to the start.
dens <- c(4,2,5,8,4,6,4)
secs <- c(2,2,2,2,3,2,2)
st <- as.POSIXct("2000-01-01 00:00:00")
plot(st + cumsum(secs), dens, xlab = "", type = "l")

Set default zoom in plotly

I have a time series I'm plotting in R using ggplotly to auto-convert my ggplot2 graph plotly. My time series goes back 20 years, but when it's brought up I only want it to display the most recent 4 years of data. I've used
layout(ggplotly_object, xaxis=list(range=c(min_date,max_date)))
This does not appear to even be working to limit the date ranges, which I'm setting using lubridate to subtract 4 years from the maximum date.
I have not found any documentation on changing the default zoom of a plotly plot to a limited range of data while still allowing the user to zoom out and pan to past data. Any tips would be appreciated
The date axis is in measured in milliseconds, so you need to convert to this unit first. Here's an example:
library(plotly)
library(lubridate)
set.seed(42)
# Dummy data
t1 <- ymd_hms("2006-03-14 12:00:00")
t2 <- ymd_hms("2016-03-14 12:00:00")
df <- data.frame(t = seq(t1, t2, by = 'week'),
y = rexp(522, rate = 0.25))
# Full plot
p <- plot_ly(df, x = t, y = y, type = 'scatter')
p
# Now zoom. Needs to be the number of milliseconds since 01/01/1970.
# I'm deliberately using lubridate functions.
min_Date <- ymd_hms("2010-03-14 12:00:00")
min_Date_ms <- interval("1970-01-01 00:00:00", min_Date) / dmilliseconds(1)
max_Date <- ymd_hms("2012-03-14 12:00:00")
max_Date_ms <- interval("1970-01-01 00:00:00", max_Date) / dmilliseconds(1)
p %>% layout(xaxis = list(range = c(min_Date_ms, max_Date_ms)))
There's probably a more elegant way of doing this but it should work.
So for range, you should set it to a vector length 2, ie: c(min value,max value).

starting a daily time series in R

I have a daily time series about number of visitors on the web site. my series start from 01/06/2014 until today 14/10/2015 so I wish to predict number of visitor for in the future. How can I read my series with R? I'm thinking:
series <- ts(visitors, frequency=365, start=c(2014, 6))
if yes,and after runing my time series model arimadata=auto.arima() I want to predict visitor's number for the next 6o days, how can i do this?
h=..?
forecast(arimadata,h=..),
the value of h shoud be what ?
thanks in advance for your help
The ts specification is wrong; if you are setting this up as daily observations, then you need to specify what day of the year 2014 is June 1st and specify this in start:
## Create a daily Date object - helps my work on dates
inds <- seq(as.Date("2014-06-01"), as.Date("2015-10-14"), by = "day")
## Create a time series object
set.seed(25)
myts <- ts(rnorm(length(inds)), # random data
start = c(2014, as.numeric(format(inds[1], "%j"))),
frequency = 365)
Note that I specify start as c(2014, as.numeric(format(inds[1], "%j"))). All the complicated bit is doing is working out what day of the year June 1st is:
> as.numeric(format(inds[1], "%j"))
[1] 152
Once you have this, you're effectively there:
## use auto.arima to choose ARIMA terms
fit <- auto.arima(myts)
## forecast for next 60 time points
fore <- forecast(fit, h = 60)
## plot it
plot(fore)
That seems suitable given the random data I supplied...
You'll need to select appropriate arguments for auto.arima() as suits your data.
Note that the x-axis labels refer to 0.5 (half) of a year.
Doing this via zoo
This might be easier to do via a zoo object created using the zoo package:
## create the zoo object as before
set.seed(25)
myzoo <- zoo(rnorm(length(inds)), inds)
Note you now don't need to specify any start or frequency info; just use inds computed earlier from the daily Date object.
Proceed as before
## use auto.arima to choose ARIMA terms
fit <- auto.arima(myts)
## forecast for next 60 time points
fore <- forecast(fit, h = 60)
The plot though will cause an issue as the x-axis is in days since the epoch (1970-01-01), so we need to suppress the auto plotting of this axis and then draw our own. This is easy as we have inds
## plot it
plot(fore, xaxt = "n") # no x-axis
Axis(inds, side = 1)
This only produces a couple of labeled ticks; if you want more control, tell R where you want the ticks and labels:
## plot it
plot(fore, xaxt = "n") # no x-axis
Axis(inds, side = 1,
at = seq(inds[1], tail(inds, 1) + 60, by = "3 months"),
format = "%b %Y")
Here we plot every 3 months.
Time Series Object does not work well with creating daily time series. I will suggest you use the zoo library.
library(zoo)
zoo(visitors, seq(from = as.Date("2014-06-01"), to = as.Date("2015-10-14"), by = 1))
Here's how I created a time series when I was given some daily observations with quite a few observations missing. #gavin-simpson gave quite a big help. Hopefully this saves someone some grief.
The original data looked something like this:
library(lubridate)
set.seed(42)
minday = as.Date("2001-01-01")
maxday = as.Date("2005-12-31")
dates <- seq(minday, maxday, "days")
dates <- dates[sample(1:length(dates),length(dates)/4)] # create some holes
df <- data.frame(date=sort(dates), val=sin(seq(from=0, to=2*pi, length=length(dates))))
To create a time-series with this data I created a 'dummy' dataframe with one row per date and merged that with the existing dataframe:
df <- merge(df, data.frame(date=seq(minday, maxday, "days")), all=T)
This dataframe can be cast into a timeseries. Missing dates are NA.
nts <- ts(df$val, frequency=365, start=c(year(minday), as.numeric(format(minday, "%j"))))
plot(nts)
series <- ts(visitors, frequency=365, start=c(2014, 152))
152 number is 01-06-2014 as it start from 152 number because of frequency=365
To forecast for 60 days, h=60.
forecast(arimadata , h=60)

Resources