Can we plot multiple time series in one plot using hydroTSM? - r

I have daily precipitation data in the following format:
> head(df)
I_2004 G_2004 T_2004 Date
1 3628.79853 2199.310 12741.413 2004-01-01
2 1556.66704 4322.884 5464.395 2004-01-02
3 20.43379 5592.103 72.998 2004-01-03
4 265.94247 8145.041 942.344 2004-01-04
5 914.93958 9668.531 3227.579 2004-01-05
6 2585.63558 6825.905 9043.866 2004-01-06
usually I plot the time series of all 3 variables together using ggplot2:
dfmelt<-melt(df,id.vars="Date")
ggplot(dfmelt,aes(x=Date,y=value,
col=variable,group=12))+
labs(title='ANNUAL')+
geom_line()
I have used hydroTSM to plot ts but never multi variable one. I was wondering if there was any way to achieve this using packages like hydroTSM?
my current method requires subsetting and doing so for multiple years is time consuming. I'm hoping to shorten this using hydroTSM or any other suitable package.
my aim to is plot monthly and seasonal time series plots.

We use a larger data frame below (see Note at end) so that it is possible to display month plots. Convert the data frame df to a zoo series -- hydroTSM makes zoo available -- and use autoplot.zoo . Use aggregate with tail or mean to create a monthly plot and convert that to ts to create the seasonal plot. Except for ggplot2 the following only uses packages already pulled in by hydroTSM.
library(ggplot2)
library(hydroTSM)
z <- read.zoo(df, index = "Date")
autoplot(z) # separate panels
autoplot(z, facets = NULL) # single panel
# monthly plot
zm <- aggregate(z, as.yearmon, tail, 1, frequency = 12)
autoplot(zm)
# for seasonal plot
tt <- as.ts(zm)
nc <- ncol(tt)
opar <- par(mfrow = c(nc, 1), mar = c(2, 4, 0, 4))
for(j in 1:nc) monthplot(tt[, j], ylab = colnames(tt)[j])
par(opar)
Note
df in reproducible form. Larger than in question so that monthly plots can be shown.
set.seed(123)
n <- 700
df <- data.frame(I_2004 = rnorm(n),
G_2004 = rnorm(n),
T_2004 = rnorm(n),
Date = as.Date("2004-01-01") + 1:n - 1)

Related

Moving average on several time series using ggplot

Hi I try desperately to plot several time series with a 12 months moving average.
Here is an example with two time series of flower and seeds densities. (I have much more time series to work on...)
#datasets
taxon <- c(rep("Flower",36),rep("Seeds",36))
density <- c(seq(20, 228, length=36),seq(33, 259, length=36))
year <- rep(c(rep("2000",12),rep("2001",12),rep("2002",12)),2)
ymd <- c(rep(seq(ymd('2000-01-01'),ymd('2002-12-01'), by = 'months'),2))
#dataframe
df <- data.frame(taxon, density, year, ymd)
library(forecast)
#create function that does a Symmetric Weighted Moving Average (2x12) of the monthly log density of flowers and seeds
ma_12 <- function(x) {
ts_x <- ts(x, freq = 12, start = c(2000, 1), end = c(2002, 12)) # transform to time-series object as it is necessary to run the ma function
return(ma(log(ts_x + 1), order = 12, centre = T))
}
#trial of the function
ma_12(df[df$taxon=="Flower",]$density) #works well
library(ggplot2)
#Trying to plot flower and seeds log density as two time series
ggplot(df,aes(x=year,y=density,colour=factor(taxon),group=factor(taxon))) +
stat_summary(fun.y = ma_12, geom = "line") #or geom = "smooth"
#Warning message:
#Computation failed in `stat_summary()`:
#invalid time series parameters specified
Function ma_12 works correctly. The problem comes when I try to plot both time-series (Flower and Seed) using ggplot. I cannot define both taxa as different time series and apply a moving average on them. Seems that it has to do with "stat_summary"...
Any help would be more than welcome! Thanks in advance
Note: The following link is quite useful but can not directly help me as I want to apply a specific function and plot it in accordance to the levels of one group variable. For now, I can't find any solution. Any way, thank you to suggest me this.
Multiple time series in one plot
This is what you need?
f <- ma_12(df[df$taxon=="Flower", ]$density)
s <- ma_12(df[df$taxon=="Seeds", ]$density)
f <- cbind(f,time(f))
s <- cbind(s,time(s))
serie <- data.frame(rbind(f,s),
taxon=c(rep("Flower", dim(f)[1]), rep("Seeds", dim(s)[1])))
serie$density <- exp(serie$f)
library(lubridate)
serie$time <- ymd(format(date_decimal(serie$time), "%Y-%m-%d"))
library(ggplot2)
ggplot() + geom_point(data=df, aes(x=ymd, y=density, color=taxon, group=taxon)) +
geom_line(data=serie, aes(x= time, y=density, color=taxon, group=taxon))

ts.plot() not plotting Time Series data against custom x-axis

I am having issues with trying to plot some Time Series data; namely, trying to plot the date (increments in months) against a real number (which represents price).
I can plot the data with just plot(months, mydata) with no issue, but its in a scatter plot format.
However, when I try the same with ts.plot i.e. tsplot(months, mydata), I get the following error:
Error in .cbind.ts(list(...), .makeNamesTs(...), dframe = dframe, union = TRUE) : no time series supplied
I tried to bypass this by doing tsplot(ts(months, mydata)), but with this I get a straight linear line (which I know isn't correct).
I have made sure that both months and mydata have the same length
EDIT: What I mean by custom x-axis
I need the data to be in monthly increments (specifically from 03/1998 to 02/2018) - so I ran the following in R:
d <- seq(as.Date("1998-03-01"), as.Date("2018-02-01"), "day")
months <- seq(min(d), max(d), "month")
Now that I have attained the monthly increments, I need the above variable, months, to act as the x-axis for the Time Series plot (perhaps more accurately, the time index).
With package zoo you can do the following.
library(zoo)
z <- zoo(mydata, order.by = months)
labs <- seq(min(index(z)), max(index(z)), length.out = 10)
plot(z, xaxt = "n")
axis(1, at = labs, labels = format(labs, "%m/%Y"))
Data creation code.
set.seed(1234)
d <- seq(as.Date("1998-03-01"), as.Date("2018-02-01"), "day")
months <- seq(min(d), max(d), "month")
n <- length(months)
mydata <- cumsum(rnorm(n))

How to draw time series plot for data in date format in R

I have data where there are dates of visits of children.
date
16.08.13
16.08.13
16.08.13
17.08.13
27.08.13
03.09.13
04.09.13
05.09.13
07.09.13
07.09.13
I want to draw a time series plot in R that shows the dates and corresponding number of visits. For example, above there are 3 children on 16.08.2013.
In addition, my data cover 3 years. So, I would like to see the seasonal change over 3 years.
First let us create a longer data set called r. Use table to compute the frequencies, convert to a zoo time series and plot. Then compute the mean of each year/month and create a monthplot. Finally plot the means over all months vs month.
# test data
set.seed(123)
r <- as.Date("2000-01-01") + cumsum(rpois(1000, 1))
library(zoo)
opar <- par(mfrow = c(2,2)) # create a 2x2 grid of plots - optional
# plot freq vs. time
tab <- table(r)
z <- zoo(c(tab), as.Date(names(tab)))
plot(z) # this will be the upper left plot
# plot each month separately
zm <- aggregate(z, as.yearmon, mean)
monthplot(zm) # upper right plot
# plot month means
# zc <- aggregate(zm, cycle(zm), mean) # alternative but not equivalent
zc <- aggregate(z, cycle(as.yearmon(time(z))), mean)
plot(zc) # lower plot
par(opar) # reset grid
Note: The sum of z for each year/month is zym and the average of those for all the January months, all the February months, ...., all December months is:
zym <- aggregate(z, as.yearmon(time(z)), sum)
aggregate(zym, cycle(as.yearmon(time(zym))), mean)
With ggplot and scale packages you can try something like this (which is a piece of my code that actually works):
library(ggplot2)
library(lubridate)
library(scales)
g_sm_ddply <- ggplot(final_data, aes(x = as.Date(dates), y = scon_me, fill = tipo))
g_sm_ddply + geom_bar(position = "dodge", stat = "identity") +
labs(title = "SCONTRINO MEDIO ACQ_ISS_KPMG NUOVA CLUSTERIZZAZIONE", x = "data", y = "scontrino medio")+
scale_x_date(breaks = date_breaks("month"), labels = date_format("%Y/%m"))
I assume that you are already familiar with basic data manipulation in R.
One way to do what you want, is to tabulate the date vector and create a proper times series object or a data.frame
df <- as.data.frame(table(date)) ### tabulate
df$date <- as.Date(df$date, "%d.%m.%y") ### turn your date to Date class
df
## date Freq
## 1 2013-09-03 1
## 2 2013-09-04 1
## 3 2013-09-05 1
## 4 2013-09-07 2
## 5 2013-08-16 3
## 6 2013-08-17 1
## 7 2013-08-27 1
plot(Freq ~ date, data = df, pch = 19) ### plot
So far we are still missing the seasonal trend analysis the OP asked for. I think it is the more difficult part of the question.
If your data covers only 3 years, you can maybe observe the seasonal changes by simple looking at the monthly average daily visits.
Depending on your needs you can go with a simple monthly plot or you might have to prepare further your data to compute the exact trend in seasonality.
Below a suggestion on how to compute and plot the Monthly average number visits per day (with at least one visit per day)
library(ggplot2)
df<-read.table(text="
16.08.13
16.08.13
16.08.13
17.08.13
27.08.13
03.09.13
04.10.13
05.09.13
07.09.13
07.01.14
03.02.14
04.03.14
04.03.14
04.03.14
15.05.14
15.05.14
15.09.14
20.10.14
20.09.14 ", col.names="date")
df <- as.data.frame(table(df)) #get the frequency count (daily)
df$date <- as.Date(df$df, "%d.%m.%y") # turn your date variable to Date class
df$year<-sapply(df$df,function(x) strptime(x,"%d.%m.%Y")$year+1900) #extract month of the visit
df$month<-sapply(df$df,function(x) strptime(x,"%d.%m.%Y")$mon+1) #extract year of the visit
#plot daily frequency
ggplot(aes(x=date, y=Freq), data = df) +
geom_bar(stat = 'identity', position = 'dodge')+
ggtitle("Daily visits")
#compute monthly average visit per day (for days with at least one visit)
library(dplyr)
df2<-df[,c("year","month","Freq")]%>%
group_by(year,month) %>%
summarise_each(funs(mean=mean(., na.rm=TRUE)))
#recreate a date for the graph
df2$date<-as.Date(paste(rep("01",length(df2)),df2$month,df2$year),"%d %m %y")
ggplot(aes(x=date, y=Freq), data = df2) +
geom_bar(stat = 'identity', position = 'dodge')+
ggtitle("Average daily visits per month")

starting a daily time series in R

I have a daily time series about number of visitors on the web site. my series start from 01/06/2014 until today 14/10/2015 so I wish to predict number of visitor for in the future. How can I read my series with R? I'm thinking:
series <- ts(visitors, frequency=365, start=c(2014, 6))
if yes,and after runing my time series model arimadata=auto.arima() I want to predict visitor's number for the next 6o days, how can i do this?
h=..?
forecast(arimadata,h=..),
the value of h shoud be what ?
thanks in advance for your help
The ts specification is wrong; if you are setting this up as daily observations, then you need to specify what day of the year 2014 is June 1st and specify this in start:
## Create a daily Date object - helps my work on dates
inds <- seq(as.Date("2014-06-01"), as.Date("2015-10-14"), by = "day")
## Create a time series object
set.seed(25)
myts <- ts(rnorm(length(inds)), # random data
start = c(2014, as.numeric(format(inds[1], "%j"))),
frequency = 365)
Note that I specify start as c(2014, as.numeric(format(inds[1], "%j"))). All the complicated bit is doing is working out what day of the year June 1st is:
> as.numeric(format(inds[1], "%j"))
[1] 152
Once you have this, you're effectively there:
## use auto.arima to choose ARIMA terms
fit <- auto.arima(myts)
## forecast for next 60 time points
fore <- forecast(fit, h = 60)
## plot it
plot(fore)
That seems suitable given the random data I supplied...
You'll need to select appropriate arguments for auto.arima() as suits your data.
Note that the x-axis labels refer to 0.5 (half) of a year.
Doing this via zoo
This might be easier to do via a zoo object created using the zoo package:
## create the zoo object as before
set.seed(25)
myzoo <- zoo(rnorm(length(inds)), inds)
Note you now don't need to specify any start or frequency info; just use inds computed earlier from the daily Date object.
Proceed as before
## use auto.arima to choose ARIMA terms
fit <- auto.arima(myts)
## forecast for next 60 time points
fore <- forecast(fit, h = 60)
The plot though will cause an issue as the x-axis is in days since the epoch (1970-01-01), so we need to suppress the auto plotting of this axis and then draw our own. This is easy as we have inds
## plot it
plot(fore, xaxt = "n") # no x-axis
Axis(inds, side = 1)
This only produces a couple of labeled ticks; if you want more control, tell R where you want the ticks and labels:
## plot it
plot(fore, xaxt = "n") # no x-axis
Axis(inds, side = 1,
at = seq(inds[1], tail(inds, 1) + 60, by = "3 months"),
format = "%b %Y")
Here we plot every 3 months.
Time Series Object does not work well with creating daily time series. I will suggest you use the zoo library.
library(zoo)
zoo(visitors, seq(from = as.Date("2014-06-01"), to = as.Date("2015-10-14"), by = 1))
Here's how I created a time series when I was given some daily observations with quite a few observations missing. #gavin-simpson gave quite a big help. Hopefully this saves someone some grief.
The original data looked something like this:
library(lubridate)
set.seed(42)
minday = as.Date("2001-01-01")
maxday = as.Date("2005-12-31")
dates <- seq(minday, maxday, "days")
dates <- dates[sample(1:length(dates),length(dates)/4)] # create some holes
df <- data.frame(date=sort(dates), val=sin(seq(from=0, to=2*pi, length=length(dates))))
To create a time-series with this data I created a 'dummy' dataframe with one row per date and merged that with the existing dataframe:
df <- merge(df, data.frame(date=seq(minday, maxday, "days")), all=T)
This dataframe can be cast into a timeseries. Missing dates are NA.
nts <- ts(df$val, frequency=365, start=c(year(minday), as.numeric(format(minday, "%j"))))
plot(nts)
series <- ts(visitors, frequency=365, start=c(2014, 152))
152 number is 01-06-2014 as it start from 152 number because of frequency=365
To forecast for 60 days, h=60.
forecast(arimadata , h=60)

Selecting and plotting months in ggplot2

I have a time series dataset in this format with two columns date (e.g Jan 1980, Feb 1980...Dec 2013) and it's corresponding temperature. This dataset is from 1980 to 2013. I am trying to subset and plot time series in ggplot for the months separately (e.g I only want all Feb so that I can plot it using ggplot). Tried the following, but the Feb1 is empty
Feb1 <- subset(temp, date ==5)
The structure of my dataset is:
'data.frame': 408 obs. of 2 variables:
$ date :Class 'yearmon' num [1:359] 1980 1980 1980 1980 1980 ...
$ temp: int 16.9 12.7 13 6 6.0 5 6 10.9 0.9 16 ...
What about this?:
library(zoo)
# Generating some data:
df <- data.frame(date = as.yearmon("1980-01") + 0:407/12, val = rnorm(408))
# Subsetting to get a specific month:
df.sub <- subset(df, format(df$date,"%b")=="Jan")
# The actual plot:
ggplot(df.sub) + geom_line(aes(x = as.Date(date), y = val))
I believe your column being in a 'yearmon' class comes in the format "mm YY". I'm a little confused by how you are subsetting the data by 'date==5'. Below I try a method.
temp$month<-substr(temp$date,1,3)
Feb1<-subset(temp,month=='Feb')
#more elegant
Feb1<-subset(temp,substr(temp$date,1,3)=='Feb')
You can also directly plot the subset in ggplot2 without creating a new data frame.
Based on RStudent's solution:
library(zoo)
# Generating some data:
df <- data.frame(date = as.yearmon("1980-01") + 0:407/12, val = rnorm(408))
library(ggplot2)
ggplot(df[format(df$date,"%b")=="Jan", ], aes(x = as.Date(date), y = val))+
geom_line()
Convert the data to zoo, use cycle to split into months and autoplot.zoo to plot. Below we show four different ways to plot. First we plot just January. Then we plot all the months with each month in a separate panel and then we plot all months with each month as a separate series all in the same panel. Finally we use monthplot (not ggplot2) to plot them all in a single panel in a different manner.
library(zoo)
library(ggplot2)
# test data
set.seed(123)
temp <- data.frame(date = as.yearmon(1980 + 0:479/12), value = rnorm(480))
z <- read.zoo(temp, FUN = identity) # convert to zoo
# split into 12 series and cbind them together so zz480 is 480 x 12
# Then aggregate to zz which is 40 x 12
zz480 <- do.call(cbind, split(z, cycle(z)))
zz <- aggregate(zz480, as.numeric(trunc(time(zz480))), na.omit)
### now we plot this 4 different ways
#####################################
# 1. plot just January
autoplot(zz[, 1]) + ggtitle("Jan")
# 2. plot each in separate panel
autoplot(zz)
# 3. plot them all in a single panel
autoplot(zz, facet = NULL)
# 4. plot them all in a single panel in a different way (not using ggplot2)
monthplot(z)
Note that an alternative way to calculate zz would be:
zz <- zoo(matrix(coredata(z), 40, 12, byrow=TRUE), unique(as.numeric(trunc(time(z)))))
Update: Added plot types and improved the approach.

Resources