Weekly time series plot in R - r

I am trying to create a plot of weekly data. Though this is not the exact problem I am having it illustrates it well. Basically imagine you want to make a plot of 1,2,....,7 for for 7 weeks from Jan 1 2015. So basically my plot should just be a line that trends upward but instead I get 7 different lines. I tried the code (and some other to no avail). Help would be greatly appreciated.
startDate = "2015-01-01"
endDate = "2015-02-19"
y=c(1,2,3,4,5,6,7)
tsy=ts(y,start=as.Date(startDate),end=as.Date(endDate))
plot(tsy)

You are plotting both the time and y together as individual plots.
Instead use:
plot(y)
lines(y)
Also, create a date column based on the specifics you gave which will be a time series. From here you can add the date on the x-axis to easily see how your variable changes over time.

To make your life easier I think your first step should be to create a (xts) time series object (install/load the xts-package), then it is a piece of cake to plot, subset or do whatever you like with the series.
Build your vector of dates as a sequence with start/end date:
seq( as.Date("2011-07-01"), by=1, len=7)
and your data vector: 1:7
a one-liner builds and plots the above time series object:
plot(as.xts(1:7,order.by=seq( as.Date("2011-07-01"), by=1, len=7)))

Related

Converting dataset into Time Series

mydatasetI have this data set which consists of two attributes i.e Year(2016,2017,2018) and Month(JAN TO DEC). The data set contains the average sales value for all the months for the years 2016, 2017 & 2018. Now when I import this data set, it shows that the data set is a "data.frame" . However I want it to be in "ts" . Then I ran this command
data.ts<- as.ts(myData)
to convert my data into "ts". The result is as follows:
class(data.ts)
[1] "mts" "ts" "matrix"
Now, I want my data set to be in "ts" only, meaning when I run the command class(data.ts). It should show "ts" only. How can I convert my data in "ts" only? And does this "mts" and "matrix" matters or not?
Also, when I plot my data using the command
plot(data.ts)
It shows a plot in which Time is on x-axis while Year and Sales are on y-axis. On the other hand, I want to plot a graph which shows the Year in x axis and Sales values of Months on y-axis.
How do I arrange my data such that when I import the dataset, it is already in ts? Or is there any other way to do it? Also, how to arrange the dataset that it shows the Year on x axis by default. I'm really confused as all the videos that I have seen on YouTube has their data already in "ts". Also, their plot shows Year on x-axis. Hope I have made myself clear. Any help would be appreciated.
How can I plot the graph such that Year is on x axis?
Reorder the data in a single variable:
data=as.matrix(data)
data= as.data.frame(t(data))
names(data)=c('x2016','x2017','x2018')
series=c(data$x2016,data$x2017,data$x2018)
Then take just index accordingly to the start point and the frequency of data. In your case looks like monthly from 2016 hence:
data.ts=ts(series ,start=c(2016,1), frequency=12)
plot(data.ts)

Correct imputation for a zooreg object?

My objective is to impute NAs in a zooreg time series object. The pattern of the time series is cyclic. My code is:
#load libraries required
library("zoo")
# create sequence every 15 minutes from 1st Dec to 20th Dec, 2018
timeStamp <- seq.POSIXt(from=as.POSIXct('2018-01-01 00:00:00', tz="UTC"), to=as.POSIXct('2018-01-20 23:45:00', tz="UTC"), by = "15 min")
# data which increases from 12am to 12pm, then decreases till 12 am of next day, for 20 days
readings <- rep(c(seq(1,48,1), seq(48,1,-1)), 20)
dF <- data.frame(timeStamp=timeStamp, readings=readings)
# create a regular zooreg object, frequency is 1 day( 4 readings * 24 hours)
readingsZooReg <- zooreg(dF$readings, order.by = dF$timeStamp, frequency = 4*24)
plot(readingsZooReg)
# force some data to be NAs
window(readingsZooReg, start = as.POSIXct("2018-01-14 00:00:00", tz="UTC"), end = as.POSIXct("2018-01-16 23:45:00", tz="UTC")) <- NA
plot(readingsZooReg)
# plot imputed values
plot(na.approx(readingsZooReg))
The plots are:
Full time series, NAs added, Imputed time series
I'm purposely using zoo here, since the time series I work on are irregular(eg. solar, oil wells, etc)
1) Is my usage of "zooreg" correct? Or would a "zoo" object suffice ?
2) Is my frequency variable right?
3) Why won't na.approx work? I've also tried na.StructTs, the R script hangs.
4) Is there a solution using any other package? xts, ts, etc?
Your current example time-series is a regular time-series.
(a irregular time series would have time-steps with different time distances between observations)
E.g.:
10:00:10, 10:00:20, 10:00:30, 10:00:40, 10:00:50 (regular spaced)
10:00:10, 10:00:17, 10:00:33, 10:00:37, 10:00:50 (irregular spaced)
If you really need to handle irregular spaced time-series, zoo is your go to package. Otherwise you can also use other time series classes as xts and ts.
About the frequency:
You set the frequency of a time-series usually according to a value where you expect patterns to repeat. (in your example this could be 96). In real live this is often 1 day, 1 week, 1 month,....but it can be also different from these like 1,5 days. (e.g. if you have daily returning patterns and 1 minute observations you would set the frequency to 1440).
na.approx of zoo workes perfectly. It is exactly doing what it is expected to. A interpolation between the points 0 before the gap and 0 at the end of the gap will give a straight line at 0. Of course that is probably not the result you expected, because it does not account for seasonality. That is why G. Grothendieck suggests you na.StructTS as a method to choose. (this method is usually better in accounting for seasonality)
The best choice if you are not bound to zoo would in this specific case be using na_seadec from the imputeTS package ( a package solely dedicated to time series imputation).
I have added you a example also with nice plots from the imputeTS package
library(imputeTS)
yourTS <- ts(coredata(readingsZooReg), frequency = 96)
ggplot_na_distribution(yourTS)
imputedTS <- na_seadec(yourTS)
ggplot_na_imputations(yourTS, imputedTS)
Usually imputeTS also works perfectly with zoo time-series as input. I only changed it to ts again, because something with your zoo object seems odd...that is also why na.StructTS from zoo itself breaks. Maybe somebody with better knowledge can help out here.
Beware, if you really should have irregular time series do not use other packages / imputation functions than from zoo. Because they all assume the data to be regular spaced and will give results accordingly.

How to produce a scatter plot of dates vs magnitudes in R?

This is what i have done so far but its wrong.
earthquakes<- c(6.6,6.8,8.4)
dates <- (13/02/2001 ,28/02/2001,23/06/2001)
plot(earthquakes,dates)
I have only started learning R. Please help.
earthquakes<- c(6.6,6.8,8.4)
dates <- as.Date(c("13/02/2001", "28/02/2001", "23/06/2001"), format="%d/%m/%Y")
plot(dates, earthquakes)
You had a few issues:
Dates should be in quotes (otherwise R will think you're trying to do arithmetic (i.e. 13 divided by 02 divied by 2001)
To convert dates to actual date objects, use as.Date, pass a vector of dates (this is the c(... part), and then specify the format that they are in so that R knows what to do with the strings
you had x and y swapped
Note, the as.Date step is not strictly necessary, but if you don't do that, then the x axis of the plot will plot every item equidistant, irrespective of how far apart the dates actually are in time.

Plotting truncated times from zoo time series

Let's say I have a data frame with lots of values under these headers:
df <- data.frame(c("Tid", "Value"))
#Tid.format = %Y-%m-%d %H:%M
Then I turn that data frame over to zoo, because I want to handle it as a time series:
library("zoo")
df <- zoo(df$Value, df$Tid)
Now I want to produce a smooth scatter plot over which time of day each measurement was taken (i.e. discard date information and only keep time) which supposedly should be done something like this: https://stat.ethz.ch/pipermail/r-help/2009-March/191302.html
But it seems the time() function doesn't produce any time at all; instead it just produces a number sequence. Whatever I do from that link, I can't get a scatter plot of values over an average day. The data.frame code that actually does work (without using zoo time series) looks like this (i.e. extracting the hour from the time and converting it to numeric):
smoothScatter(data.frame(as.numeric(format(df$Tid,"%H")),df$Value)
Another thing I want to do is produce a density plot of how many measurements I have per hour. I have plotted on hours using a regular data.frame with no problems, so the data I have is fine. But when I try to do it using zoo then I either get errors or I get the wrong results when trying what I have found through Google.
I did manage to get something plotted through this line:
plot(density(as.numeric(trunc(time(df),"01:00:00"))))
But it is not correct. It seems again that it is just producing a sequence from 1 to 217, where I wanted it to be truncating any date information and just keep the time rounded off to hours.
I am able to plot this:
plot(density(df))
Which produces a density plot of the Values. But I want a density plot over how many values were recorded per hour of the day.
So, if someone could please help me sort this out, that would be great. In short, what I want to do is:
1) smoothScatter(x-axis: time of day (0-24), y-axis: value)
2) plot(density(x-axis: time of day (0-24)))
EDIT:
library("zoo")
df <- data.frame(Tid=strptime(c("2011-01-14 12:00:00","2011-01-31 07:00:00","2011-02-05 09:36:00","2011-02-27 10:19:00"),"%Y-%m-%d %H:%M"),Values=c(50,52,51,52))
df <- zoo(df$Values,df$Tid)
summary(df)
df.hr <- aggregate(df, trunc(df, "hours"), mean)
summary(df.hr)
png("temp.png")
plot(df.hr)
dev.off()
This code is some actual values that I have. I would have expected the plot of "df.hr" to be an hourly average, but instead I get some weird new index that is not time at all...
There are three problems with the aggregate statement in the question:
We wish to truncate the times not df.
trunc.POSIXt unfortunately returns a POSIXlt result so it needs to be converted back to POSIXct
It seems you did not intend to truncate to the hour in the first place but wanted to extract the hours.
To address the first two points the aggregate statement needs to be changed to:
tt <- as.POSIXct(trunc(time(df), "hours"))
aggregate(df, tt, mean)
but to address the last point it needs to be changed entirely to
tt <- as.POSIXlt(time(df))$hour
aggregate(df, tt, mean)

Line up ts or zoo timeseries of different frequencies at "midperiod" on X axis

I need to plot a number of time series of different frequencies in R, and I need them to have the points centered on a period instead of starting at the beginning of each period. Here is an illustration of what I'm running into:
test1 <- ts(rnorm(24), start=2004, freq=12)
test2 <- ts(rnorm(2), start=2004, freq=1)
plot(test1, type='l')
lines(test2, col='red')
I'd like the red line to essentially be shifted forward 6 months, to the middle oaf each year. I've spent a little time with the R documentation for "ts" and haven't figured out how to do this -- any suggestions?
How about changing the time-series start?
test2 <- ts(rnorm(2), start=2004.5, freq=1)
I agree with #haggai_e that shifting the 'start' parameter makes sense, but if you already have a ts-object then the code to use those values would be:
lines(ts(test2, start=2004.5, freq=frequency(test2)) )
ts-objects are really just numeric vectors with attributes. You recover those attributes with start, end and frequency. The end is actually calculated on the fly from(length/frequency -1 ) of the vector added to start.

Resources