Time series with missing weekend value and keep date in plot - r

I have 1241 daily data from 2012-11-19 to 2017-10-16 but only for week day (for the number of service in a cafeteria). I'm trying to do to prediction,but I have trouble initializing my time series:
timeseries = ts(passage, frequency = 365,
start = c(2012, as.numeric(format(as.Date("2012-11-19"), "%j"))),
end = c(2017, as.numeric(format(as.Date("2017-10-16"), "%j"))) )
If I do like that, because of missing weekend, my variable will loop back after getting to 1241, all the way to 1791 (which correspond to the number of day between my 2 date) and if I want to make a train time series, choosing a date with the parameter "end" will make it not corresponding to the actual date's data.
So I can I overcome this problem? I know that I can create my time series directly with ( and I'm choosing the right frequency ?, if I put 5 or 7 the axis go into very far years)
timeseries = ts(passage, frequency = 365)
but I loose the ability to choose a start and en date and can't see that information in a plot
Edit: The reason I want to keep it to weekly data with 5 day is so when I plot the forecast, I don't get lots of zero in the plot
plot(forecast(timeseries_00))
like this

if I understand your problem correctly, this one could be a solution:
Step 1) I create a time series (passage) with length 1241 like yours.
passage<-rep(1:1241)
"passage" time series
Step 2) I convert the time series in a matrix where every single column is a working day (adding 4 zeros because the time series end at monday), after that I add two additional columns to the matrix with zero values (Saturday and Sunday), I come back to a time series using function unmatrix (package gdata) and I delete the last 6 zeros (4 added by myself and 2 coming from Sunday and Saturday columns)
passage_matrix<-cbind(t(matrix(c(passage,c(0,0,0,0)),nrow = 5)),0,0)
library(gdata)
passage_00<-as.numeric(unmatrix( passage_matrix ,byrow=T))
passage_00<-passage_00[1:(length(passage_00)-6)]
Step 3) I create my new time series
timeseries_00 = ts(passage_00,
frequency = 365,
start = c(2012, as.numeric(format(as.Date("2012-11-19"),
"%j"))))
Step 4) Now I'm able to plot the time series with correct date label (just for working days in my exemple below)
date<-seq(from=as.Date("2012-11-19"),by=1,length.out=length(timeseries_00))
plot(timeseries_00[timeseries_00>0],axes=F)
axis(1, at=1:length(timeseries_00[timeseries_00>0]), labels=date[timeseries_00>0])
"passage" time series with right date
Step 4) Forecast the time series
for_00<-forecast(timeseries_00)
Step 5) I have to modify my original time series in order to have same length beetween forecast data and original data
length(for_00$mean) #length of the prediction
passage_00extended<-c(passage_00,rep(0,730)) #Add zeros for future date
timeseries_00extended = ts(passage_00extended, frequency = 365,
start = c(2012, as.numeric(format(as.Date("2012-11-19"), "%j"))))
date<-seq(from=as.Date("2012-11-19"),by=1,length.out=length(timeseries_00extended))
Step 6) I have to modify predicted data in order to have the same length of timeseries_00extended, all fake data (0 values) are changed in "NA"
pred_mean<-c(rep(NA,length(passage_00)),for_00$mean) #Prediction mean
pred_upper<-c(rep(NA,length(passage_00)),for_00$upper[,2]) #Upper 95%
pred_lower<-c(rep(NA,length(passage_00)),for_00$lower[,2]) #Lower 95%
passage_00extended[passage_00extended==0]<-rep(NA,sum(passage_00extended==0))
Step 7) I plot original data (passage_00extended) and predictions on the same plot (with different colours for mean [blue] and upper and lower bound [orange])
plot(passage_00extended,axes=F,ylim=c(1,max(pred_upper[!is.na(pred_upper)])))
lines(pred_mean,col="Blue")
lines(pred_upper,col="orange")
lines(pred_lower,col="orange")
axis(1, at=1:length(timeseries_00extended), labels=date)
Plot: Forecast

Related

Interpretation of a graph created by the R package seas

I am relatively new to R studio and R in general, I am not even sure if this is the right place to ask this question. I was instructed to draw a graph showing seasonality using daily rainfall over a number of years. I need help more in interpreting the graph than in plotting it.
There is an example already in R using mscdata that I was able to replicate using my own data, the code for the example is as below. Any help with what this graph means or explains will be greatly appreciated.Thank you
install.packages(seas)
library(seas)
data(mscdata)
dat <- mksub(mscdata, id=1108447)
dat.ss <- seas.sum(dat, width="mon")
x<-mscdata
# Structure in R
str(dat.ss)
tail(mscdata)
# Annual data
dat.ss$ann
# Demonstrate how to slice through a cubic array
dat.ss$seas["1990",,]
dat.ss$seas[,2,] # or "Feb", if using English locale
dat.ss$seas[,,"precip"]
# Simple calculation on an array
(monthly.mean <- apply(dat.ss$seas[,,"precip"], 2, mean,na.rm=TRUE))
barplot(monthly.mean, ylab="Mean monthly total (mm/month)",
main="Un-normalized mean precipitation in Vancouver, BC")
text(6.5, 150, paste("Un-normalized rates given 'per month' should be",
"avoided since ~3-9% error is introduced",
"to the analysis between months", sep="\n"))
# Normalized precip
norm.monthly <- dat.ss$seas[,,"precip"] / dat.ss$days
norm.monthly.mean <- apply(norm.monthly, 2, mean,na.rm=TRUE)
print(round(norm.monthly, 2))
print(round(norm.monthly.mean, 2))
barplot(norm.monthly.mean,
ylab="Normalized mean monthly total (mm/day)",
main="Normalized mean precipitation in Vancouver, BC")
# Better graphics of data
dat.ss <- seas.sum(dat, width=11)
image(dat.ss)
This code gives a graph showing sample quartiles, annual rainfall but I don't really know what it means. Any help whatsoever will be appreciated
The Graph using the package seas is as below
Plot
I'll start with the top left graph :
You've probably guessed that each row is a year (as shown by the Y-axis) while day groups/months of the year are X-axis. The color of each box of the heatmap is proportionally darker according to the mm's worth of rain in that day group, with the scale being displayed on the far right. I assume the red X's mean missing values.
Top right is like a barplot with the sum of rainfall each year (row), just continuously plotted. The red bar should be the average precipitation overall (not sure about the orange one).
Bottom left is a bit more tricky. Think of it like you reordered the rows in each column to have the heaviest rainfall of the day group at the top (forgetting about the year info here). The Y-axis shows the quantiles. The quantiles' respective values change for each day group, so the lines you see on top of the plot indicate key rainfall values in mm (4,6,8,10,12). Indeed, If you look at the 2mm line (lowest one), you'll see that in January, about 20% of rainfalls (across all years) are below this threshold, while in the end of July, over 80% are below 2mm (expect less rainfall in the summer).
Lastly, bottom right is similar to the one above it. It's the sum of all rows, referring to the quantiles rather than years this time, resulting in the staircase pattern.
You'll notice that since the scale of the plot is the same as the one showing the average per year, the top of the staircase is outside of the plot...
Hope I made that clear enough.

R-Project: How to limit axes in SPI plot? ylim & xlim don't work

Dear stackoverflow community,
I'm quite new in R and this is my first stackoverflow entry so please show mercy with me if it's not the perfect questioning.
I'm calculating standardized precipitation index (SPI) with the package "SPEI" for a time series of a climate station with 20 years of monthly precipitation data. I have done this for the timescale of 1 and 12 month like this:
spi1 <- spi(SPI_Anu_input_ts[,'PRCP_Anu'], 1)
spi12 <- spi(SPI_Anu_input_ts[,'PRCP_Anu'], 12)
The output of SPI is not a matrix or a dataframe, it's a list. Inside this list under the entry fitted you find a timeseries with the wanted and calculated index values.
To plot these index values you don't have to enter x & y like usual:
plot(x, y, ...)
You can just use the complete list:
par(mfcol=c(2,1))
plot(spi1, 'Anuradhapura, SPI-1')
plot(spi12, 'Anuradhapura, SPI-12')
Then it looks like this:
Plot SPI1 & SPI12
Part of SPI calculation is that the amount of time scale is the first month for the first index value. The precipitation data is starting in Jan 1990. So the indices for SPI1 start in january but for SPI12 start in december (first 11 month are NA).
As you can see in the graphic both x and y axes are shifted. Neither
xlim=as.Date(c("1990-01-01","2017-09-01"))
nor any axes limitation like
ylim=c(-2.5,2.5)
is working to have the same value range in both graphics.
Do anyone know how to solve that?

Automate calculation of Area under curve of moving window

I am trying to find the area under a solar radiation intensity graph at a series of continuous time points.
Basically I want the integral of the past 24hours of solar radiation every hour over a 7 day period - a moving sum of the past 24 hours - (I suspect that temperature in the soil is a result of the past 24 hours of solar radiation)
Here is the code i am using, it works but I would like to automate it so I can change the time window of integration easily (try 12, 18, 24,36 hours ) and obtain a printed/saved table of hourly integrated solar radiation values that I may plot against my hourly temperature data to see if there is a relationship)
Here: Rg -solar radiation in 10min measurements
num - entry number in dataframe
AUC_xxx - Total solar radiation over the past 24 hours
y<-as.numeric(xx$Rg[xx$num["2015-09-13 14:10"]:xx$num["2015-09-14 14:00"]])
x<-c(1:length(y))
id <- order(x)
AUC_s14_14 <- sum(diff(x[id])*rollmean(y[id],2))
I tried with rollapply, but I'm stuck again:
rollapply(xx$Rg[xx$num["2015-09-13 00:10"]:xx$num["2015-09-14 00:00"]], width = 144, by = 6, FUN = **"INTEGRAL"**, na.rm = TRUE, align = "left")
Thanks for the help !

How to plot multiple series of a y variable against a single x variable and fit quantile trend lines

I am dealing with some micro-climate data, where temperature was measured at 3 hour intervals over the course of 11 days.
I am hoping to generate a scatter plot of this data in R Studio, but am battling as I am quite new to the program.
I would like to plot the "time of day" along the x axis and temperature along the y axis, with each set of temperature measurements for each day as a separate series ("Temp day 1," "Temp day 2," "Temp day 3" ... "Temp day 11") on the plot with each series represented with uniquely coloured/symbolized data points.
My data column headings look like this in excel and I have imported it into R studio in this layout.
Time of Day | Temperature day 1 | Temperature day 2 | Temperature day 3 ... Temperature day 11
Once I have plotted this scatter plot in R Studio, is it them possible to fit trend lines to the data points at the 90th, 50th and 10th quantiles? If so would I be able to get the the slope values for these trend lines for comparision?
Any help with the appropriate codes to run, in order to perform these tasks would be greatly appreciated.
Many Thanks

Expand a Time Series to a specific number of periods

I'm new to R and I am attempting to take a set of time series and run them through a Conditional Inference Tree to help classify the shape of the time series. The problem is that not all of the time sereis are of the same number of periods. I am trying to expand each time series to be 30 periods long, but still maintain the same "shape". This is as far as I have got
Require(zoo)
test<-c(606,518,519,541,624,728,560,512,777,728,1014,1100,930,798,648,589,680,635,607,544,566)
accordion<-function(A,N){
x<-ts(scale(A), start=c(1,1), frequency=1)
X1 <- zoo(x,seq(from = 1, to = N, by =(N-1)/(length(x)-1) ))
X2<-merge(X1, zoo(order.by=seq(start(X1), end(X1)-1, by=((N-1)/length(x))/(N/length(x)))))
X3<-na.approx(X2)
return(X3)}
expand.test<-accordion(test,30)
plot(expand.test); lines(scale(test))
length(expand.test)
The above code, scales the time series and then evenly spaces it out to 30 periods and interpolates the missing values. However, the length of the returned series is 42 units and not 30, however it retains the same "shape" as the orignal time series. Does anyone know how to modify this so that the results produced by the function accordian are 30 periods long and the time series shape remains relatively unchanged?
I think there's a base R solution here. Check out approx(), which does linear (or constant) interpolation with as many points n as you specify. Here I think you want n = 30.
test2 <- approx(test, n=30)
plot(test2)
points(test, pch="*")
This returns a list test2 where the second element y is your interpolated values. I haven't yet used your time series object, but it seems that was entirely interior to your function, correct?

Resources