X axis in DateTime format in R script/plot - r

I am trying to build a forecast plot in R. But, inspite of trying many solutions I am unable to plot my X axis in dates.
My data is in the form of :
Datetime(MM/DD/YYY) ConsumedSpace
01-01-2015 2488
02-01-2015 7484
03-01-2015 4747
Below is the forecast script I am using:
library(forecast)
library(calibrate)
# group searches by date
dataset <- aggregate(ConsumedSpace ~ Date, data = dataset, FUN= sum)
# create a time series based on day of week
ts <- ts(dataset$ConsumedSpace, frequency=6)
# pull out the seasonal, trend, and irregular components from the time series (train the forecast model)
decom <- stl(ts, s.window = "periodic")
#predict the next 7 days of searches
Pred <- forecast(decom)
# plot the forecast model
plot(Pred)
#text(Pred,ts ,labels = dataset$ConsumedSpace)
The output looks like this-- as you can see I have X axis displayed is periods(numbers) rather than in data format.
Any help is highly appreciated.

Try to enter explicit specifications in your plot : plot(x=Date, ...)
if it does not work try :
timeline<-seq(from=your.first.date, to=your.last.date, by="week")
plot(x=...,y=..., xlab=NA, xaxt="n") # no x axis
axis.Date(1, at=(timeline), format=F, labels=TRUE) # Special axis
Edit :
Sorry for my first solution, which does not fit for your timeserie. The problem is there is no date is time series, but an index refering to "start" and "frequency". Here, your problem comes from your use of "frequency", which is supposed to specify the number of observations by unit of time, ie 4 for quarterly data, 12 for monthly data... Here your unit of time is the week, with 6 open days, that's why your graph axes indicates the index ok the weeks. To have a more readable axis you can try this :
dmin<-as.Date("2015-01-01") # Starting date
# Dummy data
ConsumedSpace=rep(c(5488, 7484, 4747, 4900, 4747, 6548, 6548, 7400, 6300, 8484, 5161, 6161),2)
ts<-ts(ConsumedSpace, frequency=6)
decom <- stl(ts, s.window = "periodic")
Pred <- forecast(decom)
plot(Pred, xlab=NA, xaxt="n") # Plot with no axis
ticks<-seq(from=dmin, to= dmin+(length(time(Pred))-1)*7, by = 7) # Ticks sequency : ie weeks label
axis(1, at=time(Pred), labels=ticks) # axis with weeks label at weeks index
You have to use a 7 interval for weeks labels because of the closed day.
It's ugly but it works. There is surely a better way looking closely at your ts() to specify those data are daily data, and adapting your forecasting function.

Related

Trying to change time labels in R

I'm posting this because i've been having a little problem with my code. What i want to do is to make a forecast of COVID cases in a province for the next 30 days using the AUTOARIMA script. Everything is ok, but when i plot the forecast model, the date labels appears in increments of 25% (IE: 2020.2, 2020.4, etc), but i want to label that axis with a YMD format. This is my code:
library(readxl)
library(ggplot2)
library(forecast)
data <- read_xlsx("C:/Users/XXXX/Documents/Casos ARIMA Ejemplo.xlsx")
provincia_1 <- ts(data$Provincia_1, frequency = 365, start = c(2020,64))
autoarima_provincia1 <- auto.arima(provincia_1)
forecast_provincia1 <- forecast(autoarima_provincia1, h = 30)
plot(forecast_provincia1, main = "Proyeccion Provincia 1", xlab = "Meses", ylab = "Casos Diarios")
When i plot the forecast, this is what appears (with the problem i've stated before on the dates label)
The database is here:
https://github.com/pgonzalezp/Casos-Covid-provincias
Try to create a data.frame having on one column your predictions and in the other the daily dates. Then plot it.
Introduce your start and ending date as seen below, then at "by" argument, please check documentation from this link:
https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/as.Date
df <- data.frame(
date=seq(as.Date("1999-01-01"), as.Date("2014-01-10"), by="6 mon"),
pred_val = forecast_provincia1
)
with(df, plot(date, pred_val ))
I got inspired from here:
R X-axis Date Labels using plot()

How do I plot multiple lines on the same graph?

I am using the R. I am trying to use the "lines' command in ggplot2 to show the predicted values vs. the actual values for a statistical model (arima, time series). Yet, when I ran the code, I can only see a line of one color.
I simulated some data in R and then tried to make plots that show actual vs predicted:
#set seed
set.seed(123)
#load libraries
library(xts)
library(stats)
#create data
date_decision_made = seq(as.Date("2014/1/1"), as.Date("2016/1/1"),by="day")
date_decision_made <- format(as.Date(date_decision_made), "%Y/%m/%d")
property_damages_in_dollars <- rnorm(731,100,10)
final_data <- data.frame(date_decision_made, property_damages_in_dollars)
#aggregate
y.mon<-aggregate(property_damages_in_dollars~format(as.Date(date_decision_made),
format="%W-%y"),data=final_data, FUN=sum)
y.mon$week = y.mon$`format(as.Date(date_decision_made), format = "%W-%y")`
ts = ts(y.mon$property_damages_in_dollars, start = c(2014,1), frequency = 12)
#statistical model
fit = arima(ts, order = c(4, 1, 1))
Here were my attempts at plotting the graphs:
#first attempt at plotting (no second line?)
plot(fit$residuals, col="red")
lines(fitted(fit),col="blue")
#second attempt at plotting (no second line?)
par(mfrow = c(2,1),
oma = c(0,0,0,0),
mar = c(2,4,1,1))
plot(ts, main="as-is") # plot original sim
lines(fitted(fit), col = "red") # plot fitted values
legend("topleft", legend = c("original","fitted"), col = c("black","red"),lty = 1)
#third attempt (plot actual, predicted and 5 future values - here, the actual and future values show up, but not the predicted)
pred = predict(fit, n.ahead = 5)
ts.plot(ts, pred$pred, lty = c(1,3), col=c(5,2))
However, none of these seem to be working correctly. Could someone please tell me what I am doing wrong? (note: the computer I am using for my work does not have an internet connection or a usb port - it only has R with some preloaded packages. I do not have access to the forecast package.)
Thanks
Sources:
In R plot arima fitted model with the original series
R fitted ARIMA off by one timestep? pkg:Forecast
Plotting predicted values in ARIMA time series in R
You seem to be confusing a couple of things:
fitted usually does not work on an object of class arima. Usually, you can load the forecast package first and then use fitted.
But since you do not have acces to the forecast package you cannot use fitted(fit): it always returns NULL. I had problems with fitted
before.
You want to compare the actual series (x) to the fitted series (y), yet in your first attempt you work with the residuals (e = x - y)
You say you are using ggplot2 but actually you are not
So here is a small example on how to plot the actual series and the fitted series without ggplot.
set.seed(1)
x <- cumsum(rnorm(10))
y <- stats::arima(x, order = c(1, 0, 0))
plot(x, col = "red", type = "l")
lines(x - y$residuals, col = "blue")
I Hope this answer helps you get back on tracks.

Plotting Basic Time Series Data in R - Not Plotting Correctly

I'm trying to plot some time series data. My plot looks like the following:
I'm uncertain as to why it displays the date as such. I'm using R Markdown in R studio. Below is my code:
agemployment<-read.csv("Employment-Level1.csv", header=TRUE)
Tried to change the class of Date:
as.Date(as.character(agemployment$Date),format="%m%d%Y")
That did nothing. Rest of code here:
`attach(agemployment)
View(agemployment)
head(agemployment)
agemployment<-ts(agemployment,frequency=12,start=c(2008, 1))
plot(agemployment, col="black", main="Agriculture Employment Level",
ylab="Total Employment Level (Thousands)", ylim=c(0, 250),lwd=2,
xaxs="i", yaxs="i", lty=1)'
This produces the above plot. I'm uncertain what I'm doing wrong. I would appreciate any help. Thank you!
EDIT:
Data here:
I suspect your issues are somehow driven by attach, generally attaching data frames is not a good practice. The following super-simple code worked for me:
# small dataset from your example, I use package readr to load it as data frame
df = readr::read_csv("DATE,Employment
1/1/2008,1245
2/1/2008,1280
3/1/2008,1343
4/1/2008,1251
5/1/2008,1236
6/1/2008,1265")
ts <- ts(data = df$Employment, frequency = 12, start = c(2008, 1))
plot(ts)
Using the file generated reproducibly in the Note at the end read the file into a zoo object making the index of class "yearmon" (representing year and month without day). Then plot it.
library(zoo)
z <- read.csv.zoo("Employment-Level1.csv", format = "%m/%d/%Y", FUN = as.yearmon)
plot(z)
or
library(ggplot2)
autoplot(z) + scale_x_yearmon()
(continued after plots)
If you wanted to convert z to a ts object or data frame:
tt <- as.ts(z)
DF <- fortify.zoo(z)
Note
Lines <- "DATE,Employment
1/1/2008,1245
2/1/2008,1280
3/1/2008,1343
4/1/2008,1251
5/1/2008,1236
6/1/2008,1265"
cat(Lines, file = "Employment-Level1.csv") # write out file
Realize that by providing an image in the question it means that everyone who answers must retype your data so in the future please provide the input data to questions in a reproducible form as we have done here.

plot same data two different ways, get different results (lattice xyplot)

I am trying to produce a scatter plot of some data. I do so in two different ways, as shown in code below (most of the code is just arranging data, the only graphing part is at the bottom). One uses a direct reference to the variables in the workspace, and the other arranges the data into an xts object first and then uses column indices to refer to them.
The resulting scatter plots are different, even though I have checked that the source data is the same in both ways.
I am wondering why these plots are different, thanks in advance.
# Get data
# =============
library('quantmod')
# Set monthly time interval
StartPeriod = paste0("1980-01")
EndPeriod = paste0("2014-07")
DateString = paste0(StartPeriod,"/", EndPeriod)
# CPI (monthly)
getSymbols("CPIAUCSL", src="FRED")
# QoQ growth, Annualized
CPIAUCSL = ((CPIAUCSL/lag(CPIAUCSL))^4-1)*100
CPIAUCSL = CPIAUCSL[DateString]
# Oil prices (monthly)
getSymbols(c("MCOILWTICO"), src="FRED")
# QoQ growth, annualized
MCOILWTICO = ((MCOILWTICO/lag(MCOILWTICO))^4-1)*100
MCOILWTICO = MCOILWTICO[DateString]
# Produce plots
# ===============
library('lattice')
# Method 1, direct reference
xyplot(CPIAUCSL~lag(MCOILWTICO,1), ylim=c(-5,6),
ylab="CPI",
xlab="Oil Price, 1 month lag",
main="Method 1: Inflation vs. Lagged Oil Price",
grid=TRUE)
# Method 2, refer to column indices of xts object
basket = merge(CPIAUCSL, MCOILWTICO)
xyplot(basket[ ,1] ~ lag(basket[ ,2],1), ylim=c(-5, 6),
ylab="CPI",
xlab="Oil Price, 1 month lag",
main="Method 2: Inflation vs. Lagged Oil Price",
grid=TRUE)
# Double check data fed into plots is the same
View(merge(CPIAUCSL, lag(MCOILWTICO,1)))
View(merge(basket[ ,1], lag(basket[ ,2],1))) # yes, matches
Method 1 is definitely incorrect as it will pair points 6 years apart! For instance, CPIAUCSL[3] is the data for 1980-03-01, while lag(MCOILWTICO,1)[3] corresponds to 1986-03-01 - however, on the scatterplot they will be paired! In contrast, basket[ ,1][3] and basket[ ,2][3] both belong to 1980-03-01.
(Your double check didn't show the problem, because there you used merge - as opposed to Method 1! - which solves the problem.)

Plot Timeseries ts object in R

How to plot ts object. month in x axis and monthly.returns in y axis for each year in same graph. please find the code that i am using.
stock<-"^GSPC"
getSymbols(stock,from = "2000-01-01",to = Sys.Date())
GSPC_pr<-monthlyReturn(GSPC)
GSPC_pr<-ts(GSPC_pr,frequency=12, start=c(2000,1))
Presumably you're using the quantmod package. You can plot it like this:
plot(as.xts(GSPC_pr), major.format = "%Y-%m")

Resources