Add changepoint lines to plot in prophet (R) - r

I am using prophet in R (https://facebook.github.io/prophet/), and I would like to overlay changepoints on top of the forecasting plot prophet makes. Here's my code (df is a dataframe containing dates (column ds) and values (column y):
m <- prophet(df)
future <- make_future_dataframe(m, periods=5)
forecast <- predict(m, future)
plot(m, forecast, xlab="Day", ylab="Counts")
i = 0
while (i <= length(m$changepoints.t)) {
tmp <- m$changepoints[i]
abline(v=as.POSIXct(tmp), col='red')
i = i + 1
}
However, the changepoint lines never show up. I double-checked that the changepoints are in range of the main plot, and I tried both as.Date and as.POSIXct in abline. No errors in either case, but no changepoint lines either. Could someone please help?

If you check the class of the plot object:
class(plot(m,forecast,xlab="Day", ylab="Counts"))
... you'll see that it's a ggplot graphic. Additionally, if you access m$changepoints instead of m$changepoints.t, you'll find the changepoints of your model already in POSIXct format.
To plot all of your changepoints as vertical lines, you can use geom_vline like so:
p <- plot(m,forecast,xlab="Day", ylab="Counts")
for (changepoint in m$changepoints) {
p <- p + geom_vline(xintercept = changepoint)
}
print(p)

The prophet package has a dedicated function for this, add_changepoints_to_plot.
In your case, this would do the trick:
plot(m, forecast, xlab="Day", ylab="Counts") +
add_changepoints_to_plot(m)
See ?add_changepoints_to_plot for customisations.

Related

How do I plot multiple lines on the same graph?

I am using the R. I am trying to use the "lines' command in ggplot2 to show the predicted values vs. the actual values for a statistical model (arima, time series). Yet, when I ran the code, I can only see a line of one color.
I simulated some data in R and then tried to make plots that show actual vs predicted:
#set seed
set.seed(123)
#load libraries
library(xts)
library(stats)
#create data
date_decision_made = seq(as.Date("2014/1/1"), as.Date("2016/1/1"),by="day")
date_decision_made <- format(as.Date(date_decision_made), "%Y/%m/%d")
property_damages_in_dollars <- rnorm(731,100,10)
final_data <- data.frame(date_decision_made, property_damages_in_dollars)
#aggregate
y.mon<-aggregate(property_damages_in_dollars~format(as.Date(date_decision_made),
format="%W-%y"),data=final_data, FUN=sum)
y.mon$week = y.mon$`format(as.Date(date_decision_made), format = "%W-%y")`
ts = ts(y.mon$property_damages_in_dollars, start = c(2014,1), frequency = 12)
#statistical model
fit = arima(ts, order = c(4, 1, 1))
Here were my attempts at plotting the graphs:
#first attempt at plotting (no second line?)
plot(fit$residuals, col="red")
lines(fitted(fit),col="blue")
#second attempt at plotting (no second line?)
par(mfrow = c(2,1),
oma = c(0,0,0,0),
mar = c(2,4,1,1))
plot(ts, main="as-is") # plot original sim
lines(fitted(fit), col = "red") # plot fitted values
legend("topleft", legend = c("original","fitted"), col = c("black","red"),lty = 1)
#third attempt (plot actual, predicted and 5 future values - here, the actual and future values show up, but not the predicted)
pred = predict(fit, n.ahead = 5)
ts.plot(ts, pred$pred, lty = c(1,3), col=c(5,2))
However, none of these seem to be working correctly. Could someone please tell me what I am doing wrong? (note: the computer I am using for my work does not have an internet connection or a usb port - it only has R with some preloaded packages. I do not have access to the forecast package.)
Thanks
Sources:
In R plot arima fitted model with the original series
R fitted ARIMA off by one timestep? pkg:Forecast
Plotting predicted values in ARIMA time series in R
You seem to be confusing a couple of things:
fitted usually does not work on an object of class arima. Usually, you can load the forecast package first and then use fitted.
But since you do not have acces to the forecast package you cannot use fitted(fit): it always returns NULL. I had problems with fitted
before.
You want to compare the actual series (x) to the fitted series (y), yet in your first attempt you work with the residuals (e = x - y)
You say you are using ggplot2 but actually you are not
So here is a small example on how to plot the actual series and the fitted series without ggplot.
set.seed(1)
x <- cumsum(rnorm(10))
y <- stats::arima(x, order = c(1, 0, 0))
plot(x, col = "red", type = "l")
lines(x - y$residuals, col = "blue")
I Hope this answer helps you get back on tracks.

Is there a ggplot2 analogue to the avPlots function in R?

When undertaking regression modelling it is useful to produce added variable plots for the explanatory variables in the model, to check whether the posited relationships to the response variable are appropriate to the data. The avPlots function in the car package in R takes a model input, and produces a grid of added-variable plots using the base graphics system. This function is extremely user-friendly, insofar as all you need to do is put in the model object as an argument, and it automatically produces all the added variable plots for each explanatory variable. This matrix of plots contains all the desired information, but unfortunately the plots look poor, owing to the fact that it uses the base graphics system rather than the ggplot2 package. For example, using data found here (downloaded as the file Trucking.csv) here is the output of the avPlots function.
#Load required libraries
library(car);
#Import data, fit model, and show AV plots
DATA <- read.csv('Trucking.csv');
MODEL <- lm(log(PRICPTM) ~ DISTANCE + PCTLOAD + ORIGIN + MARKET + DEREG + PRODUCT,
data = DATA);
avPlots(MODEL);
Question: Is there an equivalent function in ggplot2 that produces a matrix of each of the added-variable plots for a model, but with "prettier" plots? Is it possible to produce these plots, but then customise them using standard ggplot syntax?
I am not aware of any automated function that produces the added variable plots using ggplot. However, as well as giving a plot output as a side-effect of the function call, the avPlots function produces an object that is a list containing the data values used in each of the added variable plots. It is relatively simple to extract data frames of these variables and use these to generate added variable plots using ggplot. This can be done for a general model object using the following functions.
avPlots.invis <- function(MODEL, ...) {
ff <- tempfile()
png(filename = ff)
OUT <- car::avPlots(MODEL, ...)
dev.off()
unlink(ff)
OUT }
ggAVPLOTS <- function(MODEL, YLAB = NULL) {
#Extract the information for AV plots
AVPLOTS <- avPlots.invis(MODEL)
K <- length(AVPLOTS)
#Create the added variable plots using ggplot
GGPLOTS <- vector('list', K)
for (i in 1:K) {
DATA <- data.frame(AVPLOTS[[i]])
GGPLOTS[[i]] <- ggplot2::ggplot(aes_string(x = colnames(DATA)[1],
y = colnames(DATA)[2]),
data = DATA) +
geom_point(colour = 'blue') +
geom_smooth(method = 'lm', se = FALSE,
color = 'red', formula = y ~ x, linetype = 'dashed') +
xlab(paste0('Predictor Residual \n (',
names(DATA)[1], ' | others)')) +
ylab(paste0('Response Residual \n (',
ifelse(is.null(YLAB),
paste0(names(DATA)[2], ' | others'), YLAB), ')')) }
#Return output object
GGPLOTS }
The function ggAVPLOTS will take an input model and produce a list of ggplot objects for each of the added variable plots. These have been constructed to give "pretty" plots with blue points and a dashed red regression line through each plot. If you want all the added variable plots to show up in a single plot, it is relatively simple to do this using the grid.arrange function in the gridExtra package. Below we apply this to your model and show the resulting plot.
#Produce matrix of added variable plots
library(gridExtra)
PLOTS <- ggAVPLOTS(MODEL)
K <- length(PLOTS)
NCOL <- ceiling(sqrt(K))
AVPLOTS <- do.call("arrangeGrob", c(PLOTS, ncol = NCOL, top = 'Added Variable Plots'))
ggsave('AV Plots - Trucking.jpg', width = 10, height = 10)
It is possible to make whatever alterations you want to these plots in the ggplot code above, so if a user prefers to change the colours, font sizes, etc., this is done using standard syntax in ggplot. This method works by importing the data for the added variable plots from the avPlots function, but once you have done that, you can use this data to produce any kind of plot.

Save plots as R objects and displaying in grid

In the following reproducible example I try to create a function for a ggplot distribution plot and saving it as an R object, with the intention of displaying two plots in a grid.
ggplothist<- function(dat,var1)
{
if (is.character(var1)) {
var1 <- which(names(dat) == var1)
}
distribution <- ggplot(data=dat, aes(dat[,var1]))
distribution <- distribution + geom_histogram(aes(y=..density..),binwidth=0.1,colour="black", fill="white")
output<-list(distribution,var1,dat)
return(output)
}
Call to function:
set.seed(100)
df <- data.frame(x = rnorm(100, mean=10),y =rep(1,100))
output1 <- ggplothist(dat=df,var1='x')
output1[1]
All fine untill now.
Then i want to make a second plot, (of note mean=100 instead of previous 10)
df2 <- data.frame(x = rep(1,1000),y = rnorm(1000, mean=100))
output2 <- ggplothist(dat=df2,var1='y')
output2[1]
Then i try to replot first distribution with mean 10.
output1[1]
I get the same distibution as before?
If however i use the information contained inside the function, return it back and reset it as a global variable it works.
var1=as.numeric(output1[2]);dat=as.data.frame(output1[3]);p1 <- output1[1]
p1
If anyone can explain why this happens I would like to know. It seems that in order to to draw the intended distribution I have to reset the data.frame and variable to what was used to draw the plot. Is there a way to save the plot as an object without having to this. luckly I can replot the first distribution.
but i can't plot them both at the same time
var1=as.numeric(output2[2]);dat=as.data.frame(output2[3]);p2 <- output2[1]
grid.arrange(p1,p2)
ERROR: Error in gList(list(list(data = list(x = c(9.66707664902549, 11.3631137069225, :
only 'grobs' allowed in "gList"
In this" Grid of multiple ggplot2 plots which have been made in a for loop " answer is suggested to use a list for containing the plots
ggplothist<- function(dat,var1)
{
if (is.character(var1)) {
var1 <- which(names(dat) == var1)
}
distribution <- ggplot(data=dat, aes(dat[,var1]))
distribution <- distribution + geom_histogram(aes(y=..density..),binwidth=0.1,colour="black", fill="white")
plot(distribution)
pltlist <- list()
pltlist[["plot"]] <- distribution
output<-list(pltlist,var1,dat)
return(output)
}
output1 <- ggplothist(dat=df,var1='x')
p1<-output1[1]
output2 <- ggplothist(dat=df2,var1='y')
p2<-output2[1]
output1[1]
Will produce the distribution with mean=100 again instead of mean=10
and:
grid.arrange(p1,p2)
will produce the same Error
Error in gList(list(list(plot = list(data = list(x = c(9.66707664902549, :
only 'grobs' allowed in "gList"
As a last attempt i try to use recordPlot() to record everything about the plot into an object. The following is now inside the function.
ggplothist<- function(dat,var1)
{
if (is.character(var1)) {
var1 <- which(names(dat) == var1)
}
distribution <- ggplot(data=dat, aes(dat[,var1]))
distribution <- distribution + geom_histogram(aes(y=..density..),binwidth=0.1,colour="black", fill="white")
plot(distribution)
distribution<-recordPlot()
output<-list(distribution,var1,dat)
return(output)
}
This function will produce the same errors as before, dependent on resetting the dat, and var1 variables to what is needed for drawing the distribution. and similarly can't be put inside a grid.
I've tried similar things like arrangeGrob() in this question "R saving multiple ggplot2 plots as R-object in list and re-displaying in grid " but with no luck.
I would really like a solution that creates an R object containing the plot, that can be redrawn by itself and can be used inside a grid without having to reset the variables used to draw the plot each time it is done. I would also like to understand wht this is happening as I don't consider it intuitive at all.
The only solution I can think of is to draw the plot as a png file, saved somewhere and then have the function return the path such that i can be reused - is that what other people are doing?.
Thanks for reading, and sorry for the long question.
Found a solution
How can I reference the local environment within a function, in R?
by inserting
localenv <- environment()
And referencing that in the ggplot
distribution <- ggplot(data=dat, aes(dat[,var1]),environment = localenv)
made it all work! even with grid arrange!

R superimposing bivariate normal density (ellipses) on scatter plot

There are similar questions on the website, but I could not find an answer to this seemingly very simple problem. I fit a mixture of two gaussians on the Old Faithful Dataset:
if(!require("mixtools")) { install.packages("mixtools"); require("mixtools") }
data_f <- faithful
plot(data_f$waiting, data_f$eruptions)
data_f.k2 = mvnormalmixEM(as.matrix(data_f), k=2, maxit=100, epsilon=0.01)
data_f.k2$mu # estimated mean coordinates for the 2 multivariate Gaussians
data_f.k2$sigma # estimated covariance matrix
I simply want to super-impose two ellipses for the two Gaussian components of the model described by the mean vectors data_f.k2$mu and the covariance matrices data_f.k2$sigma. To get something like:
For those interested, here is the MatLab solution that created the plot above.
If you are interested in the colors as well, you can use the posterior to get the appropriate groups. I did it with ggplot2, but first I show the colored solution using #Julian's code.
# group data for coloring
data_f$group <- factor(apply(data_f.k2$posterior, 1, which.max))
# plotting
plot(data_f$eruptions, data_f$waiting, col = data_f$group)
for (i in 1: length(data_f.k2$mu)) ellipse(data_f.k2$mu[[i]],data_f.k2$sigma[[i]], col=i)
And for my version using ggplot2.
# needs ggplot2 package
require("ggplot2")
# ellipsis data
ell <- cbind(data.frame(group=factor(rep(1:length(data_f.k2$mu), each=250))),
do.call(rbind, mapply(ellipse, data_f.k2$mu, data_f.k2$sigma,
npoints=250, SIMPLIFY=FALSE)))
# plotting command
p <- ggplot(data_f, aes(color=group)) +
geom_point(aes(waiting, eruptions)) +
geom_path(data=ell, aes(x=`2`, y=`1`)) +
theme_bw(base_size=16)
print(p)
You can use the ellipse-function from package mixtools. The initial problem was that this function swaps x and y from your plot. I'll try to figure this out and update the answe. (I'll leave the colors to somebody else...)
plot( data_f$eruptions,data_f$waiting)
for (i in 1: length(data_f.k2$mu)) ellipse(data_f.k2$mu[[i]],data_f.k2$sigma[[i]])
Using mixtools internal plotting function:
plot.mixEM(data_f.k2, whichplots=2)

function lines() is not working

I have a problem with the function lines.
this is what I have written so far:
model.ew<-lm(Empl~Wage)
summary(model.ew)
plot(Empl,Wage)
mean<-1:500
lw<-1:500
up<-1:500
for(i in 1:500){
mean[i]<-predict(model.ew,data.frame(Wage=i*100),interval="confidence",level=0.90)[1]
lw[i]<-predict(model.ew,data.frame(Wage=i*100),interval="confidence",level=0.90)[2]
up[i]<-predict(model.ew,data.frame(Wage=i*100),interval="confidence",level=0.90)[3]
}
plot(Wage,Empl)
lines(mean,type="l",col="red")
lines(up,type="l",col="blue")
lines(lw,type="l",col="blue")
my problem i s that no line appears on my plot and I cannot figure out why.
Can somebody help me?
You really need to read some introductory manuals for R. Go to this page, and select one that illustrates using R for linear regression: http://cran.r-project.org/other-docs.html
First we need to make some data:
set.seed(42)
Wage <- rnorm(100, 50)
Empl <- Wage + rnorm(100, 0)
Now we run your regression and plot the lines:
model.ew <- lm(Empl~Wage)
summary(model.ew)
plot(Empl~Wage) # Note. You had the axes flipped here
Your first problem was that you flipped the axes. The dependent variable (Empl) goes on the vertical axis. That is the main reason you didn't get any lines on the plot. To get the prediction lines requires no loops at all and only a single plot call using matlines():
xval <- seq(min(Wage), max(Wage), length.out=101)
conf <- predict(model.ew, data.frame(Wage=xval),
interval="confidence", level=.90)
matlines(xval, conf, col=c("red", "blue", "blue"))
That's all there is to it.

Resources