I have been able to use a lm poly-model to model and predict some timeseries data. However when I change to using a holt model, I obtain an error in the R console.
Here is what I am trying to do:
library(ggplot2)
library(matrixStats)
library(forecast)
df_input <- read.csv("postprocessed.csv")
x <- df_input$time
y <- df_input$value
df <- data.frame(x, y)
#poly4model <- lm(y~poly(x, degree=4), data=df)
holtmodel <- holt(df$y) # might need df$value here ?
v <- seq(1, 44)
v2 <- seq(44, 55)
pdf("postprocessed_holts.pdf")
plot(df, xlim=c(0, 55))
##lines(v, predict(poly4model, data.frame(x=v)), col="blue", pch=20, lwd=3)
##lines(v2, predict(poly4model, data.frame(x=v2)), col="red", pch=20, lwd=3)
lines(v, predict(holtmodel, data.frame(x=v)), col="blue", pch=20, lwd=3)
lines(v2, predict(holtmodel, data.frame(x=v2)), col="red", pch=20, lwd=3)
dev.off()
This is the error which shows up
Error in xy.coords(x, y) : 'x' and 'y' lengths differ
I am a bit confused as to what x and y refer to here. The objects x and y which are in the Environment (R Studio Environment) both have length 44.
The code appears to error on both lines starting with lines.
Here's a copy of the input data...
"","time","value"
"1",1,2.61066016308988
"2",2,3.41246054742996
"3",3,3.8608767964033
"4",4,4.28686048552237
"5",5,4.4923132964825
"6",6,4.50557049744317
"7",7,4.50944447661246
"8",8,4.51097373134893
"9",9,4.48788748823809
"10",10,4.34603985656981
"11",11,4.28677073671406
"12",12,4.20065901625172
"13",13,4.02514194962519
"14",14,3.91360194972916
"15",15,3.85865748409081
"16",16,3.81318053258601
"17",17,3.70380706527433
"18",18,3.61552922363713
"19",19,3.61405310598722
"20",20,3.64591327503384
"21",21,3.70234435835577
"22",22,3.73503970503372
"23",23,3.81003078640584
"24",24,3.88201196162666
"25",25,3.89872518158949
"26",26,3.97432743542362
"27",27,4.2523675144599
"28",28,4.34654855854847
"29",29,4.49276038902684
"30",30,4.67830892029687
"31",31,4.91896819673664
"32",32,5.04350767355202
"33",33,5.09073406942046
"34",34,5.18510849382162
"35",35,5.18353176529036
"36",36,5.2210776270173
"37",37,5.22643491929207
"38",38,5.11137006553725
"39",39,5.01052467981257
"40",40,5.0361056705898
"41",41,5.18149486951409
"42",42,5.36334869132276
"43",43,5.43053620818444
"44",44,5.60001072279525
Edit
I tried an alternative method as well. I noticed that the object holtmodel contains two objects which might be useful. They are fitted and mean. As far as I can tell this is the fitted timeseries and the mean timeseries for the next 10 steps/predictions.
I tried plotting these objects with
lines(holtmodel$fitted, col="orange", lwd=2)
lines(holtmodel$mean, col="blue", lwd=2)
however the second of these fails to plot anything, despite no error being produced in the console. The first line plots an orange timeseries as expected.
Your issue
The objects you are trying to add as lines don't have the same length:
length(predict(holtmodel, data.frame(x=v)))
# 10
length(v)
# 44
length(predict(holtmodel, data.frame(x=v2)))
# 10
length(v2)
# 12
This means you can't add them as new lines.
Also, you can't really predict the same way you would with a linear regression by using say, older data as point to prepare the model. Exponential smoothing methods use historical data points to build future data points, you can't really display them for past events.
Also, you are not specifying the parameter for the number of periods you are trying to predict (h), I'll let you refer to the documentation on the holt function. It is already a prediction of future events that is the output, so the use of predict() on it doesn't change the result:
holt_predict <- predict(holtmodel)
length(setdiff(holt_predict, holtmodel))
# 0 which means they are the same objects
Solution
What you could do is use directly mean and fitted and plot them with lines, by also expanding the area to plot the chat with xlim and ylim to view the predicted values. You can directly plot holtmodel$fitted and holtmodel$mean on your chart, since they are time series objects:
plot(df, xlim=c(0, 60), ylim=c(2.5, 10))
lines(holtmodel$fitted, col="blue", pch=20, lwd=3)
lines(holtmodel$mean, col="red", pch=20, lwd=3)
And the result:
Easy alternative
To save you the hassle of having to go through this kind of solution there are easier methods. Have you tried the autoplot function included in the package forecast ? It is from ggplot2 and will give you what you want directly (unless you don't want the confidence intervals). It is very straightforward and will probably yield results close to what you want:
autoplot(holtmodel)
Related
Hi I am plotting the spectral profile of each individual feature in the table below. For the road feature I am able to plot it but for the others I get this error. Please assist.
This is the code I used:
plot((sli[,1:3]),col="grey",type="l")
This is the error I got:
Error in plot.default(...) : formal argument "type" matched by multiple actual arguments
This is the table
plot wants a vector and you give it a matrix. You could make an empty plot and lines afterwards by looping over the columns in an sapply.
plot(1:nrow(sli), type="n", ylim=c(0, max(sli[1:3])))
sapply(1:3, function(x) lines(sli[x], col="grey", lty=x + 1))
Or simply use matplot which is designed for this purpose.
matplot(sli[,1:3], col="grey", type="l")
Data:
set.seed(42)
sli <- data.frame(41:47*.01, sample(200:900, 7), sample(20:90, 7))
I need to plot the following graph so I can choose the optimal threshold for a logistic regression model.
However I can't use the packages (epi and roc) which are used in many of the research I have done. I do have the package pROC. Is there anyway to plot the graph using this package. Also how else could I choose the optimal threshold? How does it work using only the ROC curve?
If you are using the pROC package, the first step is to extract the coordinates of the curve. For instance:
library(pROC)
data(aSAH)
myroc <- roc(aSAH$outcome, aSAH$ndka)
mycoords <- coords(myroc, "all")
Once you have that you can plot anything you like. This should be somewhat close to your example.
plot(mycoords["threshold",], mycoords["specificity",], type="l",
col="red", xlab="Cutoff", ylab="Performance")
lines(mycoords["threshold",], mycoords["sensitivity",], type="l",
col="blue")
legend(100, 0.4, c("Specificity", "Sensitivity"),
col=c("red", "blue"), lty=1)
Choosing the "optimal" cutpoint is as difficult as defining what is optimal in the first place. It highly depends on the context and your application.
A common shortcut is to use the Youden index, which is simply the point with the cutoff with max(specificity + sensitivity). Again with pROC:
best.coords <- coords(myroc, "best", best.method="youden")
abline(v=best.coords["threshold"], lty=2, col="grey")
abline(h=best.coords["specificity"], lty=2, col="red")
abline(h=best.coords["sensitivity"], lty=2, col="blue")
With pROC you can change the criteria for the "best" threshold. See the ?coords help page and the best.method and best.weights arguments for quick ways to tune it. You may want to look at the OptimalCutpoints package for more advanced ways to select your own optimum.
The output plot should look something like this:
I am trying to plot the inverse of a survival function, as the data I'm is actually an increase in proportion of an event over time. I can produce Kaplan-Meier survival plots, but I want to produce the 'opposite' of these. I can kind of get what I want using the following fun="cloglog":
plot(survfit(Surv(Days_until_workers,Workers)~Queen_Number+Treatment,data=xdata),
fun="cloglog", lty=c(1:4), lwd=2, ylab="Colonies with Workers",
xlab="Days", las=1, font.lab=2, bty="n")
But I don't understand quite what this has done to the time (i.e. doesn't start at 0 and distance decreases?), and why the survival lines extend above the y axis.
Would really appreciate some help with this!
Cheers
Use fun="event" to get the desired output
fit <- survfit(Surv(time, status) ~ x, data = aml)
par(mfrow=1:2, las=1)
plot(fit, col=2:3)
plot(fit, col=2:3, fun="event")
The reason for fun="cloglog" screwing up the axes is that it does not plot a fraction at all. It is instead plotting this according to ?plot.survfit:
"cloglog" creates a complimentary log-log survival plot (f(y) = log(-log(y)) along with log scale for the x-axis)
Moreover, the fun argument is not limited to predefined functions like "event" or "cloglog", so you can easily give it your own custom function.
plot(fit, col=2:3, fun=function(y) 3*sqrt(1-y))
I am starting on a bit of analysis on pairs of stocks (pairs trading) and here is the function I wrote for producing a graph (pairs.report - listed below).
I need to plot three different lines in a single plot. The function I have listed does what I want it to do, but it will take a bit of work if I want a fine customisation in the x-axis (the time line). As it is, it prints just the years (for 10 years of data) or the months (for 6 months of data) in the x-axis, with no formatting for ticks.
If I use an xts object, i.e., if I use
plot(xts-object-with-date-asset1-asset2, ...)
instead of
plot(date, asset2, ...)
I get a nicely formatted x-axis right away (along with the grid and the box), but subsequent additions to the plot using functions like points(), text(), lines() fails. I suppose points.xts() and text.xts() are not coming out any time soon.
I would like the convenience of xts objects, but I will also require a fine grained control over my plot. So what should my work-flow be like? Do I have to stick to basic graphics and do all the customisation manually? Or is there a way I can make xts work for me?
I am aware of lattice and ggplot2, but I don't want to use them now. Here is the function I mentioned (any criticism/ suggestions for improvement of the code is welcome) -
library(xts)
pairs.report <- function(asset1, asset2, dataset) {
#create data structures
attach(dataset)
datasetlm <- lm(formula = asset1 ~ asset2 + 0, data = dataset)
beta = coef(datasetlm)[1]
#add extra space to right margin of plot within frame
par(mar=c(5, 4, 4, 4) + 0.1)
# Plot first set of data and draw its axis
ylim <- c(min(asset2,asset1), max(asset2,asset1))
plot(date,
asset2,
axes=T,
ylim=ylim,
xlab="Timeline",
ylab="asset2 and asset1 equity",
type="l",
col="red",
main="Comparison between asset2 and asset1")
lines(date, asset1, col="green")
box()
grid(lwd=3)
# Allow a second plot on the same graph
par(new=T)
# Plot the second plot and
ylim <- c(min(asset1-beta*asset2), max(asset1-beta*asset2))
plot(date,
asset1-beta*asset2,
xlab="", ylab="",
ylim=ylim,
axes=F,
type="l",
col="blue")
#put axis scale on right
axis(side=4,
ylim=ylim,
col="blue",
col.axis="blue")
mtext("Residual Spread",side=4,col="blue",line=2.5)
abline(h=mean(asset1-beta*asset2))
}
plot.xts is a base plot function, which means you can use points.default() and lines.default() if you used the same x arguments as plot.xts uses. But that is not necessary. It is already hashed out in the xts and zoo packages because when those packages are loaded, and you execute methods(lines) and methods(points) you see such functions are already available. points.zoo is documented on the ?plot.zoo page.
I have the following code
frame()
Y = read.table("Yfile.txt",header=T,row.names=NULL,sep='')
X = read.table("Xfile.txt",header=F,sep='')
plot(Y$V1~X$V1,pch=20,xlim=c(0,27))
par(new=T)
plot(Y$V1~X$V2,pch=20,xlim=c(0,27),col='red')
par(new=T)
plot(Y$V1~Y$V3,pch=20,xlim=c(0,27),col='blue')
par(new=T)
All is well and I get the 3 graphs on the same plot. However, when I want to divide X$V1, X$V2 and X$V3 to normalise the data such that
plot(Y$V1~X$V1/Y$V2,pch=20,xlim=c(0,27))
par(new=T)
plot(Y$V1~X$V2/Y$V2,pch=20,xlim=c(0,27),col='red')
par(new=T)
plot(Y$V1~Y$V3/Y$V2,pch=20,xlim=c(0,27),col='blue')
par(new=T)
I get the message
Hit Return to see next plot:
and the graphs just won't show in the same plot. Could anybody tell me what is happening and how to solve it?
If you want to use arithmetic operations in formula you have to use I() functions. So
plot(Y$V1~I(X$V1/Y$V2),pch=20,xlim=c(0,27))
par(new=T)
plot(Y$V1~I(X$V2/Y$V2),pch=20,xlim=c(0,27),col='red')
par(new=T)
plot(Y$V1~I(Y$V3/Y$V2),pch=20,xlim=c(0,27),col='blue')
par(new=T)
works.
Following help page to formula:
To avoid this confusion, the function
I() can be used to bracket those
portions of a model formula where the
operators are used in their arithmetic
sense. For example, in the formula y
~ a + I(b+c), the term b+c is to be
interpreted as the sum of b and c.
Edit. You could do it without formula in one command:
plot(c(X$V1/Y$V2, X$V2/Y$V2, Y$V3/Y$V2), rep(Y$V1, 3),
pch=20, xlim=c(0,27),
col=rep(c("black", "red", "blue"), each=30)
)
I'm not sure why you get the error, but using points instead of plot for the second and third graph is a much more elegant solution (and gets rid of those par calls)