Adding Points, Legends and Text to plots using xts objects - r

I am starting on a bit of analysis on pairs of stocks (pairs trading) and here is the function I wrote for producing a graph (pairs.report - listed below).
I need to plot three different lines in a single plot. The function I have listed does what I want it to do, but it will take a bit of work if I want a fine customisation in the x-axis (the time line). As it is, it prints just the years (for 10 years of data) or the months (for 6 months of data) in the x-axis, with no formatting for ticks.
If I use an xts object, i.e., if I use
plot(xts-object-with-date-asset1-asset2, ...)
instead of
plot(date, asset2, ...)
I get a nicely formatted x-axis right away (along with the grid and the box), but subsequent additions to the plot using functions like points(), text(), lines() fails. I suppose points.xts() and text.xts() are not coming out any time soon.
I would like the convenience of xts objects, but I will also require a fine grained control over my plot. So what should my work-flow be like? Do I have to stick to basic graphics and do all the customisation manually? Or is there a way I can make xts work for me?
I am aware of lattice and ggplot2, but I don't want to use them now. Here is the function I mentioned (any criticism/ suggestions for improvement of the code is welcome) -
library(xts)
pairs.report <- function(asset1, asset2, dataset) {
#create data structures
attach(dataset)
datasetlm <- lm(formula = asset1 ~ asset2 + 0, data = dataset)
beta = coef(datasetlm)[1]
#add extra space to right margin of plot within frame
par(mar=c(5, 4, 4, 4) + 0.1)
# Plot first set of data and draw its axis
ylim <- c(min(asset2,asset1), max(asset2,asset1))
plot(date,
asset2,
axes=T,
ylim=ylim,
xlab="Timeline",
ylab="asset2 and asset1 equity",
type="l",
col="red",
main="Comparison between asset2 and asset1")
lines(date, asset1, col="green")
box()
grid(lwd=3)
# Allow a second plot on the same graph
par(new=T)
# Plot the second plot and
ylim <- c(min(asset1-beta*asset2), max(asset1-beta*asset2))
plot(date,
asset1-beta*asset2,
xlab="", ylab="",
ylim=ylim,
axes=F,
type="l",
col="blue")
#put axis scale on right
axis(side=4,
ylim=ylim,
col="blue",
col.axis="blue")
mtext("Residual Spread",side=4,col="blue",line=2.5)
abline(h=mean(asset1-beta*asset2))
}

plot.xts is a base plot function, which means you can use points.default() and lines.default() if you used the same x arguments as plot.xts uses. But that is not necessary. It is already hashed out in the xts and zoo packages because when those packages are loaded, and you execute methods(lines) and methods(points) you see such functions are already available. points.zoo is documented on the ?plot.zoo page.

Related

R: Holt Model. Unable to plot timeseries prediction (predict)

I have been able to use a lm poly-model to model and predict some timeseries data. However when I change to using a holt model, I obtain an error in the R console.
Here is what I am trying to do:
library(ggplot2)
library(matrixStats)
library(forecast)
df_input <- read.csv("postprocessed.csv")
x <- df_input$time
y <- df_input$value
df <- data.frame(x, y)
#poly4model <- lm(y~poly(x, degree=4), data=df)
holtmodel <- holt(df$y) # might need df$value here ?
v <- seq(1, 44)
v2 <- seq(44, 55)
pdf("postprocessed_holts.pdf")
plot(df, xlim=c(0, 55))
##lines(v, predict(poly4model, data.frame(x=v)), col="blue", pch=20, lwd=3)
##lines(v2, predict(poly4model, data.frame(x=v2)), col="red", pch=20, lwd=3)
lines(v, predict(holtmodel, data.frame(x=v)), col="blue", pch=20, lwd=3)
lines(v2, predict(holtmodel, data.frame(x=v2)), col="red", pch=20, lwd=3)
dev.off()
This is the error which shows up
Error in xy.coords(x, y) : 'x' and 'y' lengths differ
I am a bit confused as to what x and y refer to here. The objects x and y which are in the Environment (R Studio Environment) both have length 44.
The code appears to error on both lines starting with lines.
Here's a copy of the input data...
"","time","value"
"1",1,2.61066016308988
"2",2,3.41246054742996
"3",3,3.8608767964033
"4",4,4.28686048552237
"5",5,4.4923132964825
"6",6,4.50557049744317
"7",7,4.50944447661246
"8",8,4.51097373134893
"9",9,4.48788748823809
"10",10,4.34603985656981
"11",11,4.28677073671406
"12",12,4.20065901625172
"13",13,4.02514194962519
"14",14,3.91360194972916
"15",15,3.85865748409081
"16",16,3.81318053258601
"17",17,3.70380706527433
"18",18,3.61552922363713
"19",19,3.61405310598722
"20",20,3.64591327503384
"21",21,3.70234435835577
"22",22,3.73503970503372
"23",23,3.81003078640584
"24",24,3.88201196162666
"25",25,3.89872518158949
"26",26,3.97432743542362
"27",27,4.2523675144599
"28",28,4.34654855854847
"29",29,4.49276038902684
"30",30,4.67830892029687
"31",31,4.91896819673664
"32",32,5.04350767355202
"33",33,5.09073406942046
"34",34,5.18510849382162
"35",35,5.18353176529036
"36",36,5.2210776270173
"37",37,5.22643491929207
"38",38,5.11137006553725
"39",39,5.01052467981257
"40",40,5.0361056705898
"41",41,5.18149486951409
"42",42,5.36334869132276
"43",43,5.43053620818444
"44",44,5.60001072279525
Edit
I tried an alternative method as well. I noticed that the object holtmodel contains two objects which might be useful. They are fitted and mean. As far as I can tell this is the fitted timeseries and the mean timeseries for the next 10 steps/predictions.
I tried plotting these objects with
lines(holtmodel$fitted, col="orange", lwd=2)
lines(holtmodel$mean, col="blue", lwd=2)
however the second of these fails to plot anything, despite no error being produced in the console. The first line plots an orange timeseries as expected.
Your issue
The objects you are trying to add as lines don't have the same length:
length(predict(holtmodel, data.frame(x=v)))
# 10
length(v)
# 44
length(predict(holtmodel, data.frame(x=v2)))
# 10
length(v2)
# 12
This means you can't add them as new lines.
Also, you can't really predict the same way you would with a linear regression by using say, older data as point to prepare the model. Exponential smoothing methods use historical data points to build future data points, you can't really display them for past events.
Also, you are not specifying the parameter for the number of periods you are trying to predict (h), I'll let you refer to the documentation on the holt function. It is already a prediction of future events that is the output, so the use of predict() on it doesn't change the result:
holt_predict <- predict(holtmodel)
length(setdiff(holt_predict, holtmodel))
# 0 which means they are the same objects
Solution
What you could do is use directly mean and fitted and plot them with lines, by also expanding the area to plot the chat with xlim and ylim to view the predicted values. You can directly plot holtmodel$fitted and holtmodel$mean on your chart, since they are time series objects:
plot(df, xlim=c(0, 60), ylim=c(2.5, 10))
lines(holtmodel$fitted, col="blue", pch=20, lwd=3)
lines(holtmodel$mean, col="red", pch=20, lwd=3)
And the result:
Easy alternative
To save you the hassle of having to go through this kind of solution there are easier methods. Have you tried the autoplot function included in the package forecast ? It is from ggplot2 and will give you what you want directly (unless you don't want the confidence intervals). It is very straightforward and will probably yield results close to what you want:
autoplot(holtmodel)

R Studio - Big dataset making the graph look like a bar chart

I'm running a large data set with 2524 rows, when I want to run the plot code below the graph looks like a bar chart, I cant figure out why. I used ggplot2 and it ended up changing the data to text.
plot(SamAll$OpenSam, type="l", lwd=2, main= "2014-2018 Samsung Open Stock")
A line plot is really what I want to achieve.
Help would be great, Thanks.
Perhaps your variable is saved as a factor. You might need to change it as a numeric variable, or you can change it to numeric within the plot function:
plot( as.numeric(SamAll$OpenSam),
type="l", lwd=2, main= "2014-2018 Samsung Open Stock")
Consider the following as an illustrative example
# Save 2524 values as a factor
x = as.factor(floor(rnorm(2524, 100, 10)))
> plot(x, type="l", lwd=2)
We get the following:
Convert x to numeric
> plot( as.numeric(x), type="l", lwd=2)
Now we get the line graph

How do I plot an abline() when I don't have any data points (in R)

I have to plot a few different simple linear models on a chart, the main point being to comment on them. I have no data for the models. I can't get R to create a plot with appropriate axes, i.e. I can't get the range of the axes correct. I think I'd like my y-axis to 0-400 and x to be 0-50.
Models are:
$$
\widehat y=108+0.20x_1
$$$$
\widehat y=101+2.15x_1
$$$$
\widehat y=132+0.20x_1
$$$$
\widehat y=119+8.15x_1
$$
I know I could possibly do this much more easily in a different software or create a dataset from the model and estimate and plot the model from that but I'd love to know if there is a better way in R.
As #Glen_b noticed, type = "n" in plot produces a plot with nothing on it. As it demands data, you have to provide anything as x - it can be NA, or some data. If you provide actual data, the plot function will figure out the plot margins from the data, otherwise you have to choose the margins by hand using xlim and ylim arguments. Next, you use abline that has parameters a and b for intercept and slope (or h and v if you want just a horizontal or vertical line).
plot(x=NA, type="n", ylim=c(100, 250), xlim=c(0, 50),
xlab=expression(x[1]), ylab=expression(hat(y)))
abline(a=108, b=0.2, col="red")
abline(a=101, b=2.15, col="green")
abline(a=132, b=0.2, col="blue")
abline(a=119, b=8.15, col="orange")

R: par(mfg) resets ylim values

I'm having a frustrating experience trying to use par(mfg) to move between subplots of a figure. It seems like changing which plot I'm working in using this command resets something about the way y axes are specified such that the ylim=c(a,b) call is useless. This thread (puzzled by xlim/ylim behavior in R) makes me believe that asp may play a role here, but I can't figure out how or how to correct the error.
Briefly, to plot results from density() for multiple datasets on two subplots of a single window, I've written a loop that increments through two lists of output from density() adding new lines to subplot 1, then subplot 2, then back to subplot 1, etc.
DATA.A<-vector("list",length=6)
DATA.B<-vector("list",length=6)
par(mfrow=c(2,1))
plot(0,0, main="title", xlab="X", ylab="Y", xlim=c(c,d), ylim=c(0,30))
plot(0,0, main="title", xlab="X", ylab="Y", xlim=c(c,d), ylim=c(-5,5))
for(i in 1:6){
DATA.A[[i]]<-density(RAWDATA.A[[i]][,"varname"], from=c, to=d, by=e)
DATA.B[[i]]<-density(RAWDATA.B[[i]][,"varname"], from=c, to=d, by=e)
par(mfg=c(1,1))
lines(DATA.A[[i]]$x,DATA.A[[i]]$y,ylim=c(0,30),col="black", lty=i)
lines(DATA.B[[i]]$x,DATA.B[[i]]$y,ylim=c(0,30),col="red", lty=i)
par(mfg=c(2,1))
lines(DATA.A[[i]]$x,DATA.B[[i]]$y-DATA.A[[i]]$y,
ylim=c(-5,5), col="red", lty=i)
abline(v=median(RAWDATA.A[[i]][,"varname"]),lty=i, col="black")
}
EDIT: I am realizing that it fails mostly for the first subplot where it is supposed to be plotting densities over the range from 0 to 30, but instead always resets the axis to the range -1 to 1. Calling plot(0,0), the y tick labels correspond to ylim values I provide, but the data is plotted on the -1 to 1 range. I'd be very grateful for any suggestions.

plot multiple line segments on one graph using R

How can I duplicate this style of graph, with multiple plots on one graph, and, preferably, legends attached as below.
I have tried the concept of "facet" but ggplot2 and trellis:xyplot both think of facets as separate panels rather than overlaid plots.
I can do it using plain Jane plot() and line().. but was using ggplot2 and woudl like to get multiple lines on one plot in that package.
Here is some example data in long form (captured from the plot using a nifty app called "Graphclick")
comp <- read.table(pipe("pbpaste"), header=T, sep=',')
company, year, sales
Apple,1975.003,17298.457
Apple,1977.302,16784.502
Apple,1978.314,17298.457
Apple,1980.246,20730.098
Apple,1981.533,27608.426
Apple,1984.293,40862.852
Apple,1986.408,50468.617
Apple,1987.328,48236.188
Apple,1988.892,35676.547
Apple,1989.904,34616.582
Apple,1991.192,44732.742
Apple,1992.387,44732.742
Apple,1993.399,39055.324
Apple,1995.791,37894.922
Apple,1996.895,39648.746
Apple,1998.274,52804.367
Apple,1999.378,61399.512
Apple,2001.770,2.350e5
Apple,2005.265,7.735e5
Toshiba,1999.378,86856.6
Toshiba,2001.862,1.192e5
Toshiba,2004.069,1.495e5
Toshiba,2004.069,1.495e5
IBM,1975.003,22019.092
IBM,1975.830,27195.193
IBM,1976.934,30682.320
IBM,1978.130,31148.527
IBM,1980.430,35676.547
IBM,1981.625,35676.547
IBM,1983.005,39648.746
IBM,1985.305,40862.852
IBM,1986.408,46102.508
IBM,1987.512,64241.156
IBM,1989.996,75832.898
IBM,1991.100,84276.039
IBM,1992.295,85556.641
IBM,1993.307,79342.539
IBM,1994.779,79342.539
IBM,1995.791,84276.039
IBM,1996.895,95082.484
IBM,1996.895,95082.484
Commodore,1975.003,33588.051
Commodore,1975.830,34616.582
Commodore,1977.118,25219.982
Commodore,1978.130,23388.229
Commodore,1979.326,25992.234
Commodore,1980.521,21689.514
Commodore,1981.717,25219.982
Commodore,1984.201,6999.029
Commodore,1985.213,1670.460
Commodore,1986.408,1458.447
(source: asymco.com)
If you're looking for the most control, you could just use the low-level plot and lines commands. Use "plot" to generate the first graph (with title, xlimits, and ylimits), then use "lines" to add lines to that graph.
plot(0,type="n", xlim=c(0,10), ylim=c(0,10), xlab="X Label", ylab="Y Label", main="Title")
Then add lines using the lines command:
lines(1:10, 1:10, type="l", lty=2)
lines(2:4, 10:8, col=2, type="l")
lines(6:9, c(5,6,5,6), col=3, type="l")
You can fine-tune the look by using all of the parameters listed in the "par" help file ("?par")
so, in ggplot2, this code works
qplot(year, sales, data=comp, colour=as.factor(company), group= company, geom="path", log="y")
The only things left now is to format the value on the Y axis as numeric (not sci notation), and the labels are in an off-graph legend, rather than on the plots... Final suggestions welcomed.
This is a lot easier in the end than plot() + lines(), as that required support code to get the ranges, iterate over the group levels etc.

Resources