Related
I have been able to use a lm poly-model to model and predict some timeseries data. However when I change to using a holt model, I obtain an error in the R console.
Here is what I am trying to do:
library(ggplot2)
library(matrixStats)
library(forecast)
df_input <- read.csv("postprocessed.csv")
x <- df_input$time
y <- df_input$value
df <- data.frame(x, y)
#poly4model <- lm(y~poly(x, degree=4), data=df)
holtmodel <- holt(df$y) # might need df$value here ?
v <- seq(1, 44)
v2 <- seq(44, 55)
pdf("postprocessed_holts.pdf")
plot(df, xlim=c(0, 55))
##lines(v, predict(poly4model, data.frame(x=v)), col="blue", pch=20, lwd=3)
##lines(v2, predict(poly4model, data.frame(x=v2)), col="red", pch=20, lwd=3)
lines(v, predict(holtmodel, data.frame(x=v)), col="blue", pch=20, lwd=3)
lines(v2, predict(holtmodel, data.frame(x=v2)), col="red", pch=20, lwd=3)
dev.off()
This is the error which shows up
Error in xy.coords(x, y) : 'x' and 'y' lengths differ
I am a bit confused as to what x and y refer to here. The objects x and y which are in the Environment (R Studio Environment) both have length 44.
The code appears to error on both lines starting with lines.
Here's a copy of the input data...
"","time","value"
"1",1,2.61066016308988
"2",2,3.41246054742996
"3",3,3.8608767964033
"4",4,4.28686048552237
"5",5,4.4923132964825
"6",6,4.50557049744317
"7",7,4.50944447661246
"8",8,4.51097373134893
"9",9,4.48788748823809
"10",10,4.34603985656981
"11",11,4.28677073671406
"12",12,4.20065901625172
"13",13,4.02514194962519
"14",14,3.91360194972916
"15",15,3.85865748409081
"16",16,3.81318053258601
"17",17,3.70380706527433
"18",18,3.61552922363713
"19",19,3.61405310598722
"20",20,3.64591327503384
"21",21,3.70234435835577
"22",22,3.73503970503372
"23",23,3.81003078640584
"24",24,3.88201196162666
"25",25,3.89872518158949
"26",26,3.97432743542362
"27",27,4.2523675144599
"28",28,4.34654855854847
"29",29,4.49276038902684
"30",30,4.67830892029687
"31",31,4.91896819673664
"32",32,5.04350767355202
"33",33,5.09073406942046
"34",34,5.18510849382162
"35",35,5.18353176529036
"36",36,5.2210776270173
"37",37,5.22643491929207
"38",38,5.11137006553725
"39",39,5.01052467981257
"40",40,5.0361056705898
"41",41,5.18149486951409
"42",42,5.36334869132276
"43",43,5.43053620818444
"44",44,5.60001072279525
Edit
I tried an alternative method as well. I noticed that the object holtmodel contains two objects which might be useful. They are fitted and mean. As far as I can tell this is the fitted timeseries and the mean timeseries for the next 10 steps/predictions.
I tried plotting these objects with
lines(holtmodel$fitted, col="orange", lwd=2)
lines(holtmodel$mean, col="blue", lwd=2)
however the second of these fails to plot anything, despite no error being produced in the console. The first line plots an orange timeseries as expected.
Your issue
The objects you are trying to add as lines don't have the same length:
length(predict(holtmodel, data.frame(x=v)))
# 10
length(v)
# 44
length(predict(holtmodel, data.frame(x=v2)))
# 10
length(v2)
# 12
This means you can't add them as new lines.
Also, you can't really predict the same way you would with a linear regression by using say, older data as point to prepare the model. Exponential smoothing methods use historical data points to build future data points, you can't really display them for past events.
Also, you are not specifying the parameter for the number of periods you are trying to predict (h), I'll let you refer to the documentation on the holt function. It is already a prediction of future events that is the output, so the use of predict() on it doesn't change the result:
holt_predict <- predict(holtmodel)
length(setdiff(holt_predict, holtmodel))
# 0 which means they are the same objects
Solution
What you could do is use directly mean and fitted and plot them with lines, by also expanding the area to plot the chat with xlim and ylim to view the predicted values. You can directly plot holtmodel$fitted and holtmodel$mean on your chart, since they are time series objects:
plot(df, xlim=c(0, 60), ylim=c(2.5, 10))
lines(holtmodel$fitted, col="blue", pch=20, lwd=3)
lines(holtmodel$mean, col="red", pch=20, lwd=3)
And the result:
Easy alternative
To save you the hassle of having to go through this kind of solution there are easier methods. Have you tried the autoplot function included in the package forecast ? It is from ggplot2 and will give you what you want directly (unless you don't want the confidence intervals). It is very straightforward and will probably yield results close to what you want:
autoplot(holtmodel)
So, I've spent the last four hours trying to find an efficient way of plotting the curve(s) of a function with two variables - to no avail. The only answer that I could actually put to practice wasn't producing a multiple-line graph as I expected.
I created a function with two variables, x and y, and it returns a continuous numeric value. I wanted to plot in a single screen the result of this function with certain values of x and all possible values of y within a given range (y is also a continuous variable).
Something like that:
These two questions did help a little, but I still can't get there:
Plotting a function curve in R with 2 or more variables
How to plot function of multiple variables in R by initializing all variables but one
I also used the mosaic package and plotFun function, but the results were rather unappealing and not very readable: https://www.youtube.com/watch?v=Y-s7EEsOg1E.
Maybe the problem is my lack of proficiency with R - though I've been using it for months so I'm not such a noob. Please enlighten me.
Say we have a simple function with two arguments:
fun <- function(x, y) 0.5*x - 0.01*x^2 + sqrt(abs(y)/2)
And we want to evaluate it on the following x and y values:
xs <- seq(-100, 100, by=1)
ys <- c(0, 100, 300)
This line below might be a bit hard to understand but it does all of the work:
res <- mapply(fun, list(xs), ys)
mapply allows us to run function with multiple variables across a range of values. Here we provide it with only one value for "x" argument (note that xs is a long vector, but since it is in a list - it's only one instance). We also provide multiple values of "y" argument. So the function will run 3 times each with the same value of x and different values of y.
Results are arranged column-wise so in the end we have 3 columns. Now we only have to plot:
cols <- c("black", "cornflowerblue", "orange")
matplot(xs, res, col=cols, type="l", lty=1, lwd=2, xlab="x", ylab="result")
legend("bottomright", legend=ys, title="value of y", lwd=2, col=cols)
Here the matplot function does all the work - it plots a line for every column in the provided matrix. Everything else is decoration.
Here is the result:
Hopefully a simple question today:
I'm plotting an RDA (in R Studio) and would like to remove the second X and Y (top and right) axes . Purely for aesthetic purposes, but still. The code I'm using is below. I've managed to remove the first axes (I'll replace them with something nicer later) with xaxt="n" and yaxt="n", but it still puts the others in.
The question: How do I remove the top and right axes from a plot in R?
To make this example reproducible you will need two data frames of equal length called "bio" and "abio" respectively.
library (vegan) ##not sure which package I'm actually employing
library(MASS) ##these are just my defaults
rdaY1<-rda(bio,Abio) #any dummy data will do so long as they're of equal length
par(bg="transparent",new=FALSE)
plot(rdaY1,type="n",bty="n",main="Y1. P<0.001 R2=XXX",
ylab="XXX% variance explained",
xlab="XXX% variance explained",
col.main="black",col.lab="black", col.axis="white",
xaxt="n",yaxt="n",axes=FALSE, bty="n")
abline(h=0,v=0,col="black",lwd=1)
points(rdaY1,display="species",col="gray",pch=20)
#text(rdaY1,display="species",col="gray")
points(rdaY1,display="cn",col="black",lwd=2)
text(rdaY1,display="cn",col="black")
UPDATE: Using comments below I've played around with various ways to get rid of the axes and it seems like that second "points" command where I call for the vectors to be plotted is the problem. Any ideas?
bty="L" worked for me. I generated some random data using rnorm() to test:
library(vegan)
mat <- matrix(rnorm(100), nrow = 10)
pl <- rda(mat)
plot(pl, bty="L")
Here's the result.
I know most of the programers would refer me to 'LATTICE' or 'ggplot2' packages of R as a solution to this question, but there must be a way to do it with the base package. I want to plot multiple graphs with corresponding regression lines and correlation coefficients with simple loops. An easy example data may look like-
a=list(cbind(c(1,2,3), c(4,8,12)), cbind(c(5,15,25), c(10,30,50)))
par(mfrow=c(1,2))
lapply(1:length(a), function(i)
plot(a[[i]][,1], a[[i]][,2]))
lapply(1:length(a), function(i)
abline(lm(a[[i]][,2]~a[[i]][,1])))
require(plotrix)
lapply(1:length(a), function(i)
boxed.labels(a[[i]][,1][1], a[[i]][,2][3],
labels=paste(round(cor(a[[i]][,2], a[[i]][,1], use = "pairwise.complete.obs"),2)),
border=FALSE, adj=0.5, cex=0.8))
If you run the above script you'd notice that all linear lines and r-values will plot on the top of the last graph. Is there any way to write in the call for regression along with the plot command? Or any other clever way to deal with loops to plot regressions on corresponding figures?? It works fine for a single plot (shown below), but I'm working with a considerably large list!
plot(a[[1]][,1], a[[1]][,2])
abline(lm(a[[1]][,2]~a[[1]][,1]))
boxed.labels(a[[1]][,1][1], a[[1]][,2][3],
labels=paste(round(cor(a[[1]][,2], a[[1]][,1], use = "pairwise.complete.obs"),2)),
border=FALSE, adj=0.5, cex=0.8)
Once you call plot(), you start drawing in a new "cell". So if you want to add more to the plot before moving on to the next one, make sure you do all of your drawing before calling the next plot()
For example
a=list(cbind(c(1,2,3), c(4,8,12)), cbind(c(5,15,25), c(10,30,50)))
par(mfrow=c(1,2))
lapply(a, function(d) {
d <- setNames(data.frame(d), c("x","y"))
plot( y~x, d )
abline( lm( y ~ x, d ) )
boxed.labels(min(d$x), max(d$y),
labels=paste(round(cor(d$y, d$x, use = "pairwise.complete.obs"),2)),
border=FALSE, adj=0.5, cex=0.8)
})
Note how we do all the drawing inside a single lapply() so that abline and boxed.labels are called in between the multiple plot calls rather than after they are all done.
Is there any way for me to add some points to a pairs plot?
For example, I can plot the Iris dataset with pairs(iris[1:4]), but I wanted to execute a clustering method (for example, kmeans) over this dataset and plot its resulting centroids on the plot I already had.
It would help too if there's a way to plot the whole data and the centroids together in a single pairs plot in such a way that the centroids can be plotted in a different way. The idea is, I plot pairs(rbind(iris[1:4],centers) (where centers are the three centroids' data) but plotting the three last elements of this matrix in a different way, like changing cex or pch. Is it possible?
You give the solution yourself in the last paragraph of your question. Yes, you can use pch and col in the pairs function.
pairs(rbind(iris[1:4], kmeans(iris[1:4],3)$centers),
pch=rep(c(1,2), c(nrow(iris), 3)),
col=rep(c(1,2), c(nrow(iris), 3)))
Another option is to use panel function:
cl <- kmeans(iris[1:4],3)
idx <- subset(expand.grid(x=1:4,y=1:4),x!=y)
i <- 1
pairs(iris[1:4],bg=cl$cluster,pch=21,
panel=function(x, y,bg, ...) {
points(x, y, pch=21,bg=bg)
points(cl$center[,idx[i,'x']],cl$center[,idx[i,'y']],
cex=4,pch=10,col='blue')
i <<- i +1
})
But I think it is safer and easier to use lattice splom function. The legend is also automatically generated.
cl <- kmeans(iris[1:4],3)
library(lattice)
splom(iris[1:4],groups=cl$cluster,pch=21,
panel=function(x, y,i,j,groups, ...) {
panel.points(x, y, pch=21,col=groups)
panel.points(cl$center[,j],cl$center[,i],
pch=10,col='blue')
},auto.key=TRUE)