Given data that looks like this:
x<-c(0.287,0.361,0.348,0.430,0.294)
y<-c(105,230,249,758,379)
I'm trying to fit several different methods to this data. For this question I'm looking at 2nd order polynomial fits vs Loess fits. To get a smoother curve, I'd like to expand the x data to give me more points to predict over. So for my Loess curve I do this:
Loess_Fit<-loess(y ~ x)
MakeSmooth<-seq(min(x), max(x), (max(x)-min(x))/1000)
plot(x,y)
#WithoutSmoothing
lines(x=sort(x), y=predict(Loess_Fit)[order(x)], col="red", type="l")
#With Smoothing
lines(x=sort(MakeSmooth), y=predict(Loess_Fit,MakeSmooth)[order(MakeSmooth)], col="blue", type="l")
When I attempt to do the same thing with a 2nd order polynomial fit- I get an error
Poly2<-lm(y ~ poly(x,2,raw=TRUE))
plot(x,y)
#WithoutSmoothing
lines(x=sort(x), y=predict(Poly2)[order(x)], col="red", type="l")
#With Smoothing
lines(x=sort(MakeSmooth), y=predict(Poly2,MakeSmooth)[order(MakeSmooth)], col="blue", type="l")
Obviously there is some difference between Poly2 and Loess_Fit, but I don't know what the difference is. Is there a way to smooth out the Poly2 fit as I did with the Loess_Fit?
For lm, the new data needs to be a data frame:
lines(x=sort(MakeSmooth), y=predict(Poly2,data.frame(x=MakeSmooth))[order(MakeSmooth)], col="blue", type="l")
Related
I have been able to use a lm poly-model to model and predict some timeseries data. However when I change to using a holt model, I obtain an error in the R console.
Here is what I am trying to do:
library(ggplot2)
library(matrixStats)
library(forecast)
df_input <- read.csv("postprocessed.csv")
x <- df_input$time
y <- df_input$value
df <- data.frame(x, y)
#poly4model <- lm(y~poly(x, degree=4), data=df)
holtmodel <- holt(df$y) # might need df$value here ?
v <- seq(1, 44)
v2 <- seq(44, 55)
pdf("postprocessed_holts.pdf")
plot(df, xlim=c(0, 55))
##lines(v, predict(poly4model, data.frame(x=v)), col="blue", pch=20, lwd=3)
##lines(v2, predict(poly4model, data.frame(x=v2)), col="red", pch=20, lwd=3)
lines(v, predict(holtmodel, data.frame(x=v)), col="blue", pch=20, lwd=3)
lines(v2, predict(holtmodel, data.frame(x=v2)), col="red", pch=20, lwd=3)
dev.off()
This is the error which shows up
Error in xy.coords(x, y) : 'x' and 'y' lengths differ
I am a bit confused as to what x and y refer to here. The objects x and y which are in the Environment (R Studio Environment) both have length 44.
The code appears to error on both lines starting with lines.
Here's a copy of the input data...
"","time","value"
"1",1,2.61066016308988
"2",2,3.41246054742996
"3",3,3.8608767964033
"4",4,4.28686048552237
"5",5,4.4923132964825
"6",6,4.50557049744317
"7",7,4.50944447661246
"8",8,4.51097373134893
"9",9,4.48788748823809
"10",10,4.34603985656981
"11",11,4.28677073671406
"12",12,4.20065901625172
"13",13,4.02514194962519
"14",14,3.91360194972916
"15",15,3.85865748409081
"16",16,3.81318053258601
"17",17,3.70380706527433
"18",18,3.61552922363713
"19",19,3.61405310598722
"20",20,3.64591327503384
"21",21,3.70234435835577
"22",22,3.73503970503372
"23",23,3.81003078640584
"24",24,3.88201196162666
"25",25,3.89872518158949
"26",26,3.97432743542362
"27",27,4.2523675144599
"28",28,4.34654855854847
"29",29,4.49276038902684
"30",30,4.67830892029687
"31",31,4.91896819673664
"32",32,5.04350767355202
"33",33,5.09073406942046
"34",34,5.18510849382162
"35",35,5.18353176529036
"36",36,5.2210776270173
"37",37,5.22643491929207
"38",38,5.11137006553725
"39",39,5.01052467981257
"40",40,5.0361056705898
"41",41,5.18149486951409
"42",42,5.36334869132276
"43",43,5.43053620818444
"44",44,5.60001072279525
Edit
I tried an alternative method as well. I noticed that the object holtmodel contains two objects which might be useful. They are fitted and mean. As far as I can tell this is the fitted timeseries and the mean timeseries for the next 10 steps/predictions.
I tried plotting these objects with
lines(holtmodel$fitted, col="orange", lwd=2)
lines(holtmodel$mean, col="blue", lwd=2)
however the second of these fails to plot anything, despite no error being produced in the console. The first line plots an orange timeseries as expected.
Your issue
The objects you are trying to add as lines don't have the same length:
length(predict(holtmodel, data.frame(x=v)))
# 10
length(v)
# 44
length(predict(holtmodel, data.frame(x=v2)))
# 10
length(v2)
# 12
This means you can't add them as new lines.
Also, you can't really predict the same way you would with a linear regression by using say, older data as point to prepare the model. Exponential smoothing methods use historical data points to build future data points, you can't really display them for past events.
Also, you are not specifying the parameter for the number of periods you are trying to predict (h), I'll let you refer to the documentation on the holt function. It is already a prediction of future events that is the output, so the use of predict() on it doesn't change the result:
holt_predict <- predict(holtmodel)
length(setdiff(holt_predict, holtmodel))
# 0 which means they are the same objects
Solution
What you could do is use directly mean and fitted and plot them with lines, by also expanding the area to plot the chat with xlim and ylim to view the predicted values. You can directly plot holtmodel$fitted and holtmodel$mean on your chart, since they are time series objects:
plot(df, xlim=c(0, 60), ylim=c(2.5, 10))
lines(holtmodel$fitted, col="blue", pch=20, lwd=3)
lines(holtmodel$mean, col="red", pch=20, lwd=3)
And the result:
Easy alternative
To save you the hassle of having to go through this kind of solution there are easier methods. Have you tried the autoplot function included in the package forecast ? It is from ggplot2 and will give you what you want directly (unless you don't want the confidence intervals). It is very straightforward and will probably yield results close to what you want:
autoplot(holtmodel)
I want to add a curve to an existing plot.
This curve should be a poisson distribution curve that approaches the mean 3.
I've tried this code
points is a vector with 1000 values
plot(c(1:1000), points,type="l")
abline(h=3)
x = 0:1000
curve(dnorm(x, 3, sqrt(3)), lwd=2, col="red", add=TRUE)
I am getting a plot, but without any curve.
I would like to see a curve that approaches 3.
you can do something like this:
plot(0:20, 3+dpois( x=0:20, lambda=3 ), xlim=c(-2,20))
normden <- function(x){3+dnorm(x, mean=3, sd=sqrt(3))}
curve(normden, from=-4, to=20, add=TRUE, col="red")
running this code will produce the following:
is that what you intended?
I'm running into an odd problem; get my dataset here:dataset
All I need is a simple graph showing the best-fit regression (quadratic regression) between rao and obs_richness; but instead I am getting very different polynomial models. Any suggestions on how to fix this?
#read in data
F_Div<-read.csv('F_Div.csv', header=T)
str(F_Div)
pairs(F_Div[2:12], pch=16)
#richness vs functional diversity
par(mfrow=c(1,1))
lm1<-lm ( rao~Obs_Richness, data=F_Div)
summary (lm1)
plot (rao~Obs_Richness, data=F_Div, pch=16, xlab="Species Richness", ylab="Rao's Q")
abline(lm1, lty=3)
lines (lowess (F_Div$rao~F_Div$Obs_Richness))
poly.mod<- lm (F_Div$rao ~ poly (F_Div$Obs_Richness, 2, raw=T))
summary (poly.mod)
lines (F_Div$Obs_Richness, predict(poly.mod))
I need the line that best approximates the lowess line (a simple curve), not this squiggly mess.
I also tried this but not what need:
xx <- seq(0,30, length=67)
plot (rao~Obs_Richness, data=F_Div, pch=16, xlab="Species Richness", ylab="Rao's Q")
lines(xx, predict(poly.mod, data.frame(x=xx)), col="blue")
The squiggly mess happens because line(...) draws lines between successive points in the data's original order. Try this at the end.
p <- data.frame(x=F_Div$Obs_Richness,y=predict(poly.mod))
p <- p[order(p$x),]
lines(p)
I have the following data:
someFactor = 500
x = c(1:250)
y = x^-.25 * someFactor
which I show in a double logarithmic plot:
plot(x, y, log="xy")
Now I "find out" the slope of the data using a linear model:
model = lm(log(y) ~ log(x))
model
which gives:
Call:
lm(formula = log(y) ~ log(x))
Coefficients:
(Intercept) log(x)
6.215 -0.250
Now I'd like to plot the linear regression as a red line, but abline does not work:
abline(model, col="red")
What is the easiest way to add a regression line to my plot?
lines(log(x), exp(predict(model, newdata=list(x=log(x)))) ,col="red")
The range of values for x plotted on the log-scale and for log(x) being used as the independent variable are actually quite different. This will give you the full range:
lines(x, exp(predict(model, newdata=list(x=x))) ,col="red")
Your line is being plotted, you just can't see it in the window because the values are quite different. What is happening when you include the log='xy' argument is that the space underneath the plot (so to speak) is being distorted (stretched and/or compressed), nonetheless, the original numbers are still being used. (Imagine you are plotting these points by hand on graph paper; you are still marking a point where the faint blue graph lines for, say, (1,500) cross, but the graph paper has been continuously stretched such that the lines are not equally spaced anymore.) On the other hand, your model is using the transformed data.
You need to make your plot with the same transformed data as your model, and then simply re-mark your axes in a way that will be sufficiently intuitively accessible. This is a first try:
plot(log(x), log(y), axes=FALSE, xlab="X", ylab="Y")
box()
axis(side=1, at=log(c(1,2, 10,20, 100,200)),
labels=c( 1,2, 10,20, 100,200))
axis(side=2, at=log(c(125,135, 250,260, 350, 500)),
labels=c( 125,135, 250,260, 350, 500))
abline(model, col="red")
Instead of transforming the axes, plot the log-transformed x and y.
plot(log(x), log(y))
abline(model, col="red")
I am trying to plot the inverse of a survival function, as the data I'm is actually an increase in proportion of an event over time. I can produce Kaplan-Meier survival plots, but I want to produce the 'opposite' of these. I can kind of get what I want using the following fun="cloglog":
plot(survfit(Surv(Days_until_workers,Workers)~Queen_Number+Treatment,data=xdata),
fun="cloglog", lty=c(1:4), lwd=2, ylab="Colonies with Workers",
xlab="Days", las=1, font.lab=2, bty="n")
But I don't understand quite what this has done to the time (i.e. doesn't start at 0 and distance decreases?), and why the survival lines extend above the y axis.
Would really appreciate some help with this!
Cheers
Use fun="event" to get the desired output
fit <- survfit(Surv(time, status) ~ x, data = aml)
par(mfrow=1:2, las=1)
plot(fit, col=2:3)
plot(fit, col=2:3, fun="event")
The reason for fun="cloglog" screwing up the axes is that it does not plot a fraction at all. It is instead plotting this according to ?plot.survfit:
"cloglog" creates a complimentary log-log survival plot (f(y) = log(-log(y)) along with log scale for the x-axis)
Moreover, the fun argument is not limited to predefined functions like "event" or "cloglog", so you can easily give it your own custom function.
plot(fit, col=2:3, fun=function(y) 3*sqrt(1-y))