How to filter a trend? - r

Is it possible to filter a trend like this:
set.seed(1)
n=1000
mu = c(rep(1,100),rep(3,100),rep(5,100),rep(2,100),rep(1,600))
y = mu + rnorm(n)
and then obtain a numerical vector that defines the new trend?
It would be optimal if you could also use different threshold values

It sounds like you are looking for a smoothing function. There are many ways to achieve this: for example rolling average, loess, generalized additive models. If you want the trend to be composed of straight line sections, as in your example, you could try a regression with b-splines and degree 1.
This little function would perform such a task:
library(splines)
smoother <- function(x, n = floor(length(x) / 10), deg = 1) {
predict(lm(y ~ bs(seq_along(y), knots = seq(1, length(y), n), degree = deg)))
}
The x argument is the data from which you are trying to find the trend, and n is the number of measurements between knots (that is, the points where the gradient of the line can change). deg is the degree of the polynomial used (1 for straight line segments, and higher numbers for smoother polynomial fits).
Trying this on your example, we would get something like this:
plot(y, type = 'l')
trend <- smoother(y, 50)
lines(trend, col = 'red')
Or if you wanted a less jagged line:
plot(y, type = 'l')
lines(smoother(y, 75, 4), col = 'red')

Related

Syntax for three-piece segmented regression using NLS in R when concave

My goal is to fit a three-piece (i.e., two break-point) regression model to make predictions using propagate's predictNLS function, making sure to define knots as parameters, but my model formula seems off.
I've used the segmented package to estimate the breakpoint locations (used as starting values in NLS), but would like to keep my models in the NLS format, specifically, nlsLM {minipack.lm} because I am fitting other types of curves to my data using NLS, want to allow NLS to optimize the knot values, am sometimes using variable weights, and need to be able to easily calculate the Monte Carlo confidence intervals from propagate. Though I'm very close to having the right syntax for the formula, I'm not getting the expected/required behaviour near the breakpoint(s). The segments SHOULD meet directly at the breakpoints (without any jumps), but at least on this data, I'm getting a weird local minimum at the breakpoint (see plots below).
Below is an example of my data and general process. I believe my issue to be in the NLS formula.
library(minpack.lm)
library(segmented)
y <- c(-3.99448113, -3.82447011, -3.65447803, -3.48447030, -3.31447855, -3.14448753, -2.97447972, -2.80448401, -2.63448380, -2.46448069, -2.29448796, -2.12448912, -1.95448783, -1.78448797, -1.61448563, -1.44448719, -1.27448469, -1.10448651, -0.93448525, -0.76448637, -0.59448626, -0.42448586, -0.25448588, -0.08448548, 0.08551417, 0.25551393, 0.42551411, 0.59551395, 0.76551389, 0.93551398)
x <- c(61586.1711, 60330.5550, 54219.9925, 50927.5381, 48402.8700, 45661.9175, 37375.6023, 33249.1248, 30808.6131, 28378.6508, 22533.3782, 13901.0882, 11716.5669, 11004.7305, 10340.3429, 9587.7994, 8736.3200, 8372.1482, 8074.3709, 7788.1847, 7499.6721, 7204.3168, 6870.8192, 6413.0828, 5523.8097, 3961.6114, 3460.0913, 2907.8614, 2016.1158, 452.8841)
df<- data.frame(x,y)
#Use Segmented to get estimates for parameters with 2 breakpoints
my.seg2 <- segmented(lm(y ~ x, data = df), seg.Z = ~ x, npsi = 2)
#extract knot, intercept, and coefficient values to use as NLS start points
my.knot1 <- my.seg2$psi[1,2]
my.knot2 <- my.seg2$psi[2,2]
my.m_2 <- slope(my.seg2)$x[1,1]
my.b1 <- my.seg2$coefficients[[1]]
my.b2 <- my.seg2$coefficients[[2]]
my.b3 <- my.seg2$coefficients[[3]]
#Fit a NLS model to ~replicate segmented model. Presumably my model formula is where the problem lies
my.model <- nlsLM(y~m*x+b+(b2*(ifelse(x>=knot1&x<=knot2,1,0)*(x-knot1))+(b3*ifelse(x>knot2,1,0)*(x-knot2-knot1))),data=df, start = c(m = my.m_2, b = my.b1, b2 = my.b2, b3 = my.b3, knot1 = my.knot1, knot2 = my.knot2))
How it should look
plot(my.seg2)
How it does look
plot(x, y)
lines(x=x, y=predict(my.model), col='black', lty = 1, lwd = 1)
I was pretty sure I had it "right", but when the 95% confidence intervals are plotted with the line and prediction resolution (e.g., the density of x points) is increased, things seem dramatically incorrect.
Thank you all for your help.
Define g to be a grouping vector having the same length as x which takes on values 1, 2, 3 for the 3 sections of the X axis and create an nls model from these. The resulting plot looks ok.
my.knots <- c(my.knot1, my.knot2)
g <- cut(x, c(-Inf, my.knots, Inf), label = FALSE)
fm <- nls(y ~ a[g] + b[g] * x, df, start = list(a = c(1, 1, 1), b = c(1, 1, 1)))
plot(y ~ x, df)
lines(fitted(fm) ~ x, df, col = "red")
(continued after graph)
Constraints
Although the above looks ok and may be sufficient it does not guarantee that the segments intersect at the knots. To do that we must impose the constraints that both sides are equal at the knots:
a[2] + b[2] * my.knots[1] = a[1] + b[1] * my.knots[1]
a[3] + b[3] * my.knots[2] = a[2] + b[2] * my.knots[2]
so
a[2] = a[1] + (b[1] - b[2]) * my.knots[1]
a[3] = a[2] + (b[2] - b[3]) * my.knots[2]
= a[1] + (b[1] - b[2]) * my.knots[1] + (b[2] - b[3]) * my.knots[2]
giving:
# returns a vector of the three a values
avals <- function(a1, b) unname(cumsum(c(a1, -diff(b) * my.knots)))
fm2 <- nls(y ~ avals(a1, b)[g] + b[g] * x, df, start = list(a1 = 1, b = c(1, 1, 1)))
To get the three a values we can use:
co <- coef(fm2)
avals(co[1], co[-1])
To get the residual sum of squares:
deviance(fm2)
## [1] 0.193077
Polynomial
Although it involves a large number of parameters, a polynomial fit could be used in place of the segmented linear regression. A 12th degree polynomial involves 13 parameters but has a lower residual sum of squares than the segmented linear regression. A lower degree could be used with corresponding increase in residual sum of squares. A 7th degree polynomial involves 8 parameters and visually looks not too bad although it has a higher residual sum of squares.
fm12 <- nls(y ~ cbind(1, poly(x, 12)) %*% b, df, start = list(b = rep(1, 13)))
deviance(fm12)
## [1] 0.1899218
It may, in part, reflect a limitation in segmented. segmented returns a single change point value without quantifying the associated uncertainty. Redoing the analysis using mcp which returns Bayesian posteriors, we see that the second change point is bimodally distributed:
library(mcp)
model = list(
y ~ 1 + x, # Intercept + slope in first segment
~ 0 + x, # Only slope changes in the next segments
~ 0 + x
)
# Fit it with a large number of samples and plot the change point posteriors
fit = mcp(model, data = data.frame(x, y), iter = 50000, adapt = 10000)
plot_pars(fit, regex_pars = "^cp*", type = "dens_overlay")
FYI, mcp can plot credible intervals as well (the red dashed lines):
plot(fit, q_fit = TRUE)

get the derivative of an ECDF

Is it possible to differentiate an ECDF? Take the one obtained in the following for example example.
set.seed(1)
a <- sort(rnorm(100))
b <- ecdf(a)
plot(b)
I would like to take the derivative of b in order to obtain its probability density function (PDF).
n <- length(a) ## `a` must be sorted in non-decreasing order already
plot(a, 1:n / n, type = "s") ## "staircase" plot; not "line" plot
However I'm looking to find the derivative of b
In samples-based statistics, estimated density (for a continuous random variable) is not obtained from ECDF by differentiation, because the sample size is finite and and ECDF is not differentiable. Instead, we estimate the density directly. I guess plot(density(a)) is what you are really looking for.
a few days later..
Warning: the following is just a numerical solution without statistical ground!
I take it as an exercise to learn about R package scam for shape constrained additive models, a child package of mgcv by Prof Wood's early PhD student Dr Pya.
The logic is as such:
using scam::scam, fit a monotonically increasing P-spline to ECDF (you have to specify how many knots you want); [Note that monotonicity is not the only theoretical constraint. It is required that the smoothed ECDF are "clipped" on its two edges: the left edge at 0 and the right edge at 1. I am currently using weights to impose such constraint, by giving very large weight at two edges]
using stats::splinefun, reparametrize the fitted spline with a monotonic interpolation spline through knots and predicted values at knots;
return the interpolation spline function, which can also evaluate the 1st, 2nd and 3rd derivatives.
Why I expect this to work:
As sample size grows,
ECDF converges to CDF;
P-spline is consistent so a smoothed ECDF will be increasingly unbiased for ECDF;
the 1st derivative of smoothed ECDF will be increasingly unbiased for PDF.
Use with caution:
You have to choose number of knots yourself;
the derivative is NOT normalized so that the area under the curve is 1;
the result can be rather unstable, and is only good for large sample size.
function arguments:
x: a vector of samples;
n.knots: number of knots;
n.cells: number of grid points when plotting derivative function
You need to install scam package from CRAN.
library(scam)
test <- function (x, n.knots, n.cells) {
## get ECDF
n <- length(x)
x <- sort(x)
y <- 1:n / n
dat <- data.frame(x = x, y = y) ## make sure `scam` can find `x` and `y`
## fit a monotonically increasing P-spline for ECDF
fit <- scam::scam(y ~ s(x, bs = "mpi", k = n.knots), data = dat,
weights = c(n, rep(1, n - 2), 10 * n))
## interior knots
xk <- with(fit$smooth[[1]], knots[4:(length(knots) - 3)])
## spline values at interior knots
yk <- predict(fit, newdata = data.frame(x = xk))
## reparametrization into a monotone interpolation spline
f <- stats::splinefun(xk, yk, "hyman")
par(mfrow = c(1, 2))
plot(x, y, pch = 19, col = "gray") ## ECDF
lines(x, f(x), type = "l") ## smoothed ECDF
title(paste0("number of knots: ", n.knots,
"\neffective degree of freedom: ", round(sum(fit$edf), 2)),
cex.main = 0.8)
xg <- seq(min(x), max(x), length = n.cells)
plot(xg, f(xg, 1), type = "l") ## density estimated by scam
lines(stats::density(x), col = 2) ## a proper density estimate by density
## return smooth ECDF function
f
}
## try large sample size
set.seed(1)
x <- rnorm(1000)
f <- test(x, n.knots = 20, n.cells = 100)
f is a function as returned by stats::splinefun (read ?splinefun).
A naive, similar solution is to do interpolation spline on ECDF without smoothing. But this is a very bad idea, as we have no consistency.
g <- splinefun(sort(x), 1:length(x) / length(x), method = "hyman")
curve(g(x, deriv = 1), from = -3, to = 3)
A reminder: it is highly recommended to use stats::density for a direct density estimation.

Segmented linear regression with discontinuous data

I have a dataset that looks to be piecewise linear. I would like to perform a segmented linear regression in R. The issue is that there is a discontinuity at the breakpoint. By using some pieces of code from this question I managed to get something, but I am not satisfied.
Dataset
Here is a dummy dataset.
NB = 100
A1 = 2 # coeff for first part
A2 = 1 # coeff for second part
B1 = 0 # intercept for first part
B2 = 300 # intercept for second part
df = data.frame(n=1:NB)
df$n = sample(500, size=NB, replace=TRUE)
df$noise = sample(20, size=NB, replace=TRUE)-10
my_func <- function(n, noise) {
if(n < 100) {
return(A1*n+B1 + noise)
}
else {
return(A2*n+B2 + noise)
}
}
df$fn = mapply(my_func, df$n, df$noise)
Using segmented package
This is quite straightforward, we simply perform a classical linear regression and give it to segmented.
library(segmented)
library(ggplot2)
model_segmented = segmented(lm(fn~n, data=df), seg.Z = ~ n)
predict_segmented = data.frame(n = df$n, fn = broken.line(model_segmented)$fit)
ggplot(df, aes(x = n, y = fn)) +
geom_point() + geom_line(data = predict_segmented, color = 'blue')
Gives:
Obviously, segmented expects the data to be continuous. It is not the case here, so the regression is not correct.
“Manual” method
This method is more tedious. First, we compute the break-point by trying all the possible break points and keeping the one which yields the lowest residual. Then, we add a new factor in the linear regression, which tells if the predictor variable is greater or lower than this breakpoint.
# Computation of the break-point
Break<-sort(unique(df$n))
Break<-Break[2:(length(Break)-1)]
d<-numeric(length(Break))
for (i in 1:length(Break)) {
model_manual<-lm(fn~(n<Break[i])*n + (n>=Break[i])*n, data=df)
d[i]<-summary(model_manual)[[6]]
}
breakpoint = Break[which.min(d)]
# Linear regression using this break-point
df$group = df$n >= breakpoint
model_manual<-lm(fn~n*group, data=df)
dat_pred = data.frame(n = df$n, fn = predict(model_manual, df))
ggplot(df, aes(x = n, y = fn)) +
geom_point() +
geom_line(data=dat_pred[dat_pred$n < breakpoint,], color = 'blue') +
geom_line(data=dat_pred[dat_pred$n >= breakpoint,], color = 'blue')
Gives:
Here, the regression is great.
Question
Is there a better way to achieve this goal? Can the segmented package take discontinuous data, or is there a package that can do this?
My concern is that the second method is a bit long and not very readable.
After spending a tremendous amount of time digging, I believe the chngpt package is the way to go. It can do both continuous and discontinuous segmented regressions. Link here: https://cran.r-project.org/web/packages/chngpt/vignettes/chngpt-vignette.pdf
strucchange will detect the breakpoint using statistically valid methods. Then, you can fit each piece with whatever model you want. For example, with a seasonal time series you can apply separate ARIMA models to each segment.

R smooth.spline(): smoothing spline is not smooth but overfitting my data

I have several data points which seem suitable for fitting a spline through them. When I do this, I get a rather bumpy fit, like overfitting, which is not what I understand as smoothing.
Is there a special option / parameter for getting back the function of a really smooth spline like here.
The usage of the penalty parameter for smooth.spline didn't have any visible effect. Maybe I did it wrong?
Here are data and code:
results <- structure(
list(
beta = c(
0.983790622281964, 0.645152464354322,
0.924104713597375, 0.657703886566088, 0.788138034115623, 0.801080207252363,
1, 0.858337365965949, 0.999687052533693, 0.666552625121279, 0.717453633245958,
0.621570152961453, 0.964658181346544, 0.65071758770312, 0.788971505000918,
0.980476054183113, 0.670263506919246, 0.600387040967624, 0.759173403408052,
1, 0.986409675965, 0.982996471134736, 1, 0.995340781899163, 0.999855895958986,
1, 0.846179233381267, 0.879226324448832, 0.795820998892035, 0.997586607285667,
0.848036806290156, 0.905320944437968, 0.947709125535428, 0.592172373022407,
0.826847031044922, 0.996916006944244, 0.785967729206612, 0.650346929853076,
0.84206351833549, 0.999043126652724, 0.936879214753098, 0.76674066557003,
0.591431233516217, 1, 0.999833445117791, 0.999606223666537, 0.6224971799303,
1, 0.974537160571494, 0.966717133936379
), inventoryCost = c(
1750702.95138889,
442784.114583333, 1114717.44791667, 472669.357638889, 716895.920138889,
735396.180555556, 3837320.74652778, 872873.4375, 2872414.93055556,
481095.138888889, 538125.520833333, 392199.045138889, 1469500.95486111,
459873.784722222, 656220.486111111, 1654143.83680556, 437511.458333333,
393295.659722222, 630952.170138889, 4920958.85416667, 1723517.10069444,
1633579.86111111, 4639909.89583333, 2167748.35069444, 3062420.65972222,
5132702.34375, 838441.145833333, 937659.288194444, 697767.1875,
2523016.31944444, 800903.819444444, 1054991.49305556, 1266970.92013889,
369537.673611111, 764995.399305556, 2322879.6875, 656021.701388889,
458403.038194444, 844133.420138889, 2430700, 1232256.68402778,
695574.479166667, 351348.524305556, 3827440.71180556, 3687610.41666667,
2950652.51736111, 404550.78125, 4749901.64930556, 1510481.59722222,
1422708.07291667
)
), .Names = c("beta", "inventoryCost"), class = c("data.frame")
)
plot(results$beta,results$inventoryCost)
mySpline <- smooth.spline(results$beta,results$inventoryCost, penalty=999999)
lines(mySpline$x, mySpline$y, col="red", lwd = 2)
Transform your data sensibly before modelling
Based on the scale of your results$inventoryCost, log transform is appropriate. For simplicity, in the following I am using x, y. I am also reordering your data so that x is ascending:
x <- results$beta; y <- log(results$inventoryCost)
reorder <- order(x); x <- x[reorder]; y <- y[reorder]
par(mfrow = c(1,2))
plot(x, y, main = "take log transform")
hist(x, main = "x is skewed")
The left figure looks better? Also, it is highly recommended to further take transform for x, because it is skewed! (see right figure).
The following transform is appropriate:
x1 <- -(1-x)^(1/3)
The cubic root of (1-x) will make data more spread out around x = 1. I put an additional -1 so that there is a positively monotonic relation rather than a negative one between x and x1. Now let's check the relationship:
par(mfrow = c(1,2))
plot(x1, y, main = expression(y %~% ~ x1))
hist(x1, main = "x1 is well spread out")
Fitting a spline
Now we are ready for statistical modelling. Try the following call:
fit <- smooth.spline(x1, y, nknots = 10)
pred <- stats:::predict.smooth.spline(fit, x1)$y ## predict at all x1
## or you can simply call: pred <- predict(fit, x1)$y
plot(x1, y) ## scatter plot
lines(x1, pred, lwd = 2, col = 2) ## fitted spline
Does it look nice? Note, that I have used nknots = 10 tells smooth.spline to place 10 interior knots (by quantile); Therefore, we are to fit a penalized regression spline rather than a smoothing spline. In fact, the smooth.spline() function almost never fit a smoothing spline, unless you put all.knots = TRUE (see later example).
I also dropped penalty = 999999, as that has nothing to do with smoothness control. If you really want to control smoothness, rather than letting smooth.spline figure out the optimal one by GCV, you should use argument df or spar. I will give example later.
To transform fit back to original scale, do:
plot(x, exp(y), main = expression(Inventory %~%~ beta))
lines(x, exp(pred), lwd = 2, col = 2)
As you can see, the fitted spline is as smooth as you had expected.
Explanation on fitted spline
Let's see the summary of your fitted spline:
> fit
Smoothing Parameter spar= 0.4549062 lambda= 0.0008657722 (11 iterations)
Equivalent Degrees of Freedom (Df): 6.022959
Penalized Criterion: 0.08517417
GCV: 0.004288539
We used 10 knots, ending up with 6 degree of freedom, so penalization suppresses about 4 parameters. The smoothing parameter GCV has chosen, after 11 iterations, is lambda= 0.0008657722.
Why do we have to transform x to x1
Spline is penalized by 2nd derivatives, yet such penalization is on the averaged/integrated 2nd derivatives at all data points. Now, look at your data (x, y). For x before 0.98, the relationship is relatively steady; as x approaches 1, the relationship quickly goes steeper. The "change point", 0.98, has very high second derivative, much much higher than the second derivatives at other locations.
y0 <- as.numeric(tapply(y, x, mean)) ## remove tied values
x0 <- unique(x) ## remove tied values
dy0 <- diff(y0)/diff(x0) ## 1st order difference
ddy0 <- diff(dy0)/diff(x0[-1]) ## 2nd order difference
plot(x0[1:43], abs(ddy0), pch = 19)
Look at that huge spike in 2nd order difference/derivative! Now, if we fit a spline directly, the spline curve around this change point will be heavily penalized.
bad <- smooth.spline(x, y, all.knots = TRUE)
bad.pred <- predict(bad, x)$y
plot(x, exp(y), main = expression(Inventory %~% ~ beta))
lines(x, exp(bad.pred), col = 2, lwd = 3)
abline(v = 0.98, lwd = 2, lty = 2)
You can see clearly that the spline is having some difficulty in approximating data after x = 0.98.
There are of course some ways to achieve better approximation after this change point, for example, by manually setting smaller smoothing parameter, or higher degree of freedom. But we are going to another extreme. Remember, both penalization and degree of freedom are a global measure. Increasing model complexity will get better approximation after x = 0.98, but will also make other parts more bumpy. Now let's try a model with 45 degree of freedom:
worse <- smooth.spline(x, y, all.knots = TRUE, df = 45)
worse.pred <- predict(worse, x)$y
plot(x, exp(y), main = expression(Inventory %~% ~ beta))
lines(x, exp(worse.pred), col = 2, lwd = 2)
As you can see, the curve is bumpy. Sure, we have overfitted our dataset of 50 data, with 45 degree of freedom.
In fact, your original misuse of smooth.spline() is doing the same thing:
> mySpline
Call:
smooth.spline(x = results$beta, y = results$inventoryCost, penalty = 999999)
Smoothing Parameter spar= -0.8074624 lambda= 3.266077e-19 (17 iterations)
Equivalent Degrees of Freedom (Df): 45
Penalized Criterion: 5.598386
GCV: 0.03824885
Oops, 45 degree of freedom, overfitting!
I don't think you should use / want splinefun. I would suggest fitting a GAM instead:
library(mgcv)
fit <- gam(inventoryCost ~ s(beta, bs = "cr", k = 20), data = results)
summary(fit)
gam.check(fit)
plot(fit)
plot(inventoryCost ~ beta, data = results, col = "dark red", , pch = 16)
curve(predict(fit, newdata = data.frame(beta = x)), add = TRUE,
from = min(results$beta), to = max(results$beta), n = 1e3, lwd = 2)

How to compute prediction intervals for a circle fit in R

I wish to compute the prediction interval of the radius from a circle fit with the formula > r² = (x-h)²+(y-k)². r- radius of the circle, x,y, are gaussian coordinates, h,k, mark the center of the fitted circle.
# data
x <- c(1,2.2,1,2.5,1.5,0.5,1.7)
y <- c(1,1,3,2.5,4,1.7,0.8)
# using nls.lm from minpack.lm (minimising the sum of squared residuals)
library(minpack.lm)
residFun <- function(par,x,y) {
res <- sqrt((x-par$h)^2+(y-par$k)^2)-par$r
return(res)
}
parStart <- list("h" = 1.5, "k" = 2.5, "r" = 1.7)
out <- nls.lm(par = parStart, x = x, y = y, lower =NULL, upper = NULL, residFun)
The problem is, predict() doesn't work with nls.lm, hence I am trying to compute the circle fit using nlsLM. (I could compute it by hand, but have troubles creating my Designmatrix).`
So this is what I tried next:
dat = list("x" = x,"y" = y)
out1 <- nlsLM(y ~ sqrt(-(x-h)^2+r^2)+k, start = parStart )
which results in:
Error in stats:::nlsModel(formula, mf, start, wts) :
singular gradient matrix at initial parameter estimates
Question 1a: How does nlsLM() work with circle fits? (advantage being that the generic predict() is available.
Question 1b: How do I get the prediction interval for my circle fit?
EXAMPLE from linear regression (this is what I want for the circle regression)
attach(faithful)
eruption.lm = lm(eruptions ~ waiting)
newdata = data.frame(waiting=seq(45,90, length = 272))
# confidence interval
conf <- predict(eruption.lm, newdata, interval="confidence")
# prediction interval
pred <- predict(eruption.lm, newdata, interval="predict")
# plot of the data [1], the regression line [1], confidence interval [2], and prediction interval [3]
plot(eruptions ~ waiting)
lines(conf[,1] ~ newdata$waiting, col = "black") # [1]
lines(conf[,2] ~ newdata$waiting, col = "red") # [2]
lines(conf[,3] ~ newdata$waiting, col = "red") # [2]
lines(pred[,2] ~ newdata$waiting, col = "blue") # [3]
lines(pred[,3] ~ newdata$waiting, col = "blue") # [3]
Kind regards
Summary of Edits:
Edit1: Rearranged formula in nlsLM, but parameter (h,k,r) results are now different in out and out1 ...
Edit2: Added 2 wikipedia links for clarification puprose on terminology used: (c.f. below)
confidence interval
prediction interval
Edit3: Some rephrasing of the question(s)
Edit4: Added a working example for linear regression
I am having a hard time figuring out what you want to do. Let me illustrate what the data looks like and something about the "prediction".
plot(x,y, xlim=range(x)*c(0, 1.5), ylim=range(y)*c(0, 1.5))
lines(out$par$h+c(-1,-1,1,1,-1)*out$par$r, # extremes of x-coord
out$par$k+c(-1,1,1,-1 ,-1)*out$par$r, # extremes of y-coord
col="red")
So what "prediction interval" are we speaking about? ( I do realize that you were thinking of a circle and if you just want to plot a circle on this background that's going to be pretty easy as well.)
lines(out$par$h+cos(seq(-pi,pi, by=0.1))*out$par$r, #center + r*cos(theta)
out$par$k+sin(seq(-pi,pi, by=0.1))*out$par$r, #center + r*sin(theta)
col="red")
I think that this question is not answerable in its current form. Any predict() function that is based on a linear model will require the predicted variable to be a linear function of the input design matrix. r^2 = (x-x0)^2 + (y-y0)^2 is not a linear function of the design matrix (which would be something like [x0 x y0 y], so I don't think you're going to be able to find a linear model fit that will give you confidence intervals. If someone more clever than I am has a way to do it, though, I'd be very interested in hearing about it.
The general way to approach these sorts of problems is to create a hierarchical nonlinear model, where your hyperparameters would be x0 and y0 (your h and k) with uniform distribution over your search space, and then the r^2 would be distributed ~N((x-x0)^2+(y-y0)^2, \sigma). You would then use MCMC sampling or similar to get your posterior confidence intervals.
Here's a solution to find h,k,r using base R's optim function. You essentially create a cost function that is a closure containing the data you wish to optimize over. I had to RSS value, else we would go to -Inf. There is a local optima problem, so you need to run this a few times...
# data
x <- c(1,2.2,1,2.5,1.5,0.5,1.7)
y <- c(1,1,3,2.5,4,1.7,0.8)
residFunArg <- function(xVector,yVector){
function(theta,xVec=xVector,yVec=yVector){
#print(xVec);print(h);print(r);print(k)
sum(sqrt((xVec-theta[1])^2+(yVec-theta[2])^2)-theta[3])^2
}
}
rFun = residFunArg(x,y);
o = optim(f=rFun,par=c(0,0,0))
h = o$par[1]
k = o$par[2]
r = o$par[3]
Run this command in the REPL to observe the local mins:
o=optim(f=tFun,par=runif(3),method="CG");o$par

Resources