Curve fitting "best fit in 3d " with matlab or R - r

I have a problem with fitting a curve in 3D point set (or point cloud) in space. When I look at curve fitting tools, they mostly create a surface when given a point set [x,y,z]. But it is not what I want. I would like to fit on point set curve not surface.
So please help me what is the best solution for curve fitting in space (3D).
Particularly, my data looks like polynomial curve in 3d.
Equation is
z ~ ax^2 + bxy + cy^2 + d
and there is not any pre-estimated coefficients [a,b,c,d].
Thanks.
xyz <- read.table( text="x y z
518315,750 4328698,260 101,139
518315,429 4328699,830 101,120
518315,570 4328700,659 101,139
518315,350 4328702,050 101,180
518315,3894328702,849 101,190
518315,239 4328704,020 101,430", header=TRUE, dec=",")
sample image is here

With a bit of data we can now demonstrate a rather hackis effort in the direction you suggest, although this really is estimating a surface, despite your best efforts to convince us otherwise:
xyz <- read.table(text="x y z
518315,750 4328698,260 101,139
518315,429 4328699,830 101,120
518315,570 4328700,659 101,139
518315,350 4328702,050 101,180
518315,389 4328702,849 101,190
518315,239 4328704,020 101,430", header=TRUE, dec=",")
lm( z ~ I(x^2)+I(x*y) + I(y^2), data=xyz)
#---------------
Call:
lm(formula = z ~ I(x^2) + I(x * y) + I(y^2), data = xyz)
Coefficients:
(Intercept) I(x^2) I(x * y) I(y^2)
-1.182e+05 -3.187e-07 9.089e-08 NA
The collinearity of x^2 and x*y with y^2 is preventing an estimate of the y^2 variable coefficient since y = x*y/x. You can also use nls to estimate parameters for non-linear surfaces.

I suppose that you want to fit a parametrized curve of of this type:
r(t) = a + bt + ct^2
Therefore, you will have to do three independent fits:
x = ax + bx*t + cx*t^2
y = ay + by*t + cy*t^2
z = az + bz*t + cz*t^2
and obtain nine fitting parameters ax,ay,az,bx,by,bz,cx,cy,cz. Your data contains the positions x,y,z and you also need to include the time variable t=1,2,3,...,5 assuming that the points are sampled at equal time intervals.
If the 'time' parameter of your data points is unknown/random, then I suppose that you will have to estimate it yourself as another fitting parameter, one per data point. So what I suggest is the following:
Assume some reasonable parameters a,b,c.
Write a function which calculates the time t_i of each data point by
minimizing the square distance between that point and the tentative
curve r(t).
Calculate the sum of all (r(t)-R(t))^2
between the curve and your dataset R. This will be your fitting score, or
the Figure of Merit
use Matlab's genetic algoritm ga() routine to
obtain an optimal a,b,c which will minimize the Figure
of Merit as defined above
Good luck!

Related

Plotting a line y = aX^b with known a and b parameters

I found the parameters a and b of the above equation by fitting a linear model to log(y) = log(a) + b*log(X). I am wanting to back transform the model into a non-linear plot of the line following the equation y = aX^b using R software. I understand there are functions in R to fit a model (e.g., nls()), however, I am not interested in fitting a non-linear model I only want to plot the non-linear line that was found using the log-log transformation. Any suggestions?
Thank you in advance!
If you have a fully parameterized equation, you just need to make a vector of the domain you want to view (the X values), directly compute the Y values, and plot them.
a=1; b=2;
x = seq(-10, 10, 0.1)
y = a*(x^b)
plot(x,y)
You can try the code below
a <- 1
b <- 2
f <- function(x) a * x^b
curve(f, -10, 10)
and you will see

How to find coeficients in power function?

I'm working with LTV prediction and stuck with a problem.
I need to solve power equation: a*x**b = y, where x and y are variables, of which I know the first 30, but a and b are constants, which I don't know.
Task is to find a and b such that predicted y will have the smallest square deviation from known.
At this moment I find only a solution on Excel.
A=EXP(INDEX(LINEST(LN(Known Ys), LN(Known Xs)), 2))
B=INDEX(LINEST(LN(Known Ys), LN(Known Xs)), 1)
In R this should be something like
## fit a log-log model and extract coefficients
cc <- coef(lm(log(y) ~ log(x)))
## the slope of the log-log model is the exponent
b <- cc[["y"]]
## exp(intercept) is the multiplicative coefficient
a <- exp(cc[["(Intercept)"]])
Note that these solutions minimize the squared error on the log scale: if you want to minimize the squared error on the linear scale, you need to fit
glm(y~log(x), family=gaussian(link="log"))
and then extract its coefficients etc.

Multiple linear regression: Plot a straight line with confidence intervals

Here is my question:
1) I ran a multiple linear regression: suppose like:
lm(attitude~quality+price+location+Income)
I mainly care about the relationship between attitude and quality, and other variables are control variables.
2) Then I wanted to do a scatter plot between attitude and quality. It is easy:
Q <-ggplot(data=data, aes(x=quality, y=attitude))
Q + geom_point(size = 1)
3) I further wanted to plot the fitted line between x and y, and the slope should be the partial regression coefficient from the multiple linear regression. That is, it should be the b1 in the following formula: attitude=b1*quality+b2*price+b3*location+b4*Income, rather than the b in the following formula: attitude=b*quality. Therefore, the following code does not work correctly, as it will plot the slope of b rather than b1.
g <- g + geom_smooth(method = lm)
Someone asked a very similar question, see here
The answer provided looks like this (replaced with my variables):
g <- g + geom_smooth(data=data, aes(x=quality, y=attitude, ymin=lcl, ymax=ucl))
However, this is a LOWESS plot (as you can see the figure posted in the post), not a linear straight line plot.
My question: how can I add a straight line of slope b1, with confidence interval band?
If you want to see b1, you should draw the partial regression plots, as I know.
In this case,
regress attitude on every variable except quality
regress quality on the other predictors
plot residual of these fits:
X <-
data_frame(
x = lm(quality ~.-attitude, data = data)$resid,
y = lm(attitude ~ .-quality, data = data)$resid
)
X %>%
ggplot(aes(x, y)) +
geom_smooth(method = "lm")
This line might be same as b1 although not x,y points.

`gam` package: extra shift spotted when sketching data on `plot.gam`

I try to fit a GAM using the gam package (I know mgcv is more flexible, but I need to use gam here). I now have the problem that the model looks good, but in comparison with the original data it seems to be offset along the y-axis by a constant value, for which I cannot figure out where this comes from.
This code reproduces the problem:
library(gam)
data(gam.data)
x <- gam.data$x
y <- gam.data$y
fit <- gam(y ~ s(x,6))
fit$coefficients
#(Intercept) s(x, 6)
# 1.921819 -2.318771
plot(fit, ylim = range(y))
points(x, y)
points(x, y -1.921819, col=2)
legend("topright", pch=1, col=1:2, legend=c("Original", "Minus intercept"))
Chambers, J. M. and Hastie, T. J. (1993) Statistical Models in S (Chapman & Hall) shows that there should not be an offset, and this is also intuitively correct (the smooth should describe the data).
I noticed something comparable in mgcv, which can be solved by providing the shift parameter with the intercept value of the model (because the smooth is seemingly centred). I thought the same could be true here, so I subtracted the intercept from the original data-points. However, the plot above shows this idea wrong. I don't know where the extra shift comes from. I hope someone here may be able to help me.
(R version. 3.3.1; gam version 1.12)
I think I should first explain various output in the fitted GAM model:
library(gam)
data(gam.data)
x <- gam.data$x
y <- gam.data$y
fit <-gam(y ~ s(x,6), model = FALSE)
## coefficients for parametric part
## this includes intercept and null space of spline
beta <- coef(fit)
## null space of spline smooth (a linear term, just `x`)
nullspace <- fit$smooth.frame[,1]
nullspace - x ## all 0
## smooth space that are penalized
## note, the backfitting procedure guarantees that this is centred
pensmooth <- fit$smooth[,1]
sum(pensmooth) ## centred
# [1] 5.89806e-17
## estimated smooth function (null space + penalized space)
smooth <- nullspace * beta[2] + pensmooth
## centred smooth function (this is what `plot.gam` is going to plot)
c0 <- mean(smooth)
censmooth <- smooth - c0
## additive predictors (this is just fitted values in Gaussian case)
addpred <- beta[1] + smooth
You can first verify that addpred is what fit$additive.predictors gives, and since we are fitting additive models with Gaussian response, this is also as same as fit$fitted.values.
What plot.gam does, is to plot censmooth:
plot.gam(fit, col = 4, ylim = c(-1.5,1.5))
points(x, censmooth, col = "gray")
Remember, there is
addpred = beta[0] + censmooth + c0
If you want to shift original data y to match this plot, you not only need to subtract intercept (beta[0]), but also c0 from y:
points(x, y - beta[1] - c0)

Fitting logarithmic curve in R

If I have a set of points in R that are linear I can do the following to plot the points, fit a line to them, then display the line:
x=c(61,610,1037,2074,3050,4087,5002,6100,7015)
y=c(0.401244, 0.844381, 1.18922, 1.93864, 2.76673, 3.52449, 4.21855, 5.04368, 5.80071)
plot(x,y)
Estimate = lm(y ~ x)
abline(Estimate)
Now, if I have a set of points that looks like a logarithmic curve fit is more appropriate such as the following:
x=c(61,610,1037,2074,3050,4087,5002,6100,7015)
y=c(0.974206,1.16716,1.19879,1.28192,1.30739,1.32019,1.35494,1.36941,1.37505)
I know I can get the standard regression fit against the log of the x values with the following:
logEstimate = lm(y ~ log(x))
But then how do I transform that logEstimate back to normal scaling and plot the curve against my linear curve from earlier?
Hmmm, I'm not quite sure what you mean by "plot the curve against my linear curve from earlier".
d <- data.frame(x,y) ## need to use data in a data.frame for predict()
logEstimate <- lm(y~log(x),data=d)
Here are three ways to get predicted values:
(1) Use predict:
plot(x,y)
xvec <- seq(0,7000,length=101)
logpred <- predict(logEstimate,newdata=data.frame(x=xvec))
lines(xvec,logpred)
(2) Extract the numeric coefficient values:
coef(logEstimate)
## (Intercept) log(x)
## 0.6183839 0.0856404
curve(0.61838+0.08564*log(x),add=TRUE,col=2)
(3) Use with() magic (you need back-quotes around the parameter estimate names because they contain parentheses)
with(as.list(coef(logEstimate)),
curve(`(Intercept)`+`log(x)`*log(x),add=TRUE,col=4))
Maybe what you want is
est1 <- predict(lm(y~x,data=d),newdata=data.frame(x=xvec))
plot(est1,logpred)
... although I'm not sure why ...
I'm not exactly sure what you mean either... but I guessed a little different. I think you want to fit two models to those points, one linear, and one logged. Then, you want to plot the points, and the functional form of both models. Here is the code for that:
x=c(61,610,1037,2074,3050,4087,5002,6100,7015)
y=c(0.974206,1.16716,1.19879,1.28192,1.30739,1.32019,1.35494,1.36941,1.37505)
Estimate = lm(y ~ x)
logEstimate = lm(y ~ log(x))
plot(x,predict(Estimate),type='l',col='blue')
lines(x,predict(logEstimate),col='red')
points(x,y)
In response to your second question in the comment, linear regression does always return a linear combination of your predictors, but that doesn't necessarily mean that it is a straight line. Think about what your log transformation really means: If you fit,
y = log(x)
that is the same as fitting
exp(y) = x
Which means that as x increases linearly, then y will change exponentially, which is clearly not a 'straight line'. However, if you transformed your x-axis on the log scale, then the displayed line would be straight.

Resources