I found the parameters a and b of the above equation by fitting a linear model to log(y) = log(a) + b*log(X). I am wanting to back transform the model into a non-linear plot of the line following the equation y = aX^b using R software. I understand there are functions in R to fit a model (e.g., nls()), however, I am not interested in fitting a non-linear model I only want to plot the non-linear line that was found using the log-log transformation. Any suggestions?
Thank you in advance!
If you have a fully parameterized equation, you just need to make a vector of the domain you want to view (the X values), directly compute the Y values, and plot them.
a=1; b=2;
x = seq(-10, 10, 0.1)
y = a*(x^b)
plot(x,y)
You can try the code below
a <- 1
b <- 2
f <- function(x) a * x^b
curve(f, -10, 10)
and you will see
Related
I'm working with LTV prediction and stuck with a problem.
I need to solve power equation: a*x**b = y, where x and y are variables, of which I know the first 30, but a and b are constants, which I don't know.
Task is to find a and b such that predicted y will have the smallest square deviation from known.
At this moment I find only a solution on Excel.
A=EXP(INDEX(LINEST(LN(Known Ys), LN(Known Xs)), 2))
B=INDEX(LINEST(LN(Known Ys), LN(Known Xs)), 1)
In R this should be something like
## fit a log-log model and extract coefficients
cc <- coef(lm(log(y) ~ log(x)))
## the slope of the log-log model is the exponent
b <- cc[["y"]]
## exp(intercept) is the multiplicative coefficient
a <- exp(cc[["(Intercept)"]])
Note that these solutions minimize the squared error on the log scale: if you want to minimize the squared error on the linear scale, you need to fit
glm(y~log(x), family=gaussian(link="log"))
and then extract its coefficients etc.
I try to fit a GAM using the gam package (I know mgcv is more flexible, but I need to use gam here). I now have the problem that the model looks good, but in comparison with the original data it seems to be offset along the y-axis by a constant value, for which I cannot figure out where this comes from.
This code reproduces the problem:
library(gam)
data(gam.data)
x <- gam.data$x
y <- gam.data$y
fit <- gam(y ~ s(x,6))
fit$coefficients
#(Intercept) s(x, 6)
# 1.921819 -2.318771
plot(fit, ylim = range(y))
points(x, y)
points(x, y -1.921819, col=2)
legend("topright", pch=1, col=1:2, legend=c("Original", "Minus intercept"))
Chambers, J. M. and Hastie, T. J. (1993) Statistical Models in S (Chapman & Hall) shows that there should not be an offset, and this is also intuitively correct (the smooth should describe the data).
I noticed something comparable in mgcv, which can be solved by providing the shift parameter with the intercept value of the model (because the smooth is seemingly centred). I thought the same could be true here, so I subtracted the intercept from the original data-points. However, the plot above shows this idea wrong. I don't know where the extra shift comes from. I hope someone here may be able to help me.
(R version. 3.3.1; gam version 1.12)
I think I should first explain various output in the fitted GAM model:
library(gam)
data(gam.data)
x <- gam.data$x
y <- gam.data$y
fit <-gam(y ~ s(x,6), model = FALSE)
## coefficients for parametric part
## this includes intercept and null space of spline
beta <- coef(fit)
## null space of spline smooth (a linear term, just `x`)
nullspace <- fit$smooth.frame[,1]
nullspace - x ## all 0
## smooth space that are penalized
## note, the backfitting procedure guarantees that this is centred
pensmooth <- fit$smooth[,1]
sum(pensmooth) ## centred
# [1] 5.89806e-17
## estimated smooth function (null space + penalized space)
smooth <- nullspace * beta[2] + pensmooth
## centred smooth function (this is what `plot.gam` is going to plot)
c0 <- mean(smooth)
censmooth <- smooth - c0
## additive predictors (this is just fitted values in Gaussian case)
addpred <- beta[1] + smooth
You can first verify that addpred is what fit$additive.predictors gives, and since we are fitting additive models with Gaussian response, this is also as same as fit$fitted.values.
What plot.gam does, is to plot censmooth:
plot.gam(fit, col = 4, ylim = c(-1.5,1.5))
points(x, censmooth, col = "gray")
Remember, there is
addpred = beta[0] + censmooth + c0
If you want to shift original data y to match this plot, you not only need to subtract intercept (beta[0]), but also c0 from y:
points(x, y - beta[1] - c0)
I have a problem with fitting a curve in 3D point set (or point cloud) in space. When I look at curve fitting tools, they mostly create a surface when given a point set [x,y,z]. But it is not what I want. I would like to fit on point set curve not surface.
So please help me what is the best solution for curve fitting in space (3D).
Particularly, my data looks like polynomial curve in 3d.
Equation is
z ~ ax^2 + bxy + cy^2 + d
and there is not any pre-estimated coefficients [a,b,c,d].
Thanks.
xyz <- read.table( text="x y z
518315,750 4328698,260 101,139
518315,429 4328699,830 101,120
518315,570 4328700,659 101,139
518315,350 4328702,050 101,180
518315,3894328702,849 101,190
518315,239 4328704,020 101,430", header=TRUE, dec=",")
sample image is here
With a bit of data we can now demonstrate a rather hackis effort in the direction you suggest, although this really is estimating a surface, despite your best efforts to convince us otherwise:
xyz <- read.table(text="x y z
518315,750 4328698,260 101,139
518315,429 4328699,830 101,120
518315,570 4328700,659 101,139
518315,350 4328702,050 101,180
518315,389 4328702,849 101,190
518315,239 4328704,020 101,430", header=TRUE, dec=",")
lm( z ~ I(x^2)+I(x*y) + I(y^2), data=xyz)
#---------------
Call:
lm(formula = z ~ I(x^2) + I(x * y) + I(y^2), data = xyz)
Coefficients:
(Intercept) I(x^2) I(x * y) I(y^2)
-1.182e+05 -3.187e-07 9.089e-08 NA
The collinearity of x^2 and x*y with y^2 is preventing an estimate of the y^2 variable coefficient since y = x*y/x. You can also use nls to estimate parameters for non-linear surfaces.
I suppose that you want to fit a parametrized curve of of this type:
r(t) = a + bt + ct^2
Therefore, you will have to do three independent fits:
x = ax + bx*t + cx*t^2
y = ay + by*t + cy*t^2
z = az + bz*t + cz*t^2
and obtain nine fitting parameters ax,ay,az,bx,by,bz,cx,cy,cz. Your data contains the positions x,y,z and you also need to include the time variable t=1,2,3,...,5 assuming that the points are sampled at equal time intervals.
If the 'time' parameter of your data points is unknown/random, then I suppose that you will have to estimate it yourself as another fitting parameter, one per data point. So what I suggest is the following:
Assume some reasonable parameters a,b,c.
Write a function which calculates the time t_i of each data point by
minimizing the square distance between that point and the tentative
curve r(t).
Calculate the sum of all (r(t)-R(t))^2
between the curve and your dataset R. This will be your fitting score, or
the Figure of Merit
use Matlab's genetic algoritm ga() routine to
obtain an optimal a,b,c which will minimize the Figure
of Merit as defined above
Good luck!
I have fitted a logistic regression model that takes 3 variables into account. I would like to make a 3D plot of the datapoints and draw the decision boundary (which I suppose would be a plane here).
I found an online example that applies to the case (so that you can load the data directly)
mydata <- read.csv("http://www.ats.ucla.edu/stat/data/binary.csv")
mylogit <- glm(admit ~ gre + gpa + rank, data = mydata, family = "binomial")
I was thinking of using the 3Dscatterplot package, but I am not sure what equation I should write to draw the boundary. Any ideas?
Many thanks,
The decision boundary will be a 3-d plane, which you could plot with any 3-d plotting package in R. I'll use persp by defining an x-y grid and then calculating the corresponding z value with the outer function:
# Use iris dataset for example logistic regression
data(iris)
iris$long <- as.numeric(iris$Sepal.Length > 6)
mod <- glm(long~Sepal.Width+Petal.Length+Petal.Width, data=iris, family="binomial")
# Plot 50% decision boundary; another cutoff can be achieved by changing the intercept term
x <- seq(2, 5, by=.1)
y <- seq(1, 7, by=.1)
z <- outer(x, y, function(x, y) (-coef(mod)[1] - coef(mod)[2]*x - coef(mod)[3]*y) /
coef(mod)[4])
persp(x, y, z, col="lightblue")
I want to use y=a^(b^x) to fit the data below,
y <- c(1.0385, 1.0195, 1.0176, 1.0100, 1.0090, 1.0079, 1.0068, 1.0099, 1.0038)
x <- c(3,4,5,6,7,8,9,10,11)
data <- data.frame(x,y)
When I use the non-linear least squares procedure,
f <- function(x,a,b) {a^(b^x)}
(m <- nls(y ~ f(x,a,b), data = data, start = c(a=1, b=0.5)))
it produces an error: singular gradient matrix at initial parameter estimates. The result is roughly a = 1.1466, b = 0.6415, so there shouldn't be a problem with intial parameter estimates as I have defined them as a=1, b=0.5.
I have read in other topics that it is convenient to modify the curve. I was thinking about something like log y=log a *(b^x), but I don't know how to deal with function specification. Any idea?
I will expand my comment into an answer.
If I use the following:
y <- c(1.0385, 1.0195, 1.0176, 1.0100, 1.0090, 1.0079, 1.0068, 1.0099, 1.0038)
x <- c(3,4,5,6,7,8,9,10,11)
data <- data.frame(x,y)
f <- function(x,a,b) {a^b^x}
(m <- nls(y ~ f(x,a,b), data = data, start = c(a=0.9, b=0.6)))
or
(m <- nls(y ~ f(x,a,b), data = data, start = c(a=1.2, b=0.4)))
I obtain:
Nonlinear regression model
model: y ~ f(x, a, b)
data: data
a b
1.0934 0.7242
residual sum-of-squares: 0.0001006
Number of iterations to convergence: 10
Achieved convergence tolerance: 3.301e-06
I always obtain an error if I use 1 as a starting value for a, perhaps because 1 raised to anything is 1.
As for automatically generating starting values, I am not familiar with a procedure to do that. One method I have read about is to simulate curves and use starting values that generate a curve that appears to approximate your data.
Here is the plot generated using the above parameter estimates using the following code. I admit that maybe the lower right portion of the line could fit a little better:
setwd('c:/users/mmiller21/simple R programs/')
jpeg(filename = "nlr.plot.jpeg")
plot(x,y)
curve(1.0934^(0.7242^x), from=0, to=11, add=TRUE)
dev.off()