I am attempting to optimize (minimize) a function with two parameters that should have a unique solution.
foo <- function(x) {
x1 <- x[1]
x2 <- x[2]
t=5-sqrt((0-x1)^2+(0-x2)^2);
u=4-sqrt((0-x1)^2+(4-x2)^2);
v=3-sqrt((3-x1)^2+(0-x2)^2);
return(sum(t,u,v))
}
optim(c(0,0), foo)
Those of you who loved middle school math may recognize the formula for the distance between two points on a Cartesian (x,y) coordinate plane. The function foo is written so that x1 is the x coordinate and x2 is the y coordinate of a point I'm trying to find. In this case, that point is (3,4). However, I'm getting a whacky output:
optim(c(0,0), foo)
$par
[1] -3.938866e+54 1.293779e+54
$value
[1] -1.243772e+55
$counts
function gradient
501 NA
$convergence
[1] 1
$message
NULL
Any idea what's going wrong?
As is, your function has no minimum, since t, u and v decrease as x1 and x2 go farther away from the three fixed points (0,0), (0,4) and (3,0).
It seems that you want to find a point (x1,x2) that minimizes the distance to those three points. If that's the case, you should define f as:
foo <- function(x) {
x1 <- x[1]
x2 <- x[2]
t = 5 + sqrt((0-x1)^2+(0-x2)^2);
u = 4 + sqrt((0-x1)^2+(4-x2)^2);
v = 3 + sqrt((3-x1)^2+(0-x2)^2);
return(sum(t,u,v))
}
Note that the only change is -sqrt to +sqrt.
The answer is not point (3,4) as you expect, the minimum is a point inside the triangle that have those three points as vertex.
Solution:
> z
$par
[1] 0.7510095 0.6954136
$value
[1] 18.76643
$counts
function gradient
59 NA
$convergence
[1] 0
$message
NULL
The red point is the solution:
The main issue was that the 3 and 4 were flipped. The other was that your differences were signed. But also the routine can converge to a local min which is not the absolute min. Instead:
foo <- function(x) {
t=5-sqrt((0-x[1])^2+(0-x[2])^2);
u=3-sqrt((0-x[1])^2+(4-x[2])^2);
v=4-sqrt((3-x[1])^2+(0-x[2])^2);
return(sum(abs(t),abs(u),abs(v)))
}
optim(c(4,0), foo)
Related
I am using a simple polynomial to fit a curve.
poly <- function(a, b, c, x) a * x^2 + b * x + c
I'd like to find the value of x that results in the maximum value of the curve. Currently I create a grid with a range of x from 20000 to 50000, run the function for each row, then use max() on the result. It works, but I have a lot of groups and it creates a big dataframe every time I do it. It is very clunky and I feel like there must be a better way.
Some typical coefficients are:
a <- -0.000000179
b <- 0.011153167
c <- 9.896420781
If you rearrange your function so the variable you want to maximize is first and you set the default values like so:
poly <- function(x, a, b, c) a * x^2 + b * x + c
formals(poly)$a <- -0.000000179
formals(poly)$b <- 0.011153167
formals(poly)$c <- 9.896420781
Then you can use the optimize function to maximize over your interval:
optimize(poly, c(20000, 50000), maximum = T)
$`maximum`
[1] 31154.1
$objective
[1] 183.6298
Where $maximum is the x value at which the maximum occurs and $objective is the height.
If a is negative, maximum of parabola a * x^2 + b * x + c is reached at -b/(2*a) :
a<0
#[1] TRUE
-b/(2*a)
#[1] 31154.1
You could use optim. I think the other solutions answered in this thread are more appealing, but I'll write this up for completeness:
a <- -0.000000179
b <- 0.011153167
c <- 9.896420781
o <- optim(
par=list(x=0),
fn=function(x){ -poly(a,b,c,x=x) },
method="Brent",
lower=-50e3, upper=50e3
)
Output:
> o
$par
[1] 31154.1
$value
[1] -183.6298
$counts
function gradient
NA NA
$convergence
[1] 0
$message
NULL
I'd like to minimize several differences. For one difference, this seems straight forward:
target1 <- 1.887
data <- seq(0,1, by=.001)
#Step 1
somefunction <- function(dat, target1, X){
#some random function...
t <- sum(dat)
y <- t * X
#minimize this difference
diff <- target1-y
return(diff)
}
V1 <- optimize(f = somefunction,
interval = c(0,1),
dat=data,
target1=target,
maximum = T)
V1$maximum
6.610696e-05
#--> This value for `X` should minimize the difference...
V1$maximum * sum(data)
#0.03308653
#--> as close to zero we get
Now, I'd like to minimize several differences in one step relying on optim but this does not work properly:
#Step 2
set.seed(1)
data2 <- data.frame(dat1=seq(0,1, by=.01),
dat2=runif(101),
dat3=runif(101))
somefunction_general <- function(dat, target1, target2, target3, X){
#some random function...
y <- sum(dat[,1]) * X[1]
y1 <- sum(dat[,2]) * X[2]
y2 <- sum(dat[,3]) * X[3]
#minimize these differences...
diff1 <- target1-y
diff2 <- target2-y1
diff3 <- target3-y2
#almost certain that this is wrong...
vtr <- sum(abs(diff1), abs(diff2), abs(diff3))
return(vtr)
}
V2 <- optim(par=c(1,1,1),
fn = somefunction_general,
dat=data2,
target1=1.8,
target2=2,
target3=4,
control = list(fnscale = -1))
sum(data2[,1])
[1] 50.5
sum(data2[,2])
[1] 44.27654
sum(data2[,3])
[1] 51.73668
V2$par[1]*sum(data2[,1])
#[1] 1.469199e+45
V2$par[2]*sum(data2[,2])
#[1] 1.128977e+45
V2$par[3]*sum(data2[,3])
[1] 2.923681e+45
Looks like there's some disagreement between the first function and the second? In the first function, you're returning target1-sum(dat)*X and then trying to find the maximum over X values in [0, 1].
But since you're returning the raw difference and not the absolute value, you're actually just maximizing -sum(dat)*X, or, equivalently, minimizing sum(dat)*X. Since the dat is constant, naturally the optimize function is going to return the smallest value on the interval each time (0 in the example).
For the first function, I think what you want to do is return the absolute value of the difference and then find the minimum and not the maximum. The fix for the second function, somefunction_general, is even simpler, since you're already returning sum(abs(diff1), abs(diff2), abs(diff3)): just make sure the minimum is returned by getting rid of control = list(fnscale = -1)
V2 <- optim(par=c(1,1,1),
fn = somefunction_general,
dat=data2,
target1=1.8,
target2=2,
target3=4)
V2$par
[1] 0.03564358 0.03837754 0.07748929
You should write a function such that whether there is one parameter or more, optim should work on it:
somefunction_general <- function(X, dat, target){
dat <- as.matrix(dat)
y <- colSums(dat) * X
sum((target-y)^2) # Often use the MSE
}
let us test this
data2 <- data.frame(dat1=seq(0,1, by=.01),
dat2=runif(101),
dat3=runif(101))
data <- seq(0,1, by=.001)
(a <-optim(0,somefunction_general,dat = data,target = 1.887,method = "BFGS"))
$par
[1] 0.00377023
$value
[1] 3.64651e-28
$counts
function gradient
25 3
$convergence
[1] 0
$message
NULL
We can not that the function value is zero. thus the parameter a$par is what we want. check this out
a$par*sum(data)
[1] 1.887
We can also have 3 parameters 1 target eg:
(b<-optim(c(0,0,0),somefunction_general,dat = data2,target = 1.887))
$par
[1] 0.03736837 0.04262253 0.03647203
$value
[1] 4.579334e-08
$counts
function gradient
100 NA
$convergence
[1] 0
$message
NULL
b$par*colSums(data2)
dat1 dat2 dat3
1.887103 1.887178 1.886942
Each almost got to the target of 1.887. note that this is similar to running the first one 3 times.
lastly:
(d<-optim(c(0,0,0),somefunction_general,dat = data2,target = c(1.8, 2, 4)))
$par
[1] 0.03564672 0.04516916 0.07730660
$value
[1] 2.004725e-07
$counts
function gradient
88 NA
$convergence
[1] 0
$message
NULL
the target was achieved:
d$par*colSums(data2)
dat1 dat2 dat3
1.800160 1.999934 3.999587
This one function can work on n dimensions. please use the method BFGS unless it does not converge.
What if there is one parameter with three targets? well this is quite difficult. Unless there is such a parameter, then it wont converge.
suppose we say the parameter is 0.01, what is the target?
colSums(data2)*0.01
dat1 dat2 dat3
0.5050000 0.4427654 0.5173668
Okay, suppose we were given this target, can we get the 0.01 back?
(e<-optim(10,somefunction_general,dat = data2,target = c(0.505, 0.4427654, 0.5173668),method = "BFGS"))
$par
[1] 0.01
$value
[1] 7.485697e-16
$counts
function gradient
12 3
$convergence
[1] 0
$message
NULL
Huh, we were able to converge. this is because there was a parameter that could take us there. note that i did change the starting point to 10.
I have 2 equation that need to be solved by using optim
{(4x^2-20x+1/4 y^2+8=0
1/2 xy^2+2x-5y+8=0)
I have already run the code,but I'm confused if there should be 1 answer or 2 because function will only return the results for the last line
Should I do like this
> myfunc=function(x){
+ 4*x[1]^2-20*x[1]+(x[2]^2/4)+8
+ }
> optim(c(0,0),myfunc,method="BFGS")
and
> myfunc=function(x){
+ (1/2)*(x[1]*x[2]^2)+2*x[1]-5*x[2]+8
+ }
> optim(c(0,0),myfunc,method="BFGS")
or should I do like this
> myfunc=function(x){
+ 4*x[1]^2-20*x[1]+(x[2]^2/4)+8
+ (1/2)*(x[1]*x[2]^2)+2*x[1]-5*x[2]+8
+ }
> optim(c(0,0),myfunc,method="BFGS")
For the second one it still give me only the answer for the second function so which method is correct.
Minimize the sum of the squares of the two expressions that should equal zero and ensure that the value at the optimum equals 0 (up to floating point approximation).
myfunc <- function(z) {
x <- z[1]
y <- z[2]
(4*x^2-20* x + 1/4*y^2 + 8)^2 + (1/2 * x*y^2 + 2*x- 5*y + 8)^2
}
optim(c(0, 0), myfunc)
giving:
$par
[1] 0.5000553 2.0002986
$value
[1] 1.291233e-06
$counts
function gradient
67 NA
$convergence
[1] 0
$message
NULL
You can also use a package for solving systems of non linear equations such as nleqslv.
Slightly redefine your function by making it return a vector containing the result for each equation
myfunc <- function(x){
y <- numeric(length(x))
y[1] <- 4*x[1]^2-20*x[1]+(x[2]^2/4)+8
y[2] <- (1/2)*(x[1]*x[2]^2)+2*x[1]-5*x[2]+8
y
}
Define a starting value for the solver
xstart <- c(0,0)
Then do this
library(nleqslv)
nleqslv(xstart,myfunc)
giving
$x
[1] 0.5 2.0
$fvec
[1] -1.472252e-09 -7.081979e-10
$termcd
[1] 1
$message
[1] "Function criterion near zero"
$scalex
[1] 1 1
$nfcnt
[1] 7
$njcnt
[1] 1
$iter
[1] 7
There are more packages that can solve equation systems such as BB and pracma.
Suppose I have the following system of equations:
a * b = 5
sqrt(a * b^2) = 10
How can I solve these equations for a and b in R ?
I guess this problem can be stated as an optimisation problem, with the following function... ?
fn <- function(a, b) {
rate <- a * b
shape <- sqrt(a * b^2)
return(c(rate, shape) )
}
In a comment the poster specifically asks about using solve and optim so we show how to solve this (1) by hand, (2) using solve, (3) using optim and (4) a fixed point iteration.
1) by hand First note that if we write a = 5/b based on the first equation and substitute that into the second equation we get sqrt(5/b * b^2) = sqrt(5 * b) = 10 so b = 20 and a = 0.25.
2) solve Regarding the use of solve these equations can be transformed into linear form by taking the log of both sides giving:
log(a) + log(b) = log(5)
0.5 * (loga + 2 * log(b)) = log(10)
which can be expressed as:
m <- matrix(c(1, .5, 1, 1), 2)
exp(solve(m, log(c(5, 10))))
## [1] 0.25 20.00
3) optim Using optim we can write this where fn is from the question. fn2 is formed by subtracting off the RHS of the equations and using crossprod to form the sum of squares.
fn2 <- function(x) crossprod( fn(x[1], x[2]) - c(5, 10))
optim(c(1, 1), fn2)
giving:
$par
[1] 0.2500805 19.9958117
$value
[1] 5.51508e-07
$counts
function gradient
97 NA
$convergence
[1] 0
$message
NULL
4) fixed point For this one rewrite the equations in a fixed point form, i.e. in the form c(a, b) = f(c(a, b)) and then iterate. In general, there will be several ways to do this and not all of them will converge but in this case this seems to work. We use starting values of 1 for both a and b and divide both side of the first equation by b to get the first equation in fixed point form and we divide both sides of the second equation by sqrt(a) to get the second equation in fixed point form:
a <- b <- 1 # starting values
for(i in 1:100) {
a = 5 / b
b = 10 / sqrt(a)
}
data.frame(a, b)
## a b
## 1 0.25 20
Use this library.
library("nleqslv")
You need to define the multivariate function you want to solve for.
fn <- function(x) {
rate <- x[1] * x[2] - 5
shape <- sqrt(x[1] * x[2]^2) - 10
return(c(rate, shape))
}
Then you're good to go.
nleqslv(c(1,5), fn)
Always look at the detailed results. Numerical calculations can be tricky. In this case I got this:
Warning message:
In sqrt(x[1] * x[2]^2) : NaNs produced
That just means the procedure searched a region that included x[1] < 0 and then presumably noped the heck back to the right hand side of the plane.
Could you please explain me the difference between these two cases?
> cor(1:10, rep(10,10))
[1] NA
Warning message:
In cor(1:10, rep(10, 10)) : the standard deviation is zero
> cor(1:10, 1:10)
[1] 1
the first one is just a straight line as well as the second I would expect the correlation to be one. What am I not considering? Thanks
Plot the data and it should be clear. The data set
## y doesn't vary
plot(1:10, rep(10,10))
is just a horizontal line. The correlation coefficient undefined for a horizontal line, since the estimate of the standard deviation for y is 0 (this appears on the denominator of the correlation coefficient). While
plot(1:10, 1:10)
is the line:
y = x
If you want to measure how much "in line" the points are,
you can use (one minus) the ratio of the eigenvalues of the variance matrix.
f <- function(x,y) {
e <- eigen(var(cbind(x,y)))$values
1 - e[2] / e[1]
}
# To have values closer to 0, you can square that quantity.
f <- function(x,y) {
e <- eigen(var(cbind(x,y)))$values
( 1 - e[2] / e[1] )^2
}
f( 1:10, 1:10 )
f( 1:10, rep(1,10) )
f( rnorm(100), rnorm(100) ) # Close to 0
f( rnorm(100), 2 * rnorm(100) ) # Closer to 1
f( 2 * rnorm(100), rnorm(100) ) # Similar
It will be 1 if the points are aligned,
0 if the cloud they form has a spherical shape,
invariant by translations and rotations,
non-negative, and symmetric.
If your situation is not symmetric, i.e., if x and y do not play the same role,
the regression-based approach suggested in Roland's comment makes more sense.