I have two functions f(x) and g(x). Here f(x) is the objective function to minimize, and g(x) is the gradient function. My problem is for each trial x, the body of f(x) will compute a complicated matrix A(x), which will also be used in g(x). For the sake of efficiency, I don't want g(x) to repeat the computation of A. I am considering to make A(x) global by defining A <<- ... in the body of f(x). So g(x) can use A(x) directly. Because I don't know how optim in R iterates f(x) and g(x), I am not sure if this strategy is correct and efficient. Any suggestions and comments are welcome. Thanks.
Because you don't know how optim is going to call f and g you are going to have to make sure that any stashed A(x) is from the same x when you need it. It might call f(x1), f(x2), f(x3) and then g(x1).
One solution might be memoisation:
http://cran.r-project.org/web/packages/memoise/index.html
A memoised A(x) will store the return value for given input values and return that when given the same input values without recomputing. Obviously only works for non-stochastic functions (don't call any random number generators).
I'm not sure how you control the size of the cache, but the source code is all there.
Related
I am sorry about the title, but I couldn't find a better one.
Let's define
function test(n)
print("test executed")
return n
end
f(n) = test(n)
Every time we call f we get
f(5)
test executed
5
Is there a way to tell julia to evaluate test once in the definition of f?
I expect that this is probably not going to be possible, in which case I have a slightly different question. If ar=[1,2,:x,-2,2*:x] is there any way to define f(x) to be the sum of ar, i.e. f(x) = 3*x+1?
If you want to compile based on type information, you can use #generated functions. But it seems like you want to compile based on the runtime values of the input. In this case, you might want to do memoization. There is a library Memoize that provides a macro for memoizing functions.
Can someone explain me how the optim function works in Scilab and give me a short example of that.
What I am trying to do is to maximize this function and find the optimal value
> function [f, g, ind]=cost(x, ind)
f= -x.^2
g=2*x
endfunction
// Simplest call
x0 = [1; -1; 1];
[fopt, xopt] = optim(cost, x0)
When I am trying to implement the function, I receive error
Variable returned by scilab argument function is incorrect.
I think I do some very basic mistake but can't understand where.
I think the answer is that -x.^2 does not return a scalar but a vector (x is a vector and x.^2 is an elementwise operation). You probably want to say something like x'*x. The objective function of an optimization problem should always be scalar (otherwise we end up with a multi-objective or multi-criteria problem which is a whole different type of problem).
Minimizing -x'*x is probably not a good idea
The gradient is not correct for f=-x'*x (but see previous point).
I have a function f(x,y) whose outcome is random (I take mean from 20 random numbers depending on x and y). I see no way to modify this function to make it symbolic.
And when I run
x,y = var('x,y')
d = plot_vector_field((f(x),x), (x,0,1), (y,0,1))
it says it can't cast symbolic expression to real or rationa number. In fact it stops when I write:
a=matrix(RR,1,N)
a[0]=x
What is the way to change this variable to real numbers in the beginning, compute f(x) and draw a vector field? Or just draw a lot of arrows with slope (f(x),x)?
I can create something sort of like yours, though with no errors. At least it doesn't do what you want.
def f(m,n):
return m*randint(100,200)-n*randint(100,200)
var('x,y')
plot_vector_field((f(x,y),f(y,x)),(x,0,1),(y,0,1))
The reason is because Python functions immediately evaluate - in this case, f(x,y) was 161*x - 114*y, though that will change with each invocation.
My suspicion is that your problem is similar, the immediate evaluation of the Python function once and for all. Instead, try lambda functions. They are annoying but very useful in this case.
var('x,y')
plot_vector_field((lambda x,y: f(x,y), lambda x,y: f(y,x)),(x,0,1),(y,0,1))
Wow, I now I have to find an excuse to show off this picture, cool stuff. I hope your error ends up being very similar.
I would like to use optim() to optimize a cost function (fn argument), and I will be providing a gradient (gr argument). I can write separate functions for fn and gr. However, they have a lot of code in common and I don't want the optimizer to waste time repeating those calculations. So is it possible to provide one function that computes both the cost and the gradient? If so, what would be the calling syntax to optim()?
As an example, suppose the function I want to minimize is
cost <- function(x) {
x*exp(x)
}
Obviously, this is not the function I'm trying to minimize. That's too complicated to list here, but the example serves to illustrate the issue. Now, the gradient would be
grad <- function(x) {
(x+1)*exp(x)
}
So as you can see, the two functions, if called separately, would repeat some of the work (in this case, the exponential function). However, since optim() takes two separate arguments (fn and gr), it appears there is no way to avoid this inefficiency, unless there is a way to define a function like
costAndGrad <- function(x) {
ex <- exp(x)
list(cost=x*ex, grad=(x+1)*ex)
}
and then pass that function to optim(), which would need to know how to extract the cost and gradient.
Hope that explains the problem. Like I said my function is much more complicated, but the idea is the same: there is considerable code that goes into both calculations (cost and gradient), which I don't want to repeat unnecessarily.
By the way, I am an R novice, so there might be something simple that I'm missing!
Thanks very much
The nlm function does optimization and it expects the gradient information to be returned as an attribute to the value returned as the original function value. That is similar to what you show above. See the examples in the help for nlm.
I have a function that takes a floating point number and returns a floating point number. It can be assumed that if you were to graph the output of this function it would be 'n' shaped, ie. there would be a single maximum point, and no other points on the function with a zero slope. We also know that input value that yields this maximum output will lie between two known points, perhaps 0.0 and 1.0.
I need to efficiently find the input value that yields the maximum output value to some degree of approximation, without doing an exhaustive search.
I'm looking for something similar to Newton's Method which finds the roots of a function, but since my function is opaque I can't get its derivative.
I would like to down-thumb all the other answers so far, for various reasons, but I won't.
An excellent and efficient method for minimizing (or maximizing) smooth functions when derivatives are not available is parabolic interpolation. It is common to write the algorithm so it temporarily switches to the golden-section search (Brent's minimizer) when parabolic interpolation does not progress as fast as golden-section would.
I wrote such an algorithm in C++. Any offers?
UPDATE: There is a C version of the Brent minimizer in GSL. The archives are here: ftp://ftp.club.cc.cmu.edu/gnu/gsl/ Note that it will be covered by some flavor of GNU "copyleft."
As I write this, the latest-and-greatest appears to be gsl-1.14.tar.gz. The minimizer is located in the file gsl-1.14/min/brent.c. It appears to have termination criteria similar to what I implemented. I have not studied how it decides to switch to golden section, but for the OP, that is probably moot.
UPDATE 2: I googled up a public domain java version, translated from FORTRAN. I cannot vouch for its quality. http://www1.fpl.fs.fed.us/Fmin.java I notice that the hard-coded machine efficiency ("machine precision" in the comments) is 1/2 the value for a typical PC today. Change the value of eps to 2.22045e-16.
Edit 2: The method described in Jive Dadson is a better way to go about this. I'm leaving my answer up since it's easier to implement, if speed isn't too much of an issue.
Use a form of binary search, combined with numeric derivative approximations.
Given the interval [a, b], let x = (a + b) /2
Let epsilon be something very small.
Is (f(x + epsilon) - f(x)) positive? If yes, the function is still growing at x, so you recursively search the interval [x, b]
Otherwise, search the interval [a, x].
There might be a problem if the max lies between x and x + epsilon, but you might give this a try.
Edit: The advantage to this approach is that it exploits the known properties of the function in question. That is, I assumed by "n"-shaped, you meant, increasing-max-decreasing. Here's some Python code I wrote to test the algorithm:
def f(x):
return -x * (x - 1.0)
def findMax(function, a, b, maxSlope):
x = (a + b) / 2.0
e = 0.0001
slope = (function(x + e) - function(x)) / e
if abs(slope) < maxSlope:
return x
if slope > 0:
return findMax(function, x, b, maxSlope)
else:
return findMax(function, a, x, maxSlope)
Typing findMax(f, 0, 3, 0.01) should return 0.504, as desired.
For optimizing a concave function, which is the type of function you are talking about, without evaluating the derivative I would use the secant method.
Given the two initial values x[0]=0.0 and x[1]=1.0 I would proceed to compute the next approximations as:
def next_x(x, xprev):
return x - f(x) * (x - xprev) / (f(x) - f(xprev))
and thus compute x[2], x[3], ... until the change in x becomes small enough.
Edit: As Jive explains, this solution is for root finding which is not the question posed. For optimization the proper solution is the Brent minimizer as explained in his answer.
The Levenberg-Marquardt algorithm is a Newton's method like optimizer. It has a C/C++ implementation levmar that doesn't require you to define the derivative function. Instead it will evaluate the objective function in the current neighborhood to move to the maximum.
BTW: this website appears to be updated since I last visited it, hope it's even the same one I remembered. Apparently it now also support other languages.
Given that it's only a function of a single variable and has one extremum in the interval, you don't really need Newton's method. Some sort of line search algorithm should suffice. This wikipedia article is actually not a bad starting point, if short on details. Note in particular that you could just use the method described under "direct search", starting with the end points of your interval as your two points.
I'm not sure if you'd consider that an "exhaustive search", but it should actually be pretty fast I think for this sort of function (that is, a continuous, smooth function with only one local extremum in the given interval).
You could reduce it to a simple linear fit on the delta's, finding the place where it crosses the x axis. Linear fit can be done very quickly.
Or just take 3 points (left/top/right) and fix the parabola.
It depends mostly on the nature of the underlying relation between x and y, I think.
edit this is in case you have an array of values like the question's title states. When you have a function take Newton-Raphson.