I am dealing with a nested optimization problem in Julia 1.7.3. In particular, I need to optimize a function (say f1) that, in turn, depends on the optimization result of another function (say f2). Here is a minimal example to illustrate my problem
using Optim
function f2(x::Float64, y::Float64)
return (x^2 - x - y)^2
end
function f1(y::Float64)
x₁ = optimize(x -> f2(x,y), -10, 10).minimizer
return (y*x₁ - 0.5)^2
end
To get the optimizer of f1, I do
y₁ = optimize(y -> f1(y), -10, 10).minimizer
To get the optimizer of f2, I do
x₁ = optimize(x -> f2(x,y₁), -10, 10).minimizer
However, this last step seems very inefficient because it requires an extra optimization call. The optimizer of f2 is indeed already computed while optimizing f1 (i.e., x₁). Is there a way to retrieve x₁ without an extra optimization step (e.g., saving x₁ during the last iteration step of f1)?
Note: one option is to merge the two optimization problems and simultaneously optimize the objective function with respect to x and y. However, I cannot follow this approach in the actual application I am dealing with.
You may store x₁ in a mutable struct, e.g.
using Optim
mutable struct Minimizer{T}
x::T
y::T
end
function f2(x::Float64, y::Float64)
return (x^2 - x - y)^2
end
function f1!(m::Minimizer, y::Float64)
x₁ = optimize(x -> f2(x,y), -10, 10).minimizer
m.x = x₁
return (y*x₁ - 0.5)^2
end
M = Minimizer(NaN, NaN)
M.y = optimize(y -> f1!(M, y), -10, 10).minimizer
#show M
# M = Minimizer{Float64}(1.2971565074975993, 0.3854585027525779)
Related
I want to implement a simproc capable of rewriting the argument of sin into a linear combination x + k * pi + k' * pi / 2 (where ideally k' = 0 or k' = 1) and then apply existing lemmas about additions of arguments in sines.
The steps could be as follows:
Pattern match the goal to extract the argument of sin(expr):
fun dest_sine t =
case t of
(#{term "(sin):: real ⇒ real"} $ t') => t'
| _ => raise TERM ("dest_sine", [t]) ;
Prove that for some x, k, k': expr = x + k*pi + k' * pi/2.
Use existing lemmas to rewrite to a simpler trigonometric function:
fun rewriter x k k' =
if (k mod 2 = 0 andalso k' = 0) then #{term "sin"} $ x
else if (k mod 2 = 0 andalso k' = 1) then #{term "cos"} $ x
else if (k mod 2 = 1 andalso k' = 0) then #{term "-sin"} $ x
else #{term "-cos"} $ x
I'm stuck at step two. The idea is to use algebra simplifications to obtain the x,k,k' where the theorem holds. I believe schematic goals should do this but I haven't ever used them.
My thoughts
Could I rather assume that the expression is of this form and let the simplifier find it so that the simproc can be triggered?
If I first start assuming the linear form x + k*pi + k' * pi/2 then:
Extract x,k,k' from this combination.
Apply rewriter and obtain the corresponding term to be rewritten two.
Apply in a sequence: rules dealing with + pi/2, rules dealing with + 2 pi
I would start easy and ignore the pi / 2 part for now.
You probably want to build a simproc that matches on anything of the form sin x. Then you want to write a conversion that takes that term x (which is assumed to be a sum of several terms) and brings it into the form a + of_int b * p.
A conversion is essentially a function of type cterm → thm which takes a cterm ct and returns a theorem of the form ct ≡ …, i.e. it's a form of deterministic rewriting (a conversion can also fail by throwing a CTERM exception, by convention). There are a lot of combinators for building and using these in Pure/conv.ML.
This is probably a bit fiddly. You essentially have to descend through the term and, for each atom (i.e. anything not of the form _ + _) you have to figure out whether it can be brought into the form of_int … * pi (e.g. again by writing a conversion that does this transformation – to make it easy you can omit this part so that your procedure only works if the terms are already in that form) and then group all the terms of the form of_int … * pi to the right and all the terms not of that form to the left using associativity and commutativity.
I would suggest this:
Define a function SIN_SIMPROC_ATOM x n = x + of_int n * pi
Write a conversion sin_atom_conv that rewrites of_int n * pi to SIN_SIMPROC_ATOM 0 n and everything else into SIN_SIMPROC_ATOM x 0
Write a conversion that descends through +, applies sin_atom_conv to every atom, and then applies some kind of combination rule like SIN_SIMPROC_ATOM x1 n1 + SIN_SIMPROC_ATOM x2 n2 = SIN_SIMPROC_ATOM (x1 + x2) (n1 + n2)
In the end, you have rewritten your entire form to the form sin (SIN_SIMPROC_ATOM x n), and then you can apply some suitable rule to that.
It's not quite clear to me how to best handle the parity of n. You could rewrite sin (SIN_SIMPROC_ATOM x n) = (-1) ^ nat ¦n¦ * sin x but I'm not sure if that's what the user really wants in most cases. It might make more sense to only do that if you can deduce the parity of n statically (e.g. by using the simplifier) and then directly simplify to sin x or -sin x.
The situation becomes even more complicated if you want to include halves of π. You can of course extend SIN_SIMPROC_ATOM by a second term for halves of π (and one for doubles of π as well to make it more uniform). Or you could ad all of them together so that you just have a single integer n that describes your multiples of π/2, and k multiples of π simply contribute 2k to that term. And then you have to figure out what n mod 4 is – possibly again with the simplifier or with some clever static method.
I am having some troubles with my CS assignment. I am trying to call another rule that I created previously within a new rule that will calculate the factorial of a power function (EX. Y = (N^X)!). I think the problem with my code is that Y in exp(Y,X,N) is not carrying over when I call factorial(Y,Z), I am not entirely sure though. I have been trying to find an example of this, but I haven been able to find anything.
I am not expecting an answer since this is homework, but any help would be greatly appreciated.
Here is my code:
/* 1.2: Write recursive rules exp(Y, X, N) to compute mathematical function Y = X^N, where Y is used
to hold the result, X and N are non-negative integers, and X and N cannot be 0 at the same time
as 0^0 is undefined. The program must print an error message if X = N = 0.
*/
exp(_,0,0) :-
write('0^0 is undefined').
exp(1,_,0).
exp(Y,X,N) :-
N > 0, !, N1 is N - 1, exp(Y1, X, N1), Y is X * Y1.
/* 1.3: Write recursive rules factorial(Y,X,N) to compute Y = (X^N)! This function can be described as the
factorial of exp. The rules must use the exp that you designed.
*/
factorial(0,X) :-
X is 1.
factorial(N,X) :-
N> 0, N1 is N - 1, factorial(N1,X1), X is X1 * N.
factorial(Y,X,N) :-
exp(Y,X,N), factorial(Y,Z).
The Z variable mentioned in factorial/3 (mentioned only once; so-called 'singleton variable', cannot ever get unified with anything ...).
Noticed comments under question, short-circuiting it to _ won't work, you have to unify it with a sensible value (what do you want to compute / link head of the clause with exp and factorial through parameters => introduce some parameter "in the middle"/not mentioned in the head).
Edit: I'll rename your variables for you maybe you'll se more clearly what you did:
factorial(Y,X,Result) :-
exp(Y,X,Result), factorial(Y,UnusedResult).
now you should see what your factorial/3 really computes, and how to fix it.
Version: v"0.5.0-dev+1259"
Context: The goal is to calculate the Rademacher penalty bound on a give data points n with respect to VC-dimension dvc and probability expressed by delta
Please consider Julia code:
#Growth function on any n points with respect to VC-dimmension
function mh(n, dvc)
if n <= dvc
2^n #A
else
n^dvc #B
end
end
#Rademacher penalty bound
function rademacher_penalty_bound(n::Int, dvc::Int, delta::Float64)
sqrt((2.0*log(2.0*n*mh(n,dvc)))/n) + sqrt((2.0/n)*log(1.0/delta)) + 1.0/n
end
and the equivalent code in Octave/Matlab:
%Growth function on n points for a give VC dimmension (dvc)
function md = mh(n, dvc)
if n <= dvc
md= 2^n;
else
md = n^dvc;
end
end
%Rademacher penalty bound
function epsilon = rademacher_penalty_bound (n, dvc, delta)
epsilon = sqrt ((2*log(2*n*mh(n,dvc)))/n) + sqrt((2/n)*log(1/delta)) + 1/n;
end
Problem:
When I start testing it I receive the following results:
Julia first:
julia> rademacher_penalty_bound(50, 50, 0.05) #50 points
1.619360057204432
julia> rademacher_penalty_bound(500, 50, 0.05) #500 points
ERROR: DomainError:
[inlined code] from math.jl:137
in rademacher_penalty_bound at none:2
in eval at ./boot.jl:264
Now Octave:
octave:17> rademacher_penalty_bound(50, 50, 0.05)
ans = 1.6194
octave:18> rademacher_penalty_bound(500, 50, 0.05)
ans = 1.2387
Question: According to Noteworthy differences from MATLAB I think I followed the rule of thumb ("literal numbers without a decimal point (such as 42) create integers instead of floating point numbers..."). The code crashes when the number of points exceeds 51 (line #B in mh). Can someone with more experience can look at the code and say what I should improve/change?
While BigInt and BigFloat will work here, they're serious overkill. The real issue is that you're doing integer exponentiation in Julia and floating-point exponentiation in Octave/Matlab. So you just need to change mh to use floats instead of integers for exponents:
mh(n, dvc) = n <= dvc ? 2^float(n) : n^float(dvc)
rademacher_penalty_bound(n, dvc, δ) =
√((2log(2n*mh(n,dvc)))/n) + √(2log(1/δ)/n) + 1/n
With these definitions, you get the same results as Octave/Matlab:
julia> rademacher_penalty_bound(50, 50, 0.05)
1.619360057204432
julia> rademacher_penalty_bound(500, 50, 0.05)
1.2386545010981596
In Octave/Matlab, even when you input a literal without a decimal point, you still get a float – you have to do an explicit cast to int type. Also, exponentiation in Octave/Matlab always converts to float first. In Julia, x^2 is equivalent to x*x which prohibits conversion to floating-point.
Although BigInt and BigFloat are excellent tools when they are necessary, they should usually be avoided, since they are overkill and slow.
In this case, the problem is indeed the difference between Octave, that treats everything as a floating-point number, and Julia, that treats e.g. 2 as an integer.
So the first thing to do is to use floating-point numbers in Julia too:
function mh(n, dvc)
if n <= dvc
2.0 ^ n
else
Float64(n) ^ dvc
end
end
This already helps, e.g. mh(50, 50) works.
However, the correct solution for this problem is to look at the code more carefully, and realise that the function mh only occurs inside a log:
log(2.0*n*mh(n,dvc))
We can use the laws of logarithms to rewrite this as
log(2.0*n) + log_mh(n, dvc)
where log_mh is a new function, which returns the logarithm of the result of mh. Of course, this should not be written directly as log(mh(n, dvc)), but is rather a new function:
function log_mh(n, dvc)
if n <= dvc
n * log(2.0)
else
dvc * log(n)
end
end
In this way, you will be able to use huge numbers without overflow.
I don't know is it acceptable to get results of BigFloat but anyway in julia part you can use BigInt
#Growth function on any n points with respect to VC-dimmension
function mh(n, dvc)
if n <= dvc
(BigInt(2))^n #A
else
n^dvc #B
end
end
#Rademacher penalty bound
function rademacher_penalty_bound(n::BigInt, dvc::BigInt, delta::Float64)
sqrt((2.0*log(2.0*n*mh(n,dvc)))/n) + sqrt((2.0/n)*log(1.0/delta)) + 1.0/n
end
rademacher_penalty_bound(BigInt(500), BigInt(500), 0.05)
# => 1.30055251010957621105182244420.....
Because by default a Julia Int is a "machine-size" integer, a 64-bit integer for the common x86-64 platform, whereas Octave uses floating point. So in Julia mh(500,50) overflows. You can fix it by replacing mh() as follows:
function mh(n, dvc)
n2 = BigInt(n) # Or n2 = Float64(n)
if n <= dvc
2^n2 #A
else
n2^dvc #B
end
end
All,
I've just been starting to play around with the Julia language and am enjoying it quite a bit. At the end of the 3rd tutorial there's an interesting problem: genericize the quadratic formula such that it solves for the roots of any n-order polynomial equation.
This struck me as (a) an interesting programming problem and (b) an interesting Julia problem. Has anyone out there solved this one? For reference, here is the Julia code with a couple toy examples. Again, the idea is to make this generic for any n-order polynomial.
Cheers,
Aaron
function derivative(f)
return function(x)
# pick a small value for h
h = x == 0 ? sqrt(eps(Float64)) : sqrt(eps(Float64)) * x
# floating point arithmetic gymnastics
xph = x + h
dx = xph - x
# evaluate f at x + h
f1 = f(xph)
# evaluate f at x
f0 = f(x)
# divide the difference by h
return (f1 - f0) / dx
end
end
function quadratic(f)
f1 = derivative(f)
c = f(0.0)
b = f1(0.0)
a = f(1.0) - b - c
return (-b + sqrt(b^2 - 4a*c + 0im))/2a, (-b - sqrt(b^2 - 4a*c + 0im))/2a
end
quadratic((x) -> x^2 - x - 2)
quadratic((x) -> x^2 + 2)
The package PolynomialRoots.jl provides the function roots() to find all (real and complex) roots of polynomials of any order. The only mandatory argument is the array with coefficients of the polynomial in ascending order.
For example, in order to find the roots of
6x^5 + 5x^4 + 3x^2 + 2x + 1
after loading the package (using PolynomialRoots) you can use
julia> roots([1, 2, 3, 4, 5, 6])
5-element Array{Complex{Float64},1}:
0.294195-0.668367im
-0.670332+2.77556e-17im
0.294195+0.668367im
-0.375695-0.570175im
-0.375695+0.570175im
The package is a Julia implementation of the root-finding algorithm described in this paper: http://arxiv.org/abs/1203.1034
PolynomialRoots.jl has also support for arbitrary precision calculation. This is useful for solving equation that cannot be solved in double precision. For example
julia> r = roots([94906268.375, -189812534, 94906265.625]);
julia> (r[1], r[2])
(1.0000000144879793 - 0.0im,1.0000000144879788 + 0.0im)
gives the wrong result for the polynomial, instead passing the input array in arbitrary precision forces arbitrary precision calculations that provide the right answer (see https://en.wikipedia.org/wiki/Loss_of_significance):
julia> r = roots([BigFloat(94906268.375), BigFloat(-189812534), BigFloat(94906265.625)]);
julia> (Float64(r[1]), Float64(r[2]))
(1.0000000289759583,1.0)
There are no algebraic formulae for a general polynomials of degree five and above (infact there cant be see here). So theoretically, you could proceed using the same methodology for solutions to cubics and quartics, but even that would be a lot of hard work given very unwieldy formulae for roots of quartics. You could also use a CAS like SymPy to find out those formulae.
Often we are interested in computing f(i) i=m n∑ , the sum of function
values f(i) for i = m through n. Define ‘sigma f m n’ which computes
f(i) i=m n∑ . This is different from defining ‘sigma (f, m, n)’
I'm required to write a Curried version of this function. I'm have a bit of trouble understanding how this would actually work. I understand that a Curry function is something that takes in a function and produces a function. Would this be an example of a curry function?
fun myCurry f x = f(x)
As far as setting up my problem, would this be an acceptable start?
fun sigma f m n =
I haven't gotten any further, because I can't really grasp what i'm being asked to do.
A curried function is not, in fact, a function that takes in a function and produces another function. That is a higher order function.
A curried function is simply one that takes more than one argument and can be partially applied by only giving it one of its arguments.
For example, with your sigma question,
fun sigma (f,m,n) = ...
is not a curried function, as it takes only one argument (the tuple (f,m,n).)
fun sigma f m n = ...
, however, is a curried function, as it takes three arguments, and it is valid to say something like
val sigmasquare = sigma (fn x => x * x)
, partially applying sigma by giving it its first argument.
A simpler example would be
fun add (x,y) = x + y
This is a noncurried function. To evaluate it, you must give it its argument, which includes both x and y. add (3,5) will evaluate to 8, in this case.
fun add x y = x + y
is the curried version of this same function. This can be partially evaluated by just giving it x. For example, add 3 will evaluate to a function which will add three to its argument.
This is more clearly seen by looking at the previous examples as anonymous or lambda functions.
The first is equivalent to fn (x,y) => x + y, which clearly takes two ints and evaluates to an int.
The second is equivalent to fn x => fn y => x + y, which takes an int and evaluates to a function taking another int and evaluating to an int.
Thus, the type of the first is (int * int) -> int, while the type of the second is int -> int -> int.
Hopefully, this clears currying up somewhat.