So I've combed through the various websites pertaining to Julia JuMP and using functions as arguments to #objective or #NLobjective, but let me try to state my problem. I'm certain that I'm doing something silly, and that this is a quick fix.
Here is a brief code snippet and what I would like to do:
using juMP;
tiLim = 1800;
x = range(1,1,M); # M stated elsewhere
solver_opt = "bonmin.time_limit=" * "$tiLim";
m = Model(solver=AmplNLSolver("bonmin",[solver_opt]));
#variables m begin
T[x];
... # Have other decision variables which are matrices
end
#NLobjective(m,:Min,maximum(T[i] for i in x));
Now from my understanding, the `maximum' function makes the problem nonlinear and is not allowed inside the JuMP objective function, so people will do one of two things:
(1) play the auxiliary variable + constraint trick, or
(2) create a function and then `register' this function with JuMP.
However, I can't seem to do either correctly.
Here is an attempt at using the auxiliary variable + constraint trick:
mymx(vec::Array) = maximum(vec) #generic function in Julia
#variable(m, aux)
#constraint(m, aux==mymx(T))
#NLobjective(m,:Min,aux)
I was hoping to get some assistance with doing this seemingly trivial task of minimizing a maximum.
Also, it should be noted that this is a MILP problem which I'm trying to solve. I've previously implemented the problem in CPLEX using the ILOG script for OPL, where this objective function is much more straightforward it seems. Though it's probably just my ignorance of using JuMP.
Thanks.
You can model this as a linear problem as follows:
#variable(m, aux)
for i in x
#constraint(m, aux >= T[i]
end
#objective(m, Min, aux)
Related
How to integrate an interval-algorithm-based nonlinear function (for example, d(F(X))/dX= a/(1+cX), where a=[1, 2], c=[2, 3] are interval constants) using the "IntervalArithmetic" package in Julia? Could you please give an example? Or could you please provide a relevant document? F(X) will come out as an interval with bounds, F(X)=[p, q].
Just numerical integration?
As long as the integrating code is written in Julia (otherwise i suspect it will struggle to understand IntervalArithmetic) and there isn't some snag about how it should interpret tolerances, then it should just work, more or less how you might expect it to handle e.g. complex numbers.
using IntervalArithmetic
f(x) = interval(1,2)/(1+interval(2,3)*x)
and combined with e.g.
using QuadGK
quadgk(f, 0, 1)
gives ([0.462098, 1.09862], [0, 0.636515]) (so.. i guess here the interpretation is that the error is in the interval 0 to 0.636515 :))
Just as a sanity check, lets just go with a good old trapezodial rule.
using Trapz
xs = range(0, 1,length=100)
trapz(xs, f.(xs))
again gives us the expected interval [0.462098, 1.09862]
For a stochastic solver that will run on a GPU, I'm currently trying to draw Poisson-distributed random numbers. I will need one number for each entry of a large array. The array lives in device memory and will also be deterministically updated afterwards. The problem I'm facing is that the mean of the distribution depends on the old value of the entry. Therefore, I would have to do naively do something like:
CUDA.rand_poisson!(lambda=array*constant)
or:
array = CUDA.rand_poisson(lambda=array*constant)
Both of which don't work, which does not really surprise me, but maybe I just need to get a better understanding of broadcasting?
Then I tried writing a kernel which looks like this:
function cu_draw_rho!(rho::CuDeviceVector{FloatType}, λ::FloatType)
idx = (blockIdx().x - 1i32) * blockDim().x + threadIdx().x
stride = gridDim().x * blockDim().x
#inbounds for i=idx:stride:length(rho)
l = rho[i]*λ
# 1. variant
rho[i] > 0.f0 && (rho[i] = FloatType(CUDA.rand_poisson(UInt32,1;lambda=l)))
# 2. variant
rho[i] > 0.f0 && (rho[i] = FloatType(rand(Poisson(lambda=l))))
end
return
end
And many slight variations of the above. I get tons of errors about dynamic function calls, which I connect to the fact that I'm calling functions that are meant for arrays from my kernels. the 2. variant of using rand() works only without the Poisson argument (which uses the Distributions package, I guess?)
What is the correct way to do this?
You may want CURAND.jl, which provides curand_poisson.
using CURAND
n = 10
lambda = .5
curand_poisson(n, lambda)
I am writing a script converting Python's Keras (v1.1.0) model to Julia's Flux model, and I am struggling with implementing regularization (I have read https://fluxml.ai/Flux.jl/stable/models/regularisation/) as a way to get to know Julia.
So, in Keras's json model I have something like: "W_regularizer": {"l2": 0.0010000000474974513, "name": "WeightRegularizer", "l1": 0.0} for each Dense layer. I want to use these coefficients to create regularization in the Flux model. The problem is that, in Flux it is added directly to the loss instead of being defined as a property of the layer itself.
To avoid posting too much code here, I've added it to the repo. Here is a small script that takes the json and createa Flux's Chain: https://github.com/iegorval/Keras2Flux.jl/blob/master/Keras2Flux/src/Keras2Flux.jl
Now, I want to create a penalty for each Dense layer with the predefined l1/l2 coefficient. I tried to do it like this:
using Pkg
pkg"activate /home/username/.julia/dev/Keras2Flux"
using Flux
using Keras2Flux
using LinearAlgebra
function get_penalty(model::Chain, regs::Array{Any, 1})
index_model = 1
index_regs = 1
penalties = []
for layer in model
if layer isa Dense
println(regs[index_regs](layer.W))
penalty(m) = regs[index_regs](m[index_model].W)
push!(penalties, penalty)
#println(regs[i])
index_regs += 1
end
index_model += 1
end
total_penalty(m) = sum([p(m) for p in penalties])
println(total_penalty)
println(total_penalty(model))
return total_penalty
end
model, regs = convert_keras2flux("examples/keras_1_1_0.json")
penalty = get_penalty(model, regs)
So, I create a penalty function for each Dense layer and then sum it up to the total penalty. However, it gives me this error:
ERROR: LoadError: BoundsError: attempt to access 3-element Array{Any,1} at index [4]
I understand what it means but I really don't understand how to fix it. So, it seems that when I call total_penalty(model), it uses index_regs == 4 (so, the values of index_regs and index_model as they are AFTER the for-cycle). Instead, I want to use their actual indices that I had while pushing the given penalty to the list of penalties.
On the other hand, if I did it not as a list of functions but as a list of values, it also would not be correct, because I will define the loss as:
loss(x, y) = binarycrossentropy(model(x), y) + total_penalty(model). If I was to use it just as list of values, then I would have a static total_penalty, while it should be recalculated for every Dense layer every time during the model training.
I would be thankful if somebody with Julia experience gives me some advise because I am definitely failing to understand how it works in Julia and, specifically, in Flux. How would I create total_penalty that would be recalculated automatically during training?
There are a couple parts to your question, and since you are new to Flux (and Julia?), I will answer in steps. But I suggest the solution at the end as a cleaner way to handle this.
First, there is the issue of p(m) calculating the penalty using index_regs and index_model as the values after the for-loop. This is because of the scoping rules in Julia. When you define the closure penalty(m) = regs[index_regs](m[index_model].W), index_regs is bound to the variable defined in get_penalty. So, as index_regs changes, so does the output of p(m). The other issue is the naming of the function as penalty(m). Every time you run this line, you are redefining penalty and all references to it that you pushed onto penalties. Instead, you should prefer to create an anonymous function. Here is how we incorporate these changes:
function get_penalty(model::Chain, regs::Array{Any, 1})
index_model = 1
index_regs = 1
penalties = []
for layer in model
if layer isa Dense
println(regs[index_regs](layer.W))
penalty = let i = index_regs, index_model = index_model
m -> regs[i](m[index_model].W)
end
push!(penalties, penalty)
index_regs += 1
end
index_model += 1
end
total_penalty(m) = sum([p(m) for p in penalties])
return total_penalty
end
I used i and index_model in the let block to drive home the scoping rules. I'd encourage you to replace the anonymous function in the let block with global penalty(m) = ... (and remove the assignment to penalty before the let block) to see the difference of using anonymous vs named functions.
But, if we go back to your original issue, you want to calculate the regularization penalty for your model using the stored coefficients. Ideally, these would be stored with each Dense layer as in Keras. You can recreate the same functionality in Flux:
using Flux, Functor
struct RegularizedDense{T, LT<:Dense}
layer::LT
w_l1::T
w_l2::T
end
#functor RegularizedDense
(l::RegularizedDense)(x) = l.layer(x)
penalty(l) = 0
penalty(l::RegularizedDense) =
l.w_l1 * norm(l.layer.W, 1) + l.w_l2 * norm(l.layer.W, 2)
penalty(model::Chain) = sum(penalty(layer) for layer in model)
Then, in your Keras2Flux source, you can redefine get_regularization to return w_l1_reg and w_l2_reg instead of functions. And in create_dense you can do:
function create_dense(config::Dict{String,Any}, prev_out_dim::Int64=-1)
# ... code you have already written
dense = Dense(in, out, activation; initW = init, initb = zeros)
w_l1, w_l2 = get_regularization(config)
return RegularizedDense(dense, w_l1, w_l2)
end
Lastly, you can compute your loss function like so:
loss(x, y, m) = binarycrossentropy(m(x), y) + penalty(m)
# ... later for training
train!((x, y) -> loss(x, y, m), training_data, params)
We define loss as a function of (x, y, m) to avoid performance issues.
So, in the end, this approach is cleaner because after model construction, you don't need to pass around an array of regularization functions and figure out how to index each function correctly with the corresponding dense layer.
If you prefer to keep the regularizer and model separate (i.e. have standard Dense layers in your model chain), then you can do that too. Let me know if you want that solution, but I'll leave it out for now.
Can someone explain me how the optim function works in Scilab and give me a short example of that.
What I am trying to do is to maximize this function and find the optimal value
> function [f, g, ind]=cost(x, ind)
f= -x.^2
g=2*x
endfunction
// Simplest call
x0 = [1; -1; 1];
[fopt, xopt] = optim(cost, x0)
When I am trying to implement the function, I receive error
Variable returned by scilab argument function is incorrect.
I think I do some very basic mistake but can't understand where.
I think the answer is that -x.^2 does not return a scalar but a vector (x is a vector and x.^2 is an elementwise operation). You probably want to say something like x'*x. The objective function of an optimization problem should always be scalar (otherwise we end up with a multi-objective or multi-criteria problem which is a whole different type of problem).
Minimizing -x'*x is probably not a good idea
The gradient is not correct for f=-x'*x (but see previous point).
I'm modelling an overhead crane and obtained the following equations:
I'm noob when it comes to Scilab and so far I only simullated (using ODE) linear systems with no more than two degrees of freedom, which are simple systems that I can easily convert to am matrix and integrate it using ODE.
But this system in particular I have no clue how to simulate it, not because of the sin and cos functions, but because of the fact that I don't know how to put it in a state space matrix.
I've looked for a few tutorials (listed bellow) but I didn't understand any of those, can somebody tell me how I do it, or at least point where I could learn it?
http://www.openeering.com/sites/default/files/Nonlinear_Systems_Scilab.pdf
http://www.math.univ-metz.fr/~sallet/ODE_Scilab.pdf
Thank you, and sorry about my english
The usual form means writing in terms of first order derivatives. So you'll have relations where the 2nd derivative terms will be written as:
x'' = d(x')/fx
Substitute these into the equations you have. You'll end up with eight simultaneous ODEs to solve instead of four, with appropriate initial conditions.
Although this ODE system is implicit, you can solve it with a classical (explicit) ODE solver by reformulating it this way: if you define X=(x,L,theta,q)^T then your system can be reformulated using matrix algebra as A(X,X') * X" = B(X,X'). Please note that the first order form of this system is
d/dt(X,X') = ( X', A(X,X')^(-1)*B(X,X') )
Suppose now that you have defined two Scilab functions A and B which actually compute their values w.r.t. to the values of Xand X'
function out = A(X,Xprime)
x=X(1)
L=X(2)
theta=X(3)
qa=X(4)
xd=XPrime(1)
Ld=XPrime(2)
thetad=XPrime(3)
qa=XPrime(4);
...
end
function out = B(X,Xprime)
...
end
then the right hand side of the system of 8 ODEs, as it can be given to the ode function of Scilab can be coded as follows
function dstate_dt = rhs(t,state)
X = state(1:4);
Xprime = state(5:8);
out = [ Xprime
A(X,Xprime) \ B(X,Xprime)]
end
Writing the code of A() and B() according to the given equations is the only remaining (but quite easy) task.