How to initialize weights in Flux.jl - julia

I want to initialize some weights and bias for a specific layer in my model. How is this possible in Flux.jl?

Most of the layer functions such as Dense and Conv take in the weight and bias. You can see the function definition of each by doing ? Dense or ? Conv which would reveal that for example, the Dense function can be simple called with weights and bars by doing: Dense(W::AbstractMatrix, [bias, σ]) and for the Conv function, something like this:
julia> weight = rand(3, 4, 5);
julia> bias = zeros(5);
julia> c1 = Conv(weight, bias, sigmoid) # expects 1 spatial dimension
Conv((3,), 4=>5, σ)
where sigmoid is syntactical sugar for the sigmoid function, often denoted as σ().

Related

Range for continuos distribution in Julia

I am trying to calculate the density function of a continuos random variable in range in Julia using Distributions, but I am not able to define the range. I used Truncator constructor to construct the distribution, but I have no idea how to define the range. By density function I mean P(a
Would appreciate any help. The distribution I'm using is Gamma btw!
Thanks
To get the maximum and minimum of the support of distribution d just write maximum(d) and minimum(d) respectively. Note that for some distributions this might be infinity, e.g. maximum(Normal()) is Inf.
What version of Julia and Distributions du you use? In Distribution v0.16.4, it can be easily defined with the second and third arguments of Truncated.
julia> a = Gamma()
Gamma{Float64}(α=1.0, θ=1.0)
julia> b = Truncated(a, 2, 3)
Truncated(Gamma{Float64}(α=1.0, θ=1.0), range=(2.0, 3.0))
julia> p = rand(b, 1000);
julia> extrema(p)
(2.0007680527633305, 2.99864177354943)
You can see the document of Truncated by typing ?Truncated in REPL and enter.

Solver for non-linear least squares with boundary constraints

I'm looking for an analog to Matlab's lsqnonlin function in Julia.
LsqFit.jl looks great, but doesn't accept the same arguments Matlab's implementation does; specifically:
Lower bounds
Upper bounds
Initial conditions
where initial conditions, lower, and upper bounds are vectors of length 6.
Any advice would be awesome. Thanks!
Actually, it does, it's just not explained in the readme (for good measure, here is a stable link README.md).
It is unclear what you mean by initial conditions. If you mean initial parameters, this is very much possible.
using LsqFit
# a two-parameter exponential model
# x: array of independent variables
# p: array of model parameters
model(x, p) = p[1]*exp.(-x.*p[2])
# some example data
# xdata: independent variables
# ydata: dependent variable
xdata = linspace(0,10,20)
ydata = model(xdata, [1.0 2.0]) + 0.01*randn(length(xdata))
p0 = [0.5, 0.5]
fit = curve_fit(model, xdata, ydata, p0)
(taken from the manual). Here p0 is the initial parameter vector.
This will give you something very close to [1.0, 2.0]. But what if we want to constrain the parameter to be in [0,1]x[0,1]? Then we simply set the keyword arguments lower and upper to be vectors of lower and upper bounds
fit = curve_fit(model, xdata, ydata, p0; lower = zeros(2), upper = ones(2))
That should give something like [1.0, 1.0] depending on your exact data.
Maybe it's not a proper answer, but I have had some success in the past
adding a penalization term to the cost function outside the bounds, something like a strong exponential with a step-like behaivour. The downside is defining your cost function manually, of course.

univariate nonlinear optimization with quadratic constraint in R

I have a quadratic function f where, f = function (x) {2+.1*x+.23*(x*x)}. Let's say I have another quadratic fn g where g = function (x) {3+.4*x-.60*(x*x)}
Now, I want to maximize f given the constraints 1. g>0 and 2. 600<x<650
I have tried the packages optim,constrOptim and optimize. optimize does one dim. optimization, but without constraints and constrOptim I couldn't understand. I need to this using R. Please help.
P.S. In this example, the values may be erratic as I have given two random quadratic functions, but basically I want maximization of a quadratic fn given a quadratic constraint.
If you solve g(x)=0 for x by the usual quadratic formula then that just gives you another set of bounds on x. If your x^2 coefficent is negative then g(x) > 0 between the solutions, otherwise g(x)>0 outside the solutions, so within (-Inf, x1) and (x2, Inf).
In this case, g(x)>0 for -1.927 < x < 2.59. So in this case both your constraints cannot be simultaneously achieved (g(x) is LESS THAN 0 for 600<x<650).
But supposing your second condition was 1 < x < 5, then you'd just combine the solution from g(x)>0 with that interval to get 1 < x < 2.59, and then maximise f in that interval using standard univariate optimisation.
And you don't even need to run an optimisation algorithm. Your target f is quadratic. If the coefficient of x^2 is positive the maximum is going to be at one of your limits of x, so you only have a small number of values to try. If the coefficient of x^2 is -ve then the maximum is either at a limit or at the point where f(x) peaks (solve f'(x)=0) if that is within your limits.
So you can do this precisely, there's just a few conditions to test and then some intervals to compute and then some values of f at those interval limits to calculate.

What mathematical methods work for interpolation 2d to 2d functions?

So we have a matrix like
12,32
24,12
...
with length 2xN and another
44,32
44,19
...
with length 2xN and there is some function f(x, y) that returns z[1], z[2]. That 2 matrices that we were given represent known value pairs for x,y and z[1],z[2]. What are interpolation formulas that would help in such case?
If you solve the problem for one return value, you can find two functions f_1(x,y) and f_2(x,y) by interpolation, and compose your function as f(x, y) = [f_1(x,y), f_2(x,y)]. Just pick any method for solving the interpolation function suitable for your problem.
For the actual interpolation problem in two dimensions, there are a lot of ways you can handle this. If simple is what you require, you can go with linear interpolation. If you are OK with piecewise functions, you can go for bezier curves, or splines. Or, if data is uniform, you could get away with a simple polynomial interpolation (well, not quite trivial when in 2D, but easy enough).
EDIT: More information and some links.
A piecewise solution is possible using Bilinear interpolation (wikipedia).
For polynomial interpolation, if your data is on a grid, you can use the following algorithm (I cannot find the reference for it, it is from memory).
If the data points are on a k by l grid, rewrite your polynomial as follows:
f(x,y) = cx_1(x)*y^(k-1) + cx_2(x)*y^(k-2) + ... + cx_k(x)
Here, each coefficient cx_i(x) is also a polynomial of degree l. The first step is to find k polynomials of degree l by interpolating each row or column of the grid. When this is done, you have l coefficient sets (or, in other words, l polynomials) as interpolation points for each cx_i(x) polynomials as cx_i(x0), cx_i(x1), ..., cx_i(xl) (giving you a total of l*k points). Now, you can determine these polynomials using the above constants as the interpolation points, which give you the resulting f(x,y).
The same method is used for bezier curves or splines. The only difference is that you use control points instead of polynomial coefficients. You first get a set of splines that will generate your data points, and then you interpolate the control points of these intermediate curves to get the control points of the surface curve.
Let me add an example to clarify the above algorithm. Let's have the following data points:
0,0 => 1
0,1 => 2
1,0 => 3
1,1 => 4
We start by fitting two polynomials: one for data points (0,0) and (0,1), and another for (1, 0) and (1, 1):
f_0(x) = x + 1
f_1(x) = x + 3
Now, we interpolate in the other direction to determine the coefficients.When we read these polynomial coefficients vertically, we need two polynomials. One evaluates to 1 at both 0 and 1; and another that evaluates to 1 at 0, and 3 at 1:
cy_1(y) = 1
cy_2(y) = 2*y + 1
If we combine these into f(x,y), we get:
f(x,y) = cy_1(y)*x + cy_2(y)
= 1*x + (2*y + 1)*1
= x + 2*y + 1

Curve fitting: Find the smoothest function that satisfies a list of constraints

Consider the set of non-decreasing surjective (onto) functions from (-inf,inf) to [0,1].
(Typical CDFs satisfy this property.)
In other words, for any real number x, 0 <= f(x) <= 1.
The logistic function is perhaps the most well-known example.
We are now given some constraints in the form of a list of x-values and for each x-value, a pair of y-values that the function must lie between.
We can represent that as a list of {x,ymin,ymax} triples such as
constraints = {{0, 0, 0}, {1, 0.00311936, 0.00416369}, {2, 0.0847077, 0.109064},
{3, 0.272142, 0.354692}, {4, 0.53198, 0.646113}, {5, 0.623413, 0.743102},
{6, 0.744714, 0.905966}}
Graphically that looks like this:
(source: yootles.com)
We now seek a curve that respects those constraints.
For example:
(source: yootles.com)
Let's first try a simple interpolation through the midpoints of the constraints:
mids = ({#1, Mean[{#2,#3}]}&) ### constraints
f = Interpolation[mids, InterpolationOrder->0]
Plotted, f looks like this:
(source: yootles.com)
That function is not surjective. Also, we'd like it to be smoother.
We can increase the interpolation order but now it violates the constraint that its range is [0,1]:
(source: yootles.com)
The goal, then, is to find the smoothest function that satisfies the constraints:
Non-decreasing.
Tends to 0 as x approaches negative infinity and tends to 1 as x approaches infinity.
Passes through a given list of y-error-bars.
The first example I plotted above seems to be a good candidate but I did that with Mathematica's FindFit function assuming a lognormal CDF.
That works well in this specific example but in general there need not be a lognormal CDF that satisfies the constraints.
I don't think you've specified enough criteria to make the desired CDF unique.
If the only criteria that must hold is:
CDF must be "fairly smooth" (see below)
CDF must be non-decreasing
CDF must pass through the "error bar" y-intervals
CDF must tend toward 0 as x --> -Infinity
CDF must tend toward 1 as x --> Infinity.
then perhaps you could use Monotone Cubic Interpolation.
This will give you a C^2 (twice continously differentiable) function which,
unlike cubic splines, is guaranteed to be monotone when given monotone data.
This leaves open the question, exactly what data should you use to generate the
monotone cubic interpolation. If you take the center point (mean) of each error
bar, are you guaranteed that the resulting data points are monotonically
increasing? If not, you might as well make some arbitrary choice to guarantee
that the points you select are monotonically increasing (because the criteria does not force our solution to be unique).
Now what to do about the last data point? Is there an X which is guaranteed to
be larger than any x in the constraints data set? Perhaps you can again make an
arbitrary choice of convenience and pick some very large X and put (X,1) as the
final data point.
Comment 1: Your problem can be broken into 2 sub-problems:
Given exact points (x_i,y_i) through which the CDF must pass, how do you generate CDF? I suspect there are infinitely many possible solutions, even with the infinite-smoothness constraint.
Given y-errorbars, how should you pick (x_i,y_i)? Again, there infinitely many possible solutions. Some additional criteria may need to be added to force a unique choice. Additional criteria would also probably make the problem even harder than it currently is.
Comment 2: Here is a way to use monotonic cubic interpolation, and satisfy criteria 4 and 5:
The monotonic cubic interpolation (let's call it f) maps R --> R.
Let CDF(x) = exp(-exp(f(x))). Then CDF: R --> (0,1). If we could find the appropriate f, then by defining CDF this way, we could satisfy criteria 4 and 5.
To find f, transform the CDF constraints (x_0,y_0),...,(x_n,y_n) using the transformation xhat_i = x_i, yhat_i = log(-log(y_i)). This is the inverse of the CDF transformation. If the y_i's were increasing, then the yhat_i's are decreasing.
Now apply monotone cubic interpolation to the (x_hat,y_hat) data points to generate f. Then finally, define CDF(x) = exp(-exp(f(x))). This will be a monotonically increasing function from R --> (0,1), which passes through the points (x_i,y_i).
This, I think, satisfies all the criteria 2--5. Criteria 1 is somewhat satisfied, though there certainly could exist smoother solutions.
I have found a solution that gives reasonable results for a variety of inputs.
I start by fitting a model -- once to the low ends of the constraints, and again to the high ends.
I'll refer to the mean of these two fitted functions as the "ideal function".
I use this ideal function to extrapolate to the left and to the right of where the constraints end, as well as to interpolate between any gaps in the constraints.
I compute values for the ideal function at regular intervals, including all the constraints, from where the function is nearly zero on the left to where it's nearly one on the right.
At the constraints, I clip these values as necessary to satisfy the constraints.
Finally, I construct an interpolating function that goes through these values.
My Mathematica implementation follows.
First, a couple helper functions:
(* Distance from x to the nearest member of list l. *)
listdist[x_, l_List] := Min[Abs[x - #] & /# l]
(* Return a value x for the variable var such that expr/.var->x is at least (or
at most, if dir is -1) t. *)
invertish[expr_, var_, t_, dir_:1] := Module[{x = dir},
While[dir*(expr /. var -> x) < dir*t, x *= 2];
x]
And here's the main function:
(* Return a non-decreasing interpolating function that maps from the
reals to [0,1] and that is as close as possible to expr[var] without
violating the given constraints (a list of {x,ymin,ymax} triples).
The model, expr, will have free parameters, params, so first do a
model fit to choose the parameters to satisfy the constraints as well
as possible. *)
cfit[constraints_, expr_, params_, var_] :=
Block[{xlist,bots,tops,loparams,hiparams,lofit,hifit,xmin,xmax,gap,aug,bests},
xlist = First /# constraints;
bots = Most /# constraints; (* bottom points of the constraints *)
tops = constraints /. {x_, _, ymax_} -> {x, ymax};
(* fit a model to the lower bounds of the constraints, and
to the upper bounds *)
loparams = FindFit[bots, expr, params, var];
hiparams = FindFit[tops, expr, params, var];
lofit[z_] = (expr /. loparams /. var -> z);
hifit[z_] = (expr /. hiparams /. var -> z);
(* find x-values where the fitted function is very close to 0 and to 1 *)
{xmin, xmax} = {
Min#Append[xlist, invertish[expr /. hiparams, var, 10^-6, -1]],
Max#Append[xlist, invertish[expr /. loparams, var, 1-10^-6]]};
(* the smallest gap between x-values in constraints *)
gap = Min[(#2 - #1 &) ### Partition[Sort[xlist], 2, 1]];
(* augment the constraints to fill in any gaps and extrapolate so there are
constraints everywhere from where the function is almost 0 to where it's
almost 1 *)
aug = SortBy[Join[constraints, Select[Table[{x, lofit[x], hifit[x]},
{x, xmin,xmax, gap}],
listdist[#[[1]],xlist]>gap&]], First];
(* pick a y-value from each constraint that is as close as possible to
the mean of lofit and hifit *)
bests = ({#1, Clip[(lofit[#1] + hifit[#1])/2, {#2, #3}]} &) ### aug;
Interpolation[bests, InterpolationOrder -> 3]]
For example, we can fit to a lognormal, normal, or logistic function:
g1 = cfit[constraints, CDF[LogNormalDistribution[mu,sigma], z], {mu,sigma}, z]
g2 = cfit[constraints, CDF[NormalDistribution[mu,sigma], z], {mu,sigma}, z]
g3 = cfit[constraints, 1/(1 + c*Exp[-k*z]), {c,k}, z]
Here's what those look like for my original list of example constraints:
(source: yootles.com)
The normal and logistic are nearly on top of each other and the lognormal is the blue curve.
These are not quite perfect.
In particular, they aren't quite monotone.
Here's a plot of the derivatives:
Plot[{g1'[x], g2'[x], g3'[x]}, {x, 0, 10}]
(source: yootles.com)
That reveals some lack of smoothness as well as the slight non-monotonicity near zero.
I welcome improvements on this solution!
You can try to fit a Bezier curve through the midpoints. Specifically I think you want a C2 continuous curve.

Resources