Multi label classification in Flux.jl? - julia

I am currently working with a dataset in which the boundary between classes is not very well defined. I don't want to use regular classification since the nuance overlap of these classes might not be represented using that setup.
I've seen a similar setup in PyTorch where the Binary Cross Entropy Loss function was used, but other than that, I am not sure what needs to be done to reformation my problem from classification to multi label classification in Flux. From the Flux.jl docs, it looks like I may want to use a custom Split label?

This question made me thought on how to implement multi-label classification in BetaML, my own ML library, and it ended up it was relatively easily:
(EDIT: Model simplified by just using a couple of DenseLayers with the second layer's activation function f=x -> (tanh(x) + 1)/2 )
using BetaML
# Creating test data..
X = rand(2000,2)
# note that the Y are 0.0/1.0 floats
Y = hcat(round.(tanh.(0.5 .* X[:,1] + 0.8 .* X[:,2])),
round.(tanh.(0.5 .* X[:,1] + 0.3 .* X[:,2])),
round.(tanh.(max.(0.0,-3 .* X[:,1].^2 + 2 * X[:,1] + 0.5 .* X[:,2]))))
# Creating the NN model...
l1 = DenseLayer(2,10,f = relu)
l2 = DenseLayer(10,3,f = x -> (tanh(x) + 1)/2)
mynn = buildNetwork([l1,l2],squaredCost,name="Multinomial multilabel regression Model")
# Train of the model...
train!(mynn,X,Y,epochs=100,batchSize=8)
# Predictions...
ŷ = round.(predict(mynn,X))
(nrec,ncat) = size(Y)
# Just a basic accuracy measure. I could think to extend the ConfusionMatrix measures to multi-label classification if needed..
overallAccuracy = sum(ŷ .== Y)/(nrec*ncat) # 0.999
I initially thought on using softmax with a learnable beta parameter, but then I realised that such way is not possible: how would the model be able to distinguish between Y = [0 0 0] and Y = [1 1 1] ? So I ended up with a layer with an adjusted tanh function that guarantee me an output in the [0,1] range for each label "independently", and setting the threshold on 0.5, the value that maximise the loss (in BetaML the output is already a vector if the last layer has more than a single node).

Related

How to fit a function to measurements with error?

I am currently using Measurements.jl for error propagation and LsqFit.jl for fitting functions to data. Is there a simple way to fit a function to data with errors? It would be no problem to use an other package if that makes things easier.
Thanks in advance for your help.
While in principle it should be possible to make these packages work together, the implementation of LsqFit.jl does not seem to play nicely with the Measurement type. However, if one writes a simple least-squares linear regression directly
# Generate test data, with noise
x = 1:10
y = 2x .+ 3
using Measurements
x_observed = (x .+ randn.()) .± 1
y_observed = (y .+ randn.()) .± 1
# Simple least-squares linear regression
# for an equation of the form y = a + bx
# using `\` for matrix division
linreg(x, y) = hcat(fill!(similar(x), 1), x) \ y
(a, b) = linreg(x_observed, y_observed)
then
julia> (a, b) = linreg(x_observed, y_observed)
2-element Vector{Measurement{Float64}}:
3.9 ± 1.4
1.84 ± 0.23
This ought to be able to work with either x uncertainties, y uncertainties, or both.
If you need a nonlinear least-squares fit, it should also be possible to extend the above approach to nonlinear least squares -- though for the latter it may be easier to just find where the incompatibility is in LsqFit.jl and make a PR.

General Equilibrium Problem using SymPy in Julia

I am trying to solve an economic problem using the sympy package in Julia. In this economic problem I have exogenous variables and endogenous variables and I am indexing them all. I have two questions:
How to access the indexed variables to pass: calibrated values ( to exogenous variables, calibrated in other enveiroment) or formula (to endogenous variables, determined by the first order conditions of the agents' maximalization problem using pencil and paper). This will also allow me to study the behavior of equilibrium when I disturb exogenous variables. First, consider my attempto to pass calibrated values on exogenous variables.
using SymPy
# To index
n,N = sympy.symbols("n N", integer=True)
N = 3 # It can change
# Household
#exogenous variables
α = sympy.IndexedBase("α")
#syms γ
α2 = sympy.Sum(α[n], (n, 1, N))
equation_1 = Eq(α2 + γ, 1)
The equation_1 says that the alpha's plus gamma sums one. So I would like to pass values to the α vector according to another vector, alpha3, with calibrated parameters.
# Suposse
alpha3 = [1,2,3]
for n in 1:N
α[n]= alpha3[n]
end
MethodError: no method matching setindex!(::Sym, ::Int64, ::Int64)
I will certainly do this step once the system is solved. Now, I want to pass formulas or expressions as a function of prices. Prices are endogenous and unknown variables. (As said before, the expressions were calculated using paper and pencil)
# Price vector, Endogenous, unknown in the system equations
P = sympy.IndexedBase("P")
# Other exogenous variables to be calibrated.
z = sympy.IndexedBase("z")
s = sympy.IndexedBase("s")
Y = sympy.IndexedBase("Y")
# S[n] and D[n], Supply and Demand, are endogenous, but determined by the first order conditions of the maximalization problem of the agents
# Supply and Demand
S = sympy.IndexedBase("S")
D = sympy.IndexedBase("D")
# (Hypothetical functions that I have to pass)
# S[n] = s[n]*P[n]
# D[n] = z[n]/P[n]
Once I can write the formulas on S[n] and D[n], consider the second question:
How to specify the endogenous variables indexed (All prices in their indexed format P[n]) as being unknown in the system of non-linear equations? I will ignore the possibility of not solving my system. Suppose my system has a single solution or infinite (manifold). So let's assume that I have more equations than variables:
# For all n, I want determine N indexed equations (looping?)
Eq_n = Eq(S[n] - D[n],0)
# Some other equations relating the P[n]'s
Eq0 = Eq(sympy.Sum(P[n]*Y[n] , (n, 1, N)), 0 )
# Equations system
eq_system = [Eq_n,Eq0]
# Solving
solveset(eq_system,P[n])
Many thanks
There isn't any direct support for the IndexedBase feature of SymPy. As such, the syntax alpha[n] is not available. You can call the method __getitem__ directly, as with
alpha.__getitem__[n]
I don't see a corresponding __setitem__ documented, so I'm not sure whether
α[n]= alpha3[n]
is valid in sympy itself. But if there is some other assignment method, you would likely just call that instead of the using [ for assignment.
As for the last question about equations, I'm not sure but you would presumably find the size of the IndexedBase object and use that to loop.
If possible, using native julia constructs would be preferred, as possible. For this example, you might just consider an array of variables. The recently changed #syms macro makes this easy to generate.
For example, I think the following mostly replicates what you are trying to do:
#syms n::integer, N::integer
#exogenous variables
N = 3
#syms α[1:3] # hard code 3 here or use `α =[Sym("αᵢ$i") for i ∈ 1:N]`
#syms γ
α2 = sum(α[i] for i ∈ 1:N)
equation_1 = Eq(α2 + γ, 1)
alpha3 = [1,2,3]
for n in 1:N
α[n]= alpha3[n]
end
#syms P[1:3], z[1:3], s[1:3], γ[1:3], S[1:3], D[1:3]
Eq_n = [Eq(S[n], D[n]) for n ∈ 1:N]
Eq0 = Eq(sum(P .* Y), 0)
eq_system = [Eq_n,Eq0]
solveset(eq_system,P[n])

How is Hard Sigmoid defined

I am working on Deep Nets using keras. There is an activation "hard sigmoid". Whats its mathematical definition ?
I know what is Sigmoid. Someone asked similar question on Quora: https://www.quora.com/What-is-hard-sigmoid-in-artificial-neural-networks-Why-is-it-faster-than-standard-sigmoid-Are-there-any-disadvantages-over-the-standard-sigmoid
But I could not find the precise mathematical definition anywhere ?
Since Keras supports both Tensorflow and Theano, the exact implementation might be different for each backend - I'll cover Theano only. For Theano backend Keras uses T.nnet.hard_sigmoid, which is in turn linearly approximated standard sigmoid:
slope = tensor.constant(0.2, dtype=out_dtype)
shift = tensor.constant(0.5, dtype=out_dtype)
x = (x * slope) + shift
x = tensor.clip(x, 0, 1)
i.e. it is: max(0, min(1, x*0.2 + 0.5))
For reference, the hard sigmoid function may be defined differently in different places. In Courbariaux et al. 2016 [1] it's defined as:
σ is the “hard sigmoid” function: σ(x) = clip((x + 1)/2, 0, 1) =
max(0, min(1, (x + 1)/2))
The intent is to provide a probability value (hence constraining it to be between 0 and 1) for use in stochastic binarization of neural network parameters (e.g. weight, activation, gradient). You use the probability p = σ(x) returned from the hard sigmoid function to set the parameter x to +1 with p probability, or -1 with probability 1-p.
[1] https://arxiv.org/abs/1602.02830 - "Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1", Matthieu Courbariaux, Itay Hubara, Daniel Soudry, Ran El-Yaniv, Yoshua Bengio, (Submitted on 9 Feb 2016 (v1), last revised 17 Mar 2016 (this version, v3))
The hard sigmoid is normally a piecewise linear approximation of the logistic sigmoid function. Depending on what properties of the original sigmoid you want to keep, you can use a different approximation.
I personally like to keep the function correct at zero, i.e. σ(0) = 0.5 (shift) and σ'(0) = 0.25 (slope). This could be coded as follows
def hard_sigmoid(x):
return np.maximum(0, np.minimum(1, (x + 2) / 4))
it is
clip((x + 1)/2, 0, 1)
in coding parlance:
max(0, min(1, (x + 1)/2))

OpenMDAO 1.2.0 implicit component

I new to OpenMDAO and I'm still learning how to formulate the problems.
For a simple example, let's say I have 3 input variables with the given bounds:
1 <= x <= 10
0 <= y <= 10
1 <= z <= 10
and I have 4 outputs, defined as:
f1 = x * y
f2 = 2 * z
g1 = x + y - 1
g2 = z
my goal is to minimize f1 * g1, but enforce the constraint f1 = f2 and g1 = g2. For example, one solution is x=3, y=4, z=6 (no idea if this is optimal).
For this simple problem, you can probably just feed the output equality constraints to the driver. However, for my actual problem it's hard to find an initial starting point that satisfy all the constraints, and as the result the optimizer failed to do anything. I figure maybe I could define y and z as states in an implicit component and have a nonlinear solver figure out the right values of y and z given x, then feed x to the optimization driver.
Is this a possible approach? If so, how will the implicit component look like in this case? I looked at the Sellar problem tutorial but I wasn't able to translate it to this case.
You could create an implicit component if you want. In that case, you would define an apply_linear method in your component. That is done with the sellar problem here.
In your case since you have a 2 equation set of residuals which are both dependent on the state variables, I suggest you create a single array state variable of length 2, call it foo (I used a new variable to avoid any confusion, but name it whatever you want!). Then you will define two residuals, one for each element of the residual array of the new state variable.
Something like:
resids['foo'][0] = params['x'] * unknowns['foo'][0] - 2 * unknowns['foo'][1]
resids['foo'][1] = params['x'] + unknowns['foo'][0] - 1 - unknowns['foo'][1]
If you wanted to keep the state variable names separate you could, and it will still work. You'll just have to arbitrarily assign one residual equation to one variable and one to the other.
Then the only thing left is to add a non linear solver to the group containing your implicit component and it should work. If you choose to use a newton solver, you'll either need to set fd_options['force_fd'] = True or define derivatives of your residuals wrt all params and state variables.

R Nonlinear Least Squares (nls) Model Fitting

I'm trying to fit the information from the G function of my data to the following mathematical mode: y = A / ((1 + (B^2)*(x^2))^((C+1)/2)) . The shape of this graph can be seen here:
http://www.wolframalpha.com/input/?i=y+%3D+1%2F+%28%281+%2B+%282%5E2%29*%28x%5E2%29%29%5E%28%282%2B1%29%2F2%29%29
Here's a basic example of what I've been doing:
data(simdat)
library(spatstat)
simdat.Gest <- Gest(simdat) #Gest is a function within spatstat (explained below)
Gvalues <- simdat.Gest$rs
Rvalues <- simdat.Gest$r
GvsR_dataframe <- data.frame(R = Rvalues, G = rev(Gvalues))
themodel <- nls(rev(Gvalues) ~ (1 / (1 + (B^2)*(R^2))^((C+1)/2)), data = GvsR_dataframe, start = list(B=0.1, C=0.1), trace = FALSE)
"Gest" is a function found within the 'spatstat' library. It is the G function, or the nearest-neighbour function, which displays the distance between particles on the independent axis, versus the probability of finding a nearest neighbour particle on the dependent axis. Thus, it begins at y=0 and hits a saturation point at y=1.
If you plot simdat.Gest, you'll notice that the curve is 's' shaped, meaning that it starts at y = 0 and ends up at y = 1. For this reason, I reveresed the vector Gvalues, which are the dependent variables. Thus, the information is in the correct orientation to be fitted the above model.
You may also notice that I've automatically set A = 1. This is because G(r) always saturates at 1, so I didn't bother keeping it in the formula.
My problem is that I keep getting errors. For the above example, I get this error:
Error in nls(rev(Gvalues) ~ (1/(1 + (B^2) * (R^2))^((C + 1)/2)), data = GvsR_dataframe, :
singular gradient
I've also been getting this error:
Error in nls(Gvalues1 ~ (1/(1 + (B^2) * (x^2))^((C + 1)/2)), data = G_r_dataframe, :
step factor 0.000488281 reduced below 'minFactor' of 0.000976562
I haven't a clue as to where the first error is coming from. The second, however, I believe was occurring because I did not pick suitable starting values for B and C.
I was hoping that someone could help me figure out where the first error was coming from. Also, what is the most effective way to pick starting values to avoid the second error?
Thanks!
As noted your problem is most likely the starting values. There are two strategies you could use:
Use brute force to find starting values. See package nls2 for a function to do this.
Try to get a sensible guess for starting values.
Depending on your values it could be possible to linearize the model.
G = (1 / (1 + (B^2)*(R^2))^((C+1)/2))
ln(G)=-(C+1)/2*ln(B^2*R^2+1)
If B^2*R^2 is large, this becomes approx. ln(G) = -(C+1)*(ln(B)+ln(R)), which is linear.
If B^2*R^2 is close to 1, it is approx. ln(G) = -(C+1)/2*ln(2), which is constant.
(Please check for errors, it was late last night due to the soccer game.)
Edit after additional information has been provided:
The data looks like it follows a cumulative distribution function. If it quacks like a duck, it most likely is a duck. And in fact ?Gest states that a CDF is estimated.
library(spatstat)
data(simdat)
simdat.Gest <- Gest(simdat)
Gvalues <- simdat.Gest$rs
Rvalues <- simdat.Gest$r
plot(Gvalues~Rvalues)
#let's try the normal CDF
fit <- nls(Gvalues~pnorm(Rvalues,mean,sd),start=list(mean=0.4,sd=0.2))
summary(fit)
lines(Rvalues,predict(fit))
#Looks not bad. There might be a better model, but not the one provided in the question.

Resources