How to combine multiple objectives for optimization? - math

I don't know why this is so hard for me to figure out.
For example, I have two functions, f(x, y) and g(x, y). I want to find the values of x and y such that:
f(x, y) is at a target value (minimize the difference from the target)
g(x, y) is minimized (can be negative, doesn't stop at 0)
x and y are bounded (so g's minimum doesn't necessarily have a gradient of 0)
So if I were just finding a solution for f, I could minimize abs(f(x, y) - target), for instance, and it will hit zero when it's found a solution. But there are multiple such solutions and I also want to find the one that minimizes g.
So how do I combine these two functions into a single expression that I can then minimize (using a Newton-like method)?
My first attempt was 100*abs(f(x, y) - target) + g(x, y) to strongly emphasize hitting the target first, and it worked for some cases of the target value, but failed for others, since g(x, y) could go so negative that it dominated the combination and the optimizer stopped caring about f. How do I guarantee that f hitting the target is always dominant?
Are there general rules for how to combine multiple objectives into a single objective?

There is a rich literature about multi-objective optimization. Two popular methods are weighted objective and a lexicographic approach.
A weighted objective could be designed as:
min w1 * [f-target]^2 + w2 * g
for some weights w1, w2 >= 0. Often we have w1+w2=1 so we can also write:
min w1 * [f-target]^2 + (1-w1) * g
Set w1 to a larger value than w2 to put emphasis on the f objective.
The lexicographic method assumes an ordering of objectives. It can look like:
Solve with first objective z = min [f-target]^2. Let z* be the optimal objective.
Solve with the second objective while staying close to z*:
min g subject to [f-target]^2-z* <= tolerance
To measure the deviation between the target and f I used a quadratic function here. You can also use an absolute value.

Since you cannot exactly get f(x,y)-target to be zero, you have to accept some amount of error. I will use the relative error r = abs((f(x, y) - target)/target).
A function which grows extremely rapidly with r should do the trick.
exp(r/epsilon) + g(x, y)
If I choose epsilon = 1e-10, then I know r has to be less than 1e-7, because exp(1000) is an enormous number, but when r is small, like r = 1e-12, then the exponential changes very slowly, and g(x,y) will be the dominant term. You can even take it a step further and calculate how close x and y are to their true value, but its usually just easier to adjust the parameter until you get what you need.

Related

Solve a particular linear system efficiently in julia

I use extensively the julia's linear equation solver res = X\b. I have to use it millions of times in my program because of parameter variation. This was working ok because I was using small dimensions (up to 30). Now that I want to analyse bigger systems, up to 1000, the linear solver is no longer efficient.
I think there can be a work around. However I must say that sometimes my X matrix is dense, and sometimes is sparse, so I need something that works fine for both cases.
The b vector is a vector with all zeroes, except for one entry which is always 1 (actually it is always the last entry). Moreover, I don't need all the res vector, just the first entry of it.
If your problem is of the form (A - µI)x = b, where µ is a variable parameter and A, b are fixed, you might work with diagonalization.
Let A = PDP° where P° denotes the inverse of P. Then (PDP° - µI)x = b can be transformed to
(D - µI)P°x = P°b,
P°x = P°b / (D - µI),
x = P(P°b / (D - µI)).
(the / operation denotes the division of the respective vector elements by the scalars Dr - µ.)
After you have diagonalized A, computing a solution for any µ reduces to two matrix/vector products, or a single one if you can also precompute P°b.
Numerical instability will show up in the vicinity of the Eigenvalues of A.
Usually when people talk about speeding up linear solvers res = X \ b, it’s for multiple bs. But since your b isn’t changing, and you just keep changing X, none of those tricks apply.
The only way to speed this up, from a mathematical perspective, seems to be to ensure that Julia is picking the fastest solver for X \ b, i.e., if you know X is positive-definite, use Cholesky, etc. Matlab’s flowcharts for how it picks the solver to use for X \ b, for dense and sparse X, are available—most likely Julia implements something close to these flowcharts too, but again, maybe you can find some way to simplify or shortcut it.
All programming-related speedups (multiple threads—while each individual solver is probably already multi-threaded, it may be worth running multiple solvers in parallel when each solver uses fewer threads than cores; #simd if you’re willing to dive into the solvers themselves; OpenCL/CUDA libraries; etc.) then can be applied.
Best approach for efficiency would be to use: JuliaMath/IterativeSolvers.jl. For A * x = b problems, I would recommend x = lsmr(A, b).
Second best alternatives would be to give a bit more information to the compiler: instead of x = inv(A'A) * A' * b, do x = inv(cholfact(A'A)) A' * b if Cholesky decomposition works for you. Otherwise, you could try U, S, Vt = svd(A) and x = Vt' * diagm(sqrt.(S)) * U' * b.
Unsure if x = pinv(A) * b is optimized, but might be slightly more efficient than x = A \ b.

How to adjust coefficient of equations to obtain high correlation between y and x_i?

Given a set of variables, x's. I want to find the values of coefficients for this equation:
y = a_1*x_1 +... +a_n*x_n + c
where a_1,a_2,...,a_n are all unknowns. Thinking this in perspective of data frame, I want to create this value of y for every rows in the data.
My question is: for y, a_1...a_n and c are all unknown, is there a way for me to find a set of solutions a_1,...,a_n under the condition that corr(y,x_1), corr(y,x_2) .... corr(y,x_n) are all greater than 0.7. For simplicity take correlation here as Pearson correlation. I know there would no be unique solution. But how can I construct a set of solutions for a_1,...,a_n to fulfill this condition?
Spent a day to search the idea but could not get any information out of it. Any programming language to tackle this problem is welcomed or at least some reference for this.
No, it is not possible in general. It may be possible in some special cases.
Given x₁, x₂, ... you want to find y = a₁x₁ + a₂x₂ + ... + c so that all the correlations between y and the x's are greater than some target R. Since the correlation is
Corr(y, xi) = Cov(y, xi) / Sqrt[ Var(y) * Var(xi) ]
your constraint is
Cov(y, xi) / Sqrt[ Var(y) * Var(xi) ] > R
which can be rearranged to
Cov(y, xi)² > R² * Var(y) * Var(xi)
and this needs to be true for all i.
Consider the simple case where there are only two columns x₁ and x₂, and further assume that they both have mean zero (so you can ignore the constant c) and variance 1, and that they are uncorrelated. In that case y = a₁x₁ + a₂x₂ and the covariances and variances are
Cov(y, x₁) = a₁
Cov(y, x₂) = a₂
Var(x₁) = 1
Var(x₂) = 1
Var(y) = (a₁)² + (a₂)²
so you need to simultaneously satisfy
(a₁)² > R² * ((a₁)² + (a₂)²)
(a₂)² > R² * ((a₁)² + (a₂)²)
Adding these inequalities together, you get
(a₁)² + (a₂)² > 2 * R² * ((a₁)² + (a₂)²)
which means that in order to satisfy both of the inequalities, you must have R < Sqrt(1/2) (by cancelling common factors on both sides of the inequality). So the very best you could do in this simple case is to choose a₁ = a₂ (the exact value doesn't matter as long as they are equal) and both of the correlations Corr(y,a₁) and Corr(y,a₂) will be equal to 0.707. You cannot achieve correlations higher than this between y and all of the x's simultaneously in this case.
For the more general case with n columns (each of which has mean zero, variance 1 and zero correlation between columns) you cannot simultaneously achieve correlations greater than 1 / sqrt(n) (as pointed out in the comments by #kazemakase).
In general, the more independent variables there are, the lower the correlation you will be able to achieve between y and the x's. Also (although I haven't mentioned it above) the correlations between the x's matter. If they are in general positively correlated, you will be able to achieve a higher target correlation between y and the x's. If they are in general uncorrelated or negatively correlated, you will only be able to achieve low correlations between y and the x's.
I am not expert in this field so read with extreme prejudice!
I am a bit confused by your y
Your y is a single constant and you want to have the correlation between it and all the x_i values be > 0.7 ? I am no math/statistics expert but my feelings for this are that this is achievable only if the correlation between x_i,x_j upholds the same condition. in that case you can simply do the average of x_i like this:
y=(x_1+x_2+x_3+...+x_n)/n
so the a_i=1.0/n and c=0.0 But still the question is:
What meaning has a correlation between 2 numbers only?
More reasonable would be if y is a function dependent on x
for example like this:
y(x) = a_1*(x-x_1)+... +a_n*(x-x_n) + c
or any other equation (hard to make any without knowing where it came from and for what purpose). Then you can compute the correlation between two sets
X = { x_1 , x_2 ,..., x_n }
Y = { y(x_1),y(x_2),...y(x_n) }
In that case I would give try approximation search for the c,a_i constants to maximize correlation between X,Y, but the results complexity for the whole thing would be insane. So instead I would tweak just one constant. at the time
set some safe c,a_1,a_2,... constants
tweak a_1
so compute correlation for (a_1-delta) and (a_1+delta) and then choose the direction which is in favor of correlation. then keep going in that direction until the correlation coefficient start to drop.
Then you can recursively to this again with smaller delta. Btw this is exactly what my approx class does from the link above.
loop #2 through all the a_i
loop this whole few times to enhance precision
May be you could compute the c after each run to minimize the distance between X,Y sets.

average value when comparing each element with each other element in a list

I have number of strings (n strings) and I am computing edit distance between strings in a way that I take first one and compare it to the (n-1) remaining strings, second one and compare it to (n-2) remaining, ..., comparing until I ran out of the strings.
Why would an average edit distance be computed as sum of all the edit distances between all the strings divided by the number of comparisons squared. This squaring is confusing me.
Thanks,
Jannine
I assume you have somewhere an answer that seems to come with a squared factor -which I'll take as n^2, where n is the number of strings (not the number of distinct comparisons, which is n*(n-1)/2, as +flaschenpost points to ). It would be easier to give you a more precise answer if you'd exactly quote what that answer is.
From what I understand of your question, it isn't, at least it's not the usual sample average. It is, however, a valid estimator of central tendency with the caveat that it is a biased estimator.
See https://en.wikipedia.org/wiki/Bias_of_an_estimator.
Let's define the sample average, which I will denote as X', by
X' = \sum^m_i X_i/N
IF N=m, we get the standard average. In your case, this is the number of distinct pairs which is m=n*(n-1)/2. Let's call this average Xo.
Then if N=n*n, it is
X' = (n-1)/(2*n) Xo
Xo is an unbiased estimator of the population mean \mu. Therefore, X' is biased by a factor f=(n-1)/(2*n). For n very large this bias tends to 1/2.
That said, it could be that the answer you see has a sum that runs not just over distinct pairs. The normalization would then change, of course. For instance, we could extend that sum to all pairs without changing the average value: The correct normalization would then be N = n*(n-1); the value of the average would still be Xo though as the number of summands has double as well.
Those things are getting easier to understand if done by hand with pen and paper for a small example.
If you have the 7 Strings named a,b,c,d,e,f,g, then the simplest version would
Compare a to b, a to c, ... , a to g (this are 6)
Compare b to a, b to c, ... , b to g (this are 6)
. . .
Compare g to a, g to b, ... , g to f (this are 6)
So you have 7*6 or n*(n-1) values, so you divide by nearly 7^2. This is where the square comes from. Maybe you even compare a to a, which should bring a distance of 0 and increase the values to 7*7 or n*n. But I would count it a bit as cheating for the average distance.
You could double the speed of the algorithm, just changing it a small bit
Compare a to b, a to c, ... , a to g (this are 6)
Compare b to c, ... , b to g (this are 5)
Compare c to d, ... , b to g (this are 4)
. . .
Compare f to g (this is 1)
That is following good ol' Gauss 7*6/2, or n*(n-1)/2.
So in Essence: Try doing a simple example on paper and then count your distance values.
Since Average is still and very simply the same as ever:
sum(values) / count(values)

How to implement fuzzy minimum function via fuzzy maximum

I know that I can represent fuzzy max via power function(i need it in neural network) i.e.
def max(p:Double)(a:Double,b:Double) =
pow(pow(a,p) + pow(b,p) , 1/p)
// assumption a >=0 and b >=0
It is become maximum when p -> infinity and sum when p = 1
Not sure how correctly implement fuzzy minimum.
If you are willing to replace "sum" with "harmonic sum" for the p=1 case, you can use
1/(pow(pow(a,-p) + pow(b,-p),1/p))
This converges to min(a,b) as p goes to infinity.
For p=1 it's 1/(1/a + 1/b), which is related to the harmonic mean but without the factor of 2. Just like in your original formula, a+b is related to the arithmetic mean but without the factor of 2.
However, note that both of these formulas (yours and mine) converge much more slowly to the limit as p goes to infinity, for cases where a and b are closer together.

how to evaluate derivative of function in matlab?

This should be very simple. I have a function f(x), and I want to evaluate f'(x) for a given x in MATLAB.
All my searches have come up with symbolic math, which is not what I need, I need numerical differentiation.
E.g. if I define: fx = inline('x.^2')
I want to find say f'(3), which would be 6, I don't want to find 2x
If your function is known to be twice differentiable, use
f'(x) = (f(x + h) - f(x - h)) / 2h
which is second order accurate in h. If it is only once differentiable, use
f'(x) = (f(x + h) - f(x)) / h (*)
which is first order in h.
This is theory. In practice, things are quite tricky. I'll take the second formula (first order) as the analysis is simpler. Do the second order one as an exercise.
The very first observation is that you must make sure that (x + h) - x = h, otherwise you get huge errors. Indeed, f(x + h) and f(x) are close to each other (say 2.0456 and 2.0467), and when you substract them, you lose a lot of significant figures (here it is 0.0011, which has 3 significant figures less than x). So any error on h is likely to have a huge impact on the result.
So, first step, fix a candidate h (I'll show you in a minute how to chose it), and take as h for your computation the quantity h' = (x + h) - x. If you are using a language like C, you must take care to define h or x as volatile for that computation not to be optimized away.
Next, the choice of h. The error in (*) has two parts: the truncation error and the roundoff error. The truncation error is because the formula is not exact:
(f(x + h) - f(x)) / h = f'(x) + e1(h)
where e1(h) = h / 2 * sup_{x in [0,h]} |f''(x)|.
The roundoff error comes from the fact that f(x + h) and f(x) are close to each other. It can be estimated roughly as
e2(h) ~ epsilon_f |f(x) / h|
where epsilon_f is the relative precision in the computation of f(x) (or f(x + h), which is close). This has to be assessed from your problem. For simple functions, epsilon_f can be taken as the machine epsilon. For more complicated ones, it can be worse than that by orders of magnitude.
So you want h which minimizes e1(h) + e2(h). Plugging everything together and optimizing in h yields
h ~ sqrt(2 * epsilon_f * f / f'')
which has to be estimated from your function. You can take rough estimates. When in doubt, take h ~ sqrt(epsilon) where epsilon = machine accuracy. For the optimal choice of h, the relative accuracy to which the derivative is known is sqrt(epsilon_f), ie. half the significant figures are correct.
In short: too small a h => roundoff error, too large a h => truncation error.
For the second order formula, same computation yields
h ~ (6 * epsilon_f / f''')^(1/3)
and a fractional accuracy of (epsilon_f)^(2/3) for the derivative (which is typically one or two significant figures better than the first order formula, assuming double precision).
If this is too imprecise, feel free to ask for more methods, there are a lot of tricks to get better accuracy. Richardson extrapolation is a good start for smooth functions. But those methods typically compute f quite a few times, this may or not be what you want if your function is complex.
If you are going to use numerical derivatives a lot of times at different points, it becomes interesting to construct a Chebyshev approximation.
To get a numerical difference (symmetric difference), you calculate (f(x+dx)-f(x-dx))/(2*dx)
fx = #(x)x.^2;
fPrimeAt3 = (fx(3.1)-fx(2.9))/0.2;
Alternatively, you can create a vector of function values and apply DIFF, i.e.
xValues = 2:0.1:4;
fValues = fx(xValues);
df = diff(fValues)./0.1;
Note that diff takes the forward difference, and that it assumes that dx equals to 1.
However, in your case, you may be better off to define fx as a polynomial, and evaluating the derivative of the function, rather than the function values.
Lacking the symbolic toolbox, nothing stops you from using Derivest, a tool for automatic adaptive numerical differentiation.
derivest(#sin,pi)
ans =
-1
For your example it does very nicely. In fact, it even provides an estimate of the error in the resulting approximation.
fx = inline('x.^2');
[fp,errest] = derivest(fx,3)
fp =
6
errest =
3.6308e-14
did you try diff (calculates differences and approximates a derivative), gradient, or polyder (calculates the derivative of a polynomial) functions?
You can read more on these functions by using help <commandname> on MATLAB console, or use the function browser in the Help menu.
For a given function in analytical form, you can evaluate the derivative at a desired point with the following code:
syms x
df = diff(x^2);
df3 = subs(df, 'x', 3);
fprintf('f''(3)=%f\n', df3);
For pure numerical derivatives use the already given solutions by Jonas and posdef.

Resources