Convexity of ratio of two linear functions - convex-optimization

I am working on optimization of an objective function which is a ratio of two linear functions given as mx + b/-mx+c. Can somebody comment about convexity of this function and/or give me some reference?

You might consider consulting Stephen Boyd's convex optimization book. Section 3.4 (example 3.32) is what you are interested in. Your example is called a linear fractional function and is indeed quasiconvex and quasiconcave if you restrict the domain of the denominator to be either greater or less than 0. Quasiconvex optimization problems can be solved using a method like bisection which involves solving a series of feasibility problems

The easiest litmus test for convexivity of a function is to take the derivative and consider the region where this derivative is zero - these are potential local minima, though they could be global minima or saddle points.
In this case, your derivative is: (d)/(dx)((m x + b)/(-m x + c)) = (m (b + c))/(c - m x)^2
There is no zero point at all that depends on x, except at infinity. There is no minimum.

Related

Quadrature to approximate a transformed beta distribution in R

I am using R to run a simulation in which I use a likelihood ratio test to compare two nested item response models. One version of the LRT uses the joint likelihood function L(θ,ρ) and the other uses the marginal likelihood function L(ρ). I want to integrate L(θ,ρ) over f(θ) to obtain the marginal likelihood L(ρ). I have two conditions: in one, f(θ) is standard normal (μ=0,σ=1), and my understanding is that I can just pick a number of abscissa points, say 20 or 30, and use Gauss-Hermite quadrature to approximate this density. But in the other condition, f(θ) is a linearly transformed beta distribution (a=1.25,b=10), where the linear transformation B'=11.14*(B-0.11) is such that B' also has (approximately) μ=0,σ=1.
I am confused enough about how to implement quadrature for a beta distribution but then the linear transformation confuses me even more. My question is threefold: (1) can I use some variation of quadrature to approximate f(θ) when θ is distributed as this linearly transformed beta distribution, (2) how would I implement this in R, and (3) is this a ridiculous waste of time such that there is an obviously much faster and better method to accomplish this task? (I tried writing my own numerical approximation function but found that my implementation of it, being limited to the R language, was just too slow to suffice.)
Thanks!
First, I assume you can express your L(θ,ρ) and f(θ) in terms of actual code; otherwise you're kinda screwed. Given that assumption, you can use integrate to perform the necessary computations. Something like this should get you started; just plug in your expressions for L and f.
marglik <- function(rho) {
integrand <- function(theta, rho) L(theta, rho) * f(theta)
# set your lower/upper integration limits as appropriate
integrate(integrand, lower=-5, upper=5, rho=rho)
}
For this to work, your integrand has to be vectorized; ie, given a vector input for theta, it must return a vector of outputs. If your code doesn't fit the bill, you can use Vectorize on the integrand function before passing it to integrate:
integrand <- Vectorize(integrand, "theta")
Edit: not sure if you're also asking how to define f(θ) for the transformed beta distribution; that seems rather elementary for someone working with joint and marginal likelihoods. But if you are, then the density of B' = a*B + b, given f(B), is
f'(B') = f(B)/a = f((B' - b)/a) / a
So in your case, f(theta) is dbeta(theta/11.14 + 0.11, 1.25, 10) / 11.14

Algorithm for finding an equidistributed solution to a linear congruence system

I face the following problem in a cryptographical application: I have given a set of linear congruences
a[1]*x[1]+a[2]*x[2]+a[3]*x[3] == d[1] (mod p)
b[1]*x[1]+b[2]*x[2]+b[3]*x[3] == d[2] (mod p)
c[1]*x[1]+c[2]*x[2]+c[3]*x[3] == d[3] (mod p)
Here, x is unknown an a,b,c,d are given
The system is most likely underdetermined, so I have a large solution space. I need an algorithm that finds an equidistributed solution (that means equidistributed in the solution space) to that problem using a pseudo-random number generator (or fails).
Most standard algorithms for linear equation systems that I know from my linear algebra courses are not directly applicable to congruences as far as I can see...
My current, "safe" algorithm works as follows: Find all variable that appear in only one equation, and assign a random value. Now if in each row, only one variable is unassigned, assign the value according to the congruence. Otherwise fail.
Can anyone give me a clue how to solve this problem in general?
You can use gaussian elimination and similar algorithms just like you learned in your linear algebra courses, but all arithmetic is performed mod p (p is a prime). The one important difference is in the definition of "division": to compute a / b you instead compute a * (1/b) (in words, "a times b inverse"). Consider the following changes to the math operations normally used
addition: a+b becomes a+b mod p
subtraction: a-b becomes a-b mod p
multiplication: a*b becomes a*b mod p
division: a/b becomes: if p divides b, then "error: divide by zero", else a * (1/b) mod p
To compute the inverse of b mod p you can use the extended euclidean algorithm or alternatively compute b**(p-2) mod p.
Rather than trying to roll this yourself, look for an existing library or package. I think maybe Sage can do this, and certainly Mathematica, and Maple, and similar commercial math tools can.

Get branch points of equation

If I have a general function,f(z,a), z and a are both real, and the function f takes on real values for all z except in some interval (z1,z2), where it becomes complex. How do I determine z1 and z2 (which will be in terms of a) using Mathematica (or is this possible)? What are the limitations?
For a test example, consider the function f[z_,a_]=Sqrt[(z-a)(z-2a)]. For real z and a, this takes on real values except in the interval (a,2a), where it becomes imaginary. How do I find this interval in Mathematica?
In general, I'd like to know how one would go about finding it mathematically for a general case. For a function with just two variables like this, it'd probably be straightforward to do a contour plot of the Riemann surface and observe the branch cuts. But what if it is a multivariate function? Is there a general approach that one can take?
What you have appears to be a Riemann surface parametrized by 'a'. Consider the algebraic (or analytic) relation g(a,z)=0 that would be spawned from this branch of a parametrized Riemann surface. In this case it is simply g^2 - (z - a)*(z - 2*a) == 0. More generally it might be obtained using Groebnerbasis, as below (no guarantee this will always work without some amount of user intervention).
grelation = First[GroebnerBasis[g - Sqrt[(z - a)*(z - 2*a)], {x, a, g}]]
Out[472]= 2 a^2 - g^2 - 3 a z + z^2
A necessary condition for the branch points, as functions of the parameter 'a', is that the zero set for 'g' not give a (single valued) function in a neighborhood of such points. This in turn means that the partial derivative of this relation with respect to g vanishes (this is from the implicit function theorem of multivariable calculus). So we find where grelation and its derivative both vanish, and solve for 'z' as a function of 'a'.
Solve[Eliminate[{grelation == 0, D[grelation, g] == 0}, g], z]
Out[481]= {{z -> a}, {z -> 2 a}}
Daniel Lichtblau
Wolfram Research
For polynomial systems (and some class of others), Reduce can do the job.
E.g.
In[1]:= Reduce[Element[{a, z}, Reals]
&& !Element[Sqrt[(z - a) (z - 2 a)], Reals], z]
Out[1]= (a < 0 && 2a < z < a) || (a > 0 && a < z < 2a)
This type of approach also works (often giving very complicated solutions for functions with many branch cuts) for other combinations of elementary functions I checked.
To find the branch cuts (as opposed to the simple class of branch points you're interested in) in general, I don't know of a good approach. The best place to find the detailed conventions that Mathematica uses is at the functions.wolfram site.
I do remember reading a good paper on this a while back... I'll try to find it....
That's right! The easiest approach I've seen for branch cut analysis uses the unwinding number. There's a paper "Reasoning about the elementary functions of complex analysis" about this the the journal "Artificial Intelligence and Symbolic Computation". It and similar papers can be found at one of the authors homepage: http://www.apmaths.uwo.ca/~djeffrey/offprints.html.
For general functions you cannot make Mathematica calculate it.
Even for polynomials, finding an exact answer takes time.
I believe Mathematica uses some sort of quantifier elimination when it uses Reduce,
which takes time.
Without any restrictions on your functions (are they polynomials, continuous, smooth?)
one can easily construct functions which Mathematica cannot simplify further:
f[x_,y_] := Abs[Zeta[y+0.5+x*I]]*I
If this function is real for arbitrary x and any -0.5 < y < 0 or 0<y<0.5,
then you will have found a counterexample to the Riemann zeta conjecture,
and I'm sure Mathematica cannot give a correct answer.

how to evaluate derivative of function in matlab?

This should be very simple. I have a function f(x), and I want to evaluate f'(x) for a given x in MATLAB.
All my searches have come up with symbolic math, which is not what I need, I need numerical differentiation.
E.g. if I define: fx = inline('x.^2')
I want to find say f'(3), which would be 6, I don't want to find 2x
If your function is known to be twice differentiable, use
f'(x) = (f(x + h) - f(x - h)) / 2h
which is second order accurate in h. If it is only once differentiable, use
f'(x) = (f(x + h) - f(x)) / h (*)
which is first order in h.
This is theory. In practice, things are quite tricky. I'll take the second formula (first order) as the analysis is simpler. Do the second order one as an exercise.
The very first observation is that you must make sure that (x + h) - x = h, otherwise you get huge errors. Indeed, f(x + h) and f(x) are close to each other (say 2.0456 and 2.0467), and when you substract them, you lose a lot of significant figures (here it is 0.0011, which has 3 significant figures less than x). So any error on h is likely to have a huge impact on the result.
So, first step, fix a candidate h (I'll show you in a minute how to chose it), and take as h for your computation the quantity h' = (x + h) - x. If you are using a language like C, you must take care to define h or x as volatile for that computation not to be optimized away.
Next, the choice of h. The error in (*) has two parts: the truncation error and the roundoff error. The truncation error is because the formula is not exact:
(f(x + h) - f(x)) / h = f'(x) + e1(h)
where e1(h) = h / 2 * sup_{x in [0,h]} |f''(x)|.
The roundoff error comes from the fact that f(x + h) and f(x) are close to each other. It can be estimated roughly as
e2(h) ~ epsilon_f |f(x) / h|
where epsilon_f is the relative precision in the computation of f(x) (or f(x + h), which is close). This has to be assessed from your problem. For simple functions, epsilon_f can be taken as the machine epsilon. For more complicated ones, it can be worse than that by orders of magnitude.
So you want h which minimizes e1(h) + e2(h). Plugging everything together and optimizing in h yields
h ~ sqrt(2 * epsilon_f * f / f'')
which has to be estimated from your function. You can take rough estimates. When in doubt, take h ~ sqrt(epsilon) where epsilon = machine accuracy. For the optimal choice of h, the relative accuracy to which the derivative is known is sqrt(epsilon_f), ie. half the significant figures are correct.
In short: too small a h => roundoff error, too large a h => truncation error.
For the second order formula, same computation yields
h ~ (6 * epsilon_f / f''')^(1/3)
and a fractional accuracy of (epsilon_f)^(2/3) for the derivative (which is typically one or two significant figures better than the first order formula, assuming double precision).
If this is too imprecise, feel free to ask for more methods, there are a lot of tricks to get better accuracy. Richardson extrapolation is a good start for smooth functions. But those methods typically compute f quite a few times, this may or not be what you want if your function is complex.
If you are going to use numerical derivatives a lot of times at different points, it becomes interesting to construct a Chebyshev approximation.
To get a numerical difference (symmetric difference), you calculate (f(x+dx)-f(x-dx))/(2*dx)
fx = #(x)x.^2;
fPrimeAt3 = (fx(3.1)-fx(2.9))/0.2;
Alternatively, you can create a vector of function values and apply DIFF, i.e.
xValues = 2:0.1:4;
fValues = fx(xValues);
df = diff(fValues)./0.1;
Note that diff takes the forward difference, and that it assumes that dx equals to 1.
However, in your case, you may be better off to define fx as a polynomial, and evaluating the derivative of the function, rather than the function values.
Lacking the symbolic toolbox, nothing stops you from using Derivest, a tool for automatic adaptive numerical differentiation.
derivest(#sin,pi)
ans =
-1
For your example it does very nicely. In fact, it even provides an estimate of the error in the resulting approximation.
fx = inline('x.^2');
[fp,errest] = derivest(fx,3)
fp =
6
errest =
3.6308e-14
did you try diff (calculates differences and approximates a derivative), gradient, or polyder (calculates the derivative of a polynomial) functions?
You can read more on these functions by using help <commandname> on MATLAB console, or use the function browser in the Help menu.
For a given function in analytical form, you can evaluate the derivative at a desired point with the following code:
syms x
df = diff(x^2);
df3 = subs(df, 'x', 3);
fprintf('f''(3)=%f\n', df3);
For pure numerical derivatives use the already given solutions by Jonas and posdef.

Fitting polynomials to data

Is there a way, given a set of values (x,f(x)), to find the polynomial of a given degree that best fits the data?
I know polynomial interpolation, which is for finding a polynomial of degree n given n+1 data points, but here there are a large number of values and we want to find a low-degree polynomial (find best linear fit, best quadratic, best cubic, etc.). It might be related to least squares...
More generally, I would like to know the answer when we have a multivariate function -- points like (x,y,f(x,y)), say -- and want to find the best polynomial (p(x,y)) of a given degree in the variables. (Specifically a polynomial, not splines or Fourier series.)
Both theory and code/libraries (preferably in Python, but any language is okay) would be useful.
Thanks for everyone's replies. Here is another attempt at summarizing them. Pardon if I say too many "obvious" things: I knew nothing about least squares before, so everything was new to me.
NOT polynomial interpolation
Polynomial interpolation is fitting a polynomial of degree n given n+1 data points, e.g. finding a cubic that passes exactly through four given points. As said in the question, this was not want I wanted—I had a lot of points and wanted a small-degree polynomial (which will only approximately fit, unless we've been lucky)—but since some of the answers insisted on talking about it, I should mention them :) Lagrange polynomial, Vandermonde matrix, etc.
What is least-squares?
"Least squares" is a particular definition/criterion/"metric" of "how well" a polynomial fits. (There are others, but this is simplest.) Say you are trying to fit a polynomial
p(x,y) = a + bx + cy + dx2 + ey2 + fxy
to some given data points (xi,yi,Zi) (where "Zi" was "f(xi,yi)" in the question). With least-squares the problem is to find the "best" coefficients (a,b,c,d,e,f), such that what is minimized (kept "least") is the "sum of squared residuals", namely
S = ∑i (a + bxi + cyi + dxi2 + eyi2 + fxiyi - Zi)2
Theory
The important idea is that if you look at S as a function of (a,b,c,d,e,f), then S is minimized at a point at which its gradient is 0. This means that for example ∂S/∂f=0, i.e. that
∑i2(a + … + fxiyi - Zi)xiyi = 0
and similar equations for a, b, c, d, e.
Note that these are just linear equations in a…f. So we can solve them with Gaussian elimination or any of the usual methods.
This is still called "linear least squares", because although the function we wanted was a quadratic polynomial, it is still linear in the parameters (a,b,c,d,e,f). Note that the same thing works when we want p(x,y) to be any "linear combination" of arbitrary functions fj, instead of just a polynomial (= "linear combination of monomials").
Code
For the univariate case (when there is only variable x — the fj are monomials xj), there is Numpy's polyfit:
>>> import numpy
>>> xs = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> ys = [1.1, 3.9, 11.2, 21.5, 34.8, 51, 70.2, 92.3, 117.4, 145.5]
>>> p = numpy.poly1d(numpy.polyfit(xs, ys, deg=2))
>>> print p
2
1.517 x + 2.483 x + 0.4927
For the multivariate case, or linear least squares in general, there is SciPy. As explained in its documentation, it takes a matrix A of the values fj(xi). (The theory is that it finds the Moore-Penrose pseudoinverse of A.) With our above example involving (xi,yi,Zi), fitting a polynomial means the fj are the monomials x()y(). The following finds the best quadratic (or best polynomial of any other degree, if you change the "degree = 2" line):
from scipy import linalg
import random
n = 20
x = [100*random.random() for i in range(n)]
y = [100*random.random() for i in range(n)]
Z = [(x[i]+y[i])**2 + 0.01*random.random() for i in range(n)]
degree = 2
A = []
for i in range(n):
A.append([])
for xd in range(degree+1):
for yd in range(degree+1-xd):
A[i].append((x[i]**xd)*(y[i]**yd)) #f_j(x_i)
c,_,_,_ = linalg.lstsq(A,Z)
j = 0
for xd in range(0,degree+1):
for yd in range(0,degree+1-xd):
print " + (%.2f)x^%dy^%d" % (c[j], xd, yd),
j += 1
prints
+ (0.01)x^0y^0 + (-0.00)x^0y^1 + (1.00)x^0y^2 + (-0.00)x^1y^0 + (2.00)x^1y^1 + (1.00)x^2y^0
so it has discovered that the polynomial is x2+2xy+y2+0.01. [The last term is sometimes -0.01 and sometimes 0, which is to be expected because of the random noise we added.]
Alternatives to Python+Numpy/Scipy are R and Computer Algebra Systems: Sage, Mathematica, Matlab, Maple. Even Excel might be able to do it. Numerical Recipes discusses methods to implement it ourselves (in C, Fortran).
Concerns
It is strongly influenced by how the points are chosen. When I had x=y=range(20) instead of the random points, it always produced 1.33x2+1.33xy+1.33y2, which was puzzling... until I realised that because I always had x[i]=y[i], the polynomials were the same: x2+2xy+y2 = 4x2 = (4/3)(x2+xy+y2). So the moral is that it is important to choose the points carefully to get the "right" polynomial. (If you can chose, you should choose Chebyshev nodes for polynomial interpolation; not sure if the same is true for least squares as well.)
Overfitting: higher-degree polynomials can always fit the data better. If you change the degree to 3 or 4 or 5, it still mostly recognizes the same quadratic polynomial (coefficients are 0 for higher-degree terms) but for larger degrees, it starts fitting higher-degree polynomials. But even with degree 6, taking larger n (more data points instead of 20, say 200) still fits the quadratic polynomial. So the moral is to avoid overfitting, for which it might help to take as many data points as possible.
There might be issues of numerical stability I don't fully understand.
If you don't need a polynomial, you can obtain better fits with other kinds of functions, e.g. splines (piecewise polynomials).
Yes, the way this is typically done is by using least squares. There are other ways of specifying how well a polynomial fits, but the theory is simplest for least squares. The general theory is called linear regression.
Your best bet is probably to start with Numerical Recipes.
R is free and will do everything you want and more, but it has a big learning curve.
If you have access to Mathematica, you can use the Fit function to do a least squares fit. I imagine Matlab and its open source counterpart Octave have a similar function.
For (x, f(x)) case:
import numpy
x = numpy.arange(10)
y = x**2
coeffs = numpy.polyfit(x, y, deg=2)
poly = numpy.poly1d(coeffs)
print poly
yp = numpy.polyval(poly, x)
print (yp-y)
Bare in mind that a polynomial of higher degree ALWAYS fits the data better. Polynomials of higher degree typically leads to highly improbable functions (see Occam's Razor), though (overfitting). You want to find a balance between simplicity (degree of polynomial) and fit (e.g. least square error). Quantitatively, there are tests for this, the Akaike Information Criterion or the Bayesian Information Criterion. These tests give a score which model is to be prefered.
If you want to fit the (xi, f(xi)) to an polynomial of degree n then you would set up a linear least squares problem with the data (1, xi, xi, xi^2, ..., xi^n, f(xi) ). This will return a set of coefficients (c0, c1, ..., cn) so that the best fitting polynomial is *y = c0 + c1 * x + c2 * x^2 + ... + cn * x^n.*
You can generalize this two more than one dependent variable by including powers of y and combinations of x and y in the problem.
Lagrange polynomials (as #j w posted) give you an exact fit at the points you specify, but with polynomials of degree more than say 5 or 6 you can run into numerical instability.
Least squares gives you the "best fit" polynomial with error defined as the sum of squares of the individual errors. (take the distance along the y-axis between the points you have and the function that results, square them, and sum them up) The MATLAB polyfit function does this, and with multiple return arguments, you can have it automatically take care of scaling/offset issues (e.g. if you have 100 points all between x=312.1 and 312.3, and you want a 6th degree polynomial, you're going to want to calculate u = (x-312.2)/0.1 so the u-values are distributed between -1 and +=).
NOTE that the results of least-squares fits are strongly influenced by the distribution of x-axis values. If the x-values are equally spaced, then you'll get larger errors at the ends. If you have a case where you can choose the x values and you care about the maximum deviation from your known function and an interpolating polynomial, then the use of Chebyshev polynomials will give you something that is close to the perfect minimax polynomial (which is very hard to calculate). This is discussed at some length in Numerical Recipes.
Edit: From what I gather, this all works well for functions of one variable. For multivariate functions it is likely to be much more difficult if the degree is more than, say, 2. I did find a reference on Google Books.
at college we had this book which I still find extremely useful: Conte, de Boor; elementary numerical analysis; Mc Grow Hill. The relevant paragraph is 6.2: Data Fitting.
example code comes in FORTRAN, and the listings are not very readable either, but the explanations are deep and clear at the same time. you end up understanding what you are doing, not just doing it (as is my experience of Numerical Recipes).
I usually start with Numerical Recipes but for things like this I quickly have to grab Conte-de Boor.
maybe better posting some code... it's a bit stripped down, but the most relevant parts are there. it relies on numpy, obviously!
def Tn(n, x):
if n==0:
return 1.0
elif n==1:
return float(x)
else:
return (2.0 * x * Tn(n - 1, x)) - Tn(n - 2, x)
class ChebyshevFit:
def __init__(self):
self.Tn = Memoize(Tn)
def fit(self, data, degree=None):
"""fit the data by a 'minimal squares' linear combination of chebyshev polinomials.
cfr: Conte, de Boor; elementary numerical analysis; Mc Grow Hill (6.2: Data Fitting)
"""
if degree is None:
degree = 5
data = sorted(data)
self.range = start, end = (min(data)[0], max(data)[0])
self.halfwidth = (end - start) / 2.0
vec_x = [(x - start - self.halfwidth)/self.halfwidth for (x, y) in data]
vec_f = [y for (x, y) in data]
mat_phi = [numpy.array([self.Tn(i, x) for x in vec_x]) for i in range(degree+1)]
mat_A = numpy.inner(mat_phi, mat_phi)
vec_b = numpy.inner(vec_f, mat_phi)
self.coefficients = numpy.linalg.solve(mat_A, vec_b)
self.degree = degree
def evaluate(self, x):
"""use Clenshaw algorithm
http://en.wikipedia.org/wiki/Clenshaw_algorithm
"""
x = (x-self.range[0]-self.halfwidth) / self.halfwidth
b_2 = float(self.coefficients[self.degree])
b_1 = 2 * x * b_2 + float(self.coefficients[self.degree - 1])
for i in range(2, self.degree):
b_1, b_2 = 2.0 * x * b_1 + self.coefficients[self.degree - i] - b_2, b_1
else:
b_0 = x*b_1 + self.coefficients[0] - b_2
return b_0
Remember, there's a big difference between approximating the polynomial and finding an exact one.
For example, if I give you 4 points, you could
Approximate a line with a method like least squares
Approximate a parabola with a method like least squares
Find an exact cubic function through these four points.
Be sure to select the method that's right for you!
It's rather easy to scare up a quick fit using Excel's matrix functions if you know how to represent the least squares problem as a linear algebra problem. (That depends on how reliable you think Excel is as a linear algebra solver.)
The lagrange polynomial is in some sense the "simplest" interpolating polynomial that fits a given set of data points.
It is sometimes problematic because it can vary wildly between data points.

Resources