How do I change the inequality representation in PORTA output? - vertex

I am trying to derive facet Bell Inequalities given the extreme points of a 16-dim polytope. I have been using the software PORTA to go from the standard vertex representation to the inequality representation. However, my PORTA output seems to give inequalities(the bottom inequalities (17)-(24) are the main ones I'm looking for) which are stated in terms of eight equalities((1) - (8) above in the picture). I used the traf.bat function. I was wondering is there any way to make PORTA simply give me eight inequalities dependent on the 16 unknown variables and not in terms of the eight equalities above? These inequalities should end up looking like the familiar CHSH inequalities where you have 16 unknowns <= 2.
Picture showing the given inequalities and equalities

Related

Why do we calculate the margin in SVM?

I'm learning SVM (Support Vector Machine) : there are several points that remain ambiguous : (linearly separable, primal case)
I know how to find the weigth w and the hyperplan equation, but if we can deduce the support vectors from it, why do we calculate the margin ? What do I need to calculate first ? In which case ? (Sorry for those mixed questions, but I'm really lost with it)
I saw in some exemples that the margin is caculated in this manner :
1 / ||w||
while in others, this way :
2 / ||w||
so what is the difference between those two cases ?
Thanks
The optimization objective of SVM is to reduce w, b in such a way that we have the maximum margin with the hyperplane.
Mathematically speaking,
it is a nonlinear optimization task which is solved by KKT (Karush-Kunn-Tucker) conditions, using lagrange multipliers.
The following video explains this in simple terms for linearly seperable case
https://www.youtube.com/watch?v=1NxnPkZM9bc
Also how this is calculated is better explained here for both linear and primal cases.
https://www.csie.ntu.edu.tw/~cjlin/talks/rome.pdf
The margin between the separating hyperplane and the class boundaries of an SVM is an essential feature of this algorithm.
See, you have two hyperplanes (1) w^tx+b>=1, if y=1 and (2) w^tx+b<=-1, if y=-1. This says that any vector with a label y=1 must lie ether on or behind the hyperplane (1). The same applies to the vectors with label y=-1 and hyperplane (2).
Note: If those requirements can be fulfilled, it implicitly means the dataset is linearly separatable. This makes sense because otherwise no such margin can be constructed.
So, what an SVM tries to find is a decision boundary which ist half-way between (1) and (2). Let's define this boundary as (3) w^tx+b=0. What you see here is that (1), (2) and (3) are parallel hyperplanes because they share the same parameters w and b. The parameters w holds the direction of those planes. Recall that a vector always has a direction and a magnitude/length.
The question is now: How can one calculate the hyperplane (3)? The equations (1) and (2) tell us that any vector with a label y=1 which is closest to (3) lies exactly on the hyperplane (1), hence (1) becomes w^tx+b=1 for such x. The similar applies for the closest vectors with a negative label and (2). Those vectors on the planes called 'support vectors' and the decision boundary (3) only depends on those, because one simply can subtract (2) from (1) for the support vectors and gets:
w^tx+b-w^tx+b=1-(-1) => wt^x-w^tx=2
Note: x for the two planes are different support vectors.
Now, we want to get the direction of w but ignoring it's length to get the shortest distance between (3) and the other planes. This distance is a perpendicular line segment from (3) to the others. To do so, one can divide by the length of w to get the norm vector which is perpendicular to (3), hence (wt^x-w^tx)/||w||=2/||w||. By ignoring the left hand site (it's equal) we see that the distance between the two planes is actually 2/||w||. This distance must be maximized.
Edit:
As others state here, use Lagrange multipliers or the SMO algorithm to minimize the term
1/2 ||w||^2
s.t. y(w^tx+b)>=1
This is the convex form of the optimization problem for the primal svm.

julia angle function returns different values for same angle

While using the function angle I came across this result:
julia> angle(-1+im*0.0)
3.141592653589793
julia> angle(-1-im*0.0)
-3.141592653589793
which is not properly wrong, but perhaps could cause discomfort because
usually the evaluation of the angle is in the interval (-pi,pi].
This is correct and intentional. You are encountering what is known as a "branch cut": a point where a multi-valued function has to choose between multiple values it could return. John D. Cook has a short but good article introducing the concept and outlining how Common Lisp approaches the problem of defining various branch cuts for various related functions consistently.
In the case of the angle function applied to the point -1 in the complex plane, any odd multiple of π is theoretically a correct answer. Angle values are normalized to be in the range [-π, π], however, which leaves only two odd multiples of π to choose between: ±π. Which one should be returned at -1 in the complex plane? In some sense, the question is "Which way did you approach the real line from?" If you approach -1 from above in the complex plane, then π is the is the answer that respects continuity since the angle of values slightly above the real line are close to π. If you approached -1 from below, however, then -π is the continuous answer since angles just below the real line are close -π. Accordingly, when we evaluate angle near -1 the sign of the imaginary part is significant – even if its value is zero (±0.0) – so we give different answers for -1 ± 0.0im:
angle(-1 + 0.0im) == +3.141592653589793
angle(-1 - 0.0im) == -3.141592653589793
These answers respect the continuity of angle with respect to the sign of the imaginary part of the argument. Many complex functions have similar branch cuts on the real line with different results depending on the sign of the zero-valued imaginary part of their argument.

Flexagon Simulation

What is the best way to simulate a flexagon?
My best guess at a starting point is to represent the faces and edges, and simulate transformations based where edges meet. I'm thinking that in the process of implementing a transformation, it will be apparent when folding in a given direction is physically impossible.
I'm going to try to figure this out by experimentation, but it definitely feels like the kind of problem where a gap in my facility with mathematics is holding me back.
Edit: To clarify, I'm interested in what sort of data structures I could use to represent a flexagon and how I can manipulate those data structures to simulate the folding of a flexagon.
If you write all of the invariants of the flexagon as a system of equations, small deviations around legal states may be written as a linear system. For instance, the stiffness of a piece of paper between (x1,y1) and (x2,y2) enforces
(x1 - x2)**2 + (y1 - y2)**2 - L**2 == 0
This can be be softened to
chi2 = (x1 - x2)**2 + (y1 - y2)**2 - L**2 + other constraints...
Derivatives of chi2 with respect to x1, x2, y1, y2 yield linear equations. A system of linear equations is a matrix, and an eigenvalue/eigenvector decomposition of that matrix give you linear combinations of the x1, x2, y1, y2 parameters that are easy or hard to bend. The eigenvectors are a basis set of possible directions and each one's corresponding eigenvalue tells you how hard it is to bend in that direction. Larger eigenvalues are more constrained.
A problem with the above is that if there are any directions that are truly allowed, that is, the derivative of chi2 with respect to p is 0 (the original constraint is absolutely satisfied), then the matrix is singular and can't be inverted to get the eigensystem. If you only want to know what those absolutely allowed directions are, you can compute the null space of the matrix instead of its eigensystem. However, I suspect (never having played with a flexagon) that the "allowed" directions involve a little bit of bending, in which case chi2 is small but nonzero. Then you'd be looking for small but nonzero eigenvalues. Other degrees of freedom are allowed and uninteresting, such as translation or rotation of the whole object. To turn it into a pure eigensystem problem (no null space at all), add constraints to the system with arbitrarily small constants lambda:
chi2 += lambda_x * (x1 + x2)**2/4.0 + lambda_y * (y1 + y2)**2/4.0
You'll recognize them in your solution because they'll vary as you vary each lambda. (The example above gives a penalty lambda_x to translating in x and lambda_y to translating in y.)
In terms of implementation, you can use any linear algebra software to compute solutions and check for variation with the lambdas. I used Python to prototype a problem like this (detector alignment in high energy physics, in which the constraints are measurements like "this detector is 3 cm from that detector" and the chi2 was derived from the uncertainties "3 cm +- 0.1 cm") and then ported the solution to C++ (BLAS) for production. The Numpy library for Python had enough linear algebra (it's BLAS under the hood), though I also used the generic, non-linear minimizers in Scipy to debug the matrix solution. The hardest part is getting the indexes to line up right, which is necessary when casting it as a matrix and not when you give an objective function to a generic minimizer (because you use variable names instead). This is more of a Matlab or Mathematica problem, so if you're more comfortable with one of them, use it instead. This problem will require a lot of trial and error, so use the most interactive system possible (one with a good REPL or worksheet/notebook-style interface).
It can also be helpful to draw a graph of the connections (graph-theory graph, not a plot), on which to label their constraints. For me, that was a necessary first step before writing out the equations.
It might also help to visualize the system by writing a set of functions that take parameter values (x1, etc.) and draw the figure with OpenGL (or other 3-D mesh renderer). This can show you if some constraint is being violated, because the mesh tiles would pass theough each other. It can also help you identify the degrees of freedom represented by each eigenvector: vary the parameters by the linear combination represented by the eigenvector and you'll see if it's just translating/rotating or if it's doing some interesting twist or fold.

If there are M different boxes and N identical balls

and we need to put these balls into boxes.
How many states of the states could there be?
This is part of a computer simulation puzzle. I've almost forget all my math knowledges.
I believe you are looking for the Multinomial Coefficient.
I will check myself and expand my answer.
Edit:
If you take a look at the wikipedia article I gave a link to, you can see that the M and N you defined in your question correspond to the m and n defined in the Theorem section.
This means that your question corresponds to: "What is the number of possible coefficient orderings when expanding a polynomial raised to an arbitrary power?", where N is the power, and M is the number of variables in the polynomial.
In other words:
What you are looking for is to sum over the multinomial coefficients of a polynomial of M variables expanded when raised to the power on N.
The exact equations are a bit long, but they are explained very clearly in wikipedia.
Why is this true:
The multinomial coefficient gives you the number of ways to order identical balls between baskets when grouped into a specific grouping (for example, 4 balls grouped into 3, 1, and 1 - in this case M=4 and N=3). When summing over all grouping options you get all possible combinations.
I hope this helped you out.
These notes explain how to solve the "balls in boxes" problem in general: whether the balls are labeled or not, whether the boxes are labeled or not, whether you have to have at least one ball in each box, etc.
this is a basic combinatorial question (distribution of identical objects into non identical slots)
the number of states is [(N+M-1) choose (M-1)]

Approximating nonparametric cubic Bezier

What is the best way to approximate a cubic Bezier curve? Ideally I would want a function y(x) which would give the exact y value for any given x, but this would involve solving a cubic equation for every x value, which is too slow for my needs, and there may be numerical stability issues as well with this approach.
Would this be a good solution?
Just solve the cubic.
If you're talking about Bezier plane curves, where x(t) and y(t) are cubic polynomials, then y(x) might be undefined or have multiple values. An extreme degenerate case would be the line x= 1.0, which can be expressed as a cubic Bezier (control point 2 is the same as end point 1; control point 3 is the same as end point 4). In that case, y(x) has no solutions for x != 1.0, and infinite solutions for x == 1.0.
A method of recursive subdivision will work, but I would expect it to be much slower than just solving the cubic. (Unless you're working with some sort of embedded processor with unusually poor floating-point capacity.)
You should have no trouble finding code that solves a cubic that has already been thoroughly tested and debuged. If you implement your own solution using recursive subdivision, you won't have that advantage.
Finally, yes, there may be numerical stablility problems, like when the point you want is near a tangent, but a subdivision method won't make those go away. It will just make them less obvious.
EDIT: responding to your comment, but I need more than 300 characters.
I'm only dealing with bezier curves where y(x) has only one (real) root. Regarding numerical stability, using the formula from http://en.wikipedia.org/wiki/Cubic_equation#Summary, it would appear that there might be problems if u is very small. – jtxx000
The wackypedia article is math with no code. I suspect you can find some cookbook code that's more ready-to-use somewhere. Maybe Numerical Recipies or ACM collected algorithms link text.
To your specific question, and using the same notation as the article, u is only zero or near zero when p is also zero or near zero. They're related by the equation:
u^^6 + q u^^3 == p^^3 /27
Near zero, you can use the approximation:
q u^^3 == p^^3 /27
or p / 3u == cube root of q
So the computation of x from u should contain something like:
(fabs(u) >= somesmallvalue) ? (p / u / 3.0) : cuberoot (q)
How "near" zero is near? Depends on how much accuracy you need. You could spend some quality time with Maple or Matlab looking at how much error is introduced for what magnitudes of u. Of course, only you know how much accuracy you need.
The article gives 3 formulas for u for the 3 roots of the cubic. Given the three u values, you can get the 3 corresponding x values. The 3 values for u and x are all complex numbers with an imaginary component. If you're sure that there has to be only one real solution, then you expect one of the roots to have a zero imaginary component, and the other two to be complex conjugates. It looks like you have to compute all three and then pick the real one. (Note that a complex u can correspond to a real x!) However, there's another numerical stability problem there: floating-point arithmetic being what it is, the imaginary component of the real solution will not be exactly zero, and the imaginary components of the non-real roots can be arbitrarily close to zero. So numeric round-off can result in you picking the wrong root. It would be helpfull if there's some sanity check from your application that you could apply there.
If you do pick the right root, one or more iterations of Newton-Raphson can improve it's accuracy a lot.
Yes, de Casteljau algorithm would work for you. However, I don't know if it will be faster than solving the cubic equation by Cardano's method.

Resources