svmlib scaling vs. pyml normalization, scaling, and translation - scale

What is the proper way to normalize feature vectors for use in a linear-kernel SVM?
Looking at LIBSVM, it looks like it's done by just rescaling each feature to a single standard upper/lower range. However, it doesn't seem like PyML provides a way to scale the data this way. Instead, there are options to normalize the vectors by their length, shift each feature value by its mean while rescaling by the standard deviation, etc.
I am dealing with a case when most features are binary, except a few that are numeric.

I am not an expert in this, but I believe centering and scaling each feature vector by subtracting its mean and dividing thereafter by the standard deviation is a typical way to normalize feature vectors for use with SVMs. In R, this can be done with the scale function.
Another way is to transform each feature vector to the [0,1] range:
(x - min(x)) / (max(x) - min(x))
Maybe some features could benefit from a log-transformation if the distribution is very scewed, but this would change the shape of the distribution as well and not only "move" it.
I am not sure what you gain in an SVM-setting by normalizing the vectors by their L1 or L2 norm like PyML does with its normalize method. I guess binary features (0 or 1) don't need to be normalized.

Related

Looking for good scale factor for converting log to 8.8 fixed point

I have a range of numbers in (0, 1]
I would like to take the natural log of these numbers, and then store as 8.8
fixed point.
My foruma is K*ln(x) + (1<<16)
but I am not sure what the best value is for K .
My thinking is that if x doubles, then ln(x) increases by ln(2), so the fixed point value should increase by 1 in fixed point (i.e. 256)
So, this would mean K = 256/ln(2)
Does this make sense?
As x approaches 0, ln(x) will diverge to negative infinity. So you are essentially trying to map an infinite domain to a finite range.
If you do so in a linear way, you have to cut off at some point. If you choose your cut-off at too low a value, you'll be wasting precision for the numbers you represent. If you choose to high a cut-off, too many values will be clamped to the minimal element of the range. Without knowledge about the distribution of the point, it will be very hard to guess a suitable balance here.
So perhaps you could apply a non-linear map instead of the linear one you proposed. Something like the exponential function? Which would mean you'd actually store x instead of ln(x). So I'd say if you want to store values from [0,1) in 16 bit without too much loss of information, you'd just use Q0.16, i.e. all the digits in the fractional part. For (0,1] you can either store 1 − x or do a special case for x = 1 so that you encode that as 0 instead. If you have Q8.8 numbers, you'd multiply your numbers by 28 = 256 first, but if you have access to the bit representation that multiplication would be a waste of time.
I guess you had a reason you'd want to store logarithms, so this answer may not be what you were hoping for. I don't see an easier way around the underlying problem, though, so you may have to reconsider some of your ideas.

Calculate derivative of an array with apache-commons-math

Good Morning,
I have an array with about 3000 double values, I need to find all local minimum and maximum, for this I'm interested to first and second derivative, what's best way to achieve this with Apache Commons Math? My trouble is that I'm starting directly from the array, not from a function like sin(x).
Thanks
With just an array you wont be able to find a min/max.
If the array was calcualted from a known function, then you could differentiate it numerically (just calculate at X and X + epsilon, and divide by epsilon, assuming that there's a single parameter that you're differentating with respect to).
Alternatively, is the array actually the list of coefficients of a big polynomial? If so, then the same approach might work.

Flexagon Simulation

What is the best way to simulate a flexagon?
My best guess at a starting point is to represent the faces and edges, and simulate transformations based where edges meet. I'm thinking that in the process of implementing a transformation, it will be apparent when folding in a given direction is physically impossible.
I'm going to try to figure this out by experimentation, but it definitely feels like the kind of problem where a gap in my facility with mathematics is holding me back.
Edit: To clarify, I'm interested in what sort of data structures I could use to represent a flexagon and how I can manipulate those data structures to simulate the folding of a flexagon.
If you write all of the invariants of the flexagon as a system of equations, small deviations around legal states may be written as a linear system. For instance, the stiffness of a piece of paper between (x1,y1) and (x2,y2) enforces
(x1 - x2)**2 + (y1 - y2)**2 - L**2 == 0
This can be be softened to
chi2 = (x1 - x2)**2 + (y1 - y2)**2 - L**2 + other constraints...
Derivatives of chi2 with respect to x1, x2, y1, y2 yield linear equations. A system of linear equations is a matrix, and an eigenvalue/eigenvector decomposition of that matrix give you linear combinations of the x1, x2, y1, y2 parameters that are easy or hard to bend. The eigenvectors are a basis set of possible directions and each one's corresponding eigenvalue tells you how hard it is to bend in that direction. Larger eigenvalues are more constrained.
A problem with the above is that if there are any directions that are truly allowed, that is, the derivative of chi2 with respect to p is 0 (the original constraint is absolutely satisfied), then the matrix is singular and can't be inverted to get the eigensystem. If you only want to know what those absolutely allowed directions are, you can compute the null space of the matrix instead of its eigensystem. However, I suspect (never having played with a flexagon) that the "allowed" directions involve a little bit of bending, in which case chi2 is small but nonzero. Then you'd be looking for small but nonzero eigenvalues. Other degrees of freedom are allowed and uninteresting, such as translation or rotation of the whole object. To turn it into a pure eigensystem problem (no null space at all), add constraints to the system with arbitrarily small constants lambda:
chi2 += lambda_x * (x1 + x2)**2/4.0 + lambda_y * (y1 + y2)**2/4.0
You'll recognize them in your solution because they'll vary as you vary each lambda. (The example above gives a penalty lambda_x to translating in x and lambda_y to translating in y.)
In terms of implementation, you can use any linear algebra software to compute solutions and check for variation with the lambdas. I used Python to prototype a problem like this (detector alignment in high energy physics, in which the constraints are measurements like "this detector is 3 cm from that detector" and the chi2 was derived from the uncertainties "3 cm +- 0.1 cm") and then ported the solution to C++ (BLAS) for production. The Numpy library for Python had enough linear algebra (it's BLAS under the hood), though I also used the generic, non-linear minimizers in Scipy to debug the matrix solution. The hardest part is getting the indexes to line up right, which is necessary when casting it as a matrix and not when you give an objective function to a generic minimizer (because you use variable names instead). This is more of a Matlab or Mathematica problem, so if you're more comfortable with one of them, use it instead. This problem will require a lot of trial and error, so use the most interactive system possible (one with a good REPL or worksheet/notebook-style interface).
It can also be helpful to draw a graph of the connections (graph-theory graph, not a plot), on which to label their constraints. For me, that was a necessary first step before writing out the equations.
It might also help to visualize the system by writing a set of functions that take parameter values (x1, etc.) and draw the figure with OpenGL (or other 3-D mesh renderer). This can show you if some constraint is being violated, because the mesh tiles would pass theough each other. It can also help you identify the degrees of freedom represented by each eigenvector: vary the parameters by the linear combination represented by the eigenvector and you'll see if it's just translating/rotating or if it's doing some interesting twist or fold.

how to reduce dimensionality of vector

I have a set of vectors. I'm working on ways to reduce a n-dimensional vector to a unary value (1-d), say
(x1,x2,....,xn) ------> y
This single value needs to be the characteristic value of the vector. Each unique vector produces a unique output value. Which of the following methods is appropriate:
1- norm of the vector - square root of sum of squares that measures euclidian distance from origin
2- compute hash of F, using some hashing techniques avoiding collision
3- use linear regression to compute, y = w1*x1 + w2*x2 + ... + wn*xn - unlikely to be good if there is no good dependence of input values on output
4- feature extraction technique like PCA that assigns weights to each of x1,x2,..xn based on
the set of input vectors
It's unclear from the method what properties you need this transform to have, so I'm making a guess that you don't need the transformation to preserve any properties other than uniqueness, and possibly invertibility.
None of the techniques you suggest can in general avoid collisions:
Norm - two vectors pointing in opposite directions have the same norm.
Hash - if the input isn't known apriori - what is generally meant by hash function has a finite image, and you have an infinite number of possible vectors - no good.
It's easy to find to vectors which give the same result for any linear regression result (think about it).
PCA is a specific kind of linear transformation - hence the same problem as with linear regression.
So - if you're just looking for uniqueness, you could "stringify" your vectors. One way to do it is to write them down as text strings, with the different coordinates separated by a special character (underscore, for example). Then take the binary value of this string as your representation.
If space is important and you need a more efficient representation, you could think of a more efficient bit encoding: each character in the set 0,1,...,9,'.','' can be represented by 4 bits - a hexadecimal digit (map '.' to A and '' to B). Now encode this string as a hexadecimal number, saving half the space.

Normalize to scale

I have an 2-D array of data (C), where C(:,1) has values corresponding to C(:,2). C(:,2) varies from 0.0001:0.0001:1, i.e. 10,000 values. I need to calculate the d(log(C(i,1))) / d(log(C(i,2))), which I do by simply calculating log(C(i,1)) / log(C(i,2)). But as C(i,2), approaches 1, the denominator approaches zero, and the quotient shoots up. One way to keep this in check would be to normalize it using a parameter, but I'm not sure how to do that. Does anyone have an idea about this?
Since this is discrete differentiation, the answer is bound to be a little inelegant.
You're interested in the derivative d(log(C(i,1))) / d(log(C(i,2)))
=∆(log(C(i,1))) / ∆(log(C(i,2)))
=(log(C(i+1,1))-log(C(i,1))) / (log(C(i,2)) - log(C(i,2)))
which is tractable. The denominator does not go to zero, it goes to the step size (0.0001).

Resources