The reason of bias'es usage in networks? - networking

It may be easy to see why but I still don't understand why we use bias in a neutral network? The weight's values will get changed, therefore ensuring whether the algorithm will learn. So, why use bias in all of this?

Because of linear equations.
Bias is another learned parameter. A single neuron will compute w*x + b where w is your weight parameter and b your bias.
Perhaps this helps you: Let's assume you are dealing with a 2D euclidian space that you'd like to classify with two labels. You can do that by computing a linear function and then classify everything below it with one label, and everything below with another. If you would not use the bias you could only change the slope of your function and your function would always intersect (0, 0). Bias gives you the possibility to define where that linear function intersects the y-axis for x=0, i.e. (0, y). E.g. without bias you could not separate data that is only separatable by the y-axis.

Related

hyperplane equation in SVM

How does the SVM algorithm find the optimum hyperplane? The positive margin hyperplane equation is w.x-b=1, the negative margin hyperplane equation is w.x-b=-1, and the middle(optimum) hyperplane equation is w.x-b=0).
I understand how a hyperplane equation can be got by using a normal vector of that plane and a known vector point(not the whole vector) by this tutorial. Lets say the known vector point is x1, the whole vector will be (x-x1), for some x. If w is the normal vector of the plane, then w.(x-x1)=0; eventually we will get the form w.x=b
Now, for getting a hyperplane, we need a normal vector and known point. How does the algorithm create a hyperplane at the middle where there is no data point (which I think is a known vector point needed in the equation) from training data?
Maybe I misunderstand something or my logic is not correct.
You misunderstand one basic fact: the algorithm is not required to represent a hyperplane in terms of w.x-b = 0, using a given data point. The algorithm is free to change this into any form convenient to each of its functions.
The solution is obvious, as you've already found it: the algorithm does not have to use one of the points form the data set. In fact, if the partition is ideal (no data on the wrong side), there is no point in the middle.
However, finding that hyperplane is trivial. (1) The positive and negative hyperplanes are parallel, and (2) the optimum plane bisects their separation. By (1), all three planes have the same normal vector. By (2), the reference point can be the midpoint of any segment connecting two points on opposite planes.
Briefly, pick a positive support vector and a negative support vector; these lie one on each of the planes. Find the midpoint between them, convolve with the normal vector, and there's your optimum plane.

Find the optimized rotation

I have an application where I must find a rotation from a set of 15 orderer&indexed 3D points (X1, X2, ..., X15) to another set of 15 points with the same index (1 initial point corresponding to 1 final point).
I've read manythings about finding the rotation with Euler angles (evil for some persons), quaternions or with projecting the vector on the basis axis. But i've an additionnal constraint : a few points of my final set can be wrong (i.e. have wrong coordinates) so I want to discriminate the points that ask a rotation very far from the median rotation.
My issue is : for every set of 3 points (not aligned ones) and their images I can compute quaternions (according to the fact that the transformation matrix won't be a pure rotation I have some additionnal calculations but it can be done). So I get a set of quaternions (455 ones max) and I want to remove the wrong ones.
Is there a way to find what points give rotations far from the mean rotation ? Does the "mean" and "the standard deviation" mean something for quaternions or must I compute Euler angles ? And once I've the set of "good" quaternions, how can I compute the "mean" quaternion/rotation ?
Cheers,
Ricola3D
In computer vision, there's a technique called RANSAC for doing something like what you propose. Instead of finding all of the possible quaternions, you would use a minimal set of point correspondences to find a single quaternion/transformation matrix. You'd then evaluate all of the points for quality of fit, discarding those that don't fit well enough. If you don't have enough good matches, perhaps you got a bad match in your original set. So you'll throw away that attempt and try again. If you do get enough good matches, you'll do a least-squares regression fit of all the inlier points to get a new transformation matrix and then iterate until you're happy with the results.
Alternatively, you could take all of your normalized quaternions and find the dot-product between them all. The dot-product should always be positive; if it's not for any given calculation, you should negate all of the components of one of the two quaternions and re-compute. You then have a distance measure between the quaternions and you can cluster or look for gaps.
There are 2 problems here:
how do you compute a "best fit" for an arbitrary number of points?
how do decide which points to accept, and which points to reject?
The general answer to the first is, "do a least squares fit". Quaternions would probably be better than Euler angles for this; try the following:
foreach point pair (a -> b), ideal rotation by unit quaternion q is:
b = q a q* -> q a - b q = 0
So, look for a least-squares fit for q:
minimize sum[over i] of |q a_i - b_i q|^2
under the constraint: |q|^2 = 1
As presented above, the least-squares problem is linear except for the constraint, which should make it easier to solve than an Euler angle formulation.
For the second problem, I can see two approaches:
if your points aren't too far off, you could try running the least-squares solver with all points, then go back, throw out the "outliers" (those point pairs whose squared-error is greatest), and try again.
if wildly inconsistent points are throwing off the above procedure, you could try selecting random, small subsets of 3 or 4 pairs, and find a least-squares fit for each. If a large group of these results have similar rotations with low total error, you can use this to identify "good" pairs (and thereby eliminate bad pairs); then go back and find a least-squares fit for all good pairs.

Mathematical indicator for the "flattness" of a curve?

I am currently working on a computer science project where I have to evaluate charts. The charts are simple lines in an x-y-coordinate-system, given by CSV files. the flatter the curve, the better for me. Now I am looking for an indicator for the "flatness" of these curves.
My first idea was to calculate the first derivative of the function and then calculate the average between two points. If this value is near 0, then the function is pretty flat.
Is that a good idea? Is there any better solution?
Edit:
Here is a picture as an example. Which curve is flatter between x1 and x2?
You might consider using the standard deviation as a measure of distance from a perfectly flat line. First do a simple linear regression to find the ideally fitting flat line, then compute the standard deviation of the residues.
if the values are all positive you could try calculating the integral.
So basically the surface below the line.
The lower the integral, the better. Just like you need it.
If you also expect negative values, you could basically do the same after changing the sign.
If the quickness of change is important to the answer (that is, many small zig-zags are considered flatter than a gradual rise), the slope of the autocorrelation function might be interesting.
Compare max(abs(d)) where d is the (numerical) derivative of the curve. That'll give you how steep the curve is compared to the flat curve (y = CONSTANT), but won't tell you how far away from the flat curve you'll get.
The peakedness of a statistical distribution is called "kurtosis".
Kurtosis = [[E[(mu-x)^4]]/[E[(mu-x)^2]]^2]-3
mu = average value of x in the population
E[y] = the expected value of y
Since this is usually used with probability functions, I would suggest you divide all values in the curve by the area under it.
1.First apply the linear regression to find the ideally fitting flat line
2.Measure the least square of the residues.

Math function to find saturation point of a curve

Does anybody know an algorithm in c to find the saturation point in a curve of saturation?
The curve could change its increasing speed in a sharp or in a smooth way, and have noise included, so it's not as simple as I thought.
I tried calculating the derivative of atan(delta_y/delta_x), but it doesn't work fine for all the curves.
It appears you're trying to ascertain, numerically, when the gradient of a function, fitted to some data points from a chemistry experiment, is less than one. It also seems like your data is noisy and you want to determine when the gradient would be less than one if the noise wasn't there.
Firstly, let's forget about the noise. You do not want to do this:
atan(((y(i)-y(i-1))/(x(i)-x(i-1)))*180/PI
There is no need to compute the angle of the gradient when you have the gradient is right there. Just compare (y(i)-y(i-1))/(x(i)-x(i-1)) to 1.
Secondly, if there is noise you can't trust derivatives computed like that. But to do better we really need to know more about your problem. There are infinitely many ways to interpret your data. Is there noise in the x values, or just in the y values? Do we expect this curve to have a characteristic shape or can it do anything.
I'll make a guess: This is some kind of chemistry thing where the y values rapidly increase but then the rate of increase slows down, so that in the absence of noise we have y = A(1-exp(-B*x)) for some A and B. If that's the case then you can use a non-linear regression algorithm to fit such a curve to your points and then test when the gradient of the fitted curve is less than 1.
But without more data, your question will be hard to answer. If you really are unwilling to give more information I'd suggest a quick and dirty filtering of your data. Eg. at any time estimate the true value of y by using a weighted average of the previous y values using weights that drop off exponentially the further back in time you go. Eg. instead of using y[i] use z[i] where
z[i] = sum over j = 0 to i of w[i,j]*y[j] / sum over j = 0 to i of w[i,j]
where
w[i,j] = exp(A*(x[j]-x[i]))
and A is some number that you tune by hand until you get the results you want. Try this, and plotting the z[i] as you tweak A. See if it does what you want.
We can get the maxima or minima of a curve quite easily from the function parameters of the curve.Can't see whats the reason why you getting inconsistent results.
I think the problem might be while trying to include the noise curve with the original .So make sure that you mixes these curves in a proper way.There is nothing wrong with the atan or any other math function you used. The problem is with your implementation which you haven't specified here.

Point Sequence Interpolation

Given an arbitrary sequence of points in space, how would you produce a smooth continuous interpolation between them?
2D and 3D solutions are welcome. Solutions that produce a list of points at arbitrary granularity and solutions that produce control points for bezier curves are also appreciated.
Also, it would be cool to see an iterative solution that could approximate early sections of the curve as it received the points, so you could draw with it.
The Catmull-Rom spline is guaranteed to pass through all the control points. I find this to be handier than trying to adjust intermediate control points for other types of splines.
This PDF by Christopher Twigg has a nice brief introduction to the mathematics of the spline. The best summary sentence is:
Catmull-Rom splines have C1
continuity, local control, and
interpolation, but do not lie within
the convex hull of their control
points.
Said another way, if the points indicate a sharp bend to the right, the spline will bank left before turning to the right (there's an example picture in that document). The tightness of those turns in controllable, in this case using his tau parameter in the example matrix.
Here is another example with some downloadable DirectX code.
One way is Lagrange polynominal, which is a method for producing a polynominal which will go through all given data points.
During my first year at university, I wrote a little tool to do this in 2D, and you can find it on this page, it is called Lagrange solver. Wikipedia's page also has a sample implementation.
How it works is thus: you have a n-order polynominal, p(x), where n is the number of points you have. It has the form a_n x^n + a_(n-1) x^(n-1) + ...+ a_0, where _ is subscript, ^ is power. You then turn this into a set of simultaneous equations:
p(x_1) = y_1
p(x_2) = y_2
...
p(x_n) = y_n
You convert the above into a augmented matrix, and solve for the coefficients a_0 ... a_n. Then you have a polynomial which goes through all the points, and you can now interpolate between the points.
Note however, this may not suit your purpose as it offers no way to adjust the curvature etc - you are stuck with a single solution that can not be changed.
You should take a look at B-splines. Their advantage over Bezier curves is that each part is only dependent on local points. So moving a point has no effect on parts of the curve that are far away, where "far away" is determined by a parameter of the spline.
The problem with the Langrange polynomial is that adding a point can have extreme effects on seemingly arbitrary parts of the curve; there's no "localness" like described above.
Have you looked at the Unix spline command? Can that be coerced into doing what you want?
There are several algorithms for interpolating (and exrapolating) between an aribtrary (but final) set of points. You should check out numerical recipes, they also include C++ implementations of those algorithms.
Unfortunately the Lagrange or other forms of polynomial interpolation will not work on an arbitrary set of points. They only work on a set where in one dimension e.g. x
xi < xi+1
For an arbitary set of points, e.g. an aeroplane flight path, where each point is a (longitude, latitude) pair, you will be better off simply modelling the aeroplane's journey with current longitude & latitude and velocity. By adjusting the rate at which the aeroplane can turn (its angular velocity) depending on how close it is to the next waypoint, you can achieve a smooth curve.
The resulting curve would not be mathematically significant nor give you bezier control points. However the algorithm would be computationally simple regardless of the number of waypoints and could produce an interpolated list of points at arbitrary granularity. It would also not require you provide the complete set of points up front, you could simply add waypoints to the end of the set as required.
I came up with the same problem and implemented it with some friends the other day. I like to share the example project on github.
https://github.com/johnjohndoe/PathInterpolation
Feel free to fork it.
Google "orthogonal regression".
Whereas least-squares techniques try to minimize vertical distance between the fit line and each f(x), orthogonal regression minimizes the perpendicular distances.
Addendum
In the presence of noisy data, the venerable RANSAC algorithm is worth checking out too.
In the 3D graphics world, NURBS are popular. Further info is easily googled.

Resources