I have seen a few times people using -1 as opposed to 0 when working with neural networks for the input data. How is this better and does it effect any of the mathematics to implement it?
Edit: Using feedforward and back prop
Edit 2: I gave it a go but the network stopped learning so I assume the maths would have to change somewhere?
Edit 3: Finally found the answer. The mathematics for binary is different to bipolar. See my answer below.
Recently found that the sigmoid and sigmoid derivative formula needs to change if using bipolar over binary.
Bipolar Sigmoid Function: f(x) = -1 + 2 / (1 + e^-x)
Bipolar Sigmoid Derivative: fâ(x) = 0.5 * (1 + f(x)) * (1 â f(x) )
It's been a long time, but as I recall, it has no effect on the mathematics needed to implement the network (assuming you're not working with a network type that for some reason limits any part of the process to non-negative values). One of the advantages is that it makes a larger distinction between inputs, and helps amplify the learning signal. Similarly for outputs.
Someone who's done this more recently probably has more to say (like about whether the 0-crossing makes a difference; I think it does). And in reality some of this depends on exactly what type of neural network you're using. I'm assuming you're talking about backprop or a variant thereof.
The network learns quickly using -1/1 inputs compared to 0/1. Also, if you use -1/1 inputs, 0 means "unknown entry/noise/does not matter". I would use -1/1 as input of my neural network.
Related
This is a newbie question. I am trying to minimize the following QP problem:
x'Qx + b'x + c, for A.x >= lb
where:
x is a vector of coordinates,
Q is a sparse, strongly diagonally dominant, symmetric matrix typically
of size 500,000 x 500,000 to 1M x 1M
b is a vector of constants
c is a constant
A is an identity matrix
lb is a vector containing lower bounds on vector x
Following are the packages I have tried:
Optim.jl: They have a primal interior-point algorithm for simple "box" constraints. I have tried playing around with the inner_optimizer, by setting it to GradientDescent()/ ConjugateGradient(). No matter what this seems to be very slow for my problem set.
IterativeSolver.jl: They have a conjugate gradient solver but they do not have a way to set constraints to the QP problem.
MathProgBase.jl: They have a dedicated solver for Quadratic Programming called the Ipopt(). It works wonderfully for small data sets typically around 3Kx3K matrix, but it takes too long for the kind of data sets I am looking at. I am aware that changing the linear system solver from MUMPS to HSL or WSMP may produce significant improvement but is there a way to add third party linear system solvers to the Ipopt() through Julia?
OSQP.jl: This again takes too long to converge for the data sets that I am interested in.
Also I was wondering if anybody has worked with large data sets can they suggest a way to solve a problem of this scale really fast in Julia using the existing packages?
You can try the OSQP solver with different parameters to speedup convergence for your specific problem. In particular:
If you have multiple cores, MKL Pardiso can significantly reduce the execution time. You can find details on how to install it here (It basically consists in running the default MKL installer). After that, you can use it in OSQP as follows
model = OSQP.Model()
OSQP.setup!(model; P=Q, q=b, A=A, l=lb, u=ub, linsys_solver="mkl pardiso")
results = OSQP.solve!(model)
The number of iterations depends on your stepsize rho. OSQP automatically updates it trying to find the best one. If you have a specific problem, you can disable the automatic detection and play with it yourself. Here is an example for try different rho
model = OSQP.Model()
OSQP.setup!(model; P=Q, q=b, A=A, l=lb, u=ub, linsys_solver="mkl pardiso",
adaptive_rho=false, rho=1e-3)
results = OSQP.solve!(model)
I suggest you to try different rho values maybe logspaced between 1e-06 and 1e06.
You can reduce the iterations by rescaling the problem data so that the condition number of your matrices is not too high. This can significantly reduce the number of iterations.
I pretty sure that if follow these 3 steps you can make OSQP work pretty well. I am happy to try OSQP for your problem if you are willing to share your data (I am one of the developers).
Slightly unrelated, you can call OSQP using MathProgBase.jl and JuMP.jl. It also supports the latest MathOptInterface.jl package that will replace MathProgBase.jl for the newest version of JuMP.
Given an indicator function that maps finite set X to {0, 1} and assuming the exact function of how this mapping occurs is unknown, can I learn how each element of X is contributing to the output simply by generating random samples of X and using ML?
In general, yes. This is a very basic machine learning problem: binary classification, supervised learning.
Use random samples of X and the indicator classification (0 | 1) as your data set. You can feed this to a component analysis tool, or perhaps train a model and then look at the derived training function (which, supposedly, will approach the same functionality as your indicator function).
Learning to use these tools will take more learning on your part. There is probably a ML or statistics support package in your favourite language that can help. Since Stack Overflow is not really a tutorial site, I'll leave off here and let you get to those search terms and your next set of reading.
I'm hoping some of the more experienced users here might have some suggestions for me.
I am implementing a neural network with 2 inputs, 2 hidden nodes, and 1 output.
I have used the sigmoid activation function on both the hidden layer and the output and I'm using back propagation. I am fairly certain I understand the theory correctly. I have the program calculating gradients, updating weights and biases, and I use momentum and strength variables for adjusting.
The point of using multiple layers is to solve non-linearly separable problems, but I have only been able to solve the linear seperable AND and OR boolean functions so far. I have tried playing with all kinds of different momentum and strength settings to no avail.
My usual outputs are always exactly the same for all 4 variables. It was near 0.55 for awhile, until I played with settings and now they're all outputting 0.9. If I remove the bias the first value goes to zero, but not the fourth.
Any suggestions?
To answer my own question ..
After a lot of trial and error, I threw caution to the wind and tried using tanh(x) instead of sigmoid .. and after only a little tweeking, it WORKS!
If anyone else has been struggling with one of these nets, it might work for you.
The derivative is (1 - tanh(x))(1 + tanh(x)).
I'm trying to solve a 6th-order nonlinear PDE (1D) with fixed boundary values (extended Fisher-Kolmogorov - EFK). After failing with FTCS, next attempt is MoL (either central in space or FEM) using e.g. LSODES.
How can this be implemented? Using Python/C + OpenMP so far, but need some pointers
to do this efficiently.
EFK with additional 6th order term:
u_t = d u_6x - g u_4x + u_xx + u-u^3
where d, g are real coefficients.
u(x,0) = exp(-x^2/16),
ux = 0 on boundary
domain is [0,300] and dx << 1 since i'm looking for pattern formations (subject to the values
of d, g)
I hope this is sufficient information.
All PDE solutions like this will ultimately end up being expressed using linear algebra in your program, so the trick is to figure out how to get the PDE into that form before you start coding.
Finite element methods usually begin with a weighted residual method. Non-linear equations will require a linear approximation and iterative methods like Newton-Raphson. I would recommend that you start there.
Yours is a transient solution, so you'll have to do time stepping. You can either use an explicit method and live with the small time steps that stability limits will demand or an implicit method, which will force you to do a matrix inversion at each step.
I'd do a Fourier analysis first of the linear piece to get an idea of the stability requirements.
The only term in that equation that makes it non-linear is the last one: -u^3. Have you tried starting by leaving that term off and solving the linear equation that remains?
UPDATE: Some additional thoughts prompted by comments:
I understand how important the u^3 term is. Diffusion is a 2nd order derivative w.r.t. space, so I wouldn't be so certain that a 6th order equation will follow suit. My experience with PDEs comes from branches of physics that don't have 6th order equations, so I honestly don't know what the solution might look like. I'd solve the linear problem first to get a feel for it.
As for stability and explicit methods, it's dogma that the stability limits placed on time step size makes them likely to fail, but the probability isn't 1.0. I think map reduce and cloud computing might make an explicit solution more viable than it was even 10-20 years ago. Explicit dynamics have become a mainstream way to solve difficult statics problems, because they don't require a matrix inversion.
If I have a system of a springs, not one, but for example 3 degree of freedom system of the springs connected in some with each other. I can make a system of differential equations for but it is impossible to solve it in a general way. The question is, are there any papers or methods for filtering such a complex oscilliations, in order to get rid of the oscilliations and get a real signal as much as possible? For example if I connect 3 springs in some way, and push them to start the vibrations, or put some weight on them, and then take the vibrations from each spring, are there any filtering methods to make it easy to determine the weight (in case if some mass is put above) of each mass? I am interested in filtering complex spring like systems.
Three springs, six degrees of freedom? This is a trivial solution using finite element methods and numerical integration. It's a system of six coupled ODEs. You can apply any form of numerical integration, such as 5th order Runge-Kutta.
I'd recommend doing an eigenvalue analysis of the system first to find out something about its frequency characteristics and normal modes. I'd also do an FFT of the dynamic forces you apply to the system. You don't mention any damping, so if you happen to excite your system at a natural frequency that's close to a resonance you might have some interesting behavior.
If the dynamic equation has this general form (sorry, I don't have LaTeX here to make it look nice):
Ma + Kx = F
where M is the mass matrix (diagonal), a is the acceleration (2nd derivative of displacements w.r.t. time), K is the stiffness matrix, and F is the forcing function.
If you're saying you know the response, you'll have to pre-multiply by the transpose of the response function and try to solve for M. It's diagonal, so you have a shot at it.
Are you connecting the springs in such a way that the behavior of the system is approximately linear? (e.g. at least as close to linear as are musical instrument springs/strings?) Is this behavior consistant over time? (e.g. the springs don't melt or break.) If so, LTI (linear time invariant) systems theory might be applicable. Given enough measurements versus the numbers of degrees of freedom in the LTI system, one might be able to estimate a pole-zero plot of the system response, and go from there. Or something like a linear predictor might be useful.
Actually it is possible to solve the resulting system of differential equations as long as you know the masses, etc.
The standard approach is to use a Laplace Transform. In particular you start with a set of linear differential equations. Add variables until you have a set of first order linear differential equations. (So if you have y'' in your equation, you'd add the equation z = y' and replace y'' with z'.) Rewrite this in the form:
v' = Av + w
where v is a vector of variable, A is a matrix, and w is a scalar vector. (An example of something that winds up in w is gravity.)
Now apply a Laplace transform to get
s L(v) - v(0) = AL(v) + s w
Solve it to get
L(v) = inv(A - I s)(s w + v(0))
where inv inverts a matrix and I is the identity matrix. Apply the inverse Laplace transform (if you read up on Laplace transforms you can find tables of inverse of common types of functions - getting a complete list of the functions you actually encounter shouldn't be that hard), and you have your solution. (Be warned, these computations quickly get very complex.)
Now you have the ability to take a particular setup and solve for the future behavior. You also have the ability to (if you do things really carefully) figure out how the model responds to a small perturbation in parameters. But your problem is that you don't know the parameters to use. However you do have the ability to measure the positions in the system at repeated times.
If you put this together, what you can do is this. Measure your position at a number of points. First estimate all of the initial values of the parameters, and then all of the values a second later. You can adjust your parameters (using Newton's method) to come close enough to the values a second later. Take the measurements from 5 seconds later and use that initial estimate as your starting point to refine your calculations for what is happening 5 seconds later. Repeat with longer intervals to get all of your answers.
Writing and debugging this should take you some time. :-) I would strongly recommend investigating how much of this Mathematica knows how to do for you already...