Neural network accuracy improvement by using polynomials - math

I am new to the theory and history behind Neural networks and came to know that our main neuron activation function is a linear expression that is of the form
w1 x1 + w2 x2 + w3 x3 +...+b
my question why not use a higher degree polynomial expression instead ? that is like
a1 (w1 x1 + w2 x2 + w3 x3...+b)^n + a2(w1 x1 + w2 x2 + w3 x3...+b)^(n-1) + a3(w1 x1 + w2 x2 + w3 x3...+b)^(n-2)+...+an
would this improve accuracy assuming an ideal limitless computation,sorry if this questions sounds stupid

This is actually a field of active research with GMDH networks, where the familiar weighted sums are replaced by Kolmogorov-Gabor polynomial transfer functions.
Working with multidimensional data, a 'complete' polynomial with all cross terms would get really large. In the process of training a GMDH network, a polynomial of the form
is adaptively formed, appending cross terms until a target complexity is reached. This is nice, as it works with (unknown) cross dependencies in the input data and prevents over- and underfitting. It's however really demanding to correctly design and implement. Also, the math is more involved, leading to longer computation.
On the other hand, 'normal' ANNs come with the Universal Approximation Theorem, which allows them to (approximately) fit any arbitrary function. This makes them easy to design, as you don't really have to think about the shape (or even dimensionality) of the function you want to fit.
I don't think the UAT has been proven for GMDH networks (yet), which limits their application.
So, kinda, yeah. Given unlimited computation and perfect design, you could probably design better networks using more involved transfer functions, but since their design is so vastly easier, the weighted sum formulation is ubiquitous.

Related

Julia Differential Algebraic Equation as a Boundary Value Problem

Does Julia support boundary value differential algebraic equations? I have an implicit ODE with a variable mass matrix that is sometimes singular, so I have to use the DAEProblem. My problem is two coupled second order ODEs for x1(t) and x2(t) that I have transformed into four first order equations by setting x1'(t) = y1(t) and x2'(t)=y2(t). I have values for x1 and x2 at the start and end of my domain, but don't have values for y1 or y2 anywhere, so I have a need for both a DAE and a BVP.
This github post suggests that this is possible, but I'm afraid I don't understand the machinery well enough to understand how to couple DAEProblem with BVProblem.
I've had success writing multiple shooting code following numerical recipes to solve the problem, but it's fairly clunky. Ultimately, I would like to pair this with DiffEqFlux (I have quite a few measurements of x1 and x2 along the domain and don't know the exact form of the differential equation), but I suspect that would be much simpler if there was a more direct approach to linking BVProblem with DAEProblem.
Just go directly to DiffEqFlux, since parameter estimation encompasses BVPs. Write the boundary conditions as part of the loss function on a DAEProblem (i.e. that the starting values should equal x and the final values should equal y), and optimize the initial conditions at the same time as any parameters. Optimizing only the intial conditions and not any parameters is equivalent to a single shooting BVP solver in this form, and this allows simultaneous parameter estimation. Or use the multiple shooting layer functions to do multiple shooting. Or use a BVProblem with a mass matrix.
For any more help, you'll need to share code for what you tried and didn't work, since it's not anything more difficult than that so it's hard to give more generic help than just "use constructor x".

Running a GLM with a Gamma distribution, but data includes zeros

I'm trying to run a GLM in R for biomass data (reductive biomass and ratio of reproductive biomass to vegetative biomass) as a function of habitat type ("hab"), year data was collected ("year"), and site of data collection ("site"). My data looks like it would fit a Gamma distribution well, but I have 8 observations with zero biomass (out of ~800 observations), so the model won't run. What's the best way to deal with this? What would be another error distribution to use? Or would adding a very small value (such as .0000001) to my zero observations be viable?
My model is:
reproductive_biomass<-glm(repro.biomass~hab*year + site, data=biom, family = Gamma(link = "log"))
Ah, zeroes - gotta love them.
Depending on the system you're studying, I'd be tempted to check out zero-inflated or hurdle models - the basic idea is that there are two components to the model: some binomial process deciding whether the response is zero or nonzero, and then a gamma that works on the nonzeroes. Slick part is you can then do inferences on the coefficients of both models and even use different coefficients for both.
http://seananderson.ca/2014/05/18/gamma-hurdle.html ... but a search for "zero-inflated gamma" or "tweedie models" might also yield something informative and/or scholarly.
In an ideal world, your analytic tool should fit your system and your intended inferences. The zero-inflated world is pretty sweet, but is conditional on the assumption of separate processes. Thus an important question to answer, of course, is what zeroes "mean" in the context of your study, and only you can answer that - whether they're numbers that just happened to be really really small, or true zeroes that are the result of some confounding process like your coworker spilling the bleach (or something otherwise uninteresting to your study), or else true zeroes that ARE interesting.
Another thought: ask the same question over on crossvalidated, and you'll probably get an even more statistically informed answer. Good luck!

High order PDEs

I'm trying to solve a 6th-order nonlinear PDE (1D) with fixed boundary values (extended Fisher-Kolmogorov - EFK). After failing with FTCS, next attempt is MoL (either central in space or FEM) using e.g. LSODES.
How can this be implemented? Using Python/C + OpenMP so far, but need some pointers
to do this efficiently.
EFK with additional 6th order term:
u_t = d u_6x - g u_4x + u_xx + u-u^3
where d, g are real coefficients.
u(x,0) = exp(-x^2/16),
ux = 0 on boundary
domain is [0,300] and dx << 1 since i'm looking for pattern formations (subject to the values
of d, g)
I hope this is sufficient information.
All PDE solutions like this will ultimately end up being expressed using linear algebra in your program, so the trick is to figure out how to get the PDE into that form before you start coding.
Finite element methods usually begin with a weighted residual method. Non-linear equations will require a linear approximation and iterative methods like Newton-Raphson. I would recommend that you start there.
Yours is a transient solution, so you'll have to do time stepping. You can either use an explicit method and live with the small time steps that stability limits will demand or an implicit method, which will force you to do a matrix inversion at each step.
I'd do a Fourier analysis first of the linear piece to get an idea of the stability requirements.
The only term in that equation that makes it non-linear is the last one: -u^3. Have you tried starting by leaving that term off and solving the linear equation that remains?
UPDATE: Some additional thoughts prompted by comments:
I understand how important the u^3 term is. Diffusion is a 2nd order derivative w.r.t. space, so I wouldn't be so certain that a 6th order equation will follow suit. My experience with PDEs comes from branches of physics that don't have 6th order equations, so I honestly don't know what the solution might look like. I'd solve the linear problem first to get a feel for it.
As for stability and explicit methods, it's dogma that the stability limits placed on time step size makes them likely to fail, but the probability isn't 1.0. I think map reduce and cloud computing might make an explicit solution more viable than it was even 10-20 years ago. Explicit dynamics have become a mainstream way to solve difficult statics problems, because they don't require a matrix inversion.

Filtering methods of the complex oscilliations

If I have a system of a springs, not one, but for example 3 degree of freedom system of the springs connected in some with each other. I can make a system of differential equations for but it is impossible to solve it in a general way. The question is, are there any papers or methods for filtering such a complex oscilliations, in order to get rid of the oscilliations and get a real signal as much as possible? For example if I connect 3 springs in some way, and push them to start the vibrations, or put some weight on them, and then take the vibrations from each spring, are there any filtering methods to make it easy to determine the weight (in case if some mass is put above) of each mass? I am interested in filtering complex spring like systems.
Three springs, six degrees of freedom? This is a trivial solution using finite element methods and numerical integration. It's a system of six coupled ODEs. You can apply any form of numerical integration, such as 5th order Runge-Kutta.
I'd recommend doing an eigenvalue analysis of the system first to find out something about its frequency characteristics and normal modes. I'd also do an FFT of the dynamic forces you apply to the system. You don't mention any damping, so if you happen to excite your system at a natural frequency that's close to a resonance you might have some interesting behavior.
If the dynamic equation has this general form (sorry, I don't have LaTeX here to make it look nice):
Ma + Kx = F
where M is the mass matrix (diagonal), a is the acceleration (2nd derivative of displacements w.r.t. time), K is the stiffness matrix, and F is the forcing function.
If you're saying you know the response, you'll have to pre-multiply by the transpose of the response function and try to solve for M. It's diagonal, so you have a shot at it.
Are you connecting the springs in such a way that the behavior of the system is approximately linear? (e.g. at least as close to linear as are musical instrument springs/strings?) Is this behavior consistant over time? (e.g. the springs don't melt or break.) If so, LTI (linear time invariant) systems theory might be applicable. Given enough measurements versus the numbers of degrees of freedom in the LTI system, one might be able to estimate a pole-zero plot of the system response, and go from there. Or something like a linear predictor might be useful.
Actually it is possible to solve the resulting system of differential equations as long as you know the masses, etc.
The standard approach is to use a Laplace Transform. In particular you start with a set of linear differential equations. Add variables until you have a set of first order linear differential equations. (So if you have y'' in your equation, you'd add the equation z = y' and replace y'' with z'.) Rewrite this in the form:
v' = Av + w
where v is a vector of variable, A is a matrix, and w is a scalar vector. (An example of something that winds up in w is gravity.)
Now apply a Laplace transform to get
s L(v) - v(0) = AL(v) + s w
Solve it to get
L(v) = inv(A - I s)(s w + v(0))
where inv inverts a matrix and I is the identity matrix. Apply the inverse Laplace transform (if you read up on Laplace transforms you can find tables of inverse of common types of functions - getting a complete list of the functions you actually encounter shouldn't be that hard), and you have your solution. (Be warned, these computations quickly get very complex.)
Now you have the ability to take a particular setup and solve for the future behavior. You also have the ability to (if you do things really carefully) figure out how the model responds to a small perturbation in parameters. But your problem is that you don't know the parameters to use. However you do have the ability to measure the positions in the system at repeated times.
If you put this together, what you can do is this. Measure your position at a number of points. First estimate all of the initial values of the parameters, and then all of the values a second later. You can adjust your parameters (using Newton's method) to come close enough to the values a second later. Take the measurements from 5 seconds later and use that initial estimate as your starting point to refine your calculations for what is happening 5 seconds later. Repeat with longer intervals to get all of your answers.
Writing and debugging this should take you some time. :-) I would strongly recommend investigating how much of this Mathematica knows how to do for you already...

Fitting a binormal distribution in R

As from title, I have some data that is roughly binormally distributed and I would like to find its two underlying components.
I am fitting to the data distribution the sum of two normal with means m1 and m2 and standard deviations s1 and s2. The two gaussians are scaled by a weight factor such that w1+w2 = 1
I can succeed to do this using the vglm function of the VGAM package such as:
fitRes <- vglm(mydata ~ 1, mix2normal1(equalsd=FALSE),
iphi=w, imu=m1, imu2=m2, isd1=s1, isd2=s2))
This is painfully slow and it can take several minutes depending on the data, but I can live with that.
Now I would like to see how the distribution of my data changes over time, so essentially I break up my data in a few (30-50) blocks and repeat the fit process for each of those.
So, here are the questions:
1) how do I speed up the fit process? I tried to use nls or mle that look much faster but mostly failed to get good fit (but succeeded in getting all the possible errors these function could throw on me). Also is not clear to me how to impose limits with those functions (w in [0;1] and w1+w2=1)
2) how do I automagically choose some good starting parameters (I know this is a $1 million question but you'll never know, maybe someone has the answer)? Right now I have a little interface that allow me to choose the parameters and visually see what the initial distribution would look like which is very cool, but I would like to do it automatically for this task.
I thought of relying on the x corresponding to the 3rd and 4th quartiles of the y as starting parameters for the two mean? Do you thing that would be a reasonable thing to do?
First things first:
did you try to search for fit mixture model on RSeek.org?
did you look at the Cluster Analysis + Finite Mixture Modeling Task View?
There has been a lot of research into mixture models so you may find something.

Resources