I am trying to determine the exact mathematics used to train feed forward networks used for classification in deeplearning4j with stochastic gradient descent. I have tried stepping through the code but am getting lost in the forest.
Is this documented anywhere?
You can find it in the preOutput and backpropGradient methods in the various layers implementations:
Here's the basic implementation used in DenseLayers
And the more complex convolution one used in Convolutions
The implementation might be swapped out when using CuDNN or MKL however.
Related
Could the autograd or jax packages be used to generate the equivalent of analytic derivatives for OpenMDAO explicit components? i.e. something more accurate than finite differences (or perhaps more accurate or more general than the complex step method?) but without the work of manually computing and programming the analytic gradients?
I'm not an expert on either of these packages, but they seem to be designed for just this purpose.
I'm using the Caret package from R to create prediction models for maximum energy demand. What i need to use is neural network multilayer perceptron, but in the Caret package i found out there's 2 of the mlp method, which is "mlp" and "mlpML". what is the difference between the two?
I have read description from a book (Advanced R Statistical Programming and Data Models: Analysis, Machine Learning, and Visualization) but it still doesnt answer my question.
Caret has 238 different models available! However many of them are just different methods to call the same basic algorithm.
Besides mlp there are 9 other methods of calling a multi-layer-perceptron one of which is mlpML. The real difference is only in the parameters of the function call and which model you need depends on your use case and what you want to adapt about the basic model.
Chances are, if you don't know what mlpML or mlpWeightDecay,etc. does you are fine to just use the basic mlp.
Looking at the official documentation we can see that:
mlp(size) while mlpML(layer1,layer2,layer3) so in the first method you can only tune the size of the multi-layer-perceptron while in the second call you can tune each layer individually.
Looking at the source code here:
https://github.com/topepo/caret/blob/master/models/files/mlp.R
and here:
https://github.com/topepo/caret/blob/master/models/files/mlpML.R
It seems that the difference is that mlpML allows several hidden layers:
modelInfo <- list(label = "Multi-Layer Perceptron, with multiple layers",
while mlp has one single layer with hidden units.
The official documentation also hints at this difference. In my opinion, it is not particularly useful to have many different models that differ only very slightly, and the documentation does not explain those slight differences well.
I need to estimate parameters of continuous-discrete nonlinear stochastic dynamic system using Kalman filtering techniques.
I'm going to use Julia ode45() from ODE and implement Extended Kalman Filter by myself to compute loglikelihood. ODE is written fully in Julia, ForwardDiff supports differentiation of native Julia functions, including nested differentiation, that's what I also need cause I want to use ForwardDiff in my EKF implementation.
Will ForwardDiff handle differentiation of such a comprehensive function like the loglikelihood I've described?
ODE.jl is in maintenance mode so I would recommend using DifferentialEquations.jl instead. In the DiffEq FAQ there is an explanation about using ForwardDiff through the ODE solvers. It works, but as in the FAQ I would recommend using sensitivity analysis since that's a better way of calculating the derivatives (it will take a lot less compilation time). But yes, DiffEqParamEstim.jl is a whole repository for parameter estimation of ODEs/SDEs/DAEs/DDEs and it uses ForwardDiff.jl through the solvers.
(BTW, what you're looking to do sounds interesting. Feel free to get in touch with us in the JuliaDiffEq channel to talk about the development of parameter estimation tooling!)
I would like to implement an adaptive sampling algorithm in Julia, in n-dimensions, for plotting and numerical integration purposes. As a starting point I found:
https://mathematica.stackexchange.com/questions/216/adaptive-sampling-for-slow-to-compute-functions-in-2d
As I am quite new to Julia, any help would be much appreciated. First of all, is any of this functionality already implemented in existing libraries? I mean, anything I could use as a starting base?
PS: I plan to update this thread as I progress with the programming.
The julia package Mamba has a few samplers implemented:
Adaptive Mixture Metropolis (AMM)
Adaptive Metropolis within Gibbs (AMWG)
Missing Values Sampler (MISS)
No-U-Turn Sampler (NUTS)
Shrinkage Slice (Slice)
See documentation for details:
http://mambajl.readthedocs.org/en/latest/index.html
I'm checking a simple moving average crossing strategy in R. Instead of running a huge simulation over the 2 dimenional parameter space (length of short term moving average, length of long term moving average), I'd like to implement the Particle Swarm Optimization algorithm to find the optimal parameter values. I've been browsing through the web and was reading that this algorithm was very effective. Moreover, the way the algorithm works fascinates me...
Does anybody of you guys have experience with implementing this algorithm in R? Are there useful packages that can be used?
Thanks a lot for your comments.
Martin
Well, there is a package available on CRAN called pso, and indeed it is a particle swarm optimizer (PSO).
I recommend this package.
It is under actively development (last update 22 Sep 2010) and is consistent with the reference implementation for PSO. In addition, the package includes functions for diagnostics and plotting results.
It certainly appears to be a sophisticated package yet the main function interface (the function psoptim) is straightforward--just pass in a few parameters that describe your problem domain, and a cost function.
More precisely, the key arguments to pass in when you call psoptim:
dimensions of the problem, as a vector
(par);
lower and upper bounds for each
variable (lower, upper); and
a cost function (fn)
There are other parameters in the psoptim method signature; those are generally related to convergence criteria and the like).
Are there any other PSO implementations in R?
There is an R Package called ppso for (parallel PSO). It is available on R-Forge. I do not know anything about this package; i have downloaded it and skimmed the documentation, but that's it.
Beyond those two, none that i am aware of. About three months ago, I looked for R implementations of the more popular meta-heuristics. This is the only pso implementation i am aware of. The R bindings to the Gnu Scientific Library GSL) has a simulated annealing algorithm, but none of the biologically inspired meta-heuristics.
The other place to look is of course the CRAN Task View for Optimization. I did not find another PSO implementation other than what i've recited here, though there are quite a few packages listed there and most of them i did not check other than looking at the name and one-sentence summary.