Derivation of the Backpropagation Algorithm - math

I'm studying the Backpropagation algorithm and I want to derive it by myself. Therefore, I've constructed a very simple network with one input layer, one hidden layer and one output layer. You can find the details in the graphic.
The network can be found here:
t is the true output, i is the input, z and w are the weights. Moreover I have the activation function ϕ(x).
I think that I have understood the algorithm more or less, so I've started with the following error function:
That function should be correct, isn't it? The next step is to calculate the partial derivative:
For me that seems to be correct, but is that right up to here (Especially the formal math aspects)? I'm asking, because i have so problems with the next steps (derivating to the z's...)
Thanks for helping!

Related

Graph Clustering

I've been searching paper about method review in graph clustering but not satisfied me,
please tell me what is best method (according to you) in graph clustering, so sorry if my question very general
Thanks
With such an open question, I guess I can recommend you to try WEKA.
It has a nice set of user interfaces to let you import your dataset and then try and compare various classification and clustering algorithms on your data, without writing even one line of code.
After you identified an algorithm that works for your problem, you can then search for a nice and fast implementation in the programming language of your choice.
EDIT: since you mentioned the graph tag, maybe you should have a look at Markov Cluster Algorithm, or else, you will have a hard time trying to represent your graph data in a format suitable for the distance based clustering algorithms in WEKA.

Howto implement the inverse Laplace transform in javascript?

I'm writing an javascript applet make it easy for others to see how a system with and without proportional controller works and what the outputs are.
First a little explanation on the applet (You can skip this if you want, the real question is in the last paragraph.):
I managed to implement a way of input for the system (in the frequency domain), so the applet can do the math and show the users their provided system. At the moment the applet computes the poles and zeros of the system, plots them together with the root-Loci, plot the Nyquist curve of the system and plot the Bode plots of the system.
The next thing I want the applet to do is calculating and plotting the impulse response. To do so I need to perform an inverse Laplace transformation on the transferfunction of the system.
Now the real question:
I have a function (the transferfunction) in the frequency domain. The function is a rational function, stored in the program as two polynomes (numerator and denominator stored by their coefficients). What would be the best way of transforming this function to the time domain? (inverse Laplace). Or is there an open-source library which implements this already. I've searched for it already but only found some math libraries for with more simple mathematics.
Thanks in advance
This is a fairly complex and interesting problem. A couple of ideas.
(1) If the solution must be strictly JS: the inverse LT of some rational functions can be found via partial fraction decomposition. You have numerical coefficients for the polynomials, right? You can try implementing a partial fraction decomposition in JS or maybe find one. The difficulty here is that it is not guaranteed that you can find the inverse LT via partial fractions.
(2) Use JS as glue code and send the rational function to another process (running e.g. Sympy or Maxima) to compute the inverse LT. That way you can take advantage of all the functions available, but it will take some work to connect to the other process and parse the result. For Maxima at least, there have been many projects which use Maxima as the computational back-end; see: http://maxima.sourceforge.net/relatedprojects.html
Problem is solved now. After checking out some numerical methods I went for the partial fraction decomposition by using the poles of the system and the least square method to calculate the coeficients. After this the inverse LT wasn't that hard to find.
Thx for your suggestions ;)
Ask me if you want to look at the code.

Graph partitioning optimization

The problem
I have a set of locations on a plane (actually they are pins in a KML file) and I want to partition this graph into subgraphs. Connectivity is pretty good - as with all real world road networks - so I assume that if two locations are close they have some kind of connection. The resulting set of subgraphs should adhere to these constraints:
Every node has to be covered by a subgraph
Every node should be in exactly 1 subgraph
Every node within a subgraph should be close to each other (L2 norm distances)
Every subgraph should contain at least 5 locations
The amount of subgraphs should be minimal
Right now the amount of locations is no more than 100 so I thought about brute forcing through every possibility but this obviously won't scale well.
I thought about using some k-Nearest-Neighbors algorithm (e.g. using QuickGraph) but I can't get my head around where to start and how to extend/shrink the subgraphs on the way. Maybe it's possible to map this problem to another problem that can easily be solved with some numerical procedure (e.g. Simplex) ...
Maybe someone has experience in this kind of optimization problems and is willing to help me find a solution? I don't have access to Mathematica/Matlab or the like ... but sufficient .NET programming skills and hmm Excel :-)
Thanks a lot!
As soon as there are multiple criteria that need to be appeased in the best possible way simultanously, it is usually starting to get difficult.
A numerical solution could work as follows: You could define yourself a utility function, that maps partitionings of your locations to positive real values, describing how "good" a partition is by assigning it a "rating" (good could be high "bad" could be near zero).
Once you have such a function assigning partitions their according "values", you simply need to optimize it and then you hopefully obtain a good solution if you defined your utility function reasonably. Evolutionary algorithms are good at that task since your utility function is probably analytically too complex to solve due to its discrete nature.
The problem is then only how you assign "values" to partitions via this utility function. This is then your task. It can be done for example by weighing each criterion with a factor and summing the results up, or even more complex functions (least squares etc.). The factors you use in the definition of the utility function are tuning parameters and can be varied until the result seems to be good.
Some CA software wold help a lot for testing if you can get your hands on one, bit I guess to obtain a black box solver for your partitioning problem, you need to implement the complete procedure yourself using a language of your choice.

How can I do blind fitting on a list of x, y value pairs if I don't know the form of f(x) = y?

If I have a function f(x) = y that I don't know the form of, and if I have a long list of x and y value pairs (potentially thousands of them), is there a program/package/library that will generate potential forms of f(x)?
Obviously there's a lot of ambiguity to the possible forms of any f(x), so something that produces many non-trivial unique answers (in reduced terms) would be ideal, but something that could produce at least one answer would also be good.
If x and y are derived from observational data (i.e. experimental results), are there programs that can create approximate forms of f(x)? On the other hand, if you know beforehand that there is a completely deterministic relationship between x and y (as in the input and output of a pseudo random number generator) are there programs than can create exact forms of f(x)?
Soooo, I found the answer to my own question. Cornell has released a piece of software for doing exactly this kind of blind fitting called Eureqa. It has to be one of the most polished pieces of software that I've ever seen come out of an academic lab. It's seriously pretty nifty. Check it out:
It's even got turnkey integration with Amazon's ec2 clusters, so you can offload some of the heavy computational lifting from your local computer onto the cloud at the push of a button for a very reasonable fee.
I think that I'm going to have to learn more about GUI programming so that I can steal its interface.
(This is more of a numerical methods question.) If there is some kind of observable pattern (you can kinda see the function), then yes, there are several ways you can approximate the original function, but they'll be just that, approximations.
What you want to do is called interpolation. Two very simple (and not very good) methods are Newton's method and Laplace's method of interpolation. They both work on the same principle but they are implemented differently (Laplace's is iterative, Newton's is recursive, for one).
If there's not much going on between any two of your data points (ie, the actual function doesn't have any "bumps" whose "peaks" are not represented by one of your data points), then the spline method of interpolation is one of the best choices you can make. It's a bit harder to implement, but it produces nice results.
Edit: Sometimes, depending on your specific problem, these methods above might be overkill. Sometimes, you'll find that linear interpolation (where you just connect points with straight lines) is a perfectly good solution to your problem.
It depends.
If you're using data acquired from the real-world, then statistical regression techniques can provide you with some tools to evaluate the best fit; if you have several hypothesis for the form of the function, you can use statistical regression to discover the "best" fit, though you may need to be careful about over-fitting a curve -- sometimes the best fit (highest correlation) for a specific dataset completely fails to work for future observations.
If, on the other hand, the data was generated something synthetically (say, you know they were generated by a polynomial), then you can use polynomial curve fitting methods that will give you the exact answer you need.
Yes, there are such things.
If you plot the values and see that there's some functional relationship that makes sense, you can use least squares fitting to calculate the parameter values that minimize the error.
If you don't know what the function should look like, you can use simple spline or interpolation schemes.
You can also use software to guess what the function should be. Maybe something like Maxima can help.
Wolfram Alpha can help you guess:
http://blog.wolframalpha.com/2011/05/17/plotting-functions-and-graphs-in-wolframalpha/
Polynomial Interpolation is the way to go if you have a totally random set
http://en.wikipedia.org/wiki/Polynomial_interpolation
If your set is nearly linear, then regression will give you a good approximation.
Creating exact form from the X's and Y's is mostly impossible.
Notice that what you are trying to achieve is at the heart of many Machine Learning algorithm and therefor you might find what you are looking for on some specialized libraries.
A list of x/y values N items long can always be generated by an degree-N polynomial (assuming no x values are the same). See this article for more details:
http://en.wikipedia.org/wiki/Polynomial_interpolation
Some lists may also match other function types, such as exponential, sinusoidal, and many others. It is impossible to find the 'simplest' matching function, but the best you can do is go through a list of common ones like exponential, sinusoidal, etc. and if none of them match, interpolate the polynomial.
I'm not aware of any software that can do this for you, though.

R script - nls function

Can anyone give me a good explanation for what the parameter "algorithm" does in the nls function in R?
Also, how does the formula work? I know it uses a tilda, but I can't really find a down-to-earth explanation of it.
Also, how important are the start values? Do I need to try multiple start values, or can I still have a guarantee that nls will find the correct parameters regardless of the start values I use?
In brief:
nls() is going to vary parameters to try to minimize the square error between your model and your data. There's several good methods it can try to find the minimum. Reading the details about "method" in ?optim will provide some good info and references.
In general, for nonlinear models, your results can be sensitive to initial guess. You should try several different guesses to make sure that the outputs are close. If your results are very sensitive to your guess, you can try re-parameterizing, using a different algorithm, or rethinking your model.
As for the formula, I'd echo the previous answer. Work through the examples in the bottom of ?nls and then try to ask a more specific question.

Resources