I'm hoping some of the more experienced users here might have some suggestions for me.
I am implementing a neural network with 2 inputs, 2 hidden nodes, and 1 output.
I have used the sigmoid activation function on both the hidden layer and the output and I'm using back propagation. I am fairly certain I understand the theory correctly. I have the program calculating gradients, updating weights and biases, and I use momentum and strength variables for adjusting.
The point of using multiple layers is to solve non-linearly separable problems, but I have only been able to solve the linear seperable AND and OR boolean functions so far. I have tried playing with all kinds of different momentum and strength settings to no avail.
My usual outputs are always exactly the same for all 4 variables. It was near 0.55 for awhile, until I played with settings and now they're all outputting 0.9. If I remove the bias the first value goes to zero, but not the fourth.
Any suggestions?
To answer my own question ..
After a lot of trial and error, I threw caution to the wind and tried using tanh(x) instead of sigmoid .. and after only a little tweeking, it WORKS!
If anyone else has been struggling with one of these nets, it might work for you.
The derivative is (1 - tanh(x))(1 + tanh(x)).
Related
I had to add some circular dependencies to my model and thus adding NonlinearBlockGS and LinearBlockGS to the Group with the circular dependency. I get messages like this
LN: LNBGSSolver 'LN: LNBGS' on system 'XXX' failed to converge in 10
iterations.
in the phase where it's finding the Coloring of the problem. There is a Dymos trajectory as part of the problem, but the circular dependency is not in the Trajectory group, it's upstream. It however converges very easily when actually solving the problem. The number of FWD solves is the same as it was before-- everything seem to work fine. Should I be worried about anything?
the way our total derivative coloring works is that we replace partial derivatives with random numbers and then solve the linear system. So the linear solver should be converging. Now, whether or not it should converge with LNBGS in 10 iterations... probably not.
Its hard to speak diffinitively when putting random numbers into a matrix to invert it... but generally speaking it should remain invertible (though we can't promise). That does not mean that it will remain easily invertible. How close does the linear residual get during the coloring? it is decreasing, but slowly. Would more iteration let it get there?
If your problem is working well, I don't think you need to freak out about this. If you would like it to converge better, it won't hurt anything and might give you better coloring. You can increase the iprint of that solver to get more information on the convergence history.
Another option, if your system is small enough, is to try using the DirectSolver instead of LNBGS. For most models with less than 10,000 variables in them a DirectSolver will be overall faster than the LNBGS. There is a nice symetry to using LNBGS with NLGBS ... but while the nonlinear solver tends to be a good choice (i.e. fast and stable) for cyclic dependencies the same can't be said for its linear counter part.
So my go-to combination if NLBGS and DirectSolver. You can't always use the DirectSolver. If you have distributed components in your model, or components that use the matrix-free derivative APIs (apply_linear, compute_jacvec_product), then LNBGS is a good option. But if everything is explicit components with compute_partials or implicit components that provide partials in the linearize method then I suggest using the DirectSolver as your first option.
I think you may have discovered a coloring performance issue in OpenMDAO. When we compute coloring, internally we replace the component partials with random arrays matching the declared sparsity. Since we're not trying to find an actual solution when we compute coloring, we probably don't need to iterate more than once in any given system. And we shouldn't be generating convergence warnings when computing the coloring. I don't think you need to be worried in this case. I'll put a story in our bug tracker to look into this.
I had a similar issue as expressed in this question. I followed Rob Flack's answer but had issues. If anyone could help me out, I would appreciate it.
I used the code suggested in the answer but had an issue: It changed the simulation results. I added a line in the script for the min_time_climb example that goes like this:
phase.add_timeseries_output('aero.mach', units=None, shape=(1,), output_name = "recorded_mach")
I used the name "recorded_mach" so as to not override anything else Dymos may or may not have been recording. The issue is that the default Altitude (h) vs. time graph actually changed, both the discrete points and simulation curve. I ended up recording 4 variables with similar commands to what I have just shown and that somehow made the simulation track better with the discrete optimisation points on the graph. When I recorded another 4 variables on top of that, it made it track worse. I find this very strange because I don't see why recording the simulation should change its output.
Have you ever come across this? Any insight you could provide into the issue would be greatly appreciated.
Notes:
I have somewhat modified the example in order to fit a different sutuation (Different thrust and fuel burn data, different lift and drag polars, different height and speed goals) before implimenting the code described above. However, it was working fine still.
Without some kind of example to look at, I can only make an educated guess. So please take my answer with a grain of salt.
Some optimization problems have very ill conditioned Jacobians and/or KKT matrices (which you as a user would not normally see, but can be problematic none the less). There are many potential causes for this ill conditioning, but some common ones are very large derivatives (i.e. approaching infinity) or very larger ranges in magnitude between different derivatives. Another common cuase is the introduction of a saddle point, where you have infinite numbers of answers that are all equally good. Sometimes you can fix the problem with scaling, other times you need to re-work the problem formulation.
Ill conditioning has two bad effects on the optimizer. First, it makes it very hard for the numerics inside to comput inverses which are needed to compute step sizes. It will get an answer, but may be highly subject to numerical noise. Second, it may prevent certain approximations (like BFGS) from performing well in the first place.
In these cases, small changes in execution order or extra steps (e.g. case recoding) can cause the optimizer to take a different path. If you're finding that the path ultimately leads one case to work and another to fail, then you might have a marginally stable problem where you got lucky one time and not the other.
Look carefully for anything singular-like in your jacobian. 0 rows/columns? a constraint that happens to be satisfied, but still has a 0 row is a problem that comes up in Dymos cases if you forget to add additional degrees of freedom when you add constraints. Saddle points also arise if you're careful with your objective.
I have two openmdao groups with cyclic dependency between the groups. I calculate the derivatives using Complex step. I have a non-linear solver for the dependency and use SLSQP to optimize my objective function. The issue is with the choice of the non-linear solver. When I use NonlinearBlockGS the optimization is successful in 12 iterations. But when I use NewtonSolver with Directsolver or ScipyKrylov the optimization fails (Iteration limit exceeded), even with maxiter=2000. The cyclic connections converge, but it is just that the design variables does not reach the optimal values. The difference between the design variables in consecutive iterations is in the order 1e-5. And this increases the iterations needed. Also when I change the initial guess to a value closer to the optimal value it works.
To check further, I converted the model into IDF (by creating copies of coupling variables and consistency constraints) thereby removing the need for a solver. Now the optimization is successful in 5 iterations and the results are similar to the results when NonlinearBlockGS is used.
Why does this happen? Am I missing something? When should I use NewtonSolver over others? I know that it is difficult to answer without seeing the code. But it is just that my code is long with multiple components and I couldn't recreate the issue with a toy model. So any general insight is much appreciated.
Without seeing the code, you're right that its hard to give specifics.
Very broadly speaking, Newton can sometimes have a lot more trouble converging than NLBGS (Note: this is not absolutely true, but is a good rule of thumb). So what I would guess is happening is that on your first or second iteration, the newton solver isn't actually converging. You can check this by setting newton.options['iprint']=2 and looking at the iteration history as the optimizer iterates.
When you have a solver in your optimization, its critical that you also make sure that you set it to throw an error on non-convergence. Some optimizers can handle this error, and will backtrack on the line search. Others will just die. Either way, its important. Otherwise, you end up giving the optimizer an unconverged case that it doesn't know is unconverged.
This is bad for two reasons. First, the objective and constraints values it gets are going to be wrong! Second, and perhaps more importantly, the derivatives it computes are going to be wrong! You can read the details [in the theory manual,] but in summary the analytic derivative methods that OpenMDAO uses assume that the residuals have gone to 0. If thats not the case, the math breaks down. Even if you were doing full model finite-difference, non-convergenced models are a problem. You'll just get noisy garbage when you try to FD it.2
So, assuming you have set up your model correctly, and that you have the linear solvers set up problems (it sounds like you do since it works with NLBGS), then its most likely that the newton solver isn't converging. Use iprint, possibly combined with driver debug printing, to check this for yourself. If thats the case, you need to figure out how to get newton to behave better.
There are some tips here that are pretty general. You could also try using the armijo line search, which can often stablize a newton solve at the cost of some speed.
Finally... Newton isn't the best answer in all situations. If NLBGS is more stable, and computational cheaper you should use it. I applaud your desire to get it to work with Newton. You should definitely track down why its not, but if it turns out that Newton just can't solve your coupled problem reliably thats ok too!
the set it to throw an error on non-convergence is broken on your answer. I have added the link which I think is the right one. Please correct if the linked one is not the one you were thinking to link.
Obviously an R (and math) amateur. I've been working 10+ hours on trying to get this to work, so I thought I'd attempt posting here as a shot.
I have data collected from an experiment with two variables: Iq and q. These data are linear when plotted in loglog space. I am trying to solve for two other variables, por and r, in the following equation:
Iq=SLD^2*(por/Vra)*integral{Rmin to Rmax}((Vr)^2*f(r)*F dr)
Where:
SLD=known constant
por=unknown
Vra=integral{0 to Inf}(Vr*f(r)dr)
Vr=(4/3)*pi*r^3
Rmin and Rmax = known constants
f(r)=((r^-(1+fd))/(Rmin^(-fd) - Rmax^(-fd))/fd)
r=unknown
fd=known constant
F=(3*(sin(q*r)-q*rcos(q*r))/(q*r)^3)^2
I've tried many attempts at this, but can't seem to wrap my brain around the variables inside the variables into code. This problem used to be solved in an Excel solver routine that optimized parameter values using non-linear least squares that only works on (imo) Windows 95 Excel, and we're trying to adapt it into a more user-friendly data processing method. But I'm a geochemist, so basically useless. Any help would be much appreciated! I can include more details if some kind soul out there is willing to help out.
I am trying to use a regression model to establish a relationship between two parameters, A and B(more specifically, runtime and workload, so that can I recommend what an optimal workload could be maybe, or how strongly one affects the other etc. ) I am using 'rlm'(robust linear model) for this purpose since it saves me the trouble of dealing with outliers before hand.
However, rather than output one single regression model, I would like to determine a band that can confidently explain most of the points. Here is an image I took from the web. Those additional red lines are what I want to determine.
This is what I had in mind :
1. I found the mean of the residuals of all the points lying above the line. Then we probably shift the original regression line by some multiple of mean + k*sigma. The same can be done for the points below the line.
In SVM, in order to find the support vectors, we draw parallel lines(essentially shift the middle line until we find support vectors on either sides). I had something like that in mind. Play around with the intercepts a little and find the the number of points which can be explained by the band. Keep a threshold so you can stop somewhere.
The problem is, I am unable to implement this in R. For that matter, I am not sure if these approaches even work either. I would like to know what you would suggest. Also, is there a classic way to do this using one of the many R packages?
Thanks a lot for helping. Appreciate it.