Recently I started to be familiar with DiffEqPhysics and DifferentialEquations package of Julia. I'm wondering if there is a way (like callback functions) to terminate the solver with certain output convergence condition. For example if the result obtained in previous step (or a range of previous steps) does not differ from the solution at current step by a threshold value, then end the process after the current step.
Yes, you can use callbacks to do this. In the callback, doing terminate!(integrator) will halt the integration. In the docs this example shows using callbacks with terminate! in more detail.
But instead of building your own, for terminating at a steady state you can use TerminateSteadyState from the callback library. The callback library is just a set of pre-built callbacks, and this one terminates when the derivative is sufficiently small.
Related
I am creating this new topic because I am using the OpenMDAO platform, and more specifically its design of experiment option. I would like to know if there is a proper way to interrupt and stop the computations if a condition is met in my program.
I have already used OpenMDAO optimizers to study and solve some problems and to stop the computations I used to raise an Exception to stop the program. This strategy seems to work for optimizers but not so much when I am using the LatinHypercubeGenerator driver: it is like the OpenMDAO program is still trying to compute the points even if Exception or RuntimeError are raise within the OpenMDAO explicit component function "compute".
In that respect I am wondering if there is a way to kill OpenMDAO during calculations. I tried to check if an OpenMDAO built-in attribute or method could do the job, but I have not found anything.
Does anyone know how to stop OpenMDAO DOE computations?
Many thanks in advance for any advice/help
As of OpenMDAO V3.18, there is no way to add some kind off a stopping condition to the DOE driver. You mention using AnalysisError to achieve this for other optimizers. This won't work in general either, since some drivers will intentionally catch those errors, react, and attempt to keep running the optimization.
You can see the run code of the driver, where a for loop is made and some try/catch blocks are used to record the success/failure of specific cases.
My suggestion for achieving what you want would be to copy the driver code into your model directory and make your own custom drivers. You can add whatever kind of termination condition you like, either based on results of a single case or some statistical analysis of the currently run cases.
If you come up with a clean way of doing it, you can always submit a POEM and/or a pull request to propose adding your new functionality to the mainline of OpenMDAO.
I've been trying to figure out a good way to to record basic function calls and function calls to the sensitivity functions of my models in openmdao. I have not found an easy way to do this, but I think I must be missing something. What is the best way to record function calls and sensitivity function calls during an optimization? I need this information from regular runs, not just during debugging.
The OpenMDAO documentation has extensive info on this topic:
http://openmdao.org/twodocs/versions/latest/features/recording/index.html
Here is how to save your data:
http://openmdao.org/twodocs/versions/latest/features/recording/saving_data.html
And how to read it:
http://openmdao.org/twodocs/versions/latest/features/recording/reading_data.html
I have been having trouble implementing an asynchronous gradient descent in a multithreaded environment.
To describe the skeleton of my code, for each thread,
loop
synchronize with global param
< do some work / accumulate gradients on mini-batch >
apply gradient descent to the global network, specifically,
= self.optimizer.apply_gradients(grads_and_vars)
end
where each thread has its own optimizer.
Now the problem is that, in defining the optimizer with 'use_locking=False', it does Not work, evidenced by the rewards generated by my reinforcement learning agent.
However, when I set the 'use_locking=True', it works so the algorithm is correct; it's just that the local gradients are not applied properly to the global param.
So some possible reasons I thought of were the following:
1. While one thread is updating the global param, when another thread accesses the global param, the former thread cancels all remaining updates. And that too many threads access this global param concurrently, threads do all hard work for nothing.
2. Referring to, How does asynchronous training work in distributed Tensorflow?, reading asynchronously is certain fine in the top of the loop. However, it may be that as soon as the thread is done applying the gradient, it goes to synchronizing from the global param too quickly that it does not fetch the updates from other threads.
Can you, hopefully tensorflow developer, help me what is really happening with 'use_locking' for this specific loop instance?
I have been spending days on this simple example. Although setting use_locking = True does solve the issue, it is not asynchronous in nature and it is also very slow.
I appreciate your help.
To use OpenCL kernel the following is needed:
Put the kernel code in a string
call clCreateProgramWithSource
call clBuildProgram
call clCreateKernel
call clSetKernelArg (x number of arguments)
call clEnqueueNDRangeKernel
This need to be done for each kernel. Is there a way to do this repeating less code for each kernel?
There is no way to speed up the process. You need to go step by step as you listed.
But it is important to know why it is needed these steps, to understand how flexible the chain is.
clCreateProgramWithSource: Allows to add different strings from different sources to generate the program. Some string might be static, but some might be downloaded from a server, or loaded from disk. It allows the CL code to be dynamic and updated over time.
clBuildProgram: Builds the program for a given device. Maybe you have 8 devices, so you need to call this multiple times. Each device will produce a different binary code.
clCreateKernel: Creates a kernel. But a kernel is an entry point in a binary. So it is possible you create multiple kernels from a program (for different functions). Also the same kernel might be created multiple times, since it holds the arguments. This is useful for having ready-to-be-launched instances with proper parameters.
clSetKernelArg: Changes the parameters in the instance of the kernel. (it is stored there, so it can used multiple times in the future).
clEnqueueNDRangeKernel: Launches it, configuring the size of the launch and the chain of dependencies with other operations.
So, even if you could have a way to just call "getKernelFromString()", the functionality will be very limited, and not very flexible.
You can have look at wrapper libraries
https://streamhpc.com/knowledge/for-developers/opencl-wrappers/
I suggest you look into SYCL. The building steps are performed offline, saving execution time by skipping the clCreateProgramWithSource. The argument setting is done automatically by the runtime, extracting the information from the user lambda
There is also CLU: https://github.com/Computing-Language-Utility/CLU - see https://www.khronos.org/assets/uploads/developers/library/2012-siggraph-opencl-bof/OpenCL-CLU-and-Intel-SIGGRAPH_Aug12.pdf for more info. It is a very simple tool, but should make life a bit easier.
The case I am solving is two discipline aerospace problem. The architecture is IDF. I am using recorders to record the data at each iteration. I am using finite difference. I am using SLSQP optimizer from SciPy.
If after few major iteration, the optimization crashes during line search. How to start the line search from the same point?
Apart from that, I want to check whether the call to solver_nonlinear() of Component is called for purpose of derivative calculation or for line search, from inside the component. Is there a way to do it?
SLSQP doesn't offer any built in restart capability, so there isn't a whole lot you can do there. Pyopt-sparse does have some restart capability that OpenMDAO can use. Its called "hot-start" in their code.
As for knowing if a solve_nonlinear is for derivative calculations or not, I assume you mean that you want to know if the call is for an FD step or not. We don't currently have that feature.