parallelize openmdao optimization with different initial guesses - openmdao

In order to find a better minimum, I currently create and run multiple instances of openmdao problem with different initial guesses, then select the solution with the best performance. To make this process faster, I currently use Python's multiprocessing module and solve each openmdao problem in a parallel subprocess.
However, as my problem becomes more complex, I would like to parallelize the optimization process too (by using ParallelGroup and/or distributed components), and I'm unsure if mpi will interact with Python's multiprocessing in strange ways. Is there any openmdao features that will handle both parallelism in solving individual problems and multiple problem instances?

You can run multiple instances of an OpenMDAO problem under MPI (without subprocesses) by splitting the comm and passing a sub-comm to the Problem constructor. See a basic example here:
https://github.com/OpenMDAO/OpenMDAO/blob/master/mpitest/test_mpi.py#L207-237
Your Problem can have ParallelGroup(s) and it will be fine as long as you give it a large enough comm.

Related

Limit GPU Memory in Julia using CuArrays

I'm fairly new to julia and I'm currently trying out some deep convolution networks with recurrent structures. I'm training the networks on a GPU using
CuArrays(CUDA Version 9.0).
Having two separate GPU's, I started two instances with different datasets.
Soon after some training both julia instances allocated all available Memory (2 x 11GB) and I couldn't even start another instance on my own using CuArrays (Memory allocation error). This became quite a problem, since this is running on a Server which is shared among many people.
I'm assuming that this is a normal behavior to use all available memory to train as fast as possible. But, under these circumstances I would like to limit the memory which can be allocated to run two instances at the same time and don't block me or other people from using the GPU.
To my surprise I found only very, very little information about this.
I'm aware of the CUDA_VISIBLE_DEVICES Option but this does not help since I want to train simultaneously on both devices.
Another one suggested to call GC.gc() and CuArrays.clearpool()
The second call throws an unknown function error and seems not to be within the CuArray Package anymore. The first one I'm currently testing but not exactly what I need. Is there any possibilty to limit the allocation of RAM on a GPU using CuArrays and Julia?
Thanks in advance
My Batchsize is 100 and one batch should have less than 1MB...
There is currently no such functionality. I quickly whipped something up, see https://github.com/JuliaGPU/CuArrays.jl/pull/379, you can use it to define CUARRAYS_MEMORY_LIMIT and set it to an amount of bytes that the allocator will not go beyond. Note that this might significantly increase memory pressure, a situation for which the CuArrays.jl memory allocator is currently not optimized (though it is one of my top priorities for the Julia GPU infrastructure).

Using ExternalCodeComp as the single comp and OpenMDAO concept

I am very much attracted to the idea of using the OpenMDAO. However I am not sure if it is worthwhile to use OpenMDAO in an optimization scenario where I use an external code as a single component and nothing else.
Is there any difference between the implementation using an optimizer available in SciPy versus the aforementioned openmdao implementation.
Or any difference between that and implementation of similar approach in some other language like matlab optimization toolbox etc?
(Of course the way optimizers are implemented may differ but i mean conceptually am i taking advantage of OpenMDAO with this approach?)
As far as I read the articles; openMDAO is powerful in cases where multiple components ''interact'' with each other and "global derivatives"" are obtained?
Am I taking advantage of openMDAO by using single ExternalCodeComp
Using just a single ExternalCodeComp would not be using the full potential of OpenMDAO. There would still be some advantages, because the ExternalCodeComp handles a lot of file wrapping details for you. Additionally, there are often details in an optimization, such as adding constraints, the will commonly require additional components. In that case you might use an ExecComp to add a few additional calculations.
Lastly, using OpenMDAO would allow you to potentially grow your model in the future to include other disciplines.
If you are sure that you'll never do anything other than optimize the one external code, then OpenMDAO does reduce down to a similarly functionality to using the bare pyoptsparse, scipy, or matlab optimizers though. In this corner case, OpenMDAO doesn't bring a whole lot to the table, other than the ease of use of the ExternalCodeComp.

Constraint handling, integer & parallel optimization

I have recently been assigned to a project where an optimization tool will be developed in python.
Various online search points out there are multiple libraries/platforms that come with pros and cons. As far as I have looked up with the existing openmdao framework we can not have an optimizer that can do constraint handling, mixed-integer, parallel optimization. Here with parallel it is meant that each iteration should be parallellized as in GADriver. I wanted to ask some advice from the developers considering the future possible improvements on openmdao:
Is it a good idea to look into writing a wrapper for an existing optimizer that can handle the aforementioned request or should one opt out from openmdao completely as openmdao may not be the strongest platform in this specific problem?
if writing a wrapper is a good idea i assume one should look for driver routines in the openmdao 2.2.X github. Do you have any advice for an optimizer type within python (paid or free) that can be easily compatible with openmdao.
There is an AIAA paper titled "Next generation aircraft design considering airline operations and economics", which described current state-of-the-art research into mixed integer programming problems. The approach here used a hybrid method that takes advantage of the efficient gradient based capabilities of OpenMDAO to handle larger numbers of continuous design variables.
In general, there is no limitation on mixed integer programming. You just need to write your own driver to handle it. These algorithsm are complex, but SimpleGADriver is a decent place to start to see how to run the model in parallel.

openmdao example optimization with parallelgroup

Is there an example of an OpenMDAO optimization where each iteration of the optimization is designed to run in parallel? The examples I saw seemed focused on the design-of-experiment drivers.
Once you get into parallelism within the model, there are potentially many many different ways to split up a problem. However, the simplest case, and the one that is likely the most relevant to something like running multiple directional cases at the same time is just to use a multi-point like setup. This is done with a parallel group, as shown in the multi-point doc.

Memory virtualization with R on cluster

I don't know almost anything about parallel computing so this question might be very stupid and it is maybe impossible to do what I would like to.
I am using linux cluster with 40 nodes, however since I don't know how to write parallel code in R I am limited to using only one. On this node I am trying to analyse data which floods the memory (arround 64GB). So my problem isn't lack of computational power but rather memory limitation.
My question is, whether it is even possible to use some R package (like doSnow) for implicit parallelisation to use 2-3 nodes to increase the RAM limit or would I have to rewrite the script from ground to make it explicit parallelised ?
Sorry if my question is naive, any suggestions are welcomed.
Thanks,
Simon
I don't think there is such a package. The reason is that it would not make much sense to have one. Memory access is very fast, and accessing data from another computer over the network is very slow compared to that. So if such a package existed it would be almost useless, since the processor would need to wait for data over the network all the time, and this would make the computation very very slow.
This is true for common computing clusters, built from off-the-shelf hardware. If you happen to have a special cluster where remote memory access is fast, and is provided as a service of the operating system, then of course it might be not that bad.
Otherwise, what you need to do is to try to divide up the problem into multiple pieces, manually, and then parallelize, either using R, or another tool.
An alternative to this would be to keep some of the data on the disk, instead of loading all of it into the memory. You still need to (kind of) divide up the problem, to make sure that the part of the data in the memory is used for a reasonable amount of time for computation, before loading another part of the data.
Whether it is worth (or possible at all) doing either of these options, depends completely on your application.
Btw. a good list of high performance computing tools in R is here:
http://cran.r-project.org/web/views/HighPerformanceComputing.html
For future inquiry:
You may want to have a look at two packages "snow" and "parallel".
Library "snow" extends the functionality of apply/lapply/sapply... to work on more than one core and/or one node.
Of course, you can perform simple parallel computing using more than one core:
#SBATCH --cpus-per-task= (enter some number here)
You can also perform parallel computing using more than one node (preferably with the previously mentioned libraries) using:
#SBATCH --ntasks-per-node= (enter some number here)
However, for several implications, you may wanna think of using Python instead of R where parallelism can be much more efficient using "Dask" workers.
You might want to take a look at TidalScale, which can allow you to aggregate nodes on your cluster to run a single instance of Linux with the collective resources of the underlying nodes. www.tidalscale.com. Though the R application may be inherently single threaded, you'll be able to provide your R application with a single, simple coherent memory space across the nodes that will be transparent to your application.
Good luck with your project!

Resources