Guidelines for permissible maximum linearly-independent cyclomatic complexity? - software-design

For software engineering metrics, what are some guidelines on the maximum permissible linearly-independent cyclomatic complexity? For a properly designed module, what is the upper-bound on cyclomatic complexity?

The recommendation in the documentation of the tool NDepend concerning method Cyclomatic Complexity is:
Methods where CC is higher than 15 are hard to understand and maintain.
Methods where CC is higher than 30 are extremely complex and should be split into smaller methods (except if they are automatically generated by a tool).
For a properly designed module, what is the upper-bound on cyclomatic complexity?
CC applies well on methods because a method is a unit of code flow understanding. There are other metrics to estimate the classes and modules (as a grape of classes) design and complexity like for example:
Lack of Cohesion Of Methods
Relational Cohesion
Distance from main sequence
Disclaimer: I work for NDepend

Related

Understanding the reasons behind Openmdao design

I am reading about MDO and I find openmdao really interesting. However I have trouble understanding/justifying the reasons behind some basic choices.
Why Gradient-based optimization ? Since gradient-based optimizer can never guarantee global optimum why is it preferred. I understand that finding a global minima is really hard for MDO problems with numerous design variables and a local optimum is far better than a human design. But considering that the application is generally for expensive systems like aircrafts or satellites, why settle for local minima ? Wouldn't it be better to use meta-heuristics or meta-heuristics on top of gradient methods to converge to global optimum ? Consequently the computation time will be high but now that almost every university/ leading industry have access to super computers, I would say it is an acceptable trade-off.
Speaking about computation time, why python ? I agree that python makes scripting convenient and can be interfaced to compiled languages. Does this alone tip the scales in favor of Python ? But if computation time is one of the primary reasons that makes finding the global minima really hard, wouldn't it be better to use C++ or any other energy efficient language ?
To clarify the only intention of this post is to justify (to myself) using Openmdao as I am just starting to learn about MDO.
No algorithm can guarantee that it finds a global optimum in finite time, but gradient-based methods generally find locals faster than gradient-free methods. OpenMDAO concentrates on gradient-based methods because they are able to traverse the design space much more rapidly than gradient-free methods.
Gradient-free methods are generally good for exploring the design space more broadly for better local optima, and there's nothing to prevent users from wrapping the gradient-based optimization drivers under a gradient-free caller. (see the literature about algorithms like Monotonic Basin Hopping, for instance)
Python was chosen because, while it's not the most efficient in run-time, it considerably reduces the development time. Since using OpenMDAO means writing code, the relatively low learning curve, ease of access, and cross-platform nature of Python made it attractive. There's also a LOT of open-source code out there that's written in Python, which makes it easier to incorporate things like 3rd party solvers and drivers. OpenMDAO is only possible because we stand on a lot of shoulders.
Despite being written in Python, we achieve relatively good performance because the algorithms involved are very efficient and we attempt to minimize the performance issues of Python by doing things like using vectorization via Numpy rather than Python loops.
Also, the calculations that Python handles at the core of OpenMDAO are generally very low cost. For complex engineering calculations like PDE solvers (e.g. CFD or FEA) the expensive parts of the code can be written in C, C++, Fortran, or even Julia. These languages are easy to interface with python, and many OpenMDAO users do just that.
OpenMDAO is actively used in a number of applications, and the needs of those applications drives its design. While we don't have a built-in monotonic-basin-hopping capability right now (for instance), if that was determined to be a need by our stakeholders we'd look to add it in. As our development continues, if we were to hit roadblocks that could be overcome by switching do a different language, we would consider it, but backwards compatibility (the ability of users to use their existing Python-based models) would be a requirement.

Does it make sense to use a gradient free optimizer within openmdao framework

Is my understanding correct that : using a gradient free optimizer wraps the whole problem and treats it as a black box (even though the problem has multiple groups/components attached to inner solvers with gradients etc.).
Then the actual capabilities of openmdao are not exploited well and the advantage of openmdao boils down to easily tracking your calculations with smaller routines etc.
Though it is true that OpenMDAO's most unique and powerful feature is its automatic derivatives capability, IMO that does not mean that it is only useful in the context of gradient based optimization. The framework offers several other features that are useful regardless of what optimizer you chose. For example:
support for parallelization
library of powerful nonlinear solvers
modular model construction
You could certainly hand-code a large, complex model without OpenMDAO, then wrap that with a gradient free optimizer, but I would argue that you would ultimately end up doing a bit more work in the long run. Using a framework provides organization and structure to your model that pays off long term.

Algorithmic Differentiation vs Multiple Explicit Components with Analytical Derivatives

I have a problem composed of around 6 mathematical expressions - i.e. (f(g(z(y(x))))) where x are two independent arrays.
I can divide this expression into multiple explicit comps with analytical derivatives or use an algorithmic differentiation method to get the derivatives which reduces the system to a single explicit component.
As far as i understand it is not easy to tell in advance the possible computational performance difference between these 2 approaches.
It might depend on the algorithmic differentiation tools capabilities on the reverse mode case but maybe the system will be very large with multiple explicit components that it would still be ok to use algo diff.
my questions is :
Is algo diff. a common tool being used by any of the developers/users ?
I found AlgoPY but not sure about other python tools.
As of OpenMDAO v2.4 the OpenMDAO development team has not heavily used AD tools on any pure-python components. We have experimented with it a bit and found roughly a 2x increase in computational vs hand differentiated components. While some computational cost increase is expected, I do not want to indicate that I expect 2x to be the final rule of thumb. We simply don't have enough data to provide such an estimate.
Python based AD tools are much less well developed than those for compiled languages. The dynamic typing and general language flexibility both make it much more challenging to write good AD tools.
We have interfaced OpenMDAO with compiled codes that use AD, such as CFD and FEA tools. In these cases you're always using the matrix-free derivative APIs for OpenMDAO (apply_linear and compute_jacvec_product).
If your component is small enough to fit in memory and fast enough to run on a single process, I suggest you hand differentiate your code. That will give you the best overall performance for now.
AD support for small serial components is something we'll look into supporting in the future, but we don't have anything to offer you in the near term (as of OpenMDAO v2.4)

Why graph processing is difficult to be distributed?

Recently I read the paper Scalability! But at what cost?. In this paper, authors take graph computation as an example to measure their performance on a single thread machine compared to the performance on some distributed frameworks.
In section 2, authors stated that graph computation represents one of the simplest classes of data-parallel computation that is not trivially parallelized. Can anybody tell me what are the main barriers in the parallelization of graph computing?
The main barriers are the commutative and associative properties of the graph operations. These two properties determine if an algorithm is trivially parallelizable or not. In the page you linked the authors state the following:
The updates are commutative and associative,
and consequently admit a scalable implementation [7].
Actually the cited paper at [7] is a PhD dissertation which explains it quite well:
At the core of this dissertation’s approach is this scalable commutativity rule: In any situation where several operations commute—meaning there’s no way to distinguish their execution order using the interface—they have a implementation that is conflict-free during those operations—meaning no core writes a cache line that was read or written by another core.
Empirically, conflict-free operations scale, so this implementation scales. Or, more concisely, whenever interface operations commute, they can be implemented in a way that scales. This rule makes intuitive sense: when operations commute, their results(return values and
effect on system state) are independent of order. Hence, communication between commutative
operations is unnecessary, and eliminating it yields a conflict-free implementation. On modern
shared-memory multicores, conflict-free operations can execute entirely from per-core caches,
so the performance of a conflict-free implementation will scale linearly with the number of
cores.
For example cartesian graph product is a commutative and associative operation, the resulting vertices can be calculated in any order, making parallelization easy in this case. However most graph operations lack either one or both of these properties.

Constraint handling, integer & parallel optimization

I have recently been assigned to a project where an optimization tool will be developed in python.
Various online search points out there are multiple libraries/platforms that come with pros and cons. As far as I have looked up with the existing openmdao framework we can not have an optimizer that can do constraint handling, mixed-integer, parallel optimization. Here with parallel it is meant that each iteration should be parallellized as in GADriver. I wanted to ask some advice from the developers considering the future possible improvements on openmdao:
Is it a good idea to look into writing a wrapper for an existing optimizer that can handle the aforementioned request or should one opt out from openmdao completely as openmdao may not be the strongest platform in this specific problem?
if writing a wrapper is a good idea i assume one should look for driver routines in the openmdao 2.2.X github. Do you have any advice for an optimizer type within python (paid or free) that can be easily compatible with openmdao.
There is an AIAA paper titled "Next generation aircraft design considering airline operations and economics", which described current state-of-the-art research into mixed integer programming problems. The approach here used a hybrid method that takes advantage of the efficient gradient based capabilities of OpenMDAO to handle larger numbers of continuous design variables.
In general, there is no limitation on mixed integer programming. You just need to write your own driver to handle it. These algorithsm are complex, but SimpleGADriver is a decent place to start to see how to run the model in parallel.

Resources