What are the specific reasons for CVXPY to throw `SolverError` exception? - convex-optimization

I am using CVXPY (version 1.0) to solve a quadratic program (QP) and I often get this exception:
SolverError: Solver 'xxx' failed. Try another solver.
which makes my program really fragile. I have tried different solvers, including CVXOPT, OSQP, ECOS, ECOS_BB, SCS. They all have more or less the same problem. I noticed that when I make the stopping criteria of the solver more strict (e.g., decrease the absolute error tolerance), I get SolverError more frequently, and when I make it less strict, the SolverError problem is attenuated and even disappears. I also find that the way that CVXPY throws SolverError is stochastic: if I run the same program many times, there are some runs having SolverError and others get the optimal result.
Although I can avoid SolverError just by trying more times and lowering the stopping criteria, I really want to understand the real specific reasons behind the exception
SolverError: Solver 'xxx' failed. Try another solver.
This error is not really informative and I have no clues on what to do to improve the problem solving robustness. Are its causes specific to a solver? Is this exception thrown for a set of well-defined situations? Or is it just a way of saying "something goes wrong for unknown reasons"? What reasons might those be?

If you have a solver error, you need to either debug by calling the solve method with verbose=True to see the detailed error message or use a more robust commercial solver like MOSEK. The specific reasons for solver errors depend on the solver used. A common cause is having too tight a numerical tolerance or having badly scaled data (i.e., the dynamic range of the floats in your program is too large). I will modify the SolverError message to mention using verbose=True.

Related

OpenMDAO: interrupt a design of experiment computations

I am creating this new topic because I am using the OpenMDAO platform, and more specifically its design of experiment option. I would like to know if there is a proper way to interrupt and stop the computations if a condition is met in my program.
I have already used OpenMDAO optimizers to study and solve some problems and to stop the computations I used to raise an Exception to stop the program. This strategy seems to work for optimizers but not so much when I am using the LatinHypercubeGenerator driver: it is like the OpenMDAO program is still trying to compute the points even if Exception or RuntimeError are raise within the OpenMDAO explicit component function "compute".
In that respect I am wondering if there is a way to kill OpenMDAO during calculations. I tried to check if an OpenMDAO built-in attribute or method could do the job, but I have not found anything.
Does anyone know how to stop OpenMDAO DOE computations?
Many thanks in advance for any advice/help
As of OpenMDAO V3.18, there is no way to add some kind off a stopping condition to the DOE driver. You mention using AnalysisError to achieve this for other optimizers. This won't work in general either, since some drivers will intentionally catch those errors, react, and attempt to keep running the optimization.
You can see the run code of the driver, where a for loop is made and some try/catch blocks are used to record the success/failure of specific cases.
My suggestion for achieving what you want would be to copy the driver code into your model directory and make your own custom drivers. You can add whatever kind of termination condition you like, either based on results of a single case or some statistical analysis of the currently run cases.
If you come up with a clean way of doing it, you can always submit a POEM and/or a pull request to propose adding your new functionality to the mainline of OpenMDAO.

Does non-deterministic nature of property-based testing hurt build repeatability?

I am learning FP and got introduced to the concept of property-based testing and for someone from OOP world PBT looks both useful and dangerous. It does check a lot of options, but what if there is one (or some) options that fail, but they didn't fail during your first let's say Jenkins build. Then next time you run the build the test may or may not fail, doesn't it kill the entire idea of repeatable builds?
I see that some people explored options to make the tests deterministic, but then if such test doesn't catch an error it will never catch it.
So what's better approach here? Do we sacrifice build repeatability to eventually uncover a bug or do we take the risk of never uncovering it, but get our repeatability back?
(I hope that I properly understood the concept of PBT, but if I didn't I would appreciate if somebody could point out my misconceptions)
Doing a lot of property-based testing I don’t see indeterminism as a big problem. I basically experience three types of it:
A property is really indeterministic b/c some external factor - e.g. timeout, delay, db config - makes it so. Those flaky tests also show up in example-based testing and should be eliminated by making the external factor deterministic.
A property fails rarely because the triggering condition is only sometimes met by pseudo random data generation. Most PBT libraries have ways to reproduce those failing runs, eg by re-using the random seed of the failing test run or even remembering the exact constellation in a database of some sort. Those failures reveal problems and are one of the reasons why we’re doing random test cases generation in the first place.
Coverage assertions („this condition will be hit in at least 5 percent of all cases“) may fail from time to time even though they are generally true. This can be mitigated by raising the number of tries. Some libs, eg quickcheck, do their own calculation of how many tries are needed to prove/disprove coverage assumptions and thereby mostly eliminate those false positives.
The important thing is to always follow up on flaky failures and find the bug, the indeterministic external factor or the wrong assumption in the property‘s invariant. When you do that, sporadic failures will occur less and less often. My personal experience is mostly with jqwik but other people have been telling me similar stories.
You can have both non-determinism and reproducible builds by generating the randomness outside the build process. You could generate it during development or during external testing.
One example would be to seed your property based tests, and to automatically modify this seed on commit. You're still making a tradeoff. A developer could be alerted of a bug unrelated to what they're working on, and you lose some test capacity since the tests might change less often.
You can tip the tradeoff further in the deterministic direction by making the seed change less often. You could for example have one seed for each program component or file, and only change it when a related file is committed.
A different approach would be to not change the seed during development at all. You would instead have automatic QA doing periodic or continuous testing with random seeds and use them to generate bug reports/issues that can be dealt with when convenient.
johanneslink's analysis of non-determinism is spot on.
There's one thing I would like to add: non-determinism is not only a rare and small cost, it's also beneficial. If the first run of your test suite is successful, insisting on determinism means insisting that future runs (of the same suite against the same system) will find zero bugs.
Usually most test suites contain many independent tests of many independent system parts, and commits rarely change large parts of the system. So even across commits, most tests test exactly the same thing before and after, where once again determinism guarantees that you will find zero bugs.
Allowing for randomness means every run has at least a chance of discovering a bug.
That of course raises the question of regression tests. I think the standard argument is something like this: to maximize value per effort you should focus your testing on the most bug-prone parts of the code. Having observed a bug in the past provides evidence about which part of the code is buggy (and which kind of bug it's likely to have). You should use that evidence to guide your testing effort. (Often with a laser-like focus on one concrete bug.)
I think this is a very reasonable argument. I also think there's more than one way of making good use of the evidence provided by bugs.
For example, you might write a generator which produces data of the same kind and shape as the data which triggered the bug the first time, and/or which is tailor made to trigger the bug.
And/or, you might want to write tests verifying specifically those properties that were violated by the buggy behavior.
If you want to judge how good these tests are, I recommend running them a couple of times (on normally sized input batches). If they trigger the bug every time, it's likely to do so in the future also.
Here's a (hopefully thought-)provoking question: is it worse to release software which has a bug it has had before, or release software with new bugs? In other words: is catching past bugs more important than catching new ones—or do do it primarily because it's easier?
If you think we do it in part because it's easier, then I don't think it matters that re-catching the bug is probabilistic: what you should really care about is something like the average bug-catching abilities of property testing—its benefits elsewhere should outweigh the fairly small chance that an old bug squeaks through, even though it got caught in (say) 5 consecutive runs of the tests when you evaluated your regression tests.
Now, if you can't reliably generate random inputs that trigger the bug even though you understand the bug just fine, or the generator which does it is large and complicated and thus costly to maintain, hand-picking a regression example seems like a perfectly reasonable choice.

Math library design decision: throw exceptions or fail silently

Say you're designing a math library (in JS) from scratch: the usual Vector2/3/4, Matrix2/3/4, Quaternion and so on (standard stuff for WebGL apps). What would be the best way to handle bad input? (division by zero, inverting a singular matrix, computing the intersection point between 2 parallel lines and so on).
The two ways to deal with this would be to:
throw exceptions
I know there are plenty who like to know precisely when their code fails - sort of the same people who hate dynamic typing, but I can't help but think of the dreaded "Error 200: Division by zero" exceptions that I got so much of in my early days of programming years ago. The only solution was to sprinkle the code with checks to prevent any of these errors. That only made code UGLY. I also can't help but wonder why programming languages nowadays have adopted +/-Infinity and NaN.
or to fail silently
In this case, the possible scenarios when trying to execute the line:
singularMatrix.invert().add(otherMatrix)
would be:
singularMatrix.invert() would return BAD_MATRIX, and BAD_MATRIX.add() would do nothing (and "stop the computation" (JQuery-like))
singularMatrix.invert() would fail but return itself unchanged and .add() would work
singularMatrix.invert() would fill the matrix with +/-Infinity and the computation would continue
I would personally prefer one of the latter options, but I'm totally open to arguments and alternatives (that's why I'm asking here on SO).
I don't know if the "best way" for this sort of thing has been invented yet.
Whatever you do, don't fail silently. There's no point in continuing the computation if the result is going to be wrong, and you don't want to show an incorrect result to the user and claim that it's correct. Nothing good can come of that, especially in a reusable library where you don't necessarily know what the caller will do with the result.
Throw an exception or return a special value that the caller can check for, such as undefined.
NaN and status codes (option 3)
The IEEE754 standard was created to resolve a lot of these issue in a completely consistent way. For example, 1/0 == +inf, which is a kind of NaN. This standard is baked into processors themselves. It is neither a thrown exception (which would make some simple code very complex) nor a silent failure. You can trace the NaNs all the way back to where they appeared, giving you the debug information you need to fix the bug.
As far as large routines like matrix inversion goes, numerical libraries generally follow the unix convention of returning a status code. In Javascript you can do this by returning an object with a status property.
Taking your example:
singularMatrix.invert().add(otherMatrix)
If invert were to return a matrix object full of NaNs with the status property 'invalid matrix', then add can be called and return another matrix full of NaNs.
This permits you to call invert and later check whether it was valid; if you use exceptions you have to handle them immediately, and when you want to defer a decision until later, you'll have to set up the same set of properties.
There is still useful information in a matrix that is partially or entirely filled with NaNs - the shape information can be used to create a new matrix to replace a bad initial vector, or known good values can still be used in calculation.
TLDR: Do the NaN thing, and recreate it in matrices.

fault tolerance in MPICH/OpenMPI

I have two questions-
Q1. Is there a more efficient way to handle the error situation in MPI, other than check-point/rollback? I see that if a node "dies", the program halts abruptly.. Is there any way to go ahead with the execution after a node dies ?? (no issues if it is at the cost of accuracy)
Q2. I read in "http://stackoverflow.com/questions/144309/what-is-the-best-mpi-implementation", that OpenMPI has better fault tolerance and recently MPICH-2 has also come up with similar features.. does anybody know what they are and how to use them? is it a "mode"? can they help in the situation stated in Q1 ?
kindly reply. Thank you.
MPI - all implementations - have had the ability to continue after an error for a while. The default is to die - that is, the default error handler is MPI_ERRORS_ARE_FATAL - but that can be set (eg, see the discussion here). But the standard doesn't currently much beyond that; that is, it's hard to recover and continue after such an error. If your program is sufficiently simple - some sort of master-worker type of setup - it may be possible to continue this way.
The MPI forum is currently working on what will become MPI-3, and error handling and fault tolerance will be an important component of the new standard (there's a working group dedicated to the topic). Until that work is complete, however, the only way to get stronger fault tolerance out of MPI is to use earlier, nonstandard, extensions. FT-MPI was a project that developed a very robust MPI, but unfortuantely it's based on MPI1.2; a very early version of the standard. The claim here is that they're now working with OpenMPI, but I don't know what's become of that. There's MPICH-V, based on MPI2, but that's more checkpoint-restart based than what I think you're looking for.
Updated to add: The fault tolerance didn't make it into MPI-3, but the working group continues its work and the expectation is that something will result from that before too long.

to throw, to return or to errno?

i am creating a system. What i want to know is if a msg is unsupported what should it do? should i throw saying unsupported msg? should i return 0 or -1? or should i set an errno (base->errno_). Some messages i wouldnt care if there was an error (such as setBorderColour). Others i would (addText or perhaps save if i create a save cmd).
I want to know what the best method is for 1) coding quickly 2) debugging 3) extending and maintenance. I may make debugging 3rd, its hard to debug ATM but thats bc there is a lot of missing code which i didnt fill in. Actual bugs arent hard to correct. Whats the best way to let the user know there is an error?
The system works something like this but not exactly the same. This is C style and mycode has a bunch of inline functions that wrap settext(const char*text){ to msg(this, esettext, text)
Base base2, base;
base = get_root();
base2 = msg(base, create, BASE_TYPE);
msg(base2, setText, "my text");
const char *p = (const char *)msg(base2, getText);
Generally if it's C++, prefer exceptions unless performance is critical or unless you may be running in an environment (e.g. an embedded platform) that does not support exceptions. Exceptions are by far the best choice for debugging because they are very noticeable when they occur and are ignored. Further, exceptions are self-documenting. They have their own type name and usually a contained message that explains the error. Return codes and errno require separate code definitions and some kind of out-of-band way of communicating what the codes mean in any given context (e.g. man pages, comments).
For coding quickly, return codes are probably easier since they don't involve potentially defining your own exception types, and often the error checking code is not as verbose as with exceptions. But of course the big risk is that it is much easier to silently ignore error return codes, leading to problems that may not be noticed until well after they occur, making debugging and maintenance a nightmare.
Try to avoid ever using errno, since it's very error-prone itself. It's a global, so you never know who is resetting it, and it is most definitively not thread safe.
Edit: I just realized you meant an errno member variable and not the C-style errno. That's better in that it's not global, but you still need additional constructs to make it thread safe (if your app is multi-threaded), and it retains all the problems of a return code.
Returning an error code requires discipline because the error code must be explicitly checked and then passed up. We wrote a large C-based system that used this approach and it took a while to remove all the "lost" errors. We eventually developed some techniques to catch this problem (such as storing the error code in a thread-global location and checking at the top level to see that the returned error code matched the saved error code).
Exception handling is easier to code quickly because if you're writing code and you're not sure how to handle the error, you can just let it propagate upwards (assuming you're not using Java where you have to deal with checked exceptions). It's better for debugging because you can get a stack trace of where the exception occurred (and because you can build in top level exception handlers to catch problems that should have been caught elsewhere). It's better for maintaining because if you've done things right, you'll be notified of problem faster.
However, there are some design issues with exception handling, and if you get it wrong, you'll be worse off. In short, if you're writing code that you don't know how to handle an exception, you should let it propagate up. Too many coders trap errors just to convert the exception and then rethrow it, resulting in spaghetti exception code that sometimes loses information about the original cause of the problem. This assumes that you have exception handlers at the top level (entry points).
Personally when it comes to output graphics, I feel a silent fail is fine. It just makes your picture wrong.
Graphical errors are super easy to spot anyways.
Personally, I would add an errno to your Base struct if it is pure 'C'. If this is C++ I'd throw an exception.
It depends on how 'fatal' these errors are. Does the user really need to see the error, or is it for other developers edification?
For maintainability you need to clearly document the errors that can occur and include clear examples of error handling.

Resources