z3: How to identify conflicts in an unsat model? - constraints

In an interactive scenario, user actions cause constraints to be created. These constraints are then evaluated using the Microsoft z3 solver. When all the constraints are good, I can extract the resolved values using (get-model) and all is good.
When user actions result in an over-constrained model (i.e. conflicting constraints), is there a way how to identify which of the input asserts actually cause the conflict that leads to the unsat result?
I'd like to use this information to provide UI to users that guides them to choose among conflicting demands they have made in their model.

You're looking for unsatisfiable cores, which Z3 supports; see smtc_core for an example.

Related

Can I update the spacy's Entity Linking knowledge base after training?

Let's suppose I have successfully trained an Entity Linking model, and it is working just fine. But, eventually, I'm going to update some aliases of the knowledge base. Just some aliases not the description nor new entities.
I know that spacy has a method to do so which is: kb.add_alias(alias="Emerson", entities=qids, probabilities=probs). But, what if I have to do that after the training process? Should I re-run everything, or updating the KB will do?
Best thing is to try it and see.
If you're just adding new aliases, it really depends on how much they overlap with existing aliases. If there's no overlap it won't make any difference, but if there is overlap that could have resulted in different evaluations in training, which could modify the model. Whether those differences are significant or not is hard to say.

Avoiding singularity in analysis - does OpenMDAO automatically enable 'fully-simultaneous' solution?

Turbulent boundary layer calculations break down at the point of flow separation when solved with a prescribed boundary layer edge velocity, ue, in what is called the direct method.
This can be alleviated by solving the system in a fully-simultaneous or quasi-simultaneous manner. Details about both methods are available here (https://www.rug.nl/research/portal/files/14407586/root.pdf), pages 38 onwards. Essentially, the fully-simultaneous method combines the inviscid and viscous equations into a single large system of equations, and solves them with Newton iteration.
I have currently implemented an inviscid panel solver entirely in ExplicitComponents. I intend to implement the boundary layer solver also entirely with ExplicitComponents. I am unsure whether coupling these two groups would then result in an execution procedure like the direct method, or whether it would work like the fully-simultaneous method. I note that in the OpenMDAO paper, it is stated that the components are solved "as a single nonlinear system of equations", and that the reformulation from explicit components to the implicit system is handled automatically by OpenMDAO.
Does this mean that if I couple my two analyses (again, consisting purely of ExplicitComponents) and set the group to solve with the Newton solver, I'll get a fully-simultaneous solution 'for free'? This seems too good to be true, as ultimately the component that integrates the boundary layer equations will have to take some prescribed ue as an input, and then will run into the singularity in the execution of its compute() method.
If doing the above would instead make it execute like the direct method and lead to the singularity, (briefly) what changes would I need to make to avoid it? Would it require defining the boundary layer components implicitly?
despite seeming too good to be true, you can in fact change the structure of your system by changing out the top level solver.
If you used a NonlinearBlockGS solver at the tope, it would solve in the weak form. If you used a NewtonSolver at the top, it would solve as one large monolithic system. This property does indeed derive from the unique structure of how OpenMDAO stores things.
There are some caveats. I would guess that your panel code is implemented as a set of intermediate calculations broken up across several components. If thats the case, then the NewtonSolver will be treating each intermediate variable as it it was its own state variable. In other words, you would have more than just delta and u_e as states, but also all the intermediate calculations too.
This is might be somewhat unstable (though it might work just fine, so try it!). You might need a hybrid between the weak and strong forms, that can be achieved via the solve_subsystems option on the NewtonSolver. This approach, is called the Hierarchical Newton Method in section 5.1.2 of the OpenMDAO paper. It will do a sub-iteration of NLBGS for every top level Newton iteration. This acts as a form of nonlinear preconditioner which can help stabilize the strong form. You can limit ho many sub-iterations are done, and in your case you may want to use just 2 or 3 because of the risk of singularity.

How to check for missing partials

I implemented a system that is composed of few groups and multiple components. It is relatively intricate and has component inputs/outputs, which some partials are dependent/non dependent etc.
Gradient based optimizers seem to get stuck at the initial values and never go further than iteration 0 (not stuck at local optimum). I have encountered this error before as I was missing declare_partials for some variables. Is there a way to automatically check which component input/output is missing partials similar to missing connection in N^2 diagram.
There are two tools that you need to use to check for bad derivatives. The first is check_partials. That will go component by component and use either finite-difference or complex-step to verify the partial derivatives for every component (regardless of whether or not your declared them in the setup of that component). That will catch the problem if you are missing any partials, because the check-fd will see them as non-zero and will show you that there is an error.
Check_partials should be your first stop, always. If you can, use complex-step to verify your derivatives. That way you know they are totally accurate. Also, check_partials will do the check around whatever point is currently initialized. So sometimes you might have a degenerate case (e.g. you have some input that is 0) and so your check_passes, but your derivatives are still wrong. For example, if your component represented y=2*x, and you forgot to define derivatives, but you ran check_partials at x=0, then the check would pass. But if you ran it at x=1, then the check would show an error.
If all of your partial derivatives are correct, but you're still getting bad results then you can try check_totals. Depending on the structure of your model, and if you have any coupling in it (i.e. you need to use some kind of nonlinear solver) then its possible that you don't have a correctly configured linear solver setup to solve for total derivatives correctly. In a lot of cases, if you have coupling you can just put a DirectSolver right at the same level as the nonlinear solver you put in the model.

Can I choose clustering methods base on internal validation result?

From what I understand, it is usually difficult to select the best possible clustering method for your data priori, and we can use cluster validity to compare the results of different clustering algorithms and choose the one with the best validation scores.
I use an internal validation function from R stats package on my clustering result (for clustering methods I used R igraph fast.greedy and walk.trap).
The outcome is a list of many validation scores.
In the list, almost in every validation Fast greedy method has better scores than Walk trap, except in entropy walk trap method has a better score.
Can I use this validation result list as one of my reasons to explain to others why I choose Fast greedy method rather than walk trap method?
Also, is there any way to validate a disconnected graph?
Short answer: NO!
You can't use a internal index to justify the choose of a algorithm over another. Why?
Because evaluation indexes were designed to evaluate clustering results, i.e., partitions and hierarchies. You can only use them to access the quality of a clustering and therefore justify its choice over the others options. But again, you can't use them to justify choosing a particular algorithm to apply on a different dataset based on a single previous experiment.
For this task, several benchmarks are needed to determine which algorithms are generally better and should be tried first. Here some paper about it: Community detection algorithms: a comparative analysis.
Edit: What I am saying is, your validation indexes may show that the fast.greed's solution is better than the walk.trap's. However, they do not explain why you chose these algorithms instead of any others. Only your data, your assumptions, and your constraints could do that.
Also, is there any way to validate a disconnected graph?
Theoretically, any evaluation index can do this. Technically, some implementations don't handle disconnected components.

OCL : express the unique constraint in a more optimal way

I am learning OCL (using "USE"), I've a question about the isUnique() constraint here's the following example:
We want to establish the unique constraint of customer numbers through the class full as follows
context Client
inv NoClientUnique : Client.allInstances -> isUnique(noClient)
but this expression is far from optimal , because it is possible that the constraint is validated repeatedly. Please anyone can explain me when this is the case and why, and please if you could give me another way to express the unique constraint of Client.noClient using an optimal. i'll appreciate any help.
OCL is a declarative language. Therefore you express what you want to happen, not how to do it. It doesn't make sense to discuss about how optimal is an OCL expression when optimal referes to the execution time. The translation engine should then be able to translate this declarative expression into the most efficient imperative traversal of the object graph in order to verify it.
Today, you can avoid the inefficiency by placing the constraint in a neutral scoping class such as perhaps the ClientManager that just needs to assert once that all-my-clients are unique.
Realistically there should always be such a scoping class, since if you take your subsystem and instantiate it multiple times, the Constraint that all Clients are unique is a fallacy. There is no reason why two different companies shouldn't have the same client if uniqueness is defined by a set of attributes rather than object identity.
(It has been suggested that it should be possible to have Package constraints so that in this case you could have used the Package as the neutral location.)

Resources