Difficulty satisfying hard and medium constraints simultaneously with Optaplanner - constraints

I've implemented a sensor scheduling problem using OptaPlanner 6.2 that has 1 hard constraint, 1 medium constraint, and 1 soft constraint. The trouble I'm having is that either I can satisfy some of the hard constraints after 30 seconds of or so, and then the solver makes very little progress satisfying them constraints with additional minutes of termination. I don't think the problem is over constrained; I also don't know how to help the local search process significantly increase the score.
My problem is a scheduling one, where I precalculate all possible times (intervals) that a sensor can observe objects prior to solving. I've modeled the problem as follows:
Hard constraint - no intervals can overlap
when
$s1: A( interval!=null,$id: id, $doy : interval.doy, $interval: interval, $sensor: interval.getSensor())
exists A( id > $id, interval!=null, $interval2: interval, $interval2.getSensor() == $sensor, $interval2.getDoy() == $doy, $interval.getStartSlot() <= $interval2.getEndSlot(), $interval.getEndSlot() >= $interval2.getStartSlot() )
then
scoreHolder.addHardConstraintMatch(kcontext,-10000);
Medium constraint - every assignment should have an Interval
when
A(interval==null)
then
scoreHolder.addMediumConstraintMatch(kcontext,-100);
Soft constraint - maximize a property/value in the Interval class
when
$s1: A( interval!=null)
then
scoreHolder.addSoftConstraintMatch(kcontext,-1 * $s1.getInterval().getSomeProperty())
A: entity planning class; each instance is an assignment for a particular object (i.e has an member objectid that corresponds with one in the Interval class)
Interval: planning variable class, all possible intervals (start time, stop time) for the sensor and objects
In A, I restrict access to B instances (intervals) to just those for the object associated with that assignment. For my test case, there are 40000 or so Intervals, but only dozens for each object. There are about 1100 instances of A (so dozens of possible Intervals for each one).
#PlanningVariable(valueRangeProviderRefs = {"intervalRange"},strengthComparatorClass = IntervalStrengthComparator.class, nullable=true)
public Interval getInterval() {
return interval;
}
#ValueRangeProvider(id = "intervalRange")
public List<Interval> getPossibleIntervalList() {
return task.getAvailableIntervals();
}
In my solution class:
//have tried commenting this out since the overall interval list does not apply to all A
#ValueRangeProvider (id="intervalRangeAll")
public List getIntervalList() {
return intervals;
}
#PlanningEntityCollectionProperty
public List<A> getAList() {
return AList;
}
I've reviewed the documentation and tried a lot of things. My problem is somewhat of a cross between the nurserostering and course scheduling examples, which I have looked at. I am using the FIRST_FIT_DECREASING construction heuristic.
What I have tried:
Turning on and off nullable in the planning variable annotation for A.getInterval()
Late acceptance, Tabu, both.
Benchmarking. I wasn't seeing any problems and average
Adding an IntervalChangeFactory as a moveListFactory. Restricting the custom ChangeMove to whether the interval can be accepted or not (i.e. enforcing or not the hard constraint in the IntervalChangeMove.isDoable).
Here is an example one, where most of the hard constraints are not satisfied, but the medium ones are:
[main] INFO org.optaplanner.core.impl.solver.DefaultSolver - Solving started: time spent (202), best score (0hard/-112500medium/0soft), environment mode (REPRODUCIBLE), random (WELL44497B with seed 987654321).
[main] INFO org.optaplanner.core.impl.constructionheuristic.DefaultConstructionHeuristicPhase - Construction Heuristic phase (0) ended: step total (1125), time spent (2296), best score (-9100000hard/0medium/-72608soft).
[main] INFO org.optaplanner.core.impl.localsearch.DefaultLocalSearchPhase - Local Search phase (1) ended: step total (92507), time spent (30000), best score (-8850000hard/0medium/-74721soft).
[main] INFO org.optaplanner.core.impl.solver.DefaultSolver - Solving ended: time spent (30000), best score (-8850000hard/0medium/-74721soft), average calculate count per second (5643), environment mode (REPRODUCIBLE).
So I don't understand is why the hard constraints can't be deal with by the search process. Any my calculate count per second has dropped to below 10000 due to all the tinkering I've done.

If it's not due to the Score trap (see docs, this is the first thing to fix), it's probably because it gets stuck in a local optima and there are no moves that go from 1 feasible solution to another feasible solution except those that don't really change much. There are several ways to deal with that:
Add course-grained moves (but still leave the fine-grained moves such as ChangeMove in!). You can add generic course grained moves (such as the pillar moves) or add a custom Move. Don't start making smarter selectors that try to only select feasible moves, that's a bad idea (as it will kill your ACC or will limit your search space). Just mix in course grained moves (= diversification) to complement the fine grained moves (= intensification).
A better domain model might help too. The Project Job Scheduling and Cheap Time scheduling examples have a domain model which naturally leads to a smaller search space while still allowing all feasible solutions.
Increase Tabu list size, LA size or use a hard constraint in the SA starting temperature. But I presume you've tried that with the benchmarker.
Turn on TRACE logging to see optaplanner's decision making. Only look at the part after it reaches the local optimum.
In the future, I 'll also add Ruin&Recreate moves, which will be far less code than custom moves or a better domain model (but it will be less efficient).

Related

How do I write constraints that is only active at Construction Heuristics phase?

I would like to have some constraints that is only active during construction heuristics phase, so I write like this:
fun aConstraintOnlyActiveInCHPhase(constraintFactory: ConstraintFactory): Constraint {
return constraintFactory.from(MyPlanningEntity::class.java)
.ifExists(MyPlanningEntity::class.java,
Joiners.filtering({entity1,entity2 -> entity2.myplanningvariable == null})
)
...
...
.penalize("aConstraintOnlyActiveInCHPhase",HardSoftScore.ONE_HARD)
}
However this works for all but the last planning entity, when the last planning entity is initialized, there is no other uninitialized planning entity so this constraint will not be active.
How do I write constraints that is active for all planning entity during construction heuristics phase?
Furthermore, how do I write constraints that is active in different phase during solving?
The short answer is that you do not. The score represents the measure of quality of your solution, as it pertains to a particular problem. The problem you are solving is the same in every solver phase.
If you change your constraints, you are changing the optimization problem, and therefore might as well run a new solver with a new configuration. Whatever solution you got until that point may as well be thrown out of the window, because it was optimized for different criteria which are no longer valid.
That said, the constraint above will do what you want if you start using forEachIncludingNullVars(...) instead. This will include uninitialized entities, helping you avoid the ifExists(...) hack.

Difference between shuffle() and rebalance() in Apache Flink

I am working on my bachelor's final project, which is about the comparison between Apache Spark Streaming and Apache Flink (only streaming) and I have just arrived to "Physical partitioning" in Flink's documentation. The matter is that in this documentation it doesn't explain well how this two transformations work. Directly from the documentation:
shuffle(): Partitions elements randomly according to a uniform distribution.
rebalance(): Partitions elements round-robin, creating equal load per partition. Useful for performance optimisation in the presence of data skew.
Source: https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/datastream_api.html#physical-partitioning
Both are automatically done, so what I understand is that they both redistribute equally (shuffle() > uniform distribution & rebalance() > round-robin) and randomly the data. Then I deduce that rebalance() distributes the data in a better way ("equal load per partitions") so the tasks have to process the same amount of data, but shuffle() may create bigger and smaller partitions. Then, in which cases might you prefer to use shuffle() than rebalance()?
The only thing that comes to my mind is that probably rebalance()requires some processing time so in some cases it might use more time to do the rebalancing than the time it will improve in the future transformations.
I have been looking for this and nobody has talked about this, only in a mailing list of Flink, but they don't explain how shuffle() works.
Thanks to Sneftel who has helped me to improve my question asking me things to let me rethink about what I wanted to ask; and to Till who answered quite well my question. :D
As the documentation states, shuffle will randomly distribute the data whereas rebalance will distribute the data in a round robin fashion. The latter is more efficient since you don't have to compute a random number. Moreover, depending on the randomness, you might end up with some kind of not so uniform distribution.
On the other hand, rebalance will always start sending the first element to the first channel. Thus, if you have only few elements (fewer elements than subtasks), then only some of the subtasks will receive elements, because you always start to send the first element to the first subtask. In the streaming case this should eventually not matter because you usually have an unbounded input stream.
The actual reason why both methods exist is a historically reason. shuffle was introduced first. In order to make the batch an streaming API more similar, rebalance was then introduced.
This statement by Flink is misleading:
Useful for performance optimisation in the presence of data skew.
Since it's used to describe rebalance, but not shuffle, it suggests it's the distinguishing factor. My understanding of it was that if some items are slow to process and some fast, the partitioner will use the next free channel to send the item to. But this is not the case, compare the code for rebalance and shuffle. The rebalance just adds to next channel regardless how busy it is.
// rebalance
nextChannelToSendTo = (nextChannelToSendTo + 1) % numberOfChannels;
// shuffle
nextChannelToSendTo = random.nextInt(numberOfChannels);
The statement can be also understood differently: the "load" doesn't mean actual processing time, just the number of items. If your original partitioning has skew (vastly different number of items in partitions), the operation will assign items to partitions uniformly. However in this case it applies to both operations.
My conclusion: shuffle and rebalance do the same thing, but rebalance does it slightly more efficiently. But the difference is so small that it's unlikely that you'll notice it, java.util.Random can generate 70m random numbers in a single thread on my machine.

Modeling an HTTP transition system in Alloy

I want to model an HTTP interaction, i.e. a sequence of HTTPRequest/HTTPResponse, and I am trying to model this as a transition system.
I defined an ordering on a class State by using:
open util/ordering[State]
where a State is simply a set of Messages:
sig State {
msgSet: set Message
}
Each pair of (HTTPRequest->HTTPResponse) and (HTTPResponse->HTTPRequest) is represented as a rule in my transition system.
The rules are expressed in Alloy as predicates that let one move from one state to another.
E.g., this is a rule generating an HTTPResponse after a particular HTTPRequest is received:
pred rsp1 [s, s': State] {
one msg: Request, msg':Response | (
// Preconditions (previous Request)
msg.method=get &&
msg.address.url=sample_com &&
// Postconditions (next Response)
msg'.status=OK_200 &&
// previous Request has to be in previous state
msg in s.msgSet &&
// Response generated is added to next state
s'.msgSet = s.msgSet + msg'
}
Unfortunately, the model created seems to be too complex: we have a dozen of rules (more complex than the one above but following the same pattern) and the execution is very slow.
EDIT: In particular, the CNF generation is extremely slow, while the solving takes a reasonable amount of time.
Do you have any suggestion on how to model a similar transition system?
Thank you very much!
This is a model with an impressive level of detail; thank you for sharing it!
None of the various forms of honestAction by itself takes more than two or three minutes to find an instance (or in some cases to fail to find any instance), except for rsp8, which takes quite a while by itself (it ran for fifteen minutes or so before I stopped it).
So the long CNF preparation times you are observing are apparently caused by either (a) just predicate rsp8 that's causing your time issues, or (b) the size of the disjunction in the honestAction predicate, or (c) both.
I suspect but have not proved that the time issue is caused by combinatorial explosion in the number of individuals required to populate a model and the number of constraints in the model.
My first instinct (it's not more than that) would be to cut back on the level of detail in the model, in particular the large number of singleton signatures which instantiate your abstract signatures. These seem (I could be wrong) to be present either for bookkeeping purposes (so you can identify which rule licenses the transition from one state to another), or because the modeler doesn't trust Alloy to generate concrete instances of signatures like UserName, Password, Code, etc.
As the model now is, it looks as if you're doing a lot of work to define all the individuals involved in a particular example, instead of defining constraints and letting Alloy do the work of finding examples. (Using Alloy to check the properties a particular concrete example can be useful, but there are other ways to do that.)
Since so many of the concrete signatures in the model are constrained to singleton cardinality, I don't actually know that defining them makes the task of finding models more complex; for all I know, it makes it simpler. But my instinct is to think that it would be more useful to know (as well as possibly easier for Alloy to establish) that state transitions have a particular property in general, no matter what hosts, users, and URIs are involved, than to know that property rsp1 applies in all the cases where the host is named examplecom and the address URI is example_url_https and whatnot.
I conjecture that reducing the number of individuals whose existence and properties are prescribed, and the constraints on which individuals can be involved in which state transitions, will reduce the CNF generation time.
If your long-term goal is to test long sequences of state transitions to test whether from a given starting point it's possible or impossible to arrive at a particular state (or kind of state), you may need to re-think the approach to enable shorter sequences of state transitions to do the job.
A second conjecture would involve less restructuring of the model. For reasons I don't think I understand fully, sometimes quantification with one seems to hurt rather than help performance, as in this example, where explicitly quantifying some variables with some instead of one turned out to make a problem tractable instead of intractable.
That question involves quantification in a predicate, not in the model overall, and the quantification with one wasn't intended in the first place, so it may not be relevant here. But we can test the effect of the one keyword on this model in a simple way: I commented out everything in honestAction except rsp8 and ran the predicate first != last in a scope of 8, once with most of the occurrences of one commented out and once with those keywords intact. With the one keywords commented out, the Analyser ran the problem in 24 seconds or so; with the one keywords in place, it ran for 500 seconds so far before I decided the point was made and terminated it.
So I'd try removing the keyword one from all of the signatures with instance-specific individuals, leaving it only on get, post, OK_200, etc., and appData. I would also try doing without the various subtypes of Key, SessionID, URL, Host, UserName, and Password, or at least constraining their cardinality in the run command.

MS Project single resource assigned to multiple concurrent activities

In Microsoft Project (MSP), is it possible to have two parallel/concurrent activities with the same resource assigned to both? For example:
The worker resource "matt" is assigned to both activities, hence the overallocation error showing in the info column (red human icon).
Why do I require such a structure you may ask? In reality, one task would continually be temporarily interrupted to do the other one, and vice versa. But in MSP, it would be counter-productive to do so. Hence, my simplified approach - parallel activities.
So how should I fix this issue?
Ignore it?
Allocate 50% of the resource to each (and fix up finish dates which are automatically extended by MSP?)
Or is there a better solution?
if they indeed overlap then the best way to put this in your project is to allocate 50% on each task. But why fix up finish date, the program will recalculate the new duration of the task based on the new work, while the allocation is lowered to 50% then the duration should double ... Or you can declare the task as duration driven if that is the case, for such tasks duration will stay as set though allocation is changed
I do not think there is a better solution. So solution number 2 that is adjusting assignment units is the best in your case.

Are all scheduling problems NP-Hard?

I know there are some scheduling problems out there that are NP-hard/NP-complete ... however, none of them are stated in such a way to show this situation is also NP.
If you have a set of tasks constrained to a startAfter, startBy, and duration all trying to use a single resource ... can you resolve a schedule or identify that it cannot be resolved without an exhaustive search?
If the answer is "sorry pal, but this is NP-complete" what would be the best heuristic(s?) to use and are there ways to decrease the time it takes to a) resolve a schedule and b) to identify an unresolvable schedule.
I've implemented (in prolog) a basic conflict resolution goal through recursion that implements a "smallest window first" heuristic. This actually finds solutions rather quickly, but is exceptionally slow at finding invalid schedules. Is there a way to overcome this?
Yay for compound questions!
The hardest part of most scheduling problems in real life is getting hold of a reliability and complete set of constraints. If we take the example of creating a university timetable:
Professor A will not get up in the morning, he is on a lot of committees, but no-one will tell the timetable office about this sort of constraint
Department 1 needs the timetable by the start of term, however, Department 2 that uses the same rooms is unwilling to decide on the courses that will be run until after all the students have arrived
Etc
Then you need a schedule system that can cope with changes, so when one constraint is changed at the last minute you don’t have to change the complete timetable.
All of the above is normally ignored in research papers about scheduling systems. As to NP completeness of a given scheduling problem, in real life you don’t care as even if it is not NP complete you are unlikely to even be able to define what the “best solution” is, so good enough is good enough.
See http://www.asap.cs.nott.ac.uk/watt/resources/university.html for a list of papers that may help get you started; there are still many PHDs to be had in scheduling software.
There are often good approximation algorithms for NP-hard/complete optimization problems like scheduling. You might skim the course notes by Ahmed Abu Safia on Approximation Algorithms for scheduling or various papers.
In a sense, all public key cryptography is done with "less hard" problems like factoring partially because NP-hard problems offer up too many easy cases. It's the same NP-completeness that makes them "morally hard" which also gives them too many easy problems, which often fall within some error bound of optimal.
There is a deeper theory of hardness of approximation that discusses the limitations of approximation algorithms though.
You can use dynamic programming to solve some of these things. Greedy algorithms also come to mind. Scheduling theory is both deep and beautiful but those two I find will solve most of the problems I've faced. Perhaps I've been lucky.
What do you mean with startBy?
With startAfter and if there is only one resource, then a fast solution could be to use topological sorting. The example algorithm runs in linear time, but does not include the error case if the graph contains cycles.
Here's one that isn't.
Schedule a set of jobs i= 1,2...n on a single machine which each take time t(i) so that the average waiting time is minimized.
Solution: Sort in increasing order of t(i). O(n log n)
Good list here
Consider the scheduling problem that is in the class P:
Input: list of activities which include the start time and finish time.
Sort by finish time.
Select the first N elements of this sorted list to find the maximum amount of activities you can schedule in a given time.
You can add caveats like: all activities must end at 5pm, well in this case as you work through the list, stop once you reach an activity which ends after this time.

Resources