Optaplanner: Dynamic number of planning variable based on ProblemFactCollectionProperty - constraints

Problem: Truck assignment with limited quantity to deliver to the customers.
The truck has to be assigned to the customer. As the truck has limited quantity, the truck needs to return back to reload again to deliver to the next customer.
Trip - load at Depot, unload at customer/few customers, come back to depot.
The problem facts are available trucks and customers to be delivered. We need to find dynamically how many trips can be possible from truck-based on few timing-related conditions(like truck available time, driver hours, etc).
The solution I can think off:
Pre-compute max number of trips by the truck based on business understanding- use this as a planning variable. Provide hard score for violating time constraints, so few trips will be left unassigned if truck exceeds the available truck/trip time.
Need Help:
For every solved example, we have a fixed number of planning variables before planning. Even In the chained planning variable(Like TSP,VRP), we have the fixed number of trucks beforehand.
Any help is appreciated. If there is no direct solution, is the approach I have come up is the best possible?

That solution is indeed recommended currently:
Provide enough trucks in the anchorValueRange to make sure a feasible solution can be found. Defining that number can be tricky: typically double the average usage. For example, if you have 300 visits and do on average 100 visits per truck, give it 6 trucks, as you never expect it to use more than 6 trucks (and probably a lot less). If trucks have skills or affinity, this becomes a bunch more complex.
Add an extra score level: if you're on HardSoftScore, switch to HardMediumSoftScore.
Add a medium constraint to penalize the number of trucks used. This is softer than the hard constraints (capacity etc) and harder than the soft constraints (distance etc).
(The alternative, adding/removing values to the value ranges on the fly, is only theoretically possible in OptaPlanner's architecture at the moment (don't use addProblemFactChanges for this!). It might sound like the perfect solution, but there are many subsystems that profit from a fixed value range, so that approach would have severe trade-offs.)

Related

Why pipelining cannot operate at its maximum theoretical speed?

First of all, what is the maximum theoretical speed/speed up?
Can anyone explain why pipelining cannot operate at its maximum theoretical speed?
The maximum theoretical speedup is equal to the increase in pipeline depth. In a scalar (one instruction wide execution) design, the ideal instructions per cycle is one. Ideally, the clock frequency could increase by a factor equal to the increase in pipeline depth.
The actual frequency increase will be less than this ideal due to latching overheads, clock skew, and imbalanced division of work/latency. (While one can theoretically place latches at any point, the amount of state latched, its position, and other factors make certain points more friendly for stage divisions.
Manufacturing variation also means that work designed to take an equal amount of time will not do so for all stages in the pipeline. A designer can provide more slack so that more chips will meet minimal timing in all stages. Another technique to handle such variation is to accept that not all chips will meet the target frequency (whether one exclusively uses the "golden samples" or use lower frequency chips as well is a marketing decision).
As one might expect, with shallow pipelines variation in a stage is spread out over more logic and so is less likely to affect frequency.
Wave pipelining, where a multiple signal waves (corresponding to pipeline stages) can be passing through a block of logic at the same time, provides a limited method to avoid latch overhead. However, besides other design issues, such is more sensitive to variation both from manufacturing and from run-time conditions such as temperature and voltage (which one might wish to intentionally vary to target different power/performance behaviors).
Even if one did have incredible hardware that provided a perfect frequency increase, hazards (as mentioned in Peter Cordes' comment) would prevent the perfect utilization of available execution resources.

How can I create my own GUID algorithm with smaller "global"?

I have my own application with far more smaller "global" than our real global and I wanted shorter version of GUID. Now supposed I have my concrete number of IDs that I estimated to not ever exceed (for example 100 million IDs). How can I determine the number of random bits required to have the same property as GUID? (Globally unique, require no central authority to generate one) Using the normal GUID would be an overkill.
My "overkill" refers to this : I need the ID to be as easily typed/say/write down as possible and have somewhat astronomically low collision chance as GUID at the same time. I heard GUID can be assigned to every grain of sand on earth. My application is a game, each player get one ID generated, obviously my players is not as much as the amount of sand on earth.
It would be the best if player can say like "My ID is XXXX-XXXX". In that case, I would be not so sure if 8 characters of randomized hex is not enough or too much for 100 million players. (In reality I encode it to A-Z 0-9 instead of hex though) My game is not online restricted, so I would like each player to be able to obtain unique ID even when not online. (no server to check ID collisions)
GUID has been designed to be globally unique. But I don't know why that results in 128-bit sequence. Maybe they just choose the "very large" one that is a power of 2? I don't know what are they thinking when designing GUID to ensure that it will not clash. (They use world population times something? If that is the case I can too use 10 million times something.)
A 128-bit guid will generally perform well, because most compilers are smart enough to reduce operations on it to a pair of 64-bit operations (and on some CPUs, a single 128-bit extended operation). Java and C#/VB.NET would likely have quite a bit more overhead than C++, but if you are using Java or C#/VB.NET, you've already accepted quite a bit more overhead, and a GUID won't add much to it.
However, if you really need smaller values, you could manually reduce GUIDs, by XOR-ing the upper 64 bits with the lower 64 bits (thereby preserving some of the uniqueness of the original) to create a compact 64-bit mostly-unique number.
You could reduce to 32-bit or 48-bit in a similar way, always a multiple of the size of the original GUID. This has the advantage that you are starting out with a number that is intended to be unique across a very large set. However, keep in mind that 100 million items require a fairly high number of bits to preserve a non-overlapping guarantee, so you may just be setting yourself up for a very difficult-to-find problem later on if you aren't careful.
A crude but probably equally effective approach is to use a cryptographically-secure random number generator and construct a number as large as you need (probably minimum 48-bit). It is important not to do modulo operations on the results, or you could significantly reduce the uniqueness (due to the period of the random number generator).
I am assuming you cannot use a sequential id, although you may want to revisit that idea and see if there is a way to make a sequential id work. For example, you could use a sequential id paired with a random seed number, guaranteeing uniqueness without requiring a large number, and allowing internal indexing operations and similar optimizations that are common with large data sets.
Ok, I have discussed with friend and came up with solution. This is how to decide the number of "characters" of my game ID.
A character would consist of 0-9 and A-Z instead of HEX, thats 36 kinds of characters. We took out 0 O 1 I so it would be printable to variety of fonts without confusion, that leaves 32 kind of characters.
Then if every characters will be pseudo-randomized, how many players can we safely have?
We used Birthday paradox's square approximation. The formula in that page indicate how many number of people necessary to have 50% chance of 2 people colliding. It is 22.99 people for birthday problem. (365 possible choices)
Now we substitute 32^No.of characters into the equation instead of 365. This is how many players that will cause 50% chance of 2 players having the same ID :
Finally, we agreed to choose 9-character ID so the game can be registered up to 6.9 million players before just 2 from all 6.9 million players will have the same ID (50% chance).
The game isn't even online-only! It only collide if that 2 players is still actively playing at the same time and decide to send score to the scoreboard in the same week because of weekly score reset. So the actual number that the game can hold would be somewhat higher than that. (The game will probably not having that many players.. it is just a small happy dream of every game startups. Well at least the computation was fun.)
It will probably looks like this for easier reading : 5XT-339-A67

Estimating the heat generated by a process or job

Is it possible to estimate the heat generated by an individual process in runtime.
Temperature readings of the processor is easily accessible but what I need is process specific information.
Is it possible to map information such as cpu utilization, io, running time, memory usage etc to get some kind of an estimate?
I'm gonna say no. Because the overall temperature of your system components isn't a simple mathematical equation with everything that's moving and switching either.
Heat generated by and inside a computer is dependent on many external factors like hardware setup, ambient temperature of the room, possibly the age of the components, is there dust on them or in the fans, was the cooling paste correctly applied on the CPU or elsewhere, where heat sinks are present, how is heat being dissipated etc.etc.. In short, again, no.
Additionally, your computer runs a LOT of processes at any given time apart from the ones that you control (and "control" is a relative term). Even if it is possible to access certain sensory data for individual components (like you can see to some extent in the BIOS) then interpolating one single process' generated temperature in regard to the total is, well, impossible.
At the lowest levels (gate networks, control signalling etc.), an external individual no longer has any means to observe or measure what's going on but there as well, things are in a changing state, a variable amount of electricity is being used and thus a variable amount of heat generated.
Pertaining to your second question: that's basically what your task manager does. There are countless examples and articles on the internet on how to get that done in a plethora of programming languages.
That is, unless some of the actually smart people in this merry little community of keytappers and screengazers say that it IS actually possible, at which point I will be thoroughly amazed...
EDIT: Monitoring the processes is a first step in what you're looking for. take a look at How to detect a process start & end using c# in windows? and be sure to follow up on duplicates like the one mentioned by Hans.
You could take a look at PowerTOP or some other tool that monitors power usage. I am not sure how accurate it is across different systems but a power estimation should provide at least some relative information as the heat generated assuming the processes you are comparing are running in similar manners on hardware. In reality there are just too many factors to predict power, much less heat, effectively but you may be able to get an idea of the usage.

What are the tradeoffs when generating unique sequence numbers in a distributed and concurrent environment?

I am curious about the contraints and tradeoffs for generating unique sequence numbers in a distributed and concurrent environment.
Imagine this: I have a system where all it does is give back an unique sequence number every time you ask it. Here is an ideal spec for such a system (constraints):
Stay up under high-load.
Allow as many concurrent connections as possible.
Distributed: spread load across multiple machines.
Performance: run as fast as possible and have as much throughput as possible.
Correctness: numbers generated must:
not repeat.
be unique per request (must have a way break ties if any two request happens at the exact same time).
in (increasing) sequential order.
have no gaps between requests: 1,2,3,4... (effectively a counter for total # requests)
Fault tolerant: if one or more, or all machines went down, it could resume to the state before failure.
Obviously, this is an idealized spec and not all constraints can be satisfied fully. See CAP Theorem. However, I would love to hear your analysis on various relaxation of the constraints. What type of problems will we left with and what algorithms would we use to solve the remaining problems. For example, if we rid of the counter constraint, then the problem becomes much easier: since gaps are allowed, we can just partition the numeric ranges and map them onto different machines.
Any references (papers, books, code) are welcome. I'd also like to keep a list of existing software (open source or not).
Software:
Snowflake: a network service for generating unique ID numbers at high scale with some simple guarantees.
keyspace: a publicly accessible, unique 128-bit ID generator, whose IDs can be used for any purpose
RFC-4122 implementations exist in many languages. The RFC spec is probably a really good base, as it prevents the need for any inter-system coordination, the UUIDs are 128-bit, and when using IDs from software implementing certain versions of the spec, they include a time code portion that makes sorting possible, etc.
If you must be sequential (per machine) but can drop the gap/counter requirments look for an implementation of the Version 1 UUID as specified in RFC 4122.
If you're working in .NET and can eliminate the sequential and gap/counter requirements, just use System.Guids. They implement RFC 4122 Version 4 and are already unique (very low collision probability) across machines and requests. This could be easily implemented as a web service or just used locally.
Here's a high-level idea for an approach that may fulfill all the requirements, albeit with a significant caveat that may not match many use cases.
If you can tolerate having two sequence numbers - a logical one returned immediately; guaranteed unique and ordered but with gaps - and a separate physical one guaranteed to be in sequential order with no gaps and available a short while later - then the solution seems straightforward:
One distributed system that can serve up a high resolution clock + machine id as the logical sequence number
Stream all the logical sequence numbers into a separate distributed system that orders the logical sequence numbers and maps them to the physical sequence numbers.
The mapping from logical to physical can happen on-demand as soon as the second system is done with processing.

Is it safe to assume a GUID will always be unique?

I know there is a minute possibility of a clash but if I generated a batch of 1000 GUIDs (for example), would it be safe to assume they're all unique to save testing each one?
Bonus question
An optimal way to test a GUID for uniqueness? Bloom filter maybe?
Yes, you can. Since GUIDs are 128 bits long, there is admittedly a minute possibility of a clash—but the word "minute" is nowhere near strong enough. There are so many GUIDs that if you generate several trillion of them randomly, you're still more likely to get hit by a meteorite than to have even one collision (from Wikipedia). And if you aren't generating them randomly, but are e.g. using the MAC-address-and-time-stamp algorithm, then they're also going to be unique, as MAC addresses are unique among computers and time stamps are unique on your computer.
Edit 1: To answer your bonus question, the optimal way to test a set of GUIDs for uniqueness is to just assume that they are all are unique. Why? Because, given the number of GUIDs you're generating, the odds of a GUID collision are smaller than the odds of a cosmic ray flipping a bit in your computer's memory and screwing up the answer given by any "accurate" algorithm you'd care to run. (See this StackOverflow answer for the math.)
There are an enormous number of GUIDs out there. To quote Douglas Adams's Hitchhiker's Guide to the Galaxy:
"Space," it says, "is big. Really big. You just won't believe how vastly hugely mindbogglingly big it is. I mean you may think it's a long way down the road to the chemist, but that's just peanuts to space, listen…"
And since there are about 7×1022 stars in the universe, and just under 2128 GUIDs, then there are approximately 4.86×1015—almost five quadrillion—GUIDs for every single star. If every one of those stars had a world with a thriving population like ours, then around each and every star, every human or alien who had ever lived would be entitled to over forty-five thousand GUIDs. For every person in history at every star in the universe. The GUID space is at the same level of hugeness as the size of the entire universe. You do not need to worry.
(Edit 2: Reflecting on this: wow. I hadn't realized myself what this meant. The GUID space is incomprehensibly massive. I'm sort of in awe of it.)
Short answer: for practical purposes, yes.
However, you have to consider the birthday paradox!
I have calculated a few representative collision probabilities. With 122-bit UUIDs as specified in the Wikipedia article, the probability of collision is 1/2 if you generate at least 2.71492e18 UUIDs. With 10^19 UUIDs, the probability is 0.999918. With 10^17 UUIDs, 0.000939953.
Some numbers for comparison can be found on Wikipedia. So you can safely assign a UUID for each human that has lived, each galaxy in the observable universe, each fish in the ocean, and each individual ant on Earth. However, collisions are almost certain if you generate a UUID for each transistor humanity produces in a year, each insect on Earth, each grain of sand on Earth, each star in the observable universe, or anything larger.
If you generate 1 billion UUIDs per second, it would take about 36 years to get a collision probability of 10%.
Eventually, there will probably be a collision among the set of UUIDs generated over the course of human history. Still, the probability that collided UUIDs will be used for the same purpose is vanishingly small, so there's no problem in practice.
An analysis of the possibility of collision is available on Wikipedia: http://en.wikipedia.org/wiki/Uuid#Random_UUID_probability_of_duplicates
As mentioned in the link, this will be affected by the properties of the random number generator.
There is also the possibility of a bug in GUID generator code; while the chances are low, they are probably higher than the chances of a collision based on the mathematics.
A Bloom filter might be appropriate; it can quickly tell you if a GUID is unique, but there's a chance for a false indication of a collision. An alternate method if you're testing a batch at a time is to sort the batch and compare each successive element.
In general, yes it is safe to assume.
If your GUID generator is truly random, the possibilities of a clash within a 1000 GUIDs is extraordinarily small.
Of course, that assumes a good GUID generator. So the question is really about how much you trust the tool you're using to generate GUID and does it have its own tests?
This topic reminds me of the Deck of cards scenario. That is to say that there are so many ways a deck of 52 cards can be arranged, that its pretty much certain that no 2 properly shuffled decks of cards that have ever existed, have been in the same order.
If you take a deck now and shuffle it, that sequence will be unique, and will probably never be seen again in all of humanity. Indeed the potential number of ways to arrange 52 of anything is so unimaginably vast that the chances of any 2 decks happening to be the same order are close to zero.
In this example of having 40 shuffled decks and wanting to know for sure they are all unique, it's not impossible 2 of them are the same but its something that most likely would not occur if you were able to shuffle all the decks once every 10th of a second and you started at the birth of the universe.
While a collision is possible, it is HIGHLY unlikely. (Math here.) It is safe to assume they are in fact distinct.
Usually it is a pretty safe assumption.
http://en.wikipedia.org/wiki/Globally_Unique_Identifier
Is a GUID unique 100% of the time?

Resources