Partially re-create Risk-like game based on incomplete log files - math

I'm trying to re-create this conquerclub (Risk-like) game:
http://conquerclub.barrycarter.info/ONEOFF/7460216.html
In other words, I want to know who owned each territory at each point
in time, and how many troops they had on that territory. My primary
source of information is the Game Log. Notes:
% It's not in the Game Log, but all territories start w/ 3 troops.
% Since we know the territory owners at the end of the game, and the
Game Log mentions all owner changes, determining territory owners at
any point in time is easy.
% The challenge is to find the number of troops on a territory at a
given time.
% The Game Log gives information on troop deployment, reinforcement,
and conquest.
% However, the Game Log is incomplete. Suppose territory X attacks
territory Y unsuccessfully, but both territories lose troops in the
process. The Game Log will not mention this.
% It's probably not possible (in general) to find the exact number
of troops on a territory at a given time, so I'm looking for a range.
% I tried feeding the data to Mathematica as a series of
inequalities, but as the manual warns, the computation time increases
exponentially with the number of inequalities. Even with a fairly
small number of inequalities, it hangs. Plus, I'm not convinced
Mathematica is the right tool here.
% Any thoughts? Another example is:
http://conquerclub.barrycarter.info/ONEOFF/7562013.html
% I know about http://userscripts.org/scripts/show/83035 but that only tracks \
owners, not number of troops.

You could make use of Prolog's constraint programming (specifically, CLP/FD). It would require you to encode all rules in Prolog, which might be a non-trivial task. However Prolog would be able then to show you all possible valid (legal in terms of encoded rules) ways of playing such game, or just show ranges of possible values.
Also, while CLP/FD in Prolog sometimes is quite fast, it might be difficult to use it to make solving your problem quickly. Most free solvers have many quirks.
Again, I think this is a nontrivial task, and even greater if you haven't programmed in Prolog earlier. But I am pretty sure this would give you answers you seek.

Related

Within Distributed System, when can a logical clock increment by more than 1?

I'm doing a cousework for a distributed sytems module, and within it I neef to apply a variable clock incrementor; my tutor has gone over both Lamport and Vector clocks, but said "I cant hint at that" when I asked him about applying a variable length/size per clock.
I wish I knew what to do,
Andy
I suppose you mean vector clocks of variable size?
This is technically not possible due to the way vector clocks are defined and used, however it brings the problem, that you would need to know about all nodes which will communicate together and use a vector clock right in the beginning. This way you wouldn’t be allowed to expand your service, and also if you tear down a node, to never start it again, the time for it would be still sent around and waste resources.
One of my professors in distributed systems mentioned, that Amazon is/was using “dynamic” vector clocks for some services, and they had an algorithm which automatically removed “old” entries from the vector clcoks. They supposesdly concluded something like, this worked so far fine. However I never saw the paper about this.

Estimating the heat generated by a process or job

Is it possible to estimate the heat generated by an individual process in runtime.
Temperature readings of the processor is easily accessible but what I need is process specific information.
Is it possible to map information such as cpu utilization, io, running time, memory usage etc to get some kind of an estimate?
I'm gonna say no. Because the overall temperature of your system components isn't a simple mathematical equation with everything that's moving and switching either.
Heat generated by and inside a computer is dependent on many external factors like hardware setup, ambient temperature of the room, possibly the age of the components, is there dust on them or in the fans, was the cooling paste correctly applied on the CPU or elsewhere, where heat sinks are present, how is heat being dissipated etc.etc.. In short, again, no.
Additionally, your computer runs a LOT of processes at any given time apart from the ones that you control (and "control" is a relative term). Even if it is possible to access certain sensory data for individual components (like you can see to some extent in the BIOS) then interpolating one single process' generated temperature in regard to the total is, well, impossible.
At the lowest levels (gate networks, control signalling etc.), an external individual no longer has any means to observe or measure what's going on but there as well, things are in a changing state, a variable amount of electricity is being used and thus a variable amount of heat generated.
Pertaining to your second question: that's basically what your task manager does. There are countless examples and articles on the internet on how to get that done in a plethora of programming languages.
That is, unless some of the actually smart people in this merry little community of keytappers and screengazers say that it IS actually possible, at which point I will be thoroughly amazed...
EDIT: Monitoring the processes is a first step in what you're looking for. take a look at How to detect a process start & end using c# in windows? and be sure to follow up on duplicates like the one mentioned by Hans.
You could take a look at PowerTOP or some other tool that monitors power usage. I am not sure how accurate it is across different systems but a power estimation should provide at least some relative information as the heat generated assuming the processes you are comparing are running in similar manners on hardware. In reality there are just too many factors to predict power, much less heat, effectively but you may be able to get an idea of the usage.

Is it safe to assume a GUID will always be unique?

I know there is a minute possibility of a clash but if I generated a batch of 1000 GUIDs (for example), would it be safe to assume they're all unique to save testing each one?
Bonus question
An optimal way to test a GUID for uniqueness? Bloom filter maybe?
Yes, you can. Since GUIDs are 128 bits long, there is admittedly a minute possibility of a clash—but the word "minute" is nowhere near strong enough. There are so many GUIDs that if you generate several trillion of them randomly, you're still more likely to get hit by a meteorite than to have even one collision (from Wikipedia). And if you aren't generating them randomly, but are e.g. using the MAC-address-and-time-stamp algorithm, then they're also going to be unique, as MAC addresses are unique among computers and time stamps are unique on your computer.
Edit 1: To answer your bonus question, the optimal way to test a set of GUIDs for uniqueness is to just assume that they are all are unique. Why? Because, given the number of GUIDs you're generating, the odds of a GUID collision are smaller than the odds of a cosmic ray flipping a bit in your computer's memory and screwing up the answer given by any "accurate" algorithm you'd care to run. (See this StackOverflow answer for the math.)
There are an enormous number of GUIDs out there. To quote Douglas Adams's Hitchhiker's Guide to the Galaxy:
"Space," it says, "is big. Really big. You just won't believe how vastly hugely mindbogglingly big it is. I mean you may think it's a long way down the road to the chemist, but that's just peanuts to space, listen…"
And since there are about 7×1022 stars in the universe, and just under 2128 GUIDs, then there are approximately 4.86×1015—almost five quadrillion—GUIDs for every single star. If every one of those stars had a world with a thriving population like ours, then around each and every star, every human or alien who had ever lived would be entitled to over forty-five thousand GUIDs. For every person in history at every star in the universe. The GUID space is at the same level of hugeness as the size of the entire universe. You do not need to worry.
(Edit 2: Reflecting on this: wow. I hadn't realized myself what this meant. The GUID space is incomprehensibly massive. I'm sort of in awe of it.)
Short answer: for practical purposes, yes.
However, you have to consider the birthday paradox!
I have calculated a few representative collision probabilities. With 122-bit UUIDs as specified in the Wikipedia article, the probability of collision is 1/2 if you generate at least 2.71492e18 UUIDs. With 10^19 UUIDs, the probability is 0.999918. With 10^17 UUIDs, 0.000939953.
Some numbers for comparison can be found on Wikipedia. So you can safely assign a UUID for each human that has lived, each galaxy in the observable universe, each fish in the ocean, and each individual ant on Earth. However, collisions are almost certain if you generate a UUID for each transistor humanity produces in a year, each insect on Earth, each grain of sand on Earth, each star in the observable universe, or anything larger.
If you generate 1 billion UUIDs per second, it would take about 36 years to get a collision probability of 10%.
Eventually, there will probably be a collision among the set of UUIDs generated over the course of human history. Still, the probability that collided UUIDs will be used for the same purpose is vanishingly small, so there's no problem in practice.
An analysis of the possibility of collision is available on Wikipedia: http://en.wikipedia.org/wiki/Uuid#Random_UUID_probability_of_duplicates
As mentioned in the link, this will be affected by the properties of the random number generator.
There is also the possibility of a bug in GUID generator code; while the chances are low, they are probably higher than the chances of a collision based on the mathematics.
A Bloom filter might be appropriate; it can quickly tell you if a GUID is unique, but there's a chance for a false indication of a collision. An alternate method if you're testing a batch at a time is to sort the batch and compare each successive element.
In general, yes it is safe to assume.
If your GUID generator is truly random, the possibilities of a clash within a 1000 GUIDs is extraordinarily small.
Of course, that assumes a good GUID generator. So the question is really about how much you trust the tool you're using to generate GUID and does it have its own tests?
This topic reminds me of the Deck of cards scenario. That is to say that there are so many ways a deck of 52 cards can be arranged, that its pretty much certain that no 2 properly shuffled decks of cards that have ever existed, have been in the same order.
If you take a deck now and shuffle it, that sequence will be unique, and will probably never be seen again in all of humanity. Indeed the potential number of ways to arrange 52 of anything is so unimaginably vast that the chances of any 2 decks happening to be the same order are close to zero.
In this example of having 40 shuffled decks and wanting to know for sure they are all unique, it's not impossible 2 of them are the same but its something that most likely would not occur if you were able to shuffle all the decks once every 10th of a second and you started at the birth of the universe.
While a collision is possible, it is HIGHLY unlikely. (Math here.) It is safe to assume they are in fact distinct.
Usually it is a pretty safe assumption.
http://en.wikipedia.org/wiki/Globally_Unique_Identifier
Is a GUID unique 100% of the time?

Are all scheduling problems NP-Hard?

I know there are some scheduling problems out there that are NP-hard/NP-complete ... however, none of them are stated in such a way to show this situation is also NP.
If you have a set of tasks constrained to a startAfter, startBy, and duration all trying to use a single resource ... can you resolve a schedule or identify that it cannot be resolved without an exhaustive search?
If the answer is "sorry pal, but this is NP-complete" what would be the best heuristic(s?) to use and are there ways to decrease the time it takes to a) resolve a schedule and b) to identify an unresolvable schedule.
I've implemented (in prolog) a basic conflict resolution goal through recursion that implements a "smallest window first" heuristic. This actually finds solutions rather quickly, but is exceptionally slow at finding invalid schedules. Is there a way to overcome this?
Yay for compound questions!
The hardest part of most scheduling problems in real life is getting hold of a reliability and complete set of constraints. If we take the example of creating a university timetable:
Professor A will not get up in the morning, he is on a lot of committees, but no-one will tell the timetable office about this sort of constraint
Department 1 needs the timetable by the start of term, however, Department 2 that uses the same rooms is unwilling to decide on the courses that will be run until after all the students have arrived
Etc
Then you need a schedule system that can cope with changes, so when one constraint is changed at the last minute you don’t have to change the complete timetable.
All of the above is normally ignored in research papers about scheduling systems. As to NP completeness of a given scheduling problem, in real life you don’t care as even if it is not NP complete you are unlikely to even be able to define what the “best solution” is, so good enough is good enough.
See http://www.asap.cs.nott.ac.uk/watt/resources/university.html for a list of papers that may help get you started; there are still many PHDs to be had in scheduling software.
There are often good approximation algorithms for NP-hard/complete optimization problems like scheduling. You might skim the course notes by Ahmed Abu Safia on Approximation Algorithms for scheduling or various papers.
In a sense, all public key cryptography is done with "less hard" problems like factoring partially because NP-hard problems offer up too many easy cases. It's the same NP-completeness that makes them "morally hard" which also gives them too many easy problems, which often fall within some error bound of optimal.
There is a deeper theory of hardness of approximation that discusses the limitations of approximation algorithms though.
You can use dynamic programming to solve some of these things. Greedy algorithms also come to mind. Scheduling theory is both deep and beautiful but those two I find will solve most of the problems I've faced. Perhaps I've been lucky.
What do you mean with startBy?
With startAfter and if there is only one resource, then a fast solution could be to use topological sorting. The example algorithm runs in linear time, but does not include the error case if the graph contains cycles.
Here's one that isn't.
Schedule a set of jobs i= 1,2...n on a single machine which each take time t(i) so that the average waiting time is minimized.
Solution: Sort in increasing order of t(i). O(n log n)
Good list here
Consider the scheduling problem that is in the class P:
Input: list of activities which include the start time and finish time.
Sort by finish time.
Select the first N elements of this sorted list to find the maximum amount of activities you can schedule in a given time.
You can add caveats like: all activities must end at 5pm, well in this case as you work through the list, stop once you reach an activity which ends after this time.

Intelligent Voice Recording: Request for Ideas

Say you have a conference room and meetings take place at arbitrary impromptu times. You would like to keep an audio record of all meetings. In order to make it as easy to use as possible, no action would be required on the part of meeting attenders, they just know that when they have a meeting in a specific room they will have a record of it.
Obviously just recording nonstop would be inefficient as it would be a waste of data storage and a pain to sift through.
I figure there are two basic ways to go about it.
Recording simply starts and stops according to sound level thresholds.
Recording is continuous, but split into X minute blocks. Blocks found to contain no content are discarded.
I like the second way better because I feel there is less risk for losing data because of late starts, or triggers failing.
I would like to implement in Python, and on Windows if possible.
Implementation suggestions?
Bonus considerations that probably deserve their own questions:
best audio format and compression for this purpose
any way of determining how many speakers are present, assuming identification is unrealistic
This is one of those projects where the path is going to be defined more about what's on hand for ready reuse.
You'll probably find it easier to continuously record and saving the data off in chunks (for example, hour long pieces).
Format is going to be dependent on what you in the form of recording tools and audio processing library. You may even find that you use two. One format, like PCM encoded WAV for recording and processing, but compressed MP3 for storage.
Once you have an audio stream, you'll need to access it in a PCM form (list of amplitude values). A simple averaging approach will probably be good enough to detect when there is a conversation. Typical tuning attributes:
* Average energy level to trigger
* Amount of time you need to be at the energy level or below to identify stop and start (I recommend two different values)
* Size of analysis window for averaging
As for number of participants, unless you find a library that does this, I don't see an easy solution. I've used speech recognition engines before and also done a reasonable amount of audio processing and I haven't seen any 'easy' ways to do this. If you were to look, search out universities doing speech analysis research. You may find some prototypes you can modify to give your software some clues.
I think you'll have difficulty doing this entirely in Python. You're talking about doing frequency/amplitude analysis of MP3 files. You would have to open up the file and look for a volume threshold, then cut out the portions that go below that threshold. Figuring out how many speakers are present would require very advanced signal processing.
A cursory Google search turned up nothing for me. You might have better luck looking for an off-the-shelf solution.
As an aside- there may be legal complications to having a recorder running 24/7 without letting people know.

Resources