Scrabble - the best move [closed] - math

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I have made algorithm for scrabble . It uses strategy of highest score . But I do not think that is is the best way to play the game.
My question is: is there any advanced math for scrabble that suggests not the highest score word but an other one that will increase the probability to win?
Or in other words some different strategy then highest score?
I have my own ideas how it can be. For example, suppose there are two words that have almost the same score (s1 > s2) but lets say the second word does not open new way to 3W or 2W and even its score is less then the score of first one, than it is good to use the second word and not the first one.

From my experience with scrabble, you are correct in that you don't necessarily want to always suggest the highest scoring word. Rather, you want to suggest the best word. I don't think this requires a lot of advanced math to pull off.
Here are some suggestions:
In your current algorithm, rank all your letters, particularly consonants, by ease of use. For example, the letter "S" would have the highest ease of use because it is the most flexible. That is, when you play a given word and leave out the letter "S", you are essentially opening up the possibility for better word choices with the new letters than come into play on your next turn.
Balance out vowel and consonant usage in your words. As a regular scrabble player, I don't always play the best scoring word if the best scoring word doesn't use enough vowels. For example, if I use 4 letters than contain no vowels and I have 3 vowels left in my array of letters, chances are I am going to draw at least two vowels on my next turn, which would leave me with 5 vowels and 2 consonants, which chances are doesn't open up a lot of opportunity for high scoring words. It is almost always better to use more vowels than consonants in your words, especially the letter I. Your algorithm should reflect some of this when selecting the best word.
I hope this gives you a good start. Once your algorithm is able to select the best scoring word, you can fine tune it with these suggestions in order to be an overall better scorer in your scrabble games. (I am assuming this is some sort of AI you are creating)

My question is: is there any advanced math for scrabble that suggests not the highest score word but an other one that will increase the probability to win?
As ROFLwTIME mentioned, you need to also account for the letters that you haven't played.
In doing that accounting, you need to account for how letters interact with one another. For example, suppose you have a Q, a U, and five other letters. Suppose the best you can score playing both the Q and the U is 30 points, but you can score more by playing the U but leaving the Q unplayed. Unless that "more" is much more than 30, either play the word with the Q or find a third word that leaves both the Q and the U unplayed.
You also need to account for the opportunities the word you play creates for your opponents. A typical game theory strategy is to maximize your score while minimizing your opponents score, maximin for short. Playing a 20 point word that allows your opponent to play a 50 point word is not a good idea.

Related

Counting to a million in Python - Theory

I'm learning Python and came across a question that went something like "How long would it take to count to 1,000,000 out loud?" The only parameter it gave was, "you count, on average, 1 digit per second." I did that problem, which wasn't very difficult. Then I started thinking about counting aloud, annunciating each numeral. That parameter seems off to me, and indeed the answer Google gives to the question alone "how long to count to a million" suggests it's off. Given that each number in the sequence takes progressively longer (an exponential increase??), there must be a better way.
Any ideas or general guidance would be of assistance. Would sampling various people's "counting rates" at various intervals work? Would programming the # of syllables work? I am really curious, and have looked all over SO and Google for solutions that don't revolve around that seemingly inaccurate "average time".
Thanks, and sorry if this isn't on topic or in the appropriate place. I'm a long time lurker, but new to posting, so let me know if you need more info or anything. Thanks!
Let us suppose for the sake of simplicity that you don't say 1502 as "fifteen hundred and two", but as "thousand five hundred and two". Then we can hierarchically break it down.
And let's ignore the fact whether you say "and" or not (though apparently it is more said than not) for now. I will use this reference (and British English, because I like it more and it's more consistent : http://forum.wordreference.com/showthread.php?t=15&langid=6) for how to pronounce numbers.
In fact, to formally describe this, let t be a function of a set of numbers, that tells you how much time it takes to pronounce every number in that set. Then your question is how to compute t([1..1000000]), and we will use M=t([1..999])
Triplet time in function of previous one
To read a large number we start at the left and read the three-digit groups. The group at the left, of course, may have only one or two digits.
Thus for every number x of thousands you will say x thousand y where y will describe all the numbers from 1 to 999.
Thus the time you spend in the x thousand ... is 1000 t({1000x}) + M, as detailed here after :
Note that this formula is generalizable to numbers below 1000, by simply defining t({0}) = 0.
Now the time to say "x thousand" is, per our hypothesis, equal to the time to say "x" plus the time to say "thousand" (when x > 0). Thus your answer is :
Where is the time it takes to say the word thousand. This supposes you say 1000 as "one thousand". You may want to remove 1000 tau("one") if you would only say "thousand".
How ever I stick with the reference :
The numbers 100-199 begin with one hundred... or a hundred...
You can in exactly the same way express the time it takes to count to a billion from and the number above, and so on for all the greater powers of 103, i.e.
Taking into account the "and"
There is a small correction to be done. Let us suppose that M is the time it takes to pronounce numbers from 1 to 999 when they are preceded by at least a non-0 group of numbers, including initial "and"s.
Our reference (well, the wordreference post I linked) says the following :
What do we say to join the groups?
Normally, we don’t use any joining word.
The exception is the last group.
If the last group after the thousands is 1-99 it is joined with and.
Thus our correction applies only to the numbers between 0 and 999 (where there is no non-zero group preceding) :
Getting M
Or rather, let's get t([1..999]) since it's more natural and we know how it is related to M.
Let C = t([1..99]), X = t([1..9]).
Between 1 and 999 we have all the numbers from [1..99] and all the 9 exact hundreds where you don't say "and", that is 108 occurences. There are 900 numbers prefixed with a hundreds number.
Thus
C is probably hard to break down, so I'm not going to try.
Final result
The corrected formula is :
And as a function of C and X :
Note that your measures of tau(word), C, and X need to be very precise if you plan on doing this multiplication and having any kind of correct order of magnitude.
Conclusion : Brits end up saying "and" a whole lot. The nice thing about the last formulation is that you can remove all the "and"s if you decide you actually don't want to pronounce them.

Graph traversal

At a party with n people P1, . . . , Pn, certain pairs of individuals cannot stand each other.
Given a list of such pairs, determine if we can divide the n people into two groups such that all the people
in both group are amicable, that is, they can stand each other.
Suppose we have a G that the pairs of people cannot be in the same group has a edge between them. Use DFS in this G and set Group1 for s, and then Group2 for its successor, and then Group2.... If we can finish it, we find it, otherwise, there are some collisions, which means we can't divide them into two groups as the question asked.
One brute force solution would be to find all possible combinations of n choose n/2 people and then verify that everyone in the group is amicable, if so, then you must check everyone in the other half as well. If both sides are happy then you've found a solution. Otherwise, move on to the next combination. Obviously, this is not an ideal solution, but it does work deterministically. Typically in an interview, it is best to start with something that works and iterate on to better ideas.
A more sophisticated solution would compute the complement graph, then remove any edges that are not bi-directional, pick an arbitrary node to start from, use depth-first search, mark every node found in group 1. Then pick any unmarked node, and mark every node found in group 2. If there are any remaining unmarked nodes, then the individuals cannot be divided into two amicable groups.

How to select stop words using tf-idf? (non english corpus)

I have managed to evaluate the tf-idf function for a given corpus. How can I find the stopwords and the best words for each document? I understand that a low tf-idf for a given word and document means that it is not a good word for selecting that document.
Stop-words are those words that appear very commonly across the documents, therefore loosing their representativeness. The best way to observe this is to measure the number of documents a term appears in and filter those that appear in more than 50% of them, or the top 500 or some type of threshold that you will have to tune.
The best (as in more representative) terms in a document are those with higher tf-idf because those terms are common in the document, while being rare in the collection.
As a quick note, as #Kevin pointed out, very common terms in the collection (i.e., stop-words) produce very low tf-idf anyway. However, they will change some computations and this would be wrong if you assume they are pure noise (which might not be true depending on the task). In addition, if they are included your algorithm would be slightly slower.
edit:
As #FelipeHammel says, you can directly use the IDF (remember to invert the order) as a measure which is (inversely) proportional to df. This is completely equivalent for ranking purposes, and therefore to select the top "k" terms. However, it is not possible to use it to select based on ratios (e.g., words that appear in more than 50% of the documents), although a simple thresholding will fix that (i.e., selecting terms with idf lower than a specific value). In general, a fix number of terms is used.
I hope this helps.
From "Introduction to Information Retrieval" book:
tf-idf assigns to term t a weight in document d that is
highest when t occurs many times within a small number of documents (thus lending high discriminating power to those documents);
lower when the term occurs fewer times in a document, or occurs in many documents (thus offering a less pronounced relevance signal);
lowest when the term occurs in virtually all documents.
So words with lowest tf-idf can considered as stop words.

How can determine dice sum probabilities? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
In trying to solve a particular Project Euler question, I ran into difficulties with a particular mathematical formula. According to this web page (http://www.mathpages.com/home/kmath093.htm), the formula for determining the probability for rolling a sum, T, on a number of dice, n, each with number of sides, s, each numbered 1 to s, can be given as follows:
alt text http://www.freeimagehosting.net/uploads/8294d47194.gif
After I started getting nonsensical answers in my program, I started stepping through, and tried this for some specific values. In particular, I decided to try the formula for a sum T=20, for n=9 dice, each with s=4 sides. As the sum of 9 4-sided dice should give a bell-like curve of results, ranging from 4 to 36, a sum of 20 seems like it should be fairly (relatively speaking) likely. Dropping the values into the formula, I got:
alt text http://www.freeimagehosting.net/uploads/8e7b339e32.gif
Since j runs from 0 to 7, we must add over all j...but for most of these values, the result is 0, because at least one the choose formulae results are 0. The only values for j that seem to return non-0 results are 3 and 4. Dropping 3 and 4 into this formula, I got
alt text http://www.freeimagehosting.net/uploads/490f943fa5.gif
Which, when simplified, seemed to go to:
alt text http://www.freeimagehosting.net/uploads/603ca84541.gif
which eventually simplifies down to ~30.75. Now, as a probability, of course, 30.75 is way off...the probability must be between 0 and 1, so something has gone terribly wrong. But I'm not clear what it is.
Could I misunderstanding the formula? Very possible, though I'm not clear at all where the breakdown would be occuring. Could it be transcribed wrong on the web page? Also possible, but I've found it difficult to find another version of it online to check it against. Could I be just making a silly math error? Also possible...though my program comes up with a similar value, so I think it's more likely that I'm misunderstanding something.
Any hints?
(I would post this on MathOverflow.com, but I don't think it even comes close to being the kind of "postgraduate-level" mathematics that is required to survive there.)
Also: I definitely do not want the answer to the Project Euler question, and I suspect that other people that my stumble across this would feel the same way. I'm just trying to figure out where my math skills are breaking down.
According to mathworld (formula 9 is the relevant one), the formula from your source is wrong.
The correct formula is supposed to be n choose j, not n choose T. That'll really reduce the size of the values within the summation.
The mathworld formula uses k instead of j and p instead of T:
Take a look at article in wikipedia - Dice.
The formula here looks almost similar, but have one difference. I think it will solve your problem.
I'm going to have to show my ignorance here.... Isn't 9 choose 20 = 0? More generally, isn't n choose T going to always be 0 since T>=n? Perhaps I'm reading this formula incorrectly (I'm not a math expert), but looking at de Moive's work, I'm not sure how this formula was derived; it seems slightly off. You might try working up from Moive's original math, page 39, in the lemma.

Looking for a good reference on calculating permutations

As a programmer, I frequently need to be able to know the
how to calculate the number of permutations of a set, usually
for estimation purposes.
There are a lot of different ways specify the allowable
combinations, depending on the problem at hand. For example,
given the set of letters A,B,C,D
Assuming a 4 digit result, how many ways can those letters
be arranged?
What if you can have 1,2,3 or 4 digits, then how many ways?
What if you are only allowed to use each letter at most once?
twice?
What if you must avoid the same letter appearing twice in
a row, but if they are not in a row, then twice is ok?
Etc. I'm sure there are many more.
Does anyone know of a web reference or book that talks about
this subject in terms that a non-mathematician can understand?
Thanks!
Assuming a 4 digit result, how many
ways can those letters be arranged?
when picking the 1st digital , you have 4 choices ,which is one of A, B , C and D ; it is the same when picking the 2nd, 3rd ,4th since repetition is allowed:
so you have total : 4*4*4*4 = 256 choices.
What if you can have 1,2,3 or 4
digits, then how many ways?
It is easy to deduce from question 1.
What if you are only allowed to use
each letter at most once?
When pick the 1st digital , you have 4 choices ,which is one of A , B , c and D ; when picking the 2nd , you have 3 choice except the one you have picked for the 1st ; and 2 choices for 3rd , 1 choices for the 4th.
So you have total : 4 * 3 * 2 * 1 = 24 choice.
The knowledge involving here include combination , permutation and probability. Here is a good tutorial to understand their difference.
First of all the topics you are speaking of are
Permutations (where the order matters)
Combinations (order doesn't matter)
I would recommend Math Tutor DVD for teaching yourself math topics. The "probability and statistics" disk set will give you the formulas and skill you need to solve the problems. It's great because it's the closest thing you can get to going back to school, because a teacher solves problems on a white board for you.
I've found a clip on the Combinations chapter of the video for you to check out.
If you need to do more than just count the number of combinations and permutations, if you actually need to generate the sequences, then Donald Knuth's books Generating all combinations and partitions and Generating all tuples and permutations. He goes into great detail regarding algorithms subject to various restrictions, looking at the advantages and disadvantages of different solutions for each problem.
It all depends on how simply do you need the explanation to be.
The topic you are looking for is called "Permutations and Combinations".
Here's a fairly simply introduction. There are dozens like this on the first few pages from google.

Resources