Making a cryptaritmetic solver in C++ - math

I am planning out a C++ program that takes 3 strings that represent a cryptarithmetic puzzle. For example, given TWO, TWO, and FOUR, the program would find digit substitutions for each letter such that the mathematical expression
TWO
+ TWO
------
FOUR
is true, with the inputs assumed to be right justified. One way to go about this would of course be to just brute force it, assigning every possible substitution for each letter with nested loops, trying the sum repeatedly, etc., until the answer is finally found.
My thought is that though this is terribly inefficient, the underlying loop-check thing may be a feasible (or even necessary) way to go--after a series of deductions are performed to limit the domains of each variable. I'm finding it kind of hard to visualize, but would it be reasonable to first assume a general/padded structure like this (each X represents a not-necessarily distinct digit, and each C is a carry digit, which in this case, will either be 0 or 1)? :
CCC.....CCC
XXX.....XXXX
+ XXX.....XXXX
----------------
CXXX.....XXXX
With that in mind, some more planning thoughts:
-Though leading zeros will not be given in the problem, I probably ought to add enough of them where appropriate to even things out/match operands up.
-I'm thinking I should start with a set of possible values 0-9 for each letter, perhaps stored as vectors in a 'domains' table, and eliminate values from this as deductions are made. For example, if I see some letters lined up like this
A
C
--
A
, I can tell that C is zero and this eliminate all other values from its domain. I can think of quite a few deductions, but generalizing them to all kinds of little situations and putting it into code seems kind of tricky at first glance.
-Assuming I have a good series of deductions that run through things and boot out lots of values from the domains table, I suppose I'd still just loop over everything and hope that the state space is small enough to generate a solution in a reasonable amount of time. But it feels like there has to be more to it than that! -- maybe some clever equations to set up or something along those lines.
Tips are appreciated!

You could iterate over this problem from right to left, i.e. the way you'd perform the actual operation. Start with the rightmost column. For every digit you encounter, you check whether there already is an assignment for that digit. If there is, you use its value and go on. If there isn't, then you enter a loop over all possible digits (perhaps omitting already used ones if you want a bijective map) and recursively continue with each possible assignment. When you reach the sum row, you again check whether the variable for the digit given there is already assigned. If it is not, you assign the last digit of your current sum, and then continue to the next higher valued column, taking the carry with you. If there already is an assignment, and it agrees with the last digit of your result, you proceed in the same way. If there is an assignment and it disagrees, then you abort the current branch, and return to the closest loop where you had other digits to choose from.
The benefit of this approach should be that many variables are determined by a sum, instead of guessed up front. Particularly for letters which only occur in the sum row, this might be a huge win. Furthermore, you might be able to spot errors early on, thus avoiding choices for letters in some cases where the choices you made so far are already inconsistent. A drawback might be the slightly more complicated recursive structure of your program. But once you got that right, you'll also have learned a good deal about turning thoughts into code.

I solved this problem at my blog using a randomized hill-climbing algorithm. The basic idea is to choose a random assignment of digits to letters, "score" the assignment by computing the difference between the two sides of the equation, then altering the assignment (swap two digits) and recompute the score, keeping those changes that improve the score and discarding those changes that don't. That's hill-climbing, because you only accept changes in one direction. The problem with hill-climbing is that it sometimes gets stuck in a local maximum, so every so often you throw out the current attempt and start over; that's the randomization part of the algorithm. The algorithm is very fast: it solves every cryptarithm I have given it in fractions of a second.

Cryptarithmetic problems are classic constraint satisfaction problems. Basically, what you need to do is have your program generate constraints based on the inputs such that you end up with something like the following, using your given example:
O + O = 2O = R + 10Carry1
W + W + Carry1 = 2W + Carry1 = U + 10Carry2
T + T + Carry2 = 2T + Carry2 = O + 10Carry3 = O + 10F
Generalized pseudocode:
for i in range of shorter input, or either input if they're the same length:
shorterInput[i] + longerInput2[i] + Carry[i] = result[i] + 10*Carry[i+1] // Carry[0] == 0
for the rest of the longer input, if one is longer:
longerInput[i] + Carry[i] = result[i] + 10*Carry[i+1]
Additional constraints based on the definition of the problem:
Range(digits) == {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
Range(auxiliary_carries) == {0, 1}
So for your example:
Range(O, W, T) == {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
Range(Carry1, Carry2, F) == {0, 1}
Once you've generated the constraints to limit your search space, you can use CSP resolution techniques as described in the linked article to walk the search space and determine your solution (if one exists, of course). The concept of (local) consistency is very important here and taking advantage of it allows you to possibly greatly reduce the search space for CSPs.
As a simple example, note that cryptarithmetic generally does not use leading zeroes, meaning if the result is longer than both inputs the final digit, i.e. the last carry digit, must be 1 (so in your example, it means F == 1). This constraint can then be propagated backwards, as it means that 2T + Carry2 == O + 10; in other words, the minimum value for T must be 5, as Carry2 can be at most 1 and 2(4)+1==9. There are other methods of enhancing the search (min-conflicts algorithm, etc.), but I'd rather not turn this answer into a full-fledged CSP class so I'll leave further investigation up to you.
(Note that you can't make assumptions like A+C=A -> C == 0 except for in least significant column due to the possibility of C being 9 and the carry digit into the column being 1. That does mean that C in general will be limited to the domain {0, 9}, however, so you weren't completely off with that.)

Related

Why reverse the remainder during decimal to binary conversion?

There is a particular method of converting a decimal (with a decimal point, like xx.xx) to a binary number. It is detailed here: https://www.geeksforgeeks.org/convert-decimal-fraction-binary-number/
I can apply this process but am having trouble understanding WHY it works.
Basically, it calculates the left side of the decimal point separately from the right side - this part I have no issue with.
For example, if we have 6.9, it will start by calculating the left side: 6.
6 divided by 2 gives us 3, with a remainder of 0.
3 divided by 2 gives us 1, with a remainder of 1.
1 divided by 2 gives us 0, with a remainder of 1.
For some reason, it now takes the REVERSE of this, which is 110, and this magically becomes 6.
I don't understand why the remainder of the least significant division (1 divided by 2) is now used in the most significant bit of the answer, and this somehow works.
Similarly confused about why the method for the right hand side works.
Does anyone have some intuition they can share about this particular process of converting decimals to binaries? Again, I have no problem performing the calculation as the computation is quite easy. I simply don't understand why this works.
Think of it like this :
A binary representation b_n, b_(n-1), .., b_0 (least significant bit on the right) represents the number
k = b_n*2^n + b_(n-1)*2^(n-1) + ... + b_0*2^0 (remember that 2^0 is just 1).
To get the least significant bit, you want to know whether this number divides evenly into 2's, because if it doesn't then you know that b_0 == 1 because all the other terms surely divide evenly, as they all have some positive power of 2 in front. Thus the remainder from the division by two is b_0. Don't divide just yet, only get the remainder and write it down.
Now we would like to get rid of that last bit and start over again to get the next one. How can we do that? Simply divide k by two. Because then you get:
k/2 = b_n*2^(n-1) + b_(n-1)*2^(n-2) + ... + b_1*2^0 (Divide each term in the sum by 2, thus decreasing the power. The last term disappears because it was either 0 or 1, which both give 0 when divided by 2)
Or written in binary (without the powers of two) : b_n, b_(n-1), .., b_1.
Now we get a new number which is simply the same as before where the least significant bit has been thrown away and everything shifted to the right. So we can start this whole process again with k/2 to get b_1. And then b_2. And so on.
Here I separated getting the remainder and dividing to make it clearer, but you can do them at the same time if you want to, it's the same thing.
I hope you see how, during this process, we get the bits from right to left, which is why you want to flip the whole thing in the end if you have been writing them down from left to right.

How to represent Graham's number in programming language (specifically python)?

I'm new to programming, and I would like to know how to represent Graham's number in python (the language I decided to learn as my first). I can show someone in real life on paper how to somewhat get to Graham's number, but for anyone who doesn't know what it is, here it is.
So imagine you have a 3. Now I can represent powers of 3 using ^ (up arrow). So 3^3 = 27. 3^^3 = 3^3^3 = 3^(3^3) = 3^27 = 7625597484987. 3^^^3 = 3^7625597484987 = scary big number. Now we can see that every time you add an up arrow, the number gets massively big.
Now imagine 3^^^^3 = 3^(3^7625597484987)... this number is stupid big.
So now that we have 3^(3^7625597484987), we will call this G1. That's 3^^^^3 (4 arrows in between the 3's).
Now G2 is basically 3 with G1 number of arrows in between them. Whatever number 3^(3^7625597484987) is, is the number of arrows in between the 2 3's of G2. So this number has 3^(3^7625597484987) number of arrows.
Now G3 has G2 number of arrows in between the 2 3's. We can see that this number (G3) is just huge. Stupidly huge. So each G has the number of arrows represented by the G before the current G.
Do this over and over again, placing and previous number of "G" arrows into the next number. Do this until G64, THAT'S Graham's number.
Now here is my question. How do you represent "the number of a certain thing (in this case arrows) is the number of arrows in the next G"? How do you represent the number of something "goes into" the next string in programming. If I can't do it in python, please post which languages this would be possible in. Thanks for any responses.
This is an example of a recursively defined number, which can be expressed using a base case and a recursive case. These two cases capture the idea of "every output is calculated from a previous output following the rule '____,' except for the first one which is ____."
A simple canonical one would be the factorial, which uses the base case 1! = 1 (or sometimes 0! = 1) and the recursive case n! = n * (n-1)! (when n>1). Or, in a more code-like fashion, f(1) = 1 and f(n) = n * f(n-1). For more information on how to write a function that behaves like this, find a good resource to learn about recursion.
The G1, G2, G3, etc. in the definition of Graham's number are like these calls to the previous results. You need a function to represent an "arrow" so you can call an arbitrary number of them. So to translate your mental model into code:
"[...] we will call this [3^^^^3] G1. [...] 'The number of a certain thing (in this case arrows) is the number of arrows in the next G' [...]"
Let the aforementioned function be arrow(n) = 3^^^...^3 (where there are n arrows). Then to get Graham's number, your base case will be G(1) = arrow(4), and your recursive case will be G(n) = arrow(G(n-1)) (when n>1).
Note that arrow(n) will also have a recursive definition (see Knuth's up-arrow notation), as you showed but didn't describe, by calculating each output from the previous (emphasis added):
3^3 = 27.
3^^3 = 3^3^3 = 3^(3^3) = 3^27 = 7625597484987.
3^^^3 = 3^7625597484987 = scary big number.
You might describe this as "every output [arrow(n)] is 3 to the power of the previous output, except for the first one [arrow(1)] which is just 3." I'll leave the translation from description to definition as an exercise.
(Also, I hope you didn't actually want to represent Graham's number, but rather just its calculation, since it's too big even for Python or any computer for that matter.)
class GrahamsNumber(object):
pass
G = GrahamsNumber()
For most purposes, this is as good of a representation of Graham's number as any other one. Sure, you can't do anything useful with G, but for the most part you can't do anything useful with Graham's number either, so it accurately reflects realistic use cases.
If you do have some specific use cases (and they're feasible), then a representation can be tailored to allow those use cases; but we can't guess what you want to be able to do with G, you have to spell it out explicitly.
For fun, I implemented hyperoperators in python as the module hyperop and one of my examples is Graham's number:
def GrahamsNumber():
# This may take awhile...
g = 4
for n in range(1,64+1):
g = hyperop(g+2)(3,3)
return g
The magic is in the recursion, which loosely stated looks like H[n](x,y) = reduce(lambda x,y: H[n-1](y,x), [a,]*b).
To answer your question, this function (in theory) will calculate the number in question. Since there is no way this program will ever finish before the heat death of the Universe, you gain more of an understanding from the "function" itself than the actual value.

derangements and permutations in cryptography

i have a problem that i am having a bit of trouble with;
we are given a partial key (missing 11 letters) for a mono-alphabetic substitution cipher and asked to calculate the number of possible keys given that no plaintext letter can be mapped to itself.
ordinarily, the number of possible keys would be the number of derangements of the missing letters (!11), however 5 of the plaintext letters that are missing mappings already exist as mappings in the partial key, so logically it shouldnt matter what the mapping of those plaintext letters is, because they can never map to themselves.
so shouldnt the number of possible keys be 5! * !6, ie. (the number of permutations of the 5 already mapped free letters) * (the number of derangements of the remaining 6)?
the problem is that 5! * !6 = 31800 which is much less than !11 = 14684570
intuitively the set of derangements should be a smaller subset of !11, shouldnt it?
am i just getting something wrong in my arithmetic? or am i completely missing the concepts? any help would be greatly appreciated
thanks gus
ps. i know this isn't strictly a programming question, but it is a computing question and related to a programming project, so i thought it might be pertinent. also, i posted it on math.stackexchange.com yesterday but havent had any responses yet..
EDIT: corrected the value of !11
I think your problem can be rephrased as the following:
How many permutation has a list with elements a_0, a_1, ... a_n-1, b_0, b_1, ..., b_m-1, in which no a_k element is at position k? (Let us denote this number with p_{n,m} - your specific question is the value of p_{6,5}.)
Please note that your suggested formula 5!*!6 is not correct because of the following:
it only counts the cases, where the a_ks are in the first 6 positions (without any of them being in the position of its own index), and the b_ks on the last 5.
You do not count any other configurations like: a_3, b_4, b_1, a_0, a_5, b_0, a_2, b_2, b_3, a_1, a_4, where the order is totally mixed.
Your other idea about the result being a subset of the !11-element derangement on all the elements is also not correct, as any of the b_ks can be at any position.
However, we can easily add a recursive formula for p_{n,m} by separating it into two cases based on the position of a_0.
If a_0 gets in one of the positions 1, 2, ..., n-1. (n-1 different possibilities.)
This means that neither a_0 is at position 0, and it also prevents another a_k from being at position k by occupying that position. Thus this a_k becomes 'free', it can go to any other positions. If a_0 gets fixed this way, the other elements can be permutated in p_{n-2,m+1} different ways.
If a_0 gets in one of the positions n, n+1, ..., n+m-1. (m different possibilities.)
This way no other a_k gets prevented to be at the position corresponding to it's index. The other elements can be permutated in p_{n-1,m} different ways.
Adding this together gives the recursion: p_{n,m} = (n-1)*p_{n-2,m+1} + m*p_{n-1,m}. The halting conditions are p_{0,m}=m! for every m, as it means, that each element can be at any location.
I also coded it in python:
import math
def derange(n,m):
if n<0:
return 0
elif n==0:
return math.factorial(m)
else:
return (n-1)*derange(n-2, m+1) + m*derange(n-1, m)
print derange(6,5)
gives 22852200.
If you are interested in the general case, you can find some related sequences on OEIS.
The search term 'differences of factorial numbers' can be interesting, e.g. in triangular form: http://oeis.org/A047920.
There is also an article mentioned there: http://www.pmfbl.org/janjic/enumfun.pdf, maybe it can help if you are interested in a generic closed formula for n and m.
Suddenly I didn't have any good idea to come up with, but I think this can be a good starting point.

log-sum-exp trick why not recursive

I have been researching the log-sum-exp problem. I have a list of numbers stored as logarithms which I would like to sum and store in a logarithm.
the naive algorithm is
def naive(listOfLogs):
return math.log10(sum(10**x for x in listOfLogs))
many websites including:
logsumexp implementation in C?
and
http://machineintelligence.tumblr.com/post/4998477107/
recommend using
def recommend(listOfLogs):
maxLog = max(listOfLogs)
return maxLog + math.log10(sum(10**(x-maxLog) for x in listOfLogs))
aka
def recommend(listOfLogs):
maxLog = max(listOfLogs)
return maxLog + naive((x-maxLog) for x in listOfLogs)
what I don't understand is if recommended algorithm is better why should we call it recursively?
would that provide even more benefit?
def recursive(listOfLogs):
maxLog = max(listOfLogs)
return maxLog + recursive((x-maxLog) for x in listOfLogs)
while I'm asking are there other tricks to make this calculation more numerically stable?
Some background for others: when you're computing an expression of the following type directly
ln( exp(x_1) + exp(x_2) + ... )
you can run into two kinds of problems:
exp(x_i) can overflow (x_i is too big), resulting in numbers that you can't add together
exp(x_i) can underflow (x_i is too small), resulting in a bunch of zeroes
If all the values are big, or all are small, we can divide by some exp(const) and add const to the outside of the ln to get the same value. Thus if we can pick the right const, we can shift the values into some range to prevent overflow/underflow.
The OP's question is, why do we pick max(x_i) for this const instead of any other value? Why don't we recursively do this calculation, picking the max out of each subset and computing the logarithm repeatedly?
The answer: because it doesn't matter.
The reason? Let's say x_1 = 10 is big, and x_2 = -10 is small. (These numbers aren't even very large in magnitude, right?) The expression
ln( exp(10) + exp(-10) )
will give you a value very close to 10. If you don't believe me, go try it. In fact, in general, ln( exp(x_1) + exp(x_2) + ... ) will give be very close to max(x_i) if some particular x_i is much bigger than all the others. (As an aside, this functional form, asymptotically, actually lets you mathematically pick the maximum from a set of numbers.)
Hence, the reason we pick the max instead of any other value is because the smaller values will hardly affect the result. If they underflow, they would have been too small to affect the sum anyway, because it would be dominated by the largest number and anything close to it. In computing terms, the contribution of the small numbers will be less than an ulp after computing the ln. So there's no reason to waste time computing the expression for the smaller values recursively if they will be lost in your final result anyway.
If you wanted to be really persnickety about implementing this, you'd divide by exp(max(x_i) - some_constant) or so to 'center' the resulting values around 1 to avoid both overflow and underflow, and that might give you a few extra digits of precision in the result. But avoiding overflow is much more important about avoiding underflow, because the former determines the result and the latter doesn't, so it's much simpler just to do it this way.
Not really any better to do it recursively. The problem's just that you want to make sure your finite-precision arithmetic doesn't swamp the answer in noise. By dealing with the max on its own, you ensure that any junk is kept small in the final answer because the most significant component of it is guaranteed to get through.
Apologies for the waffly explanation. Try it with some numbers yourself (a sensible list to start with might be [1E-5,1E25,1E-5]) and see what happens to get a feel for it.
As you have defined it, your recursive function will never terminate. That's because ((x-maxlog) for x in listOfLogs) still has the same number of elements as listOfLogs.
I don't think that this is easily fixable either, without significantly impacting either the performance or the precision (compared to the non-recursive version).

modifying an element of a list in-place in J, can it be done?

I have been playing with an implementation of lookandsay (OEIS A005150) in J. I have made two versions, both very simple, using while. type control structures. One recurs, the other loops. Because I am compulsive, I started running comparative timing on the versions.
look and say is the sequence 1 11 21 1211 111221 that s, one one, two ones, etc.
For early elements of the list (up to around 20) the looping version wins, but only by a tiny amount. Timings around 30 cause the recursive version to win, by a large enough amount that the recursive version might be preferred if the stack space were adequate to support it. I looked at why, and I believe that it has to do with handling intermediate results. The 30th number in the sequence has 5808 digits. (32nd number, 9898 digits, 34th, 16774.)
When you are doing the problem with recursion, you can hold the intermediate results in the recursive call, and the unstacking at the end builds the results so that there is minimal handling of the results.
In the list version, you need a variable to hold the result. Every loop iteration causes you to need to add two elements to the result.
The problem, as I see it, is that I can't find any way in J to modify an extant array without completely reassigning it. So I am saying
try. o =. o,e,(0&{y) catch. o =. e,(0&{y) end.
to put an element into o where o might not have a value when we start. That may be notably slower than
o =. i.0
.
.
.
o =. (,o),e,(0&{y)
The point is that the result gets the wrong shape without the ravels, or so it seems. It is inheriting a shape from i.0 somehow.
But even functions like } amend don't modify a list, they return a list that has a modification made to it, and if you want to save the list you need to assign it. As the size of the assigned list increases (as you walk the the number from the beginning to the end making the next number) the assignment seems to take more time and more time. This assignment is really the only thing I can see that would make element 32, 9898 digits, take less time in the recursive version while element 20 (408 digits) takes less time in the loopy version.
The recursive version builds the return with:
e,(0&{y),(,lookandsay e }. y)
The above line is both the return line from the function and the recursion, so the whole return vector gets built at once as the call gets to the end of the string and everything unstacks.
In APL I thought that one could say something on the order of:
a[1+rho a] <- new element
But when I try this in NARS2000 I find that it causes an index error. I don't have access to any other APL, I might be remembering this idiom from APL Plus, I doubt it worked this way in APL\360 or APL\1130. I might be misremembering it completely.
I can find no way to do that in J. It might be that there is no way to do that, but the next thought is to pre-allocate an array that could hold results, and to change individual entries. I see no way to do that either - that is, J does not seem to support the APL idiom:
a<- iota 5
a[3] <- -1
Is this one of those side effect things that is disallowed because of language purity?
Does the interpreter recognize a=. a,foo or some of its variants as a thing that it should fastpath to a[>:#a]=.foo internally?
This is the recursive version, just for the heck of it. I have tried a bunch of different versions and I believe that the longer the program, the slower, and generally, the more complex, the slower. Generally, the program can be chained so that if you want the nth number you can do lookandsay^: n ] y. I have tried a number of optimizations, but the problem I have is that I can't tell what environment I am sending my output into. If I could tell that I was sending it to the next iteration of the program I would send it as an array of digits rather than as a big number.
I also suspect that if I could figure out how to make a tacit version of the code, it would run faster, based on my finding that when I add something to the code that should make it shorter, it runs longer.
lookandsay=: 3 : 0
if. 0 = # ,y do. return. end. NB. return on empty argument
if. 1 ~: ##$ y do. NB. convert rank 0 argument to list of digits
y =. (10&#.^:_1) x: y
f =. 1
assert. 1 = ##$ y NB. the converted argument must be rank 1
else.
NB. yw =. y
f =. 0
end.
NB. e should be a count of the digits that match the leading digit.
e=.+/*./\y=0&{y
if. f do.
o=. e,(0&{y),(,lookandsay e }. y)
assert. e = 0&{ o
10&#. x: o
return.
else.
e,(0&{y),(,lookandsay e }. y)
return.
end.
)
I was interested in the characteristics of the numbers produced. I found that if you start with a 1, the numerals never get higher than 3. If you start with a numeral higher than 3, it will survive as a singleton, and you can also get a number into the generated numbers by starting with something like 888888888 which will generate a number with one 9 in it and a single 8 at the end of the number. But other than the singletons, no digit gets higher than 3.
Edit:
I did some more measuring. I had originally written the program to accept either a vector or a scalar, the idea being that internally I'd work with a vector. I had thought about passing a vector from one layer of code to the other, and I still might using a left argument to control code. With I pass the top level a vector the code runs enormously faster, so my guess is that most of the cpu is being eaten by converting very long numbers from vectors to digits. The recursive routine always passes down a vector when it recurs which might be why it is almost as fast as the loop.
That does not change my question.
I have an answer for this which I can't post for three hours. I will post it then, please don't do a ton of research to answer it.
assignments like
arr=. 'z' 15} arr
are executed in place. (See JWiki article for other supported in-place operations)
Interpreter determines that only small portion of arr is updated and does not create entire new list to reassign.
What happens in your case is not that array is being reassigned, but that it grows many times in small increments, causing memory allocation and reallocation.
If you preallocate (by assigning it some large chunk of data), then you can modify it with } without too much penalty.
After I asked this question, to be honest, I lost track of this web site.
Yes, the answer is that the language has no form that means "update in place, but if you use two forms
x =: x , most anything
or
x =: most anything } x
then the interpreter recognizes those as special and does update in place unless it can't. There are a number of other specials recognized by the interpreter, like:
199(1000&|#^)199
That combined operation is modular exponentiation. It never calculates the whole exponentiation, as
199(1000&|^)199
would - that just ends as _ without the #.
So it is worth reading the article on specials. I will mark someone else's answer up.
The link that sverre provided above ( http://www.jsoftware.com/jwiki/Essays/In-Place%20Operations ) shows the various operations that support modifying an existing array rather than creating a new one. They include:
myarray=: myarray,'blah'
If you are interested in a tacit version of the lookandsay sequence see this submission to RosettaCode:
las=: ,#((# , {.);.1~ 1 , 2 ~:/\ ])&.(10x&#.inv)#]^:(1+i.#[)
5 las 1
11 21 1211 111221 312211

Resources