How to bruteforce a lossy AND routine? - math

Im wondering whether there are any standard approaches to reversing AND routines by brute force.
For example I have the following transformation:
MOV(eax, 0x5b3e0be0) <- Here we move 0x5b3e0be0 to EDX.
MOV(edx, eax) # Here we copy 0x5b3e0be0 to EAX as well.
SHL(edx, 0x7) # Bitshift 0x5b3e0be0 with 0x7 which results in 0x9f05f000
AND(edx, 0x9d2c5680) # AND 0x9f05f000 with 0x9d2c5680 which results in 0x9d045000
XOR(edx, eax) # XOR 0x9d045000 with original value 0x5b3e0be0 which results in 0xc63a5be0
My question is how to brute force and reverse this routine (i.e. transform 0xc63a5be0 back into 0x5b3e0be0)
One idea i had (which didn't work) was this using PeachPy implementation:
#Input values
MOV(esi, 0xffffffff) < Initial value to AND with, which will be decreased by 1 in a loop.
MOV(cl, 0x1) < Initial value to SHR with which will be increased by 1 until 0x1f.
MOV(eax, 0xc63a5be0) < Target result which I'm looking to get using the below loop.
MOV(edx, 0x5b3e0be0) < Input value which will be transformed.
sub_esi = peachpy.x86_64.Label()
with loop:
#End the loop if ESI = 0x0
TEST(esi, esi)
JZ(loop.end)
#Test the routine and check if it matches end result.
MOV(ebx, eax)
SHR(ebx, cl)
TEST(ebx, ebx)
JZ(sub_esi)
AND(ebx, esi)
XOR(ebx, eax)
CMP(ebx, edx)
JZ(loop.end)
#Add to the CL register which is used for SHR.
#Also check if we've reached the last potential value of CL which is 0x1f
ADD(cl, 0x1)
CMP(cl, 0x1f)
JNZ(loop.begin)
#Decrement ESI by 1, reset CL and restart routine.
peachpy.x86_64.LABEL(sub_esi)
SUB(esi, 0x1)
MOV(cl, 0x1)
JMP(loop.begin)
#The ESI result here will either be 0x0 or a valid value to AND with and get the necessary result.
RETURN(esi)
Maybe an article or a book you can recommend specific to this?

It's not lossy, the final operation is an XOR.
The whole routine can be modeled in C as
#define K 0x9d2c5680
uint32_t hash(uint32_t num)
{
return num ^ ( (num << 7) & K);
}
Now, if we have two bits x and y and the operation x XOR y, when y is zero the result is x.
So given two numbers n1 and n2 and considering their XOR, the bits or n1 that pairs with a zero in n2 would make it to the result unchanged (the others will be flipped).
So in considering num ^ ( (num << 7) & K) we can identify num with n1 and (num << 7) & K with n2.
Since n2 is an AND, we can tell that it must have at least the same zero bits that K has.
This means that each bit of num that corresponds to a zero bit in the constant K will make it unchanged into the result.
Thus, by extracting those bits from the result we already have a partial inverse function:
/*hash & ~K extracts the bits of hash that pair with a zero bit in K*/
partial_num = hash & ~K
Technically, the factor num << 7 would also introduce other zeros in the result of the AND. We know for sure that the lowest 7 bits must be zero.
However K already has the lowest 7 bits zero, so we cannot exploit this information.
So we will just use K here, but if its value were different you'd need to consider the AND (which, in practice, means to zero the lower 7 bits of K).
This leaves us with 13 bits unknown (the ones corresponding to the bits that are set in K).
If we forget about the AND for a moment, we would have x ^ (x << 7) meaning that
hi = numi for i from 0 to 6 inclusive
hi = numi ^ numi-7 for i from 7 to 31 inclusive
(The first line is due to the fact that the lower 7 bits of the right-hand are zero)
From this, starting from h7 and going up, we can retrive num7 as h7 ^ num0 = h7 ^ h0.
From bit 7 onward, the equality doesn't work and we need to use numk (for the suitable k) but luckily we already have computed its value in a previous step (that's why we start from lower to higher).
What the AND does to this is just restricting the values the index i runs in, specifically only to the bits that are set in K.
So to fill in the thirteen remaining bits one have to do:
part_num7 = h7 ^ part_num0
part_num9 = h9 ^ part_num2
part_num12 = h12 ^ part_num5
...
part_num31 = h31 ^ part_num24
Note that we exploited that fact that part_num0..6 = h0..6.
Here's a C program that inverts the function:
#include <stdio.h>
#include <stdint.h>
#define BIT(i, hash, result) ( (((result >> i) ^ (hash >> (i+7))) & 0x1) << (i+7) )
#define K 0x9d2c5680
uint32_t base_candidate(uint32_t hash)
{
uint32_t result = hash & ~K;
result |= BIT(0, hash, result);
result |= BIT(2, hash, result);
result |= BIT(3, hash, result);
result |= BIT(5, hash, result);
result |= BIT(7, hash, result);
result |= BIT(11, hash, result);
result |= BIT(12, hash, result);
result |= BIT(14, hash, result);
result |= BIT(17, hash, result);
result |= BIT(19, hash, result);
result |= BIT(20, hash, result);
result |= BIT(21, hash, result);
result |= BIT(24, hash, result);
return result;
}
uint32_t hash(uint32_t num)
{
return num ^ ( (num << 7) & K);
}
int main()
{
uint32_t tester = 0x5b3e0be0;
uint32_t candidate = base_candidate(hash(tester));
printf("candidate: %x, tester %x\n", candidate, tester);
return 0;
}

Since the original question was how to "bruteforce" instead of solve here's something that I eventually came up with which works just as well. Obviously its prone to errors depending on input (might be more than 1 result).
from peachpy import *
from peachpy.x86_64 import *
input = 0xc63a5be0
x = Argument(uint32_t)
with Function("DotProduct", (x,), uint32_t) as asm_function:
LOAD.ARGUMENT(edx, x) # EDX = 1b6fb67c
MOV(esi, 0xffffffff)
with Loop() as loop:
TEST(esi,esi)
JZ(loop.end)
MOV(eax, esi)
SHL(eax, 0x7)
AND(eax, 0x9d2c5680)
XOR(eax, esi)
CMP(eax, edx)
JZ(loop.end)
SUB(esi, 0x1)
JMP(loop.begin)
RETURN(esi)
#Read Assembler Return
abi = peachpy.x86_64.abi.detect()
encoded_function = asm_function.finalize(abi).encode()
python_function = encoded_function.load()
print(hex(python_function(input)))

Related

Dynamic programming to solve the fibwords problem

Problem Statement: The Fibonacci word sequence of bit strings is defined as:
F(0) = 0, F(1) = 1
F(n − 1) + F(n − 2) if n ≥ 2
For example : F(2) = F(1) + F(0) = 10, F(3) = F(2) + F(1) = 101, etc.
Given a bit pattern p and a number n, how often does p occur in F(n)?
Input:
The first line of each test case contains the integer n (0 ≤ n ≤ 100). The second line contains the bit
pattern p. The pattern p is nonempty and has a length of at most 100 000 characters.
Output:
For each test case, display its case number followed by the number of occurrences of the bit pattern p in
F(n). Occurrences may overlap. The number of occurrences will be less than 2^63.
Sample input: 6 10 Sample output: Case 1: 5
I implemented a divide and conquer algorithm to solve this problem, based on the hints that I found on the internet: We can think of the process of going from F(n-1) to F(n) as a string replacement rule: every '1' becomes '10' and '0' becomes '1'. Here is my code:
#include <string>
#include <iostream>
using namespace std;
#define LL long long int
LL count = 0;
string F[40];
void find(LL n, char ch1,char ch2 ){//Find occurences of eiher "11" / "01" / "10" in F[n]
LL n1 = F[n].length();
for (int i = 0;i+1 <n1;++i){
if (F[n].at(i)==ch1&&F[n].at(i+1)==ch2) ++ count;
}
}
void find(char ch, LL n){
LL n1 = F[n].length();
for (int i = 0;i<n1;++i){
if (F[n].at(i)==ch) ++count;
}
}
void solve(string p, LL n){//Recursion
// cout << p << endl;
LL n1 = p.length();
if (n<=1&&n1>=2) return;//return if string pattern p's size is larger than F(n)
//When p's size is reduced to 2 or 1, it's small enough now that we can search for p directly in F(n)
if (n1<=2){
if (n1 == 2){
if (p=="00") return;//Return since there can't be two subsequent '0' in F(n) for any n
else find(n,p.at(0),p.at(1));
return;
}
if (n1 == 1){
if (p=="1") find('1',n);
else find('0',n);
return;
}
}
string p1, p2;//if the last character in p is 1, we can replace it with either '1' or '0'
//p1 stores the substring ending in '1' and p2 stores the substring ending in '0'
for (LL i = 0;i<n1;++i){//We replace every "10" with 1, "1" with 0.
if (p[i]=='1'){
if (p[i+1]=='0'&&(i+1)!= n1){
if (p[i+2]=='0'&&(i+2)!= n1) return;//Return if there are two subsequent '0'
p1.append("1");//Replace "10" with "1"
++i;
}
else {
p1.append("0");//Replace "1" with "0"
}
}
else {
if (p[i+1]=='0'&&(i+1)!= n1){//Return if there are two subsequent '0'
return;
}
p1.append("1");
}
}
solve(p1,n-1);
if (p[n1-1]=='1'){
p2 = p1;
p2.back() = '1';
solve(p2,n-1);
}
}
main(){
F[0] = "0";F[1] = "1";
for (int i = 2;i<38;++i){
F[i].append(F[i-1]);
F[i].append(F[i-2]);
}//precalculate F(0) to F(37)
LL t = 0;//NumofTestcases
int n; string p;
while (cin >> n >> p) {
count = 0;
solve(p,n);
cout << "Case " << ++t << ": " << count << endl;
}
}
The above program works fine, but with small inputs only. When i submitted the above program to codeforces i got an answer wrong because although i shortened the pattern string p and reduces n to n', the size of F[n'] is still very large (n'>=50). How can i modify my code to make it works in this case, or is there another approach (such as dynamic programming?). Many thanks for any advice.
More details about the problem can be found here: https://codeforces.com/group/Ir5CI6f3FD/contest/273369/problem/B
I don't have time now to try to code this up myself, but I have a suggested approach.
First, I should note, that while that hint you used is certainly accurate, I don't see any straightforward way to solve the problem. Perhaps the correct follow-up to that would be simpler than what I'm suggesting.
My approach:
Find the first two ns such that length(F(n)) >= length(pattern). Calculating these is a simple recursion. The important insight is that every subsequent value will start with one of these two values, and will also end with one of them. (This is true for all adjacent values -- for any m > n, F(m) will begin either with F(n) or with F(n - 1). It's not hard to see why.)
Calculate and cache the number of occurrences of the pattern in this these two Fs, but whatever index shifting technique makes sense.
For F(n+1) (and all subsequent values) calculate by adding together
The count for F(n)
The count for F(n - 1)
The count for those spanning both F(n) and F(n - 1). We can achieve that by testing every breakdown of pattern into (nonempty) prefix and suffix values (i.e., splitting at every internal index) and counting those where F(n) ends in prefix and F(n - 1) starts with suffix. But we don't have to have all of F(n) and F(n - 1) to do this. We just need the tail of F(n) and the head of F(n - 1) of the length of the pattern. So we don't need to calculate all of F(n). We just need to know which of those two initial values our current one ends with. But the start is always the predecessor, and the end oscillates between the previous two. It should be easy to keep track.
The time complexity then should be proportional to the product of n and the length of the pattern.
If I find time tomorrow, I'll see if I can code this up. But it won't be in C -- those years were short and long gone.
Collecting the list of prefix/suffix pairs can be done once ahead of time

Finding (a ^ x) % m from a % m. This is about utilizing a % m to calculate (a ^ x) % m. % is the modulus operator [duplicate]

I want to calculate ab mod n for use in RSA decryption. My code (below) returns incorrect answers. What is wrong with it?
unsigned long int decrypt2(int a,int b,int n)
{
unsigned long int res = 1;
for (int i = 0; i < (b / 2); i++)
{
res *= ((a * a) % n);
res %= n;
}
if (b % n == 1)
res *=a;
res %=n;
return res;
}
You can try this C++ code. I've used it with 32 and 64-bit integers. I'm sure I got this from SO.
template <typename T>
T modpow(T base, T exp, T modulus) {
base %= modulus;
T result = 1;
while (exp > 0) {
if (exp & 1) result = (result * base) % modulus;
base = (base * base) % modulus;
exp >>= 1;
}
return result;
}
You can find this algorithm and related discussion in the literature on p. 244 of
Schneier, Bruce (1996). Applied Cryptography: Protocols, Algorithms, and Source Code in C, Second Edition (2nd ed.). Wiley. ISBN 978-0-471-11709-4.
Note that the multiplications result * base and base * base are subject to overflow in this simplified version. If the modulus is more than half the width of T (i.e. more than the square root of the maximum T value), then one should use a suitable modular multiplication algorithm instead - see the answers to Ways to do modulo multiplication with primitive types.
In order to calculate pow(a,b) % n to be used for RSA decryption, the best algorithm I came across is Primality Testing 1) which is as follows:
int modulo(int a, int b, int n){
long long x=1, y=a;
while (b > 0) {
if (b%2 == 1) {
x = (x*y) % n; // multiplying with base
}
y = (y*y) % n; // squaring the base
b /= 2;
}
return x % n;
}
See below reference for more details.
1) Primality Testing : Non-deterministic Algorithms – topcoder
Usually it's something like this:
while (b)
{
if (b % 2) { res = (res * a) % n; }
a = (a * a) % n;
b /= 2;
}
return res;
The only actual logic error that I see is this line:
if (b % n == 1)
which should be this:
if (b % 2 == 1)
But your overall design is problematic: your function performs O(b) multiplications and modulus operations, but your use of b / 2 and a * a implies that you were aiming to perform O(log b) operations (which is usually how modular exponentiation is done).
Doing the raw power operation is very costly, hence you can apply the following logic to simplify the decryption.
From here,
Now say we want to encrypt the message m = 7, c = m^e mod n = 7^3 mod 33
= 343 mod 33 = 13. Hence the ciphertext c = 13.
To check decryption we compute m' = c^d mod n = 13^7 mod 33 = 7. Note
that we don't have to calculate the full value of 13 to the power 7
here. We can make use of the fact that a = bc mod n = (b mod n).(c mod
n) mod n so we can break down a potentially large number into its
components and combine the results of easier, smaller calculations to
calculate the final value.
One way of calculating m' is as follows:- Note that any number can be
expressed as a sum of powers of 2. So first compute values of 13^2,
13^4, 13^8, ... by repeatedly squaring successive values modulo 33. 13^2
= 169 ≡ 4, 13^4 = 4.4 = 16, 13^8 = 16.16 = 256 ≡ 25. Then, since 7 = 4 + 2 + 1, we have m' = 13^7 = 13^(4+2+1) = 13^4.13^2.13^1 ≡ 16 x 4 x 13 = 832
≡ 7 mod 33
Are you trying to calculate (a^b)%n, or a^(b%n) ?
If you want the first one, then your code only works when b is an even number, because of that b/2. The "if b%n==1" is incorrect because you don't care about b%n here, but rather about b%2.
If you want the second one, then the loop is wrong because you're looping b/2 times instead of (b%n)/2 times.
Either way, your function is unnecessarily complex. Why do you loop until b/2 and try to multiply in 2 a's each time? Why not just loop until b and mulitply in one a each time. That would eliminate a lot of unnecessary complexity and thus eliminate potential errors. Are you thinking that you'll make the program faster by cutting the number of times through the loop in half? Frankly, that's a bad programming practice: micro-optimization. It doesn't really help much: You still multiply by a the same number of times, all you do is cut down on the number of times testing the loop. If b is typically small (like one or two digits), it's not worth the trouble. If b is large -- if it can be in the millions -- then this is insufficient, you need a much more radical optimization.
Also, why do the %n each time through the loop? Why not just do it once at the end?
Calculating pow(a,b) mod n
A key problem with OP's code is a * a. This is int overflow (undefined behavior) when a is large enough. The type of res is irrelevant in the multiplication of a * a.
The solution is to ensure either:
the multiplication is done with 2x wide math or
with modulus n, n*n <= type_MAX + 1
There is no reason to return a wider type than the type of the modulus as the result is always represent by that type.
// unsigned long int decrypt2(int a,int b,int n)
int decrypt2(int a,int b,int n)
Using unsigned math is certainly more suitable for OP's RSA goals.
Also see Modular exponentiation without range restriction
// (a^b)%n
// n != 0
// Test if unsigned long long at least 2x values bits as unsigned
#if ULLONG_MAX/UINT_MAX - 1 > UINT_MAX
unsigned decrypt2(unsigned a, unsigned b, unsigned n) {
unsigned long long result = 1u % n; // Insure result < n, even when n==1
while (b > 0) {
if (b & 1) result = (result * a) % n;
a = (1ULL * a * a) %n;
b >>= 1;
}
return (unsigned) result;
}
#else
unsigned decrypt2(unsigned a, unsigned b, unsigned n) {
// Detect if UINT_MAX + 1 < n*n
if (UINT_MAX/n < n-1) {
return TBD_code_with_wider_math(a,b,n);
}
a %= n;
unsigned result = 1u % n;
while (b > 0) {
if (b & 1) result = (result * a) % n;
a = (a * a) % n;
b >>= 1;
}
return result;
}
#endif
int's are generally not enough for RSA (unless you are dealing with small simplified examples)
you need a data type that can store integers up to 2256 (for 256-bit RSA keys) or 2512 for 512-bit keys, etc
Here is another way. Remember that when we find modulo multiplicative inverse of a under mod m.
Then
a and m must be coprime with each other.
We can use gcd extended for calculating modulo multiplicative inverse.
For computing ab mod m when a and b can have more than 105 digits then its tricky to compute the result.
Below code will do the computing part :
#include <iostream>
#include <string>
using namespace std;
/*
* May this code live long.
*/
long pow(string,string,long long);
long pow(long long ,long long ,long long);
int main() {
string _num,_pow;
long long _mod;
cin>>_num>>_pow>>_mod;
//cout<<_num<<" "<<_pow<<" "<<_mod<<endl;
cout<<pow(_num,_pow,_mod)<<endl;
return 0;
}
long pow(string n,string p,long long mod){
long long num=0,_pow=0;
for(char c: n){
num=(num*10+c-48)%mod;
}
for(char c: p){
_pow=(_pow*10+c-48)%(mod-1);
}
return pow(num,_pow,mod);
}
long pow(long long a,long long p,long long mod){
long res=1;
if(a==0)return 0;
while(p>0){
if((p&1)==0){
p/=2;
a=(a*a)%mod;
}
else{
p--;
res=(res*a)%mod;
}
}
return res;
}
This code works because ab mod m can be written as (a mod m)b mod m-1 mod m.
Hope it helped { :)
use fast exponentiation maybe..... gives same o(log n) as that template above
int power(int base, int exp,int mod)
{
if(exp == 0)
return 1;
int p=power(base, exp/2,mod);
p=(p*p)% mod;
return (exp%2 == 0)?p:(base * p)%mod;
}
This(encryption) is more of an algorithm design problem than a programming one. The important missing part is familiarity with modern algebra. I suggest that you look for a huge optimizatin in group theory and number theory.
If n is a prime number, pow(a,n-1)%n==1 (assuming infinite digit integers).So, basically you need to calculate pow(a,b%(n-1))%n; According to group theory, you can find e such that every other number is equivalent to a power of e modulo n. Therefore the range [1..n-1] can be represented as a permutation on powers of e. Given the algorithm to find e for n and logarithm of a base e, calculations can be significantly simplified. Cryptography needs a tone of math background; I'd rather be off that ground without enough background.
For my code a^k mod n in php:
function pmod(a, k, n)
{
if (n==1) return 0;
power = 1;
for(i=1; i<=k; $i++)
{
power = (power*a) % n;
}
return power;
}
#include <cmath>
...
static_cast<int>(std::pow(a,b))%n
but my best bet is you are overflowing int (IE: the number is two large for the int) on the power I had the same problem creating the exact same function.
I'm using this function:
int CalculateMod(int base, int exp ,int mod){
int result;
result = (int) pow(base,exp);
result = result % mod;
return result;
}
I parse the variable result because pow give you back a double, and for using mod you need two variables of type int, anyway, in a RSA decryption, you should just use integer numbers.

How to get the first x leading binary digits of 5**x without big integer multiplication

I want to efficiently and elegantly compute with perfect precision the first x leading binary digits of 5**x?
For example 5**20 is 10101101011110001110101111000101101011000110001. The first 8 leading binary digits is 10101101.
In my use case, x is only up to 1-60. I don't want to create a table. A solution using 64-bit integers would be fine. I just don't want to use big integers.
first x leading binary digits of 5**x without big integer multiplication
efficiently and elegantly compute with perfect precision the first x leading binary digits of 5x?
"compute with perfect precision" leaves out pow(). Too many implementations will return an imperfect result and FP math might not use 64 bit precision, even with long double.
Form an integer with a 64-bit whole number part .ms and a 64-bit fraction part .ls. Then loop 60 times, multiply by 5 and diving by 2 as needed, to keep the leading bits from growing too big.
Note there is some precision lost in the fraction, with N > 42, yet that is not significant enough to affect the whole number part OP is seeking.
#include <inttypes.h>
#include <stdint.h>
#include <stdio.h>
typedef struct {
uint64_t ms, ls;
} uint128;
// Simplifications possible here, leave for OP
uint128 times5(uint128 x) {
uint128 y = x;
for (int i=1; i<5; i++) {
// y += x
y.ms += x.ms;
y.ls += x.ls;
if (y.ls < x.ls) y.ms++;
}
return y;
}
uint128 div2(uint128 x) {
x.ls = (x.ls >> 1) | (x.ms << 63);
x.ms >>= 1;
return x;
}
int main(void) {
uint128 y = {.ms = 1};
uint64_t pow2 = 2;
for (unsigned x = 1; x <= 60; x++) {
y = times5(y);
while (y.ms >= pow2) {
y = div2(y);
}
printf("%2u %16" PRIX64 ".%016" PRIX64 "\n", x, y.ms, y.ls);
pow2 <<= 1;
}
}
Output
whole part.fraction
1 1.4000000000000000
2 3.2000000000000000
3 7.D000000000000000
4 9.C400000000000000
...
57 14643E5AE44D12B.8F5FEE5AA432560D
58 32FA9BE33AC0AEC.E66FD3E29A7DD720
59 7F7285B812E1B50.401791B6823A99D0
60 9F4F2726179A224.501D762422C94044
^-------------^ This is the part OP is seeking.
The key to solving this task is: divide and conquer. Form an algorithm, (which is simply *5 and /2 as needed), and code a type and functions to do each small step.
Is a loop of 60 efficient? Perhaps not. Another approach would use Exponentiation by squaring. Certainly would be worth it for large N, yet for N == 60, a loop was simple enough for a quick turn.
5n = 2(-n) • 10n
Using this identity, we can easily compute the leading N base-2 digits of (the nearest integer to) any given power of 5.
This code example is in C, but it's the same idea in any other language.
Example output: https://wandbox.org/permlink/Fs205DDzQR0gaLSo
#include <assert.h>
#include <float.h>
#include <math.h>
#include <stdint.h>
#define STATIC_ASSERT(CONDITION) ((void)sizeof(int[(CONDITION) ? 1 : -1]))
uint64_t pow5_leading_digits(double power, uint8_t ndigits)
{
STATIC_ASSERT(DBL_MANT_DIG <= 64);
double pow5 = exp2(-power) * pow(10, power);
const double binary_digits = ceil(log2(pow5));
assert(ndigits <= DBL_MANT_DIG);
if (!ndigits || binary_digits < 0)
return 0;
// If pow5 can fit in the number of digits requested, return it
if (binary_digits <= ndigits)
return pow5;
// If pow5 is too big to return, divide by 2 until it fits
if (binary_digits > DBL_MANT_DIG)
pow5 /= exp2(binary_digits - DBL_MANT_DIG + 1);
return (uint64_t)pow5 >> (DBL_MANT_DIG - ndigits);
}
Edit: Now limits the returned value to those exactly representable with double's.

create a random sequence, skip to any part of the sequence

In Linux. There is an srand() function, where you supply a seed and it will guarantee the same sequence of pseudorandom numbers in subsequent calls to the random() function afterwards.
Lets say, I want to store this pseudo random sequence by remembering this seed value.
Furthermore, let's say I want the 100 thousandth number in this pseudo random sequence later.
One way, would be to supply the seed number using srand(), and then calling random() 100 thousand times, and remembering this number.
Is there a better way of skipping all 99,999 other numbers in the pseudo random list and directly getting the 100 thousandth number in the list.
thanks,
m
I'm not sure there's a defined standard for implementing rand on any platform; however, picking this one from the GNU Scientific Library:
— Generator: gsl_rng_rand
This is the BSD rand generator. Its sequence is
xn+1 = (a xn + c) mod m
with a = 1103515245, c = 12345 and m = 231. The seed specifies the initial value, x1. The period of this generator is 231, and it uses 1 word of storage per generator.
So to "know" xn requires you to know xn-1. Unless there's some obvious pattern I'm missing, you can't jump to a value without computing all the values before it. (But that's not necessarily the case for every rand implementation.)
If we start with x1...
x2 = (a * x1 + c) % m
x3 = (a * ((a * x1 + c) % m) + c) % m
x4 = (a * ((a * ((a * x1 + c) % m) + c) % m) + c) % m
x5 = (a * (a * ((a * ((a * x1 + c) % m) + c) % m) + c) % m) + c) % m
It gets out of hand pretty quickly. Is that function easily reducible? I don't think it is.
(There's a statistics phrase for a series where xn depends on xn-1 -- can anyone remind me what that word is?)
If they're available on your system, you can use rand_r instead of rand & srand, or use initstate and setstate with random. rand_r takes an unsigned * as an argument, where it stores its state. After calling rand_r numerous times, save the value of this unsigned integer and use it as the starting value the next time.
For random(), use initstate rather than srandom. Save the contents of the state buffer for any state that you want to restore. To restore a state, fill a buffer with and call setstate. If a buffer is already the current state buffer, you can skip the call to setstate.
This is developed from #Mark's answer using the BSD rand() function.
rand1() computes the nth random number, starting at seed, by stepping through n times.
rand2() computes the same using a shortcut. It can step up to 2^24-1 steps in one go. Internally it requires only 24 steps.
If the BSD random number generator is good enough for you then this will suffice:
#include <stdio.h>
const unsigned int m = (1<<31)-1;
unsigned int a[24] = {
1103515245, 1117952617, 1845919505, 1339940641, 1601471041,
187569281 , 1979738369, 387043841 , 1046979585, 1574914049,
1073647617, 285024257 , 1710899201, 1542750209, 2011758593,
1876033537, 1604583425, 1061683201, 2123366401, 2099249153,
2051014657, 1954545665, 1761607681, 1375731713
};
unsigned int b[24] = {
12345, 1406932606, 1449466924, 1293799192, 1695770928, 1680572000,
422948032, 910563712, 519516928, 530212352, 98880512, 646551552,
940781568, 472276992, 1749860352, 278495232, 556990464, 1113980928,
80478208, 160956416, 321912832, 643825664, 1287651328, 427819008
};
unsigned int rand1(unsigned int seed, unsigned int n)
{
int i;
for (i = 0; i<n; ++i)
{
seed = (1103515245U*seed+12345U) & m;
}
return seed;
}
unsigned int rand2(unsigned int seed, unsigned int n)
{
int i;
for (i = 0; i<24; ++i)
{
if (n & (1<<i))
{
seed = (a[i]*seed+b[i]) & m;
}
}
return seed;
}
int main()
{
printf("%u\n", rand1 (10101, 100000));
printf("%u\n", rand2 (10101, 100000));
}
It's not hard to adapt to any linear congruential generator. I computed the tables in a language with a proper integer type (Haskell), but I could have computed them another way in C using only a few lines more code.
If you always want the 100,000th item, just store it for later.
Or you could gen the sequence and store that... and query for the particular element by index later.

Designing function f(f(n)) == -n

A question I got on my last interview:
Design a function f, such that:
f(f(n)) == -n
Where n is a 32 bit signed integer; you can't use complex numbers arithmetic.
If you can't design such a function for the whole range of numbers, design it for the largest range possible.
Any ideas?
You didn't say what kind of language they expected... Here's a static solution (Haskell). It's basically messing with the 2 most significant bits:
f :: Int -> Int
f x | (testBit x 30 /= testBit x 31) = negate $ complementBit x 30
| otherwise = complementBit x 30
It's much easier in a dynamic language (Python). Just check if the argument is a number X and return a lambda that returns -X:
def f(x):
if isinstance(x,int):
return (lambda: -x)
else:
return x()
How about:
f(n) = sign(n) - (-1)ⁿ * n
In Python:
def f(n):
if n == 0: return 0
if n >= 0:
if n % 2 == 1:
return n + 1
else:
return -1 * (n - 1)
else:
if n % 2 == 1:
return n - 1
else:
return -1 * (n + 1)
Python automatically promotes integers to arbitrary length longs. In other languages the largest positive integer will overflow, so it will work for all integers except that one.
To make it work for real numbers you need to replace the n in (-1)ⁿ with { ceiling(n) if n>0; floor(n) if n<0 }.
In C# (works for any double, except in overflow situations):
static double F(double n)
{
if (n == 0) return 0;
if (n < 0)
return ((long)Math.Ceiling(n) % 2 == 0) ? (n + 1) : (-1 * (n - 1));
else
return ((long)Math.Floor(n) % 2 == 0) ? (n - 1) : (-1 * (n + 1));
}
Here's a proof of why such a function can't exist, for all numbers, if it doesn't use extra information(except 32bits of int):
We must have f(0) = 0. (Proof: Suppose f(0) = x. Then f(x) = f(f(0)) = -0 = 0. Now, -x = f(f(x)) = f(0) = x, which means that x = 0.)
Further, for any x and y, suppose f(x) = y. We want f(y) = -x then. And f(f(y)) = -y => f(-x) = -y. To summarize: if f(x) = y, then f(-x) = -y, and f(y) = -x, and f(-y) = x.
So, we need to divide all integers except 0 into sets of 4, but we have an odd number of such integers; not only that, if we remove the integer that doesn't have a positive counterpart, we still have 2(mod4) numbers.
If we remove the 2 maximal numbers left (by abs value), we can get the function:
int sign(int n)
{
if(n>0)
return 1;
else
return -1;
}
int f(int n)
{
if(n==0) return 0;
switch(abs(n)%2)
{
case 1:
return sign(n)*(abs(n)+1);
case 0:
return -sign(n)*(abs(n)-1);
}
}
Of course another option, is to not comply for 0, and get the 2 numbers we removed as a bonus. (But that's just a silly if.)
Thanks to overloading in C++:
double f(int var)
{
return double(var);
}
int f(double var)
{
return -int(var);
}
int main(){
int n(42);
std::cout<<f(f(n));
}
Or, you could abuse the preprocessor:
#define f(n) (f##n)
#define ff(n) -n
int main()
{
int n = -42;
cout << "f(f(" << n << ")) = " << f(f(n)) << endl;
}
This is true for all negative numbers.
f(n) = abs(n)
Because there is one more negative number than there are positive numbers for twos complement integers, f(n) = abs(n) is valid for one more case than f(n) = n > 0 ? -n : n solution that is the same same as f(n) = -abs(n). Got you by one ... :D
UPDATE
No, it is not valid for one case more as I just recognized by litb's comment ... abs(Int.Min) will just overflow ...
I thought about using mod 2 information, too, but concluded, it does not work ... to early. If done right, it will work for all numbers except Int.Min because this will overflow.
UPDATE
I played with it for a while, looking for a nice bit manipulation trick, but I could not find a nice one-liner, while the mod 2 solution fits in one.
f(n) = 2n(abs(n) % 2) - n + sgn(n)
In C#, this becomes the following:
public static Int32 f(Int32 n)
{
return 2 * n * (Math.Abs(n) % 2) - n + Math.Sign(n);
}
To get it working for all values, you have to replace Math.Abs() with (n > 0) ? +n : -n and include the calculation in an unchecked block. Then you get even Int.Min mapped to itself as unchecked negation does.
UPDATE
Inspired by another answer I am going to explain how the function works and how to construct such a function.
Lets start at the very beginning. The function f is repeatedly applied to a given value n yielding a sequence of values.
n => f(n) => f(f(n)) => f(f(f(n))) => f(f(f(f(n)))) => ...
The question demands f(f(n)) = -n, that is two successive applications of f negate the argument. Two further applications of f - four in total - negate the argument again yielding n again.
n => f(n) => -n => f(f(f(n))) => n => f(n) => ...
Now there is a obvious cycle of length four. Substituting x = f(n) and noting that the obtained equation f(f(f(n))) = f(f(x)) = -x holds, yields the following.
n => x => -n => -x => n => ...
So we get a cycle of length four with two numbers and the two numbers negated. If you imagine the cycle as a rectangle, negated values are located at opposite corners.
One of many solution to construct such a cycle is the following starting from n.
n => negate and subtract one
-n - 1 = -(n + 1) => add one
-n => negate and add one
n + 1 => subtract one
n
A concrete example is of such an cycle is +1 => -2 => -1 => +2 => +1. We are almost done. Noting that the constructed cycle contains an odd positive number, its even successor, and both numbers negate, we can easily partition the integers into many such cycles (2^32 is a multiple of four) and have found a function that satisfies the conditions.
But we have a problem with zero. The cycle must contain 0 => x => 0 because zero is negated to itself. And because the cycle states already 0 => x it follows 0 => x => 0 => x. This is only a cycle of length two and x is turned into itself after two applications, not into -x. Luckily there is one case that solves the problem. If X equals zero we obtain a cycle of length one containing only zero and we solved that problem concluding that zero is a fixed point of f.
Done? Almost. We have 2^32 numbers, zero is a fixed point leaving 2^32 - 1 numbers, and we must partition that number into cycles of four numbers. Bad that 2^32 - 1 is not a multiple of four - there will remain three numbers not in any cycle of length four.
I will explain the remaining part of the solution using the smaller set of 3 bit signed itegers ranging from -4 to +3. We are done with zero. We have one complete cycle +1 => -2 => -1 => +2 => +1. Now let us construct the cycle starting at +3.
+3 => -4 => -3 => +4 => +3
The problem that arises is that +4 is not representable as 3 bit integer. We would obtain +4 by negating -3 to +3 - what is still a valid 3 bit integer - but then adding one to +3 (binary 011) yields 100 binary. Interpreted as unsigned integer it is +4 but we have to interpret it as signed integer -4. So actually -4 for this example or Int.MinValue in the general case is a second fixed point of integer arithmetic negation - 0 and Int.MinValue are mapped to themselve. So the cycle is actually as follows.
+3 => -4 => -3 => -4 => -3
It is a cycle of length two and additionally +3 enters the cycle via -4. In consequence -4 is correctly mapped to itself after two function applications, +3 is correctly mapped to -3 after two function applications, but -3 is erroneously mapped to itself after two function applications.
So we constructed a function that works for all integers but one. Can we do better? No, we cannot. Why? We have to construct cycles of length four and are able to cover the whole integer range up to four values. The remaining values are the two fixed points 0 and Int.MinValue that must be mapped to themselves and two arbitrary integers x and -x that must be mapped to each other by two function applications.
To map x to -x and vice versa they must form a four cycle and they must be located at opposite corners of that cycle. In consequence 0 and Int.MinValue have to be at opposite corners, too. This will correctly map x and -x but swap the two fixed points 0 and Int.MinValue after two function applications and leave us with two failing inputs. So it is not possible to construct a function that works for all values, but we have one that works for all values except one and this is the best we can achieve.
Using complex numbers, you can effectively divide the task of negating a number into two steps:
multiply n by i, and you get n*i, which is n rotated 90° counter-clockwise
multiply again by i, and you get -n
The great thing is that you don't need any special handling code. Just multiplying by i does the job.
But you're not allowed to use complex numbers. So you have to somehow create your own imaginary axis, using part of your data range. Since you need exactly as much imaginary (intermediate) values as initial values, you are left with only half the data range.
I tried to visualize this on the following figure, assuming signed 8-bit data. You would have to scale this for 32-bit integers. The allowed range for initial n is -64 to +63.
Here's what the function does for positive n:
If n is in 0..63 (initial range), the function call adds 64, mapping n to the range 64..127 (intermediate range)
If n is in 64..127 (intermediate range), the function subtracts n from 64, mapping n to the range 0..-63
For negative n, the function uses the intermediate range -65..-128.
Works except int.MaxValue and int.MinValue
public static int f(int x)
{
if (x == 0) return 0;
if ((x % 2) != 0)
return x * -1 + (-1 *x) / (Math.Abs(x));
else
return x - x / (Math.Abs(x));
}
The question doesn't say anything about what the input type and return value of the function f have to be (at least not the way you've presented it)...
...just that when n is a 32-bit integer then f(f(n)) = -n
So, how about something like
Int64 f(Int64 n)
{
return(n > Int32.MaxValue ?
-(n - 4L * Int32.MaxValue):
n + 4L * Int32.MaxValue);
}
If n is a 32-bit integer then the statement f(f(n)) == -n will be true.
Obviously, this approach could be extended to work for an even wider range of numbers...
for javascript (or other dynamically typed languages) you can have the function accept either an int or an object and return the other. i.e.
function f(n) {
if (n.passed) {
return -n.val;
} else {
return {val:n, passed:1};
}
}
giving
js> f(f(10))
-10
js> f(f(-10))
10
alternatively you could use overloading in a strongly typed language although that may break the rules ie
int f(long n) {
return n;
}
long f(int n) {
return -n;
}
Depending on your platform, some languages allow you to keep state in the function. VB.Net, for example:
Function f(ByVal n As Integer) As Integer
Static flag As Integer = -1
flag *= -1
Return n * flag
End Function
IIRC, C++ allowed this as well. I suspect they're looking for a different solution though.
Another idea is that since they didn't define the result of the first call to the function you could use odd/evenness to control whether to invert the sign:
int f(int n)
{
int sign = n>=0?1:-1;
if (abs(n)%2 == 0)
return ((abs(n)+1)*sign * -1;
else
return (abs(n)-1)*sign;
}
Add one to the magnitude of all even numbers, subtract one from the magnitude of all odd numbers. The result of two calls has the same magnitude, but the one call where it's even we swap the sign. There are some cases where this won't work (-1, max or min int), but it works a lot better than anything else suggested so far.
Exploiting JavaScript exceptions.
function f(n) {
try {
return n();
}
catch(e) {
return function() { return -n; };
}
}
f(f(0)) => 0
f(f(1)) => -1
For all 32-bit values (with the caveat that -0 is -2147483648)
int rotate(int x)
{
static const int split = INT_MAX / 2 + 1;
static const int negativeSplit = INT_MIN / 2 + 1;
if (x == INT_MAX)
return INT_MIN;
if (x == INT_MIN)
return x + 1;
if (x >= split)
return x + 1 - INT_MIN;
if (x >= 0)
return INT_MAX - x;
if (x >= negativeSplit)
return INT_MIN - x + 1;
return split -(negativeSplit - x);
}
You basically need to pair each -x => x => -x loop with a y => -y => y loop. So I paired up opposite sides of the split.
e.g. For 4 bit integers:
0 => 7 => -8 => -7 => 0
1 => 6 => -1 => -6 => 1
2 => 5 => -2 => -5 => 2
3 => 4 => -3 => -4 => 3
A C++ version, probably bending the rules somewhat but works for all numeric types (floats, ints, doubles) and even class types that overload the unary minus:
template <class T>
struct f_result
{
T value;
};
template <class T>
f_result <T> f (T n)
{
f_result <T> result = {n};
return result;
}
template <class T>
T f (f_result <T> n)
{
return -n.value;
}
void main (void)
{
int n = 45;
cout << "f(f(" << n << ")) = " << f(f(n)) << endl;
float p = 3.14f;
cout << "f(f(" << p << ")) = " << f(f(p)) << endl;
}
x86 asm (AT&T style):
; input %edi
; output %eax
; clobbered regs: %ecx, %edx
f:
testl %edi, %edi
je .zero
movl %edi, %eax
movl $1, %ecx
movl %edi, %edx
andl $1, %eax
addl %eax, %eax
subl %eax, %ecx
xorl %eax, %eax
testl %edi, %edi
setg %al
shrl $31, %edx
subl %edx, %eax
imull %ecx, %eax
subl %eax, %edi
movl %edi, %eax
imull %ecx, %eax
.zero:
xorl %eax, %eax
ret
Code checked, all possible 32bit integers passed, error with -2147483647 (underflow).
Uses globals...but so?
bool done = false
f(int n)
{
int out = n;
if(!done)
{
out = n * -1;
done = true;
}
return out;
}
This Perl solution works for integers, floats, and strings.
sub f {
my $n = shift;
return ref($n) ? -$$n : \$n;
}
Try some test data.
print $_, ' ', f(f($_)), "\n" for -2, 0, 1, 1.1, -3.3, 'foo' '-bar';
Output:
-2 2
0 0
1 -1
1.1 -1.1
-3.3 3.3
foo -foo
-bar +bar
Nobody ever said f(x) had to be the same type.
def f(x):
if type(x) == list:
return -x[0]
return [x]
f(2) => [2]
f(f(2)) => -2
I'm not actually trying to give a solution to the problem itself, but do have a couple of comments, as the question states this problem was posed was part of a (job?) interview:
I would first ask "Why would such a function be needed? What is the bigger problem this is part of?" instead of trying to solve the actual posed problem on the spot. This shows how I think and how I tackle problems like this. Who know? That might even be the actual reason the question is asked in an interview in the first place. If the answer is "Never you mind, assume it's needed, and show me how you would design this function." I would then continue to do so.
Then, I would write the C# test case code I would use (the obvious: loop from int.MinValue to int.MaxValue, and for each n in that range call f(f(n)) and checking the result is -n), telling I would then use Test Driven Development to get to such a function.
Only if the interviewer continues asking for me to solve the posed problem would I actually start to try and scribble pseudocode during the interview itself to try and get to some sort of an answer. However, I don't really think I would be jumping to take the job if the interviewer would be any indication of what the company is like...
Oh, this answer assumes the interview was for a C# programming related position. Would of course be a silly answer if the interview was for a math related position. ;-)
I would you change the 2 most significant bits.
00.... => 01.... => 10.....
01.... => 10.... => 11.....
10.... => 11.... => 00.....
11.... => 00.... => 01.....
As you can see, it's just an addition, leaving out the carried bit.
How did I got to the answer? My first thought was just a need for symmetry. 4 turns to get back where I started. At first I thought, that's 2bits Gray code. Then I thought actually standard binary is enough.
Here is a solution that is inspired by the requirement or claim that complex numbers can not be used to solve this problem.
Multiplying by the square root of -1 is an idea, that only seems to fail because -1 does not have a square root over the integers. But playing around with a program like mathematica gives for example the equation
(18494364652+1) mod (232-3) = 0.
and this is almost as good as having a square root of -1. The result of the function needs to be a signed integer. Hence I'm going to use a modified modulo operation mods(x,n) that returns the integer y congruent to x modulo n that is closest to 0. Only very few programming languages have suc a modulo operation, but it can easily be defined. E.g. in python it is:
def mods(x, n):
y = x % n
if y > n/2: y-= n
return y
Using the equation above, the problem can now be solved as
def f(x):
return mods(x*1849436465, 2**32-3)
This satisfies f(f(x)) = -x for all integers in the range [-231-2, 231-2]. The results of f(x) are also in this range, but of course the computation would need 64-bit integers.
C# for a range of 2^32 - 1 numbers, all int32 numbers except (Int32.MinValue)
Func<int, int> f = n =>
n < 0
? (n & (1 << 30)) == (1 << 30) ? (n ^ (1 << 30)) : - (n | (1 << 30))
: (n & (1 << 30)) == (1 << 30) ? -(n ^ (1 << 30)) : (n | (1 << 30));
Console.WriteLine(f(f(Int32.MinValue + 1))); // -2147483648 + 1
for (int i = -3; i <= 3 ; i++)
Console.WriteLine(f(f(i)));
Console.WriteLine(f(f(Int32.MaxValue))); // 2147483647
prints:
2147483647
3
2
1
0
-1
-2
-3
-2147483647
Essentially the function has to divide the available range into cycles of size 4, with -n at the opposite end of n's cycle. However, 0 must be part of a cycle of size 1, because otherwise 0->x->0->x != -x. Because of 0 being alone, there must be 3 other values in our range (whose size is a multiple of 4) not in a proper cycle with 4 elements.
I chose these extra weird values to be MIN_INT, MAX_INT, and MIN_INT+1. Furthermore, MIN_INT+1 will map to MAX_INT correctly, but get stuck there and not map back. I think this is the best compromise, because it has the nice property of only the extreme values not working correctly. Also, it means it would work for all BigInts.
int f(int n):
if n == 0 or n == MIN_INT or n == MAX_INT: return n
return ((Math.abs(n) mod 2) * 2 - 1) * n + Math.sign(n)
Nobody said it had to be stateless.
int32 f(int32 x) {
static bool idempotent = false;
if (!idempotent) {
idempotent = true;
return -x;
} else {
return x;
}
}
Cheating, but not as much as a lot of the examples. Even more evil would be to peek up the stack to see if your caller's address is &f, but this is going to be more portable (although not thread safe... the thread-safe version would use TLS). Even more evil:
int32 f (int32 x) {
static int32 answer = -x;
return answer;
}
Of course, neither of these works too well for the case of MIN_INT32, but there is precious little you can do about that unless you are allowed to return a wider type.
I could imagine using the 31st bit as an imaginary (i) bit would be an approach that would support half the total range.
works for n= [0 .. 2^31-1]
int f(int n) {
if (n & (1 << 31)) // highest bit set?
return -(n & ~(1 << 31)); // return negative of original n
else
return n | (1 << 31); // return n with highest bit set
}
The problem states "32-bit signed integers" but doesn't specify whether they are twos-complement or ones-complement.
If you use ones-complement then all 2^32 values occur in cycles of length four - you don't need a special case for zero, and you also don't need conditionals.
In C:
int32_t f(int32_t x)
{
return (((x & 0xFFFFU) << 16) | ((x & 0xFFFF0000U) >> 16)) ^ 0xFFFFU;
}
This works by
Exchanging the high and low 16-bit blocks
Inverting one of the blocks
After two passes we have the bitwise inverse of the original value. Which in ones-complement representation is equivalent to negation.
Examples:
Pass | x
-----+-------------------
0 | 00000001 (+1)
1 | 0001FFFF (+131071)
2 | FFFFFFFE (-1)
3 | FFFE0000 (-131071)
4 | 00000001 (+1)
Pass | x
-----+-------------------
0 | 00000000 (+0)
1 | 0000FFFF (+65535)
2 | FFFFFFFF (-0)
3 | FFFF0000 (-65535)
4 | 00000000 (+0)
:D
boolean inner = true;
int f(int input) {
if(inner) {
inner = false;
return input;
} else {
inner = true;
return -input;
}
}
return x ^ ((x%2) ? 1 : -INT_MAX);
I'd like to share my point of view on this interesting problem as a mathematician. I think I have the most efficient solution.
If I remember correctly, you negate a signed 32-bit integer by just flipping the first bit. For example, if n = 1001 1101 1110 1011 1110 0000 1110 1010, then -n = 0001 1101 1110 1011 1110 0000 1110 1010.
So how do we define a function f that takes a signed 32-bit integer and returns another signed 32-bit integer with the property that taking f twice is the same as flipping the first bit?
Let me rephrase the question without mentioning arithmetic concepts like integers.
How do we define a function f that takes a sequence of zeros and ones of length 32 and returns a sequence of zeros and ones of the same length, with the property that taking f twice is the same as flipping the first bit?
Observation: If you can answer the above question for 32 bit case, then you can also answer for 64 bit case, 100 bit case, etc. You just apply f to the first 32 bit.
Now if you can answer the question for 2 bit case, Voila!
And yes it turns out that changing the first 2 bits is enough.
Here's the pseudo-code
1. take n, which is a signed 32-bit integer.
2. swap the first bit and the second bit.
3. flip the first bit.
4. return the result.
Remark: The step 2 and the step 3 together can be summerised as (a,b) --> (-b, a). Looks familiar? That should remind you of the 90 degree rotation of the plane and the multiplication by the squar root of -1.
If I just presented the pseudo-code alone without the long prelude, it would seem like a rabbit out of the hat, I wanted to explain how I got the solution.

Resources