will carry flag be set after cmp[.b] #4, #-1 ? [MSP430 16 bit] - unsigned

[MSP430 16 bit]
0x437c mov[.b] #-1, r12
0x926c cmp[.b] #4, r12
0x2801 jlo 0xda36
Could anyone help me calculation (cmp[.b] #4, r12 ) with Binary?
Example : r12-#4= 1111111111111111 - 0100 ##?
I dont know how to calculation cmp with Binary in unsigned and signed case.

Nothing changes between signed and unsigned subtraction. What changes is what kind of jump you use: JLO or JL.
JL and JGE is used for signed comparison. JLO and JHS are used for unsigned comparison.
So, in your case, you are using unsigned comparison and the jlo will not jump to 0xda36 as CMP will set the carry bit. Note that JLO is an alias for JNC (jump if not carry).
To compute manually the subraction you can traslate it to the sum of the first addend plus the one complement of the second + 1. So in you case you have:
0xFFFF - 0x0004 =
0xFFFF + 0xFFFB + 1 = //carry is set
0xFFFB

Related

How to bruteforce a lossy AND routine?

Im wondering whether there are any standard approaches to reversing AND routines by brute force.
For example I have the following transformation:
MOV(eax, 0x5b3e0be0) <- Here we move 0x5b3e0be0 to EDX.
MOV(edx, eax) # Here we copy 0x5b3e0be0 to EAX as well.
SHL(edx, 0x7) # Bitshift 0x5b3e0be0 with 0x7 which results in 0x9f05f000
AND(edx, 0x9d2c5680) # AND 0x9f05f000 with 0x9d2c5680 which results in 0x9d045000
XOR(edx, eax) # XOR 0x9d045000 with original value 0x5b3e0be0 which results in 0xc63a5be0
My question is how to brute force and reverse this routine (i.e. transform 0xc63a5be0 back into 0x5b3e0be0)
One idea i had (which didn't work) was this using PeachPy implementation:
#Input values
MOV(esi, 0xffffffff) < Initial value to AND with, which will be decreased by 1 in a loop.
MOV(cl, 0x1) < Initial value to SHR with which will be increased by 1 until 0x1f.
MOV(eax, 0xc63a5be0) < Target result which I'm looking to get using the below loop.
MOV(edx, 0x5b3e0be0) < Input value which will be transformed.
sub_esi = peachpy.x86_64.Label()
with loop:
#End the loop if ESI = 0x0
TEST(esi, esi)
JZ(loop.end)
#Test the routine and check if it matches end result.
MOV(ebx, eax)
SHR(ebx, cl)
TEST(ebx, ebx)
JZ(sub_esi)
AND(ebx, esi)
XOR(ebx, eax)
CMP(ebx, edx)
JZ(loop.end)
#Add to the CL register which is used for SHR.
#Also check if we've reached the last potential value of CL which is 0x1f
ADD(cl, 0x1)
CMP(cl, 0x1f)
JNZ(loop.begin)
#Decrement ESI by 1, reset CL and restart routine.
peachpy.x86_64.LABEL(sub_esi)
SUB(esi, 0x1)
MOV(cl, 0x1)
JMP(loop.begin)
#The ESI result here will either be 0x0 or a valid value to AND with and get the necessary result.
RETURN(esi)
Maybe an article or a book you can recommend specific to this?
It's not lossy, the final operation is an XOR.
The whole routine can be modeled in C as
#define K 0x9d2c5680
uint32_t hash(uint32_t num)
{
return num ^ ( (num << 7) & K);
}
Now, if we have two bits x and y and the operation x XOR y, when y is zero the result is x.
So given two numbers n1 and n2 and considering their XOR, the bits or n1 that pairs with a zero in n2 would make it to the result unchanged (the others will be flipped).
So in considering num ^ ( (num << 7) & K) we can identify num with n1 and (num << 7) & K with n2.
Since n2 is an AND, we can tell that it must have at least the same zero bits that K has.
This means that each bit of num that corresponds to a zero bit in the constant K will make it unchanged into the result.
Thus, by extracting those bits from the result we already have a partial inverse function:
/*hash & ~K extracts the bits of hash that pair with a zero bit in K*/
partial_num = hash & ~K
Technically, the factor num << 7 would also introduce other zeros in the result of the AND. We know for sure that the lowest 7 bits must be zero.
However K already has the lowest 7 bits zero, so we cannot exploit this information.
So we will just use K here, but if its value were different you'd need to consider the AND (which, in practice, means to zero the lower 7 bits of K).
This leaves us with 13 bits unknown (the ones corresponding to the bits that are set in K).
If we forget about the AND for a moment, we would have x ^ (x << 7) meaning that
hi = numi for i from 0 to 6 inclusive
hi = numi ^ numi-7 for i from 7 to 31 inclusive
(The first line is due to the fact that the lower 7 bits of the right-hand are zero)
From this, starting from h7 and going up, we can retrive num7 as h7 ^ num0 = h7 ^ h0.
From bit 7 onward, the equality doesn't work and we need to use numk (for the suitable k) but luckily we already have computed its value in a previous step (that's why we start from lower to higher).
What the AND does to this is just restricting the values the index i runs in, specifically only to the bits that are set in K.
So to fill in the thirteen remaining bits one have to do:
part_num7 = h7 ^ part_num0
part_num9 = h9 ^ part_num2
part_num12 = h12 ^ part_num5
...
part_num31 = h31 ^ part_num24
Note that we exploited that fact that part_num0..6 = h0..6.
Here's a C program that inverts the function:
#include <stdio.h>
#include <stdint.h>
#define BIT(i, hash, result) ( (((result >> i) ^ (hash >> (i+7))) & 0x1) << (i+7) )
#define K 0x9d2c5680
uint32_t base_candidate(uint32_t hash)
{
uint32_t result = hash & ~K;
result |= BIT(0, hash, result);
result |= BIT(2, hash, result);
result |= BIT(3, hash, result);
result |= BIT(5, hash, result);
result |= BIT(7, hash, result);
result |= BIT(11, hash, result);
result |= BIT(12, hash, result);
result |= BIT(14, hash, result);
result |= BIT(17, hash, result);
result |= BIT(19, hash, result);
result |= BIT(20, hash, result);
result |= BIT(21, hash, result);
result |= BIT(24, hash, result);
return result;
}
uint32_t hash(uint32_t num)
{
return num ^ ( (num << 7) & K);
}
int main()
{
uint32_t tester = 0x5b3e0be0;
uint32_t candidate = base_candidate(hash(tester));
printf("candidate: %x, tester %x\n", candidate, tester);
return 0;
}
Since the original question was how to "bruteforce" instead of solve here's something that I eventually came up with which works just as well. Obviously its prone to errors depending on input (might be more than 1 result).
from peachpy import *
from peachpy.x86_64 import *
input = 0xc63a5be0
x = Argument(uint32_t)
with Function("DotProduct", (x,), uint32_t) as asm_function:
LOAD.ARGUMENT(edx, x) # EDX = 1b6fb67c
MOV(esi, 0xffffffff)
with Loop() as loop:
TEST(esi,esi)
JZ(loop.end)
MOV(eax, esi)
SHL(eax, 0x7)
AND(eax, 0x9d2c5680)
XOR(eax, esi)
CMP(eax, edx)
JZ(loop.end)
SUB(esi, 0x1)
JMP(loop.begin)
RETURN(esi)
#Read Assembler Return
abi = peachpy.x86_64.abi.detect()
encoded_function = asm_function.finalize(abi).encode()
python_function = encoded_function.load()
print(hex(python_function(input)))

how to encode 27 vector3's into a 0-256 value?

I have 27 combinations of 3 values from -1 to 1 of type:
Vector3(0,0,0);
Vector3(-1,0,0);
Vector3(0,-1,0);
Vector3(0,0,-1);
Vector3(-1,-1,0);
... up to
Vector3(0,1,1);
Vector3(1,1,1);
I need to convert them to and from a 8-bit sbyte / byte array.
One solution is to say the first digit, of the 256 = X the second digit is Y and the third is Z...
so
Vector3(-1,1,1) becomes 022,
Vector3(1,-1,-1) becomes 200,
Vector3(1,0,1) becomes 212...
I'd prefer to encode it in a more compact way, perhaps using bytes (which I am clueless about), because the above solution uses a lot of multiplications and round functions to decode, do you have some suggestions please? the other option is to write 27 if conditions to write the Vector3 combination to an array, it seems inefficient.
Thanks to Evil Tak for the guidance, i changed the code a bit to add 0-1 values to the first bit, and to adapt it for unity3d:
function Pack4(x:int,y:int,z:int,w:int):sbyte {
var b: sbyte = 0;
b |= (x + 1) << 6;
b |= (y + 1) << 4;
b |= (z + 1) << 2;
b |= (w + 1);
return b;
}
function unPack4(b:sbyte):Vector4 {
var v : Vector4;
v.x = ((b & 0xC0) >> 6) - 1; //0xC0 == 1100 0000
v.y = ((b & 0x30) >> 4) - 1; // 0x30 == 0011 0000
v.z = ((b & 0xC) >> 2) - 1; // 0xC == 0000 1100
v.w = (b & 0x3) - 1; // 0x3 == 0000 0011
return v;
}
I assume your values are float not integer
so bit operations will not improve speed too much in comparison to conversion to integer type. So my bet using full range will be better. I would do this for 3D case:
8 bit -> 256 values
3D -> pow(256,1/3) = ~ 6.349 values per dimension
6^3 = 216 < 256
So packing of (x,y,z) looks like this:
BYTE p;
p =floor((x+1.0)*3.0);
p+=floor((y+1.0)*3.0*6.0);
p+=floor((y+1.0)*3.0*6.0*6.0);
The idea is convert <-1,+1> to range <0,1> hence the +1.0 and *3.0 instead of *6.0 and then just multiply to the correct place in final BYTE.
and unpacking of p looks like this:
x=p%6; x=(x/3.0)-1.0; p/=6;
y=p%6; y=(y/3.0)-1.0; p/=6;
z=p%6; z=(z/3.0)-1.0;
This way you use 216 from 256 values which is much better then just 2 bits (4 values). Your 4D case would look similar just use instead 3.0,6.0 different constant floor(pow(256,1/4))=4 so use 2.0,4.0 but beware case when p=256 or use 2 bits per dimension and bit approach like the accepted answer does.
If you need real speed you can optimize this to force float representation holding result of packet BYTE to specific exponent and extract mantissa bits as your packed BYTE directly. As the result will be <0,216> you can add any bigger number to it. see IEEE 754-1985 for details but you want the mantissa to align with your BYTE so if you add to p number like 2^23 then the lowest 8 bit of float should be your packed value directly (as MSB 1 is not present in mantissa) so no expensive conversion is needed.
In case you got just {-1,0,+1} instead of <-1,+1>
then of coarse you should use integer approach like bit packing with 2 bits per dimension or use LUT table of all 3^3 = 27 possibilities and pack entire vector in 5 bits.
The encoding would look like this:
int enc[3][3][3] = { 0,1,2, ... 24,25,26 };
p=enc[x+1][y+1][z+1];
And decoding:
int dec[27][3] = { {-1,-1,-1},.....,{+1,+1,+1} };
x=dec[p][0];
y=dec[p][1];
z=dec[p][2];
Which should be fast enough and if you got many vectors you can pack the p into each 5 bits ... to save even more memory space
One way is to store the component of each vector in every 2 bits of a byte.
Converting a vector component value to and from the 2 bit stored form is as simple as adding and subtracting one, respectively.
-1 (1111 1111 as a signed byte) <-> 00 (in binary)
0 (0000 0000 in binary) <-> 01 (in binary)
1 (0000 0001 in binary) <-> 10 (in binary)
The packed 2 bit values can be stored in a byte in any order of your preference. I will use the following format: 00XXYYZZ where XX is the converted (packed) value of the X component, and so on. The 0s at the start aren't going to be used.
A vector will then be packed in a byte as follows:
byte Pack(Vector3<int> vector) {
byte b = 0;
b |= (vector.x + 1) << 4;
b |= (vector.y + 1) << 2;
b |= (vector.z + 1);
return b;
}
Unpacking a vector from its byte form will be as follows:
Vector3<int> Unpack(byte b) {
Vector3<int> v = new Vector<int>();
v.x = ((b & 0x30) >> 4) - 1; // 0x30 == 0011 0000
v.y = ((b & 0xC) >> 2) - 1; // 0xC == 0000 1100
v.z = (b & 0x3) - 1; // 0x3 == 0000 0011
return v;
}
Both the above methods assume that the input is valid, i.e. All components of vector in Pack are either -1, 0 or 1 and that all two-bit sections of b in Unpack have a (binary) value of either 00, 01 or 10.
Since this method uses bitwise operators, it is fast and efficient. If you wish to compress the data further, you could try using the 2 unused bits too, and convert every 3 two-bit elements processed to a vector.
The most compact way is by writing a 27 digits number in base 3 (using a shift -1 -> 0, 0 -> 1, 1 -> 2).
The value of this number will range from 0 to 3^27-1 = 7625597484987, which takes 43 bits to be encoded, i.e. 6 bytes (and 5 spare bits).
This is a little saving compared to a packed representation with 4 two-bit numbers packed in a byte (hence 7 bytes/56 bits in total).
An interesting variant is to group the base 3 digits five by five in bytes (hence numbers 0 to 242). You will still require 6 bytes (and no spare bits), but the decoding of the bytes can easily be hard-coded as a table of 243 entries.

4 byte sequence 0x86 0x65 0x71 0xA5 in a little-endian architecture interpreted as a 32-bit signed integer represents what decimal?

I know how to convert 0x86 0x65 0x71 0xA5 into decimals, I'm just not sure how to approach it past that point.
I assume it is (from least significant to most significant) 134 101 113 165, but what exactly do I do past this point? I'm guessing 134,101,113,165 is not correct. Do I need to convert anything into binary to do this? Kind of lost conceptually.
By converting each octet into decimal, you've essentially converted the number into base 256. You can do it that way, but it's not particularly easy. You'd have to combine the parts as follows:
134 x (256^0) + 101 x (256^1) + 113 x (256^2) + 165 x (256^3)
0x86 0x65 0x71 0xA5 as a 32-bit unsigned integer in little-endian notation would mean that the integer in hex is 0xA5716586. Then just convert from hex to decimal normally.
Either way, you will get 2,775,672,198.
However, this is a signed integer, not an unsigned integer. And because the most significant byte is A5, the most significant bit is 1. Therefore, this is a negative number.
So we need to do some math:
FFFFFFFF - A5716586 = 5A8E9A79
So:
A5716586 + 5A8E9A79 = FFFFFFFF
Also, in 32-bit arithmetic:
FFFFFFFF + 1 = 0
So:
FFFFFFFF => -1
Combining these two:
A5716586 + 5A8E9A79 => -1
A5716586 = -1 -5A8E9A79 = - (5A8E9A79 + 1) = - 5A8E9A7A
Also:
5A8E9A7A => 1,519,295,098 (decimal)
So our final answer is -1,519,295,098

Convert signed to unsigned integer mathematically

I am in need of an unsigned 16 bit integer value in my OPC server but can only send it a signed 16 bit integer. I need to change this signed integer to unsigned mathematically but am unsure how. My internet research has not lead me in the right path either. Could someone please give some advise? Thanks in advance.
Mathematically, the conversion from signed to unsigned works as follows: (1) do the integer division of the signed integer by 1 + max, (2) codify the signed integer as the non-negative remainder of the division. Here max is the maximum integer you can write with the available number of bits, 16 in your case.
Recall that the non-negative remainder is the only integer r that satisfies
1. s = q(1+max) + r
2. 0 <= r < 1+max.
Note that when s >= 0, the non-negative remainder is s itself (you cannot codify integers greater than max). Therefore, there is actually something to do only when s < 0:
if s >= 0 then return s else return 1 + max + s
because the value r = 1 + max + s satisfies conditions 1 and 2 above for the non-negative reminder.
For this convention to work as expected the signed s must satisfy
- (1 + max)/2 <= s < (1 + max)/2
In your case, given that you have 16 bits, we have max = 0xFFFF and 1 + max = 0x10000 = 65536.
Note also that if you codify a negative integer with this convention, the result will have its highest bit on, i.e., equal to 1. This way, the highest bit becomes a flag that tells whether the number is negative or positive.
Examples:
2 -> 2
1 -> 1
0 -> 0
-1 -> 0xFFFF
-2 -> 0xFFFE
-3 -> 0xFFFD
...
-15 -> 0xFFF1
...
-32768 -> 0x8000 = 32768 (!)
-32769 -> error: cannot codify using only 16 bits.

x86-64 Big Integer Representation?

How do hig-performance native big-integer libraries on x86-64 represent a big integer in memory? (or does it vary? Is there a most common way?)
Naively I was thinking about storing them as 0-terminated strings of numbers in base 264.
For example suppose X is in memory as:
[8 bytes] Dn
.
.
[8 bytes] D2
[8 bytes] D1
[8 bytes] D0
[8 bytes] 0
Let B = 264
Then
X = Dn * Bn + ... + D2 * B2 + D1 * B1 + D0
The empty string (i.e. 8 bytes of zero) means zero.
Is this a reasonable way? What are the pros and cons of this way? Is there a better way?
How would you handle signedness? Does 2's complement work with this variable length value?
(Found this: http://gmplib.org/manual/Integer-Internals.html Whats a limb?)
I would think it would be as an array lowest value to highest. I implemented addition of arbitrary sized numbers in assembler. The CPU provides the carry flag that allows you to easily perform these sorts of operations. You write a loop that performs the operation in byte size chunks. The carry flag is included in the next operation using the "Add with carry" instruction (ADC opcode).
Here I have some examples of processing Big Integers.
Addition
Principle is pretty simple. You need to use CF (carry-flag) for any bigger overflow, with adc (add with carry) propagating that carry between chunks. Let's think about two 128-bit number addition.
num1_lo: dq 1<<63
num1_hi: dq 1<<63
num2_lo: dq 1<<63
num2_hi: dq 1<<62
;Result of addition should be 0xC0000000 0x000000001 0x00000000 0x00000000
mov eax, dword [num1_lo]
mov ebx, dword [num1_lo+4]
mov ecx, dword [num1_hi]
mov edx, dword [num1_hi+4]
add eax, dword [num2_lo]
adc ebx, dword [num2_lo+4]
adc ecx, dword [num2_hi]
adc edx, dword [num2_hi+4]
; 128-bit integer sum in EDX:ECX:EBX:EAX
jc .overflow ; detect wrapping if you want
You don't need all of it in registers at once; you could store a 32-bit chunk before loading the next, because mov doesn't affect FLAGS. (Looping is trickier, although dec/jnz is usable on modern CPUs which don't have partial-flag stalls for ADC reading CF after dec writes other FLAGS. See Problems with ADC/SBB and INC/DEC in tight loops on some CPUs)
Subtraction
Very similar to addition, although you CF is now called borrow.
mov eax, dword [num1_lo]
mov ebx, dword [num1_lo+4]
mov ecx, dword [num1_hi]
mov edx, dword [num1_hi+4]
sub eax, dword [num2_lo]
sbb ebx, dword [num2_lo+4]
sbb ecx, dword [num2_hi]
sbb edx, dword [num2_hi+4]
jb .overflow ;or jc
Multiplication
Is much more difficult. You need to multiply each part of first number with each part of second number and add the results. You don't have to multiply only two highest parts that will surely overflow. Pseudocode:
long long int /*128-bit*/ result = 0;
long long int n1 = ;
long long int n2 = ;
#define PART_WIDTH 32 //to be able to manipulate with numbers in 32-bit registers
int i_1 = 0; /*iteration index*/
for(each n-bit wide part of first number : n1_part) {
int i_2 = 0;
for(each n-bit wide part of second number : n2_part) {
result += (n1_part << (i_1*PART_WIDTH))*(n2_part << (i_2*PART_WIDTH));
i_2++;
}
i++;
}
Division
is even more complicated. User Brendan on OsDev.org forum posted example pseudocode for division of n-bit integers. I'm pasting it here because principle is the same.
result = 0;
count = 0;
remainder = numerator;
while(highest_bit_of_divisor_not_set) {
divisor = divisor << 1;
count++;
}
while(remainder != 0) {
if(remainder >= divisor) {
remainder = remainder - divisor;
result = result | (1 << count);
}
if(count == 0) {
break;
}
divisor = divisor >> 1;
count--;
}
Dividing a wide number by a 1-chunk (32 or 64-bit number) can use a sequence of div instructions, using the remainder of the high element as the high half of the dividend for the next lower chunk. See Why should EDX be 0 before using the DIV instruction? for an example of when div is useful with non-zero EDX.
But this doesn't generalize to N-chunk / N-chunk division, hence the above manual shift / subtract algorithm.

Resources