Error in LLVM 13 documentation for little-endian vectors? - llvm-ir

LLVM 13 added a short note on the bit representation of sub-byte elements to its documentation on vector types. I can follow everything it says except for the memory diagram for little endian which doesn't look right and disagrees with my experiments. I'm wondering if I'm misunderstanding something:
The same example for little endian:
%val = bitcast <4 x i4> <i4 1, i4 2, i4 3, i4 5> to i16
; Bitcasting from a vector to an integral type can be seen as
; concatenating the values:
; %val now has the hexadecimal value 0x5321.
store i16 %val, i16* %ptr
; In memory the content will be (8-bit addressing):
;
; [%ptr + 0]: 01010011 (0x53)
; [%ptr + 1]: 00100001 (0x21)
I agree that %val has value 0x5321, but shouldn't the memory layout be 0x21 0x53 (33 83 in decimal) instead of 0x53 0x21? For example, bitcasting %val to <2 x i8> yields <i8 33, i8 83>.

Related

Plotting images from array containing datatype UInt8

I have a bunch of images (of cats) and want to plot one of them. The values image are in the format UInt8 and contain 3 bands. When I try to plot using plots I get the following error, ERROR: StackOverflowError.
Using Plots
# Get data
train_data_x = fid["train_set_x"] |> HDF5.read
#Out >
3×64×64×209 Array{UInt8, 4}:
[:, :, 1, 1] =
0x11 0x16 0x19 0x19 0x1b … 0x01 0x01 0x01 0x01
0x1f 0x21 0x23 0x23 0x24 0x1c 0x1c 0x1a 0x16
0x38 0x3b 0x3e 0x3e 0x40 0x3a 0x39 0x38 0x33
...
# Reshape to be in the format, no_of_images x length x width x channels
train_data_rsp = reshape(train_data_x, (209,64,64,3))
# Get first image
first_img = train_data_rsp[1, :, :, :]
plot(first_img)
Out >
ERROR: StackOverflowError:
# I also tried plotting one band and I get a line plot
plot(train_data_rsp[1,:,:,1])
#Out >
Any ideas whats incorrect with my code?
First, I'd be careful about how you're reshapeing; I think this will merely rearrange the pixels in your images instead of swapping the dimensions, which it seems like you want to do. You may want train_data_rsp = permutedims(train_data_x, (4, 2, 3, 1)) which will actually swap the dimensions around and give you a 209×64×64×3 array with the semantics of which pixels belong to which images preserved.
Then, Julia's Images package has a colorview function that lets you combine the separate R,G,B channels into a single image. You'll first need to convert your array element type into N0f8 (a single-byte format where 0 corresponds to 0 and 255 to 1) so that Images can work with it. It would look something like this:
julia> arr_rgb = N0f8.(first_img // 255) # rescale UInt8 in range [0,255] to Rational with denominator 255 in range [0,1]
64×64×3 Array{N0f8,3} with eltype N0f8:
[...]
julia> img = colorview(RGB, map(i->selectdim(arr_rgb, 3, i), 1:3)...)
64×64 mappedarray(RGB{N0f8}, ImageCore.extractchannels, view(::Array{N0f8,3}, :, :, 1), view(::Array{N0f8,3}, :, :, 2), view(::Array{N0f8,3}, :, :, 3)) with eltype RGB{N0f8}:
[...]
Then you should be able to plot this image.

how to encode 27 vector3's into a 0-256 value?

I have 27 combinations of 3 values from -1 to 1 of type:
Vector3(0,0,0);
Vector3(-1,0,0);
Vector3(0,-1,0);
Vector3(0,0,-1);
Vector3(-1,-1,0);
... up to
Vector3(0,1,1);
Vector3(1,1,1);
I need to convert them to and from a 8-bit sbyte / byte array.
One solution is to say the first digit, of the 256 = X the second digit is Y and the third is Z...
so
Vector3(-1,1,1) becomes 022,
Vector3(1,-1,-1) becomes 200,
Vector3(1,0,1) becomes 212...
I'd prefer to encode it in a more compact way, perhaps using bytes (which I am clueless about), because the above solution uses a lot of multiplications and round functions to decode, do you have some suggestions please? the other option is to write 27 if conditions to write the Vector3 combination to an array, it seems inefficient.
Thanks to Evil Tak for the guidance, i changed the code a bit to add 0-1 values to the first bit, and to adapt it for unity3d:
function Pack4(x:int,y:int,z:int,w:int):sbyte {
var b: sbyte = 0;
b |= (x + 1) << 6;
b |= (y + 1) << 4;
b |= (z + 1) << 2;
b |= (w + 1);
return b;
}
function unPack4(b:sbyte):Vector4 {
var v : Vector4;
v.x = ((b & 0xC0) >> 6) - 1; //0xC0 == 1100 0000
v.y = ((b & 0x30) >> 4) - 1; // 0x30 == 0011 0000
v.z = ((b & 0xC) >> 2) - 1; // 0xC == 0000 1100
v.w = (b & 0x3) - 1; // 0x3 == 0000 0011
return v;
}
I assume your values are float not integer
so bit operations will not improve speed too much in comparison to conversion to integer type. So my bet using full range will be better. I would do this for 3D case:
8 bit -> 256 values
3D -> pow(256,1/3) = ~ 6.349 values per dimension
6^3 = 216 < 256
So packing of (x,y,z) looks like this:
BYTE p;
p =floor((x+1.0)*3.0);
p+=floor((y+1.0)*3.0*6.0);
p+=floor((y+1.0)*3.0*6.0*6.0);
The idea is convert <-1,+1> to range <0,1> hence the +1.0 and *3.0 instead of *6.0 and then just multiply to the correct place in final BYTE.
and unpacking of p looks like this:
x=p%6; x=(x/3.0)-1.0; p/=6;
y=p%6; y=(y/3.0)-1.0; p/=6;
z=p%6; z=(z/3.0)-1.0;
This way you use 216 from 256 values which is much better then just 2 bits (4 values). Your 4D case would look similar just use instead 3.0,6.0 different constant floor(pow(256,1/4))=4 so use 2.0,4.0 but beware case when p=256 or use 2 bits per dimension and bit approach like the accepted answer does.
If you need real speed you can optimize this to force float representation holding result of packet BYTE to specific exponent and extract mantissa bits as your packed BYTE directly. As the result will be <0,216> you can add any bigger number to it. see IEEE 754-1985 for details but you want the mantissa to align with your BYTE so if you add to p number like 2^23 then the lowest 8 bit of float should be your packed value directly (as MSB 1 is not present in mantissa) so no expensive conversion is needed.
In case you got just {-1,0,+1} instead of <-1,+1>
then of coarse you should use integer approach like bit packing with 2 bits per dimension or use LUT table of all 3^3 = 27 possibilities and pack entire vector in 5 bits.
The encoding would look like this:
int enc[3][3][3] = { 0,1,2, ... 24,25,26 };
p=enc[x+1][y+1][z+1];
And decoding:
int dec[27][3] = { {-1,-1,-1},.....,{+1,+1,+1} };
x=dec[p][0];
y=dec[p][1];
z=dec[p][2];
Which should be fast enough and if you got many vectors you can pack the p into each 5 bits ... to save even more memory space
One way is to store the component of each vector in every 2 bits of a byte.
Converting a vector component value to and from the 2 bit stored form is as simple as adding and subtracting one, respectively.
-1 (1111 1111 as a signed byte) <-> 00 (in binary)
0 (0000 0000 in binary) <-> 01 (in binary)
1 (0000 0001 in binary) <-> 10 (in binary)
The packed 2 bit values can be stored in a byte in any order of your preference. I will use the following format: 00XXYYZZ where XX is the converted (packed) value of the X component, and so on. The 0s at the start aren't going to be used.
A vector will then be packed in a byte as follows:
byte Pack(Vector3<int> vector) {
byte b = 0;
b |= (vector.x + 1) << 4;
b |= (vector.y + 1) << 2;
b |= (vector.z + 1);
return b;
}
Unpacking a vector from its byte form will be as follows:
Vector3<int> Unpack(byte b) {
Vector3<int> v = new Vector<int>();
v.x = ((b & 0x30) >> 4) - 1; // 0x30 == 0011 0000
v.y = ((b & 0xC) >> 2) - 1; // 0xC == 0000 1100
v.z = (b & 0x3) - 1; // 0x3 == 0000 0011
return v;
}
Both the above methods assume that the input is valid, i.e. All components of vector in Pack are either -1, 0 or 1 and that all two-bit sections of b in Unpack have a (binary) value of either 00, 01 or 10.
Since this method uses bitwise operators, it is fast and efficient. If you wish to compress the data further, you could try using the 2 unused bits too, and convert every 3 two-bit elements processed to a vector.
The most compact way is by writing a 27 digits number in base 3 (using a shift -1 -> 0, 0 -> 1, 1 -> 2).
The value of this number will range from 0 to 3^27-1 = 7625597484987, which takes 43 bits to be encoded, i.e. 6 bytes (and 5 spare bits).
This is a little saving compared to a packed representation with 4 two-bit numbers packed in a byte (hence 7 bytes/56 bits in total).
An interesting variant is to group the base 3 digits five by five in bytes (hence numbers 0 to 242). You will still require 6 bytes (and no spare bits), but the decoding of the bytes can easily be hard-coded as a table of 243 entries.

4 byte sequence 0x86 0x65 0x71 0xA5 in a little-endian architecture interpreted as a 32-bit signed integer represents what decimal?

I know how to convert 0x86 0x65 0x71 0xA5 into decimals, I'm just not sure how to approach it past that point.
I assume it is (from least significant to most significant) 134 101 113 165, but what exactly do I do past this point? I'm guessing 134,101,113,165 is not correct. Do I need to convert anything into binary to do this? Kind of lost conceptually.
By converting each octet into decimal, you've essentially converted the number into base 256. You can do it that way, but it's not particularly easy. You'd have to combine the parts as follows:
134 x (256^0) + 101 x (256^1) + 113 x (256^2) + 165 x (256^3)
0x86 0x65 0x71 0xA5 as a 32-bit unsigned integer in little-endian notation would mean that the integer in hex is 0xA5716586. Then just convert from hex to decimal normally.
Either way, you will get 2,775,672,198.
However, this is a signed integer, not an unsigned integer. And because the most significant byte is A5, the most significant bit is 1. Therefore, this is a negative number.
So we need to do some math:
FFFFFFFF - A5716586 = 5A8E9A79
So:
A5716586 + 5A8E9A79 = FFFFFFFF
Also, in 32-bit arithmetic:
FFFFFFFF + 1 = 0
So:
FFFFFFFF => -1
Combining these two:
A5716586 + 5A8E9A79 => -1
A5716586 = -1 -5A8E9A79 = - (5A8E9A79 + 1) = - 5A8E9A7A
Also:
5A8E9A7A => 1,519,295,098 (decimal)
So our final answer is -1,519,295,098

Hex Addition Overflow detection

I'm trying to detect whether a hex arithmetic results in overflow or not.
Using just 8 bit two's complement signed operation.
0xFF + 0x1
But first, I'm having trouble determining number is negative or positive in hexadecimal.
In 2's complement, overflow occurs when the result is the wrong sign.
Example:
Two positives yield a negative result:
01111111 (+127)
+ 00000001 (+ 1)
-------------------
10000000 (-128) <-- overflow (wrong sign)
Two negatives yield a positive result:
11111111 ( -1)
+ 10000000 (-128)
-------------------
01111111 (+127) <-- overflow (wrong sign)
Note: overflow cannot occur if adding numbers with opposite signs.
01111111 (+127)
+ 10000000 (-128)
-------------------
11111111 ( -1)
Concerning the sign, the leftmost bit is the sign bit. "0" is positive
and "1" is negative.
Example:
+------- sign bit
|
v
0xFF = 11111111 = -1
0x80 = 10000000 = -128
0x01 = 00000001 = +1
0x7F = 01111111 = +127
If the leftmost hexadecimal digit is 0, 1, 2, 3, 4, 5, 6, or 7,
then it is positive. If the leftmost hexadecimal digit is 8, 9, A, B,
C, D, E, or F, then it is negative.

How to interpret a binary integer as ternary (base 3)?

My CPU register contains a binary integer 0101, equal to the decimal number 5:
0101 ( 4 + 1 = 5 )
I want the register to contain instead the binary integer equal to decimal 10, as if the original binary number 0101 were ternary (base 3) and every digit happens to be either 0 or 1:
0101 ( 9 + 1 = 10 )
How can i do this on a contemporary CPU or GPU with 1. the fewest memory reads and 2. the fewest hardware instructions?
Use an accumulator. C-ish Pseudocode:
var accumulator = 0
foreach digit in string
accumulator = accumulator * 3 + (digit - '0')
return accumulator
To speed up the multiply by 3, you might use ((accumulator << 1) + accumulator), but a good compiler will be able to do that for you.
If a large percentage of your numbers are within a relatively small range, you can also pregenerate a lookup table to make the transformation from base2 to base3 instantaneous (using the base2 value as the index). You can also use the lookup table to accelerate lookup of the first N digits, so you only pay for the conversion of the remaining digits.
This C program will do it:
#include <stdio.h>
main()
{
int binary = 5000; //Example
int ternary = 0;
int po3 = 1;
do
{
ternary += (binary & 1) * po3;
po3 *= 3;
}
while (binary >>= 1 != 0);
printf("%d\n",ternary);
}
The loop compiles into this machine code on my 32-bit Intel machine:
do
{
ternary += (binary & 1) * po3;
0041BB33 mov eax,dword ptr [binary]
0041BB36 and eax,1
0041BB39 imul eax,dword ptr [po3]
0041BB3D add eax,dword ptr [ternary]
0041BB40 mov dword ptr [ternary],eax
po3 *= 3;
0041BB43 mov eax,dword ptr [po3]
0041BB46 imul eax,eax,3
0041BB49 mov dword ptr [po3],eax
}
while (binary >>= 1 != 0);
0041BB4C mov eax,dword ptr [binary]
0041BB4F sar eax,1
0041BB51 mov dword ptr [binary],eax
0041BB54 jne main+33h (41BB33h)
For the example value (decimal 5000 = binary 1001110001000), the ternary value it produces is 559899.

Resources