how to generate instruction for llvm vector comparison - vector

In LLVM, we can create a comparison instruction for scalar instruction easily. For example, if we have:
%a = fsub double %1, %2
%b = fadd double %3, %4
%c = fcmp one double %a, %b
where instruction %c can be generated by:
c = new FCmpInst(insertAt, FCmpInst::FCMP_ONE, %a, %b, instName)
Can we do the similar comparison conveniently for vector instructions? For example, if %a and %b are the following respectively:
%a = fsub <2 x double>, %5, %6; %5 and %6 are in vector type
%b = fadd <2 x double>, %7, %8
Can we have the similar comparison as the scalar version to check the equivalence for %a and %b?
I actually tried the same instruction as the above, but it produces the following error "void llvm::BranchInst::AssertOK(): Assertion `getCondition()->getType()->isIntegerTy(1) && "May only branch on boolean predicates!"' failed." when I load the pass.

fcmp of a vector returns a vector of boolean results. You'll need to change the way you reduce that into a boolean predicate for branching purposes.
For your case, you'll want to compare the result of the fcmp with zero.

I think http://lists.llvm.org/pipermail/llvm-dev/2012-September/053046.html may contain the answer to your question.
It can be summarized to
sign extend your boolean vector to an integer vector
bitcast the integer vector to a integer of same size
make the comparison on the integer
%a = fsub <2 x double>, %5, %6;
%b = fadd <2 x double>, %7, %8
%c = fcmp one <2 x i1>, %a, %b
%d = sext <2 x i1> %c to <2 x i64>
%e = bitcast <2 x i64> %d to i128
%f = icmp ne %e, 0
br i1 %f, label %true1, label %false2
The sign extension is used so that backends of standard vector instruction sets map this instruction without switching vector size.

Related

Plotting images from array containing datatype UInt8

I have a bunch of images (of cats) and want to plot one of them. The values image are in the format UInt8 and contain 3 bands. When I try to plot using plots I get the following error, ERROR: StackOverflowError.
Using Plots
# Get data
train_data_x = fid["train_set_x"] |> HDF5.read
#Out >
3×64×64×209 Array{UInt8, 4}:
[:, :, 1, 1] =
0x11 0x16 0x19 0x19 0x1b … 0x01 0x01 0x01 0x01
0x1f 0x21 0x23 0x23 0x24 0x1c 0x1c 0x1a 0x16
0x38 0x3b 0x3e 0x3e 0x40 0x3a 0x39 0x38 0x33
...
# Reshape to be in the format, no_of_images x length x width x channels
train_data_rsp = reshape(train_data_x, (209,64,64,3))
# Get first image
first_img = train_data_rsp[1, :, :, :]
plot(first_img)
Out >
ERROR: StackOverflowError:
# I also tried plotting one band and I get a line plot
plot(train_data_rsp[1,:,:,1])
#Out >
Any ideas whats incorrect with my code?
First, I'd be careful about how you're reshapeing; I think this will merely rearrange the pixels in your images instead of swapping the dimensions, which it seems like you want to do. You may want train_data_rsp = permutedims(train_data_x, (4, 2, 3, 1)) which will actually swap the dimensions around and give you a 209×64×64×3 array with the semantics of which pixels belong to which images preserved.
Then, Julia's Images package has a colorview function that lets you combine the separate R,G,B channels into a single image. You'll first need to convert your array element type into N0f8 (a single-byte format where 0 corresponds to 0 and 255 to 1) so that Images can work with it. It would look something like this:
julia> arr_rgb = N0f8.(first_img // 255) # rescale UInt8 in range [0,255] to Rational with denominator 255 in range [0,1]
64×64×3 Array{N0f8,3} with eltype N0f8:
[...]
julia> img = colorview(RGB, map(i->selectdim(arr_rgb, 3, i), 1:3)...)
64×64 mappedarray(RGB{N0f8}, ImageCore.extractchannels, view(::Array{N0f8,3}, :, :, 1), view(::Array{N0f8,3}, :, :, 2), view(::Array{N0f8,3}, :, :, 3)) with eltype RGB{N0f8}:
[...]
Then you should be able to plot this image.

How to bruteforce a lossy AND routine?

Im wondering whether there are any standard approaches to reversing AND routines by brute force.
For example I have the following transformation:
MOV(eax, 0x5b3e0be0) <- Here we move 0x5b3e0be0 to EDX.
MOV(edx, eax) # Here we copy 0x5b3e0be0 to EAX as well.
SHL(edx, 0x7) # Bitshift 0x5b3e0be0 with 0x7 which results in 0x9f05f000
AND(edx, 0x9d2c5680) # AND 0x9f05f000 with 0x9d2c5680 which results in 0x9d045000
XOR(edx, eax) # XOR 0x9d045000 with original value 0x5b3e0be0 which results in 0xc63a5be0
My question is how to brute force and reverse this routine (i.e. transform 0xc63a5be0 back into 0x5b3e0be0)
One idea i had (which didn't work) was this using PeachPy implementation:
#Input values
MOV(esi, 0xffffffff) < Initial value to AND with, which will be decreased by 1 in a loop.
MOV(cl, 0x1) < Initial value to SHR with which will be increased by 1 until 0x1f.
MOV(eax, 0xc63a5be0) < Target result which I'm looking to get using the below loop.
MOV(edx, 0x5b3e0be0) < Input value which will be transformed.
sub_esi = peachpy.x86_64.Label()
with loop:
#End the loop if ESI = 0x0
TEST(esi, esi)
JZ(loop.end)
#Test the routine and check if it matches end result.
MOV(ebx, eax)
SHR(ebx, cl)
TEST(ebx, ebx)
JZ(sub_esi)
AND(ebx, esi)
XOR(ebx, eax)
CMP(ebx, edx)
JZ(loop.end)
#Add to the CL register which is used for SHR.
#Also check if we've reached the last potential value of CL which is 0x1f
ADD(cl, 0x1)
CMP(cl, 0x1f)
JNZ(loop.begin)
#Decrement ESI by 1, reset CL and restart routine.
peachpy.x86_64.LABEL(sub_esi)
SUB(esi, 0x1)
MOV(cl, 0x1)
JMP(loop.begin)
#The ESI result here will either be 0x0 or a valid value to AND with and get the necessary result.
RETURN(esi)
Maybe an article or a book you can recommend specific to this?
It's not lossy, the final operation is an XOR.
The whole routine can be modeled in C as
#define K 0x9d2c5680
uint32_t hash(uint32_t num)
{
return num ^ ( (num << 7) & K);
}
Now, if we have two bits x and y and the operation x XOR y, when y is zero the result is x.
So given two numbers n1 and n2 and considering their XOR, the bits or n1 that pairs with a zero in n2 would make it to the result unchanged (the others will be flipped).
So in considering num ^ ( (num << 7) & K) we can identify num with n1 and (num << 7) & K with n2.
Since n2 is an AND, we can tell that it must have at least the same zero bits that K has.
This means that each bit of num that corresponds to a zero bit in the constant K will make it unchanged into the result.
Thus, by extracting those bits from the result we already have a partial inverse function:
/*hash & ~K extracts the bits of hash that pair with a zero bit in K*/
partial_num = hash & ~K
Technically, the factor num << 7 would also introduce other zeros in the result of the AND. We know for sure that the lowest 7 bits must be zero.
However K already has the lowest 7 bits zero, so we cannot exploit this information.
So we will just use K here, but if its value were different you'd need to consider the AND (which, in practice, means to zero the lower 7 bits of K).
This leaves us with 13 bits unknown (the ones corresponding to the bits that are set in K).
If we forget about the AND for a moment, we would have x ^ (x << 7) meaning that
hi = numi for i from 0 to 6 inclusive
hi = numi ^ numi-7 for i from 7 to 31 inclusive
(The first line is due to the fact that the lower 7 bits of the right-hand are zero)
From this, starting from h7 and going up, we can retrive num7 as h7 ^ num0 = h7 ^ h0.
From bit 7 onward, the equality doesn't work and we need to use numk (for the suitable k) but luckily we already have computed its value in a previous step (that's why we start from lower to higher).
What the AND does to this is just restricting the values the index i runs in, specifically only to the bits that are set in K.
So to fill in the thirteen remaining bits one have to do:
part_num7 = h7 ^ part_num0
part_num9 = h9 ^ part_num2
part_num12 = h12 ^ part_num5
...
part_num31 = h31 ^ part_num24
Note that we exploited that fact that part_num0..6 = h0..6.
Here's a C program that inverts the function:
#include <stdio.h>
#include <stdint.h>
#define BIT(i, hash, result) ( (((result >> i) ^ (hash >> (i+7))) & 0x1) << (i+7) )
#define K 0x9d2c5680
uint32_t base_candidate(uint32_t hash)
{
uint32_t result = hash & ~K;
result |= BIT(0, hash, result);
result |= BIT(2, hash, result);
result |= BIT(3, hash, result);
result |= BIT(5, hash, result);
result |= BIT(7, hash, result);
result |= BIT(11, hash, result);
result |= BIT(12, hash, result);
result |= BIT(14, hash, result);
result |= BIT(17, hash, result);
result |= BIT(19, hash, result);
result |= BIT(20, hash, result);
result |= BIT(21, hash, result);
result |= BIT(24, hash, result);
return result;
}
uint32_t hash(uint32_t num)
{
return num ^ ( (num << 7) & K);
}
int main()
{
uint32_t tester = 0x5b3e0be0;
uint32_t candidate = base_candidate(hash(tester));
printf("candidate: %x, tester %x\n", candidate, tester);
return 0;
}
Since the original question was how to "bruteforce" instead of solve here's something that I eventually came up with which works just as well. Obviously its prone to errors depending on input (might be more than 1 result).
from peachpy import *
from peachpy.x86_64 import *
input = 0xc63a5be0
x = Argument(uint32_t)
with Function("DotProduct", (x,), uint32_t) as asm_function:
LOAD.ARGUMENT(edx, x) # EDX = 1b6fb67c
MOV(esi, 0xffffffff)
with Loop() as loop:
TEST(esi,esi)
JZ(loop.end)
MOV(eax, esi)
SHL(eax, 0x7)
AND(eax, 0x9d2c5680)
XOR(eax, esi)
CMP(eax, edx)
JZ(loop.end)
SUB(esi, 0x1)
JMP(loop.begin)
RETURN(esi)
#Read Assembler Return
abi = peachpy.x86_64.abi.detect()
encoded_function = asm_function.finalize(abi).encode()
python_function = encoded_function.load()
print(hex(python_function(input)))

How to write more than 7 columns into a csv file by IDL?

I have been tried to write 8 columns in a CSV file by using IDL, but seems the maximum columns that I can wrote is 7?
IDL> write_csv,ffout,date_time,tmin_tmax,precp,wind,rh,sun_hrs,glb_rad,net_rad
WRITE_CSV: Incorrect number of arguments
The documentation for WRITE_CSV indicates you can write up to 8 columns, which works for me:
x = findgen(10)
write_csv, 'test.csv', x, x, x, x, x, x, x, x
In any case, if you need to write more columns, it is easy enough to use PRINTF to output each line:
for i = 0L, n_lines - 1L do begin
printf, lun, a[i], b[i], c[i], d[i], e[i], f[i], g[i], h[i], $
format='(%"%f, %f, %f, %f, %f, %f, %f, %f")'
endfor
You can change the format codes as appropriate, or use the Fortran-style format codes as you prefer.
As long as your input arrays are one-dimensional you can just concatenate them (and do an additional transpose):
a = FINDGEN(3)
b = TRANSPOSE([ [a], [a], [a], [a], [a], [a], [a], [a], [a], [a] ])
WRITE_CSV, 'test.csv', b

Type stability with container types and matrix-vector multiply in Julia

I am trying to use Julia's A_mul_B! with a container type, something like
# my composite type, contains 2 vectors and 1 matrix of same Float type
type MyContainer{T <: Float}
z :: Vector
x :: Matrix
y :: Vector
MyContainer(z::Vector{T}, x::Matrix{T}, y::Vector{T}) = new(z,x,y)
end
I then use an instance of MyContainer with A_mul_B! followed by some arithmetic with the Vector objects:
# only work with single/double precision
typealias Float Union{Float32, Float64}
# function to perform mat-vec multiply
function f{T <: Float}(v::MyContainer{T})
Base.A_mul_B!(v.z, v.x, v.y)
return sumabs2(v.z) * sumabs2(v.y)
end
As defined, f is curiously not type-stable, even though the constructor itself is type-stable. Is there a place where I can annotate the types of z, x, and y so that A_mul_B! sees them?
Here is a minimal working example:
MyModule.jl
module MyModule
export MyContainer, f
# only work with single/double precision
typealias Float Union{Float32, Float64}
# my composite type, contains 2 vectors and 1 matrix of same Float type
type MyContainer{T <: Float}
z :: Vector
x :: Matrix
y :: Vector
MyContainer(z::Vector{T}, x::Matrix{T}, y::Vector{T}) = new(z,x,y)
end
# testing routine initializes all arrays with a single value
function MyContainer{T <: Float}(n::Int, t::T)
z = t*ones(T, n)
x = t*ones(T, (n,n))
y = t*ones(T, n)
return MyContainer{eltype(z)}(z, x, y)
end
# function to perform mat-vec multiply
function f{T <: Float}(v::MyContainer{T})
Base.A_mul_B!(v.z, v.x, v.y)
return sumabs2(v.z) * sumabs2(v.y)
end
end
test.jl
include("MyModule.jl")
function g()
# check type stability
#code_warntype MyModule.MyContainer(10, 1.0) # type-stable
#code_warntype MyModule.f(v) # red Array{T,1}, Array{T,2}, Any
# make a container
v = MyModule.MyContainer(10, 1.0)
# does type-stability matter for performance?
#time 1+1
MyModule.f(v)
#time MyModule.f(v) # maybe... note small memory allocation
end
g()
partial output
# omit output of #code_warntype for conciseness
0.000000 seconds
0.000001 seconds (3 allocations: 48 bytes)
10000.0
As David Sanders pointed out, the problem is
type MyContainer{T <: Float}
z :: Vector
x :: Matrix
y :: Vector
MyContainer(z::Vector{T}, x::Matrix{T}, y::Vector{T}) = new(z,x,y)
end
Since Vector and Matrix are abstract types, this type's fields are not concrete-inferrable. The fix is to concretely type them:
type MyContainer{T <: Float}
z :: Vector{T}
x :: Matrix{T}
y :: Vector{T}
end

How to interpret a binary integer as ternary (base 3)?

My CPU register contains a binary integer 0101, equal to the decimal number 5:
0101 ( 4 + 1 = 5 )
I want the register to contain instead the binary integer equal to decimal 10, as if the original binary number 0101 were ternary (base 3) and every digit happens to be either 0 or 1:
0101 ( 9 + 1 = 10 )
How can i do this on a contemporary CPU or GPU with 1. the fewest memory reads and 2. the fewest hardware instructions?
Use an accumulator. C-ish Pseudocode:
var accumulator = 0
foreach digit in string
accumulator = accumulator * 3 + (digit - '0')
return accumulator
To speed up the multiply by 3, you might use ((accumulator << 1) + accumulator), but a good compiler will be able to do that for you.
If a large percentage of your numbers are within a relatively small range, you can also pregenerate a lookup table to make the transformation from base2 to base3 instantaneous (using the base2 value as the index). You can also use the lookup table to accelerate lookup of the first N digits, so you only pay for the conversion of the remaining digits.
This C program will do it:
#include <stdio.h>
main()
{
int binary = 5000; //Example
int ternary = 0;
int po3 = 1;
do
{
ternary += (binary & 1) * po3;
po3 *= 3;
}
while (binary >>= 1 != 0);
printf("%d\n",ternary);
}
The loop compiles into this machine code on my 32-bit Intel machine:
do
{
ternary += (binary & 1) * po3;
0041BB33 mov eax,dword ptr [binary]
0041BB36 and eax,1
0041BB39 imul eax,dword ptr [po3]
0041BB3D add eax,dword ptr [ternary]
0041BB40 mov dword ptr [ternary],eax
po3 *= 3;
0041BB43 mov eax,dword ptr [po3]
0041BB46 imul eax,eax,3
0041BB49 mov dword ptr [po3],eax
}
while (binary >>= 1 != 0);
0041BB4C mov eax,dword ptr [binary]
0041BB4F sar eax,1
0041BB51 mov dword ptr [binary],eax
0041BB54 jne main+33h (41BB33h)
For the example value (decimal 5000 = binary 1001110001000), the ternary value it produces is 559899.

Resources