Understanding a code-example from the Intel Intrinsics Guide

Understanding a code-example from the Intel Intrinsics Guide - intel

I am trying to learn what _mm256_permute2f128_ps() does, but can't fully understand the intel's code-example.
DEFINE SELECT4(src1, src2, control) {
CASE(control[1:0]) OF
0: tmp[127:0] := src1[127:0]
1: tmp[127:0] := src1[255:128]
2: tmp[127:0] := src2[127:0]
3: tmp[127:0] := src2[255:128]
ESAC
IF control[3]
tmp[127:0] := 0
FI
RETURN tmp[127:0]
}
dst[127:0] := SELECT4(a[255:0], b[255:0], imm8[3:0])
dst[255:128] := SELECT4(a[255:0], b[255:0], imm8[7:4])
dst[MAX:256] := 0
Specifically, I don't understand:
the imm8[3:0] notation. Are they using it as a 4-byte mask?
But I've seen people invoke _mm256_permute2f128_pd(myVec, myVec, 5), where imm8 is used as a number (number 5).
Inside the SELECT4 function, what does control[1:0] mean? Is control a byte-mask, or used as a number? How many bytes is it made of?
why IF control[3] is used in intel's example. Doesn't it undo the choice 3: inside CASE? Why would we ever want to set tmp[127 to 0] to zero, if we've been outputting into it?

The [x:y] notations always refers to bitnumbers in this case. E.g., if you pass 5 as the imm8 argument, then (because 5==0b00000101) imm8[3:0]==0b0101==5 and if that was passed as control to the SELECT4 macro, you would get control[3]==0==false and control[1:0]==0b01==1. The control[2] bit would be ignored.
Fully evaluating this, you get
dst[127:0] := SELECT4(a[255:0], b[255:0], 5) == a[255:128]
dst[255:128] := SELECT4(a[255:0], b[255:0], 0) == a[127:0]
That means this would switch the upper and lower half of the a register and store it into the dst register.
The dst[MAX:256] := 0 is only relevant for architectures with larger registers (if you have AVX-512), i.e., it sets everything above bit 255 to zero. This is in contrast to legacy SSE instructions, which (if executed on CPUs with AVX-support) would leave the upper half unchanged (and producing false dependencies -- see this related question).

Related

Modifying R variable assigned to a literal through C leading to strange behavior

Does anyone know what is going on with the following R code?
library(inline)
increment <- cfunction(c(a = "integer"), "
INTEGER(a)[0]++;
return R_NilValue;
")
print_one = function(){
one = 0L
increment(one)
print(one)
}
print_one() # prints 1
print_one() # prints 2
The printed results are 1, 2. Replacing one = 0L with one = integer(1) gives result 1, 1.
Just to clarify. I know I'm passing the variable one by reference to the C function, so its value should change (become 1). What I don't understand is that, how come resetting one = 0L seem to have no effect at all after the first call to print_one (the second call to print_one prints 2 instead of 1).

This really (as the last comment hinted) have little to do with Rcpp. It's really mostly about the .C() interface used by the old inline package and here by cfunction approach in the question.
That leads to two answers.
First, the consensus among R developers is that .C() is deprecated and should no longer be used. Statements to that effect can be found on the r-devel and r-package-devel lists. .C() uses plain old types as pointers in the interface so here the integer value is passed as int* by reference and can be altered.
If we switch to Rcpp uses, and hence to the underlying .Call() interface using only SEXP types for input and output, then an int no passes by reference. So the code behaves and prints only 0:
Rcpp::cppFunction("void increment2(int a) { a++; }")
print_two <- function(){
two <- 0L
increment2(two)
print(two)
}
print_two() # prints 0
print_two() # prints 0
Lastly, Rcpp (capital R) is of course not the "sucessor" to inline (as it does a whole lot more than inline but it among all its functionality is (since around 2013) a quasi-replacement for inline in Rcpp Attributes. So with Rcpp 'as-is' since about 2013 you no longer need the examples and approach from inline.

How to explain this strange phenomenon about pointer of slice in Golang? [duplicate]

Okay it's hard to describe it in words but let's say I have a map that stores int pointers, and want to store the result of an operation as another key in my hash:
m := make(map[string]*int)
m["d"] = &(*m["x"] + *m["y"])
This doesn't work and gives me the error: cannot take the address of *m["x"] & *m["y"]
Thoughts?

A pointer is a memory address. For example a variable has an address in memory.
The result of an operation like 3 + 4 does not have an address because there is no specific memory allocated for it. The result may just live in processor registers.
You have to allocate memory whose address you can put into the map. The easiest and most straightforward is to create a local variable for it.
See this example:
x, y := 1, 2
m := map[string]*int{"x": &x, "y": &y}
d := *m["x"] + *m["y"]
m["d"] = &d
fmt.Println(m["d"], *m["d"])
Output (try it on the Go Playground):
0x10438300 3
Note: If the code above is in a function, the address of the local variable (d) that we just put into the map will continue to live even if we return from the function (that is if the map is returned or created outside - e.g. a global variable). In Go it is perfectly safe to take and return the address of a local variable. The compiler will analyze the code and if the address (pointer) escapes the function, it will automatically be allocated on the heap (and not on the stack). For details see FAQ: How do I know whether a variable is allocated on the heap or the stack?
Note #2: There are other ways to create a pointer to a value (as detailed in this answer: How do I do a literal *int64 in Go?), but they are just "tricks" and are not nicer or more efficient. Using a local variable is the cleanest and recommended way.
For example this also works without creating a local variable, but it's obviously not intuitive at all:
m["d"] = &[]int{*m["x"] + *m["y"]}[0]
Output is the same. Try it on the Go Playground.

The result of the addition is placed somewhere transient (on the stack) and it would therefore not be safe to take its address. You should be able to work around this by explicitly allocating an int on the heap to hold your result:
result := make(int)
*result = *m["x"] + *m["y"]
m["d"] = result

In Go, you can not take the reference of a literal value (formally known as an r-value). Try the following:
package main
import "fmt"
func main() {
x := 3;
y := 2;
m := make(map[string]*int)
m["x"] = &x
m["y"] = &y
f := *m["x"] + *m["y"]
m["d"] = &f
fmt.Printf("Result: %d\n",*m["d"])
}
Have a look at this tutorial.

How to check for potential overflow in Ada when dealing with expression?

I am relatively new to Ada and have been using Ada 2005. However, I feel like this question is pertinent to all languages.
I am currently using static analysis tools such as Codepeer to address potential vulnerabilities in my code.
One problem I'm debating is how to handle checks before assigning an expression that may cause overflow to a variable.
This can be explained better with an example. Let's say I have a variable of type unsigned 32-bit integer. I am assigning an expression to this variable CheckMeForOverflow:
CheckMeForOverflow := (Val1 + Val2) * Val3;
My dilemma is how to efficiently check for overflow in cases such as this - which would seem to appear quite often in code. Yes, I could do this:
if ((Val1 + Val2) * Val3) < Unsigned_Int'Size then
CheckMeForOverflow := (Val1 + Val2) * Val3;
end if;
My issue with this is that this seems inefficient to check the expression and then immediately assign that same expression if there is no potential for overflow.
However, when I look online, this seems to be pretty common. Could anyone explain better alternatives or explain why this is a good choice? I don't want this scattered throughout my code.
I also realize I could make another variable of a bigger type to hold the expression, do the evaluation against the new variable, and then assign that variable's value to CheckMeForOverflow, but then again, that would mean making a new variable and using it just to perform a single check and then never using it again. This seems wasteful.
Could someone please provide some insight?
Thanks so much!

Personally I would do something like this
begin
CheckMeForOverflow := (Val1 + Val2) * Val3;
exception
when constraint_error =>
null; -- or log that it overflowed
end;
But take care that your variable couldn't have a usable value.
It's clearer than an if construct and we don't perform the calculation twice.

This is exactly the problem SPARK can help solve. It allows you to prove you won't have runtime errors given certain assumptions about the inputs to your calculations.
If you start with a simple function like No_Overflow in this package:
with Interfaces; use Interfaces;
package Show_Runtime_Errors is
type Unsigned_Int is range 0 .. 2**32 - 1;
function No_Overflow (Val1, Val2, Val3 : Unsigned_Int) return Unsigned_Int;
end Show_Runtime_Errors;
package body Show_Runtime_Errors is
function No_Overflow (Val1, Val2, Val3 : Unsigned_Int) return Unsigned_Int is
Result : constant Unsigned_Int := (Val1 + Val2) * Val3;
begin
return Result;
end No_Overflow;
end Show_Runtime_Errors;
Then when you run SPARK on it, you get the following:
Proving...
Phase 1 of 2: generation of Global contracts ...
Phase 2 of 2: flow analysis and proof ...
show_runtime_errors.adb:4:55: medium: range check might fail (e.g. when Result = 10)
show_runtime_errors.adb:4:55: medium: overflow check might fail (e.g. when
Result = 9223372039002259450 and Val1 = 4 and Val2 = 2147483646 and
Val3 = 4294967293)
gnatprove: unproved check messages considered as errors
exit status: 1
Now if you add a simple precondition to No_Overflow like this:
function No_Overflow (Val1, Val2, Val3 : Unsigned_Int) return Unsigned_Int with
Pre => Val1 < 2**15 and Val2 < 2**15 and Val3 < 2**16;
Then SPARK produces the following:
Proving...
Phase 1 of 2: generation of Global contracts ...
Phase 2 of 2: flow analysis and proof ...
Success!
Your actual preconditions on the ranges of the inputs will obviously depend on your application.
The alternatives are the solution you are assuming where you put lots of explicit guards in your code before the expression is evaluated, or to catch runtime errors via exception handling. The advantage of SPARK over these approaches is that you do not need to build your software with runtime checks if you can prove ahead of time there will be no runtime errors.
Note that preconditions are a feature of Ada 2012. You can also use pragma Assert throughout your code which SPARK can take advantage of for doing proofs.
For more on SPARK there is a tutorial here:
https://learn.adacore.com/courses/intro-to-spark/index.html
To try it yourself, you can paste the above code in the example here:
https://learn.adacore.com/courses/intro-to-spark/book/03_Proof_Of_Program_Integrity.html#runtime-errors
Incidentally, the code you suggested:
if ((Val1 + Val2) * Val3) < Unsigned_Int'Size then
CheckMeForOverflow := (Val1 + Val2) * Val3;
end if;
won't work for two reasons:
Unsigned_Int'Size is the number of bits needed to represent Unsigned_Int. You likely wanted Unsigned_Int'Last instead.
((Val1 + Val2) * Val3) can overflow before the comparison to Unsigned_Int'Last is even done. Thus you will generate an exception at this point and either crash or handle it in an exception handler.

wire in always block/case statement - Verilog

Following is a sample code that uses case statement and always #(*) block. I don't get how the always block is triggered and why it works even when x is declared as wire.
wire [2:0] x = 0;
always #(*)
begin
case (1'b1)
x[0]: $display("Bit 0 : %0d",x[0]);
x[1]: $display("Bit 1 : %0d",x[1]);
x[2]: $display("Bit 2 : %0d",x[2]);
default: $display("In default case");
endcase
end
Any help is appreciated.
Thanks.

As we know, reg can be driven by a wire, we can definitely use a wire as the right hand side of the assignment in any procedural block.
Here, your code checks which bit of x is 1'b1 (of course giving priority to zeroth bit). Lets say x changes to 3'b010. Then, Bit 1 shall be displayed and so on. Now, if x=3'b011 then Bit 0 is displayed since zeroth bit is checked first.
As you can see, there is no assignment to x, the procedural block only reads its value. Moreover, the system task $display also reads the value of x.
There is no change of signal value from this block. Hence, this code works fine. If, by chance, we had something like x[0] = ~x[0] instead of $display, then this code shall provide compilation issues.
More information can be found at this and this links.

Here, this always block does not assign a value to a x, but it just checks a value of x. So it's a legal use of wire.

So, the explanation to the part of your question about how always #(*) is triggered is as follows :
"Nets and variables that appear on the right-hand side of assignments, in subroutine calls, in case and conditional expressions, as an index variable on the left-hand side of assignments, or as variables in case item expressions shall all be included in always #(*)."
Ref: IEEE Std 1800-2012 Sec 9.4.2.2
As an extension of #sharvil111's answer, if your code was something like this
always #(*)
begin
case (sel)
x[0]: $display("Bit 0 : %0d",x[0]);
x[1]: $display("Bit 1 : %0d",x[1]);
x[2]: $display("Bit 2 : %0d",x[2]);
default: $display("In default case");
endcase
end
The procedural block would be triggered whenever there is a change in sel signal or x i.e. it would be equivalent to always #(sel or x).

VHDL OR logic with 32 bit vector

zero <= result_i(31) OR result_i(30) OR result_i(29) OR result_i(28)
OR result_i(27) OR result_i(26) OR result_i(25) OR result_i(24)
OR result_i(23) OR result_i(22) OR result_i(21) OR result_i(20)
OR result_i(19) OR result_i(18) OR result_i(17) OR result_i(16)
OR result_i(15) OR result_i(14) OR result_i(13) OR result_i(12)
OR result_i(11) OR result_i(10) OR result_i(9) OR result_i(8)
OR result_i(7) OR result_i(6) OR result_i(5) OR result_i(4)
OR result_i(3) OR result_i(2) OR result_i(1) OR result_i(0);
How can I make this shorter?

I am assuming you are using std_logic/std_logic_vector types.
Then you can use or_reduce from ieee.std_logic_misc.
library ieee;
use ieee.std_logic_misc.or_reduce;
...
zero <= or_reduce(result_i);
Or write your own function:
function or_reduce(vector : std_logic_vector) return std_logic is
variable result : std_logic := '0';
begin
for i in vector'range loop
result := result or vector(i);
end loop
return result;
end function;
A general tip if you are just starting out with VHDL is to not forget about functions and procedures. Unlike Verilog (Without SystemVerilog) VHDL has good support for writing clean and high level code, even for synthesis, using functions and procedures. If you are doing something repetitive it is a sure sign that it should be wrapped in a function/procedure. In this case there already was a standard function ready to be used though.
You might also want to consider pipelining the or-reduction and inserting flip-flops between the stages. Maybe the 32-bit reduction that you use in your example should still run a reasonably high frequency in an FPGA device but if you are going to use more bits or target a really high frequency you might want to use an or-tree where no more than 6-8 bits are or:ed in each pipeline stage. You can still re-use the or_reduce function for the intermediate operations though.

You can achieve it with vhdl revision 2008
VHDL-2008 defines unary operators, like these:
outp <= and "11011";
outp <= xor "11011";
So in your case it would be:
zero <= or result_i;

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Understanding a code-example from the Intel Intrinsics Guide - intel

Related

Modifying R variable assigned to a literal through C leading to strange behavior

How to explain this strange phenomenon about pointer of slice in Golang? [duplicate]

How to check for potential overflow in Ada when dealing with expression?

wire in always block/case statement - Verilog

VHDL OR logic with 32 bit vector

Categories

Resources