Removing branches when setting local variables - opencl

Which code is more efficient:
Routine A
local int a[5];
bool condition;
...
a[0] = 0;
if (condition) {
a[0] = 1;
}
or Routine B
local int a[5];
bool condition;
....
a[0] = 0;
a[0] = select(a[0], 1, condition);
The second listing removes the branch, but the select statement may access local memory twice, if condition is false. Hopefully, the compiler would put in a no-op if condition is false.

Between the low level code and the target machine there's a compiler, that and the target machine the compiler is compiling for would ultimately determine the assembly code to be generated.
For instance you can't say things like 'removing the if statement gets rid of the branch', what if the compiler decided to use a compare and set instruction instead of a compare branch followed by move?
If the condition is always false the code will be considered dead code and the compiler may decide to put nothing instead of a no-op.
If you manage to generate the assembly code corresponding for Routine A and Routine B and get a table for the instruction costs for the target machine only then you'd be able to talk about machine code efficiency. Compilers usually have such tables embedded in them to try and choose the most suitable combinations of instructions.
select looks like it's designed for vectors instead of just single values. Most likely so it can use SIMP instructions which operate on arrays rather than just single values like what we have at hand. See OpenCL built-in function 'select'

Related

How do I do jumps to a label in an enclosing function in LLVM IR?

I want to do an LLVM compiler for a very old language, PL/M. This has some peculiar features, not least of which is having nested functions with the ability to jump out of an enclosing function. In pseudocode:
toplevel() {
nested() {
if (something)
goto label;
}
nested();
label:
print("finished!");
}
The constraints here are:
you can only jump into the top-level function, luckily
the stack does get unwound (the language does not support destructors, so this is easy)
you do not have to have executed the statement at label before jumping (so the naive setjmp/longjmp method doesn't work).
code at label can be executed normally, i.e. it's not like catch
LLVM has a number of non-local jump mechanisms, such as the exception handling system, but I've never used that. Can this be implemented using LLVM exceptions, or are they not suitable for this? Is there an easier way?
If you want the stack to get unwound, you'll likely want it to be in a separate function, at least a separate LLVM IR function. (The only real exception is if your language does not have a construct like C's "alloca()" and you don't allow calling a nested function by address in which case you could inline it.)
That part of the problem you mentioned, jumping out of an enclosing function, is best handled by having some way for the callee to communicate "how it exited" to the caller, and the caller having a "switch()" on that value. You could stick it in the return value (if it already returns a value, make it a struct of both values), you could add a pointer parameter that it writes to, you could add it a thread-local global variable and fill that in before calling longjmp, or you could use exceptions.
Exceptions, they're complex (I can't describe how to make them work offhand but the docs are here: https://llvm.org/docs/ExceptionHandling.html ) and slow when the exception path is taken, and really intended for exceptional situations, not for normal code. Setjmp/longjmp does the same thing as exceptions except simpler to use and without the performance trade-off when executed, but unfortunately there are miscompiles in LLVM which you need will be the one to fix if you start using them in earnest (see the postscript at the end of the answer).
Those two options cover the ways you can do it without changing the function signature, which may be necessary if your language allows the address to be taken then called later.
If you do need to take the address of nested, then LLVM supports trampolines. See https://llvm.org/docs/LangRef.html#trampoline-intrinsics . Trampolines solve the problem of accessing the local variables of the calling function from the callee, even when the function is called by address.
PS. LLVM miscompiles setjmp/longjmp today. The current model is that a call to setjmp may return twice, and only functions with the returns_twice attribute may return twice. Note that this doesn't affect the whole call stack, only the direct caller of a function that returns twice has to deal with the twice-returning call-- just because function F calls setjmp does not mean that F itself can return twice. So far, so good.
The problem is that in a function with a setjmp, all function calls may themselves call longjmp. I'd say "unless proven otherwise" as with all things in optimizers, but there is no attribute in LLVM doesnotlongjmp or any code within LLVM that attempts to answer the question of whether a function could call longjmp. Adding that would be a good optimization, but it's a separate issue from the miscompile.
If you have code like this pseudo-code:
%entry block:
allocate val
val <- 0
setjmpret <- call setjmp
br i1 setjmpret, %first setjmp return block, %second setjmp return block
%first setjmp return block:
val <- 1;
call foo();
goto after;
%second setjmp return block:
call print(val);
goto after;
%after:
return
The control flow graph shows that is no path from val <- 0 to val <- 1 to print(val). The only path with "print(val)" has "val <- 0" before it therefore constant propagation may turn print(val) into print(0). The problem here is a missing control flow edge from foo() back to the %second setjmp return block. In a function that contains a setjmp, all calls which may call longjmp must have a CFG edge to the second setjmp return block. In LLVM that control flow edge is missing and LLVM miscompiles code because of it.
This problem also manifests in the backend. The first time I heard of this problem it was in the context of the backend losing track of the placement of variables on the stack, and this issue was the underlying root cause.
For the most part setjmp/longjmp seems to work because LLVM isn't usually able to analyze what calling foo() might do and can't perform the optimization. For instance if val was not a fresh allocation but was a pointer, then who's to say that foo() doesn't have access to the same pointer, and then performs "val <- 1" on it? If LLVM can't prove that impossible, that precludes the transform to print(0). Secondly, setjmp/longjmp are just not used often in real code.

STM32F4 UART half word addressing

Trying to roll my own code for STM32F4 UART.
A peculiarity of this chip is that if you use byte addressing as the GNAT compiler does when setting a single bit, the corresponding bit in the other byte of the half word is set. The data sheet says use half word addressing. Is there a way to tell the compiler to do this? I tried
for CR1_register'Size use 16;
but this had no effect. Writing the whole 16 bit word works, but you lose the ability to set named bits.
The GNAT way to do this, as used in the AdaCore Ada Drivers Library, is to use the GNAT-only aspect Volatile_Full_Access, about which the GNAT Reference Manual says
This is similar in effect to pragma Volatile, except that any reference to the object is guaranteed to be done only with instructions that read or write all the bits of the object. Furthermore, if the object is of a composite type, then any reference to a subcomponent of the object is guaranteed to read and/or write all the bits of the object.
The intention is that this be suitable for use with memory-mapped I/O devices on some machines. Note that there are two important respects in which this is different from pragma Atomic. First a reference to a Volatile_Full_Access object is not a sequential action in the RM 9.10 sense and, therefore, does not create a synchronization point. Second, in the case of pragma Atomic, there is no guarantee that all the bits will be accessed if the reference is not to the whole object; the compiler is allowed (and generally will) access only part of the object in this case.
Their code is
-- Control register 1
type CR1_Register is record
-- Send break
SBK : Boolean := False;
...
end record
with Volatile_Full_Access, Size => 32,
Bit_Order => System.Low_Order_First;
for CR1_Register use record
SBK at 0 range 0 .. 0;
...
end record;
Portable way is to do this explicitly: read whole record, modify, then write it back. As long as it is declared Volatile a compiler will not optimize reads and writes out.
-- excerpt from my working code --
declare
R : Control_Register_1 := Module.CR1;
begin
R.UE := True;
Module.CR1 := R;
end;
This is very verbose, but it does its work.

Do you make safe and unsafe version of your functions or just stick to the safe version? (Embedded System)

let's say you have a function that set an index and then update few variables based on the value stored in the array element which the index is pointing to. Do you check the index to make sure it is in range? (In embedded system environment to be specific Arduino)
So far I have made a safe and unsafe version for all functions, is that a good idea? In some of my other codes I noticed that having only safe functions result in checking conditions multiple time as the libraries get larger, so I started to develop both. The safe function checks the condition and call the unsafe function as shown in example below for the case explained above.
Safe version:
bool RcChannelModule::setFactorIndexAndUpdateBoundaries(factorIndex_T factorIndex)
{
if(factorIndex < N_FACTORS)
{
setFactorIndexAndUpdateBoundariesUnsafe(factorIndex);
return true;
}
return false;
}
Unsafe version:
void RcChannelModule::setFactorIndexAndUpdateBoundariesUnsafe(factorIndex_T factorIndex)
{
setCuurentFactorIndexUnsafe(factorIndex);
updateOutputBoundaries();
}
If I am doing it wrong fundamentally please let me know why and how I could avoid that. Also I would like to know, generally when you program, do you consider the future user to be a fool or you expect them to follow the minimal documentation provided? (the reason I say minimal is because I do not have the time to write a proper documentation)
void RcChannelModule::setCuurentFactorIndexUnsafe(const factorIndex_T factorIndex)
{
currentFactorIndex_ = factorIndex;
}
Safety checks, such as array index range checks, null checks, and so on, are intended to catch programming errors. When these checks fail, there is no graceful recovery: the best the program can do is to log what happened, and restart.
Therefore, the only time when these checks become useful is during debugging and testing of your code. C++ provides built-in functionality for dealing with this through asserts, which are kept in the debug versions of the code, but compiled out from the release version:
void RcChannelModule::setFactorIndexAndUpdateBoundariesUnsafe(factorIndex_T factorIndex) {
assert(factorIndex < N_FACTORS);
setCuurentFactorIndexUnsafe(factorIndex);
updateOutputBoundaries();
}
Note: [When you make a library for external use] an argument-checking version of each external function perhaps makes sense, with non-argument-checking implementations of those and all internal-only functions. If you perform argument checking then do it (only) at the boundary between your library and the client code. But it's pointless to offer a choice to your users, for if you want to protect them from usage errors then you cannot rely on them to choose the "safe" versions of your functions. (John Bollinger)
Do you make safe and unsafe version of your functions or just stick to the safe version?
For higher level code, I recommend one version, a safe one.
High level code, with a large set of related functions and data, the combinations of interactions of data and code are not possible to fully check at development time. When an error is detected, the data should be set to indicate an error state. Subsequent use of data within these functions would be aware of the error state.
For lowest level -time critical routines, I'd go with #dasblinkenlight answer. Create one source code that compiles 2 ways per the debug and release compiles.
Yet keep in mind #pete becker, it this really likely a performance bottle neck to do a check?
With floating-point related routines, use the NaN to help keep track of an unrecoverable error.
Lastly, as able, create functions that do not fail and avoid the issue. With many, not all, this only requires small code additions. It often only adds a constant of time performance penalty and not a O(n) penalty.
Example: Consider a function to lop off the first character of a string - in place.
// This work fine as long as s[0] != 0
char *slop_1(char *s) {
size_t len = strlen(s); // most work is here
return memmove(s, s + 1, len); // and here
}
Instead define the function, and code it, to do nothing when s[0] == 0
char *slop_2(char *s) {
size_t len = strlen(s);
if (len > 0) { // negligible additional work
memmove(s, s + 1, len);
}
return s;
}
Similar code can be applied to OP's example. Note that it is "safe", at least within the function. The assert() scheme can still be used to discovery development issues. Yet the released code, without the assert(), still checks the range.
void RcChannelModule::setFactorIndexAndUpdateBoundaries(factorIndex_T factorIndex)
{
if(factorIndex < N_FACTORS) {
setFactorIndexAndUpdateBoundariesUnsafe(factorIndex);
} else {
assert(1);
}
}
Since you tagged this Arduino and embedded, you have a very resource-constrained system, one of the crappiest processors still manufactured.
On such a system you cannot afford extra error handling. It is better to properly document what values the parameters passed to the function must have, then leave the checking of this to the caller.
The caller can then either check this in run-time, if needed, or otherwise in compile-time with a static assert. Your function would however not be able to implement it as a static assert, as it can't know if factorIndex is a run-time variable or a compile-time constant.
As for "I have no time to write proper documentation", that's nonsense. It takes far less time to document this function than to post this SO question. You don't necessarily have to write an essay in some Word file. You don't necessarily have to use Doxygen or similar.
But you do need to write the bare minimum of documentation: In the header file, document the purpose and expected values of all function parameters in the form of comments. Preferably you should have a coding standard for how to document such functions. A minimal documentation of public API functions in the form of comments is part of your job as programmer. The code is not complete until this is written.

How can I list available operating system signals by name in a cross-platform way in Go?

Let's say I'm implementing the kill program in Go. I can accept numeric signals and PIDs from the commandline and send them to syscall.Kill no problem.
However, I don't know how to implement the "string" form of signal dispatch, e.g. kill -INT 12345.
The real use case is a part of a larger program that prompts the user to send kill signals; not a replacement for kill.
Question:
How can I convert valid signal names to signal numbers on any supported platform, at runtime (or at least without writing per-platform code to be run at compile time)?
What I've tried:
Keep a static map of signal names to numbers. This doesn't work in a cross-platform way (for example, different signal lists are returned by kill -l on Mac OSX versus a modern Linux versus an older Linux, for example). The only way to make this solution work in general would be to make maps for every OS, which would require me to know the behavior of every OS, and keep up to date as they add new signal support.
Shell out to the GNU kill tool and capture the signal lists from it. This is inelegant and kind of a paradox, and also requires a) being able to find kill, b) having the ability/permission to exec subprocesses, and c) being able to predict/parse the output of kill-the-binary.
Use the various Signal types' String method. This just returns strings containing the signal number, e.g. os.Signal(4).String() == "signal 4", which is not useful.
Call the private function runtime.signame, which does exactly what I want. go://linkname hacks will work, but I'm assuming that this sort of thing is frowned-upon for a reason.
Ideas/Things I Haven't Tried:
Use CGo somehow. I'd rather not venture into CGO territory for a project that is otherwise not low-level/needful of native integration at all. If that's the only option, I will, but have no idea where to start.
Use templating and code generation to build lists of signals based on external sources at compile time. This is not preferable for the same reasons as CGo.
Reflect and parse the members of syscall that start with SIG somehow. I am told that this is not possible because names are compiled away; is it possible that, for something as fundamental as signal names, there's someplace they're not compiled away?
Commit d455e41 added this feature in March 2019 as sys/unix.SignalNum() and is thus available at least since Go 1.13. More details in GitHub issue #28027.
From the documentation of the golang.org/x/sys/unix package:
func SignalNum(s string) syscall.Signal
SignalNum returns the syscall.Signal for signal named s, or 0 if a signal with such name is not found. The signal name should start with "SIG".
To answer a similar question, "how can I list the names of all available signals (on a given Unix-like platform)", we can use the inverse function sys/unix.SignalName():
import "golang.org/x/sys/unix"
// See https://github.com/golang/go/issues/28027#issuecomment-427377759
// for why looping in range 0,255 is enough.
for i := syscall.Signal(0); i < syscall.Signal(255); i++ {
name := unix.SignalName(i)
// Signal numbers are not guaranteed to be contiguous.
if name != "" {
fmt.Println(name)
}
}
Update some time after I posted the below answer, Golang's stdlib acquired this functionality. An answer describing how to use that functionality was posted by #marco.m and accepted; the below is not recommended unless the version of Go you are using pre-dates the availability of the right tool for the job.
Since no answers were posted, I'll post the less-than-ideal solution I was able to use by "breaking into" a private signal-enumeration function inside Go's standard library.
The signame internal function can get a signal name by number on Unix and Windows. To call it, you have to use the linkname/assembler workaround. Basically, make a file in your project called empty.s or similar, with no contents, and then a function declaration like so:
//go:linkname signame runtime.signame
func signame(sig uint32) string
Then, you can get a list of all signals known by the operating system by calling signame on an increasing number until it doesn't return a value, like so:
signum := uint32(0)
signalmap = make(map[uint32]string)
for len(signame(signum)) > 0 {
words := strings.Fields(signame(signum))
if words[0] == "signal" || ! strings.HasPrefix(words[0], "SIG") {
signalmap[signum] = ""
} else {
// Remove leading SIG and trailing colon.
signalmap[signum] = strings.TrimRight(words[0][3:], ":")
}
signum++
}
After that runs, signalmap will have keys for every signal that can be sent on the current operating system. It will have an empty string where Go doesn't think the OS has a name for the signal (the kill(1) may name some signals that Go won't return names for, I've found, but it's usually the higher-numbered/nonstandard ones), or a string name, e.g. "INT" where a name can be found.
This behavior is undocumented, subject to change, and may not hold true on some platforms. It would be nice if this were made public, though.

Behaviour of non-const int pointer on a const int

#include<stdio.h>
int main()
{
const int sum=100;
int *p=(int *)∑
*p=101;
printf("%d, %d",*p,sum);
return 0;
}
/*
output
101, 101
*/
p points to a constant integer variable, then why/how does *p manage to change the value of sum?
It's undefined behavior - it's a bug in the code. The fact that the code 'appears to work' is meaningless. The compiler is allowed to make it so your program crashes, or it's allowed to let the program do something nonsensical (such as change the value of something that's supposed to be const). Or do something else altogether. It's meaningless to 'reason' about the behavior, since there is no requirement on the behavior.
Note that if the code is compiled as C++ you'll get an error since C++ won't implicitly cast away const. Hopefully, even when compiled as C you'll get a warning.
p contains the memory address of the variable sum. The syntax *p means the actual value of sum.
When you say
*p=101
you're saying: go to the address p (which is the address where the variable sum is stored) and change the value there. So you're actually changing sum.
You can see const as a compile-time flag that tells the compiler "I shouldn't modify this variable, tell me if I do." It does not enforce anything on whether you can actually modify the variable or not.
And since you are modifying that variable through a non-const pointer, the compiler is indeed going to tell you:
main.c: In function 'main':
main.c:6:16: warning: initialization discards qualifiers from pointer target type
You broke your own promise, the compiler warns you but will let you proceed happily.
The behavior is undefined, which means that it may produce different outcomes on different compiler implementations, architecture, compiler/optimizer/linker options.
For the sake of analysis, here it is:
(Disclaimer: I don't know compilers. This is just a logical guess at how the compiler may choose to handle this situation, from a naive assembly-language debugger perspective.)
When a constant integer is declared, the compiler has the choice of making it addressable or non-addressable.
Addressable means that the integer value will actually occupy a memory location, such that:
The lifetime will be static.
The value might be hard-coded into the binary, or initialized during program startup.
It can be accessed with a pointer.
It can be accessed from any binary code that knows of its address.
It can be placed in either read-only or writable memory section.
For everyday CPUs the non-writeability is enforced by memory management unit (MMU). Messing the MMU is messy impossible from user-space, and it is not worth for a mere const integer value.
Therefore, it will be placed into writable memory section, for simplicity's sake.
If the compiler chooses to place it in non-writable memory, your program will crash (access violation) when it tries to write to the non-writable memory.
Setting aside microcontrollers - you would not have asked this question if you were working on microcontrollers.
Non-addressable means that it does not occupy a memory address. Instead, every code that references the variable (i.e. use the value of that integer) will receive a r-value, as if you did a find-and-replace to change every instance of sum into a literal 100.
In some cases, the compiler cannot make the integer non-addressable: if the compiler knows that you're taking the address of it, then surely the compiler knows that it has to put that value in memory. Your code belongs to this case.
Yet, with some aggressively-optimizing compiler, it is entirely possible to make it non-addressable: the variable could have been eliminated and the printf will be turned into int main() { printf("%s, %s", (b1? "100" : "101"), (b2? "100" : "101")); return 0; } where b1 and b2 will depend on the mood of the compiler.
The compiler will sometimes take a split decision - it might do one of those, or even something entirely different:
Allocate a memory location, but replace every reference with a constant literal. When this happens, a debugger will tell you the value is zero but any code that uses that location will appear to contain a hard-coded value.
Some compiler may be able to detect that the cast causes a undefined behavior and refuse to compile.

Resources