Why Eva plugin of Frama-c return unkown when it acctually found a counter example of an assertion - frama-c

I am trying to insert an assertion inside a function. Here is what I did:
void foo(int a) {
//# assert a == 1;
}
void main() {
foo(1);
foo(2);
}
I expect to get an invalid result, but Frama-C returns an unknown result while it can provide a counterexample with the call stack.
Here is the screenshot when I run my example with Frama-C:

Eva indicates that the status is unknown because it has observed a callstack where the assertion was valid and another were it was invalid. However, because the plug-in performs over-approximations (well, not here, as your program is trivial, but in the general case there will be), it has no way to be sure that both can happen in a concrete execution: one of the branch (either the one validating the assertion or the one invalidating it) might be impossible to reach in the concrete world, because of conditions that are out of reach of the abstractions that are used by Eva. Hence the only sound possibility, which encompasses all possibilities, is to put an Unknown status here.
You can also see the same issue arise if you comment out the foo(1) call. Eva will then report that the assertion is invalid, but only provided that it can indeed be reached.
Finally, this kind of annotations are indeed the ones you usually want to investigate about first (as opposed to annotations with a "plain" Unknown status), and in newer versions of Frama-C (starting from v17.0), you have an additional panel Red Alarms that lists properties that are invalid for at least one callstack.

Related

How do I do jumps to a label in an enclosing function in LLVM IR?

I want to do an LLVM compiler for a very old language, PL/M. This has some peculiar features, not least of which is having nested functions with the ability to jump out of an enclosing function. In pseudocode:
toplevel() {
nested() {
if (something)
goto label;
}
nested();
label:
print("finished!");
}
The constraints here are:
you can only jump into the top-level function, luckily
the stack does get unwound (the language does not support destructors, so this is easy)
you do not have to have executed the statement at label before jumping (so the naive setjmp/longjmp method doesn't work).
code at label can be executed normally, i.e. it's not like catch
LLVM has a number of non-local jump mechanisms, such as the exception handling system, but I've never used that. Can this be implemented using LLVM exceptions, or are they not suitable for this? Is there an easier way?
If you want the stack to get unwound, you'll likely want it to be in a separate function, at least a separate LLVM IR function. (The only real exception is if your language does not have a construct like C's "alloca()" and you don't allow calling a nested function by address in which case you could inline it.)
That part of the problem you mentioned, jumping out of an enclosing function, is best handled by having some way for the callee to communicate "how it exited" to the caller, and the caller having a "switch()" on that value. You could stick it in the return value (if it already returns a value, make it a struct of both values), you could add a pointer parameter that it writes to, you could add it a thread-local global variable and fill that in before calling longjmp, or you could use exceptions.
Exceptions, they're complex (I can't describe how to make them work offhand but the docs are here: https://llvm.org/docs/ExceptionHandling.html ) and slow when the exception path is taken, and really intended for exceptional situations, not for normal code. Setjmp/longjmp does the same thing as exceptions except simpler to use and without the performance trade-off when executed, but unfortunately there are miscompiles in LLVM which you need will be the one to fix if you start using them in earnest (see the postscript at the end of the answer).
Those two options cover the ways you can do it without changing the function signature, which may be necessary if your language allows the address to be taken then called later.
If you do need to take the address of nested, then LLVM supports trampolines. See https://llvm.org/docs/LangRef.html#trampoline-intrinsics . Trampolines solve the problem of accessing the local variables of the calling function from the callee, even when the function is called by address.
PS. LLVM miscompiles setjmp/longjmp today. The current model is that a call to setjmp may return twice, and only functions with the returns_twice attribute may return twice. Note that this doesn't affect the whole call stack, only the direct caller of a function that returns twice has to deal with the twice-returning call-- just because function F calls setjmp does not mean that F itself can return twice. So far, so good.
The problem is that in a function with a setjmp, all function calls may themselves call longjmp. I'd say "unless proven otherwise" as with all things in optimizers, but there is no attribute in LLVM doesnotlongjmp or any code within LLVM that attempts to answer the question of whether a function could call longjmp. Adding that would be a good optimization, but it's a separate issue from the miscompile.
If you have code like this pseudo-code:
%entry block:
allocate val
val <- 0
setjmpret <- call setjmp
br i1 setjmpret, %first setjmp return block, %second setjmp return block
%first setjmp return block:
val <- 1;
call foo();
goto after;
%second setjmp return block:
call print(val);
goto after;
%after:
return
The control flow graph shows that is no path from val <- 0 to val <- 1 to print(val). The only path with "print(val)" has "val <- 0" before it therefore constant propagation may turn print(val) into print(0). The problem here is a missing control flow edge from foo() back to the %second setjmp return block. In a function that contains a setjmp, all calls which may call longjmp must have a CFG edge to the second setjmp return block. In LLVM that control flow edge is missing and LLVM miscompiles code because of it.
This problem also manifests in the backend. The first time I heard of this problem it was in the context of the backend losing track of the placement of variables on the stack, and this issue was the underlying root cause.
For the most part setjmp/longjmp seems to work because LLVM isn't usually able to analyze what calling foo() might do and can't perform the optimization. For instance if val was not a fresh allocation but was a pointer, then who's to say that foo() doesn't have access to the same pointer, and then performs "val <- 1" on it? If LLVM can't prove that impossible, that precludes the transform to print(0). Secondly, setjmp/longjmp are just not used often in real code.

MPI_Comm_dup() not working when sending MPI_COMM_NULL as argument

Somewhere I used
MPI_Comm_dup(row_comm, &bigger_row_comm);
and I noticed it caused 'fatal' error when row_comm was equal to MPI_COMM_NULL. I changed it with
if (row_comm != MPI_COMM_NULL)
MPI_Comm_dup(row_comm, &bigger_row_comm);
else
bigger_row_comm = MPI_COMM_NULL;
Now it works. I use MPICH and found this in its documentation in the entry for MPI_Comm_dup:
A common error is to use a null communicator in a call (not even allowed in MPI_Comm_rank).
I wonder if this behavior is standard and I should expect other implementations to do the same. Why haven't they just handled it like I did? One expects the duplicate of MPI_COMM_NULL to be a MPI_COMM_NULL.
The MPI standard does not specify what MPI_Comm_dup shall do when called with an null communicator (see section 6.4.2). Therefore, one cannot assume that such a call is allowed, especially since MPI_COMM_NULL is defined as "the value used for invalid communicator handles".
For what it's worth, OpenMPI 4.0.1 also treats the call as an error.

Custom errors in golang and pointer receivers

Reading about value receivers vs pointer receivers across the web and stackoverflow, I understand the basic rule to be: If you don't plan to modify the receiver, and the receiver is relatively small, there is no need for pointers.
Then, reading about implementing the error interface (eg. https://blog.golang.org/error-handling-and-go), I see that examples of the Error() function all use pointer receiver.
Yet, we are not modifying the receiver, and the struct is very small.
I feel like the code is much nicer without pointers (return &appError{} vs return appError{}).
Is there a reason why the examples are using pointers?
First, the blog post you linked and took your example from, appError is not an error. It's a wrapper that carries an error value and other related info used by the implementation of the examples, they are not exposed, and not appError nor *appError is ever used as an error value.
So the example you quoted has nothing to do with your actual question. But to answer the question in title:
In general, consistency may be the reason. If a type has many methods and some need pointer receiver (e.g. because they modify the value), often it's useful to declare all methods with pointer receiver, so there's no confusion about the method sets of the type and the pointer type.
Answering regarding error implementations: when you use a struct value to implement an error value, it's dangerous to use a non-pointer to implement the error interface. Why is it so?
Because error is an interface. And interface values are comparable. And they are compared by comparing the values they wrap. And you get different comparison result based what values / types are wrapped inside them! Because if you store pointers in them, the error values will be equal if they store the same pointer. And if you store non-pointers (structs) in them, they are equal if the struct values are equal.
To elaborate on this and show an example:
The standard library has an errors package. You can create error values from string values using the errors.New() function. If you look at its implementation (errors/errors.go), it's simple:
// Package errors implements functions to manipulate errors.
package errors
// New returns an error that formats as the given text.
func New(text string) error {
return &errorString{text}
}
// errorString is a trivial implementation of error.
type errorString struct {
s string
}
func (e *errorString) Error() string {
return e.s
}
The implementation returns a pointer to a very simple struct value. This is so that if you create 2 error values with the same string value, they won't be equal:
e1 := errors.New("hey")
e2 := errors.New("hey")
fmt.Println(e1, e2, e1 == e2)
Output:
hey hey false
This is intentional.
Now if you would return a non-pointer:
func New(text string) error {
return errorString{text}
}
type errorString struct {
s string
}
func (e errorString) Error() string {
return e.s
}
2 error values with the same string would be equal:
e1 = New("hey")
e2 = New("hey")
fmt.Println(e1, e2, e1 == e2)
Output:
hey hey true
Try the examples on the Go Playground.
A shining example why this is important: Look at the error value stored in the variable io.EOF:
var EOF = errors.New("EOF")
It is expected that io.Reader implementations return this specific error value to signal end of input. So you can peacefully compare the error returned by Reader.Read() to io.EOF to tell if end of input is reached. You can be sure that if they occasionally return custom errors, they will never be equal to io.EOF, this is what errors.New() guarantees (because it returns a pointer to an unexported struct value).
Errors in go only satisfy the error interface, i.e. provide a .Error() method. Creating custom errors, or digging through Go source code, you will find errors to be much more behind the scenes. If a struct is being populated in your application, to avoid making copies in memory it is more efficient to pass it as a pointer. Furthermore, as illustrated in The Go Programming Language book:
The fmt.Errorf function formats an error message using fmt.Sprintf and returns a new error value. We use it to build descriptive errors by successively prefixing additional context information to the original error message. When the error is ultimately handled by the program’s main function, it should provide a clear causal chain from the root problem to the overall failure, reminiscent of a NASA accident investigation:
genesis: crashed: no parachute: G-switch failed: bad relay orientation
Because error messages are frequently chained together, message strings should not be capitalized and newlines should be avoided. The resulting errors may be long, but they will be self-contained when found by tools like grep.
From this we can see that if a single 'error type' holds a wealth of information, and on top of this we are 'chaining' them together to create a detailed message, using pointers will be the best way to achieve this.
We can look at this from the error handling's perspective, instead of the error creation.
Error Definiton Side's Story
type ErrType1 struct {}
func (e *ErrType1) Error() string {
return "ErrType1"
}
type ErrType2 struct {}
func (e ErrType2) Error() string {
return "ErrType1"
}
Error Handler Side's Story
err := someFunc()
switch err.(type) {
case *ErrType1
...
case ErrType2, *ErrType2
...
default
...
}
As you can see, if you implements a error type on a value receiver, then when you are doing the type assertion, you need to worry about both cases.
For ErrType2, both &ErrType2{} and ErrType2{} satisfy the interface.
Because someFunc returns an error interface, you never know if it returns a struct value or a struct pointer, especially when someFunc isn't written by you.
Therefore, by using a pointer receiver doesn't stop a user from returning a pointer as an error.
That been said, all other aspects such as
Stack vs. Heap (memory allocation, GC pressure) still apply.
Choose your implementation according to your use cases.
In general, I prefer to a pointer receiver for the reason I demonstrated above. I prefer to Friendly API over performance and sometimes, when error type contains huge information, it's more performant.
No :)
https://blog.golang.org/error-handling-and-go#TOC_2.
Go interfaces allow for anything that complies with the error interface to be handled by code expecting error
type error interface {
Error() string
}
Like you mentioned, If you don't plan to modify state there is little incentive to pass around pointers:
allocating to heap
GC pressure
Mutable state and concurrency, etc
On a random rant , Anecdotally, I personally think that seeing examples like this one are why new go programers favor pointer receivers by default.
The tour of go explains the general reasons for pointer receivers pretty well:
https://tour.golang.org/methods/8
There are two reasons to use a pointer receiver.
The first is so that the method can modify the value that its receiver points to.
In general, all methods on a given type should have either value or pointer receivers, but not a mixture of both.

Do you make safe and unsafe version of your functions or just stick to the safe version? (Embedded System)

let's say you have a function that set an index and then update few variables based on the value stored in the array element which the index is pointing to. Do you check the index to make sure it is in range? (In embedded system environment to be specific Arduino)
So far I have made a safe and unsafe version for all functions, is that a good idea? In some of my other codes I noticed that having only safe functions result in checking conditions multiple time as the libraries get larger, so I started to develop both. The safe function checks the condition and call the unsafe function as shown in example below for the case explained above.
Safe version:
bool RcChannelModule::setFactorIndexAndUpdateBoundaries(factorIndex_T factorIndex)
{
if(factorIndex < N_FACTORS)
{
setFactorIndexAndUpdateBoundariesUnsafe(factorIndex);
return true;
}
return false;
}
Unsafe version:
void RcChannelModule::setFactorIndexAndUpdateBoundariesUnsafe(factorIndex_T factorIndex)
{
setCuurentFactorIndexUnsafe(factorIndex);
updateOutputBoundaries();
}
If I am doing it wrong fundamentally please let me know why and how I could avoid that. Also I would like to know, generally when you program, do you consider the future user to be a fool or you expect them to follow the minimal documentation provided? (the reason I say minimal is because I do not have the time to write a proper documentation)
void RcChannelModule::setCuurentFactorIndexUnsafe(const factorIndex_T factorIndex)
{
currentFactorIndex_ = factorIndex;
}
Safety checks, such as array index range checks, null checks, and so on, are intended to catch programming errors. When these checks fail, there is no graceful recovery: the best the program can do is to log what happened, and restart.
Therefore, the only time when these checks become useful is during debugging and testing of your code. C++ provides built-in functionality for dealing with this through asserts, which are kept in the debug versions of the code, but compiled out from the release version:
void RcChannelModule::setFactorIndexAndUpdateBoundariesUnsafe(factorIndex_T factorIndex) {
assert(factorIndex < N_FACTORS);
setCuurentFactorIndexUnsafe(factorIndex);
updateOutputBoundaries();
}
Note: [When you make a library for external use] an argument-checking version of each external function perhaps makes sense, with non-argument-checking implementations of those and all internal-only functions. If you perform argument checking then do it (only) at the boundary between your library and the client code. But it's pointless to offer a choice to your users, for if you want to protect them from usage errors then you cannot rely on them to choose the "safe" versions of your functions. (John Bollinger)
Do you make safe and unsafe version of your functions or just stick to the safe version?
For higher level code, I recommend one version, a safe one.
High level code, with a large set of related functions and data, the combinations of interactions of data and code are not possible to fully check at development time. When an error is detected, the data should be set to indicate an error state. Subsequent use of data within these functions would be aware of the error state.
For lowest level -time critical routines, I'd go with #dasblinkenlight answer. Create one source code that compiles 2 ways per the debug and release compiles.
Yet keep in mind #pete becker, it this really likely a performance bottle neck to do a check?
With floating-point related routines, use the NaN to help keep track of an unrecoverable error.
Lastly, as able, create functions that do not fail and avoid the issue. With many, not all, this only requires small code additions. It often only adds a constant of time performance penalty and not a O(n) penalty.
Example: Consider a function to lop off the first character of a string - in place.
// This work fine as long as s[0] != 0
char *slop_1(char *s) {
size_t len = strlen(s); // most work is here
return memmove(s, s + 1, len); // and here
}
Instead define the function, and code it, to do nothing when s[0] == 0
char *slop_2(char *s) {
size_t len = strlen(s);
if (len > 0) { // negligible additional work
memmove(s, s + 1, len);
}
return s;
}
Similar code can be applied to OP's example. Note that it is "safe", at least within the function. The assert() scheme can still be used to discovery development issues. Yet the released code, without the assert(), still checks the range.
void RcChannelModule::setFactorIndexAndUpdateBoundaries(factorIndex_T factorIndex)
{
if(factorIndex < N_FACTORS) {
setFactorIndexAndUpdateBoundariesUnsafe(factorIndex);
} else {
assert(1);
}
}
Since you tagged this Arduino and embedded, you have a very resource-constrained system, one of the crappiest processors still manufactured.
On such a system you cannot afford extra error handling. It is better to properly document what values the parameters passed to the function must have, then leave the checking of this to the caller.
The caller can then either check this in run-time, if needed, or otherwise in compile-time with a static assert. Your function would however not be able to implement it as a static assert, as it can't know if factorIndex is a run-time variable or a compile-time constant.
As for "I have no time to write proper documentation", that's nonsense. It takes far less time to document this function than to post this SO question. You don't necessarily have to write an essay in some Word file. You don't necessarily have to use Doxygen or similar.
But you do need to write the bare minimum of documentation: In the header file, document the purpose and expected values of all function parameters in the form of comments. Preferably you should have a coding standard for how to document such functions. A minimal documentation of public API functions in the form of comments is part of your job as programmer. The code is not complete until this is written.

Readable exception information when testing DateTime objects in PHPUnit

When testing DateTime objects in PHPUnit (3.7.22), I am making the assertion like this:
$this->assertEquals(date_create('2014-01-01'), $myobject->getDate());
Works nicely till you get an exception, and the exception in not clear enough (like for primitives where is clearly states for example that 1 does not equal the excepted 2).
PHPUnit_Framework_ExpectationFailedException : Failed asserting that two objects are equal.
I could pass the $message parameter to the assertEquals method, with a string containing the object value, but I feel it could go easier.
Any ideas?
You could do something like
$expected_date = new DateTime('2014-01-01');
$this->assertEquals($expected_date->format('Y-m-d'), $myobject->getDate()->format('Y-m-d');
This would make you error message say something like "failed asserting that '2014-02-03 matches expected 2014-01-01'
I had a poke around the phpUnit source, and don't see any hook to allow better display. What I usually do is what you already mention in your question:
$this->assertEquals(date_create('2014-01-01'), $myobject->getDate(), print_r($myobject,true) );
OR:
$this->assertEquals(date_create('2014-01-01'), $myobject->getDate(), print_r($myobject->getDate(),true) );
depending on which is more useful, case-by-case.
I do this anyway, because often I want to include other helpful data in the message, perhaps some previous calculations, to give it context. I.e. to know why a test is failing I often need to know more than just the two objects that should've been equal.

Resources