Mvapich2 buffer aliasing - mpi

I am launched an MPI program with MVAPICH2 and got this error:
Fatal error in PMPI_Gather:
Invalid buffer pointer, error stack:
PMPI_Gather(923): MPI_Gather() failed
PMPI_Gather(857): Buffers must not be aliased
There are two ways I think I could solve this:
Rewrite my MPI program (use different buffers)
Disable checking buffer aliasing
Do someone know how I could do this with MVAPICH2? Some compiler option, parameter, environmental variable, etc?
Something like MV2_NO_BUFFER_ALIAS_CHECK, but it does not work.

What you're doing is an incorrect program and you should rewrite your code to use separate buffers
Alternatively, you might be able to use MPI_IN_PLACE if you want to use the same buffer as both the input and output values of your MPI_GATHER. Without seeing your code, I can't tell you how you could do that. You can check out some documentation about MPI_GATHER and read more about how MPI_IN_PLACE works and see if that solves your problem.

Related

Very odd OpenCL CL_OUT_OF_RESOURCES behavior

I am writing a rather large OpenCL program with lots of function calls. I've been having problems with CL_OUT_OF_RESOURCES errors, but I managed to fix the problem with a simple printf statement. This is the code fragment in question:
...
const float color = raytrace(depthMap, triangles, ...tonMoreParameters...);
if (i == 1234) {
printf("hello\n");
}
outImage[i] = color;
...
This works fine, but if I remove the printf function, the program crashes. If I keep it in, it doesn't.
When it crashes, it gives a CL_OUT_OF_RESOURCES error. Can anyone explain why adding printf makes the program not run out of resources? How can I make this work without this useless printf?
Relevant specs:
OpenCL 1.2
NVIDIA GTX 660
Using Java JOCL as host code
EDIT:
I've noticed that putting printf statements in other places changes the way the code operates. Some printf statements cause the program to output different numeric results, while others cause it to crash.
Even changing code that is never executed hugely changes the calculations.
It's as if changing any code randomizes the way it executes.
Is this a sign of a faulty graphics card?
Or perhaps a bug in the OpenCL compiler?
EDIT 2
As it turns out, recursion is not the problem. I removed all recursive calls, but printfs and other harmless changes sill change the way the code runs depending on where they are put.
This is definitely a problem rooted during compilation time.
large OpenCL program with lots of function calls and recursion
OpenCL C 2.2 pdf, page 46:
Recursion is not supported.
I have no idea why printf changes things, but your program relies on a feature that's explicitly not supported.
I found the solution to my own problem.
The problem was caused by a simple array out-of-bounds error. Apparently, OpenCL does not catch these types of errors. So any attempts to read or write out of bounds could cause a silent memory corruption as it did in my case. The memory that got corrupted were the instructions for the program itself, hence the random results of execution.
The problem was also partially caused by the illegal use of recursion as mogu mentioned. Again, the OpenCL compiler lets this silently corrupt the program memory.
So be careful OpenCL developers.

Is there a way to simplify OpenCl kernels usage ?

To use OpenCL kernel the following is needed:
Put the kernel code in a string
call clCreateProgramWithSource
call clBuildProgram
call clCreateKernel
call clSetKernelArg (x number of arguments)
call clEnqueueNDRangeKernel
This need to be done for each kernel. Is there a way to do this repeating less code for each kernel?
There is no way to speed up the process. You need to go step by step as you listed.
But it is important to know why it is needed these steps, to understand how flexible the chain is.
clCreateProgramWithSource: Allows to add different strings from different sources to generate the program. Some string might be static, but some might be downloaded from a server, or loaded from disk. It allows the CL code to be dynamic and updated over time.
clBuildProgram: Builds the program for a given device. Maybe you have 8 devices, so you need to call this multiple times. Each device will produce a different binary code.
clCreateKernel: Creates a kernel. But a kernel is an entry point in a binary. So it is possible you create multiple kernels from a program (for different functions). Also the same kernel might be created multiple times, since it holds the arguments. This is useful for having ready-to-be-launched instances with proper parameters.
clSetKernelArg: Changes the parameters in the instance of the kernel. (it is stored there, so it can used multiple times in the future).
clEnqueueNDRangeKernel: Launches it, configuring the size of the launch and the chain of dependencies with other operations.
So, even if you could have a way to just call "getKernelFromString()", the functionality will be very limited, and not very flexible.
You can have look at wrapper libraries
https://streamhpc.com/knowledge/for-developers/opencl-wrappers/
I suggest you look into SYCL. The building steps are performed offline, saving execution time by skipping the clCreateProgramWithSource. The argument setting is done automatically by the runtime, extracting the information from the user lambda
There is also CLU: https://github.com/Computing-Language-Utility/CLU - see https://www.khronos.org/assets/uploads/developers/library/2012-siggraph-opencl-bof/OpenCL-CLU-and-Intel-SIGGRAPH_Aug12.pdf for more info. It is a very simple tool, but should make life a bit easier.

Writing an entire array to QFile

The QFile::write() documentation says:
Writes at most maxSize bytes of data from data to the device. Returns the number of bytes that were actually written, or -1 if an error occurred.
(Yes that's the entire documentation - unusually poor for Qt.)
This seems to imply that it is common for it not to write all of the data you pass. Does that mean I should call write() in a loop, passing in the remaining unwritten data until it has fully written it?
If so that seems like a pain. Is there some convenience function in Qt that will do it for me?
This seems to imply that it is common for it not to write all of the data you pass
Common, no, but it is possible, which we'll see, below.
Is there some convenience function in Qt that will do it for me?
If you're writing a whole file, a better method would be to use QSaveFile, for "safely writing to files".
As it states in the documentation for QSaveFile:
QSaveFile automatically detects errors while writing, such as the full partition situation, where write() cannot write all the bytes.
So a full partition would be one instance that could cause a QIODevice::write to fail to fully write data to disk. Disk failure, removal of a portable storage device or network drive during a write to those can also cause only partial data to be written.
Yes that's the entire documentation - unusually poor for Qt.
There's not much more to it.
Does that mean I should call write() in a loop
Something has to prevent write from finishing, perhaps the missing bit is that it won't fail to write maxBytes for no reason. I.e. it's not done to spite you, but because write can't proceed.
If you remove the condition that caused it to fail you can call it again. In most cases, there's no point in retrying - when write fails to write what you meant to write, you're done and you handle it as an error.

Difference between write() and printf()

Recently I am studying operating system..I just wanna know:
What’s the difference between a system call (like write()) and a standard library function (like printf())?
A system call is a call to a function that is not part of the application but is inside the kernel. The kernel is a software layer that provides you some basic functionalities to abstract the hardware to you. Roughly, the kernel is something that turns your hardware into software.
You always ultimately use write() to write anything on a peripheral whatever is the kind of device you write on. write() is designed to only write a sequence of bytes, that's all and nothing more. But as write() is considered too basic (you may want to write an integer in ten basis, or a float number in scientific notation, etc), different libraries are provided to you by different kind of programming environments to ease you.
For example, the C programming langage gives you printf() that lets you write data in many different formats. So, you can understand printf() as a function that convert your data into a formatted sequence of bytes and that calls write() to write those bytes onto the output. But C++ gives you cout; Java System.out.println, etc. Each of these functions ends to a call to write() (at least on POSIX systems).
One thing to know (important) is that such a system call is costly! It is not a simple function call because you need to call something that is outside of your own code and the system must ensure that you are not trying to do nasty things, etc. So it is very common in higher print-like function that some buffering is built-in; such that write is not always called, but your data are kept into some hidden structure and written only when it is really needed or necessary (buffer is full or you really want to see the result of your print).
This is exactly what happens when you manage your money. If many people gives you 5 bucks each, you won't go deposit each to the bank! You keep them on your wallet (this is the print) up to the point it is full or you don't want to keep them anymore. Then you go to the bank and make a big deposit (this is the write). And you know that putting 5 bucks to your wallet is much much faster than going to the bank and make the deposit. The bank is the kernel/OS.
System calls are implemented by the operating system, and run in kernel mode. Library functions are implemented in user mode, just like application code. Library functions might invoke system calls (e.g. printf eventually calls write), but that depends on what the library function is for (math functions usually don't need to use the kernel).
System Call's in OS are used in interacting with the OS. E.g. Write() could be used something into the system or into a program.
While Standard Library functions are program specific, E.g. printf() will print something out but it will only be in GUI/command line and wont effect system.
Sorry couldnt comment, because i need 50 reputation to comment.
EDIT: Barmar has good answer
I am writing a small program. At the moment it just reads each line from stdin and prints it to stdout. I can add a call to write in the loop, and it would add a few characters at the end of each line. But when I use printf instead, then all the extra characters are clustered and appear all at once, instead of appearing on each line.
It seems that using printf causes stderr to be buffered. Adding fflush(stdout); after calling printf fixes the discrepancy in output.
I'd like to mention another point that the stdio buffers are maintained in a process’s user-space memory, while system call write transfers data directly to a kernel buffer. It means that if you fork a process after write and printf calls, flushing may bring about to give output three times subject to line-buffering and block-buffering, two of them belong to printf call since stdio buffers are duplicated in the child by fork.
printf() is one of the APIs or interfaces exposed to user space to call functions from C library.
printf() actually uses write() system call. The write() system call is actually responsible for sending data to the output.

to throw, to return or to errno?

i am creating a system. What i want to know is if a msg is unsupported what should it do? should i throw saying unsupported msg? should i return 0 or -1? or should i set an errno (base->errno_). Some messages i wouldnt care if there was an error (such as setBorderColour). Others i would (addText or perhaps save if i create a save cmd).
I want to know what the best method is for 1) coding quickly 2) debugging 3) extending and maintenance. I may make debugging 3rd, its hard to debug ATM but thats bc there is a lot of missing code which i didnt fill in. Actual bugs arent hard to correct. Whats the best way to let the user know there is an error?
The system works something like this but not exactly the same. This is C style and mycode has a bunch of inline functions that wrap settext(const char*text){ to msg(this, esettext, text)
Base base2, base;
base = get_root();
base2 = msg(base, create, BASE_TYPE);
msg(base2, setText, "my text");
const char *p = (const char *)msg(base2, getText);
Generally if it's C++, prefer exceptions unless performance is critical or unless you may be running in an environment (e.g. an embedded platform) that does not support exceptions. Exceptions are by far the best choice for debugging because they are very noticeable when they occur and are ignored. Further, exceptions are self-documenting. They have their own type name and usually a contained message that explains the error. Return codes and errno require separate code definitions and some kind of out-of-band way of communicating what the codes mean in any given context (e.g. man pages, comments).
For coding quickly, return codes are probably easier since they don't involve potentially defining your own exception types, and often the error checking code is not as verbose as with exceptions. But of course the big risk is that it is much easier to silently ignore error return codes, leading to problems that may not be noticed until well after they occur, making debugging and maintenance a nightmare.
Try to avoid ever using errno, since it's very error-prone itself. It's a global, so you never know who is resetting it, and it is most definitively not thread safe.
Edit: I just realized you meant an errno member variable and not the C-style errno. That's better in that it's not global, but you still need additional constructs to make it thread safe (if your app is multi-threaded), and it retains all the problems of a return code.
Returning an error code requires discipline because the error code must be explicitly checked and then passed up. We wrote a large C-based system that used this approach and it took a while to remove all the "lost" errors. We eventually developed some techniques to catch this problem (such as storing the error code in a thread-global location and checking at the top level to see that the returned error code matched the saved error code).
Exception handling is easier to code quickly because if you're writing code and you're not sure how to handle the error, you can just let it propagate upwards (assuming you're not using Java where you have to deal with checked exceptions). It's better for debugging because you can get a stack trace of where the exception occurred (and because you can build in top level exception handlers to catch problems that should have been caught elsewhere). It's better for maintaining because if you've done things right, you'll be notified of problem faster.
However, there are some design issues with exception handling, and if you get it wrong, you'll be worse off. In short, if you're writing code that you don't know how to handle an exception, you should let it propagate up. Too many coders trap errors just to convert the exception and then rethrow it, resulting in spaghetti exception code that sometimes loses information about the original cause of the problem. This assumes that you have exception handlers at the top level (entry points).
Personally when it comes to output graphics, I feel a silent fail is fine. It just makes your picture wrong.
Graphical errors are super easy to spot anyways.
Personally, I would add an errno to your Base struct if it is pure 'C'. If this is C++ I'd throw an exception.
It depends on how 'fatal' these errors are. Does the user really need to see the error, or is it for other developers edification?
For maintainability you need to clearly document the errors that can occur and include clear examples of error handling.

Resources