Here is a simple piece of code where a division by zero occurs. I'm trying to catch it :
#include <iostream>
int main(int argc, char *argv[]) {
int Dividend = 10;
int Divisor = 0;
try {
std::cout << Dividend / Divisor;
} catch(...) {
std::cout << "Error.";
}
return 0;
}
But the application crashes anyway (even though I put the option -fexceptions of MinGW).
Is it possible to catch such an exception (which I understand is not a C++ exception, but a FPU exception) ?
I'm aware that I could check for the divisor before dividing, but I made the assumption that, because a division by zero is rare (at least in my app), it would be more efficient to try dividing (and catching the error if it occurs) than testing each time the divisor before dividing.
I'm doing these tests on a WindowsXP computer, but would like to make it cross platform.
It's not an exception. It's an error which is determined at hardware level and is returned back to the operating system, which then notifies your program in some OS-specific way about it (like, by killing the process).
I believe that in such case what happens is not an exception but a signal. If it's the case: The operating system interrupts your program's main control flow and calls a signal handler, which - in turn - terminates the operation of your program.
It's the same type of error which appears when you dereference a null pointer (then your program crashes by SIGSEGV signal, segmentation fault).
You could try to use the functions from <csignal> header to try to provide a custom handler for the SIGFPE signal (it's for floating point exceptions, but it might be the case that it's also raised for integer division by zero - I'm really unsure here). You should however note that the signal handling is OS-dependent and MinGW somehow "emulates" the POSIX signals under Windows environment.
Here's the test on MinGW 4.5, Windows 7:
#include <csignal>
#include <iostream>
using namespace std;
void handler(int a) {
cout << "Signal " << a << " here!" << endl;
}
int main() {
signal(SIGFPE, handler);
int a = 1/0;
}
Output:
Signal 8 here!
And right after executing the signal handler, the system kills the process and displays an error message.
Using this, you can close any resources or log an error after a division by zero or a null pointer dereference... but unlike exceptions that's NOT a way to control your program's flow even in exceptional cases. A valid program shouldn't do that. Catching those signals is only useful for debugging/diagnosing purposes.
(There are some useful signals which are very useful in general in low-level programming and don't cause your program to be killed right after the handler, but that's a deep topic).
Dividing by zero is a logical error, a bug by the programmer. You shouldn't try to cope with it, you should debug and eliminate it. In addition, catching exceptions is extremely expensive- way more than divisor checking will be.
You can use Structured Exception Handling to catch the divide by zero error. How that's achieved depends on your compiler. MSVC offers a function to catch Structured Exceptions as catch(...) and also provides a function to translate Structured Exceptions into regular exceptions, as well as offering __try/__except/__finally. However I'm not familiar enough with MinGW to tell you how to do it in that compiler.
There's isn't a
language-standard way of catching
the divide-by-zero from the CPU.
Don't prematurely "optimize" away a
branch. Is your application
really CPU-bound in this context? I doubt it, and it isn't really an
optimization if you break your code.
Otherwise, I could make your code
even faster:
int main(int argc, char *argv[]) { /* Fastest program ever! */ }
Divide by zero is not an exception in C++, see https://web.archive.org/web/20121227152410/http://www.jdl.co.uk/briefings/divByZeroInCpp.html
Somehow the real explanation is still missing.
Is it possible to catch such an exception (which I understand is not a C++ exception, but a FPU exception) ?
Yes, your catch block should work on some compilers. But the problem is that your exception is not an FPU exception. You are doing integer division. I don’t know whether that’s also a catchable error but it’s not an FPU exception, which uses a feature of the IEEE representation of floating point numbers.
On Windows (with Visual C++), try this:
BOOL SafeDiv(INT32 dividend, INT32 divisor, INT32 *pResult)
{
__try
{
*pResult = dividend / divisor;
}
__except(GetExceptionCode() == EXCEPTION_INT_DIVIDE_BY_ZERO ?
EXCEPTION_EXECUTE_HANDLER : EXCEPTION_CONTINUE_SEARCH)
{
return FALSE;
}
return TRUE;
}
MSDN: http://msdn.microsoft.com/en-us/library/ms681409(v=vs.85).aspx
Well, if there was an exception handling about this, some component actually needed to do the check. Therefore you don't lose anything if you check it yourself. And there's not much that's faster than a simple comparison statement (one single CPU instruction "jump if equal zero" or something like that, don't remember the name)
To avoid infinite "Signal 8 here!" messages, just add 'exit' to Kos nice code:
#include <csignal>
#include <iostream>
#include <cstdlib> // exit
using namespace std;
void handler(int a) {
cout << "Signal " << a << " here!" << endl;
exit(1);
}
int main() {
signal(SIGFPE, handler);
int a = 1/0;
}
As others said, it's not an exception, it just generates a NaN or Inf.
Zero-divide is only one way to do that. If you do much math, there are lots of ways, like
log(not_positive_number), exp(big_number), etc.
If you can check for valid arguments before doing the calculation, then do so, but sometimes that's hard to do, so you may need to generate and handle an exception.
In MSVC there is a header file #include <float.h> containing a function _finite(x) that tells if a number is finite.
I'm pretty sure MinGW has something similar.
You can test that after a calculation and throw/catch your own exception or whatever.
I want to run a particular MPI function under google benchmark. Something like:
#include <mpi.h>
#include <benchmark/benchmark.h>
template<class Real>
void MPIInitFinalize(benchmark::State& state)
{
auto mpi = []() {
MPI_Init(nullptr, nullptr);
foo();
MPI_Finalize();
};
for(auto _ : state) {
mpi();
}
}
BENCHMARK_TEMPLATE(MPIInitFinalize, double);
BENCHMARK_MAIN();
Of course, we know what will happen:
*** The MPI_Init() function was called after MPI_FINALIZE was invoked.
*** This is disallowed by the MPI standard.
*** Your MPI job will now abort.
I understand that MPI isn't cool with what I want to do. But google benchmark is simply too useful to not at least try to find a hack to make this work.
Is there anything that can be done? Can I fork a process and pass the lambda to it? Is there a threading pattern that will work? Even expensive things will be helpful, as I can just subtract the cost of doing whatever hack works without a call too foo() from the one which call foo().
If you don't need to include MPI_Init and MPI_Finalize in your time (which you probably don't want anyways) you can take alook at this gist: https://gist.github.com/mdavezac/eb16de7e8fc08e522ff0d420516094f5
It countains an example on how to benchmark MPI enabled code with google benchmark. The basic idea is to call google benchmark from your own main method (using ::benchmark::Initialize(&argc, argv) and ::benchmark::RunSpecifiedBenchmarks()), synchronize using MPI_Barrier, time your code using std::chrono::high_resolution_clock and using MPI_Allreduce to find the slowest process. You can then publish that time using state.SetIterationTime (but only on the main process).
In my program, I have wrapped up some MPI communicators in to a data structure. Unfortunately, sometimes the destructor of an object of this type might get called before it has been initialized. In my destructor, I of course call MPI_Comm_Free. But if this is called on an invalid communicator the code crashes.
I've been looking through the MPI standard, but I can't find a function to test if a communicator is valid. I also assume I can't use MPI_Comm_set_errhandler to try and catch the free exception because there isn't a valid communicator to set the handler of. I could maintain a flag value of my own saying if the communicator is valid, but I prefer to avoid duplicating state information like that. Is there any built in way I can safely check if a communicator is valid?
Here is a basic program demonstrating my problem:
#include <mpi.h>
typedef struct {
MPI_Comm comm;
} mystruct;
void cleanup(mystruct* a) {
MPI_Comm_free(&(a->comm));
}
int main(int argc, char* argv[]) {
MPI_Init(&argc, &argv);
mystruct a;
/* Some early exit condition triggers cleanup without
initialization */
cleanup(&a);
MPI_Finalize();
return 0;
}
MPI_COMM_NULL is a constant used for invalid communicators. However, you cannot determine if an MPI communicator has been initialized. In C, it is impossible to determine if a variable has been initialized. Non-static variables start with an indeterminate value, reading it causes undefined behavior.
You must initialized the communicator with MPI_COMM_NULL yourself. This only make sense if cannot possibly create actual communicator during initialization.
Note: MPI_Comm_free also sets comm to MPI_COMM_NULL.
I am running a script that does multiple subsequent mpirun calls through slurms squeue command. Each call to mpirun will write its output to an own directory, but there is a dependency between them in the way that a given run will use data from the former runs output directory.
The mpi program internally performs some iterative optimization algorithm, which will terminate if some convergence criteria are met. Every once in a while it happens, that the algorithm reaches a state in which those criteria are not quite met yet, but by plotting the output (which is continuosly written to disk) one can quite easily tell that the important things have converged and that further iterations would not change the nature of the final result anymore.
What I am therefore looking for is a way to manually terminate the run in a controlled way and have the outer script proceed to the next mpirun call. What is the best way to achieve this? I do not have direct access to the node on which the calculation is actually performed, but I have of course access to all of slurms commands and the working directories of the individual runs. I have access to the mpi programs full source code.
One solution that would work is the following: If one manually wants to terminate a run, one places a file with a special name like killme in the working directory, which could easily be done with touch killme. The mpi program would regulary check for the existence of this file and terminate in a controlled manner if it exists. The outer script or slurm would not be involved at all here and the script would just continue with the next mpirun call. What do you think of this solution? Can you think of anything better?
Here is a short code snippet for getting SIGUSR1 as a signal.
More detailed explanation can be found here.
#include <stdio.h>
#include <stdlib.h>
#include <signal.h>
#include <string.h>
#include <unistd.h>
void sighandler(int signum, siginfo_t *info, void *ptr) {
fprintf(stderr, "Received signal %d\n", signum);
fprintf(stderr, "Signal originates from process %lu\n",
(unsigned long) info->si_pid);
fprintf(stderr, "Shutting down properly.\n");
exit(0);
}
int main(int argc, char** argv) {
struct sigaction act;
printf("pid %lu\n", (unsigned long) getpid());
memset(&act, 0, sizeof(act));
act.sa_sigaction = sighandler;
act.sa_flags = SA_SIGINFO;
sigaction(SIGUSR1, &act, NULL);
while (1) {
};
return 0;
}
I am developing a Windows 64-bit application that will manage concurrent execution of different CUDA-algorithms on several GPUs.
My design requires a way of passing pointers to device memory
around c++ code. (E.g. remember them as members in my c++ objects).
I know that it is impossible to declare class members with __device__ qualifiers.
However I couldn't find a definite answer whether assigning __device__ pointer to a normal C pointer and then using the latter works. In other words: Is the following code valid?
__device__ float *ptr;
cudaMalloc(&ptr, size);
float *ptr2 = ptr
some_kernel<<<1,1>>>(ptr2);
For me it compiled and behaved correctly but I would like to know whether it is guaranteed to be correct.
No, that code isn't strictly valid. While it might work on the host side (more or less by accident), if you tried to dereference ptr directly from device code, you would find it would have an invalid value.
The correct way to do what your code implies would be like this:
__device__ float *ptr;
__global__ void some_kernel()
{
float val = ptr[threadIdx.x];
....
}
float *ptr2;
cudaMalloc(&ptr2, size);
cudaMemcpyToSymbol("ptr", ptr2, sizeof(float *));
some_kernel<<<1,1>>>();
for CUDA 4.x or newer, change the cudaMemcpyToSymbol to:
cudaMemcpyToSymbol(ptr, ptr2, sizeof(float *));
If the static device symbol ptr is really superfluous, you can just to something like this:
float *ptr2;
cudaMalloc(&ptr2, size);
some_kernel<<<1,1>>>(ptr2);
But I suspect that what you are probably looking for is something like the thrust library device_ptr class, which is a nice abstraction wrapping the naked device pointer and makes it absolutely clear in code what is in device memory and what is in host memory.