Is there an easy way to get at "Attempting to use an MPI routine after finalizing MPI"? It is for sure being called by a logging message, possibly in a destructor, but I can't find the call.
Related
Suppose we have the following situation:
launch_kernel_a<<<n_blocks, n_threads>>>(...);
launch_kernel_b<<<n_blocks, n_threads>>>(...);
cudaDeviceSynchronize();
if(cudaGetLastError() != CudaSuccess)
{
// Handle error
...
}
My understanding is that in the above, execution errors occurring during the asynchronous execution of either kernel may be returned by cudaGetLastError(). In that case, how do I figure out which kernel caused the error to occur during runtime?
My understanding is that in the above, execution errors occurring during the asynchronous execution of either kernel may be returned by cudaGetLastError().
That is correct. The runtime API will return the last error which was encountered. It isn't possible to know from which call in a sequence of asynchronous API calls an error was generated.
In that case, how do I figure out which kernel caused the error to occur during runtime?
You can't. You would require some kind of additional API call between the two kernel launches to determine the error. The crudest would be a cudaDeviceSynchronize() call, although that would serialize the operations if they actually did overlap (although I see no stream usage so that is probably not happening here).
As noted in comments -- most kernel runtime errors will result in context destruction, so if you got an error from the first kernel, the second kernel will abort or refuse to run anyway and that is probably fatal to your whole application.
Does anyone understand what this RocksDB error refers to ?
/column_family.cc:275: rocksdb::ColumnFamilyData::~ColumnFamilyData():
Assertion `refs_ == 0' failed. Aborted (core dumped)
This is an assertion failure raised by RocksDB, and it intentionally terminates the execution of the program.
In general, assertions are used by programmers to ensure certain invariants in the program. Assertions have some runtime overhead, and therefore can be completely disabled. Often they are compiled into development or debug builds, but are omitted for production builds.
When an assertion fails, the program execution is intentionally aborted immediately by calling std::abort. This may lead to your OS writing a core dump (as it obviously did as the above message reveals), but if and where core dumps are written depends on the OS configuration.
In case of this specific assertion, the destructor of rocksdb::ColumnFamilyData raised the assertion because it requires its refs_ member to have a value of 0. refs_ is a reference counter and it makes sense to assert that no references are actually held when the object's destructor is called.
From just looking at the destructor code, it is unclear whether this is a bug in the RocksDB library itself, or an error caused by using it the wrong way, e.g. destroying column family objects when they are still in use by other objects.
For reference, here's the code part that raised the assertion (currently on line 365 in file rocksdb/db/column_family.cc):
ColumnFamilyData::~ColumnFamilyData() {
assert(refs_.load(std::memory_order_relaxed) == 0);
If the error persists, it may be useful if you provide the code that uses RocksDB here. Otherwise it may be impossible to find the error source.
The core dump may also provide useful information, because it contains the stack trace of the code that actually invoked the object's destructor.
I noticed that all column_family.cc errors (core_dumped, memory_order_relaxed and etc) occur after incorrect rocksdb installation. In my vagrant script i found true way.
instead of use
https://github.com/facebook/rocksdb/blob/master/INSTALL.md
i create script
cd /opt
git clone https://github.com/facebook/rocksdb.git
cd rocksdb
git checkout tags/v4.1
PORTABLE=1 make shared_lib
export LD_LIBRARY_PATH=/opt/rocksdb
LD_LIBRARY_PATH add better to your environment path(.bash_rc or /etc/environment)
Assertion refs_ == 0 fails on ~ColumnFamilyData() means the reference count of a column family is not zero when the column family is deleted. Most likely you have some un-deleted column family handles before closing the DB. Note that all column family handles must be deleted before closing the DB. Otherwise the assertion will fail.
// Before delete DB, you have to close All column families by calling
// DestroyColumnFamilyHandle() with all the handles.
static Status Open(const DBOptions& db_options, const std::string& name,
const std::vector<ColumnFamilyDescriptor>& column_families,
std::vector<ColumnFamilyHandle*>* handles, DB** dbptr);
To fix such assertion failure, making sure you delete all column family handles before closing the DB.
In Go, a call to the net.Listener type's Accept method returns an error. However, is there a way to tell the difference between a transient error (ie, this connection failed to set up) vs a permanent error (ie, the listener is dead, such as a Unix domain socket file that was forcibly removed)? If I can't tell the difference, I run the risk of infinite looping and spitting out errors as fast as I can since each Accept call will immediately return an error.
Figured it out. Errors returned by the net package may be of the net.Error type, which defines the Temporary() bool method which reports whether the error is temporary.
I am trying to make a multi threaded Qt Application that uses QGLWidgets and I keep getting this error.(I am trying to paint from another thread using QPainter)
And it also looks like I have a huge memory leak because of it.
The error is "QGLContext::makeCurrent() : wglMakeCurrent failed: The operation completed successfully"
I believe this is related to a rather old issue from the Qt mailing list as described here. In short, if the thread calling makeCurrent() does not equal the thread where the device context was retrieved, GetDC() is called. As outlined in the linked thread, the problem is that ReleaseDC() is not called accordingly, resulting in a handle leak, and triggering Windows to return NULL in the call to GetDC() at some point, which makes wglMakeCurrent() fail. I don't know, however, why GetLastError() claims "The operation completed successfully" in this case.
For a user application (not a driver) using WinUSB, I use WinUsb_ControlTransfer in combination with overlapped I/O to asynchronously send a control message. Is it possible to cancel the asynchronous operation? WinUsb_AbortPipe works for all other endpoints but gives an 'invalid parameter' error when the control endpoint is passed (0x00 or 0x80 as the pipe address). I also tried CancelIo and CancelIoEx but both give an 'invalid handle' error on the WinUSB handle. The only related information I could find is on http://www.winvistatips.com/winusb-bugchecks-t335323.html, but offers no solution. Is this just impossible?
Probably not useful to the original asker any more, but in case anyone else comes across this: you can use CancelIo() or CancelIoEx() with the file handle that you originally passed in to WinUsb_Initialize().
This is similar to how the documentation of WinUsb_GetOverlappedResult says:
This function is like the Win32 API routine, GetOverlappedResult, with one difference—instead of passing a file handle that is returned from CreateFile, the caller passes an interface handle that is returned from WinUsb_Initialize. The caller can use either API routine, if the appropriate handle is passed. The WinUsb_GetOverlappedResult function extracts the file handle from the interface handle and then calls GetOverlappedResult.