Does anyone understand what this RocksDB error refers to ?
/column_family.cc:275: rocksdb::ColumnFamilyData::~ColumnFamilyData():
Assertion `refs_ == 0' failed. Aborted (core dumped)
This is an assertion failure raised by RocksDB, and it intentionally terminates the execution of the program.
In general, assertions are used by programmers to ensure certain invariants in the program. Assertions have some runtime overhead, and therefore can be completely disabled. Often they are compiled into development or debug builds, but are omitted for production builds.
When an assertion fails, the program execution is intentionally aborted immediately by calling std::abort. This may lead to your OS writing a core dump (as it obviously did as the above message reveals), but if and where core dumps are written depends on the OS configuration.
In case of this specific assertion, the destructor of rocksdb::ColumnFamilyData raised the assertion because it requires its refs_ member to have a value of 0. refs_ is a reference counter and it makes sense to assert that no references are actually held when the object's destructor is called.
From just looking at the destructor code, it is unclear whether this is a bug in the RocksDB library itself, or an error caused by using it the wrong way, e.g. destroying column family objects when they are still in use by other objects.
For reference, here's the code part that raised the assertion (currently on line 365 in file rocksdb/db/column_family.cc):
ColumnFamilyData::~ColumnFamilyData() {
assert(refs_.load(std::memory_order_relaxed) == 0);
If the error persists, it may be useful if you provide the code that uses RocksDB here. Otherwise it may be impossible to find the error source.
The core dump may also provide useful information, because it contains the stack trace of the code that actually invoked the object's destructor.
I noticed that all column_family.cc errors (core_dumped, memory_order_relaxed and etc) occur after incorrect rocksdb installation. In my vagrant script i found true way.
instead of use
https://github.com/facebook/rocksdb/blob/master/INSTALL.md
i create script
cd /opt
git clone https://github.com/facebook/rocksdb.git
cd rocksdb
git checkout tags/v4.1
PORTABLE=1 make shared_lib
export LD_LIBRARY_PATH=/opt/rocksdb
LD_LIBRARY_PATH add better to your environment path(.bash_rc or /etc/environment)
Assertion refs_ == 0 fails on ~ColumnFamilyData() means the reference count of a column family is not zero when the column family is deleted. Most likely you have some un-deleted column family handles before closing the DB. Note that all column family handles must be deleted before closing the DB. Otherwise the assertion will fail.
// Before delete DB, you have to close All column families by calling
// DestroyColumnFamilyHandle() with all the handles.
static Status Open(const DBOptions& db_options, const std::string& name,
const std::vector<ColumnFamilyDescriptor>& column_families,
std::vector<ColumnFamilyHandle*>* handles, DB** dbptr);
To fix such assertion failure, making sure you delete all column family handles before closing the DB.
Related
Suppose we have the following situation:
launch_kernel_a<<<n_blocks, n_threads>>>(...);
launch_kernel_b<<<n_blocks, n_threads>>>(...);
cudaDeviceSynchronize();
if(cudaGetLastError() != CudaSuccess)
{
// Handle error
...
}
My understanding is that in the above, execution errors occurring during the asynchronous execution of either kernel may be returned by cudaGetLastError(). In that case, how do I figure out which kernel caused the error to occur during runtime?
My understanding is that in the above, execution errors occurring during the asynchronous execution of either kernel may be returned by cudaGetLastError().
That is correct. The runtime API will return the last error which was encountered. It isn't possible to know from which call in a sequence of asynchronous API calls an error was generated.
In that case, how do I figure out which kernel caused the error to occur during runtime?
You can't. You would require some kind of additional API call between the two kernel launches to determine the error. The crudest would be a cudaDeviceSynchronize() call, although that would serialize the operations if they actually did overlap (although I see no stream usage so that is probably not happening here).
As noted in comments -- most kernel runtime errors will result in context destruction, so if you got an error from the first kernel, the second kernel will abort or refuse to run anyway and that is probably fatal to your whole application.
We do explicit soft-locking with
serviceHub.vaultService.softLockReserve(txBuilder.lockId, NonEmptySet.of(balanceStateS2R.ref))
Things were working until today when we got this exception repeatedly.
And now we cannot run the flow anymore. There was no double-spend going on. What is causing it and how can we get out of it?
[m[1;31mE 17:01:57-0500 [Node thread] vault.NodeVaultService.softLockReserve - soft lock update error attempting to reserve states for acace05e-b0fb-4d4e-9b96-a7d0d4728f68 and [392D84F9CF931F17438399D36607CAFDB549C02A5E7B63E8F8D2B2FADE1AFF57(1)]")
Soft locking error: Attempted to reserve [392D84F9CF931F17438399D36607CAFDB549C02A5E7B63E8F8D2B2FADE1AFF57(1)] for acace05e-b0fb-4d4e-9b96-a7d0d4728f68 but only 0 rows available.
Without more context (e.g. what actions the flow is executing), I cannot provide an accurate answer, but some points to raise:
The error message indicates that the previously locked state has been consumed in some way (perhaps by the flow that originally locked it) and was not explicitly released after consumption
Use the softLockRelease() API call to explicitly release states that have been explicitly previously locked
We have a new beta version of our software with some changes, but not around our database layer.
We've just started getting Error 3128 reported in our server logs. It seems that once it happens, it happens for as long as the app is open. The part of the code where it is most apparent is where we log data every second via SQLite. We've generated 47k errors on our server this month alone.
3128 Disk I/O error occurred. Indicates that an operation could not be completed because of a disk I/O error. This can happen if the runtime is attempting to delete a temporary file and another program (such as a virus protection application) is holding a lock on the file. This can also happen if the runtime is attempting to write data to a file and the data can't be written.
I don't know what could be causing this error. Maybe an anti-virus program? Maybe our app is getting confused and writing data on top of each other? We're using async connections.
It's causing lots of issues and we're at a loss. It has happened in our older version, but maybe 100 times in a month rather than 47,000 times. Either way I'd like to make it happen "0" times.
Possible solution: Exception Message: Some kind of disk I/O error occurred
Summary: There is probably not a problem with the database but a problem creating (or deleting) the temporary file once the database is opened. AIR may have permissions to the database, but not to create or delete files in the directory.
One answer that has worked for me is to use the PRAGMA statement to set the journal_mode value to something other than DELETE. You do this by issuing a PRAGMA statement in the same way you would issue a query statement.
PRAGMA journal_mode = OFF
Unfortunately, if the application crashes in the middle of a transaction when the OFF journaling mode is set, then the database file will very likely go corrupt.1.
1 http://www.sqlite.org/pragma.html#pragma_journal_mode
The solution was to make sure database delete, update, insert only happened one at at time by wrapping a little wrapper. On top of that, we had to watch for error 3128 and retry. I think this is because we have a trigger running that could lock the database after we inserted data.
We have a new beta version of our software with some changes, but not around our database layer.
We've just started getting Error 3128 reported in our server logs. It seems that once it happens, it happens for as long as the app is open. The part of the code where it is most apparent is where we log data every second via SQLite. We've generated 47k errors on our server this month alone.
3128 Disk I/O error occurred. Indicates that an operation could not be completed because of a disk I/O error. This can happen if the runtime is attempting to delete a temporary file and another program (such as a virus protection application) is holding a lock on the file. This can also happen if the runtime is attempting to write data to a file and the data can't be written.
I don't know what could be causing this error. Maybe an anti-virus program? Maybe our app is getting confused and writing data on top of each other? We're using async connections.
It's causing lots of issues and we're at a loss. It has happened in our older version, but maybe 100 times in a month rather than 47,000 times. Either way I'd like to make it happen "0" times.
Possible solution: Exception Message: Some kind of disk I/O error occurred
Summary: There is probably not a problem with the database but a problem creating (or deleting) the temporary file once the database is opened. AIR may have permissions to the database, but not to create or delete files in the directory.
One answer that has worked for me is to use the PRAGMA statement to set the journal_mode value to something other than DELETE. You do this by issuing a PRAGMA statement in the same way you would issue a query statement.
PRAGMA journal_mode = OFF
Unfortunately, if the application crashes in the middle of a transaction when the OFF journaling mode is set, then the database file will very likely go corrupt.1.
1 http://www.sqlite.org/pragma.html#pragma_journal_mode
The solution was to make sure database delete, update, insert only happened one at at time by wrapping a little wrapper. On top of that, we had to watch for error 3128 and retry. I think this is because we have a trigger running that could lock the database after we inserted data.
I am trying to make a multi threaded Qt Application that uses QGLWidgets and I keep getting this error.(I am trying to paint from another thread using QPainter)
And it also looks like I have a huge memory leak because of it.
The error is "QGLContext::makeCurrent() : wglMakeCurrent failed: The operation completed successfully"
I believe this is related to a rather old issue from the Qt mailing list as described here. In short, if the thread calling makeCurrent() does not equal the thread where the device context was retrieved, GetDC() is called. As outlined in the linked thread, the problem is that ReleaseDC() is not called accordingly, resulting in a handle leak, and triggering Windows to return NULL in the call to GetDC() at some point, which makes wglMakeCurrent() fail. I don't know, however, why GetLastError() claims "The operation completed successfully" in this case.