OpenCL User Events / Cant Release Objects? - opencl

NOTE: Enqueued commands that specify user events in the
event_wait_list argument of clEnqueue*** commands must ensure that
the status of these user events being waited on are set using
clSetUserEventStatus before any OpenCL APIs that release OpenCL
objects except for event objects are called; otherwise the behavior
is undefined.
So if I have a user event being waited on in queue, I can't call release on any OpenCL object?
This seems like a strange requirement? What is the purpose of it? Or, why is it so?
The example they give is:
ev1 = clCreateUserEvent(ctx, NULL);
clEnqueueWriteBuffer(cq, buf1, CL_FALSE, ..., 1, &ev1, NULL);
clEnqueueWriteBuffer(cq, buf2, CL_FALSE,...);
clReleaseMemObject(buf2); // <--- UNDEFINED
clSetUserEventStatus(ev1, CL_COMPLETE);
Which causes undefined behaviour?

Consider the example that they give.
We have an in-order queue and we create a user event:
ev1 = clCreateUserEvent(ctx, NULL); // (1)
We then want to enqueue a write to a buffer, but we want it to wait for our event:
clEnqueueWriteBuffer(cq, buf1, CL_FALSE, ..., 1, &ev1, NULL); // (2)
We want to write another buffer after the buffer in the previous one (which is waiting on our event):
clEnqueueWriteBuffer(cq, buf2, CL_FALSE,...); // (3)
We release the buffer from the second clEnqueueWriteBuffer which hasn't gone through yet because we are still waiting for the user event. In this case (4) happens before (3), so we don't know what will happen as the memory object is freed.
clReleaseMemObject(buf2); // <--- UNDEFINED // (4)
We finally complete our user event which causes (2) and (3) to occur after (4) has already completed.
clSetUserEventStatus(ev1, CL_COMPLETE); // (5)
Basically, performing computation this way can cause problems because clReleaseMemObject doesn't get inserted into the clCommandQueue and can break the dependencies we expect.

Related

When to use MPI_BUFFER_ATTACH?

As far as I know, MPI_BUFFER_ATTACH must be called by a process if it is going to do buffered communication. But does this include the standard MPI_SEND as well? We know that MPI_SEND may behave either as a synchronous send or as a buffered send.
You need to call MPI_Buffer_attach() only if you plan to perform (explicitly) buffered sends via MPI_Bsend().
If you only plan to MPI_Send() or MPI_Isend(), then you do not need to invoke MPI_Buffer_attach().
FWIW, buffered sends are error prone and I strongly encourage you not to use them.
MPI_Buffer_attach
Attaches a user-provided buffer for sending
Synopsis
int MPI_Buffer_attach(void *buffer, int size)
Input Parameters
buffer
initial buffer address (choice)
size
buffer size, in bytes (integer)
Notes
The size given should be the sum of the sizes of all outstanding
Bsends that you intend to have, plus MPI_BSEND_OVERHEAD for each Bsend
that you do. For the purposes of calculating size, you should use
MPI_Pack_size. In other words, in the code
MPI_Buffer_attach( buffer, size );
MPI_Bsend( ..., count=20, datatype=type1, ... );
...
MPI_Bsend( ..., count=40, datatype=type2, ... );
the value of size in the MPI_Buffer_attach call should be greater than the value computed by
MPI_Pack_size( 20, type1, comm, &s1 );
MPI_Pack_size( 40, type2, comm, &s2 );
size = s1 + s2 + 2 * MPI_BSEND_OVERHEAD;
The MPI_BSEND_OVERHEAD gives the maximum amount of space that may be used in the buffer for use by the BSEND routines in using the buffer. This value is in mpi.h (for C) and mpif.h (for Fortran).
Thread and Interrupt Safety
The user is responsible for ensuring that multiple threads do not try to update the same MPI object from different threads. This routine should not be used from within a signal handler.
The MPI standard defined a thread-safe interface but this does not mean that all routines may be called without any thread locks. For example, two threads must not attempt to change the contents of the same MPI_Info object concurrently. The user is responsible in this case for using some mechanism, such as thread locks, to ensure that only one thread at a time makes use of this routine. Because the buffer for buffered sends (e.g., MPI_Bsend) is shared by all threads in a process, the user is responsible for ensuring that only one thread at a time calls this routine or MPI_Buffer_detach.
Notes for Fortran
All MPI routines in Fortran (except for MPI_WTIME and MPI_WTICK) have an additional argument ierr at the end of the argument list. ierr is an integer and has the same meaning as the return value of the routine in C. In Fortran, MPI routines are subroutines, and are invoked with the call statement.
All MPI objects (e.g., MPI_Datatype, MPI_Comm) are of type INTEGER in Fortran.
Errors
All MPI routines (except MPI_Wtime and MPI_Wtick) return an error value; C routines as the value of the function and Fortran routines in the last argument. Before the value is returned, the current MPI error handler is called. By default, this error handler aborts the MPI job. The error handler may be changed with MPI_Comm_set_errhandler (for communicators), MPI_File_set_errhandler (for files), and MPI_Win_set_errhandler (for RMA windows). The MPI-1 routine MPI_Errhandler_set may be used but its use is deprecated. The predefined error handler MPI_ERRORS_RETURN may be used to cause error values to be returned. Note that MPI does not guarentee that an MPI program can continue past an error; however, MPI implementations will attempt to continue whenever possible.
MPI_SUCCESS
No error; MPI routine completed successfully.
MPI_ERR_BUFFER
Invalid buffer pointer. Usually a null buffer where one is not valid.
MPI_ERR_INTERN
An internal error has been detected. This is fatal. Please send a bug report to mpi-bugs#mcs.anl.gov.
See Also MPI_Buffer_detach, MPI_Bsend
Refer Here For More
Buffer allocation and usage
Programming with MPI
MPI - Bsend usage

Qt crash when drawing, even though in GUI thread

I think I'm having a problem similar to the one in Qt crash when redrawing Widget, and switching to Qt::QueuedConnection fixed the problem. However, in my case, both the signal emitter and received are always in the same thread (the main thread).
I have a QAbstractItemModel with entry rows, and a QSortFilterProxyModel for filtering. Since the model can be very large, I wanted to make a progress bar when filtering. Updating the filter basically does this in a slot that is connected to a QAction::toggled signal:
m_ProgressBar = new QProgressBar(); // Put into status bar etc.
auto connection = connect(m_filteredModel, SIGNAL(filterProgressChanged(int)), m_ProgressBar, SLOT(setValue(int)), Qt::QueuedConnection);
m_filteredModel->UpdateFilter();
delete m_ProgressBar;
disconnect(connection);
UpdateFilter basically does some housekeeping and then calls invalidate, making the filter model requery filterAcceptsRow for every row.
The filter model then emits the filterProgressChanged(int) signal within filterAcceptsRow (works by incrementing a counter and dividing by the source model's row count, and is only emitted when the actual int progress value changes).
UpdateFilter returns when the filtering is complete. The progress bar is not deleted until then (verified), so it should work in my opinion. Not deleting the progress bar leads to getting a new one every call, but the crash is still the same.
Everything is done in the main thread: Creating the progress bar, calling UpdateFilter, emitting the filterProgressChanged signal. However, when the connection is created as Qt::AutoConnection, i.e. direct, it crashes (only when disabling the filter, for some reason) within repainting the progress bar. The same happens when I call setValue directly in my own event handler, which is what I did prior to switching to the current code.
Now I have a solution that works, but I don't understand why the original code does not work. I thought that a DirectConnection would only make an actual difference when sender and receiver of the signal are in different threads, but they're not. You can easily see in the stack trace that everything happens within the same thread, and that is even true with the queued connection.
So, what's going wrong in the original code? Is there something I just missed? Is there any way to get more information out of the actual crash?
I only found out that in void QRasterPaintEngine::clip(const QRect &rect, Qt::ClipOperation op), state() returns 0, and the code assumes it never returns 0, which is the immediate crash reason, but likely not the reason. And the stack trace points to painting as the problem area, that's all I saw when debugging this.
I'm on Windows with Qt 5.4.2 (also tried 5.7), and with MSVC 2013, if any of that matters.
Edit: As requested by code_fodder, I added the UpdateFilter and emit code (actualFilterFunction performs the actual filtering, but has nothing to do with signals or GUI or anything).
void MyModel::UpdateFilter() {
m_filterCounter = 0;
m_lastReportedProgress = -1;
invalidate();
}
bool MyModel::filterAcceptsRow(int sourceRow, const QModelIndex &sourceParent) const {
m_filterCounter++;
int progress = (100 * m_filterCounter) / m_sourceModel->rowCount();
if (progress != m_lastReportedProgress) {
emit filterProgressChanged(m_lastReportedProgress = progress);
}
return actualFilterFunction();
}

Resume execution at arbitrary positions inside a callback function

I am using Pin for dynamic analysis.
In my dynamic analysis task on 64-bit x86 binary code, I would like to resume the execution at arbitrary program positions (e.g., the second instruction of current executed function) after I fix certain memory access error inside the signal handling callbacks.
It would be something like this:
BOOL catchSignalSEGV(THREADID tid, INT32 sig, CONTEXT *ctx, BOOL hasHandler, const EXCEPTION_INFO *pExceptInfo, VOID *v)
{
// I will first fix the memory access error according to certain rules.
fix();
// then I would like to resume the execution at an arbitrary position, say, at the beginning of current monitored function
set_reg(rip, 0x123456); // set the rip register
PIN_ExecuteAt(ctx); // resume the execution
return false;
}
However, I got this exception: E: PIN_ExecuteAt() cannot be called from a callback.
I know I can resume the execution at "current instruction" by return false at the end of the signal handling function, but basically can I resume at arbitrary positions?
Am I clear? Thank you for your help!
The documentation is clear on this:
A tool can call this API to abandon the current analysis function and resume execution of the calling thread at a new application register state. Note that this API does not return back to the caller's analysis function.
This API can be called from an analysis function or a replacement routine, but not from a callback.
The signal handler is considered a callback. You can only use PIN_ExecuteAt in an analysis function or a replacement routine.
One thing you may try to do is to save the context you are interested in and allow the application to resume, ensuring that the next instruction to be executed has an analysis callback attached. You may be able to use if-then instrumentation to improve performance. Then you can call ExecuteAt from that analysis routine.

Mysterious EXC_BAD_ACCESS on dispatch_async *serial queue*

I have a location-based app that gets location every 1 second, and saves a batch of the locations at a time to the CoreData DB so as not to make the locations array too large. However, for some reason it crashes with EXC_BAD_ACCESS even though I use dispatch_async on a SERIAL QUEUE:
Created a global serial custom queue like this:
var MyCustomQueue : dispatch_queue_t =
dispatch_queue_create("uniqueID.for.custom.queue", DISPATCH_QUEUE_SERIAL);
in the didUpdateToLocation protocol function, I have this bit of code that saves the latest batch of CLLocation items onto the disk, if the batch size is greater than a preset number:
if(myLocations.count == MAX_LOCATIONS_BUFFER)
{
dispatch_async(MyCustomQueue)
{
for myLocation in self.myLocations
{
// the next line is where it crashes with EXC_BAD_ACCESS
newLocation = NSEntityDescription.insertNewObjectForEntityForName(locationEntityName, inManagedObjectContext: context as NSManagedObjectContext) ;
newLocation.setValue(..)
// all the code to set the attributes
}
appDelegate.saveContext();
dispatch_async(dispatch_get_main_queue()) {
// once the latest batch is saved, empty the in-memory array
self.myLocations = [];
}
}
}
The mystery is, isn't a SERIAL QUEUE supposed to execute all the items on it in order, even though you use dispatch_async (which simply makes the execution concurrent with the MAIN thread, that doesn't touch the SQLite data)?
Notes:
The crash happens at random times...sometimes it'll crash after 0.5 miles, other times after 2 miles, etc.
If I just take out any of the dispatch_async code (just make everything executed in order on the main thread), there is no issue.
You can take a look at this link. Your issue is probably because there is no MOC at the time you are trying to insert item into it.
Core Data : inserting Objects crashed in global queue [ARC - iPhone simulator 6.1]

Sample Grabber Sink release() issue

I use Sample Grabber Sink in my Media session using most of code from msdn sample.
In OnProcessSample method I memcpy data to media buffer, attach it to MFSample and put this one into main process pointer. Problem is I either get memory leaking or crashes in ntdll.dll
ntdll.dll!#RtlpLowFragHeapFree#8() Unknown
SampleGrabberSink:
OnProcessSample(...)
{
MFCreateMemoryBuffer(dwSampleSize,&tmpBuff);
tmpBuff->Lock(&data,NULL,NULL);
memcpy(data,pSampleBuffer,dwSampleSize); tmpBuff->Unlock();
MFCreateSample(&tmpSample);
tmpSample->AddBuffer(tmpBuff);
while(!(*Free) && (*pSample)!=NULL)
{
Sleep(1);
}
(*Free)=false;
(*pSample)=tmpSample;
(*Free)=true;
SafeRelease(&tmpBuff);
}
in main thread
ReadSample()
{
if(pSample==NULL)
return;
while(!Free)
Sleep(1);
Free=false;
//process sample into dx surface//
SafeRelease(&pSample);
Free=true;
}
//hr checks omitted//
With this code i get that ntdll.dll error after playing few vids.
I also tried to push samples in qeue so OnProcess doesn't have to wait but then some memory havent free after video ended.
(even now it practicaly doesn't wait, Session rate is 1 and main process can read more than 60fps)
EDIT: It was thread synchronization problem. Solved by using critical section thanks to Roman R.
It is not easy to see is from the code snippet, but I suppose you are burning cycles on a streaming thread (you have your callback called on) until a global/shared variable is NULL and then you duplicate a media sample there.
You need to look at synchronization APIs and serialize access to shared variables. You don't do that and eventually either you are accessing freed memory or breaking reference count of COM object.
You need an event set externally when you are ready to accept new buffer from the callback, then the callback sees the event, enters critical section (or, reader/writer lock), does your *pSample magic there, exits from critical section and sets another event indicating availability of a buffer.

Resources