How can I do other oprations on CPU while kenrels in GPU are running? - opencl

The kernel needs a few minutes to finish.
I want to display a moving bar in the console while the kernel in GPU is processing.
Normally, this funciton clEnqueueNDRangeKernel executes the kernels, and when they are finished, CPU continues to execute the following operations like clWaitForEvents and clReleaseMemObject etc.
However, I want the CPU to print a processing bar continuesly after clEnqueueNDRangeKernel but before the kernels finish.
Is there any way to do that?

Create one thread that handles the GPU queue and one thread that handles the console output.
You can share information between the two threads by allocating global variables.
#include <iostream>;
#include <thread>
using namespace std;
volatile bool not_finished = true;
float progress = 0.0f;
void do_console_output() {
while(not_finished) {
// do console output ...
cout << progress << endl;
}
}
void do_opencl_stuff() {
while(not_finished) {
// do OpenCL stuff ...
progress += 0.01f;
}
}
int main() {
thread console_thread(do_console_output); // launch a separate thread
do_opencl_stuff(); // execute this in the main thread
console_thread.join();
return 0;
}

Related

Using POSIX threads in Qt Widget App

I'm relatively new to both Qt and pthreads, but I'm trying to use a pthread to work in the background of basic test app I'm making. I'm aware of the Qt Frameworks own threading framework - but there's a lot of complaint surrounding it so I'd like to use pthread if possible. The code is as below
#include "drawwindow.h"
#include "ui_drawwindow.h"
#include <pthread.h>
#include <stdlib.h>
#include <stdio.h>
#include "QThread"
pthread_t th1;
DrawWindow::DrawWindow(QWidget *parent) :
QMainWindow(parent),
ui(new Ui::DrawWindow)
{
ui->setupUi(this);
}
DrawWindow::~DrawWindow()
{
delete ui;
}
void DrawWindow::on_pushButton_clicked()
{
pthread_create(&th1, NULL, &DrawWindow::alter_text, NULL);
}
void DrawWindow::alter_text()
{
while(1)
{
ui->pushButton->setText("1");
QThread::sleep(1);
ui->pushButton->setText("one");
QThread::sleep(1);
}
}
With the header
#ifndef DRAWWINDOW_H
#define DRAWWINDOW_H
#include <QMainWindow>
namespace Ui {
class DrawWindow;
}
class DrawWindow : public QMainWindow
{
Q_OBJECT
public:
explicit DrawWindow(QWidget *parent = 0);
~DrawWindow();
void alter_text();
private slots:
void on_pushButton_clicked();
private:
Ui::DrawWindow *ui;
};
#endif // DRAWWINDOW_H
And I'm getting the error
error: cannot convert 'void (DrawWindow::*)()' to 'void* (*)(void*)' for argument '3' to 'int pthread_create(pthread_t*, const pthread_attr_t*, void* (*)(void*), void*)'
pthread_create(&th1, NULL, &DrawWindow::alter_text, NULL);
^
Does anyone know what is wrong?
TL;DR: The way you're using pthreads is precisely the discouraged way of using QThread. Just because you use a different api doesn't mean that what you're doing is OK.
There's absolutely no problem with either QThread or std::thread. Forget about pthreads: they are not portable, their API is C and thus abhorrent from a C++ programmer's perspective, and you'll be making your life miserable for no reason by sticking to pthreads.
Your real issue is that you've not understood the concerns with QThread. There are two:
Neither QThread nor std::thread are destructible at all times. Good C++ design mandates that classes are destructible at any time.
You cannot destruct a running QThread nor std::thread. You must first ensure that it's stopped, by calling, respectively QThread::wait() or std::thread::join(). It wouldn't have been a big stretch to have their destructors do that, and also stop the event loop in case of QThread.
Way too often, people use QThread by reimplementing the run method, or they use std::thread by running a functor on it. This is, of course, precisely how you use pthreads: you run some function in a dedicated thread. The way you're using pthreads is just as bad as the discouraged way of using QThread!
There are many ways of doing multithreading in Qt, and you should understand the pros and cons of each of them.
Thus, how do you do threading in C++/Qt?
First, keep in mind that threads are expensive resources, and you should ideally have no more threads in your application than the number of available CPU cores. There are some situations when you're forced to have more threads, but we'll discuss when it's the case.
Use a QThread without subclassing it. The default implementation of run() simply spins an event loop that allows the objects to run their timers and receive events and queued slot calls. Start the thread, then move some QObject instances to it. The instances will run in that thread, and can do whatever work they need done, away from the main thread. Of course, everything that the objects do should be short, run-to-completion code that doesn't block the thread.
The downside of this method is that you're unlikely to exploit all the cores in the system, as the number of threads is fixed. For any given system, you might have exactly as many as needed, but more likely you'll have too few or too many. You also have no control over how busy the threads are. Ideally, they should all be "equally" busy.
Use QtConcurrent::run. This is similar to Apple's GCD. There is a global QThreadPool. When you run a functor, one thread from the pool will be woken up and will execute the functor. The number of threads in the pool is limited to the number of cores available on the system. Using more threads than that will decrease performance.
The functors you pass to run will do self-contained tasks that would otherwise block the GUI leading to usability problems. For example, use it to load or save an image, perform a chunk of computations, etc.
Suppose you wish to have a responsible GUI that loads a multitude of images. A Loader class could do the job without blocking the GUI.
class Loader : public QObject {
Q_OBJECT
public:
Q_SIGNAL void hasImage(const QImage &, const QString & path);
explicit Loader(const QStringList & imagePaths, QObject * parent = 0) :
QObject(parent) {
QtConcurrent::map(imagePaths, [this](const QString & path){
QImage image;
image.load(path);
emit hasImage(image, path);
});
}
};
If you wish to run a short-lived QObject in a thread from the thread pool, the functor can spin the event loop as follows:
auto foo = QSharedPointer<Object>(new Object); // Object inherits QObject
foo->moveToThread(0); // prepares the object to be moved to any thread
QtConcurrent::run([foo]{
foo->moveToThread(QThread::currentThread());
QEventLoop loop;
QObject::connect(foo, &Object::finished, &loop, &QEventLoop::quit);
loop.exec();
});
This should only be done when the object is not expected to take long to finish what it's doing. It should not use timers, for example, since as long as the object is not done, it occupies an entire thread from the pool.
Use a dedicated thread to run a functor or a method. The difference between QThread and std::thread is mostly in that std::thread lets you use functors, whereas QThread requires subclassing. The pthread API is similar to std::thread, except of course that it is C and is awfully unsafe compared to the C++ APIs.
// QThread
int main() {
class MyThread : public QThread {
void run() { qDebug() << "Hello from other thread"; }
} thread;
thread.start();
thread.wait();
return 0;
}
// std::thread
int main() {
// C++98
class Functor {
void operator()() { qDebug() << "Hello from another thread"; }
} functor;
std::thread thread98(functor);
thread98.join();
// C++11
std::thread thread11([]{ qDebug() << "Hello from another thread"; });
thread11.join();
return 0;
}
// pthread
extern "C" void* functor(void*) { qDebug() << "Hello from another thread"; }
int main()
{
pthread_t thread;
pthread_create(&thread, NULL, &functor, NULL);
void * result;
pthread_join(thread, &result);
return 0;
}
So, what is this good for? Sometimes, you have no choice but to use a blocking API. Most database drivers, for example, have blocking-only APIs. They expose no way for your code to get notified when a query has been finished. The only way to use them is to run a blocking query function/method that doesn't return until the query is done. Suppose now that you're using a database in a GUI application that you wish to remain responsive. If you're running the queries from the main thread, the GUI will block each time the database query run. Given long-running queries, a congested network, a dev server with a flaky cable that makes the TCP perform on par with sneakernet... you're facing huge usability issues.
Thus, you can't but have to run the database connection on, and execute the database queries on a dedicated thread that can get blocked as much as necessary.
Even then, it might still be helpful to use some QObject on the thread, and spin an event loop, since this will allow you to easily queue the database requests without having to write your own thread-safe queue. Qt's event loop already implements a nice, thread-safe event queue so you might as well use it. For example, with a note that Qt's SQL module can be used from one thread only - thus you can't prepare QSQLQuery in the main thread :(
Note that this example is very simplistic, you'd likely want to provide thread-safe way of iterating the query results, instead of pushing the entire query's worth of data at once.
class DBWorker : public QObject {
Q_OBJECT
QScopedPointer<QSqlDatabase> m_db;
QScopedPointer<QSqlQuery> m_qBooks, m_query2;
Q_SLOT void init() {
m_db.reset(new QSqlDatabase(QSqlDatabase::addDatabase("QSQLITE")));
m_db->setDatabaseName(":memory:");
if (!m_db->open()) { emit openFailed(); return; }
m_qBooks.reset(new QSqlQuery(*m_db));
m_qBooks->prepare("SELECT * FROM Books");
m_qCars.reset(new QSqlQuery(*m_db));
m_qCars->prepare("SELECT * FROM Cars");
}
QList<QVariantList> read(QSqlQuery * query) {
QList<QVariantList> result;
result.reserve(query->size());
while (query->next()) {
QVariantList row;
auto record = query->record();
row.reserve(record.count());
for (int i = 0; i < recourd.count(); ++i)
row << query->value(i);
result << row;
}
return result;
}
public:
typedef QList<QVariantList> Books, Cars;
DBWorker(QObject * parent = 0) : QObject(parent) {
QObject src;
connect(&src, &QObject::destroyed, this, &DBWorker::init, Qt::QueuedConnection);
m_db.moveToThread(0
}
Q_SIGNAL void openFailed();
Q_SIGNAL void gotBooks(const DBWorker::Books &);
Q_SIGNAL void gotCars(const DBWorker::Cars &);
Q_SLOT void getBooks() {
Q_ASSERT(QThread::currentThread() == thread());
m_qBooks->exec();
emit gotBooks(read(m_qBooks));
}
Q_SLOT void getCars() {
Q_ASSERT(QThread::currentThread() == thread());
m_qCars->exec();
emit gotCars(read(m_qCars));
}
};
Q_REGISTER_METATYPE(DBWorker::Books);
Q_REGISTER_METATYPE(DBWorker::Cars);
// True C++ RAII thread.
Thread : public QThread { using QThread::run; public: ~Thread() { quit(); wait(); } };
int main(int argc, char ** argv) {
QCoreApplication app(argc, argv);
Thread thread;
DBWorker worker;
worker.moveToThread(&thread);
QObject::connect(&worker, &DBWorker::gotCars, [](const DBWorker::Cars & cars){
qDebug() << "got cars:" << cars;
qApp->quit();
});
thread.start();
...
QMetaObject::invokeMethod(&worker, "getBooks"); // safely invoke `getBooks`
return app.exec();
}
Change void DrawWindow::alter_text() to void* DrawWindow::alter_text(void*) and return pthread_exit(NULL);.

Access to callable object inside an async task conflicts with use of std::cin

#include <iostream>
#include <functional>
#include <future>
#include <tchar.h>
void StartBackground(std::function<void()> notify)
{
auto background = std::async([&]
{
notify(); // (A)
});
}
int _tmain(int argc, _TCHAR* argv[])
{
StartBackground([](){});
char c; std::cin >> c; // (B)
while (1);
return 0;
}
1) Build and run the code above using Visual Studio 2012.
2) Line (A) triggers an Access Violation in _VARIADIC_EXPAND_P1_0(_CLASS_FUNC_CLASS_0, , , , ):
First-chance exception at 0x0F96271E (msvcp110d.dll) in
ConsoleApplication1.exe: 0xC0000005: Access violation writing location
0x0F9626D8
Most confusingly, the exception can be avoided by removing line (B).
Questions
Why does the callable object notify apparently conflict with the use of std::cin?
What's wrong with this code?
The real world scenario for this simplified example is a function that executes some code in parallel and have that code call a user-supplied notify function when done.
Edit
I found at least one problem im my code: The background variable is destroyed as soon as StartBackground() exits. Since std::async may or may not start a separate thread, and std::thread calls terminate() if the thread is still joinable, this might be causing the problem.
The following variant works because it gives the task enough time to complete:
void StartBackground(std::function<void()> notify)
{
auto background = std::async([&]
{
notify(); // (A)
});
std::this_thread::sleep_for(std::chrono::seconds(1));
}
Keeping the std::future object alive over a longer period instead of sleeping should also work. But the following code also causes the same access violation:
std::future<void> background;
void StartBackground(std::function<void()> notify)
{
background = std::async([&]
{
notify(); // (A)
});
}
whereas using a std::thread in the same manner works as expected:
std::thread background;
void StartBackground(std::function<void()> notify)
{
background = std::thread([&]
{
notify(); // (A)
});
}
I'm completely puzzled.
I must be missing some very crucial points here regarding std::async and std::thread.
The result of async is a future, not a running thread. You have to synchronize on the task by saying background.get(). Without that, the client procedure may never get executed.

How to create multiple threads using QtConcurrent::run()

I tried running QtConcurrent::run() in a loop, but program crashes: (i am using libsmbclient)
void Scanner::scan()
{
for(int i=0;i<ipList.length();i++)
{
QtConcurrent::run(this,&Scanner::scanThread,i);
}
}
void Scanner::scanThread(int i)
{
int dh;
QString ip;
ip="smb://"+ipList[i]+"/";
dh= smbc_opendir(ip.toAscii()); // debugger points to this location
if(dh<0)
return;
emit
updateTree(i,dh); // on commenting this line, it still crashes
}
Error:
talloc: access after free error - first free may be at ../lib/util/talloc_stack.c:103
Bad talloc magic value - access after free
The program has unexpectedly finished.

Qt 4.8 killing and restarting the GUI

There is requirement of writing a Qt application on a MIPS based platform.
But there are lots of constraints. The constraints included freeing up of few resources (QGFX Plugin, GPU Memory etc) when required and re-using it. But the application cannot be killed as its handling lots of other requests and running other things.
Basically the GUI needs to be killed and free all the resources related to GUI; later when when required restart again
One of the way which has been tried is :
main() -> create a New-Thread
In the New-Thread,
while(<Condition>)
{
sem_wait(..)
m_wnd = new myMainWindow();
...
..
app->exec();
}
When ever there is a kill command, it comes out of the event loop, and wait for the signal from other threads. Once other threads does the required changes, it will get the signal and will create a new window and goes into the event loop.
In the main(), there are also few other threads created, which control other devices etc and signal the start and stop for the Qt-GUI.
The above seems to work but I am not sure if this is the right design. Does it create any problem?
Can any one suggest any better way?
I was able to find the required answer in Qt-Forums.
Since the main intention was to remove all the things related to GUI (On screen), I could use void setQuitOnLastWindowClosed ( bool quit ) (Details Here). This will make sure the GUI / Main window is closed and still the app doesnt come out of event loop and I can restart the main window later.
Thanks
When I needed a way to ensure that my app kept running, I forked it into a sub-process. That way, even if it seg-faulted, the main process would catch it and start a new child process. In the child process, I had multiple threads for GUI and non-GUI tasks. The fork code is short and is based on the example given in the wait(2) man page. The main() simply calls createChild() in a while loop. createChild() starts a new process using zmain(). zmain() is your QT app's main.
#include <QtGui/QApplication>
#include <QThread>
int zmain(int argc, char *argv[])
{
QApplication app(argc, argv, true);
app.setQuitOnLastWindowClosed(false);
QThread powerThread;
Power p;
p.moveToThread(&powerThread);
powerThread.start();
return app.exec();
}
// The following code is taken from the wait(2) man page and has been modified to run
// our Qt main() above in a child process. When the child terminates, it is automatically
// restarted.
#include <sys/wait.h>
#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>
int createChild(int argc, char *argv[]) {
pid_t cpid, w;
int status;
cpid = fork();
if (cpid == -1) {
perror("fork");
exit(EXIT_FAILURE);
}
if (cpid == 0) { /* Code executed by child */
fprintf(stderr, "Child PID is %ld\n", (long) getpid());
exit(zmain(argc, argv));
} else { /* Code executed by parent */
do {
w = waitpid(cpid, &status, WUNTRACED | WCONTINUED);
if (w == -1) {
perror("waitpid");
return(EXIT_FAILURE);
}
if (WIFEXITED(status)) {
fprintf(stderr, "exited, status=%d\n", WEXITSTATUS(status));
} else if (WIFSIGNALED(status)) {
fprintf(stderr, "killed by signal %d\n", WTERMSIG(status));
} else if (WIFSTOPPED(status)) {
fprintf(stderr, "stopped by signal %d\n", WSTOPSIG(status));
} else if (WIFCONTINUED(status)) {
fprintf(stderr, "continued\n");
}
} while (!WIFEXITED(status) && !WIFSIGNALED(status));
if (WIFEXITED(status) && WEXITSTATUS(status) == 111)
return 111;
return EXIT_SUCCESS;
}
}
int
main(int argc, char *argv[])
{
while (111 != createChild(argc, argv)) {
}
}

Device sync during async memcpy in CUDA

Suppose I want to perform an async memcpy host to device in CUDA, then immediately run the kernel. How can I test in the kernel if the async transfer has completed ?
Sequencing your asynchronous copy and kernel launch using a CUDA "stream" ensures that the kernel executes after the asynchronous transfer has completed. The following code example demonstrates:
#include <stdio.h>
__global__ void kernel(const int *ptr)
{
printf("Hello, %d\n", *ptr);
}
int main()
{
int *h_ptr = 0;
// allocate pinned host memory with cudaMallocHost
// pinned memory is required for asynchronous copy
cudaMallocHost(&h_ptr, sizeof(int));
// look for thirteen in the output
*h_ptr = 13;
// allocate device memory
int *d_ptr = 0;
cudaMalloc(&d_ptr, sizeof(int));
// create a stream
cudaStream_t stream;
cudaStreamCreate(&stream);
// sequence the asynchronous copy on our stream
cudaMemcpyAsync(d_ptr, h_ptr, sizeof(int), cudaMemcpyHostToDevice, stream);
// sequence the kernel on our stream after the copy
// the kernel will execute after the copy has completed
kernel<<<1,1,0,stream>>>(d_ptr);
// clean up after ourselves
cudaStreamDestroy(stream);
cudaFree(d_ptr);
cudaFreeHost(h_ptr);
}
And the output:
$ nvcc -arch=sm_20 async.cu -run
Hello, 13
I don't believe there's any supported way to test from within a kernel whether some asynchronous condition (such as the completion of an asynchronous transfer) has been met. CUDA thread blocks are assumed to execute completely independently from other threads of execution.

Resources