Related
Intro
This is an open ended question that I thought could be beneficial to the community because I have been unable to find great documentaton in regards to this. Unfortunately, I learned the hard way that implementing a DLL in Qt is different than in other languages as I will explain later
Problem Statement
Implement a multithreaded DLL in Qt that can easily be used by non-Qt applications
Background Info
Qt is the tool of choice because its inherent cross-platform compatibility
The API makes use of callback functions to tell the calling application when certain events have occurred
Assumptions
-The applications that will link to the Qt dll are compatible with Qt compilers (c/c++ -mingw, C# -msvc)
-Signals/slots are used to communicate from main thread to worker threads (e.g. tell a worker thread to collect data) as well as from worker threads back to main threads (e.g. notify the main thread via a callback function that data collection has completed)
Problem Description
I learned the hard way that writing a multi-threaded DLL in QT is different than other languages because of Qt's architecture. Problems arise due to QT event loops that handle spwaning threads, timers, sending signals, and receiving slots. This Qt even loop (QApplication.exec()) can be called from the main application when the main app is Qt (Qt has access to QT specific libraries). However, when the calling application is not Qt, C# for example, the calling application (aka the main thread) does not have the ability to call Qt specific libraries therefore, making it necessary to design your DLL with an event loop embedded inside of it. It is important that this is considered upfront in the design because it is difficult to shoe-horn it in later because QApplication.exec is blocking.
In a nutshell, I am looking for oppinions on the best way to architect a multithreaded dll in Qt such that it is compatible w/ non-QT applications.
In Summary
Where does the event loop reside in the overall architecture?
What special considerations should you make in regards to signals/slots?
Are there any gotchas that the community has come across when
implementing something similar to what I have described?
[...] when the calling application is not Qt, C# for example, the calling application (aka the main thread) does not have the ability to call Qt specific libraries therefore, making it necessary to design your DLL with an event loop embedded inside of it.
This is not accurate. On Windows, you need one event loop per thread, and that event loop can be implemented using pure WINAPI, or using C#, or whatever language/framework you need. As long as that event loop is dispatching windows messages, Qt code will work.
The only Qt specific thing that needs to exist is an instance of QApplication (or QGuiApplication or QCoreApplication, depending on your needs) created from the main thread.
You must not call exec() on that instance, since native code (the main application) is already pumping windows messages. You do need to call QCoreApplication::processEvents once after you create the application instance, to "prime" it. That you do need to do so is a bug (omission) and I'm not sure that it's even necessary in Qt 5.5.
Afterwards, the gui thread in the native application will properly dispatch native events to Qt widgets and objects.
Any worker threads you create using unaltered QThread::run will spin native event loops and each of these threads can host native objects (windows handles) and QObjects alike, as well as execute asynchronous procedure calls.
The simplest way of setting it all up is to provide an initialize function in the DLL that is called by the main application once to get Qt going:
static int argc = 1;
static char arg0[] = "";
static char * argv[] = { arg0, nullptr };
Q_GLOBAL_STATIC_WITH_ARGS(QApplication, app, (arc, argv))
extern "C" __declspec(dllexport) void initialize() {
app->processEvents(); // prime the application instance
new MyWindow(app)->show();
}
The requirement for initialization not to be done in DllMain is not specific to Qt. Native WINAPI-using code is forbidden from doing mostly anything in DllMain - it's not possible to create windows, etc.
I reiterate that it's an error to do anything that could be allocating memory, window handles, threads, etc. from DllMain. You can only call kernel32 APIs, with some exceptions. Allocating a QThread or QApplication instance there is an obvious no-no. Queuing the APC call from a "current" (random) thread is the best you can do, and there's still not a firm guarantee that the thread will survive long enough to execute your APC, or that it will ever alertably wait so that the APCs can get a chance to run.
If you're feeling adventurous, according to this answer, you could queue the call to initialize() as an APC. The major problem then is that you can't ever be sure that DllMain is called from the right thread. The thread it's called from must end up in alertable wait state (say pumping the message loop). You can create a dedicated application thread then, and it's not possible to figure out if there is any particular other "main" thread that should be used instead of the new one. The thread handle must be detached, thus we must use std::thread instead of QThread.
void guiWorker() {
int argc = 1;
const char dummy[] = "";
char * argv[] = { const_cast<char*>(dummy), 0 };
QApplication app(argc, argv);
QLabel label("Hello, World!");
label.show();
app.exec();
}
VOID CALLBACK Start(_In_ ULONG_PTR) {
std::thread thread { guiWorker };
thread.detach();
}
BOOL WINAPI DllMain(HINSTANCE, DWORD reason, LPVOID)
{
switch (reason) {
case DLL_PROCESS_ATTACH:
QueueUserAPC(Start, GetCurrentThread(), NULL);
break;
case DLL_PROCESS_DETACH:
// Reasonably safe, doesn't allocate
if (QCoreApplication::instance()) QCoreApplication::instance()->quit();
break;
}
}
Generally speaking, you should never need such code. The main application must call the initialization function from a thread with an event pump (usually the main thread), and everything will work then - just as it would were it initializing a DLL using native functionality only.
Just to provide a quick update on this so you can learn from our mistake. We ran into all types of problems when we tried to integrate our Qt written dll with non-Qt languages such as C# because of the issues listed above. While Qt is great at providing a multi-platform solution it has the drawback of not being very DLL friendly as it is very difficult to get the DLL working any language other than Qt. We are currently investigating whether or not we want to rewrite our entire DLL in standard portable C++ and scrap the Qt implementation which would be very expensive.
Fair warning, I would avoid using QT as your framework when creating DLLs.
This comment by Stephen Cleary says this:
AspNetSynchronizationContext is the strangest implementation. It treats Post as synchronous rather than asynchronous and uses a lock to execute its delegates one at a time.
Similarly, the article that he wrote on synchronization contexts and linked to in that comment suggests:
Conceptually, the context of AspNetSynchronizationContext is complex. During the lifetime of an asynchronous page, the context starts with just one thread from the ASP.NET thread pool. After the asynchronous requests have started, the context doesn’t include any threads. As the asynchronous requests complete, the thread pool threads executing their completion routines enter the context. These may be the same threads that initiated the requests but more likely would be whatever threads happen to be free at the time the operations complete.
If multiple operations complete at once for the same application, AspNetSynchronizationContext will ensure that they execute one at a time. They may execute on any thread, but that thread will have the identity and culture of the original page.
Digging in reflector seems to validate this as it takes a lock on the HttpApplication while invoking any callback.
Locking the app object seems like scary stuff. So my first question: Does that mean that today, all asynchronous completions for the entire app execute one at a time, even ones that originated from separate requests on separate threads with separate HttpContexts? Wouldn't this be a huge bottleneck for any apps that make 100% use of async pages (or async controllers in MVC)? If not, why not? What am I missing?
Also, in .NET 4.5, it looks like there's a new AspNetSynchronizationContext, and the old one is renamed LegacyAspNetSynchronizationContext and only used if the new app setting UseTaskFriendlySynchronizationContext is not set. So question #2: Does the new implementation change this behavior? Otherwise, I imagine with the new async/await support marshaling completions through the synchronization context, this kind of bottleneck would be noticed much more frequently going forward.
The answer to this forum post (linked from SO answer here) suggests that something fundamentally changed here, but I want to be clear on what that is and what behaviors have improved, since we have a .NET 4 MVC 3 app which is pretty much 100% async action methods making web service calls.
Let me answer your first question. In your assumption you didn't consider the fact that separate ASP.NET requests are processed by different HttpApplication objects. HttpApplication objects are stored in pool. Once you request a page, an application object is retrieved from pool and belongs to the request till its completion. So, my answer to your question:
all asynchronous completions for the entire app execute one at a time, even ones that originated from separate requests on separate threads with separate HttpContexts
is: No, they don't
Separate requests are processed by separate HttpApplication objects, locked HttpApplication will affect only single request.
Synchronization context is a powerful thing that helps developers to synchronize access to shared (in scope of request) resources. That is why all callbacks are executed under lock. Synchronization context is a heart of event-based synchronization pattern.
I am trying to use Qt as a library (similar to this), because I want to reuse Qt classes in some currently non-Qt applications, and in shared libraries as cross-platform glue. Everything is non-GUI.
Some problems are easily avoided by DirectConnection, some can be solved with private event loops, even one can run a fake QCoreApplication in a thread and it works (last resort).
I want to know what modules rely on a running instance of QCoreApplication and cannot work without it.
Some of the Qt modules (in QtCore) do need an instance of QCoreApplication to run properly. For example QTimer relies on QCoreApplication to dispatch timer events.
I was reading the documentation for QtConcurrentRun and it seems to be relying on a global instance of QThreadPool, I am about to try and see if the application execution is vital, or maybe the instance is created on first access, or maybe not.
I am going to study QCoreApplicationPrivate source (Windows and Linux for now) but any hints in the right direction is greatly appreciated.
What are other functionality dependencies to the core application? Note that it might depend on the OS.
Edit1: Thanks to Kuba's answer, It seems QCoreApplication event loop is not necessary for timer and socket events to be dispatched. So some QtCore modules require and instance of QCoreApplication, but there is no need to have a running application event loop.
You are conflating the existence of a QCoreApplication with a running event loop. Those two are separate concepts. You may well need the former for the latter, but the latter doesn't have to run in the same thread as the former.
Most notably, you don't really have to call qApp->exec() if you don't have any events to dispatch in the thread where you constructed QCoreApplication.
The existence of a QCoreApplication is, as it were, a non-issue. Things get hairier with QApplication -- you can start it in a non-gui thread, but it's not portable and won't work on OS X. I'm trying to figure out why it doesn't work, but I don't have much time now to offer a satisfactory solution -- not yet.
It is also a misconception that QCoreApplication's event loop needs to be running for socket notifications and timer events to be dispatched to other threads. A QCoreApplication's event loop is nothing special. There is a platform-specific instance of QAbstractEventDispatcher that gets created for a thread when you instantiate the first QEventLoop in that thread. The QEventLoop doesn't know anything specific about the platform.
QCoreApplication's exec() method is quite simple and creates an instance of QEventLoop, and thus will create an instance of platform-specific QAbstractEventDispatcher. This instance is not special in any way. It's the same as any other event dispatcher created in any other thread, as far as my code reading tells so far.
If all underlying window systems would support it, it'd be in fact possible to make Qt GUI code multithreaded -- the per-thread event reception and dispatch is already there as a small first step. The big obstacle, probably the only one, would be the X library and its display lock. The display lock would be an obvious matter of contention between threads. You'd need each thread that wants to talk to the GUI open up a separate connection to the X server, and I don't know if there's a supported way of doing that from Xlib.
Since the very begining of writing ASP.NET applications when I wanted to add a threading there are 3 simple ways I can accomplish threading within my ASP.NET application :
Using the System.Threading.ThreadPool.
Using a custom delegate and calling its BeginInvoke method.
Using custom threads with the aid of System.Threading.Thread class.
The first two methods offer a quick way to fire off worker threads for your application. But unfortunately, they hurt the overall performance of your application since they consume threads from the same pool used by ASP.NET to handle HTTP requests.
Then I wanted to use a new Task or async/await to write IHttpAsyncHandler. One example you can find is what Drew Marsh explains here : https://stackoverflow.com/a/6389323/261950
My guess is that using Task or async/await still consume the thread from the ASP.NET thread pool and I don't want for the obvious reason.
Could you please tell me if I can use Task (async/await) on the background thread like with System.Threading.Thread class and not from thread pool ?
Thanks in advance for your help.
Thomas
This situation is where Task, async, and await really shine. Here's the same example, refactored to take full advantage of async (it also uses some helper classes from my AsyncEx library to clean up the mapping code):
// First, a base class that takes care of the Task -> IAsyncResult mapping.
// In .NET 4.5, you would use HttpTaskAsyncHandler instead.
public abstract class HttpAsyncHandlerBase : IHttpAsyncHandler
{
public abstract Task ProcessRequestAsync(HttpContext context);
IAsyncResult IHttpAsyncHandler.BeginProcessRequest(HttpContext context, AsyncCallback cb, object extraData)
{
var task = ProcessRequestAsync(context);
return Nito.AsyncEx.AsyncFactory.ToBegin(task, cb, extraData);
}
void EndProcessRequest(IAsyncResult result)
{
Nito.AsyncEx.AsyncFactory.ToEnd(result);
}
void ProcessRequest(HttpContext context)
{
EndProcessRequest(BeginProcessRequest(context, null, null));
}
public virtual bool IsReusable
{
get { return true; }
}
}
// Now, our (async) Task implementation
public class MyAsyncHandler : HttpAsyncHandlerBase
{
public override async Task ProcessRequestAsync(HttpContext context)
{
using (var webClient = new WebClient())
{
var data = await webClient.DownloadDataTaskAsync("http://my resource");
context.Response.ContentType = "text/xml";
context.Response.OutputStream.Write(data, 0, data.Length);
}
}
}
(As noted in the code, .NET 4.5 has a HttpTaskAsyncHandler which is similar to our HttpAsyncHandlerBase above).
The really cool thing about async is that it doesn't take any threads while doing the background operation:
An ASP.NET request thread kicks off the request, and it starts downloading using the WebClient.
While the download is going, the await actually returns out of the async method, leaving the request thread. That request thread is returned back to the thread pool - leaving 0 (zero) threads servicing this request.
When the download completes, the async method is resumed on a request thread. That request thread is briefly used just to write the actual response.
This is the optimal threading solution (since a request thread is required to write the response).
The original example also uses threads optimally - as far as the threading goes, it's the same as the async-based code. But IMO the async code is easier to read.
If you want to know more about async, I have an intro post on my blog.
I've been looking for information through internet for a couple of days. Let me sum up what I found until now :
ASP.NET ThreadPool facts
As Andres said: When async/await will not consume an additional ThreadPool thread ? Only in the case you are using BCL Async methods. that uses an IOCP thread to execute the IO bound operation.
Andres continues with ...If you are trying to async execute some sync code or your own library code, that code will probably use an additional ThreadPool thread unless you explicitely use the IOCP ThreadPool or your own ThreadPool.
But as far as I know you can't chose whetever you want to use a IOCP thread, and making correct implementation of the threadPool is not worth the effort. I doubt someone does a better one that already exists.
ASP.NET uses threads from a common language runtime (CLR) thread pool to process requests. As long as there are threads available in the thread pool, ASP.NET has no trouble dispatching incoming requests.
Async delegates use the threads from ThreadPool.
When you should start thinking about implementing asynchronous execution ?
When your application performs relatively lengthy I/O operations (database queries, Web service calls, and other I/O operations)
If you want to do I/O work, then you should be using an I/O thread (I/O Completion Port) and specifically you should be using the async callbacks supported by whatever library class you're using. Theirs names start with the words Begin and End.
If requests are computationally cheap to process, then parallelism is probably an unnecessary overhead.
If the incoming request rate is high, then adding more parallelism will likely yield few benefits and could actually decrease performance, since the incoming rate of work may be high enough to keep the CPUs busy.
Should I create new Threads ?
Avoid creating new threads like you would avoid the plague.
If you are actually queuing enough work items to prevent ASP.NET from processing further requests, then you should be starving the thread pool! If you are running literally hundreds of CPU-intensive operations at the same time, what good would it do to have another worker thread to serve an ASP.NET request, when the machine is already overloaded.
And the TPL ?
TPL can adapt to use available resources within a process. If the server is already loaded, the TPL can use as little as one worker and make forward progress. If the server is mostly free, they can grow to use as many workers as the ThreadPool can spare.
Tasks use threadpool threads to execute.
References
http://msdn.microsoft.com/en-us/magazine/cc163463.aspx
http://blogs.msdn.com/b/pfxteam/archive/2010/02/08/9960003.aspx
https://stackoverflow.com/a/2642789/261950
Saying that "0 (zero) threads will be servicing this request" is not accurate entirely.
I think you mean "from the ASP.NET ThreadPool", and in the general case that will be correct.
When async/await will not consume an additional ThreadPool thread?
Only in the case you are using BCL Async methods (like the ones provided by WebClient async extensions) that uses an IOCP thread to execute the IO bound operation.
If you are trying to async execute some sync code or your own library code, that code will probably use an additional ThreadPool thread unless you explicitely use the IOCP ThreadPool or your own ThreadPool.
Thanks,
Andrés.
The Parallel Extensions team has a blog post on using TPL with ASP.NET that explains how TPL and PLINQ use the ASP.NET ThreadPool. The post even has a decision chart to help you pick the right approach.
In short, PLINQ uses one worker thread per core from the threadpool for the entire execution of the query, which can lead to problems if you have high traffic.
The Task and Parallel methods on the other hand will adapt to the process's resources and can use as little as one thread for processing.
As far as the Async CTP is concerned, there is little conceptual difference between the async/await construct and using Tasks directly. The compiler uses some magic to convert awaits to Tasks and Continuations behind the scenes. The big difference is that your code is much MUCH cleaner and easier to debug.
Another thing to consider is that async/await and TPL (Task) are not the same thing.
Please read this excellent post http://blogs.msdn.com/b/ericlippert/archive/2010/11/04/asynchrony-in-c-5-0-part-four-it-s-not-magic.aspx to understand why async/await doesn't mean "using a background thread".
Going back to our topic here, in your particular case where you want to perform some expensive calculations inside an AsyncHandler you have three choices:
1) leave the code inside the Asynchandler so the expensive calculation will use the current thread from the ThreadPool.
2) run the expensive calculation code in another ThreadPool thread by using Task.Run or a Delegate
3) Run the expensive calculation code in another thread from your custom thread pool (or IOCP threadPool).
The second case MIGHT be enough for you depending on how long your "calculation" process run and how much load you have. The safe option is #3 but a lot more expensive in coding/testing. I also recommend always using .NET 4 for production systems using async design because there are some hard limits in .NET 3.5.
There's a nice implementation of HttpTaskAsyncHandler for the .NET 4.0 in the SignalR project. You may want to ckeck it out: http://bit.ly/Jfy2s9
the web layer is coded in asp.net with pages marked as async. Yes, the recommended way to code for aync is using the RegisterAsyncTask
I have a problem now - there are a few pages that have used AutoResetEvent and ManualResetEvent for aync and not the standard RegisterAsyncTask.
Would these objects servicing the async calls, use up the worker threads from the threadpool? (not recommended, as this would exhaust the worker threads and the server would not be able to serve other client requests
OR
would they use the IO threads? (typically IO threads are used for async calls with the RegsterAsyncTask, this is desired)
I would need to propose change to these pages based on your insights.
Any opinions please?
The reset event objects don't use different threads themselves - they just block/release the current thread based on the current state and the activities of other threads.
When you say there are other pages "that have used AutoResetEvent and ManualResetEvent for a[s]ync" what exactly do you mean? These are synchronization objects, and don't provide a way (in themselves) of making operations asynchronous. Something else must be starting a thread or using the thread pool.