Fast (HW-accelerated) drawing to foreign window (probably using Direct3D) - gdi

I am working on certain project where the task is to paint bitmaps (currently HBITMAP/bitblt/alphablend) into non-client areas of all visible windows (i.e. windows which do not belong to my application). It must be done very fast - bitmap is update when window is moved, resized etc. Also some kind of blur algorithm must be applied on this bitmap. It should work on Win7 and Win8 (so XP is not required).
I have managed to work properly with GDI. I obtain GetWindowDC, GetWindowRect and AlphaBlend bitmap into buffer (CreateCompatibleDC/CreateCompatibleBitmap) and then BitBlt it into GetWindowDC. This works perfectly... except... it is not as fast as I want. And if I apply blur algorithm on the bitmap, then everything is slow as hell.
So my question would be how to improve the speed? I'm thinking about hardware accelerated drawing.
a) I tried GDI-compatible Direct2D (using ID2D1DCRenderTarget+BindDC) but it is much slower than pure GDI.
b) I am thinking about Direct3D. Problem is that I probably don't know how to use it. If I call D3D10CreateDeviceAndSwapChain with swapchain's OutputWindow set to HWND of my application then it returns S_OK but when I set OutputWindow to HWND of any foreign window then the method fails. So I am not sure how to render into foreign windows.
c) how to properly apply blur on a part of image? I found many algorithms but all of them are processed on CPU. How to make it on GPU?
Thanks in advance for any idea how to solve my problem.

Have you thought about using DComp? For why using DComp may be appropriate, take a look at this: http://msdn.microsoft.com/en-us/library/windows/desktop/hh449195%28v=vs.85%29.aspx
For a brief summary of what DComp is (from MSDN):
Microsoft DirectComposition is a Windows component that enables high-performance bitmap composition with transforms, effects, and animations. Application developers can use the DirectComposition API to create visually engaging user interfaces that feature rich and fluid animated transitions from one visual to another.
DirectComposition enables rich and fluid transitions by achieving a high framerate, using graphics hardware, and operating independently of the UI thread. DirectComposition can accept bitmap content drawn by different rendering libraries, including Microsoft DirectX bitmaps, and bitmaps rendered to a window (HWND bitmaps). Also, DirectComposition supports a variety of transformations, such as 2D affine transforms and 3D perspective transforms, as well as basic effects such as clipping and opacity.
DirectComposition is designed to simplify the process of composing visuals and creating animated transitions. If your application already contains rendering code or already uses the recommended DirectX API, you only need to do a minimal amount of work to use DirectComposition effectively.

Related

Render QML onscreen and save video offscreen simultaneously (OpenGL ES 2.0)

I am trying to achieve a 30 fps screen recording (of a QML scene) while also rendering the same QML scene to my display. So far I have followed (http://blog.qt.io/blog/2017/02/21/making-movies-qml/) and have been able to achieve onscreen and offscreen rendering by use of two QML engines. The issue is that any saving method with calls to glReadPixels (QOpenGLFramebufferObject->toImage) will block the onscreen rendering.
I've learned a way to get around this is to use pixel buffer objects (pbo) to achieve asynchronous transfers; I have achieved this on my Desktop platform but I need a solution for my embedded platform which has OpenGL ES 2.0 and QT 5.7.1.
Is there any other way to use frame buffer objects or textures to achieve this goal? Is there a way to copy the texture / color attachment in the GPU memory space and transfer the image back in chunks?
Thanks,
Most embedded platforms use a unified memory architecture, so it's entirely possible that the glReadPixels approach won't trigger nearly as much overhead. You're probably still better off using a ring of buffer textures that are copied from the actual framebuffer texture and then have fences protecting the reads from those, but you're not going to have the same issues that arise from crossing the CPU / GPU memory boundary on desktop platforms.

Which one for a desktop app with a good looking UI : QtWebkit or Qml?

I have been researching about these two technologies for creating a good looking desktop using Qt. However I see people talking about Qml as the next big thing for desktop apps since it provides all those "good" eye candy effects for a desktop application. But on the other hand with QtWebkit we could bring the same state-of-the-art UI looks and feel that we have on the web. Now I need help in choosing the right technology for a cross-platform application with a descent good looking UI. So Qml or QtWebkit with html5?
You will eventually hit webkit's limitations. First of all, webkit is really heavyweight. Just its javascript engine is about 5MB IIRC. Qt 5.2 dropped the V8 javascript engine for its own engine, and saved about that much from the executable size.
QML gives you all the benefits of javascript with a couple things that are simply nowhere in webkit, namely:
A declarative, property-binding style of hooking things together, with a lot of well-performing elements, such as animations. In html you have to deal with dom and css separately, and there is an obvious impedance mismatch between the two - the designs got nothing to do with each other.
An ever-improving OpenGL-ES-based scene graph. WebGL gives you a much lower-level interface than that, and DOM is something else entirely.
A lighter V4 engine (in 5.2) optimized for QML.
Never mind that webkit simply doesn't use hardware acceleration for its rendering, at all. In QML, graphics hardware nominally does all the rendering. With webkit as-is (as opposed to, say Awesomium), you're leaving yourself behind, performance-wise. It may let you do "flashy" stuff, but it will not be anywhere near as fluid as QML is.

DWM in Win7/8 + GDI

A problem I've noticed once on my Win7 system, thought it was a DWM bug as it was fixed after a reboot. But now I'm realizing that it's happening (as default behavior) on other people's systems, and it's the normal behavior on Surface Pro as well.
How to reproduce the problem: implement, using GDI, a basic lasso system. Define a rectangle controlled by the mouse, when the rectangle changes, invalidate the old one & the new one (or invalidate a union of both rectangles, either as a new rectangle or a complex region, it doesn't matter, the "bug" still shows anyway).
During wm_paint, you just erase the background & paint the rectangle (it has to be a rectangle outline, the problem won't be visible if it's a filled rectangle). You can do double-buffering if you wanna be sure that it's not a flickering problem (& trust me it's not).
So what you'll see, if you have a system like mine (desktop Win7 with a geforce, aero on), is a normal lasso system, with no more ghosting than the monitor's own.
On other systems (like Surface Pro, to define a fully known system), you'll see, as you extend the lasso outwards, the border of the lasso disappearing. A bit like LCD ghosting, but massively more noticable.
Now, instead of invalidating the lasso's rectangle, try invalidating the whole window. And there, no more ghosting.
What I found out is that it's not the invalidation that "fixes" it, it's GDI access. You can as well invalidate the whole rectangle, but only paint the lasso's zone, still ghosting. But if you paint the lasso's zone and draw a single little pixel on each corners of the window, no more ghosting.
There must be something in DWM, probably since version 1.1, that uses some kind of caching of the bounding box of the last GDI access, and for some weird reason, what falls within the last bounding box will get immediately on screen, while the new part will be delayed by at least 1 frame.
This is pretty bad because it's breaking very basic window invalidation that everyone uses, and I haven't found any way to fix it (other than by invalidating the whole window of course, but that'd be stupid, & besides it's a problem that affects the whole GDI, so you get poor visual results everywhere).
Again it's most likely in DWM 1.1, I don't think you can get this in Vista, but I'm not sure. I also don't know why it doesn't do that on my desktop, possibly it depends on the graphic card's driver.
So if anyone happens to know more about this...
An update on this. I didn't find any clean solution, but a hack that works.
First, I also discovered that this "bug" seems to affect all systems, just not the same way.
Using DwmGetCompositionTimingInfo, once can make v-synced GDI animation. That's in theory, because in practice it won't work, because of the same "bug", even on systems not visibly affected by what I describe above. The DWM will simply decide not to refresh anything when there isn't enough to fresh, and that will cause frameskips when scrolling a window using ScrollWindowEx and not enough pixels are invalidated.
So what's the solution? Fool the DWM into thinking that it has to swap buffers immediately, as it's what the whole problem is about. When doing GDI operations, one might think that the result will appear at the DWM's next vblank-synced refresh, but it's not the case. Some parts will, some will be delayed. So the trick is to force the DWM to swap, here's what I found about it:
-first, DWMFlush does not do that (nor does GDIFlush). DWMFlush is more or less a WaitForVBlank anyway
-DWMSetPresentParameters doesn't seem to allow this either, even though I didn't bother that much with that function since it's already gone in Windows 8
-the DWM seems to refresh when there are enough pixels to swap, but it also seems to be partitionned, quite possibly into wide but short rectangles (that's what it appears to be on Surface Pro - but not on my desktop). Whether it's related to apertures to the VRAM or segmentation into small textures, I have no idea, really, maybe someone knows?
So what works for me: telling the GDI to refresh vertical stripes, 1 pixel wide, about 500 pixels apart, all over the screen. If you only do it on parts of the screen, you'll still get flicker on other parts.
It also works using horizontal stripes but then you'll need a lot more of them, that's why I believe that the segmentation is done in wide, short rectangles. You can kinda see those rectangles when tricking the DWM.
Which GDI functions work? Many do, but they don't all have the same CPU usage.
First, forget GetPixel, it works but the CPU usage is obviously extremely high.
Second, the GDI seems to be intelligent enough to detect things that can be buffered just as commands, so filling a rectangle with a null brush, or drawing empty text, won't force the DWM to refresh.
What does work is BitBlt or AlphaBlend using empty vertical stripes. The CPU usage is still ok, even though it's not so far from blitting a whole screen.
It has to be done on a top-level window, not the desktop.
Creating a Direct2D render target for the DC & doing a begin/enddraw also works, but it's normal since it will force a refresh of the full rectangle, & thus the CPU cost is higher.
If anyone knows a better way to force a DWM refresh, I'd like to know. But since Windows 8 already made obsolete most of the interesting DWM functions (fortunately DwmGetCompositionTimingInfo still partially works, or it would be impossible do vsynced timers without DirectX), I'm not sure there's any better way.
Also, it doesn't have to be done on all top-level windows. You can see the effect working on a Windows desktop blue lasso when it gets near a top-level window that invalidates stripes, it stops flickering as soon as it enters a zone near it (the segmentation I'm talking of above).
Here's a couple of general GDI-related pointers regarding your code:
When you call InvalidateRect API it may not immediately dispatch WM_PAINT notification, so calling InvalidateRect two times in a row like you did in your TForm1.FormMouseMove method is most certainly causing this visual effect. The first redrawing is not yet processed when you call it the second time.
I'm not sure what Canvas.Rectangle exactly does "on the inside" or which APIs it calls. I'm assuming it is using DrawFocusRect, in which case you have to be aware that its drawing is XORed, so doing it twice over the same rectangle will erase it.
OK, so here's what I'd do if I were drawing that selection box:
Call InvalidateRect API only once. For that you will have to calculate the bounding rectangle that will include position of the selection box before and after the move. If I'm not mistaking, you can use UnionRect API for that, or just calculate the bounding rect yourself.
If avoiding a visual lag is important, I'd do all the drawing of the selection box in TForm1.FormMouseMove. You can get a device context by calling GetDC API on your window handle. After that do the same drawing. In this case you will not need any invalidation. (Sorry, I can't give you a procedure for Delphi. This is how I'd do it with plain WinAPIs.)
EDIT: After reviewing your C++ code I was able to reproduce the visual flicker you're describing. I also made two C++ projects, derived from your original code in hopes of solving it: Here's a simple version and here's the one with mouse dragging support. None of them solved the issue though.
I was able to reproduce the flicker on a Windows 7 desktop. After having done some tests I am convinced now that this visual artifact is caused by a display driver. In my case the desktop has ATI Radeon video card. This is how I was able to conclude that: I ran the test executable on a different (older) desktop with Windows XP OS and the flicker was not present. I ran the exact same executable in a virtual machine with Windows XP installed in it on the desktop computer that was producing the flicker. The flicker was present when the executable ran in a Windows XP in that virtual machine. Since the same video driver is responsible for rendering the actual desktop and the one in the virtual machine, there's most definitely some "funny business" going on with optimization or caching in the video driver, that is causing this artifact.
So, the question now is how to resolve it. In my second build of your project, I added the code to render the whole client area for the window to eliminate the possibility of a calculation error, but that did not help. So at this point you're helpless to resolve it via plain APIs.
Your next steps should probably be these:
Contact the driver maker for the video card that you're experiencing this issue on. See if they help. I'd show them your YouTube video of the issue and give them the C++ project to reproduce it with.
Post a question on Windows Driver Development forums, preferably for video driver developers. Unfortunately it's a dwindling community, so expect long delays.
Change the tags on your question here. Add these: C, C++, WinAPI, GDI, DWM and remove what you have there now. This way it will be visible for a Win32 dev community so you'll get more views.
Other than that, good luck! This is such a minor bug (even if one can call it that) that I doubt that you'll get any serious attention to it. Although this is very admirable, in my book, that you're trying to achieve such perfection for your software.
Also, if you need me to second you on this, I can do so.

Qt+OpenGl+SimpleGl

I have enabled qt+OpenGl+SimpleGl on one of the ARM platform and was able to run opengl example programs.
I also has a qt+Webkit, which is working with a graphic plugin.
I wanted to use simpleGl context for every thing, instead of using the normal graphic screen. So, when I try to run Qt+Webkit with simpleGl, I just get a blank screen.
Does QT support this? If so how can we make it?
Yes, this is correct. OpenGL draws directly to the framebuffer. The simplegl driver doesn't handle what is drawn using the raster paint engine of the QWS, so you may see only black.
Using simplegl for "everything" means you want everything to be drawn using OpenGL in your EGL full-screen window? This is possible under some assumptions. You have to write all your applications to be rendered using the Qt OpenGL paint engine (using the opengl graphics system is not supported under Qt/E). This is possible also for QtWebKit, I'm doing it now. Note that this does not mean that everything is rendered using hardware acceleration. You'll have to write your applications "the right way" to get all actually hardware accelerated. Consider that you'll have to handle the mouse pointer some other way in this case.
The other way is to just modify the simplegl driver to allow for the use of Qt applications using the raster paint engine. This is possible as well with some limitations. Qt can use blit to place its own windows over OpenGL. Look for the framebuffer driver inside the Qt source tree to know how to do this. You can then have common Qt applications and OpenGL Qt applications some way. I'm doing this as well. Not everything can be done anyway.
EDIT: I'm sure you already did, but in case, give this http://doc.qt.io/qt-4.8/qt-embeddedlinux-opengl.html much attention.
Unfortunately I don't know anything about SimpleGL, but I do know that there is a way to render a standard Qt widget in a QGLWidget. Maybe have a look at this Qt Quarterly which I think is somewhat related to your question:
http://doc.qt.nokia.com/qq/qq26-openglcanvas.html

How can I get a smooth text crawl using Flex?

I'm working on a standalone Flash application (written using Flex 3/ActionScript 3) that features a text crawl, like what you might see at the bottom of your TV when watching a cable news channel; it's a long narrow box that text moves across from right to left.
I've implemented it by creating a Label element, populating it with text, and then moving it using a mx:Move object with a Linear.easeNone easing function. It works, but it has ample room for improvement. It looks a bit jerky, and tends to have a fair amount of "tearing" (the top and bottom halves of the text sometimes fall out of sync).
I tried throwing math at the problem to get the crawl's movement rate synced with the monitor's refresh rate, but that was a bust. I found out the hard way that the app's frame rate jumps around too much; the "optimized" crawl varied between looking silky smooth and like it had epilepsy.
Is there anything else folks would recommend I try to smooth this thing out? Is there some alternate design you'd recommend I try?
Edit: Some context: the crawl is part of a digital signage application (played from a standalone Flash projector -- no web browser) that does stuff elsewhere on the screen, including video playback and rendering text and images. It definitely gets choppier during video playback, but it's never as smooth as I'd like it to be.
There are two potential solutions to this problem, but both have caveats, the first because of your use of Flex and a standalone projector, the second because it is a mitigator, not a complete solution.
Hardware Acceleration
When publishing your file, you can attempt to have Flash utilize hardware acceleration to alleviate the vertical refresh issue you are running into that is causing tearing. Sadly, Flex Builder 3 is incapable of enabling this setting at the SWF (projector) level (Link to bug). This has yet to be resolved and has been pushed from 4.0 to 4.1 to 4.x... If and when it is resolved, it will likely be a compiler argument in the project settings of Flash Builder 4.
You may be able to determine if this solution works for you by outputting your projector as a standard SWF and embedding it on an HTML document with the wmode set to "direct" or "gpu". Sadly, if it does (it should), you can't use it right now anyway. If you have Flash Builder 4, certain projects are capable of making round trips between FB4 and Flash Professional CS5, though I am not sure what the criteria for that is (my current AIR project has all the project modification menu options grayed out). If you do manage to get your project into Flash, you can enable hardware acceleration in the Publish Settings of the project (File->Publish Settings->Flash tab->Hardware Acceleration option in CS5).
This method is almost a certain solution for your problem, though it has two issues, one already highlighted above, and (for people publishing for the web) that by utilizing direct or GPU rendering on a webpage, you are unable to layer any DOM elements on top of flash.
direct: This mode tries to use the fastest path to screen, or direct path if you will. In most cases it will ignore whatever the browser would want to do to have things like overlapping HTML menus or such work. A typical use case for this mode is video playback. On Windows this mode is using DirectDraw or Direct3D on Vista, on OSX and Linux we are using OpenGL. Fidelity should not be affected when you use this mode.
gpu: This is fully fledged compositing (+some extras) using some functionality of the graphics card. Think of it being similar to what OSX and Vista do for their desktop managers, the content of windows (in flash language that means movie clips) is still rendered using software, but the result is composited using hardware. When possible we also scale video natively in the card. More and more parts of our software rasterizer might move to the GPU over the next few Flash Player versions, this is just a start. On Windows this mode uses Direct3D, on OSX and Linux we are using OpenGL.
**Source*
Direct is the ideal option for this situation, as you can actually have performance degredation with "gpu" as well as visual differences from graphics card to graphics card.
Lower your framerate
The Flash player will continue to play video at its native refresh rate independent of the rest of your project as long as you keep the framerate at or above approximately 2FPS (though I suggest 5FPS minimum). You won't want to run that low for this example, but you are able to lower the framerate of the entire scene without impacting video performance. The closer your framerate is to the screen refresh rate, the more apt you are to actually create the tearing effect unless you are able to absolutely sync with the monitor's refresh rate, which you probably cannot do without the above... Hardware Acceleration.
This problem has existed in the Flash Player for as long as it has been able to move objects horizontally. What happens is that Flash updates a buffered snapshot of the running animation at the same time that the screen is refreshing. If the buffered snapshot changes partway through a screen refresh, you get a tear. This is why lowering the framerate actually reduces the amount of tearing, you are refreshing the buffer less frequently.
As #Tegeril mentioned, using Flex is one of the reasons. Flex is a pretty heavy framework and it does a lot of things behind the scenes. If you're familiar with the life cycle of a component(especially invalidating properties, invalidating the display list, etc.).
As a few minor things that might improve performance:
try to keep a simple display list. If you know the app will always be displayed at one size, then flex won't waste time traversing the display list/tree up to the top and back for measurements. Also, try to use a Canvas. I know, it's not very clean, but since it uses absolute values and doesn't check with the 'parents' much, it should be faster than other containers(like HBox,VBox, etc.)
try to display the video at it's full size(make sure the encoded video dimensions are right so there be any CPU cycles on resizing video
Ok, this was Flex stuff.
It might be very handy to read sencular's article on Asynchronous ActionScript Execution which explains how Flash Player handles updates and renders.
(source: senocular.com)
Frames both execute ActionScript and render the screen
(source: senocular.com)
ActionScript taking a long time to complete delays rendering
I imagine the jerkiness is related to this. Also, I'm guessing you might
get moments of smooth movement then sudden halts, every now and then, when
Flash Player catches it's breath(Garbage Collector cleans up)
Victor Drâmbă article on “Multithreading” in Actionscript might also
be useful.
Soo, to recap:
use Profiler or something and see if the Flex framework is slowing you down, or where the 'bottleneck' is
improve as much as you can on that side then check if it's how Flash Player handles all the actionscript('elastic' frames)
If the bottleneck comes from the Flex framework, worst case, you
can try to minimise the number of components that traverse the display list,
and use pure actionscript for the other things(as #PatrickS suggested, use TweenLite, etc.)
If it helps, try to preload data(fetch rss feed and all that) at the start, and when you've got most of the important bits that don't require 'refreshes'/loads frequently, display the app. You will use more memory, but will have more cpu cycles to spare for other tasks.
Also, if it's display objects that are the 'bottleneck' and there's plenty of them, check if you can reuse them using Object Pools.
HTH
TweenMax or even TweenLite ( http://www.greensock.com )handles this sort of job pretty well. What else is your app doing while the text is scrolling though? Is it possible that some other processes are interfering?
This may not be helpful, but have you considered putting the crawling text into the html DOM and using CSS transitions to crawl the text. Obviously there's the IE problem, but it should be supported in IE9 and you could use javascript as a fallback.
This may seem silly, but CSS transitions are getting hardware acceleration and separate processes for plugins meaning on a multicore machine you could get parallel threads.
One thing you might consider is to move your label incrementally using a Timer instead of an easing function. That way you can take advantage of the updateAfterEvent method to get smoother rendering. Here's a link to an article/video from Chet Haase (Adobe's Flex graphics dude) that explains usage along with an example app with code:
http://graphics-geek.blogspot.com/2010/04/video-event-performance-in-flex.html
Hope that helps.

Resources