I am trying to implemente a CoverFlow like effect using a QGLWidget, the problem is the texture loading process.
I have a worker (QThread) for loading images from disk, and the main thread checks for new loaded images, if it finds any then uses bindTexture for loading them into QGLContext. While the texture is being bound, the main thread is blocked, so I have a fps drop.
What is the right way to do this?
I have found that the default behaviour of bindTexture in Qt4 is extremelly slow:
bindTexture(image,target,format,LinearFilteringBindOption | InvertedYBindOption | MipmapBindOption)
using only the LinearFilteringBindOption in the binding options speeds up the things a lot, this is my current call:
bindTexture(image, GL_TEXTURE_2D,GL_RGBA,QGLContext::LinearFilteringBindOption);
more info here : load time for a 3800x2850 bmp file reduced from 2 seconds to 34 milliseconds
Of course, if you need mipmapping, this is not the solution. In this case, I think that the way to go is Pixel Buffer Objects.
Binding in the main thread (single QGLWidget solution):
decide on maximum texture size. You could decide it based on maximum possible widget size for example. Say you know that the widget can be at most (approximately) 800x600 pixels and the largest cover visible has 30 pixels margins up and down and 1:2 aspect ratio -> 600-2*30 = 540 -> maximum size of the cover is 270x540, e.g. stored in m_maxCoverSize.
scale the incoming images to that size in the loader thread. It doesn't make sense to bind larger textures and the larger it is, the longer it'll take to upload to the graphics card. Use QImage::scaled(m_maxCoverSize, Qt::KeepAspectRatio) to scale loaded image and pass it to the main thread.
limit the number of textures or better time spent binding them per frame. I.e. remember the time at which you started binding textures (e.g. QTime bindStartTime;) and after binding each texture do:
if (bindStartTime.elapsed() > BIND_TIME_LIMIT)
break;
BIND_TIME_LIMIT would depend on frame rate you want to keep. But of course if binding each one texture takes much longer than BIND_TIME_LIMIT you haven't solved anything.
You might still experience framerate drop while loading images though on slower machines / graphics cards. The rest of the code should be prepared to live with it (e.g. use actual time to drive animation).
Alternative solution is to bind in a separate thread (using a second invisible QGLWidget, see documentation):
2. Texture uploading in a thread.
Doing texture uploads in a thread may be very useful for applications handling large amounts of images that needs to be displayed, like for instance a photo gallery application. This is supported in Qt through the existing bindTexture() API. A simple way of doing this is to create two sharing QGLWidgets. One is made current in the main GUI thread, while the other is made current in the texture upload thread. The widget in the uploading thread is never shown, it is only used for sharing textures with the main thread. For each texture that is bound via bindTexture(), notify the main thread so that it can start using the texture.
Related
I am writing an application and there will potentially be tens of thousands of labels (a log-viewing application of a sort), most of them hidden with QWidget::hide(). I imagine a QLabel, when created, takes up some video memory. Now, does hide() free that video memory? Or will I have to QWidget::remove() most of those hidden labels to keep video memory usage at a reasonable level?
In general, most widgets do not store their pre-rendered images in memory. Instead, they are render themselves on demand after being invalidated. However, some do it if render is time-consuming. Took a look at QLabel source code (http://code.qt.io/cgit/qt/qtbase.git/tree/src/widgets/widgets/qlabel.cpp), it seems that QLabel caches its pixmap when scaledContents is enabled and scaling is necessary. Plain text-only labels are painted as-is without any caching.
Still, as #G.M mentioned, each widget consumes some system memory to store its own data, and processing time due to event handling, so producing 10k labels is a reasonable resource waste. In contrast, item views are single widgets that draws items on their surface. No event handling overhead, no unnecessary caches. Like QLabels, item view items are perfectly stylable, see http://doc.qt.io/archives/qt-5.8/stylesheet-examples.html#customizing-qlistview, http://doc.qt.io/archives/qt-5.8/stylesheet-examples.html#customizing-qtreeview for details. More complex looks like multi-line list items are achievable with QItemDelegate: Qt QListWidgetItem Multiple Lines
I need to render lots (hundreds) of similar spheres and cylinders with different transforms. Currently this is achieved by creating hundreds of identical QEntity objects. The result is the app's constantly consuming 20..70% of CPU -- even when the scene is still.
Is there a default source of update events for the widget? If there is one, can I turn it off or reduce its frequency? There seems to be no other source of CPU load but the Qt3D widget.
Did you have a look at the enum of the QRenderSettings class? It seems like you can set the render policy to OnDemand which causes Qt to only render the scene when something changes.
Alternatively, if you don't need interaction with the scene you could have a look at my implementation of an offline renderer. The underlying QAspectEngine starts and stops whenever you set a root entity. You could set your root entity, capture the frame and unset the root entity, causing the graphics loop to stop which would result in less CPU load I would guess.
My current code follows this: Disk > bytearray > QImage (or QPixmap) > Painter
// Step 1: disk to qbytearray
QFile file("/path/to/file");
file.open(QIODevice::ReadOnly);
QByteArray blob = file.readAll();
file.close();
// Step 2: bytearray to QImage
this->mRenderImg = new QImage();
this->mRenderImg->loadFromData(blob, "JPG");
// Step 3: render
QPainter p(this);
QRect target; // draw rectangle
p.drawImage(target, *(this->mRenderImg));
Step 1 takes 0.5s
Step 2 takes 3.0s - decoding jpeg is the slowest step
Step 3 takes 0.1s
Apparently, decoding jpeg data is the slowest step. How do I make it faster?
Is there a third party library that can transform bytearray of jpeg data to bytearray of ppm that can be loaded to QImage faster?
Btw, Using QPixmap takes the same time as QImage.
I was able to reduce QImage load time significantly using libjpeg-turbo. Here are the steps
Step 1: Load .jpeg in memory filebuffer
Step 2a: Use tjDecompress2() to decompress filebuffer to uncompressedbuffer
Step 2b: Use QImage(uncompressedbuffer, int width, int height, Format pixfmt) to load the QImage
Step 3: Render
Step 2a and 2b combined offers atleast 3x speedup compared to QImage::loadFromData()
Notes
PixelFormat used in libjpeg's tjDecompress2() should match with format specified in step 2b
You can derive the width and height used in step 2b using tjDecompressHeader2()
I think your question is more addressing a end result of a design issue rather than the design issue itself. Here are my thoughts related to the loading times of jpegs or images in general in GUIs.
Loading of data from a harddrive is a common bottleneck of software design. You have to get the harddrive to spin to that location and pull it out and copy it to the ram. Then when you are ready, you have the ram push it to the video buffer or maybe to the graphics card.
An SSD will be much faster, but demanding this of your end user is not practical in most situations. Loading the image once on startup and never closing your program is a way to avoid hitting this delay multiple times.
It is a big reason why lots of programs have a loading bar or a splash screen when it is starting up, so that it doesn't have a slow user experience when pulling up data.
Some other ways that programs handle longer processes are with the classic hour glass, or the spinning beach ball, or the "wait a bit" gifs are common.
Probably the best example of handing lots of large jpegs is google maps or some of the higher quality photo manager programs such as Picasa.
Google maps stores many different resolutions of the same area, and tiles and then load then specific to the resolution that can be viewed. Picasa "processes" all the images it finds, and stores a few different thumbnails for each one that can be loaded much faster than the full resolution image (in most cases).
My suggestion would be to either store a copy of your jpeg at a lower resolution, load that one, and then after it is loaded, replace it with the high resolution one when it is loaded, or to look into breaking your image into tiles and load them as needed.
On another related note, if you UI is getting slowed down by the jpeg loading, move the loading part into a thread and keep your UI responsive!
Hope that helps.
You asked about third-party software - -
ImageMagick can do this task quickly (even with larger files):
convert 1.jpg 2.jpg 3.jpg outputfilenamehere.ppm
While you have the filestream open, you can do numerous operations...
Hope that helps,
Gette
Is there a way to set the render target to a GDI bitmap in SlimDX so that as soon as the scene is rendered I can immediately BitBlt the render out of there for processing in another thread and continue rendering?
Is it necessary to render to a texture and then copy the contents out to the bitmap? I would like to be able to do this without any unnecessary copying. I'm going to need every speedup I can get.
Sorry, you do need to render to a RenderTarget then copy that resource into a Texture2D then you can map the data and get the pixels into your bitmap.
The memory for RenderTargets is marked for a special kind of use by the graphics card and cannot be read from directly
The memory for Textures can be marked so that it can be read but only through the API as it is still held on the graphics card (some exceptions but DirectX has to go with the lowest common denominator)
If you need the extra speed reuse the same bitmap or have an array of prepared bitmaps ready to fill and keep them on rotation.
And as ever, measure how much time these things are consuming with a profiler so that you can quantify bottlenecks.
I want to create a large CompatibleDC, draw a large image on it, then bitblt part of the image to other DC, in order to achieve high performance. I am using the following code to create compatible Memory DC. But when the rect becomes very large, etc: 5000*5000, the CompatibleDC created become unstable. sometimes it is OK, sometimes it failed. is there any thing wrong with my code?
input :pInputDC
output:pOutputMemDC
{
pOutputMemDC=new CDC();
VERIFY(pOutputMemDC->CreateCompatibleDC(pInputDC));
CRect rect(0,0,nDCWidth,nDCHeight);
CBitmap bitmap;
if (bitmap.CreateCompatibleBitmap(pInputDC, rect.Width(), rect.Height()))
{
pOutputMemDC->SetViewportOrg(-rect.left, -rect.top);
m_pOldBitmap = pOutputMemDC->SelectObject(&bitmap);
}
CBrush brush;
VERIFY(brush.CreateSolidBrush(RGB(255,0, 0)));
brush.UnrealizeObject();
pOutputMemDC->FillRect(rect, &brush);
}
Instead of creating a large DC and then blitting a portion of it another, smaller DC, create a DC the same size as the destination DC, or at least the same size as the blit destination. Then, offset all your drawing commands by the (-x,-y) of the sub section you want to copy. If your destination is (100,200)-(400,400) on the source then create a DC (300x200) and offset everything by (-100,-200).
This has two big advantages: firstly, the memory required is much smaller. Secondly, GDI will clip your drawing operations to the size of the DC (it always clips anyway). Although the act of clipping takes CPU time, the time saved by not drawing pixels that aren't seen more than makes up for it.
Now, if this large DC is something like an image (JPEG for example) then you need to look into other methods. One technique used by many image editing programs is to split the image into tiles and page the tiles to/from memory/hard disk. Each tile is its own DC and you only have enough source DCs to fill the target DC. As the view window moves across the large image, unload tiles that have moved out of the target rectangle and load tiles that have become visible.
Each 5000x5000 pixel image needs ca. 100MB of RAM. Depending on how much RAM your PC has, this might already be the problem.
If you have 1GB of RAM or more, then that's probably not the issue. In this case, you must have a memory leak. Where do you free the allocated bitmap? I see that you unrealize the brush but how about the bitmap?
Note that increasing your swap won't help since that will kill your performance.
Make sure you are selecting all original GDI objects to the DCs.
The problem may be that your Bitmap is still selected into the pOutputMemDC when it is being destroyed and one of them or both can't be deleted properly. Thus problems with memory might begin.