How to create a large Compatible Memory DC in GDI programming? - gdi

I want to create a large CompatibleDC, draw a large image on it, then bitblt part of the image to other DC, in order to achieve high performance. I am using the following code to create compatible Memory DC. But when the rect becomes very large, etc: 5000*5000, the CompatibleDC created become unstable. sometimes it is OK, sometimes it failed. is there any thing wrong with my code?
input :pInputDC
output:pOutputMemDC
{
pOutputMemDC=new CDC();
VERIFY(pOutputMemDC->CreateCompatibleDC(pInputDC));
CRect rect(0,0,nDCWidth,nDCHeight);
CBitmap bitmap;
if (bitmap.CreateCompatibleBitmap(pInputDC, rect.Width(), rect.Height()))
{
pOutputMemDC->SetViewportOrg(-rect.left, -rect.top);
m_pOldBitmap = pOutputMemDC->SelectObject(&bitmap);
}
CBrush brush;
VERIFY(brush.CreateSolidBrush(RGB(255,0, 0)));
brush.UnrealizeObject();
pOutputMemDC->FillRect(rect, &brush);
}

Instead of creating a large DC and then blitting a portion of it another, smaller DC, create a DC the same size as the destination DC, or at least the same size as the blit destination. Then, offset all your drawing commands by the (-x,-y) of the sub section you want to copy. If your destination is (100,200)-(400,400) on the source then create a DC (300x200) and offset everything by (-100,-200).
This has two big advantages: firstly, the memory required is much smaller. Secondly, GDI will clip your drawing operations to the size of the DC (it always clips anyway). Although the act of clipping takes CPU time, the time saved by not drawing pixels that aren't seen more than makes up for it.
Now, if this large DC is something like an image (JPEG for example) then you need to look into other methods. One technique used by many image editing programs is to split the image into tiles and page the tiles to/from memory/hard disk. Each tile is its own DC and you only have enough source DCs to fill the target DC. As the view window moves across the large image, unload tiles that have moved out of the target rectangle and load tiles that have become visible.

Each 5000x5000 pixel image needs ca. 100MB of RAM. Depending on how much RAM your PC has, this might already be the problem.
If you have 1GB of RAM or more, then that's probably not the issue. In this case, you must have a memory leak. Where do you free the allocated bitmap? I see that you unrealize the brush but how about the bitmap?
Note that increasing your swap won't help since that will kill your performance.

Make sure you are selecting all original GDI objects to the DCs.
The problem may be that your Bitmap is still selected into the pOutputMemDC when it is being destroyed and one of them or both can't be deleted properly. Thus problems with memory might begin.

Related

Cropping YUV_420_888 images for Firebase barcode decoding

I'm using the Firebase-ML barcode decoder in a streaming (live) fashion using the Camera2 API. The way I do it is to set up an ImageReader that periodically gives me Images. The full image is the resolution of my camera, so it's big - it's a 12MP camera.
The barcode scanner takes about 0.41 seconds to process an image on a Samsung S7 Edge, so I set up the ImageReaderListener to decode one at a time and throw away any subsequent frames until the decoder is complete.
The image format I'm using is YUV_420_888 because that's what the documentation recommends, and because if you try to feed the ML Barcode decoder anything else it complains (run time message to debug log).
All this is working but I think if I could crop the image it would work better. I'd like to leave the camera resolution the same (so that I can display a wide SurfaceView to help the user align his camera to the barcode) but I want to give Firebase a cropped version (basically a center rectangle). By "work better" I mean mostly "faster" but I'd also like to eliminate distractions (especially other barcodes that might be on the edge of the image).
This got me trying to figure out the best way to crop a YUV image, and I was surprised to find very little help. Most of the examples I have found online do a multi step process where you first convert the YUV image into a JPEG, then render the JPEG into a Bitmap, then scale that. This has a couple of problems in my mind
It seems like that would have significant performance implications (in real time). This would help me accomplish a few things including reducing some power consumption, improving the response time, and allowing me to return Images to the ImageReader via image.close() more quickly.
This approach doesn't get you back to Image, so you have to feed firebase a Bitmap instead, and that doesn't seem to work as well. I don't know what firebase is doing internally but I kind of suspect it's working mostly (maybe entirely) off of the Y plane and that the translation if Image -> JPEG -> Bitmap muddies that up.
I've looked around for YUV libraries that might help. There is something in the wild called libyuv-android but it doesn't work exactly in the format firebase-ml wants, and it's a bunch of JNI which gives me cross-platform concerns.
I'm wondering if anybody else has thought about this and come up with a better solution for croppying YUV_420_488 images in Android. Am I not able to find this because it's a relatively trivial operation? There's stride and padding to be concerned with among other things. I'm not an image/color expert, and i kind of feel like I shouldn't attempt this myself, my particular concern being I figure out something that works on my device but not others.
Update: this may actually be kind of moot. As an experiment I looked at the Image that comes back from ImageReader. It's an instance of ImageReader.SurfaceImage which is a private (to ImageReader) class. It also has a bunch of native tie-ins. So it's possible that the only choice is to do the compress/decompress method, which seems lame. The only other thing I can think of is to make the decision myself to only use the Y plane and make a bitmap from that, and see if Firebase-ML is OK with that. That approach still seems risky to me.
I tried to scale down the YUV_420_888 output image today. I think it quite similar to the cropping image.
In my case, I will put the 3 byte arrays from the Image.plane for representing the Y,U and V.
yBytes = Image.plane[0]
uBytes = Image.plane[1]
vBytes = Image.plane[2]
Then I convert it to the RGB array for bitmap converting. I found that if I read the YUV array with the original Image width and height by step 2 Then I can scale half of my Bitmap Image.
What should you prepare:
yBytes,
uBytes,
vBytes,
width of Image,
height of Image,
y row stride,
uv RowStride,
uv pixelStride,
output array (The length of output array length should be equal to the output image width * height. For me the size is 1/4 original width and height)
It means if you can find the cropping area positions(The four corner of the Image) in your image, you can just fill the new RGB array with the YUV data.
Hope it can help you to solve the problem.

OpenCL: know local work group size in advance?

I'm working on optimizing a separable image downscaler. My next step is reduction of multiple samplings (nearest) of the same texel by reading all necessary texels into local memory. Here begins the fun...
The downscaler is versatile, so it can downscale anything larger into anything smaller and even take sections of an image and downscale it into a destination image. Thus the final resolution divider never is a whole number. Most of the time it will be something around 3.97 or such. This means: I do not know the required size for that local array at compile time.
To me that means: before enqueuing a task, I'll have to create a local mem object of the required size.
How do I know what workgroup sizes OpenCL will select?
If there is no way, is there a "best practice" to overcome this problem?
P.S.: I'm writing for OpenCL 1.1 compatibility.
Since you are using images, the texture cache can be relied upon instead of using shared local memory.

Qt4/Opengl bindTexture in separated thread

I am trying to implemente a CoverFlow like effect using a QGLWidget, the problem is the texture loading process.
I have a worker (QThread) for loading images from disk, and the main thread checks for new loaded images, if it finds any then uses bindTexture for loading them into QGLContext. While the texture is being bound, the main thread is blocked, so I have a fps drop.
What is the right way to do this?
I have found that the default behaviour of bindTexture in Qt4 is extremelly slow:
bindTexture(image,target,format,LinearFilteringBindOption | InvertedYBindOption | MipmapBindOption)
using only the LinearFilteringBindOption in the binding options speeds up the things a lot, this is my current call:
bindTexture(image, GL_TEXTURE_2D,GL_RGBA,QGLContext::LinearFilteringBindOption);
more info here : load time for a 3800x2850 bmp file reduced from 2 seconds to 34 milliseconds
Of course, if you need mipmapping, this is not the solution. In this case, I think that the way to go is Pixel Buffer Objects.
Binding in the main thread (single QGLWidget solution):
decide on maximum texture size. You could decide it based on maximum possible widget size for example. Say you know that the widget can be at most (approximately) 800x600 pixels and the largest cover visible has 30 pixels margins up and down and 1:2 aspect ratio -> 600-2*30 = 540 -> maximum size of the cover is 270x540, e.g. stored in m_maxCoverSize.
scale the incoming images to that size in the loader thread. It doesn't make sense to bind larger textures and the larger it is, the longer it'll take to upload to the graphics card. Use QImage::scaled(m_maxCoverSize, Qt::KeepAspectRatio) to scale loaded image and pass it to the main thread.
limit the number of textures or better time spent binding them per frame. I.e. remember the time at which you started binding textures (e.g. QTime bindStartTime;) and after binding each texture do:
if (bindStartTime.elapsed() > BIND_TIME_LIMIT)
break;
BIND_TIME_LIMIT would depend on frame rate you want to keep. But of course if binding each one texture takes much longer than BIND_TIME_LIMIT you haven't solved anything.
You might still experience framerate drop while loading images though on slower machines / graphics cards. The rest of the code should be prepared to live with it (e.g. use actual time to drive animation).
Alternative solution is to bind in a separate thread (using a second invisible QGLWidget, see documentation):
2. Texture uploading in a thread.
Doing texture uploads in a thread may be very useful for applications handling large amounts of images that needs to be displayed, like for instance a photo gallery application. This is supported in Qt through the existing bindTexture() API. A simple way of doing this is to create two sharing QGLWidgets. One is made current in the main GUI thread, while the other is made current in the texture upload thread. The widget in the uploading thread is never shown, it is only used for sharing textures with the main thread. For each texture that is bound via bindTexture(), notify the main thread so that it can start using the texture.

How to render a SlimDX scene directly to a GDI bitmap

Is there a way to set the render target to a GDI bitmap in SlimDX so that as soon as the scene is rendered I can immediately BitBlt the render out of there for processing in another thread and continue rendering?
Is it necessary to render to a texture and then copy the contents out to the bitmap? I would like to be able to do this without any unnecessary copying. I'm going to need every speedup I can get.
Sorry, you do need to render to a RenderTarget then copy that resource into a Texture2D then you can map the data and get the pixels into your bitmap.
The memory for RenderTargets is marked for a special kind of use by the graphics card and cannot be read from directly
The memory for Textures can be marked so that it can be read but only through the API as it is still held on the graphics card (some exceptions but DirectX has to go with the lowest common denominator)
If you need the extra speed reuse the same bitmap or have an array of prepared bitmaps ready to fill and keep them on rotation.
And as ever, measure how much time these things are consuming with a profiler so that you can quantify bottlenecks.

Best way to show image sequence as a movie in Adobe AIR

I need to show an image sequence as a movie in an Adobe AIR application - i.e. treat lots of images as video frames and show the result. For now I am going to try simply loading them and displaying in a movie clip but this might be too slow. Any advanced ideas how to make it work? Images are located on a hard drive or very fast network share, so the bandwidth should be enough. There can be thousands of them, so preloading everything to memory doesn't seem feasible.
Adobe AIR is not 100% decided, I am open to other ideas how to create a cross-platform desktop application for this purpose quickly enough.
You could have an image control as your movie frame, then load up a buffer of BitmapData objects. Fill the BitmapData objects with the images as they come in, and then call the image load function to load the next image in the buffer.
private drawNextImage(bitmapData:BitmapData):void {
movieFrame.load(new Bitmap(bitmmapData));
}
In case the images aren't big but you have a lots of them it can be interesting to group sequences on single bitmaps (à la mipmap). This way you can load in say, one bitmap containing say, 50 images forming 2 seconds of video playback at 25 fps.
This is method is specially useful online as you want to limit the amount of pings and handshakes causing slowness but I reckon it can also be useful in order to optimize loading, unloading and memory access.

Resources