Qt and "SGX Out of mem event" on Maemo - qt

I'm still fighting with Qt and I managed to get an "SGX Out of mem event" on Nokia N900. It happens when I load some .obj models in my QGraphicsScene (usually after the fourth-fifth). Any idea on what is causing it or how I could trace it?

My guess is that you are running out of graphics memory (the N900's GPU is an Imagination Technologies PowerVR SGX 530).
As far as I can see, the N900 does not have any EGL extensions for directly querying graphics memory usage. In this case, the best you can do may be to reduce graphics memory usage by limiting the complexity of the scene you are trying to render - in other words, load fewer OBJ models, or reduce the complexity (number of polygons) of individual models.

Related

Mouse pointer lags in QT5 app on TI Sitara

We are using TI Sitara AM33 system on chip with 600 Mhz clock and 256 Mb ram.
OS is OE Yocto v2.1 Krogoth, kernel 4.4.19. Video driver - DRM/KSM
We are having issues with mouse performance.
I have made a little vedio to demonstrate the effect:
https://www.youtube.com/watch?v=5dRDGzhcnn0
Note how mouse pointer is moving smoothly on the blank area of the window and lags over at controls. It's as if it is going through jelly. If you have more controls on the window, mouse becomes so laggy, it's unusable. CPU load is minimal though.
There could be no error in the example app in the vedio - we created a blank QT Widget project, put the controls on the form and that's it, it is not doing anything else at all.
Has anyone seen such mouse issues?
If you're not using an X server, then you need to check what platform plugin is Qt using on your platform. Perhaps that plugin is broken or not the best choice in your situation.
Your application is also very unlikely to use GPU in any capacity other than to composite the windows (if at all), so the CPU load being low is rather telling.
It seems as if the event dispatch system on your platform was very slow the more widgets there are. This is unlikely to have much to do with the graphics side of things. In a process of elimination perhaps you could first benchmark the performance of synchronization primitives (QBasicMutex and QMutex) and atomic integers and pointers to ensure that they are configured correctly for your platform.

Shipping reliable OpenCL applications - Tools/Techniques/Tips?

I want to ship OpenCL code that should work on all OpenCL 1.1 compatible GPUs. Rather than buying a bunch of GPUs and testing on them, are there any tools that can help ensure reliability?
If anyone has experience shipping OpenCL applications to a wide hardware base, I'd be interested in knowing about any other methods for testing reliability.
I've a bit of knowledge on this. Unfortunately, the answer is: depends on what the kernel is doing.
My biggest gripe is with NVIDIA and OpenCL, since they don't seem to support: vectors (float2, 4, etc) and global offsets. Kind of obnoxious. Intel and ATI are both solid, but even then vector sizes can differ. The above doesn't really matter if you are doing image convolution.
It matters if you want to run AMD FFT on an NVIDIA card, are doing matrix math, etc. To address the vector issue, you can write multiple kernels that each have a different vector size and call the right one: MatrixMult_float4(...).
You can check whether your code compiles by using the AMD KernelAnalyzer2, although this does need some component of the Catalyst drivers so it only works for me on PCs with AMD GPUs. There is also the Intel Kernel Builder, which works for devices with Intel OpenCL SDK support. Nvidia's implementation has bugs in it, especially on newer GPUs in my experience so there the best is to test one GPU from each generation.
To avoid extensions and validate CL language versions, one could try to test compile the code using the LLVM, or just getting the grammar for validation, e.g. as BNF.
There's a promising open source project, which probably contains useful stuff: http://bazaar.launchpad.net/~pocl/pocl/master/files/head:/lib/CL/
However, the problems I encountered were:
Newline characters caused build breakers on certain implementations (CR, LF, CRLF) in OpenCL source files. Specifying one of these as the only valid line ending would be just stupid. If one is editing source files on different platforms in conjunction with an SCM, it could get inconvenient. So I remove comments and clean up line breaks before compilation.
Performance: Feeding the GPU efficiently using multithreading; different hardware constellations have different bottlenecks. Here I needed a client side pipeline with multiple dispatcher threads. Of course, the amount of work that remains for the CPU depends on the task or capabilities, amount and resources of computing devices. Things that needed serialized execution or dynamic loop counts have been such candidates.

Resizing images (jpeg or decompressed image)

In my last question I asked whether there was a better way to rotate images than I had thought of. I ended up discovering jpegtran and have since found libjpeg-turbo.
Now I am looking for a better way to resize the images (jpegs) than imagemagick and graphicsmagick.
Is there a specialized commandline tool to resize the images in a more efficient way than imagemagick or graphicsmagick? Maybe the resizing can be done on the GPU using opencl or opengl?
The provided hardware is the same as in the other post:
Intel Atom D525 (1,8 Ghz)
Mobility Radeon HD 5430 Series
4 GB of RAM
SSD Vertility 3
Check this link out: http://leocharre.com/articles/faster-image-resizing-in-linux/
In particular the author mentions that imgresize is faster than imagemagick, and epeg is extremely fast.
epeg (http://www.systhread.net/texts/200507epeg1.php) seems quite well documented for generating thumbnails. If the quality is good enough, this could be the solution.
OpenCL is a standard for cross-platform, parallel programming of modern processors found in personal computers, servers and handheld/embedded devices. It's directly supported by ATI. You'll need to get AMD APP SDK (formerly known as AMD Stream SDK) to get GPU support (also check out this getting started guide).
Take a look at Intel's IPP - Integrated Performance Primitives. It's a multi-threaded software library of functions for multimedia and data processing applications. Among other features, it's has functions to resize images (bilinear, nearest neighbor, etc). Unfortunately, it is not free (cheapest version costs $199).
VIPS is a free image processing system. It claims that compared to most image processing libraries, VIPS needs little memory and runs quickly, especially on machines with more than one CPU. See the Speed and Memory Use page for a simple benchmark against other similar systems.
You can actually do a lot of bulk processing like this with GIMP's CLI options.
http://www.gimp.org/tutorials/Basic_Batch/
There is also djpeg and cjpeg from the Independent JPEG Group which can rescale and image to an M/N fraction. Not perfect but very fast.
Simply use FFMpeg.exe. It can resize , convert , change quality and so on.
And also it works with almost all known types of videos/audios/pictures.
It works in linux/unix too, and there is open source code for it written in C++.
You can get it Here (for Windows/compiled exe) or Here (source code and so on).
If you are developing a program, I recomend you to use standard GDIPlus library.
It does everything with pictures.

Developing with OpenCl on ATI and Nvidia on the same time

our workgroup is slowly trying a little bit of OpenCl in a side project. So far 'everybody' is working on NVIDIA Quadro FX 580. Now we are planning to buy new computers for new colleages and instead of the FX 580 we could buy ATI FirePro V4800 instead, which costs only 15Eur more and give us 1Gig instead of 512Gig of Ram which will benificial for our data intensive tasks.
So, how much trouble is it to develop OpenCl code at the same time on Nvidia and ATI?
I read the following SO question, Running OpenCL on hardware from mixed vendors, which was very pessimistic about developing on/for different vendors. On the other side, the question is already a year old.
What do you reccomend?
I have previous worked extensively with CUDA programming language.
I have been planning to start developing apps using OpenCL. As you mentioned one of the best features with OpenCL is running on many vendor hardware (Intel, AMD and Nvidia).
One project that I came across that used openCL extensively for large scale development is http://sourceforge.net/projects/hypgad/. It might be a good idea to look at the source code from this group and understand how they have developed their application on so many hardware including sony cell processor.
Another approach would be to use PyOPENCL, which provides higher abstraction than OpenCL and can significantly reduce the coding effort.
Do you need the code to run unchanged on both bits of hardware? If so you may have to develop for a limited subset of common functions.
If you can run slightly different c ode on each you will probably get better performance - in CUDA/OpenCL you generally have to tune the algorithms for the amount of ram, number of GPU engines anyway so it shoudldn't be much more work to also tweak for NVidia/AMD
The biggest problem is workgroup sizes. Some ATI cards I have used crash at above 64, but then it may be the Apple OSX 10.6 drivers I am using.
Developing for both ATI and NVIDIA is actually not too difficult so long as you avoid using any part of either vendor's SDK. Stick to OpenCL as it is defined in the OpenCL spec. (www.khronos.org/opencl) and your code will stay syntax portable. Due to differences in the underlying architectures, performance portability may be an issue. Local & Global worksizes really have to be determined independently for each card to maximize performance. Another thing to pay attention to is the types being used. Vector types (float2, float4) are especially useful on ATI cards, as each processing element actually contains 4 execution units (one for each RGB color channel, plus aplha).

OpenGL Threaded Tile Texture Loading with Qt 4.5 / 4.6

I am trying to develop am map application for scientific purposes at my university. Therefor I got access to a lot of tiles (256x256). I can access them and save them to an QImage in a seperate QThread. My Problem is, how can I actually manage to load the QImage into a texture within the seperate QThread (not the GUI main thread)? Or even better give me a Tipp how to approach this problem.
I though about multithreaded OpenGL but I also require OpenGL picking and I did not fall over anything usefull for that.#
Point me to any usefully example code if you feel like, I am thankfull for everything that compiles on Linux :)
Note1: I am using event based rendering, so only if the scene changes it gets redrawn.
Note2: OSG is NOT an option, it's far to heavy for that purpose, a leightweight approach is needed.
Note3: The Application is entirely written in C++
Thanks for any reply.
P.S. be patient, I am not that adavanced as this Topic may (or may not) suggest.
OpenGL is not thread-safe. You can only use one GL context in one thread at a time. Depending on OS you also have to explicitely give up on the context handle in one thread to use it in another.
You cannot speed up the texture loading by threading given that the bottleneck here is the bandwidth to the graphics card.
Let your delivery thread(s) that load the tiles fill up a ring buffer. The GL thread feeds from the ring buffer. With two mutexes it is easy to control the ring buffer to make this thread-safe operation.
That would be my suggestion.
Two tricks I use to speed things up:
pixel buffer objects: map GPU memory so the loading thread can write directly to gpu;
sync objects: with a sync object I know when the texture is really ready to be used (glTexImage2D with PBO is async so there is no guarantee the texture is ready to be binded, ie, when binding a texture, it blocks if DMA didn't finish updating texture data)

Resources