Render YUV in JavaFX - javafx

I need to render a yuyv422 stream in JavaFX with minimum latency. If I convert it to RGB, I can use an ImageView with a WritableImage with a PixelFormat instance, and it works, but the RGB conversion consumes a lot of CPU, specially with high resolutions. I saw this exact feature request
https://bugs.openjdk.java.net/browse/JDK-8091933
but seems it will not be implemented in Java 9. And if it does, I wonder if it won't introduce latency or demand too much CPU. Is there another way using JavaFX?

In General:
Image processing is always expensive, this is why Vectorization or Hardware Acceleration is used for these tasks. Simple looping through an Image with just one thread is already really slow, especially in java. On top of that people tend to use Color objects for color modifications which is tremendously slow.
Pure Java:
If you want to keep your code in pure Java. You should check which internal format is used for the WriteableImage by calling:
myImage.getPixelWriter().getPixelFormat().getType()
If the internal format isn't RGB adapt your color conversion to the given format to avoid double conversion.
Additionally make sure that your code is optimized as much as possible:
-Don't use any objects except arrays
-Minimize the use of local variables
You can also try to multithread the conversion process via parallel loops.
JNI:
Moving away from Java opens up a lot of possibilities. There are several platform independent libraries for converting YUV to RGB and back:
OpenCV:
Easy to use and coming already with an java API:
byte[] myYuvImage = null; //your image here
byte[] myRgbImage = new byte[width * height * 3]; //the output image
Mat yuvMat = new Mat(height, width, CvType.CV_8UC2); //YUV422 should be 2 channel
Mat rgbMat = new Mat(height, width, CvType.CV_8UC3);
yuvMat.put(0,0, myYuvImage);
Imgproc.cvtColor(yuvMat, rgbMat, Imgproc.COLOR_YUV2RGB_Y422);
rgbMat.get(0, 0, myRgbImage);
Intel IPP:
Only available via JNI. You would use ippiRGBToYUV422_8u_C3C2R see RGBToYUV422 for more information.
SwScale as part of FFmpeg:
Only available via JNI. See this answer and adapt the example.
My personal experience is that IPP offers by far the best performance even on AMD machines. However the license it comes with may be free but it prohibits decompiling which might be an not compatible with LGPL libraries.

Related

How would I handle extra-large images in OpenCL?

I've been working on a PyOpenCL program that will take in an OpenCL kernel (representing an image filter) and an image and apply said filter to generate an output image. The issue is that I need to make this program run on an image of any size.
I've written a similar program before with C# and OpenCL using the Cloo (http://sourceforge.net/projects/cloo/) framework, but I wanted to make something more portable (since the Cloo framework fails to run properly on Linux).
Now, in my C# implementation, I simply split the image up into chunks and ran the kernel on each chunk. I did this by handling the images as plain byte arrays in my kernel. However, the issue I'm having now is that I'm attempting to use the image2d_t datatype in my PyOpenCL implementation, and I'm not sure how to go about breaking the image into chunks and passing them to the kernel.
Does the image2d_t class add padding to the returned images (that I would need to post-process), or perhaps it supports some sort of automated methodology that would handle this for me?
Any resources that would point me in the right direction are greatly appreciated!
Edit: I figured I should mention that the reason why I want to do this is because I run into memory allocation exceptions with my current build (because the images are too large).
I managed to work around it by splitting the image up using the Python Imaging Library's crop and paste functionality to process subimages and replace them into the output image after processing.

QT, convert raw image to jpg using Hardware acceleration (gpu)

I need to convert an Raw image buffer into a jpg image buffer.
At the moment, I do this operation in the following way:
QImage tmpImage
= QImage(rawImgBuffer, img_width, img_height, image.format ); //image.format=RGB888
QBuffer bufferJpeg(&ba);
bufferJpeg.open(QIODevice::WriteOnly);
tmpImage.save(&bufferJpeg, "JPG");
QByteArray finalJpgBuffer = bufferJpeg.data();
It works fine but the cpu load is too high (I have a lot of threads that do this operation a lot of time each second).
Reading the Qt documentation I found this article: Hardware Acceleration &amp Embedded Platforms.
If i understood, I can use the QPainter class to execute gpu operations...
Is it possible to do this convertion (from raw to jpg) using this class? (or another similar Qt class that use hardware acceleration (gpu))!!
My application need to be platform indipendent.
Thanx at all.
I don't think QImage uses the GPU to generate a jpeg.
This probably wouldn't help (except on very limited CPUs) since the transfer time of getting the data back out of the GPU would normally dominate. The reason for using hardware acceleration for display is that the result is then already in the GPU ready for display.
As far as I know decoding of image formats (jpeg in this case) is not handled by QPainter. It is done by Qt using libjpeg, which is controlled by Qt using a plugin. You can find the plugin in qt_source_tree/src/plugins/imageformats/jpeg. That is simply using the library you have available on your system (libjpeg.so in Linux). If it is hardware accelerated or not, it is up to your system.
I had a case in which hardware decoding required to use a specific library. In that case I had to create a specific Qt plugin to handle that.

Implement IP camera

We have a device that has an analog camera. We have a card that samples it and digitizes it. This is all done in directx. At this point in time, replacing hardware is not an option, but we need to code such that we can see this video feed real-time regardless of any hardware or underlying operating system changes occur in the future.
Along this line, we've chosen Qt to implement a GUI to view this camera feed. However, if we move to a linux or other embedded platform in the future and change other hardware (including the physical device where the camera/video sampler lives), we will need to change the camera display software as well, and that's going to be a pain because we need to integrate it into our GUI.
What i proposed was migrating to a more abstract model where data is sent over a socket to the GUI and the video is displayed live after being parsed from the socket stream.
First, is this a good idea or a bad idea?
Secondly, how would you implement such a thing? How do the video samplers usually give usable output? How can I push this output over a socket? Once I am on the receiving end parsing the output, how do I know what to do with the output (as in how to get the output to render)? The only thing I can think of would be to write each sample to a file and then to display the contents of the file every time a new sample arrives. This seems like an inefficient solution to me, if it would work at all.
How do you recommend I handle this? Are there any cross-platform libraries available for such a thing?
Thank you.
edit: i am willing to accept suggestions of something different rather than what is listed above.
Have you looked at QVision? It is a Qt based framework for managing video and video processing. You don't need the processing, but I think it will do what you want.
Anything that duplicates the video stream is going to cost you in performance, especially in an embedded space. In most situations for video, I think you're better off trying to use local hardware acceleration to blast the video directly to the screen. With some proper encapsulation, you should be able to use Qt for the GUI surrounding the video, and have a class that is platform specific that you use to control the actual video drawing to the screen (where to draw, and how big, etc.).
Edit:
You may also want to look at the Phonon library. I haven't looked at it much, but it appears to support showing video that may be acquired from a range of different sources.

Actionscript PNGEncoder performance and UI blocking

I'm trying to use PNGEncoder to encode a bitmapData object into a png ByteArray so I can send the data to the server. Everything would be peachy except the bitmapData is 4000x4000px and when I run the PNGEncoder.encode function on it the whole app stops (UI is blocked) for 5-8 seconds while it runs. Does anybody have any suggestions on how to not make it block so bad, I read about chunking up the process (since you can't multithread in AS3) but can't find any sample code on chunking up the process.
Thanks,
Sam
In addition to Arthur's comment, you could also write it in C/C++ for Alchemy, since alchemy supports green threads. Like PixelBender, Alchemy also requires Flash 10.
There are mainly two ways to do this.
a) Use pixel bender:
You can off load the work to pixel bender (a shade like language in as3). This has the advantage of using the gpu on some cases, but it also is assynchronous and non blocking (runs on another thread). But it does require player 10+. I haven't seen a pixel bender png encoder, and to be honest, it may not be possible (I am not familiar enough with png encoding to tell), but it might be an option. This is, performance wise, the best you can get. More info here
b) Use chuncking. Basically, you rewrite the encoder to encode blocks (lines, columns or a smaller area), and hook that to an enter frame event, each frame you'd call next on your encoder, until there is no more encoding to do. Zeh has a neat LWZ chunked encoder with source code that might give you insights into the details.
Cheers
Arthur
Another shameless plug!
You can use my recently completed PNGEncoder2 library (also requires Flash 10+), which handily supports gigantic images. It does proper asynchronous encoding, with no single compression step at the end. Additionally, it's really fast ;-)
Grab it from GitHub (README), and check out the benchmark comparing it with other encoders on my blog post.
It's highly tuned for speed, and uses the Alchemy opcodes and domain memory to speed it up (thanks to Haxe), so it should be comparable to anything you compile using Alchemy.
You could encode multiple PNG files separately and send them to the server. Once on the server you can reconstruct the larger image.
It's for JPEG encoding, but should be useful - look a this post http://segfaultlabs.com/blog/post/asynchronous-jpeg-encoding/
As Arthur Debert said, you can use chunking. I'd suggest that instead of encoding once/frame, you try a setTimeout( chunkingFunction, 0 ); approach. A timeout with a 0 ms delay will happen as soon as possible, allowing the chunking to process quickly but without crushing the UI.

OpenGL Threaded Tile Texture Loading with Qt 4.5 / 4.6

I am trying to develop am map application for scientific purposes at my university. Therefor I got access to a lot of tiles (256x256). I can access them and save them to an QImage in a seperate QThread. My Problem is, how can I actually manage to load the QImage into a texture within the seperate QThread (not the GUI main thread)? Or even better give me a Tipp how to approach this problem.
I though about multithreaded OpenGL but I also require OpenGL picking and I did not fall over anything usefull for that.#
Point me to any usefully example code if you feel like, I am thankfull for everything that compiles on Linux :)
Note1: I am using event based rendering, so only if the scene changes it gets redrawn.
Note2: OSG is NOT an option, it's far to heavy for that purpose, a leightweight approach is needed.
Note3: The Application is entirely written in C++
Thanks for any reply.
P.S. be patient, I am not that adavanced as this Topic may (or may not) suggest.
OpenGL is not thread-safe. You can only use one GL context in one thread at a time. Depending on OS you also have to explicitely give up on the context handle in one thread to use it in another.
You cannot speed up the texture loading by threading given that the bottleneck here is the bandwidth to the graphics card.
Let your delivery thread(s) that load the tiles fill up a ring buffer. The GL thread feeds from the ring buffer. With two mutexes it is easy to control the ring buffer to make this thread-safe operation.
That would be my suggestion.
Two tricks I use to speed things up:
pixel buffer objects: map GPU memory so the loading thread can write directly to gpu;
sync objects: with a sync object I know when the texture is really ready to be used (glTexImage2D with PBO is async so there is no guarantee the texture is ready to be binded, ie, when binding a texture, it blocks if DMA didn't finish updating texture data)

Resources