wxGraphicsContext dreadfully slow on Windows - gdi+

I've implemented a plotter using wxGraphicsContext. The development was done using wxGTK, and the graphics was very fast.
Then I switched to Windows (XP) using wxWidgets 2.9.0. And the same code is extremely slow. It takes about 350 ms to render a frame. Since the user is able to drag the plotter with the mouse to navigate it feels very sluggish with such a slow update rate.
I've tried to implement some parts using wxDC and benchmarked the difference. With wxDC the code runs just about 100 times faster.
As far as I know both Cairo and GDI+ are implemented in software at this point, so there's no real reason Cairo should be so much faster than GDI+.
Am I doing something wrong? Or is the GDI+ implementation just not up on par with Cairo?
One small note: I'm rendering to a wxBitmap now, with the wxGraphicsContext created from a wxMemoryDC. This is to avoid flicker on XP, since double buffering doesn't work there.

Excerpt from the cairo homepage:
Cairo is designed to produce consistent output on all output media while taking advantage of display hardware acceleration when available (eg. through the X Render Extension).

Related

#OPENCL#clBuildProgram failed with error code -5

I met a problem when using clBuildProgram() on GTX 750. The kernel failed to build with error code -5(CL_OUT_OF_RESOURCES) and an empty build log.
There is a possible solution, which is adding '-cl-nv-verbose' as input option to clBuildProgram(). However, it doesn't work for all kernels.
Based on that, I tried another optimization option which is '-cl-opt-disable'. It also works for some kernels.
Then I got confused.
I cannot find the real reason for causing the error;
Why do different build-options make sense for some kernels?
The error seems like architecture independent.Since the same Opencl code is executed successfully on GTX 750, while failed on Tesla P100.
Does anyone has ideas?
Possible reasons I can think of:
Running out of registers. This happens if you have a lot of (private) variables in your kernel code, especially arrays. Each core only has a certain amount of registers available (architecture dependent), and it may not be possible for the compiler to "spill" them to global memory. If this is the problem, you can try to rearrange your code so your variables have more limited scope, or you can try to move some arrays to local memory (bearing in mind this is shared between work items in a group, and also limited in size). A good GPU profiler/code analysis tool should be able to tell you how much register pressure there is, so if you've got the kernel working on some hardware, you should be able to find out register pressure for that, and draw conclusions for other hardware too.
Code size itself. I didn't think this should be much of a problem anymore on modern GPUs, but it might be possible if you have truly gigantic kernels.

Is there any way to recover code from an arduino?

So I made this device for my car using an arduino. great, works almost perfect, only have minor debugging left to do. then my computer crashes the other day and I lost all my files because I never thought to back anything up. I've put at least 20 hours into writing this code and I really don't want to have to try and write the whole thing again from memory just for minor debugging. Is there any way I can lift the current version of the code that is still on the arduino and store it as a file on my computer? I'm using a mega2560 and a macbook pro.
At best you may be able to read out the machine code from the Atmel device.
You'd then need to use a dissembler / de-compiler to convert the code to asm / c.
Working with the disassembled code won't be easy.
You won't get the original names for your constants / functions / subroutines, just arbitrarily assigned labels eg sub1
To be honest it'd probably be quicker to start over.
Edit
If it were me I'd whip the HDD out of the mac and throw it at the mercy of some decent data recovery software,
Good luck

Resizing images (jpeg or decompressed image)

In my last question I asked whether there was a better way to rotate images than I had thought of. I ended up discovering jpegtran and have since found libjpeg-turbo.
Now I am looking for a better way to resize the images (jpegs) than imagemagick and graphicsmagick.
Is there a specialized commandline tool to resize the images in a more efficient way than imagemagick or graphicsmagick? Maybe the resizing can be done on the GPU using opencl or opengl?
The provided hardware is the same as in the other post:
Intel Atom D525 (1,8 Ghz)
Mobility Radeon HD 5430 Series
4 GB of RAM
SSD Vertility 3
Check this link out: http://leocharre.com/articles/faster-image-resizing-in-linux/
In particular the author mentions that imgresize is faster than imagemagick, and epeg is extremely fast.
epeg (http://www.systhread.net/texts/200507epeg1.php) seems quite well documented for generating thumbnails. If the quality is good enough, this could be the solution.
OpenCL is a standard for cross-platform, parallel programming of modern processors found in personal computers, servers and handheld/embedded devices. It's directly supported by ATI. You'll need to get AMD APP SDK (formerly known as AMD Stream SDK) to get GPU support (also check out this getting started guide).
Take a look at Intel's IPP - Integrated Performance Primitives. It's a multi-threaded software library of functions for multimedia and data processing applications. Among other features, it's has functions to resize images (bilinear, nearest neighbor, etc). Unfortunately, it is not free (cheapest version costs $199).
VIPS is a free image processing system. It claims that compared to most image processing libraries, VIPS needs little memory and runs quickly, especially on machines with more than one CPU. See the Speed and Memory Use page for a simple benchmark against other similar systems.
You can actually do a lot of bulk processing like this with GIMP's CLI options.
http://www.gimp.org/tutorials/Basic_Batch/
There is also djpeg and cjpeg from the Independent JPEG Group which can rescale and image to an M/N fraction. Not perfect but very fast.
Simply use FFMpeg.exe. It can resize , convert , change quality and so on.
And also it works with almost all known types of videos/audios/pictures.
It works in linux/unix too, and there is open source code for it written in C++.
You can get it Here (for Windows/compiled exe) or Here (source code and so on).
If you are developing a program, I recomend you to use standard GDIPlus library.
It does everything with pictures.

Actionscript PNGEncoder performance and UI blocking

I'm trying to use PNGEncoder to encode a bitmapData object into a png ByteArray so I can send the data to the server. Everything would be peachy except the bitmapData is 4000x4000px and when I run the PNGEncoder.encode function on it the whole app stops (UI is blocked) for 5-8 seconds while it runs. Does anybody have any suggestions on how to not make it block so bad, I read about chunking up the process (since you can't multithread in AS3) but can't find any sample code on chunking up the process.
Thanks,
Sam
In addition to Arthur's comment, you could also write it in C/C++ for Alchemy, since alchemy supports green threads. Like PixelBender, Alchemy also requires Flash 10.
There are mainly two ways to do this.
a) Use pixel bender:
You can off load the work to pixel bender (a shade like language in as3). This has the advantage of using the gpu on some cases, but it also is assynchronous and non blocking (runs on another thread). But it does require player 10+. I haven't seen a pixel bender png encoder, and to be honest, it may not be possible (I am not familiar enough with png encoding to tell), but it might be an option. This is, performance wise, the best you can get. More info here
b) Use chuncking. Basically, you rewrite the encoder to encode blocks (lines, columns or a smaller area), and hook that to an enter frame event, each frame you'd call next on your encoder, until there is no more encoding to do. Zeh has a neat LWZ chunked encoder with source code that might give you insights into the details.
Cheers
Arthur
Another shameless plug!
You can use my recently completed PNGEncoder2 library (also requires Flash 10+), which handily supports gigantic images. It does proper asynchronous encoding, with no single compression step at the end. Additionally, it's really fast ;-)
Grab it from GitHub (README), and check out the benchmark comparing it with other encoders on my blog post.
It's highly tuned for speed, and uses the Alchemy opcodes and domain memory to speed it up (thanks to Haxe), so it should be comparable to anything you compile using Alchemy.
You could encode multiple PNG files separately and send them to the server. Once on the server you can reconstruct the larger image.
It's for JPEG encoding, but should be useful - look a this post http://segfaultlabs.com/blog/post/asynchronous-jpeg-encoding/
As Arthur Debert said, you can use chunking. I'd suggest that instead of encoding once/frame, you try a setTimeout( chunkingFunction, 0 ); approach. A timeout with a 0 ms delay will happen as soon as possible, allowing the chunking to process quickly but without crushing the UI.

In terms of performance, which is better Flex or Silverlight?

In general, which performs better? How are they like when processing vector graphics?
Bubblemark is the premier benchmarking site for RIA.
Note at least one comparison image showing better drawing in Flash.
The GUIMark test is very interesting, the initial results showing SilverLight performance as poor. If you read down into the comments, the solution was identified as being partly a coding problem with timing and partly due to a major speed issue with SilverLight rendering text.
So, one key issue would be if text forms a major part of what you wish to render at speed.
Aside - I did a realtime graphing engine in Cocoa back in 2003 where I ended up using traditional Quickdraw rendering because the heavily anti-aliased and fancy text rendering of the Quartz graphics on Mac OS/X at the time was way too slow. Fast, good-looking text is not easy!

Resources