Cropping YUV_420_888 images for Firebase barcode decoding - firebase

I'm using the Firebase-ML barcode decoder in a streaming (live) fashion using the Camera2 API. The way I do it is to set up an ImageReader that periodically gives me Images. The full image is the resolution of my camera, so it's big - it's a 12MP camera.
The barcode scanner takes about 0.41 seconds to process an image on a Samsung S7 Edge, so I set up the ImageReaderListener to decode one at a time and throw away any subsequent frames until the decoder is complete.
The image format I'm using is YUV_420_888 because that's what the documentation recommends, and because if you try to feed the ML Barcode decoder anything else it complains (run time message to debug log).
All this is working but I think if I could crop the image it would work better. I'd like to leave the camera resolution the same (so that I can display a wide SurfaceView to help the user align his camera to the barcode) but I want to give Firebase a cropped version (basically a center rectangle). By "work better" I mean mostly "faster" but I'd also like to eliminate distractions (especially other barcodes that might be on the edge of the image).
This got me trying to figure out the best way to crop a YUV image, and I was surprised to find very little help. Most of the examples I have found online do a multi step process where you first convert the YUV image into a JPEG, then render the JPEG into a Bitmap, then scale that. This has a couple of problems in my mind
It seems like that would have significant performance implications (in real time). This would help me accomplish a few things including reducing some power consumption, improving the response time, and allowing me to return Images to the ImageReader via image.close() more quickly.
This approach doesn't get you back to Image, so you have to feed firebase a Bitmap instead, and that doesn't seem to work as well. I don't know what firebase is doing internally but I kind of suspect it's working mostly (maybe entirely) off of the Y plane and that the translation if Image -> JPEG -> Bitmap muddies that up.
I've looked around for YUV libraries that might help. There is something in the wild called libyuv-android but it doesn't work exactly in the format firebase-ml wants, and it's a bunch of JNI which gives me cross-platform concerns.
I'm wondering if anybody else has thought about this and come up with a better solution for croppying YUV_420_488 images in Android. Am I not able to find this because it's a relatively trivial operation? There's stride and padding to be concerned with among other things. I'm not an image/color expert, and i kind of feel like I shouldn't attempt this myself, my particular concern being I figure out something that works on my device but not others.
Update: this may actually be kind of moot. As an experiment I looked at the Image that comes back from ImageReader. It's an instance of ImageReader.SurfaceImage which is a private (to ImageReader) class. It also has a bunch of native tie-ins. So it's possible that the only choice is to do the compress/decompress method, which seems lame. The only other thing I can think of is to make the decision myself to only use the Y plane and make a bitmap from that, and see if Firebase-ML is OK with that. That approach still seems risky to me.

I tried to scale down the YUV_420_888 output image today. I think it quite similar to the cropping image.
In my case, I will put the 3 byte arrays from the Image.plane for representing the Y,U and V.
yBytes = Image.plane[0]
uBytes = Image.plane[1]
vBytes = Image.plane[2]
Then I convert it to the RGB array for bitmap converting. I found that if I read the YUV array with the original Image width and height by step 2 Then I can scale half of my Bitmap Image.
What should you prepare:
yBytes,
uBytes,
vBytes,
width of Image,
height of Image,
y row stride,
uv RowStride,
uv pixelStride,
output array (The length of output array length should be equal to the output image width * height. For me the size is 1/4 original width and height)
It means if you can find the cropping area positions(The four corner of the Image) in your image, you can just fill the new RGB array with the YUV data.
Hope it can help you to solve the problem.

Related

Proximity Distortion in Depth Image

Description:
The goal of my current project is to determine the location of an "object" with just its 3D-coordinates.
To achieve that I figured it'd be best to turn off the "Fill"-Mode of my Camera (ZED 2 from Stereolabs), because I want some hard edges in my depth-image.
The Problem:
The depth image is being distorted to a major degree due to proximity of other "objects".
The following image shows the depth image from the side, it is viewing some bars before a smooth woodwall. The wall is mostly plain, so everything is fine here.
I blacked the Color-Image and Myself, do not worry about those parts.
When I put my hand or another object in front of the wood wall parts that are bigger than my actual hand get "pulled" towards the camera around the location of the hand or other object. These parts seem to "stick" to other elevated parts in the proximity, as the area between the bars and my arm gets pulled entirely.
Question(s):
Is this normal?
Is there an easy way to get rid of it?
What is the reason behind it?
My own assumption(s):
Feel like this is some sort of approximation of unknown parts
Hopefully.. Glad the camera was calibrated by default, as that usually is a pain to do right.
Due to the new object that gets put in front of the wall, there is more stuff hidden and therefore more areas that the camera cannot see with both lenses, maybe it just "guesses" that the area between is not so far off due to some underlying algorithms that make the image smoother..
First of all I would advice you to change the depth mode also with keeping the sensing mode in STANDARD:
ULTRA: offers the highest depth range and better preserves Z-accuracy along the sensing range.
QUALITY: has a strong filtering stage giving smooth surfaces.
PERFORMANCE: designed to be smooth, can miss some details.
*********************From your description, it seems like you are using the Performance mode
The ZED Camera uses a matching alogorithm to generate the disparity/depth map, which is a closed source and I have recently contacted stereolabs about that and they've said "We cannot disclose this information to you because it's internal information and proprietary to Stereolabs."
Other works on the zed camera showed some limitations in depth sensing, specially when there is a variation in lightning and shadows. """Depth Data Error Modeling of the ZED 3D Vision Sensor from
Stereolabs"""
In addition to this, the depth error is directly proportional to the distance of the object from the camera, so make sure to set your depth range properly.

What device/instrument/technology should I use for detecting object’s lying on a given surface?

First of: Thanks for taking the time to help me with my problem. It is much appreciated :)
I am building a natural user interface. I’d like the interface to detect several (up to 40) objects lying on it. The interface should detect if the objects are moved on it’s the canvas. It is not important what the actual object on surface is
e.x. “bottle”
or what color it has – only the shape and the placement of the object is of interest
e.x. “circle” .
So far I’m using a webcam connected to my computer and Processing’s blob functionality to detect the objects on the surface of the interface (see picture 1). This has some major disadvantages to what I am trying to accomplish:
I do not want the user to see the camera or any alternative device because this is detracting the user’s attention. Actually the surface should be completely dark.
Whenever I am reaching with my hand to rearrange the objects on the interface, the blob detection gets very busy and is recognizing objects (my hand) which are not touching the canvas directly. This problem can hardly be tackled using a Kinect, because the depth functionality is not working through glass/acrylic glass – correct me if I am wrong.
It would be nice to install a few LEDs on the canvas controlled by an Arduino. Unfortunately, the light of the LEDs would disturb the blob detection.
Because of the camera’s focal length, the table needs to be unnecessarily high (60 cm / 23 inch).
Do you have any idea on an alternative device/technology to detect the objects? Would be nice if the device would work well with Processing and Arduino.
Thanks in advance! :)
Possibilities:
Use Reflective tinted glass so that the surface would dark or reflective
Illuminate the area, where you place the webcam with array of IR LED's.
I would suggest colour based detection and contouring of the objects.
If you are using colour based detection convert frames to HSV and CrCb colour space. These are much better for segmentation of required area while using colour based detection.
I do recommend you to check out https://github.com/atduskgreg/opencv-processing. This interfaces Open-CV with processing, you will be getting lot functionalities of Open-CV in processing .
One possibility:
Use a webcam with infrared capability (such as a security camera with built-in IR illumination). Apparently some normal webcams can be converted to IR use by removing a filter, I have no idea how common that is.
Make the tabletop out of some material that is IR-transparent, but opaque or nearly so to visible light. (Look at the lens on most any IR remote control for an example.)
This doesn't help much with #2, unfortunately. Perhaps you can be a bit pickier about the size/shape of the blobs you recognize as being your objects?
If you only need a few distinct points of illumination for #3, you could put laser diodes under the table, out of the path of the camera - that should make a visible spot on top, if the tabletop material isn't completely opaque. If you need arbitrary positioning of the lights - perhaps a projector on the ceiling, pointing down?
Look into OpenCV. It's an open source computer vision project.
In addition to existing ideas (which are great), I'd like to suggest trying TUIO Processing.
Once you have the camera setup (with the right field of view/lens/etc. based on your physical constraints) you could probably get away with sticking TUIO markers to the bottom of your objects.
The software will pickup detect the markers and you'll differentiate the objects by ID, but also be able to get position/rotation/etc. and your hands will not be part of that.

IMediaSample(DirectShow) to IDirect3DSurface9/IMFSample(MediaFoundation)

I am working on a custom video player. I am using a mix of DirectShow/Media Foundation in my architecture. Basically, I'm using DS to grab VOB frames(unsupported by MF). I am able to get a sample from DirectShow but am stuck on passing it to the renderer. In MF, I get a Direct3DSurface9 (from IMFSample), and present it on the backbuffer using the IDirect3D9Device.
Using DirectShow, I'm getting IMediaSample as my data buffer object. I don't know how to convert and pass this as IMFSample. I found others getting bitmap info from the sample and use GDI+ to render. But my video data may not always have RGB data. I wish to get a IDirect3DSurface9 or maybe IMFSample from IMediaSample and pass it for rendering, where I will not have to bother about color space conversion.
I'm new to this. Please correct me if I'm going wrong.
Thanks
IMediaSample you have from upstream decoder in DirectShow is nothing but a wrapper over memory backed buffer. There is no and cannot be any D3D surface behind it (unless you take care of it on your own and provide a custom allocator, in which case you would not have a question in first place). Hence, you are to memory-copy data from this buffer into MF sample buffer.
There you come to the question that you want buffer formats (media types) match, so that you could copy without conversion. One of the ways - and there might be perhaps a few - is to first establish MF pipeline and find out what exactly pixel type you are offered with buffers on the video hardware. Then make sure you have this pixel format and media type in DirectShow pipeline, by using respective grabber initialization or color space conversion filters, or via color space conversion DMO/MFT.

Storing pixel based world data

I am making a 2d game with destructable terrain. It will be on iOS but I am looking for ideas or pseudocode, not actual code. I'm wondering how to store a large amount of data. (It will be a large world, approximately 64000 pixels wide and 9600 tall. Each pixel needs a way to store what type of object it is.) I was hoping to use a 2D array but a quick load test showed that this is not feasable (even using a 640x480 grid I dropped below 1 fps)
I also tried the method detailed here: http://gmc.yoyogames.com/index.php?showtopic=315851 (I used to use Game Maker and remembered this method) however is seems a bit cumbersome and recombining the objects again is nearly impossible.
So what other methods are there? Does anyone know how Worms worked? What about image editors, how do they store the colour of each pixel?
Thankyou,
YM
Run-length encoding can help with your memory issues
I am most likely going to use Polygon based storage.

How to create a large Compatible Memory DC in GDI programming?

I want to create a large CompatibleDC, draw a large image on it, then bitblt part of the image to other DC, in order to achieve high performance. I am using the following code to create compatible Memory DC. But when the rect becomes very large, etc: 5000*5000, the CompatibleDC created become unstable. sometimes it is OK, sometimes it failed. is there any thing wrong with my code?
input :pInputDC
output:pOutputMemDC
{
pOutputMemDC=new CDC();
VERIFY(pOutputMemDC->CreateCompatibleDC(pInputDC));
CRect rect(0,0,nDCWidth,nDCHeight);
CBitmap bitmap;
if (bitmap.CreateCompatibleBitmap(pInputDC, rect.Width(), rect.Height()))
{
pOutputMemDC->SetViewportOrg(-rect.left, -rect.top);
m_pOldBitmap = pOutputMemDC->SelectObject(&bitmap);
}
CBrush brush;
VERIFY(brush.CreateSolidBrush(RGB(255,0, 0)));
brush.UnrealizeObject();
pOutputMemDC->FillRect(rect, &brush);
}
Instead of creating a large DC and then blitting a portion of it another, smaller DC, create a DC the same size as the destination DC, or at least the same size as the blit destination. Then, offset all your drawing commands by the (-x,-y) of the sub section you want to copy. If your destination is (100,200)-(400,400) on the source then create a DC (300x200) and offset everything by (-100,-200).
This has two big advantages: firstly, the memory required is much smaller. Secondly, GDI will clip your drawing operations to the size of the DC (it always clips anyway). Although the act of clipping takes CPU time, the time saved by not drawing pixels that aren't seen more than makes up for it.
Now, if this large DC is something like an image (JPEG for example) then you need to look into other methods. One technique used by many image editing programs is to split the image into tiles and page the tiles to/from memory/hard disk. Each tile is its own DC and you only have enough source DCs to fill the target DC. As the view window moves across the large image, unload tiles that have moved out of the target rectangle and load tiles that have become visible.
Each 5000x5000 pixel image needs ca. 100MB of RAM. Depending on how much RAM your PC has, this might already be the problem.
If you have 1GB of RAM or more, then that's probably not the issue. In this case, you must have a memory leak. Where do you free the allocated bitmap? I see that you unrealize the brush but how about the bitmap?
Note that increasing your swap won't help since that will kill your performance.
Make sure you are selecting all original GDI objects to the DCs.
The problem may be that your Bitmap is still selected into the pOutputMemDC when it is being destroyed and one of them or both can't be deleted properly. Thus problems with memory might begin.

Resources