I have an image which I would like to extract the number but in a dynamic way (I don't want to specify a roi because image may vary) so I have to filter it. I tried to detect the horizontal line(to crop the image) but it failed. I would like to detect high density zones in the binary image (the face and the top of the image)
ps:my problem isn't how to extract numbers but to specify the roi
and all the images have the same format
any help would be appreciated(even without code just the big lines)
thanks
the image
I would start form detecting frame of the whole document.
If you google: rectangle detection opencv, you will find lots of examples.
In second stage i would apply inRange to filter purple line and detect it with HoughLines.
This should be enough to calculate ROI.
Related
I have developed a framework in R to automatically measure vegetation structure variables from whiteboard photos taken on grasslands for ecological related studies. Until now we have preprocessed the images by hand, however now we need to automatise the rotation and cropping of the images.
The idea is the following: Use reference marks on the whiteboard, and detect these markings to rotate and crop the original photographs. I need help to detect the reference markings. After knowing the position of reference markings (centroids) we can calculate the coordinates/pixels where to crops the image. In the end, we want to get a picture like that.
We can use some special colour for the markings, but these can be obstructed by the vegetation. The bottom of the whiteboard is always obstructed, the cropped part (without the reference markings) should be 25×100 cm.
Possibly edge detection can be a solution. I'm familiar with only with R programming.
I am new to the field of medical imaging - and trying to solve this (potentially basic problem). For a machine learning purpose, I am trying to standardize and normalize a library of DICOM images, to ensure that all images have the same rotation and are at the same scale (e.g. in mm). I have been playing around with the Mango viewer, and understand that one can create transformation matrices that might be helpful in this regard. I have however the following basic questions:
I would have thought that a scaling of the image would have changed the pixel spacing in the image header. Does this tag not provide the distance between pixels, and should this not change as a result of scaling?
What is the easiest way to standardize a library of images (ideally in python)? Is it possible and should one extract a mean pixel spacing across all images, and then scaling all images to match that mean? or is there a smarter way to ensure consistency in scaling and rotation?
Many thanks in advance, W
Does this tag not provide the distance between pixels, and should this
not change as a result of scaling?
Think of the image voxels as fixed units of space, which are sampling your image. When you apply your transform, you are translating/rotating/scaling your image around within these fixed units of space. That is, the size and shape of the voxels doesn't change. They just sample different parts of your image.
You can resample your image by making your voxels bigger or smaller or changing their shape (pixel spacing), but this can be independent of the transform you are applying to the image.
What is the easiest way to standardize a library of images (ideally in
python)?
One option is FSL-FLIRT, although it only accepts data in NIFTI format, so you'd have to convert your DICOMs to NIFTI. There is also this Python interface to FSL.
Is it possible and should one extract a mean pixel spacing across all
images, and then scaling all images to match that mean? or is there a
smarter way to ensure consistency in scaling and rotation?
I think you'd just to have pick a reference image to register all your other images too. There's no right answer: picking the highest resolution image/voxel dimensions or an average or some resampling into some other set of dimensions all sound reasonable.
Recently I had much fun with the Laplacian Pyramid algorithm (http://persci.mit.edu/pub_pdfs/pyramid83.pdf). But one big problem is that the original paper is limited to 2^m+1*2^n+1 images. My question is: What is the best way to deal with arbitrary w*h instead? I can think of a couple of options:
Up sample the input to the next 2^m+1,2^n+1 up front
Pad even lines. How exactly? Wouldn't it shift the signal?
Shift even lines by half a sample? Wouldn't it loose half a sample?
Does anybody have experience with this? What is the most practical and efficient approach? Also any pointers to papers dealing with this would be very welcome.
One approach is to create an image with a width and height equal to the next 2^m+1,2^n+1, but instead of up-sampling the image to fill the expanded dimensions, just place it in the top-left corner and fill the empty space to the right and below with a constant value (the average value for the image is a good choice for this). Then encode in the normal way, storing the original image dimensions along with the pyramid. When decoding, decode and then crop to the original size.
This won't introduce any visual artifacts or degradation because you aren't stretching or offsetting the image in any way.
Because the empty space to the right and below the original image is a constant value, the high-pass bands at each level in the image pyramid will be all zero in this area. So if you are using a compression scheme like run length encoding to store each level this will be automatically taken care off and these areas will be compressed to almost nothing. If not then you can simply store the top-left (potentially non-zero) area of each level and then fill out the rest with zeros when decoding.
You could find the min and max x and y bounding rectangle of the non-zero values for each level and store this along with the level, cropped to include only non-zero values. The decoder could also be optimized so that areas of the image that are going to be cropped away are not actually decoded in the first place, by only processing the top-left of each level.
Here's an illustration of the technique:
Instead of just filling the lower-right area with a flat color, you could fill it with horizontally and vertically mirrored copies of the image to the right and below, and a copy mirrored in both directions to the bottom-right, like this:
This will avoid the discontinuities of the first technique, although there will be a discontinuity in dx (e.g. if the value was gradually increasing from left to right it will suddenly be decreasing). Choosing a mirror that keeps dx constant and ddx zero will avoid this second-order discontinuity by linearly extrapolating the values.
Another technique, which is similar to what some JPEG encoders do to pad out an image to a whole number of MCU blocks, is to take the last pixel value of each row and repeat it, and likewise for columns, with the bottom-right-most pixel of the image used to fill the bottom-right area:
This last technique could easily be modified to extrapolate the gradient of values or even the gradient of gradients instead of just repeating the same value for the remainder of the row or column.
I have a graph like:
I would like to generate a set of (x,y) pairs that correspond to points of this graph.
Maybe one for each horizontal pixel.
How would I go about doing this?
If I had the image in uncompressed bitmap format, maybe cropped to the actual graph, I could examine each vertical strip for the blackest point...
I would prefer to work in Python, but I'm interested in any technique.
I answered a question like this a while back. It should be fairly easy to detect the grid, from there you can get the pixel's coordinates relatively to the grid. However, it wasn't clear how to extract the numbers, which you need to do in order to get the the scale of the grid. Although, it might be possible fairly easily if you can match the font and font size (which might be possible via scaling). Otherwise, you'd have to enter the numbers manually.
To extract the grid, you'd start from the top right and move diagonally until you find the start of the grid. From there you can follow the vertical and horizontal lines (of the grid) until they end. This should allow you to say with fairly high probability where the outer rectangle of the grid is and what the x and y intervals of the grid are in terms of pixels. The blackest parts within the grid should do for finding the curve, but it may require some interpolation depending on how many data points you need/want.
It also may be useful to look into techniques for reversing anti-aliasing effects. Although, the uncompressed bitmap image may not need it.
I'm trying to find similar or equivalent function of Matlabs "Bwareaopen" function in OpenCV?
In MatLab Bwareaopen(image,P) removes from a binary image all connected components (objects) that have fewer than P pixels.
In my 1 channel image I want to simply remove small regions that are not part of bigger ones? Is there any trivial way to solve this?
Take a look at the cvBlobsLib, it has functions to do what you want. In fact, the code example on the front page of that link does exactly what you want, I think.
Essentially, you can use CBlobResult to perform connected-component labeling on your binary image, and then call Filter to exclude blobs according to your criteria.
There is not such a function, but you can
1) find contours
2) Find contours area
3) filter all external contours with area less then threshhold
4) Create new black image
5) Draw left contours on it
6) Mask it with a original image
I had the same problem and came up with a function that uses connectedComponentsWithStats():
def bwareaopen(img, min_size, connectivity=8):
"""Remove small objects from binary image (approximation of
bwareaopen in Matlab for 2D images).
Args:
img: a binary image (dtype=uint8) to remove small objects from
min_size: minimum size (in pixels) for an object to remain in the image
connectivity: Pixel connectivity; either 4 (connected via edges) or 8 (connected via edges and corners).
Returns:
the binary image with small objects removed
"""
# Find all connected components (called here "labels")
num_labels, labels, stats, centroids = cv2.connectedComponentsWithStats(
img, connectivity=connectivity)
# check size of all connected components (area in pixels)
for i in range(num_labels):
label_size = stats[i, cv2.CC_STAT_AREA]
# remove connected components smaller than min_size
if label_size < min_size:
img[labels == i] = 0
return img
For clarification regarding connectedComponentsWithStats(), see:
How to remove small connected objects using OpenCV
https://www.programcreek.com/python/example/89340/cv2.connectedComponentsWithStats
https://python.hotexamples.com/de/examples/cv2/-/connectedComponentsWithStats/python-connectedcomponentswithstats-function-examples.html
The closest OpenCV solution to your question is the morphological closing or opening.
Say you have white regions in your image that you need to remove. You can use morphological opening. Opening is erosion + dilation, in that order. Erosion is when the white regions in your image are shrunk. Dilation is (the opposite) where white regions in your image are enlarged. When you perform an opening operation, your small white region is eroded until it vanishes. Larger white features will not vanish but will be eroded from the boundary. The subsequent dilation step restores their original size. However, since the small element(s) vanished during the erosion step, they will not appear in the final image after dilation.
For example consider this image where we want to remove the small white regions but retain the 3 large white ellipses. Running the following code removes the white regions and displays the clean image
import cv2
im = cv2.imread('sample.png')
clean = cv2.morphologyEx(im, cv2.MORPH_OPEN, np.ones((10, 10)))
cv2.imshwo("Clean image", clean)
The clean image output would be like this.
The command above uses a square block of size 10 as the kernel. You can modify this to suit your requirement. You can even generate a more advanced kernel using the function getStructuringElement().
Note that if your image is inverted, i.e., with black noise on a white background, you simply need to use the morphological closing operation (cv2.MORPH_CLOSE method) instead of opening. This reverses the order of operation - first the image is eroded and then dilated.