how to ascribe (x,y) coordinates to text? - coordinate-systems

Greetings, Does anyone have a suggestion on how I can either, a) map text on a grid (what i imagine is a Cartesian system where each character has a (x,y) coordinate, or b) if a grid is not possible, somehow measure text for i)location of a character, difference (delta) between one text location and another x2-x1 horizontal or difference between one text location and another vertically y2-y1? I am on a PC and would require a suggested programming method or program suggestion (is there a PC based text program with this feature that anyone knows of)? Thanks so much in advance,
c~tea

In the absence of a defined programming language, I suggest PostScript.
PostScript is a page description language which has the feature you ask. It is understood by many printers directly and for the rest there are interpreters. It is also the basis of the PDF file format. On the wikipedia page is an example that demonstrates the use of the coordinate system:
%!PS
/Courier % name the desired font
20 selectfont % choose the size in points and establish
% the font as the current one
72 500 moveto % position the current point at
% coordinates 72, 500 (the origin is at the
% lower-left corner of the page)
(Hello world!) show % stroke the text in parentheses
showpage % print all on the page
The coordinate system has its origin in the lower left corner and is measured in points (1/72 inch)

Related

Is it possible to retrieve beam real world position from RT PLAN

I am currently working on some radiotherapy plan generation and I am trying to retrieve the beam source position from a DICOM RTPLAN file and point it on a related CT-Scan 3D image.
With the RTPLAN, I am able to access the isocenter position of each beam but this is in patient coordinates and I am note quite sure how to find the coordinates in the basis that is used by the 3D CT-Scan image.
I have access to the attributes ImagePosition and ImageOrientation of the DICOM of the CT. Moreover, the CT DICOM-like file (it is in practice a json regrouping some DICOM information) and the RTPLAN share the same FrameOfReference (Does it mean that they share the Patient coordinate system ?).
What does ImagePosition truely indicates ? As well as I can understand I think this is the position of the point (0, 0, 0) of the CT-3DImage in the Patient Coordinates. I am also a bit confused about the ImageOrientation attribute.
As you can read in this answer here, the ImagePosition attribute gives you the x, y, and z coordinates of the upper left hand corner of the image, in mm, i.e. the coordinates of the center of the upper left pixel of the image.
For your convenience I copy-paste below a table from the DICOM Documentation Part 3 (page 561).
Regarding the ImageOrientation attribute, as described in the documentation, gives you the direction cosines of the first row and the first column with respect​ to the patient. To understand better this attribute take a look at the very useful website, DICOM is Easy, by Roni Zaharia. In one of his images (below), you can clearly see that when the attribute is not equal to 1\0\0\0\1\0, then the coordinate system of the image is not align with the coordinate system of the patient. To align them, you have to use the direction cosines provided by the attribute and apply a rotation (take a look at the transformation matrix at page 562 of the aforementioned DICOM documentation).

OCR and character similarity

I am currently working on some kind of OCR (Optical Character Recognition) system. I have already written a script to extract each character from the text and clean (most of the) irregularities out of it. I also know the font. The images I have now for example are:
M (http://i.imgur.com/oRfSOsJ.png (font) and http://i.imgur.com/UDEJZyV.png (scanned))
K (http://i.imgur.com/PluXtDz.png (font) and http://i.imgur.com/TRuDXSx.png (scanned))
C (http://i.imgur.com/wggsX6M.png (font) and http://i.imgur.com/GF9vClh.png (scanned))
For all of these images I already have a sort of binary matrix (1 for black, 0 for white). I was now wondering if there was some kind of mathematical projection-like formula to see the similarity between these matrices. I do not want to rely on a library, because that was not the task given to me.
I know this question may seem a bit vague and there are similar questions, but I'm looking for the method, not for a package and so far I couldn't find any comments regarding the method. The reason this question being vague is that I really have no point to start. What I want to do is actually described here on wikipedia:
Matrix matching involves comparing an image to a stored glyph on a pixel-by-pixel basis; it is also known as "pattern matching" or "pattern recognition".[9] This relies on the input glyph being correctly isolated from the rest of the image, and on the stored glyph being in a similar font and at the same scale. This technique works best with typewritten text and does not work well when new fonts are encountered. This is the technique the early physical photocell-based OCR implemented, rather directly. (http://en.wikipedia.org/wiki/Optical_character_recognition#Character_recognition)
If anyone could help me out on this one, I would appreciate it very much.
for recognition or classification most OCR's use neural networks
These must be properly configured to desired task like number of layers internal interconnection architecture , and so on. Also problem with neural networks is that they must be properly trained which is pretty hard to do properly because you will need to know for that things like proper training dataset size (so it contains enough information and do not over-train it). If you do not have experience with neural networks do not go this way if you need to implement it yourself !!!
There are also other ways to compare patterns
vector approach
polygonize image (edges or border)
compare polygons similarity (surface area, perimeter, shape ,....)
pixel approach
You can compare images based on:
histogram
DFT/DCT spectral analysis
size
number of occupied pixels per each line
start position of occupied pixel in each line (from left)
end position of occupied pixel in each line (from right)
these 3 parameters can be done also for rows
points of interest list (points where is some change like intensity bump,edge,...)
You create feature list for each tested character and compare it to your font and then the closest match is your character. Also these feature list can be scaled to some fixed size (like 64x64) so the recognition became invariant on scaling.
Here is sample of features I use for OCR
In this case (the feature size is scaled to fit in NxN) so each character has 6 arrays by N numbers like:
int row_pixels[N]; // 1nd image
int lin_pixels[N]; // 2st image
int row_y0[N]; // 3th image green
int row_y1[N]; // 3th image red
int lin_x0[N]; // 4th image green
int lin_x1[N]; // 4th image red
Now: pre-compute all features for each character in your font and for each readed character. Find the most close match from font
min distance between all feature vectors/arrays
not exceeding some threshold difference
This is partially invariant on rotation and skew up to a point. I do OCR for filled characters so for outlined font it may have use some tweaking
[Notes]
For comparison you can use distance or correlation coefficient

Find the closest tile along a path

I have a tile based game and I need to find the closest tile within a 32px radius. So say a user is at 400, 200 and the user clicks at 500, 400. I need to create a path or line from the player to the mouse position on click and the closest tile that is underneath the path within 32px (or 2 tiles) must be chosen. The map is tiled at 16px.
A function call to see if a tile is at a given tile position is available Map.at(x,y).
I just don't know the maths to use to work this out.
The block blocks are within 16px, the red are within 32px. The grey block is the tile to be destroyed and the blue line is the invisible path between the player and mouse.
If you work in terms of tile coordinates, the problem becomes a line-drawing problem from the title the user is at to the tile the mouse was clicked in. A line drawing algorithm would generate, in sequence, all the tiles along a straight-line path between those two tiles. Just pick the first one where Map.at(x,y) satisfies your requirements and exit the line-drawer.
A number of line drawing algorithms exist. Two simple ones are DDA and Bresenham's. Both generate the discrete "pixels" (tiles, in your question) in the correct order. The DDA is the simple choice if floating point arithmetic can be used in your application. Bresenham's uses only integer math.
With a lot of games, it's not necessary a straight line, but a search for the shortest path. If that's where you are heading then you might want to look at the A* algorithm.

Matlab Bwareaopen equivalent function in OpenCV

I'm trying to find similar or equivalent function of Matlabs "Bwareaopen" function in OpenCV?
In MatLab Bwareaopen(image,P) removes from a binary image all connected components (objects) that have fewer than P pixels.
In my 1 channel image I want to simply remove small regions that are not part of bigger ones? Is there any trivial way to solve this?
Take a look at the cvBlobsLib, it has functions to do what you want. In fact, the code example on the front page of that link does exactly what you want, I think.
Essentially, you can use CBlobResult to perform connected-component labeling on your binary image, and then call Filter to exclude blobs according to your criteria.
There is not such a function, but you can
1) find contours
2) Find contours area
3) filter all external contours with area less then threshhold
4) Create new black image
5) Draw left contours on it
6) Mask it with a original image
I had the same problem and came up with a function that uses connectedComponentsWithStats():
def bwareaopen(img, min_size, connectivity=8):
"""Remove small objects from binary image (approximation of
bwareaopen in Matlab for 2D images).
Args:
img: a binary image (dtype=uint8) to remove small objects from
min_size: minimum size (in pixels) for an object to remain in the image
connectivity: Pixel connectivity; either 4 (connected via edges) or 8 (connected via edges and corners).
Returns:
the binary image with small objects removed
"""
# Find all connected components (called here "labels")
num_labels, labels, stats, centroids = cv2.connectedComponentsWithStats(
img, connectivity=connectivity)
# check size of all connected components (area in pixels)
for i in range(num_labels):
label_size = stats[i, cv2.CC_STAT_AREA]
# remove connected components smaller than min_size
if label_size < min_size:
img[labels == i] = 0
return img
For clarification regarding connectedComponentsWithStats(), see:
How to remove small connected objects using OpenCV
https://www.programcreek.com/python/example/89340/cv2.connectedComponentsWithStats
https://python.hotexamples.com/de/examples/cv2/-/connectedComponentsWithStats/python-connectedcomponentswithstats-function-examples.html
The closest OpenCV solution to your question is the morphological closing or opening.
Say you have white regions in your image that you need to remove. You can use morphological opening. Opening is erosion + dilation, in that order. Erosion is when the white regions in your image are shrunk. Dilation is (the opposite) where white regions in your image are enlarged. When you perform an opening operation, your small white region is eroded until it vanishes. Larger white features will not vanish but will be eroded from the boundary. The subsequent dilation step restores their original size. However, since the small element(s) vanished during the erosion step, they will not appear in the final image after dilation.
For example consider this image where we want to remove the small white regions but retain the 3 large white ellipses. Running the following code removes the white regions and displays the clean image
import cv2
im = cv2.imread('sample.png')
clean = cv2.morphologyEx(im, cv2.MORPH_OPEN, np.ones((10, 10)))
cv2.imshwo("Clean image", clean)
The clean image output would be like this.
The command above uses a square block of size 10 as the kernel. You can modify this to suit your requirement. You can even generate a more advanced kernel using the function getStructuringElement().
Note that if your image is inverted, i.e., with black noise on a white background, you simply need to use the morphological closing operation (cv2.MORPH_CLOSE method) instead of opening. This reverses the order of operation - first the image is eroded and then dilated.

Where can I find information on line growing algorithms?

I'm doing some image processing, and I need to find some information on line growing algorithms - not sure if I'm using the right terminology here, so please call me out on this is needs be.
Imagine my input image is simply a circle on a black background. I'd basically like extract the coordinates, so that I may draw this circle elsewhere based on the coordinates.
Note: I am already using edge detection image filters, but I thought it best to explain with a simple example.
Basically what I'm looking to do is detect lines in an image, and store the result in a data type where by I have say a class called Line, and various different Point objects (containing X/Y coordinates).
class Line
{
Point points[];
}
class Point
{
int X, Y;
}
And this is how I'd like to use it...
Line line;
for each pixel in image
{
if pixel should be added to line
{
add pixel coordinates to line;
}
}
I have no idea how to approach this as you can probably establish, so pointers to any subject matter would be greatly appreciated.
I'm not sure if I'm interpreting you right, but the standard way is to use a Hough transform. It's a two step process:
From the given image, determine whether each pixel is an edge pixel (this process creates a new "binary" image). A standard way to do this is Canny edge-detection.
Using the binary image of edge pixels, apply the Hough transform. The basic idea is: for each edge pixel, compute all lines through it, and then take the lines that went through the most edge pixels.
Edit: apparently you're looking for the boundary. Here's how you do that.
Recall that the Canny edge detector actually gives you a gradient also (not just the magnitude). So if you pick an edge pixel and follow along (or against) that vector, you'll find the next edge pixel. Keep going until you don't hit an edge pixel anymore, and there's your boundary.
What you are talking about is not an easy problem! I have found that this website is very helpful in image processing: http://homepages.inf.ed.ac.uk/rbf/HIPR2/wksheets.htm
One thing to try is the Hough Transform, which detects shapes in an image. Mind you, it's not easy to figure out.
For edge detection, the best is Canny edge detection, also a non-trivial task to implement.
Assuming the following is true:
Your image contains a single shape on a background
You can determine which pixels are background and which pixels are the shape
You only want to grab the boundary of the outside of the shape (this excludes donut-like shapes where you want to trace the inside circle)
You can use a contour tracing algorithm such as the Moore-neighbour algorithm.
Steps:
Find an initial boundary pixel. To do this, start from the bottom-left corner of the image, travel all the way up and if you reach the top, start over at the bottom moving right one pixel and repeat, until you find a shape pixel. Make sure you keep track of the location of the pixel that you were at before you found the shape pixel.
Find the next boundary pixel. Travel clockwise around the last visited boundary pixel, starting from the background pixel you last visited before finding the current boundary pixel.
Repeat step 2 until you revisit first boundary pixel. Once you visit the first boundary pixel a second time, you've traced the entire boundary of the shape and can stop.
You could take a look at http://processing.org/ the project was created to teach the fundamentals of computer programming within a visual context. There is the language, based on java, and an IDE to make 'sketches' in. It is a very good package to quickly work with visual objects and has good examples of things like edge detection that would be useful to you.
Just to echo the answers above you want to do edge detection and Hough transform.
Note that a Hough transform for a circle is slightly tricky (you are solving for 3 parameters, x,y,radius) you might want to just use a library like openCV

Resources