What are the parts in an image ndarray? when passing into a CNN - multidimensional-array

I have been trying to fit images converted to ndarrays into my AI and I am curious about what what each part in the ndarray is.
Looking at how I pass an image into my AI,
prediction = int(model.predict(img_array)[0][0])
I know that (in debug mode and hovering over the variable) img_array is an ndarray of (1, 224, 224, 3). I know that 224 is the width and height. what would the 1 and 3 be?
I actually sampled this line from this link and although, not that its the main question, can someone explain why they make the prediction an int and add the [0][0]
on the end when predicting?
Thanks!

Related

OCR and character similarity

I am currently working on some kind of OCR (Optical Character Recognition) system. I have already written a script to extract each character from the text and clean (most of the) irregularities out of it. I also know the font. The images I have now for example are:
M (http://i.imgur.com/oRfSOsJ.png (font) and http://i.imgur.com/UDEJZyV.png (scanned))
K (http://i.imgur.com/PluXtDz.png (font) and http://i.imgur.com/TRuDXSx.png (scanned))
C (http://i.imgur.com/wggsX6M.png (font) and http://i.imgur.com/GF9vClh.png (scanned))
For all of these images I already have a sort of binary matrix (1 for black, 0 for white). I was now wondering if there was some kind of mathematical projection-like formula to see the similarity between these matrices. I do not want to rely on a library, because that was not the task given to me.
I know this question may seem a bit vague and there are similar questions, but I'm looking for the method, not for a package and so far I couldn't find any comments regarding the method. The reason this question being vague is that I really have no point to start. What I want to do is actually described here on wikipedia:
Matrix matching involves comparing an image to a stored glyph on a pixel-by-pixel basis; it is also known as "pattern matching" or "pattern recognition".[9] This relies on the input glyph being correctly isolated from the rest of the image, and on the stored glyph being in a similar font and at the same scale. This technique works best with typewritten text and does not work well when new fonts are encountered. This is the technique the early physical photocell-based OCR implemented, rather directly. (http://en.wikipedia.org/wiki/Optical_character_recognition#Character_recognition)
If anyone could help me out on this one, I would appreciate it very much.
for recognition or classification most OCR's use neural networks
These must be properly configured to desired task like number of layers internal interconnection architecture , and so on. Also problem with neural networks is that they must be properly trained which is pretty hard to do properly because you will need to know for that things like proper training dataset size (so it contains enough information and do not over-train it). If you do not have experience with neural networks do not go this way if you need to implement it yourself !!!
There are also other ways to compare patterns
vector approach
polygonize image (edges or border)
compare polygons similarity (surface area, perimeter, shape ,....)
pixel approach
You can compare images based on:
histogram
DFT/DCT spectral analysis
size
number of occupied pixels per each line
start position of occupied pixel in each line (from left)
end position of occupied pixel in each line (from right)
these 3 parameters can be done also for rows
points of interest list (points where is some change like intensity bump,edge,...)
You create feature list for each tested character and compare it to your font and then the closest match is your character. Also these feature list can be scaled to some fixed size (like 64x64) so the recognition became invariant on scaling.
Here is sample of features I use for OCR
In this case (the feature size is scaled to fit in NxN) so each character has 6 arrays by N numbers like:
int row_pixels[N]; // 1nd image
int lin_pixels[N]; // 2st image
int row_y0[N]; // 3th image green
int row_y1[N]; // 3th image red
int lin_x0[N]; // 4th image green
int lin_x1[N]; // 4th image red
Now: pre-compute all features for each character in your font and for each readed character. Find the most close match from font
min distance between all feature vectors/arrays
not exceeding some threshold difference
This is partially invariant on rotation and skew up to a point. I do OCR for filled characters so for outlined font it may have use some tweaking
[Notes]
For comparison you can use distance or correlation coefficient

Creating seamless rotated background image

I want to repeat a background image that is rotated. Trying to make it seamless is destroying my soul.
Starting with something simple, consider each image is laid out like bricks. Creating a seamless repeating background image is pretty simple:
(the red area is the crop). You can see this working as expected at http://jsfiddle.net/mPqfB.
Now let's say I want to rotate the image by 45 degrees:
Unfortunately, the same crop no longer works, as you can see on http://jsfiddle.net/mPqfB/1.
I'm trying to figure out how to crop the image correctly so that we have a seamless repeat. There's probably some fairly trivial maths involved to do this but I can't for the life of me figure it out.
[Update]
I'm attempting to follow #oezi's calculations so to make things easier have created an image of dimensions: 100px x 50px.
Therefore:
Least Common Multiple = 100
Hypotenuse = 1002 + 1002 = 20000
Now I'm assuming this means we don't have to create an image of 20000px x 20000px. Am hoping that #oezi can clarify how he performs his resizing??
If this is a2 + b2 = c2 is equal to c = square root of (a2 + b2)
Then we can concur that our crop should be 141px?
Finally, this doesn't actually explain where we take the crop from?
[Update 2]
It does look like this is how the resize should be created. Taking a 141px x 141px crop of the image yielded the correct results - http://jsfiddle.net/EfuV2/
As far as where to crop from, it doesn't actually matter!
is the rotation is exactly 45 degrees, you'll have to find out the least common multiple of the width and height of your unrotated pattern.
in your case, that's 15100 (width 100 and height 151)
it would be much better to scale your pattern to width 100 and height 150, so the least common multiple is only 300
Take that number and some math (pythagorean theorem). Assume your number is the length of the two short arms and calculate the length of the hypotenuse - that's our result (make a square image of that size to get your pattern).
in your case, that's 21355
with resizing, it's ~ 424
Note that this is just typed straight from my head because i can't try it out practically at the moment - but i'm really sure it's correct.
edit: a fast (and messy) test got me to this:
http://i.imgur.com/rZuu9.jpg
http://jsfiddle.net/mPqfB/2/ (click the image-link first, otherwise jsfiddle doesn't show the image)
accidentally i made the pattern only be 423 in height and the rotation isn't perfect (don't have photoshop here), but it's good enough to prove that my math is correct.
The trick is to crop the pattern at points where the section being cut off matches the section remaining on the opposite side of the crop area (see example cuts in blue). It'll probably take some trial and error to get it right but you should be able to do it easily enough.

Math problem with smooth zooming into canvas

To all mathematicians out there please read the whole question, this could be answered without any HTML5 knowledge :-)
I am zooming (scaling) a HTML5 canvas and i want to do that in a lot of small steps instead of one big step to make the scrooling seem "smooth". To explain that:
When the user turns the scroolwheel once my goal is to scale the canvas by 20% in total. Now i want to divide this into lets say 20 small steps.
My problem is, that due to the restricted library of the canvas i can only use the scale(xscale,yscale) method of the 2d Context to do so. So i cant set the units to a specific size, that would make it easy. I have to use a factor in every single step and i can't figure out how this could be done.
So the units of my canvas has been 100 in step 1 i want them to be 130 in step 20 with 19 small steps in between.
oldscale: 100, : factor: x, newscale: 101
oldscale: 101, : factor: x, newscale: 102
oldscale: 102, : factor: x, newscale: 103
......
The variables i now is the current stepnumber (0-19), totalsteps (20) and target-zoom in percent (20)
That is a very simple example of what i need. The main problem is, that i only have access to the X and i cant affect the scales directly.
Can anyone help? I hope i explained it well.
You need the 20-th root of 130/100. To compute it use
k = Math.exp(Math.log(130/100) / 20)
k raised to the 20-th power will be 130/100.
Not sur what your problem is :
But I believe you want to find x such that :
(1+x)**numberofStep - 1 = pourcentageOfIncrease //** is power
=> x= numberOfStepsNTHROOTOF(1+pourcentageOfIncrease) -1
for numberofStep =20
and pourcentageOfIncrease =20
you find : x=pow(1+20%,1/20.)-1= 0.00916 = 0.916% increase at each step
Note: This is similar to calculating compound interest but for zooming. You calculate the zoom at each step to get the total zoom.
Hope I understood your problem and it helps

How to randomly but evenly distribute nodes on a plane

I need to place 1 to 100 nodes (actually 25px dots) on a html5 canvas. I need to make them look randomly distributed so using some kind of grid is out. I also need to ensure these dots are not touching or overlapping. I would also like to not have big blank areas. Can someone tell me what this kind of algorithm is called? A reference to an open source project that does this would also be appreciated.
Thanks all
Guido
What you are looking for is called a Poisson-disc distribution. It occurs in nature in the distribution of photoreceptor cells on your retina. There is a great article about this by Mike Bostock (StackOverflow profile) called Visualizing Algorithms. It has JavaScript demos and a lot of code to look at.
In the interest of doing more then dropping a link into the answer, I will try to give a brief summary of the article:
Mitchell's best-candidate algorithm
A simple approximation known as Mitchell’s best-candidate algorithm. It is easy to implement both crowds some spaces and leaves gaps in other. The algorithm adds new points one at a time. For each new sample, the best-candidate algorithm generates a fixed number of candidates, say 10. The point furthest from any other point is added to the set and the process is repeated until the desired density is achieved.
Bridson's Algorithm
Bridson’s algorithm for Poisson-disc sampling (original paper pdf) scales linearly and is easy to implement as well. This algorithm grows from an initial point and (IMHO) is quite fun to watch (again see Mike Bostock's article). All points in the set are either active or inactive. all points are added as active. One point is chosen from the active set and some number of candidate points are generated in the annulus (a.k.a ring) that extends from the sample with the inner circle having a radius r and the outer circle having a radius 2r. Candidate sample less then r distance away from any point in the FinalSet are rejected. Once a sample is found that is not rejected it is added the the FinalSet. If all the candidate sample are rejected the original point is marked as inactive on the assumption that is has so many neighboring points that no more can be added around it. When all samples are inactive the algorithm terminates.
A grid of size r/√2 can be used to greatly increase the speed of checking candidate points. Only one point may ever be in a grid square and only a limited number of adjacent squares need to be checked.
The easiest way would be to just generate random (x, y) coordinates for each one, repeating if they are touching or overlapping.
Pseudocode:
do N times
{
start:
x = rand(0, width)
y = rand(0, height)
for each other point, p
if distance(p.x, p.y, x, y) < radius * 2
goto start
add_point(x, y);
}
This is O(n^2), but if n is only going to be 100 then that's fine.
I don't know if this is a named algorithm, but it sounds like you could assign each node a position on a “grid”, then pick a random offset. That would give the appearance of some chaos while still guaranteeing that there are no big empty spaces.
For example:
node.x = node.number / width + (Math.random() - 0.5) * SOME_SCALE;
node.y = node.number % height + (Math.random() - 0.5) * SOME_SCALE;
Maybe you could use a grid of circles and place one 25px-dot in every circle? Wouldn't really be random, but look good.
Or you could place dots randomly and then make empty areas attract dots and give dots a limited-range-repulsion, but that is maybe too complicated and takes too much CPU time for this simple task.

How do I rotate an image?

See also: Why is my image rotation algorithm not working?
This question isn't language specific, and is a math problem. I will however use some C++ code to explain what I need as I'm not experienced with the mathematic equations needed to express the problem (but if you know about this, I’d be interested to learn).
Here's how the image is composed:
ImageMatrix image;
image[0][0][0] = 1;
image[0][1][0] = 2;
image[0][2][0] = 1;
image[1][0][0] = 0;
image[1][1][0] = 0;
image[1][2][0] = 0;
image[2][0][0] = -1;
image[2][1][0] = -2;
image[2][2][0] = -1;
Here's the prototype for the function I'm trying to create:
ImageMatrix rotateImage(ImageMatrix image, double angle);
I'd like to rotate only the first two indices (rows and columns) but not the channel.
The usual way to solve this is by doing it backwards. Instead of calculating where each pixel in the input image ends up in the output image, you calculate where each pixel in the output image is located in the input image (by rotationg the same amount in the other direction. This way you can be sure that all pixels in the output image will have a value.
output = new Image(input.size())
for each pixel in input:
{
p2 = rotate(pixel, -angle);
value = interpolate(input, p2)
output(pixel) = value
}
There are different ways to do interpolation. For the formula of rotation I think you should check https://en.wikipedia.org/wiki/Rotation_matrix#In_two_dimensions
But just to be nice, here it is (rotation of point (x,y) angle degrees/radians):
newX = cos(angle)*x - sin(angle)*y
newY = sin(angle)*x + cos(angle)*y
To rotate an image, you create 3 points:
A----B
|
|
C
and rotate that around A. To get the new rotated image you do this:
rotate ABC around A in 2D, so this is a single euler rotation
traverse in the rotated state from A to B. For every pixel you traverse also from left to right over the horizontal line in the original image. So if the image is an image of width 100, height 50, you'll traverse from A to B in 100 steps and from A to C in 50 steps, drawing 50 lines of 100 pixels in the area formed by ABC in their rotated state.
This might sound complicated but it's not. Please see this C# code I wrote some time ago:
rotoZoomer by me
When drawing, I alter the source pointers a bit to get a rubber-like effect, but if you disable that, you'll see the code rotates the image without problems. Of course, on some angles you'll get an image which looks slightly distorted. The sourcecode contains comments what's going on so you should be able to grab the math/logic behind it easily.
If you like Java better, I also have made a java version once, 14 or so years ago ;) ->
http://www.xs4all.nl/~perseus/zoom/zoom.java
Note there's another solution apart from rotation matrices, that doesn't loose image information through aliasing.
You can separate 2D image rotation into skews and scalings, which preserve the image quality.
Here's a simpler explanation
It seems like the example you've provided is some edge detection kernel. So if what you want to is detect edges of different angles you'd better choose some continuous function (which in your case might be a parametrized gaussian of x1 multiplied by x2) and then rotate it according to formulae provided by kigurai. As a result you would be able to produce a diskrete kernel more efficiently and without aliasing.

Resources