I am researching RNN as a beginner. I have walked through some examples and tutorials about RNN.Regarding the Loss partial derivative with respect to V (weight array between outputs and states), outer product is used in some example and inner(dot) product in another. follow the examples respectively as two following link:
https://mmuratarat.github.io/2019-02-07/bptt-of-rnn
https://www.analyticsvidhya.com/blog/2019/01/fundamentals-deep-learning-recurrent-neural-networks-scratch-python/
which one is correct?
I captured screen shots as well below respectively:
enter image description here
enter image description here
Related
I am trying to implement the radial distortion correction from this technical report: https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/tr98-71.pdf
I am having quite a few problems interpreting/understanding the equations in the report. The equations of interest are number 11 and 12 with the matrix formulation provided just afterwards.
The problem I am having is that its really unclear as to what the descriptions of some of the variables actually mean.
If we look at the descriptions of [u,v] and [x,y], in the technical report it states that [u,v] are the pixel image coordinates and that [x,y] are the normalised image coordinates. My intuitive understanding of this is that [x,y] = [u,v] – [principal point] but then the presence of the [u-u_0] term would be redundant if that was correct.
I so far have been able to determine the intrinsic parameters of a camera and the only thing perventing me doing distortion correction is my understanding of the equations.
The steps are clearly laid out in the documentation of the OpenCV rooutines implementing Zhang's method. Please see the "Detailed Description" section here
I was wondering what kind of QR code does not sport squares in the corners and is divided into 4 quarters by a solid black line? I would like to replicate this, since I think they look more professional than the variety I have seen before, but I cannot find out what kind of code it would be?
It's a data matrix 2D barcode.
http://en.wikipedia.org/wiki/Data_Matrix
http://en.wikipedia.org/wiki/Barcode
It only looks divided into 4 blocks when you get above a certain size (such as 20x20 shown below)
http://jpgraph.net/download/manuals/chunkhtml/images/datamatrix-structure-details.png
This article talks more about these blocks (or more officially, 'Data Regions' or 'sub matrices'). See page 12 for a table of common sizes and data region breakdowns:
http://www.gs1.org/docs/barcodes/GS1_DataMatrix_Introduction_and_technical_overview.pdf
The matrix symbol (square or rectangle) will be composed of several
areas of data (or: Data Regions), which together encode the data.
I am currently working on some kind of OCR (Optical Character Recognition) system. I have already written a script to extract each character from the text and clean (most of the) irregularities out of it. I also know the font. The images I have now for example are:
M (http://i.imgur.com/oRfSOsJ.png (font) and http://i.imgur.com/UDEJZyV.png (scanned))
K (http://i.imgur.com/PluXtDz.png (font) and http://i.imgur.com/TRuDXSx.png (scanned))
C (http://i.imgur.com/wggsX6M.png (font) and http://i.imgur.com/GF9vClh.png (scanned))
For all of these images I already have a sort of binary matrix (1 for black, 0 for white). I was now wondering if there was some kind of mathematical projection-like formula to see the similarity between these matrices. I do not want to rely on a library, because that was not the task given to me.
I know this question may seem a bit vague and there are similar questions, but I'm looking for the method, not for a package and so far I couldn't find any comments regarding the method. The reason this question being vague is that I really have no point to start. What I want to do is actually described here on wikipedia:
Matrix matching involves comparing an image to a stored glyph on a pixel-by-pixel basis; it is also known as "pattern matching" or "pattern recognition".[9] This relies on the input glyph being correctly isolated from the rest of the image, and on the stored glyph being in a similar font and at the same scale. This technique works best with typewritten text and does not work well when new fonts are encountered. This is the technique the early physical photocell-based OCR implemented, rather directly. (http://en.wikipedia.org/wiki/Optical_character_recognition#Character_recognition)
If anyone could help me out on this one, I would appreciate it very much.
for recognition or classification most OCR's use neural networks
These must be properly configured to desired task like number of layers internal interconnection architecture , and so on. Also problem with neural networks is that they must be properly trained which is pretty hard to do properly because you will need to know for that things like proper training dataset size (so it contains enough information and do not over-train it). If you do not have experience with neural networks do not go this way if you need to implement it yourself !!!
There are also other ways to compare patterns
vector approach
polygonize image (edges or border)
compare polygons similarity (surface area, perimeter, shape ,....)
pixel approach
You can compare images based on:
histogram
DFT/DCT spectral analysis
size
number of occupied pixels per each line
start position of occupied pixel in each line (from left)
end position of occupied pixel in each line (from right)
these 3 parameters can be done also for rows
points of interest list (points where is some change like intensity bump,edge,...)
You create feature list for each tested character and compare it to your font and then the closest match is your character. Also these feature list can be scaled to some fixed size (like 64x64) so the recognition became invariant on scaling.
Here is sample of features I use for OCR
In this case (the feature size is scaled to fit in NxN) so each character has 6 arrays by N numbers like:
int row_pixels[N]; // 1nd image
int lin_pixels[N]; // 2st image
int row_y0[N]; // 3th image green
int row_y1[N]; // 3th image red
int lin_x0[N]; // 4th image green
int lin_x1[N]; // 4th image red
Now: pre-compute all features for each character in your font and for each readed character. Find the most close match from font
min distance between all feature vectors/arrays
not exceeding some threshold difference
This is partially invariant on rotation and skew up to a point. I do OCR for filled characters so for outlined font it may have use some tweaking
[Notes]
For comparison you can use distance or correlation coefficient
In a tournament chart from the bottom to the top where there is a winner I've been told that it is somehow connected with the gray-code. I know that the grey code is an alternative code, it's recursive and is useful to find the best solution in various games, space-filling-curves, error correction codes, harddisk positioning and is a shorthand for the piano player but how is this code is related with a tournament chart?
Parsed the following from here:
A tournament is really a node in a binary tree. The value in each
node contains the ranking of the best ranking team contained in the
tournament tree. It turns out that the gray code of the ranking-1 has
a bit pattern that conveniently helps us descend the binary tree to
the appropriate place at which to put the team. When descending the
three, the bits in the gray code of the ranking from least-significant
to most-significant indicate which branch to take.
I'm trying to create a very specific geodesic tessellation, but I can't find anything online about it.
It is normal to subdivide the triangles of an icosahedron into triangle patches and project them onto the sphere. However, I noticed an animated GIF on the Wikipedia entry for Geodesic Domes that appears not to follow this scheme. Geodesic spheres generally comprise a mixture of mostly hexagonal triangle patches, with pentagonal patches forming at the vertices of the original icosahedron; in most cases, these pentagons are linked together; that is, following a straight edge from the center of one pentagon leads to the center of another pentagon. In the Wikipedia animation, however, the edge from the center of one pentagon doesn't appear to intersect the center of an adjacent pentagons; instead it intersects the side of the other pentagon.
Where can I go to learn about the math behind this particular geometry? Ideally, I'd like to know of an algorithm for generating such tessellations.
Marcelo,
The most-commonly employed geodesic tessellations are either Class-I or Class-II. The image you reference is of a Class-III tessellation, more-specifically, 4v{3,1}. The classes can be diagrammed, so:
Class-III tessellations are chiral, and can have left-handed or right-handed twist. Here's the mirror-image of the sample you referenced:
You can find some 3D models of Class-III spheres, at Google's 3D Warehouse:
http://sketchup.google.com/3dwarehouse/cldetails?mid=b926c2713e303860a99d92cd8fe533cd
Being properly identified should get you off to a good start.
Feel free to stop by the Geodesic Help Group; http://groups.google.com/group/GeodesicHelp?hl=en
TaffGoch
Here's an image from one of Joe Clinton's NASA publications:
I believe it is actually just a matter of resolution (i.e., number of sub-divisions). The tessellation you show does seem to emanate from an icosahedron scheme: cf p.7 here, mid-page example. Check out the rest of the document for some calculation details - also its cited references, and some further code samples here.
Marcelo,
If you want to devise algorithms to generate any class of geodesic spheres, you can do it here:
http://thomson.phy.syr.edu/thomsonapplet.htm
Start by using the "custom(m,n)" option, select your desired parameters, then hit the "pause" button. Switch to "lattice energy" and hit the "Auto" button.
If you're intimately familiar with java, you can save the "jar" file(s) for this app, and examine the contents, to back-engineer the algorithms.
BTW, this java app also has a "File" menu option, which can activate a new window, listing the "Point set" (vertex coordinates.) I copy & paste them into an Excel spreadsheet, from which I can generate a "csv" file that can be, subsequently, imported into 3D-graphic programs.
Taff