Recognition of numbers: A to 9

Already a couple of times on the Habré arose a discussion on how to recognize numbers now works. But the article, which would show different approaches to the recognition of numbers, until Habré was not. So here we will try to understand how it all works. And then, if the article will be of interest, and will continue to publish a working model that can be poissledovat.





Software VS Iron h4> One of the key parameters for establishing a system of recognition - iron used for photography. The more powerful and better lighting system, the better the camera, the more likely to recognize the number. Good infrared (IR) Illuminator can enlighten even the dust and dirt that is available on the room, outshine all confounding factors. I think someone came similar "chain letter", which apart from the room can not see anything.


The better the system of shooting - the reliability of the results. Best algorithm without a good shooting is useless: you can always find a room that is not recognized. Here are two very different picture:


This article discusses exactly the software part, with emphasis is on the case where the number is seen badly and distorted (just filmed "with hands" of any camera).

The structure of the algorithm h4> • Advanced Search number - detection area which contains a number
Normalization number - the definition of the exact boundaries of the room, the normalization konstrastom
OCR - reading all that was found in the normalized image
This basic structure. Of course, in a situation where the number of linearly located and well lit, and at your disposal a great recognition algorithm text , the first two items disappear . In some algorithms can be combined number search and its normalization.

Part 1: Algorithms Advanced Search h4>

Analysis of borders and shapes, contour analysis h5> The most obvious method of isolation rooms - find a rectangular contour. Works only in situations when there is a clearly readable outline does not fenced, with sufficiently high resolution and with a smooth boundary.




To filter the image to find границ then allocates all found their contours and analysis . Almost all student work with image processing are done this way. Examples in the internet full . Works poorly, but somehow.

The analysis of only part of the borders h5> Much more interesting, more stable and more practical approach seems , where the scope of the analyzes only part of it. Highlighted contours and then searched all vertical lines. For any two lines located close to each other, with a slight shift along the axis y, with the correct ratio of the distance between them to their length, a hypothesis that is room between them. In fact, this approach is similar to the simplified method HOG .


Histogram analysis of regions h5> One of the most popular methods of approach is the analysis of the image histogram ( 1 , 2 ). The approach is based on the assumption that the frequency characteristic of the region with a number different from the frequency response of the neighborhood.




The image border allocated (allocation of high-spatial components of the image). We construct the projection image on the axis y (sometimes on the axis x). Maximum projection obtained may coincide with the location of the room.
This approach has a significant drawback - the machine in size should be comparable with the size of the frame, ie. A. Background can contain inscriptions or other detailed objects.

Statistical analysis, classifiers h5> What is minus all previous methods? The fact that on the real, stained with mud rooms expressed no borders, no pronounced statistics. From below shows some examples of such numbers. And I must say, for Moscow, such examples are not the worst option.




Best practices, though not often enough used, this methods based on different classifiers. For example, works well trained Haar cascade . These methods allow you to analyze the area for the presence of her characteristic numbers of relations, points or gradients. The most beautiful I think the method based on a specially синтезированном converting . True, I have not tried, but, at first glance, should work steadily.
These methods allow to find not just a room number and a complex and atypical conditions. The same cascade Haar base collected in the winter in the center of Moscow gave about 90% of correct detections rooms and 2-3% false capture. Neither detection algorithm borders or histograms can not issue such quality on detection so bad picture.

The weak point of h5> Many methods in actual algorithms are directly or indirectly based on the presence of the boundaries of the room. Even if limits are not used for detecting the room, it may be used for further analysis.
Surprisingly, but for statistical algorithms complex cases it may be even a relatively clean room in chrome (light) frame on the white car, since it occurs much less frequently dirty rooms and can not meet a sufficient number of times in the training.

Part 2: normalization algorithms h4> Most of the above algorithms exhibit number is not accurate and require further clarification of his position, and improved image quality. For example, in this case requires the rotation and cropping the edges:


Rotation numbers in a horizontal orientation h5> when left alone neighborhood rooms, isolation boundaries starts to work much better, as all the long horizontal lines, which managed to extract - this will be the border of the room.
The simplest filter is capable of releasing such direct - conversion Хафа:


Convert Huff allows quickly distinguish two main lines and crop the image on them:


Increasing the contrast h5> And the best way or the other to improve the contrast of the resulting image. Strictly speaking, it is necessary to strengthen the region of interest of spatial frequencies:


The partition on the letters h5> After rotation, we have a room with a horizontal inaccurate definite left and right edges. Precisely cut unnecessary now not necessarily enough to simply cut the letters are available in the room and work with them during recognition.


(The figure has already carried out an operation binarization, t. E. Used some rule of separation of pixels into two classes. In the separation of the rooms on the characters, this operation is not required, and in the future may prove harmful)
Now it is sufficient to find the maximum horizontal charts, and it will be gaps letters. Especially if we expect a certain amount of characters and the distance between the marks will be about the same, then the partition on the letters on the histogram will work perfectly.
One can only cut with the letter and go to the procedure for their identification.

Weaknesses h5> With a significant number periodic pollution peaks at a partition on the symbols can not just show up, although the characters can be visually quite readable.
Horizontal border rooms - not always a good benchmark. Rooms can be bent nominally (Mercedes C-Class), can be carefully sunk in the wrong almost square recess for the room on American cars. And the upper limit of the rear rooms are just part of the body is covered by the elements.
Of course, take into account all these problems - this is a serious problem for a system of recognition of numbers.

Part 3: character recognition algorithms h4> The problem of recognition of text or individual characters (optical character recognition, OCR) on the one hand is difficult, but on the other - quite classical. There are many algorithms to solve it, some of which reached perfection . On the other hand, the best algorithms in the public domain not. There are, of course Tesseract OCR and several of his peers, but these algorithms do not solve all problems. In general, the methods of text recognition can be divided into two classes: methods based on the structural morphology and circuit analysis dealing with the binarized image and raster methods based on the analysis of the direct image. This often uses a combination of structural and raster methods

Unlike standard tasks OCR h5> Firstly, in any case in Russia, car number, a standard font. It's just a gift for the automatic recognition system of signs. 90% of the efforts spent on OCR handwriting.
Second, the dirt.


Here then have to throw an absolute majority of the known methods for character recognition, especially when the path of the image is binarized to check the connectivity of areas delimited.

Tesseract OCR h5> This is open source software that performs automatic recognition as a single letter and immediately text. Tesseract is convenient because it is, for any operating system runs stably and easily trainable. But it works very poorly with zamylennym, broken, dirty and deformed text. When I tried to do it on the recognition of rooms - on the strength of only 20-30% of the rooms from the database correctly recognized. The most clear and straight. Although, of course, and when you use ready-made libraries something depends on the radius of curvature of the hands.

K-nearest h5> Very easy to understand method of character recognition, which, despite its primitive, often can not win the most successful implementation of SVM or neural network methods.
It works as follows:
1) pre-records a decent amount of real images of characters already correctly divided in classes with their own eyes and hands
2) introduce a measure of the distance between the symbols (if the image is binarized, the XOR operation is optimal)
3) Then, when we try to recognize the symbol, in turn calculates distance between it and all the characters in the database. Among the k nearest neighbors may be representatives of different classes. Naturally, members of the class more among the neighbors, the class should include the recognizable symbol.

In theory, if we write a very large database with examples of characters taken from different angles, lighting, with all the rubbing, the K-nearest - it's all you need. But then you need to quickly calculate the distance between the images and, therefore, Binarization and its use XOR. But then it is in the case of contaminated or worn rooms will be problems. Binarization unpredictably alter the character.
The method has one very important advantage: it is simple and transparent, and hence easy to debug and tune to the optimal result. In many cases, it is important to understand how your algorithm.

Correlation h5> Most methods that are used in image recognition, built on an empirical approach. But nobody forbids to use the mathematical apparatus of the theory of probability, which was just polished in problems of signal detection in radar systems. The font on the car numbers we know, noise or dust on the camera room can hardly be called a Gaussian. There is some uncertainty on the location of the symbol and its slope, but these parameters can iterate. If we leave the image is binarized, we still unknown, and the amplitude of the signal, t. E. The brightness of the symbol.
I do not want to go into the exact solution of this problem within the article. In fact it's still all comes down to the operation of calculating the covariance of the input signal with a hypothetical (considering the set of displacements and rotations):
X - input signal, Y - a hypothesis. Designation E - expectation.
If it is necessary to select from the different symbols, the hypothesis for rotation and displacement plotted for each symbol. If we know that the input image contains the symbol, the maximum covariance of all hypotheses define a symbol, its offset and slope. Here, of course, raises the problem of proximity images of different characters ("p" and "c", "o" and "c", etc.). The most simple - you enter for each character weighting coefficient matrix.
Sometimes these methods are called «template-matching», that fully reflects their essence. Given pattern - compare the input image with the samples. If there is any uncertainty in the parameters, then either iterate through all the possible options, or use the адаптивные approaches , though here already know and understand mathematics have.
Advantages of the method:
- Predictable and well-studied result, if a little noise corresponds to the chosen model;
- If the font is set strictly, as in our case, it is able to discern much dusty / dirty / worn character.
Disadvantages:
- Computationally very expensive.

Neural networks h5>

About artificial neural network to Habré had already written a lot . Now they are divided into two generations:
- 2-3-ply classic neural network studying gradient methods with back-propagation of errors (3-layer neural network shown in the figure);
- So-called deep-learning neural network and convolutional network.
The second generation of neural networks for the past 7 years, winning various competitions on recognition of images, giving the result is slightly better than the other methods.
There is an open database of images of handwritten digits. Table results very clearly demonstrates the evolution of the various methods, including algorithms based on neural networks.
Also worth special mention that the fonts for printing works just fine single-layer or double-layer (a question of terminology) network , which is essentially nothing differs from the template-matching approaches.
Advantages of the method:
- When properly configured and training may work better than other known methods;
- With more learning dataset resistant to distortion of characters.
Disadvantages:
- The most difficult for these methods;
- Diagnosis of abnormal behavior in multilayer networks is simply impossible.

Conclusion h4> The article describes the basic methods of recognition, their typical glitches and bugs. Perhaps this will help you to make your room a bit more readable when traveling around the city, or vice versa.
Still, I hope that was able to show a complete lack of magic in the problem of recognition of numbers. Everything is absolutely clear and intuitive. It is not a terrible problem for the student's course work in the relevant specialty.
A few days later ZlodeiBaal put a small raspoznavalka numbers, based on our work on which this article was written. It can be a snare.
ZY All of the rooms, which are listed in the article - extracted from Google and Yandex simple requests

References h4> 1) ALGORITHMIC AND MATHEMATICAL PRINCIPLES OF AUTOMATIC NUMBER PLATE RECOGNITION SYSTEMS ONDREJ MARTINSKY - Review article.
2) A Real-Time Mobile Vehicle License Plate Detection and Recognition Kuo- Ming Hung and Ching-Tang Hsieh -gistogrammny approach in recognizing numbers
3) Robust License Plate Detection Using Covariance Descriptor in a Neural Network Framework Fatih Porikli, Tekin Kocak - neural network approach in finding rooms
4) Automated Number Plate Recognition Using Hough Lines and Template Matching Saqib Rasheed, Asad Naeem and Omer Ishaq - find rooms through HOG-descriptors vertical lines
5) Survey of Methods for Character Recognition Suruchi G. Dedgaonkar, Anjali A. Chandavale, Ashok M . Sapkal - small review article about the recognition of beeches and numbers
7) Textbook «Основа theory of image processing », Krasheninnikov VR

Source: habrahabr.ru/post/221891/

Tags

See also

New and interesting