WO2018133717A1

WO2018133717A1 - Image thresholding method and device, and terminal

Info

Publication number: WO2018133717A1
Application number: PCT/CN2018/072047
Authority: WO
Inventors: 刘银松; 郭安泰
Original assignee: 腾讯科技（深圳）有限公司
Priority date: 2017-01-17
Filing date: 2018-01-10
Publication date: 2018-07-26
Also published as: CN106874906A; CN106874906B

Abstract

Disclosed are an image thresholding method and device, and a terminal. The present invention only requires independently using various thresholding methods having favorable compensation performance to process an image to be processed, then using an optical character recognition-based learning engine to obtain confidence values of respective characters so as to further calculate a confidence value of a text, thereby dynamically selecting an optimal processing result. The invention does not need to take into consideration global information or local texture and realizes a seamless shift between processing results produced by various thresholding methods. The present invention enables dynamic selection of an optimal thresholding result under different scenarios, thereby meeting diverse requirements of different scenarios, and enabling thresholding of an image in various scenarios.

Description

Image binarization method, device and terminal

This application claims priority to Chinese Patent Application No. JP-A No. No. No. No. No. No. No. No. No. No. No. Combined in this application.

Technical field

This application relates to the field of image processing.

Background technique

The binarization of the image is to set the gray value of the pixel on the image to 0 or 255, so that the entire image presents a distinct black and white visual effect. Binarization is the basic operation of image processing and its application is very extensive. Accordingly, there are many binarization methods, such as the bimodal method, the P-parameter method, the iterative method, and the maximum inter-class variance method.

However, the diversity of binarization methods and the limitations of each binarization method lead to the difficulty of quickly finding a suitable binarization method when it is necessary to binarize pictures of multiple scenes, thus affecting the image. Binary effect.

Summary of the invention

In order to solve the above technical problem, the present application proposes a method, a device and a terminal for binarization of pictures.

The technical solution of the embodiment of the present application is as follows:

In one aspect, a method for binarization of a picture is provided, the method comprising:

The binarization device of the image acquires a to-be-processed image, and the to-be-processed image contains text;

The binarization device of the image separately performs independent binarization processing on the to-be-processed image by using a plurality of preset binarization processing methods, and each binarization method obtains a processing result;

The binarization device of the picture obtains a set of processing results according to the processing result;

The binarization device of the picture calculates a text confidence of each processing result in the set of processing results;

The binarization device of the picture selects the processing result with the highest text confidence as the binarization result of the to-be-processed picture.

On the other hand, a binarization device for pictures is provided,

In a possible implementation manner, the device includes:

The image acquisition module to be processed is used to obtain a picture to be processed;

The processing result obtaining module is configured to perform independent binarization processing on the to-be-processed image by using a plurality of preset binarization processing methods, and each binarization method obtains a processing result;

Processing a result set obtaining module, configured to obtain a processing result set according to the processing result;

a text confidence calculation module, configured to calculate a text confidence of each processing result in the processing result set;

The binarization result obtaining module is configured to select a processing result with the highest degree of text confidence as a binarization result of the to-be-processed picture.

In another possible implementation, the device includes:

Transceiver, processor and bus;

The transceiver and the processor are connected by the bus;

The processor performs the following steps:

Get the image to be processed;

Performing independent binarization processing on the to-be-processed image by using a plurality of preset binarization processing methods, and each binarization method obtains a processing result;

Obtaining a set of processing results according to the processing result;

Calculating a text confidence of each processing result in the set of processing results;

The processing result with the highest degree of confidence in the text is selected as the binarization result of the image to be processed.

In another aspect, a binarization terminal for a picture is provided, the terminal comprising a binarization device of the above picture.

In another aspect, an embodiment of the present application provides a computer readable storage medium, comprising instructions that, when executed on a computer, perform the method described in the first aspect above.

In another aspect, an embodiment of the present application provides a computer program product comprising instructions for performing the method of the first aspect described above when the computer program product is run on a computer.

The present application provides a method, a device and a terminal for binarization of pictures, which have the following beneficial effects:

The present application calculates the text confidence of the binarization result of the image to be processed based on the optical character recognition, and dynamically selects an optimal binarization method according to the text confidence, thereby obtaining an optimal binarization result for the image to be processed. The present application can dynamically select an optimal binarization result in different scenarios to meet the diversity requirements of different scenarios, and implement full scene adaptation for picture binarization.

DRAWINGS

1 is a flowchart of a method for binarization of a picture in the embodiment of the present application;

2 is a flowchart of a method for acquiring text confidence in an embodiment of the present application;

3 is a flowchart of a weighted average algorithm in the embodiment of the present application;

4 is a flowchart of a binarization method based on a sliding window in the embodiment of the present application;

5 is a flowchart of a local binarization method in an embodiment of the present application;

6 is a flowchart of a binarization method based on color value statistics in an embodiment of the present application;

7 is a structural diagram of a convolutional neural network in an embodiment of the present application;

Figure 8 is a picture to be processed in the embodiment of the present application;

9 is a processing result of the to-be-processed picture in FIG. 8 in the binarization method based on the sliding window in the embodiment of the present application;

10 is a processing result of the to-be-processed picture in FIG. 8 by the binarization method based on color value statistics in the embodiment of the present application;

Figure 11 is another picture to be processed in the embodiment of the present application;

FIG. 12 is a processing result of the to-be-processed picture in FIG. 11 according to the binarization method based on the sliding window in the embodiment of the present application; FIG.

FIG. 13 is a processing result of the binarization method based on color value statistics in the embodiment of the present application for the to-be-processed picture in FIG. 11;

FIG. 14 is a block diagram of a binarization apparatus for a picture in the embodiment of the present application; FIG.

15 is a block diagram of a text confidence calculation unit in the embodiment of the present application;

16 is a block diagram of a processing result obtaining module in the embodiment of the present application;

17 is a block diagram of a sliding window binarization unit in the embodiment of the present application;

18 is a block diagram of a color value statistical binarization unit in the embodiment of the present application;

FIG. 19 is a structural block diagram of a terminal in an embodiment of the present application.

detailed description

The technical solutions in the embodiments of the present application are clearly and completely described in the following with reference to the drawings in the embodiments of the present application. It is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments.

Please refer to FIG. 1, which is a flowchart of a binarization method for a picture provided by an embodiment of the present application. The method can include the following steps.

Step 101: Acquire a picture to be processed, where the picture to be processed includes text.

Step 102: Perform independent binarization processing on the to-be-processed image by using a plurality of preset binarization processing methods, and each binarization method obtains a processing result.

The preset binarization method can select an existing binarization method, and the number of preset binarization methods can be two or more.

The existing binarization methods are mainly divided into two categories. One is a global method, which determines a unified segmentation threshold from a global perspective and binarizes by segmentation thresholds. The other is a locally adaptive method based on different regions of the image. In the case, different thresholds are determined and binarized according to the threshold.

The global method mostly calculates a segmentation threshold that can achieve the maximum binarization effect according to the global color statistics of the image, and then performs simple binarization according to the segmentation threshold. This method works well only in images with simple background and single color, and poor effect on images with complex texture information or low contrast.

The local adaptive method mostly calculates the binarization threshold based on the local texture information, which can avoid the misjudgment of the global threshold to a certain extent, but often it is too focused on local information, ignoring the global coordination information, and often causing adjacent The effect of local binarization is very different, and the effect of binarization in adjacent areas is inconsistent.

It can be seen that the existing binarization methods can only deal with fixed scenes, and the adaptive ability is not strong. In order to optimize the binarization effect of the image, the accuracy of the text extraction in the image to be processed in step 101 is improved. In step 102, a plurality of binarization methods can be enumerated, and the text confidence of the subsequent calculation is used to select the most The good binarization method is used to collect the effects of various binarization methods and lengthen the shortcomings, thereby expanding the binarization scene of the image and obtaining the best binarization effect.

Step 103: Obtain a processing result set according to the processing result.

In step 102, a plurality of binarization methods are enumerated, and each binarization method obtains a processing result, thereby constituting a processing result set.

Step 104: Calculate a text confidence of each processing result in the processing result set.

The text confidence is used to characterize the probability that the text in the processing result can be accurately recognized, and the text confidence can be used as an evaluation index of the processing effect of the binarization method. The high confidence of the text indicates that the binarization processing effect is ideal, and the text confidence is low, indicating that the binarization processing effect is not satisfactory.

Step 105: Select a processing result with the highest degree of confidence in the text as a binarization result for the to-be-processed picture.

If there are multiple processing results with the highest degree of confidence in the text, one of the processing results is selected as the binarization result for the to-be-processed image according to a preset selection method. The preset selection method may be a random selection or other selection method.

The embodiment of the present application can enumerate various binarization methods and dynamically select an optimal processing result, so that the pictures of each scene can be binarized and a better binarization effect is obtained. Improve the compatibility of image processing; use text confidence as the evaluation standard of image binarization effect, which can make the selected processing result get the best text recognition result, which is beneficial to other word processing for the result later. .

Further, please refer to FIG. 2, which shows a flow chart of a method for acquiring text confidence in step 104, including:

Step 1041: Acquire a confidence level of each character in the processing result.

Specifically, the confidence of the output of the learning engine is obtained by inputting the processing result into a preset optical character recognition (OCR)-based learning engine. The learning engine may be a deep learning engine based on a Convolutional Neural Network (CNN), and the deep learning engine based on the CNN can better recognize single-word images, and has high accuracy, accurate confidence, and the like. Features are superior to the traditional traditional recognition engine. In addition, one of the advantages of the CNN-implemented deep learning engine over the conventional image processing algorithm is that the pre-processing process (extracting artificial features, etc.) for complex images is avoided, and the original image can be directly input. For example, you can directly input a single-word picture with a resolution of 28*28 and output the confidence directly. The output of confidence is more reliable than traditional methods.

In addition, the traditional Tesseract learning engine and the Nhocr engine also support the recognition of confidence, which can also be used in this embodiment.

In the deep learning engine based on the Convolutional Neural Network (CNN), the traditional Tesseract learning engine and the Nhocr engine, the output of a single text has a confidence of between 0 and 1.

Step 1042: Calculate the text confidence of each of the processing results according to a preset text confidence algorithm and a confidence level of each text.

Specifically, the preset text confidence algorithm includes, but is not limited to, a weighted average algorithm using a weighted average of confidence as a text confidence, and a geometric mean algorithm using a geometric mean of confidence as a text confidence. The squared mean of the confidence is used as the squared mean algorithm of the text confidence and the harmonic mean of the confidence is used as the harmonic mean of the text confidence. Taking the weighted average algorithm as an example, please refer to FIG. 3, which shows a flowchart of the weighted average algorithm, including:

S1. Set the weight corresponding to each text.

Taking N characters as an example, according to the order of the characters in the picture, the weights of the characters can be Q ₀ ... Q _{n-1 respectively} . The weight corresponding to each text can be set randomly by the program, or can be set according to actual needs.

S2. Weighting and summing the confidence according to the confidence of each character and the weight corresponding to the text.

According to the order of the text in the picture, the confidence of each text can be Z ₀ ... Z _{n-1 respectively} , then the process of weighted summation can be expressed as

S3. The weighted average confidence is obtained by dividing the result of the weighted sum by the number of words in the processing result.

Weighted average confidence is

S4. The weighted average confidence is taken as the text confidence.

Weighted average confidence

Applyed as text confidence to step S105

In the embodiment of the present application, by selecting different text confidence algorithms and setting different parameters in the selected text confidence algorithm, the reliability and distinguishing ability of the text confidence can be improved, and the distinction between multiple processing results is supported, and The optimal processing result is distinguished from the plurality of processing results.

Further, in step 102, an existing binarization method may be used, or a custom binarization method may be used, and the following possible binarization methods are taken as an example:

One possible implementation: direct binarization.

After the image is grayed out, for each pixel value of the scanned image, the gray value of the pixel whose gray value is less than 127 is set to 0 (black), and the gray value of the pixel whose gray value is greater than or equal to 127 is set to 255. (White), the advantage of the method is that the amount of calculation is small and fast. The disadvantage is that the pixel distribution and pixel value characteristics of the image are not considered.

Another possible implementation: binarization based on the mean K.

After the image is grayscaled, the average value K of the pixels in the image is calculated; the gray value of each pixel of the scanned image, if the gray value is greater than K, the gray value of the pixel is set to 255 (white), If the gradation value is less than or equal to K, the gradation value of the pixel is set to 0 (black). This method uses the average value as the binarization threshold, although simple, but may result in partial object pixels or background pixels being lost. The binarization result is difficult to truly reflect the source image information.

Another possible implementation: the maximum interclass variance method.

Suppose the image is composed of two parts, the foreground area and the background area. By traversing, calculate the gray histogram of the foreground area and the background area in the segmentation result under different thresholds (usually within the range of [0, 255]), and then compare the two. The variance between the variances, the grayscale threshold that maximizes the variance, is the desired binarization threshold.

For each pixel value of the scanned image, if the gray value is greater than the binarization threshold, the gray value of the pixel is set to 255 (white), and if the gray value is less than or equal to the binarization threshold, then The gray value value of the pixel is set to 0 (black).

The largest interclass variance method is a classical method of binarization, which achieves a good balance between computational speed and binarization effect, but as a global binarization method, its texture information is complex or low contrast. The image is less effective.

The above-mentioned common binarization method or other binarization methods listed in the embodiments of the present application can be applied to step S102. In order to achieve a better binarization effect, the embodiment of the present application performs binarization processing on the picture to be processed by the binarization method based on the sliding window and the binarization method based on the color value statistics in step S102.

Please refer to FIG. 4, which shows a flowchart of a sliding window based binarization method, including:

Step T1. Set a window to a preset position of the to-be-processed picture.

The size and shape of the window may be set according to actual needs. Taking a window containing M*N pixels as an example, the window is set at a preset position. The preset position may also be set according to actual needs. Specifically, it may be located at the upper left corner or the lower right corner of the to-be-processed picture. For a picture to be processed with a width of M pixels, it may be located at the leftmost end of the to-be-processed picture or The far right.

Step T2. Determine whether the pixels in the window and the related pixels belong to a continuous pattern.

The related pixel is a pixel adjacent to the window outside the window. The purpose of step T2 is to determine whether the M*N pixels and the related pixels in the picture to be processed falling within the window belong to a continuous pattern. If not, it is determined that the window contains text.

Step T3. If no, the pixels in the window are locally binarized.

If the window contains text, it is binarized for the pixels in the window. In this embodiment, only the binarization effect of the text portion in the image to be processed is concerned. Therefore, if the window does not contain characters, the processing is not performed, or the window may be directly set to a uniform gray scale.

Step T4. Determine whether the window reaches the end point of the preset track.

The window slides according to a preset trajectory, and the sliding ends at the end of the preset trajectory.

Step T5. If no, the window is slid according to a preset trajectory.

The preset track can be set by itself according to actual needs. For a picture to be processed having a width of M pixels, it may be located to move the window along its length.

Step T6. Return to step T2.

Specifically, the local binarization method used in step T3 can be implemented by using the existing binarization method. In this embodiment, please refer to FIG. 5, which shows the local binarization method in step T3 in this embodiment. Flow chart, including:

Step T31. Obtain a color distribution statistical result of the pixels in the window.

Step T32. Set a threshold according to the statistical result, where the threshold is used to distinguish the foreground and the background of the to-be-processed picture.

Assuming that the picture is composed of two parts, a foreground area and a background area, the foreground pixel and the background pixel of the image are distinguished by selecting the threshold. For example, a pixel whose color is larger than the threshold is classified as a foreground pixel, and vice versa is a background pixel; or a pixel whose color is smaller than the threshold is classified as a foreground pixel, and vice versa. The threshold can be such that the color mean of the foreground pixels and the color mean of the background pixels have the largest difference after segmentation based on the threshold.

Step T33. Binarize pixels in the window according to the threshold.

Specifically, the foreground pixel can be set to 255 (white) and the background pixel to 0 (black); the foreground pixel can be set to 0 (black) and the background pixel to 255 (white).

The sliding window-based binarization method provided by the embodiment of the present invention belongs to a local adaptive method, and is more suitable for a scene in which a text line is divided among pictures, and can obtain better in a scene in which color information is relatively simple and the background texture is not complicated. Binary effect.

Please refer to FIG. 6, which shows a flowchart of a binarization method based on color value statistics, including:

Step P1. Obtain a color distribution statistical result of the pixels of the to-be-processed picture.

Step P2. Based on the color distribution statistical result, two target colors are obtained using a preset color clustering algorithm.

Clustering is an aggregation of data that aggregates similar data into one class. Clustering is an unsupervised classification that has the advantage of not requiring an advance training process. Under normal circumstances, the color clustering algorithm can reduce the range of color space, increase the distance between each color, and get the color clustering result (target color). Currently, the commonly used color clustering method is K-means, mixed Gaussian. Gaussian Mixture Models (GMM), Mean shift and other methods.

Step P3. Set the foreground color and the background color according to the two target colors.

Step P4. The first distance and the second distance of the pixels of the to-be-processed picture are sequentially calculated, and the attribution of the pixel is determined according to the calculation result.

Specifically, the first distance is a Euclidean distance between a color of the pixel and the foreground color, and the second distance is a Euclidean between the pixel color and the background color distance. If the first distance is smaller than the second distance, it is determined that the pixel belongs to the foreground; if the first distance is greater than the second distance, it is determined that the pixel belongs to the background.

Step P5. Perform binarization on the pixels in the to-be-processed picture according to the determination result.

Assuming that the picture is composed of two parts, a foreground area and a background area, the pixel is determined to belong to the foreground pixel or the background pixel by calculating the first distance and the second distance of the pixels in the picture. Specifically, the foreground pixel can be set to 255 (white) and the background pixel to 0 (black); the foreground pixel can be set to 0 (black) and the background pixel to 255 (white).

The binarization method based on the color value statistics provided by the embodiment of the present application belongs to the global method. Since the target color is calculated by the clustering method, it can be applied to a complex scene and has a wide application range.

Further, in the step S104, the present application evaluates the processing result of the binarization processing of the image to be processed by the binarization method based on the sliding window and the binarization method based on the color value statistics by the depth learning engine implemented by the CNN. The text confidence of the processing result of the binarization method based on the sliding window and the text confidence of the processing result of the binarization method based on the color value statistics are respectively obtained.

CNN is one of the most representative network structures in deep learning technology. It has achieved great success in the field of image processing. On the international standard ImageNet dataset, many successful models are based on CNN, which is used in this embodiment. The deep learning engine is also based on the convolutional neural network CNN. One of the advantages of CNN over traditional image processing algorithms is that it avoids complex pre-processing of images (extracting artificial features, etc.), can directly input the original image, and output confidence for a single text.

In image processing, an image is usually regarded as one or more two-dimensional vectors. For a grayscale image, it can be regarded as a two-dimensional vector, and the gray value of the pixel is the two-dimensional vector. The color picture represented by RGB (RGB color mode is a color standard in the industry) has three color channels, which can be represented as three two-dimensional vectors. The traditional neural network adopts the full connection mode, that is, the neurons from the input layer to the hidden layer are all connected, which will result in a large amount of parameters, making the network training time-consuming and even difficult to train, and is used in this embodiment. The convolutional neural network avoids this difficulty through methods such as local connection and weight sharing. Therefore, compared with the traditional learning engine, the time complexity in the deep learning engine operation based on CNN is greatly reduced in this embodiment. Therefore, it has more excellent performance.

In this embodiment, there are mainly two types of network layers in the CNN, which are a convolution layer and a pooling/sampling layer. The function of the convolutional layer is to extract various features of the image; the role of the pooling layer is to abstract the original feature signal, thereby greatly reducing the training parameters, and also reducing the degree of overfitting of the model.

The convolutional layer is obtained by calculating the convolution kernel on the input layer of the previous stage by sliding the window one by one. Each parameter in the convolution kernel is equivalent to the weight parameter in the traditional neural network, and is connected with the corresponding local pixel. The sum of the parameters of the convolution kernel and the corresponding local pixel values (usually plus an offset parameter) to obtain the result on the convolutional layer.

After the features of the image are obtained by the convolutional layer, in order to further reduce the over-fitting degree of the network training parameters and the model of the deep learning engine implemented by the convolutional neural network in this embodiment, the convolution layer is pooled/sampled. . There are usually two ways to pool/sample:

Max-Pooling: Select the maximum value in the Pooling window as the sample value;

Mean-Pooling: Adds all the values in the Pooling window to the average and takes the average as the sampled value.

Please refer to FIG. 7, which shows a structural diagram of a convolutional neural network in this embodiment. In this embodiment, a classical convolutional neural network structure is used.

The C1 layer is a convolutional layer. In the C1 layer, six feature maps are obtained. Each neuron in each feature map is connected to a 5*5 neighborhood in the input. The feature map size is 28*28; each volume The product neuron has 25 unit parameters and a base parameter; there are 122,304 connections.

The S2 layer is a downsampling layer with six 14*14 feature maps. Each cell in each graph is connected to a 2*2 neighborhood in the C1 feature map, and does not overlap. Therefore, each in S2 The size of the feature map is 1/4 of the size of the feature map in C1; the four inputs of each unit of the S2 layer are added, multiplied by a trainable parameter W, and the trainable offset b is added, and the result is calculated by the sigmoid function. . The number of connections in the S2 layer is 5,880.

The C3 layer is a convolutional layer with 16 convolution kernels, and 16 feature maps are obtained. The feature map size is 10*10; each neuron in each feature map and a plurality of layers in S2 are 5 *5 neighbors are connected.

S4 is a downsampling layer composed of 16 5*5 size feature maps. Each unit in the feature map is connected to the 2*2 neighborhood of the corresponding feature map in C3; the number of connections is 2000.

The C5 layer is a convolutional layer consisting of 120 neurons and 120 feature maps. Each feature map has a size of 1*1; each unit is connected to the 5*5 neighborhood of all 16 units of the S4 layer. 48120 connections.

The F6 layer has 84 units, which are fully connected to the C5 layer and have 10,164 connections.

After processing the image to be processed using the binarization method, the processing result needs to be analyzed. Generally, when analyzing the processing results, the prior art generally has difficulty in obtaining high-accuracy results for binarization analysis in a scene with low contrast or complex texture. The deep learning engine based on the convolutional neural network provided by the embodiment is a deep learning neural network based on big data, and the output of the confidence is accurate and the output speed is fast, which makes up the scene requirements of the prior art. Insufficient high and poor accuracy. Therefore, based on the engine, the processing result in step 103 is evaluated, and the robustness and accuracy are higher than the traditional evaluation method of binarization processing results.

Based on the engine, the embodiment of the present application can adaptively calculate the text confidence of the processing result of the binarization method based on the sliding window and the text confidence of the processing result of the binarization method based on the color value statistics, and thereby The processing result is selected in S105.

In one scenario of an embodiment of the present application, please refer to FIG. 8, which shows a picture to be processed. Please refer to FIG. 9 , which shows the processing result of the to-be-processed picture in FIG. 8 in the binarization method based on the sliding window in the embodiment of the present application. Please refer to FIG. 10 , which shows the processing result of the binarization method based on the color value statistics in the embodiment of the present application on the to-be-processed image in FIG. 8 . 9 and FIG. 10 are input into the deep learning engine based on CNN, and the confidence of each character in FIG. 9 and FIG. 10 is obtained, and the text confidence of FIG. 9 and FIG. 10 is calculated, and the text of FIG. 9 in this embodiment is calculated. The confidence level is 0.88, and the text confidence of FIG. 10 is 0.97. Therefore, the processing result of FIG. 10 is selected as the binarization result for the picture to be processed in FIG.

In another scenario of an embodiment of the present application, please refer to FIG. 11, which shows a picture to be processed. Please refer to FIG. 12 , which shows the processing result of the to-be-processed picture in FIG. 11 in the binarization method based on the sliding window in the embodiment of the present application. Please refer to FIG. 13 , which shows the processing result of the binarization method based on color value statistics in the embodiment of the present application on the to-be-processed image in FIG. 11 . 12 and FIG. 13 are input into the deep learning engine based on CNN, and the confidence of each character in FIG. 12 and FIG. 13 is obtained, and the text confidence of FIG. 12 and FIG. 13 is calculated, and the text of FIG. 12 in this embodiment is calculated. The confidence level is 0.99, and the text confidence of Fig. 13 is 0.94. Therefore, the processing result of Fig. 13 is selected as the binarization result for the picture to be processed in Fig. 11.

In the embodiment of the present application, only the image to be processed needs to be separately processed by various complementary methods of binarization, and then the confidence of the single text is obtained by using the learning engine based on optical character recognition, thereby calculating the text confidence, that is, The optimal processing result can be dynamically selected. Seamless switching of processing results of various binarization methods can be achieved without concern for global information or local textures.

The following is an embodiment of the apparatus of the present application, which may be used to implement the method embodiments of the present application. For details not disclosed in the device embodiments of the present application, please refer to the method embodiments of the present application.

Please refer to FIG. 14, which shows a block diagram of a binarization device for a picture. The device has a function of implementing the above method, and the function may be implemented by hardware or may be implemented by hardware. The device can include:

The to-be-processed picture acquisition module 201 is configured to acquire a picture to be processed. It can be used to perform step 101.

The processing result obtaining module 202 is configured to perform independent binarization processing on the to-be-processed image by using a plurality of preset binarization processing methods, and each binarization method obtains a processing result. It can be used to perform step 102.

The processing result set obtaining module 203 is configured to obtain a processing result set according to the processing result. It can be used to perform step 103.

The text confidence calculation module 204 is configured to calculate a text confidence of each processing result in the processing result set. It can be used to perform step 104.

The binarization result obtaining module 205 is configured to select a processing result with the highest degree of confidence in the text as a binarization result of the to-be-processed picture. It can be used to perform step 105.

Further, the text confidence calculation module 204 includes:

The confidence acquiring unit 2041 is configured to obtain a confidence level of each character in the processing result. It can be used to perform step 1041.

The text confidence calculation unit 2042 is configured to calculate the text confidence of the processing result according to a preset text confidence algorithm and a confidence level of each character. It can be used to perform step 1042.

Referring to FIG. 15, which is a block diagram of a text confidence calculation unit, the text confidence calculation unit 2042 may include:

The weight setting module 20421 is configured to set a weight corresponding to each character. It can be used to perform step S1.

The average confidence calculation module 20422 is configured to calculate a weighted average confidence of the processing result. It can be used to perform steps S2 and S3.

The text confidence obtaining module 20423 is configured to use the weighted average confidence as a text confidence. It can be used to perform step S4.

Please refer to FIG. 16, which shows a block diagram of a processing result obtaining module. The processing result obtaining module 202 includes:

The sliding window binarization unit 2021 is configured to perform binarization processing on the image to be processed based on the binarization method of the sliding window.

The color value statistical binarization unit 2022 is configured to perform binarization processing on the image to be processed based on the binarization method of the color value statistics.

Specifically, please refer to FIG. 17, which shows a block diagram of a sliding window binarization unit, which includes:

The window setting module 20211 is configured to set a window to a preset position of the to-be-processed picture. It can be used to perform step T1.

The first determining module 20212 is configured to determine whether the pixel and the related pixel in the window belong to a continuous pattern; and the related pixel is a pixel adjacent to the window outside the window. It can be used to perform step T2.

The local binarization module 20213 is configured to perform local binarization on pixels in the window. It can be used to perform step T3.

The second determining module 20214 is configured to determine whether the sliding of the window reaches an end point of the preset trajectory. It can be used to perform step T4.

The moving module 20215 is configured to move the window according to a preset trajectory. It can be used to perform step T5.

Specifically, please refer to FIG. 18, which shows a block diagram of a color value statistical binarization unit, and the color value statistical binarization unit 2022 includes:

The statistical result obtaining module 20221 is configured to obtain a color distribution statistical result of the pixel of the to-be-processed picture. It can be used to perform step P1.

The target color obtaining module 20222 is configured to obtain two target colors using a preset color clustering algorithm based on the color distribution statistical result. It can be used to perform step P2.

The setting module 20223 is configured to set a foreground color and a background color according to the two target colors. It can be used to perform step P3.

The determining module 20224 is configured to sequentially calculate a first distance and a second distance of the pixel of the to-be-processed picture, and determine a attribution of the pixel according to a calculation result; the first distance is a color of the pixel and the foreground The Euclidean distance between the colors, the second distance being the Euclidean distance between the pixel color and the background color. It can be used to perform step P4.

The binarization module 20225 is configured to binarize pixels in the to-be-processed image according to the determination result. It can be used to perform step P5.

It should be noted that, when the device provided by the foregoing embodiment implements its function, only the division of each functional module described above is illustrated. In actual applications, the function distribution may be completed by different functional modules as needed. The internal structure of the device is divided into different functional modules to perform all or part of the functions described above. In addition, the apparatus and method embodiments provided in the foregoing embodiments are in the same concept, and the specific implementation process is described in detail in the method embodiment, and details are not described herein again.

Referring to FIG. 19, it is a schematic structural diagram of a terminal provided by an embodiment of the present application. The terminal is used to implement the binarization method of a picture provided in the foregoing embodiment.

The terminal may include a radio frequency (RF) circuit 110, a memory 120 including one or more computer readable storage media, an input unit 130, a display unit 140, a sensor 150, an audio circuit 160, and wireless fidelity (wireless) A fidelity, WiFi) module 170, a processor 180 including one or more processing cores, and a power supply 190 and the like. It will be understood by those skilled in the art that the terminal structure shown in FIG. 19 does not constitute a limitation to the terminal, and may include more or less components than those illustrated, or a combination of certain components, or different component arrangements. among them:

The RF circuit 110 can be used for transmitting and receiving information or during a call, and receiving and transmitting signals. Specifically, after receiving downlink information of the base station, the downlink information is processed by one or more processors 180. In addition, the data related to the uplink is sent to the base station. . Generally, the RF circuit 110 includes, but is not limited to, an antenna, at least one amplifier, a tuner, one or more oscillators, a Subscriber Identity Module (SIM) card, a transceiver, a coupler, and a Low Noise Amplifier (LNA). , duplexer, etc. In addition, RF circuitry 110 can also communicate with the network and other devices via wireless communication. The wireless communication may use any communication standard or protocol, including but not limited to Global System of Mobile communication (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access (Code). Division Multiple Access (CDMA), Wideband Code Division Multiple Access (WCDMA), Long Term Evolution (LTE), E-mail, Short Messaging Service (SMS), etc.

The memory 120 can be used to store software programs and modules, and the processor 180 executes various functional applications and data processing by running software programs and modules stored in the memory 120. The memory 120 may mainly include a storage program area and an storage data area, wherein the storage program area may store an operating system, an application required for the function, and the like; the storage data area may store data or the like created according to the use of the terminal. Moreover, memory 120 can include high speed random access memory, and can also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, memory 120 may also include a memory controller to provide access to memory 120 by processor 180 and input unit 130.

The input unit 130 can be configured to receive input numeric or character information and to generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function controls. In particular, input unit 130 can include touch-sensitive surface 131 as well as other input devices 132. Touch-sensitive surface 131, also referred to as a touch display or trackpad, can collect touch operations on or near the user (such as a user using a finger, stylus, etc., on any suitable object or accessory on touch-sensitive surface 131 or The operation near the touch-sensitive surface 131) and driving the corresponding connecting device according to a preset program. Alternatively, the touch sensitive surface 131 can include two portions of a touch detection device and a touch controller. Wherein, the touch detection device detects the touch orientation of the user, and detects a signal brought by the touch operation, and transmits the signal to the touch controller; the touch controller receives the touch information from the touch detection device, converts the touch information into contact coordinates, and sends the touch information. The processor 180 is provided and can receive commands from the processor 180 and execute them. In addition, the touch-sensitive surface 131 can be implemented in various types such as resistive, capacitive, infrared, and surface acoustic waves. In addition to the touch-sensitive surface 131, the input unit 130 can also include other input devices 132. Specifically, other input devices 132 may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control buttons, switch buttons, etc.), trackballs, mice, joysticks, and the like.

Display unit 140 can be used to display information entered by the user or information provided to the user as well as various graphical user interfaces of the terminal, which can be composed of graphics, text, icons, video, and any combination thereof. The display unit 140 may include a display panel 141. Alternatively, the display panel 141 may be configured in the form of a liquid crystal display (LCD), an organic light-emitting diode (OLED), or the like. Further, the touch-sensitive surface 131 may cover the display panel 141, and when the touch-sensitive surface 131 detects a touch operation thereon or nearby, it is transmitted to the processor 180 to determine the type of the touch event, and then the processor 180 according to the touch event The type provides a corresponding visual output on display panel 141. Although in FIG. 19, touch-sensitive surface 131 and display panel 141 are implemented as two separate components to implement input and input functions, in some embodiments, touch-sensitive surface 131 can be integrated with display panel 141 for input. And output function.

The terminal may also include at least one type of sensor 150, such as a light sensor, a motion sensor, and other sensors. Specifically, the light sensor may include an ambient light sensor and a proximity sensor, wherein the ambient light sensor may adjust the brightness of the display panel 141 according to the brightness of the ambient light, and the proximity sensor may close the display panel 141 when the terminal moves to the ear. And / or backlight. As a kind of motion sensor, the gravity acceleration sensor can detect the magnitude of acceleration in each direction (usually three axes). When it is stationary, it can detect the magnitude and direction of gravity. It can be used to identify the attitude of the terminal (such as horizontal and vertical screen switching, related Game, magnetometer attitude calibration), vibration recognition related functions (such as pedometer, tapping), etc.; as for the terminal can also be configured with gyroscopes, barometers, hygrometers, thermometers, infrared sensors and other sensors, here No longer.

An audio circuit 160, a speaker 161, and a microphone 162 can provide an audio interface between the user and the terminal. The audio circuit 160 can transmit the converted electrical data of the received audio data to the speaker 161 for conversion to the sound signal output by the speaker 161; on the other hand, the microphone 162 converts the collected sound signal into an electrical signal by the audio circuit 160. After receiving, it is converted into audio data, and then processed by the audio data output processor 180, transmitted to the terminal, for example, via the RF circuit 110, or outputted to the memory 120 for further processing. The audio circuit 160 may also include an earbud jack to provide communication of the peripheral earphones with the terminal.

WiFi is a short-range wireless transmission technology, and the terminal can help users to send and receive emails, browse web pages, and access streaming media through the WiFi module 170, which provides wireless broadband Internet access for users. Although FIG. 19 shows the WiFi module 170, it can be understood that it does not belong to the necessary configuration of the terminal, and may be omitted as needed within the scope of not changing the essence of the application.

The processor 180 is the control center of the terminal, connecting various portions of the entire terminal using various interfaces and lines, by running or executing software programs and/or modules stored in the memory 120, and recalling data stored in the memory 120. Performing various functions and processing data of the terminal to perform overall monitoring on the terminal. Optionally, the processor 180 may include one or more processing cores; preferably, the processor 180 may integrate an application processor and a modem processor, where the application processor mainly processes an operating system, a user interface, an application, and the like. The modem processor primarily handles wireless communications. It can be understood that the above modem processor may not be integrated into the processor 180.

The terminal further includes a power source 190 (such as a battery) for supplying power to each component. Preferably, the power source can be logically connected to the processor 180 through the power management system to manage functions such as charging, discharging, and power management through the power management system. . Power supply 190 may also include any one or more of a DC or AC power source, a recharging system, a power failure detection circuit, a power converter or inverter, a power status indicator, and the like.

Although not shown, the terminal may further include a camera, a Bluetooth module, and the like, and details are not described herein again. Specifically, in this embodiment, the display unit of the terminal is a touch screen display, the terminal further includes a memory, and one or more programs, wherein one or more programs are stored in the memory and configured to be processed by one or more The execution of one or more programs includes instructions for performing the binarization method of one of the above pictures.

In an exemplary embodiment, there is also provided a non-transitory computer readable storage medium comprising instructions, such as a memory comprising instructions executable by a processor of a terminal to perform the various steps of the above method embodiments. For example, the non-transitory computer readable storage medium may be a ROM, a random access memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, and an optical data storage device.

It should be understood that "a plurality" as referred to herein means two or more. "and/or", describing the association relationship of the associated objects, indicating that there may be three relationships, for example, A and/or B, which may indicate that there are three cases where A exists separately, A and B exist at the same time, and B exists separately. The character "/" generally indicates that the contextual object is an "or" relationship.

The serial numbers of the embodiments of the present application are merely for the description, and do not represent the advantages and disadvantages of the embodiments.

A person skilled in the art may understand that all or part of the steps of implementing the above embodiments may be completed by hardware, or may be instructed by a program to execute related hardware, and the program may be stored in a computer readable storage medium. The storage medium mentioned may be a read only memory, a magnetic disk or an optical disk or the like.

The above is only the preferred embodiment of the present application, and is not intended to limit the present application. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and principles of the present application are included in the protection of the present application. Within the scope.

A person skilled in the art can clearly understand that for the convenience and brevity of the description, the specific working process of the system, the device and the unit described above can refer to the corresponding process in the foregoing method embodiment, and details are not described herein again.

In the several embodiments provided by the present application, it should be understood that the disclosed system, apparatus, and method may be implemented in other manners. For example, the device embodiments described above are merely illustrative. For example, the division of the unit is only a logical function division. In actual implementation, there may be another division manner, for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed. In addition, the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.

The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit. The above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application, in essence or the contribution to the prior art, or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium. A number of instructions are included to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present application. The foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like. .

The above embodiments are only used to explain the technical solutions of the present application, and are not limited thereto; although the present application has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that they can still The technical solutions described in the embodiments are modified, or the equivalents of the technical features are replaced by the equivalents. The modifications and substitutions of the embodiments do not depart from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims

A binarization method for a picture, wherein the method includes:

The binarization device of the picture acquires a picture to be processed, and the picture to be processed includes a text;

The binarization device of the picture performs independent binarization processing on the to-be-processed image by using a plurality of preset binarization processing methods, and each binarization method obtains a processing result;

The binarization device of the picture obtains a set of processing results according to the processing result;

a binarization device of the picture calculates a text confidence of each of the processing result sets;

The binarization device of the picture selects a processing result with the highest degree of text confidence as a binarization result of the to-be-processed picture.
The method according to claim 1, wherein the binarization means of the picture calculates the text confidence of each of the processing result sets including:

The binarization device of the picture acquires a confidence level of each word in the processing result;

The binarization means of the picture calculates the text confidence of the processing result according to a preset text confidence algorithm and a confidence level of each word.
The method according to claim 2, wherein the confidence of each of the characters in the processing result of the binarization means of the picture comprises:

The binarization device of the picture inputs the processing result into a preset optical character recognition-based learning engine;

The binarization device of the picture obtains a confidence level of the learning engine output.
The method according to claim 2, wherein the binarization means of the picture calculates the text confidence of the processing result according to a preset text confidence algorithm and a confidence level of each character, including:

The binarization device of the picture sets a weight corresponding to each character in the processing result;

The binarization device of the picture calculates a weighted average confidence of the processing result;

The binarization device of the picture weights and satisfies the confidence according to the confidence of each character and the weight corresponding to the text;

The binarization means of the picture obtains a weighted average confidence by dividing the result of the weighted sum by the number of words in the processing result;

The binarization means of the picture takes the weighted average confidence as a literal confidence.
The method according to claim 1, wherein the preset binarization processing method comprises a binarization method based on a sliding window and a binarization method based on color value statistics.
The method of claim 5 wherein said sliding window based binarization method comprises:

The binarization device of the picture sets a window at a preset position of the to-be-processed picture;

The binarization device of the picture determines whether pixels and related pixels in the window belong to a continuous pattern; the related pixels are pixels adjacent to the window outside the window;

If not, the binarization device of the picture performs local binarization on pixels in the window;

The binarization device of the picture determines whether the window reaches an end point of the preset track;

If not, the binarization device of the picture slides the window according to a preset trajectory;

The binarization means of the picture returns a step of determining whether the pixels in the window and adjacent pixels outside the window belong to a continuous pattern.
The method according to claim 6, wherein the binarization means of the picture performs local binarization on pixels in the window, including:

The binarization device of the picture obtains a statistical result of color distribution of pixels in the window;

The binarization device of the picture sets a threshold according to the statistical result, and the threshold is used to distinguish a foreground and a background of the to-be-processed picture;

The binarization means of the picture binarizes pixels in the window according to the threshold.
The method of claim 5, wherein the binarization method based on color value statistics comprises:

The binarization device of the picture obtains a color distribution statistical result of the pixel of the picture to be processed;

The binarization device of the picture obtains two target colors using a preset color clustering algorithm based on the color distribution statistical result;

The binarization device of the picture sets a foreground color and a background color according to the two target colors;

The binarization device of the picture sequentially calculates a first distance and a second distance of the pixel of the to-be-processed picture, and determines a attribution of the pixel according to a calculation result, where the first distance is a color and a color of the pixel a Euclidean distance between foreground colors, the second distance being a Euclidean distance between the pixel color and the background color;

The binarization device of the picture binarizes pixels in the to-be-processed picture according to the determination result.
The method according to claim 8, wherein the binarization means of the picture sequentially calculates the first distance and the second distance of the pixels of the picture to be processed, and determines that the attribution of the pixel according to the calculation result comprises:

If the first distance is smaller than the second distance, the binarization device of the picture determines that the pixel belongs to the foreground;

If the first distance is greater than the second distance, the binarization device of the picture determines that the pixel belongs to the background.
A binarization device for a picture, wherein the device comprises:

The image acquisition module to be processed is used to obtain a picture to be processed;

The processing result obtaining module is configured to perform independent binarization processing on the to-be-processed image by using a plurality of preset binarization processing methods, and each binarization method obtains a processing result;

Processing a result set obtaining module, configured to obtain a processing result set according to the processing result;

a text confidence calculation module, configured to calculate a text confidence of each processing result in the processing result set;

The binarization result obtaining module is configured to select a processing result with the highest degree of text confidence as the binarization result of the to-be-processed picture.
The apparatus of claim 10, wherein the text confidence calculation module comprises:

a confidence acquisition unit, configured to obtain a confidence level of each text in the processing result;

The text confidence calculation unit is configured to calculate the text confidence of the processing result according to a preset text confidence algorithm and a confidence level of each text.
The apparatus according to claim 10, wherein said text confidence calculation unit comprises:

a weight setting module for setting a weight corresponding to each character;

An average confidence calculation module, configured to calculate a weighted average confidence of the processing result;

A text confidence level module is used to use the weighted average confidence as a text confidence.
The apparatus according to claim 10, wherein said processing result obtaining module comprises:

a sliding window binarization unit for performing binarization processing on a picture to be processed based on a binarization method of a sliding window;

The color value statistical binarization unit is configured to perform binarization processing on the image to be processed based on the binarization method of color value statistics.
The apparatus of claim 13, wherein the sliding window binarization unit comprises:

a window setting module, configured to set a window to a preset position of the to-be-processed picture;

a first determining module, configured to determine whether a pixel and a related pixel in the window belong to a continuous pattern; and the related pixel is a pixel adjacent to the window outside the window;

a local binarization module, configured to perform local binarization on pixels in the window;

a second determining module, configured to determine whether the sliding of the window reaches an end point of the preset track;

a moving module, configured to move the window according to a preset trajectory.
The apparatus according to claim 13, wherein said color value statistical binarization unit comprises:

a statistical result obtaining module, configured to obtain a color distribution statistical result of the pixel of the to-be-processed picture;

a target color obtaining module, configured to obtain two target colors by using a preset color clustering algorithm based on the color distribution statistical result;

a setting module, configured to set a foreground color and a background color according to the two target colors;

a determining module, configured to sequentially calculate a first distance and a second distance of the pixel of the to-be-processed picture, and determine a attribution of the pixel according to a calculation result; the first distance is a color of the pixel and the foreground color a Euclidean distance between the second distance being a Euclidean distance between the pixel color and the background color;

And a binarization module, configured to binarize pixels in the to-be-processed image according to the determination result.
A binarization terminal for a picture, wherein the terminal comprises a binarization device for a picture according to any one of claims 10-15.
A computer readable storage medium comprising instructions for performing the method of any one of claims 1 to 9 when the instruction is run on a computer.
A computer program product comprising instructions for performing the method of any of claims 1 to 9 when the computer program product is run on a computer.
A binarization device for a picture, wherein the device comprises:

Transceiver, processor and bus;

The transceiver and the processor are connected by the bus;

The processor performs the following steps:

Get the image to be processed;

Performing independent binarization processing on the to-be-processed image by using a plurality of preset binarization processing methods, and each binarization method obtains a processing result;

Obtaining a set of processing results according to the processing result;

Calculating a text confidence of each processing result in the set of processing results;

The processing result with the highest degree of confidence in the text is selected as the binarization result of the image to be processed.