+

WO2018133717A1 - Image thresholding method and device, and terminal - Google Patents

Image thresholding method and device, and terminal Download PDF

Info

Publication number
WO2018133717A1
WO2018133717A1 PCT/CN2018/072047 CN2018072047W WO2018133717A1 WO 2018133717 A1 WO2018133717 A1 WO 2018133717A1 CN 2018072047 W CN2018072047 W CN 2018072047W WO 2018133717 A1 WO2018133717 A1 WO 2018133717A1
Authority
WO
WIPO (PCT)
Prior art keywords
binarization
picture
confidence
processed
processing result
Prior art date
Application number
PCT/CN2018/072047
Other languages
French (fr)
Chinese (zh)
Inventor
刘银松
郭安泰
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2018133717A1 publication Critical patent/WO2018133717A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/28Quantising the image, e.g. histogram thresholding for discrimination between background and foreground patterns
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Definitions

  • This application relates to the field of image processing.
  • binarization of the image is to set the gray value of the pixel on the image to 0 or 255, so that the entire image presents a distinct black and white visual effect. Binarization is the basic operation of image processing and its application is very extensive. Accordingly, there are many binarization methods, such as the bimodal method, the P-parameter method, the iterative method, and the maximum inter-class variance method.
  • the present application proposes a method, a device and a terminal for binarization of pictures.
  • a method for binarization of a picture comprising:
  • the binarization device of the image acquires a to-be-processed image, and the to-be-processed image contains text;
  • the binarization device of the image separately performs independent binarization processing on the to-be-processed image by using a plurality of preset binarization processing methods, and each binarization method obtains a processing result;
  • the binarization device of the picture obtains a set of processing results according to the processing result
  • the binarization device of the picture calculates a text confidence of each processing result in the set of processing results
  • the binarization device of the picture selects the processing result with the highest text confidence as the binarization result of the to-be-processed picture.
  • the device includes:
  • the image acquisition module to be processed is used to obtain a picture to be processed
  • the processing result obtaining module is configured to perform independent binarization processing on the to-be-processed image by using a plurality of preset binarization processing methods, and each binarization method obtains a processing result;
  • Processing a result set obtaining module configured to obtain a processing result set according to the processing result
  • a text confidence calculation module configured to calculate a text confidence of each processing result in the processing result set
  • the binarization result obtaining module is configured to select a processing result with the highest degree of text confidence as a binarization result of the to-be-processed picture.
  • the device includes:
  • Transceiver processor and bus
  • the transceiver and the processor are connected by the bus;
  • the processor performs the following steps:
  • the processing result with the highest degree of confidence in the text is selected as the binarization result of the image to be processed.
  • a binarization terminal for a picture comprising a binarization device of the above picture.
  • an embodiment of the present application provides a computer readable storage medium, comprising instructions that, when executed on a computer, perform the method described in the first aspect above.
  • an embodiment of the present application provides a computer program product comprising instructions for performing the method of the first aspect described above when the computer program product is run on a computer.
  • the present application provides a method, a device and a terminal for binarization of pictures, which have the following beneficial effects:
  • the present application calculates the text confidence of the binarization result of the image to be processed based on the optical character recognition, and dynamically selects an optimal binarization method according to the text confidence, thereby obtaining an optimal binarization result for the image to be processed.
  • the present application can dynamically select an optimal binarization result in different scenarios to meet the diversity requirements of different scenarios, and implement full scene adaptation for picture binarization.
  • FIG. 1 is a flowchart of a method for binarization of a picture in the embodiment of the present application
  • FIG. 2 is a flowchart of a method for acquiring text confidence in an embodiment of the present application
  • FIG. 3 is a flowchart of a weighted average algorithm in the embodiment of the present application.
  • FIG. 4 is a flowchart of a binarization method based on a sliding window in the embodiment of the present application
  • FIG. 5 is a flowchart of a local binarization method in an embodiment of the present application.
  • FIG. 6 is a flowchart of a binarization method based on color value statistics in an embodiment of the present application
  • FIG. 7 is a structural diagram of a convolutional neural network in an embodiment of the present application.
  • Figure 8 is a picture to be processed in the embodiment of the present application.
  • FIG. 9 is a processing result of the to-be-processed picture in FIG. 8 in the binarization method based on the sliding window in the embodiment of the present application;
  • FIG. 10 is a processing result of the to-be-processed picture in FIG. 8 by the binarization method based on color value statistics in the embodiment of the present application;
  • Figure 11 is another picture to be processed in the embodiment of the present application.
  • FIG. 12 is a processing result of the to-be-processed picture in FIG. 11 according to the binarization method based on the sliding window in the embodiment of the present application;
  • FIG. 13 is a processing result of the binarization method based on color value statistics in the embodiment of the present application for the to-be-processed picture in FIG. 11;
  • FIG. 14 is a block diagram of a binarization apparatus for a picture in the embodiment of the present application.
  • 15 is a block diagram of a text confidence calculation unit in the embodiment of the present application.
  • 16 is a block diagram of a processing result obtaining module in the embodiment of the present application.
  • 17 is a block diagram of a sliding window binarization unit in the embodiment of the present application.
  • FIG. 18 is a block diagram of a color value statistical binarization unit in the embodiment of the present application.
  • FIG. 19 is a structural block diagram of a terminal in an embodiment of the present application.
  • FIG. 1 is a flowchart of a binarization method for a picture provided by an embodiment of the present application.
  • the method can include the following steps.
  • Step 101 Acquire a picture to be processed, where the picture to be processed includes text.
  • Step 102 Perform independent binarization processing on the to-be-processed image by using a plurality of preset binarization processing methods, and each binarization method obtains a processing result.
  • the preset binarization method can select an existing binarization method, and the number of preset binarization methods can be two or more.
  • the existing binarization methods are mainly divided into two categories.
  • One is a global method, which determines a unified segmentation threshold from a global perspective and binarizes by segmentation thresholds.
  • the other is a locally adaptive method based on different regions of the image. In the case, different thresholds are determined and binarized according to the threshold.
  • the global method mostly calculates a segmentation threshold that can achieve the maximum binarization effect according to the global color statistics of the image, and then performs simple binarization according to the segmentation threshold. This method works well only in images with simple background and single color, and poor effect on images with complex texture information or low contrast.
  • the local adaptive method mostly calculates the binarization threshold based on the local texture information, which can avoid the misjudgment of the global threshold to a certain extent, but often it is too focused on local information, ignoring the global coordination information, and often causing adjacent
  • the effect of local binarization is very different, and the effect of binarization in adjacent areas is inconsistent.
  • step 101 the accuracy of the text extraction in the image to be processed in step 101 is improved.
  • step 102 a plurality of binarization methods can be enumerated, and the text confidence of the subsequent calculation is used to select the most
  • the good binarization method is used to collect the effects of various binarization methods and lengthen the shortcomings, thereby expanding the binarization scene of the image and obtaining the best binarization effect.
  • Step 103 Obtain a processing result set according to the processing result.
  • step 102 a plurality of binarization methods are enumerated, and each binarization method obtains a processing result, thereby constituting a processing result set.
  • Step 104 Calculate a text confidence of each processing result in the processing result set.
  • the text confidence is used to characterize the probability that the text in the processing result can be accurately recognized, and the text confidence can be used as an evaluation index of the processing effect of the binarization method.
  • the high confidence of the text indicates that the binarization processing effect is ideal, and the text confidence is low, indicating that the binarization processing effect is not satisfactory.
  • Step 105 Select a processing result with the highest degree of confidence in the text as a binarization result for the to-be-processed picture.
  • one of the processing results is selected as the binarization result for the to-be-processed image according to a preset selection method.
  • the preset selection method may be a random selection or other selection method.
  • the embodiment of the present application can enumerate various binarization methods and dynamically select an optimal processing result, so that the pictures of each scene can be binarized and a better binarization effect is obtained. Improve the compatibility of image processing; use text confidence as the evaluation standard of image binarization effect, which can make the selected processing result get the best text recognition result, which is beneficial to other word processing for the result later. .
  • FIG. 2 shows a flow chart of a method for acquiring text confidence in step 104, including:
  • Step 1041 Acquire a confidence level of each character in the processing result.
  • the confidence of the output of the learning engine is obtained by inputting the processing result into a preset optical character recognition (OCR)-based learning engine.
  • the learning engine may be a deep learning engine based on a Convolutional Neural Network (CNN), and the deep learning engine based on the CNN can better recognize single-word images, and has high accuracy, accurate confidence, and the like.
  • CNN Convolutional Neural Network
  • the advantages of the CNN-implemented deep learning engine over the conventional image processing algorithm is that the pre-processing process (extracting artificial features, etc.) for complex images is avoided, and the original image can be directly input. For example, you can directly input a single-word picture with a resolution of 28*28 and output the confidence directly.
  • the output of confidence is more reliable than traditional methods.
  • the traditional Tesseract learning engine and the Nhocr engine also support the recognition of confidence, which can also be used in this embodiment.
  • the output of a single text has a confidence of between 0 and 1.
  • Step 1042 Calculate the text confidence of each of the processing results according to a preset text confidence algorithm and a confidence level of each text.
  • the preset text confidence algorithm includes, but is not limited to, a weighted average algorithm using a weighted average of confidence as a text confidence, and a geometric mean algorithm using a geometric mean of confidence as a text confidence.
  • the squared mean of the confidence is used as the squared mean algorithm of the text confidence and the harmonic mean of the confidence is used as the harmonic mean of the text confidence.
  • FIG. 3 shows a flowchart of the weighted average algorithm, including:
  • the weights of the characters can be Q 0 ... Q n-1 respectively .
  • the weight corresponding to each text can be set randomly by the program, or can be set according to actual needs.
  • the confidence of each text can be Z 0 ... Z n-1 respectively , then the process of weighted summation can be expressed as
  • the weighted average confidence is obtained by dividing the result of the weighted sum by the number of words in the processing result.
  • the weighted average confidence is taken as the text confidence.
  • the reliability and distinguishing ability of the text confidence can be improved, and the distinction between multiple processing results is supported, and The optimal processing result is distinguished from the plurality of processing results.
  • step 102 an existing binarization method may be used, or a custom binarization method may be used, and the following possible binarization methods are taken as an example:
  • the gray value of the pixel whose gray value is less than 127 is set to 0 (black), and the gray value of the pixel whose gray value is greater than or equal to 127 is set to 255. (White), the advantage of the method is that the amount of calculation is small and fast. The disadvantage is that the pixel distribution and pixel value characteristics of the image are not considered.
  • Another possible implementation binarization based on the mean K.
  • the average value K of the pixels in the image is calculated; the gray value of each pixel of the scanned image, if the gray value is greater than K, the gray value of the pixel is set to 255 (white), If the gradation value is less than or equal to K, the gradation value of the pixel is set to 0 (black).
  • This method uses the average value as the binarization threshold, although simple, but may result in partial object pixels or background pixels being lost. The binarization result is difficult to truly reflect the source image information.
  • the image is composed of two parts, the foreground area and the background area.
  • the variance between the variances, the grayscale threshold that maximizes the variance, is the desired binarization threshold.
  • the gray value of the pixel is set to 255 (white), and if the gray value is less than or equal to the binarization threshold, then The gray value value of the pixel is set to 0 (black).
  • the largest interclass variance method is a classical method of binarization, which achieves a good balance between computational speed and binarization effect, but as a global binarization method, its texture information is complex or low contrast. The image is less effective.
  • step S102 The above-mentioned common binarization method or other binarization methods listed in the embodiments of the present application can be applied to step S102.
  • the embodiment of the present application performs binarization processing on the picture to be processed by the binarization method based on the sliding window and the binarization method based on the color value statistics in step S102.
  • FIG. 4 shows a flowchart of a sliding window based binarization method, including:
  • Step T1 Set a window to a preset position of the to-be-processed picture.
  • the size and shape of the window may be set according to actual needs. Taking a window containing M*N pixels as an example, the window is set at a preset position.
  • the preset position may also be set according to actual needs. Specifically, it may be located at the upper left corner or the lower right corner of the to-be-processed picture. For a picture to be processed with a width of M pixels, it may be located at the leftmost end of the to-be-processed picture or The far right.
  • Step T2. Determine whether the pixels in the window and the related pixels belong to a continuous pattern.
  • the related pixel is a pixel adjacent to the window outside the window.
  • the purpose of step T2 is to determine whether the M*N pixels and the related pixels in the picture to be processed falling within the window belong to a continuous pattern. If not, it is determined that the window contains text.
  • the window contains text, it is binarized for the pixels in the window. In this embodiment, only the binarization effect of the text portion in the image to be processed is concerned. Therefore, if the window does not contain characters, the processing is not performed, or the window may be directly set to a uniform gray scale.
  • Step T4 Determine whether the window reaches the end point of the preset track.
  • the window slides according to a preset trajectory, and the sliding ends at the end of the preset trajectory.
  • the preset track can be set by itself according to actual needs. For a picture to be processed having a width of M pixels, it may be located to move the window along its length.
  • Step T6 Return to step T2.
  • the local binarization method used in step T3 can be implemented by using the existing binarization method.
  • FIG. 5 shows the local binarization method in step T3 in this embodiment.
  • Step T31 Obtain a color distribution statistical result of the pixels in the window.
  • Step T32 Set a threshold according to the statistical result, where the threshold is used to distinguish the foreground and the background of the to-be-processed picture.
  • the foreground pixel and the background pixel of the image are distinguished by selecting the threshold. For example, a pixel whose color is larger than the threshold is classified as a foreground pixel, and vice versa is a background pixel; or a pixel whose color is smaller than the threshold is classified as a foreground pixel, and vice versa.
  • the threshold can be such that the color mean of the foreground pixels and the color mean of the background pixels have the largest difference after segmentation based on the threshold.
  • Step T33 Binarize pixels in the window according to the threshold.
  • the foreground pixel can be set to 255 (white) and the background pixel to 0 (black); the foreground pixel can be set to 0 (black) and the background pixel to 255 (white).
  • the sliding window-based binarization method provided by the embodiment of the present invention belongs to a local adaptive method, and is more suitable for a scene in which a text line is divided among pictures, and can obtain better in a scene in which color information is relatively simple and the background texture is not complicated. Binary effect.
  • FIG. 6 shows a flowchart of a binarization method based on color value statistics, including:
  • Step P1 Obtain a color distribution statistical result of the pixels of the to-be-processed picture.
  • Step P2 Based on the color distribution statistical result, two target colors are obtained using a preset color clustering algorithm.
  • Clustering is an aggregation of data that aggregates similar data into one class. Clustering is an unsupervised classification that has the advantage of not requiring an advance training process. Under normal circumstances, the color clustering algorithm can reduce the range of color space, increase the distance between each color, and get the color clustering result (target color).
  • the commonly used color clustering method is K-means, mixed Gaussian. Gaussian Mixture Models (GMM), Mean shift and other methods.
  • Step P3. Set the foreground color and the background color according to the two target colors.
  • Step P4 The first distance and the second distance of the pixels of the to-be-processed picture are sequentially calculated, and the attribution of the pixel is determined according to the calculation result.
  • the first distance is a Euclidean distance between a color of the pixel and the foreground color
  • the second distance is a Euclidean between the pixel color and the background color distance. If the first distance is smaller than the second distance, it is determined that the pixel belongs to the foreground; if the first distance is greater than the second distance, it is determined that the pixel belongs to the background.
  • Step P5. Perform binarization on the pixels in the to-be-processed picture according to the determination result.
  • the pixel is determined to belong to the foreground pixel or the background pixel by calculating the first distance and the second distance of the pixels in the picture.
  • the foreground pixel can be set to 255 (white) and the background pixel to 0 (black); the foreground pixel can be set to 0 (black) and the background pixel to 255 (white).
  • the binarization method based on the color value statistics provided by the embodiment of the present application belongs to the global method. Since the target color is calculated by the clustering method, it can be applied to a complex scene and has a wide application range.
  • the present application evaluates the processing result of the binarization processing of the image to be processed by the binarization method based on the sliding window and the binarization method based on the color value statistics by the depth learning engine implemented by the CNN.
  • the text confidence of the processing result of the binarization method based on the sliding window and the text confidence of the processing result of the binarization method based on the color value statistics are respectively obtained.
  • CNN is one of the most representative network structures in deep learning technology. It has achieved great success in the field of image processing. On the international standard ImageNet dataset, many successful models are based on CNN, which is used in this embodiment.
  • the deep learning engine is also based on the convolutional neural network CNN.
  • One of the advantages of CNN over traditional image processing algorithms is that it avoids complex pre-processing of images (extracting artificial features, etc.), can directly input the original image, and output confidence for a single text.
  • an image is usually regarded as one or more two-dimensional vectors.
  • a grayscale image it can be regarded as a two-dimensional vector, and the gray value of the pixel is the two-dimensional vector.
  • the color picture represented by RGB RGB color mode is a color standard in the industry
  • RGB RGB color mode is a color standard in the industry
  • the traditional neural network adopts the full connection mode, that is, the neurons from the input layer to the hidden layer are all connected, which will result in a large amount of parameters, making the network training time-consuming and even difficult to train, and is used in this embodiment.
  • the convolutional neural network avoids this difficulty through methods such as local connection and weight sharing. Therefore, compared with the traditional learning engine, the time complexity in the deep learning engine operation based on CNN is greatly reduced in this embodiment. Therefore, it has more excellent performance.
  • the CNN there are mainly two types of network layers in the CNN, which are a convolution layer and a pooling/sampling layer.
  • the function of the convolutional layer is to extract various features of the image; the role of the pooling layer is to abstract the original feature signal, thereby greatly reducing the training parameters, and also reducing the degree of overfitting of the model.
  • the convolutional layer is obtained by calculating the convolution kernel on the input layer of the previous stage by sliding the window one by one.
  • Each parameter in the convolution kernel is equivalent to the weight parameter in the traditional neural network, and is connected with the corresponding local pixel.
  • the sum of the parameters of the convolution kernel and the corresponding local pixel values (usually plus an offset parameter) to obtain the result on the convolutional layer.
  • the convolution layer is pooled/sampled. . There are usually two ways to pool/sample:
  • Max-Pooling Select the maximum value in the Pooling window as the sample value
  • Mean-Pooling Adds all the values in the Pooling window to the average and takes the average as the sampled value.
  • FIG. 7 shows a structural diagram of a convolutional neural network in this embodiment.
  • a classical convolutional neural network structure is used.
  • the C1 layer is a convolutional layer.
  • six feature maps are obtained. Each neuron in each feature map is connected to a 5*5 neighborhood in the input.
  • the feature map size is 28*28; each volume
  • the product neuron has 25 unit parameters and a base parameter; there are 122,304 connections.
  • the S2 layer is a downsampling layer with six 14*14 feature maps. Each cell in each graph is connected to a 2*2 neighborhood in the C1 feature map, and does not overlap. Therefore, each in S2
  • the size of the feature map is 1/4 of the size of the feature map in C1; the four inputs of each unit of the S2 layer are added, multiplied by a trainable parameter W, and the trainable offset b is added, and the result is calculated by the sigmoid function. .
  • the number of connections in the S2 layer is 5,880.
  • the C3 layer is a convolutional layer with 16 convolution kernels, and 16 feature maps are obtained.
  • the feature map size is 10*10; each neuron in each feature map and a plurality of layers in S2 are 5 *5 neighbors are connected.
  • S4 is a downsampling layer composed of 16 5*5 size feature maps. Each unit in the feature map is connected to the 2*2 neighborhood of the corresponding feature map in C3; the number of connections is 2000.
  • the C5 layer is a convolutional layer consisting of 120 neurons and 120 feature maps. Each feature map has a size of 1*1; each unit is connected to the 5*5 neighborhood of all 16 units of the S4 layer. 48120 connections.
  • the F6 layer has 84 units, which are fully connected to the C5 layer and have 10,164 connections.
  • the processing result After processing the image to be processed using the binarization method, the processing result needs to be analyzed.
  • the prior art generally has difficulty in obtaining high-accuracy results for binarization analysis in a scene with low contrast or complex texture.
  • the deep learning engine based on the convolutional neural network provided by the embodiment is a deep learning neural network based on big data, and the output of the confidence is accurate and the output speed is fast, which makes up the scene requirements of the prior art. Insufficient high and poor accuracy. Therefore, based on the engine, the processing result in step 103 is evaluated, and the robustness and accuracy are higher than the traditional evaluation method of binarization processing results.
  • the embodiment of the present application can adaptively calculate the text confidence of the processing result of the binarization method based on the sliding window and the text confidence of the processing result of the binarization method based on the color value statistics, and thereby The processing result is selected in S105.
  • FIG. 8 shows a picture to be processed.
  • FIG. 9 shows the processing result of the to-be-processed picture in FIG. 8 in the binarization method based on the sliding window in the embodiment of the present application.
  • FIG. 10 shows the processing result of the binarization method based on the color value statistics in the embodiment of the present application on the to-be-processed image in FIG. 8 .
  • 9 and FIG. 10 are input into the deep learning engine based on CNN, and the confidence of each character in FIG. 9 and FIG. 10 is obtained, and the text confidence of FIG. 9 and FIG. 10 is calculated, and the text of FIG. 9 in this embodiment is calculated.
  • the confidence level is 0.88, and the text confidence of FIG. 10 is 0.97. Therefore, the processing result of FIG. 10 is selected as the binarization result for the picture to be processed in FIG.
  • FIG. 11 shows a picture to be processed.
  • FIG. 12 shows the processing result of the to-be-processed picture in FIG. 11 in the binarization method based on the sliding window in the embodiment of the present application.
  • FIG. 13 shows the processing result of the binarization method based on color value statistics in the embodiment of the present application on the to-be-processed image in FIG. 11 .
  • 12 and FIG. 13 are input into the deep learning engine based on CNN, and the confidence of each character in FIG. 12 and FIG. 13 is obtained, and the text confidence of FIG. 12 and FIG. 13 is calculated, and the text of FIG. 12 in this embodiment is calculated.
  • the confidence level is 0.99, and the text confidence of Fig. 13 is 0.94. Therefore, the processing result of Fig. 13 is selected as the binarization result for the picture to be processed in Fig. 11.
  • the image to be processed needs to be separately processed by various complementary methods of binarization, and then the confidence of the single text is obtained by using the learning engine based on optical character recognition, thereby calculating the text confidence, that is,
  • the optimal processing result can be dynamically selected. Seamless switching of processing results of various binarization methods can be achieved without concern for global information or local textures.
  • FIG. 14 shows a block diagram of a binarization device for a picture.
  • the device has a function of implementing the above method, and the function may be implemented by hardware or may be implemented by hardware.
  • the device can include:
  • the to-be-processed picture acquisition module 201 is configured to acquire a picture to be processed. It can be used to perform step 101.
  • the processing result obtaining module 202 is configured to perform independent binarization processing on the to-be-processed image by using a plurality of preset binarization processing methods, and each binarization method obtains a processing result. It can be used to perform step 102.
  • the processing result set obtaining module 203 is configured to obtain a processing result set according to the processing result. It can be used to perform step 103.
  • the text confidence calculation module 204 is configured to calculate a text confidence of each processing result in the processing result set. It can be used to perform step 104.
  • the binarization result obtaining module 205 is configured to select a processing result with the highest degree of confidence in the text as a binarization result of the to-be-processed picture. It can be used to perform step 105.
  • the text confidence calculation module 204 includes:
  • the confidence acquiring unit 2041 is configured to obtain a confidence level of each character in the processing result. It can be used to perform step 1041.
  • the text confidence calculation unit 2042 is configured to calculate the text confidence of the processing result according to a preset text confidence algorithm and a confidence level of each character. It can be used to perform step 1042.
  • the text confidence calculation unit 2042 may include:
  • the weight setting module 20421 is configured to set a weight corresponding to each character. It can be used to perform step S1.
  • the average confidence calculation module 20422 is configured to calculate a weighted average confidence of the processing result. It can be used to perform steps S2 and S3.
  • the text confidence obtaining module 20423 is configured to use the weighted average confidence as a text confidence. It can be used to perform step S4.
  • the processing result obtaining module 202 includes:
  • the sliding window binarization unit 2021 is configured to perform binarization processing on the image to be processed based on the binarization method of the sliding window.
  • the color value statistical binarization unit 2022 is configured to perform binarization processing on the image to be processed based on the binarization method of the color value statistics.
  • FIG. 17 shows a block diagram of a sliding window binarization unit, which includes:
  • the window setting module 20211 is configured to set a window to a preset position of the to-be-processed picture. It can be used to perform step T1.
  • the first determining module 20212 is configured to determine whether the pixel and the related pixel in the window belong to a continuous pattern; and the related pixel is a pixel adjacent to the window outside the window. It can be used to perform step T2.
  • the local binarization module 20213 is configured to perform local binarization on pixels in the window. It can be used to perform step T3.
  • the second determining module 20214 is configured to determine whether the sliding of the window reaches an end point of the preset trajectory. It can be used to perform step T4.
  • the moving module 20215 is configured to move the window according to a preset trajectory. It can be used to perform step T5.
  • FIG. 18 shows a block diagram of a color value statistical binarization unit
  • the color value statistical binarization unit 2022 includes:
  • the statistical result obtaining module 20221 is configured to obtain a color distribution statistical result of the pixel of the to-be-processed picture. It can be used to perform step P1.
  • the target color obtaining module 20222 is configured to obtain two target colors using a preset color clustering algorithm based on the color distribution statistical result. It can be used to perform step P2.
  • the setting module 20223 is configured to set a foreground color and a background color according to the two target colors. It can be used to perform step P3.
  • the determining module 20224 is configured to sequentially calculate a first distance and a second distance of the pixel of the to-be-processed picture, and determine a attribution of the pixel according to a calculation result; the first distance is a color of the pixel and the foreground The Euclidean distance between the colors, the second distance being the Euclidean distance between the pixel color and the background color. It can be used to perform step P4.
  • the binarization module 20225 is configured to binarize pixels in the to-be-processed image according to the determination result. It can be used to perform step P5.
  • FIG. 19 it is a schematic structural diagram of a terminal provided by an embodiment of the present application.
  • the terminal is used to implement the binarization method of a picture provided in the foregoing embodiment.
  • the terminal may include a radio frequency (RF) circuit 110, a memory 120 including one or more computer readable storage media, an input unit 130, a display unit 140, a sensor 150, an audio circuit 160, and wireless fidelity (wireless) A fidelity, WiFi) module 170, a processor 180 including one or more processing cores, and a power supply 190 and the like.
  • RF radio frequency
  • the RF circuit 110 can be used for transmitting and receiving information or during a call, and receiving and transmitting signals. Specifically, after receiving downlink information of the base station, the downlink information is processed by one or more processors 180. In addition, the data related to the uplink is sent to the base station. .
  • the RF circuit 110 includes, but is not limited to, an antenna, at least one amplifier, a tuner, one or more oscillators, a Subscriber Identity Module (SIM) card, a transceiver, a coupler, and a Low Noise Amplifier (LNA). , duplexer, etc.
  • RF circuitry 110 can also communicate with the network and other devices via wireless communication.
  • the wireless communication may use any communication standard or protocol, including but not limited to Global System of Mobile communication (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access (Code). Division Multiple Access (CDMA), Wideband Code Division Multiple Access (WCDMA), Long Term Evolution (LTE), E-mail, Short Messaging Service (SMS), etc.
  • GSM Global System of Mobile communication
  • GPRS General Packet Radio Service
  • Code Division Multiple Access Code Division Multiple Access
  • CDMA Code Division Multiple Access
  • WCDMA Wideband Code Division Multiple Access
  • LTE Long Term Evolution
  • E-mail Short Messaging Service
  • the memory 120 can be used to store software programs and modules, and the processor 180 executes various functional applications and data processing by running software programs and modules stored in the memory 120.
  • the memory 120 may mainly include a storage program area and an storage data area, wherein the storage program area may store an operating system, an application required for the function, and the like; the storage data area may store data or the like created according to the use of the terminal.
  • memory 120 can include high speed random access memory, and can also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, memory 120 may also include a memory controller to provide access to memory 120 by processor 180 and input unit 130.
  • the input unit 130 can be configured to receive input numeric or character information and to generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function controls.
  • input unit 130 can include touch-sensitive surface 131 as well as other input devices 132.
  • Touch-sensitive surface 131 also referred to as a touch display or trackpad, can collect touch operations on or near the user (such as a user using a finger, stylus, etc., on any suitable object or accessory on touch-sensitive surface 131 or The operation near the touch-sensitive surface 131) and driving the corresponding connecting device according to a preset program.
  • the touch sensitive surface 131 can include two portions of a touch detection device and a touch controller.
  • the touch detection device detects the touch orientation of the user, and detects a signal brought by the touch operation, and transmits the signal to the touch controller; the touch controller receives the touch information from the touch detection device, converts the touch information into contact coordinates, and sends the touch information.
  • the processor 180 is provided and can receive commands from the processor 180 and execute them.
  • the touch-sensitive surface 131 can be implemented in various types such as resistive, capacitive, infrared, and surface acoustic waves.
  • the input unit 130 can also include other input devices 132.
  • other input devices 132 may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control buttons, switch buttons, etc.), trackballs, mice, joysticks, and the like.
  • Display unit 140 can be used to display information entered by the user or information provided to the user as well as various graphical user interfaces of the terminal, which can be composed of graphics, text, icons, video, and any combination thereof.
  • the display unit 140 may include a display panel 141.
  • the display panel 141 may be configured in the form of a liquid crystal display (LCD), an organic light-emitting diode (OLED), or the like.
  • the touch-sensitive surface 131 may cover the display panel 141, and when the touch-sensitive surface 131 detects a touch operation thereon or nearby, it is transmitted to the processor 180 to determine the type of the touch event, and then the processor 180 according to the touch event The type provides a corresponding visual output on display panel 141.
  • touch-sensitive surface 131 and display panel 141 are implemented as two separate components to implement input and input functions, in some embodiments, touch-sensitive surface 131 can be integrated with display panel 141 for input. And output function.
  • the terminal may also include at least one type of sensor 150, such as a light sensor, a motion sensor, and other sensors.
  • the light sensor may include an ambient light sensor and a proximity sensor, wherein the ambient light sensor may adjust the brightness of the display panel 141 according to the brightness of the ambient light, and the proximity sensor may close the display panel 141 when the terminal moves to the ear. And / or backlight.
  • the gravity acceleration sensor can detect the magnitude of acceleration in each direction (usually three axes). When it is stationary, it can detect the magnitude and direction of gravity.
  • attitude of the terminal such as horizontal and vertical screen switching, related Game, magnetometer attitude calibration), vibration recognition related functions (such as pedometer, tapping), etc.; as for the terminal can also be configured with gyroscopes, barometers, hygrometers, thermometers, infrared sensors and other sensors, here No longer.
  • An audio circuit 160, a speaker 161, and a microphone 162 can provide an audio interface between the user and the terminal.
  • the audio circuit 160 can transmit the converted electrical data of the received audio data to the speaker 161 for conversion to the sound signal output by the speaker 161; on the other hand, the microphone 162 converts the collected sound signal into an electrical signal by the audio circuit 160. After receiving, it is converted into audio data, and then processed by the audio data output processor 180, transmitted to the terminal, for example, via the RF circuit 110, or outputted to the memory 120 for further processing.
  • the audio circuit 160 may also include an earbud jack to provide communication of the peripheral earphones with the terminal.
  • WiFi is a short-range wireless transmission technology
  • the terminal can help users to send and receive emails, browse web pages, and access streaming media through the WiFi module 170, which provides wireless broadband Internet access for users.
  • FIG. 19 shows the WiFi module 170, it can be understood that it does not belong to the necessary configuration of the terminal, and may be omitted as needed within the scope of not changing the essence of the application.
  • the processor 180 is the control center of the terminal, connecting various portions of the entire terminal using various interfaces and lines, by running or executing software programs and/or modules stored in the memory 120, and recalling data stored in the memory 120. Performing various functions and processing data of the terminal to perform overall monitoring on the terminal.
  • the processor 180 may include one or more processing cores; preferably, the processor 180 may integrate an application processor and a modem processor, where the application processor mainly processes an operating system, a user interface, an application, and the like.
  • the modem processor primarily handles wireless communications. It can be understood that the above modem processor may not be integrated into the processor 180.
  • the terminal further includes a power source 190 (such as a battery) for supplying power to each component.
  • a power source 190 such as a battery
  • the power source can be logically connected to the processor 180 through the power management system to manage functions such as charging, discharging, and power management through the power management system.
  • Power supply 190 may also include any one or more of a DC or AC power source, a recharging system, a power failure detection circuit, a power converter or inverter, a power status indicator, and the like.
  • the terminal may further include a camera, a Bluetooth module, and the like, and details are not described herein again.
  • the display unit of the terminal is a touch screen display
  • the terminal further includes a memory, and one or more programs, wherein one or more programs are stored in the memory and configured to be processed by one or more
  • the execution of one or more programs includes instructions for performing the binarization method of one of the above pictures.
  • non-transitory computer readable storage medium comprising instructions, such as a memory comprising instructions executable by a processor of a terminal to perform the various steps of the above method embodiments.
  • the non-transitory computer readable storage medium may be a ROM, a random access memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, and an optical data storage device.
  • a plurality as referred to herein means two or more.
  • "and/or” describing the association relationship of the associated objects, indicating that there may be three relationships, for example, A and/or B, which may indicate that there are three cases where A exists separately, A and B exist at the same time, and B exists separately.
  • the character "/" generally indicates that the contextual object is an "or" relationship.
  • a person skilled in the art may understand that all or part of the steps of implementing the above embodiments may be completed by hardware, or may be instructed by a program to execute related hardware, and the program may be stored in a computer readable storage medium.
  • the storage medium mentioned may be a read only memory, a magnetic disk or an optical disk or the like.
  • the disclosed system, apparatus, and method may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
  • the integrated unit if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable storage medium.
  • a computer readable storage medium A number of instructions are included to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present application.
  • the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)
  • Character Input (AREA)

Abstract

Disclosed are an image thresholding method and device, and a terminal. The present invention only requires independently using various thresholding methods having favorable compensation performance to process an image to be processed, then using an optical character recognition-based learning engine to obtain confidence values of respective characters so as to further calculate a confidence value of a text, thereby dynamically selecting an optimal processing result. The invention does not need to take into consideration global information or local texture and realizes a seamless shift between processing results produced by various thresholding methods. The present invention enables dynamic selection of an optimal thresholding result under different scenarios, thereby meeting diverse requirements of different scenarios, and enabling thresholding of an image in various scenarios.

Description

图片的二值化方法、装置及终端Image binarization method, device and terminal
本申请要求于2017年01月17日提交中国专利局、申请号为201710031170.X名称为“一种图片的二值化方法、装置及终端”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims priority to Chinese Patent Application No. JP-A No. No. No. No. No. No. No. No. No. No. No. Combined in this application.
技术领域Technical field
本申请涉及图像处理领域。This application relates to the field of image processing.
背景技术Background technique
图像的二值化,就是将图像上的像素点的灰度值设置为0或255,从而将整个图像呈现出明显的只有黑和白的视觉效果。二值化是图像处理的基本操作,其应用非常广泛。相应地,二值化方法很多,比如双峰法、P参数法、迭代法和最大类间方差法等。The binarization of the image is to set the gray value of the pixel on the image to 0 or 255, so that the entire image presents a distinct black and white visual effect. Binarization is the basic operation of image processing and its application is very extensive. Accordingly, there are many binarization methods, such as the bimodal method, the P-parameter method, the iterative method, and the maximum inter-class variance method.
然而,二值化方法的多样性和每个二值化方法的局限性导致了在需要对多种场景的图片进行二值化时,难以快速找到适合的二值化方法,从而影响了图片的二值化效果。However, the diversity of binarization methods and the limitations of each binarization method lead to the difficulty of quickly finding a suitable binarization method when it is necessary to binarize pictures of multiple scenes, thus affecting the image. Binary effect.
发明内容Summary of the invention
为了解决上述技术问题,本申请提出了一种图片的二值化方法、装置及终端。In order to solve the above technical problem, the present application proposes a method, a device and a terminal for binarization of pictures.
本申请实施例的技术方案如下:The technical solution of the embodiment of the present application is as follows:
一方面,提供了一种图片的二值化方法,该方法包括:In one aspect, a method for binarization of a picture is provided, the method comprising:
图片的二值化装置获取待处理图片,该待处理图片中包含文字;The binarization device of the image acquires a to-be-processed image, and the to-be-processed image contains text;
该图片的二值化装置分别使用多个预设的二值化处理方法对该待处理图片进行独立的二值化处理,每个二值化方法得到一个处理结果;The binarization device of the image separately performs independent binarization processing on the to-be-processed image by using a plurality of preset binarization processing methods, and each binarization method obtains a processing result;
该图片的二值化装置根据该处理结果,得到处理结果集合;The binarization device of the picture obtains a set of processing results according to the processing result;
该图片的二值化装置计算该处理结果集合中的每一个处理结果的文字置信度;The binarization device of the picture calculates a text confidence of each processing result in the set of processing results;
该图片的二值化装置选取文字置信度最高的处理结果作为该待处理图片的二值化结果。The binarization device of the picture selects the processing result with the highest text confidence as the binarization result of the to-be-processed picture.
另一方面,提供了一种图片的二值化装置,On the other hand, a binarization device for pictures is provided,
一种可能实现方式中,所述装置包括:In a possible implementation manner, the device includes:
待处理图片获取模块,用于获取待处理图片;The image acquisition module to be processed is used to obtain a picture to be processed;
处理结果得到模块,用于分别使用多个预设的二值化处理方法对所述待处理图片进行独立的二值化处理,每个二值化方法得到一个处理结果;The processing result obtaining module is configured to perform independent binarization processing on the to-be-processed image by using a plurality of preset binarization processing methods, and each binarization method obtains a processing result;
处理结果集合得到模块,用于根据所述处理结果,得到处理结果集合;Processing a result set obtaining module, configured to obtain a processing result set according to the processing result;
文字置信度计算模块,用于计算所述处理结果集合中的每一个处理结果的文字置信度;a text confidence calculation module, configured to calculate a text confidence of each processing result in the processing result set;
二值化结果得到模块,用于选取文字置信度最高的处理结果作为对所述待处理图片的二值化结果。The binarization result obtaining module is configured to select a processing result with the highest degree of text confidence as a binarization result of the to-be-processed picture.
另一种可能实现方式中,该装置包括:In another possible implementation, the device includes:
收发器,处理器以及总线;Transceiver, processor and bus;
所述收发器与所述处理器通过所述总线相连;The transceiver and the processor are connected by the bus;
所述处理器,执行如下步骤:The processor performs the following steps:
获取待处理图片;Get the image to be processed;
分别使用多个预设的二值化处理方法对所述待处理图片进行独立的二值化处理,每个二值化方法得到一个处理结果;Performing independent binarization processing on the to-be-processed image by using a plurality of preset binarization processing methods, and each binarization method obtains a processing result;
根据所述处理结果,得到处理结果集合;Obtaining a set of processing results according to the processing result;
计算所述处理结果集合中的每一个处理结果的文字置信度;Calculating a text confidence of each processing result in the set of processing results;
选取文字置信度最高的处理结果作为对所述待处理图片的二值化结果。The processing result with the highest degree of confidence in the text is selected as the binarization result of the image to be processed.
另一方面,提供了一种图片的二值化终端,所述终端包括上述的一种图片的二值化装置。In another aspect, a binarization terminal for a picture is provided, the terminal comprising a binarization device of the above picture.
另一方面,本申请实施例提供一种计算机可读存储介质,包括指令,当该指令在计算机上运行时,该计算机执行上述第一方面所述的方法。In another aspect, an embodiment of the present application provides a computer readable storage medium, comprising instructions that, when executed on a computer, perform the method described in the first aspect above.
另一方面,本申请实施例提供一种包含指令的计算机程序产品,当该计算机程序产品在计算机上运行时,该计算机执行上述第一方面所述的方法。In another aspect, an embodiment of the present application provides a computer program product comprising instructions for performing the method of the first aspect described above when the computer program product is run on a computer.
本申请提供一种图片的二值化方法、装置及终端,具有如下有益效果:The present application provides a method, a device and a terminal for binarization of pictures, which have the following beneficial effects:
本申请基于光学字符识别计算待处理图片二值化结果的文字置信度,并根据文字置信度动态选择最优的二值化方法,从而获取对于待处理图片的最优二值化结果。本申请能够在不同场景中动态选择最优的二值化结果从而满足不同场景的多样性需求,实现了对于图片二值化的全场景适配。The present application calculates the text confidence of the binarization result of the image to be processed based on the optical character recognition, and dynamically selects an optimal binarization method according to the text confidence, thereby obtaining an optimal binarization result for the image to be processed. The present application can dynamically select an optimal binarization result in different scenarios to meet the diversity requirements of different scenarios, and implement full scene adaptation for picture binarization.
附图说明DRAWINGS
图1是本申请实施例中一种图片的二值化方法的流程图;1 is a flowchart of a method for binarization of a picture in the embodiment of the present application;
图2是本申请实施例中文字置信度的获取方法流程图;2 is a flowchart of a method for acquiring text confidence in an embodiment of the present application;
图3是本申请实施例中加权平均值算法的流程图;3 is a flowchart of a weighted average algorithm in the embodiment of the present application;
图4是本申请实施例中基于滑动窗的二值化方法的流程图;4 is a flowchart of a binarization method based on a sliding window in the embodiment of the present application;
图5是本申请实施例中局部二值化方法的流程图;5 is a flowchart of a local binarization method in an embodiment of the present application;
图6是本申请实施例中基于颜色值统计的二值化方法的流程图;6 is a flowchart of a binarization method based on color value statistics in an embodiment of the present application;
图7是本申请实施例中卷积神经网络的结构图;7 is a structural diagram of a convolutional neural network in an embodiment of the present application;
图8是本申请实施例中一张待处理图片;Figure 8 is a picture to be processed in the embodiment of the present application;
图9是本申请实施例中基于滑动窗的二值化方法对于图8中的待处理图片的处理结果;9 is a processing result of the to-be-processed picture in FIG. 8 in the binarization method based on the sliding window in the embodiment of the present application;
图10是本申请实施例中基于颜色值统计的二值化方法对于图8中的待处理图片的处理结果;10 is a processing result of the to-be-processed picture in FIG. 8 by the binarization method based on color value statistics in the embodiment of the present application;
图11是本申请实施例中另一张待处理图片;Figure 11 is another picture to be processed in the embodiment of the present application;
图12是本申请实施例中基于滑动窗的二值化方法对于图11中的待处理图片的处理结果;FIG. 12 is a processing result of the to-be-processed picture in FIG. 11 according to the binarization method based on the sliding window in the embodiment of the present application; FIG.
图13是本申请实施例中基于颜色值统计的二值化方法对于图11中的待处理图片的处理结果;FIG. 13 is a processing result of the binarization method based on color value statistics in the embodiment of the present application for the to-be-processed picture in FIG. 11;
图14是本申请实施例中一种图片的二值化装置的框图;FIG. 14 is a block diagram of a binarization apparatus for a picture in the embodiment of the present application; FIG.
图15是本申请实施例中文字置信度计算单元的框图;15 is a block diagram of a text confidence calculation unit in the embodiment of the present application;
图16是本申请实施例中处理结果得到模块的框图;16 is a block diagram of a processing result obtaining module in the embodiment of the present application;
图17是本申请实施例中滑动窗二值化单元的框图;17 is a block diagram of a sliding window binarization unit in the embodiment of the present application;
图18是本申请实施例中颜色值统计二值化单元的框图;18 is a block diagram of a color value statistical binarization unit in the embodiment of the present application;
图19是本申请实施例中终端的结构框图。FIG. 19 is a structural block diagram of a terminal in an embodiment of the present application.
具体实施方式detailed description
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。The technical solutions in the embodiments of the present application are clearly and completely described in the following with reference to the drawings in the embodiments of the present application. It is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments.
请参考图1,其示出了本申请一个实施例提供的一种图片的二值化方法的 流程图。该方法可以包括如下步骤。Please refer to FIG. 1, which is a flowchart of a binarization method for a picture provided by an embodiment of the present application. The method can include the following steps.
步骤101,获取待处理图片,所述待处理图片中包含文字。Step 101: Acquire a picture to be processed, where the picture to be processed includes text.
步骤102,分别使用多个预设的二值化处理方法对所述待处理图片进行独立的二值化处理,每个二值化方法得到一个处理结果。Step 102: Perform independent binarization processing on the to-be-processed image by using a plurality of preset binarization processing methods, and each binarization method obtains a processing result.
预设的二值化方法可以选择现有的二值化方法,预设的二值化方法的数量可以为二个或二个以上。The preset binarization method can select an existing binarization method, and the number of preset binarization methods can be two or more.
现有的二值化方法主要分为两类,一类是全局方法,以全局视角确定统一的分割阈值,通过分割阈值进行二值化;另一类是局部自适应的方法,根据图像不同区域的情况,确定不同的阈值并根据所述阈值进行二值化。The existing binarization methods are mainly divided into two categories. One is a global method, which determines a unified segmentation threshold from a global perspective and binarizes by segmentation thresholds. The other is a locally adaptive method based on different regions of the image. In the case, different thresholds are determined and binarized according to the threshold.
全局方法大多根据图像的全局颜色统计信息,计算出一个能取得最大二值化效果的分割阈值,然后根据所述分割阈值进行简单的二值化。这种方法仅在背景简单且颜色单一的图像中效果较好,对纹理信息复杂或低对比度的图像效果较差。The global method mostly calculates a segmentation threshold that can achieve the maximum binarization effect according to the global color statistics of the image, and then performs simple binarization according to the segmentation threshold. This method works well only in images with simple background and single color, and poor effect on images with complex texture information or low contrast.
局部自适应方法大多根据局部纹理信息来计算二值化阈值,在一定程度上可以避免全局阈值的误判,但是由此经常由于过于注重于局部信息,忽略全局统筹信息,而常常造成相邻的局部二值化效果迥异,相邻区域二值化效果不连贯的问题。The local adaptive method mostly calculates the binarization threshold based on the local texture information, which can avoid the misjudgment of the global threshold to a certain extent, but often it is too focused on local information, ignoring the global coordination information, and often causing adjacent The effect of local binarization is very different, and the effect of binarization in adjacent areas is inconsistent.
可见,现有的二值化方法都只能处理固定场景,自适应能力不强。为优化图像的二值化效果,提高步骤101中的待处理图片中的文字提取的准确度,在步骤102中可以枚举多种二值化方法,并通过后续计算的文字置信度来选择最佳的二值化方法,从而起到集各种二值化方法之所长、取长补短的效果,进而扩展图片的二值化场景,并且获取最佳的二值化效果。It can be seen that the existing binarization methods can only deal with fixed scenes, and the adaptive ability is not strong. In order to optimize the binarization effect of the image, the accuracy of the text extraction in the image to be processed in step 101 is improved. In step 102, a plurality of binarization methods can be enumerated, and the text confidence of the subsequent calculation is used to select the most The good binarization method is used to collect the effects of various binarization methods and lengthen the shortcomings, thereby expanding the binarization scene of the image and obtaining the best binarization effect.
步骤103,根据所述处理结果,得到处理结果集合。Step 103: Obtain a processing result set according to the processing result.
在步骤102中枚举多种二值化方法,每个二值化方法得到一个处理结果,从而构成了处理结果集合。In step 102, a plurality of binarization methods are enumerated, and each binarization method obtains a processing result, thereby constituting a processing result set.
步骤104,计算所述处理结果集合中的每一个处理结果的文字置信度。Step 104: Calculate a text confidence of each processing result in the processing result set.
所述文字置信度用于表征所述处理结果中的文字能够被准确识别的概率,所述文字置信度可以作为二值化方法的处理效果的评价指标。文字置信度高,则说明二值化处理效果理想,文字置信度低,则说明二值化处理效果不理想。The text confidence is used to characterize the probability that the text in the processing result can be accurately recognized, and the text confidence can be used as an evaluation index of the processing effect of the binarization method. The high confidence of the text indicates that the binarization processing effect is ideal, and the text confidence is low, indicating that the binarization processing effect is not satisfactory.
步骤105,选取文字置信度最高的处理结果作为对所述待处理图片的二值化结果。Step 105: Select a processing result with the highest degree of confidence in the text as a binarization result for the to-be-processed picture.
若存在文字置信度最高的处理结果有多个的时候,则按照预设的选择方法在所述处理结果中选择一个作为对所述待处理图片的二值化结果。所述预设的选择方法可以为随机选择或其它选择方法。If there are multiple processing results with the highest degree of confidence in the text, one of the processing results is selected as the binarization result for the to-be-processed image according to a preset selection method. The preset selection method may be a random selection or other selection method.
本申请实施例通过枚举多种二值化方法并动态选择最优的处理结果的方式,从而可以对各个场景的图片均能够进行二值化处理,并取得较好的二值化效果,从而提高了对于图片处理的兼容性;以文字置信度作为图片二值化效果的评价标准,可以使得选出的处理结果获取最好的文字识别结果,有利于后期对于所述结果进行其它的文字处理。The embodiment of the present application can enumerate various binarization methods and dynamically select an optimal processing result, so that the pictures of each scene can be binarized and a better binarization effect is obtained. Improve the compatibility of image processing; use text confidence as the evaluation standard of image binarization effect, which can make the selected processing result get the best text recognition result, which is beneficial to other word processing for the result later. .
进一步地,请参考图2,其示出了步骤104中的文字置信度的获取方法流程图,包括:Further, please refer to FIG. 2, which shows a flow chart of a method for acquiring text confidence in step 104, including:
步骤1041,获取处理结果中每一个文字的置信度。Step 1041: Acquire a confidence level of each character in the processing result.
具体地,通过将所述处理结果输入预设的基于光学字符识别(Optical Character Recognition,OCR)的学习引擎,得到所述学习引擎输出的置信度。所述学习引擎可以为基于卷积神经网络(Convolutional Neural Network,CNN)实现的深度学习引擎,所述基于CNN实现的深度学习引擎可以更好的识别单字图片,具有准确度高,置信度准确等特点,优于一般的传统识别引擎。此外,所述基于CNN实现的深度学习引擎相较于传统的图像处理算法的优点之一在于,避免了对图像复杂的前期预处理过程(提取人工特征等),可以直接输入原始图像。比如,可以直接输入28*28分辨率的单字图片,并直接输出置信度。相对传统方法,置信度的输出结果更加可靠。Specifically, the confidence of the output of the learning engine is obtained by inputting the processing result into a preset optical character recognition (OCR)-based learning engine. The learning engine may be a deep learning engine based on a Convolutional Neural Network (CNN), and the deep learning engine based on the CNN can better recognize single-word images, and has high accuracy, accurate confidence, and the like. Features are superior to the traditional traditional recognition engine. In addition, one of the advantages of the CNN-implemented deep learning engine over the conventional image processing algorithm is that the pre-processing process (extracting artificial features, etc.) for complex images is avoided, and the original image can be directly input. For example, you can directly input a single-word picture with a resolution of 28*28 and output the confidence directly. The output of confidence is more reliable than traditional methods.
此外,传统的Tesseract学习引擎以及Nhocr引擎也支持置信度的识别,同样可以用于本实施例中。In addition, the traditional Tesseract learning engine and the Nhocr engine also support the recognition of confidence, which can also be used in this embodiment.
在基于卷积神经网络(Convolutional Neural Network,CNN)实现的深度学习引擎,传统的Tesseract学习引擎以及Nhocr引擎中,输出的单个文字的置信度均为在0至1之间的小数。In the deep learning engine based on the Convolutional Neural Network (CNN), the traditional Tesseract learning engine and the Nhocr engine, the output of a single text has a confidence of between 0 and 1.
步骤1042,根据预设的文字置信度算法和每一个文字的置信度计算所述每一个处理结果的文字置信度。Step 1042: Calculate the text confidence of each of the processing results according to a preset text confidence algorithm and a confidence level of each text.
具体地,所述预设的文字置信度算法包括但不限于以置信度的加权平均值作为文字置信度的加权平均值算法、以置信度的几何平均作为文字置信度的几何平均值算法、以置信度的平方平均值作为文字置信度的平方平均值算法以及以置信度的调和平均值作为文字置信度的调和平均值算法。以加权平均值算法为例,请参考图3,其示出了加权平均值算法的流程图,包括:Specifically, the preset text confidence algorithm includes, but is not limited to, a weighted average algorithm using a weighted average of confidence as a text confidence, and a geometric mean algorithm using a geometric mean of confidence as a text confidence. The squared mean of the confidence is used as the squared mean algorithm of the text confidence and the harmonic mean of the confidence is used as the harmonic mean of the text confidence. Taking the weighted average algorithm as an example, please refer to FIG. 3, which shows a flowchart of the weighted average algorithm, including:
S1.设定每一个文字对应的权值。S1. Set the weight corresponding to each text.
以N个文字为例,按照文字在图片中的先后顺序,文字的权值可以分别为Q 0……Q n-1。每个文字对应的权值可以由程序随机设定,也可以根据实际需要进行有针对性的设定。 Taking N characters as an example, according to the order of the characters in the picture, the weights of the characters can be Q 0 ... Q n-1 respectively . The weight corresponding to each text can be set randomly by the program, or can be set according to actual needs.
S2.根据每一个文字的置信度和所述文字对应的权值对置信度进行加权求和。S2. Weighting and summing the confidence according to the confidence of each character and the weight corresponding to the text.
按照文字在图片中的先后顺序,每个文字的置信度可以分别为Z 0……Z n-1,则加权求和的过程可以表示为
Figure PCTCN2018072047-appb-000001
According to the order of the text in the picture, the confidence of each text can be Z 0 ... Z n-1 respectively , then the process of weighted summation can be expressed as
Figure PCTCN2018072047-appb-000001
S3.由加权求和的结果除以所述处理结果中的文字的数量得到加权平均置信度。S3. The weighted average confidence is obtained by dividing the result of the weighted sum by the number of words in the processing result.
加权平均置信度为
Figure PCTCN2018072047-appb-000002
Weighted average confidence is
Figure PCTCN2018072047-appb-000002
S4.将所述加权平均置信度作为文字置信度。S4. The weighted average confidence is taken as the text confidence.
以加权平均置信度
Figure PCTCN2018072047-appb-000003
作为文字置信度应用于步骤S105之中
Weighted average confidence
Figure PCTCN2018072047-appb-000003
Applyed as text confidence to step S105
本申请实施例中通过选用不同的文字置信度算法以及在选定的文字置信度算法中设置不同的参数,可以提升文字置信度的可靠性和区分能力,支持对于多个处理结果的区分,并从多个处理结果中区分出最优的处理结果。In the embodiment of the present application, by selecting different text confidence algorithms and setting different parameters in the selected text confidence algorithm, the reliability and distinguishing ability of the text confidence can be improved, and the distinction between multiple processing results is supported, and The optimal processing result is distinguished from the plurality of processing results.
进一步地,在步骤102中,可以使用现有的二值化方法,也可以使用自定义的二值化方法,以如下几种可能的二值化方法为例:Further, in step 102, an existing binarization method may be used, or a custom binarization method may be used, and the following possible binarization methods are taken as an example:
一种可能实现方式:直接二值化。One possible implementation: direct binarization.
对图像灰度化以后,扫描图像的每个像素值,将灰度值小于127的像素的灰度值设为0(黑色),将灰度值大于等于127的像素的灰度值设为255(白色),所述方法的好处是计算量少速度快。缺点是没有考虑图像的像素分布情况与像素值特征。After the image is grayed out, for each pixel value of the scanned image, the gray value of the pixel whose gray value is less than 127 is set to 0 (black), and the gray value of the pixel whose gray value is greater than or equal to 127 is set to 255. (White), the advantage of the method is that the amount of calculation is small and fast. The disadvantage is that the pixel distribution and pixel value characteristics of the image are not considered.
另一种可能实现方式:基于平均值K的二值化。Another possible implementation: binarization based on the mean K.
对图像灰度化以后,计算图像中像素的平均值K;扫描图像的每个像素的灰度值,若灰度值大于K,则将所述像素的灰度值设为255(白色),若灰度值小于等于K,则将所述像素的灰度值设为0(黑色)。该方法使用平均值作为二值化阈值虽然简单,但是可能导致部分对象像素或者背景像素丢失。二值化结果难以真实反映源图像信息。After the image is grayscaled, the average value K of the pixels in the image is calculated; the gray value of each pixel of the scanned image, if the gray value is greater than K, the gray value of the pixel is set to 255 (white), If the gradation value is less than or equal to K, the gradation value of the pixel is set to 0 (black). This method uses the average value as the binarization threshold, although simple, but may result in partial object pixels or background pixels being lost. The binarization result is difficult to truly reflect the source image information.
另一种可能实现方式:最大类间方差法。Another possible implementation: the maximum interclass variance method.
假设图像是由前景区域和背景区域两部分组成的,通过遍历计算不同阈值(通常为[0,255]区间范围内)下的分割结果中前景区域和背景区域的灰度直方图,然后比较两者之间的方差,使得方差最大化的那个灰度阈值即为所求二值化阈值。Suppose the image is composed of two parts, the foreground area and the background area. By traversing, calculate the gray histogram of the foreground area and the background area in the segmentation result under different thresholds (usually within the range of [0, 255]), and then compare the two. The variance between the variances, the grayscale threshold that maximizes the variance, is the desired binarization threshold.
扫描图像的每个像素值,若灰度值大于所述二值化阈值,则所述像素的灰度值设为255(白色),若灰度值小于等于所述二值化阈值,则所述像素的灰度值值设为0(黑色)。For each pixel value of the scanned image, if the gray value is greater than the binarization threshold, the gray value of the pixel is set to 255 (white), and if the gray value is less than or equal to the binarization threshold, then The gray value value of the pixel is set to 0 (black).
最大类间方差法是二值化的经典方法,其在计算速度以及二值化效果之间取得了较好的平衡效果,但是作为全局化的二值化方法,其对纹理信息复杂或低对比度的图像效果较差。The largest interclass variance method is a classical method of binarization, which achieves a good balance between computational speed and binarization effect, but as a global binarization method, its texture information is complex or low contrast. The image is less effective.
上述常见的二值化方法或本申请实施例中列出的其它二值化方法均可以应用于步骤S102中。为取得较好的二值化效果,本申请实施例在步骤S102中通过基于滑动窗的二值化方法和基于颜色值统计的二值化方法对待处理图片进行二值化处理。The above-mentioned common binarization method or other binarization methods listed in the embodiments of the present application can be applied to step S102. In order to achieve a better binarization effect, the embodiment of the present application performs binarization processing on the picture to be processed by the binarization method based on the sliding window and the binarization method based on the color value statistics in step S102.
请参考图4,其使出了基于滑动窗的二值化方法的流程图,包括:Please refer to FIG. 4, which shows a flowchart of a sliding window based binarization method, including:
步骤T1.将窗口设置于所述待处理图片的预设位置。Step T1. Set a window to a preset position of the to-be-processed picture.
所述窗口的大小和形状可以根据实际需要进行设置,以包含M*N个像素的窗口为例,将所述窗口设置于预设位置。所述预设位置也可以根据实际需要进行设置,具体地,可以位于待处理图片的左上角或右下角,对于宽度为M个像素的待处理图片,可以位于所述待处理图片的最左端或最右端。The size and shape of the window may be set according to actual needs. Taking a window containing M*N pixels as an example, the window is set at a preset position. The preset position may also be set according to actual needs. Specifically, it may be located at the upper left corner or the lower right corner of the to-be-processed picture. For a picture to be processed with a width of M pixels, it may be located at the leftmost end of the to-be-processed picture or The far right.
步骤T2.判断所述窗口内的像素与相关像素是否属于连续的图案。Step T2. Determine whether the pixels in the window and the related pixels belong to a continuous pattern.
所述相关像素为窗口外与所述窗口相邻的像素。步骤T2的目的在于判断落入窗口内的待处理图片中的M*N个像素与所述相关像素是否属于连续的图 案,若不属于,则判定所述窗口内包含有文字。The related pixel is a pixel adjacent to the window outside the window. The purpose of step T2 is to determine whether the M*N pixels and the related pixels in the picture to be processed falling within the window belong to a continuous pattern. If not, it is determined that the window contains text.
步骤T3.若否,则对所述窗口内的像素进行局部二值化。Step T3. If no, the pixels in the window are locally binarized.
若窗口内包含有文字,则对于窗口内的像素进行而二值化处理。本实施例中,只关注待处理图片中含有文字部分的二值化效果,因此,若窗口内不含有文字,则不予处理,或者将所述窗口直接设置一个统一的灰度即可。If the window contains text, it is binarized for the pixels in the window. In this embodiment, only the binarization effect of the text portion in the image to be processed is concerned. Therefore, if the window does not contain characters, the processing is not performed, or the window may be directly set to a uniform gray scale.
步骤T4.判断所述窗口是否到达所述预设轨迹的终点。Step T4. Determine whether the window reaches the end point of the preset track.
所述窗口按照预设轨迹滑动,到达所述预设轨迹的尽头处滑动结束。The window slides according to a preset trajectory, and the sliding ends at the end of the preset trajectory.
步骤T5.若否,按照预设轨迹滑动所述窗口。Step T5. If no, the window is slid according to a preset trajectory.
所述预设轨迹可以根据实际需要进行自行设置。对于宽度为M个像素的待处理图片,可以位于沿其长度方向移动所述窗口。The preset track can be set by itself according to actual needs. For a picture to be processed having a width of M pixels, it may be located to move the window along its length.
步骤T6.返回步骤T2。Step T6. Return to step T2.
具体地,步骤T3使用的局部二值化方法可以使用现有的二值化方法实现,在本实施例中,请参考图5,其示出了本实施例中步骤T3中局部二值化方法的流程图,包括:Specifically, the local binarization method used in step T3 can be implemented by using the existing binarization method. In this embodiment, please refer to FIG. 5, which shows the local binarization method in step T3 in this embodiment. Flow chart, including:
步骤T31.得到窗口内的像素的颜色分布统计结果。Step T31. Obtain a color distribution statistical result of the pixels in the window.
步骤T32.根据所述统计结果设定阈值,所述阈值用于区分所述待处理图片的前景和背景。Step T32. Set a threshold according to the statistical result, where the threshold is used to distinguish the foreground and the background of the to-be-processed picture.
假设图片是由前景区域和背景区域两部分组成的,通过选择所述阈值区分出图像的前景像素和背景像素。比如将颜色大于所述阈值的像素划归为前景像素,反之为背景像素;或者将颜色小于所述阈值的像素划归为前景像素,反之为背景像素。所述阈值能够使得基于所述阈值分割后,前景像素的颜色均值和背景像素的颜色均值具有最大的差距。Assuming that the picture is composed of two parts, a foreground area and a background area, the foreground pixel and the background pixel of the image are distinguished by selecting the threshold. For example, a pixel whose color is larger than the threshold is classified as a foreground pixel, and vice versa is a background pixel; or a pixel whose color is smaller than the threshold is classified as a foreground pixel, and vice versa. The threshold can be such that the color mean of the foreground pixels and the color mean of the background pixels have the largest difference after segmentation based on the threshold.
步骤T33.根据所述阈值对所述窗口内的像素进行二值化。Step T33. Binarize pixels in the window according to the threshold.
具体地,可以将前景像素设定为255(白色),背景像素设定为0(黑色);也可以将前景像素设定为0(黑色),背景像素设定为255(白色)。Specifically, the foreground pixel can be set to 255 (white) and the background pixel to 0 (black); the foreground pixel can be set to 0 (black) and the background pixel to 255 (white).
本申请实施例提供的基于滑动窗的二值化方法属于局部自适应方法,较适用于在图片当中分割文本行的场景,在颜色信息较为单一,背景纹理不复杂的场景下能够取得较好的二值化效果。The sliding window-based binarization method provided by the embodiment of the present invention belongs to a local adaptive method, and is more suitable for a scene in which a text line is divided among pictures, and can obtain better in a scene in which color information is relatively simple and the background texture is not complicated. Binary effect.
请参考图6,其示出了基于颜色值统计的二值化方法的流程图,包括:Please refer to FIG. 6, which shows a flowchart of a binarization method based on color value statistics, including:
步骤P1.得到所述待处理图片的像素的颜色分布统计结果。Step P1. Obtain a color distribution statistical result of the pixels of the to-be-processed picture.
步骤P2.基于所述颜色分布统计结果,使用预设的颜色聚类算法得到两个目标颜色。Step P2. Based on the color distribution statistical result, two target colors are obtained using a preset color clustering algorithm.
聚类是对数据的一种聚集,是将类似的数据聚成一类。聚类是一种无监督的分类方式,其优点在于不需要预先的训练过程。通常情况下,通过颜色聚类算法可以缩小颜色空间的范围,增大各个颜色间的距离,从而得到颜色聚类结果(目标颜色),目前比较常用的颜色聚类方式有K-means,混合高斯模型(Gaussian Mixture Models,GMM),Mean shift等方法。Clustering is an aggregation of data that aggregates similar data into one class. Clustering is an unsupervised classification that has the advantage of not requiring an advance training process. Under normal circumstances, the color clustering algorithm can reduce the range of color space, increase the distance between each color, and get the color clustering result (target color). Currently, the commonly used color clustering method is K-means, mixed Gaussian. Gaussian Mixture Models (GMM), Mean shift and other methods.
步骤P3.根据所述两个目标颜色设定前景颜色和背景颜色。Step P3. Set the foreground color and the background color according to the two target colors.
步骤P4.依次计算所述待处理图片的像素的第一距离和第二距离,并根据计算结果判定所述像素的归属。Step P4. The first distance and the second distance of the pixels of the to-be-processed picture are sequentially calculated, and the attribution of the pixel is determined according to the calculation result.
具体地,所述第一距离为所述像素的颜色与所述前景颜色之间的欧几里得距离,所述第二距离为所述像素颜色与所述背景颜色之间的欧几里得距离。若所述第一距离小于第二距离,则判定所述像素归属于前景;若所述第一距离大于第二距离,则判定所述像素归属于背景。Specifically, the first distance is a Euclidean distance between a color of the pixel and the foreground color, and the second distance is a Euclidean between the pixel color and the background color distance. If the first distance is smaller than the second distance, it is determined that the pixel belongs to the foreground; if the first distance is greater than the second distance, it is determined that the pixel belongs to the background.
步骤P5.根据所述判定结果对所述待处理图片中的像素进行二值化。Step P5. Perform binarization on the pixels in the to-be-processed picture according to the determination result.
假设图片是由前景区域和背景区域两部分组成的,通过计算图片中的像素的第一距离和第二距离判断所述像素属于前景像素或背景像素。具体地,可以将前景像素设定为255(白色),背景像素设定为0(黑色);也可以将前景像素设定为0(黑色),背景像素设定为255(白色)。Assuming that the picture is composed of two parts, a foreground area and a background area, the pixel is determined to belong to the foreground pixel or the background pixel by calculating the first distance and the second distance of the pixels in the picture. Specifically, the foreground pixel can be set to 255 (white) and the background pixel to 0 (black); the foreground pixel can be set to 0 (black) and the background pixel to 255 (white).
本申请实施例提供的基于颜色值统计的二值化方法属于全局方法,由于通过聚类方法计算目标颜色,因此可适用于复杂场景,应用范围较广。The binarization method based on the color value statistics provided by the embodiment of the present application belongs to the global method. Since the target color is calculated by the clustering method, it can be applied to a complex scene and has a wide application range.
进一步地,本申请在步骤S104中通过基于CNN实现的深度学习引擎对基于滑动窗的二值化方法和基于颜色值统计的二值化方法对待处理图片进行二值化处理的处理结果进行评价,分别得到基于滑动窗的二值化方法处理结果的文字置信度和基于颜色值统计的二值化方法处理结果的文字置信度。Further, in the step S104, the present application evaluates the processing result of the binarization processing of the image to be processed by the binarization method based on the sliding window and the binarization method based on the color value statistics by the depth learning engine implemented by the CNN. The text confidence of the processing result of the binarization method based on the sliding window and the text confidence of the processing result of the binarization method based on the color value statistics are respectively obtained.
CNN是深度学习技术中极具代表的网络结构之一,在图像处理领域取得了很大的成功,在国际标准的ImageNet数据集上,许多成功的模型都是基于CNN的,本实施例中使用的深度学习引擎同样基于卷积神经网络CNN。CNN 相较于传统的图像处理算法的优点之一在于,避免了对图像复杂的前期预处理过程(提取人工特征等),可以直接输入原始图像,并输出对于单个文字的置信度。CNN is one of the most representative network structures in deep learning technology. It has achieved great success in the field of image processing. On the international standard ImageNet dataset, many successful models are based on CNN, which is used in this embodiment. The deep learning engine is also based on the convolutional neural network CNN. One of the advantages of CNN over traditional image processing algorithms is that it avoids complex pre-processing of images (extracting artificial features, etc.), can directly input the original image, and output confidence for a single text.
图像处理中,通常将图像看成是一个或多个的二维向量,对于灰度化之后的图片,可以将其看做一个二维向量,像素的灰度值即为所述二维向量中的元素;而对于RGB(RGB色彩模式是工业界的一种颜色标准)表示的彩色图片有三个颜色通道,可表示为三张二维向量。传统的神经网络都是采用全连接的方式,即输入层到隐藏层的神经元都是全部连接的,这样做将导致参数量巨大,使得网络训练耗时甚至难以训练,而本实施例中使用的卷积神经网络则通过局部连接、权值共享等方法避免这一困难,因此,相较于传统的学习引擎,本实施例中基于CNN实现的深度学习引擎运算过程中的时间复杂度大大降低,因此具有更优异的运算性能。In image processing, an image is usually regarded as one or more two-dimensional vectors. For a grayscale image, it can be regarded as a two-dimensional vector, and the gray value of the pixel is the two-dimensional vector. The color picture represented by RGB (RGB color mode is a color standard in the industry) has three color channels, which can be represented as three two-dimensional vectors. The traditional neural network adopts the full connection mode, that is, the neurons from the input layer to the hidden layer are all connected, which will result in a large amount of parameters, making the network training time-consuming and even difficult to train, and is used in this embodiment. The convolutional neural network avoids this difficulty through methods such as local connection and weight sharing. Therefore, compared with the traditional learning engine, the time complexity in the deep learning engine operation based on CNN is greatly reduced in this embodiment. Therefore, it has more excellent performance.
本实施例中CNN中主要有两种类型的网络层,分别是卷积层和池化/采样层。卷积层的作用是提取图像的各种特征;池化层的作用是对原始特征信号进行抽象,从而大幅度减少训练参数,另外还可以减轻模型过拟合的程度。In this embodiment, there are mainly two types of network layers in the CNN, which are a convolution layer and a pooling/sampling layer. The function of the convolutional layer is to extract various features of the image; the role of the pooling layer is to abstract the original feature signal, thereby greatly reducing the training parameters, and also reducing the degree of overfitting of the model.
卷积层是卷积核在上一级输入层上通过逐一滑动窗口计算而得,卷积核中的每一个参数都相当于传统神经网络中的权值参数,与对应的局部像素相连接,将卷积核的各个参数与对应的局部像素值相乘之和,(通常还要再加上一个偏置参数),得到卷积层上的结果。The convolutional layer is obtained by calculating the convolution kernel on the input layer of the previous stage by sliding the window one by one. Each parameter in the convolution kernel is equivalent to the weight parameter in the traditional neural network, and is connected with the corresponding local pixel. The sum of the parameters of the convolution kernel and the corresponding local pixel values (usually plus an offset parameter) to obtain the result on the convolutional layer.
通过卷积层获得了图像的特征之后,为了进一步降低本实施例中基于卷积神经网络实现的深度学习引擎的网络训练参数及模型的过拟合程度,对卷积层进行池化/采样处理。池化/采样的方式通常有以下两种:After the features of the image are obtained by the convolutional layer, in order to further reduce the over-fitting degree of the network training parameters and the model of the deep learning engine implemented by the convolutional neural network in this embodiment, the convolution layer is pooled/sampled. . There are usually two ways to pool/sample:
Max-Pooling:选择Pooling窗口中的最大值作为采样值;Max-Pooling: Select the maximum value in the Pooling window as the sample value;
Mean-Pooling:将Pooling窗口中的所有值相加取平均,以平均值作为采样值。Mean-Pooling: Adds all the values in the Pooling window to the average and takes the average as the sampled value.
请参考图7,其示出了本实施例中卷积神经网络的结构图,本实施例中使用经典的卷积神经网络结构。Please refer to FIG. 7, which shows a structural diagram of a convolutional neural network in this embodiment. In this embodiment, a classical convolutional neural network structure is used.
C1层是一个卷积层,在C1层中得到6个特征图,每个特征图中的每个神经元与输入中的5*5邻域相连,特征图大小为28*28;每个卷积神经元有25 个单元参数和一个基础参数;有122304个连接。The C1 layer is a convolutional layer. In the C1 layer, six feature maps are obtained. Each neuron in each feature map is connected to a 5*5 neighborhood in the input. The feature map size is 28*28; each volume The product neuron has 25 unit parameters and a base parameter; there are 122,304 connections.
S2层是一个下采样层,有6个14*14的特征图,每个图中的每个单元与C1特征图中的一个2*2的邻域相连接,不重叠,因此,S2中每个特征图的大小是C1中特征图大小的1/4;S2层每个单元的4个输入相加,乘以一个可训练参数W,在加上可训练偏置b,结果通过sigmoid函数计算。S2层的连接数为5880个。The S2 layer is a downsampling layer with six 14*14 feature maps. Each cell in each graph is connected to a 2*2 neighborhood in the C1 feature map, and does not overlap. Therefore, each in S2 The size of the feature map is 1/4 of the size of the feature map in C1; the four inputs of each unit of the S2 layer are added, multiplied by a trainable parameter W, and the trainable offset b is added, and the result is calculated by the sigmoid function. . The number of connections in the S2 layer is 5,880.
C3层是一个卷积层,有16个卷积核,得到16张特征图,特征图大小为10*10;每个特征图中的每个神经元与S2中的某几层的多个5*5邻域相连。The C3 layer is a convolutional layer with 16 convolution kernels, and 16 feature maps are obtained. The feature map size is 10*10; each neuron in each feature map and a plurality of layers in S2 are 5 *5 neighbors are connected.
S4是一个下采样层,由16个5*5大小的特征图构成,特征图中的每个单元与C3中的相应特征图的2*2邻域相连接;连接数为2000个。S4 is a downsampling layer composed of 16 5*5 size feature maps. Each unit in the feature map is connected to the 2*2 neighborhood of the corresponding feature map in C3; the number of connections is 2000.
C5层是一个卷积层,包括120个神经元,120个特征图,每张特征图的大小为1*1;每个单元与S4层的全部16个单元的5*5邻域相连,共有48120个连接数。The C5 layer is a convolutional layer consisting of 120 neurons and 120 feature maps. Each feature map has a size of 1*1; each unit is connected to the 5*5 neighborhood of all 16 units of the S4 layer. 48120 connections.
F6层有84个单元,与C5层全相连,具有10164个连接数。The F6 layer has 84 units, which are fully connected to the C5 layer and have 10,164 connections.
在使用二值化方法对待处理图片进行处理之后,需要对处理结果进行分析。通常在对处理结果进行分析的时候,现有技术对于低对比度或纹理复杂的场景下的二值化分析通常难以取得准确度高的结果。而本实施例提供的基于卷积神经网络实现的深度学习引擎为基于大数据的深度学习神经网络,其对于置信度的输出结果准确度高并且输出速度快,弥补了现有技术的对场景要求高并且准确度差的不足。因此,基于所述引擎对步骤103中的处理结果进行评价,其鲁棒性和准确度均高于传统的二值化处理结果评价方法。After processing the image to be processed using the binarization method, the processing result needs to be analyzed. Generally, when analyzing the processing results, the prior art generally has difficulty in obtaining high-accuracy results for binarization analysis in a scene with low contrast or complex texture. The deep learning engine based on the convolutional neural network provided by the embodiment is a deep learning neural network based on big data, and the output of the confidence is accurate and the output speed is fast, which makes up the scene requirements of the prior art. Insufficient high and poor accuracy. Therefore, based on the engine, the processing result in step 103 is evaluated, and the robustness and accuracy are higher than the traditional evaluation method of binarization processing results.
依托于所述引擎,本申请实施例可自适应计算基于滑动窗的二值化方法处理结果的文字置信度和基于颜色值统计的二值化方法处理结果的文字置信度,并由此在步骤S105中选择处理结果。Based on the engine, the embodiment of the present application can adaptively calculate the text confidence of the processing result of the binarization method based on the sliding window and the text confidence of the processing result of the binarization method based on the color value statistics, and thereby The processing result is selected in S105.
在本申请实施例的一个场景中,请参考图8,其示出了待处理图片。请参考图9,其示出了本申请实施例中基于滑动窗的二值化方法对于图8中的待处理图片的处理结果。请参考图10,其示出了本申请实施例中基于颜色值统计的二值化方法对于图8中的待处理图片的处理结果。将图9和图10输入基于CNN实现的深度学习引擎,得到图9和图10中每一个文字的置信度,并进而 算出图9和图10的文字置信度,本实施例中图9的文字置信度为0.88,图10的文字置信度0.97,因此,选择图10的处理结果作为对图8中的待处理图片的二值化结果。In one scenario of an embodiment of the present application, please refer to FIG. 8, which shows a picture to be processed. Please refer to FIG. 9 , which shows the processing result of the to-be-processed picture in FIG. 8 in the binarization method based on the sliding window in the embodiment of the present application. Please refer to FIG. 10 , which shows the processing result of the binarization method based on the color value statistics in the embodiment of the present application on the to-be-processed image in FIG. 8 . 9 and FIG. 10 are input into the deep learning engine based on CNN, and the confidence of each character in FIG. 9 and FIG. 10 is obtained, and the text confidence of FIG. 9 and FIG. 10 is calculated, and the text of FIG. 9 in this embodiment is calculated. The confidence level is 0.88, and the text confidence of FIG. 10 is 0.97. Therefore, the processing result of FIG. 10 is selected as the binarization result for the picture to be processed in FIG.
在本申请实施例的另一个场景中,请参考图11,其示出了待处理图片。请参考图12,其示出了本申请实施例中基于滑动窗的二值化方法对于图11中的待处理图片的处理结果。请参考图13,其示出了本申请实施例中基于颜色值统计的二值化方法对于图11中的待处理图片的处理结果。将图12和图13输入基于CNN实现的深度学习引擎,得到图12和图13中每一个文字的置信度,并进而算出图12和图13的文字置信度,本实施例中图12的文字置信度为0.99,图13的文字置信度0.94,因此,选择图13的处理结果作为对图11中的待处理图片的二值化结果。In another scenario of an embodiment of the present application, please refer to FIG. 11, which shows a picture to be processed. Please refer to FIG. 12 , which shows the processing result of the to-be-processed picture in FIG. 11 in the binarization method based on the sliding window in the embodiment of the present application. Please refer to FIG. 13 , which shows the processing result of the binarization method based on color value statistics in the embodiment of the present application on the to-be-processed image in FIG. 11 . 12 and FIG. 13 are input into the deep learning engine based on CNN, and the confidence of each character in FIG. 12 and FIG. 13 is obtained, and the text confidence of FIG. 12 and FIG. 13 is calculated, and the text of FIG. 12 in this embodiment is calculated. The confidence level is 0.99, and the text confidence of Fig. 13 is 0.94. Therefore, the processing result of Fig. 13 is selected as the binarization result for the picture to be processed in Fig. 11.
本申请实施例中仅需将待处理图像通过各种互补性强的二值化方法分别独立处理,然后通过使用基于光学字符识别的学习引擎得到单个文字的置信度,进而计算文字置信度,即可动态选择最优的处理结果。不需要关心全局信息或局部纹理,即可实现各种二值化方法的处理结果的无缝切换。In the embodiment of the present application, only the image to be processed needs to be separately processed by various complementary methods of binarization, and then the confidence of the single text is obtained by using the learning engine based on optical character recognition, thereby calculating the text confidence, that is, The optimal processing result can be dynamically selected. Seamless switching of processing results of various binarization methods can be achieved without concern for global information or local textures.
下述为本申请装置实施例,可以用于执行本申请方法实施例。对于本申请装置实施例中未披露的细节,请参照本申请方法实施例。The following is an embodiment of the apparatus of the present application, which may be used to implement the method embodiments of the present application. For details not disclosed in the device embodiments of the present application, please refer to the method embodiments of the present application.
请参考图14,其示出了一种图片的二值化装置的框图,该装置具有实现上述方法的功能,所述功能可以由硬件实现,也可以由硬件执行相应的软件实现。该装置可以包括:Please refer to FIG. 14, which shows a block diagram of a binarization device for a picture. The device has a function of implementing the above method, and the function may be implemented by hardware or may be implemented by hardware. The device can include:
待处理图片获取模块201,用于获取待处理图片。其可以用于执行步骤101。The to-be-processed picture acquisition module 201 is configured to acquire a picture to be processed. It can be used to perform step 101.
处理结果得到模块202,用于分别使用多个预设的二值化处理方法对所述待处理图片进行独立的二值化处理,每个二值化方法得到一个处理结果。其可以用于执行步骤102。The processing result obtaining module 202 is configured to perform independent binarization processing on the to-be-processed image by using a plurality of preset binarization processing methods, and each binarization method obtains a processing result. It can be used to perform step 102.
处理结果集合得到模块203,用于根据所述处理结果,得到处理结果集合。其可以用于执行步骤103。The processing result set obtaining module 203 is configured to obtain a processing result set according to the processing result. It can be used to perform step 103.
文字置信度计算模块204,用于计算所述处理结果集合中的每一个处理结果的文字置信度。其可以用于执行步骤104。The text confidence calculation module 204 is configured to calculate a text confidence of each processing result in the processing result set. It can be used to perform step 104.
二值化结果得到模块205,用于选取文字置信度最高的处理结果作为对所述待处理图片的二值化结果。其可以用于执行步骤105。The binarization result obtaining module 205 is configured to select a processing result with the highest degree of confidence in the text as a binarization result of the to-be-processed picture. It can be used to perform step 105.
进一步地,所述文字置信度计算模块204包括:Further, the text confidence calculation module 204 includes:
置信度获取单元2041,用于获取处理结果中每一个文字的置信度。其可以用于执行步骤1041。The confidence acquiring unit 2041 is configured to obtain a confidence level of each character in the processing result. It can be used to perform step 1041.
文字置信度计算单元2042,用于根据预设的文字置信度算法和每一个文字的置信度计算所述处理结果的文字置信度。其可以用于执行步骤1042。The text confidence calculation unit 2042 is configured to calculate the text confidence of the processing result according to a preset text confidence algorithm and a confidence level of each character. It can be used to perform step 1042.
请参考图15,其示出了文字置信度计算单元的框图,所述文字置信度计算单元2042可以包括:Referring to FIG. 15, which is a block diagram of a text confidence calculation unit, the text confidence calculation unit 2042 may include:
权值设定模块20421,用于设定每一个文字对应的权值。其可以用于执行步骤S1。The weight setting module 20421 is configured to set a weight corresponding to each character. It can be used to perform step S1.
平均置信度计算模块20422,用于计算所述处理结果的加权平均置信度。其可以用于执行步骤S2和S3。The average confidence calculation module 20422 is configured to calculate a weighted average confidence of the processing result. It can be used to perform steps S2 and S3.
文字置信度得到模块20423,用于将所述加权平均置信度作为文字置信度。其可以用于执行步骤S4。The text confidence obtaining module 20423 is configured to use the weighted average confidence as a text confidence. It can be used to perform step S4.
请参考图16,其示出了处理结果得到模块的框图,所述处理结果得到模块202包括:Please refer to FIG. 16, which shows a block diagram of a processing result obtaining module. The processing result obtaining module 202 includes:
滑动窗二值化单元2021,用于基于滑动窗的二值化方法对待处理图片进行二值化处理。The sliding window binarization unit 2021 is configured to perform binarization processing on the image to be processed based on the binarization method of the sliding window.
颜色值统计二值化单元2022,用于基于颜色值统计的二值化方法对待处理图片进行二值化处理。The color value statistical binarization unit 2022 is configured to perform binarization processing on the image to be processed based on the binarization method of the color value statistics.
具体地,请参考图17,其示出了滑动窗二值化单元的框图,所述滑动窗二值化单元2021包括:Specifically, please refer to FIG. 17, which shows a block diagram of a sliding window binarization unit, which includes:
窗口设定模块20211,用于将窗口设置于所述待处理图片的预设位置。其可以用于执行步骤T1。The window setting module 20211 is configured to set a window to a preset position of the to-be-processed picture. It can be used to perform step T1.
第一判断模块20212,用于判断所述窗口内的像素与相关像素是否属于连续的图案;所述相关像素为窗口外与所述窗口相邻的像素。其可以用于执行步骤T2。The first determining module 20212 is configured to determine whether the pixel and the related pixel in the window belong to a continuous pattern; and the related pixel is a pixel adjacent to the window outside the window. It can be used to perform step T2.
局部二值化模块20213,用于对所述窗口内的像素进行局部二值化。其可 以用于执行步骤T3。The local binarization module 20213 is configured to perform local binarization on pixels in the window. It can be used to perform step T3.
第二判断模块20214,用于判断滑动所述窗口是否到达所述预设轨迹的终点。其可以用于执行步骤T4。The second determining module 20214 is configured to determine whether the sliding of the window reaches an end point of the preset trajectory. It can be used to perform step T4.
移动模块20215,用于按照预设轨迹移动所述窗口。其可以用于执行步骤T5。The moving module 20215 is configured to move the window according to a preset trajectory. It can be used to perform step T5.
具体地,请参考图18,其示出了颜色值统计二值化单元的框图,所述颜色值统计二值化单元2022包括:Specifically, please refer to FIG. 18, which shows a block diagram of a color value statistical binarization unit, and the color value statistical binarization unit 2022 includes:
统计结果得到模块20221,用于得到所述待处理图片的像素的颜色分布统计结果。其可以用于执行步骤P1。The statistical result obtaining module 20221 is configured to obtain a color distribution statistical result of the pixel of the to-be-processed picture. It can be used to perform step P1.
目标颜色得到模块20222,用于基于所述颜色分布统计结果,使用预设的颜色聚类算法得到两个目标颜色。其可以用于执行步骤P2。The target color obtaining module 20222 is configured to obtain two target colors using a preset color clustering algorithm based on the color distribution statistical result. It can be used to perform step P2.
设定模块20223,用于根据所述两个目标颜色设定前景颜色和背景颜色。其可以用于执行步骤P3。The setting module 20223 is configured to set a foreground color and a background color according to the two target colors. It can be used to perform step P3.
判定模块20224,用于依次计算所述待处理图片的像素的第一距离和第二距离,并根据计算结果判定所述像素的归属;所述第一距离为所述像素的颜色与所述前景颜色之间的欧几里得距离,所述第二距离为所述像素颜色与所述背景颜色之间的欧几里得距离。其可以用于执行步骤P4。The determining module 20224 is configured to sequentially calculate a first distance and a second distance of the pixel of the to-be-processed picture, and determine a attribution of the pixel according to a calculation result; the first distance is a color of the pixel and the foreground The Euclidean distance between the colors, the second distance being the Euclidean distance between the pixel color and the background color. It can be used to perform step P4.
二值化模块20225,用于根据所述判定结果对所述待处理图片中的像素进行二值化。其可以用于执行步骤P5。The binarization module 20225 is configured to binarize pixels in the to-be-processed image according to the determination result. It can be used to perform step P5.
需要说明的是,上述实施例提供的装置,在实现其功能时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将设备的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的装置与方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。It should be noted that, when the device provided by the foregoing embodiment implements its function, only the division of each functional module described above is illustrated. In actual applications, the function distribution may be completed by different functional modules as needed. The internal structure of the device is divided into different functional modules to perform all or part of the functions described above. In addition, the apparatus and method embodiments provided in the foregoing embodiments are in the same concept, and the specific implementation process is described in detail in the method embodiment, and details are not described herein again.
请参考图19,其示出了本申请一个实施例提供的终端的结构示意图。该终端用于实施上述实施例中提供的一种图片的二值化方法。Referring to FIG. 19, it is a schematic structural diagram of a terminal provided by an embodiment of the present application. The terminal is used to implement the binarization method of a picture provided in the foregoing embodiment.
所述终端可以包括射频(Radio Frequency,RF)电路110、包括有一个或一个以上计算机可读存储介质的存储器120、输入单元130、显示单元140、传感器150、音频电路160、无线保真(wireless fidelity,WiFi)模块170、包括 有一个或者一个以上处理核心的处理器180、以及电源190等部件。本领域技术人员可以理解,图19中示出的终端结构并不构成对终端的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。其中:The terminal may include a radio frequency (RF) circuit 110, a memory 120 including one or more computer readable storage media, an input unit 130, a display unit 140, a sensor 150, an audio circuit 160, and wireless fidelity (wireless) A fidelity, WiFi) module 170, a processor 180 including one or more processing cores, and a power supply 190 and the like. It will be understood by those skilled in the art that the terminal structure shown in FIG. 19 does not constitute a limitation to the terminal, and may include more or less components than those illustrated, or a combination of certain components, or different component arrangements. among them:
RF电路110可用于收发信息或通话过程中,信号的接收和发送,特别地,将基站的下行信息接收后,交由一个或者一个以上处理器180处理;另外,将涉及上行的数据发送给基站。通常,RF电路110包括但不限于天线、至少一个放大器、调谐器、一个或多个振荡器、用户身份模块(SIM)卡、收发信机、耦合器、低噪声放大器(Low Noise Amplifier,LNA)、双工器等。此外,RF电路110还可以通过无线通信与网络和其他设备通信。所述无线通信可以使用任一通信标准或协议,包括但不限于全球移动通讯系统(Global System of Mobile communication,GSM)、通用分组无线服务(General Packet Radio Service,GPRS)、码分多址(Code Division Multiple Access,CDMA)、宽带码分多址(Wideband Code Division Multiple Access,WCDMA)、长期演进(Long Term Evolution,LTE)、电子邮件、短消息服务(Short Messaging Service,SMS)等。The RF circuit 110 can be used for transmitting and receiving information or during a call, and receiving and transmitting signals. Specifically, after receiving downlink information of the base station, the downlink information is processed by one or more processors 180. In addition, the data related to the uplink is sent to the base station. . Generally, the RF circuit 110 includes, but is not limited to, an antenna, at least one amplifier, a tuner, one or more oscillators, a Subscriber Identity Module (SIM) card, a transceiver, a coupler, and a Low Noise Amplifier (LNA). , duplexer, etc. In addition, RF circuitry 110 can also communicate with the network and other devices via wireless communication. The wireless communication may use any communication standard or protocol, including but not limited to Global System of Mobile communication (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access (Code). Division Multiple Access (CDMA), Wideband Code Division Multiple Access (WCDMA), Long Term Evolution (LTE), E-mail, Short Messaging Service (SMS), etc.
存储器120可用于存储软件程序以及模块,处理器180通过运行存储在存储器120的软件程序以及模块,从而执行各种功能应用以及数据处理。存储器120可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、功能所需的应用程序等;存储数据区可存储根据所述终端的使用所创建的数据等。此外,存储器120可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。相应地,存储器120还可以包括存储器控制器,以提供处理器180和输入单元130对存储器120的访问。The memory 120 can be used to store software programs and modules, and the processor 180 executes various functional applications and data processing by running software programs and modules stored in the memory 120. The memory 120 may mainly include a storage program area and an storage data area, wherein the storage program area may store an operating system, an application required for the function, and the like; the storage data area may store data or the like created according to the use of the terminal. Moreover, memory 120 can include high speed random access memory, and can also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, memory 120 may also include a memory controller to provide access to memory 120 by processor 180 and input unit 130.
输入单元130可用于接收输入的数字或字符信息,以及产生与用户设置以及功能控制有关的键盘、鼠标、操作杆、光学或者轨迹球信号输入。具体地,输入单元130可包括触敏表面131以及其他输入设备132。触敏表面131,也称为触摸显示屏或者触控板,可收集用户在其上或附近的触摸操作(比如用户使用手指、触笔等任何适合的物体或附件在触敏表面131上或在触敏表面131附近的操作),并根据预先设定的程式驱动相应的连接装置。可选的,触敏表 面131可包括触摸检测装置和触摸控制器两个部分。其中,触摸检测装置检测用户的触摸方位,并检测触摸操作带来的信号,将信号传送给触摸控制器;触摸控制器从触摸检测装置上接收触摸信息,并将它转换成触点坐标,再送给处理器180,并能接收处理器180发来的命令并加以执行。此外,可以采用电阻式、电容式、红外线以及表面声波等多种类型实现触敏表面131。除了触敏表面131,输入单元130还可以包括其他输入设备132。具体地,其他输入设备132可以包括但不限于物理键盘、功能键(比如音量控制按键、开关按键等)、轨迹球、鼠标、操作杆等中的一种或多种。The input unit 130 can be configured to receive input numeric or character information and to generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function controls. In particular, input unit 130 can include touch-sensitive surface 131 as well as other input devices 132. Touch-sensitive surface 131, also referred to as a touch display or trackpad, can collect touch operations on or near the user (such as a user using a finger, stylus, etc., on any suitable object or accessory on touch-sensitive surface 131 or The operation near the touch-sensitive surface 131) and driving the corresponding connecting device according to a preset program. Alternatively, the touch sensitive surface 131 can include two portions of a touch detection device and a touch controller. Wherein, the touch detection device detects the touch orientation of the user, and detects a signal brought by the touch operation, and transmits the signal to the touch controller; the touch controller receives the touch information from the touch detection device, converts the touch information into contact coordinates, and sends the touch information. The processor 180 is provided and can receive commands from the processor 180 and execute them. In addition, the touch-sensitive surface 131 can be implemented in various types such as resistive, capacitive, infrared, and surface acoustic waves. In addition to the touch-sensitive surface 131, the input unit 130 can also include other input devices 132. Specifically, other input devices 132 may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control buttons, switch buttons, etc.), trackballs, mice, joysticks, and the like.
显示单元140可用于显示由用户输入的信息或提供给用户的信息以及所述终端的各种图形用户接口,这些图形用户接口可以由图形、文本、图标、视频和其任意组合来构成。显示单元140可包括显示面板141,可选的,可以采用液晶显示器(Liquid Crystal Display,LCD)、有机发光二极管(Organic Light-Emitting Diode,,OLED)等形式来配置显示面板141。进一步的,触敏表面131可覆盖显示面板141,当触敏表面131检测到在其上或附近的触摸操作后,传送给处理器180以确定触摸事件的类型,随后处理器180根据触摸事件的类型在显示面板141上提供相应的视觉输出。虽然在图19中,触敏表面131与显示面板141是作为两个独立的部件来实现输入和输入功能,但是在某些实施例中,可以将触敏表面131与显示面板141集成而实现输入和输出功能。 Display unit 140 can be used to display information entered by the user or information provided to the user as well as various graphical user interfaces of the terminal, which can be composed of graphics, text, icons, video, and any combination thereof. The display unit 140 may include a display panel 141. Alternatively, the display panel 141 may be configured in the form of a liquid crystal display (LCD), an organic light-emitting diode (OLED), or the like. Further, the touch-sensitive surface 131 may cover the display panel 141, and when the touch-sensitive surface 131 detects a touch operation thereon or nearby, it is transmitted to the processor 180 to determine the type of the touch event, and then the processor 180 according to the touch event The type provides a corresponding visual output on display panel 141. Although in FIG. 19, touch-sensitive surface 131 and display panel 141 are implemented as two separate components to implement input and input functions, in some embodiments, touch-sensitive surface 131 can be integrated with display panel 141 for input. And output function.
所述终端还可包括至少一种传感器150,比如光传感器、运动传感器以及其他传感器。具体地,光传感器可包括环境光传感器及接近传感器,其中,环境光传感器可根据环境光线的明暗来调节显示面板141的亮度,接近传感器可在所述终端移动到耳边时,关闭显示面板141和/或背光。作为运动传感器的一种,重力加速度传感器可检测各个方向上(一般为三轴)加速度的大小,静止时可检测出重力的大小及方向,可用于识别终端姿态的应用(比如横竖屏切换、相关游戏、磁力计姿态校准)、振动识别相关功能(比如计步器、敲击)等;至于所述终端还可配置的陀螺仪、气压计、湿度计、温度计、红外线传感器等其他传感器,在此不再赘述。The terminal may also include at least one type of sensor 150, such as a light sensor, a motion sensor, and other sensors. Specifically, the light sensor may include an ambient light sensor and a proximity sensor, wherein the ambient light sensor may adjust the brightness of the display panel 141 according to the brightness of the ambient light, and the proximity sensor may close the display panel 141 when the terminal moves to the ear. And / or backlight. As a kind of motion sensor, the gravity acceleration sensor can detect the magnitude of acceleration in each direction (usually three axes). When it is stationary, it can detect the magnitude and direction of gravity. It can be used to identify the attitude of the terminal (such as horizontal and vertical screen switching, related Game, magnetometer attitude calibration), vibration recognition related functions (such as pedometer, tapping), etc.; as for the terminal can also be configured with gyroscopes, barometers, hygrometers, thermometers, infrared sensors and other sensors, here No longer.
音频电路160、扬声器161,传声器162可提供用户与所述终端之间的音频接口。音频电路160可将接收到的音频数据转换后的电信号,传输到扬声器 161,由扬声器161转换为声音信号输出;另一方面,传声器162将收集的声音信号转换为电信号,由音频电路160接收后转换为音频数据,再将音频数据输出处理器180处理后,经RF电路110以发送给比如另一终端,或者将音频数据输出至存储器120以便进一步处理。音频电路160还可能包括耳塞插孔,以提供外设耳机与所述终端的通信。An audio circuit 160, a speaker 161, and a microphone 162 can provide an audio interface between the user and the terminal. The audio circuit 160 can transmit the converted electrical data of the received audio data to the speaker 161 for conversion to the sound signal output by the speaker 161; on the other hand, the microphone 162 converts the collected sound signal into an electrical signal by the audio circuit 160. After receiving, it is converted into audio data, and then processed by the audio data output processor 180, transmitted to the terminal, for example, via the RF circuit 110, or outputted to the memory 120 for further processing. The audio circuit 160 may also include an earbud jack to provide communication of the peripheral earphones with the terminal.
WiFi属于短距离无线传输技术,所述终端通过WiFi模块170可以帮助用户收发电子邮件、浏览网页和访问流式媒体等,它为用户提供了无线的宽带互联网访问。虽然图19示出了WiFi模块170,但是可以理解的是,其并不属于所述终端的必须构成,完全可以根据需要在不改变申请的本质的范围内而省略。WiFi is a short-range wireless transmission technology, and the terminal can help users to send and receive emails, browse web pages, and access streaming media through the WiFi module 170, which provides wireless broadband Internet access for users. Although FIG. 19 shows the WiFi module 170, it can be understood that it does not belong to the necessary configuration of the terminal, and may be omitted as needed within the scope of not changing the essence of the application.
处理器180是所述终端的控制中心,利用各种接口和线路连接整个终端的各个部分,通过运行或执行存储在存储器120内的软件程序和/或模块,以及调用存储在存储器120内的数据,执行所述终端的各种功能和处理数据,从而对终端进行整体监控。可选的,处理器180可包括一个或多个处理核心;优选的,处理器180可集成应用处理器和调制解调处理器,其中,应用处理器主要处理操作系统、用户界面和应用程序等,调制解调处理器主要处理无线通信。可以理解的是,上述调制解调处理器也可以不集成到处理器180中。The processor 180 is the control center of the terminal, connecting various portions of the entire terminal using various interfaces and lines, by running or executing software programs and/or modules stored in the memory 120, and recalling data stored in the memory 120. Performing various functions and processing data of the terminal to perform overall monitoring on the terminal. Optionally, the processor 180 may include one or more processing cores; preferably, the processor 180 may integrate an application processor and a modem processor, where the application processor mainly processes an operating system, a user interface, an application, and the like. The modem processor primarily handles wireless communications. It can be understood that the above modem processor may not be integrated into the processor 180.
所述终端还包括给各个部件供电的电源190(比如电池),优选的,电源可以通过电源管理系统与处理器180逻辑相连,从而通过电源管理系统实现管理充电、放电、以及功耗管理等功能。电源190还可以包括一个或一个以上的直流或交流电源、再充电系统、电源故障检测电路、电源转换器或者逆变器、电源状态指示器等任意组件。The terminal further includes a power source 190 (such as a battery) for supplying power to each component. Preferably, the power source can be logically connected to the processor 180 through the power management system to manage functions such as charging, discharging, and power management through the power management system. . Power supply 190 may also include any one or more of a DC or AC power source, a recharging system, a power failure detection circuit, a power converter or inverter, a power status indicator, and the like.
尽管未示出,所述终端还可以包括摄像头、蓝牙模块等,在此不再赘述。具体在本实施例中,终端的显示单元是触摸屏显示器,终端还包括有存储器,以及一个或者一个以上的程序,其中一个或者一个以上程序存储于存储器中,且经配置以由一个或者一个以上处理器执行述一个或者一个以上程序包含用于执行上述一种图片的二值化方法的指令。Although not shown, the terminal may further include a camera, a Bluetooth module, and the like, and details are not described herein again. Specifically, in this embodiment, the display unit of the terminal is a touch screen display, the terminal further includes a memory, and one or more programs, wherein one or more programs are stored in the memory and configured to be processed by one or more The execution of one or more programs includes instructions for performing the binarization method of one of the above pictures.
在示例性实施例中,还提供了一种包括指令的非临时性计算机可读存储介质,例如包括指令的存储器,上述指令可由终端的处理器执行以完成上述方法 实施例中的各个步骤。例如,所述非临时性计算机可读存储介质可以是ROM、随机存取存储器(RAM)、CD-ROM、磁带、软盘和光数据存储设备等。In an exemplary embodiment, there is also provided a non-transitory computer readable storage medium comprising instructions, such as a memory comprising instructions executable by a processor of a terminal to perform the various steps of the above method embodiments. For example, the non-transitory computer readable storage medium may be a ROM, a random access memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, and an optical data storage device.
应当理解的是,在本文中提及的“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。字符“/”一般表示前后关联对象是一种“或”的关系。It should be understood that "a plurality" as referred to herein means two or more. "and/or", describing the association relationship of the associated objects, indicating that there may be three relationships, for example, A and/or B, which may indicate that there are three cases where A exists separately, A and B exist at the same time, and B exists separately. The character "/" generally indicates that the contextual object is an "or" relationship.
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。The serial numbers of the embodiments of the present application are merely for the description, and do not represent the advantages and disadvantages of the embodiments.
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。A person skilled in the art may understand that all or part of the steps of implementing the above embodiments may be completed by hardware, or may be instructed by a program to execute related hardware, and the program may be stored in a computer readable storage medium. The storage medium mentioned may be a read only memory, a magnetic disk or an optical disk or the like.
以上所述仅为本申请的较佳实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。The above is only the preferred embodiment of the present application, and is not intended to limit the present application. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and principles of the present application are included in the protection of the present application. Within the scope.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。A person skilled in the art can clearly understand that for the convenience and brevity of the description, the specific working process of the system, the device and the unit described above can refer to the corresponding process in the foregoing method embodiment, and details are not described herein again.
在本申请所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided by the present application, it should be understood that the disclosed system, apparatus, and method may be implemented in other manners. For example, the device embodiments described above are merely illustrative. For example, the division of the unit is only a logical function division. In actual implementation, there may be another division manner, for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed. In addition, the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中, 也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit. The above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。The integrated unit, if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application, in essence or the contribution to the prior art, or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium. A number of instructions are included to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present application. The foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like. .
以上所述,以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。The above embodiments are only used to explain the technical solutions of the present application, and are not limited thereto; although the present application has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that they can still The technical solutions described in the embodiments are modified, or the equivalents of the technical features are replaced by the equivalents. The modifications and substitutions of the embodiments do not depart from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims (19)

  1. 一种图片的二值化方法,其中,所述方法包括:A binarization method for a picture, wherein the method includes:
    图片的二值化装置获取待处理图片,所述待处理图片中包含文字;The binarization device of the picture acquires a picture to be processed, and the picture to be processed includes a text;
    所述图片的二值化装置使用多个预设的二值化处理方法对所述待处理图片进行独立的二值化处理,每个二值化方法得到一个处理结果;The binarization device of the picture performs independent binarization processing on the to-be-processed image by using a plurality of preset binarization processing methods, and each binarization method obtains a processing result;
    所述图片的二值化装置根据所述处理结果,得到处理结果集合;The binarization device of the picture obtains a set of processing results according to the processing result;
    所述图片的二值化装置计算所述处理结果集合中的每一个处理结果的文字置信度;a binarization device of the picture calculates a text confidence of each of the processing result sets;
    所述图片的二值化装置选取文字置信度最高的处理结果作为所述待处理图片的二值化结果。The binarization device of the picture selects a processing result with the highest degree of text confidence as a binarization result of the to-be-processed picture.
  2. 根据权利要求1所述的方法,其中,所述图片的二值化装置计算所述处理结果集合中的每一个处理结果的文字置信度包括:The method according to claim 1, wherein the binarization means of the picture calculates the text confidence of each of the processing result sets including:
    所述图片的二值化装置获取处理结果中每一个文字的置信度;The binarization device of the picture acquires a confidence level of each word in the processing result;
    所述图片的二值化装置根据预设的文字置信度算法和每一个文字的置信度计算所述处理结果的文字置信度。The binarization means of the picture calculates the text confidence of the processing result according to a preset text confidence algorithm and a confidence level of each word.
  3. 根据权利要求2所述的方法,其中,所述图片的二值化装置获取处理结果中每一个文字的置信度包括:The method according to claim 2, wherein the confidence of each of the characters in the processing result of the binarization means of the picture comprises:
    所述图片的二值化装置将所述处理结果输入预设的基于光学字符识别的学习引擎;The binarization device of the picture inputs the processing result into a preset optical character recognition-based learning engine;
    所述图片的二值化装置得到所述学习引擎输出的置信度。The binarization device of the picture obtains a confidence level of the learning engine output.
  4. 根据权利要求2所述的方法,其中,所述图片的二值化装置根据预设的文字置信度算法和每一个文字的置信度计算所述处理结果的文字置信度包括:The method according to claim 2, wherein the binarization means of the picture calculates the text confidence of the processing result according to a preset text confidence algorithm and a confidence level of each character, including:
    所述图片的二值化装置设定所述处理结果中每一个文字对应的权值;The binarization device of the picture sets a weight corresponding to each character in the processing result;
    所述图片的二值化装置计算所述处理结果的加权平均置信度;The binarization device of the picture calculates a weighted average confidence of the processing result;
    所述图片的二值化装置根据每一个文字的置信度和所述文字对应的权值对置信度进行加权求和;The binarization device of the picture weights and satisfies the confidence according to the confidence of each character and the weight corresponding to the text;
    所述图片的二值化装置将加权求和的结果除以所述处理结果中的文字的数量得到加权平均置信度;The binarization means of the picture obtains a weighted average confidence by dividing the result of the weighted sum by the number of words in the processing result;
    所述图片的二值化装置将所述加权平均置信度作为文字置信度。The binarization means of the picture takes the weighted average confidence as a literal confidence.
  5. 根据权利要求1所述的方法,其中,所述预设的二值化处理方法包括基于滑动窗的二值化方法和基于颜色值统计的二值化方法。The method according to claim 1, wherein the preset binarization processing method comprises a binarization method based on a sliding window and a binarization method based on color value statistics.
  6. 根据权利要求5所述的方法,其中,所述基于滑动窗的二值化方法包括:The method of claim 5 wherein said sliding window based binarization method comprises:
    所述图片的二值化装置将窗口设置于所述待处理图片的预设位置;The binarization device of the picture sets a window at a preset position of the to-be-processed picture;
    所述图片的二值化装置判断所述窗口内的像素与相关像素是否属于连续的图案;所述相关像素为窗口外与所述窗口相邻的像素;The binarization device of the picture determines whether pixels and related pixels in the window belong to a continuous pattern; the related pixels are pixels adjacent to the window outside the window;
    若否,则所述图片的二值化装置对所述窗口内的像素进行局部二值化;If not, the binarization device of the picture performs local binarization on pixels in the window;
    所述图片的二值化装置判断所述窗口是否到达所述预设轨迹的终点;The binarization device of the picture determines whether the window reaches an end point of the preset track;
    若否,则所述图片的二值化装置按照预设轨迹滑动所述窗口;If not, the binarization device of the picture slides the window according to a preset trajectory;
    所述图片的二值化装置返回判断所述窗口内的像素与窗口外的相邻像素是否属于连续的图案的步骤。The binarization means of the picture returns a step of determining whether the pixels in the window and adjacent pixels outside the window belong to a continuous pattern.
  7. 根据权利要求6所述的方法,其中,所述图片的二值化装置对所述窗口内的像素进行局部二值化包括:The method according to claim 6, wherein the binarization means of the picture performs local binarization on pixels in the window, including:
    所述图片的二值化装置得到窗口内的像素的颜色分布统计结果;The binarization device of the picture obtains a statistical result of color distribution of pixels in the window;
    所述图片的二值化装置根据所述统计结果设定阈值,所述阈值用于区分所述待处理图片的前景和背景;The binarization device of the picture sets a threshold according to the statistical result, and the threshold is used to distinguish a foreground and a background of the to-be-processed picture;
    所述图片的二值化装置根据所述阈值对所述窗口内的像素进行二值化。The binarization means of the picture binarizes pixels in the window according to the threshold.
  8. 根据权利要求5所述的方法,其中,所述基于颜色值统计的二值化方法包括:The method of claim 5, wherein the binarization method based on color value statistics comprises:
    所述图片的二值化装置得到所述待处理图片的像素的颜色分布统计结果;The binarization device of the picture obtains a color distribution statistical result of the pixel of the picture to be processed;
    所述图片的二值化装置基于所述颜色分布统计结果,使用预设的颜色聚类算法得到两个目标颜色;The binarization device of the picture obtains two target colors using a preset color clustering algorithm based on the color distribution statistical result;
    所述图片的二值化装置根据所述两个目标颜色设定前景颜色和背景颜色;The binarization device of the picture sets a foreground color and a background color according to the two target colors;
    所述图片的二值化装置依次计算所述待处理图片的像素的第一距离和第二距离,并根据计算结果判定所述像素的归属,所述第一距离为所述像素的颜色与所述前景颜色之间的欧几里得距离,所述第二距离为所述像素颜色与所述背景颜色之间的欧几里得距离;The binarization device of the picture sequentially calculates a first distance and a second distance of the pixel of the to-be-processed picture, and determines a attribution of the pixel according to a calculation result, where the first distance is a color and a color of the pixel a Euclidean distance between foreground colors, the second distance being a Euclidean distance between the pixel color and the background color;
    所述图片的二值化装置根据所述判定结果对所述待处理图片中的像素进行二值化。The binarization device of the picture binarizes pixels in the to-be-processed picture according to the determination result.
  9. 根据权利要求8所述的方法,其中,所述图片的二值化装置依次计算所述待处理图片的像素的第一距离和第二距离,并根据计算结果判定所述像素的归属包括:The method according to claim 8, wherein the binarization means of the picture sequentially calculates the first distance and the second distance of the pixels of the picture to be processed, and determines that the attribution of the pixel according to the calculation result comprises:
    若所述第一距离小于第二距离,则所述图片的二值化装置判定所述像素归属于前景;If the first distance is smaller than the second distance, the binarization device of the picture determines that the pixel belongs to the foreground;
    若所述第一距离大于第二距离,则所述图片的二值化装置判定所述像素归属于背景。If the first distance is greater than the second distance, the binarization device of the picture determines that the pixel belongs to the background.
  10. 一种图片的二值化装置,其中,所述装置包括:A binarization device for a picture, wherein the device comprises:
    待处理图片获取模块,用于获取待处理图片;The image acquisition module to be processed is used to obtain a picture to be processed;
    处理结果得到模块,用于使用多个预设的二值化处理方法对所述待处理图片进行独立的二值化处理,每个二值化方法得到一个处理结果;The processing result obtaining module is configured to perform independent binarization processing on the to-be-processed image by using a plurality of preset binarization processing methods, and each binarization method obtains a processing result;
    处理结果集合得到模块,用于根据所述处理结果,得到处理结果集合;Processing a result set obtaining module, configured to obtain a processing result set according to the processing result;
    文字置信度计算模块,用于计算所述处理结果集合中的每一个处理结果的文字置信度;a text confidence calculation module, configured to calculate a text confidence of each processing result in the processing result set;
    二值化结果得到模块,用于选取文字置信度最高的处理结果作为所述待处理图片的二值化结果。The binarization result obtaining module is configured to select a processing result with the highest degree of text confidence as the binarization result of the to-be-processed picture.
  11. 根据权利要求10所述的装置,其中,所述文字置信度计算模块包括:The apparatus of claim 10, wherein the text confidence calculation module comprises:
    置信度获取单元,用于获取处理结果中每一个文字的置信度;a confidence acquisition unit, configured to obtain a confidence level of each text in the processing result;
    文字置信度计算单元,用于根据预设的文字置信度算法和每一个文字的置信度计算所述处理结果的文字置信度。The text confidence calculation unit is configured to calculate the text confidence of the processing result according to a preset text confidence algorithm and a confidence level of each text.
  12. 根据权利要求10所述的装置,其中,所述文字置信度计算单元包括:The apparatus according to claim 10, wherein said text confidence calculation unit comprises:
    权值设定模块,用于设定每一个文字对应的权值;a weight setting module for setting a weight corresponding to each character;
    平均置信度计算模块,用于计算所述处理结果的加权平均置信度;An average confidence calculation module, configured to calculate a weighted average confidence of the processing result;
    文字置信度得到模块,用于将所述加权平均置信度作为文字置信度。A text confidence level module is used to use the weighted average confidence as a text confidence.
  13. 根据权利要求10所述的装置,其中,所述处理结果得到模块包括:The apparatus according to claim 10, wherein said processing result obtaining module comprises:
    滑动窗二值化单元,用于基于滑动窗的二值化方法对待处理图片进行二值化处理;a sliding window binarization unit for performing binarization processing on a picture to be processed based on a binarization method of a sliding window;
    颜色值统计二值化单元,用于基于颜色值统计的二值化方法对待处理图片进行二值化处理。The color value statistical binarization unit is configured to perform binarization processing on the image to be processed based on the binarization method of color value statistics.
  14. 根据权利要求13所述的装置,其中,所述滑动窗二值化单元包括:The apparatus of claim 13, wherein the sliding window binarization unit comprises:
    窗口设定模块,用于将窗口设置于所述待处理图片的预设位置;a window setting module, configured to set a window to a preset position of the to-be-processed picture;
    第一判断模块,用于判断所述窗口内的像素与相关像素是否属于连续的图案;所述相关像素为窗口外与所述窗口相邻的像素;a first determining module, configured to determine whether a pixel and a related pixel in the window belong to a continuous pattern; and the related pixel is a pixel adjacent to the window outside the window;
    局部二值化模块,用于对所述窗口内的像素进行局部二值化;a local binarization module, configured to perform local binarization on pixels in the window;
    第二判断模块,用于判断滑动所述窗口是否到达所述预设轨迹的终点;a second determining module, configured to determine whether the sliding of the window reaches an end point of the preset track;
    移动模块,用于按照预设轨迹移动所述窗口。a moving module, configured to move the window according to a preset trajectory.
  15. 根据权利要求13所述的装置,其中,所述颜色值统计二值化单元包括:The apparatus according to claim 13, wherein said color value statistical binarization unit comprises:
    统计结果得到模块,用于得到所述待处理图片的像素的颜色分布统计结果;a statistical result obtaining module, configured to obtain a color distribution statistical result of the pixel of the to-be-processed picture;
    目标颜色得到模块,用于基于所述颜色分布统计结果,使用预设的颜色聚类算法得到两个目标颜色;a target color obtaining module, configured to obtain two target colors by using a preset color clustering algorithm based on the color distribution statistical result;
    设定模块,用于根据所述两个目标颜色设定前景颜色和背景颜色;a setting module, configured to set a foreground color and a background color according to the two target colors;
    判定模块,用于依次计算所述待处理图片的像素的第一距离和第二距离,并根据计算结果判定所述像素的归属;所述第一距离为所述像素的颜色与所述前景颜色之间的欧几里得距离,所述第二距离为所述像素颜色与所述背景颜色之间的欧几里得距离;a determining module, configured to sequentially calculate a first distance and a second distance of the pixel of the to-be-processed picture, and determine a attribution of the pixel according to a calculation result; the first distance is a color of the pixel and the foreground color a Euclidean distance between the second distance being a Euclidean distance between the pixel color and the background color;
    二值化模块,用于根据所述判定结果对所述待处理图片中的像素进行二值化。And a binarization module, configured to binarize pixels in the to-be-processed image according to the determination result.
  16. 一种图片的二值化终端,其中,所述终端包括权利要求10-15中任意一项所述的一种图片的二值化装置。A binarization terminal for a picture, wherein the terminal comprises a binarization device for a picture according to any one of claims 10-15.
  17. 一种计算机可读存储介质,包括指令,当该指令在计算机上运行时,该计算机执行上述权利要求1至权利要求9任一项所述的方法。A computer readable storage medium comprising instructions for performing the method of any one of claims 1 to 9 when the instruction is run on a computer.
  18. 一种包含指令的计算机程序产品,当该计算机程序产品在计算机上运行时,该计算机执行上述权利要求1至权利要求9任一项所述的方法。A computer program product comprising instructions for performing the method of any of claims 1 to 9 when the computer program product is run on a computer.
  19. 一种图片的二值化装置,其中,所述装置包括:A binarization device for a picture, wherein the device comprises:
    收发器,处理器以及总线;Transceiver, processor and bus;
    所述收发器与所述处理器通过所述总线相连;The transceiver and the processor are connected by the bus;
    所述处理器,执行如下步骤:The processor performs the following steps:
    获取待处理图片;Get the image to be processed;
    分别使用多个预设的二值化处理方法对所述待处理图片进行独立的二值化处理,每个二值化方法得到一个处理结果;Performing independent binarization processing on the to-be-processed image by using a plurality of preset binarization processing methods, and each binarization method obtains a processing result;
    根据所述处理结果,得到处理结果集合;Obtaining a set of processing results according to the processing result;
    计算所述处理结果集合中的每一个处理结果的文字置信度;Calculating a text confidence of each processing result in the set of processing results;
    选取文字置信度最高的处理结果作为所述待处理图片的二值化结果。The processing result with the highest degree of confidence in the text is selected as the binarization result of the image to be processed.
PCT/CN2018/072047 2017-01-17 2018-01-10 Image thresholding method and device, and terminal WO2018133717A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710031170.XA CN106874906B (en) 2017-01-17 2017-01-17 Image binarization method and device and terminal
CN201710031170.X 2017-01-17

Publications (1)

Publication Number Publication Date
WO2018133717A1 true WO2018133717A1 (en) 2018-07-26

Family

ID=59157628

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/072047 WO2018133717A1 (en) 2017-01-17 2018-01-10 Image thresholding method and device, and terminal

Country Status (2)

Country Link
CN (1) CN106874906B (en)
WO (1) WO2018133717A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109934812A (en) * 2019-03-08 2019-06-25 腾讯科技(深圳)有限公司 Image processing method, device, server and storage medium
CN109978890A (en) * 2019-02-25 2019-07-05 平安科技(深圳)有限公司 Target extraction method, device and terminal device based on image procossing
CN110363785A (en) * 2019-07-15 2019-10-22 腾讯科技(深圳)有限公司 A text super frame detection method and device
CN110390260A (en) * 2019-06-12 2019-10-29 平安科技(深圳)有限公司 Picture scanning part processing method, device, computer equipment and storage medium
CN110827308A (en) * 2019-11-05 2020-02-21 中国医学科学院肿瘤医院 Image processing method, device, electronic device and storage medium
CN112364740A (en) * 2020-10-30 2021-02-12 交控科技股份有限公司 Unmanned machine room monitoring method and system based on computer vision

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106874906B (en) * 2017-01-17 2023-02-28 腾讯科技(上海)有限公司 Image binarization method and device and terminal
CN108255298B (en) * 2017-12-29 2021-02-19 安徽慧视金瞳科技有限公司 Infrared gesture recognition method and device in projection interaction system
CN110361625B (en) * 2019-07-23 2022-01-28 中南大学 Method for diagnosing open-circuit fault of inverter and electronic equipment

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0780782A2 (en) * 1995-12-22 1997-06-25 Canon Kabushiki Kaisha Separation of touching characters in optical character recognition
CN102193918A (en) * 2010-03-01 2011-09-21 汉王科技股份有限公司 Video retrieval method and device
CN102779276A (en) * 2011-05-09 2012-11-14 汉王科技股份有限公司 Text image recognition method and device
CN104008384A (en) * 2013-02-26 2014-08-27 山东新北洋信息技术股份有限公司 Character identification method and character identification apparatus
CN204537126U (en) * 2015-04-18 2015-08-05 王学庆 A kind of image text identification translation glasses
US20160142405A1 (en) * 2014-11-17 2016-05-19 International Business Machines Corporation Authenticating a device based on availability of other authentication methods
CN106874906A (en) * 2017-01-17 2017-06-20 腾讯科技(上海)有限公司 A kind of binarization method of picture, device and terminal

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0799533B2 (en) * 1986-05-16 1995-10-25 富士電機株式会社 Binarization device
JP2756308B2 (en) * 1989-06-30 1998-05-25 キヤノン株式会社 Image processing device
US7734092B2 (en) * 2006-03-07 2010-06-08 Ancestry.Com Operations Inc. Multiple image input for optical character recognition processing systems and methods
US9355293B2 (en) * 2008-12-22 2016-05-31 Canon Kabushiki Kaisha Code detection and decoding system
US9025897B1 (en) * 2013-04-05 2015-05-05 Accusoft Corporation Methods and apparatus for adaptive auto image binarization
CN104200211A (en) * 2014-09-03 2014-12-10 腾讯科技(深圳)有限公司 Image binaryzation method and device
CN104268512B (en) * 2014-09-17 2018-04-27 清华大学 Character identifying method and device in image based on optical character identification
CN105374015A (en) * 2015-10-27 2016-03-02 湖北工业大学 Binary method for low-quality document image based on local contract and estimation of stroke width
CN106096491B (en) * 2016-02-04 2022-04-12 上海市第一人民医院 An automated method for identifying microaneurysms in fundus color photographic images

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0780782A2 (en) * 1995-12-22 1997-06-25 Canon Kabushiki Kaisha Separation of touching characters in optical character recognition
CN102193918A (en) * 2010-03-01 2011-09-21 汉王科技股份有限公司 Video retrieval method and device
CN102779276A (en) * 2011-05-09 2012-11-14 汉王科技股份有限公司 Text image recognition method and device
CN104008384A (en) * 2013-02-26 2014-08-27 山东新北洋信息技术股份有限公司 Character identification method and character identification apparatus
US20160142405A1 (en) * 2014-11-17 2016-05-19 International Business Machines Corporation Authenticating a device based on availability of other authentication methods
CN204537126U (en) * 2015-04-18 2015-08-05 王学庆 A kind of image text identification translation glasses
CN106874906A (en) * 2017-01-17 2017-06-20 腾讯科技(上海)有限公司 A kind of binarization method of picture, device and terminal

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109978890A (en) * 2019-02-25 2019-07-05 平安科技(深圳)有限公司 Target extraction method, device and terminal device based on image procossing
CN109978890B (en) * 2019-02-25 2023-07-07 平安科技(深圳)有限公司 Target extraction method and device based on image processing and terminal equipment
CN109934812A (en) * 2019-03-08 2019-06-25 腾讯科技(深圳)有限公司 Image processing method, device, server and storage medium
CN109934812B (en) * 2019-03-08 2022-12-09 腾讯科技(深圳)有限公司 Image processing method, image processing apparatus, server, and storage medium
US11715203B2 (en) 2019-03-08 2023-08-01 Tencent Technology (Shenzhen) Company Limited Image processing method and apparatus, server, and storage medium
CN110390260A (en) * 2019-06-12 2019-10-29 平安科技(深圳)有限公司 Picture scanning part processing method, device, computer equipment and storage medium
CN110390260B (en) * 2019-06-12 2024-03-22 平安科技(深圳)有限公司 Picture scanning piece processing method and device, computer equipment and storage medium
CN110363785A (en) * 2019-07-15 2019-10-22 腾讯科技(深圳)有限公司 A text super frame detection method and device
CN110827308A (en) * 2019-11-05 2020-02-21 中国医学科学院肿瘤医院 Image processing method, device, electronic device and storage medium
CN112364740A (en) * 2020-10-30 2021-02-12 交控科技股份有限公司 Unmanned machine room monitoring method and system based on computer vision
CN112364740B (en) * 2020-10-30 2024-04-19 交控科技股份有限公司 Unmanned aerial vehicle room monitoring method and system based on computer vision

Also Published As

Publication number Publication date
CN106874906A (en) 2017-06-20
CN106874906B (en) 2023-02-28

Similar Documents

Publication Publication Date Title
WO2018133717A1 (en) Image thresholding method and device, and terminal
CN111260665B (en) Image segmentation model training method and device
CN109919251B (en) Image-based target detection method, model training method and device
CN111476780B (en) Image detection method and device, electronic equipment and storage medium
WO2018233438A1 (en) Human face feature point tracking method, device, storage medium and apparatus
CN112820299B (en) Voiceprint recognition model training method and device and related equipment
WO2018113512A1 (en) Image processing method and related device
CN106156711B (en) Text line positioning method and device
CN111209423B (en) Image management method and device based on electronic album and storage medium
CN108764051B (en) Image processing method, device and mobile terminal
CN108259758A (en) Image processing method, device, storage medium and electronic equipment
WO2015003606A1 (en) Method and apparatus for recognizing pornographic image
CN111046742B (en) Eye behavior detection method, device and storage medium
CN113392820B (en) Dynamic gesture recognition method and device, electronic equipment and readable storage medium
CN110083742B (en) Video query method and device
WO2024022149A1 (en) Data enhancement method and apparatus, and electronic device
CN114612531B (en) Image processing method and device, electronic equipment and storage medium
CN112541489A (en) Image detection method and device, mobile terminal and storage medium
CN117292384B (en) Character recognition method, related device and storage medium
CN115841575A (en) Key point detection method, device, electronic apparatus, storage medium, and program product
CN117332844A (en) Challenge sample generation method, related device and storage medium
CN113849142B (en) Image display method, device, electronic equipment and computer readable storage medium
CN114140655A (en) Image classification method and device, storage medium and electronic equipment
CN116259083A (en) Image quality recognition model determining method and related device
CN115147503A (en) Document scheme color matching method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18741665

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18741665

Country of ref document: EP

Kind code of ref document: A1

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载