+

CN110135248A - A deep learning-based text detection method in natural scenes - Google Patents

A deep learning-based text detection method in natural scenes Download PDF

Info

Publication number
CN110135248A
CN110135248A CN201910270269.4A CN201910270269A CN110135248A CN 110135248 A CN110135248 A CN 110135248A CN 201910270269 A CN201910270269 A CN 201910270269A CN 110135248 A CN110135248 A CN 110135248A
Authority
CN
China
Prior art keywords
text
natural scene
detection
text detection
deep learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910270269.4A
Other languages
Chinese (zh)
Inventor
刘发贵
陈成
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN201910270269.4A priority Critical patent/CN110135248A/en
Publication of CN110135248A publication Critical patent/CN110135248A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/413Classification of content, e.g. text, photographs or tables

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

本发明公开了一种基于深度学习的自然场景文本检测方法。该方法使用CNN网络提取文本的多尺度特征,然后使用RNN编码这些特征以充分利用文本的上下文特性;接着,将特征图输入ROI池化层并输出一系列的文本提议。在经过非极大值抑制之后,最后通过一个文本连接器将生成的文本提议连接起来,从而灵活高效地实现多尺度、多方向的文本检测。本发明提升了多方向、变尺度条件下自然场景文本检测的准确率和召回率。

The invention discloses a natural scene text detection method based on deep learning. The method uses a CNN network to extract multi-scale features of text, and then uses RNN to encode these features to make full use of the contextual properties of text; then, the feature map is fed into the ROI pooling layer and outputs a series of text proposals. After non-maximum suppression, the generated text proposals are finally connected through a text connector, so as to realize multi-scale and multi-directional text detection flexibly and efficiently. The invention improves the accuracy and recall rate of text detection in natural scenes under the condition of multi-direction and variable scale.

Description

一种基于深度学习的自然场景文本检测方法A deep learning-based text detection method in natural scenes

技术领域technical field

本发明属于图像处理技术领域,具体涉及一种基于深度学习的自然场景文本检测方法。The invention belongs to the technical field of image processing, and in particular relates to a natural scene text detection method based on deep learning.

背景技术Background technique

场景文本检测是文本识别的重要前提,常被应用在图像检索、机器翻译、自动驾驶等领域。但是,文本检测在复杂背景、多尺度、多语言、光照不均匀、模糊等情况下的检测仍然存在着诸多困难。Scene text detection is an important prerequisite for text recognition, and is often used in image retrieval, machine translation, autonomous driving and other fields. However, there are still many difficulties in the detection of text detection in complex background, multi-scale, multi-language, uneven illumination, blurred and so on.

自然场景文本的多样性与多变性:相比与文档中的文本,自然场景的文本可能是多尺度、多语言的,形状、方向、比例、颜色可能各不相同,这些变化都给文本的检测带来了诸多挑战。Diversity and variability of natural scene text: Compared with the text in the document, the text in the natural scene may be multi-scale and multi-language, and the shape, direction, scale, and color may be different. brought many challenges.

复杂背景:场景文本可能在任意的背景中出现,包括信号标示、砖块或是草丛、栅栏,这些背景可能具有和文本非常相似的特征,可能成为噪声影响文本的判断。同时,还有异物的遮挡造成的文本的缺失,导致潜在的检测错误。Complex background: The scene text may appear in any background, including signal signs, bricks, grass, and fences. These backgrounds may have very similar characteristics to the text, which may become noise and affect the judgment of the text. At the same time, there is also the lack of text caused by the occlusion of foreign objects, resulting in potential detection errors.

参差不齐的成像质量:由于不可控的收集手段,无法保证成像的质量。用于检测的图像可能由于不同的拍摄角度或是拍摄距离造成畸变、虚焦,或是由于拍摄时光照的不同形成噪点、阴影。Uneven imaging quality: Due to uncontrolled collection means, imaging quality cannot be guaranteed. The images used for detection may be distorted or out of focus due to different shooting angles or shooting distances, or noise and shadows may be formed due to different lighting during shooting.

针对自然场景文本检测问题,可将检测方法分为两类,一类是传统的检测方法,另一类是基于深度学习的检测方法。传统的方法有基于纹理的方法,如使用局部强度、滤波器响应、小波系数等;有基于区域的方法,如笔画宽度变换(Stroke Width Transform,SWT)、最大极值稳定区域(Maximally Stable Extremal Regions,MSER)、笔画特征变换(StrokeFeature Transform,SFT)等。近年来,随着深度神经网络的发展,深度学习在计算机视觉领域表现出越来越大的优势。目前,最流行的还是基于卷积神经网络(Convolutional NeuralNetworks,CNN)的深度学习方法。在使用了深度学习之后,大大提高了文本检测的准确性,并且将人们从复杂的特征设计工作中解放出来。常用的基于深度学习的自然场景文本检测模型通常基于常见的目标检测模型,如RCNN、YOLO、SSD等。这些模型的基本结构通常是用数个卷积层和池化层提取特征,最后使用全连接层进行检测框的分类和回归。For the problem of text detection in natural scenes, detection methods can be divided into two categories, one is the traditional detection method, and the other is the detection method based on deep learning. Traditional methods include texture-based methods, such as using local intensity, filter response, wavelet coefficients, etc.; there are region-based methods, such as stroke width transform (Stroke Width Transform, SWT), Maximum Stable Extremal Regions (Maximally Stable Extremal Regions). , MSER), stroke feature transform (StrokeFeature Transform, SFT) and so on. In recent years, with the development of deep neural networks, deep learning has shown more and more advantages in the field of computer vision. At present, the most popular deep learning method is based on Convolutional Neural Networks (CNN). After using deep learning, the accuracy of text detection is greatly improved, and people are freed from complex feature design work. Commonly used deep learning-based natural scene text detection models are usually based on common object detection models, such as RCNN, YOLO, SSD, etc. The basic structure of these models is usually to extract features with several convolutional layers and pooling layers, and finally use a fully connected layer for detection box classification and regression.

发明内容SUMMARY OF THE INVENTION

为了更加准确高效地在自然场景中进行文本检测,解决自然场景中文本多方向、变尺度的检测问题,本发明提出了一种基于深度学习的自然场景文本检测方法。In order to more accurately and efficiently perform text detection in natural scenes and solve the problem of multi-directional and variable-scale text detection in natural scenes, the present invention proposes a natural scene text detection method based on deep learning.

本发明的目的至少通过如下技术方案之一实现。The object of the present invention is achieved by at least one of the following technical solutions.

一种基于深度学习的自然场景文本检测方法,包括如下步骤:A natural scene text detection method based on deep learning, comprising the following steps:

(1)构建并训练基于神经网络的自然场景文本检测模型,包含以下子步骤:(1) Build and train a neural network-based natural scene text detection model, including the following sub-steps:

(1.1)构建基于特征金字塔网络(Feature Pyramid Networks,FPN)的特征提取器;(1.1) Construct a feature extractor based on Feature Pyramid Networks (FPN);

(1.2)使用循环神经网络(Recurrent Neural Network,RNN)对特征提取器提取到的特征进行编码;(1.2) Use Recurrent Neural Network (RNN) to encode the features extracted by the feature extractor;

(1.3)使用ROI池化层进一步提高检测的精度;(1.3) Use the ROI pooling layer to further improve the detection accuracy;

(1.4)最后使用全连接层进行检测框的分类和回归,形成文本检测模型;(1.4) Finally, the fully connected layer is used to classify and regress the detection frame to form a text detection model;

(1.5)将经过标注的训练图形输入模型;使用包含分类损失和回归损失的多任务损失函数计算损失值以训练模型;(1.5) Input the labeled training graph into the model; use a multi-task loss function including classification loss and regression loss to calculate the loss value to train the model;

(2)使用上述训练完成的自然场景文本检测模型对给定图像中的自然场景文本进行检测,包含以下子步骤:(2) Use the natural scene text detection model completed by the above training to detect the natural scene text in a given image, including the following sub-steps:

(2.1)输入待检测图像,使用上述训练后模型对给定图像进行文版检测,输出一系列文本提议检测框的得分和坐标。(2.1) Input the image to be detected, use the above trained model to perform text version detection on the given image, and output a series of scores and coordinates of the text proposal detection box.

(2.2)对得到的文本提议进行非极大值抑制,以去除部分冗余检测框。(2.2) Non-maximum suppression is performed on the obtained text proposals to remove partially redundant detection boxes.

(2.3)使用文本连接器对一系列的文本提议进行连接,生成最终的检测结果。(2.3) Use the text connector to connect a series of text proposals to generate the final detection result.

与现有技术相比,本发明具有如下优点和技术效果:Compared with the prior art, the present invention has the following advantages and technical effects:

(1)本发明对于变尺度的文本检测,使用了特征金字塔网络(Feature PyramidNetworks,FPN),能够高效地同时利用各个不同大小的卷积层的信息,相比于使用最后一层特征图的方法,同时利用了高层的强语义信息和底层的高分辨率信息,从而实现更高的召回率和准确率;相比与基于图像金字塔的方法,则大大降低了计算量。(1) The present invention uses Feature Pyramid Networks (FPN) for variable-scale text detection, which can efficiently utilize the information of convolutional layers of different sizes at the same time, compared to the method using the feature map of the last layer. , while using high-level strong semantic information and low-level high-resolution information to achieve higher recall rate and accuracy; compared with the method based on image pyramid, it greatly reduces the amount of computation.

(2)对于多方向的文本检测,采用输出一系列文本提议的方式,最后通过文本连接器将这些文本提议连接起来,相比于使用任意四边形或是旋转矩形的方法,使用了更少的参数,从而对多方向文本的检测更加灵活高效。(2) For multi-directional text detection, a series of text proposals are output, and finally these text proposals are connected through text connectors. Compared with the method of using any quadrilateral or rotating rectangle, fewer parameters are used. , so that the detection of multi-directional text is more flexible and efficient.

附图说明Description of drawings

图1为实施例中自然场景文本检测流程图。FIG. 1 is a flowchart of text detection in a natural scene in an embodiment.

图2为实施例中使用的自然场景文本检测模型架构图。FIG. 2 is an architecture diagram of a natural scene text detection model used in the embodiment.

图3为实施例中使用本发明的文本检测方法在不同场景下检测的实际结果图。FIG. 3 is a diagram of actual results of detection in different scenarios using the text detection method of the present invention in the embodiment.

具体实施方式Detailed ways

为了使本发明的技术方案及优点更加清楚明白,以下结合附图,进行进一步的详细说明,但本发明的实施和保护不限于此。In order to make the technical solutions and advantages of the present invention clearer, further detailed descriptions are given below with reference to the accompanying drawings, but the implementation and protection of the present invention are not limited thereto.

首先说明本发明中的术语:First, the terms in the present invention are explained:

特征金字塔网络(Feature Pyramid Networks,FPN):FPN直接在原来的骨架网络上做修改,每个分辨率的特征图引入后一分辨率缩放两倍的特征图做每个元素对应相加的操作。通过这样的连接,每一层预测所用的特征图都融合了不同分辨率、不同语义强度的特征,融合的不同分辨率的特征图分别做对应分辨率大小的物体检测。这样保证了每一层都有合适的分辨率以及强语义特征。Feature Pyramid Networks (FPN): FPN is directly modified on the original skeleton network, and the feature map of each resolution is introduced into the feature map of the next resolution scaled twice to do the corresponding addition operation of each element. Through such a connection, the feature maps used for each layer of prediction integrate features of different resolutions and different semantic strengths, and the fused feature maps of different resolutions are used for object detection of corresponding resolutions. This ensures that each layer has appropriate resolution and strong semantic features.

残差网络(ResNet):是何凯明于2015年提出的深度卷积网络模型,根据模型所采用的层数的不同,分别命名为ResNet-34、ResNet-50、ResNet-101、ResNet-152等。Residual network (ResNet): It is a deep convolutional network model proposed by He Kaiming in 2015. According to the number of layers used in the model, it is named ResNet-34, ResNet-50, ResNet-101, ResNet-152, etc.

非极大值抑制(Non-Maximum Suppression,NMS):抑制不是极大值的元素,可以理解为局部最大搜索。输出的每个检测框都有一个分数,这些检测框可能存在包含和交叉的情况,使用NMS来选取领域里得分最高的检测框,并抑制那些分数低的检测框。Non-Maximum Suppression (NMS): Suppressing elements that are not maximum values can be understood as local maximum search. Each outputted detection box has a score, and these detection boxes may contain and intersect. NMS is used to select the detection boxes with the highest scores in the field, and suppress the detection boxes with low scores.

如图1所示,本发明中基于深度学习的自然场景文本检测模型,包括以下步骤:As shown in Figure 1, the natural scene text detection model based on deep learning in the present invention includes the following steps:

(1)构建并训练基于神经网络的自然场景文本检测模型,如图2所示,包含以下子步骤:(1) Build and train a neural network-based natural scene text detection model, as shown in Figure 2, including the following sub-steps:

(1.1)构建基于特征金字塔网络(Feature Pyramid Networks,FPN)的特征提取器。使用ResNet-101作为骨架网络,生成特征金字塔,使用其中的从P2到P5的层级的特征。(1.1) Construct a feature extractor based on Feature Pyramid Networks (FPN). Using ResNet-101 as the skeleton network, a feature pyramid is generated, and the features of the layers from P2 to P5 are used.

(1.2)使用循环神经网络(Recurrent Neural Network,RNN)对提取到的特征进行编码。使用512个隐藏层的双向长短时记忆循环神经网络(Bi-directional Long Short-Term Memory,Bi-LSTM)作为RNN对提取到的特征进行编码。(1.2) Use Recurrent Neural Network (RNN) to encode the extracted features. The extracted features are encoded using a Bi-directional Long Short-Term Memory (Bi-LSTM) recurrent neural network with 512 hidden layers as RNN.

(1.3)使用ROI池化层进一步提高检测的精度。ROI池化的具体操作如下:(1.3) Using the ROI pooling layer to further improve the detection accuracy. The specific operation of ROI pooling is as follows:

(1.3.1)根据输入的图像,将ROI映射到特征图的对应位置;(1.3.1) According to the input image, map the ROI to the corresponding position of the feature map;

(1.3.2)将映射后的区域划分为相同大小的部分,划分的数量与输出的维度相同;(1.3.2) Divide the mapped area into parts of the same size, and the number of divisions is the same as the dimension of the output;

(1.3.3)对每个部分进行最大池化操作。(1.3.3) Perform a max-pooling operation on each part.

(1.4)最后使用全连接层进行检测框的分类和回归。经过ROI池化的特征分别通过两个全连接层进行分类和回归。若输出的检测框的数量为k,其中分类层输出的维度为2k,对应着文本和背景;回归层输出的维度为4k,对应检测框的左上和右下2个坐标。(1.4) Finally, the fully connected layer is used for the classification and regression of the detection frame. The ROI pooled features are classified and regressed through two fully connected layers, respectively. If the number of output detection frames is k, the output dimension of the classification layer is 2k, corresponding to the text and background; the output dimension of the regression layer is 4k, corresponding to the upper left and lower right coordinates of the detection frame.

(1.5)输入经过标注的训练图形对模型进行训练。其中,训练图像可以使用四边形标注,也可以使用矩形标注。但在输入模型之前,需将其按给定的宽度分割,若训练图像标注为四边形,则取其分割后的最小外接矩形;若标注为矩形,则直接分割。(1.5) Input the labeled training graph to train the model. Among them, the training images can be marked with quadrilaterals or rectangles. However, before inputting the model, it needs to be divided according to the given width. If the training image is marked as a quadrilateral, the smallest circumscribed rectangle after the segmentation is taken; if it is marked as a rectangle, it is directly divided.

设计包含分类损失和回归损失的多任务损失函数。使用设计的损失函数进行损失的计算:Design a multi-task loss function that includes classification loss and regression loss. Use the designed loss function to calculate the loss:

其中L、Lcls和Lreg分别为总损失、分类损失和回归损失,λ是平衡分类损失和回归损失之间的权重系数。pi是第i个检测框预测的类别,是第i个检测框的真实类别。ti是第i个检测框的预测坐标,是第i个检测框的真实坐标。where L, L cls , and L reg are the total loss, classification loss, and regression loss, respectively, and λ is the weight coefficient that balances the classification loss and regression loss. p i is the category predicted by the ith detection box, is the ground-truth category of the ith detection box. t i is the predicted coordinate of the ith detection box, are the true coordinates of the ith detection box.

(2)使用上述训练完成的自然场景文本检测模型对给定图像中的自然场景文本进行检测,包含以下子步骤:(2) Use the natural scene text detection model completed by the above training to detect the natural scene text in a given image, including the following sub-steps:

(2.1)输入待检测图像,使用上述训练后模型对给定图像进行文本检测,输出一系列文本提议检测框的得分和坐标。(2.1) Input the image to be detected, use the above-trained model to perform text detection on the given image, and output a series of scores and coordinates of text proposal detection boxes.

(2.2)对得到的文本提议进行非极大值抑制,以去除部分冗余检测框。具体操作如下:(2.2) Non-maximum suppression is performed on the obtained text proposals to remove partially redundant detection boxes. The specific operations are as follows:

对于文本提议检测框的列表B及其对应的得分S,采用下面的计算方式。选择具有最大分数For the list B of text proposal detection boxes and their corresponding scores S, the following calculation methods are used. Choose the one with the largest score

的检测框M,将其从B集合中移除并加入到最终的检测结果D中。通常将B中剩余检测框中The detection frame M is removed from the B set and added to the final detection result D. Usually the remaining detection box in B

与M的IoU大于阈值的框从B中移除。重复这个过程,直到B为空。Boxes with an IoU greater than a threshold with M are removed from B. Repeat this process until B is empty.

(2.3)使用文本连接器对一系列的文本提议进行连接,生成最终的检测结果。使用如(2.3) Use the text connector to connect a series of text proposals to generate the final detection result. use as

下步骤进行文本提议的连接:The following steps perform concatenation of text proposals:

若提议Pj和提议Pi(此处的i、j表示不同的提议)满足下列两项条件,将提议Pj定义为提议Pi的邻居:If the proposal P j and the proposal P i (where i and j represent different proposals) satisfy the following two conditions, the proposal P j is defined as the neighbor of the proposal P i :

(1)提议Pj和提议Pi离得最近且他们之间的距离小于wj+wi (1) The proposal P j and the proposal P i are the closest and the distance between them is less than w j + w i

(2)提议Pj和提议Pi在垂直方向上具有大于0.5的重合度(2) The proposal P j and the proposal P i have a degree of coincidence greater than 0.5 in the vertical direction

其中wi和wj分别为提议Pi和提议Pj的宽度,如果提议Pi是提议Pj的邻居并且提议Pj是提议Pi的邻居,这将这两个提议连接为同一个检测框。重复执行上述步骤,直到所有的提议连接完成,则检测框为最终的输出结果。从图2和图3可知,本发明在自然场景中的检测效果,可见本发明能够对自然场景中变尺度、多方向的文本进行很好的检测。where wi and w j are the widths of proposal Pi and proposal P j respectively, if proposal Pi is a neighbor of proposal P j and proposal P j is a neighbor of proposal Pi , this connects these two proposals into the same detection frame. Repeat the above steps until all proposed connections are completed, and the detection frame is the final output result. It can be seen from FIG. 2 and FIG. 3 that the detection effect of the present invention in a natural scene shows that the present invention can perform good detection on text with variable scales and multiple directions in a natural scene.

Claims (8)

1. a kind of natural scene Method for text detection based on deep learning, it is characterised in that the following steps are included:
(1) it constructs and trains natural scene text detection model neural network based, comprising:
(1.1) building is based on the feature extractor of feature pyramid network (Feature Pyramid Networks, FPN);
(1.2) spy that feature extractor is extracted using Recognition with Recurrent Neural Network (Recurrent Neural Network, RNN) Sign is encoded;
(1.3) precision of detection is further increased using the pond ROI layer;
(1.4) classification and recurrence that detection block is finally carried out using full articulamentum, form text detection model;
It (1.5) will be by the training figure input model of mark;
(1.6) penalty values are calculated with training pattern using the multitask loss function comprising Classification Loss and recurrence loss;
(2) the natural scene text in given image is examined using the natural scene text detection model that training is completed It surveys, includes following sub-step:
(2.1) image to be detected is inputted, text inspection is carried out to given image using the natural scene text detection model after training It surveys, exports score and coordinate that a series of texts propose detection block;
(2.2) obtained text is proposed to carry out non-maxima suppression, to remove partial redundance detection block;
(2.3) proposal of a series of text is attached using text connector, generates final testing result.
2. the natural scene Method for text detection according to claim 1 based on deep learning, which is characterized in that constructing In natural scene text detection model neural network based, feature pyramid network (Feature Pyramid Networks, FPN the level from P2 to P5) has been only used.
3. the natural scene Method for text detection according to claim 1 based on deep learning, which is characterized in that constructing In natural scene text detection model neural network based, feature pyramid network (Feature Pyramid Networks, FPN) used ResNet-101 as back bone network.
4. the natural scene Method for text detection according to claim 1 based on deep learning, which is characterized in that constructing In natural scene text detection model neural network based, nerve is recycled using the two-way long short-term memory of 512 hidden layers Network (Bi-directional Long Short-Term Memory, Bi-LSTM) is used as Recognition with Recurrent Neural Network (Recurrent Neural Network, RNN) feature extracted is encoded.
5. the natural scene Method for text detection according to claim 1 based on deep learning, which is characterized in that constructing In natural scene text detection model neural network based, the calculating lost using following loss function:
Wherein L, LclsAnd LregRespectively total losses, Classification Loss and recurrence loss, λ are balanced sort loss and recurrence loss Between weight coefficient,It is the true classification of i-th of detection block.
6. the natural scene Method for text detection according to claim 5 based on deep learning, which is characterized in that classification damage Mistake is defined as follows:
Wherein, piIt is the prediction classification of i-th of detection block,It is the true classification of i-th of detection block.
7. the natural scene Method for text detection according to claim 5 based on deep learning, which is characterized in that return damage Mistake is defined as follows:
Wherein, tiIt is the prediction coordinate of i-th of detection block,It is the true coordinate of i-th of detection block.
8. the natural scene Method for text detection according to claim 1 based on deep learning, which is characterized in that giving Determine to use following steps to carry out the connection of text proposal during the natural scene text in image detected:
If proposing PjWith proposal PiMeet following two conditions, will propose PjIt is defined as proposing PiNeighbours:
(1) propose PjWith proposal PiIt is nearest and they the distance between be less than wj+wi
(2) propose PjWith proposal PiThere is the registration greater than 0.5 in vertical direction
Wherein wiAnd wjRespectively propose PiWith proposal PjWidth, if propose PiIt is to propose PjNeighbours and propose PjIt is to mention Discuss PiNeighbours, the two proposals are connected as the same detection block by this.
CN201910270269.4A 2019-04-03 2019-04-03 A deep learning-based text detection method in natural scenes Pending CN110135248A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910270269.4A CN110135248A (en) 2019-04-03 2019-04-03 A deep learning-based text detection method in natural scenes

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910270269.4A CN110135248A (en) 2019-04-03 2019-04-03 A deep learning-based text detection method in natural scenes

Publications (1)

Publication Number Publication Date
CN110135248A true CN110135248A (en) 2019-08-16

Family

ID=67569376

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910270269.4A Pending CN110135248A (en) 2019-04-03 2019-04-03 A deep learning-based text detection method in natural scenes

Country Status (1)

Country Link
CN (1) CN110135248A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110766020A (en) * 2019-10-30 2020-02-07 哈尔滨工业大学 System and method for detecting and identifying multi-language natural scene text
CN110807422A (en) * 2019-10-31 2020-02-18 华南理工大学 A deep learning-based text detection method in natural scenes
CN111126389A (en) * 2019-12-20 2020-05-08 腾讯科技(深圳)有限公司 Text detection method, device, electronic device and storage medium
CN111753714A (en) * 2020-06-23 2020-10-09 中南大学 A multi-directional natural scene text detection method based on character segmentation
WO2020221298A1 (en) * 2019-04-30 2020-11-05 北京金山云网络技术有限公司 Text detection model training method and apparatus, text region determination method and apparatus, and text content determination method and apparatus
CN113591829A (en) * 2021-05-25 2021-11-02 上海一谈网络科技有限公司 Character recognition method, device, equipment and storage medium

Citations (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101436299A (en) * 2008-11-19 2009-05-20 哈尔滨工业大学 Method for detecting natural scene image words
CN103942550A (en) * 2014-05-04 2014-07-23 厦门大学 Scene text recognition method based on sparse coding characteristics
CN104537362A (en) * 2015-01-16 2015-04-22 中国科学院自动化研究所 Domain-based self-adaptive English scene character recognition method
CN105740909A (en) * 2016-02-02 2016-07-06 华中科技大学 Text recognition method under natural scene on the basis of spatial transformation
CN106650725A (en) * 2016-11-29 2017-05-10 华南理工大学 Full convolutional neural network-based candidate text box generation and text detection method
CN107122342A (en) * 2017-04-21 2017-09-01 东莞中国科学院云计算产业技术创新与育成中心 Text encoding recognition method and device
CN107203606A (en) * 2017-05-17 2017-09-26 西北工业大学 Text detection and recognition methods under natural scene based on convolutional neural networks
CN107402947A (en) * 2017-03-29 2017-11-28 北京粉笔未来科技有限公司 Picture retrieval method for establishing model and device, picture retrieval method and device
CN107622267A (en) * 2017-10-16 2018-01-23 天津师范大学 A Scene Text Recognition Method Based on Embedded Bilateral Convolutional Activation
CN108288088A (en) * 2018-01-17 2018-07-17 浙江大学 A kind of scene text detection method based on end-to-end full convolutional neural networks
CN108304835A (en) * 2018-01-30 2018-07-20 百度在线网络技术(北京)有限公司 character detecting method and device
CN108399419A (en) * 2018-01-25 2018-08-14 华南理工大学 Chinese text recognition methods in natural scene image based on two-dimentional Recursive Networks
CN108427665A (en) * 2018-03-15 2018-08-21 广州大学 A kind of text automatic generation method based on LSTM type RNN models
CN108549893A (en) * 2018-04-04 2018-09-18 华中科技大学 A kind of end-to-end recognition methods of the scene text of arbitrary shape
CN108573257A (en) * 2017-03-14 2018-09-25 奥多比公司 Automatic Image Segmentation Based on Natural Language Phrases
CN108764133A (en) * 2018-05-25 2018-11-06 北京旷视科技有限公司 Image-recognizing method, apparatus and system
US20180365560A1 (en) * 2017-06-19 2018-12-20 International Business Machines Corporation Context aware sensitive information detection
CN109299274A (en) * 2018-11-07 2019-02-01 南京大学 A natural scene text detection method based on fully convolutional neural network
CN109344824A (en) * 2018-09-21 2019-02-15 泰康保险集团股份有限公司 A kind of line of text method for detecting area, device, medium and electronic equipment
CN109492638A (en) * 2018-11-07 2019-03-19 北京旷视科技有限公司 Method for text detection, device and electronic equipment

Patent Citations (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101436299A (en) * 2008-11-19 2009-05-20 哈尔滨工业大学 Method for detecting natural scene image words
CN103942550A (en) * 2014-05-04 2014-07-23 厦门大学 Scene text recognition method based on sparse coding characteristics
CN104537362A (en) * 2015-01-16 2015-04-22 中国科学院自动化研究所 Domain-based self-adaptive English scene character recognition method
CN105740909A (en) * 2016-02-02 2016-07-06 华中科技大学 Text recognition method under natural scene on the basis of spatial transformation
CN106650725A (en) * 2016-11-29 2017-05-10 华南理工大学 Full convolutional neural network-based candidate text box generation and text detection method
CN108573257A (en) * 2017-03-14 2018-09-25 奥多比公司 Automatic Image Segmentation Based on Natural Language Phrases
CN107402947A (en) * 2017-03-29 2017-11-28 北京粉笔未来科技有限公司 Picture retrieval method for establishing model and device, picture retrieval method and device
CN107122342A (en) * 2017-04-21 2017-09-01 东莞中国科学院云计算产业技术创新与育成中心 Text encoding recognition method and device
CN107203606A (en) * 2017-05-17 2017-09-26 西北工业大学 Text detection and recognition methods under natural scene based on convolutional neural networks
US20180365560A1 (en) * 2017-06-19 2018-12-20 International Business Machines Corporation Context aware sensitive information detection
CN107622267A (en) * 2017-10-16 2018-01-23 天津师范大学 A Scene Text Recognition Method Based on Embedded Bilateral Convolutional Activation
CN108288088A (en) * 2018-01-17 2018-07-17 浙江大学 A kind of scene text detection method based on end-to-end full convolutional neural networks
CN108399419A (en) * 2018-01-25 2018-08-14 华南理工大学 Chinese text recognition methods in natural scene image based on two-dimentional Recursive Networks
CN108304835A (en) * 2018-01-30 2018-07-20 百度在线网络技术(北京)有限公司 character detecting method and device
CN108427665A (en) * 2018-03-15 2018-08-21 广州大学 A kind of text automatic generation method based on LSTM type RNN models
CN108549893A (en) * 2018-04-04 2018-09-18 华中科技大学 A kind of end-to-end recognition methods of the scene text of arbitrary shape
CN108764133A (en) * 2018-05-25 2018-11-06 北京旷视科技有限公司 Image-recognizing method, apparatus and system
CN109344824A (en) * 2018-09-21 2019-02-15 泰康保险集团股份有限公司 A kind of line of text method for detecting area, device, medium and electronic equipment
CN109299274A (en) * 2018-11-07 2019-02-01 南京大学 A natural scene text detection method based on fully convolutional neural network
CN109492638A (en) * 2018-11-07 2019-03-19 北京旷视科技有限公司 Method for text detection, device and electronic equipment

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
YU SONG ETC.: "Scene Text Detection via Deep Semantic Feature Fusion and Attention-based Refinement", 《2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION(ICPR)》 *
蔡华杰等: "基于 WT-BTC 特征和 SVM 组合分类的场景文本检测", 《科学技术与工程》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020221298A1 (en) * 2019-04-30 2020-11-05 北京金山云网络技术有限公司 Text detection model training method and apparatus, text region determination method and apparatus, and text content determination method and apparatus
CN110766020A (en) * 2019-10-30 2020-02-07 哈尔滨工业大学 System and method for detecting and identifying multi-language natural scene text
CN110807422A (en) * 2019-10-31 2020-02-18 华南理工大学 A deep learning-based text detection method in natural scenes
CN110807422B (en) * 2019-10-31 2023-05-23 华南理工大学 A method of natural scene text detection based on deep learning
CN111126389A (en) * 2019-12-20 2020-05-08 腾讯科技(深圳)有限公司 Text detection method, device, electronic device and storage medium
CN111753714A (en) * 2020-06-23 2020-10-09 中南大学 A multi-directional natural scene text detection method based on character segmentation
CN111753714B (en) * 2020-06-23 2023-09-01 中南大学 Multidirectional natural scene text detection method based on character segmentation
CN113591829A (en) * 2021-05-25 2021-11-02 上海一谈网络科技有限公司 Character recognition method, device, equipment and storage medium
CN113591829B (en) * 2021-05-25 2024-02-13 上海一谈网络科技有限公司 Character recognition method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN109299274B (en) Natural scene text detection method based on full convolution neural network
CN109584248B (en) Infrared target instance segmentation method based on feature fusion and dense connection network
CN113065558B (en) Lightweight small target detection method combined with attention mechanism
Song et al. Region-based quality estimation network for large-scale person re-identification
Tang et al. Pixel convolutional neural network for multi-focus image fusion
CN109615611B (en) Inspection image-based insulator self-explosion defect detection method
CN110428428B (en) An image semantic segmentation method, electronic device and readable storage medium
Li et al. High-resolution concrete damage image synthesis using conditional generative adversarial network
CN110135248A (en) A deep learning-based text detection method in natural scenes
CN110689599B (en) 3D visual saliency prediction method based on non-local enhancement generation countermeasure network
CN111768388B (en) A product surface defect detection method and system based on positive sample reference
CN110807422A (en) A deep learning-based text detection method in natural scenes
CN109002807A (en) A kind of Driving Scene vehicle checking method based on SSD neural network
CN108509978A (en) The multi-class targets detection method and model of multi-stage characteristics fusion based on CNN
CN108428229A (en) A lung texture recognition method based on deep neural network to extract appearance and geometric features
CN106504233A (en) Image electric power widget recognition methodss and system are patrolled and examined based on the unmanned plane of Faster R CNN
CN112365462B (en) An Image-Based Change Detection Method
CN108564012B (en) Pedestrian analysis method based on human body feature distribution
CN114494164A (en) A kind of steel surface defect detection method, device and computer storage medium
CN111723687A (en) Human action recognition method and device based on neural network
CN115830004A (en) Surface defect detection method, device, computer equipment and storage medium
CN111462140B (en) Real-time image instance segmentation method based on block stitching
CN118379589A (en) Photovoltaic panel abnormal state detection method based on multi-mode fusion and related equipment
CN116433596A (en) Slope vegetation coverage measuring method and device and related components
CN112200789B (en) Image recognition method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190816

RJ01 Rejection of invention patent application after publication
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载