CN109977970A

CN109977970A - Character recognition method under water conservancy project complex scene based on saliency detection

Info

Publication number: CN109977970A
Application number: CN201910240747.7A
Authority: CN
Inventors: 孙丰; 卢克; 马艳娜
Original assignee: Zhejiang University of Water Resources and Electric Power
Current assignee: Zhejiang University of Water Resources and Electric Power
Priority date: 2019-03-27
Filing date: 2019-03-27
Publication date: 2019-07-05

Abstract

The case where present invention discloses character recognition method under a kind of water conservancy project complex scene based on saliency detection, partial region semantic information missing occurs for part conspicuousness object (personage) detection model proposes a kind of new strong supervision conspicuousness detection method.The model of use is divided into two layers, and first layer mainly uses multi-level full convolutional neural networks and grabs the global semantic information and local feature information of conspicuousness personage in pixel level, and marks out coarse conspicuousness personage.The second layer is introduced into the significant characteristics of the shallow hierarchy generated in first layer operational process by using short connection, is merged with coarse notable figure to obtain the characteristic information being lost, and reinforce the boundary characteristic of obvious object.In data input, have chosen primary picture it is high it is photochemical after picture and primary bloom picture is gone as inputting, to show on the character data collection under the model of the design water conservancy project complex scene that random collecting obtains on network excellent simultaneously.

Description

Character recognition method under water conservancy project complex scene based on saliency detection

Technical field

The present invention relates to computer vision fields, detect primarily directed to the conspicuousness of personage under water conservancy project complex scene.

Background technique

Under the increasingly important historical background of flood control informatization, in the weather such as typhoon and spring tide, how to prevent and and Shi Faxian personnel illegally enter water engineering facility (seawall, reservoir, dyke etc.) seems especially urgent in supervision, using artificial Check that the mode of video monitoring is popularized on a large scale, but efficiency and cost are to be improved.As can by the way of finding automatically, Machine is allowed to assist staff, therefore the target detection under water conservancy project complex scene for personage is very important.In this way Demand action can be used personage conspicuousness detection technique meet.Although there is some technologies to can detecte part on the market Personage in scape, but have the following disadvantages: (1) scene can not be excessively complicated, once contain more complexity in scene Element, with high contrast and the big non-limiting object of accounting is easy to cause detection to fail in picture；(2) for detecting Personage, profile is unintelligible, and sometimes very fuzzy, lost part overall situation semantic information；(3) existing part conspicuousness detection Model can not be arranged effectively for such as water surface glistening light of waves of the element in the complex background in scene or with the massif of high contrast It removes；(4) existing conspicuousness detection model is for close with personage's distance present in the image that shoots in water conservancy project scene and compare Spending low impurity element effectively can not remove or identify；(5) it when existing conspicuousness detection model is directed to actual conditions, obtains Result be deviated with reality.

Personage's picture in the case where observing a large amount of water conservancy project complex scene finds that these pictures can be divided into following three classes: (1) wisp, i.e. conspicuousness personage are in the picture compared with full figure, and area accounting is less than 10%, and such picture number is in entirety Detect accounting 80% in picture；(2) in the picture compared with full figure, accounting is greater than 50% for big object, i.e. personage, such picture compared with It is few；(3) complex background not only contains personage's main body in the image shot, also includes riverside dykes and dams, distant place massif, river bank Junction grade obvious object.How personage in detection image is solved, and the especially difficult problem of wisp detection is to be badly in need of solution Certainly the problem of.

Summary of the invention

In order to overcome the above-mentioned deficiencies of the prior art, the present invention provides a kind of water conservancy project based on saliency detection is multiple The recognition methods of personage under miscellaneous scene.For can not the global semantic letter of effective integration existing for current most people object detecting method Breath and local characteristic information and noise present in picture can not be effectively detected, propose the depth based on multi-level short connection Merge SEMANTIC INFORMATION MODEL.Can effectively using local feature information reduce because detect when lost part overall situation semantic information and The case where causing conspicuousness object detection to fail, while enhancing the mark of conspicuousness object, noise to non-limiting object or The non-limiting object of person's large volume is effectively removed.

The technical scheme adopted by the invention is that: acquisition picture overall situation semantic information and local feature information use multilayer Secondary two information of short link model depth integration complement each other, the case where to reduce loss of learning in detection process.

Compared with prior art, the beneficial effects of the invention are as follows global semantic information and local feature information is combined, subtract Lack and has occurred global semantic information loss in conspicuousness detection process so that the case where influencing final significant result figure.

Detailed description of the invention

Fig. 1 is the Artificial Neural Network Structures figure that the present invention uses；

Fig. 2 is pixel layer facial nerve flow through a network schematic diagram；

Fig. 3 is pixel layer facial nerve network specification auxiliary figure；

Fig. 4 is primary figure, removes bloom figure and bloom figure contrast schematic diagram.

Specific embodiment

The present invention will be further explained below with reference to the attached drawings.

Model of the invention is based on Caffe deep learning frame.

The first step designs a multi plane neural network end to end, and input picture can be made to be mapped directly to needs Pixel-level significantly detect figure.With that in mind, (1) model first can produce multifaceted notable figure to grab difference Overall situation semanteme or local feature in level.(2) model, which needs enough depth just, can go deep into the specifying information of capturing pictures And hiding context comparative information.As shown in Figure 1, design pixel layer model.(initial picture size is 256 to initial picture × 256, unit: pixel) by bloom and after going bloom to operate, the full convolutional neural networks of depth are inputted, in different convolutional layers Generate respective result figure.The depth of these different size of convolution and neural network is referring to DCL (Deep Contrast Learning pixel fluid layer), substrate of the simultaneous selection VGG-16 as model, as shown in Figure 2.

It also needs to carry out bloom and bloom reservation operations to image before input picture.It is complementary in image in order to grab Semantic information, figure bloom figure and remove bloom figure by being converted to original image (RGB).It is converted using following formula:

Wherein X is original image, and M is the pixel average of image data set.K is a hyper parameter, is commonly defined as 1.

As can be seen from the above formula that the image X after conversion_OWithIt is reciprocal.Use X_OWithTwo as model A input source.Specific effect is as shown in Figure 4.

The pixel layer model mainly has 5 pond layers (Max Pooling), 10 whole convolutional layer (each whole volumes It include 2~4 not equal independent convolutional layers in lamination).Data are slowly transmitted downwards since first layer convolutional layer up to Five layers of convolutional layer, during convolution transmitting, each layer convolutional layer can extract the known another characteristic of each layer.As first layer convolution is special Note is local characteristic information, can eliminate the inapparent noise and impurity of part at this time, and for conspicuousness object Boundary have higher retention so that picture is still within the degree of high pixel.After first layer, pond layer is acted on The relatively significant information around the data requirement that diminution is transmitted while also extraction pixel.Hereafter each layer all can be to upper layer The data transmitted extract new information, because the cross-sectional dimensions size of the data of the effect transmitting of pond layer successively contracts Subtract.Layer 5 can grab the position of conspicuousness object and ignore more part to the global information for extracting general image, emphasis Characteristic value.After the completion of all down-sampling operations, in order to need to merge the image that each convolutional layer time generates later, need by All picture up-sampling, that is, deconvolution are needed the notable figure of generation at all levels in this example to identical specification size It up-samples to 225 × 225 (units: pixel).Specific every layer of convolutional layer specification is as shown in Fig. 3.It is as described above a series of Process is concluded when mathematics function can be used:

f_s(X；W, b)=Pooling (σ (W*_sX+b))

In above-mentioned formula, X is primary input picture；W and b respectively represents convolution kernel and convolution bias；*_sRepresent volume The step value of product operation；σ representative is line rectification function (Rectified Linear Unit, ReLU)；What Pooling was represented It is pondization operation, refers in particular to maximum pondization operation (Max Pooling) herein.F as a result_s(X；W, b) be former number According to obtained from the down-sampling operation carried out according to parameter s.

In above-mentioned formula, X still represents primary input picture；f_s(X；θ) represent under the action of step-length s and parameter θ The characteristic pattern of generation；It represents in up-sampling step-length s and parameterUnder the action of deconvolution generate characteristic pattern, and guarantee Characteristic pattern specification is identical as the specification of X.However up-sampling operation herein is different from conventional double interpolation operations, in function It is to participate in supervised learning process, needs continuous perfect in an iterative process.

As described above, with apparent conspicuousness contour of object but same in the Saliency maps that pool1 convolutional layer face generates When with more impurity and noise.Convolutional network generation figure after pool5 etc. has grabbed global conspicuousness information, still Part overall situation conspicuousness information may be lost in the picture of part.So in order to preferably integrate multifaceted different spy Sign averagely melts the result figure whole numerical value addition that all different levels generate also for the loss for making up Pixel-level notable figure It is combined into a conspicuousness testing result figure.That is the FUSE operation in Fig. 2.This operates with mathematical formulae and is described as follows:

Wherein, N is the notable figure quantity obtained by first layer difference convolution Chi Huahou；S_iTo pass through different convolution ponds Obtained notable figure S_fuse。

The notable figure is than the notable figure S after layer 5 pondization up-sampling₅There is clearer boundary, but there is also more Noise.Therefore it will enter the second layer, and carry out noise, and strengthen conspicuousness object.It is learnt by observation, is adopted on first layer pond Notable figure S after sample₁There is the clear boundary close to original image Pixel-level.Then spontaneous idea, it is expected that S_fuseIt can be better Extract S₁Clear boundary to supply S_fuseSmeared out boundary problem.Therefore, it is connected using end, by S₁With S_fuseLongitudinal stack, with It carries out maximum pondization operation three times to the image afterwards to operate with cubic convolution, the purpose of this operation is to be expected that by pondization operation Remove S₁In unnecessary impurity reinforce the mark of conspicuousness object, and because pond degree is smaller three times, do not influence S₁Boundary Readability.Pond step-length three times is 2 and pond range is 2 × 2 (units: pixel).During operating three times respectively Generate notable figure S_fuse11, S_fuse12, S_fuse13.Then the above-mentioned average fusion of same addition is carried out to three width figures, obtains notable figure S_fuse2.Continue to S_fuse2It is iterated operation, as shown in fig. 1, finally obtains significant result figure.

Finally notable figure is done to front transfer using cross entropy loss function (Cross Entropy Function).Wherein Cross entropy weight be primary figure and notable figure, be expressed as follows with mathematical formulae:

Wherein G is true value figure (Ground Truth, GT)；What W was represented is the set of network parameter；β_iIt is balance of weights ginseng Number；| I | represent all pixels point set in picture；|I|_-For non-significant pixel collection；|I|₊For significant pixel collection；And

The data set of the model training is on the personage's pictures collected, picture size be unified for 225 × 225 (unit: Pixel), it the use of Batch-size is 1, the random decline that learning rate is 1e-8 is trained.When 200000 consumption of model iteration Between more than 24 hours.This method is realized using Python based on Caffe frame.The GPU used is Tesla M40 (12GB)。

By the way that the fusion function of Chi Huayu convolution operation realization, the part of primary personage's picture of input are used for multiple times above Characteristic information is gradually merged based on global semantic information.In the process, local feature information and global sense information Fusion will not accomplish in one move but gradually supplement the global semantic character physical's frame lost during Chi Huayu convolution The semantic information of other defects such as information or personage's main body.The model existing 6 SOD data sets (DUT-OMRON, ECSSD, HKU-IS, PASCA-S, SED1, SED2) on have good performance.

Claims

1. character recognition method under a kind of water conservancy project complex scene based on saliency detection, is technically characterized in that use Multi-level full convolutional neural networks carry out the detection of conspicuousness object to picture, in the detection process specifically for wisp feelings Condition acquires global semantic feature, and acquisition local message is as supplement.

2. character recognition method under the water conservancy project complex scene according to claim 1 based on saliency detection, special Sign is: acquiring the global semantic feature i.e. main position of wisp personage as judging the basic of conspicuousness object space Information uses short connection that the local message i.e. figure action details of shallow-layer convolutional layer output is semantic for the overall situation as supplemental information It is supplemented.

3. character recognition method under the water conservancy project complex scene according to claim 1 based on saliency detection, special Sign is: Model Fusion local message and global semantic feature using short connection.

4. character recognition method under the water conservancy project complex scene according to claim 2 or 3 based on saliency detection, Be characterized in that: this method conspicuousness object in picture is to have highly sensitive inspection in the case that wisp personage accounting is low The property surveyed.

5. character recognition method under the water conservancy project complex scene according to claim 2 or 3 based on saliency detection, It is characterized in that, this method has complex background present in scene high-intensitive filter capacity.