+

CN119904894A - Multi-scale pedestrian detection method and device based on joint head and overall information - Google Patents

Multi-scale pedestrian detection method and device based on joint head and overall information Download PDF

Info

Publication number
CN119904894A
CN119904894A CN202510409873.6A CN202510409873A CN119904894A CN 119904894 A CN119904894 A CN 119904894A CN 202510409873 A CN202510409873 A CN 202510409873A CN 119904894 A CN119904894 A CN 119904894A
Authority
CN
China
Prior art keywords
detection
pedestrian
head
overall
frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202510409873.6A
Other languages
Chinese (zh)
Other versions
CN119904894B (en
Inventor
马晞茗
李宁
吴迪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changchun Institute of Optics Fine Mechanics and Physics of CAS
Original Assignee
Changchun Institute of Optics Fine Mechanics and Physics of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changchun Institute of Optics Fine Mechanics and Physics of CAS filed Critical Changchun Institute of Optics Fine Mechanics and Physics of CAS
Priority to CN202510409873.6A priority Critical patent/CN119904894B/en
Publication of CN119904894A publication Critical patent/CN119904894A/en
Application granted granted Critical
Publication of CN119904894B publication Critical patent/CN119904894B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

本发明涉及计算机视觉技术领域,具体提供一种基于头部和整体信息联合的多尺度行人检测方法及装置,通过对不同层次特征进行密集连接,提升网络对多尺度行人目标的灵敏度;其次对区域建议网络的采样方式进行了优化,通过计算样本集合中每一样本的遮挡重叠率,提升模型对遮挡行人目标的适应能力;然后构建行人头部与整体信息的联合检测框架,降低行人躯体被遮挡对检测造成的不利影响。对后处理环节和损失函数模块进行了优化,减弱了相邻行人目标对检测造成的干扰,同时提升了筛除冗余框的智能性和合理性,进一步降低行人检测的漏检率和误检率。可以对复杂人群密集场景中的多尺度行人目标和遮挡行人目标的检测能力更强,并能够降低行人漏检率。

The present invention relates to the field of computer vision technology, and specifically provides a multi-scale pedestrian detection method and device based on the joint use of head and overall information. By densely connecting features at different levels, the sensitivity of the network to multi-scale pedestrian targets is improved; secondly, the sampling method of the region proposal network is optimized, and the adaptability of the model to occluded pedestrian targets is improved by calculating the occlusion overlap rate of each sample in the sample set; then, a joint detection framework for pedestrian heads and overall information is constructed to reduce the adverse effects of pedestrian body occlusion on detection. The post-processing link and loss function module are optimized to reduce the interference caused by adjacent pedestrian targets on detection, while improving the intelligence and rationality of screening out redundant frames, and further reducing the missed detection rate and false detection rate of pedestrian detection. The detection capability of multi-scale pedestrian targets and occluded pedestrian targets in complex crowd-dense scenes can be enhanced, and the pedestrian missed detection rate can be reduced.

Description

Multi-scale pedestrian detection method and device based on head and whole information combination
Technical Field
The invention relates to the technical field of computer vision, and particularly provides a multi-scale pedestrian detection method and device based on head and whole information combination.
Background
In the field of computer vision, pedestrian detection technology is one of the popular research directions in which it is widely used. The pedestrian detection task mainly refers to accurately detecting and positioning pedestrian target examples in image, video and video stream data through a computer vision correlation technology, and is essentially a classifying and regression process. In actual life, pedestrian detection plays an important role in the fields of automatic driving, intelligent monitoring, intelligent robots, man-machine interaction and the like, and has higher application value. In the automatic driving field and the auxiliary safe driving field, whether a pedestrian suddenly intrudes in front of the vehicle can be detected in real time by utilizing a pedestrian detection technology, so that adjustment is timely made according to actual conditions, and the safety of the pedestrian and the driving safety are ensured. In particular, in the crowded scene of the congested road section, the pedestrian detection technology has more obvious auxiliary effect on the automatic driving safety. In the field of intelligent monitoring, most public places today use cameras to monitor the whole scene and count crowd flow data in real time. Particularly, in epidemic situation, pedestrian traffic density can be strictly controlled in most large public places, pedestrian traffic in places can be accurately counted in real time by utilizing pedestrian detection technology, and corresponding adjustment measures can be further facilitated for managers by analyzing and predicting the data. In the field of intelligent robots, sensors such as cameras and the like are used for transmitting corresponding environmental scene signals to the intelligent robots, and a pedestrian detection algorithm is used as an important thinking sensing network in the brains of the intelligent robots, so that the intelligent robots can be helped to quickly and accurately sense pedestrian targets and make corresponding decisions in time to adjust. In the man-machine interaction field, the intelligent meal delivery vehicle, the intelligent express delivery vehicle and other devices in campuses and restaurant are integrated with various functions such as pedestrian detection algorithm, and the aim of better serving people in daily life by utilizing an artificial intelligence technology is achieved through interaction with pedestrians.
The pedestrian detection algorithm based on the deep learning can be divided into a single-stage pedestrian detection algorithm and a two-stage pedestrian detection algorithm according to detection ideas. The single-stage pedestrian detection algorithm is mainly represented by a YOLO series network model, and the two-stage pedestrian detection algorithm is represented by an R-CNN series network model. However, both algorithms lack research and design on specific detection scenes, and have weaker detection capability on small-scale pedestrian targets at far-view angles and pedestrian targets with serious shielding overlap in dense scenes, which mainly shows lower detection precision and higher omission ratio. In addition, both types of algorithms ignore an important problem that although the pedestrian body part is easy to be blocked and overlapped in a complex dense scene, the head area is always blocked slightly or even not blocked at all, and even if the pedestrian body part is severely blocked, the head area still can provide important characteristic information, which is very important for detecting pedestrian targets in the dense scene. However, the overall dimensions of the head area are small, which can easily cause ambiguity in the hand, elbow, and surrounding small objects, thus causing false detection. Therefore, the pedestrian detection is difficult to realize accurately only depending on the whole detection and the head detection, the head detection and the whole detection are required to be effectively combined, and the richer and finer multi-scale characteristic information is required to be obtained in the characteristic extraction link, so that the advantages of the head detection and the whole detection are better exerted, and the accuracy of the pedestrian detection is improved. Therefore, the design of the multi-scale pedestrian detection algorithm based on the combination of the head and the whole information is very necessary, and has important significance for improving the detection accuracy of the shielding pedestrian target and the multi-scale pedestrian target in the dense scene.
In view of the above requirements, there are many related solutions at home and abroad.
The Chinese patent publication number is CN111767882A, the publication date is 2020, 10 months and 13 days, and the patent name is the patent application of the invention of a multi-mode pedestrian detection method based on an improved YOLO model, and the pedestrian detection effect is improved by fusing CBAM attention mechanism and optimizing a loss function. However, there are some problems such as (1) insufficient sensitivity to multi-scale pedestrian targets in the feature extraction link, and especially weak feature information learning ability for small-scale pedestrians at far-view angles. (2) There is still room for improvement in terms of missed detection of pedestrian targets that overlap severely. The Chinese patent publication number is CN113989939A, the publication date is 2022, 1 and 28 days, and the patent name is the patent application of 'a small target pedestrian detection system based on an improved YOLO algorithm', but the phenomenon of dense crowding of pedestrians is unavoidable in an actual detection scene, so that the detection capability of the method for shielding the pedestrian target is weaker, and certain limitation exists in practical application. The Chinese patent publication number is CN114882527A, the publication date is 2022, 8 and 9, the patent name is the patent application of the pedestrian detection method and system based on dynamic grouping convolution, the pedestrian detection is realized mainly by using grouping convolution, but consideration of specific scenes of pedestrian detection is lacking, and the execution efficiency in complex crowd-intensive scenes has certain limitation.
For the existing pedestrian detection technology, more schemes rely on classical target detection algorithms to detect pedestrian targets in a scene, and the detection thought can be divided into a single-stage detection algorithm represented by a YOLO series network model and a two-stage detection algorithm represented by a fast R-CNN network model. However, most of the classical target detection algorithms lack specific consideration for detection scenes, especially in complex crowd-intensive scenes, the robustness of these mainstream detection algorithms is affected, mainly including the following two points:
(1) The variable pedestrian target dimensions result in the overall performance of the detector being affected. Because the current pedestrian detection data set is mostly obtained by shooting based on a camera and performing calibration processing, the camera has a rule of 'near big and far small' when shooting, the overall dimension of a pedestrian target at a near visual angle is larger, and the overall dimension of the pedestrian target at a far visual angle is smaller. Because the resolution ratio of the small-scale pedestrian targets is relatively low, the problems of limited learned characteristic information or weak characteristic expression capability easily occur in the process of characteristic extraction by the algorithm, and the problem that the pedestrian targets with different scales are difficult to have high sensitivity is difficult to occur, so that the phenomenon of missed detection or false detection is easily caused.
(2) The high occlusion of pedestrian targets results in an impact on the overall performance of the detector. In the pedestrian detection task under a complex crowd intensive scene, a pedestrian target is often subjected to a certain shielding phenomenon. As can be seen from analysis of images in a pedestrian detection dataset, the pedestrian occlusion problem mainly includes two cases, i.e., intra-class occlusion and inter-class occlusion. The intra-class shielding refers to mutual shielding among pedestrian targets. The inter-class shielding means that pedestrians are interfered by background information, and the background information mainly comprises buildings, trees, vehicles, articles carried by the pedestrians, articles carried by other pedestrians nearby, and the like. The proportion of the visible area part of the whole body of the pedestrian is reduced due to the intra-class shielding and inter-class shielding, the characteristic extraction link of the algorithm is difficult, the information required by the reasoning of the algorithm detection module is reduced, and the accuracy of the pedestrian target positioning is influenced, so that the comprehensive performance of the pedestrian detection algorithm is influenced.
Disclosure of Invention
The invention aims to solve the problems, and provides a multi-scale pedestrian detection method and device based on the combination of head and overall information, which can improve the detection precision of a detector on multi-scale pedestrian targets and blocked pedestrian targets in a complex crowd intensive scene and reduce the omission ratio of the detector.
In a first aspect, the present invention provides a multi-scale pedestrian detection method based on a combination of head and overall information, comprising:
Constructing a Faster R-CNN network model;
The backbone network of the Faster R-CNN network model is fused with an improved feature extraction network, and an image to be detected is input into the Faster R-CNN model of the fused and improved feature extraction network for feature extraction, so that an extracted feature map is obtained;
The sampling mode of the regional suggestion network is improved, a non-uniform difficult sample mining strategy based on shielding overlapping rate discrimination is constructed, the shielding overlapping rate of each sample in a sample set is calculated, a sample with higher shielding overlapping rate is given with higher weight, and the regional suggestion network is utilized to simultaneously generate a head candidate frame and an integral candidate frame set for all pedestrian instances in a scene;
constructing a pedestrian head detection branch module and a pedestrian overall detection branch module, and obtaining a preliminary target detection result through the pedestrian head detection branch module and the pedestrian overall detection branch module, wherein the preliminary target detection result comprises a pedestrian head detection frame and a pedestrian overall detection frame;
Post-processing the obtained pedestrian head detection frame and the pedestrian overall detection frame, and screening out redundant detection frames generated in the combined detection process to obtain a final pedestrian detection result;
In order to further inhibit false detection and missing detection conditions in the post-processing link, a joint loss function is constructed in the loss function part, and the joint loss function is used for punishing false detection conditions which are not inhibited by a correct non-maximum value and missing detection conditions which are inhibited by the wrong non-maximum value, so that a final detection result is more accurate.
As a preferred solution, the fusing the backbone network of the fast R-CNN network model with the improved feature extraction network, and inputting the image to be detected into the fast R-CNN model of the fused and improved feature extraction network for feature extraction, to obtain an extracted feature map, which includes:
Taking ResNet network as backbone network of the fast R-CNN network model;
Learning characteristic information of an image to be detected by using the backbone network;
Feature fusion is carried out on feature information acquired by the backbone network, a dense connection idea is combined, feature splicing strategies of a feature pyramid network FPN are improved, namely, features of all scales participate in calculation and output of the feature information, the advantages of high-level semantic information expressed by a large-scale feature map and detailed information such as bottom texture expressed by a small-scale feature map are absorbed, fusion of the network to the multi-scale features is enhanced, and predicted values corresponding to the scale features of each layer are obtained ~Specifically as shown in formula 1 and formula 2;
(1)
(2)
Wherein, Are all the weight parameters of the weight-based material,As a function of the feature map,~Extracting each layer of scale characteristics obtained by a network for the characteristics, and in the specific training process, parametersParticipate in the back propagation of the gradient, and learn to update to the most appropriate value through training of the model.
As a preferred solution, the method for improving the sampling manner of the area suggestion network, constructing a non-uniform difficult sample mining strategy based on the judgment of the shielding overlapping rate, calculating the shielding overlapping rate of each sample in the sample set, and giving a higher weight to the sample with a higher shielding overlapping rate, and generating a head candidate frame and an integral candidate frame set simultaneously for all pedestrian instances in the scene by using the area suggestion network, including:
the non-uniform difficult sample mining strategy based on shielding overlap rate discrimination is realized by introducing a judging threshold value The sample set is based on the average shielding overlapping rateDivided into difficult sets) And common collection) Two classes, the probability of each sample being extracted is defined asThe specific calculation method is shown in a formula 3;
(3)
Wherein, Represent the firstThe shading coefficient of each sample is used for reflecting the firstThe degree to which the individual samples are occluded;
For the first Occlusion coefficient of individual samplesFurther expressed as:
(4)
Wherein, Indicating the number of samples required for the test,Representing the total number of candidate samples,AndRespectively represent the first in the sample setDetermining a threshold value by using the occlusion overlapping rate of each sample and the average occlusion overlapping rate of the whole sample setDifferent values may be set depending on the overall occlusion degree of the dataset;
for sample set No Occlusion overlap ratio of individual samplesAnd average occlusion overlap rate for the entire sample setFurther expressed as:
(5)
(6)
As a preferred solution, the post-processing is performed on the obtained pedestrian head detection frame and the pedestrian overall detection frame, and redundant detection frames generated in the combined detection process are screened out, so as to obtain a final pedestrian detection result, which includes:
Respectively introducing penalty factors for decreasing confidence to a head detection frame and a pedestrian overall detection frame obtained by combining the detection frames, and gradually reducing the confidence score of the overlapped detection frames, so that competition of the overlapped frames is reduced while the overlapped frames are not excessively restrained, and the first step is that Score of individual pedestrian overall detection frameAnd (d)Score of individual head detection framesExpressed as:
(7)
(8)
(9)
(10)
Wherein, AndThe overall box and the head box with the highest scores in the previous iteration are respectively,AndNMS thresholds corresponding to the overall detection and the header detection respectively,A smaller fixed value for the initial setting;
Weighting and summing the confidence scores of the obtained head frame and the whole frame to obtain a joint confidence score The method is specifically expressed as follows:
(11)
Wherein, And representing the weight occupied by the head detection score, and if the head overlapping degree or the whole overlapping degree is larger than a preset threshold value, suppressing the detection frame with lower confidence.
As a preferred solution, to further suppress false detection and missing detection situations occurring in the post-processing link, a joint loss function is constructed in the loss function part, where the joint loss function is used to penalize false detection situations that are not suppressed by a correct non-maximum value and missing detection situations that are suppressed by a wrong non-maximum value, so that a final detection result is more accurate, and the method includes:
The joint loss function The concrete steps are as follows:
(12)
Wherein the coefficient is AndAre the weights of the balance loss,And (3) withFor pulling the head frame and the whole frame of the false detection closer to be removed,And (3) withThe method comprises the steps of pushing away a missing head frame and an integral frame which are restrained by an error non-maximum value, so that the head frame and the integral frame correctly correspond to a target frame;
For the missing head box and the whole box to be suppressed by the error non-maximum value And (3) withThe loss function, further expressed as:
(13)
(14)
For head frames and whole frames for false detection to be drawn in for rejection And (3) withThe loss function, further expressed as:
(15)
(16)
Wherein, AndNMS threshold values corresponding to the head detection branch and the whole detection branch are respectively obtained, and after redundant frames generated by the head detection branch and the whole detection branch are removed, a final pedestrian detection result is obtained.
In a second aspect, the present invention provides a multi-scale pedestrian detection device based on a combination of head and overall information, comprising:
the building unit is used for building a Faster R-CNN network model;
The extraction unit is used for fusing the backbone network of the Faster R-CNN network model with the improved feature extraction network, inputting the image to be detected into the Faster R-CNN model fused with the improved feature extraction network for feature extraction, and obtaining an extracted feature map;
The generation unit is used for improving the sampling mode of the regional suggestion network, constructing a non-uniform difficult sample mining strategy based on shielding overlapping rate discrimination, calculating the shielding overlapping rate of each sample in the sample set, giving higher weight to the sample with higher shielding overlapping rate, and simultaneously generating a head candidate frame and an integral candidate frame set for all pedestrian examples in a scene by using the regional suggestion network;
The primary detection unit is used for constructing a pedestrian head detection branch module and a pedestrian overall detection branch module, and obtaining a primary target detection result through the pedestrian head detection branch module and the pedestrian overall detection branch module, wherein the primary target detection result comprises a pedestrian head detection frame and a pedestrian overall detection frame;
The post-processing unit is used for carrying out post-processing on the obtained pedestrian head detection frame and the pedestrian overall detection frame, and screening out redundant detection frames generated in the combined detection process to obtain a final pedestrian detection result;
The output unit is used for constructing a joint loss function in the loss function part for further suppressing false detection and missing detection conditions occurring in the post-processing link, and the joint loss function is used for punishing false detection conditions which are not suppressed by the correct non-maximum value and missing detection conditions which are suppressed by the incorrect non-maximum value, so that the final detection result is more accurate.
As a preferred embodiment, the extraction unit is specifically configured to:
Taking ResNet network as backbone network of the fast R-CNN network model;
Learning characteristic information of an image to be detected by using the backbone network;
Feature fusion is carried out on feature information acquired by the backbone network, a dense connection idea is combined, feature splicing strategies of a feature pyramid network FPN are improved, namely, features of all scales participate in calculation and output of the feature information, the advantages of high-level semantic information expressed by a large-scale feature map and detailed information such as bottom texture expressed by a small-scale feature map are absorbed, fusion of the network to the multi-scale features is enhanced, and predicted values corresponding to the scale features of each layer are obtained ~Specifically as shown in formula 1 and formula 2;
(1)
(2)
Wherein, Are all the weight parameters of the weight-based material,As a function of the feature map,~Extracting each layer of scale characteristics obtained by a network for the characteristics, and in the specific training process, parametersParticipate in the back propagation of the gradient, and learn to update to the most appropriate value through training of the model.
As a preferred solution, the generating unit is specifically configured to:
the non-uniform difficult sample mining strategy based on shielding overlap rate discrimination is realized by introducing a judging threshold value The sample set is based on the average shielding overlapping rateDivided into difficult sets) And common collection) Two classes, the probability of each sample being extracted is defined asThe specific calculation method is shown in a formula 3;
(3)
Wherein, Represent the firstThe shading coefficient of each sample is used for reflecting the firstThe degree to which the individual samples are occluded;
For the first Occlusion coefficient of individual samplesFurther expressed as:
(4)
Wherein, Indicating the number of samples required for the test,Representing the total number of candidate samples,AndRespectively represent the first in the sample setDetermining a threshold value by using the occlusion overlapping rate of each sample and the average occlusion overlapping rate of the whole sample setDifferent values may be set depending on the overall occlusion degree of the dataset;
for sample set No Occlusion overlap ratio of individual samplesAnd average occlusion overlap rate for the entire sample setFurther expressed as:
(5)
(6)
As a preferred solution, the post-processing unit is specifically configured to:
Respectively introducing penalty factors for decreasing confidence to a head detection frame and a pedestrian overall detection frame obtained by combining the detection frames, and gradually reducing the confidence score of the overlapped detection frames, so that competition of the overlapped frames is reduced while the overlapped frames are not excessively restrained, and the first step is that Score of individual pedestrian overall detection frameAnd (d)Score of individual head detection framesExpressed as:
(7)
(8)
(9)
(10)
Wherein, AndThe overall box and the head box with the highest scores in the previous iteration are respectively,AndNMS thresholds corresponding to the overall detection and the header detection respectively,A smaller fixed value for the initial setting;
Weighting and summing the confidence scores of the obtained head frame and the whole frame to obtain a joint confidence score The method is specifically expressed as follows:
(11)
Wherein, And representing the weight occupied by the head detection score, and if the head overlapping degree or the whole overlapping degree is larger than a preset threshold value, suppressing the detection frame with lower confidence.
As a preferred solution, the output unit is specifically configured to:
The joint loss function The concrete steps are as follows:
(12)
Wherein the coefficient is AndAre the weights of the balance loss,And (3) withFor pulling the head frame and the whole frame of the false detection closer to be removed,And (3) withThe method comprises the steps of pushing away a missing head frame and an integral frame which are restrained by an error non-maximum value, so that the head frame and the integral frame correctly correspond to a target frame;
For the missing head box and the whole box to be suppressed by the error non-maximum value And (3) withThe loss function, further expressed as:
(13)
(14)
For head frames and whole frames for false detection to be drawn in for rejection And (3) withThe loss function, further expressed as:
(15)
(16)
Wherein, AndNMS threshold values corresponding to the head detection branch and the whole detection branch are respectively obtained, and after redundant frames generated by the head detection branch and the whole detection branch are removed, a final pedestrian detection result is obtained.
Compared with the prior art, the invention has the following beneficial effects:
The embodiment of the invention provides a multiscale pedestrian detection method and device based on combination of head and whole information, which comprises the steps of firstly intensively connecting different layers of characteristics, namely enabling all scale characteristics to participate in a strategy for calculating final output characteristics, enabling each layer of characteristics to cover semantic information and detail texture information so as to improve the sensitivity of a network to multiscale pedestrian targets, secondly, optimizing a sampling mode of an area suggestion network, calculating the shielding overlapping rate of each sample in a sample set, giving higher weight to samples with higher shielding overlapping rate, strengthening learning of serious shielding samples in difficult sample sets during training, and further improving the detection capability of a model to shielding pedestrian targets, and then constructing a combined detection framework of pedestrian heads and whole information, aiming at assisting pedestrian detection by using head detection, thereby reducing adverse effects on detection caused by shielding of pedestrian bodies. And the post-processing link and the loss function module are optimized, so that the interference of adjacent pedestrian targets on detection is weakened, the intelligence and the rationality of screening redundant frames are improved, and false detection frames generated by two detection branches are removed while the detection frames at the dense positions are not excessively restrained, so that the omission rate and the false detection rate of pedestrian detection are further reduced. Therefore, compared with the previous algorithm, the pedestrian detection algorithm provided by the invention has stronger detection capability on multi-scale pedestrian targets and blocked pedestrian targets in a complex crowd intensive scene, and can reduce the missed detection rate of the pedestrian targets.
Drawings
Fig. 1 is a flow chart of a multi-scale pedestrian detection method based on head and overall information association according to an embodiment of the present invention.
Fig. 2 is an overall logic schematic diagram of a multi-scale pedestrian detection method based on head and overall information association according to an embodiment of the present invention.
Fig. 3 is a schematic structural diagram of ResNet network in a multi-scale pedestrian detection method based on head and overall information association according to an embodiment of the present invention.
Fig. 4 is a network structure diagram of feature extraction in a multi-scale pedestrian detection method based on head and overall information association according to an embodiment of the present invention.
Fig. 5 is a pseudo code flow chart of a joint post-processing algorithm in a multi-scale pedestrian detection method based on joint of head and overall information, according to an embodiment of the invention.
Fig. 6 is a graph comparing the overall performance of a multi-scale pedestrian detection method based on the combination of head and overall information with some mainstream detection algorithms on CrowdHuman datasets according to an embodiment of the present invention.
Fig. 7 is a graph comparing the overall performance of a multi-scale pedestrian detection method based on the combination of head and overall information with some mainstream detection algorithms on CityPersons datasets according to an embodiment of the present invention.
Fig. 8a is a comparison result of the performance of the multi-scale pedestrian detection method based on the combination of the head and the overall information and some main stream detection algorithms in TJU-Ped-campus subsets according to an embodiment of the present invention.
Fig. 8b is a comparison result of the performance of the multi-scale pedestrian detection method based on the combination of the head and the overall information and some main stream detection algorithms in TJU-Ped-traffic subsets according to the embodiment of the present invention.
Fig. 9 is a block diagram of a multi-scale pedestrian detection device based on the combination of head and overall information according to an embodiment of the present invention.
Detailed Description
Hereinafter, embodiments of the present invention will be described with reference to the accompanying drawings. In the following description, like modules are denoted by like reference numerals. In the case of the same reference numerals, their names and functions are also the same. Therefore, a detailed description thereof will not be repeated.
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail with reference to the accompanying drawings and specific embodiments. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not to be construed as limiting the invention.
Referring to fig. 1, an embodiment of the present invention provides a multi-scale pedestrian detection method based on head and overall information association, including:
S101, constructing a Faster R-CNN network model;
S102, integrating the backbone network of the Faster R-CNN network model with an improved feature extraction network, and inputting an image to be detected into the Faster R-CNN model of the integrated and improved feature extraction network for feature extraction to obtain an extracted feature map;
s103, improving a sampling mode of a regional suggestion network, constructing a non-uniform difficult sample mining strategy based on shielding overlapping rate discrimination, calculating the shielding overlapping rate of each sample in a sample set, giving higher weight to samples with higher shielding overlapping rate, and simultaneously generating a head candidate frame and an integral candidate frame set for all pedestrian examples in a scene by using the regional suggestion network;
S104, constructing a pedestrian head detection branch module and a pedestrian overall detection branch module, and obtaining a preliminary target detection result through the pedestrian head detection branch module and the pedestrian overall detection branch module, wherein the preliminary target detection result comprises a pedestrian head detection frame and a pedestrian overall detection frame;
S105, performing post-processing on the obtained pedestrian head detection frame and the pedestrian overall detection frame, and screening out redundant detection frames generated in the combined detection process to obtain a final pedestrian detection result;
S106, for further suppressing false detection and missing detection conditions in the post-processing link, constructing a joint loss function in the loss function part, wherein the joint loss function is used for punishing false detection conditions which are not suppressed by the correct non-maximum value and missing detection conditions which are suppressed by the incorrect non-maximum value, so that a final detection result is more accurate.
In S102, the fusing the backbone network of the fast R-CNN network model with the improved feature extraction network, and inputting the image to be detected into the fast R-CNN model of the fused and improved feature extraction network for feature extraction, to obtain an extracted feature map, including:
Taking ResNet network as backbone network of the fast R-CNN network model;
Learning characteristic information of an image to be detected by using the backbone network;
Feature fusion is carried out on feature information acquired by the backbone network, a dense connection idea is combined, feature splicing strategies of a feature pyramid network FPN are improved, namely, features of all scales participate in calculation and output of the feature information, the advantages of high-level semantic information expressed by a large-scale feature map and detailed information such as bottom texture expressed by a small-scale feature map are absorbed, fusion of the network to the multi-scale features is enhanced, and predicted values corresponding to the scale features of each layer are obtained ~Specifically as shown in formula 1 and formula 2;
(1)
(2)
Wherein, Are all the weight parameters of the weight-based material,As a function of the feature map,~Extracting each layer of scale characteristics obtained by a network for the characteristics, and in the specific training process, parametersParticipate in the back propagation of the gradient, and learn to update to the most appropriate value through training of the model.
Further, in S103, the method for improving the sampling manner of the area suggestion network, constructing a non-uniform difficult sample mining strategy based on the determination of the occlusion overlapping rate, and simultaneously generating a head candidate frame and an overall candidate frame set for all pedestrian instances in the scene by using the area suggestion network by calculating the occlusion overlapping rate of each sample in the sample set and giving a higher weight to the sample with a higher occlusion overlapping rate, including:
the non-uniform difficult sample mining strategy based on shielding overlap rate discrimination is realized by introducing a judging threshold value The sample set is based on the average shielding overlapping rateDivided into difficult sets) And common collection) Two classes, the probability of each sample being extracted is defined asThe specific calculation method is shown in a formula 3;
(3)
Wherein, Represent the firstThe shading coefficient of each sample is used for reflecting the firstThe degree to which the individual samples are occluded;
For the first Occlusion coefficient of individual samplesFurther expressed as:
(4)
Wherein, Indicating the number of samples required for the test,Representing the total number of candidate samples,AndRespectively represent the first in the sample setDetermining a threshold value by using the occlusion overlapping rate of each sample and the average occlusion overlapping rate of the whole sample setDifferent values may be set depending on the overall occlusion degree of the dataset;
for sample set No Occlusion overlap ratio of individual samplesAnd average occlusion overlap rate for the entire sample setFurther expressed as:
(5)
(6)
Further, in S105, the post-processing is performed on the obtained pedestrian head detection frame and the pedestrian overall detection frame, and the redundant detection frames generated in the combined detection process are screened out, so as to obtain a final pedestrian detection result, which includes:
Respectively introducing penalty factors for decreasing confidence to a head detection frame and a pedestrian overall detection frame obtained by combining the detection frames, and gradually reducing the confidence score of the overlapped detection frames, so that competition of the overlapped frames is reduced while the overlapped frames are not excessively restrained, and the first step is that Score of individual pedestrian overall detection frameAnd (d)Score of individual head detection framesExpressed as:
(7)
(8)
(9)
(10)
Wherein, AndThe overall box and the head box with the highest scores in the previous iteration are respectively,AndNMS thresholds corresponding to the overall detection and the header detection respectively,A smaller fixed value for the initial setting;
Weighting and summing the confidence scores of the obtained head frame and the whole frame to obtain a joint confidence score The method is specifically expressed as follows:
(11)
Wherein, And representing the weight occupied by the head detection score, and if the head overlapping degree or the whole overlapping degree is larger than a preset threshold value, suppressing the detection frame with lower confidence.
Further, in S106, to further suppress the false detection and missing detection situations occurring in the post-processing link, a joint loss function is constructed in the loss function portion, where the joint loss function is used to penalize the false detection situations that are not suppressed by the correct non-maximum value and the missing detection situations that are suppressed by the incorrect non-maximum value, so that the final detection result is more accurate, and includes:
The joint loss function The concrete steps are as follows:
(12)
Wherein the coefficient is AndAre the weights of the balance loss,And (3) withFor pulling the head frame and the whole frame of the false detection closer to be removed,And (3) withThe method comprises the steps of pushing away a missing head frame and an integral frame which are restrained by an error non-maximum value, so that the head frame and the integral frame correctly correspond to a target frame;
For the missing head box and the whole box to be suppressed by the error non-maximum value And (3) withThe loss function, further expressed as:
(13)
(14)
For head frames and whole frames for false detection to be drawn in for rejection And (3) withThe loss function, further expressed as:
(15)
(16)
Wherein, AndNMS threshold values corresponding to the head detection branch and the whole detection branch are respectively obtained, and after redundant frames generated by the head detection branch and the whole detection branch are removed, a final pedestrian detection result is obtained.
The embodiment of the invention provides a multiscale pedestrian detection method based on combination of head and whole information, which comprises the steps of firstly intensively connecting different layers of characteristics, namely enabling all scale characteristics to participate in a strategy of calculating final output characteristics, enabling each layer of characteristics to cover semantic information and detail texture information so as to improve the sensitivity of a network to multiscale pedestrian targets, secondly, optimizing a sampling mode of a regional suggestion network, calculating the shielding overlapping rate of each sample in a sample set, giving higher weight to samples with higher shielding overlapping rate, strengthening learning of serious samples which are blocked in difficult sample sets during training, further improving the detection capability of a model to the pedestrian targets, and then constructing a combined detection framework of pedestrian heads and whole information, aiming at assisting in pedestrian detection by head detection, and reducing adverse effects on detection caused by shielding of pedestrian bodies. And the post-processing link and the loss function module are optimized, so that the interference of adjacent pedestrian targets on detection is weakened, the intelligence and the rationality of screening redundant frames are improved, and false detection frames generated by two detection branches are removed while the detection frames at the dense positions are not excessively restrained, so that the omission rate and the false detection rate of pedestrian detection are further reduced. Therefore, compared with the previous algorithm, the pedestrian detection algorithm provided by the invention has stronger detection capability on multi-scale pedestrian targets and blocked pedestrian targets in a complex crowd intensive scene, and can reduce the missed detection rate of the pedestrian targets.
2, 3, 4 And 5, the invention aims at solving the problem that the detection precision of small-scale pedestrian targets and blocked pedestrian targets is reduced and the omission ratio is high in a complex and dense scene by the existing target detection algorithm, and in order to further facilitate the understanding of the scheme of the invention, the following description is given of a multi-scale pedestrian detection method based on the combination of head and whole information under one embodiment of the invention, and the whole method is shown in FIG. 2, wherein the specific steps include:
step 1, constructing a Faster R-CNN network model;
Step 2, fusing a backbone network of the Faster R-CNN network model with the improved feature extraction network, and inputting an image to be detected into the Faster R-CNN network model fused with the improved feature extraction network to perform feature extraction to obtain an extracted feature map;
Step 2.1, taking ResNet network as backbone network of the Faster R-CNN network model, wherein the structure diagram of ResNet network is shown in figure 3;
And 2.2, fusing the improved feature extraction network with a Faster R-CNN network model, wherein the overall structure of the improved feature extraction network is shown in figure 4. The specific implementation steps are as follows:
Step 2.2.1, learning characteristic information of an image to be detected by using a backbone network ResNet;
Step 2.2.2, feature fusion is carried out on Feature information acquired by a backbone network, namely, a Feature splicing strategy of a Feature pyramid network (Feature PyramidNetwork, FPN) is improved by combining a dense connection idea, namely, features of all scales participate in calculation and output of Feature information, so that the advantages of expressing high-level semantic information by a large-scale Feature map and detail information such as bottom texture by a small-scale Feature map are simultaneously absorbed, fusion of the network on multi-scale features is enhanced, and predicted values corresponding to the scale features of each layer are obtained ~Specifically, the method is shown in a formula (1) and a formula (2).
(1)
(2)
Wherein, Are all the weight parameters of the weight-based material,As a function of the feature map,~And extracting the scale characteristics of each layer acquired by the network for the characteristics. In the specific training process, parametersParticipate in the back propagation of the gradient, and learn to update to the most appropriate value through training of the model. The method enables the model to learn which connections are conducive to prediction by the detection module and which connections are invalid calculations, thereby improving the performance of the feature extraction network.
And 3, improving a sampling mode of the regional suggestion network (Region Proposal Network, RPN) to construct a non-uniform difficult sample mining strategy based on shielding overlap rate discrimination. The shielding overlapping rate of each sample in the sample set is calculated, and a higher weight is given to the sample with a higher shielding overlapping rate, so that the study of the seriously shielded sample in the difficult sample set during training is enhanced, and the detection capability of the model on the shielding pedestrian target is further improved. The region suggestion network is then utilized to simultaneously generate a set of head candidate boxes and overall candidate boxes for all pedestrian instances in the scene.
Step 3.1, for the non-uniform difficult sample mining strategy based on the shielding overlap rate discrimination constructed in the invention, a determination threshold value is introducedBased on average shielding overlapping rate of sample setDivided into difficult sets) And common collection) Two types. The probability that each sample is extracted is then defined asThe specific calculation method is shown in the formula (3).
(3)
Wherein, Represent the firstThe shading coefficient of each sample can reflect the firstThe degree to which individual samples are occluded.
Step 3.2 for the first step in step 3.1Occlusion coefficient of individual samplesFurther expressed as:
(4)
Wherein, Indicating the number of samples required for the test,Representing the total number of candidate samples.AndRespectively represent the first in the sample setOcclusion overlap ratio of individual samples and average occlusion overlap ratio of the entire sample set, thresholdDifferent values may be set depending on the overall occlusion degree of the dataset.
Step 3.3 for sample set in step 3.2Occlusion overlap ratio of individual samplesAnd average occlusion overlap rate for the entire sample setFurther expressed as:
(5)
(6)
and 4, constructing a pedestrian head detection branch and a pedestrian overall detection branch, and obtaining a preliminary target detection result through the pedestrian head detection branch module and the pedestrian overall detection branch module.
And 5, carrying out post-processing on the obtained pedestrian head detection frame and the pedestrian overall detection frame, and screening out redundant detection results generated in the combined detection process, wherein the overall flow pseudo code is shown in figure 5. The specific implementation steps are as follows:
And 5.1, respectively introducing penalty factors for decreasing confidence to the head detection frame and the whole pedestrian detection frame obtained by combining the detection frames, and gradually reducing the confidence score of the overlapped detection frames so as to reduce the competitiveness of the overlapped frames while excessively inhibiting the overlapped frames. First, the Score of individual pedestrian overall detection frameAnd (d)Score of individual head detection framesCan be expressed as:
(7)
(8)
(9)
(10)
Wherein, AndThe overall box and the head box with the highest scores in the previous iteration are respectively,AndNMS thresholds corresponding to the overall detection and the header detection respectively,A small fixed value for the initial setting.
Step 5.2, weighting and summing the confidence scores of the head frame and the whole frame obtained in the step 5.1 to obtain a joint confidence scoreSpecifically, the method can be expressed as:
(11)
Wherein, Representing the weight occupied by the head detection score, if the head overlapping degree or the whole overlapping degree is larger than a preset threshold value, the detection frame with lower confidence degree is restrained.
Step6, for false detection and missing detection in non-maximum value inhibition link, the invention constructs a combined loss function in the loss function partThe purpose is to punish false detection cases that are not suppressed by the correct non-maximum value and false omission cases that are suppressed by the incorrect non-maximum value.The concrete steps are as follows:
(12)
Wherein the coefficient is AndAre weights for balancing losses.And (3) withFor "pulling" the false-detected head box and the whole box so as to eliminate them,And (3) withThe method is used for 'pushing away' the missed head frame and the whole frame which are restrained by the error non-maximum value, so that the missed head frame and the whole frame can correctly correspond to the target frame.
Step 6.1 "push away" for the miss header box and the global box to be suppressed by the erroneous non-maximaAnd (3) withThe loss function, may be further expressed as:
(13)
(14)
Step 6.2 "zoom in" on the head frame and the whole frame for false detection to facilitate its rejection And (3) withThe loss function, may be further expressed as:
(15)
(16)
Wherein, AndNMS thresholds corresponding to the header detection branch and the overall detection branch, respectively. And removing redundant frames generated by the head detection branch and the whole detection branch to obtain a final pedestrian detection result.
The technical scheme designed by the invention can be well applied to the fields of intelligent monitoring, auxiliary safe driving, large-scale place video monitoring, intelligent robots and the like.
The effectiveness of the invention can be demonstrated by the following experimental verification:
Experimental conditions and experimental contents
1. The image Net data set is used as a pre-training data set of the pedestrian detection model, then the pedestrian detection model is trained based on CrowdHuman, cityPersons data sets and TJU-DHD-pedestrian data sets respectively, and some enhancement strategies including random clipping, random horizontal overturn and random image brightness interference are used for the data sets in the training process. In the actual training process, a random gradient descent optimizer is adopted to train 35 pieces of epochs, the momentum factor momentum is set to 0.9, the learning rate adopts the strategy of norm up, the initial learning rate is set to a lower value, and the training rate is set to be in the inventionOn iteration 1 to iteration 800 of each epoch, the learning rate increases linearly toAnd then remain unchanged. By 24 th epoch, the learning rate was reduced to 10% of the original, and by 28 th epoch, the learning rate was changed to 1%.
2. The overall performance of the method involved in the present invention was evaluated using the average accuracy (Average Precision, AP) and the logarithmic average omission ratio (MR -2) as evaluation indices. The specific calculation method of the average precision and the logarithmic average omission ratio can be expressed as follows:
Wherein, The representation accuracy rate is the ratio of the number of samples with the predicted result being the pedestrian target to the pedestrian target actually marked in the image; Representing recall rate, wherein the recall rate is the proportion of the number of pedestrian targets which are correctly predicted to occupy all the prediction results in the pedestrian target detection process; the detection omission factor is indicated to be the detection omission factor, Representing the average number of false positives in each image. The higher the AP value and the lower the MR -2 value, the more excellent the detection performance of the algorithm is explained.
(II) results of experiments
As shown in fig. 6, 7 and 8, on CrowdHuman, cityPersons and TJU-DHD-pedestrian pedestrian detection datasets, the average detection accuracy value is higher and the logarithmic average omission factor value is lower than some of the more popular pedestrian detection algorithms and the baseline fast R-CNN algorithm before improvement. Therefore, the experimental result can fully show that the method has better performance than some popular pedestrian detection algorithms, and can better detect multi-scale pedestrian targets and pedestrian targets with serious shielding.
Compared with the prior art, the multi-scale pedestrian detection method provided by the invention has the following characteristics:
1. Compared with the existing YOLO series network model and R-CNN series network model, the method provided by the invention has the advantages that in the feature extraction link, through densely connecting different layers of features and enabling all scale features to participate in a strategy for calculating final output features, semantic information and detail texture information can be covered in each finally obtained layer of features, and further, the detection capability of the network on multi-scale pedestrian targets is improved. From the information circulation perspective analysis, the feature extraction network constructed by the invention can learn abundant and detailed multi-scale features, deep semantic information can be fused in shallow features, shallow detail texture information can be fused in deep features, and the sensitivity of the network to multi-scale pedestrian targets can be improved. When the multi-scale features are fused by adopting a dense connection strategy from the perspective of parameter optimization, the gradient can be more efficiently counter-propagated in an actual training stage, so that the network complexity is not excessively increased while the convergence speed of a network model is increased, and the method is more beneficial to obtaining a high-quality global optimal solution.
2. Compared with the existing R-CNN series network model, the invention optimizes the sampling mode of the regional suggestion network, calculates the shielding overlapping rate of each sample in the sample set, and gives higher weight to the sample with higher shielding overlapping rate, thereby strengthening the study of the serious sample shielded in the difficult sample set during training, and gradually improving the capability of detecting the pedestrian shielding target along with the continuous training, so as to show higher performance in complex crowd intensive scenes.
3. Compared with the existing YOLO series network model and the R-CNN series network model, the method provided by the invention designs the head and the whole double detection branches at the core part of the model to carry out joint detection, so that the head detection is fully utilized to assist the pedestrian detection, and the influence of the shielding of the pedestrian body on the performance of the detector is reduced.
4. Compared with the existing YOLO series network model and the R-CNN series network model, the method disclosed by the invention optimizes the post-processing link and the loss function module, weakens the interference caused by adjacent pedestrian targets on detection, improves the intelligence and rationality of screening redundant frames, and enables the densely-located detection frames not to be excessively restrained and simultaneously eliminates false detection frames generated by two detection branches, thereby further reducing the omission ratio and the false detection ratio of pedestrian detection.
Accordingly, as shown in fig. 9, in an embodiment of the present invention, there is provided a multi-scale pedestrian detection device based on a combination of head and overall information, including:
the building unit 901 is used for building a fast R-CNN network model;
The extracting unit 902 is configured to fuse the backbone network of the fast R-CNN network model with an improved feature extracting network, and input an image to be detected into the fast R-CNN model fused with the improved feature extracting network for feature extraction, so as to obtain an extracted feature map;
the generating unit 903 is configured to improve a sampling manner of the area suggestion network, construct a non-uniform difficult sample mining policy based on determination of an occlusion overlapping rate, and simultaneously generate a head candidate frame and an overall candidate frame set for all pedestrian instances in a scene by using the area suggestion network by calculating an occlusion overlapping rate of each sample in a sample set and giving a higher weight to a sample with a higher occlusion overlapping rate;
The primary detection unit 904 is configured to construct a pedestrian head detection branch module and a pedestrian overall detection branch module, and obtain a primary target detection result through the pedestrian head detection branch module and the pedestrian overall detection branch module, where the primary target detection result includes a pedestrian head detection frame and a pedestrian overall detection frame;
The post-processing unit 905 is configured to perform post-processing on the obtained pedestrian head detection frame and the obtained pedestrian overall detection frame, and screen out a redundant detection frame generated in the combined detection process, so as to obtain a final pedestrian detection result;
And an output unit 906, configured to construct a joint loss function in the loss function portion for further suppressing false detection and missing detection situations occurring in the post-processing link, where the joint loss function is configured to penalize false detection situations not suppressed by the correct non-maximum value and missing detection situations suppressed by the incorrect non-maximum value, so that a final detection result is more accurate.
As a preferred solution, the extracting unit 902 is specifically configured to:
Taking ResNet network as backbone network of the fast R-CNN network model;
Learning characteristic information of an image to be detected by using the backbone network;
Feature fusion is carried out on feature information acquired by the backbone network, a dense connection idea is combined, feature splicing strategies of a feature pyramid network FPN are improved, namely, features of all scales participate in calculation and output of the feature information, the advantages of high-level semantic information expressed by a large-scale feature map and detailed information such as bottom texture expressed by a small-scale feature map are absorbed, fusion of the network to the multi-scale features is enhanced, and predicted values corresponding to the scale features of each layer are obtained ~Specifically as shown in formula 1 and formula 2;
(1)
(2)
Wherein, Are all the weight parameters of the weight-based material,As a function of the feature map,~Extracting each layer of scale characteristics obtained by a network for the characteristics, and in the specific training process, parametersParticipate in the back propagation of the gradient, and learn to update to the most appropriate value through training of the model.
As a preferable solution, the generating unit 903 is specifically configured to:
the non-uniform difficult sample mining strategy based on shielding overlap rate discrimination is realized by introducing a judging threshold value The sample set is based on the average shielding overlapping rateDivided into difficult sets) And common collection) Two classes, the probability of each sample being extracted is defined asThe specific calculation method is shown in a formula 3;
(3)
Wherein, Represent the firstThe shading coefficient of each sample is used for reflecting the firstThe degree to which the individual samples are occluded;
For the first Occlusion coefficient of individual samplesFurther expressed as:
(4)
Wherein, Indicating the number of samples required for the test,Representing the total number of candidate samples,AndRespectively represent the first in the sample setDetermining a threshold value by using the occlusion overlapping rate of each sample and the average occlusion overlapping rate of the whole sample setDifferent values may be set depending on the overall occlusion degree of the dataset;
for sample set No Occlusion overlap ratio of individual samplesAnd average occlusion overlap rate for the entire sample setFurther expressed as:
(5)
(6)
As a preferred solution, the post-processing unit 905 is specifically configured to:
Respectively introducing penalty factors for decreasing confidence to a head detection frame and a pedestrian overall detection frame obtained by combining the detection frames, and gradually reducing the confidence score of the overlapped detection frames, so that competition of the overlapped frames is reduced while the overlapped frames are not excessively restrained, and the first step is that Score of individual pedestrian overall detection frameAnd (d)Score of individual head detection framesExpressed as:
(7)
(8)
(9)
(10)
Wherein, AndThe overall box and the head box with the highest scores in the previous iteration are respectively,AndNMS thresholds corresponding to the overall detection and the header detection respectively,A smaller fixed value for the initial setting;
Weighting and summing the confidence scores of the obtained head frame and the whole frame to obtain a joint confidence score The method is specifically expressed as follows:
(11)
Wherein, And representing the weight occupied by the head detection score, and if the head overlapping degree or the whole overlapping degree is larger than a preset threshold value, suppressing the detection frame with lower confidence.
As a preferred solution, the output unit 906 is specifically configured to:
The joint loss function The concrete steps are as follows:
(12)
Wherein the coefficient is AndAre the weights of the balance loss,And (3) withFor pulling the head frame and the whole frame of the false detection closer to be removed,And (3) withThe method comprises the steps of pushing away a missing head frame and an integral frame which are restrained by an error non-maximum value, so that the head frame and the integral frame correctly correspond to a target frame;
For the missing head box and the whole box to be suppressed by the error non-maximum value And (3) withThe loss function, further expressed as:
(13)
(14)
For head frames and whole frames for false detection to be drawn in for rejection And (3) withThe loss function, further expressed as:
(15)
(16)
Wherein, AndNMS threshold values corresponding to the head detection branch and the whole detection branch are respectively obtained, and after redundant frames generated by the head detection branch and the whole detection branch are removed, a final pedestrian detection result is obtained.
The embodiment of the invention provides a multi-scale pedestrian detection device based on combination of head and whole information, which comprises the steps of firstly intensively connecting different layers of characteristics, namely enabling all scale characteristics to participate in a strategy for calculating final output characteristics, enabling each layer of characteristics to cover semantic information and detail texture information so as to improve the sensitivity of a network to multi-scale pedestrian targets, secondly, optimizing a sampling mode of a regional suggestion network, calculating the shielding overlapping rate of each sample in a sample set, giving higher weight to samples with higher shielding overlapping rate, strengthening learning of serious shielding samples in difficult sample sets during training, further improving the detection capability of a model to shielding pedestrian targets, and then constructing a combined detection framework of pedestrian heads and whole information, aiming at assisting pedestrian detection by head detection, and reducing adverse effects on detection caused by shielding of pedestrian bodies. And the post-processing link and the loss function module are optimized, so that the interference of adjacent pedestrian targets on detection is weakened, the intelligence and the rationality of screening redundant frames are improved, and false detection frames generated by two detection branches are removed while the detection frames at the dense positions are not excessively restrained, so that the omission rate and the false detection rate of pedestrian detection are further reduced. Therefore, compared with the previous algorithm, the pedestrian detection algorithm provided by the invention has stronger detection capability on multi-scale pedestrian targets and blocked pedestrian targets in a complex crowd intensive scene, and can reduce the missed detection rate of the pedestrian targets.
While embodiments of the present invention have been illustrated and described above, it will be appreciated that the above described embodiments are illustrative and should not be construed as limiting the invention. Variations, modifications, alternatives and variations of the above-described embodiments may be made by those of ordinary skill in the art within the scope of the present invention.
The above embodiments of the present invention do not limit the scope of the present invention. Any other corresponding changes and modifications made in accordance with the technical idea of the present invention shall be included in the scope of the claims of the present invention.

Claims (10)

1.一种基于头部和整体信息联合的多尺度行人检测方法,其特征在于,包括:1. A multi-scale pedestrian detection method based on the combination of head and overall information, characterized by comprising: 构建Faster R-CNN网络模型;Build the Faster R-CNN network model; 将所述Faster R-CNN网络模型的骨干网络融合改进后的特征提取网络,并将待检测图像输入至融合改进后特征提取网络的Faster R-CNN模型中进行特征提取,得到提取特征图;The backbone network of the Faster R-CNN network model is integrated with the improved feature extraction network, and the image to be detected is input into the Faster R-CNN model integrated with the improved feature extraction network to extract features, thereby obtaining an extracted feature map; 对区域建议网络的采样方式进行改进,构建基于遮挡重叠率判别的非均匀困难样本挖掘策略,通过计算样本集合中每一样本的遮挡重叠率,并为遮挡重叠率较高的样本赋予较高权重,利用所述区域建议网络对场景中的所有行人实例同时生成头部候选框与整体候选框集合;The sampling method of the region proposal network is improved, and a non-uniform difficult sample mining strategy based on occlusion overlap rate discrimination is constructed. By calculating the occlusion overlap rate of each sample in the sample set and assigning a higher weight to samples with a higher occlusion overlap rate, the region proposal network is used to simultaneously generate head candidate boxes and overall candidate box sets for all pedestrian instances in the scene. 构建行人头部检测分支模块和行人整体检测分支模块,并通过所述行人头部检测分支模块和行人整体检测分支模块获得初步目标检测结果,所述初步目标检测结果包括行人头部检测框和行人整体检测框;Constructing a pedestrian head detection branch module and a pedestrian overall detection branch module, and obtaining a preliminary target detection result through the pedestrian head detection branch module and the pedestrian overall detection branch module, wherein the preliminary target detection result includes a pedestrian head detection frame and a pedestrian overall detection frame; 对得到的所述行人头部检测框和所述行人整体检测框进行后处理,并筛除联合检测过程中产生的冗余检测框,得到最终的行人检测结果;Post-processing the obtained pedestrian head detection frame and the pedestrian overall detection frame, and filtering out redundant detection frames generated in the joint detection process to obtain a final pedestrian detection result; 为进一步抑制后处理环节中出现的误检和漏检情况,在损失函数部分构建联合损失函数,所述联合损失函数用于惩罚未被正确非极大值抑制的误检情况和被错误非极大值抑制的漏检情况,使得最终的检测结果更加准确。In order to further suppress false detection and missed detection in the post-processing stage, a joint loss function is constructed in the loss function part. The joint loss function is used to punish false detections that are not suppressed by correct non-maxima and missed detections that are suppressed by incorrect non-maxima, so that the final detection result is more accurate. 2.如权利要求1所述的基于头部和整体信息联合的多尺度行人检测方法,其特征在于,所述将所述Faster R-CNN网络模型的骨干网络融合改进后的特征提取网络,并将待检测图像输入至融合改进后特征提取网络的Faster R-CNN模型中进行特征提取,得到提取特征图,包括:2. The multi-scale pedestrian detection method based on the combination of head and overall information according to claim 1 is characterized in that the backbone network of the Faster R-CNN network model is fused with the improved feature extraction network, and the image to be detected is input into the Faster R-CNN model fused with the improved feature extraction network for feature extraction to obtain an extracted feature map, including: 将ResNet50网络作为Faster R-CNN网络模型的骨干网络;Use the ResNet50 network as the backbone network of the Faster R-CNN network model; 利用所述骨干网络学习待检测图像的特征信息;Using the backbone network to learn feature information of the image to be detected; 对所述骨干网络获取的特征信息进行特征融合,结合密集连接思想,对特征金字塔网络FPN的特征拼接策略进行改进,即令所有尺度的特征均参与计算输出特征信息,吸取大尺度特征图表达高级语义信息和小尺度特征图表达底层纹理等细节信息的优点,增强网络对多尺度特征的融合,则各层尺度特征对应的预测值~具体如公式1和公式2所示;The feature information obtained by the backbone network is fused, and the feature splicing strategy of the feature pyramid network FPN is improved by combining the idea of dense connection, that is, all scale features are involved in the calculation of output feature information, absorbing the advantages of large-scale feature maps expressing high-level semantic information and small-scale feature maps expressing underlying texture and other detail information, enhancing the network's fusion of multi-scale features, and then the prediction value corresponding to each layer of scale features is ~ The details are shown in formula 1 and formula 2; (1) (1) (2) (2) 其中,均为权重参数,为特征映射函数,~为特征提取网络获取到的各层尺度特征,在具体训练过程中,参数参与梯度的反向传播,通过模型的训练来学习更新至最为合适的值。in, are weight parameters, is the feature mapping function, ~ The scale features of each layer obtained by the feature extraction network. In the specific training process, the parameters Participate in the back propagation of gradients and learn to update to the most appropriate value through model training. 3.如权利要求2所述的基于头部和整体信息联合的多尺度行人检测方法,其特征在于,所述对区域建议网络的采样方式进行改进,构建基于遮挡重叠率判别的非均匀困难样本挖掘策略,通过计算样本集合中每一样本的遮挡重叠率,并为遮挡重叠率较高的样本赋予较高权重,利用所述区域建议网络对场景中的所有行人实例同时生成头部候选框与整体候选框集合,包括:3. The multi-scale pedestrian detection method based on the joint head and overall information as claimed in claim 2 is characterized in that the sampling method of the region proposal network is improved, and a non-uniform difficult sample mining strategy based on occlusion overlap rate discrimination is constructed, by calculating the occlusion overlap rate of each sample in the sample set, and giving a higher weight to the sample with a higher occlusion overlap rate, and using the region proposal network to simultaneously generate a head candidate box and an overall candidate box set for all pedestrian instances in the scene, including: 所述基于遮挡重叠率判别的非均匀困难样本挖掘策略通过引入判定阈值,将样本集合依据平均遮挡重叠率划分为困难集()和普通集()两类,将每一样本被抽取的概率定义为,具体计算方法如公式3所示;The non-uniform difficult sample mining strategy based on occlusion overlap ratio judgment introduces a judgment threshold , the sample set is divided according to the average occlusion overlap rate Divided into difficult sets ( ) and the ordinary set ( ) two categories, the probability of each sample being drawn is defined as , the specific calculation method is shown in formula 3; (3) (3) 其中,表示第个样本的遮挡系数,用于反映第个样本受遮挡的程度;in, Indicates The occlusion coefficient of the sample is used to reflect the The degree to which each sample is occluded; 对于第个样本的遮挡系数,进一步表示为:For The occlusion coefficient of samples , further expressed as: (4) (4) 其中,表示检测需要的样本数目,表示候选样本总数,分别表示样本集中第个样本的遮挡重叠率和整个样本集合的平均遮挡重叠率,判定阈值可依据数据集的整体遮挡程度而设置不同的值;in, Indicates the number of samples required for testing. represents the total number of candidate samples, and Respectively represent the first The occlusion overlap rate of samples and the average occlusion overlap rate of the entire sample set are used to determine the threshold Different values can be set according to the overall occlusion level of the data set; 对于样本集中第个样本的遮挡重叠率和整个样本集合的平均遮挡重叠率,进一步表示为:For the sample set The occlusion overlap ratio of samples and the average occlusion overlap ratio of the entire sample set , further expressed as: (5) (5) (6)。 (6). 4.如权利要求3所述的基于头部和整体信息联合的多尺度行人检测方法,其特征在于,所述对得到的所述行人头部检测框和所述行人整体检测框进行后处理,并筛除联合检测过程中产生的冗余检测框,得到最终的行人检测结果,包括:4. The multi-scale pedestrian detection method based on joint head and overall information according to claim 3, characterized in that the obtained pedestrian head detection frame and the pedestrian overall detection frame are post-processed, and redundant detection frames generated in the joint detection process are filtered out to obtain the final pedestrian detection result, including: 对于联合检测框架得到的头部检测框和行人整体检测框分别引入递减置信度的惩罚因子,通过逐渐降低重叠检测框的置信度分数,在不过度抑制重叠框的同时降低其竞争,第个行人整体检测框的分数和第个头部检测框的分数表示为:For the head detection frame and pedestrian overall detection frame obtained by the joint detection framework, a penalty factor of decreasing confidence is introduced respectively. By gradually reducing the confidence score of the overlapping detection frame, the competition between the overlapping frames is reduced without excessively suppressing them. The score of the overall pedestrian detection box and The score of head detection boxes It is expressed as: (7) (7) (8) (8) (9) (9) (10) (10) 其中,分别为前一次迭代中分数最高的整体框和头部框,分别为整体检测和头部检测对应的NMS阈值,为初始设定的一个较小固定值;in, and are the overall frame and head frame with the highest scores in the previous iteration, and They are the NMS thresholds corresponding to overall detection and head detection, A small fixed value that is initially set; 对于得到的头部框和整体框的置信度分数进行加权求和,得到联合置信度分数,具体表示为:The confidence scores of the head frame and the overall frame are weighted summed to obtain the joint confidence score. , specifically expressed as: (11) (11) 其中,表示头部检测分数所占权重,如果头部重叠度或整体重叠度大于预设阈值,则对置信度较低的检测框进行抑制。in, Indicates the weight of the head detection score. If the head overlap or the overall overlap is greater than the preset threshold, the detection box with lower confidence will be suppressed. 5.如权利要求4所述的基于头部和整体信息联合的多尺度行人检测方法,其特征在于,所述为进一步抑制后处理环节中出现的误检和漏检情况,在损失函数部分构建联合损失函数,所述联合损失函数用于惩罚未被正确非极大值抑制的误检情况和被错误非极大值抑制的漏检情况,使得最终的检测结果更加准确,包括:5. The multi-scale pedestrian detection method based on the joint head and overall information according to claim 4 is characterized in that, in order to further suppress the false detection and missed detection in the post-processing link, a joint loss function is constructed in the loss function part, and the joint loss function is used to punish the false detection that is not suppressed by the correct non-maximum value and the missed detection that is suppressed by the wrong non-maximum value, so that the final detection result is more accurate, including: 所述联合损失函数具体表示为:The joint loss function Specifically expressed as: (12) (12) 其中,系数均为平衡损失的权重,用于将误检的头部框和整体框拉近以进行剔除,用于将被错误非极大值抑制的漏检头部框和整体框推远,从而使得头部框和整体框正确对应目标框;Among them, the coefficient , , and are the weights of the balanced loss, and It is used to bring the falsely detected head frame and the overall frame closer for elimination. and It is used to push away the missed head frame and the overall frame that are suppressed by the erroneous non-maximum value, so that the head frame and the overall frame correctly correspond to the target frame; 对于用于将被错误非极大值抑制的漏检头部框和整体框推远的损失函数,进一步表示为:For the missed head frame and the overall frame that are used to suppress the erroneous non-maximum value, and The loss function is further expressed as: (13) (13) (14) (14) 对于用于将误检的头部框和整体框拉近以进行剔除的损失函数,进一步表示为:For the purpose of bringing the falsely detected head frame and the overall frame closer together for elimination and The loss function is further expressed as: (15) (15) (16) (16) 其中,分别为头部检测分支和整体检测分支对应的NMS阈值,在对头部检测分支和整体检测分支产生的冗余框进行剔除后,得到最终的行人检测结果。in, and are the NMS thresholds corresponding to the head detection branch and the overall detection branch respectively. After removing the redundant frames generated by the head detection branch and the overall detection branch, the final pedestrian detection result is obtained. 6.一种基于头部和整体信息联合的多尺度行人检测装置,其特征在于,包括:6. A multi-scale pedestrian detection device based on the combination of head and overall information, characterized by comprising: 构建单元,用于构建Faster R-CNN网络模型;Construction unit, used to build Faster R-CNN network model; 提取单元,用于将所述Faster R-CNN网络模型的骨干网络融合改进后的特征提取网络,并将待检测图像输入至融合改进后特征提取网络的Faster R-CNN模型中进行特征提取,得到提取特征图;An extraction unit is used to fuse the backbone network of the Faster R-CNN network model with the improved feature extraction network, and input the image to be detected into the Faster R-CNN model fused with the improved feature extraction network to perform feature extraction, so as to obtain an extracted feature map; 生成单元,用于对区域建议网络的采样方式进行改进,构建基于遮挡重叠率判别的非均匀困难样本挖掘策略,通过计算样本集合中每一样本的遮挡重叠率,并为遮挡重叠率较高的样本赋予较高权重,利用所述区域建议网络对场景中的所有行人实例同时生成头部候选框与整体候选框集合;A generation unit is used to improve the sampling method of the region proposal network, construct a non-uniform difficult sample mining strategy based on occlusion overlap rate discrimination, calculate the occlusion overlap rate of each sample in the sample set, and give a higher weight to the sample with a higher occlusion overlap rate, and use the region proposal network to simultaneously generate a head candidate box and a whole candidate box set for all pedestrian instances in the scene; 初步检测单元,用于构建行人头部检测分支模块和行人整体检测分支模块,并通过所述行人头部检测分支模块和行人整体检测分支模块获得初步目标检测结果,所述初步目标检测结果包括行人头部检测框和行人整体检测框;A preliminary detection unit, used to construct a pedestrian head detection branch module and a pedestrian overall detection branch module, and obtain preliminary target detection results through the pedestrian head detection branch module and the pedestrian overall detection branch module, wherein the preliminary target detection results include a pedestrian head detection frame and a pedestrian overall detection frame; 后处理单元,用于对得到的所述行人头部检测框和所述行人整体检测框进行后处理,并筛除联合检测过程中产生的冗余检测框,得到最终的行人检测结果;A post-processing unit, used to post-process the obtained pedestrian head detection frame and the pedestrian whole detection frame, and filter out redundant detection frames generated in the joint detection process to obtain a final pedestrian detection result; 输出单元,用于为进一步抑制后处理环节中出现的误检和漏检情况,在损失函数部分构建联合损失函数,所述联合损失函数用于惩罚未被正确非极大值抑制的误检情况和被错误非极大值抑制的漏检情况,使得最终的检测结果更加准确。The output unit is used to further suppress false detection and missed detection in the post-processing link. A joint loss function is constructed in the loss function part. The joint loss function is used to punish false detections that are not suppressed by correct non-maxima and missed detections that are suppressed by incorrect non-maxima, so that the final detection result is more accurate. 7.如权利要求6所述的基于头部和整体信息联合的多尺度行人检测装置,其特征在于,所述提取单元具体用于:7. The multi-scale pedestrian detection device based on the combination of head and overall information according to claim 6, characterized in that the extraction unit is specifically used for: 将ResNet50网络作为Faster R-CNN网络模型的骨干网络;Use the ResNet50 network as the backbone network of the Faster R-CNN network model; 利用所述骨干网络学习待检测图像的特征信息;Using the backbone network to learn feature information of the image to be detected; 对所述骨干网络获取的特征信息进行特征融合,结合密集连接思想,对特征金字塔网络FPN的特征拼接策略进行改进,即令所有尺度的特征均参与计算输出特征信息,吸取大尺度特征图表达高级语义信息和小尺度特征图表达底层纹理等细节信息的优点,增强网络对多尺度特征的融合,则各层尺度特征对应的预测值~具体如公式1和公式2所示;The feature information obtained by the backbone network is fused, and the feature splicing strategy of the feature pyramid network FPN is improved by combining the idea of dense connection, that is, all scale features are involved in the calculation of output feature information, absorbing the advantages of large-scale feature maps expressing high-level semantic information and small-scale feature maps expressing underlying texture and other detail information, enhancing the network's fusion of multi-scale features, and then the prediction value corresponding to each layer of scale features is ~ The details are shown in formula 1 and formula 2; (1) (1) (2) (2) 其中,均为权重参数,为特征映射函数,~为特征提取网络获取到的各层尺度特征,在具体训练过程中,参数参与梯度的反向传播,通过模型的训练来学习更新至最为合适的值。in, are weight parameters, is the feature mapping function, ~ The scale features of each layer obtained by the feature extraction network. In the specific training process, the parameters Participate in the back propagation of gradients and learn to update to the most appropriate value through model training. 8.如权利要求7所述的多尺度行人检测装置,其特征在于,所述生成单元具体用于:8. The multi-scale pedestrian detection device according to claim 7, wherein the generating unit is specifically used for: 所述基于遮挡重叠率判别的非均匀困难样本挖掘策略通过引入判定阈值,将样本集合依据平均遮挡重叠率划分为困难集()和普通集()两类,将每一样本被抽取的概率定义为,具体计算方法如公式3所示;The non-uniform difficult sample mining strategy based on occlusion overlap ratio judgment introduces a judgment threshold , the sample set is divided according to the average occlusion overlap rate Divided into difficult sets ( ) and the ordinary set ( ) two categories, the probability of each sample being drawn is defined as , the specific calculation method is shown in formula 3; (3) (3) 其中,表示第个样本的遮挡系数,用于反映第个样本受遮挡的程度;in, Indicates The occlusion coefficient of the sample is used to reflect the The degree to which each sample is occluded; 对于第个样本的遮挡系数,进一步表示为:For The occlusion coefficient of samples , further expressed as: (4) (4) 其中,表示检测需要的样本数目,表示候选样本总数,分别表示样本集中第个样本的遮挡重叠率和整个样本集合的平均遮挡重叠率,判定阈值可依据数据集的整体遮挡程度而设置不同的值;in, Indicates the number of samples required for testing. represents the total number of candidate samples, and Respectively represent the first The occlusion overlap rate of samples and the average occlusion overlap rate of the entire sample set are used to determine the threshold Different values can be set according to the overall occlusion level of the data set; 对于样本集中第个样本的遮挡重叠率和整个样本集合的平均遮挡重叠率,进一步表示为:For the sample set The occlusion overlap ratio of samples and the average occlusion overlap ratio of the entire sample set , further expressed as: (5) (5) (6)。 (6). 9.如权利要求8所述的基于头部和整体信息联合的多尺度行人检测装置,其特征在于,所述后处理单元具体用于:9. The multi-scale pedestrian detection device based on the combination of head and overall information according to claim 8, characterized in that the post-processing unit is specifically used for: 对于联合检测框架得到的头部检测框和行人整体检测框分别引入递减置信度的惩罚因子,通过逐渐降低重叠检测框的置信度分数,在不过度抑制重叠框的同时降低其竞争,第个行人整体检测框的分数和第个头部检测框的分数表示为:For the head detection frame and pedestrian overall detection frame obtained by the joint detection framework, a penalty factor of decreasing confidence is introduced respectively. By gradually reducing the confidence score of the overlapping detection frame, the competition between the overlapping frames is reduced without excessively suppressing them. The score of the overall pedestrian detection box and The score of head detection boxes It is expressed as: (7) (7) (8) (8) (9) (9) (10) (10) 其中,分别为前一次迭代中分数最高的整体框和头部框,分别为整体检测和头部检测对应的NMS阈值,为初始设定的一个较小固定值;in, and are the overall frame and head frame with the highest scores in the previous iteration, and They are the NMS thresholds corresponding to overall detection and head detection, A small fixed value that is initially set; 对于得到的头部框和整体框的置信度分数进行加权求和,得到联合置信度分数,具体表示为:The confidence scores of the head frame and the overall frame are weighted summed to obtain the joint confidence score. , specifically expressed as: (11) (11) 其中,表示头部检测分数所占权重,如果头部重叠度或整体重叠度大于预设阈值,则对置信度较低的检测框进行抑制。in, Indicates the weight of the head detection score. If the head overlap or the overall overlap is greater than the preset threshold, the detection box with lower confidence will be suppressed. 10.如权利要求9所述的基于头部和整体信息联合的多尺度行人检测装置,其特征在于,所述输出单元具体用于:10. The multi-scale pedestrian detection device based on the combination of head and overall information according to claim 9, characterized in that the output unit is specifically used for: 所述联合损失函数具体表示为:The joint loss function Specifically expressed as: (12) (12) 其中,系数均为平衡损失的权重,用于将误检的头部框和整体框拉近以进行剔除,用于将被错误非极大值抑制的漏检头部框和整体框推远,从而使得头部框和整体框正确对应目标框;Among them, the coefficient , , and are the weights of the balanced loss, and It is used to bring the falsely detected head frame and the overall frame closer for elimination. and It is used to push away the missed head frame and the overall frame that are suppressed by the erroneous non-maximum value, so that the head frame and the overall frame correctly correspond to the target frame; 对于用于将被错误非极大值抑制的漏检头部框和整体框推远的损失函数,进一步表示为:For the missed head frame and the overall frame that are used to suppress the erroneous non-maximum value, and The loss function is further expressed as: (13) (13) (14) (14) 对于用于将误检的头部框和整体框拉近以进行剔除的损失函数,进一步表示为:For the purpose of bringing the falsely detected head frame and the overall frame closer together for elimination and The loss function is further expressed as: (15) (15) (16) (16) 其中,分别为头部检测分支和整体检测分支对应的NMS阈值,在对头部检测分支和整体检测分支产生的冗余框进行剔除后,得到最终的行人检测结果。in, and are the NMS thresholds corresponding to the head detection branch and the overall detection branch respectively. After removing the redundant frames generated by the head detection branch and the overall detection branch, the final pedestrian detection result is obtained.
CN202510409873.6A 2025-04-02 2025-04-02 Multi-scale pedestrian detection method and device based on joint head and overall information Active CN119904894B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202510409873.6A CN119904894B (en) 2025-04-02 2025-04-02 Multi-scale pedestrian detection method and device based on joint head and overall information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202510409873.6A CN119904894B (en) 2025-04-02 2025-04-02 Multi-scale pedestrian detection method and device based on joint head and overall information

Publications (2)

Publication Number Publication Date
CN119904894A true CN119904894A (en) 2025-04-29
CN119904894B CN119904894B (en) 2025-07-25

Family

ID=95466775

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202510409873.6A Active CN119904894B (en) 2025-04-02 2025-04-02 Multi-scale pedestrian detection method and device based on joint head and overall information

Country Status (1)

Country Link
CN (1) CN119904894B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108596958A (en) * 2018-05-10 2018-09-28 安徽大学 Target tracking method based on difficult positive sample generation
CN110348437A (en) * 2019-06-27 2019-10-18 电子科技大学 It is a kind of based on Weakly supervised study with block the object detection method of perception
CN111898406A (en) * 2020-06-05 2020-11-06 东南大学 Face detection method based on focal loss and multi-task cascade
CN117765534A (en) * 2023-09-11 2024-03-26 之江实验室 Automatic traffic image labeling method and device based on difficult sample mining
CN119027986A (en) * 2024-10-30 2024-11-26 中国科学院长春光学精密机械与物理研究所 A pedestrian detection method and device that resists occlusion overlap and scale change
CN119559681A (en) * 2024-11-18 2025-03-04 杭州电子科技大学 An occluded face recognition method based on FaceNet and attention mechanism

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108596958A (en) * 2018-05-10 2018-09-28 安徽大学 Target tracking method based on difficult positive sample generation
CN110348437A (en) * 2019-06-27 2019-10-18 电子科技大学 It is a kind of based on Weakly supervised study with block the object detection method of perception
CN111898406A (en) * 2020-06-05 2020-11-06 东南大学 Face detection method based on focal loss and multi-task cascade
CN117765534A (en) * 2023-09-11 2024-03-26 之江实验室 Automatic traffic image labeling method and device based on difficult sample mining
CN119027986A (en) * 2024-10-30 2024-11-26 中国科学院长春光学精密机械与物理研究所 A pedestrian detection method and device that resists occlusion overlap and scale change
CN119559681A (en) * 2024-11-18 2025-03-04 杭州电子科技大学 An occluded face recognition method based on FaceNet and attention mechanism

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李晓艳 等: ""基于深度学习的多模态行人检测算法"", 《西安交通大学学报》, 31 October 2022 (2022-10-31), pages 61 - 70 *

Also Published As

Publication number Publication date
CN119904894B (en) 2025-07-25

Similar Documents

Publication Publication Date Title
CN113420607A (en) Multi-scale target detection and identification method for unmanned aerial vehicle
CN115346177A (en) Novel system and method for detecting target under road side view angle
CN109977793A (en) Trackside image pedestrian's dividing method based on mutative scale multiple features fusion convolutional network
CN112766087A (en) Optical remote sensing image ship detection method based on knowledge distillation
CN112949508A (en) Model training method, pedestrian detection method, electronic device and readable storage medium
CN111814621A (en) A multi-scale vehicle pedestrian detection method and device based on attention mechanism
WO2020181685A1 (en) Vehicle-mounted video target detection method based on deep learning
CN110084292A (en) Object detection method based on DenseNet and multi-scale feature fusion
CN114120127B (en) Target detection method, device and related equipment
CN109559302A (en) Pipe video defect inspection method based on convolutional neural networks
CN107563372A (en) A kind of license plate locating method based on deep learning SSD frameworks
CN115861772A (en) Multi-scale single-stage target detection method based on RetinaNet
CN118015490B (en) A method, system and electronic device for detecting small targets in unmanned aerial image
CN115131640A (en) Target detection method and system utilizing illumination guide and attention mechanism
CN115223017A (en) A multi-scale feature fusion bridge detection method based on depthwise separable convolution
CN115731517B (en) A crowd detection method based on Crowd-RetinaNet network
Wu et al. Traffic sign detection based on SSD combined with receptive field module and path aggregation network
CN111191531A (en) Rapid pedestrian detection method and system
CN116778277A (en) Cross-domain model training method based on progressive information decoupling
CN119027986B (en) A pedestrian detection method and device that resists occlusion overlap and scale change
CN113642388A (en) Improved mask wearing intelligent detection method based on YOLOv3
CN117037264A (en) Prison personnel abnormal behavior identification method based on target and key point detection
CN114565892A (en) A method of electric vehicle identification and elevator protection based on machine vision
CN116665016A (en) A Single Frame Infrared Weak Small Target Detection Method Based on Improved YOLOv5
CN116863271A (en) Lightweight infrared flame detection method based on improved YOLO V5

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载