+

CN118865336A - A lightweight YOLOv8-pose based method for fatigue driving detection - Google Patents

A lightweight YOLOv8-pose based method for fatigue driving detection Download PDF

Info

Publication number
CN118865336A
CN118865336A CN202410896825.XA CN202410896825A CN118865336A CN 118865336 A CN118865336 A CN 118865336A CN 202410896825 A CN202410896825 A CN 202410896825A CN 118865336 A CN118865336 A CN 118865336A
Authority
CN
China
Prior art keywords
model
pose
fatigue
lightweight
yolov
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410896825.XA
Other languages
Chinese (zh)
Inventor
林志贤
蔡忠祺
林珊玲
林坚普
吕珊红
师欣雨
刘珂
张建豪
赖芳伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fuzhou University
Original Assignee
Fuzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fuzhou University filed Critical Fuzhou University
Priority to CN202410896825.XA priority Critical patent/CN118865336A/en
Publication of CN118865336A publication Critical patent/CN118865336A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/59Context or environment of the image inside of a vehicle, e.g. relating to seat occupancy, driver state or inner lighting conditions
    • G06V20/597Recognising the driver's state or behaviour, e.g. attention or drowsiness
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/18Eye characteristics, e.g. of the iris

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Ophthalmology & Optometry (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Image Analysis (AREA)

Abstract

本发明提出一种轻量化YOLOv8‑pose的疲劳驾驶检测方法,所述方法以采集的多姿态、多视角人脸数据集来构建轻量化YOLOv8‑pose模型,并在模型主干网络中引入Ghost卷积来减少模型参数量和不必要的卷积计算,通过引入Slim‑neck融合模型主干网络提取的不同尺寸特征来加速网络预测计算,在模型Neck部分添加遮挡感知注意力模块SEAM来强调图像中的人脸区域并弱化背景,以改善关键点定位效果,同时在模型的检测头部分采用GNSC‑Head结构,其使用共享卷积来将卷积的BN层优化成更稳定的GN层,以节省模型的参数空间和计算资源;所述方法通过构建疲劳决策模型并对模型的输出结果进行评估,以判断驾驶员是否处于疲劳状态;本发明的该方法有利于降低计算复杂度,并提高疲劳检测精度。

The present invention proposes a lightweight YOLOv8-pose fatigue driving detection method. The method constructs a lightweight YOLOv8-pose model with a collected multi-pose and multi-view face data set, introduces Ghost convolution in the model backbone network to reduce the model parameter amount and unnecessary convolution calculation, accelerates network prediction calculation by introducing different size features extracted by the Slim-neck fusion model backbone network, adds an occlusion-aware attention module SEAM to the Neck part of the model to emphasize the face area in the image and weaken the background, so as to improve the key point positioning effect, and adopts a GNSC-Head structure in the detection head part of the model, which uses shared convolution to optimize the convolutional BN layer into a more stable GN layer, so as to save the parameter space and computing resources of the model; the method constructs a fatigue decision model and evaluates the output result of the model to judge whether the driver is in a fatigue state; the method of the present invention is conducive to reducing the computational complexity and improving the fatigue detection accuracy.

Description

一种轻量化YOLOv8-pose的疲劳驾驶检测方法A lightweight YOLOv8-pose based method for fatigue driving detection

技术领域Technical Field

本发明涉及疲劳驾驶检测技术领域,尤其是一种轻量化YOLOv8-pose的疲劳驾驶检测方法。The present invention relates to the technical field of fatigue driving detection, and in particular to a lightweight YOLOv8-pose fatigue driving detection method.

背景技术Background Art

近年来,随着社会经济的发展,人民的生活水平逐渐提高,汽车的人均持有量日渐增多,但是伴随着汽车数量和公路里程的快速增长,交通事故也呈现出一定程度的增长,其中约18%的交通事故是由于驾驶员身心疲劳导致的。据调查研究发现,如果在驾驶员产生疲劳状态时,能给予一定的提醒和警告,90%的类似交通事故可以避免。因此,研究一种高效的疲劳驾驶检测方法对于交通安全具有重要的意义。In recent years, with the development of social economy, people's living standards have gradually improved, and the per capita car ownership has increased. However, with the rapid growth of the number of cars and road mileage, traffic accidents have also increased to a certain extent, of which about 18% are caused by driver fatigue. According to investigations and studies, if certain reminders and warnings can be given when the driver is fatigued, 90% of similar traffic accidents can be avoided. Therefore, studying an efficient fatigue driving detection method is of great significance to traffic safety.

目前,疲劳驾驶检测的研究主要分为接触式检测和非接触式检测,接触式检测是通过佩戴传感器来获取驾驶员的生理信息并进行研究,一般根据脑电信号和心电信号去评估驾驶员的疲劳程度,虽然识别度高,但需要佩戴复杂的硬件设备,因此难以普及应用。非接触式主要采用两种方法,第一种是基于车辆轨迹进行疲劳驾驶检测,虽然不需要直接接触驾驶员,但是容易受到路况和天气的影响,准确率较低。第二种是使用图像采集设备对驾驶员的面部进行信息采集,通过计算机判定驾驶员的疲劳状态。相比于生理信号采集设备和车辆轨迹采集设备,监控摄像头的成本低廉,便捷性高,且不影响驾驶员的正常驾车,因此这类研究更具有应用潜力。At present, the research on fatigue driving detection is mainly divided into contact detection and non-contact detection. Contact detection is to obtain the driver's physiological information and conduct research by wearing sensors. Generally, the driver's fatigue level is evaluated based on EEG signals and ECG signals. Although the recognition rate is high, it requires wearing complex hardware equipment, so it is difficult to popularize and apply. Non-contact mainly uses two methods. The first is to detect fatigue driving based on vehicle trajectory. Although it does not require direct contact with the driver, it is easily affected by road conditions and weather, and the accuracy is low. The second is to use image acquisition equipment to collect information on the driver's face and determine the driver's fatigue state through a computer. Compared with physiological signal acquisition equipment and vehicle trajectory acquisition equipment, surveillance cameras are low-cost, highly convenient, and do not affect the driver's normal driving. Therefore, this type of research has more application potential.

然而,现有的基于深度学习的疲劳检测算法多多少少存在一些不足,有些方法仅通过单一特征进行疲劳检测,没有考虑到驾驶员的多种疲劳状态,鲁棒性较差。有些方法虽然检测准确率高,但是所使用的模型参数量和计算量大,难以在车载边缘设备中进行部署。有些方法虽然使用到了轻量化的模型,但是所使用的检测模型数量过多,分离模型需要分开训练,不能端到端的进行优化,模型与模型之间的传输也存在一定延迟,导致最终疲劳检测实时性变差。However, existing deep learning-based fatigue detection algorithms have some shortcomings. Some methods only use a single feature for fatigue detection, without considering the driver's various fatigue states, and have poor robustness. Although some methods have high detection accuracy, the model parameters and calculations used are large, making it difficult to deploy in vehicle-mounted edge devices. Although some methods use lightweight models, the number of detection models used is too large, and the separated models need to be trained separately, which cannot be optimized end-to-end. There is also a certain delay in the transmission between models, resulting in poor real-time performance of fatigue detection.

发明内容Summary of the invention

本发明提出一种轻量化YOLOv8-pose的疲劳驾驶检测方法,该方法有利于降低计算复杂度,并提高疲劳检测精度。The present invention proposes a lightweight YOLOv8-pose fatigue driving detection method, which is beneficial to reducing computational complexity and improving fatigue detection accuracy.

本发明采用以下技术方案。The present invention adopts the following technical solutions.

一种轻量化YOLOv8-pose的疲劳驾驶检测方法,所述方法以采集的多姿态、多视角人脸数据集来构建轻量化YOLOv8-pose模型,并在模型主干网络中引入Ghost卷积来减少模型参数量和不必要的卷积计算,通过引入Slim-neck融合模型主干网络提取的不同尺寸特征来加速网络预测计算,在模型Neck部分添加遮挡感知注意力模块SEAM来强调图像中的人脸区域并弱化背景,以改善关键点定位效果,同时在模型的检测头部分采用GNSC-Head结构,其使用共享卷积来将卷积的BN层优化成更稳定的GN层,以节省模型的参数空间和计算资源;所述方法通过构建疲劳决策模型并对模型的输出结果进行评估,以判断驾驶员是否处于疲劳状态。A lightweight YOLOv8-pose fatigue driving detection method is disclosed. The method uses a collected multi-pose and multi-view face data set to construct a lightweight YOLOv8-pose model, introduces Ghost convolution in the model backbone network to reduce the model parameters and unnecessary convolution calculations, accelerates network prediction calculations by introducing different size features extracted by the Slim-neck fusion model backbone network, adds an occlusion-aware attention module SEAM to the Neck part of the model to emphasize the face area in the image and weaken the background to improve the key point positioning effect, and adopts a GNSC-Head structure in the detection head part of the model, which uses shared convolution to optimize the convolutional BN layer into a more stable GN layer to save the model's parameter space and computing resources; the method constructs a fatigue decision model and evaluates the output result of the model to determine whether the driver is in a fatigue state.

所述方法包括以下步骤;The method comprises the following steps:

步骤S1:采集多姿态、多视角的人脸数据集,并将标注文件进行转换,生成对应的YOLO标注文件;Step S1: Collect multi-pose and multi-view face datasets, convert the annotation files, and generate corresponding YOLO annotation files;

步骤S2:构建并训练轻量化YOLOv8-pose模型,采用训练好的模型对输入的图像进行人脸检测,并在识别出人脸后进行面部关键点定位,最后在检测头部分进行分类和回归,得到驾驶员眼部和嘴部的关键点坐标;Step S2: Build and train a lightweight YOLOv8-pose model, use the trained model to perform face detection on the input image, and locate facial key points after recognizing the face. Finally, perform classification and regression in the detection head to obtain the key point coordinates of the driver's eyes and mouth.

步骤S3:构建疲劳决策模型,针对眼部和嘴部的特征给出疲劳判断,采用PERCLOS评价准则对眼部信息进行睁闭眼判定,并通过检测一段时间内驾驶员眼睛的闭合次数来判断驾驶员是否处于疲劳状态;对嘴部特征的评价准则采用同原理方法,即对嘴部坐标信息进行张合嘴判定,通过检测一段时间内驾驶员打哈欠次数来判断驾驶员是否处于疲劳状态;Step S3: construct a fatigue decision model, give fatigue judgment based on the features of eyes and mouth, use PERCLOS evaluation criteria to judge whether the eye information is open or closed, and judge whether the driver is in a fatigue state by detecting the number of times the driver's eyes are closed within a period of time; the evaluation criteria for mouth features adopt the same principle method, that is, judge whether the mouth is open or closed based on the mouth coordinate information, and judge whether the driver is in a fatigue state by detecting the number of times the driver yawns within a period of time;

步骤S4:对于步骤S2、步骤S3识别分析得到的驾驶员疲劳行为的判断结果进行可视化预警。Step S4: Provide a visual warning for the judgment results of the driver's fatigue behavior obtained by the identification and analysis in steps S2 and S3.

步骤S1中,引入AFLW数据集作为包含多姿态、多视角的大规模人脸数据集使用,若该数据集内闭眼的图片数量低至不满足需求,则引入CEW闭眼数据集作为补充,当上述数据集内的标注文件的格式并非YOLO模型的标注格式时,对其标注文件进行转换,方法是使用Python对数据进行处理,生成对应的YOLO标注文件,步骤S1中,对上述数据集以6:2:2的比例进行随机抽样,划分为训练集、验证集和测试集。In step S1, the AFLW dataset is introduced as a large-scale face dataset containing multiple poses and multiple perspectives. If the number of closed-eye images in the dataset is too low to meet the demand, the CEW closed-eye dataset is introduced as a supplement. When the format of the annotation file in the above dataset is not the annotation format of the YOLO model, the annotation file is converted by processing the data using Python to generate the corresponding YOLO annotation file. In step S1, the above dataset is randomly sampled in a ratio of 6:2:2 and divided into a training set, a validation set, and a test set.

在步骤S2中,轻量化的YOLOv8-pose在主干网络部分,引入轻量级卷积GhostConv,以使用廉价的线性变换,低代价地生成大量能从原始特征发掘所需信息的Ghost特征图,以减少模型的参数量和计算量;In step S2, the lightweight YOLOv8-pose introduces a lightweight convolution GhostConv in the backbone network to use cheap linear transformation to generate a large number of Ghost feature maps that can mine the required information from the original features at a low cost, so as to reduce the number of parameters and calculations of the model;

步骤S2中引入检测速度快的C3模块,通过在C3模块上进行轻量化操作,对两个相同的GhostConv进行组合连接,构成Ghost-Bottleneck结构,将重组后的C3模块作为C3Ghost模块,以降低模型的复杂度及降低人脸关键点模型的计算量。In step S2, a C3 module with fast detection speed is introduced. By performing lightweight operations on the C3 module, two identical GhostConvs are combined and connected to form a Ghost-Bottleneck structure. The reorganized C3 module is used as the C3Ghost module to reduce the complexity of the model and the amount of calculation of the facial key point model.

在步骤S2中,轻量化的YOLOv8-pose使用Slim-neck网络结构作为增强特征融合网络,引入轻量级卷积GSConv,通过密集卷积计算来最大限度地保留了每个通道之间的隐含连接以加速模型的预测计算,并通过残差连接构成GS-bottleneck结构以进一步增强网络处理特征的能力;最后通过一次性聚合方法构成VOV-GSCSP模块,使得不同位置的梯度能够交叉混合以增强网络的梯度表现和学习能力。In step S2, the lightweight YOLOv8-pose uses the Slim-neck network structure as the enhanced feature fusion network, introduces the lightweight convolution GSConv, and uses dense convolution calculations to maximize the implicit connections between each channel to accelerate the prediction calculation of the model. The GS-bottleneck structure is formed through residual connections to further enhance the network's ability to process features; finally, the VOV-GSCSP module is formed through a one-time aggregation method, so that gradients at different positions can be cross-mixed to enhance the gradient performance and learning ability of the network.

在步骤S2中,轻量化的YOLOv8-pose加入遮挡感知注意力SEAM,以在实现多尺度人脸检测的同时强调图像中的人脸区域并弱化背景,改善面部关键点定位效果。In step S2, the lightweight YOLOv8-pose adds occlusion-aware attention SEAM to emphasize the face area in the image and weaken the background while achieving multi-scale face detection, thereby improving the facial landmark location effect.

在步骤S2中,轻量化的YOLOv8-pose以GNSC-Head检测头保留解耦结构,并使用组归一化的共享卷积对网络进行轻量化改进。。In step S2, the lightweight YOLOv8-pose retains the decoupled structure with the GNSC-Head detection head, and uses group normalized shared convolution to make lightweight improvements to the network.

步骤S2中,利用Pytorch深度学习框架,搭建轻量化YOLOv8-pose关键点检测模型,训练轮数设置为300轮,输入图像尺寸设置为640×640,batch size为64,每10轮保存一次训练权重,并利用所述验证集计算mAP,用以评估模型性能,最后选取mAP最大的模型作为最终模型。In step S2, a lightweight YOLOv8-pose key point detection model is built using the Pytorch deep learning framework. The number of training rounds is set to 300, the input image size is set to 640×640, the batch size is 64, the training weights are saved every 10 rounds, and the mAP is calculated using the validation set to evaluate the model performance. Finally, the model with the largest mAP is selected as the final model.

在步骤S3中,根据轻量化YOLOv8-pose输出的眼部关键点信息,计算眼部纵横比EAR,EAR阈值设定参考PERCLOS标准中的P80标准,P80表示眼睑覆盖瞳孔的面积超过80%算作眼睛闭合;公式如下:In step S3, the eye aspect ratio EAR is calculated based on the eye key point information output by the lightweight YOLOv8-pose. The EAR threshold is set with reference to the P80 standard in the PERCLOS standard. P80 means that the eye is considered closed when the area of the eyelid covering the pupil exceeds 80%. The formula is as follows:

其中xi表示眼部关键点的横坐标,yi表示纵坐标,定义fe来判断驾驶员的眼部疲劳状态,te是检测时间内的闭眼帧数,Te是检测时间的总帧数;Where x i represents the horizontal coordinate of the eye key point, y i represents the vertical coordinate, fe is defined to judge the driver's eye fatigue state, te is the number of closed-eye frames within the detection time, and Te is the total number of frames within the detection time;

在步骤S3中,根据轻量化YOLOv8-pose输出的嘴部关键点信息,计算嘴部长宽比MAR。公式如下:In step S3, the mouth aspect ratio MAR is calculated based on the mouth key point information output by the lightweight YOLOv8-pose. The formula is as follows:

其中xi表示嘴部关键点的横坐标,yi表示纵坐标,定义fm判断驾驶员嘴部疲劳状态,tm表示检测时间内的张嘴帧数,Tm表示检测时间内的总帧数。Where xi represents the horizontal coordinate of the mouth key point, yi represents the vertical coordinate, fm is defined to judge the driver's mouth fatigue state, tm represents the number of mouth opening frames within the detection time, and Tm represents the total number of frames within the detection time.

步骤S3中,以算法对模型的输出结果进行评估,综合各项指标判断驾驶员是否处于疲劳状态,并进行可视化预警,可视化预警的触发标准为:In step S3, the output results of the model are evaluated by the algorithm, and various indicators are combined to determine whether the driver is in a fatigue state, and a visual warning is issued. The triggering criteria of the visual warning are:

正常状态下,单眼闭合时间在0.1-0.15秒,而在疲劳状态下,单眼闭合时间会大于0.5秒,即若在单位时间内闭眼频率大于0.5时,判断驾驶员处于疲劳状态;Under normal conditions, the closing time of one eye is 0.1-0.15 seconds, while under fatigue conditions, the closing time of one eye will be greater than 0.5 seconds. That is, if the closing frequency per unit time is greater than 0.5, it is judged that the driver is in a fatigue state.

通常人类打哈欠状态的持续时间为3-5秒;在步骤S3的该算法中,单位检测时间选择为30秒,30秒内打哈欠的次数不能超过两次,即单位时间内打哈欠频率大于0.4时,判断驾驶员处于疲劳状态。Usually, the duration of a human yawn is 3-5 seconds. In the algorithm of step S3, the unit detection time is selected as 30 seconds, and the number of yawns within 30 seconds cannot exceed twice, that is, when the yawn frequency per unit time is greater than 0.4, it is judged that the driver is in a fatigue state.

与现有技术相比,本发明具有以下增益效果:本发明提供了一种轻量化YOLOv8-pose的疲劳驾驶检测方法,该方法利用轻量化YOLOv8-pose模型实现人脸检测和面部关键点定位,并经过疲劳决策模块判断驾驶员的精神状态。轻量化YOLOv8-pose模型使用Ghost卷积和GS卷积极大降低了网络参数量,提高了模型的检测速率,同时采用了遮挡感知注意力机制和GNSC-Head结构,在轻量化的同时保持较高的检测精度,疲劳决策模块采用多特征的疲劳判定方式能够更有效识别驾驶员状态,为车辆边缘设备的部署提供有力支撑。Compared with the prior art, the present invention has the following gain effects: The present invention provides a lightweight YOLOv8-pose fatigue driving detection method, which uses a lightweight YOLOv8-pose model to realize face detection and facial key point positioning, and judges the driver's mental state through a fatigue decision module. The lightweight YOLOv8-pose model uses Ghost convolution and GS convolution to greatly reduce the number of network parameters and improve the detection rate of the model. At the same time, it adopts an occlusion-aware attention mechanism and a GNSC-Head structure to maintain high detection accuracy while being lightweight. The fatigue decision module uses a multi-feature fatigue judgment method to more effectively identify the driver's state, providing strong support for the deployment of vehicle edge devices.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

下面结合附图和具体实施方式对本发明进一步详细的说明:The present invention is further described in detail below with reference to the accompanying drawings and specific embodiments:

附图1是本发明实施例的方法实现流程图;FIG1 is a flowchart of a method implementation of an embodiment of the present invention;

附图2是本发明实施例中关键点检测模型的结构图;Figure 2 is a structural diagram of a key point detection model in an embodiment of the present invention;

附图3是本发明实施例中SEAM模块的结构图;FIG3 is a structural diagram of a SEAM module in an embodiment of the present invention;

附图4是本发明实施例中GNSC-Head模块的结构图;FIG4 is a structural diagram of a GNSC-Head module in an embodiment of the present invention;

附图5是本发明实施例中人脸关键点位置分布图;FIG. 5 is a distribution diagram of key points of a face in an embodiment of the present invention;

附图6是本发明实施例中疲劳检测系统的可视化示意图。FIG6 is a visualized schematic diagram of a fatigue detection system in an embodiment of the present invention.

具体实施方式DETAILED DESCRIPTION

下面结合附图及实施例对本发明做进一步说明。The present invention will be further described below in conjunction with the accompanying drawings and embodiments.

应该指出,以下详细说明都是示例性的,旨在对本申请提供进一步的说明。除非另有指明,本发明使用的所有技术和科学术语具有与本申请所属技术领域的普通技术人员通常理解的相同含义。It should be noted that the following detailed descriptions are exemplary and are intended to provide further explanation of the present application. Unless otherwise specified, all technical and scientific terms used in the present invention have the same meanings as those commonly understood by those skilled in the art to which the present application belongs.

需要注意的是,这里所使用的术语仅是为了描述具体实施方式,而非意图限制根据本申请的示例性实施方式。如在这里所使用的,除非上下文另外明确指出,否则单数形式也意图包括复数形式,此外,还应当理解的是,当在本说明书中使用术语“包含”和/或“包括”时,其指明存在特征、步骤、操作、器件、组件和/或它们的组合。It should be noted that the terms used herein are only for describing specific embodiments and are not intended to limit the exemplary embodiments according to the present application. As used herein, unless the context clearly indicates otherwise, the singular form is also intended to include the plural form. In addition, it should be understood that when the terms "comprise" and/or "include" are used in this specification, it indicates the presence of features, steps, operations, devices, components and/or combinations thereof.

如图所示,一种轻量化YOLOv8-pose的疲劳驾驶检测方法,所述方法以采集的多姿态、多视角人脸数据集来构建轻量化YOLOv8-pose模型,并在模型主干网络中引入Ghost卷积来减少模型参数量和不必要的卷积计算,通过引入Slim-neck融合模型主干网络提取的不同尺寸特征来加速网络预测计算,在模型Neck部分添加遮挡感知注意力模块SEAM来强调图像中的人脸区域并弱化背景,以改善关键点定位效果,同时在模型的检测头部分采用GNSC-Head结构,其使用共享卷积来将卷积的BN层优化成更稳定的GN层,以节省模型的参数空间和计算资源;所述方法通过构建疲劳决策模型并对模型的输出结果进行评估,以判断驾驶员是否处于疲劳状态。As shown in the figure, a lightweight YOLOv8-pose fatigue driving detection method is provided. The method constructs a lightweight YOLOv8-pose model with a collected multi-pose and multi-view face dataset, introduces Ghost convolution in the model backbone network to reduce the model parameters and unnecessary convolution calculations, accelerates network prediction calculations by introducing different size features extracted by the Slim-neck fusion model backbone network, adds an occlusion-aware attention module SEAM to the Neck part of the model to emphasize the face area in the image and weaken the background to improve the key point positioning effect, and adopts a GNSC-Head structure in the detection head part of the model, which uses shared convolution to optimize the convolutional BN layer into a more stable GN layer to save the model's parameter space and computing resources; the method constructs a fatigue decision model and evaluates the output results of the model to determine whether the driver is in a fatigue state.

所述方法包括以下步骤;The method comprises the following steps:

步骤S1:采集多姿态、多视角的人脸数据集,并将标注文件进行转换,生成对应的YOLO标注文件;Step S1: Collect multi-pose and multi-view face datasets, convert the annotation files, and generate corresponding YOLO annotation files;

步骤S2:构建并训练轻量化YOLOv8-pose模型,采用训练好的模型对输入的图像进行人脸检测,并在识别出人脸后进行面部关键点定位,最后在检测头部分进行分类和回归,得到驾驶员眼部和嘴部的关键点坐标;Step S2: Build and train a lightweight YOLOv8-pose model, use the trained model to perform face detection on the input image, and locate facial key points after recognizing the face. Finally, perform classification and regression in the detection head to obtain the key point coordinates of the driver's eyes and mouth.

步骤S3:构建疲劳决策模型,针对眼部和嘴部的特征给出疲劳判断,采用PERCLOS评价准则对眼部信息进行睁闭眼判定,并通过检测一段时间内驾驶员眼睛的闭合次数来判断驾驶员是否处于疲劳状态;对嘴部特征的评价准则采用同原理方法,即对嘴部坐标信息进行张合嘴判定,通过检测一段时间内驾驶员打哈欠次数来判断驾驶员是否处于疲劳状态;Step S3: construct a fatigue decision model, give fatigue judgment based on the features of eyes and mouth, use PERCLOS evaluation criteria to judge whether the eye information is open or closed, and judge whether the driver is in a fatigue state by detecting the number of times the driver's eyes are closed within a period of time; the evaluation criteria for mouth features adopt the same principle method, that is, judge whether the mouth is open or closed based on the mouth coordinate information, and judge whether the driver is in a fatigue state by detecting the number of times the driver yawns within a period of time;

步骤S4:对于步骤S2、步骤S3识别分析得到的驾驶员疲劳行为的判断结果进行可视化预警。Step S4: Provide a visual warning for the judgment results of the driver's fatigue behavior obtained by the identification and analysis in steps S2 and S3.

步骤S1中,引入AFLW数据集作为包含多姿态、多视角的大规模人脸数据集使用,若该数据集内闭眼的图片数量低至不满足需求,则引入CEW闭眼数据集作为补充,当上述数据集内的标注文件的格式并非YOLO模型的标注格式时,对其标注文件进行转换,方法是使用Python对数据进行处理,生成对应的YOLO标注文件,步骤S1中,对上述数据集以6:2:2的比例进行随机抽样,划分为训练集、验证集和测试集。In step S1, the AFLW dataset is introduced as a large-scale face dataset containing multiple poses and multiple perspectives. If the number of closed-eye images in the dataset is too low to meet the demand, the CEW closed-eye dataset is introduced as a supplement. When the format of the annotation file in the above dataset is not the annotation format of the YOLO model, the annotation file is converted by processing the data using Python to generate the corresponding YOLO annotation file. In step S1, the above dataset is randomly sampled in a ratio of 6:2:2 and divided into a training set, a validation set, and a test set.

在步骤S2中,轻量化的YOLOv8-pose在主干网络部分,引入轻量级卷积GhostConv,以使用廉价的线性变换,低代价地生成大量能从原始特征发掘所需信息的Ghost特征图,以减少模型的参数量和计算量;In step S2, the lightweight YOLOv8-pose introduces a lightweight convolution GhostConv in the backbone network to use cheap linear transformation to generate a large number of Ghost feature maps that can mine the required information from the original features at a low cost, so as to reduce the number of parameters and calculations of the model;

步骤S2中引入检测速度快的C3模块,通过在C3模块上进行轻量化操作,对两个相同的GhostConv进行组合连接,构成Ghost-Bottleneck结构,将重组后的C3模块作为C3Ghost模块,以降低模型的复杂度及降低人脸关键点模型的计算量。In step S2, a C3 module with fast detection speed is introduced. By performing lightweight operations on the C3 module, two identical GhostConvs are combined and connected to form a Ghost-Bottleneck structure. The reorganized C3 module is used as the C3Ghost module to reduce the complexity of the model and the amount of calculation of the facial key point model.

在步骤S2中,轻量化的YOLOv8-pose使用Slim-neck网络结构作为增强特征融合网络,引入轻量级卷积GSConv,通过密集卷积计算来最大限度地保留了每个通道之间的隐含连接以加速模型的预测计算,并通过残差连接构成GS-bottleneck结构以进一步增强网络处理特征的能力;最后通过一次性聚合方法构成VOV-GSCSP模块,使得不同位置的梯度能够交叉混合以增强网络的梯度表现和学习能力。In step S2, the lightweight YOLOv8-pose uses the Slim-neck network structure as the enhanced feature fusion network, introduces the lightweight convolution GSConv, and uses dense convolution calculations to maximize the implicit connections between each channel to accelerate the prediction calculation of the model. The GS-bottleneck structure is formed through residual connections to further enhance the network's ability to process features; finally, the VOV-GSCSP module is formed through a one-time aggregation method, so that gradients at different positions can be cross-mixed to enhance the gradient performance and learning ability of the network.

在步骤S2中,轻量化的YOLOv8-pose加入遮挡感知注意力SEAM,以在实现多尺度人脸检测的同时强调图像中的人脸区域并弱化背景,改善面部关键点定位效果。In step S2, the lightweight YOLOv8-pose adds occlusion-aware attention SEAM to emphasize the face area in the image and weaken the background while achieving multi-scale face detection, thereby improving the facial landmark location effect.

在步骤S2中,轻量化的YOLOv8-pose以GNSC-Head检测头保留解耦结构,并使用组归一化的共享卷积对网络进行轻量化改进。。In step S2, the lightweight YOLOv8-pose retains the decoupled structure with the GNSC-Head detection head, and uses group normalized shared convolution to make lightweight improvements to the network.

步骤S2中,利用Pytorch深度学习框架,搭建轻量化YOLOv8-pose关键点检测模型,训练轮数设置为300轮,输入图像尺寸设置为640×640,batch size为64,每10轮保存一次训练权重,并利用所述验证集计算mAP,用以评估模型性能,最后选取mAP最大的模型作为最终模型。In step S2, a lightweight YOLOv8-pose key point detection model is built using the Pytorch deep learning framework. The number of training rounds is set to 300, the input image size is set to 640×640, the batch size is 64, the training weights are saved every 10 rounds, and the mAP is calculated using the validation set to evaluate the model performance. Finally, the model with the largest mAP is selected as the final model.

在步骤S3中,根据轻量化YOLOv8-pose输出的眼部关键点信息,计算眼部纵横比EAR,EAR阈值设定参考PERCLOS标准中的P80标准,P80表示眼睑覆盖瞳孔的面积超过80%算作眼睛闭合;公式如下:In step S3, the eye aspect ratio EAR is calculated based on the eye key point information output by the lightweight YOLOv8-pose. The EAR threshold is set with reference to the P80 standard in the PERCLOS standard. P80 means that the eye is considered closed when the area of the eyelid covering the pupil exceeds 80%. The formula is as follows:

其中xi表示眼部关键点的横坐标,yi表示纵坐标,定义fe来判断驾驶员的眼部疲劳状态,te是检测时间内的闭眼帧数,Te是检测时间的总帧数;Where x i represents the horizontal coordinate of the eye key point, y i represents the vertical coordinate, fe is defined to judge the driver's eye fatigue state, te is the number of closed-eye frames within the detection time, and Te is the total number of frames within the detection time;

在步骤S3中,根据轻量化YOLOv8-pose输出的嘴部关键点信息,计算嘴部长宽比MAR。公式如下:In step S3, the mouth aspect ratio MAR is calculated based on the mouth key point information output by the lightweight YOLOv8-pose. The formula is as follows:

其中xi表示嘴部关键点的横坐标,yi表示纵坐标,定义fm判断驾驶员嘴部疲劳状态,tm表示检测时间内的张嘴帧数,Tm表示检测时间内的总帧数。Where xi represents the horizontal coordinate of the mouth key point, yi represents the vertical coordinate, fm is defined to judge the driver's mouth fatigue state, tm represents the number of mouth opening frames within the detection time, and Tm represents the total number of frames within the detection time.

步骤S3中,以算法对模型的输出结果进行评估,综合各项指标判断驾驶员是否处于疲劳状态,并进行可视化预警,可视化预警的触发标准为:In step S3, the output results of the model are evaluated by the algorithm, and various indicators are combined to determine whether the driver is in a fatigue state, and a visual warning is issued. The triggering criteria of the visual warning are:

正常状态下,单眼闭合时间在0.1-0.15秒,而在疲劳状态下,单眼闭合时间会大于0.5秒,即若在单位时间内闭眼频率大于0.5时,判断驾驶员处于疲劳状态;Under normal conditions, the closing time of one eye is 0.1-0.15 seconds, while under fatigue conditions, the closing time of one eye will be greater than 0.5 seconds. That is, if the closing frequency per unit time is greater than 0.5, it is judged that the driver is in a fatigue state.

通常人类打哈欠状态的持续时间为3-5秒;在步骤S3的该算法中,单位检测时间选择为30秒,30秒内打哈欠的次数不能超过两次,即单位时间内打哈欠频率大于0.4时,判断驾驶员处于疲劳状态。Usually, the duration of a human yawn is 3-5 seconds. In the algorithm of step S3, the unit detection time is selected as 30 seconds, and the number of yawns within 30 seconds cannot exceed twice, that is, when the yawn frequency per unit time is greater than 0.4, it is judged that the driver is in a fatigue state.

实施例:Example:

本例中,AFLW是一个包含多姿态、多视角的大规模人脸数据库,其包含不同姿态、表情、光照和种族等因素影响的照片,该数据库拥有约2.5万手工标注的人脸图片,其适用于人脸识别,人脸关键点检测等领域。由于数据集中闭眼的图片数量过少,于是引入CEW闭眼数据集作为补充,包含了2423个测试者睁眼与闭眼状态的照片,照片的差异化体现于测试者个体的差异以及各种环境的变化,如光照、模糊度、遮挡等因素,能有效提高模型训练的鲁棒性。In this case, AFLW is a large-scale face database with multiple poses and perspectives, which contains photos affected by different poses, expressions, lighting, race and other factors. The database has about 25,000 manually annotated face pictures, which are suitable for face recognition, face key point detection and other fields. Since there are too few closed-eye pictures in the data set, the CEW closed-eye data set is introduced as a supplement, which contains 2423 photos of testers with open and closed eyes. The differentiation of photos is reflected in the individual differences of the testers and the changes in various environments, such as lighting, blur, occlusion and other factors, which can effectively improve the robustness of model training.

本例中,构建的用于疲劳检测的关键点检测模型为轻量化YOLOv8-pose模型,其模型结构如图2所示。In this example, the key point detection model constructed for fatigue detection is a lightweight YOLOv8-pose model, and its model structure is shown in Figure 2.

具体地,主干网络是用于图像特征提取,它由Ghost卷积和C3Ghost模块组成。在深度卷积神经网络中,中间层的输出特征图通常会包含丰富甚至冗余的特征图,部分特征图的特征信息较为相似,从而会占用大量内存和FLOPs,Ghost卷积通过廉价的线性变换,以很小的代价生成许多能从原始特征发掘所需信息的Ghost特征图,能有效减少模型的参数量和计算量。在原YOLOv8n-pose的主干网络中,C2f模块为了获取更多丰富的梯度流信息,牺牲了一定的速度,但在疲劳驾驶检测系统中,由于周围的场景较为单一,无需识别较小的目标,因此将C2f模块优化成C3模块,并在C3模块上进行轻量化操作,引入Ghost-net网络中的Ghost-Bottleneck结构,其中通过第一个Ghost卷积来压缩通道数,减少参数量,减少高频噪声的影响。第二个Ghost卷积用于恢复通道数。然后通过残差连接操作添加原始特征,以补充通道压缩造成的信息损失,这样的瓶颈结构有利于减少参数量和信息的损失。同时为了不影响主干网络的特征提取能力,C3模块中其余的卷积操作保持不变。将改进后的C3模块命名为C3Ghost模块。C3Ghost模块作为一种轻量级网络可以在保持原有输出特征图的尺寸和通道大小的基础上,有效降低网络的参数量和计算成本,并进一步降低模型的复杂度,降低人脸关键点模型的计算量。Specifically, the backbone network is used for image feature extraction, which consists of Ghost convolution and C3Ghost modules. In deep convolutional neural networks, the output feature maps of the intermediate layers usually contain rich or even redundant feature maps, and the feature information of some feature maps is relatively similar, which will take up a lot of memory and FLOPs. Ghost convolution generates many Ghost feature maps that can mine the required information from the original features at a very low cost through cheap linear transformation, which can effectively reduce the number of parameters and calculations of the model. In the original YOLOv8n-pose backbone network, the C2f module sacrifices a certain speed in order to obtain more rich gradient flow information, but in the fatigue driving detection system, since the surrounding scenes are relatively simple and there is no need to identify smaller targets, the C2f module is optimized to the C3 module, and a lightweight operation is performed on the C3 module to introduce the Ghost-Bottleneck structure in the Ghost-net network, in which the first Ghost convolution is used to compress the number of channels, reduce the number of parameters, and reduce the impact of high-frequency noise. The second Ghost convolution is used to restore the number of channels. Then, the original features are added through the residual connection operation to supplement the information loss caused by channel compression. Such a bottleneck structure is conducive to reducing the amount of parameters and information loss. At the same time, in order not to affect the feature extraction ability of the backbone network, the remaining convolution operations in the C3 module remain unchanged. The improved C3 module is named C3Ghost module. As a lightweight network, the C3Ghost module can effectively reduce the number of network parameters and computational costs while maintaining the size of the original output feature map and the channel size, and further reduce the complexity of the model and the amount of computation of the face key point model.

特征融合网络的作用是增强对主干网络输出特征的提取,并将不同阶段提取的特征横向连接,实现高层次的语义特征和低层次的细节特征融合,从而产生更丰富的特征表达。为了进一步实现模型的轻量化,同时不影响模型对特征的提取能力,本发明引入Slim-neck中的GSConv并采用VoV-GSCSP模块对YOLOv8-pose的Neck进行改进,进一步优化模型的参数量和计算复杂度。GSConv通过密集卷积计算最大限度地保留了每个通道之间的隐含连接,避免了特征图每一次的空间压缩和通道扩张导致的语义信息丢失,加速模型的预测计算。并通过残差连接构成GS-bottleneck结构,进一步增强网络处理特征的能力。最后通过一次性聚合方法构成VOV-GSCSP模块,使得不同位置的梯度能够交叉混合,增强网络的梯度表现和学习能力,在轻量化的同时,保证了模型的精度。The function of the feature fusion network is to enhance the extraction of the output features of the backbone network, and to connect the features extracted at different stages horizontally, so as to achieve the fusion of high-level semantic features and low-level detail features, thereby generating a richer feature expression. In order to further realize the lightweight of the model without affecting the model's ability to extract features, the present invention introduces GSConv in Slim-neck and uses the VoV-GSCSP module to improve the Neck of YOLOv8-pose, further optimizing the parameter quantity and computational complexity of the model. GSConv retains the implicit connection between each channel to the maximum extent through dense convolution calculation, avoids the loss of semantic information caused by each spatial compression and channel expansion of the feature map, and accelerates the prediction calculation of the model. And the GS-bottleneck structure is formed by residual connection to further enhance the network's ability to process features. Finally, the VOV-GSCSP module is formed by a one-time aggregation method, so that the gradients at different positions can be cross-mixed, enhancing the gradient performance and learning ability of the network, and ensuring the accuracy of the model while being lightweight.

同时在特征融合网络中加入遮挡感知注意力模块(Separated and EnhancementAttention Module,SEAM),该模块能有效避免在驾车行驶的过程中,由于司机佩戴墨镜或者口罩,造成人脸识别率低和定位不准确的问题。SEAM的整体架构如图3所示,左边是SEAM的架构,右边是DcovN(通道和空间混合模块)的结构。DcovN的第一部分是带有残差连接的逐深度卷积,使用逐深度卷积虽然可以学习到不同通道的重要性并减少参数量,但忽略了通道间的信息关系。为了弥补这种损失,在第二部分的引入逐点卷积进行组合,以增强DcovN模块的表征能力和泛化能力。输入特征图经过DcovN模块后,经过平均池化层,减少特征图的空间尺寸,并保留重要的特征信息,之后使用两个全连接网络来融合各个通道的信息,使得网络能够加强所有通道之间的连接,再将全连接层学习到的输出经过指数函数处理,将取值范围从[0,1]扩大到[1,e],由于指数的归一化能提供单调的映射关系,输出结果更能容忍位置误差。最后将SEAM模块的输出乘上原始特征,使模型能够有效处理人脸的遮挡部分。At the same time, the Separated and Enhancement Attention Module (SEAM) is added to the feature fusion network. This module can effectively avoid the problem of low face recognition rate and inaccurate positioning caused by the driver wearing sunglasses or masks while driving. The overall architecture of SEAM is shown in Figure 3. The left side is the architecture of SEAM, and the right side is the structure of DcovN (channel and spatial hybrid module). The first part of DcovN is a depth-wise convolution with residual connection. Although the use of depth-wise convolution can learn the importance of different channels and reduce the number of parameters, it ignores the information relationship between channels. In order to make up for this loss, the point-by-point convolution is introduced in the second part to enhance the representation and generalization capabilities of the DcovN module. After the input feature map passes through the DcovN module, it passes through the average pooling layer to reduce the spatial size of the feature map and retain important feature information. Then two fully connected networks are used to fuse the information of each channel, so that the network can strengthen the connection between all channels. The output learned by the fully connected layer is processed by an exponential function to expand the value range from [0,1] to [1,e]. Since the normalization of the exponential can provide a monotonic mapping relationship, the output result can tolerate position errors better. Finally, the output of the SEAM module is multiplied by the original feature, so that the model can effectively handle the occluded part of the face.

YOLOv8-pose检测头采用的是解耦结构,将目标位置、类别信息和关键点坐标分别提取出来,通过不同的网络分支分别学习,最后进行融合。相比于传统的耦合头,解耦头能更好地处理不同尺度和精细度的语义信息,提高模型的泛化能力和鲁棒性。但同时参数量也大大增加,单单检测头的参数量就占据了整个模型的1/3,因此为了提高模型的检测速度,同时保持精度的稳定,本发明采用GNSC-Head检测头,保留了原始检测头的解耦结构,使用组归一化的共享卷积对网络进行改进,改进后的检测头网络结构见图4。The YOLOv8-pose detection head adopts a decoupled structure, which extracts the target position, category information and key point coordinates respectively, learns them separately through different network branches, and finally fuses them. Compared with the traditional coupling head, the decoupling head can better handle semantic information of different scales and fineness, and improve the generalization ability and robustness of the model. But at the same time, the number of parameters is also greatly increased. The number of parameters of the detection head alone accounts for 1/3 of the entire model. Therefore, in order to improve the detection speed of the model while maintaining the stability of accuracy, the present invention adopts the GNSC-Head detection head, retains the decoupled structure of the original detection head, and uses group normalized shared convolution to improve the network. The improved detection head network structure is shown in Figure 4.

GNSC-Head通过引入共享卷积,在多个位置上使用,节省参数空间和计算资源,使其能在空间有限的车载系统进行部署应用。同时为了弥补因使用共享卷积导致模型特征提取能力的下降,将卷积层的BN层替换成GN层,相较于BN层,GN是将特征图分成若干组并对每个组进行归一化,它不依赖于批量大小,因此对于高精度图片小批量的情况下也能够保持较好的性能。同时,在检测任务中,检测头的输入来自于ROI区域,而这些区域是从相同图片采样得到的,它们不满足独立同分布假设,而非独立同分布会弱化BN层的均值和方差分布,所以在检测头中,使用GN比使用BN的效果会更好。GNSC-Head introduces shared convolution and uses it in multiple locations, saving parameter space and computing resources, so that it can be deployed and applied in vehicle systems with limited space. At the same time, in order to make up for the decline in the model's feature extraction ability caused by the use of shared convolution, the BN layer of the convolution layer is replaced by the GN layer. Compared with the BN layer, GN divides the feature map into several groups and normalizes each group. It does not depend on the batch size, so it can maintain good performance even in the case of small batches of high-precision images. At the same time, in the detection task, the input of the detection head comes from the ROI area, which is sampled from the same image. They do not meet the independent and identically distributed assumption, and non-independent and identically distributed will weaken the mean and variance distribution of the BN layer. Therefore, in the detection head, using GN will have a better effect than using BN.

在分类任务中,无论大中小目标的检测层,它们都是针对相同的目标进行检测,因此共享卷积能够更好地适应分类任务的不同阶段,从而提高模型在不同尺度目标上的泛化能力。在边界框回归任务中,由于共享卷积提取的特征对所有尺度的目标都是相同的,因此无法有效地区别不同尺度的目标,为此在网络的最后加入了Scale层,通过引入一个可学习的缩放因子,帮助模型在不同尺度的目标上更好地进行特征提取,从而有效调整输入的尺度。而在关键点回归任务中,共享卷积的通道数需要被组数整除,以确保每组有相同数量的通道,但是在本发明中的人脸关键点的总数无法被组数整除,并且对于关键点的回归任务需要精确定位目标图像的关键点,是一项比较细粒度的任务,因此在此任务中选择原始检测头的网络结构,保证模型整体的精确度。In the classification task, no matter the detection layer of large, medium or small targets, they all detect the same target, so shared convolution can better adapt to the different stages of the classification task, thereby improving the generalization ability of the model on targets of different scales. In the bounding box regression task, since the features extracted by shared convolution are the same for targets of all scales, it is impossible to effectively distinguish targets of different scales. For this reason, a Scale layer is added at the end of the network. By introducing a learnable scaling factor, the model is helped to better extract features on targets of different scales, thereby effectively adjusting the scale of the input. In the key point regression task, the number of channels of the shared convolution needs to be divisible by the number of groups to ensure that each group has the same number of channels, but the total number of face key points in the present invention cannot be divided by the number of groups, and the key point regression task needs to accurately locate the key points of the target image, which is a relatively fine-grained task. Therefore, the network structure of the original detection head is selected in this task to ensure the overall accuracy of the model.

本例中的步骤S3具体为:构建疲劳决策模型,针对眼部和嘴部的特征给出疲劳判断,采用PERCLOS评价准则对眼部信息进行睁闭眼判定,并通过检测一段时间内驾驶员眼睛的闭合次数来判断驾驶员是否处于疲劳状态。嘴部特征的评价准则同理,对嘴部坐标信息进行张合嘴判定,通过检测一段时间内驾驶员打哈欠次数来判断驾驶员是否处于疲劳状态。Step S3 in this example is specifically as follows: construct a fatigue decision model, make fatigue judgments based on the features of the eyes and mouth, use the PERCLOS evaluation criteria to determine whether the eye information is open or closed, and determine whether the driver is in a fatigued state by detecting the number of times the driver's eyes are closed within a period of time. The evaluation criteria for mouth features are similar, and the mouth coordinate information is determined to be open or closed, and the number of times the driver yawns within a period of time is detected to determine whether the driver is in a fatigued state.

具体地,通过轻量化YOLOv8n-pose网络模型输入图像,从图像中检测到脸部区域后提取脸部关键点,并在图像中绘制出关键点的位置。传统的人脸关键点由68个点组成,如图5所示,但在实际检测过程中仅获取驾驶员眼睛、嘴巴部位就可以判断驾驶员是否疲劳。因此发明只选取了其中18个脸部关键点进行疲劳状态评估,分别为左,右眼共12个关键点(37-48),以及嘴巴部分的6个关键点(49、51、53、55、57、59)。Specifically, the image is input through the lightweight YOLOv8n-pose network model, the facial key points are extracted after the facial area is detected from the image, and the positions of the key points are drawn in the image. The traditional facial key points are composed of 68 points, as shown in Figure 5, but in the actual detection process, only the driver's eyes and mouth parts are obtained to determine whether the driver is tired. Therefore, the invention only selects 18 of the facial key points for fatigue status assessment, namely 12 key points (37-48) for the left and right eyes, and 6 key points (49, 51, 53, 55, 57, 59) for the mouth.

利用眼部关键点的x,y坐标得到各点的欧式距离,计算眼睛的长宽比(EAR),EAR的阈值参考PERCLOS,它是国际公认的疲劳判断标准,指一定时间内闭眼的时间比例,PERCLOS判断标准包括P70、P80和EM,分别表示眼睑覆盖瞳孔的面积超过70%、80%、50%算作眼睛闭合。其中,P80被认为是对疲劳最敏感的标准。本文中眼部状态评价指标参考PERCLOS标准中的P80标准,即将EAR阈值设定在0.2,当EAR小于0.2时,判定眼睛闭合。公式如下:The x and y coordinates of the key points of the eye are used to obtain the Euclidean distance of each point, and the aspect ratio (EAR) of the eye is calculated. The threshold of EAR refers to PERCLOS, which is an internationally recognized standard for judging fatigue. It refers to the proportion of time when the eyes are closed within a certain period of time. The PERCLOS judgment standard includes P70, P80 and EM, which respectively indicate that the area of the pupil covered by the eyelid is more than 70%, 80% and 50% and is considered as eye closure. Among them, P80 is considered to be the most sensitive standard for fatigue. The eye status evaluation index in this article refers to the P80 standard in the PERCLOS standard, that is, the EAR threshold is set at 0.2. When the EAR is less than 0.2, the eyes are judged to be closed. The formula is as follows:

通过定义fe来判断驾驶员的眼部疲劳状态,其中te是检测时间内的闭眼帧数,Te是检测时间的总帧数。正常状态下,单眼闭合时间通常在0.1-0.15秒,而在疲劳状态下,单眼闭合时间通常为会大于0.5秒,因此将fe的参数阈值设置为0.5,即单位时间内闭眼频率大于0.5时,判断驾驶员处于疲劳状态。The driver's eye fatigue state is determined by defining fe , where te is the number of closed eye frames within the detection time, and Te is the total number of frames within the detection time. Under normal conditions, the single eye closure time is usually 0.1-0.15 seconds, while under fatigue conditions, the single eye closure time is usually greater than 0.5 seconds. Therefore, the parameter threshold of fe is set to 0.5, that is, when the eye closure frequency per unit time is greater than 0.5, the driver is judged to be in a fatigue state.

同理,嘴部状态评价指标与眼部状态评价指标是类似的,利用嘴部关键点的x,y坐标得到各点的欧式距离,计算嘴巴的长宽比(MAR),定义fm判断驾驶员嘴部疲劳状态,计算公式如下:Similarly, the mouth state evaluation index is similar to the eye state evaluation index. The x and y coordinates of the key points of the mouth are used to obtain the Euclidean distance of each point, calculate the length-width ratio (MAR) of the mouth, and define f m to judge the driver's mouth fatigue state. The calculation formula is as follows:

其中,tm表示检测时间内的张嘴帧数,Tm表示检测时间内的总帧数。通常人类打哈欠状态的持续时间为3-5秒。在该算法中,单位检测时间选择为30秒,30秒内打哈欠的次数不能超过两次。因此本文将fm的参数阈值设置为0.4,即单位时间内打哈欠频率大于0.4时,判断驾驶员处于疲劳状态。Where tm represents the number of mouth-opening frames within the detection time, and Tm represents the total number of frames within the detection time. Usually, the duration of a human yawn is 3-5 seconds. In this algorithm, the unit detection time is selected as 30 seconds, and the number of yawns within 30 seconds cannot exceed twice. Therefore, this paper sets the parameter threshold of fm to 0.4, that is, when the yawning frequency per unit time is greater than 0.4, it is judged that the driver is in a fatigue state.

S4:对于S2、S3识别分析得到的驾驶员疲劳行为的判断结果进行可视化预警。S4: Provide visual warning for the judgment results of driver fatigue behavior obtained through identification and analysis in S2 and S3.

具体地,为了测试轻量化YOLOv8-pose模型在驾驶环境中的泛化能力和有效性,本文以YawnDD(YAWNING DETECTION DATASET)驾驶检测视频数据集的识别率作为判断指标,视频数据是在汽车主驾驶座采集,包括前向视频采集和斜侧向视频采集,拍摄的视频内容包含正常驾驶、说话和疲劳打哈欠的内容。依照上文方法,计算EAR与MAR,检测驾驶员闭眼次数和打哈欠行为。如图6所示,轻量化YOLOv8-pose模型能够准确检测和跟踪驾驶员驾驶过程中的眨眼和打哈欠行为并进行有效的识别统计和驾驶状态判别。即使在女驾驶员佩戴眼镜或者男驾驶员驾驶环境光线较暗的情况下,该模型也能识别出驾驶员们的疲劳状态。Specifically, in order to test the generalization ability and effectiveness of the lightweight YOLOv8-pose model in the driving environment, this paper uses the recognition rate of the YawnDD (YAWNING DETECTION DATASET) driving detection video dataset as the judgment indicator. The video data is collected from the main driver's seat of the car, including forward video collection and oblique side video collection. The video content captured includes normal driving, talking, and fatigue yawning. According to the above method, EAR and MAR are calculated to detect the number of times the driver closes his eyes and yawning behavior. As shown in Figure 6, the lightweight YOLOv8-pose model can accurately detect and track the driver's blinking and yawning behavior during driving and perform effective recognition statistics and driving state discrimination. Even when the female driver wears glasses or the male driver drives in a dark environment, the model can still identify the fatigue state of the drivers.

综上所述,本发明所述一种轻量化YOLOv8-pose的疲劳驾驶检测方法能有效提升模型算法对关键点的特征提取能力以及检测精度,提高模型算法的轻量级性能,大大提高了系统的实时性。In summary, the lightweight YOLOv8-pose fatigue driving detection method described in the present invention can effectively improve the model algorithm's feature extraction capability and detection accuracy for key points, improve the lightweight performance of the model algorithm, and greatly improve the real-time performance of the system.

在以上的描述中阐述了很多具体细节以便于充分理解本发明。但是以上描述仅是本发明的较佳实施例而已,本发明能够以很多不同于在此描述的其它方式来实施,因此本发明不受上面公开的具体实施的限制。同时任何熟悉本领域技术人员在不脱离本发明技术方案范围情况下,都可利用上述揭示的方法和技术内容对本发明技术方案做出许多可能的变动和修饰,或修改为等同变化的等效实施例。凡是未脱离本发明技术方案的内容,依据本发明的技术实质对以上实施例所做的任何简单修改、等同变化及修饰,均仍属于本发明技术方案保护的范围内。Many specific details are described in the above description to facilitate a full understanding of the present invention. However, the above description is only a preferred embodiment of the present invention. The present invention can be implemented in many other ways different from those described herein, so the present invention is not limited to the specific implementation disclosed above. At the same time, any person familiar with the art can make many possible changes and modifications to the technical solution of the present invention using the methods and technical contents disclosed above without departing from the scope of the technical solution of the present invention, or modify it into an equivalent embodiment of equivalent changes. Any simple modification, equivalent change and modification made to the above embodiments according to the technical essence of the present invention without departing from the content of the technical solution of the present invention still falls within the scope of protection of the technical solution of the present invention.

Claims (10)

1. A fatigue driving detection method for light YOLOv-pose is characterized in that: according to the method, a lightweight YOLOv-pose model is built by an acquired multi-pose and multi-view face data set, a Ghost convolution is introduced into a model backbone network to reduce the number of model parameters and unnecessary convolution calculation, network prediction calculation is accelerated by introducing different size features extracted by a Slim-neg fusion model backbone network, a shielding perception attention module SEAM is added in a model Neck part to emphasize a face region in an image and weaken a background so as to improve a key point positioning effect, and a GNSC-Head structure is adopted in a detection Head part of the model, and a shared convolution is used to optimize a convolved BN layer into a more stable GN layer so as to save parameter space and calculation resources of the model; the method comprises the steps of constructing a fatigue decision model and evaluating an output result of the model to judge whether a driver is in a fatigue state.
2. The method for detecting fatigue driving of light YOLOv-pose according to claim 1, wherein the method comprises the steps of: the method comprises the following steps;
Step S1: collecting a multi-gesture and multi-view face data set, and converting the annotation file to generate a corresponding YOLO annotation file;
Step S2: constructing and training a lightweight YOLOv-pose model, adopting the trained model to detect the human face of an input image, positioning key points of the face after the human face is identified, and finally classifying and regressing the detected head part to obtain the coordinates of the key points of eyes and a mouth of a driver;
Step S3: constructing a fatigue decision model, giving out fatigue judgment aiming at the characteristics of eyes and a mouth, adopting a PERCLOS evaluation criterion to carry out eye opening and closing judgment on eye information, and judging whether a driver is in a fatigue state or not by detecting the closing times of eyes of the driver within a period of time; the evaluation criterion of the mouth characteristics adopts the same principle method, namely mouth opening and closing judgment is carried out on the mouth coordinate information, and whether the driver is in a fatigue state is judged by detecting the yawning times of the driver in a period of time;
step S4: and (3) carrying out visual early warning on the judgment result of the fatigue behavior of the driver, which is obtained through the identification and analysis in the step (S2) and the step (S3).
3. The method for detecting fatigue driving of light YOLOv-pose according to claim 2, wherein the method comprises the steps of: in step S1, a AFLW dataset is introduced as a large-scale face dataset comprising multiple poses and multiple views, if the number of closed-eye pictures in the dataset is low enough to not meet the requirement, a CEW closed-eye dataset is introduced as a supplement, and when the format of the annotation file in the dataset is not the annotation format of the YOLO model, the annotation file is converted, the method is to process the data by using Python to generate a corresponding YOLO annotation file, and in step S1, the dataset is processed with 6:2:2, randomly sampling the proportion of the test data and dividing the test data into a training set, a verification set and a test set.
4. The method for detecting fatigue driving of light YOLOv-pose according to claim 2, wherein the method comprises the steps of: in step S2, lightweight YOLOv-pose introduces lightweight convolution GhostConv in the backbone network portion to use inexpensive linear transformation to generate a large number of Ghost feature maps capable of extracting required information from the original features at low cost, so as to reduce the number of parameters and calculation amount of the model;
in the step S2, a C3 module with high detection speed is introduced, two identical GhostConv modules are combined and connected through light weight operation on the C3 module to form a Ghost-Bottleneck structure, and the C3 module after recombination is used as a C3Ghost module, so that the complexity of a model is reduced, and the calculated amount of a face key point model is reduced.
5. The method for detecting fatigue driving of light YOLOv-pose according to claim 4, wherein the method comprises the steps of: in step S2, the lightweight YOLOv-pose uses a Slim-neg network structure as an enhanced feature fusion network, introduces lightweight convolution GSConv, furthest reserves implicit connection between each channel through dense convolution calculation to accelerate prediction calculation of a model, and constructs a GS-bottleneck structure through residual connection to further enhance the capability of network processing features; finally, the VOV-GSCSP module is formed by a one-time aggregation method, so that gradients at different positions can be mixed in a cross mode to enhance gradient expression and learning capability of the network.
6. The method for detecting fatigue driving of light YOLOv-pose according to claim 4, wherein the method comprises the steps of: in step S2, light YOLOv-pose is added to the occlusion awareness sea to emphasize the face region in the image and weaken the background while realizing multi-scale face detection, and improve the positioning effect of the facial key points.
7. The method for detecting fatigue driving of light YOLOv-pose according to claim 4, wherein the method comprises the steps of: in step S2, lightweight YOLOv-pose preserve the decoupling structure with a GNSC-Head detection Head and lightweight improvements are made to the network using group normalized shared convolution. .
8. The method for detecting fatigue driving of light YOLOv to pose according to claim 5, wherein: in step S2, a Pytorch deep learning framework is utilized to build a lightweight YOLOv-pose key point detection model, the number of training rounds is set to 300, the input image size is set to 640×640, the batch size is 64, training weights are stored once every 10 rounds, the verification set is utilized to calculate the mAP, the mAP performance is used to evaluate the model performance, and finally the model with the largest mAP is selected as the final model.
9. The method for detecting fatigue driving of light YOLOv-pose according to claim 2, wherein the method comprises the steps of: in step S3, according to the eye key point information output by the lightweight YOLOv-pose, calculating an eye aspect ratio EAR, wherein EAR threshold setting refers to the P80 standard in the PERCLOS standard, and P80 represents that the area of the eyelid covering the pupil exceeds 80% and is calculated as eye closure; the formula is as follows:
Wherein x i represents the abscissa of the eye key points, y i represents the ordinate, f e is defined to judge the eye fatigue state of the driver, T e is the number of eye closure frames in the detection time, and T e is the total number of frames in the detection time;
In step S3, the aspect ratio MAR of the mouth is calculated based on the mouth keypoint information output from the lightweight YOLOv-pose. The formula is as follows:
Wherein x i represents the abscissa of the key point of the mouth, y i represents the ordinate, f m is defined to judge the fatigue state of the mouth of the driver, T m represents the number of frames of the mouth in the detection time, and T m represents the total number of frames in the detection time.
10. The method for detecting fatigue driving of light YOLOv to pose according to claim 9, wherein the method comprises the steps of: in step S3, the output result of the model is evaluated by an algorithm, and whether the driver is in a fatigue state is judged by integrating various indexes, and a visual early warning is performed, wherein the triggering standard of the visual early warning is as follows:
Under normal state, the closing time of the monocular is 0.1-0.15 seconds, and under fatigue state, the closing time of the monocular is more than 0.5 seconds, namely if the closing frequency of the monocular is more than 0.5 in unit time, the driver is judged to be in fatigue state; typically the duration of the human yawning state is 3-5 seconds; in the algorithm in step S3, the unit detection time is selected to be 30 seconds, and the number of times of yawning in 30 seconds cannot exceed two times, that is, when the yawning frequency in the unit time is greater than 0.4, the driver is judged to be in a fatigue state.
CN202410896825.XA 2024-07-05 2024-07-05 A lightweight YOLOv8-pose based method for fatigue driving detection Pending CN118865336A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410896825.XA CN118865336A (en) 2024-07-05 2024-07-05 A lightweight YOLOv8-pose based method for fatigue driving detection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410896825.XA CN118865336A (en) 2024-07-05 2024-07-05 A lightweight YOLOv8-pose based method for fatigue driving detection

Publications (1)

Publication Number Publication Date
CN118865336A true CN118865336A (en) 2024-10-29

Family

ID=93157566

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410896825.XA Pending CN118865336A (en) 2024-07-05 2024-07-05 A lightweight YOLOv8-pose based method for fatigue driving detection

Country Status (1)

Country Link
CN (1) CN118865336A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119556128A (en) * 2025-01-24 2025-03-04 山东大学 A fault diagnosis method for on-load tap changer based on lightweight YOLO11
CN119904846A (en) * 2024-12-31 2025-04-29 浙江理工大学 Lightweight fatigue driving detection method based on improved yolov8

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119904846A (en) * 2024-12-31 2025-04-29 浙江理工大学 Lightweight fatigue driving detection method based on improved yolov8
CN119556128A (en) * 2025-01-24 2025-03-04 山东大学 A fault diagnosis method for on-load tap changer based on lightweight YOLO11

Similar Documents

Publication Publication Date Title
Zhao et al. Driver fatigue detection based on convolutional neural networks using em‐CNN
Ji et al. Fatigue state detection based on multi-index fusion and state recognition network
CN118865336A (en) A lightweight YOLOv8-pose based method for fatigue driving detection
CN108053615A (en) Driver tired driving condition detection method based on micro- expression
CN112131981A (en) Driver fatigue detection method based on skeleton data behavior recognition
CN115393830A (en) A fatigue driving detection method based on deep learning and facial features
CN102263937A (en) Driver's driving behavior monitoring device and monitoring method based on video detection
CN112016429A (en) Fatigue driving detection method based on train cab scene
CN109740477A (en) Study in Driver Fatigue State Surveillance System and its fatigue detection method
CN117333852A (en) Deep learning-based driver safety belt detection method
CN118230296B (en) A lightweight method for detecting and tracking fatigue driving
CN110532925A (en) Driver Fatigue Detection based on space-time diagram convolutional network
CN116824558B (en) Fatigue driving behavior identification method for 3D point cloud image data
CN112052829B (en) Pilot behavior monitoring method based on deep learning
CN116012819A (en) Fatigue driving detection method integrating space-time characteristics
CN114973214A (en) A method for recognizing unsafe driving behavior based on facial feature points
CN118314556A (en) Fatigue driving detection method, system, computer equipment and storage medium
CN111563468B (en) Driver abnormal behavior detection method based on attention of neural network
CN113343770B (en) A face anti-counterfeiting method based on feature screening
CN119007166A (en) Multi-feature weighted fusion driver fatigue detection method
CN116434029B (en) A kind of drinking detection method
CN117612142B (en) Head posture and fatigue state detection method based on multi-task joint model
Ma et al. Driver identification and fatigue detection algorithm based on deep learning
CN118279964A (en) Passenger cabin comfort level recognition system and method based on face video non-contact measurement
CN117351468A (en) Driver drowsiness judgment method combining perspective correction and improved ViViT

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载