+

CN118799949B - High-precision line of sight estimation method in low-light environment - Google Patents

High-precision line of sight estimation method in low-light environment Download PDF

Info

Publication number
CN118799949B
CN118799949B CN202410833622.6A CN202410833622A CN118799949B CN 118799949 B CN118799949 B CN 118799949B CN 202410833622 A CN202410833622 A CN 202410833622A CN 118799949 B CN118799949 B CN 118799949B
Authority
CN
China
Prior art keywords
image
feature
low
enhanced
feature map
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410833622.6A
Other languages
Chinese (zh)
Other versions
CN118799949A (en
Inventor
王进
王可
曹硕裕
徐嘉玲
赵颖钏
吕泽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Cactus Technology Co.,Ltd.
Original Assignee
Nantong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nantong University filed Critical Nantong University
Priority to CN202410833622.6A priority Critical patent/CN118799949B/en
Publication of CN118799949A publication Critical patent/CN118799949A/en
Application granted granted Critical
Publication of CN118799949B publication Critical patent/CN118799949B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/18Eye characteristics, e.g. of the iris
    • G06V40/193Preprocessing; Feature extraction
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/90Dynamic range modification of images or parts thereof
    • G06T5/92Dynamic range modification of images or parts thereof based on global image properties
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Molecular Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Ophthalmology & Optometry (AREA)
  • Human Computer Interaction (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a high-precision sight estimation method under a low-light environment, which comprises the following steps of S1, preprocessing a data set, simulating low-light environment S2, increasing a low-light image to obtain an enhanced image, S3, enhancing image calibration to obtain a calibration image, S4, extracting features of the image, outputting a feature vector, using an improved residual network model ResNet to extract features of the calibrated image, S5, mapping the feature vector into a three-dimensional output vector through a full connection layer, S6, applying hyperbolic tangent transformation to the first two elements of the three-dimensional output vector to obtain an accurate prediction sight direction, S7, transforming a third element of the three-dimensional output vector through a sigmoid function to obtain the uncertainty of sight prediction, and S8, measuring the error between a prediction result and a true value by adopting a MSELoss loss function. The method can effectively solve the problem that the sight line estimation precision is obviously reduced in a low-light environment, thereby improving the practicability and the accuracy of the system.

Description

High-precision sight line estimation method in low-light environment
Technical Field
The invention relates to the field of deep learning and computer vision, in particular to a high-precision sight estimation method in a low-light environment.
Background
Line-of-sight estimation techniques aim at determining the gaze direction and focus of a person in an image or video. As a core element of human social interaction, gaze behavior can reveal a large amount of potential information. Traditional line-of-sight estimation methods rely mainly on model driven techniques and require the use of expensive high resolution and complex equipment such as infrared cameras. However, with the rapid development of appearance-based gaze estimation techniques, particularly a method of estimating gaze location directly from a face or eye image by using a Convolutional Neural Network (CNN), significant technological breakthroughs have been made. The methods mainly adopt an end-to-end learning strategy, and can realize accurate sight estimation by mapping an image to a function of the sight direction and only inputting a face image. The method not only simplifies the data processing flow, but also remarkably improves the accuracy and the practicability of the sight line estimation by comprehensively analyzing the global characteristics of the whole face, and becomes the mainstream technology in the sight line estimation field.
While these innovative approaches have achieved significant results in terms of cross-domain adaptation, fine-grained processing, multi-person recognition, real-time application, and robustness, achieving high accuracy gaze estimation at low light scenes remains a challenge. Therefore, the invention aims to improve the performance and accuracy of the sight line estimation in the low-light environment so as to meet the increasingly strict practical application requirements.
Aiming at the high precision problem of the sight line estimation, a method for exchanging and fusing features by utilizing a double-view interactive convolution block and a double-view TransFormer is proposed in the prior art. The sight line estimation network of the method receives face images with two different visual angles as input, and exchanges and fuses features by means of TransFormer to complete the sight line estimation task. However, this approach shows its limitations when faced with a variety of low-light complex scenes.
Therefore, a new method for estimating the line of sight with high accuracy in low light environments is needed.
Disclosure of Invention
The invention aims to provide a high-precision sight line estimation method in a low-light environment, and designs and introduces an innovative ALGCF (self-adaptive local-global fusion) module into a low-light image enhancement network, so as to specially cope with the importance of eye characteristics and the dynamic change of the eye characteristics in a sight line estimation task. The ALGCF module provides an effective multi-scale fusion strategy by combining a local feature extractor and a global context information extractor through a self-adaptive fusion gating mechanism, and the accuracy is greatly improved. In addition, based on the weak light image enhancement technology, the invention designs a sight estimation model integrating a characteristic purification module and an attention mechanism. This design ensures that ocular features are extracted effectively even in low light conditions, allowing accurate gaze estimation. The method can effectively solve the problem that the sight line estimation precision is obviously reduced in a low-light environment, thereby improving the practicability and the accuracy of the system. In order to achieve the above purpose, the present invention adopts the following technical scheme:
In order to solve at least one of the above problems, according to an aspect of the present invention, a high-precision line-of-sight estimation method in a low-light environment specifically includes the following steps:
s1, preprocessing a data set, and simulating a low-light environment;
s2, adding low-light images to obtain an enhanced image I enhanced;
S3, enhancing the calibration of the image I enhanced to obtain a calibration image I calibrated;
S4, extracting features of the image, and outputting feature vectors, wherein the calibrated image is extracted by using an improved residual error network model ResNet;
s5, mapping the feature vector into a three-dimensional output vector O through the full connection layer;
S6, applying hyperbolic tangent transformation to the first two elements of the three-dimensional output vector O to obtain an accurate predicted line-of-sight direction;
S7, transforming a third element of the three-dimensional output vector O through a sigmoid function to obtain uncertainty of sight prediction;
S8, adopting MSELoss loss functions to measure errors between the predicted result and the true value, and updating network parameters through back propagation.
Further, in S1, to simulate visual effects in different low light environments, preprocessing is performed on the Gaze360 data set, and a new Gaze360 data set reflecting various low light conditions is constructed;
categorizing low light environments, including darker scenes, extremely dark scenes, low light environments simulated using gamma correction, and dark scenes with unknown light source locations;
S1 comprises the following specific steps:
S101, for darker scenes and extremely dark scenes, the brightness and the contrast of the images are adjusted to increase the sense of reality of night vision, specifically, the brightness adjustment is realized by changing the dark part and the bright part interval of the images, the contrast adjustment is realized by adjusting the distribution range of the colors in the images, so that the color display is more concentrated, and the distinction of the dark part and the bright part is enhanced;
S102, obtaining a gamma corrected output image O through O=I (1/G) ×255 for a low-illumination environment image set simulated by using gamma correction, wherein I is an input image, and G is a gamma value;
s103, for a dark scene image set with an unknown light source position, darkening through an image and adding a local light source effect, firstly, obviously reducing the overall brightness of the image by adjusting the brightness, enhancing the appearance of a dark part in a night environment, and then, introducing a gradual change light source effect at a random position of the image and applying Gaussian blur, simulating the illumination of a specific light source, and ensuring the natural fusion of an illumination effect in a scene.
In step S2, the low-light image is enhanced by an enhancement network module, wherein the enhancement network module aims at extracting multi-scale features from the input image and finally realizing detail enhancement and illumination balance by combining global context information;
s2, the specific steps are as follows:
s201, extracting initial features, wherein basic features are extracted from an input image I through an initial convolution layer:
F0=ReLU(Conv0(I))
The ReLU activation function is used for nonlinear processing, so that the feature map F 0 keeps basic illumination and detail effects, and reliable initial features are provided for subsequent fusion and enhancement;
s202, self-adaptive local-global fusion, namely ALGCF;
Extracting detail information of the eye region by a local feature extractor:
Flocal=Conv1(F0)
the Conv1 is a convolution layer of the local feature extractor and is used for extracting eye details from the initial feature map;
Acquiring whole illumination and structure information of a face through a global context information extractor:
Fglobal=In(Conv2(AAP(F0)),size=Flocal)
The AAP is an adaptive average pooling layer, can extract illumination and structure information from a global context, uses a1 multiplied by 1 convolution kernel to generate a global feature map by Conv2, and then adjusts the global feature to be the same size as the local feature through interpolation In operation so as to be fused;
Combining local and global features, generating fusion weights, and carrying out self-adaptive feature fusion through the fusion weights:
The method comprises the steps of Concat combining local and global features into a fusion feature map, wherein Conv3 is a convolution layer of a gating mechanism, sigma represents a Sigmoid function, fusion weight G is generated through the Sigmoid function, and fusion feature F ALGCF adaptively fuses the local and global features according to the fusion weights G and 1-G, so that the facial feature map is ensured to have global illumination consistency and eye local details are maintained;
S203, further extracting features of the fused feature map F ALGCF through a plurality of convolution blocks:
Fi+1=Fi+ReLU(BN(Convi(Fi)))
Conv i is the convolution layer of each convolution block, F i is the i-th layer feature map, BN, batchNorm is used for feature standardization, and ReLU activation function ensures nonlinear processing of features;
The output convolution layer converts the fused feature map into an enhanced image:
Foutput=σ(Conv4(Fi+1))
conv4 is an output convolution layer, converts the feature map into a final enhanced image through convolution, and normalizes the pixel value to be within a range of 0-1 through a Sigmoid activation function;
s204, finally, adding the enhancement feature map to the input image to obtain a final enhancement image:
Ienhanced=Clamp(Foutput+I,0,1)
The Clamp function ensures that the pixel values of the enhanced image are within a reasonable range. By enhancing and fusing the details of the original image, the finally generated enhanced image has higher brightness and contrast.
Further, in step S3, the enhanced image I enhanced passes through the initial convolution layer to obtain a calibration feature map:
Fcalib(0)=ReLU(BN(Conv5(Ienhanced)))
The method comprises the steps of (1) extracting a calibration feature map by Conv5 which is an input convolution layer of a calibration network and carrying out standardization through BatchNorm, wherein a ReLU activation function is used for nonlinear processing, and F calib(0) represents a feature map obtained by processing an enhanced image through an initial convolution layer, namely the calibration feature map;
The calibration feature map F calib(0) is subjected to illumination and detail adjustment through a plurality of convolution blocks:
Fcalib(i)=Fcalib(i-1)+ReLU(BN(Convi(Fcalib(i-1))))
Wherein F calib(i-1) represents the output characteristic diagram of the i-1 layer, is also the input of the i-1 layer convolution block, provides input data for the current layer and contains accumulated characteristic information of previous hierarchical processing, F calib(i) is the output characteristic diagram of the i-1 layer convolution block, combines the original input characteristic F calib(i-1) and new characteristics of convolution, batch normalization and ReLU processing, and residual connection (+) is helpful for preventing gradient vanishing problem in deep network and ensuring that enough original information can be reserved even in the deep network;
The final calibration feature map passes through the output convolution layer to generate a difference image:
ΔI=σ(Convout(Fcalib_i))
Conv out is an output convolution layer, generates a final difference image through convolution, and ensures that an output value is in a reasonable range through a Sigmoid function;
Finally, subtracting the difference image from the enhanced input image to obtain a calibrated final image:
Icalibrated=Ienhanced-ΔI
the calibration module eliminates illumination non-uniformity and artifacts in the enhanced image through subtraction operation, so that the final calibration image is more approximate to an ideal state.
Further, in step S4, feature extraction is performed on the calibrated image using the modified residual network model ResNet, the attention mechanism module is introduced in the second and third phases, and the feature purification module is introduced in the fourth phase.
According to an aspect of the present invention, there is provided a storage medium having instructions stored therein, which when read by a computer, cause the computer to execute the high-precision line-of-sight estimation method in any one of the above-described low-light environments.
According to another aspect of the present invention, there is provided an electronic device comprising a processor and the storage medium described above, the processor executing instructions in the storage medium.
Compared with the prior art, the invention has the following beneficial effects:
1. the invention introduces a weak light image enhancement technology, which focuses on improving the definition of eye details in a low light environment through multi-scale feature fusion and global information extraction. The synergistic effect of the enhancement and calibration modules significantly improves the visibility and quality of the image, and provides high quality input data for line-of-sight estimation, thereby enhancing the accuracy and reliability of the model.
2. The present invention proposes a line-of-sight estimation model that combines feature purification and attention mechanisms. The feature purification module effectively extracts eye details through the self-adaptive weight and the regional attention mechanism, and the attention mechanism realizes more accurate sight estimation through integrating space and channel weight and feature correlation analysis. Through extensive testing, the angle errors displayed on the Gaze360 data sets in four low-light environments of dark_comp, dark_super, dark_gamma and dark_light by the method are respectively 12.75 degrees, 13.27 degrees, 12.43 degrees and 11.57 degrees, which are superior to the existing advanced network model.
3. The invention not only opens up a new research direction for improving the sight line estimation in the low-light environment by using the low-light image enhancement technology, but also provides a new thought and solution for the sight line estimation technology. In addition, the technology provides important technical support for application fields such as man-machine interaction, driver fatigue driving monitoring and the like in a low-light environment, and widens the application range of the technology.
Drawings
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more clear, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present invention. It will be apparent that the described embodiments are some, but not all, embodiments of the invention.
Unless defined otherwise, technical or scientific terms used herein should be given the ordinary meaning as understood by one of ordinary skill in the art to which this invention belongs.
FIG. 1 is a view line estimation model diagram in embodiment 1 of the present invention;
FIG. 2 is a diagram of a FWF-Gaze network framework in accordance with example 1 of the present invention;
FIG. 3 is a diagram of an enhanced network model of low-light images in embodiment 1 of the present invention;
FIG. 4 is a calibration network model diagram of low-light images in embodiment 1 of the present invention;
FIG. 5 is a diagram of the attention model in example 1 of the present invention;
FIG. 6 is a diagram of a characteristic purifying network model in embodiment 1 of the present invention;
FIG. 7 is a flow chart of a method according to an embodiment of the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in detail with reference to the accompanying drawings and specific embodiments.
As shown in fig. 1, a high-precision gaze estimation method in low-light environments uses an improved residual network model ResNet, which significantly optimizes the performance of the model in various low-light environments by introducing an attention mechanism module in its second and third stages, and a feature purification module in its fourth stage. In addition, the introduced weak light image enhancement technology combines an enhancement module and a calibration module, so that the detail visibility and quality of the image are effectively improved. In particular, the enhancement module specifically enhances details of the eye region through multi-scale feature fusion and global context information extraction, and the calibration module further refines the image to eliminate artifacts possibly introduced in the enhancement process.
Fig. 2 shows the overall network framework of the FWF-size model proposed by the present invention, which is specifically used for line-of-sight estimation in low-light environments. The image is processed by the image enhancement network and then transmitted to the sight estimation network, and the end-to-end integration method enables the FWF-Gaze model to show excellent performance on the sight estimation problem under the complex illumination environment.
In order to meet the requirements, the technical scheme adopted by the invention is as follows:
A high-precision sight line estimation method in a low-light environment specifically comprises the following steps:
s1, preprocessing the Gaze360 data set to simulate four low light environments, and preparing data for subsequent image processing and analysis.
S2, inputting the low-light image into an enhancement network, and processing to obtain an enhanced image I enhanced so as to improve the visibility and detail definition of the image under the low-light condition.
And S3, the enhanced image I enhanced is further processed by a calibration network, and a final calibration image I calibrated is output so as to eliminate noise and artifacts possibly introduced in the enhancement process.
And S4, performing feature extraction on the calibrated image by using the improved residual network model ResNet, and outputting a feature vector.
And S5, mapping the characteristic vector into a three-dimensional output vector O through a full connection layer, wherein the vector comprises the predicted sight direction (horizontal and vertical angles) and the predicted uncertainty (angle error) thereof.
And S6, applying hyperbolic tangent transformation to the first two elements of the three-dimensional output vector O to obtain accurate predicted sight line directions.
And S7, transforming a third element of the three-dimensional output vector O through a sigmoid function to obtain uncertainty of sight prediction.
And S8, measuring errors between the predicted result and the true value by adopting MSELoss loss functions, and updating network parameters through back propagation to optimize the performance and accuracy of the model.
Preferably, in step S1, in order to simulate the visual effect in different low light environments, the present invention performs four specific preprocessing techniques on the Gaze360 dataset. This includes darker scenes
(Dark_comp), extremely Dark scenes (dark_super), low-light environments simulated using gamma correction (dark_gamma), and Dark scenes with unknown light source positions (dark_light) are simulated. By these methods, the present invention successfully constructs new Gaze360 datasets reflecting a variety of low light conditions.
S101, for darker scenes (dark_comp) and extremely Dark scenes (dark_super), the invention increases the sense of realism of night vision by adjusting the brightness and contrast of the image. Specifically, brightness adjustment is realized by changing the dark part and the bright part of the image, and contrast adjustment is realized by adjusting the distribution range of colors in the image, so that the color display is more concentrated, and the distinction of the dark part and the bright part is enhanced.
S102 for a low illumination environment (dark_gamma) image set simulated using gamma correction, the present invention obtains a gamma corrected output image O by o=i (1/G) ×255, where I is the input image and G is the gamma value. When the gamma value is greater than 1, the brightness of the image increases accordingly, and similarly, when the gamma value is less than 1, the image becomes darker. The closer the gamma value is to 0, the more difficult the eye can recognize. Therefore, the invention finally selects the gamma value of 0.7 to simulate the low-illumination environment by correcting the gamma value of the image.
S103, in order to further simulate low light conditions in reality, such as museums or night driving scenes, the invention develops specific image processing steps, including image darkening and adding a local light source effect (dark_light). First, the overall brightness of the image is remarkably reduced by adjusting the brightness, and the appearance of dark parts in the night environment is enhanced. Then, by introducing gradual light source effects at random positions of the image and applying Gaussian blur, the illumination of a specific light source, such as a streetlamp, is simulated, and natural fusion of the illumination effects in the scene is ensured, so that the realism and visual focus of the scene are enhanced.
Preferably, in the step S2, the enhancement network module is configured to extract multi-scale features from the input image, and combine global context information to finally implement detail enhancement and illumination balancing. It consists of an input convolution layer, ALGCF modules, a plurality of convolution blocks and an output convolution layer, and the image enhancement network model is shown in fig. 3.
The specific method comprises the following steps:
s201, firstly, extracting initial characteristics, wherein basic characteristics are extracted from an input image I through an initial convolution layer:
F0=ReLU(Conv0(I))
Conv0 is the initial convolution layer through which features of the input image are extracted. The ReLU activation function is used for nonlinear processing, so that the feature map F 0 keeps basic illumination and detail effects, and reliable initial features are provided for subsequent fusion and enhancement.
S202, an adaptive local-global fusion (ALGCF) module is then performed, which is responsible for fusing local and global information.
The local feature extractor is used for extracting detail information of the eye region:
Flocal=Conv1(F0)
Where Conv1 is the convolution layer of the local feature extractor focusing on extracting ocular details from the initial feature map. The local feature extractor captures important detail features through convolution kernels.
The global context information extractor is used to obtain the whole illumination and structure information of the face:
Fglobal=In(Conv2(AAP(F0)),size=Flocal)
Wherein AAP is an adaptive averaging pooling layer capable of extracting illumination and structural information from a global context. Conv2 uses a1 x 1 convolution kernel to generate the global feature map. The global features are then adjusted to the same size as the local features by an interpolation In operation for fusion.
Combining local and global features, generating fusion weights, and carrying out self-adaptive feature fusion through the fusion weights:
Wherein Concat combines the local and global features into a fused feature map. Conv3 is a convolution layer of a gating mechanism, sigma represents a Sigmoid function, and the sigma is the same as the sigma in the following, and fusion weight G is generated through the Sigmoid function, so that reasonable combination of information of different sources is ensured. The fusion feature F ALGCF adaptively fuses the local and global features according to the weights G and 1-G, so that the facial feature map is ensured to have global illumination consistency and the local detail of eyes is kept.
S203, further extracting features of the fused feature map F ALGCF through a plurality of volumes:
Fi+1=Fi+ReLU(BN(Convi(Fi)))
Conv i is the convolutional layer for each convolutional block, F i is the ith layer feature map, batchNorm is feature normalized, and ReLU activation function ensures nonlinear processing of features. Through residual join (+) operations, each convolution block preserves the integrity of the input features while extracting the features.
The output convolution layer converts the fused feature map into an enhanced image:
Foutput=σ(Conv4(Fi+1))
Conv4 is an output convolution layer, converts the feature map into a final enhanced image through convolution, and normalizes the pixel value to be within a range of 0-1 through a Sigmoid activation function.
And S204, finally, adding the enhancement feature map to the input image to obtain a final enhancement image:
Ienhanced=Clamp(Foutput+I,0,1)
The Clamp function ensures that the pixel values of the enhanced image are within a reasonable range. By enhancing and fusing the details of the original image, the finally generated enhanced image has higher brightness and contrast.
Preferably, in the step S3, after the calibration network processing, the calibration network model is as shown in fig. 4, and the enhanced image I enhanced is subjected to an initial convolution layer to obtain a calibration feature map:
Fcalib(0)=ReLU(BN(Conv5(Ienhanced)))
Conv5 is an input convolution layer of the calibration network, extracts a calibration feature map, and performs standardization through BatchNorm to ensure stability and consistency of the feature map. The ReLU activation function is used for nonlinear processing, so that the features are more recognizable. F calib(0) represents the feature map obtained after the enhanced image is processed through the initial convolution layer, which is the first feature map in the calibration module, as the basis for the subsequent convolution block processing.
The calibration feature map F calib-0 is subjected to illumination and detail adjustment through a plurality of convolution blocks:
Fcalib(i)=Fcalib(i-1)+ReLU(BN(Convi(Fcalib(i-1))))
Wherein F calib(i-1) represents the output feature map of the i-1 layer, which is also the input of the convolution block of the i layer, which provides input data for the current layer, including the accumulated feature information of the previous layer processing. F calib(i) is the output signature of the i-th layer convolution block, which combines the original input signature F calib(i-1) with the new signature that has been convolved, batch normalized, and ReLU processed. Residual connection (+) helps to prevent the problem of gradient extinction in deep networks while ensuring that sufficient original information is retained even in deep networks.
The final calibration feature map passes through the output convolution layer to generate a difference image:
ΔI=σ(Convout(Fcalib_i))
conv out is an output convolution layer, generates a final difference image through convolution, and ensures that an output value is within a reasonable range through a Sigmoid function.
Finally, subtracting the difference image from the enhanced input image to obtain a calibrated final image:
Icalibrated=Ienhanced-ΔI
the calibration module eliminates illumination non-uniformity and artifacts in the enhanced image through subtraction operation, so that the final calibration image is more approximate to an ideal state.
Preferably, in the step S4, the feature extraction is performed on the calibrated image using the modified residual network model ResNet, and the performance of the model in various low-light environments is significantly optimized by introducing the attention mechanism module in the second and third stages thereof and the feature purification module in the fourth stage thereof.
S401, an attention mechanism module:
Spatial attention is focused on specific areas of the enhanced image, enabling the model to focus more on critical areas such as the eyes. The attention mechanism model diagram is shown in fig. 5. The present invention captures a wider range of contextual information by using a larger convolution kernel to emphasize the importance of the eye region:
S(x)=(Convspatial(x))
Where S (x) represents the spatial weight map obtained after passing a convolution layer Conv spatial (using a 7 x 7 convolution kernel and appropriate padding to keep the feature map size unchanged).
Channel attention evaluates the extent to which each channel contributes to line of sight estimation, and suppresses unimportant channels by enhancing the characteristics of important channels, thereby optimizing the quality of the overall characteristics.
C(x)=σ(Conv(ReLU(Conv(GAP(x)))))
Where GAP (x) represents global averaging pooling of input x, compressing the spatial information for each channel into a single value. Then, the channel weight is generated through a two-layer convolution operation (dimension reduction is performed firstly and dimension increase is performed later) and a ReLU activation function, and finally, the channel weight is generated through a Sigmoid function sigma.
Then, feature correlation analysis is introduced, which not only fuses the attention of the space and the channel, but also optimizes the interaction between the features and enhances the integration effect of the features. The spatial and channel attention weighted features are connected in the channel dimension to yield feature F concat.
Fconcat=[x⊙S(x),x⊙C(x)]
Wherein S (x) ·x and C (x) ·x are feature graphs to which spatial and channel weights are applied, respectively.
And then carrying out feature correlation analysis on the combined features.
R(x)=σ(Conv(ReLU(Conv(Fconcat))))
Conv is a reduced dimension convolutional layer that helps the module focus on capturing the most critical feature correlation information by reducing the number of channels. The ReLU activation function is used here to increase nonlinearities, thereby enabling the model to capture more complex feature relationships. Conv is an extended convolutional layer that restores the channel number to the original state and generates the final feature adjustment weights. These weights are adjusted to within a reasonable range by the Sigmoid function, ensuring that the original input can be adjusted as an effective scaling factor
Through feature correlation analysis, the model can dynamically evaluate and adjust the relationships between features so that important features are highlighted while uncorrelated or noisy features are suppressed. The fine feature processing strategy remarkably improves the performance of the model under complex illumination and low light conditions, and ensures the accuracy and the robustness of sight estimation.
In combination with the above components, the enhanced feature F R(x) can be obtained by:
FR(x)=x·R(x)
By combining spatial and channel attention, and feature correlation analysis, the proposed attention mechanism is able to effectively identify and enhance key ocular features in gaze estimation, especially in varying illumination and complex visual scenes.
S402, a characteristic purifying module:
The depth separable convolution is used for basic feature extraction, and the method effectively reduces the computational complexity and simultaneously maintains the extraction efficiency:
Fbasic=ReLU(BN(Conv(x)))
Wherein Conv (x) represents group convolution for extracting grouping features, and model complexity is reduced through the grouping convolution, which helps to keep parameters independent among different channels, and reduce the number of parameters, and F basic represents a feature map subjected to preliminary processing.
The adaptive weight layer adjusts its contribution to the final output by learning the importance of each feature, thereby optimizing the representation of the key features:
Wadaptive=σ(Convadaptive(Fbasic))
Where W adaptive represents the adaptive weights generated for the feature map, conv adaptive comprises a two-step convolution operation to adjust the feature weights by reducing and expanding the number of channels.
The region attention module focuses on the eye region, emphasizing important features by generating an attention mask for a particular region:
Mfocus=σ(Convfocus(GAP(Fbasic)))
Wherein GAP focuses on global information, generates context for local features, M focus is a focus mask, conv focus includes a convolution step to convert global information into a focus mask.
Finally, combining the adaptive weight W adaptive and the attention mask M focus, performing detail enhancement and feature optimization:
Fenhanced=(Fbasic×Mfocus)×Wadaptive
And the focusing and the weight adjustment are combined, so that the expression of eye details and the quality of integral features are effectively improved. This module ensures that high quality and high accuracy ocular features can be obtained in gaze estimation by precisely adjusting the contribution and focal region of each feature. The feature purification model is shown in fig. 6.
S403 residual error connection module
In ResNet reference model 18, the input image is first processed through a 7 x 7 convolutional layer, BN layer and ReLU activation function, followed by a 3 x3 max pooling layer. The network then goes through four stages, each of which goes through a residual block, and we add a feature purification module and an attention mechanism module on the basis of ResNet reference models to increase the accuracy of line-of-sight estimation in low light scenes. The structure of the feature extraction network after low light processing is shown in fig. 7, where Res1, res2, res3, res4 represent four residual phases, each with a similar structure.
After the Res2 and Res3 phases, the attention mechanism module is integrated. These modules optimize feature recognition of key visual areas through spatial and channel attention and feature correlation analysis. The method is beneficial to the model to process and utilize the key information of eyes more accurately, so that more accurate and more stable sight tracking can be realized under various low-light environment conditions.
Fattention=EyeFocusedSCAttentionModule(Finput)
The feature purification module is embedded after the Res4 stage. The module particularly strengthens the perception of the model to the eye details through the optimization processing of the self-adaptive weight and the regional attention mechanism. This strategy is based on the importance of ocular features in gaze estimation, especially in low light or complex lighting conditions.
Fpurify=EnhancedFeaturePurificationModule(Finput)
By alternately using feature purification and attention mechanisms at different stages, the depth network advantage can be maintained while optimizing for the line-of-sight estimation needs at various low light scenes.
Preferably, in order to evaluate the predictive power of the model, the present invention measures the difference between the predicted result and the true gaze point using the mean square error MSELoss as a loss function, steps S5-S8. The loss function MSELoss of this module is expressed as:
where L mse represents the mean square error loss, and P i and Y i represent the predicted and actual values of the i-th sample, respectively. The model was optimized using L mse. By back-propagating the error, the parameters of the network can be iteratively updated, thereby gradually improving the performance of the model on the gaze estimation task.
Example 1:
The Gaze360 dataset is video data collected from 238 subjects in the real world, which is large-scale, combining 3D Gaze annotation, broad Gaze and head gestures, various indoor and outdoor capture environments, and diversity of subjects. In the Gaze360 dataset, the present invention follows a predefined training-test segmentation, training with only 84902 frontal face images. The test set had 16031 images altogether to fully evaluate the performance of the model. Four pre-processed size 360 datasets (dark_comp, dark_super, dark_gamma, dark_light) were used for training the entire FWF-size network, respectively.
Environment we performed experiments on NVIDIAGeForce RTX 3090GPU using Windows environment and Pytorch framework. The model hyper-parameter settings are the same for both datasets. The model trained 100 epochs, a batch size of 40, a learning rate of 0.0001, and a decay of 1. The attenuation step size is set to 5000.
The evaluation index is that the performance of the main stream of the sight line estimation is compared with that of other sight line estimation models by adopting the angle error of the evaluation index, namely the deviation angle of the predicted value and the true value of the sight line estimation, the performance is represented by numerical values, and the smaller the index is, the better the effect is. Assuming the actual gaze direction isEstimating gaze direction asThe angle error can be calculated as:
The contrast model adopts the advanced methods of sight 360, fullFace, RT-Gene and Dilated-Net for sight estimation. The experimental setup of each method uses the setup in the corresponding treatises, including the framework of the model and the hyper-parameters, to reproduce their network performance.
The experimental results are shown in table 1:
table 1 experimental results of the network and other advanced networks proposed by the present invention
As shown in the experimental data of the table 1, the method of the invention can effectively solve the problem of reduced accuracy of the sight line estimation in the low-light environment, and has a strong practical value.
Example 2:
the embodiment will introduce an applicable scenario of the present invention:
The sight estimation has wide application scenes, one application scene is an intelligent interaction system, and the FWF-Gaze network can be applied to detecting fatigue driving of a driver. Such an application scenario contributes to improvement of safety and convenience.
An important indicator of driver fatigue is the driver's Gaze status, and the FWF-size network model of the present invention is used to predict Gaze.
First, when a driver drives a vehicle, a face image of the driver can be captured in real time by a camera.
Then, the predicted Gaze state of the driver is inputted into the FWF-size network model of the present invention.
Finally, when the model predicts that the driver is likely to be in a tired state, the system may alert the driver to rest by sound or other means, or automatically switch to an automatic driving mode (if the vehicle supports it).
The examples of the present invention are merely for describing the preferred embodiments of the present invention, and are not intended to limit the spirit and scope of the present invention, and those skilled in the art should make various changes and modifications to the technical solution of the present invention without departing from the spirit of the present invention.

Claims (6)

1.一种低光环境下高精度的视线估计方法,其特征在于,具体包括以下步骤:1. A high-precision line of sight estimation method in a low-light environment, characterized in that it specifically includes the following steps: S1.数据集预处理,模拟低光环境;S1. Dataset preprocessing, simulating low light environment; S2.低光图像增强,得到增强后的图像IenhancedS2. low-light image enhancement to obtain an enhanced image I enhanced ; S3.增强图像Ienhanced校准,得到校准图像IcalibratedS3. The enhanced image I enhanced is calibrated to obtain a calibrated image I calibrated ; S4.图像进行特征提取,输出特征向量;使用改进的残差网络模型ResNet18对校准图像进行特征提取;S4. Extract features from the image and output feature vectors; use the improved residual network model ResNet18 to extract features from the calibration image; S5.通过全连接层将特征向量映射成三维输出向量O;S5. Map the feature vector into a three-dimensional output vector O through a fully connected layer; S6.对三维输出向量O的前两个元素应用双曲正切变换,以获取预测视线方向;S6. Apply a hyperbolic tangent transform to the first two elements of the three-dimensional output vector O to obtain a predicted sight direction; S7.对三维输出向量O的第三个元素通过sigmoid函数变换,得到视线预测的不确定性;S7. The third element of the three-dimensional output vector O is transformed by a sigmoid function to obtain the uncertainty of the line of sight prediction; S8.采用MSELoss损失函数来度量预测结果与真实值之间的误差,并通过反向传播更新网络参数;S8. Use the MSELoss loss function to measure the error between the predicted result and the true value, and update the network parameters through back propagation; 步骤S2中,通过增强网络模块对低光图像进行增强;S2具体步骤如下:In step S2, the low-light image is enhanced by an enhancement network module; the specific steps of S2 are as follows: S201.初始特征提取,输入图像I经过初始卷积层,提取基础特征:S201. Initial feature extraction: Input image I passes through the initial convolution layer to extract basic features: F0=ReLU(Conv0(I)) F0 = ReLU(Conv0(I)) Conv0是初始卷积层;S202.自适应局部-全局融合,即ALGCF;Conv0 is the initial convolutional layer; S202. Adaptive local-global fusion, i.e. ALGCF; 通过局部特征提取器提取眼部区域的细节信息:Extract detailed information of the eye area through the local feature extractor: Flocal=Conv1(F0)F local = Conv1(F 0 ) 其中,Conv1是局部特征提取器的卷积层;Among them, Conv1 is the convolution layer of the local feature extractor; 通过全局上下文信息提取器获取面部整体光照和结构信息:The global context information extractor is used to obtain the overall illumination and structure information of the face: Fglobal=In(Conv2(AAP(F0)),size=Flocal)F global =In(Conv2(AAP(F 0 )), size = F local ) 其中,AAP是自适应平均池化层;Conv2使用1×1卷积核生成全局特征图;随后通过插值In操作将全局特征调整到与局部特征相同的大小;Among them, AAP is an adaptive average pooling layer; Conv2 uses a 1×1 convolution kernel to generate a global feature map; then the global feature is adjusted to the same size as the local feature through the interpolation In operation; 将局部和全局特征相结合,并生成融合权重,通过融合权重进行自适应的特征融合:Combine local and global features, generate fusion weights, and perform adaptive feature fusion through fusion weights: 其中,Concat将局部和全局特征组合成一个融合特征图;Conv3是门控机制的卷积层,σ表示Sigmoid函数,通过Sigmoid函数生成融合权重G;融合特征FALGCF根据融合权重G和1-G自适应融合局部与全局特征;Among them, Concat combines local and global features into a fused feature map; Conv3 is the convolution layer of the gating mechanism, σ represents the Sigmoid function, and the fusion weight G is generated by the Sigmoid function; the fusion feature F ALGCF adaptively fuses local and global features according to the fusion weight G and 1-G; S203.融合后的特征图FALGCF经过多个卷积块进一步提取特征:S203. The fused feature map F ALGCF is further extracted through multiple convolution blocks: Fi+1=Fi+ReLU(BN(Convi(Fi)))F i+1 =F i +ReLU(BN(Conv i (F i ))) Convi是每个卷积块的卷积层,Fi是第i层特征图;输出卷积层将融合特征图转换为增强图像:Conv i is the convolution layer of each convolution block, F i is the i-th layer feature map; the output convolution layer converts the fused feature map into an enhanced image: Foutput=σ(Conv4(Fi+1))F output = σ(Conv4(Fi +1 )) Conv4是输出卷积层,通过卷积将特征图转换为增强图像,并通过Sigmoid激活函数将像素值归一化到0-1范围内;Conv4 is the output convolution layer, which converts the feature map into an enhanced image through convolution and normalizes the pixel values to the range of 0-1 through the Sigmoid activation function; S204.将增强特征图与输入图像相加,得到最终的增强图像:S204. Add the enhanced feature map to the input image to obtain the final enhanced image: Ienhanced=Clamp(Foutput+I,0,1)I enhanced =Clamp(F output +I, 0, 1) Clamp函数确保增强图像的像素值在范围内。The Clamp function ensures that the pixel values of the augmented image are within the range. 2.根据权利要求1所述的方法,其特征在于,S1中对Gaze360数据集执行预处理,构建反映各种低光条件下的新的Gaze360数据集;2. The method according to claim 1, characterized in that, in S1, preprocessing is performed on the Gaze360 dataset to construct a new Gaze360 dataset reflecting various low-light conditions; 对低光环境进行分类,包括:较暗的场景、极暗的场景、使用伽马校正模拟的低照度环境以及模拟带有不明光源位置的黑暗场景;Classify low-light environments, including: dark scenes, extremely dark scenes, low-light environments simulated using gamma correction, and simulated dark scenes with unknown light source locations; S1具体步骤如下:The specific steps of S1 are as follows: S101.对于较暗的场景和极暗的场景,调整图像的亮度和对比度;S101. For darker scenes and extremely dark scenes, adjust the brightness and contrast of the image; S102.对于使用伽马校正模拟的低照度环境图像集,通过O=I(1/G)×255得到伽马校正后的输出图像O,其中I是输入图像,G是伽马值;S102. For a set of low-light environment images simulated using gamma correction, obtain a gamma-corrected output image O by O=I (1/G) ×255, where I is the input image and G is the gamma value; S103.对于使用模拟带有不明光源位置的黑暗场景图像集,通过图像变暗和添加局部光源效果;首先通过调整亮度显著降低图像的整体亮度,增强夜间环境中暗部的表现;随后,通过在图像随机位置引入渐变的光源效果并应用高斯模糊,模拟特定光源的照明,确保光照效果在场景中自然融合。S103. For a set of images of simulated dark scenes with unknown light source positions, the image is darkened and a local light source effect is added; first, the overall brightness of the image is significantly reduced by adjusting the brightness to enhance the performance of the dark part in the night environment; then, a gradient light source effect is introduced at random positions of the image and Gaussian blur is applied to simulate the illumination of a specific light source to ensure that the lighting effect is naturally integrated into the scene. 3.根据权利要求1所述的方法,其特征在于步骤S3中,增强后的图像Ienhancea经过初始卷积层,得到校准特征图:3. The method according to claim 1, characterized in that in step S3, the enhanced image I enhancea is passed through an initial convolution layer to obtain a calibration feature map: Fcalib(0)=ReLU(BN(Conv5(Ienhanced)))F calib(0) =ReLU(BN(Conv5(I enhanced ))) 其中,Conv5是校准网络的输入卷积层;Fcalib(0)表示增强后的图像通过初始卷积层处理后得到的特征图,即校准特征图;Where Conv5 is the input convolution layer of the calibration network; F calib (0) represents the feature map obtained after the enhanced image is processed by the initial convolution layer, that is, the calibration feature map; 校准特征图Fcalib(0)经过多个卷积块进行光照和细节的调整:The calibration feature map F calib(0) is adjusted for illumination and details through multiple convolution blocks: Fcalib(i)=Fcalib(i-1)+ReLU(BN(Convi(Fcalib(i-1))))F calib(i) =F calib(i-1) +ReLU(BN(Conv i (F calib(i-1) ))) 其中,Fcalib(i-1)表示第i-1层的输出特征图;Fcalib(i)是第i层卷积块的输出特征图;Among them, Fcalib(i-1) represents the output feature map of the i-1th layer; Fcalib(i) is the output feature map of the i-th convolutional block; 最终的校准特征图经过输出卷积层,生成差异图像:The final calibrated feature map is passed through the output convolution layer to generate a difference image: ΔI=σ(Convout(Fcalib_i))ΔI=σ(Conv out (F calib_i )) 其中,Convout是输出卷积层;Icalibrated=Ienhanced-ΔI。Where Conv out is the output convolution layer; I calibrated = I enhanced - ΔI. 4.根据权利要求3所述的方法,其特征在于,步骤S4中,使用改进的残差网络模型ResNet18对校准图像进行特征提取,在第二和第三阶段引入注意力机制模块,以及在第四阶段引入特征净化模块。4. The method according to claim 3 is characterized in that, in step S4, an improved residual network model ResNet18 is used to extract features of the calibration image, an attention mechanism module is introduced in the second and third stages, and a feature purification module is introduced in the fourth stage. 5.一种计算机可读存储介质,其上存储有计算机程序,其特征在于:该程序被处理器执行时实现如权利要求1~4中任一项所述的低光环境下高精度的视线估计方法中的步骤。5. A computer-readable storage medium having a computer program stored thereon, wherein when the program is executed by a processor, the steps of the high-precision line of sight estimation method in a low-light environment as described in any one of claims 1 to 4 are implemented. 6.一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其特征在于,所述处理器执行所述程序时实现如权利要求1~4中任一项所述的低光环境下高精度的视线估计方法中的步骤。6. A computer device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein when the processor executes the program, the steps of the high-precision line of sight estimation method in a low-light environment as described in any one of claims 1 to 4 are implemented.
CN202410833622.6A 2024-06-26 2024-06-26 High-precision line of sight estimation method in low-light environment Active CN118799949B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410833622.6A CN118799949B (en) 2024-06-26 2024-06-26 High-precision line of sight estimation method in low-light environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410833622.6A CN118799949B (en) 2024-06-26 2024-06-26 High-precision line of sight estimation method in low-light environment

Publications (2)

Publication Number Publication Date
CN118799949A CN118799949A (en) 2024-10-18
CN118799949B true CN118799949B (en) 2025-04-11

Family

ID=93024859

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410833622.6A Active CN118799949B (en) 2024-06-26 2024-06-26 High-precision line of sight estimation method in low-light environment

Country Status (1)

Country Link
CN (1) CN118799949B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117292421A (en) * 2023-09-12 2023-12-26 南通大学 GRU-based continuous vision estimation deep learning method
CN117830783A (en) * 2024-01-03 2024-04-05 南通大学 A gaze estimation method based on local super-resolution fusion attention mechanism

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110058694B (en) * 2019-04-24 2022-03-25 腾讯科技(深圳)有限公司 Sight tracking model training method, sight tracking method and sight tracking device
CN115049619B (en) * 2022-06-16 2024-04-09 浙江理工大学 Efficient flaw detection method for complex scene
CN115760640A (en) * 2022-12-06 2023-03-07 太原理工大学 Coal mine low-illumination image enhancement method based on noise-containing Retinex model
CN115951775B (en) * 2022-12-16 2025-07-18 中国地质大学(武汉) MLP-based three-dimensional sight estimation method, device, equipment and storage medium
CN118116063B (en) * 2023-12-06 2025-04-11 南通大学 High-precision gaze estimation method based on multimodal and Transformer attention mechanism

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117292421A (en) * 2023-09-12 2023-12-26 南通大学 GRU-based continuous vision estimation deep learning method
CN117830783A (en) * 2024-01-03 2024-04-05 南通大学 A gaze estimation method based on local super-resolution fusion attention mechanism

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于深度学习的视线估计方法综述;温铭淇 等;《计算机工程与应用》;20240615;第60卷(第12期);第2.1节 *

Also Published As

Publication number Publication date
CN118799949A (en) 2024-10-18

Similar Documents

Publication Publication Date Title
US20230214976A1 (en) Image fusion method and apparatus and training method and apparatus for image fusion model
CN111709902B (en) Infrared and visible light image fusion method based on self-attention mechanism
CN109671023B (en) A Super-resolution Reconstruction Method of Face Image
CN111915526A (en) Photographing method based on brightness attention mechanism low-illumination image enhancement algorithm
CN115311186B (en) Cross-scale attention confrontation fusion method and terminal for infrared and visible light images
WO2021063341A1 (en) Image enhancement method and apparatus
CN115442515A (en) Image processing method and apparatus
WO2021164234A1 (en) Image processing method and image processing device
CN116757986A (en) Infrared and visible light image fusion method and device
Fan et al. Multiscale cross-connected dehazing network with scene depth fusion
CN110675462A (en) A Colorization Method of Grayscale Image Based on Convolutional Neural Network
CN103237168A (en) Method for processing high-dynamic-range image videos on basis of comprehensive gains
CN114092774B (en) RGB-T image saliency detection system and detection method based on information flow fusion
Yuan et al. Single image dehazing via NIN-DehazeNet
Zheng et al. Overwater image dehazing via cycle-consistent generative adversarial network
CN114663951B (en) Low-illumination face detection method and device, computer equipment and storage medium
CN111738964A (en) Image data enhancement method based on modeling
CN111914938A (en) Image attribute classification and identification method based on full convolution two-branch network
CN118822921B (en) Low-illumination image enhancement method based on HSV and attention mechanism
CN120088839A (en) A cross-view line of sight estimation method based on feature decoupling and attention mechanism
CN119379575A (en) HDR image reconstruction method, device, equipment and storage medium
CN119540071A (en) De-noising diffusion model driven texture enhanced infrared and visible light image fusion method and system
CN118799949B (en) High-precision line of sight estimation method in low-light environment
CN118870207A (en) A method, system and device for processing automatic white balance of an image
Xue et al. A Study of Lightweight Classroom Abnormal Behavior Recognition by Incorporating ODConv

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20250924

Address after: 100000 Beijing City Chaoyang District East Fourth Ring North Road No. 2 Shangdong Park Building 3rd Floor

Patentee after: Beijing Cactus Technology Co.,Ltd.

Country or region after: China

Address before: Nantong University Technology Transfer Research Institute, Building 1, No. 79 Yongfu Road, Chongchuan District, Nantong City, Jiangsu Province, 226000

Patentee before: NANTONG University

Country or region before: China

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载