CN118505540A

CN118505540A - Low-illumination image enhancement method based on random region hiding reconstruction

Info

Publication number: CN118505540A
Application number: CN202410903362.5A
Authority: CN
Inventors: 杨一帆; 王琪树; 张弘; 袁丁; 冯亚春
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2024-07-08
Filing date: 2024-07-08
Publication date: 2024-08-16

Abstract

The present invention discloses a low-light image enhancement method based on random region hidden reconstruction. The network is divided into two stages. The first stage of the network is based on a well-illuminated image data set. By training the network's ability to reconstruct noisy images after random regions are hidden, a denoising and reconstruction feature encoder for low-light noisy images is obtained; the second stage of the network, on the basis of inheriting the feature encoder model parameters obtained in the first stage, inputs the extracted different features into a multi-scale feature fusion module guided by the reconstruction features, so that the network can retain more image detail features while denoising. The present invention utilizes a more abundant source of image data sets under good lighting conditions, saving time and manpower for network training in shooting a large number of real paired low-light images; and by fusing the features extracted by different feature encoders, the network can quickly adapt to real low-light images under complex lighting, and obtain good low-light enhancement and denoising effects.

Description

A low-light image enhancement method based on random region hidden reconstruction

技术领域Technical Field

本发明属于低照度图像增强领域，具体涉及一种基于随机区域隐匿重构的低照度图像增强方法。The invention belongs to the field of low-illumination image enhancement, and in particular relates to a low-illumination image enhancement method based on random region hidden reconstruction.

背景技术Background Art

随着科技的进步，图像在社会生产生活中被越来越广泛的应用，无论是在医疗、教育、娱乐还是科研、工业、农业等领域，图像都发挥着重要的作用。例如，医疗图像帮助医生进行诊断，卫星图像被用于气候预测和地理研究。而高质量的图像不仅可以提供更好的视觉体验，也能更准确的传递信息，更有效地支持决策。而在实际应用中，环境、设备等原因常常造成过暗、含有噪声、颜色偏差等低质量的图像，因此需要通过图像增强技术进行后处理，以提高图像的质量和可用性。With the advancement of science and technology, images are being used more and more widely in social production and life. Whether in the fields of medicine, education, entertainment, scientific research, industry, agriculture, etc., images play an important role. For example, medical images help doctors make diagnoses, and satellite images are used for climate forecasting and geographical research. High-quality images can not only provide a better visual experience, but also convey information more accurately and support decision-making more effectively. In practical applications, the environment, equipment and other reasons often cause low-quality images such as dark, noisy, and color deviation. Therefore, image enhancement technology is needed for post-processing to improve the quality and usability of images.

低光增强任务的主要目的是提高低光照环境下图像的亮度与清晰度。传统图像处理方法中空域算法如直方图均衡化、伽马矫正等通过对像素直接处理提高图片亮度和对比度，但是往往处理结果放大了噪声，并带来伪影、颜色畸变等问题。频域算法如低通滤波使图像更加平滑，高通滤波提高图像中边缘交界处的细节，同态滤波可以提高暗部细节，但对低照度图像整体处理效果不佳，往往需要根据不同光照进行调参，且对城市复杂光照条件下的应用效果不好。Retinex理论也被广泛应用于低光增强，它根据三色理论和颜色恒常性，认为物体对长中短波的反射能力决定了物体的颜色，不受反射光强度的影响，所以将原始低光图像解耦合为照度分量和反射分量，通过从原始图像中分离出反射分量来得到增强后的图，但对于极暗环境和复杂光照环境处理效果不佳，且往往需要调参，泛化性不强。The main purpose of low-light enhancement is to improve the brightness and clarity of images in low-light environments. In traditional image processing methods, spatial domain algorithms such as histogram equalization and gamma correction directly process pixels to improve image brightness and contrast, but the processing results often amplify noise and bring problems such as artifacts and color distortion. Frequency domain algorithms such as low-pass filtering make images smoother, high-pass filtering improves the details at the edge junctions in the image, and homomorphic filtering can improve the details in the dark areas, but the overall processing effect on low-light images is not good, and parameters often need to be adjusted according to different lighting conditions, and the application effect is not good under complex lighting conditions in cities. Retinex theory is also widely used in low-light enhancement. Based on the trichromatic theory and color constancy, it believes that the reflection ability of an object to long, medium and short waves determines the color of the object, and is not affected by the intensity of the reflected light. Therefore, the original low-light image is decoupled into illumination component and reflection component, and the enhanced image is obtained by separating the reflection component from the original image. However, it does not work well for extremely dark environments and complex lighting environments, and often requires parameter adjustment, and the generalization is not strong.

随着神经网络的发展和计算机算力的提升，基于神经网络的图像处理方法在视觉领域得到广泛应用，但卷积神经网络受到卷积核大小限制，感受野较小，从而不擅长对图像全局长程关系的捕获，在低光增强任务中容易出现增强结果中光照不均的问题，难以利用非紧邻块的有用信息。Transformer网络对全局序列数据中的全局依赖关系有很好的捕获能力，ViT是第一个将Transformer架构应用到计算机视觉任务的网络，并且达到了与卷积相同，甚至超越卷积的性能。但Transformer作为一个序列到序列的模型，将像素作为序列输入，会导致输入序列过长，比自然语言处理中的序列长得多（如：224224=50176），计算量很大。也就是说，现有的低光增强方法需要大量真实的配对的低光图像对进行有监督网络训练，而拍摄大量同一场景下低光环境和良好照明条件下的图像对十分耗费人力和时间，目前这种数据集数量十分少，不利于神经网络训练；同时Transformer网络作为一个序列到序列的模型，面对高分辨率图像时将像素作为序列输入，计算量十分庞大对算力要求高；同时现有低光增强方法存在着低光降噪能力受限制导致增强后图像边缘细节不清晰的问题。因此，亟需提出一种能够对低光图像进行快速地有效增强的方法。With the development of neural networks and the improvement of computer computing power, image processing methods based on neural networks have been widely used in the field of vision. However, convolutional neural networks are limited by the size of the convolution kernel and have a small receptive field, so they are not good at capturing the global long-range relationships of images. In low-light enhancement tasks, it is easy to have uneven lighting problems in the enhancement results, and it is difficult to use useful information from non-adjacent blocks. The Transformer network has a good ability to capture global dependencies in global sequence data. ViT is the first network to apply the Transformer architecture to computer vision tasks, and has achieved the same performance as convolution or even better than convolution. However, as a sequence-to-sequence model, the Transformer uses pixels as sequence input, which will cause the input sequence to be too long, much longer than the sequence in natural language processing (e.g., 224 224=50176), which is very computationally intensive. In other words, existing low-light enhancement methods require a large number of real paired low-light image pairs for supervised network training, and shooting a large number of image pairs in low-light environments and good lighting conditions in the same scene is very labor-intensive and time-consuming. Currently, the number of such data sets is very small, which is not conducive to neural network training; at the same time, the Transformer network is a sequence-to-sequence model. When facing high-resolution images, it takes pixels as sequence input, which is very computationally intensive and requires high computing power; at the same time, existing low-light enhancement methods have the problem of limited low-light noise reduction capabilities, resulting in unclear edge details of the enhanced image. Therefore, it is urgent to propose a method that can quickly and effectively enhance low-light images.

发明内容Summary of the invention

外围克服现有技术的不足，本发明提供一种基于随机区域隐匿重构的低照度图像增强方法，首先利用数量集更丰富的良好照明条件图像，先对输入的良好照明条件下图像进行低光和随机的掩码和加噪处理，然后再通过重构出被掩码而隐匿的丢失像素这一自监督学习任务，在大大减少网络计算量节省算力与时间的同时，提高网络对低光含噪图像全局信息的学习能力，可以在对低光含噪图像进行降噪的同时更好的保留边缘细节；In order to overcome the shortcomings of the prior art, the present invention provides a low-light image enhancement method based on random region hidden reconstruction. First, a richer set of images under good lighting conditions is used to perform low-light and random masking and noise processing on the input images under good lighting conditions. Then, the lost pixels hidden by the mask are reconstructed through a self-supervised learning task. While greatly reducing the amount of network calculations and saving computing power and time, the network's learning ability for the global information of low-light noisy images is improved, and the edge details can be better retained while the low-light noisy images are denoised.

在第二阶段通过重构特征指导的多尺度特征融合模块，使得不同特征编码器提取到的特征可以进行更好的特征融合，并且相比于U-net卷积去噪网络中局限于规则的矩形滑动窗口和卷积神经网络有限的感受野带来的限制，特征融合模块更加灵活的和范围更广的选择与其进行交互的采样点，从而提取到更适合低光含噪图像的特征信息。同时使用来源相对较少的真实低光图像对进行网络参数进行微调，使其更加快速的适应于各种真实复杂低光照场景，比现有方法能够更好的实现低光增强与降噪，得到清晰图像。本发明通过对原图像执行随机掩码隐匿处理后再实现图像重构这一自监督学习任务，以及对不同特征在以重构特征为指导的多尺度特征融合模块的特征信息融合，在实现降低数据量节省计算机算力的同时，提高网络对低光含噪图像的信息的灵活处理能力。In the second stage, the features extracted by different feature encoders can be better fused by using a multi-scale feature fusion module guided by reconstruction features. Compared with the limitations of the U-net convolutional denoising network, which is limited to a regular rectangular sliding window and the limited receptive field of the convolutional neural network, the feature fusion module is more flexible and has a wider range of sampling points to interact with, thereby extracting feature information that is more suitable for low-light noisy images. At the same time, the network parameters are fine-tuned using relatively few real low-light images, so that it can adapt to various real and complex low-light scenes more quickly, and can better achieve low-light enhancement and noise reduction than existing methods to obtain clear images. The present invention performs random mask hiding processing on the original image and then realizes the self-supervised learning task of image reconstruction, as well as the fusion of feature information of different features in the multi-scale feature fusion module guided by the reconstruction features, thereby reducing the amount of data and saving computer computing power, while improving the network's flexible processing ability for information of low-light noisy images.

为达到上述目的，本发明采用如下技术方案：In order to achieve the above object, the present invention adopts the following technical scheme:

本发明的一种基于随机区域隐匿重构的低照度图像增强方法，采用的网络分为两个阶段进行训练，两个阶段共享同一个特征提取模块（包括降噪特征编码器和重构特征编码器）。第一阶段训练的目的是在来源更丰富的良好照相条件数据集上训练，得到更好的低光含噪图像的特征提取器，第二阶段继承第一阶段任务得到的特征提取器，然后通过重构特征指导的多尺度特征融合模块对图像特征进行融合。通过少量的真实低光含噪图像调整其网络权重，从而得到一个泛化性更强的低光增强网络。具体包括以下步骤：The present invention discloses a low-light image enhancement method based on random region hidden reconstruction. The network used is divided into two stages for training. The two stages share the same feature extraction module (including a noise reduction feature encoder and a reconstruction feature encoder). The purpose of the first stage training is to train on a data set with richer sources of good photographic conditions to obtain a better feature extractor for low-light noisy images. The second stage inherits the feature extractor obtained by the first stage task, and then fuses the image features through a multi-scale feature fusion module guided by the reconstruction features. The network weights are adjusted through a small number of real low-light noisy images to obtain a low-light enhancement network with stronger generalization. Specifically, the following steps are included:

第一阶段部分的具体步骤为：The specific steps of the first phase are:

步骤（1）：首先对良好光照条件下的图像数据集进行随机加噪和降低亮度预处理，得到合成的低光含噪声图像。Step (1): First, the image dataset under good lighting conditions is preprocessed by randomly adding noise and reducing brightness to obtain a synthetic low-light noisy image.

步骤（2）：将步骤（1）得到的低光含噪图划分为多个大小为16×16的图像块后，基于均匀分布生成随机序列，对随机数值进行排序并映射到原图像块中，按照75％的比例对图像块进行掩码处理。Step (2): After dividing the low-light noisy image obtained in step (1) into multiple image blocks of size 16×16, a random sequence is generated based on uniform distribution, the random values are sorted and mapped to the original image blocks, and the image blocks are masked at a ratio of 75%.

步骤（3）：将掩码前后两种图像分别输入一个卷积核大小为16×16的卷积神经网络和一个全连接层得到大小为196×768图像的编码结果，使用正弦余弦位置编码对每个图像块进行位置编码，两者相加得到每个图像块的初始编码信息。Step (3): Input the two images before and after the mask into a convolutional neural network with a convolution kernel size of 16×16 and a fully connected layer respectively to obtain the encoding result of an image of size 196×768. Use sine-cosine position coding to positionally encode each image block, and add the two to obtain the initial encoding information of each image block.

步骤（4）：将图像的编码信息输入去噪特征编码器和重构特征编码器得到高维特征图；Step (4): Input the encoded information of the image into the denoising feature encoder and the reconstruction feature encoder to obtain a high-dimensional feature map;

步骤（5）：将从两分支的特征编码器输出的特征图分别输入多层噪声解码器和多层复原解码器；Step (5): input the feature maps output from the feature encoders of the two branches into the multi-layer noise decoder and the multi-layer restoration decoder respectively;

步骤（6）：输出解码后的噪声分布图和重构图后与输入图像进行对比，计算重建损失和感知损失。Step (6): Output the decoded noise distribution map and the reconstructed map and compare them with the input image to calculate the reconstruction loss and perceptual loss.

第二阶段的具体步骤为：The specific steps of the second phase are:

步骤（7）：继承第一阶段任务中得到的去噪特征编码器和重构特征编码器。Step (7): Inherit the denoising feature encoder and reconstruction feature encoder obtained in the first stage task.

步骤（8）：将真实场景低光含噪图像数据集输入第一阶段任务中得到的两分支的特征编码器进行特征提取。Step (8): Input the real scene low-light noisy image dataset into the two-branch feature encoder obtained in the first stage task for feature extraction.

步骤（9）：将两分支最终得到的特征图一起输入到重构特征指导的多尺度特征融合模块进行特征融合。Step (9): The feature maps finally obtained from the two branches are input together into the multi-scale feature fusion module guided by the reconstruction feature for feature fusion.

步骤（10）：将步骤（9）得到的特征图输入低光降噪解码器，得到低光降噪增强处理后的结果图，与真值计算损失函数，进行网络权重参数的微调，使其快速适应各种复杂真实噪声图像。Step (10): Input the feature map obtained in step (9) into the low-light noise reduction decoder to obtain the result map after low-light noise reduction enhancement processing, and calculate the loss function with the true value to fine-tune the network weight parameters so that it can quickly adapt to various complex real noise images.

本发明与现有技术相比的有益效果在于：The beneficial effects of the present invention compared with the prior art are:

1、第一阶段任务利用数据集更为丰富的良好光照条件下的数据集进行网络训练，得到能够处理低光含噪图像的特征提取器，相比现有增强方法低光增强后噪声放大的问题，本发明在增强后噪声降低的同时避免了网络因真实低光图像数据集数量少而过拟合的问题。1. The first stage task uses a richer dataset under good lighting conditions to train the network, and obtains a feature extractor that can process low-light noisy images. Compared with the problem of noise amplification after low-light enhancement in existing enhancement methods, the present invention reduces the noise after enhancement while avoiding the problem of overfitting of the network due to the small number of real low-light image datasets.

2、对原图执行75%的掩码率，使得输入网络进行特征提取的数据量只有原2. Apply a 75% mask rate to the original image, so that the amount of data input to the network for feature extraction is only the original

图的25%，计算量大大减少，节省了计算机算力，加速了网络的训练速度。The amount of calculation is greatly reduced, which saves computer computing power and speeds up the training of the network.

3、使用数据量相对较少的真实低光图像对网络权重参数进行微调，使得网络可以进一步适应真实的复杂低光照场景，鲁棒性与泛化性更好。3. Use real low-light images with relatively small amounts of data to fine-tune the network weight parameters, so that the network can further adapt to real complex low-light scenes with better robustness and generalization.

4、相比于卷积神经网络中规则的矩形滑动窗口，本发明在特征融合模型中通过重构特征来生成采样点相对于特征点的偏移量，获得更为灵活的交互范围。4. Compared with the regular rectangular sliding window in the convolutional neural network, the present invention generates the offset of the sampling point relative to the feature point by reconstructing the features in the feature fusion model, thereby obtaining a more flexible interaction range.

5、相比于因为多层下采样而导致纹理细节信息丢失的U-net去噪网络，本发明进行特征融合时使用各尺度的采样点，保留了含噪图像中的细节，更加适合带噪图像，减轻了因去噪带来的图像模糊问题。5. Compared with the U-net denoising network that loses texture detail information due to multi-layer downsampling, the present invention uses sampling points of various scales when performing feature fusion, retains the details in the noisy image, is more suitable for noisy images, and reduces the image blur problem caused by denoising.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为本发明中两阶段任务的流程图；FIG1 is a flow chart of the two-stage tasks in the present invention;

图2为本发明的第二阶段任务中的重构特征指导的特征融合模块示意图；FIG2 is a schematic diagram of a feature fusion module guided by reconstruction features in the second stage task of the present invention;

图3为本发明的第二阶段任务中的重构特征指导的特征融合模块中的融合模块的结构示意图。FIG3 is a schematic diagram of the structure of a fusion module in a feature fusion module guided by reconstruction features in the second stage task of the present invention.

具体实施方式DETAILED DESCRIPTION

为了使本发明的目的、技术方案及优点更加清楚明白，以下结合附图及实施例，对本发明进行进一步详细说明。应当理解，此处所描述的具体实施例仅用以解释本发明，并不用于限定本发明。此外，下面所描述的本发明各个实施方式中所涉及到的技术特征只要彼此之间未构成冲突就可以相互组合。In order to make the purpose, technical solutions and advantages of the present invention more clearly understood, the present invention is further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention and are not intended to limit the present invention. In addition, the technical features involved in the various embodiments of the present invention described below can be combined with each other as long as they do not conflict with each other.

本发明首先训练神经网络对随机区域被掩码隐匿后图像的重构能力，使神经网络可以学习到详细的关于低光含噪图像的深层语义信息，然后使用真实低光图像对来微调从第一阶段继承到的特征提取网络，并通过重构特征指导的多尺度特征融合来实现不同类型特征的融合，实现适应于各种真实低光照条件下泛化性更强的低光增强降噪，适用于能见度低的情景中，为自动驾驶、安防监控、目标捕捉等系统提供清晰图像。The present invention first trains the neural network's ability to reconstruct images after random areas are masked and hidden, so that the neural network can learn detailed deep semantic information about low-light noisy images, and then uses real low-light image pairs to fine-tune the feature extraction network inherited from the first stage, and realizes the fusion of different types of features through multi-scale feature fusion guided by reconstruction features, thereby achieving more generalized low-light enhanced denoising that is adaptable to various real low-light conditions. It is suitable for scenarios with low visibility and provides clear images for systems such as autonomous driving, security monitoring, and target capture.

如图1所示，本发明的一种基于随机区域隐匿重构的低照度图像增强方法采用的网络使用两个阶段得到低光增强模型，两阶段共享去噪特征编码器和重构特征编码器。As shown in FIG1 , a low-light image enhancement method based on random region hidden reconstruction of the present invention adopts a network that uses two stages to obtain a low-light enhancement model, and the two stages share a denoising feature encoder and a reconstruction feature encoder.

其中第一阶段包含以下步骤：The first stage includes the following steps:

步骤（1）：对良好照明条件下的图像施加高斯噪声与泊松噪声，并随机降低图像亮度生成3通道、高为224、宽为224的低光含噪图像。Step (1): Gaussian noise and Poisson noise are applied to the image under good lighting conditions, and the image brightness is randomly reduced to generate a low-light noisy image with 3 channels, height 224 and width 224.

步骤（2）：将224×224大小的图片分成1414个图像块，每个图像块大小为(16,16)。将低光含噪图划分成为图像块后，随机选取75％的图像块进行掩码处理得到掩码后的低光含噪图像。Step (2): Divide the 224×224 image into 14 14 image blocks, each image block size is (16,16). After dividing the low-light noisy image into image blocks, 75% of the image blocks are randomly selected for masking to obtain the masked low-light noisy image.

步骤（4）：将图像的编码信息输入去噪特征编码器和重构特征编码器得到高维特征图，所述特征编码器使用经典Transformer的编码器结构，每个特征编码器的网络结构依次由以下几个模块和处理层构成：①归一化层；②多头自注意力层；③残差层；④归一化层；⑤多层感知机层；Step (4): Input the encoded information of the image into the denoising feature encoder and the reconstruction feature encoder to obtain a high-dimensional feature map. The feature encoder uses the encoder structure of the classic Transformer. The network structure of each feature encoder is composed of the following modules and processing layers in sequence: ① normalization layer; ② multi-head self-attention layer; ③ residual layer; ④ normalization layer; ⑤ multi-layer perceptron layer;

步骤（5）：将从两分支的特征编码器输出的特征图分别输入多层噪声解码器和多层复原解码器。多层噪声解码器和多层复原解码器均有由6层特征解码器组成。每层特征解码器依次由以下几个模块和处理层构成：多头自注意力层、残差层和归一化层。Step (5): Input the feature maps output from the feature encoders of the two branches into the multi-layer noise decoder and the multi-layer restoration decoder respectively. The multi-layer noise decoder and the multi-layer restoration decoder are both composed of 6 layers of feature decoders. Each layer of feature decoder is composed of the following modules and processing layers in sequence: multi-head self-attention layer, residual layer and normalization layer.

在第二阶段中，继承使用第一阶段得到的特征解码器权重初始化第二阶段网络的对应模块，并使用以下步骤训练低光增强网络：In the second stage, the feature decoder weights obtained in the first stage are used to initialize the corresponding modules of the second stage network, and the low-light enhancement network is trained using the following steps:

步骤（7）：将真实低光图像输入第二阶段网络，先重复步骤（2）进行编码，再继承第一阶段预训练好的去噪特征编码器和重构特征编码器进行特征提取，将上下两分支最终提取的特征输入重构特征指导的多尺度特征融合模块。Step (7): Input the real low-light image into the second-stage network, repeat step (2) for encoding, and then inherit the denoising feature encoder and reconstruction feature encoder pre-trained in the first stage for feature extraction. The features finally extracted by the upper and lower branches are input into the multi-scale feature fusion module guided by the reconstruction features.

步骤（8）：如图2所示，将两分支的特征输入到重构特征指导的多尺度特征融合模块。在模块内，先经过融合模块。如图3所示，在融合模块内，将去噪图像特征和重构图像特征分别依次输入不同的卷积核为3×3，步长分别为1、2、2的卷积神经网络进行下采样，得到三种尺度的去噪图像特征和重构图像特征，然后将得到的多尺度重构图像特征分别输入卷积核为3×3，步长均为1的卷积神经网络，生成对应的各尺度去噪图像特征图中的每个特征点的9个不规则采样点相对该特征点的偏移量。然后根据以重构特征为指导得到的多尺度偏移量图，各个尺度去噪图像特征图中的每个特征点均与其不规则的27个（3种尺度，每种9个）采样点进行运算，生成更新后的多尺度去噪图像特征图，再输入下一层特征融合。在最后一层融合结束后，将其中小尺度的去噪图像特征图通过上采样与特征图拼接，得到更新后的原尺寸大小的去噪图像特征图Step (8): As shown in FIG2, the features of the two branches are input into the multi-scale feature fusion module guided by the reconstruction feature. In the module, the fusion module is first passed. As shown in FIG3, in the fusion module, the denoised image features and the reconstructed image features are respectively input into different convolutional neural networks with convolution kernels of 3×3 and step sizes of 1, 2, and 2 for downsampling to obtain denoised image features and reconstructed image features of three scales. Then, the obtained multi-scale reconstructed image features are respectively input into convolutional neural networks with convolution kernels of 3×3 and step sizes of 1 to generate the offset of 9 irregular sampling points of each feature point in the corresponding denoised image feature map of each scale relative to the feature point. Then, according to the multi-scale offset map obtained under the guidance of the reconstruction feature, each feature point in the denoised image feature map of each scale is calculated with its 27 irregular sampling points (3 scales, 9 each) to generate an updated multi-scale denoised image feature map, which is then input into the next layer of feature fusion. After the last layer of fusion is completed, the small-scale denoised image feature map is spliced with the feature map by upsampling to obtain an updated denoised image feature map of the original size.

步骤（9）：将步骤（8）中融合模块的输出结果分别与参数矩阵Wq、Wk相乘得到特征图Q和K，将输入的去噪图像特征与参数矩阵Wv相乘得到特征图V，将Q、K、V三者进行交叉注意力计算，得到的特征图作为新的去噪图像特征输出。Step (9): Multiply the output of the fusion module in step (8) by the parameter matrices Wq and Wk to obtain feature maps Q and K, multiply the input denoised image features by the parameter matrix Wv to obtain the feature map V, perform cross-attention calculation on Q, K, and V, and output the obtained feature map as the new denoised image feature.

步骤（10）：将步骤（8）（9）重复N层后，把最后一层的输出去噪图像特征输入由6层低光降噪解码器构成的多层解码器，每层解码器依次由多头自注意力层、残差层和归一化层构成。最终输出增强结果图，与真值进行对比，计算L2损失和感知损失，对整体网络进行微调，使得网络可以很好的适应低光真实场景，得到增强后的结果。Step (10): After repeating steps (8) and (9) for N layers, the output denoised image features of the last layer are input into a multi-layer decoder consisting of 6 layers of low-light denoising decoders. Each layer of the decoder is composed of a multi-head self-attention layer, a residual layer, and a normalization layer. The final output is an enhanced result image, which is compared with the true value, and the L2 loss and perceptual loss are calculated. The overall network is fine-tuned so that the network can adapt well to the low-light real scene and obtain the enhanced result.

优选的，步骤（6）和步骤（10）中的重建损失和感知损失函数为L1和L2范数；第一阶段训练时，使用Adam优化器，其中一阶矩估计的指数衰减率β₁=0.9，二阶矩估计的指数衰减率β₂=0.999；对第一阶段的低光去噪重构模型进行100轮次的训练，初始学习率为0.0006，在50个轮次后线性衰减到0.0003；经过第一阶段的训练后，使用真实低光照数据集对第二阶段的低光增强去噪模型进行了30轮次微调，学习率为0.0002；为增加数据的可用性，在第二阶段训练过程中对500对真实低光数据集进行随机裁剪和水平翻转来进行数据增强以提高第二阶段任务的低光增强去噪性能。Preferably, the reconstruction loss and perceptual loss functions in step (6) and step (10) are L1 and L2 norms; during the first stage training, the Adam optimizer is used, wherein the exponential decay rate of the first-order moment estimate β ₁ =0.9, and the exponential decay rate of the second-order moment estimate β ₂ =0.999; the low-light denoising reconstruction model of the first stage is trained for 100 rounds, with an initial learning rate of 0.0006, which linearly decays to 0.0003 after 50 rounds; after the first stage training, the low-light enhancement denoising model of the second stage is fine-tuned for 30 rounds using a real low-light dataset, with a learning rate of 0.0002; to increase the availability of data, 500 pairs of real low-light datasets are randomly cropped and horizontally flipped during the second stage training to perform data enhancement to improve the low-light enhancement denoising performance of the second stage task.

本领域的技术人员容易理解，以上所述仅为本发明的较佳实施例而已，并不用以限制本发明，凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等，均应包含在本发明的保护范围之内。It will be easily understood by those skilled in the art that the above description is only a preferred embodiment of the present invention and is not intended to limit the present invention. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the protection scope of the present invention.

Claims

1. A low-light image enhancement method based on random region hidden reconstruction, characterized in that it comprises the following steps:

Step 1: Get a data set of 10,000 images under good lighting conditions, the data type is color visible light images, and divide it into a training set and a test set; get a data set of 500 pairs of low-light noisy images and good-light images under the same real scene, the data type is color visible light images, and divide it into a training set and a test set;

Step 2: First, a dataset of 10,000 images under good lighting conditions is used in the first-stage network. The dataset is preprocessed by adding noise and reducing brightness, and then the image is subjected to random region hiding processing before being input into the first-stage network. The network is then trained to restore and reconstruct the low-light noisy image after random hiding processing, and a denoising feature encoder and a reconstruction feature encoder with good information extraction capabilities for low-light noisy images are obtained.

Step 3. Use 500 pairs of low-light noisy images of real scenes for the second-stage network, input the feature encoder obtained in step 2 for feature extraction, then input the obtained denoised image features and reconstructed image features into the reconstruction feature-guided multi-scale feature fusion for feature fusion, and then input the fused features into the low-light denoising decoder. Finally, use the paired good-light images in the same scene as the reference truth value for network training, so that the network parameters can quickly adapt to the real low-light scene, and obtain a low-light enhanced image with improved brightness and better quality after the network enhancement processing.

2. According to claim 1, a low-light image enhancement method based on random region hidden reconstruction is characterized in that the preprocessing in step 2 includes applying Gaussian noise and Poisson noise and uniformly reducing the image brightness; the random masking operation is to divide the low-light noisy image into multiple image blocks of size 16×16, generate a random sequence based on uniform distribution, sort the random values and map them to the original image blocks, and mask the image blocks at a ratio of 75%.

3. According to the low-illumination image enhancement method based on random region hidden reconstruction in claim 1, it is characterized in that the feature encoder of the branch of image restoration and reconstruction in step 2 only processes visible pixels that are not covered by the mask, and the decoder is responsible for performing the image reconstruction subtask by using the image features extracted by the reconstruction feature encoder and the covered pixel information.

4. According to the low-light image enhancement method based on random region hidden reconstruction according to claim 1, it is characterized in that the feature encoder in step 2 uses the encoder structure in the classic Transformer; finally, the feature maps of the two branches are respectively input into the feature decoder for noise localization and image restoration and reconstruction tasks, and the decoder uses the decoder structure in the Transformer.

5. According to the low-illumination image enhancement method based on random region hidden reconstruction described in claim 4, it is characterized in that the input of the multi-scale feature fusion module guided by the reconstruction feature in step 3 is the denoised image feature and the reconstructed image feature obtained by two feature encoders; first, the denoised image feature and the reconstructed image feature are input into the fusion module, and in the fusion module, the two features are respectively input into different convolutional neural networks with convolution kernels of 3×3 and step sizes of 1, 2, and 2 for downsampling, and denoised image features and reconstructed image features of three scales are obtained respectively, and then the multi-scale reconstructed image features are respectively input into the convolutional neural network with a convolution kernel of 3×3 and a step size of 1 to generate denoised images of each scale. The offset of 9 irregular sampling points at each scale relative to each feature point in the image feature map; according to the multi-scale offset map obtained under the guidance of the reconstructed features, each feature point in the feature map of each scale in the denoised image feature is convolved with 27 sampling points of 9 at each of its three scales to generate a new multi-scale denoised image feature map, which is then input into the next layer of feature fusion; after the last layer of fusion, the small-scale denoised image feature map is upsampled and spliced in turn to finally obtain the denoised image feature map of the original size; it is input into the low-light denoising decoder for feature decoding to obtain the enhanced image, and the decoder structure uses the decoder structure in Transformer.

6. A low-light image enhancement method based on random region hidden reconstruction according to claim 1, characterized in that the reconstruction loss and perceptual loss functions in step 2 and step 3 are L1 and L2 norms; the Adam optimizer is used in the first stage training, wherein the exponential decay rate of the first-order moment estimation β ₁ =0.9, and the exponential decay rate of the second-order moment estimation β ₂ =0.999; the low-light denoising reconstruction model of the first stage is trained for 100 rounds, with an initial learning rate of 0.0006, which linearly decays to 0.0003 after 50 rounds; after the first stage training, the low-light enhancement denoising model of the second stage is trained for 30 rounds using a real low-light dataset, with a learning rate of 0.0002; in order to increase the availability of data, 500 pairs of real low-light datasets are randomly cropped and horizontally flipped during the training process to perform data enhancement to improve the low-light enhancement performance of the second stage task.