CN114359289A

CN114359289A - An image processing method and related device

Info

Publication number: CN114359289A
Application number: CN202011043640.2A
Authority: CN
Inventors: 汪涛; 宋风龙; 任文琦; 操晓春
Original assignee: Huawei Technologies Co Ltd; Institute of Information Engineering of CAS
Current assignee: Huawei Technologies Co Ltd; Institute of Information Engineering of CAS
Priority date: 2020-09-28
Filing date: 2020-09-28
Publication date: 2022-04-15
Anticipated expiration: 2040-09-28
Also published as: CN114359289B

Abstract

The embodiment of the application discloses an image processing method, which is applied to the field of artificial intelligence and comprises the following steps: acquiring an image to be processed; processing the image to be processed through a first network to obtain a first feature, wherein the first network is configured to at least extract the feature for image enhancement; processing the image to be processed through a second network to obtain a second feature, wherein the second network is configured to extract at least a semantic segmentation feature; generating a third feature according to the first feature and the second feature; obtaining a semantic segmentation result of an image to be processed; generating a fourth feature according to the third feature and the semantic segmentation result of the image to be processed; and carrying out image reconstruction on the fourth characteristic to obtain a target image. By introducing the semantic features and semantic segmentation results of the image in the image enhancement processing process, different image enhancement strengths can be adopted for different semantic regions, the texture details can be accurately kept, and the reality of the texture details after the image enhancement is improved.

Description

An image processing method and related device

技术领域technical field

本申请涉及人工智能技术领域，尤其涉及一种图像处理方法及相关装置。The present application relates to the technical field of artificial intelligence, and in particular, to an image processing method and a related device.

背景技术Background technique

人工智能(artificial intelligence，AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能，感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。换句话说，人工智能是计算机科学的一个分支，它企图了解智能的实质，并生产出一种新的能以人类智能相似的方式作出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法，使机器具有感知、推理与决策的功能。Artificial intelligence (AI) is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results. In other words, artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new kind of intelligent machine that responds in a similar way to human intelligence. Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.

深度学习方法，是近年来人工智能领域发展的一个关键推动力，在计算机视觉的多种任务取得了令人瞩目的效果。在图像增强(也称为图像质量增强)领域，基于深度学习的方法都已经超过了传统方法。Deep learning methods, a key driving force for the development of artificial intelligence in recent years, have achieved remarkable results in various tasks of computer vision. In the field of image enhancement (also known as image quality enhancement), deep learning-based methods have surpassed traditional methods.

然而，目前基于深度学习的图像增强网络，对图像的增强效果不自然，经过图像增强网络处理后所得到的图像纹理细节不真实。However, the current image enhancement network based on deep learning has an unnatural enhancement effect on the image, and the image texture details obtained after processing by the image enhancement network are unreal.

发明内容SUMMARY OF THE INVENTION

本申请实施例提供了一种图像处理方法及相关装置，用于提高图像增强效果。Embodiments of the present application provide an image processing method and a related device, which are used to improve an image enhancement effect.

本申请第一方面提供一种图像处理方法，包括：获取待处理图像，所述待处理图像例如可以为需要进行图像增强的图像；通过第一网络对所述待处理图像处理，得到第一特征，所述第一网络被配置为至少提取用于图像增强的特征，所述用于图像增强的特征例如可以为图像低层特征；通过第二网络对所述待处理图像处理，得到第二特征，所述第二网络被配置为至少提取语义分割特征，所述语义分割特征例如可以为图像高层特征；根据所述第一特征和所述第二特征，生成第三特征；获取所述待处理图像的语义分割结果；根据所述第三特征和所述待处理图像的语义分割结果，生成第四特征；对所述第四特征进行图像重构，得到目标图像。A first aspect of the present application provides an image processing method, including: acquiring an image to be processed, for example, the image to be processed may be an image that needs to be enhanced; processing the image to be processed through a first network to obtain a first feature , the first network is configured to at least extract features used for image enhancement, and the features used for image enhancement may be, for example, low-level image features; the second network processes the to-be-processed image to obtain second features, The second network is configured to extract at least a semantic segmentation feature, for example, the semantic segmentation feature may be a high-level image feature; generate a third feature according to the first feature and the second feature; acquire the to-be-processed image the semantic segmentation result; generate a fourth feature according to the third feature and the semantic segmentation result of the to-be-processed image; perform image reconstruction on the fourth feature to obtain a target image.

本方案中，通过在图像增强处理过程中引入图像的语义特征以及语义分割结果，并且将语义特征与语义分割结果与用于图像增强的特征融合，从而能够针对不同语义区域采用不同的图像增强强度，精确地保持纹理细节，提升图像增强后的纹理细节真实性。In this solution, by introducing the semantic features of the image and the semantic segmentation results in the image enhancement process, and merging the semantic features and the semantic segmentation results with the features used for image enhancement, different image enhancement strengths can be used for different semantic regions. , accurately maintain texture details, and improve the authenticity of texture details after image enhancement.

可选的，在一种可能的实现方式中，所述获取所述待处理图像的语义分割结果，包括：通过第三网络对所述第三特征进行处理，得到所述待处理图像的语义分割结果。Optionally, in a possible implementation manner, the acquiring the semantic segmentation result of the to-be-processed image includes: processing the third feature through a third network to obtain the semantic segmentation of the to-be-processed image result.

由于所述第三特征是融合了图像增强相关的图像低层特征和语义分割相关的图像高层特征之后的特征。因此，通过对第三特征进行处理，可以在语义分割相关的图像高层特征的基础上，引入图像低层特征，即基于不同层次的特征来获得所述待处理图像的语义分割结果，提高获得的语义分割结果的精度。Because the third feature is a feature after fusion of low-level image features related to image enhancement and high-level image features related to semantic segmentation. Therefore, by processing the third feature, the low-level image features can be introduced on the basis of the high-level image features related to semantic segmentation, that is, the semantic segmentation results of the to-be-processed image can be obtained based on the features of different levels, and the obtained semantics can be improved. The precision of the segmentation result.

可选的，在一种可能的实现方式中，所述根据所述第一特征和所述第二特征，生成第三特征，包括：对所述第一特征和所述第二特征进行特征融合处理，得到所述第三特征；所述根据所述第三特征和所述待处理图像的语义分割结果，生成第四特征，包括：对所述第三特征和所述待处理图像的语义分割结果进行特征融合处理，得到所述第四特征。Optionally, in a possible implementation manner, the generating a third feature according to the first feature and the second feature includes: performing feature fusion on the first feature and the second feature processing to obtain the third feature; generating a fourth feature according to the third feature and the semantic segmentation result of the image to be processed, including: semantic segmentation of the third feature and the image to be processed As a result, feature fusion processing is performed to obtain the fourth feature.

可选的，在一种可能的实现方式中，所述特征融合处理包括求和处理、相乘处理、级联处理或级联处理和卷积处理。Optionally, in a possible implementation manner, the feature fusion processing includes summation processing, multiplication processing, cascade processing, or cascade processing and convolution processing.

可选的，在一种可能的实现方式中，根据所述第三特征和所述待处理图像的语义分割结果，生成第四特征之前，所述方法还包括：对所述第三特征进行处理，得到第五特征；所述根据所述第三特征和所述待处理图像的语义分割结果，生成第四特征，包括：根据所述第五特征和所述待处理图像的语义分割结果，生成第四特征。Optionally, in a possible implementation manner, before generating the fourth feature according to the third feature and the semantic segmentation result of the to-be-processed image, the method further includes: processing the third feature , to obtain a fifth feature; the generating a fourth feature according to the third feature and the semantic segmentation result of the image to be processed includes: generating a fourth feature according to the fifth feature and the semantic segmentation result of the image to be processed Fourth feature.

也就是说，图像处理装置在对所述第三特征进行特征提取之后，再基于进一步特征提取得到的第五特征和待处理图像的语义分割结果进行特征融合。通过对特征融合之后所得到的第三特征做进一步的特征提取处理，可以在第三特征的基础上提取粒度更细的特征，以提高后续特征融合所得到的第四特征的精度。That is to say, after the image processing apparatus performs feature extraction on the third feature, feature fusion is performed based on the fifth feature obtained by further feature extraction and the semantic segmentation result of the image to be processed. By performing further feature extraction processing on the third feature obtained after feature fusion, finer-grained features can be extracted on the basis of the third feature, so as to improve the accuracy of the fourth feature obtained by subsequent feature fusion.

可选的，在一种可能的实现方式中，所述通过第二网络对所述待处理图像处理，得到第二特征包括：对所述待处理图像进行预处理，得到预处理特征；对所述预处理特征进行下采样处理，得到下采样特征；通过所述第二网络对所述下采样特征进行处理，得到第六特征；对所述第六特征进行上采样处理，得到所述待处理图像的第二特征。通过下采样操作，降低预处理特征的分辨率，可以减少提取语义分割特征的计算量，降低对图像处理装置的算力要求。Optionally, in a possible implementation manner, the step of processing the image to be processed through the second network to obtain the second feature includes: preprocessing the image to be processed to obtain the preprocessing feature; performing down-sampling processing on the pre-processing features to obtain down-sampling features; processing the down-sampling features through the second network to obtain sixth features; performing up-sampling processing on the sixth features to obtain the to-be-processed features The second feature of the image. Through the downsampling operation, the resolution of the preprocessing feature is reduced, the computation amount for extracting the semantic segmentation feature can be reduced, and the computing power requirement on the image processing device can be reduced.

可选的，在一种可能的实现方式中，所述方法用于实现以下图像增强任务中的至少一种：图像超分辨率重构、图像去噪、图像去雾、图像去模糊、图像对比度增强、图像去马赛克、图像去雨、图像颜色增强、图像亮度增强、图像细节增强以及图像动态范围增强。Optionally, in a possible implementation manner, the method is used to implement at least one of the following image enhancement tasks: image super-resolution reconstruction, image denoising, image dehazing, image deblurring, image contrast Enhancement, Image Demosaicing, Image Deraining, Image Color Enhancement, Image Brightness Enhancement, Image Detail Enhancement, and Image Dynamic Range Enhancement.

本申请第二方面提供一种模型训练方法，包括：获取训练样本对，所述训练样本对包括第一图像以及第二图像，所述第一图像的质量低于所述第二图像；通过待训练图像处理模型对所述第一图像进行处理，得到预测图像，其中，待训练图像处理模型用于获取待处理图像；通过第一网络对所述第一图像处理，得到第一特征，所述第一网络被配置为至少提取用于图像增强的特征；通过第二网络对所述第一图像处理，得到第二特征，所述第二网络被配置为至少提取语义分割特征；根据所述第一特征和所述第二特征，生成第三特征；获取所述第一图像的语义分割结果；根据所述第三特征和所述第一图像的语义分割结果，生成第四特征；对所述第四特征进行图像重构，得到预测图像；根据所述训练样本对中的第二图像以及所述预测图像，获取第一损失，所述第一损失用于描述所述第二图像和所述预测图像之间的差异；至少根据所述第一损失对所述待训练图像处理模型的模型参数进行更新，直至满足模型训练条件，得到图像处理模型。A second aspect of the present application provides a model training method, including: acquiring a training sample pair, the training sample pair including a first image and a second image, the quality of the first image is lower than that of the second image; The image processing model is trained to process the first image to obtain a predicted image, wherein the image processing model to be trained is used to obtain the image to be processed; the first image is processed by the first network to obtain the first feature, the The first network is configured to at least extract features for image enhancement; the first image is processed by the second network to obtain second features, and the second network is configured to extract at least semantic segmentation features; a feature and the second feature to generate a third feature; obtain the semantic segmentation result of the first image; generate a fourth feature according to the third feature and the semantic segmentation result of the first image; Image reconstruction is performed on the fourth feature to obtain a predicted image; a first loss is obtained according to the second image in the training sample pair and the predicted image, and the first loss is used to describe the second image and the predicted image. Predicting the difference between the images; updating the model parameters of the image processing model to be trained according to at least the first loss, until the model training conditions are met, and an image processing model is obtained.

可选的，在一种可能的实现方式中，所述待训练图像处理模型还用于通过第三网络对所述第三特征进行处理，得到所述第一图像的语义分割预测结果。Optionally, in a possible implementation manner, the image processing model to be trained is further configured to process the third feature through a third network to obtain a semantic segmentation prediction result of the first image.

可选的，在一种可能的实现方式中，所述待训练图像处理模型还用于：获取所述第一图像的语义分割真实结果；根据所述语义分割预测结果和所述语义分割真实结果，获取第二损失，所述第二损失用于描述所述语义分割预测结果和所述语义分割真实结果之间的差异；至少根据所述第一损失和所述第二损失对所述待训练图像处理模型的模型参数进行更新，直至满足模型训练条件，得到图像处理模型。Optionally, in a possible implementation manner, the image processing model to be trained is further used to: obtain the real result of semantic segmentation of the first image; according to the prediction result of semantic segmentation and the real result of semantic segmentation , obtain a second loss, the second loss is used to describe the difference between the predicted result of the semantic segmentation and the real result of the semantic segmentation; The model parameters of the image processing model are updated until the model training conditions are met, and an image processing model is obtained.

可选的，在一种可能的实现方式中，所述待训练图像处理模型还用于：对所述第一特征和所述第二特征进行特征融合处理，得到所述第三特征；所述根据所述第三特征和所述第一图像的语义分割结果，生成第四特征，包括：对所述第三特征和所述第一图像的语义分割结果进行特征融合处理，得到所述第四特征。Optionally, in a possible implementation manner, the image processing model to be trained is further configured to: perform feature fusion processing on the first feature and the second feature to obtain the third feature; the Generating a fourth feature according to the third feature and the semantic segmentation result of the first image includes: performing feature fusion processing on the third feature and the semantic segmentation result of the first image to obtain the fourth feature. feature.

可选的，在一种可能的实现方式中，所述特征融合处理包括求和处理、级联处理、或级联处理和卷积处理；所述特征融合处理包括求和处理、级联处理、或级联处理和卷积处理。Optionally, in a possible implementation manner, the feature fusion processing includes summation processing, cascade processing, or cascade processing and convolution processing; the feature fusion processing includes summation processing, cascade processing, Or cascade processing and convolution processing.

可选的，在一种可能的实现方式中，所述待训练图像处理模型还用于对所述第三特征进行处理，得到第五特征；根据所述第三特征和所述第一图像的语义分割结果，生成第四特征。Optionally, in a possible implementation manner, the image processing model to be trained is further configured to process the third feature to obtain a fifth feature; according to the third feature and the first image Semantic segmentation results, generate the fourth feature.

可选的，在一种可能的实现方式中，所述待训练图像处理模型还用于对所述第一图像进行预处理，得到预处理特征；对所述预处理特征进行下采样处理，得到下采样特征；通过所述第二网络对所述下采样特征进行处理，得到第六特征；对所述第六特征进行上采样处理，得到所述第一图像的第二特征。Optionally, in a possible implementation manner, the image processing model to be trained is further used to preprocess the first image to obtain preprocessing features; perform downsampling processing on the preprocessing features to obtain down-sampling features; processing the down-sampling features through the second network to obtain sixth features; performing up-sampling processing on the sixth features to obtain the second features of the first image.

可选的，在一种可能的实现方式中，所述图像处理模型用于实现以下图像增强任务中的至少一种：图像超分辨率重构、图像去噪、图像去雾、图像去模糊、图像对比度增强、图像去马赛克、图像去雨、图像颜色增强、图像亮度增强、图像细节增强以及图像动态范围增强。Optionally, in a possible implementation manner, the image processing model is used to implement at least one of the following image enhancement tasks: image super-resolution reconstruction, image denoising, image dehazing, image deblurring, Image contrast enhancement, image demosaicing, image deraining, image color enhancement, image brightness enhancement, image detail enhancement, and image dynamic range enhancement.

本申请第三方面提供一种图像处理装置，包括：获取单元和处理单元；所述获取单元用于获取待处理图像；所述处理单元，用于通过第一网络对所述待处理图像处理，得到第一特征，所述第一网络被配置为至少提取用于图像增强的特征；通过第二网络对所述待处理图像处理，得到第二特征，所述第二网络被配置为至少提取语义分割特征；根据所述第一特征和所述第二特征，生成第三特征；所述获取单元，还用于获取所述待处理图像的语义分割结果；所述处理单元，还用于根据所述第三特征和所述待处理图像的语义分割结果，生成第四特征；对所述第四特征进行图像重构，得到目标图像。A third aspect of the present application provides an image processing device, comprising: an acquisition unit and a processing unit; the acquisition unit is configured to acquire an image to be processed; the processing unit is configured to process the image to be processed through a first network, obtaining a first feature, the first network is configured to extract at least features for image enhancement; processing the to-be-processed image by a second network to obtain a second feature, the second network is configured to at least extract semantics segmentation feature; generating a third feature according to the first feature and the second feature; the obtaining unit is further configured to obtain the semantic segmentation result of the image to be processed; the processing unit is further configured to obtain the semantic segmentation result according to the The third feature and the semantic segmentation result of the to-be-processed image are used to generate a fourth feature; and image reconstruction is performed on the fourth feature to obtain a target image.

可选的，在一种可能的实现方式中，所述处理单元，还用于通过第三网络对所述第三特征进行处理，得到所述待处理图像的语义分割结果。Optionally, in a possible implementation manner, the processing unit is further configured to process the third feature through a third network to obtain a semantic segmentation result of the image to be processed.

可选的，在一种可能的实现方式中，所述处理单元，还用于对所述第一特征和所述第二特征进行特征融合处理，得到所述第三特征；对所述第三特征和所述待处理图像的语义分割结果进行特征融合处理，得到所述第四特征。Optionally, in a possible implementation manner, the processing unit is further configured to perform feature fusion processing on the first feature and the second feature to obtain the third feature; The feature and the semantic segmentation result of the image to be processed are subjected to feature fusion processing to obtain the fourth feature.

可选的，在一种可能的实现方式中，所述特征融合处理包括求和处理、相乘处理、级联处理和级联卷积处理中的至少一个。Optionally, in a possible implementation manner, the feature fusion processing includes at least one of summation processing, multiplication processing, concatenated processing and concatenated convolution processing.

可选的，在一种可能的实现方式中，所述处理单元，还用于对所述第三特征进行处理，得到第五特征；根据所述第五特征和所述待处理图像的语义分割结果，生成第四特征。Optionally, in a possible implementation manner, the processing unit is further configured to process the third feature to obtain a fifth feature; according to the fifth feature and the semantic segmentation of the image to be processed As a result, a fourth feature is generated.

可选的，在一种可能的实现方式中，所述处理单元，还用于对所述待处理图像进行预处理，得到预处理特征；对所述预处理特征进行下采样处理，得到下采样特征；通过所述第二网络对所述下采样特征进行处理，得到第六特征；对所述第六特征进行上采样处理，得到所述待处理图像的第二特征。Optionally, in a possible implementation manner, the processing unit is further configured to perform preprocessing on the to-be-processed image to obtain preprocessing features; perform downsampling processing on the preprocessing features to obtain downsampling. feature; process the down-sampling feature through the second network to obtain a sixth feature; perform up-sampling processing on the sixth feature to obtain the second feature of the image to be processed.

可选的，在一种可能的实现方式中，所述图像处理装置用于实现以下图像增强任务中的至少一种：图像超分辨率重构、图像去噪、图像去雾、图像去模糊、图像对比度增强、图像去马赛克、图像去雨、图像颜色增强、图像亮度增强、图像细节增强以及图像动态范围增强。Optionally, in a possible implementation manner, the image processing device is configured to implement at least one of the following image enhancement tasks: image super-resolution reconstruction, image denoising, image dehazing, image deblurring, Image contrast enhancement, image demosaicing, image deraining, image color enhancement, image brightness enhancement, image detail enhancement, and image dynamic range enhancement.

本申请第四方面提供一种模型训练装置，包括：获取单元和训练单元；所述获取单元，用于获取训练样本对，所述训练样本对包括第一图像以及第二图像，所述第一图像的质量低于所述第二图像；所述训练单元，用于通过待训练图像处理模型对所述第一图像进行处理，得到预测图像，其中，待训练图像处理模型用于获取待处理图像；通过第一网络对所述第一图像处理，得到第一特征，所述第一网络被配置为至少提取用于图像增强的特征；通过第二网络对所述第一图像处理，得到第二特征，所述第二网络被配置为至少提取语义分割特征；根据所述第一特征和所述第二特征，生成第三特征；获取所述第一图像的语义分割结果；根据所述第三特征和所述第一图像的语义分割结果，生成第四特征；对所述第四特征进行图像重构，得到预测图像；根据所述训练样本对中的第二图像以及所述预测图像，获取第一损失，所述第一损失用于描述所述第二图像和所述预测图像之间的差异；至少根据所述第一损失对所述待训练图像处理模型的模型参数进行更新，直至满足模型训练条件，得到图像处理模型。A fourth aspect of the present application provides a model training device, comprising: an acquisition unit and a training unit; the acquisition unit is configured to acquire a training sample pair, the training sample pair includes a first image and a second image, the first image The quality of the image is lower than that of the second image; the training unit is configured to process the first image through the image processing model to be trained to obtain a predicted image, wherein the image processing model to be trained is used to obtain the image to be processed ; Process the first image through a first network to obtain a first feature, and the first network is configured to extract at least features for image enhancement; Process the first image through a second network to obtain a second feature feature, the second network is configured to extract at least a semantic segmentation feature; generate a third feature according to the first feature and the second feature; obtain a semantic segmentation result of the first image; according to the third feature feature and the semantic segmentation result of the first image to generate a fourth feature; image reconstruction is performed on the fourth feature to obtain a predicted image; according to the second image in the training sample pair and the predicted image, obtain a first loss, the first loss is used to describe the difference between the second image and the predicted image; the model parameters of the image processing model to be trained are updated at least according to the first loss until the Model training conditions to obtain an image processing model.

可选的，在一种可能的实现方式中，所述训练单元还用于通过第三网络对所述第三特征进行处理，得到所述第一图像的语义分割预测结果。Optionally, in a possible implementation manner, the training unit is further configured to process the third feature through a third network to obtain a semantic segmentation prediction result of the first image.

可选的，在一种可能的实现方式中，所述训练单元还用于获取所述第一图像的语义分割真实结果；根据所述语义分割预测结果和所述语义分割真实结果，获取第二损失，所述第二损失用于描述所述语义分割预测结果和所述语义分割真实结果之间的差异；至少根据所述第一损失和所述第二损失对所述待训练图像处理模型的模型参数进行更新，直至满足模型训练条件，得到图像处理模型。Optionally, in a possible implementation manner, the training unit is further configured to obtain the real result of semantic segmentation of the first image; loss, the second loss is used to describe the difference between the predicted result of semantic segmentation and the real result of semantic segmentation; at least according to the first loss and the second loss on the image processing model to be trained The model parameters are updated until the model training conditions are met, and an image processing model is obtained.

可选的，在一种可能的实现方式中，所述训练单元还用于对所述第一特征和所述第二特征进行特征融合处理，得到所述第三特征；对所述第三特征和所述第一图像的语义分割结果进行特征融合处理，得到所述第四特征。Optionally, in a possible implementation manner, the training unit is further configured to perform feature fusion processing on the first feature and the second feature to obtain the third feature; Perform feature fusion processing with the semantic segmentation result of the first image to obtain the fourth feature.

可选的，在一种可能的实现方式中，所述训练单元还用于对所述第三特征进行处理，得到第五特征；根据所述第三特征和所述第一图像的语义分割结果，生成第四特征。Optionally, in a possible implementation manner, the training unit is further configured to process the third feature to obtain a fifth feature; according to the third feature and the semantic segmentation result of the first image , to generate the fourth feature.

可选的，在一种可能的实现方式中，所述训练单元还用于对所述第一图像进行预处理，得到预处理特征；对所述预处理特征进行下采样处理，得到下采样特征；通过所述第二网络对所述下采样特征进行处理，得到第六特征；对所述第六特征进行上采样处理，得到所述第一图像的第二特征。Optionally, in a possible implementation manner, the training unit is further configured to preprocess the first image to obtain preprocessing features; perform downsampling processing on the preprocessing features to obtain downsampling features. ; Process the down-sampling feature through the second network to obtain a sixth feature; and perform an up-sampling process on the sixth feature to obtain the second feature of the first image.

本申请第五方面提供了一种图像处理装置，可以包括处理器，处理器和存储器耦合，存储器存储有程序指令，当存储器存储的程序指令被处理器执行时实现上述第一方面所述的方法。对于处理器执行第一方面的各个可能实现方式中的步骤，具体均可以参阅第一方面，此处不再赘述。A fifth aspect of the present application provides an image processing apparatus, which may include a processor, the processor is coupled with a memory, the memory stores program instructions, and the method described in the first aspect is implemented when the program instructions stored in the memory are executed by the processor . For the steps in each possible implementation manner of the first aspect performed by the processor, reference may be made to the first aspect for details, and details are not repeated here.

本申请第六方面提供了一种模型训练装置，可以包括处理器，处理器和存储器耦合，存储器存储有程序指令，当存储器存储的程序指令被处理器执行时实现上述第二方面所述的方法。对于处理器执行第二方面的各个可能实现方式中的步骤，具体均可以参阅第二方面，此处不再赘述。A sixth aspect of the present application provides a model training device, which may include a processor, the processor is coupled to a memory, the memory stores program instructions, and the method described in the second aspect is implemented when the program instructions stored in the memory are executed by the processor . For the steps in each possible implementation manner of the second aspect performed by the processor, reference may be made to the second aspect for details, and details are not repeated here.

本申请第七方面提供了一种计算机可读存储介质，所述计算机可读存储介质中存储有计算机程序，当其在计算机上运行时，使得计算机执行上述第一方面所述的方法。A seventh aspect of the present application provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when it runs on a computer, causes the computer to execute the method described in the first aspect.

本申请第八方面提供了一种计算机可读存储介质，所述计算机可读存储介质中存储有计算机程序，当其在计算机上运行时，使得计算机执行上述第二方面所述的方法。An eighth aspect of the present application provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when it runs on a computer, causes the computer to execute the method described in the second aspect.

本申请第九方面提供了一种电路系统，所述电路系统包括处理电路，所述处理电路配置为执行上述第一方面所述的方法。A ninth aspect of the present application provides a circuit system, the circuit system comprising a processing circuit configured to perform the method of the above-mentioned first aspect.

本申请第十方面提供了一种电路系统，所述电路系统包括处理电路，所述处理电路配置为执行上述第二方面所述的方法。A tenth aspect of the present application provides a circuit system, the circuit system includes a processing circuit configured to perform the method described in the second aspect above.

本申请第十一方面提供了一种计算机程序，当其在计算机上运行时，使得计算机执行上述第一方面所述的方法。An eleventh aspect of the present application provides a computer program, which, when executed on a computer, causes the computer to execute the method described in the first aspect above.

本申请第十二方面提供了一种计算机程序，当其在计算机上运行时，使得计算机执行上述第二方面所述的方法。A twelfth aspect of the present application provides a computer program that, when executed on a computer, causes the computer to execute the method described in the second aspect above.

本申请第十三方面提供了一种芯片系统，该芯片系统包括处理器，用于支持服务器或门限值获取装置实现上述方面中所涉及的功能，例如，发送或处理上述方法中所涉及的数据和/或信息。在一种可能的设计中，所述芯片系统还包括存储器，所述存储器，用于保存服务器或通信设备必要的程序指令和数据。该芯片系统，可以由芯片构成，也可以包括芯片和其他分立器件。A thirteenth aspect of the present application provides a chip system, the chip system includes a processor for supporting a server or a threshold value acquisition device to implement the functions involved in the above aspects, for example, sending or processing the above methods. data and/or information. In a possible design, the chip system further includes a memory for storing necessary program instructions and data of the server or the communication device. The chip system may be composed of chips, or may include chips and other discrete devices.

附图说明Description of drawings

图1为本申请实施例提供的人工智能主体框架的一种结构示意图；1 is a schematic structural diagram of an artificial intelligence main frame provided by an embodiment of the present application;

图2a为本申请实施例提供的一种图像处理系统；FIG. 2a is an image processing system provided by an embodiment of the present application;

图2b为本申请实施例提供的另一种图像处理系统；FIG. 2b is another image processing system provided by an embodiment of the present application;

图2c为本申请实施例提供的图像处理的相关设备的示意图；FIG. 2c is a schematic diagram of a related device for image processing provided by an embodiment of the present application;

图3a为本申请实施例提供的一种系统100架构的示意图；FIG. 3a is a schematic diagram of the architecture of a system 100 provided by an embodiment of the present application;

图3b为本申请实施例提供的一种图像语义分割的示意图；3b is a schematic diagram of an image semantic segmentation provided by an embodiment of the present application;

图4为本申请实施例提供的一种图像处理方法的流程示意图；4 is a schematic flowchart of an image processing method provided by an embodiment of the present application;

图5为本申请实施例提供的一种稠密连接的空洞卷积网络的结构示意图；5 is a schematic structural diagram of a densely connected atrous convolutional network provided by an embodiment of the present application;

图6a为本申请实施例提供的一种用于图像处理的架构示意图；FIG. 6a is a schematic diagram of an architecture for image processing provided by an embodiment of the present application;

图6b为本申请实施例提供的一种用于图像处理的网络结构示意图；FIG. 6b is a schematic diagram of a network structure for image processing provided by an embodiment of the present application;

图7为本申请实施例提供的一个客观指标对比示意图；7 is a schematic diagram of an objective index comparison provided in the embodiment of the present application;

图8为本申请实施例提供的另一个客观指标对比示意图；Fig. 8 is another objective index comparison schematic diagram provided by the embodiment of the present application;

图9为本申请实施例提供的图像对比示意图；9 is a schematic diagram of image comparison provided by the embodiment of the present application;

图10为本申请实施例提供的一种模型训练方法的流程示意图FIG. 10 is a schematic flowchart of a model training method provided by an embodiment of the present application

图11为本申请实施例提供的一种图像处理装置的结构示意图；FIG. 11 is a schematic structural diagram of an image processing apparatus according to an embodiment of the application;

图12为本申请实施例提供的一种模型训练装置的结构示意图；12 is a schematic structural diagram of a model training apparatus provided by an embodiment of the application;

图13为本申请实施例提供的执行设备的一种结构示意图；13 is a schematic structural diagram of an execution device provided by an embodiment of the present application;

图14为本申请实施例提供的训练设备一种结构示意图；14 is a schematic structural diagram of a training device provided by an embodiment of the application;

图15为本申请实施例提供的芯片的一种结构示意图。FIG. 15 is a schematic structural diagram of a chip provided by an embodiment of the present application.

具体实施方式Detailed ways

下面结合本发明实施例中的附图对本发明实施例进行描述。本发明的实施方式部分使用的术语仅用于对本发明的具体实施例进行解释，而非旨在限定本发明。The embodiments of the present invention will be described below with reference to the accompanying drawings in the embodiments of the present invention. The terms used in the embodiments of the present invention are only used to explain specific embodiments of the present invention, and are not intended to limit the present invention.

下面结合附图，对本申请的实施例进行描述。本领域普通技术人员可知，随着技术的发展和新场景的出现，本申请实施例提供的技术方案对于类似的技术问题，同样适用。The embodiments of the present application will be described below with reference to the accompanying drawings. Those of ordinary skill in the art know that with the development of technology and the emergence of new scenarios, the technical solutions provided in the embodiments of the present application are also applicable to similar technical problems.

本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象，而不必用于描述特定的顺序或先后次序。应该理解这样使用的术语在适当情况下可以互换，这仅仅是描述本申请的实施例中对相同属性的对象在描述时所采用的区分方式。此外，术语“包括”和“具有”以及他们的任何变形，意图在于覆盖不排他的包含，以便包含一系列单元的过程、方法、系统、产品或设备不必限于那些单元，而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它单元。The terms "first", "second" and the like in the description and claims of the present application and the above drawings are used to distinguish similar objects, and are not necessarily used to describe a specific order or sequence. It should be understood that the terms used in this way can be interchanged under appropriate circumstances, and this is only a distinguishing manner adopted when describing objects with the same attributes in the embodiments of the present application. Furthermore, the terms "comprising" and "having" and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, product or device comprising a series of elements is not necessarily limited to those elements, but may include no explicit or other units inherent to these processes, methods, products, or devices.

首先对人工智能系统总体工作流程进行描述，请参见图1，图1示出的为人工智能主体框架的一种结构示意图，下面从“智能信息链”(水平轴)和“IT价值链”(垂直轴)两个维度对上述人工智能主题框架进行阐述。其中，“智能信息链”反映从数据的获取到处理的一列过程。举例来说，可以是智能信息感知、智能信息表示与形成、智能推理、智能决策、智能执行与输出的一般过程。在这个过程中，数据经历了“数据—信息—知识—智慧”的凝练过程。“IT价值链”从人智能的底层基础设施、信息(提供和处理技术实现)到系统的产业生态过程，反映人工智能为信息技术产业带来的价值。First, the overall workflow of the artificial intelligence system will be described. Please refer to Figure 1. Figure 1 shows a schematic structural diagram of the main frame of artificial intelligence. The above-mentioned artificial intelligence theme framework is explained in two dimensions (vertical axis). Among them, the "intelligent information chain" reflects a series of processes from data acquisition to processing. For example, it can be the general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, intelligent execution and output. In this process, data has gone through the process of "data-information-knowledge-wisdom". The "IT value chain" reflects the value brought by artificial intelligence to the information technology industry from the underlying infrastructure of human intelligence, information (providing and processing technology implementation) to the industrial ecological process of the system.

(1)基础设施(1) Infrastructure

基础设施为人工智能系统提供计算能力支持，实现与外部世界的沟通，并通过基础平台实现支撑。通过传感器与外部沟通；计算能力由智能芯片(CPU、NPU、GPU、ASIC、FPGA等硬件加速芯片)提供；基础平台包括分布式计算框架及网络等相关的平台保障和支持，可以包括云存储和计算、互联互通网络等。举例来说，传感器和外部沟通获取数据，这些数据提供给基础平台提供的分布式计算系统中的智能芯片进行计算。The infrastructure provides computing power support for artificial intelligence systems, realizes communication with the outside world, and supports through the basic platform. Communication with the outside world through sensors; computing power is provided by smart chips (hardware acceleration chips such as CPU, NPU, GPU, ASIC, FPGA); the basic platform includes distributed computing framework and network-related platform guarantee and support, which can include cloud storage and computing, interconnection networks, etc. For example, sensors communicate with external parties to obtain data, and these data are provided to the intelligent chips in the distributed computing system provided by the basic platform for calculation.

(2)数据(2) Data

基础设施的上一层的数据用于表示人工智能领域的数据来源。数据涉及到图形、图像、语音、文本，还涉及到传统设备的物联网数据，包括已有系统的业务数据以及力、位移、液位、温度、湿度等感知数据。The data on the upper layer of the infrastructure is used to represent the data sources in the field of artificial intelligence. The data involves graphics, images, voice, and text, as well as IoT data from traditional devices, including business data from existing systems and sensory data such as force, displacement, liquid level, temperature, and humidity.

(3)数据处理(3) Data processing

数据处理通常包括数据训练，机器学习，深度学习，搜索，推理，决策等方式。Data processing usually includes data training, machine learning, deep learning, search, reasoning, decision-making, etc.

其中，机器学习和深度学习可以对数据进行符号化和形式化的智能信息建模、抽取、预处理、训练等。Among them, machine learning and deep learning can perform symbolic and formalized intelligent information modeling, extraction, preprocessing, training, etc. on data.

推理是指在计算机或智能系统中，模拟人类的智能推理方式，依据推理控制策略，利用形式化的信息进行机器思维和求解问题的过程，典型的功能是搜索与匹配。Reasoning refers to the process of simulating human's intelligent reasoning method in a computer or intelligent system, using formalized information to carry out machine thinking and solving problems according to the reasoning control strategy, and the typical function is search and matching.

决策是指智能信息经过推理后进行决策的过程，通常提供分类、排序、预测等功能。Decision-making refers to the process of making decisions after intelligent information is reasoned, usually providing functions such as classification, sorting, and prediction.

(4)通用能力(4) General ability

对数据经过上面提到的数据处理后，进一步基于数据处理的结果可以形成一些通用的能力，比如可以是算法或者一个通用系统，例如，翻译，文本的分析，计算机视觉的处理，语音识别，图像的识别等等。After the above-mentioned data processing, some general capabilities can be formed based on the results of data processing, such as algorithms or a general system, such as translation, text analysis, computer vision processing, speech recognition, image identification, etc.

(5)智能产品及行业应用(5) Smart products and industry applications

智能产品及行业应用指人工智能系统在各领域的产品和应用，是对人工智能整体解决方案的封装，将智能信息决策产品化、实现落地应用，其应用领域主要包括：智能终端、智能交通、智能医疗、自动驾驶、平安城市等。Intelligent products and industry applications refer to the products and applications of artificial intelligence systems in various fields. They are the encapsulation of the overall solution of artificial intelligence, and the productization of intelligent information decision-making to achieve landing applications. Its application areas mainly include: intelligent terminals, intelligent transportation, Smart healthcare, autonomous driving, safe city, etc.

接下来介绍几种本申请的应用场景。Next, several application scenarios of the present application are introduced.

图2a为本申请实施例提供的一种图像处理系统，该图像处理系统包括用户设备以及数据处理设备。其中，用户设备包括手机、个人电脑或者信息处理中心等智能终端。用户设备为图像处理的发起端，作为图像增强请求的发起方，通常由用户通过用户设备发起请求。FIG. 2a is an image processing system provided by an embodiment of the present application, where the image processing system includes a user equipment and a data processing device. The user equipment includes smart terminals such as mobile phones, personal computers, or information processing centers. The user equipment is the initiator of image processing. As the initiator of the image enhancement request, the user usually initiates the request through the user equipment.

上述数据处理设备可以是云服务器、网络服务器、应用服务器以及管理服务器等具有数据处理功能的设备或服务器。数据处理设备通过交互接口接收来自智能终端的图像增强请求，再通过存储数据的存储器以及数据处理的处理器环节进行机器学习，深度学习，搜索，推理，决策等方式的图像处理。数据处理设备中的存储器可以是一个统称，包括本地存储以及存储历史数据的数据库，数据库可以在数据处理设备上，也可以在其它网络服务器上。The above-mentioned data processing device may be a device or server with data processing functions, such as a cloud server, a network server, an application server, and a management server. The data processing device receives the image enhancement request from the intelligent terminal through the interactive interface, and then performs image processing in the form of machine learning, deep learning, search, reasoning, and decision-making through the memory for storing data and the processor for data processing. The memory in the data processing device may be a general term, including local storage and a database for storing historical data. The database may be on the data processing device or on other network servers.

在图2a所示的图像处理系统中，用户设备可以接收用户的指令，例如用户设备可以获取用户输入/选择的一张图像，然后向数据处理设备发起请求，使得数据处理设备针对用户设备得到的该图像执行图像增强处理应用(例如图像超分辨率重构、图像去噪、图像去雾、图像去模糊以及图像对比度增强等)，从而得到针对该图像的对应的处理结果。示例性的，用户设备可以获取用户输入的一张图像，然后向数据处理设备发起图像去噪请求，使得数据处理设备对该图像进行图像去噪，从而得到去噪后的图像。In the image processing system shown in FIG. 2a, the user equipment can receive instructions from the user, for example, the user equipment can acquire an image input/selected by the user, and then initiate a request to the data processing equipment, so that the data processing equipment can target the data obtained by the user equipment. The image is subjected to image enhancement processing applications (such as image super-resolution reconstruction, image denoising, image dehazing, image deblurring, and image contrast enhancement, etc.), thereby obtaining corresponding processing results for the image. Exemplarily, the user equipment may acquire an image input by the user, and then initiate an image denoising request to the data processing device, so that the data processing device performs image denoising on the image, thereby obtaining a denoised image.

在图2a中，数据处理设备可以执行本申请实施例的图像处理方法。In Fig. 2a, the data processing device may execute the image processing method of the embodiment of the present application.

图2b为本申请实施例提供的另一种图像处理系统，在图2b中，用户设备直接作为数据处理设备，该用户设备能够直接获取来自用户的输入并直接由用户设备本身的硬件进行处理，具体过程与图2a相似，可参考上面的描述，在此不再赘述。Fig. 2b is another image processing system provided by the embodiment of the application. In Fig. 2b, the user equipment is directly used as a data processing device, and the user equipment can directly obtain the input from the user and directly process it by the hardware of the user equipment itself, The specific process is similar to that of FIG. 2a, and the above description can be referred to, and details are not repeated here.

在图2b所示的图像处理系统中，用户设备可以接收用户的指令，例如用户设备可以获取用户在用户设备中所选择的一张图像，然后再由用户设备自身针对该图像执行图像处理应用(例如图像超分辨率重构、图像去噪、图像去雾、图像去模糊以及图像对比度增强等)，从而得到针对该图像的对应的处理结果。In the image processing system shown in Fig. 2b, the user equipment can receive instructions from the user, for example, the user equipment can acquire an image selected by the user in the user equipment, and then the user equipment can execute an image processing application ( For example, image super-resolution reconstruction, image denoising, image dehazing, image deblurring, and image contrast enhancement, etc.), so as to obtain corresponding processing results for the image.

在图2b中，用户设备自身就可以执行本申请实施例的图像处理方法。In FIG. 2b, the user equipment itself can execute the image processing method of the embodiment of the present application.

图2c是本申请实施例提供的图像处理的相关设备的示意图。FIG. 2c is a schematic diagram of a related device for image processing provided by an embodiment of the present application.

上述图2a和图2b中的用户设备具体可以是图2c中的本地设备301或者本地设备302，图2a中的数据处理设备具体可以是图2c中的执行设备210，其中，数据存储系统250可以存储执行设备210的待处理数据，数据存储系统250可以集成在执行设备210上，也可以设置在云上或其它网络服务器上。The user equipment in the above-mentioned FIGS. 2a and 2b may specifically be the local device 301 or the local device 302 in FIG. 2c, and the data processing device in FIG. 2a may specifically be the execution device 210 in FIG. 2c, wherein the data storage system 250 may be To store the data to be processed by the execution device 210, the data storage system 250 may be integrated on the execution device 210, or may be set on the cloud or other network servers.

图2a和图2b中的处理器可以通过神经网络模型或者其它模型(例如，基于支持向量机的模型)进行数据训练/机器学习/深度学习，并利用数据最终训练或者学习得到的模型针对图像执行图像处理应用，从而得到相应的处理结果。The processors in Figures 2a and 2b may perform data training/machine learning/deep learning through a neural network model or other model (eg, a support vector machine-based model), and use the data to finally train or learn the model to execute on the image Image processing application, so as to obtain the corresponding processing results.

图3a是本申请实施例提供的一种系统100架构的示意图，在图3a中，执行设备110配置输入/输出(input/output，I/O)接口112，用于与外部设备进行数据交互，用户可以通过客户设备140向I/O接口112输入数据，所述输入数据在本申请实施例中可以包括：各个待调度任务、可调用资源以及其他参数。FIG. 3a is a schematic diagram of the architecture of a system 100 provided by an embodiment of the present application. In FIG. 3a, the execution device 110 is configured with an input/output (I/O) interface 112, which is used for data interaction with external devices, The user may input data to the I/O interface 112 through the client device 140, and the input data may include various tasks to be scheduled, callable resources, and other parameters in this embodiment of the present application.

在执行设备110对输入数据进行预处理，或者在执行设备110的计算模块111执行计算等相关的处理(比如进行本申请中神经网络的功能实现)过程中，执行设备110可以调用数据存储系统150中的数据、代码等以用于相应的处理，也可以将相应处理得到的数据、指令等存入数据存储系统150中。When the execution device 110 preprocesses the input data, or the calculation module 111 of the execution device 110 performs computation and other related processing (for example, performing the function realization of the neural network in this application), the execution device 110 may call the data storage system 150 The data, codes, etc. in the corresponding processing can also be stored in the data storage system 150 .

最后，I/O接口112将处理结果返回给客户设备140，从而提供给用户。Finally, the I/O interface 112 returns the processing results to the client device 140 for provision to the user.

值得说明的是，训练设备120可以针对不同的目标或称不同的任务，基于不同的训练数据生成相应的目标模型/规则，该相应的目标模型/规则即可以用于实现上述目标或完成上述任务，从而为用户提供所需的结果。其中，训练数据可以存储在数据库130中，且来自于数据采集设备160采集的训练样本。It is worth noting that the training device 120 can generate corresponding target models/rules based on different training data for different goals or tasks, and the corresponding target models/rules can be used to achieve the above-mentioned goals or complete the above-mentioned tasks. , which provides the user with the desired result. The training data may be stored in the database 130 and come from training samples collected by the data collection device 160 .

在图3a中所示情况下，用户可以手动给定输入数据，该手动给定可以通过I/O接口112提供的界面进行操作。另一种情况下，客户设备140可以自动地向I/O接口112发送输入数据，如果要求客户设备140自动发送输入数据需要获得用户的授权，则用户可以在客户设备140中设置相应权限。用户可以在客户设备140查看执行设备110输出的结果，具体的呈现形式可以是显示、声音、动作等具体方式。客户设备140也可以作为数据采集端，采集如图所示输入I/O接口112的输入数据及输出I/O接口112的输出结果作为新的样本数据，并存入数据库130。当然，也可以不经过客户设备140进行采集，而是由I/O接口112直接将如图所示输入I/O接口112的输入数据及输出I/O接口112的输出结果，作为新的样本数据存入数据库130。In the case shown in FIG. 3 a , the user can manually specify input data, which can be operated through the interface provided by the I/O interface 112 . In another case, the client device 140 can automatically send the input data to the I/O interface 112 . If the user's authorization is required to request the client device 140 to automatically send the input data, the user can set the corresponding permission in the client device 140 . The user can view the result output by the execution device 110 on the client device 140, and the specific presentation form can be a specific manner such as display, sound, and action. The client device 140 can also be used as a data collection terminal to collect the input data of the input I/O interface 112 and the output result of the output I/O interface 112 as new sample data as shown in the figure, and store them in the database 130 . Of course, it is also possible not to collect through the client device 140, but the I/O interface 112 directly uses the input data input into the I/O interface 112 and the output result of the output I/O interface 112 as shown in the figure as a new sample The data is stored in database 130 .

值得注意的是，图3a仅是本申请实施例提供的一种系统架构的示意图，图中所示设备、器件、模块等之间的位置关系不构成任何限制，例如，在图3a中，数据存储系统150相对执行设备110是外部存储器，在其它情况下，也可以将数据存储系统150置于执行设备110中。如图3a所示，可以根据训练设备120训练得到神经网络。It is worth noting that FIG. 3a is only a schematic diagram of a system architecture provided by an embodiment of the present application, and the positional relationship between the devices, devices, modules, etc. shown in the figure does not constitute any limitation. For example, in FIG. 3a, the data The storage system 150 is an external memory relative to the execution device 110 , and in other cases, the data storage system 150 may also be placed in the execution device 110 . As shown in FIG. 3a, the neural network can be obtained by training according to the training device 120.

本申请实施例还提供的一种芯片，该芯片包括神经网络处理器NPU。该芯片可以被设置在如图3a所示的执行设备110中，用以完成计算模块111的计算工作。该芯片也可以被设置在如图3a所示的训练设备120中，用以完成训练设备120的训练工作并输出目标模型/规则。An embodiment of the present application also provides a chip, where the chip includes a neural network processor NPU. The chip can be set in the execution device 110 as shown in FIG. 3 a to complete the calculation work of the calculation module 111 . The chip can also be set in the training device 120 as shown in FIG. 3a to complete the training work of the training device 120 and output the target model/rule.

神经网络处理器NPU，NPU作为协处理器挂载到主中央处理器(centralprocessing unit，CPU)(host CPU)上，由主CPU分配任务。NPU的核心部分为运算电路，控制器控制运算电路提取存储器(权重存储器或输入存储器)中的数据并进行运算。The neural network processor NPU, the NPU is mounted on the main central processing unit (central processing unit, CPU) (host CPU) as a co-processor, and the main CPU assigns tasks. The core part of the NPU is an arithmetic circuit, and the controller controls the arithmetic circuit to extract the data in the memory (weight memory or input memory) and perform operations.

在一些实现中，运算电路内部包括多个处理单元(process engine,PE)。在一些实现中，运算电路是二维脉动阵列。运算电路还可以是一维脉动阵列或者能够执行例如乘法和加法这样的数学运算的其它电子线路。在一些实现中，运算电路是通用的矩阵处理器。In some implementations, the arithmetic circuit includes a plurality of process engines (PE) inside. In some implementations, the arithmetic circuit is a two-dimensional systolic array. The arithmetic circuit may also be a one-dimensional systolic array or other electronic circuit capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuit is a general-purpose matrix processor.

举例来说，假设有输入矩阵A，权重矩阵B，输出矩阵C。运算电路从权重存储器中取矩阵B相应的数据，并缓存在运算电路中每一个PE上。运算电路从输入存储器中取矩阵A数据与矩阵B进行矩阵运算，得到的矩阵的部分结果或最终结果，保存在累加器(accumulator)中。For example, suppose there is an input matrix A, a weight matrix B, and an output matrix C. The operation circuit fetches the data corresponding to the matrix B from the weight memory, and buffers it on each PE in the operation circuit. The arithmetic circuit fetches the data of matrix A from the input memory and performs matrix operation on matrix B, and stores the partial result or final result of the matrix in an accumulator.

向量计算单元可以对运算电路的输出做进一步处理，如向量乘，向量加，指数运算，对数运算，大小比较等等。例如，向量计算单元可以用于神经网络中非卷积/非FC层的网络计算，如池化(pooling)，批归一化(batch normalization)，局部响应归一化(localresponse normalization)等。The vector calculation unit can further process the output of the arithmetic circuit, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison and so on. For example, the vector computing unit can be used for network computation of non-convolutional/non-FC layers in neural networks, such as pooling, batch normalization, local response normalization, etc.

在一些实现种，向量计算单元能将经处理的输出的向量存储到统一缓存器。例如，向量计算单元可以将非线性函数应用到运算电路的输出，例如累加值的向量，用以生成激活值。在一些实现中，向量计算单元生成归一化的值、合并值，或二者均有。在一些实现中，处理过的输出的向量能够用作到运算电路的激活输入，例如用于在神经网络中的后续层中的使用。In some implementations, the vector computation unit can store the processed output vector to a unified buffer. For example, the vector computing unit may apply a nonlinear function to the output of the arithmetic circuit, such as a vector of accumulated values, to generate activation values. In some implementations, the vector computation unit generates normalized values, merged values, or both. In some implementations, the vector of processed outputs can be used as activation input to an operational circuit, such as for use in subsequent layers in a neural network.

统一存储器用于存放输入数据以及输出数据。Unified memory is used to store input data as well as output data.

权重数据直接通过存储单元访问控制器(direct memory access controller，DMAC)将外部存储器中的输入数据搬运到输入存储器和/或统一存储器、将外部存储器中的权重数据存入权重存储器，以及将统一存储器中的数据存入外部存储器。The weight data directly transfers the input data in the external memory to the input memory and/or the unified memory through the memory unit access controller (direct memory access controller, DMAC), stores the weight data in the external memory into the weight memory, and transfers the unified memory store the data in the external memory.

总线接口单元(bus interface unit，BIU)，用于通过总线实现主CPU、DMAC和取指存储器之间进行交互。The bus interface unit (bus interface unit, BIU) is used to realize the interaction between the main CPU, the DMAC and the instruction fetch memory through the bus.

与控制器连接的取指存储器(instruction fetch buffer)，用于存储控制器使用的指令；The instruction fetch buffer connected to the controller is used to store the instructions used by the controller;

控制器，用于调用指存储器中缓存的指令，实现控制该运算加速器的工作过程。The controller is used for invoking the instructions cached in the memory to realize and control the working process of the operation accelerator.

一般地，统一存储器，输入存储器，权重存储器以及取指存储器均为片上(On-Chip)存储器，外部存储器为该NPU外部的存储器，该外部存储器可以为双倍数据率同步动态随机存储器(double data rate synchronous dynamic random access memory，DDRSDRAM)、高带宽存储器(high bandwidth memory，HBM)或其他可读可写的存储器。Generally, the unified memory, input memory, weight memory and instruction fetch memory are all on-chip memories, and the external memory is the memory outside the NPU, and the external memory can be double data rate synchronous dynamic random access memory (double data rate synchronous random access memory). rate synchronous dynamic random access memory, DDRSDRAM), high bandwidth memory (high bandwidth memory, HBM) or other readable and writable memory.

由于本申请实施例涉及大量神经网络的应用，为了便于理解，下面先对本申请实施例涉及的相关术语及神经网络等相关概念进行介绍。Since the embodiments of the present application involve a large number of neural network applications, for ease of understanding, related terms and neural networks and other related concepts involved in the embodiments of the present application are first introduced below.

(1)神经网络(1) Neural network

神经网络可以是由神经单元组成的，神经单元可以是指以xs和截距1为输入的运算单元，该运算单元的输出可以为：A neural network can be composed of neural units, and a neural unit can refer to an operation unit that takes xs and intercept 1 as inputs, and the output of the operation unit can be:

其中，s＝1、2、……n，n为大于1的自然数，Ws为xs的权重，b为神经单元的偏置。f为神经单元的激活函数(activation functions)，用于将非线性特性引入神经网络中，来将神经单元中的输入信号转换为输出信号。该激活函数的输出信号可以作为下一层卷积层的输入。激活函数可以是sigmoid函数。神经网络是将许多个上述单一的神经单元联结在一起形成的网络，即一个神经单元的输出可以是另一个神经单元的输入。每个神经单元的输入可以与前一层的局部接受域相连，来提取局部接受域的特征，局部接受域可以是由若干个神经单元组成的区域。Among them, s=1, 2, ... n, n is a natural number greater than 1, Ws is the weight of xs, and b is the bias of the neural unit. f is an activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into an output signal. The output signal of this activation function can be used as the input of the next convolutional layer. The activation function can be a sigmoid function. A neural network is a network formed by connecting many of the above single neural units together, that is, the output of one neural unit can be the input of another neural unit. The input of each neural unit can be connected with the local receptive field of the previous layer to extract the features of the local receptive field, and the local receptive field can be an area composed of several neural units.

神经网络中的每一层的工作可以用数学表达式

来描述：从物理层面神经网络中的每一层的工作可以理解为通过五种对输入空间(输入向量的集合)的操作，完成输入空间到输出空间的变换(即矩阵的行空间到列空间)，这五种操作包括：1、升维/降维；2、放大/缩小；3、旋转；4、平移；5、“弯曲”。其中1、2、3的操作由

完成，4的操作由+b完成，5的操作则由a()来实现。这里之所以用“空间”二字来表述是因为被分类的对象并不是单个事物，而是一类事物，空间是指这类事物所有个体的集合。其中，W是权重向量，该向量中的每一个值表示该层神经网络中的一个神经元的权重值。该向量W决定着上文所述的输入空间到输出空间的空间变换，即每一层的权重W控制着如何变换空间。训练神经网络的目的，也就是最终得到训练好的神经网络的所有层的权重矩阵(由很多层的向量W形成的权重矩阵)。因此，神经网络的训练过程本质上就是学习控制空间变换的方式，更具体的就是学习权重矩阵。The work of each layer in a neural network can be expressed mathematically

To describe: From the physical level, the work of each layer in the neural network can be understood as the transformation from the input space to the output space (that is, the row space of the matrix to the column space) through five operations on the input space (set of input vectors). ), the five operations include: 1. Dimension raising/lowering; 2. Enlarging/reducing; 3. Rotation; 4. Translation; 5. "Bending". Among them, the operations of 1, 2, and 3 are determined by

Complete, the operation of 4 is completed by +b, and the operation of 5 is realized by a(). The reason why the word "space" is used here is because the object to be classified is not a single thing, but a type of thing, and space refers to the collection of all individuals of this type of thing. Among them, W is the weight vector, and each value in the vector represents the weight value of a neuron in the neural network of this layer. The vector W determines the space transformation from the input space to the output space described above, that is, the weight W of each layer controls how the space is transformed. The purpose of training the neural network is to finally obtain the weight matrix of all layers of the trained neural network (the weight matrix formed by the vectors W of many layers). Therefore, the training process of the neural network is essentially learning the way to control the spatial transformation, and more specifically, learning the weight matrix.

因为希望神经网络的输出尽可能的接近真正想要预测的值，所以可以通过比较当前网络的预测值和真正想要的目标值，再根据两者之间的差异情况来更新每一层神经网络的权重向量(当然，在第一次更新之前通常会有初始化的过程，即为神经网络中的各层预先配置参数)，比如，如果网络的预测值高了，就调整权重向量让它预测低一些，不断的调整，直到神经网络能够预测出真正想要的目标值。因此，就需要预先定义“如何比较预测值和目标值之间的差异”，这便是损失函数(loss function)或目标函数(objective function)，它们是用于衡量预测值和目标值的差异的重要方程。其中，以损失函数举例，损失函数的输出值(loss)越高表示差异越大，那么神经网络的训练就变成了尽可能缩小这个loss的过程。Because we want the output of the neural network to be as close as possible to the value you really want to predict, you can compare the predicted value of the current network with the target value you really want, and then update each layer of the neural network according to the difference between the two. (of course, there is usually an initialization process before the first update, that is, pre-configuring parameters for each layer in the neural network), for example, if the predicted value of the network is high, adjust the weight vector to make it predict low Some, keep adjusting until the neural network can predict the real desired target value. Therefore, it is necessary to pre-define "how to compare the difference between the predicted value and the target value", which is the loss function or objective function, which is used to measure the difference between the predicted value and the target value. important equation. Among them, taking the loss function as an example, the higher the output value of the loss function (loss), the greater the difference, then the training of the neural network becomes the process of reducing the loss as much as possible.

(2)反向传播算法(2) Back propagation algorithm

神经网络可以采用误差反向传播(back propagation，BP)算法在训练过程中修正初始的神经网络模型中参数的大小，使得神经网络模型的重建误差损失越来越小。具体地，前向传递输入信号直至输出会产生误差损失，通过反向传播误差损失信息来更新初始的神经网络模型中参数，从而使误差损失收敛。反向传播算法是以误差损失为主导的反向传播运动，旨在得到最优的神经网络模型的参数，例如权重矩阵。The neural network can use the error back propagation (BP) algorithm to correct the size of the parameters in the initial neural network model during the training process, so that the reconstruction error loss of the neural network model becomes smaller and smaller. Specifically, the input signal is passed forward until the output will generate error loss, and the parameters in the initial neural network model are updated by back-propagating the error loss information, so that the error loss converges. The back-propagation algorithm is a back-propagation movement dominated by error loss, aiming to obtain the parameters of the optimal neural network model, such as the weight matrix.

(3)图像增强(3) Image enhancement

图像增强指的是对图像的亮度、颜色、对比度、饱和度、动态范围等进行处理，以满足某种特定指标。简单来说，通过在图像处理过程中，通过有目的地强调图像的整体或局部特性，将原来不清晰的图像变得清晰或强调某些感兴趣的特征，扩大图像中不同物体特征之间的差别，抑制不感兴趣的特征，从而起到改善图像质量、丰富图像信息量的作用，能够加强图像判读和识别效果，满足某些特殊分析的需要。示例性的，图像增强可以包括但不限于图像超分辨率重构、图像去噪、图像去雾、图像去模糊以及图像对比度增强。Image enhancement refers to processing the brightness, color, contrast, saturation, dynamic range, etc. of an image to meet certain specific indicators. Simply put, in the process of image processing, by purposefully emphasizing the overall or local characteristics of the image, the original unclear image becomes clear or some interesting features are emphasized, and the difference between the features of different objects in the image is enlarged. It can improve the image quality and enrich the amount of image information, and can strengthen the image interpretation and recognition effect to meet the needs of some special analysis. Exemplarily, image enhancement may include, but is not limited to, image super-resolution reconstruction, image denoising, image dehazing, image deblurring, and image contrast enhancement.

(4)图像语义分割(4) Image Semantic Segmentation

图像语义分割是指将图像按照某种规则(如光照、类别)将像素细分成不同的类别。简单来说，图像语义分割的目标是给图像中的每一个像素点都标注一个标签，即标注出图像中每个像素所属的对象类别，这些标签可以包括人、动物、汽车、鲜花、家具等。可以参阅图3b，图3b为本申请实施例提供的一种图像语义分割的示意图。如图3b所示，通过图像语义分割可以将图像在像素级别按照类别划分成不同的子区域，如建筑物、天空、植物等子区域。Image semantic segmentation refers to subdividing the image into different categories according to certain rules (such as illumination, category). To put it simply, the goal of image semantic segmentation is to label each pixel in the image with a label, that is, to label the object category to which each pixel in the image belongs. These labels can include people, animals, cars, flowers, furniture, etc. . Referring to FIG. 3b, FIG. 3b is a schematic diagram of an image semantic segmentation provided by an embodiment of the present application. As shown in Figure 3b, through image semantic segmentation, the image can be divided into different sub-regions according to categories at the pixel level, such as sub-regions such as buildings, sky, and plants.

下面从神经网络的训练侧和神经网络的应用侧对本申请提供的方法进行描述。The method provided by the present application will be described below from the training side of the neural network and the application side of the neural network.

本申请实施例提供的神经网络的训练方法，涉及图像的处理，具体可以应用于数据训练、机器学习、深度学习等数据处理方法，对训练数据(如本申请中的图像)进行符号化和形式化的智能信息建模、抽取、预处理、训练等，最终得到训练好的图像处理模型；并且，本申请实施例提供的图像处理方法可以运用上述训练好的图像处理模型，将输入数据(如本申请中的待处理图像)输入到所述训练好的图像处理模型中，得到输出数据(如本申请中目标图像)。需要说明的是，本申请实施例提供的图像处理模型的训练方法和图像处理方法是基于同一个构思产生的发明，也可以理解为一个系统中的两个部分，或一个整体流程的两个阶段：如模型训练阶段和模型应用阶段。The neural network training method provided in the embodiment of the present application involves the processing of images, and can be specifically applied to data processing methods such as data training, machine learning, deep learning, etc., to symbolize and form the training data (such as the images in the present application). In addition, the image processing method provided in the embodiment of the present application can use the above-mentioned trained image processing model to convert input data (such as The image to be processed in this application) is input into the trained image processing model to obtain output data (such as the target image in this application). It should be noted that the training method and the image processing method of the image processing model provided in the embodiments of this application are inventions based on the same concept, and can also be understood as two parts in a system, or two stages of an overall process : such as model training phase and model application phase.

可以参阅图4，图4为本申请实施例提供的一种图像处理方法的流程示意图。如图4所示，本申请实施例提供的一种图像处理方法包括以下步骤：Referring to FIG. 4 , FIG. 4 is a schematic flowchart of an image processing method provided by an embodiment of the present application. As shown in FIG. 4 , an image processing method provided by an embodiment of the present application includes the following steps:

步骤401、获取待处理图像。Step 401: Acquire an image to be processed.

本实施例中，图像处理装置可以获取一个待处理图像，该待处理图像例如可以为需要进行图像增强的图像。In this embodiment, the image processing apparatus may acquire an image to be processed, and the image to be processed may be, for example, an image that needs to be enhanced.

可以理解的是，当图像处理装置部署在无人车中时，图像处理装置可以通过摄像头获取无人车在行驶过程中采集到的街景图。当图像处理装置部署在机器人中时，图像处理装置可以实时获取机器人所在环境下的实景图。当图像处理装置部署在安防设备(例如监控摄像头)中时，图像处理装置可以实时获取监控摄像头所采集到的实景图。当图像处理装置部署在手机或者平板电脑等手持设备上时，图像处理装置可以获取用户拍摄的照片，或者从网站上下载的图片，这些图像均可以作为待处理图像。It can be understood that when the image processing device is deployed in the unmanned vehicle, the image processing device can obtain the street view image collected by the unmanned vehicle during the driving process through the camera. When the image processing device is deployed in the robot, the image processing device can acquire a real-world image of the environment where the robot is located in real time. When the image processing apparatus is deployed in a security device (such as a surveillance camera), the image processing apparatus can acquire the real-world image captured by the surveillance camera in real time. When the image processing apparatus is deployed on a handheld device such as a mobile phone or a tablet computer, the image processing apparatus can acquire photos taken by the user or pictures downloaded from a website, and these images can be used as images to be processed.

步骤402、通过第一网络对所述待处理图像处理，得到第一特征，所述第一网络被配置为至少提取用于图像增强的特征。Step 402: Process the to-be-processed image through a first network to obtain a first feature, where the first network is configured to extract at least features for image enhancement.

本实施例中，所述第一网络可以是与图像增强相关的主干网络(backbone)，例如卷积神经网络，所述第一网络被配置为至少提取用于图像增强的特征，例如图像低层特征(low level feature)。示例性地，图像低层特征可以是指图像中的一些小的细节信息，例如可以包括边缘(edge)、角(corner)、颜色(color)、像素(pixeles)、梯度(gradients)以及纹理等高频细节信息。In this embodiment, the first network may be a backbone network related to image enhancement, such as a convolutional neural network, and the first network is configured to extract at least features for image enhancement, such as low-level image features (low level feature). Exemplarily, low-level image features may refer to some small details in the image, such as edges, corners, colors, pixels, gradients, and textures. frequency details.

可以理解的是，针对于不同的图像增强任务，可以采用不同的网络，以适应图像增强任务的需求。例如，在图像增强任务为图像超分辨率重构时，所述第一网络可以采用残差网络(Residual Network,ResNet)；在图像增强任务为图像对比度增强时，所述第一网络可以采用Unet网络。It can be understood that for different image enhancement tasks, different networks can be used to suit the needs of the image enhancement task. For example, when the image enhancement task is image super-resolution reconstruction, the first network may use Residual Network (ResNet); when the image enhancement task is image contrast enhancement, the first network may use Unet network.

具体地，本实施例提供的图像处理方法可以应用于不同的图像增强任务，例如可以包括但不限于图像超分辨率重构、图像去噪、图像去雾、图像去模糊、图像对比度增强、图像去马赛克、图像去雨、图像颜色增强、图像亮度增强、图像细节增强以及图像动态范围增强等图像增强任务。Specifically, the image processing method provided in this embodiment can be applied to different image enhancement tasks, such as, but not limited to, image super-resolution reconstruction, image denoising, image dehazing, image deblurring, image contrast enhancement, image Image enhancement tasks such as demosaicing, image deraining, image color enhancement, image brightness enhancement, image detail enhancement, and image dynamic range enhancement.

在一个可能的实施例中，在通过第一网络对所述待处理图像处理之前，还可以通过预处理网络对所述待处理图像进行预处理，得到预处理特征。然后通过所述第一网络对得到的预处理特征进行处理，得到所述第一特征。该预处理网络例如可以为卷积神经网络。通过对所述待处理图像进行预处理，可以消除该待处理图像中无关的信息，恢复有用的真实信息，增强有关信息的可检测性且最大限度地简化数据，从而提高特征提取的可靠性。In a possible embodiment, before the image to be processed is processed through the first network, the image to be processed may also be preprocessed through a preprocessing network to obtain preprocessing features. Then, the obtained preprocessing features are processed through the first network to obtain the first features. The preprocessing network can be, for example, a convolutional neural network. By preprocessing the to-be-processed image, irrelevant information in the to-be-processed image can be eliminated, useful real information can be recovered, the detectability of the relevant information can be enhanced, and the data can be simplified to the greatest extent, thereby improving the reliability of feature extraction.

可以理解的是，所述预处理网络也可以是包括于所述第一网络内，即所述第一网络中包括所述预处理网络，通过所述第一网络对所述待处理图像进行处理，即可得到所述第一特征。It can be understood that the preprocessing network may also be included in the first network, that is, the first network includes the preprocessing network, and the image to be processed is processed through the first network. , the first feature can be obtained.

步骤403、通过第二网络对所述待处理图像处理，得到第二特征，所述第二网络被配置为至少提取语义分割特征。Step 403: Process the to-be-processed image through a second network to obtain a second feature, where the second network is configured to extract at least semantic segmentation features.

本实施例中，所述第二网络可以是与图像语义分割相关的主干网络，例如卷积神经网络，被配置为至少提取用于图像语义分割的特征，例如图像高层特征(high levelfeature)。示例性地，图像高层特征可以是指基于图像低层特征，能反应出图像的语义信息的特征。一般地，图像高层特征可以用于图像中目标或物体形状的识别和检测，具有更丰富的语义信息。In this embodiment, the second network may be a backbone network related to image semantic segmentation, such as a convolutional neural network, configured to extract at least features for image semantic segmentation, such as image high level features. Exemplarily, the high-level image features may refer to features that can reflect the semantic information of the image based on the low-level features of the image. Generally, image high-level features can be used to identify and detect the shape of objects or objects in images, and have richer semantic information.

在一个可能的实施例中，所述第二网络例如可以为稠密连接的空洞卷积网络。其中，空洞卷积网络可以增大感受野，空洞卷积网络基于稠密连接则可以获取多尺度信息。通过两者共同作用，可以生成精确的语义分割相关的图像高层特征信息。In a possible embodiment, the second network may be, for example, a densely connected atrous convolutional network. Among them, the atrous convolutional network can increase the receptive field, and the atrous convolutional network can obtain multi-scale information based on dense connections. Through the joint action of the two, accurate high-level image feature information related to semantic segmentation can be generated.

空洞卷积网络实际上是在标准的卷积网络中引入了扩张率(也称为空洞数)，该参数定义了卷积核处理数据时各值的间距，以此来增大感受野。一般地，感受野用于表示网络内部的不同神经元对原图像的感受范围的大小，或者说，卷积网络每一层输出的特征图(feature map)上的像素点在原始图像上映射的区域大小。通过增大感受野，可以使得特征图上的像素响应图像中足够大的区域，以捕获关于大型对象的信息，从而能够获得精确的语义信息。The dilated convolutional network actually introduces the dilation rate (also called the number of holes) in the standard convolutional network. This parameter defines the spacing of each value when the convolution kernel processes the data, so as to increase the receptive field. Generally, the receptive field is used to represent the size of the receptive range of different neurons inside the network to the original image, or, in other words, the pixels on the feature map output by each layer of the convolutional network are mapped on the original image. area size. By increasing the receptive field, the pixels on the feature map can be made to respond to a large enough area in the image to capture information about large objects, so that accurate semantic information can be obtained.

可以参阅图5，图5为本申请实施例提供的一种稠密连接的空洞卷积网络的结构示意图。如图5所示，稠密连接的空洞卷积网络包括有多层网络，每层网络均包括空洞卷积网络(dilated conv)和线性激活函数(leaky relu)。对于稠密连接的空洞卷积网络中的每一层网络来说，其输出会作为该层之后的每一层网络的输入，以实现特征复用。通过将低层网络的特征直接输出到后续的每个高层网络进行汇总，减少经过中间层网络传递而导致丢失的特征，从而更好的利用低层网络的特征。Referring to FIG. 5, FIG. 5 is a schematic structural diagram of a densely connected atrous convolutional network provided by an embodiment of the present application. As shown in Figure 5, the densely connected dilated convolutional network includes multiple layers of networks, each of which includes a dilated convolutional network (dilated conv) and a linear activation function (leaky relu). For each layer of the densely connected atrous convolutional network, its output is used as the input of each layer after the layer to achieve feature reuse. By directly outputting the features of the low-level network to each subsequent high-level network for aggregation, the lost features caused by passing through the intermediate-level network are reduced, so that the features of the low-level network can be better utilized.

可以理解的是，本申请实施例中以第二网络为稠密连接的空洞卷积网络为例进行了说明，在实际情况中，第二网络也可以是其他的神经网络，在此不做具体限定。It can be understood that, in the embodiments of this application, the second network is a densely connected hole convolutional network as an example for description. In actual situations, the second network may also be other neural networks, which are not specifically limited here. .

在一个可能的实施例中，所述通过第二网络对所述待处理图像处理，得到第二特征具体可以包括：图像处理装置对所述待处理图像进行预处理，得到预处理特征，例如通过上述步骤402所述的预处理网络对所述待处理图像进行预处理；对所述预处理特征进行下采样处理，得到下采样特征；通过所述第二网络对所述下采样特征进行处理，得到第六特征；对所述第六特征进行上采样处理，得到所述待处理图像的第二特征。In a possible embodiment, the processing of the to-be-processed image through the second network to obtain the second feature may specifically include: an image processing apparatus pre-processing the to-be-processed image to obtain the pre-processed feature, for example, by The preprocessing network described in the above step 402 preprocesses the image to be processed; performs downsampling processing on the preprocessing features to obtain downsampling features; processes the downsampling features through the second network, Obtain a sixth feature; perform up-sampling processing on the sixth feature to obtain the second feature of the image to be processed.

本实施例中，通过对预处理特征进行下采样处理，可以输出分辨率降低的下采样特征；在通过第二网络对所述下采样特征进行处理，得到第六特征之后；再通过对所述第六特征进行上采样处理，以生成与预处理特征分辨率大小一样的第二特征，即恢复特征的分辨率。In this embodiment, by performing down-sampling processing on the pre-processing features, the down-sampling features with reduced resolution can be output; after processing the down-sampling features through the second network to obtain the sixth feature; The sixth feature is subjected to an upsampling process to generate a second feature with the same resolution as the preprocessing feature, that is, the resolution of the restored feature.

在实际应用中，下采样的倍数可以根据期望的处理精度与目标硬件平台的算力来决定，在此不做具体限定。一般而言，下采样的倍数越大，处理精度则相对越低，计算量也越小，即算力要求较低；下采样的倍数越小，处理精度则相对越高，但是计算量也越高，即算力要求较高。上采样的倍数需要与下采样的倍数保持一致，以保证恢复特征的分辨率。其中，可用于执行上采样的方法包括但不限于反卷积、双线性插值上采样以及近邻插值上采样等方法，在此不对上采样的方法做具体限定。In practical applications, the multiple of downsampling can be determined according to the expected processing accuracy and the computing power of the target hardware platform, which is not specifically limited here. Generally speaking, the larger the multiple of downsampling, the lower the processing accuracy and the smaller the amount of calculation, that is, the lower the computing power requirement; the smaller the multiple of downsampling, the higher the processing accuracy, but the greater the amount of calculation. High, that is, higher computing power requirements. The multiple of upsampling needs to be consistent with the multiple of downsampling to ensure the resolution of the recovered features. The methods that can be used to perform upsampling include but are not limited to methods such as deconvolution, bilinear interpolation upsampling, and nearest neighbor interpolation upsampling, and the upsampling method is not specifically limited herein.

步骤404、根据所述第一特征和所述第二特征，生成第三特征。Step 404: Generate a third feature according to the first feature and the second feature.

在一个可能的实施例中，图像处理装置可以对所述第一特征和所述第二特征进行特征融合处理，得到所述第三特征。示例性地，所述特征融合处理可以包括求和处理、相乘处理、级联处理和级联卷积处理等融合处理操作中的至少一个，其中，级联卷积处理表示进行级联处理以及卷积处理。在实际情况中，可以根据实际需要而采用相应的特征融合处理方式，在此不做具体限定。In a possible embodiment, the image processing apparatus may perform feature fusion processing on the first feature and the second feature to obtain the third feature. Exemplarily, the feature fusion processing may include at least one of fusion processing operations such as summation processing, multiplication processing, concatenation processing, and concatenated convolution processing, wherein the concatenated convolution processing means performing concatenated processing and Convolution processing. In an actual situation, a corresponding feature fusion processing manner may be adopted according to actual needs, which is not specifically limited here.

本实施例中，通过对所述第一特征和所述第二特征进行融合处理，可以将图像增强相关的图像低层特征和语义分割相关的图像高层特征进行有效融合，实现不同层次特征的互补，以提高网络的鲁棒性。In this embodiment, by performing fusion processing on the first feature and the second feature, the low-level image features related to image enhancement and the high-level image features related to semantic segmentation can be effectively fused to realize the complementarity of features at different levels. to improve the robustness of the network.

步骤405、获取所述待处理图像的语义分割结果。Step 405: Obtain the semantic segmentation result of the image to be processed.

在一个可能的实施例中，图像处理装置可以是对所述第三特征进行处理，得到所述待处理图像的语义分割结果。由于所述第三特征是融合了图像增强相关的图像低层特征和语义分割相关的图像高层特征之后的特征。因此，通过对第三特征进行处理，可以在语义分割相关的图像高层特征的基础上，引入图像低层特征，即基于不同层次的特征来获得所述待处理图像的语义分割结果，提高获得的语义分割结果的精度。示例性地，图像处理装置可以是通过卷积网络对所述第三特征做卷积操作，来获得所述待处理图像的语义分割结果。In a possible embodiment, the image processing apparatus may process the third feature to obtain a semantic segmentation result of the image to be processed. Because the third feature is a feature after fusion of low-level image features related to image enhancement and high-level image features related to semantic segmentation. Therefore, by processing the third feature, the low-level image features can be introduced on the basis of the high-level image features related to semantic segmentation, that is, the semantic segmentation results of the to-be-processed image can be obtained based on the features of different levels, and the obtained semantics can be improved. The precision of the segmentation result. Exemplarily, the image processing apparatus may perform a convolution operation on the third feature through a convolution network to obtain the semantic segmentation result of the image to be processed.

在另一个可能的实施例中，图像处理装置可以是对所述第二网络输出的第二特征进行处理，得到所述待处理图像的语义分割结果，即直接基于语义分割相关的特征来获得图像的语义分割结果。示例性地，图像处理装置也可以是通过卷积网络对所述第二特征做卷积操作，来获得所述待处理图像的语义分割结果。In another possible embodiment, the image processing apparatus may process the second feature output by the second network to obtain a semantic segmentation result of the to-be-processed image, that is, obtain an image directly based on features related to semantic segmentation semantic segmentation results. Exemplarily, the image processing apparatus may also perform a convolution operation on the second feature through a convolution network to obtain the semantic segmentation result of the image to be processed.

步骤406、根据所述第三特征和所述待处理图像的语义分割结果，生成第四特征。Step 406: Generate a fourth feature according to the third feature and the semantic segmentation result of the image to be processed.

在一个可能的实施例中，图像处理装置可以对所述第三特征和所述待处理图像的语义分割结果进行特征融合处理，得到所述第四特征。其中，特征融合处理的方式具体可以参考步骤404中所述，在此不再赘述。In a possible embodiment, the image processing apparatus may perform feature fusion processing on the third feature and the semantic segmentation result of the image to be processed to obtain the fourth feature. The specific method of the feature fusion processing may refer to the description in step 404, which will not be repeated here.

在一个可能的实施例中，在获得所述第三特征之后，所述图像处理装置可以对所述第三特征进行处理，例如对所述第三特征进行进一步的特征提取，得到第五特征。图像处理装置根据所述第五特征和所述待处理图像的语义分割结果，生成第四特征。也就是说，图像处理装置在对所述第三特征进行特征提取之后，再基于进一步特征提取得到的第五特征和待处理图像的语义分割结果进行特征融合。通过对特征融合之后所得到的第三特征做进一步的特征提取处理，可以在第三特征的基础上提取粒度更细的特征，以提高后续特征融合所得到的第四特征的精度。In a possible embodiment, after obtaining the third feature, the image processing apparatus may process the third feature, for example, perform further feature extraction on the third feature to obtain the fifth feature. The image processing apparatus generates a fourth feature according to the fifth feature and the semantic segmentation result of the image to be processed. That is to say, after the image processing apparatus performs feature extraction on the third feature, feature fusion is performed based on the fifth feature obtained by further feature extraction and the semantic segmentation result of the image to be processed. By performing further feature extraction processing on the third feature obtained after feature fusion, finer-grained features can be extracted on the basis of the third feature, so as to improve the accuracy of the fourth feature obtained by subsequent feature fusion.

步骤407、对所述第四特征进行图像重构，得到目标图像。Step 407: Perform image reconstruction on the fourth feature to obtain a target image.

本实施例中，图像处理装置在获得经过了两次特征融合处理之后的第四特征，图像处理装置可以通过对所述第四特征进行图像重构，例如对所述第四特征进行卷积后处理操作，得到目标图像，该目标图像则为进行图像增强后所得到的图像。In this embodiment, after the image processing apparatus obtains the fourth feature after two feature fusion processes, the image processing apparatus may perform image reconstruction on the fourth feature, for example, after convolving the fourth feature The processing operation is performed to obtain a target image, and the target image is an image obtained after image enhancement.

本实施例中，在图像增强处理过程中，通过两次特征融合处理，将语义分割特征以及语义分割结果与相关的图像增强特征进行了融合处理，实现了特征信息互补，从而能够针对不同语义区域采用不同的图像增强强度，精确地保持纹理细节，提升图像增强后的纹理细节真实性。In this embodiment, in the process of image enhancement processing, the semantic segmentation features and the semantic segmentation results are fused with the related image enhancement features through two feature fusion processing, so as to realize the complementarity of feature information, so that different semantic regions can be targeted. Different image enhancement strengths are used to accurately maintain texture details and improve the authenticity of texture details after image enhancement.

应理解，步骤401至407的执行主体(即图像处理装置)可以为终端设备，也可以为云侧的服务器，步骤401至407也可以通过终端设备和服务器进行数据处理以及之间的交互得到。It should be understood that the execution subject (ie, the image processing device) of steps 401 to 407 can be a terminal device or a server on the cloud side, and steps 401 to 407 can also be obtained through data processing and interaction between the terminal device and the server.

为了便于理解，以下将结合具体例子详细描述本实施例所提供的图像处理方法如何实现图像去雾。For ease of understanding, the following will describe in detail how the image processing method provided in this embodiment implements image dehazing with reference to specific examples.

可以参阅图6a和图6b，图6a为本申请实施例提供的一种用于图像处理的架构示意图；图6b为本申请实施例提供的一种用于图像处理的网络结构示意图。如图6a和6b所示，该架构可以包括：6a and 6b, FIG. 6a is a schematic diagram of an architecture for image processing provided by an embodiment of the present application; FIG. 6b is a schematic diagram of a network structure for image processing provided by an embodiment of the present application. As shown in Figures 6a and 6b, the architecture can include:

预处理单元100，用于接收带雾的低对比度图像，并且对该图像进行预处理，以生成预处理特征F。其中，该预处理单元100例如可以为一个卷积网络，通过对接收到的图像(例如12兆像素，3000*4000分辨率的图像)进行卷积操作，来生成预处理特征F，该预处理特征F的分辨率与该图像的分辨率相同，即为3000*4000分辨率。The preprocessing unit 100 is configured to receive a low-contrast image with fog, and preprocess the image to generate a preprocessing feature F. Wherein, the preprocessing unit 100 can be, for example, a convolutional network, which generates a preprocessing feature F by performing a convolution operation on a received image (eg, an image with a resolution of 12 megapixels and a resolution of 3000*4000). The resolution of feature F is the same as the resolution of the image, which is 3000*4000 resolution.

第一特征提取单元101，用于对预处理特征F进行特征提取，例如对预处理特征F进行图像低层特征的提取，得到第一特征F_L。该第一特征提取单元101可以采用与去雾任务相关的主干网络，例如采用多级级联的卷积网络+实例归一化(Instance Normalization，IN)网络。基于IN网络能够学习到高度非线性的对比度归一效果，从而使得图像在亮度、色彩、风格等外观上产生的偏差不会影响最终的预测结果，同时还能够提升提取到的图像低层特征与后续提取的图像高层特征的兼容性问题。级联层数N可以根据期望的处理精度与目标硬件平台的算力来决定，一般而言，级联层数N越大，特征提取的精度越高，同时计算量也越大。The first feature extraction unit 101 is configured to perform feature extraction on the preprocessing feature F, for example, performing extraction of low-level image features on the preprocessing feature F to obtain the first feature _FL . The first feature extraction unit 101 may use a backbone network related to the dehazing task, for example, a multi-level cascaded convolutional network + instance normalization (Instance Normalization, IN) network. Based on the IN network, a highly nonlinear contrast normalization effect can be learned, so that the deviation in the appearance of the image such as brightness, color, and style will not affect the final prediction result, and it can also improve the low-level features of the extracted image and subsequent Compatibility issues of extracted image high-level features. The number of cascaded layers N can be determined according to the desired processing accuracy and the computing power of the target hardware platform. Generally speaking, the larger the number of cascaded layers, N, the higher the accuracy of feature extraction and the greater the amount of computation.

下采样单元200，用于对预处理特征F进行预处理操作，进行下采样处理，得到分辨率降低的下采样特征F_down。示例性地，该下采样单元200例如可以为池化层网络，通过对预处理特征F执行k*k平均池化(average pooling)下采样操作，以得到下采样特征F_down。下采样的倍数k可以根据期望的处理精度与目标硬件平台的算力来决定，一般而言，级联层数N越小，特征提取的精度越高，同时计算量也越大。示例性地，k的取值可以为4，即对特征的宽和高同时进行4倍的下采样。The down-sampling unit 200 is configured to perform a pre-processing operation on the pre-processing feature F, and perform down-sampling processing to obtain a down-sampling feature F _down with reduced resolution. Exemplarily, the downsampling unit 200 may be, for example, a pooling layer network, and performs a downsampling operation of k*k average pooling (average pooling) on the preprocessing feature F to obtain the downsampling feature F _down . The multiple k of downsampling can be determined according to the desired processing accuracy and the computing power of the target hardware platform. Generally speaking, the smaller the number of cascaded layers N, the higher the accuracy of feature extraction and the greater the amount of computation. Exemplarily, the value of k may be 4, that is, the width and height of the feature are simultaneously downsampled by 4 times.

第二特征提取单元201，用于对下采样特征F_down进行特征提取处理，例如对下采样特征F_down进行图像高层特征的提取，得到第六特征F_down-seg。示例性地，该第二特征提取单元201可以为稠密连接的空洞卷积网络。The second feature extraction unit 201 is configured to perform feature extraction processing on the down-sampling feature F _down , for example, perform image high-level feature extraction on the down-sampling feature F _down to obtain a sixth feature F _down-seg . Exemplarily, the second feature extraction unit 201 may be a densely connected atrous convolutional network.

上采样单元202，用于对第二特征F_down-seg进行上采样处理，从而得到与原始输入图像分辨率相同的第二特征F_H-seg。其中，上采样单元202所采用的采样方法例如可以为反卷积、双线性插值上采样、近邻插值上采样，其上采样倍数与下采样的倍数相同。The up-sampling unit 202 is configured to perform up-sampling processing on the second feature F _down-seg , so as to obtain the second feature F _H-seg with the same resolution as the original input image. The sampling method adopted by the upsampling unit 202 may be, for example, deconvolution, bilinear interpolation upsampling, and nearest neighbor interpolation upsampling, and the upsampling multiple is the same as the downsampling multiple.

第一特征融合单元102，用于对第一特征FL以及第二特征F_down-seg进行特征融合处理，例如对第一特征FL以及第二特征F_down-seg进行级联操作，以得到融合后的第三特征F_fusion1。The first feature fusion unit 102 is configured to perform feature fusion processing on the first feature FL and the second feature F _down-seg _, for example, perform a cascade operation on the first feature FL and the second feature F down-seg to obtain a fusion The third feature F _fusion1 .

第三特征提取单元103，用于对第三特征F_fusion1做进一步的特征提取，例如通过一个卷积网络对第三特征F_fusion1做卷积处理，以提取得到细粒度特征，即第五特征F_fine。The third feature extraction unit 103 is used to perform further feature extraction on the third feature F _fusion1 , for example, perform convolution processing on the third feature F _fusion1 through a convolution network to extract fine-grained features, that is, the fifth feature F _fine .

语义结果预测单元203，用于对第三特征F_fusion1做语义分割结果预测，例如通过一个卷积网络对第三特征Ffusion1进行后处理操作，得到输入图像对应的语义分割结果。The semantic result prediction unit 203 is used for predicting the semantic segmentation result of the third feature F _fusion1 , for example, performing a post-processing operation on the third feature Ffusion1 through a convolutional network to obtain a semantic segmentation result corresponding to the input image.

第二特征融合单元104，用于对第五特征F_fine以及输入图像对应的语义分割结果，进行特征融合处理，例如对第五特征F_fine以及输入图像对应的语义分割结果进行级联操作，以得到融合后的第四特征F_fusion2。The second feature fusion unit 104 is used to perform feature fusion processing on the semantic segmentation result corresponding to the fifth feature F _fine and the input image, for example, a cascade operation is performed on the semantic segmentation result corresponding to the fifth feature F _fine and the input image, to A fused fourth feature F _{fusion2 is obtained} .

图像重构单元105，用于对融合后的第四特征F_fusion2进行图像重构处理，例如通过一个卷积网络对第四特征F_fusion2进行后处理操作，得到去雾后的图像。The image reconstruction unit 105 is configured to perform image reconstruction processing on the fused fourth feature F _fusion2 , for example, perform a post-processing operation on the fourth feature F _fusion2 through a convolutional network to obtain a dehazed image.

以本实施例方法用于进行图像去雾为例，本实施例在开源仿真数据集上进行了测试，以比较本实施方法与现有的去雾算法。Taking the method of this embodiment for image dehazing as an example, this embodiment is tested on an open source simulation data set to compare the method of this embodiment with the existing dehazing algorithm.

可以参阅图7，图7为本申请实施例提供的一个客观指标对比示意图。由图7可知，相对于现有的多种去雾算法，本申请实施例提供的图像处理方法具有更高的峰值信噪比(Peak Signal to Noise Ratio，PSNR)以及结构相似性(Structural SIMilarity，SSIM)。Referring to FIG. 7 , FIG. 7 is a schematic diagram of objective index comparison provided in this embodiment of the present application. It can be seen from FIG. 7 that, compared with various existing dehazing algorithms, the image processing method provided by the embodiment of the present application has higher peak signal-to-noise ratio (Peak Signal to Noise Ratio, PSNR) and structural similarity (Structural SIMilarity, SSIM).

其中，PSNR是一个表示信号最大可能功率和影响它的表示精度的破坏性噪声功率的比值的工程术语。PSNR通常用于图像处理等领域中信号重建质量的测量方法，一般通过均方误差来进行定义。一般而言，PSNR越高，表示与真实值的差距越小。where PSNR is an engineering term that expresses the ratio of the maximum possible power of a signal to the power of destructive noise that affects the accuracy of its representation. PSNR is usually used to measure the quality of signal reconstruction in fields such as image processing, and is generally defined by mean square error. In general, the higher the PSNR, the smaller the gap with the true value.

SSIM是一种衡量两张图像相似度的指标，主要基于亮度(luminance)、对比度(contrast)和结构(structure)来评价图像的相似度。SSIM is an index to measure the similarity of two images, which is mainly based on brightness, contrast and structure to evaluate the similarity of images.

此外，在开源仿真数据集上增加了不同水平的噪声的情况下，本实施例方法与已有的去雾方法相比，具有更高的PSNR和SSIM，即本实施例方法具有更强的鲁棒性与稳定性。具体地，可以参阅图8，图8为本申请实施例提供的另一个客观指标对比示意图。In addition, when different levels of noise are added to the open source simulation data set, the method of this embodiment has higher PSNR and SSIM than the existing dehazing methods, that is, the method of this embodiment has stronger robustness Stability and stability. Specifically, reference may be made to FIG. 8 , which is a schematic diagram of another objective index comparison provided by this embodiment of the present application.

本实施例在真实含雾数据集上进行测试，可以获知本实施例方法能够得到更加清晰和通透的结果，且没有伪影失真。现有去雾算法存在去雾水平不够，去雾后图片对比度偏低；或者过于去雾，在一些局部区域中丢失了纹理细节的问题。具体地，可以参阅图9，图9为本申请实施例提供的图像对比示意图。由图9可知，右下角的本实施例方法，在去雾的基础上，很好地保持了绿植、楼面、地面等区域的细节，图片通透自然、视觉效果最佳。This embodiment is tested on a real foggy data set, and it can be known that the method of this embodiment can obtain clearer and more transparent results without artifact distortion. The existing dehazing algorithms have the problem that the dehazing level is not enough, and the contrast of the image after dehazing is low; or the dehazing is too much, and the texture details are lost in some local areas. Specifically, reference may be made to FIG. 9 , which is a schematic diagram of image comparison provided by this embodiment of the present application. It can be seen from FIG. 9 that the method of this embodiment in the lower right corner, on the basis of defogging, well maintains the details of green plants, floors, ground and other areas, the picture is transparent and natural, and the visual effect is the best.

可以参阅图10，图10为本申请实施例提供的一种模型训练方法的流程示意图。如图10所示，本申请实施例提供的一种模型训练方法，包括以下步骤：Referring to FIG. 10 , FIG. 10 is a schematic flowchart of a model training method provided by an embodiment of the present application. As shown in FIG. 10 , a model training method provided by an embodiment of the present application includes the following steps:

步骤1001，获取训练样本对，所述训练样本对包括第一图像以及第二图像，所述第一图像的质量低于所述第二图像。Step 1001: Acquire a training sample pair, where the training sample pair includes a first image and a second image, and the quality of the first image is lower than that of the second image.

本实施例中，在图像训练装置进行模型训练之前，可以获取用于训练样本对。其中，第一图像和第二图像为同一场景下的两张图像，且第一图像的图像质量低于第二图像。图像质量指的是颜色、亮度、饱和度、对比度、动态范围、分辨率、纹理细节、清晰度等中的一种或多种。例如，第一图像为带雾的图像，第二图像为不带雾的图像，第一图像的亮度、对比度以及清晰度等均低于第二图像。In this embodiment, before the image training apparatus performs model training, a pair of samples for training may be obtained. The first image and the second image are two images in the same scene, and the image quality of the first image is lower than that of the second image. Image quality refers to one or more of color, brightness, saturation, contrast, dynamic range, resolution, texture detail, sharpness, etc. For example, the first image is an image with fog, the second image is an image without fog, and the brightness, contrast, and clarity of the first image are lower than those of the second image.

步骤1002，通过待训练图像处理模型对所述第一图像进行处理，得到预测图像，其中，待训练图像处理模型用于获取待处理图像；通过第一网络对所述第一图像处理，得到第一特征，所述第一网络被配置为至少提取用于图像增强的特征；通过第二网络对所述第一图像处理，得到第二特征，所述第二网络被配置为至少提取语义分割特征；根据所述第一特征和所述第二特征，生成第三特征；获取所述第一图像的语义分割结果；根据所述第三特征和所述第一图像的语义分割结果，生成第四特征；对所述第四特征进行图像重构，得到预测图像。Step 1002: Process the first image through the image processing model to be trained to obtain a predicted image, wherein the image processing model to be trained is used to obtain the image to be processed; process the first image through the first network to obtain the first image. a feature, the first network is configured to extract at least features for image enhancement; the first image is processed by a second network to obtain a second feature, the second network is configured to extract at least semantic segmentation features ; Generate a third feature according to the first feature and the second feature; Obtain the semantic segmentation result of the first image; Generate a fourth feature according to the third feature and the semantic segmentation result of the first image feature; perform image reconstruction on the fourth feature to obtain a predicted image.

步骤1003，根据所述训练样本对中的第二图像以及所述预测图像，获取第一损失，所述第一损失用于描述所述第二图像和所述预测图像之间的差异。Step 1003: Obtain a first loss according to the second image in the training sample pair and the predicted image, where the first loss is used to describe the difference between the second image and the predicted image.

本实施例中，在得到预测图像之后，可以基于预设的损失函数，求取第二图像与预测图像对应的第一损失，以确定第二图像和预测图像之间的差异。In this embodiment, after the predicted image is obtained, a first loss corresponding to the second image and the predicted image may be obtained based on a preset loss function to determine the difference between the second image and the predicted image.

在一种可能的实现方式中，可以是基于重构损失函数(reconstruction loss)和梯度损失函数(gradient loss)来获取所述第二图像和所述预测图像对应的第一损失，以保证增强后的图像能够满足客观指标以及主观指标要求。In a possible implementation manner, the first loss corresponding to the second image and the predicted image may be obtained based on a reconstruction loss function (reconstruction loss) and a gradient loss function (gradient loss), so as to ensure that after the enhancement The images can meet the requirements of objective indicators and subjective indicators.

示例性地，重构损失函数可以是使用L1范式，获取预测图像和第二图像像素级的损失。重构损失函数可以是如公式1所示：Exemplarily, the reconstruction loss function may be to obtain the pixel-level loss of the predicted image and the second image using the L1 norm. The reconstruction loss function can be as shown in Equation 1:

其中，L_rec表示重构损失，||表示L1范式，GT表示真实值，即第二图像的像素的值，output表示预测图像的像素的值，P为像素个数。其中，L1范式是指对第二图像的像素的值和预测图像的像素的值进行求差，并且求取各个像素对应的差值的绝对值之和。Among them, L _rec represents the reconstruction loss, || represents the L1 norm, GT represents the true value, that is, the value of the pixel of the second image, output represents the value of the pixel of the predicted image, and P is the number of pixels. Wherein, the L1 normal form refers to calculating the difference between the value of the pixel of the second image and the value of the pixel of the predicted image, and calculating the sum of the absolute values of the difference values corresponding to each pixel.

示例性地，梯度损失函数可以是表示预测图像和第二图像在x/y方向上的平均梯度的损失。梯度损失函数可以是如公式2所示：Illustratively, the gradient loss function may be a loss representing the average gradient of the predicted image and the second image in the x/y direction. The gradient loss function can be as shown in Equation 2:

L_grad＝|grad(GT)-grad(output)| 公式2L _grad = |grad(GT)-grad(output)| Equation 2

其中，L_grad表示梯度损失，||表示L1范式，GT表示真实值，即第二图像的像素的值，output表示预测图像的像素的值，grad( )表示图像在x/y方向上的平均梯度。Among them, L _grad represents the gradient loss, || represents the L1 normal form, GT represents the true value, that is, the value of the pixel of the second image, output represents the value of the pixel of the predicted image, and grad( ) represents the average value of the image in the x/y direction gradient.

基于重构损失函数和梯度损失函数，可以得到所述第二图像和所述预测图像对应的第一损失。示例性地，求取第一损失的函数可以是如公式3所示：Based on the reconstruction loss function and the gradient loss function, the first loss corresponding to the second image and the predicted image can be obtained. Exemplarily, the function for obtaining the first loss may be as shown in Equation 3:

L_total＝L_rec+α*L_grad 公式3L _total =L _rec +α*L _grad Formula 3

其中，L_total表示的是第一损失，α为超参数，用于调整梯度损失的权重。Among them, L _total represents the first loss, and α is a hyperparameter used to adjust the weight of the gradient loss.

步骤1004，至少根据所述第一损失对所述待训练图像处理模型的模型参数进行更新，直至满足模型训练条件，得到图像处理模型。Step 1004: Update the model parameters of the image processing model to be trained according to at least the first loss, until the model training conditions are met, and an image processing model is obtained.

步骤1004中训练后得到的图像处理模型可以参照图4对应的实施例中的描述，这里不再赘述。For the image processing model obtained after training in step 1004, reference may be made to the description in the embodiment corresponding to FIG. 4, and details are not repeated here.

也就是说，在模型训练过程中，可以使用语义分割损失函数对所述第一图像的语义分割预测结果进行约束控制，以使得模型生成的语义分割结果能够更加精确。That is, in the model training process, the semantic segmentation loss function may be used to control the semantic segmentation prediction result of the first image, so that the semantic segmentation result generated by the model can be more accurate.

示例性地，用于获取所述语义分割预测结果和所述语义分割真实结果的第二损失的语义分割损失函数可以为交叉熵损失函数。该语义分割损失函数例如可以如公式4所示：Exemplarily, the semantic segmentation loss function used to obtain the second loss of the semantic segmentation prediction result and the semantic segmentation real result may be a cross-entropy loss function. For example, the semantic segmentation loss function can be shown in Equation 4:

其中，L_seg为第二损失，p为图像的像素数量，

s_i用于表示语义分割预测结果在像素i位置针对语义类别z的概率，

用于表示语义分割真实结果在像素i位置针对语义类别z的概率，log()用于表示求对数。where L _seg is the second loss, p is the number of pixels in the image,

s _i is used to represent the probability of the semantic segmentation prediction result for the semantic category z at the pixel i position,

It is used to represent the probability that the true result of semantic segmentation is for the semantic category z at the position of pixel i, and log() is used to represent the logarithm.

基于上述的第一损失和第二损失可以得到第三损失，根据所述第三损失对所述待训练图像处理模型的模型参数进行更新，直至满足模型训练条件，以得到图像处理模型。Based on the above-mentioned first loss and second loss, a third loss can be obtained, and the model parameters of the image processing model to be trained are updated according to the third loss until the model training conditions are satisfied, so as to obtain the image processing model.

示例性地，求取第三损失的公式可以如公式5所示：Exemplarily, the formula for calculating the third loss can be shown in formula 5:

L_total＝L_rec+α*L_seg+β*L_grad 公式5L _total =L _rec +α*L _seg +β*L _grad Formula 5

其中，L_total用于表示第三损失，α为第一超参数，β为第二超参数，α和β分别用于调整语义分割损失和梯度损失的权重。Among them, L _total is used to represent the third loss, α is the first hyperparameter, β is the second hyperparameter, and α and β are used to adjust the weight of semantic segmentation loss and gradient loss, respectively.

可以参阅图11，图11为本申请实施例提供的一种图像处理装置的结构示意图。如图11所示，本申请实施例提供的一种图像处理装置，包括：获取单元1101和处理单元1102；所述获取单元1101用于获取待处理图像；所述处理单元1102，用于通过第一网络对所述待处理图像处理，得到第一特征，所述第一网络被配置为至少提取用于图像增强的特征；通过第二网络对所述待处理图像处理，得到第二特征，所述第二网络被配置为至少提取语义分割特征；根据所述第一特征和所述第二特征，生成第三特征；所述获取单元1101，还用于获取所述待处理图像的语义分割结果；所述处理单元1102，还用于根据所述第三特征和所述待处理图像的语义分割结果，生成第四特征；对所述第四特征进行图像重构，得到目标图像。Referring to FIG. 11 , FIG. 11 is a schematic structural diagram of an image processing apparatus provided by an embodiment of the present application. As shown in FIG. 11, an image processing apparatus provided by an embodiment of the present application includes: an acquisition unit 1101 and a processing unit 1102; the acquisition unit 1101 is used to acquire an image to be processed; the processing unit 1102 is used to A network processes the to-be-processed image to obtain a first feature, the first network is configured to extract at least features for image enhancement; the to-be-processed image is processed by a second network to obtain a second feature, the The second network is configured to extract at least a semantic segmentation feature; generate a third feature according to the first feature and the second feature; the acquiring unit 1101 is further configured to acquire the semantic segmentation result of the image to be processed ; The processing unit 1102 is further configured to generate a fourth feature according to the third feature and the semantic segmentation result of the image to be processed; perform image reconstruction on the fourth feature to obtain a target image.

可选的，在一种可能的实现方式中，所述处理单元1102，还用于通过第三网络对所述第三特征进行处理，得到所述待处理图像的语义分割结果。Optionally, in a possible implementation manner, the processing unit 1102 is further configured to process the third feature through a third network to obtain a semantic segmentation result of the image to be processed.

可选的，在一种可能的实现方式中，所述处理单元1102，还用于对所述第一特征和所述第二特征进行特征融合处理，得到所述第三特征；对所述第三特征和所述待处理图像的语义分割结果进行特征融合处理，得到所述第四特征。Optionally, in a possible implementation manner, the processing unit 1102 is further configured to perform feature fusion processing on the first feature and the second feature to obtain the third feature; The three features and the semantic segmentation result of the image to be processed are subjected to feature fusion processing to obtain the fourth feature.

可选的，在一种可能的实现方式中，所述处理单元1102，还用于对所述第三特征进行处理，得到第五特征；根据所述第五特征和所述待处理图像的语义分割结果，生成第四特征。Optionally, in a possible implementation manner, the processing unit 1102 is further configured to process the third feature to obtain a fifth feature; according to the fifth feature and the semantics of the image to be processed Segment the result to generate a fourth feature.

可选的，在一种可能的实现方式中，所述处理单元1102，还用于对所述待处理图像进行预处理，得到预处理特征；对所述预处理特征进行下采样处理，得到下采样特征；通过所述第二网络对所述下采样特征进行处理，得到第六特征；对所述第六特征进行上采样处理，得到所述待处理图像的第二特征。Optionally, in a possible implementation manner, the processing unit 1102 is further configured to perform preprocessing on the to-be-processed image to obtain preprocessing features; perform downsampling processing on the preprocessing features to obtain the following: Sampling features; processing the down-sampling features through the second network to obtain sixth features; performing up-sampling processing on the sixth features to obtain the second features of the to-be-processed image.

可以参阅图12，图12为本申请实施例提供的一种模型训练装置的结构示意图。如图12所示，本申请实施例提供的一种模型训练装置，包括：获取单元1201和训练单元1202；所述获取单元1201，用于获取训练样本对，所述训练样本对包括第一图像以及第二图像，所述第一图像的质量低于所述第二图像；所述预测单元，用于通过待训练图像处理模型对所述第一图像进行处理，得到预测图像，其中，待训练图像处理模型用于获取待处理图像；通过第一网络对所述第一图像处理，得到第一特征，所述第一网络被配置为至少提取用于图像增强的特征；通过第二网络对所述第一图像处理，得到第二特征，所述第二网络被配置为至少提取语义分割特征；根据所述第一特征和所述第二特征，生成第三特征；获取所述第一图像的语义分割结果；根据所述第三特征和所述第一图像的语义分割结果，生成第四特征；对所述第四特征进行图像重构，得到预测图像；根据所述训练样本对中的第二图像以及所述预测图像，获取第一损失，所述第一损失用于描述所述第二图像和所述预测图像之间的差异；至少根据所述第一损失对所述待训练图像处理模型的模型参数进行更新，直至满足模型训练条件，得到图像处理模型。Referring to FIG. 12, FIG. 12 is a schematic structural diagram of a model training apparatus provided by an embodiment of the present application. As shown in FIG. 12 , a model training apparatus provided by an embodiment of the present application includes: an acquisition unit 1201 and a training unit 1202 ; the acquisition unit 1201 is configured to acquire a training sample pair, and the training sample pair includes a first image and a second image, the quality of the first image is lower than that of the second image; the prediction unit is configured to process the first image through the image processing model to be trained to obtain a predicted image, wherein the to-be-trained image processing model The image processing model is used to obtain the image to be processed; the first image is processed through a first network to obtain first features, and the first network is configured to extract at least features for image enhancement; The first image is processed to obtain a second feature, and the second network is configured to extract at least semantic segmentation features; generate a third feature according to the first feature and the second feature; obtain the first image Semantic segmentation result; generate a fourth feature according to the third feature and the semantic segmentation result of the first image; perform image reconstruction on the fourth feature to obtain a predicted image; Two images and the predicted image, obtain a first loss, where the first loss is used to describe the difference between the second image and the predicted image; process the to-be-trained image at least according to the first loss The model parameters of the model are updated until the model training conditions are met, and an image processing model is obtained.

可选的，在一种可能的实现方式中，所述训练单元1202还用于通过第三网络对所述第三特征进行处理，得到所述第一图像的语义分割预测结果。Optionally, in a possible implementation manner, the training unit 1202 is further configured to process the third feature through a third network to obtain a semantic segmentation prediction result of the first image.

可选的，在一种可能的实现方式中，所述训练单元1202还用于获取所述第一图像的语义分割真实结果；根据所述语义分割预测结果和所述语义分割真实结果，获取第二损失，所述第二损失用于描述所述语义分割预测结果和所述语义分割真实结果之间的差异；至少根据所述第一损失和所述第二损失对所述待训练图像处理模型的模型参数进行更新，直至满足模型训练条件，得到图像处理模型。Optionally, in a possible implementation manner, the training unit 1202 is further configured to obtain the real result of semantic segmentation of the first image; Two losses, the second loss is used to describe the difference between the predicted result of semantic segmentation and the real result of semantic segmentation; the image processing model to be trained is processed at least according to the first loss and the second loss The model parameters are updated until the model training conditions are met, and the image processing model is obtained.

可选的，在一种可能的实现方式中，所述训练单元1202还用于对所述第一特征和所述第二特征进行特征融合处理，得到所述第三特征；对所述第三特征和所述第一图像的语义分割结果进行特征融合处理，得到所述第四特征。Optionally, in a possible implementation manner, the training unit 1202 is further configured to perform feature fusion processing on the first feature and the second feature to obtain the third feature; The feature and the semantic segmentation result of the first image are subjected to feature fusion processing to obtain the fourth feature.

可选的，在一种可能的实现方式中，所述训练单元1202还用于对所述第三特征进行处理，得到第五特征；根据所述第三特征和所述第一图像的语义分割结果，生成第四特征。Optionally, in a possible implementation manner, the training unit 1202 is further configured to process the third feature to obtain a fifth feature; according to the third feature and the semantic segmentation of the first image As a result, a fourth feature is generated.

可选的，在一种可能的实现方式中，所述训练单元1202还用于对所述第一图像进行预处理，得到预处理特征；对所述预处理特征进行下采样处理，得到下采样特征；通过所述第二网络对所述下采样特征进行处理，得到第六特征；对所述第六特征进行上采样处理，得到所述第一图像的第二特征。Optionally, in a possible implementation manner, the training unit 1202 is further configured to preprocess the first image to obtain preprocessing features; perform downsampling processing on the preprocessing features to obtain downsampling feature; processing the down-sampling feature through the second network to obtain a sixth feature; performing up-sampling processing on the sixth feature to obtain the second feature of the first image.

接下来介绍本申请实施例提供的一种执行设备，请参阅图13，图13为本申请实施例提供的执行设备的一种结构示意图，执行设备1300具体可以表现为手机、平板、笔记本电脑、智能穿戴设备、服务器等，此处不做限定。其中，执行设备1300上可以部署有图13对应实施例中所描述的数据处理装置，用于实现图13对应实施例中数据处理的功能。具体的，执行设备1300包括：接收器1301、发射器1302、处理器1303和存储器1304(其中执行设备1300中的处理器1303的数量可以一个或多个，图13中以一个处理器为例)，其中，处理器1303可以包括应用处理器13031和通信处理器13032。在本申请的一些实施例中，接收器1301、发射器1302、处理器1303和存储器1304可通过总线或其它方式连接。Next, an execution device provided by an embodiment of the present application is introduced. Please refer to FIG. 13 , which is a schematic structural diagram of the execution device provided by the embodiment of the present application. Smart wearable devices, servers, etc., are not limited here. The data processing apparatus described in the embodiment corresponding to FIG. 13 may be deployed on the execution device 1300 to implement the function of data processing in the embodiment corresponding to FIG. 13 . Specifically, the execution device 1300 includes: a receiver 1301, a transmitter 1302, a processor 1303, and a memory 1304 (wherein the number of processors 1303 in the execution device 1300 may be one or more, and one processor is taken as an example in FIG. 13 ) , wherein the processor 1303 may include an application processor 13031 and a communication processor 13032 . In some embodiments of the present application, the receiver 1301, the transmitter 1302, the processor 1303, and the memory 1304 may be connected by a bus or otherwise.

存储器1304可以包括只读存储器和随机存取存储器，并向处理器1303提供指令和数据。存储器1304的一部分还可以包括非易失性随机存取存储器(non-volatile randomaccess memory，NVRAM)。存储器1304存储有处理器和操作指令、可执行模块或者数据结构，或者它们的子集，或者它们的扩展集，其中，操作指令可包括各种操作指令，用于实现各种操作。Memory 1304 may include read-only memory and random access memory, and provides instructions and data to processor 1303 . A portion of memory 1304 may also include non-volatile random access memory (NVRAM). The memory 1304 stores processors and operating instructions, executable modules or data structures, or a subset thereof, or an extended set thereof, wherein the operating instructions may include various operating instructions for implementing various operations.

处理器1303控制执行设备的操作。具体的应用中，执行设备的各个组件通过总线系统耦合在一起，其中总线系统除包括数据总线之外，还可以包括电源总线、控制总线和状态信号总线等。但是为了清楚说明起见，在图中将各种总线都称为总线系统。The processor 1303 controls the operation of the execution device. In a specific application, various components of the execution device are coupled together through a bus system, where the bus system may include a power bus, a control bus, a status signal bus, and the like in addition to a data bus. However, for the sake of clarity, the various buses are referred to as bus systems in the figures.

上述本申请实施例揭示的方法可以应用于处理器1303中，或者由处理器1303实现。处理器1303可以是一种集成电路芯片，具有信号的处理能力。在实现过程中，上述方法的各步骤可以通过处理器1303中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器1303可以是通用处理器、数字信号处理器(digital signal processing，DSP)、微处理器或微控制器，还可进一步包括专用集成电路(application specific integratedcircuit，ASIC)、现场可编程门阵列(field-programmable gate array，FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。该处理器1303可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成，或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器，闪存、只读存储器，可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器1304，处理器1303读取存储器1304中的信息，结合其硬件完成上述方法的步骤。The methods disclosed in the above embodiments of the present application may be applied to the processor 1303 or implemented by the processor 1303 . The processor 1303 may be an integrated circuit chip, which has signal processing capability. In the implementation process, each step of the above-mentioned method can be completed by an integrated logic circuit of hardware in the processor 1303 or an instruction in the form of software. The above-mentioned processor 1303 may be a general-purpose processor, a digital signal processor (DSP), a microprocessor or a microcontroller, and may further include an application specific integrated circuit (ASIC), a field programmable gate Field-programmable gate array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components. The processor 1303 may implement or execute the methods, steps, and logical block diagrams disclosed in the embodiments of this application. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in conjunction with the embodiments of the present application may be directly embodied as executed by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor. The software modules may be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other storage media mature in the art. The storage medium is located in the memory 1304, and the processor 1303 reads the information in the memory 1304, and completes the steps of the above method in combination with its hardware.

接收器1301可用于接收输入的数字或字符信息，以及产生与执行设备的相关设置以及功能控制有关的信号输入。发射器1302可用于通过第一接口输出数字或字符信息；发射器1302还可用于通过第一接口向磁盘组发送指令，以修改磁盘组中的数据；发射器1302还可以包括显示屏等显示设备。The receiver 1301 can be used to receive input numerical or character information, and to generate signal input related to performing the relevant setting and function control of the device. The transmitter 1302 can be used to output digital or character information through the first interface; the transmitter 1302 can also be used to send instructions to the disk group through the first interface to modify the data in the disk group; the transmitter 1302 can also include a display device such as a display screen .

本申请实施例中，在一种情况下，处理器1303，用于执行图4对应实施例中的执行设备执行的图像处理方法。In the embodiment of the present application, in one case, the processor 1303 is configured to execute the image processing method executed by the execution device in the embodiment corresponding to FIG. 4 .

本申请实施例还提供了一种训练设备，请参阅图14，图14为本申请实施例提供的训练设备一种结构示意图，具体的，训练设备1400由一个或多个服务器实现，训练设备1400可因配置或性能不同而产生比较大的差异，可以包括一个或一个以上中央处理器(centralprocessing units，CPU)1414(例如，一个或一个以上处理器)和存储器1432，一个或一个以上存储应用程序1442或数据1444的存储介质1430(例如一个或一个以上海量存储设备)。其中，存储器1432和存储介质1430可以是短暂存储或持久存储。存储在存储介质1430的程序可以包括一个或一个以上模块(图示没标出)，每个模块可以包括对训练设备中的一系列指令操作。更进一步地，中央处理器1414可以设置为与存储介质1430通信，在训练设备1400上执行存储介质1430中的一系列指令操作。This embodiment of the present application also provides a training device. Please refer to FIG. 14 . FIG. 14 is a schematic structural diagram of the training device provided by the embodiment of the present application. Specifically, the training device 1400 is implemented by one or more servers. The training device 1400 Large differences may occur due to differences in configuration or performance, and may include one or more central processing units (CPUs) 1414 (eg, one or more processors) and memory 1432, one or more storage applications 1442 or storage medium 1430 for data 1444 (eg, one or more mass storage devices). Among them, the memory 1432 and the storage medium 1430 may be short-term storage or persistent storage. The program stored in the storage medium 1430 may include one or more modules (not shown in the figure), and each module may include a series of instructions to operate on the training device. Further, the central processing unit 1414 may be configured to communicate with the storage medium 1430 to execute a series of instruction operations in the storage medium 1430 on the training device 1400 .

训练设备1400还可以包括一个或一个以上电源1426，一个或一个以上有线或无线网络接口1450，一个或一个以上输入输出接口1458；或，一个或一个以上操作系统1441，例如Windows Server^TM，Mac OS X^TM，Unix^TM,Linux^TM，FreeBSD^TM等等。The training device 1400 may also include one or more power supplies 1426, one or more wired or wireless network interfaces 1450, one or more input and output interfaces 1458; or, one or more operating systems 1441, such as Windows Server ^™ , Mac OS X ^TM , Unix ^TM , Linux ^TM , FreeBSD ^TM and many more.

具体的，训练设备可以执行图10对应的实施例中的步骤。Specifically, the training device may perform the steps in the embodiment corresponding to FIG. 10 .

本申请实施例中还提供一种包括计算机程序产品，当其在计算机上运行时，使得计算机执行如前述执行设备所执行的步骤，或者，使得计算机执行如前述训练设备所执行的步骤。Embodiments of the present application also provide a computer program product that, when running on a computer, causes the computer to perform the steps performed by the aforementioned execution device, or causes the computer to perform the steps performed by the aforementioned training device.

本申请实施例中还提供一种计算机可读存储介质，该计算机可读存储介质中存储有用于进行信号处理的程序，当其在计算机上运行时，使得计算机执行如前述执行设备所执行的步骤，或者，使得计算机执行如前述训练设备所执行的步骤。Embodiments of the present application further provide a computer-readable storage medium, where a program for performing signal processing is stored in the computer-readable storage medium, and when it runs on a computer, the computer executes the steps performed by the aforementioned execution device. , or, causing the computer to perform the steps as performed by the aforementioned training device.

本申请实施例提供的执行设备、训练设备或终端设备具体可以为芯片，芯片包括：处理单元和通信单元，所述处理单元例如可以是处理器，所述通信单元例如可以是输入/输出接口、管脚或电路等。该处理单元可执行存储单元存储的计算机执行指令，以使执行设备内的芯片执行上述实施例描述的数据处理方法，或者，以使训练设备内的芯片执行上述实施例描述的数据处理方法。可选地，所述存储单元为所述芯片内的存储单元，如寄存器、缓存等，所述存储单元还可以是所述无线接入设备端内的位于所述芯片外部的存储单元，如只读存储器(read-only memory，ROM)或可存储静态信息和指令的其他类型的静态存储设备，随机存取存储器(random access memory，RAM)等。The execution device, training device, or terminal device provided in this embodiment of the present application may specifically be a chip, and the chip includes: a processing unit and a communication unit, the processing unit may be, for example, a processor, and the communication unit may be, for example, an input/output interface, pins or circuits, etc. The processing unit can execute the computer executable instructions stored in the storage unit, so that the chip in the execution device executes the data processing method described in the above embodiments, or the chip in the training device executes the data processing method described in the above embodiment. Optionally, the storage unit is a storage unit in the chip, such as a register, a cache, etc., and the storage unit may also be a storage unit located outside the chip in the wireless access device, such as only Read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (RAM), and the like.

具体的，请参阅图15，图15为本申请实施例提供的芯片的一种结构示意图，所述芯片可以表现为神经网络处理器NPU 1500，NPU 1500作为协处理器挂载到主CPU(Host CPU)上，由Host CPU分配任务。NPU的核心部分为运算电路1503，通过控制器1504控制运算电路1503提取存储器中的矩阵数据并进行乘法运算。Specifically, please refer to FIG. 15. FIG. 15 is a schematic structural diagram of a chip provided by an embodiment of the present application. The chip may be represented as a neural network processor NPU 1500, and the NPU 1500 is mounted on the main CPU (Host CPU) as a co-processor. CPU), tasks are allocated by the Host CPU. The core part of the NPU is the arithmetic circuit 1503, which is controlled by the controller 1504 to extract the matrix data in the memory and perform multiplication operations.

在一些实现中，运算电路1503内部包括多个处理单元(Process Engine,PE)。在一些实现中，运算电路1503是二维脉动阵列。运算电路1503还可以是一维脉动阵列或者能够执行例如乘法和加法这样的数学运算的其它电子线路。在一些实现中，运算电路1503是通用的矩阵处理器。In some implementations, the arithmetic circuit 1503 includes multiple processing units (Process Engine, PE). In some implementations, the arithmetic circuit 1503 is a two-dimensional systolic array. The arithmetic circuit 1503 may also be a one-dimensional systolic array or other electronic circuitry capable of performing mathematical operations such as multiplication and addition. In some implementations, arithmetic circuit 1503 is a general-purpose matrix processor.

举例来说，假设有输入矩阵A，权重矩阵B，输出矩阵C。运算电路从权重存储器1502中取矩阵B相应的数据，并缓存在运算电路中每一个PE上。运算电路从输入存储器1501中取矩阵A数据与矩阵B进行矩阵运算，得到的矩阵的部分结果或最终结果，保存在累加器(accumulator)1508中。For example, suppose there is an input matrix A, a weight matrix B, and an output matrix C. The arithmetic circuit fetches the data corresponding to the matrix B from the weight memory 1502 and buffers it on each PE in the arithmetic circuit. The arithmetic circuit fetches the data of matrix A and matrix B from the input memory 1501 to perform matrix operation, and stores the partial result or final result of the matrix in the accumulator 1508 .

统一存储器1506用于存放输入数据以及输出数据。权重数据直接通过存储单元访问控制器(Direct Memory Access Controller，DMAC)1505，DMAC被搬运到权重存储器1502中。输入数据也通过DMAC被搬运到统一存储器1506中。Unified memory 1506 is used to store input data and output data. The weight data is directly passed through a storage unit access controller (Direct Memory Access Controller, DMAC) 1505 , and the DMAC is transferred to the weight memory 1502 . Input data is also moved into unified memory 1506 via the DMAC.

BIU为Bus Interface Unit即，总线接口单元1510，用于AXI总线与DMAC和取指存储器(Instruction Fetch Buffer，IFB)1509的交互。The BIU is the Bus Interface Unit, that is, the bus interface unit 1510 , which is used for the interaction between the AXI bus and the DMAC and the instruction fetch buffer (Instruction Fetch Buffer, IFB) 1509 .

总线接口单元1510(Bus Interface Unit，简称BIU)，用于取指存储器1509从外部存储器获取指令，还用于存储单元访问控制器1505从外部存储器获取输入矩阵A或者权重矩阵B的原数据。The bus interface unit 1510 (Bus Interface Unit, BIU for short) is used for the instruction fetch memory 1509 to obtain instructions from the external memory, and also for the storage unit access controller 1505 to obtain the original data of the input matrix A or the weight matrix B from the external memory.

DMAC主要用于将外部存储器DDR中的输入数据搬运到统一存储器1506或将权重数据搬运到权重存储器1502中或将输入数据数据搬运到输入存储器1501中。The DMAC is mainly used to transfer the input data in the external memory DDR to the unified memory 1506 or the weight data to the weight memory 1502 or the input data to the input memory 1501 .

向量计算单元1507包括多个运算处理单元，在需要的情况下，对运算电路1503的输出做进一步处理，如向量乘，向量加，指数运算，对数运算，大小比较等等。主要用于神经网络中非卷积/全连接层网络计算，如Batch Normalization(批归一化)，像素级求和，对特征平面进行上采样等。The vector calculation unit 1507 includes a plurality of operation processing units, and further processes the output of the operation circuit 1503, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison, etc., if necessary. It is mainly used for non-convolutional/fully connected layer network computation in neural networks, such as Batch Normalization, pixel-level summation, and upsampling of feature planes.

在一些实现中，向量计算单元1507能将经处理的输出的向量存储到统一存储器1506。例如，向量计算单元1507可以将线性函数；或，非线性函数应用到运算电路1503的输出，例如对卷积层提取的特征平面进行线性插值，再例如累加值的向量，用以生成激活值。在一些实现中，向量计算单元1507生成归一化的值、像素级求和的值，或二者均有。在一些实现中，处理过的输出的向量能够用作到运算电路1503的激活输入，例如用于在神经网络中的后续层中的使用。In some implementations, the vector computation unit 1507 can store the vector of processed outputs to the unified memory 1506 . For example, the vector calculation unit 1507 may apply a linear function; or a non-linear function to the output of the operation circuit 1503, such as linear interpolation of the feature plane extracted by the convolution layer, such as a vector of accumulated values, to generate activation values. In some implementations, the vector computation unit 1507 generates normalized values, pixel-level summed values, or both. In some implementations, the vector of processed outputs can be used as activation input to the arithmetic circuit 1503, such as for use in subsequent layers in a neural network.

控制器1504连接的取指存储器(instruction fetch buffer)1509，用于存储控制器1504使用的指令；an instruction fetch buffer 1509 connected to the controller 1504 for storing instructions used by the controller 1504;

统一存储器1506，输入存储器1501，权重存储器1502以及取指存储器1509均为On-Chip存储器。外部存储器私有于该NPU硬件架构。The unified memory 1506, the input memory 1501, the weight memory 1502 and the instruction fetch memory 1509 are all On-Chip memories. External memory is private to the NPU hardware architecture.

其中，上述任一处提到的处理器，可以是一个通用中央处理器，微处理器，ASIC，或一个或多个用于控制上述程序执行的集成电路。Wherein, the processor mentioned in any one of the above may be a general-purpose central processing unit, a microprocessor, an ASIC, or one or more integrated circuits for controlling the execution of the above program.

另外需说明的是，以上所描述的装置实施例仅仅是示意性的，其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。另外，本申请提供的装置实施例附图中，模块之间的连接关系表示它们之间具有通信连接，具体可以实现为一条或多条通信总线或信号线。In addition, it should be noted that the device embodiments described above are only schematic, wherein the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be A physical unit, which can be located in one place or distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment. In addition, in the drawings of the device embodiments provided in the present application, the connection relationship between the modules indicates that there is a communication connection between them, which may be specifically implemented as one or more communication buses or signal lines.

通过以上的实施方式的描述，所属领域的技术人员可以清楚地了解到本申请可借助软件加必需的通用硬件的方式来实现，当然也可以通过专用硬件包括专用集成电路、专用CPU、专用存储器、专用元器件等来实现。一般情况下，凡由计算机程序完成的功能都可以很容易地用相应的硬件来实现，而且，用来实现同一功能的具体硬件结构也可以是多种多样的，例如模拟电路、数字电路或专用电路等。但是，对本申请而言更多情况下软件程序实现是更佳的实施方式。基于这样的理解，本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来，该计算机软件产品存储在可读取的存储介质中，如计算机的软盘、U盘、移动硬盘、ROM、RAM、磁碟或者光盘等，包括若干指令用以使得一台计算机设备(可以是个人计算机，训练设备，或者网络设备等)执行本申请各个实施例所述的方法。From the description of the above embodiments, those skilled in the art can clearly understand that the present application can be implemented by means of software plus necessary general-purpose hardware. Special components, etc. to achieve. Under normal circumstances, all functions completed by a computer program can be easily implemented by corresponding hardware, and the specific hardware structures used to implement the same function can also be various, such as analog circuits, digital circuits or special circuit, etc. However, a software program implementation is a better implementation in many cases for this application. Based on this understanding, the technical solutions of the present application can be embodied in the form of software products in essence, or the parts that make contributions to the prior art. The computer software products are stored in a readable storage medium, such as a floppy disk of a computer. , U disk, mobile hard disk, ROM, RAM, magnetic disk or optical disk, etc., including several instructions to enable a computer device (which may be a personal computer, training device, or network device, etc.) to execute the various embodiments of the application. method.

在上述实施例中，可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时，可以全部或部分地以计算机程序产品的形式实现。In the above-mentioned embodiments, it may be implemented in whole or in part by software, hardware, firmware or any combination thereof. When implemented in software, it can be implemented in whole or in part in the form of a computer program product.

所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时，全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中，或者从一个计算机可读存储介质向另一计算机可读存储介质传输，例如，所述计算机指令可以从一个网站站点、计算机、训练设备或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、训练设备或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存储的任何可用介质或者是包含一个或多个可用介质集成的训练设备、数据中心等数据存储设备。所述可用介质可以是磁性介质，(例如，软盘、硬盘、磁带)、光介质(例如，DVD)、或者半导体介质(例如固态硬盘(Solid State Disk，SSD))等。The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, all or part of the processes or functions described in the embodiments of the present application are generated. The computer may be a general purpose computer, special purpose computer, computer network, or other programmable device. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be retrieved from a website, computer, training device, or data Transmission from the center to another website site, computer, training facility or data center via wired (eg coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (eg infrared, wireless, microwave, etc.) means. The computer-readable storage medium may be any available medium that can be stored by a computer, or a data storage device such as a training device, a data center, or the like that includes an integration of one or more available media. The usable media may be magnetic media (eg, floppy disks, hard disks, magnetic tapes), optical media (eg, DVD), or semiconductor media (eg, Solid State Disk (SSD)), and the like.

Claims

1. An image processing method, comprising:

acquiring an image to be processed;

processing the image to be processed through a first network to obtain a first feature, wherein the first network is configured to extract at least a feature for image enhancement;

processing the image to be processed through a second network to obtain a second feature, wherein the second network is configured to extract at least a semantic segmentation feature;

generating a third feature according to the first feature and the second feature;

obtaining a semantic segmentation result of the image to be processed;

generating a fourth feature according to the third feature and the semantic segmentation result of the image to be processed;

and carrying out image reconstruction on the fourth characteristic to obtain a target image.

2. The image processing method according to claim 1, wherein the obtaining of the semantic segmentation result of the image to be processed comprises:

and processing the third features through a third network to obtain a semantic segmentation result of the image to be processed.

3. The image processing method according to claim 1 or 2, wherein the generating a third feature from the first feature and the second feature comprises:

performing feature fusion processing on the first feature and the second feature to obtain a third feature;

generating a fourth feature according to the third feature and the semantic segmentation result of the image to be processed, wherein the fourth feature comprises:

and performing feature fusion processing on the third feature and the semantic segmentation result of the image to be processed to obtain the fourth feature.

4. The image processing method according to claim 3, wherein the feature fusion process includes at least one of a summation process, a multiplication process, a concatenation process, and a concatenation convolution process.

5. The image processing method according to any one of claims 1 to 4, wherein before generating a fourth feature according to the third feature and a semantic segmentation result of the image to be processed, the method further comprises:

processing the third characteristic to obtain a fifth characteristic;

generating a fourth feature according to the third feature and the semantic segmentation result of the image to be processed, wherein the fourth feature comprises: and generating a fourth feature according to the fifth feature and the semantic segmentation result of the image to be processed.

6. The image processing method according to any one of claims 1 to 5, wherein the processing the image to be processed through the second network to obtain a second feature comprises:

preprocessing the image to be processed to obtain preprocessing characteristics;

performing down-sampling processing on the preprocessing characteristic to obtain a down-sampling characteristic;

processing the downsampled features through the second network to obtain sixth features;

and performing upsampling processing on the sixth feature to obtain a second feature of the image to be processed.

7. The image processing method according to any of claims 1 to 6, wherein the method is used to implement at least one of the following image enhancement tasks: image super-resolution reconstruction, image denoising, image defogging, image deblurring, image contrast enhancement, image demosaicing, image raining, image color enhancement, image brightness enhancement, image detail enhancement and image dynamic range enhancement.

8. A method of model training, comprising:

acquiring a training sample pair, wherein the training sample pair comprises a first image and a second image, and the quality of the first image is lower than that of the second image;

processing the first image through an image processing model to be trained to obtain a predicted image, wherein the image processing model to be trained is used for obtaining an image to be processed; processing the first image through a first network resulting in first features, the first network configured to extract at least features for image enhancement; processing the first image through a second network to obtain a second feature, wherein the second network is configured to extract at least a semantic segmentation feature; generating a third feature according to the first feature and the second feature; obtaining a semantic segmentation result of the first image; generating a fourth feature according to the third feature and the semantic segmentation result of the first image; carrying out image reconstruction on the fourth feature to obtain a predicted image;

obtaining a first loss according to a second image in the training sample pair and the predicted image, wherein the first loss is used for describing the difference between the second image and the predicted image;

and updating the model parameters of the image processing model to be trained at least according to the first loss until model training conditions are met, so as to obtain the image processing model.

9. The model training method of claim 8, wherein the to-be-trained image processing model is further configured to process the third feature through a third network to obtain a semantic segmentation prediction result of the first image.

10. The model training method of claim 9, wherein the image processing model to be trained is further configured to:

obtaining a semantic segmentation real result of the first image;

obtaining a second loss according to the semantic segmentation prediction result and the semantic segmentation real result, wherein the second loss is used for describing the difference between the semantic segmentation prediction result and the semantic segmentation real result;

and updating the model parameters of the image processing model to be trained at least according to the first loss and the second loss until model training conditions are met, so as to obtain the image processing model.

11. The model training method according to any one of claims 8 to 10, wherein the image processing model to be trained is further configured to:

12. The model training method of claim 11, wherein the feature fusion process comprises at least one of a summation process, a multiplication process, a concatenation process, and a concatenation convolution process.

13. The model training method according to any one of claims 8 to 12, wherein the to-be-trained image processing model is further configured to process the third feature to obtain a fifth feature; and generating a fourth feature according to the third feature and the semantic segmentation result of the first image.

14. The model training method according to any one of claims 8 to 13, wherein the image processing model to be trained is further configured to preprocess the first image to obtain a preprocessing feature; performing down-sampling processing on the preprocessing characteristic to obtain a down-sampling characteristic; processing the downsampled features through the second network to obtain sixth features; and performing upsampling processing on the sixth feature to obtain a second feature of the first image.

15. The model training method of any one of claims 8 to 14, wherein the image processing model is configured to perform at least one of the following image enhancement tasks: image super-resolution reconstruction, image denoising, image defogging, image deblurring, image contrast enhancement, image demosaicing, image raining, image color enhancement, image brightness enhancement, image detail enhancement and image dynamic range enhancement.

16. An image processing apparatus, characterized in that the apparatus comprises a memory and a processor; the memory stores code, the processor is configured to execute the code, and when executed, the image processing apparatus performs the method of any of claims 1 to 15.

17. A computer storage medium, characterized in that the computer storage medium stores one or more instructions that, when executed by one or more computers, cause the one or more computers to implement the method of any of claims 1 to 15.