Disclosure of Invention
The application provides an image color mapping method, an image color mapping device, a terminal device and a storage medium, which can solve the problem of poor quality of an optimized image in the current image color modification task.
In a first aspect, an embodiment of the present application provides an image color mapping method, where the method includes: acquiring an image to be processed, inputting the image to be processed into a trained color mapping model for processing, and outputting an optimized image, wherein the color mapping model comprises a main network and a color condition network;
the color condition network comprises at least one color condition module and a feature conversion module which are sequentially connected, wherein the at least one color condition module is used for extracting global color feature information from a low-resolution image of an image to be processed, the feature conversion module is used for converting the global color feature information into N groups of adjusting parameters, the N groups of adjusting parameters are respectively used for adjusting N intermediate features extracted by the main network in the process of converting the image to be processed into an optimized image, and N is an integer greater than or equal to 1.
According to the image color mapping method, global color feature extraction and compression are carried out on the low-resolution image of the input image to be processed by utilizing at least one color condition module in the color condition network, and compared with a method based on local feature extraction, artificial artifacts can be prevented from being introduced into the optimized image. The global feature information is converted into the adjusting parameters through the feature conversion module to represent the color prior information of the image to be processed, the adjusting parameters are used for adjusting the extracted intermediate features in the main network, so that the corresponding optimized image is generated in a self-adaptive mode according to the color prior information of different images to be processed, and the quality of the optimized image is improved.
Optionally, the color condition module comprises a convolutional layer, a pooling layer, a first activation function, and an IN layer connected IN sequence.
Optionally, the feature conversion module comprises a Dropout layer, a convolutional layer, a pooling layer, and N full-connection layers; the Dropout layer, the convolution layer and the pooling layer are sequentially connected and used for processing the global color feature information to obtain a condition vector; and the N full-connection layers are respectively used for carrying out feature conversion on the condition vectors to obtain N groups of the adjusting parameters.
Optionally, the main network includes N GFM layers, to which N sets of tuning parameters are input, respectively, and the GFM layer is configured to tune the intermediate features input to the GFM layer according to the tuning parameters.
Optionally, the main network further includes N convolutional layers and N-1 second activation functions, the N GFM layers are respectively connected to the output ends of the N convolutional layers, and the convolutional cores of the convolutional layers have a size of 1 × 1.
Based on the optional mode, the size of the convolution kernel in the network is set to be 1 multiplied by 1, so that the number of network parameters can be effectively reduced, and the operation complexity of the network is further reduced.
Optionally, the image to be processed is a video frame acquired from an SDR video, each frame of the SDR video is subjected to color mapping model optimization processing, and then a corresponding optimized image is output, and an HDR video corresponding to the SDR video is obtained after frame combination.
In a second aspect, an embodiment of the present application provides an image color mapping apparatus, including:
the acquisition unit is used for acquiring an image to be processed;
the system comprises a processing unit, a color condition network and a characteristic conversion module, wherein the processing unit is used for inputting an image to be processed into a trained color mapping model for optimization processing and outputting an optimized image, the color mapping model comprises a main network and a color condition network, the color condition network comprises a plurality of color condition modules and a characteristic conversion module which are sequentially connected, the plurality of color condition modules are used for extracting global color characteristic information from a low-resolution image of the image to be processed, and the characteristic conversion module is used for converting the global color characteristic information into N groups of adjusting parameters; the N groups of adjusting parameters are respectively used for adjusting N intermediate features extracted by the main network in the process of converting the image to be processed into the optimized image, and N is an integer greater than or equal to 1.
Optionally, the color condition module comprises a convolutional layer, a pooling layer, a first activation function, and an IN layer connected IN sequence.
In a third aspect, an embodiment of the present application provides a terminal device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the method according to any one of the first aspect is implemented.
In a fourth aspect, the present application provides a computer-readable storage medium, in which a computer program is stored, and when executed by a processor, the computer program implements the method according to any one of the above first aspects.
In a fifth aspect, the present application provides a computer program product, which when run on a terminal device, causes the terminal device to execute the method of any one of the above first aspects.
It is to be understood that beneficial effects of the second aspect to the fifth aspect may refer to the relevant description of the beneficial effects brought by the first aspect and the possible embodiments of the first aspect, and are not described herein again.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
In order to improve the quality of an optimized image in an image color modification task, the embodiment of the application provides an image color mapping method, an image color mapping device, a terminal device and a storage medium. The color mapping model provided by the application is used for optimizing the image to be processed, and outputting the optimized image with higher contrast and rich colors. The color mapping model comprises a main network and a color condition network, wherein the color condition network is used for extracting adjusting parameters from a low-resolution image of an image to be processed, and adjusting intermediate features generated in the process of converting the image to be processed into an optimized image by using the adjusting parameters, so that the color mapping between the image to be processed and the optimized image is adaptively adjusted according to the characteristics of different images to be processed, artifacts generated in the optimized image are avoided, and the quality of the optimized image is improved.
The technical solution of the present application is described in detail below with reference to the accompanying drawings. The embodiments described below with reference to the drawings are exemplary and intended to be used for explaining the present application and should not be construed as limiting the present application.
An exemplary description of the image color mapping method provided by the present application is provided with reference to fig. 1. The color mapping model may be deployed in an image processing device. The image processing device may be a mobile terminal such as a smart phone, a tablet computer, a camera, or the like, or may be a device capable of processing image data such as a desktop computer, a robot, a server, or the like.
In one possible implementation, the color mapping model provided herein includes a master network and a color condition network. The Color Condition network comprises at least one Color Condition Block (CCB) and a characteristic conversion module which are connected in sequence. The at least one color condition module is used for extracting global color feature information from a low-resolution image of the image to be processed. The characteristic conversion module is used for converting the global color characteristic information into N groups of adjusting parameters. The N groups of adjusting parameters are respectively used for adjusting N intermediate features extracted by the main network in the process of converting the image to be processed into the optimized image, and N is an integer greater than or equal to 1.
For example, the image to be processed may be downsampled by a certain multiple (for example, downsampled by 4 times) to obtain a corresponding low-resolution image. Assuming that a low-resolution image is obtained after the image to be processed is downsampled by 4 times, the size of the low-resolution image is the same as that of the image to be processed, and the number of pixels of the image to be processed in a unit area is only 4 times that of the pixels of the image to be processed in the unit area.
According to the color mapping model, global color feature extraction and compression are carried out on the low-resolution image of the input image to be processed through at least one color condition module, and compared with a method based on local feature extraction, artificial artifacts can be prevented from being introduced into the optimized image. The global feature information is converted into the adjusting parameters through the feature conversion module to represent the color prior information of the image to be processed, the adjusting parameters are used for adjusting the intermediate features of the image to be processed extracted from the main network, so that the corresponding optimized image is generated in a self-adaptive mode according to the color prior information of different images to be processed, and the quality of the optimized image is improved.
In one embodiment, as shown in FIG. 1, the color condition module includes a convolutional layer, a pooling layer, a first activation function, and an IN (instant normalization) layer, which are connected in sequence. The color condition module can extract global features of the input low-resolution images, and compared with a method based on image local feature extraction, the color condition module can effectively represent the global feature information of the images to be processed, so that artificial artifacts can be prevented from being introduced into the optimized images.
The feature conversion module comprises a Dropout layer, a convolution layer, a pooling layer and N full-connection layers. The Dropout layer, the convolution layer and the pooling layer are sequentially connected and used for processing the global color feature information extracted by the at least one color condition module to obtain a condition vector. And the N full-connection layers are respectively used for carrying out feature conversion on the condition vectors to obtain N groups of adjusting parameters. It should be noted that each fully-connected layer respectively processes the condition vector to obtain a set of adjustment parameters, and finally, the number of fully-connected layers may be the same as the number of sets of adjustment parameters.
Illustratively, the color mapping model shown in fig. 1 includes 4 color condition modules connected in sequence. In the color condition module and the feature conversion module, the sizes of convolution kernels in the convolution layers are both 1 × 1, and the pooling layers are both subjected to average pooling. The first activation function is the nonlinear activation function, LeakyReLU.
In the embodiment of the present application, the main network includes N Global Feature Modulation (GFM) layers, and N sets of tuning parameters are input to the N GFM layers. The GFM layer can adjust the intermediate characteristic input to the GFM layer based on the adjustment parameter.
The primary network may be any neural network model that can perform the task of color optimization or color conversion. By inserting N GFM layers into the main network, the color condition module provided by the present application can be connected to the main network, resulting in the color mapping model provided by the present application.
In one example, the primary network may be a full convolutional network. Namely, the main network comprises N convolutional layers and N-1 second activation functions, and the N GFM layers are respectively connected with the output ends of the N convolutional layers. The main network is used to convert the image to be processed into an optimized image, and in the conversion process, the N convolutional layers can be used to extract N intermediate features. The convolution kernel size is 1 × 1 in each convolution layer. The second activation function may be a non-linear activation function ReLU.
In the color mapping model provided by the embodiment of the application, the convolution kernels of the convolution layers are all 1 × 1, and the network model has fewer parameters, so that the computational complexity can be effectively reduced, the computational efficiency is improved, and the real-time performance of the algorithm is further improved.
It should be noted that the number of all-connected layers in the color condition network and the number of groups of correspondingly generated tuning parameters should be designed based on the number of convolutional layers in the main network. For example, if the main network includes N convolutional layers, it means that N intermediate features generated by the N convolutional layers need to be adjusted. Therefore, N sets of adjustment parameters corresponding to the N intermediate features need to be output in the color condition network, and N GFM layers need to adjust the N intermediate features according to the N sets of adjustment parameters in the main network.
Illustratively, as shown in fig. 1, assuming that N is 3, the primary network includes 3 convolution (Conv) layers, 3 GFM layers, and 2 second activation function (ReLU) layers. Specifically, the main network includes a convolutional layer, a GFM layer, a ReLU layer, a convolutional layer, and a GFM layer in this order from input to output. Correspondingly, in the color condition network, the color condition module comprises 4 CCB layers which are connected in sequence; the feature conversion module may include a Dropout layer, a convolution (Conv) layer, an averaging pooling (Avgpool) layer, and 3 Full Connection (FC) layers respectively connected to a Condition Vector (Condition Vector) output from the averaging pooling layer, which are connected in sequence. Each fully-connected layer may convert the condition vector into a corresponding set of tuning parameters (γ, β), with the color conditioning network outputting 3 sets of tuning parameters in total (i.e., tuning parameter 1, tuning parameter 2, and tuning parameter 3). Each GFM layer in the main network adjusts the intermediate features input to the GFM layer according to the corresponding adjustment parameters, which can be expressed as formula (1):
GFM(xi)=γ*xi+β (1)
in the formula (1), xiRepresents the ith intermediate feature input to the GFM layer; GFM (x)i) Representing the intermediate characteristic x of the GFM layer input according to the manipulated variable (gamma, beta)iThe result of the adjustment of (1).
It will be appreciated that there are different color mapping relationships between the images to be processed containing different scenes and the optimized image. The color mapping model extracts the color characteristic information of the image to be processed as prior information through the color condition network and is used for adjusting the intermediate characteristic in the main network, so that the color mapping model can self-adaptively output the optimized image corresponding to the image to be processed based on the color prior characteristic information of different images to be processed, artificial artifacts in the optimized image are avoided, and the quality of the optimized image is improved.
The color mapping model provided by the application has universality and can be applied to any task needing color optimization or color conversion on the image to be processed. Such as image editing, image retouching and toning, image coloring, sdr (standard Dynamic range) video to hdr (high Dynamic range) video, etc.
Taking the example of converting the SDR video into the HDR video, because of the limitation of the shooting equipment, the existing HDR video resources are few, and a large amount of existing SDR video needs to be converted into the HDR video to meet the needs of the user. Fig. 2 is a schematic diagram of HDR and SDR gamut representation ranges. Wherein, BT.709 and BT.2020 are television parameter standards published by ITU (International telecommunication Union), and DCI-P3 is a color gamut standard established by the U.S. movie industry for the digital cinema institute. As can be seen from fig. 2, the largest color gamut among DCI-P3, bt.709 and bt.2020 is bt.2020, the second order color gamut of DCI-P3 is bt.709 is the smallest. Currently, SDR video uses the bt.709 color gamut, while HDR video uses the bt.2020 color gamut or DCI-P3 color gamut, which is broader in color gamut. With respect to the same video, HDR video can exhibit higher contrast and richer colors than SDR video, whether HDR video employs the bt.2020 color gamut or the DCI-P3 color gamut.
In the prior art, most of common methods for converting SDR video into HDR video are to convert SDR data into HDR data by using an image coding technology, so that the HDR data can be played on an HDR terminal device. There is also a need to convert low resolution SDR video content to high resolution HDR video content that conforms to the HDR video standard by a super resolution conversion method. The existing video conversion method has high calculation cost, and the converted HDR video has artificial artifacts and color deviation, thereby affecting the quality of the video. Compared with the prior art, the color mapping model provided by the application can adaptively convert different SDR videos into corresponding HDR videos according to the color mapping relation between SDR and HDR in different scenes, and can improve artificial artifacts and color deviation existing in the converted HDR videos.
It can be understood that for different tasks, the initial color mapping model can be trained by designing the corresponding training set and the loss function, so as to obtain the color mapping models suitable for different tasks. Taking the task of converting SDR video into HDR video as an example, the following describes an exemplary training process and application of the color mapping model provided by the present application.
Step one, a training set is obtained.
For the SDR video to HDR video task, the training set may include a plurality of SDR video frame samples and HDR video frame samples that correspond one-to-one with the plurality of SDR video frame samples.
Specifically, an SDR video sample and its corresponding HDR video sample are first obtained. Illustratively, an SDR video sample and a corresponding HDR video sample may be obtained from a public video website. Or performing SDR and HDR processing on videos in the same RAW data format respectively to obtain an SDR video sample and an HDR video sample corresponding to the SDR video sample. The SDR video sample and the HDR video sample can be shot by utilizing the SDR camera and the HDR camera respectively in the same scene. After the SDR video samples and the corresponding HDR video samples are obtained, frame extraction processing is respectively carried out on the SDR video samples and the corresponding HDR video samples to obtain a plurality of SDR video frame samples and the HDR video frame samples corresponding to the SDR video frame samples one by one.
And step two, training the initial color mapping model by utilizing the training set and a preset loss function to obtain a trained color mapping model.
After the initial color mapping model is built, the SDR video frame samples are input into the primary network of the initial color mapping model. And respectively carrying out downsampling processing on a plurality of SDR video frame samples to obtain a plurality of low-resolution images, and inputting the low-resolution images into a color condition network of an initial color mapping model to obtain adjusting parameters so as to adjust the HDR video frame predicted by the initial color mapping model.
The preset loss function f is used for describing the HDR video frame predicted by the initial color mapping model
The L2 loss from the HDR video frame sample H can be expressed as equation (2):
based on the training set and the preset loss function, iterative training can be carried out on the initial color mapping model through a gradient descent method until the model converges, and the trained color mapping model can be obtained.
Fig. 3 is a schematic flowchart of a method for converting an HDR video to an SDR video according to an embodiment of the present disclosure. As can be seen from fig. 3, SDR video can be converted into HDR video with high contrast and more colors based on a trained color mapping model. Firstly, frame extraction processing is carried out on an obtained SDR video to be processed, and a video frame obtained from the SDR video is the image to be processed which is input into a color mapping model and is shown in figure 1.
For each frame of video frames in the SDR video, the video frame is input into a primary network of trained color mapping models. And performing 4-time down-sampling processing on the video frame to obtain a low-resolution image, and inputting the low-resolution image into a color condition network of a trained color mapping model to obtain a plurality of adjusting parameters. And adjusting the intermediate characteristics input into the GFM layers by the plurality of GFM layers in the main network according to the corresponding adjusting parameters, and finally outputting the optimized image corresponding to the video frame. And combining the optimized images corresponding to each frame of video frame in the SDR video to obtain the HDR video corresponding to the SDR video.
It should be noted that the color mapping model provided by the present application may be directly added to the post-processing process of the terminal device such as the camera, so as to improve the quality of the image or video shot by the terminal device such as the camera from the perspective of software. The color mapping model provided by the application can also be used as an image/video later-stage color enhancement means to perform color optimization on the existing SDR or other image data.
An embodiment of the present application further provides an image color mapping apparatus, where the embodiment of the apparatus corresponds to the embodiment of the image color mapping method, and for convenience of reading, details in the foregoing method embodiment are not repeated one by one in the embodiment of the apparatus, but it should be clear that the apparatus in the embodiment can correspondingly implement all the contents in the foregoing method embodiment.
Fig. 4 is a schematic structural diagram of an image color mapping apparatus provided in an embodiment of the present application, and as shown in fig. 4, an image color mapping apparatus 100 provided in this embodiment includes an obtaining unit 101 and a processing unit 102.
Specifically, the acquiring unit 101 is configured to acquire an image to be processed. The processing unit 102 is configured to input the image to be processed into the trained color mapping model for optimization processing, and output an optimized image. The color mapping model comprises a main network and a color condition network, the color condition network comprises a plurality of color condition modules and a feature conversion module which are sequentially connected, the plurality of color condition modules are used for extracting global color feature information from a low-resolution image of an image to be processed, and the feature conversion module is used for converting the global color feature information into N groups of adjusting parameters. The N groups of adjusting parameters are respectively used for adjusting N intermediate features extracted by the main network in the process of converting the image to be processed into the optimized image, and N is an integer greater than or equal to 1.
Optionally, the color condition module comprises a convolutional layer, a pooling layer, a first activation function, and an IN layer connected IN sequence.
Optionally, the feature conversion module comprises a Dropout layer, a convolutional layer, a pooling layer, and N fully-connected layers. The Dropout layer, the convolution layer and the pooling layer are sequentially connected and used for processing the global color feature information to obtain a condition vector. And the N full-connection layers are respectively used for carrying out feature conversion on the condition vectors to obtain N groups of the adjusting parameters.
Optionally, the main network includes N GFM layers, to which N sets of tuning parameters are input, respectively, and the GFM layer is configured to tune the intermediate features input to the GFM layer according to the tuning parameters.
Optionally, the main network further includes N convolutional layers and N-1 second activation functions, the N GFM layers are respectively connected to the output ends of the N convolutional layers, and the convolutional cores of the convolutional layers have a size of 1 × 1.
Optionally, the image to be processed is a video frame acquired from an SDR video, each frame of the SDR video is subjected to color mapping model optimization processing, and then a corresponding optimized image is output, and an HDR video corresponding to the SDR video is obtained after frame combination.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
Based on the same inventive concept, the embodiment of the application also provides the terminal equipment. As shown in fig. 5, the terminal device 200 of this embodiment includes: a processor 201, a memory 202, and a computer program 204 stored in the memory 202 and executable on the processor 201. The computer program 204 may be executed by the processor 201 to generate the instructions 203, and the processor 201 may implement the steps in the above-described embodiments of the image color mapping method according to the instructions 203. Alternatively, the processor 201, when executing the computer program 204, implements the functions of the modules/units in the above-described device embodiments, such as the functions of the unit 101 and the unit 102 shown in fig. 4.
Illustratively, the computer program 204 may be partitioned into one or more modules/units, which are stored in the memory 202 and executed by the processor 201 to accomplish the present application. One or more of the modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution of the computer program 204 in the terminal device 200.
Those skilled in the art will appreciate that fig. 5 is merely an example of the terminal device 200 and does not constitute a limitation of the terminal device 200 and may include more or less components than those shown, or combine certain components, or different components, e.g., the terminal device 200 may also include input-output devices, network access devices, buses, etc.
The Processor 201 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The storage 202 may be an internal storage unit of the terminal device 200, such as a hard disk or a memory of the terminal device 200. The memory 202 may also be an external storage device of the terminal device 200, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), or the like provided on the terminal device 200. Further, the memory 202 may also include both an internal storage unit of the terminal device 200 and an external storage device. The memory 202 is used to store computer programs and other programs and data required by the terminal device 200. The memory 202 may also be used to temporarily store data that has been output or is to be output.
The terminal device provided in this embodiment may execute the method embodiments, and the implementation principle and the technical effect are similar, which are not described herein again.
Embodiments of the present application further provide a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the computer program implements the method described in the above method embodiments.
The embodiment of the present application further provides a computer program product, which when running on a terminal device, enables the terminal device to implement the method described in the above method embodiment when executed.
The integrated unit may be stored in a computer-readable storage medium if it is implemented in the form of a software functional unit and sold or used as a separate product. Based on such understanding, all or part of the processes in the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium and can implement the steps of the embodiments of the methods described above when the computer program is executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable storage medium may include at least: any entity or device capable of carrying computer program code to a photographing apparatus/terminal apparatus, a recording medium, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signal, telecommunication signal, and software distribution medium. Such as a usb-disk, a removable hard disk, a magnetic or optical disk, etc. In certain jurisdictions, computer-readable media may not be an electrical carrier signal or a telecommunications signal in accordance with legislative and patent practice.
Reference throughout this application to "one embodiment" or "some embodiments," etc., means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," or the like, in various places throughout this specification are not necessarily all referring to the same embodiment, but rather "one or more but not all embodiments" unless specifically stated otherwise. The terms "comprising," "including," "having," and variations thereof mean "including, but not limited to," unless expressly specified otherwise.
In the description of the present application, it is to be understood that the terms "first", "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implying any number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature.
In addition, in the present application, unless otherwise explicitly specified or limited, the terms "connected," "connected," and the like are to be construed broadly, e.g., as meaning both mechanically and electrically; the terms may be directly connected or indirectly connected through an intermediate medium, and may be used for communicating between two elements or for interacting between two elements, unless otherwise specifically defined, and the specific meaning of the terms in the present application may be understood by those skilled in the art according to specific situations.
The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present application.