CN119624782B

CN119624782B - Satellite video super-resolution reconstruction method and system for multiple degradation process

Info

Publication number: CN119624782B
Application number: CN202510161817.5A
Authority: CN
Inventors: 李路; 王密
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2025-02-14
Filing date: 2025-02-14
Publication date: 2025-05-02
Anticipated expiration: 2045-02-14
Also published as: CN119624782A

Abstract

The invention provides a satellite video super-resolution reconstruction method and a system for multiple degradation processes, and the whole flow of the method comprises the following steps: the method comprises the steps of generating corresponding low-resolution video data through a satellite video random degradation module, extracting features of the low-resolution video data by a feature extraction module, transmitting each frame of features of the whole video sequence by a high-order grid bidirectional propagation scheme through the extracted features, realizing feature alignment of the video data by a feature alignment module, and finally carrying out pixel recombination on the aligned features and adding the up-sampled low-resolution features to obtain a reconstructed high-resolution image. The three satellite video data experiments show that the method can effectively recover the high fidelity of the satellite video, remarkably improve the reconstruction quality of the remote sensing image and simultaneously has certain denoising capability.

Description

Satellite video super-resolution reconstruction method and system for multiple degradation process

Technical Field

The invention belongs to the field of high-resolution optical satellite video image processing, and particularly relates to a satellite video super-resolution reconstruction method and system for a multiple degradation process.

Background

With the rapid development of remote sensing technology, users can more easily acquire high-spatial-resolution and high-time-resolution earth observation remote sensing data, and the traditional static earth observation data is not satisfied. Video satellites are a new type of earth-looking satellites developed in recent years that "stare" at a particular target for a long period of time and capture time-series images of the particular target at a frame rate, as compared to conventional earth-looking satellites. Compared with the traditional static remote sensing image, the satellite video data not only has sub-meter high spatial resolution, but also has video-level high time resolution, can provide continuous information of specific targets, realizes high-space-time dynamic earth observation, and is widely applied to dynamic scene applications such as change detection, target tracking, traffic monitoring and the like. Therefore, the processing research on the video satellite data has important significance for homeland monitoring, disaster relief, national defense safety, smart cities and the like. However, the spatial resolution of satellite video data is affected by complex environmental factors in imaging and data transmission, such as atmospheric scattering, sensor tremors, data compression, and the like, high-frequency information in the satellite video data may be lost, and a blurring phenomenon occurs, which greatly reduces the performance of subsequent applications, so that improving the spatial resolution of satellite video is very important for human perception and realization of downstream tasks. Compared with the method of hardware to improve the resolution, the super-resolution reconstruction technology can greatly save the maintenance, transmission and storage costs of satellites.

Superresolution reconstruction is a very classical underlying task in computer vision that uses low resolution images to reconstruct high resolution images to improve image quality. The video super-resolution reconstruction requires the resolution of each frame in the video and puts requirements on the consistency of the images. In a broad sense, video super-resolution reconstruction can be regarded as an extension of image super-resolution reconstruction, which can be processed frame by frame using a single image super-resolution reconstruction algorithm. However, in practice, the results of processing video using image super-resolution reconstruction algorithms are difficult to satisfy, since time information is not considered, thereby causing artifacts and hysteresis. Video has more information, and video has an additional time dimension relative to the image, so it is more challenging to design a super-resolution reconstruction algorithm for video. To better utilize this time information, researchers often introduce frame alignment to eliminate the effects of object or background motion in the video. Compared with a natural scene video, the satellite video has the advantages that firstly, the spatial resolution of the satellite video is lower, texture information is lack, secondly, the visible range of the satellite video is larger than that of the natural scene video, the scenes are more various, the information density is higher, and thirdly, the variable scale of a moving object in the satellite video has more complex movement variation. Therefore, although there has been a great development in recent years of video super-resolution reconstruction based on deep learning, these general methods are not suitable for direct application to satellite video, and require more efforts to develop a method suitable for satellite video super-resolution reconstruction.

The existing method for acquiring the low-resolution data mainly comprises three modes, namely 1) synthesizing the low-resolution image from the corresponding high-resolution image based on degradation assumptions such as bicubic degradation and the like. However, when the test low resolution image does not meet the degradation assumption, the trained superreconstruction model performance may drop dramatically. 2) A low-resolution-high-resolution image pair is created from a real remote sensing image, wherein the low-resolution and high-resolution images are collected at the same location. However, this method has problems such as land coverage variation and spectral gaps between the two images. To solve the above problem, a third method is proposed to train a super-resolution reconstruction model using unpaired low-resolution-high-resolution images. These methods typically utilize a downgrader and a generator, where the low resolution image is first processed by the generator, and then the generator can be trained in a self-supervising manner. This makes it possible to use unpaired low-high resolution images, so training data is easier to collect. However, training of these methods is often unstable due to the field gap between low resolution and high resolution images and lack of intermediate supervision. For application of remote sensing image super-resolution reconstruction in the real remote sensing world, recent researches explore blind super-resolution considering various degradations in the actual scene. AWGN synthesizes training low-resolution-high-resolution image pairs in a blind super-resolution manner by introducing an anisotropic gaussian blur kernel and additive gaussian white noise, however, a degradation model based on the anisotropic gaussian blur assumption is too simple in an actual scene, limiting its application.

Furthermore, image or feature alignment of adjacent frames is a key and difficult problem in video super-resolution reconstruction algorithms. Due to the movement of ground objects such as vehicles, airplanes, ships and the like and the visual angle change caused by the movement of satellites, errors can be introduced in direct fusion, so that the effect is reduced. Thus, most current methods introduce an alignment operation that helps to accurately find missing information in adjacent frames. There are two types of alignment operations, image alignment and feature alignment. The method of image alignment typically uses a light flow method that works well for small scale movements (such as movement of features) but not for large scale changes (such as background movement). Feature alignment typically uses deformable convolutional networks and achieves good performance, but training of deformable convolutional networks is not stable. Neither optical flow nor deformable convolution is directly applicable to satellite video, because satellite video has both moving ground objects and moving background, with less obvious motion characteristics. Although the satellite video superdivision reconstruction introduces alignment operation, alignment errors cannot be eliminated, meanwhile, due to the fact that objects move in different shielded areas at fixed time, information of related areas is lost, poor effects are easily generated by directly fusing multi-frame information, and even wrong information is possibly fused, so that effects are reduced.

Disclosure of Invention

Based on the method, the invention provides a satellite video super-resolution reconstruction method and a satellite video super-resolution reconstruction system for a multiple degradation process, firstly, a degradation model of satellite video data is constructed, the degradation model considers a fuzzy core estimated from a real remote sensing image and a fuzzy core generated from predefined distribution, the low resolution data is ensured to be close to a real scene, the diversity of the low resolution data is not completely dependent on the diversity of an external data set, and secondly, the characteristics extracted from the whole video frame are propagated by using a bidirectional cyclic high-order grid characteristic propagation network, so that the reconstruction of each frame can utilize the information of all frames. Finally, a feature alignment module based on space-time fusion is provided, the method utilizes a space-time attention mechanism to fuse space-time feature information among frames, the utilization rate of key information is improved, the influence caused by error information is reduced, and a deformable convolution is used for aligning the features after space-time fusion, so that the error accumulation of long-sequence images generated along with time is reduced.

In order to achieve the above aim and achieve the above technical effects, the technical scheme adopted by the invention is that the satellite video super-resolution reconstruction method facing the multiple degradation process comprises the following steps:

step S1, acquiring a satellite video data set;

s2, constructing a satellite video super-resolution reconstruction model oriented to a multiple degradation process, wherein the satellite video super-resolution reconstruction model comprises a satellite video random degradation module, a feature extraction module, a feature alignment module and a pixel recombination module;

Step S3, satellite video data in the satellite video data set passes through a satellite video random degradation module to obtain a low-resolution frame image, and data enhancement operation is carried out to obtain low-resolution frame data;

Step S4, the low-resolution frame data obtained in the step S3 passes through a feature extraction module to obtain frame image feature information;

S5, inputting the frame image characteristic information obtained in the step S4 into a characteristic alignment module, and carrying out propagation, aggregation and alignment operation on the input characteristics to obtain the aligned characteristics;

Step S6, the aligned features obtained in the step S5 are passed through a pixel reorganization module to obtain residual features of the high-resolution image;

and S7, performing up-sampling operation on the low-resolution frame image obtained in the step S3, and adding the up-sampling operation with residual characteristics of the high-resolution image obtained in the step S6 to obtain a final high-resolution reconstructed image.

Further, the satellite video random degradation module in the step S3 includes a downsampling module, four random degradation modules and a video compression module, the downsampling module downsampling the image in one of nearest interpolation, bilinear interpolation or bicubic interpolation, each of the four random degradation modules performs degradation processing on the image in one of random blur, motion blur, random noise or sensor tremor, and the video compression module performs intra-frame compression or inter-frame compression, wherein intra-frame compression includes JPEG compression, JPEG 2000 compression, PNG compression, WEBP compression, BMP compression, TIFF compression, BPG compression and FLIF compression, and inter-frame compression includes h.263, h.264, h.265, MJPEG, mjg-2, MPEG-4, JPEG 8, VP9 and AV1.

Further, the feature extraction module in step S4 includes K residual blocks, where each residual block includes two convolution layers and an activation layer, and the residual blocks are connected by a jump connection manner.

Further, the feature alignment module in step S5 is configured to perform information propagation and aggregation on the frame image feature information obtained in step S4, and perform propagation through a high-order mesh propagation scheme, where the process is divided into three phases, namely, a first phase in which features are propagated forward along time increment, a second phase in which features are propagated backward along time decrement, and a third phase in which target frame information is connected with feature information of two frames before and after the first phase, where the feature alignment module includes a residual module, a feature alignment module based on optical flow guidance and a space-time feature fusion module, where the residual module includes K residual blocks, each including two convolution layers and an activation layer, and where the feature alignment module based on optical flow guidance is to introduce optical flow information to assist deformable convolution to perform feature alignment operation, and the space-time feature fusion module includes three convolution layers, a time attention mechanism and a space attention mechanism.

Further, the calculation process of the feature alignment module is as follows:

For backward propagation, the extracted feature information is further extracted by a residual error module:

(1)

Wherein, The residual block is represented as a block of residuals,Representing a cascading operation of the dimensions of the channel,Representing the feature information extracted by the feature extraction module,Representing the features computed at the ith time of forward propagation, and, in addition, aligning the two frames before and after the target frame:

(2)

Wherein, Representing the features after the alignment,,,AndFeatures of the i-2, i-1, i+1 and i+2 moments of backward propagation are shown,And finally, obtaining a final feature diagram through a space-time feature fusion module:

(3)

Wherein, The final feature map is shown in the figure,Represents the deeper feature information extracted through the residual module,Representing a temporal-spatial feature fusion module.

Further, in the deformable feature alignment module based on optical flow guidance, an optical flow estimation network is first used to calculate an optical flow graph, and then,,,Respectively represent the optical flow from the first,,,Frame to the firstFrame mapping, namely, twisting the front and rear frame images of the ith frame of the target frame:

(4)

(5)

(6)

(7)

Wherein, ,,,Respectively represent the first,,,The optical flow information at the moment is spatially warped,Representing spatial warping operations and then computing optical flow residuals using pre-aligned featuresAnd deformable convolution mask:

(8)

(9)

Wherein, Indicating that the channel is connected,AndA convolution calculation is represented and is performed,Representing activation functions, finally obtaining aligned features by deformable convolution:

(10)

Where DCN represents a deformable convolution operation.

Further, in the space-time feature fusion module, deep feature information extracted from the current frame is first extractedFeature map obtained after alignment with subsequent framePerforming convolution operation to calculate the similar distance after embedding:

(11)

Wherein, Representing the activation function, conv represents the convolution calculation,Representing dot product, then performing time attention mechanism processing on the similar distances, wherein the time attention is spatially anisotropic for each spatial position;

(12)

Wherein, A time attention graph is represented and a time attention graph is represented,Representing time attention processing, multiplying the aligned characteristic diagram with time attention force diagram pixel by pixel, and obtaining the characteristic diagram of time attention fusion processing by convolution layer:

(13)

Wherein, Representing a pixel-by-pixel dot product operation,Finally, the fused features are processed by a spatial attention mechanism to strengthen texture information, and a final feature map is obtained:

(14)

Wherein SA denotes a spatial attention operation,Representing element-by-element additions.

Further, the pixel reorganizing module in step S6 includes a rebuilding layer, four convolution layers, two pixel shuffling layers and three activating layers, the rebuilding layer is composed of K residual blocks with the same structure, each residual block includes two convolution layers and one activating layer, the two pixel shuffling layers adopt a pixel shuffling strategy, the first activating layer and the second activating layer both adopt a Leaky ReLU activating function, and the third activating layer adopts a ReLU activating function.

Further, the upsampling operation in step S7 employs one of nearest neighbor interpolation, bilinear interpolation, or bicubic interpolation.

The invention also provides a satellite video super-resolution reconstruction system facing the multiple degradation process, which comprises the following steps:

The system comprises a processor and a memory, wherein the memory is used for storing program instructions, and the processor is used for calling the stored instructions in the memory to execute the satellite video super-resolution reconstruction method facing the multiple degradation process according to the technical scheme.

According to the technical scheme, the satellite video super-resolution reconstruction method and system for the multiple degradation process are provided, the satellite video random degradation model is utilized to generate corresponding low-resolution data, the acquired low-resolution data passes through the feature extraction layer, the extracted features are propagated by adopting the high-order grid feature bidirectional propagation scheme, and finally, the feature alignment module for space-time fusion is used for carrying out feature alignment on the sequence images, so that error accumulation of long sequence images is reduced, high fidelity of the satellite video is effectively recovered, reconstruction quality of the satellite video is remarkably improved, and certain noise removal capability is provided.

Drawings

Fig. 1 is a schematic diagram of an overall framework of a satellite video super-resolution reconstruction method facing multiple degradation processes.

Fig. 2 is a schematic structural diagram of a satellite video random degradation model constructed in the invention.

Fig. 3 is a schematic structural diagram of a feature extraction module in the present invention.

Fig. 4 is a schematic diagram of a feature alignment module constructed in the present invention.

FIG. 5 is a schematic diagram of a deformable feature alignment module based on optical flow guidance constructed in the present invention.

FIG. 6 is a schematic diagram of a space-time feature fusion module constructed in the present invention.

Fig. 7 is a schematic diagram of a pixel reorganization module constructed in the present invention.

Fig. 8 is a visual result of 2 x superminute reconstruction in three satellite video data sets according to the present invention.

Fig. 9 is a visual result of the 3 x superscore reconstruction in three satellite video datasets according to the present invention.

Fig. 10 is a visual result of 4 x superscore reconstruction in three satellite video datasets according to the present invention.

Detailed Description

In order to make the objects and technical solutions of the present invention more clear, the following detailed description of the present invention is provided with reference to the accompanying drawings and specific embodiments, so that the features and performances of the present invention can be more easily understood by researchers in the related field, and thus the protection scope of the present invention is more clearly and clearly defined.

The embodiment of the invention discloses a satellite video super-resolution reconstruction method facing a multiple degradation process, which comprises the following steps of:

Step 1, constructing a satellite video data set, wherein the data set is cut out from 100 videos shot by a video satellite, and covers wide ground feature information, including cities, wharfs, airports, suburbs, cities, deserts and the like, and the videos comprise dynamic scenes, such as moving automobiles, airplanes, ships and the like. We clip the video into 284 small videos, each video containing 180 consecutive frames, with 240 small videos as training set, 44 small videos as verification set, and each video having a frame image size of 1280×720.

Step 2, a high-resolution optical satellite video super-resolution reconstruction model facing the multiple degradation process as shown in fig. 1 is constructed, and mainly comprises a satellite video random degradation module as shown in fig. 2, a feature extraction module as shown in fig. 3, a feature alignment module as shown in fig. 4 and a pixel recombination module as shown in fig. 7.

And 3, passing the data set obtained in the step 1 through a satellite video random degradation module to obtain a corresponding low-resolution video frame data set, and performing data enhancement operations such as rotation, translation, scaling and the like on the data to expand a sample library.

Specifically, the satellite video random degradation model includes a downsampling module, four random degradation modules, and a video compression module, as shown in fig. 2. Inputting an original frame imageThe method comprises the steps of firstly, through a downsampling module, performing downsampling processing on an image by randomly selecting one downsampling mode when the downsampling module passes through the downsampling module each time, wherein the downsampling module comprises three downsampling modes, namely nearest neighbor interpolation, bilinear interpolation and bicubic interpolation. The downsampled image is then passed through four random degenerate modules, each of which contains four degenerate modes, random blur, motion blur, random noise and sensor tremor, one of which is randomly selected each time the downsampled image is passed through the random degenerate modules. The image after the random degradation module is finally subjected to a video compression module, wherein the video compression module comprises two parts, namely, intra-frame compression and inter-frame compression, the intra-frame compression is JPEG compression, the inter-frame compression comprises three modes, namely H.264, H.265 and MEPG-4, and when the image is subjected to video compression each time, one inter-frame compression mode is randomly selected, so that the final low-resolution frame image is obtainedAnd is opposite toAnd performing data enhancement operations such as rotation, translation, scaling and the like, and expanding the sample library.

In a specific embodiment, the downsampling modes adopt nearest neighbor interpolation, bilinear interpolation and bicubic interpolation, and the probability of each downsampling mode being selected is equal to 1/3. The random blur employs an isotropic blur kernel, an anisotropic blur kernel, a generalized isotropic blur kernel, a generalized anisotropic blur kernel, a platform isotropic blur kernel, a platform anisotropic blur kernel, and a sinc function blur kernel, and the probability of each blur kernel type being selected is set to 0.405, 0.225, 0.108, 0.027, and 0.1, respectively. The standard deviation range of the Gaussian blur in the x and y directions is [0.2,3], and the value can be randomly selected from 0.2 to 3 to control the strength of the blur. The rotation angle of the fuzzy core is [ -pi, pi ], and the fuzzy core can randomly rotate within the range of-pi to pi, so as to simulate the fuzzy effect in different directions. The generalized Gaussian blur kernel range is [0.5,4], with larger values meaning that the distribution is more nearly uniform and smaller values representing sharper distributions. The motion blur mainly describes the motion direction and intensity of a moving object, wherein the motion direction comprises horizontal motion blur, vertical motion blur and diagonal motion blur, probability ranges randomly selected in three directions are respectively 0.4, 0.4 and 0.2, the step length of motion direction change is 1 degree, thus the motion blur direction change can be finely controlled, the random value range of the motion intensity is [2,10], and the step length of the intensity change is 1, so that the motion blur change is smoother and more accurate. The random noise is mainly Gaussian noise and poisson noise, the probability of selecting the two types of noise is 50%, the standard deviation of the Gaussian noise is randomly selected between [1,30], and the step size is adjusted by 0.1 increment. The scaling factor of poisson noise is randomly chosen between [0.05,3], and the step size is adjusted in 0.005 increments. Sensor tremors can model a high frequency vibration, and can be expressed generally as a function of time, the probability of sensor tremors is randomly selected in the range of [0.05,0.6], the tremors include horizontal tremors, vertical tremors and random direction tremors, the random probability ranges are [0,0.3], [0,0.3] and [0,0.4], and the tremor intensity is randomly selected in the range of [0.1,2]. The intra-frame JPEG compression quality range is [30,95], and the quality value will be randomly selected within this interval. There are three alternative encoders for inter-frame compression, h.264, h.265 and mpeg-4, respectively, each with equal probability of being selected, 1/3.

Step 4, the low-resolution frame image obtained in the step 3 is processedObtaining the characteristics through a characteristic extraction module. The feature extraction module is composed of five residual blocks with the same structure, as shown in fig. 3, each residual block comprises two convolution layers and an activation layer, and the residual blocks are connected in a jump connection mode.

In a specific embodiment, the convolution kernels of the two convolution layers are 3×3, the step size is 2, the padding is 1, the number of input channels is 3, the number of intermediate features is 64, the number of output channels is 64, the active layers adopt a ReLU activation function, and the slope is set to 0.1.

And 5, carrying out information propagation and aggregation on the features obtained in the step 4, mainly carrying out propagation through a high-order grid propagation scheme, wherein the process is divided into three stages, namely, carrying out forward propagation on the features along time increment in the first stage, carrying out backward propagation (time decrement) on the features in the second stage, and connecting the target frame information with the feature information of the front frame and the rear frame in the third stage, wherein the connection is used for gathering the feature information of different positions and improving the effectiveness of the model in a shielding area, as shown in a dotted line part of fig. 1. In the process of feature transfer, the input feature information needs to be aligned, as shown in fig. 4, the feature alignment mainly comprises a residual error module, a deformable feature alignment module based on optical flow guiding and a space-time feature fusion module.

Specifically, taking backward propagation as an example, the extracted features are further extracted by a residual module:

(1)

Wherein, The residual block is represented as a block of residuals,Representing a cascading operation of the dimensions of the channel,Representing the feature information extracted by the feature extraction module,Representing the characteristics of the forward propagation calculated at the ith time. In addition, the two frames before and after the target frame are aligned:

(2)

Wherein, Representing the features after the alignment,,,AndFeatures of the i-2, i-1, i+1 and i+2 moments of backward propagation are shown,Representing deformable feature alignment operations based on optical flow guidance. Finally, a final feature diagram is obtained through a space-time feature fusion module:

(3)

Wherein, The final feature map is shown in the figure,Represents the deeper feature information extracted through the residual module,And (5) performing space-time feature fusion operation.

In a deformable feature alignment module based on optical flow guidance, as shown in FIG. 5, an optical flow estimation network is first employed to calculate an optical flow graph using,,,Respectively represent the optical flow from the first,,,Frame to the firstMapping of frames, namely, twisting the front and rear frame images of the ith frame of the target frame:

(4)

(5)

(6)

(7)

Wherein the method comprises the steps of ,,,Respectively represent the first,,,The optical flow information at the moment is spatially warped,Representing spatial warping operations and then computing optical flow residuals using pre-aligned featuresAnd deformable convolution mask:

(8)

(9)

Wherein, Indicating that the channel is connected,AndA convolution calculation is represented and is performed,Representing an activation function. Finally, the aligned features are obtained through deformable convolution。

(10)

Where DCN represents a deformable convolution operation.

In the space-time feature fusion module, a time attention mechanism and a space attention mechanism are adopted, the attention mechanism redistributes weights, so that a model is helped to select important information from adjacent frames, error information is reduced, and longer sequences are utilized more effectively, as shown in fig. 6. Deep characteristic information extracted from current frameFeature map obtained after alignment with subsequent framePerforming convolution operation to calculate the similar distance after embedding:

(11)

Wherein, Representing the activation function, conv represents the convolution calculation,Representing dot product. The similar distances are then processed by a temporal attention mechanism, which is spatially specific for each spatial position.

(12)

Wherein, A time attention graph is represented and a time attention graph is represented,Representing a temporal attention process. Then multiplying the aligned characteristic diagram with the time attention force diagram pixel by pixel, and obtaining the characteristic diagram of the time attention fusion processing through a convolution layer。

(13)

Wherein, Representing a pixel-by-pixel dot product operation,Representing a fusion convolution operation. Finally, the fused features are subjected to spatial attention mechanism processing to enhance texture information and the like, so as to obtain a final feature map。

(14)

In a specific embodiment, the residual modules in feature alignment are composed of three residual blocks with the same structure, each residual block comprises two convolution layers and an activation layer, the residual blocks are connected in a jump connection mode, the convolution kernels of the two convolution layers are 3×3, the step size is 2, the filling is 1, the number of input channels is 64, the number of middle features is 64, the number of output channels is 128, the activation layer adopts a ReLU activation function, and the slope is set to 0.1.

Step 6, aligning the features obtained in the step 5Obtaining residual characteristics of the high-resolution image through a pixel reorganization module。

Specifically, the pixel reorganization module includes one reconstruction layer, four convolution layers, two pixel shuffling layers, and three activation layers, as shown in fig. 7. Firstly, the feature map obtained in the step 5 is further subjected to feature extraction through a reconstruction layer, the reconstruction layer is composed of five residual blocks with the same structure, the structure of the reconstruction layer is the same as that of the feature extraction module in the step 4, the number of input channels and the number of output channels are different, then the number of channels and the size of an image are increased through two convolutions and two pixel shuffling, and finally the two convolutions are performed to obtain the residual features of the high-resolution frame image with the number of channels being 3。

In a specific embodiment, the number of input channels of the reconstruction layer is 320 and the number of output channels is 64.

The first convolution layer has a number of input channels of 64, a number of output channels of 256, a convolution kernel size of 3 x 3, a step size of 1, a fill of 1, the convolution layer amplifies the number of channels by a factor of 4 for use in cooperation with a subsequent pixel shuffling operation, the second convolution layer has a number of input channels of 64, a number of output channels of 256, a convolution kernel size of 3 x 3, a step size of 1, a fill of 1, for further amplifying the number of channels. The third convolutional layer has a number of input channels of 64, a number of output channels of 64, a convolutional kernel size of 3×3, a step size of 1, and a fill of 1. The fourth convolutional layer has a number of input channels of 64, output channels of 3, convolutional kernel size of 3×3, step size of 1, and padding of 1. The first and second active layers each use a leak ReLU activation function, the third active layer uses a ReLU activation function, and the slope of the activation function is set to 0.1. The pixel shuffling strategy adopts a pixel shuffling package strategy, wherein the pixel shuffling package is an enhanced pixel shuffling module, standard pixel shuffling only simply redistributes channel data to a space dimension, and the pixel shuffling package can also use convolution for pretreatment before pixel shuffling so as to improve the expression capability and up-sampling quality of the features. The first pixel shuffle input channel number is 256, the output channel number is 64, the scale factor is 2, and the upsampling convolution kernel size is 3. The second pixel shuffle input channel number is 256, the output channel number is 64, the scale factor is 2, and the upsampling convolution kernel size is 3.

Step 7, interpolating the input low-resolution frame image, and adding the interpolation with the residual characteristics of the high-resolution frame image obtained in the step 6 to obtain a final reconstructed image。

In a specific embodiment, bicubic interpolation is used to interpolate the low resolution frame image, and PSNR and SSIM are used as objective evaluation indexes, where the evaluation indexes are obtained by performing average calculation on all frames of each tested satellite video, and testing is performed in three satellite video data sets, the objective evaluation results are shown in table 1, and the visualization results are shown in fig. 8, 9 and 10, respectively.

Table 1 results of objective evaluation of different superdivision magnifications on three satellite video datasets

Fig. 8 is a visual result of 2 x super resolution reconstruction on three satellite video datasets. In fig. 8, a is a 033 rd scene original image of a data set 1, A1 represents an original frame image amplification detail, A2 represents A2 times super-resolution reconstruction result of a corresponding position, B is a 000 th scene original frame image of the data set 2, B1 represents an original frame image amplification detail, B2 represents A2 times super-resolution reconstruction result of a corresponding position, C is a 3 rd 001 th scene original frame image of the data set, C1 represents an original frame image amplification detail, and C2 represents A2 times super-resolution reconstruction result of a corresponding position.

Fig. 9 is a visual result of a 3 x super resolution reconstruction on three satellite video datasets. In fig. 9, a is an original frame image of a 026 th scene of a data set 1, A1 represents an enlarged detail of the original frame image, A2 represents a 3 times super-resolution reconstruction result of a corresponding position, B is an original frame image of a 016 th scene of the data set 2, B1 represents an enlarged detail of the original frame image, B2 represents a 3 times super-resolution reconstruction result of the corresponding position, C is an original 001 th scene image of the data set 3, C1 represents an enlarged detail of the original frame image, and C2 represents a 3 times super-resolution reconstruction result of the corresponding position.

Fig. 10 is a visual result of a 4 x super resolution reconstruction on three satellite video datasets. In fig. 10, a is a 030 th scene original frame image of a data set 1, A1 represents an original frame image enlarged detail, A2 represents a 4 times super-resolution reconstruction result of a corresponding position, B is a 001 th scene original frame image of the data set 2, B1 represents an original frame image enlarged detail, B2 represents a 4 times super-resolution reconstruction result of a corresponding position, C is a 005 th scene original image of the data set 3, C1 represents an original frame image enlarged detail, and C2 represents a 4 times super-resolution reconstruction result of a corresponding position.

In summary, the invention solves the problem of poor reconstruction effect caused by multiple factor degradation of satellite video data in actual scenes by considering various noises in satellite video data in a satellite video random degradation module, adopts a high-order grid characteristic bidirectional propagation scheme to propagate characteristic information forwards and backwards in time in an alternating manner, can revisit information from different frames and refine the characteristics, improves the expression capability of the characteristics, fuses time characteristics in the characteristic alignment process, is favorable for a model to select important information from adjacent frames, reduces error information, thereby more effectively utilizing longer sequence images, and finally performs experiments on three satellite video data sets, and the test results prove the effectiveness of the method in qualitative and quantitative aspects.

On the other hand, the embodiment of the invention also provides a satellite video super-resolution reconstruction system facing the multiple degradation process, which comprises the following steps:

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. The satellite video super-resolution reconstruction method for the multiple degradation process is characterized by comprising the following steps of:

step S1, acquiring a satellite video data set;

The feature alignment module in the step S5 is used for carrying out information transmission and aggregation on the frame image feature information obtained in the step S4, and carries out transmission through a high-order grid transmission scheme, and the process is divided into three stages, namely, a first stage is used for carrying out forward transmission on the feature along time increment, a second stage is used for carrying out backward transmission on the feature along time decrement, and a third stage is used for connecting the target frame information with the feature information of the front frame and the rear frame, wherein the feature alignment module comprises a residual error module, a feature alignment module based on optical flow guidance and a space-time feature fusion module, the residual error module comprises K residual error blocks, each residual error block comprises two convolution layers and an activation layer, and the feature alignment module based on optical flow guidance is used for introducing the optical flow information to assist deformable convolution to carry out feature alignment operation;

2. The method for super-resolution reconstruction of satellite video according to claim 1, wherein the satellite video random degradation module in step S3 comprises a downsampling module, four random degradation modules and a video compression module, wherein the downsampling module downsamples the image by one of nearest neighbor interpolation, bilinear interpolation or bicubic interpolation, wherein each of the four random degradation modules degrades the image by one of random blurring, motion blurring, random noise or sensor tremors, and wherein the video compression module is implemented by intra-frame compression or inter-frame compression, wherein the intra-frame compression comprises JPEG compression, JPEG 2000 compression, PNG compression, WEBP compression, BMP compression, TIFF compression, BPG compression and FLIF compression, and the inter-frame compression comprises H.263, H.264, H.265, MJPEG-2, MPEG-4, VP8, VP9 and AV1.

3. The method for super-resolution reconstruction of satellite video for multiple degradation process according to claim 1, wherein said feature extraction module in step S4 comprises K residual blocks, each residual block comprising two convolution layers and an activation layer, the residual blocks being connected by means of jump connection.

4. The satellite video super-resolution reconstruction method for multiple degradation processes of claim 1, wherein the calculation process of the feature alignment module is as follows:

(1)

(2)

(3)

5. The method for satellite video super-resolution reconstruction with multiple degradation processes of claim 4, wherein said deformable feature alignment module based on optical flow guidance first uses an optical flow estimation network to calculate an optical flow map using,,,Respectively represent the optical flow from the first,,,Frame to the firstFrame mapping, namely, twisting the front and rear frame images of the ith frame of the target frame:

(4)

(5)

(6)

(7)

(8)

(9)

(10)

Where DCN represents a deformable convolution operation.

6. The method for super-resolution reconstruction of satellite video for multiple degradation process of claim 5, wherein said method comprises extracting deep feature information of current frame in space-time feature fusion moduleFeature map obtained after alignment with subsequent framePerforming convolution operation to calculate the similar distance after embedding:

(11)

(12)

(13)

(14)

7. The satellite video superdivision reconstruction method for the multiple degradation process according to claim 1, wherein the pixel reorganizing module in the step S6 comprises a reconstruction layer, four convolution layers, two pixel shuffling layers and three activation layers, the reconstruction layer is composed of K residual blocks with the same structure, each residual block comprises two convolution layers and one activation layer, the two pixel shuffling layers adopt a pixel shuffling strategy, the first activation layer and the second activation layer adopt a Leaky ReLU activation function, and the third activation layer adopts a ReLU activation function.

8. The method for super-resolution reconstruction of satellite video as in claim 1, wherein said upsampling in step S7 is one of nearest neighbor interpolation, bilinear interpolation or bicubic interpolation.

9. A satellite video super-resolution reconstruction system for multiple degradation processes, comprising:

A processor and a memory for storing program instructions, the processor for invoking the stored instructions in the memory to perform a multiple degradation process oriented satellite video super resolution reconstruction method according to any one of claims 1-8.