+

CN119624782A - A satellite video super-resolution reconstruction method and system for multiple degradation processes - Google Patents

A satellite video super-resolution reconstruction method and system for multiple degradation processes Download PDF

Info

Publication number
CN119624782A
CN119624782A CN202510161817.5A CN202510161817A CN119624782A CN 119624782 A CN119624782 A CN 119624782A CN 202510161817 A CN202510161817 A CN 202510161817A CN 119624782 A CN119624782 A CN 119624782A
Authority
CN
China
Prior art keywords
module
feature
resolution
satellite video
features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202510161817.5A
Other languages
Chinese (zh)
Other versions
CN119624782B (en
Inventor
李路
王密
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University WHU
Original Assignee
Wuhan University WHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU filed Critical Wuhan University WHU
Priority to CN202510161817.5A priority Critical patent/CN119624782B/en
Publication of CN119624782A publication Critical patent/CN119624782A/en
Application granted granted Critical
Publication of CN119624782B publication Critical patent/CN119624782B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4053Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4046Scaling of whole images or parts thereof, e.g. expanding or contracting using neural networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/62Extraction of image or video features relating to a temporal dimension, e.g. time-based feature extraction; Pattern tracking
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/593Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial prediction techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Image Processing (AREA)

Abstract

本发明提供了一种面向多重退化过程的卫星视频超分重建方法和系统,该方法的整个流程:通过卫星视频随机退化模块生成对应的低分辨率视频数据,再利用特征提取模块对低分辨率数据进行特征提取,提取的特征采用高阶格网双向传播方案来传递整个视频序列的每一帧特征,并使用特征对齐模块实现视频数据的特征对齐,最后将对齐后的特征经过像素重组,与上采样后的低分辨率特征相加得到重建的高分辨率影像。通过三种卫星视频数据实验表明,本发明可以有效的恢复卫星视频的高保真度,显著提高遥感图像的重建质量,同时还具备一定的去噪能力。

The present invention provides a satellite video super-resolution reconstruction method and system for multiple degradation processes. The whole process of the method is as follows: the satellite video random degradation module is used to generate corresponding low-resolution video data, and then the feature extraction module is used to extract features of the low-resolution data. The extracted features use a high-order grid bidirectional propagation scheme to transmit the features of each frame of the entire video sequence, and the feature alignment module is used to achieve feature alignment of the video data. Finally, the aligned features are pixel reorganized and added to the up-sampled low-resolution features to obtain a reconstructed high-resolution image. Experiments with three types of satellite video data show that the present invention can effectively restore the high fidelity of satellite videos, significantly improve the reconstruction quality of remote sensing images, and also has a certain denoising capability.

Description

Satellite video super-resolution reconstruction method and system for multiple degradation process
Technical Field
The invention belongs to the field of high-resolution optical satellite video image processing, and particularly relates to a satellite video super-resolution reconstruction method and system for a multiple degradation process.
Background
With the rapid development of remote sensing technology, users can more easily acquire high-spatial-resolution and high-time-resolution earth observation remote sensing data, and the traditional static earth observation data is not satisfied. Video satellites are a new type of earth-looking satellites developed in recent years that "stare" at a particular target for a long period of time and capture time-series images of the particular target at a frame rate, as compared to conventional earth-looking satellites. Compared with the traditional static remote sensing image, the satellite video data not only has sub-meter high spatial resolution, but also has video-level high time resolution, can provide continuous information of specific targets, realizes high-space-time dynamic earth observation, and is widely applied to dynamic scene applications such as change detection, target tracking, traffic monitoring and the like. Therefore, the processing research on the video satellite data has important significance for homeland monitoring, disaster relief, national defense safety, smart cities and the like. However, the spatial resolution of satellite video data is affected by complex environmental factors in imaging and data transmission, such as atmospheric scattering, sensor tremors, data compression, and the like, high-frequency information in the satellite video data may be lost, and a blurring phenomenon occurs, which greatly reduces the performance of subsequent applications, so that improving the spatial resolution of satellite video is very important for human perception and realization of downstream tasks. Compared with the method of hardware to improve the resolution, the super-resolution reconstruction technology can greatly save the maintenance, transmission and storage costs of satellites.
Superresolution reconstruction is a very classical underlying task in computer vision that uses low resolution images to reconstruct high resolution images to improve image quality. The video super-resolution reconstruction requires the resolution of each frame in the video and puts requirements on the consistency of the images. In a broad sense, video super-resolution reconstruction can be regarded as an extension of image super-resolution reconstruction, which can be processed frame by frame using a single image super-resolution reconstruction algorithm. However, in practice, the results of processing video using image super-resolution reconstruction algorithms are difficult to satisfy, since time information is not considered, thereby causing artifacts and hysteresis. Video has more information, and video has an additional time dimension relative to the image, so it is more challenging to design a super-resolution reconstruction algorithm for video. To better utilize this time information, researchers often introduce frame alignment to eliminate the effects of object or background motion in the video. Compared with a natural scene video, the satellite video has the advantages that firstly, the spatial resolution of the satellite video is lower, texture information is lack, secondly, the visible range of the satellite video is larger than that of the natural scene video, the scenes are more various, the information density is higher, and thirdly, the variable scale of a moving object in the satellite video has more complex movement variation. Therefore, although there has been a great development in recent years of video super-resolution reconstruction based on deep learning, these general methods are not suitable for direct application to satellite video, and require more efforts to develop a method suitable for satellite video super-resolution reconstruction.
The existing method for acquiring the low-resolution data mainly comprises three modes, namely 1) synthesizing the low-resolution image from the corresponding high-resolution image based on degradation assumptions such as bicubic degradation and the like. However, when the test low resolution image does not meet the degradation assumption, the trained superreconstruction model performance may drop dramatically. 2) A low-resolution-high-resolution image pair is created from a real remote sensing image, wherein the low-resolution and high-resolution images are collected at the same location. However, this method has problems such as land coverage variation and spectral gaps between the two images. To solve the above problem, a third method is proposed to train a super-resolution reconstruction model using unpaired low-resolution-high-resolution images. These methods typically utilize a downgrader and a generator, where the low resolution image is first processed by the generator, and then the generator can be trained in a self-supervising manner. This makes it possible to use unpaired low-high resolution images, so training data is easier to collect. However, training of these methods is often unstable due to the field gap between low resolution and high resolution images and lack of intermediate supervision. For application of remote sensing image super-resolution reconstruction in the real remote sensing world, recent researches explore blind super-resolution considering various degradations in the actual scene. AWGN synthesizes training low-resolution-high-resolution image pairs in a blind super-resolution manner by introducing an anisotropic gaussian blur kernel and additive gaussian white noise, however, a degradation model based on the anisotropic gaussian blur assumption is too simple in an actual scene, limiting its application.
Furthermore, image or feature alignment of adjacent frames is a key and difficult problem in video super-resolution reconstruction algorithms. Due to the movement of ground objects such as vehicles, airplanes, ships and the like and the visual angle change caused by the movement of satellites, errors can be introduced in direct fusion, so that the effect is reduced. Thus, most current methods introduce an alignment operation that helps to accurately find missing information in adjacent frames. There are two types of alignment operations, image alignment and feature alignment. The method of image alignment typically uses a light flow method that works well for small scale movements (such as movement of features) but not for large scale changes (such as background movement). Feature alignment typically uses deformable convolutional networks and achieves good performance, but training of deformable convolutional networks is not stable. Neither optical flow nor deformable convolution is directly applicable to satellite video, because satellite video has both moving ground objects and moving background, with less obvious motion characteristics. Although the satellite video superdivision reconstruction introduces alignment operation, alignment errors cannot be eliminated, meanwhile, due to the fact that objects move in different shielded areas at fixed time, information of related areas is lost, poor effects are easily generated by directly fusing multi-frame information, and even wrong information is possibly fused, so that effects are reduced.
Disclosure of Invention
Based on the method, the invention provides a satellite video super-resolution reconstruction method and a satellite video super-resolution reconstruction system for a multiple degradation process, firstly, a degradation model of satellite video data is constructed, the degradation model considers a fuzzy core estimated from a real remote sensing image and a fuzzy core generated from predefined distribution, the low resolution data is ensured to be close to a real scene, the diversity of the low resolution data is not completely dependent on the diversity of an external data set, and secondly, the characteristics extracted from the whole video frame are propagated by using a bidirectional cyclic high-order grid characteristic propagation network, so that the reconstruction of each frame can utilize the information of all frames. Finally, a feature alignment module based on space-time fusion is provided, the method utilizes a space-time attention mechanism to fuse space-time feature information among frames, the utilization rate of key information is improved, the influence caused by error information is reduced, and a deformable convolution is used for aligning the features after space-time fusion, so that the error accumulation of long-sequence images generated along with time is reduced.
In order to achieve the above aim and achieve the above technical effects, the technical scheme adopted by the invention is that the satellite video super-resolution reconstruction method facing the multiple degradation process comprises the following steps:
step S1, acquiring a satellite video data set;
s2, constructing a satellite video super-resolution reconstruction model oriented to a multiple degradation process, wherein the satellite video super-resolution reconstruction model comprises a satellite video random degradation module, a feature extraction module, a feature alignment module and a pixel recombination module;
step S3, satellite video data in the satellite video data set passes through a satellite video random degradation module to obtain a low-resolution frame image, and data enhancement operation is carried out to obtain low-resolution frame data;
step S4, the low-resolution frame data obtained in the step S3 passes through a feature extraction module to obtain frame image feature information;
s5, inputting the frame image characteristic information obtained in the step S4 into a characteristic alignment module, and carrying out propagation, aggregation and alignment operation on the input characteristics to obtain the aligned characteristics;
Step S6, the aligned features obtained in the step S5 are passed through a pixel reorganization module to obtain residual features of the high-resolution image;
and S7, performing up-sampling operation on the low-resolution frame image obtained in the step S3, and adding the up-sampling operation with residual characteristics of the high-resolution image obtained in the step S6 to obtain a final high-resolution reconstructed image.
Further, the satellite video random degradation module in the step S3 includes a downsampling module, four random degradation modules and a video compression module, the downsampling module downsampling the image in one of nearest interpolation, bilinear interpolation or bicubic interpolation, each of the four random degradation modules performs degradation processing on the image in one of random blur, motion blur, random noise or sensor tremor, and the video compression module performs intra-frame compression or inter-frame compression, wherein intra-frame compression includes JPEG compression, JPEG 2000 compression, PNG compression, WEBP compression, BMP compression, TIFF compression, BPG compression and FLIF compression, and inter-frame compression includes h.263, h.264, h.265, MJPEG, mjg-2, MPEG-4, JPEG 8, VP9 and AV1.
Further, the feature extraction module in step S4 includes K residual blocks, where each residual block includes two convolution layers and an activation layer, and the residual blocks are connected by a jump connection manner.
Further, the feature alignment module in step S5 is configured to perform information propagation and aggregation on the frame image feature information obtained in step S4, and perform propagation through a high-order mesh propagation scheme, where the process is divided into three phases, namely, a first phase in which features are propagated forward along time increment, a second phase in which features are propagated backward along time decrement, and a third phase in which target frame information is connected with feature information of two frames before and after the first phase, where the feature alignment module includes a residual module, a feature alignment module based on optical flow guidance and a space-time feature fusion module, where the residual module includes K residual blocks, each including two convolution layers and an activation layer, and where the feature alignment module based on optical flow guidance is to introduce optical flow information to assist deformable convolution to perform feature alignment operation, and the space-time feature fusion module includes three convolution layers, a time attention mechanism and a space attention mechanism.
Further, the calculation process of the feature alignment module is as follows:
for backward propagation, the extracted feature information is further extracted by a residual error module:
(1)
Wherein, The residual block is represented as a block of residuals,Representing a cascading operation of the dimensions of the channel,Representing the feature information extracted by the feature extraction module,Representing the features computed at the ith time of forward propagation, and, in addition, aligning the two frames before and after the target frame:
(2)
Wherein, Representing the features after the alignment,,,AndFeatures of the i-2, i-1, i+1 and i+2 moments of backward propagation are shown,And finally, obtaining a final feature diagram through a space-time feature fusion module:
(3)
Wherein, The final feature map is shown in the figure,Represents the deeper feature information extracted through the residual module,Representing a temporal-spatial feature fusion module.
Further, in the deformable feature alignment module based on optical flow guidance, an optical flow estimation network is first used to calculate an optical flow graph, and then,,,Respectively represent the optical flow from the first,,,Frame to the firstFrame mapping, namely, twisting the front and rear frame images of the ith frame of the target frame:
(4)
(5)
(6)
(7)
Wherein, ,,,Respectively represent the first,,,The optical flow information at the moment is spatially warped,Representing spatial warping operations and then computing optical flow residuals using pre-aligned featuresAnd deformable convolution mask:
(8)
(9)
Wherein, Indicating that the channel is connected,AndA convolution calculation is represented and is performed,Representing activation functions, finally obtaining aligned features by deformable convolution:
(10)
Where DCN represents a deformable convolution operation.
Further, in the space-time feature fusion module, deep feature information extracted from the current frame is first extractedFeature map obtained after alignment with subsequent framePerforming convolution operation to calculate the similar distance after embedding:
(11)
Wherein, Representing the activation function, conv represents the convolution calculation,Representing dot product, then performing time attention mechanism processing on the similar distances, wherein the time attention is spatially anisotropic for each spatial position;
(12)
Wherein, A time attention graph is represented and a time attention graph is represented,Representing time attention processing, multiplying the aligned characteristic diagram with time attention force diagram pixel by pixel, and obtaining the characteristic diagram of time attention fusion processing by convolution layer:
(13)
Wherein, Representing a pixel-by-pixel dot product operation,Finally, the fused features are processed by a spatial attention mechanism to strengthen texture information, and a final feature map is obtained:
(14)
Wherein SA denotes a spatial attention operation,Representing element-by-element additions.
Further, the pixel reorganizing module in step S6 includes a rebuilding layer, four convolution layers, two pixel shuffling layers and three activating layers, the rebuilding layer is composed of K residual blocks with the same structure, each residual block includes two convolution layers and one activating layer, the two pixel shuffling layers adopt a pixel shuffling strategy, the first activating layer and the second activating layer both adopt a Leaky ReLU activating function, and the third activating layer adopts a ReLU activating function.
Further, the upsampling operation in step S7 employs one of nearest neighbor interpolation, bilinear interpolation, or bicubic interpolation.
The invention also provides a satellite video super-resolution reconstruction system facing the multiple degradation process, which comprises the following steps:
The system comprises a processor and a memory, wherein the memory is used for storing program instructions, and the processor is used for calling the stored instructions in the memory to execute the satellite video super-resolution reconstruction method facing the multiple degradation process according to the technical scheme.
According to the technical scheme, the satellite video super-resolution reconstruction method and system for the multiple degradation process are provided, the satellite video random degradation model is utilized to generate corresponding low-resolution data, the acquired low-resolution data passes through the feature extraction layer, the extracted features are propagated by adopting the high-order grid feature bidirectional propagation scheme, and finally, the feature alignment module for space-time fusion is used for carrying out feature alignment on the sequence images, so that error accumulation of long sequence images is reduced, high fidelity of the satellite video is effectively recovered, reconstruction quality of the satellite video is remarkably improved, and certain noise removal capability is provided.
Drawings
Fig. 1 is a schematic diagram of an overall framework of a satellite video super-resolution reconstruction method facing multiple degradation processes.
Fig. 2 is a schematic structural diagram of a satellite video random degradation model constructed in the invention.
Fig. 3 is a schematic structural diagram of a feature extraction module in the present invention.
Fig. 4 is a schematic diagram of a feature alignment module constructed in the present invention.
FIG. 5 is a schematic diagram of a deformable feature alignment module based on optical flow guidance constructed in the present invention.
FIG. 6 is a schematic diagram of a space-time feature fusion module constructed in the present invention.
Fig. 7 is a schematic diagram of a pixel reorganization module constructed in the present invention.
Fig. 8 is a visual result of 2 x superminute reconstruction in three satellite video data sets according to the present invention.
Fig. 9 is a visual result of the 3 x superscore reconstruction in three satellite video datasets according to the present invention.
Fig. 10 is a visual result of 4 x superscore reconstruction in three satellite video datasets according to the present invention.
Detailed Description
In order to make the objects and technical solutions of the present invention more clear, the following detailed description of the present invention is provided with reference to the accompanying drawings and specific embodiments, so that the features and performances of the present invention can be more easily understood by researchers in the related field, and thus the protection scope of the present invention is more clearly and clearly defined.
The embodiment of the invention discloses a satellite video super-resolution reconstruction method facing a multiple degradation process, which comprises the following steps of:
Step 1, constructing a satellite video data set, wherein the data set is cut out from 100 videos shot by a video satellite, and covers wide ground feature information, including cities, wharfs, airports, suburbs, cities, deserts and the like, and the videos comprise dynamic scenes, such as moving automobiles, airplanes, ships and the like. We clip the video into 284 small videos, each video containing 180 consecutive frames, with 240 small videos as training set, 44 small videos as verification set, and each video having a frame image size of 1280×720.
Step 2, a high-resolution optical satellite video super-resolution reconstruction model facing the multiple degradation process as shown in fig. 1 is constructed, and mainly comprises a satellite video random degradation module as shown in fig. 2, a feature extraction module as shown in fig. 3, a feature alignment module as shown in fig. 4 and a pixel recombination module as shown in fig. 7.
And 3, passing the data set obtained in the step 1 through a satellite video random degradation module to obtain a corresponding low-resolution video frame data set, and performing data enhancement operations such as rotation, translation, scaling and the like on the data to expand a sample library.
Specifically, the satellite video random degradation model includes a downsampling module, four random degradation modules, and a video compression module, as shown in fig. 2. Inputting an original frame imageThe method comprises the steps of firstly, through a downsampling module, performing downsampling processing on an image by randomly selecting one downsampling mode when the downsampling module passes through the downsampling module each time, wherein the downsampling module comprises three downsampling modes, namely nearest neighbor interpolation, bilinear interpolation and bicubic interpolation. The downsampled image is then passed through four random degenerate modules, each of which contains four degenerate modes, random blur, motion blur, random noise and sensor tremor, one of which is randomly selected each time the downsampled image is passed through the random degenerate modules. The image after the random degradation module is finally subjected to a video compression module, wherein the video compression module comprises two parts, namely, intra-frame compression and inter-frame compression, the intra-frame compression is JPEG compression, the inter-frame compression comprises three modes, namely H.264, H.265 and MEPG-4, and when the image is subjected to video compression each time, one inter-frame compression mode is randomly selected, so that the final low-resolution frame image is obtainedAnd is opposite toAnd performing data enhancement operations such as rotation, translation, scaling and the like, and expanding the sample library.
In a specific embodiment, the downsampling modes adopt nearest neighbor interpolation, bilinear interpolation and bicubic interpolation, and the probability of each downsampling mode being selected is equal to 1/3. The random blur employs an isotropic blur kernel, an anisotropic blur kernel, a generalized isotropic blur kernel, a generalized anisotropic blur kernel, a platform isotropic blur kernel, a platform anisotropic blur kernel, and a sinc function blur kernel, and the probability of each blur kernel type being selected is set to 0.405, 0.225, 0.108, 0.027, and 0.1, respectively. The standard deviation range of the Gaussian blur in the x and y directions is [0.2,3], and the value can be randomly selected from 0.2 to 3 to control the strength of the blur. The rotation angle of the fuzzy core is [ -pi, pi ], and the fuzzy core can randomly rotate within the range of-pi to pi, so as to simulate the fuzzy effect in different directions. The generalized Gaussian blur kernel range is [0.5,4], with larger values meaning that the distribution is more nearly uniform and smaller values representing sharper distributions. The motion blur mainly describes the motion direction and intensity of a moving object, wherein the motion direction comprises horizontal motion blur, vertical motion blur and diagonal motion blur, probability ranges randomly selected in three directions are respectively 0.4, 0.4 and 0.2, the step length of motion direction change is 1 degree, thus the motion blur direction change can be finely controlled, the random value range of the motion intensity is [2,10], and the step length of the intensity change is 1, so that the motion blur change is smoother and more accurate. The random noise is mainly Gaussian noise and poisson noise, the probability of selecting the two types of noise is 50%, the standard deviation of the Gaussian noise is randomly selected between [1,30], and the step size is adjusted by 0.1 increment. The scaling factor of poisson noise is randomly chosen between [0.05,3], and the step size is adjusted in 0.005 increments. Sensor tremors can model a high frequency vibration, and can be expressed generally as a function of time, the probability of sensor tremors is randomly selected in the range of [0.05,0.6], the tremors include horizontal tremors, vertical tremors and random direction tremors, the random probability ranges are [0,0.3], [0,0.3] and [0,0.4], and the tremor intensity is randomly selected in the range of [0.1,2]. The intra-frame JPEG compression quality range is [30,95], and the quality value will be randomly selected within this interval. There are three alternative encoders for inter-frame compression, h.264, h.265 and mpeg-4, respectively, each with equal probability of being selected, 1/3.
Step 4, the low-resolution frame image obtained in the step 3 is processedObtaining the characteristics through a characteristic extraction module. The feature extraction module is composed of five residual blocks with the same structure, as shown in fig. 3, each residual block comprises two convolution layers and an activation layer, and the residual blocks are connected in a jump connection mode.
In a specific embodiment, the convolution kernels of the two convolution layers are 3×3, the step size is 2, the padding is 1, the number of input channels is 3, the number of intermediate features is 64, the number of output channels is 64, the active layers adopt a ReLU activation function, and the slope is set to 0.1.
And 5, carrying out information propagation and aggregation on the features obtained in the step 4, mainly carrying out propagation through a high-order grid propagation scheme, wherein the process is divided into three stages, namely, carrying out forward propagation on the features along time increment in the first stage, carrying out backward propagation (time decrement) on the features in the second stage, and connecting the target frame information with the feature information of the front frame and the rear frame in the third stage, wherein the connection is used for gathering the feature information of different positions and improving the effectiveness of the model in a shielding area, as shown in a dotted line part of fig. 1. In the process of feature transfer, the input feature information needs to be aligned, as shown in fig. 4, the feature alignment mainly comprises a residual error module, a deformable feature alignment module based on optical flow guiding and a space-time feature fusion module.
Specifically, taking backward propagation as an example, the extracted features are further extracted by a residual module:
(1)
Wherein, The residual block is represented as a block of residuals,Representing a cascading operation of the dimensions of the channel,Representing the feature information extracted by the feature extraction module,Representing the characteristics of the forward propagation calculated at the ith time. In addition, the two frames before and after the target frame are aligned:
(2)
Wherein, Representing the features after the alignment,,,AndFeatures of the i-2, i-1, i+1 and i+2 moments of backward propagation are shown,Representing deformable feature alignment operations based on optical flow guidance. Finally, a final feature diagram is obtained through a space-time feature fusion module:
(3)
Wherein, The final feature map is shown in the figure,Represents the deeper feature information extracted through the residual module,And (5) performing space-time feature fusion operation.
In a deformable feature alignment module based on optical flow guidance, as shown in FIG. 5, an optical flow estimation network is first employed to calculate an optical flow graph using,,,Respectively represent the optical flow from the first,,,Frame to the firstMapping of frames, namely, twisting the front and rear frame images of the ith frame of the target frame:
(4)
(5)
(6)
(7)
Wherein the method comprises the steps of ,,,Respectively represent the first,,,The optical flow information at the moment is spatially warped,Representing spatial warping operations and then computing optical flow residuals using pre-aligned featuresAnd deformable convolution mask:
(8)
(9)
Wherein, Indicating that the channel is connected,AndA convolution calculation is represented and is performed,Representing an activation function. Finally, the aligned features are obtained through deformable convolution
(10)
Where DCN represents a deformable convolution operation.
In the space-time feature fusion module, a time attention mechanism and a space attention mechanism are adopted, the attention mechanism redistributes weights, so that a model is helped to select important information from adjacent frames, error information is reduced, and longer sequences are utilized more effectively, as shown in fig. 6. Deep characteristic information extracted from current frameFeature map obtained after alignment with subsequent framePerforming convolution operation to calculate the similar distance after embedding:
(11)
Wherein, Representing the activation function, conv represents the convolution calculation,Representing dot product. The similar distances are then processed by a temporal attention mechanism, which is spatially specific for each spatial position.
(12)
Wherein, A time attention graph is represented and a time attention graph is represented,Representing a temporal attention process. Then multiplying the aligned characteristic diagram with the time attention force diagram pixel by pixel, and obtaining the characteristic diagram of the time attention fusion processing through a convolution layer
(13)
Wherein, Representing a pixel-by-pixel dot product operation,Representing a fusion convolution operation. Finally, the fused features are subjected to spatial attention mechanism processing to enhance texture information and the like, so as to obtain a final feature map
(14)
Wherein SA denotes a spatial attention operation,Representing element-by-element additions.
In a specific embodiment, the residual modules in feature alignment are composed of three residual blocks with the same structure, each residual block comprises two convolution layers and an activation layer, the residual blocks are connected in a jump connection mode, the convolution kernels of the two convolution layers are 3×3, the step size is 2, the filling is 1, the number of input channels is 64, the number of middle features is 64, the number of output channels is 128, the activation layer adopts a ReLU activation function, and the slope is set to 0.1.
Step 6, aligning the features obtained in the step 5Obtaining residual characteristics of the high-resolution image through a pixel reorganization module
Specifically, the pixel reorganization module includes one reconstruction layer, four convolution layers, two pixel shuffling layers, and three activation layers, as shown in fig. 7. Firstly, the feature map obtained in the step 5 is further subjected to feature extraction through a reconstruction layer, the reconstruction layer is composed of five residual blocks with the same structure, the structure of the reconstruction layer is the same as that of the feature extraction module in the step 4, the number of input channels and the number of output channels are different, then the number of channels and the size of an image are increased through two convolutions and two pixel shuffling, and finally the two convolutions are performed to obtain the residual features of the high-resolution frame image with the number of channels being 3
In a specific embodiment, the number of input channels of the reconstruction layer is 320 and the number of output channels is 64.
The first convolution layer has a number of input channels of 64, a number of output channels of 256, a convolution kernel size of 3x 3, a step size of 1, a fill of 1, the convolution layer amplifies the number of channels by a factor of 4 for use in cooperation with a subsequent pixel shuffling operation, the second convolution layer has a number of input channels of 64, a number of output channels of 256, a convolution kernel size of 3x 3, a step size of 1, a fill of 1, for further amplifying the number of channels. The third convolutional layer has a number of input channels of 64, a number of output channels of 64, a convolutional kernel size of 3×3, a step size of 1, and a fill of 1. The fourth convolutional layer has a number of input channels of 64, output channels of 3, convolutional kernel size of 3×3, step size of 1, and padding of 1. The first and second active layers each use a leak ReLU activation function, the third active layer uses a ReLU activation function, and the slope of the activation function is set to 0.1. The pixel shuffling strategy adopts a pixel shuffling package strategy, wherein the pixel shuffling package is an enhanced pixel shuffling module, standard pixel shuffling only simply redistributes channel data to a space dimension, and the pixel shuffling package can also use convolution for pretreatment before pixel shuffling so as to improve the expression capability and up-sampling quality of the features. The first pixel shuffle input channel number is 256, the output channel number is 64, the scale factor is 2, and the upsampling convolution kernel size is 3. The second pixel shuffle input channel number is 256, the output channel number is 64, the scale factor is 2, and the upsampling convolution kernel size is 3.
Step 7, interpolating the input low-resolution frame image, and adding the interpolation with the residual characteristics of the high-resolution frame image obtained in the step 6 to obtain a final reconstructed image
In a specific embodiment, bicubic interpolation is used to interpolate the low resolution frame image, and PSNR and SSIM are used as objective evaluation indexes, where the evaluation indexes are obtained by performing average calculation on all frames of each tested satellite video, and testing is performed in three satellite video data sets, the objective evaluation results are shown in table 1, and the visualization results are shown in fig. 8, 9 and 10, respectively.
Table 1 results of objective evaluation of different superdivision magnifications on three satellite video datasets
Fig. 8 is a visual result of 2 x super resolution reconstruction on three satellite video datasets. In fig. 8, a is a 033 rd scene original image of a data set 1, A1 represents an original frame image amplification detail, A2 represents A2 times super-resolution reconstruction result of a corresponding position, B is a 000 th scene original frame image of the data set 2, B1 represents an original frame image amplification detail, B2 represents A2 times super-resolution reconstruction result of a corresponding position, C is a 3 rd 001 th scene original frame image of the data set, C1 represents an original frame image amplification detail, and C2 represents A2 times super-resolution reconstruction result of a corresponding position.
Fig. 9 is a visual result of a 3 x super resolution reconstruction on three satellite video datasets. In fig. 9, a is an original frame image of a 026 th scene of a data set 1, A1 represents an enlarged detail of the original frame image, A2 represents a 3 times super-resolution reconstruction result of a corresponding position, B is an original frame image of a 016 th scene of the data set 2, B1 represents an enlarged detail of the original frame image, B2 represents a 3 times super-resolution reconstruction result of the corresponding position, C is an original 001 th scene image of the data set 3, C1 represents an enlarged detail of the original frame image, and C2 represents a 3 times super-resolution reconstruction result of the corresponding position.
Fig. 10 is a visual result of a 4 x super resolution reconstruction on three satellite video datasets. In fig. 10, a is a 030 th scene original frame image of a data set 1, A1 represents an original frame image enlarged detail, A2 represents a 4 times super-resolution reconstruction result of a corresponding position, B is a 001 th scene original frame image of the data set 2, B1 represents an original frame image enlarged detail, B2 represents a 4 times super-resolution reconstruction result of a corresponding position, C is a 005 th scene original image of the data set 3, C1 represents an original frame image enlarged detail, and C2 represents a 4 times super-resolution reconstruction result of a corresponding position.
In summary, the invention solves the problem of poor reconstruction effect caused by multiple factor degradation of satellite video data in actual scenes by considering various noises in satellite video data in a satellite video random degradation module, adopts a high-order grid characteristic bidirectional propagation scheme to propagate characteristic information forwards and backwards in time in an alternating manner, can revisit information from different frames and refine the characteristics, improves the expression capability of the characteristics, fuses time characteristics in the characteristic alignment process, is favorable for a model to select important information from adjacent frames, reduces error information, thereby more effectively utilizing longer sequence images, and finally performs experiments on three satellite video data sets, and the test results prove the effectiveness of the method in qualitative and quantitative aspects.
On the other hand, the embodiment of the invention also provides a satellite video super-resolution reconstruction system facing the multiple degradation process, which comprises the following steps:
The system comprises a processor and a memory, wherein the memory is used for storing program instructions, and the processor is used for calling the stored instructions in the memory to execute the satellite video super-resolution reconstruction method facing the multiple degradation process according to the technical scheme.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1.一种面向多重退化过程的卫星视频超分重建方法,其特征在于,包括如下步骤:1. A satellite video super-resolution reconstruction method for multiple degradation processes, characterized in that it comprises the following steps: 步骤S1,获取卫星视频数据集;Step S1, obtaining a satellite video data set; 步骤S2,构建面向多重退化过程的卫星视频超分重建模型,包括卫星视频随机退化模块、特征提取模块、特征对齐模块和像素重组模块;Step S2, constructing a satellite video super-resolution reconstruction model for multiple degradation processes, including a satellite video random degradation module, a feature extraction module, a feature alignment module and a pixel reconstruction module; 步骤S3,卫星视频数据集中的卫星视频数据通过卫星视频随机退化模块,得到低分辨率帧图像,并进行数据增强操作,得到低分辨率帧数据;Step S3, the satellite video data in the satellite video data set is subjected to a satellite video random degradation module to obtain a low-resolution frame image, and a data enhancement operation is performed to obtain low-resolution frame data; 步骤S4,将步骤S3得到的低分辨率帧数据经过特征提取模块,得到帧图像特征信息;Step S4, passing the low-resolution frame data obtained in step S3 through a feature extraction module to obtain frame image feature information; 步骤S5,将步骤S4得到的帧图像特征信息输入特征对齐模块,对输入的特征进行传播、聚合以及对齐操作,得到对齐后的特征;Step S5, inputting the frame image feature information obtained in step S4 into a feature alignment module, performing propagation, aggregation and alignment operations on the input features to obtain aligned features; 步骤S6,将步骤S5得到的对齐后的特征通过像素重组模块,得到高分辨率图像的残差特征;Step S6, passing the aligned features obtained in step S5 through a pixel reorganization module to obtain residual features of the high-resolution image; 步骤S7,将步骤S3得到的低分辨率帧图像进行上采样操作,再与步骤S6得到的高分辨率图像的残差特征进行相加,得到最终的高分辨率重建图像。Step S7, upsampling the low-resolution frame image obtained in step S3, and then adding the residual features of the high-resolution image obtained in step S6 to obtain a final high-resolution reconstructed image. 2.根据权利要求1所述的一种面向多重退化过程的卫星视频超分重建方法,其特征在于:步骤S3中所述的卫星视频随机退化模块包括一个下采样模块、四个随机退化模块和一个视频压缩模块;下采样模块采用最邻近插值、双线性插值或双三次插值中的一种下采样方式对图像进行下采样处理;四个随机退化模块中,每个随机退化模块采用随机模糊、运动模糊、随机噪声或传感器震颤中的一种退化模式对图像进行退化处理;视频压缩模块采用帧内压缩或帧间压缩实现,其中帧内压缩包括JPEG压缩、JPEG 2000压缩、PNG压缩、WEBP压缩、BMP压缩、TIFF压缩、BPG压缩和FLIF压缩,帧间压缩包括H.263、H.264、H.265、MJPEG、MJPEG-2、MPEG-4、VP8、VP9和AV1。2. A satellite video super-resolution reconstruction method for multiple degradation processes according to claim 1, characterized in that: the satellite video random degradation module described in step S3 includes a downsampling module, four random degradation modules and a video compression module; the downsampling module uses a downsampling method selected from nearest neighbor interpolation, bilinear interpolation or bicubic interpolation to downsample the image; each of the four random degradation modules uses a degradation mode selected from random blur, motion blur, random noise or sensor tremor to degrade the image; the video compression module is implemented by intra-frame compression or inter-frame compression, wherein the intra-frame compression includes JPEG compression, JPEG 2000 compression, PNG compression, WEBP compression, BMP compression, TIFF compression, BPG compression and FLIF compression, and the inter-frame compression includes H.263, H.264, H.265, MJPEG, MJPEG-2, MPEG-4, VP8, VP9 and AV1. 3.根据权利要求1所述的一种面向多重退化过程的卫星视频超分重建方法,其特征在于:步骤S4中所述的特征提取模块包括K个残差块,每个残差块包括两个卷积层和一个激活层,残差块之间通过跳跃连接的方式进行连接。3. The satellite video super-resolution reconstruction method for multiple degradation processes according to claim 1 is characterized in that: the feature extraction module described in step S4 includes K residual blocks, each residual block includes two convolutional layers and one activation layer, and the residual blocks are connected by skip connections. 4.根据权利要求1所述的一种面向多重退化过程的卫星视频超分重建方法,其特征在于:步骤S5中所述的特征对齐模块用于对步骤S4得到的帧图像特征信息进行信息的传播和聚合,通过高阶格网传播方案进行传播,这个过程分为三个阶段:第一个阶段将特征沿时间递增进行前向传播,第二个阶段是将特征沿时间递减进行后向传播,第三个阶段是将目标帧信息与前后两帧的特征信息进行连接;所述特征对齐模块包括一个残差模块,一个基于光流引导的特征对齐模块和一个时空特征融合模块;残差模块包含K个残差块,每个残差块包括两个卷积层和一个激活层;基于光流引导的特征对齐模块是引入光流信息来辅助可变形卷积进行特征对齐操作;时空特征融合模块包括三个卷积层,一个时间注意力机制和一个空间注意力机制。4. According to claim 1, a satellite video super-resolution reconstruction method for multiple degradation processes is characterized in that: the feature alignment module described in step S5 is used to propagate and aggregate the frame image feature information obtained in step S4, and propagate it through a high-order grid propagation scheme. This process is divided into three stages: the first stage propagates the features forward along the time increment, the second stage propagates the features backward along the time decrement, and the third stage connects the target frame information with the feature information of the previous and next two frames; the feature alignment module includes a residual module, a feature alignment module based on optical flow guidance and a spatiotemporal feature fusion module; the residual module includes K residual blocks, each residual block includes two convolutional layers and an activation layer; the feature alignment module based on optical flow guidance introduces optical flow information to assist deformable convolution in feature alignment operation; the spatiotemporal feature fusion module includes three convolutional layers, a temporal attention mechanism and a spatial attention mechanism. 5.根据权利要求4所述的一种面向多重退化过程的卫星视频超分重建方法,其特征在于:特征对齐模块的计算过程如下:5. The satellite video super-resolution reconstruction method for multiple degradation processes according to claim 4 is characterized in that the calculation process of the feature alignment module is as follows: 针对后向传播,首先用一个残差模块对提取的特征信息进一步进行提取:For backward propagation, a residual module is first used to further extract the extracted feature information: (1) (1) 其中,表示残差模块,表示通道维度的级联操作,表示经过特征提取模块所提取的特征信息,表示前向传播在第i个时间时计算的特征;此外,将目标帧的前后两帧进行对齐:in, represents the residual module, represents the cascade operation of the channel dimension, Represents the feature information extracted by the feature extraction module. Represents the features calculated by the forward propagation at the i-th time; in addition, the two frames before and after the target frame are aligned: (2) (2) 其中,表示对齐后的特征,分别表示后向传播的第i-2,i-1,i+1和i+2时刻的特征,表示基于光流引导的可变形特征对齐模块;最后通过一个时空特征融合模块,得到最终的特征图:in, represents the aligned features, , , and Respectively represent the features of the i-2th, i-1th, i+1th and i+2th moments of the backward propagation, It represents the deformable feature alignment module guided by optical flow; finally, a spatiotemporal feature fusion module is used to obtain the final feature map: (3) (3) 其中,表示最终的特征图,表示经过残差模块提取到的更深层的特征信息,表示时空特征融合模块。in, represents the final feature map, Represents the deeper feature information extracted by the residual module. Represents the spatiotemporal feature fusion module. 6.根据权利要求5所述的一种面向多重退化过程的卫星视频超分辨率重建方法,其特征在于:在基于光流引导的可变形特征对齐模块中,首先采用光流估计网络来计算光流图,用分别表示光流从第帧到第帧的映射,对目标帧第i帧的前后两帧图像均进行扭曲:6. The satellite video super-resolution reconstruction method for multiple degradation processes according to claim 5 is characterized in that: in the deformable feature alignment module based on optical flow guidance, an optical flow estimation network is first used to calculate the optical flow map, and then , , , Respectively represent the optical flow from , , , Frame to Frame mapping, distorting the two frames before and after the target frame i: (4) (4) (5) (5) (6) (6) (7) (7) 其中,分别表示第时刻的光流信息进行空间扭曲后的特征,表示空间扭曲操作,然后使用预对齐的特征计算光流残差和可变形卷积掩膜in, , , , Respectively represent , , , The characteristics of the optical flow information at each moment after spatial distortion, Represents the spatial warping operation, and then uses the pre-aligned features to calculate the optical flow residual and deformable convolutional masks : (8) (8) (9) (9) 其中,表示通道连接,表示卷积计算,表示激活函数,最后通过可变形卷积得到对齐后的特征in, Indicates channel connection, and represents the convolution calculation, Represents the activation function, and finally the aligned features are obtained through deformable convolution : (10) (10) 其中,DCN表示可变形卷积操作。Among them, DCN represents deformable convolution operation. 7.根据权利要求6所述的一种面向多重退化过程的卫星视频超分辨率重建方法,其特征在于:在时空特征融合模块中,首先对当前帧提取的深层特征信息和后续帧对齐后得到的特征图进行卷积运算,计算嵌入后的相似距离7. The satellite video super-resolution reconstruction method for multiple degradation processes according to claim 6 is characterized in that: in the spatiotemporal feature fusion module, the deep feature information extracted from the current frame is firstly Feature map obtained after alignment with subsequent frames Perform convolution operation to calculate the similarity distance after embedding : (11) (11) 其中,表示激活函数,conv表示卷积计算,表示点积;然后对相似距离进行时间注意力机制处理,对于每个空间位置,时间注意力都是具有空间异性的;in, represents the activation function, conv represents the convolution calculation, represents the dot product; then the similar distance is processed by the temporal attention mechanism, and for each spatial position, the temporal attention is spatially heterogeneous; (12) (12) 其中,表示时间注意力图,表示时间注意力处理;再将对齐后的特征图与时间注意力图逐像素相乘,通过卷积层得到时间注意力融合处理的特征图in, represents the temporal attention map, Represents the temporal attention processing; then the aligned feature map is multiplied pixel by pixel with the temporal attention map, and the feature map of the temporal attention fusion processing is obtained through the convolution layer : (13) (13) 其中,表示逐像素点积运算,表示融合卷积运算;最后再对融合后的特征进行空间注意力机制处理,以增强纹理信息,得到最终的特征图in, represents pixel-by-pixel dot product operation, Represents the fused convolution operation; finally, the fused features are processed by the spatial attention mechanism to enhance the texture information and obtain the final feature map : (14) (14) 其中,SA表示空间注意力操作,表示逐元素相加。Among them, SA represents the spatial attention operation, Represents element-by-element addition. 8.根据权利要求1所述的一种面向多重退化过程的卫星视频超分辨率重建方法,其特征在于:步骤S6中所述的像素重组模块包括一个重建层、四个卷积层、两个像素洗牌层和三个激活层;重建层由K个具有相同结构的残差块组成,每个残差块包括两个卷积层和一个激活层;两个像素洗牌层采用像素洗牌策略;第一个和第二个激活层均采用Leaky ReLU激活函数,第三个激活层采用ReLU激活函数。8. According to the satellite video super-resolution reconstruction method for multiple degradation processes described in claim 1, it is characterized in that: the pixel reorganization module described in step S6 includes a reconstruction layer, four convolutional layers, two pixel shuffling layers and three activation layers; the reconstruction layer is composed of K residual blocks with the same structure, each residual block includes two convolutional layers and one activation layer; the two pixel shuffling layers adopt a pixel shuffling strategy; the first and second activation layers both adopt Leaky ReLU activation functions, and the third activation layer adopts ReLU activation function. 9.根据权利要求1所述的一种面向多重退化过程的卫星视频超分辨率重建方法,其特征在于:步骤S7中所述的上采样操作采用最邻近插值、双线性插值或双三次插值中的一种。9. The satellite video super-resolution reconstruction method for multiple degradation processes according to claim 1, characterized in that the upsampling operation in step S7 adopts one of nearest neighbor interpolation, bilinear interpolation or bicubic interpolation. 10.一种面向多重退化过程的卫星视频超分辨率重建系统,其特征在于,包括:10. A satellite video super-resolution reconstruction system for multiple degradation processes, comprising: 处理器和存储器,存储器用于存储程序指令,处理器用于调用存储器中的存储指令执行如权利要求1-9任一项权利要求所述的一种面向多重退化过程的卫星视频超分辨率重建方法。A processor and a memory, the memory is used to store program instructions, and the processor is used to call the stored instructions in the memory to execute a satellite video super-resolution reconstruction method for multiple degradation processes as described in any one of claims 1 to 9.
CN202510161817.5A 2025-02-14 2025-02-14 Satellite video super-resolution reconstruction method and system for multiple degradation process Active CN119624782B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202510161817.5A CN119624782B (en) 2025-02-14 2025-02-14 Satellite video super-resolution reconstruction method and system for multiple degradation process

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202510161817.5A CN119624782B (en) 2025-02-14 2025-02-14 Satellite video super-resolution reconstruction method and system for multiple degradation process

Publications (2)

Publication Number Publication Date
CN119624782A true CN119624782A (en) 2025-03-14
CN119624782B CN119624782B (en) 2025-05-02

Family

ID=94908877

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202510161817.5A Active CN119624782B (en) 2025-02-14 2025-02-14 Satellite video super-resolution reconstruction method and system for multiple degradation process

Country Status (1)

Country Link
CN (1) CN119624782B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN120510485A (en) * 2025-07-22 2025-08-19 清华大学 Multimodal video fusion method, device, equipment and medium for wide-area visual network

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116468605A (en) * 2023-04-12 2023-07-21 西安电子科技大学 Video super-resolution reconstruction method based on time-space layered mask attention fusion
CN116527833A (en) * 2023-07-03 2023-08-01 清华大学 High-definition video generation method and system based on superdivision model
CN117689541A (en) * 2023-12-07 2024-03-12 重庆邮电大学 Multi-region classification video super-resolution reconstruction method with temporal redundancy optimization

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116468605A (en) * 2023-04-12 2023-07-21 西安电子科技大学 Video super-resolution reconstruction method based on time-space layered mask attention fusion
CN116527833A (en) * 2023-07-03 2023-08-01 清华大学 High-definition video generation method and system based on superdivision model
CN117689541A (en) * 2023-12-07 2024-03-12 重庆邮电大学 Multi-region classification video super-resolution reconstruction method with temporal redundancy optimization

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
卜丽静;郑新杰;肖一鸣;张正鹏;: "改进SA方法的卫星视频图像超分辨率重建", 测绘科学技术学报, no. 01, 3 May 2017 (2017-05-03) *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN120510485A (en) * 2025-07-22 2025-08-19 清华大学 Multimodal video fusion method, device, equipment and medium for wide-area visual network

Also Published As

Publication number Publication date
CN119624782B (en) 2025-05-02

Similar Documents

Publication Publication Date Title
Yin et al. AFBNet: A Lightweight Adaptive Feature Fusion Module for Super-Resolution Algorithms.
CN111539879A (en) Video blind denoising method and device based on deep learning
CN111028150A (en) A fast spatiotemporal residual attention video super-resolution reconstruction method
CN106127688B (en) A super-resolution image reconstruction method and system thereof
Tang et al. Deep inception-residual Laplacian pyramid networks for accurate single-image super-resolution
Dai et al. FreqFormer: Frequency-aware transformer for lightweight image super-resolution
KR102221225B1 (en) Method and Apparatus for Improving Image Quality
CN115496663B (en) Video super-resolution reconstruction method based on D3D convolutional intra-group fusion network
CN112419150A (en) A Super-resolution Reconstruction Method for Images with Arbitrary Multiples Based on Bilateral Upsampling Network
CN113469884A (en) Video super-resolution method, system, equipment and storage medium based on data simulation
CN110246084A (en) A kind of super-resolution image reconstruction method and its system, device, storage medium
CN111767679B (en) Method and device for processing time-varying vector field data
Chen et al. Single-image super-resolution using multihypothesis prediction
CN118333860B (en) Residual enhancement type frequency space mutual learning face super-resolution method
CN117575915A (en) An image super-resolution reconstruction method, terminal equipment and storage medium
CN110689509A (en) Video super-resolution reconstruction method based on cyclic multi-column 3D convolutional network
CN112435165B (en) Two-stage video super-resolution reconstruction method based on generation countermeasure network
Zhang et al. Non‐local feature back‐projection for image super‐resolution
CN117253126A (en) Mixed architecture image reconstruction method for global fusion cross self-attention network
Sun et al. A rapid and accurate infrared image super-resolution method based on zoom mechanism
CN119151787B (en) Transformer single image super-resolution reconstruction method based on trans-scale token interaction
Zhang et al. Iterative multi‐scale residual network for deblurring
CN118710542A (en) A video deblurring method based on spectral attention and feature shifting
Dong et al. Dynamic scene reconstruction for color spike camera via zero-shot learning
CN119624782B (en) Satellite video super-resolution reconstruction method and system for multiple degradation process

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载