Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
It is to be noted that, unless otherwise specified, technical or scientific terms used herein shall have the ordinary meaning as understood by those skilled in the art to which this application belongs.
In addition, the terms "first" and "second", etc. are used to distinguish different objects, rather than to describe a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.
To further illustrate aspects of embodiments of the present application, reference is made to the following description taken in conjunction with the accompanying drawings. It is to be understood that, in the following embodiments, the same or corresponding contents may be mutually referred to, and for simplicity and convenience of description, the subsequent descriptions are not repeated.
Embodiments of the present application provide a deblocking filtering method and apparatus, an electronic device, and a computer-readable storage medium, which are described below with reference to the accompanying drawings.
Please refer to fig. 1, which illustrates a flowchart of a deblocking filtering method according to some embodiments of the present application, the method can be applied to a DBK module of a hardware encoder, for example, AVS3
(Advanced Video Coding ) hardware encoder.
As shown in fig. 1, the deblocking filtering method may include the following steps:
step S101: dividing a current video frame according to a Largest Coding Unit (LCU) to obtain a plurality of LCUs;
step S102: dividing the current LCU into a plurality of sub-blocks according to the raster scanning sequence and the filtering dependency relationship between adjacent LCUs;
step S103: performing deblocking effect filtering on subblocks meeting the filtering dependency relationship among the subblocks, and storing the rest subblocks in corresponding line caches to wait for simultaneous processing when LCU filtering with the filtering dependency relationship exists later; the sub-blocks satisfying the filtering dependency relationship refer to sub-blocks having a filtering dependency relationship with a previous neighboring LCU.
The above steps will be described in detail.
In order to be able to implement the deblocking filtering in a hardware encoder, the frame level buffer needs to be rewritten to the LCU level buffer.
The original image of the whole frame is stored in a buffer space of a frame level, and pixel point values and other coding information of corresponding positions of each coding block in each area can be obtained at any time. After the hardware environment is changed, the space cannot be buffered, and meanwhile, in order to improve the throughput rate, each LCU is traversed, so that the buffer space is changed to the LCU-level buffer, and thus the video frames are divided according to the LCUs in step S101. As shown in fig. 2, for example, the size of a video frame is 128x128, and the size of an LCU is 64x64, and the video frame is divided by the LCU to obtain 4 LCUs. Since the original vertical-to-horizontal filtering order cannot be changed, otherwise the standard decoder cannot decode, so that strong data dependency is generated.
In step S102, the raster scanning sequence is from left to right, from top to bottom, first scanning a line, and then moving to the starting position of the next line to continue scanning. In the present application, lines of LCUs in a video frame are deblock filtered in a raster scan order.
As shown in fig. 2, after the LCU level buffering is changed, one LCU is buffered each time for deblocking filtering, in the present application, the LCU is further divided into a plurality of sub-blocks according to the filter dependency relationship, that is, the buffer space of the LCU is divided into four parts, 1, 2, 3, and 4, and the deblocking filtering is performed on the pixel region of the sub-blocks. First, vertical and horizontal filtering of sub-block 1 is performed, and vertical filtering of sub-block 2 needs to depend on sub-block 5, and sub-block 5 waits for the next cycle until the second LCU (5, 6, 7, 8) can obtain its pixel value, so that it must wait for the next cycle to complete filtering with the second LCU, and similarly, sub-block 6 also depends on the next LCU to perform filtering together. The vertical filtering of sub-block 3 has no dependency, but the horizontal filtering depends on the pixel values of sub-block 9, so that the full filtering can only be done by waiting for the loop to go right below the next row of LCUs.
In particular, sub-block 4 has a dual dependency, depending on both sub-block 7 and sub-block 10, wherein vertical filtering depends on sub-block 7 and horizontal filtering depends on sub-block 10, while sub-block 7 horizontal filtering depends on sub-block 13 and sub-block 10 vertical filtering depends on sub-block 13, so that it is necessary to wait until the next row of LCUs (13, 14, 15, 16) are filtered together to complete the complete filtering of sub-blocks 4, 7, 10.
As shown in fig. 2, the sizes of the sub-blocks 1, 2, 3, 4 are determined according to the encoding standard.
In step S103, as shown in fig. 2, for a video frame with a resolution of 128 × 128, the filtering order of the present application is as follows:
1. firstly, carrying out vertical and horizontal filtering on a subblock 1(60x60), and respectively caching subblocks 2, 3 and 4;
2. the next cycle enters a second LCU, sub-block 2 is taken out, vertical and horizontal filtering of sub-blocks 2, 5 and 6 is completed, and sub-blocks 7 and 8 are stored;
3. entering the first LCU in the next row, taking out the sub-block 3, finishing the vertical and horizontal filtering of the sub-blocks 3, 9 and 11, and storing the sub-blocks 10 and 12;
4. the second LCU (13, 14, 15, 16) in the second row is entered and sub-blocks 4, 7, 8, 10, 12 are fetched, the sub-blocks (4, 7, 8, 10, 13, 14, 12, 15, 16) being jointly filtered vertically and horizontally.
In some embodiments of the present application, the plurality of sub-blocks includes a first sub-block, a second sub-block, a third sub-block, and a fourth sub-block; the second sub-block is located at the right side of the first sub-block, the third sub-block is located at the lower side of the first sub-block, and the fourth sub-block is located at the lower right side of the first sub-block, as shown in fig. 2 as 1, 2, 3 and 4.
Before step S103, the method further includes:
according to the size of the video frame and the size of the LCU, line caches with corresponding quantities are set for the second sub-block, the third sub-block and the fourth sub-block, and the method specifically comprises the following steps:
setting the number of line caches corresponding to the second sub-block to be 1;
setting the number of line caches corresponding to the third sub-block to be A/B;
setting the number of line caches corresponding to the fourth sub-block to be A/B + 1;
where A represents the width of the video frame and B represents the width of the LCU.
The width of the video frame in fig. 2 is 128 and the width of the LCU is 64.
For example, in the AVS3 standard, it is specified that the order of vertical and horizontal filtering cannot be changed, otherwise decoding cannot be done with a standard decoder. Therefore, a great deal of data dependency problem exists in the frame level cache changed into the LCU cache and needs to be solved, the data dependency problem can be solved by setting a line cache (linebuffer), and the problem of the data dependency of the filtering of the boundary of the adjacent block can be solved with the lowest storage cost as possible by caching the filtering boundary and multiplexing and updating to a certain extent. A dedicated line cache is therefore required to store sub-blocks 2 and 3, where the line cache of sub-block 2 (size 60x4) requires only one, while the line cache of sub-block 3 (size 4x60) requires waiting for one line of LCUs, thus requiring 128/64 (i.e. 2), in particular sub-block 4 (size 4x4) requires setting (128/64+1) (i.e. 3). By setting these line buffers, the multiple data dependency problem can be solved smoothly.
Based on H264 or HEVC (High Efficiency Video Coding), the scheme is different from that of the DBK in the AVS3 standard, the division mode and the filtering scale of the AVS3 Coding block are complex, the maximum complexity of the optimal division tree of the Coding block is higher, and the existing method cannot meet the real-time performance requirement of the DBK hardware module under AVS 3.
In some embodiments of the present application, step S103 may be implemented as: and performing deblocking effect filtering on the current sub-block and the sub-blocks which have filtering dependency relationship with the current sub-block by adopting multiple filtering kernels.
Further, the step of performing deblocking filtering on the current subblock and subblocks having filtering dependency relationship with the current subblock by using multiple filtering kernels can be realized as follows:
dividing the current sub-block and the sub-block which has a filtering dependency relationship with the current sub-block into a luminance component and a chrominance component;
and respectively adopting multiple filtering kernels to simultaneously carry out deblocking filtering on the brightness component and the chrominance component.
In the present application, in order to improve the efficiency of deblocking filtering, a high-efficiency multi-level parallel pipeline scheme is adopted, including the parallel of multiple filtering kernels and the parallel of luminance and chrominance filtering.
And (3) parallel of multiple filtering kernels: for example, 4 filter kernels are provided, each filter kernel processes 1/4 LCUs, the filter function is divided into four parts, parallel processing is performed, and the filter operation of the complete LCU is performed synchronously. The number of filter kernels can be chosen according to the actual situation.
Parallelism of luma and chroma filtering: in the conventional implementation, the filtering of the luminance component is performed first, and then the filtering of the chrominance component is performed. In fact, different buffer spaces exist for the luminance component pixels and the chrominance component pixels, and data dependency does not exist during filtering, so that luminance and chrominance filtering operations can be directly realized in parallel. For example, 4 filter kernels for luminance and chrominance filtering are provided, respectively.
The deblocking filtering method provided by the application can successfully complete frame level cache rewriting into LCU level cache, solve the problem of vertical and horizontal filtering dependency, and realize an efficient parallel pipeline scheme. The problem of complex data dependency is solved by arranging a plurality of line caches, a multilevel parallel pipeline scheme with parallel brightness and chrominance and parallel multi-filtering cores is provided, a pipeline parallel framework of deblocking filtering can be realized with high concurrency and low resource number, and the performance requirement of real-time processing of high-definition videos is met.
In the foregoing embodiments, a deblocking filtering method is provided, and accordingly, a deblocking filtering apparatus is also provided. Please refer to fig. 3, which illustrates a schematic diagram of a deblocking filter apparatus according to some embodiments of the present application. Since the apparatus embodiments are substantially similar to the method embodiments, they are described in a relatively simple manner, and reference may be made to some of the descriptions of the method embodiments for relevant points. The device embodiments described below are merely illustrative.
As shown in fig. 3, the deblocking filter apparatus 10 includes:
an LCU partitioning module 101, configured to partition a current video frame according to a largest coding unit LCU to obtain multiple LCUs;
a sub-block dividing module 102, configured to divide a current LCU into a plurality of sub-blocks according to a raster scanning order and a filtering dependency relationship between adjacent LCUs;
the filtering module 103 is configured to perform deblocking filtering on sub-blocks that satisfy a filtering dependency relationship among the plurality of sub-blocks, and store remaining sub-blocks in corresponding line caches, so as to wait for processing together when an LCU having a filtering dependency relationship is filtered later;
the sub-blocks satisfying the filtering dependency relationship refer to sub-blocks having a filtering dependency relationship with a previous neighboring LCU.
According to some embodiments of the present application, the plurality of sub-blocks includes a first sub-block, a second sub-block, a third sub-block, and a fourth sub-block; the second sub-block is positioned at the right side of the first sub-block, the third sub-block is positioned at the lower side of the first sub-block, and the fourth sub-block is positioned at the lower right side of the first sub-block;
the device further comprises:
and the setting module is used for setting line caches with corresponding quantities for the second sub-block, the third sub-block and the fourth sub-block according to the size of the video frame and the size of the LCU before the filtering module carries out deblocking filtering on the sub-blocks which meet the filtering dependency relationship among the plurality of sub-blocks.
According to some embodiments of the present application, the setting module is specifically configured to:
setting the number of line caches corresponding to the second sub-block to be 1;
setting the number of line caches corresponding to the third sub-block to be A/B;
setting the number of line caches corresponding to the fourth sub-block to be A/B + 1;
where A represents the width of the video frame and B represents the width of the LCU.
According to some embodiments of the present application, the filtering module 103 is specifically configured to:
and performing deblocking effect filtering on the current sub-block and the sub-blocks which have filtering dependency relationship with the current sub-block by adopting multiple filtering kernels.
According to some embodiments of the present application, the filtering module 103 is further configured to:
dividing the current sub-block and the sub-block which has a filtering dependency relationship with the current sub-block into a luminance component and a chrominance component;
and respectively adopting multiple filtering kernels to simultaneously carry out deblocking filtering on the brightness component and the chrominance component.
The deblocking filter device provided by the embodiment of the application and the deblocking filter method provided by the previous embodiment of the application have the same beneficial effects based on the same inventive concept.
The present disclosure further provides an electronic device, such as a mobile phone, a notebook computer, a tablet computer, a desktop computer, etc., corresponding to the deblocking filtering method provided in the foregoing embodiments, so as to execute the deblocking filtering method.
Referring to fig. 4, a schematic diagram of an electronic device provided in some embodiments of the present application is shown. As shown in fig. 4, the electronic device 20 includes: the system comprises a processor 200, a memory 201, a bus 202 and a communication interface 203, wherein the processor 200, the communication interface 203 and the memory 201 are connected through the bus 202; the memory 201 stores a computer program that can be executed on the processor 200, and the processor 200 executes the computer program to perform the deblocking filtering method provided in any of the foregoing embodiments.
The Memory 201 may include a high-speed Random Access Memory (RAM) and may further include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. The communication connection between the network element of the system and at least one other network element is realized through at least one communication interface 203 (which may be wired or wireless), and the internet, a wide area network, a local network, a metropolitan area network, and the like can be used.
Bus 202 can be an ISA bus, PCI bus, EISA bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. The memory 201 is used for storing a program, and the processor 200 executes the program after receiving an execution instruction, and the deblocking filtering method disclosed in any of the foregoing embodiments of the present application may be applied to the processor 200, or implemented by the processor 200.
The processor 200 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 200. The Processor 200 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 201, and the processor 200 reads the information in the memory 201 and completes the steps of the method in combination with the hardware thereof.
The electronic device provided by the embodiment of the present application and the deblocking filtering method provided by the embodiment of the present application have the same inventive concept, and have the same beneficial effects as the method adopted, operated or implemented by the electronic device.
Referring to fig. 5, the computer readable storage medium is an optical disc 30, and a computer program (i.e., a program product) is stored thereon, and when being executed by a processor, the computer program performs the deblocking filtering method according to any of the foregoing embodiments.
It should be noted that examples of the computer-readable storage medium may also include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory, or other optical and magnetic storage media, which are not described in detail herein.
The computer-readable storage medium provided by the above-mentioned embodiments of the present application and the deblocking filtering method provided by the embodiments of the present application have the same beneficial effects as the method adopted, executed or implemented by the application program stored in the computer-readable storage medium.
Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit and scope of the present disclosure, and the present disclosure should be construed as being covered by the claims and the specification.