+

HK1185482A - Sample adaptive offset (sao) in accordance with video coding - Google Patents

Sample adaptive offset (sao) in accordance with video coding Download PDF

Info

Publication number
HK1185482A
HK1185482A HK13112702.9A HK13112702A HK1185482A HK 1185482 A HK1185482 A HK 1185482A HK 13112702 A HK13112702 A HK 13112702A HK 1185482 A HK1185482 A HK 1185482A
Authority
HK
Hong Kong
Prior art keywords
band
video signal
video
signal
indices
Prior art date
Application number
HK13112702.9A
Other languages
Chinese (zh)
Inventor
陈培松
温伟杰
Original Assignee
Koninklijke Philips N.V.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Koninklijke Philips N.V. filed Critical Koninklijke Philips N.V.
Publication of HK1185482A publication Critical patent/HK1185482A/en

Links

Abstract

Sample adaptive offset (SAO) in accordance with video coding. SAO filtering may be performed before e-blocking processing (e.g., in accordance with video signal decoding and/or encoding). For example, a receiver and/or decoder communication device may receive signaling from a transmitter and/or encoder communication device that includes various band offsets. Corresponding band indices may be determined via analysis of the received video signal (e.g., received from the transmitter and/or encoder communication device), inferentially without requiring signaling of such band indices from the transmitter and/or encoder communication device. Upon appropriate analysis of one or more largest coding units (LCUs) generated from the video signal to determine a pixel value distribution (e.g., which may be using a histogram in one instance), then based on that pixel value distribution, the band indices are identified and the band offsets applied thereto.

Description

Adaptive Offset (SAO) from sample point for video coding
CROSS-REFERENCE TO RELATED PATENT/PATENT APPLICATIONS
Priority claims for U.S. provisional patent application No. 61/597,683 filed on day 10, 2/2012, 61/598,326 filed on day 13, 2/2012, 61/603,190 filed on day 24, 9/2012, 13/623,765 filed on day 20, and 13/758,169 filed on day 4, 2/2013 are included herein by reference.
Technical Field
The present invention relates generally to digital video processing; and more particularly to processing and operation in accordance with such digital video processing.
Background
Communication systems that operate to transmit digital media (e.g., images, video, data, etc.) have been under development for many years. For such communication systems that employ some form of video data, a plurality of digital images are output or displayed at a frame rate (e.g., frames per second) to achieve a video signal suitable for output and consumption. In many such communication systems operating with video data, there is a trade-off between throughput (e.g., the number of image frames that can be transferred from a first location to a second location) and the video and/or image quality of the signal that is ultimately to be output or displayed. Current technology does not adequately or acceptably provide the following: i.e. it may transfer video data from a first location to a second location in accordance with providing sufficient or acceptable video and/or image quality, thereby ensuring a relatively small amount of overhead associated with the communication, a relatively low complexity of the communication devices at the respective communication link ends, etc.
Disclosure of Invention
(1) An apparatus, comprising:
at least one input for:
receiving a video signal from at least one additional device; and
receiving a plurality of band offsets from the at least one additional apparatus via signaling; and
a processor to:
analyzing a plurality of pixels associated with at least one Largest Coding Unit (LCU) to identify a pixel value distribution, wherein the largest coding unit is associated with the video signal;
identifying a plurality of band indices inferentially based on the pixel value distribution;
applying the plurality of band offsets to the plurality of band indexes according to a filtering process of the video signal or a signal based on the video signal;
performing a Sample Adaptive Offset (SAO) filtering process on the video signal or a signal based on the video signal to generate a first filtered signal, wherein the sample adaptive offset filtering process applies the plurality of band offsets to the plurality of band indices; and
a deblocking filtering process is performed on the first filtered signal to generate a second filtered signal.
(2) The apparatus of (1), wherein the processor is configured to:
analyzing a plurality of pixels associated with at least one maximum coding unit to generate a pixel value histogram representing the pixel value distribution, wherein the maximum coding unit is associated with the video signal; and
identifying the plurality of band indices to which the plurality of band offsets are to be applied based on the histogram of pixel values.
(3) The apparatus of (1), wherein:
the plurality of band indices have a discontinuous distribution such that at least two consecutive band indices of the plurality of band indices are separated from each other by at least one band index value.
(4) The apparatus of (1), wherein:
the pixel value distribution indicates a plurality of subsets of the plurality of pixels respectively associated with at least a portion of the plurality of band indices; and
the plurality of band indices to which the plurality of band offsets are to be applied correspond to at least one of a plurality of subsets of the plurality of pixels, wherein the at least one subset has a relatively larger or maximum number of pixels compared to other subsets of the plurality of pixels.
(5) The apparatus of (1), wherein:
the apparatus is a communication device operating in at least one of a satellite communication system, a wireless communication system, a wired communication system, a fiber optic communication system, and a mobile communication system.
(6) An apparatus, comprising:
an input for receiving a video signal and a plurality of band offsets from at least one additional device; and
a processor to:
analyzing a plurality of pixels associated with at least one Largest Coding Unit (LCU) associated with the video signal to identify a pixel value distribution for identifying a plurality of band indices; and
applying the plurality of band offsets to the plurality of band indexes according to a filtering process of the video signal or a signal based on the video signal.
(7) The apparatus of (6), wherein the processor is configured to:
performing a Sample Adaptive Offset (SAO) filtering process on the video signal or a signal based on the video signal to generate a first filtered signal, wherein the sample adaptive offset filtering process applies the plurality of band offsets to the plurality of band indices; and
a deblocking filtering process is performed on the first filtered signal to generate a second filtered signal.
(8) The apparatus of (6), wherein the processor is configured to:
analyzing a plurality of pixels associated with at least one maximum coding unit to generate a pixel value histogram representing the pixel value distribution, wherein the maximum coding unit is associated with the video signal; and
identifying a plurality of band indices to which the plurality of band offsets are to be applied based on the histogram of pixel values.
(9) The apparatus of (6), wherein:
the plurality of band indices have a discontinuous distribution such that at least two consecutive band indices of the plurality of band indices are separated from each other by at least one band index value.
(10) The apparatus of (6), wherein:
the pixel value distribution indicates a plurality of subsets of the plurality of pixels respectively associated with at least a portion of the plurality of band indices; and
the plurality of band indices to which the plurality of band offsets are to be applied correspond to at least one of a plurality of subsets of the plurality of pixels, wherein the at least one subset has a relatively larger or maximum number of pixels compared to other subsets of the plurality of pixels.
(11) The apparatus of (6), wherein:
the plurality of band offsets are received by the apparatus from the at least one additional apparatus via signaling; and
the processor is configured to identify the plurality of band indices inferentially based on the pixel value distribution.
(12) The apparatus of (6), wherein:
the apparatus is a receiver communication device comprising a video decoder;
the at least one additional apparatus is a transmitter communication device comprising a video encoder; and
the receiver communication device and the transmitter communication device are connected or communicatively coupled via at least one communication channel.
(13) The apparatus of (6), wherein:
the apparatus is a communication device operating in at least one of a satellite communication system, a wireless communication system, a wired communication system, a fiber optic communication system, and a mobile communication system.
(14) A method of operation of a communication device, the method comprising:
receiving, via an input of the communication device, a video signal and a plurality of band offsets from at least one additional communication device;
analyzing a plurality of pixels associated with at least one Largest Coding Unit (LCU) associated with the video signal to identify a pixel value distribution for identifying a plurality of band indices; and
applying the plurality of band offsets to the plurality of band indexes according to a filtering process of the video signal or a signal based on the video signal.
(15) The method of (14), further comprising:
performing a Sample Adaptive Offset (SAO) filtering process on the video signal or a signal based on the video signal to generate a first filtered signal, wherein the sample adaptive offset filtering process includes applying the plurality of band offsets to the plurality of band indices; and
a deblocking filtering process is performed on the first filtered signal to generate a second filtered signal.
(16) The method of (14), further comprising:
analyzing a plurality of pixels associated with the at least one largest coding unit associated with the video signal to generate a pixel value histogram representing the pixel value distribution; and
identifying a plurality of band indices to which the plurality of band offsets are to be applied based on the histogram of pixel values.
(17) The method of (14), wherein:
the plurality of band indices have a discontinuous distribution such that at least two consecutive band indices of the plurality of band indices are separated from each other by at least one band index value.
(18) The method of (14), wherein:
the pixel value distribution indicates a plurality of subsets of the plurality of pixels respectively associated with at least a portion of the plurality of band indices; and
the plurality of band indices to which the plurality of band offsets are to be applied correspond to at least one of a plurality of subsets of the plurality of pixels, wherein the at least one subset has a relatively larger or maximum number of pixels compared to other subsets of the plurality of pixels.
(19) The method of (14), further comprising:
receiving the plurality of band offsets from the at least one additional communication device via signaling; and
the plurality of band indices are inferentially identified based on the pixel value distribution.
(20) The method of (14), wherein:
the communication device operates in at least one of a satellite communication system, a wireless communication system, a wired communication system, a fiber optic communication system, and a mobile communication system.
Drawings
Fig. 1 and 2 illustrate various embodiments of a communication system.
FIG. 3A illustrates an embodiment of a computer.
Fig. 3B illustrates an embodiment of a notebook computer.
Fig. 3C shows an embodiment of a High Definition (HD) television.
Fig. 3D shows an embodiment of a Standard Definition (SD) television.
Fig. 3E illustrates an embodiment of a handheld media unit.
Fig. 3F shows an embodiment of a set-top box (STB).
Figure 3G shows an embodiment of a Digital Video Disc (DVD) player.
Fig. 3H illustrates an embodiment of a general digital image and/or video processing device.
Fig. 4, 5 and 6 are diagrams illustrating various embodiments of video encoding architectures.
Fig. 7 is a diagram illustrating an embodiment of an intra prediction process.
Fig. 8 is a diagram illustrating an embodiment of an inter prediction process.
Fig. 9 and 10 are diagrams illustrating various embodiments of video decoding architectures.
Fig. 11 shows an embodiment of a band offset Sample Adaptive Offset (SAO) filtering process.
Fig. 12 shows an alternative embodiment of a video coding architecture.
Fig. 12 shows an embodiment of a slicer header syntax.
Fig. 13 illustrates various embodiments of indications (adaptive and/or explicit signaling) of transmission band offsets in a sample point adaptive offset (SAO) band offset mode.
Fig. 14 illustrates various embodiments of indication of band granularity (adaptive and/or explicit signaling) in SAO band offset mode.
Fig. 15 shows an embodiment of implicit band index signaling.
Fig. 16 shows an alternative embodiment of implicit band index signaling.
Fig. 17 shows an embodiment of band offset coding.
Fig. 18 and 19 illustrate various embodiments of methods for operating one or more devices (e.g., a communication device, a receiver and/or decoder device, a transmitter and/or encoder device, etc.).
Detailed Description
In many devices that use digital media, such as digital video, individual images (which are digital in nature) are represented by pixels. In some communication systems, digital media may be transferred from a first location to a second location where such media may be output or displayed. Digital communication systems, including those operating to communicate digital video, aim to transmit digital data from one location or subsystem to another without error or with an acceptably low error rate. As shown in fig. 1, data may be transmitted over various communication channels in various communication systems: magnetic media, wired, wireless, fiber optic, copper, and/or other types of media.
Fig. 1 and 2 illustrate various embodiments of communication systems 100 and 200, respectively.
Referring to fig. 1, an embodiment of a communication system 100 is a communication channel 199 that communicatively couples a communication device 110 (including a transmitter 112 with an encoder 114 and including a receiver 116 with a decoder 118) at one end of the communication channel 199 to another communication device 120 (including a transmitter 126 with an encoder 128 and including a receiver 122 with a decoder 124) at the other end of the communication channel 199. In some embodiments, one of the communication devices 110 and 120 may include only a transmitter or a receiver. There are several different types of media through which communication channels 199 may be implemented (e.g., satellite communication channel 130 using satellite dishes 132 and 134, wireless communication channel 140 using towers 142 and 144 and/or local antennas 152 and 154, wired communication channel 150, and/or fiber optic communication channel 160 using an electro-optical (E/O) interface 162 and an electro-optical (O/E) interface 164). In addition, more than one type of media may be implemented and may be interfaced together to form communication channel 199.
It is noted that such communication devices 110 and/or 120 may be stationary devices or mobile devices without departing from the scope and spirit of the present invention. For example, one or both of communication devices 110 and 120 may be implemented in a fixed location or may be a mobile communication device having the capability to associate with and/or communicate with more than one network access point (e.g., a different respective Access Point (AP) in a mobile communication system environment including one or more Wireless Local Area Networks (WLANs), a different respective satellite in a mobile communication system environment including one or more satellites, or a different respective network access point generally in a mobile communication system environment including one or more network access points through which communications may be implemented with communication device 110 and/or 120).
Error correction and channel coding schemes are commonly employed to reduce transmission errors that are inevitably incurred in communication systems. In general, these error correction and channel coding schemes involve the use of an encoder at the transmitter end of communication channel 199 and a decoder at the receiver end of communication channel 199.
Any of the various types of ECC codes described may be employed in any such desired communication system (e.g., including those variations described with respect to fig. 1), any information storage device (e.g., Hard Disk Drive (HDD), network information storage device, and/or server, etc.), or any application that requires encoding and/or decoding of information.
Generally, when considering a communication system in which video data is transmitted from one location or subsystem to another, video data encoding may generally be considered to occur at the transmitting end of communication channel 199, and video data decoding may generally be considered to occur at the receiving end of communication channel 199.
Likewise, although the illustrated embodiment shows bi-directional communication between communication devices 110 and 120, it is noted of course that in some embodiments, communication device 110 may include only video data encoding functionality and communication device 120 may include only video data decoding functionality, and vice versa (e.g., a unidirectional embodiment according to a video broadcast embodiment).
Referring to communication system 200 of fig. 2, at the transmit end of communication channel 299, information bits 201 (e.g., corresponding in particular to video data in one embodiment) are provided to a transmitter 297, which is operable to perform decoding of these information bits 201 using an encoder and symbol mapper 220 (which may be considered as independent functional blocks 222 and 224, respectively), thereby generating a series of discrete-valued modulation symbols 203, which are provided to a transmit driver 230, which uses a digital-to-analog converter (DAC) 232 to generate a continuous-time transmission signal 204 and uses a transmit filter 234 to generate a filtered continuous-time transmission signal 205 that is approximately commensurate with communication channel 299. At the receiving end of communication channel 299, continuous-time received signal 206 is provided to an Analog Front End (AFE) 260, which includes a receive filter 262 (which generates filtered continuous-time received signal 207) and an analog-to-digital converter (ADC) 264 (which generates discrete-time received signal 208). Metric generator 270 computes metric values 209 (e.g., symbol and/or bit based) that are used by decoder 280 to make the best estimates of the discrete-valued modulation symbols and the information bits encoded therein 210.
In each of the transmitter 297 and the receiver 298, any desired integration of the respective components, blocks, functional blocks, circuits, and the like may be realized. For example, the figure shows that the processing module 280a includes the encoder and symbol mapper 220 and all associated corresponding components therein, and that the processing module 280 includes the metric generator 270 and the decoder 280 and all associated corresponding components therein. Such processing modules 280a and 280b may be respective integrated circuits. Of course, other boundaries and groupings may alternatively be made without departing from the scope and spirit of the present invention. For example, all components in the transmitter 297 may be included in a first processing module or integrated circuit, and all components in the receiver 298 may be included in a second processing module or integrated circuit. Alternatively, the components in each of the transmitter 297 and receiver 298 may be combined arbitrarily in other embodiments.
As with the previous embodiments, such a communication system 200 may be used for communication of video data transmitted from one location or subsystem to another location (e.g., from a transmitter 297 to a receiver 298 via a communication channel 299).
Video processing of digital images and/or media, including individual images in a digital video signal, may be implemented by any of the various devices shown in figures 3A through 3H below, allowing a user to view such digital images and/or video. These various devices do not comprise an exhaustive list of devices in which the image and/or video processing described herein may be implemented, and it is noted that any general purpose digital image and/or video processing device may be implemented for performing the processing described herein without departing from the scope and spirit of the present invention.
Fig. 3A illustrates an embodiment of a computer 301. The computer 301 may be a desktop computer, or a host's enterprise storage device (such as a server) that is attached to a storage array (such as a Redundant Array of Independent Disks (RAID), storage router, edge router, storage switch, and/or storage director). A user can utilize the computer 301 to view still digital images and/or video (e.g., a series of digital images). A variety of image and/or video viewing programs and/or media player programs are oftentimes included on the computer 301 to allow a user to view such images (including video).
Fig. 3B illustrates an embodiment of a notebook computer 302. Such a laptop 302 may be discovered and used in any of a variety of situations. In recent years, with the increasing processing power and functionality of notebook computers, such notebook computers have been sampled in many cases where high-end and more capable desktop computers have been previously used. Like computer 301, notebook computer 302 may include various image viewing programs and/or media player programs to allow a user to view such images (including video).
Fig. 3C shows an embodiment of a High Definition (HD) television 303. Many HD televisions 303 include an integrated tuner that allows media content (e.g., television broadcast signals) to be received, processed, and decoded thereon. Alternatively, at times, the HD television 303 receives media content from another source (such as a Digital Video Disc (DVD) player, a Set Top Box (STB)) that receives, processes, and decodes cable and/or satellite television broadcast signals. Regardless of the particular implementation, the HD television 303 may be implemented to perform image and/or video processing as described herein. In general, the HD television 303 has the capability to display HD media content and is often implemented with an aspect ratio of 16:9 widescreen.
Fig. 3D illustrates an embodiment of a Standard Definition (SD) television 304. Of course, the SD television 304 is somewhat similar to the HD television 303, at least one difference being that the SD television 304 does not include the capability to display HD media content, and the SD television 304 is oftentimes implemented with a full screen aspect ratio of 4: 3. Nonetheless, even the SD television 304 may be implemented to perform image and/or video processing as described herein.
Fig. 3E illustrates an embodiment of handheld media unit 305. Handheld Media unit 305 may be operative to universally store or store image/video content information for playback by a user, such as Joint Photographic Experts Group (JPEG) files, Tagged Image File Format (TIFF), bitmaps, Moving Picture Experts Group (MPEG) files, Windows Media (WMA/WMV) files, other types of video files such as MPEG4 files, and the like, and/or other types of information that may be stored in a digital format. Historically, such handheld media units have been used primarily for storage and playback of audio media; however, such handheld media units 305 may be used to store and playback any virtual media (e.g., audio media, video media, image media, etc.). Moreover, such handheld media unit 305 may also include other functionality, such as an integrated communication circuit for wired and wireless communication. Such handheld media unit 305 may be implemented to perform image and/or video processing as described herein.
Fig. 3F illustrates an embodiment of a Set Top Box (STB) 306. As described above, STB306 may sometimes be implemented to receive, process, and decode cable and/or satellite television broadcast signals to be provided to any suitable display-capable device, such as SD television 304 and/or HD television 303. Such STB306 may operate independently or cooperatively with a display-enabled device for image and/or video processing as described herein.
Figure 3G shows an embodiment of a Digital Video Disc (DVD) player 307. Such a DVD player may be a blu-ray DVD player, an HD available DVD player, an SD available DVD player, an upsampled available DVD player (e.g., from SD to HD, etc.) without departing from the scope and spirit of the present invention. The DVD player may provide signals to any suitable display-enabled device, such as the SD television 304 and/or the HD television 303. The DVD player 305 may be implemented to perform image and/or video processing as described herein.
Fig. 3H illustrates an embodiment of a general-purpose digital image and/or video processing device 308. Additionally, as noted above, these various devices described above do not include an exhaustive list of devices that can implement the image and/or video processing described herein, it being noted that any general purpose digital image and/or video processing device 308 can be implemented for performing the image and/or video processing described herein without departing from the scope and spirit of the present invention.
Fig. 4, 5 and 6 are diagrams illustrating various embodiments 400, 500 and 600 of a video coding architecture, respectively.
Referring to the embodiment 400 of fig. 4, it can be seen that an input video signal is received by a video encoder. In some embodiments, the input video signal is composed of Coding Units (CUs) or Macroblocks (MBs). Such coding units or macroblocks are variable in size and may include a plurality of pixels arranged generally in a square. In one embodiment, the size of such a coding unit or macroblock is 16 × 16 pixels. However, it is generally noted that a macroblock may have any desired size, such as N × N pixels, where N is an integer (e.g., 16 × 16, 8 × 8, or 4 × 4). Of course, while square coding units or macroblocks are employed in the preferred embodiment, some implementations may include non-square coding units or macroblocks.
The input video signal may be generally referred to as corresponding to original frame (or picture) image data. For example, raw frame (or picture) image data may undergo processing to generate luma and chroma samples. In some embodiments, the set of luma samples in a macroblock is one particular arrangement (e.g., 16 × 16) and the set of chroma samples is a different particular arrangement (e.g., 8 × 8). According to embodiments described herein, a video encoder processes these samples on a block-by-block basis.
The input video signal then undergoes mode selection, in accordance with which the input video signal is selectively subjected to intra-and/or inter-prediction processing. Generally, an input video signal undergoes compression along a compression path. When operating without feedback (e.g., not according to inter-frame prediction, nor intra-frame prediction), the input video signal is provided via a compression path to undergo a transform operation (e.g., according to Discrete Cosine Transform (DCT)). Of course, other transformations may be employed in alternative embodiments. In this mode of operation, the input video signal itself is the compressed signal. The compression path may take advantage of the lack of high frequency sensitivity of the human eye to compress.
However, by selectively using intra-or inter-prediction video coding, feedback can be made along the compression path. The compression path operates on (relatively low energy) redundancy (e.g., difference) resulting from subtracting the current macroblock prediction from the current macroblock, depending on the feedback or prediction mode of operation. Depending on which form of prediction is employed in a given instance, a redundancy or difference between the current macroblock and a macroblock prediction value based at least on a portion of the same frame (or picture) or based at least on a portion of at least one other frame (or picture) is generated.
The resulting modified video signal then undergoes a transform operation along the compression path. In one embodiment, a Discrete Cosine Transform (DCT) operates on a set of video samples (e.g., luminance, chrominance, redundancy, etc.) to compute coefficient values for each of a predetermined number of base modes. For example, one embodiment includes 64 basis functions (e.g., for an 8 × 8 sample). In general, different implementations may employ different numbers of basis functions (e.g., different transforms). Any combination of these individual basis functions (including suitably and selectively weighted basis functions) may be used to represent a given set of video samples. Additional details regarding the various ways in which transform operations are performed are described in the technical literature associated with video coding including those standard/draft standards incorporated by reference as described above. The output of the transform process includes respective coefficient values. The output is provided to a quantizer.
Typically, most image blocks will typically produce coefficients (e.g., DCT coefficients in embodiments operating according to a Discrete Cosine Transform (DCT)), such that the frequency of most of the associated DCT coefficients is low. For this reason and the relatively poor sensitivity of the human eye to high frequency visual effects, the quantizer may be operative to convert most of the less correlated coefficients to zero values. That is, those coefficients whose relative contribution rates are below a certain predetermined value (e.g., a certain threshold) may be eliminated according to the quantization process. The quantizer is also operable to convert the significant coefficients into values that can be encoded more efficiently than the values produced by the transform process. For example, the quantization process may operate by dividing each respective coefficient by an integer value and discarding any remainder. When operating on a typical coding unit or macroblock, this process typically results in a relatively small number of non-zero coefficients that are then transmitted to an entropy encoder for lossless encoding and for use in accordance with a feedback path by which intra and/or inter prediction processes may be selected in accordance with video encoding.
The entropy encoder operates according to a lossless compression encoding process. In contrast, quantization operations are typically lossy. The entropy coding process operates on the coefficients provided by the quantization process. Those coefficients may represent various features (e.g., luminance, chrominance, redundancy, etc.). The entropy encoder may employ various types of encoding. For example, the entropy encoder may perform Context Adaptive Binary Arithmetic Coding (CABAC) and/or Context Adaptive Variable Length Coding (CAVLC). For example, according to at least a portion of the entropy encoding scheme, the data is converted into (run-level) pairs (e.g., data 14, 3, 0, 4, 0, 0, -3 is converted into respective (run, level) pairs (0, 14), (0, 3), (1, 4), (2, -3)). In advance, a table is compiled in which variable length codes are assigned to the value pairs, so that relatively shorter length codes are assigned to relatively common value pairs and relatively longer length codes are assigned to relatively rare value pairs.
As the reader will understand, the operations of inverse quantization and inverse transformation correspond to the operations of quantization and transformation, respectively. For example, in embodiments where a DCT is used for the transform operation, the inverse DCT (idct) is the transform employed in the inverse transform operation.
A picture buffer (alternatively referred to as a digital picture buffer or DPB) receives signals from the IDCT module; the picture buffer operates to store a current frame (or picture) and/or one or more other frames (or pictures), such as frames (or pictures) used as intra-prediction and/or inter-prediction operations pursuant to video encoding. It is noted that, according to intra prediction, a relatively small amount of storage may be sufficient, since it may not be necessary to store the current frame (or picture) or any other frame (or picture) in a sequence of frames (or pictures). When inter-prediction is performed according to video coding, the stored information can be used for motion compensation and/or motion estimation.
In one possible implementation, for motion estimation, a corresponding set of luma samples (e.g., 16 x 16) from a current frame (or picture) is compared to respective buffer counterparts in other frames (or pictures) in a sequence of frames (or pictures) (e.g., according to inter-prediction). In one possible implementation, the best matching region is located (e.g., prediction reference) and a vector offset (e.g., motion vector) is generated. In a single frame (or picture) multiple motion vectors can be found, but not all motion vectors have to point in the same direction. One or more operations performed in accordance with motion estimation are operatively used to generate one or more motion vectors.
Motion compensation operatively employs one or more motion vectors that may be generated from motion estimation. A prediction reference set of samples is identified and delivered for subtraction from the original input video signal in an attempt to hopefully produce relatively (e.g., ideal, multiple) low energy redundancy. If such an operation does not result in a lower energy redundancy, no motion compensation is necessary, the transform operation may operate only on the original input video signal and not on the redundancy (e.g., depending on the mode of operation in which the input video signal is provided directly to the transform operation so that neither intra prediction nor inter prediction is performed), or intra prediction may be used and the redundancy resulting from intra prediction may be transformed. Likewise, if the motion estimation and/or motion compensation operations succeed, the motion vectors can also be sent to the entropy coder along with the corresponding redundancy coefficients for lossless entropy coding.
The output from the entire video encoding operation is an output bitstream. It is noted that the output bit stream may of course be subjected to certain processing in accordance with the generation of a continuous-time signal, which may be transmitted via a communication channel. For example, certain embodiments operate in a wireless communication system. In such a case, the output bit stream may be subjected to appropriate digital-to-analog conversion, frequency conversion, scaling, filtering, modulation, symbol mapping, and/or any other operations in a wireless communication device for generating a continuous-time signal or the like capable of transmission via a communication channel.
Referring to the embodiment 500 of fig. 5, it can be seen with respect to this figure that the input video signal is received by a video encoder. In some implementations, the input video signal is made up of coding units or macroblocks (and/or may be divided into Coding Units (CUs)). The size of the coding unit or macroblock may vary and may include a plurality of pixels arranged generally in a square. In one embodiment, the size of a coding unit or macroblock is 16 × 16 pixels. However, it is generally noted that a macroblock may have any desired size, such as N × N pixels, where N is an integer. Of course, while square coding units or macroblocks are employed in the preferred embodiment, some implementations may include non-square coding units or macroblocks.
The input video signal may be generally referred to as corresponding to original frame (or picture) image data. For example, raw frame (or picture) image data may be processed to generate luma and chroma samples. In some embodiments, the set of luma samples in a macroblock is one particular arrangement (e.g., 16 × 16) and the set of chroma samples is a different particular arrangement (e.g., 8 × 8). According to embodiments described herein, a video encoder processes these samples on a block-by-block basis.
The input video signal is then subjected to mode selection, in accordance with which the input video signal may be selectively subjected to intra and/or inter prediction processing. Generally, an input video signal is compressed along a compression path. When operating without feedback (e.g., not according to inter-frame prediction, nor intra-frame prediction), the input video signal is provided via a compression path to perform a transform operation (e.g., according to Discrete Cosine Transform (DCT)). Of course, other transformations may be employed in alternative embodiments. In this mode of operation, the input video signal itself is the compressed signal. The compression path may take advantage of the lack of high frequency sensitivity of the human eye to compress.
However, by selectively using intra-or inter-prediction video coding, feedback can be made along the compression path. The compression path operates on (relatively low energy) redundancy (e.g., difference) resulting from subtracting the current macroblock predictor from the current macroblock, depending on the feedback or prediction mode of operation. Depending on which form of prediction is employed in a given instance, a redundancy or difference between the current macroblock and a macroblock prediction value based at least on a portion of the same frame (or picture) or based at least on a portion of at least one other frame (or picture) is generated.
The resulting modified video signal is then subjected to a transform operation along the compression path. In one embodiment, a Discrete Cosine Transform (DCT) operates on a set of video samples (e.g., luminance, chrominance, redundancy, etc.) to compute coefficient values for each of a predetermined number of base modes. For example, one embodiment includes 64 basis functions (e.g., for an 8 × 8 sample). In general, different implementations may employ different numbers of basis functions (e.g., different transforms). Any combination of these basis functions, including suitable selective weighting, may be used to represent a given set of video samples. Additional details regarding the various ways in which transform operations are performed are described in the technical literature associated with video coding including those standard/draft standards incorporated by reference as described above. The output of the transform process includes respective coefficient values. The output is provided to a quantizer.
Typically, most image blocks will typically produce coefficients (e.g., DCT coefficients in embodiments operating according to a Discrete Cosine Transform (DCT)), such that the frequency of most of the associated DCT coefficients is low. For this reason and the relatively poor sensitivity of the human eye to high frequency visual effects, the quantizer may be operative to convert most of the less correlated coefficients to zero values. That is, those coefficients whose relative contribution is below a certain predetermined value (e.g., a certain threshold) may be eliminated according to the quantization process. The quantizer is also operable to convert the significant coefficients into values that can be encoded more efficiently than the values produced by the process of variation. For example, the quantization process may operate by dividing the respective coefficients by integer values and discarding any remainder. When operating on a typical coding unit or macroblock, this process typically results in a relatively small number of non-zero coefficients that are then transmitted to the entropy encoder for lossless encoding and selection of a feedback path for intra and/or inter prediction processing for use in accordance with video encoding.
The entropy encoder operates according to a lossless compression encoding process. In contrast, quantization operations are generally lossy. The entropy coding process operates on the coefficients provided by the quantization process. Those coefficients may represent various features (e.g., luminance, chrominance, redundancy, etc.). The entropy encoder may employ various types of encoding. For example, the entropy encoder may perform Context Adaptive Binary Arithmetic Coding (CABAC) and/or Context Adaptive Variable Length Coding (CAVLC). For example, according to at least a portion of the entropy encoding scheme, the data is converted into (run-level) pairs (e.g., data 14, 3, 0, 4, 0, 0, -3 is converted into respective (run, level) pairs (0, 14), (0, 3), (1, 4), (2, -3)). In advance, a table is compiled that assigns variable length codes to the value pairs such that relatively shorter length codes are assigned to relatively common value pairs and relatively longer length codes are assigned to relatively infrequent value pairs.
As the reader will understand, the operations of inverse quantization and inverse transformation correspond to the operations of quantization and transformation, respectively. For example, in embodiments where a DCT is used for the transform operation, the inverse DCT (idct) is the transform employed in the inverse transform operation.
An Adaptive Loop Filter (ALF) is implemented to process the output from the inverse transform block. Adaptive Loop Filters (ALF) are applied to decoded pictures before they are stored in a picture buffer (sometimes referred to as DPB, digital picture buffer). An Adaptive Loop Filter (ALF) is implemented to reduce coding noise of a decoded image, and may selectively filter luminance and chrominance, respectively, on a per-tile basis, regardless of whether the Adaptive Loop Filter (ALF) is applied at a slice level or at a block level. Two-dimensional 2-D Finite Impulse Response (FIR) filtering may be used in the application of Adaptive Loop Filters (ALF). The coefficients of the filter may be designed slice-by-slice in an encoder and then pass this information to a decoder (e.g., from a transmitter communication device comprising a video encoder (alternatively referred to as an encoder) to a receiver communication device comprising a video decoder (alternatively referred to as a decoder)).
One embodiment operates by generating coefficients according to a wiener filter design. In addition, whether or not filtering processing is performed and whether or not the decision is passed to the decoder according to a quadtree structure (e.g., from a transmitter communication device comprising a video encoder (alternatively referred to as an encoder) to a receiver communication device comprising a video decoder (alternatively referred to as a decoder)), can be applied block by block in the encoder, where the block size is decided according to rate-distortion optimization. It is noted that implementations utilizing 2D filtering may introduce complexity depending on encoding and decoding. For example, by using 2D filtering according to Adaptive Loop Filter (ALF) and implementation, there may be some increased complexity in the encoder (which is implemented in the transmitter communication device) and decoder (which is implemented in the receiver communication device).
In some alternative embodiments, the output from the deblocking filter is provided to one or more other in-loop filters implemented to process the output from the inverse transform block (e.g., implemented in accordance with an Adaptive Loop Filter (ALF), a Sample Adaptive Offset (SAO) filter, and/or any other filter type). For example, ALF is applied to decoded pictures before they are stored in a picture buffer (sometimes referred to as DPB, digital picture buffer). ALF is implemented to reduce coding noise of decoded images, and luminance and chrominance may be selectively filtered on a slice-by-slice basis, respectively, regardless of whether ALF is applied at a slice level or at a block level. Two-dimensional 2-D Finite Impulse Response (FIR) filtering may be used in ALF applications. The coefficients of the filter may be designed slice-by-slice in an encoder and then pass this information to a decoder (e.g., from a transmitter communication device comprising a video encoder (alternatively referred to as an encoder) to a receiver communication device comprising a video decoder (alternatively referred to as a decoder)).
One embodiment operates according to a wiener filter design to generate coefficients. In addition, whether or not filtering processing is performed and the decision is passed to the decoder according to a quadtree structure (e.g., from a transmitter communication device comprising a video encoder (alternatively referred to as an encoder) to a receiver communication device comprising a video decoder (alternatively referred to as a decoder)), it can be applied block-by-block in the encoder, where the block size is decided according to rate-distortion optimization. It is noted that implementations utilizing 2D filtering may introduce complexity depending on encoding and decoding. For example, by using 2D filtering according to ALF and implementation, there may be some increased complexity in the encoder (which is implemented in the transmitter communication device) and decoder (which is implemented in the receiver communication device).
As described for other embodiments, the use of ALF may provide one of a range of improvements from this video processing, including improving the objective quality measure by performing random quantization denoising-induced peak signal-to-noise ratio (PSNR). In addition, subjective quality of the subsequently encoded video signal may be achieved by illumination compensation, which may be introduced in accordance with performing offset processing and scaling processing (e.g., in accordance with applying gain) in accordance with the ALF processing.
For one type of in-loop filter, the use of an Adaptive Loop Filter (ALF) can provide one of a number of improvements in accordance with this video processing, including an improvement in objective quality measure by performing random quantization denoising-induced peak signal-to-noise ratio (PSNR). In addition, subjective quality of the subsequently encoded video signal may be achieved by illumination compensation, which may be introduced in accordance with performing offset processing and scaling processing (e.g., in accordance with applying gain) in accordance with Adaptive Loop Filter (ALF) processing.
A picture buffer (alternatively referred to as a digital picture buffer or DPB) receives the signal output from the ALF; the picture buffer operates to store a current frame (or picture) and/or one or more other frames (or pictures), such as frames (or pictures) used in accordance with intra-prediction and/or inter-prediction operations pursuant to video encoding. It is noted that, according to intra prediction, a relatively small amount of storage may be sufficient, since it may not be necessary to store any other frame (or picture) or current frame (or picture) in a sequence of frames (or pictures). When inter-prediction is performed according to video coding, the stored information can be used for motion compensation and/or motion estimation.
In one possible implementation, for motion estimation, a corresponding set of luma samples (e.g., 16 x 16) from a current frame (or picture) is compared to respective buffer counterparts in other frames (or pictures) in a sequence of frames (or pictures) (e.g., according to inter-prediction). In one possible implementation, the best matching region is located (e.g., prediction reference) and a vector offset (e.g., motion vector) is generated. In a single frame (or picture) multiple motion vectors can be found, but not all motion vectors have to point in the same direction. One or more motion vectors are operatively generated from one or more operations performed by the motion estimation.
Motion compensation operatively employs one or more motion vectors that may be generated from motion estimation. A prediction reference set of samples is identified and delivered for subtraction from the original input video signal in an attempt to hopefully produce relatively (e.g., ideally multiple) lower energy redundancy. If such an operation does not result in a lower energy redundancy, no motion compensation is necessary, the transform operation may operate only on the original input video signal and not on the redundancy (e.g., depending on the mode of operation in which the input video signal is provided directly to the transform operation so that neither intra prediction nor inter prediction is performed), or intra prediction may be used and the redundancy resulting from intra prediction may be transformed. Likewise, if the motion estimation and/or motion compensation operations succeed, the motion vectors can also be sent to the entropy coder along with the corresponding redundancy coefficients for lossless entropy coding.
The output from the entire video encoding operation is an output bitstream. It is noted that the output bit stream may of course be processed in accordance with generating a continuous-time signal, which may be transmitted via a communication channel. For example, certain embodiments operate in a wireless communication system. In this case, the output bit stream may be subjected to appropriate digital-to-analog conversion, frequency transformation, scaling, filtering, modulation, symbol mapping, and/or any other operations in a wireless communication device (which is used to generate a continuous-time signal capable of transmission via a communication channel, etc.).
Referring to embodiment 600 of fig. 6, an alternative embodiment of a video encoder that performs prediction, transform, and encoding processes to produce a compressed output bitstream is described. Such a video encoder may operate and be compatible with one or more video encoding protocols, standards and/or recommendations, such as part 10 of ISO/IEC14496-10-MPEG-4, AVC (advanced video coding) (or part 10 of H.264/MPEG-4 or AVC (advanced video coding), ITU H.264/MPEG 4-AVC).
It is noted that a corresponding video decoder, such as a video decoder within a device at the other end of the communication channel, operates to perform the complementary processing of decoding, inverse transformation and reconstruction in order to produce a corresponding decoded video sequence, which sequence (ideally) represents the input video signal.
When this map is compared to the previous map, the signal path output from the inverse quantization and inverse transform (e.g., IDCT) block provided to the intra-predicted block is also provided to the deblocking filter. The output from the deblocking filter is provided to one or more other in-loop filters (e.g., implemented in accordance with an Adaptive Loop Filter (ALF), a sample point adaptive offset (SAO) filter, and/or any other filter type) that are implemented to process the output from the inverse transform block. For example, in one possible implementation, the SAO filter is applied to the decoded pictures before the decoded pictures are stored in a picture buffer (sometimes referred to as DPB, digital picture buffer).
With respect to any video encoder architecture implemented to generate an output bitstream, it is noted that such an architecture may be implemented in any of a variety of communication devices. The output bitstream may undergo additional processing, including Error Correction Codes (ECC), Forward Error Correction (FEC), etc., to generate a modified output bitstream having additional redundant transactions therein. Also, as will be appreciated in connection with such digital signals, any suitable processing may be performed in accordance with generating a continuous-time signal suitable or appropriate for transmission via a communication channel. That is, such a video encoder architecture may be implemented in a communication device for transmitting one or more signals via one or more communication channels. The output bitstream generated by such a video encoder architecture may be subjected to additional processing to generate a continuous-time signal that may be transmitted into a communication channel.
Fig. 7 is a diagram illustrating an embodiment 700 of an intra prediction process. It can be seen from this figure that a current block of video data (e.g. generally square and generally comprising N x N pixels) is processed to estimate individual pixels therein. Previously encoded pixels located above and to the left of the current block are employed according to intra prediction. From some perspective, the intra prediction direction may be considered to correspond to a vector extending from the current pixel to a reference pixel located above or to the left of the current pixel. The details OF the intra-prediction applied to the encoding according to H.264/AVC are specified within corresponding standards incorporated by reference above (e.g., International TELECOMMUNICATION Union, ITU-T, TELECOMMUNICATION STANDARDIZATION SECTOR OF ITU, H.264(03/2010), SERIES H: AUDIOVUAL AND MULTIMEDIA SYSTEMS, Infrastraction OF audio service-Coding OF moving video, advanced video Coding for genetic audio service, Recommendation ITU-T H.264 (or part 10 OF International TELECOMMUNICATION ISO/IEC14496-10-MPEG-4, AVC (advanced video Coding), part 10 OF H.264/MPEG-4 or AVC (advanced video Coding), H.264/MPEG4-AVC, or equivalent).
The redundancy, which is the difference between the current pixel and the reference or predicted pixel, is the encoded redundancy. As can be seen with respect to this figure, intra prediction operates with pixels within a common frame (or picture). It is of course noted that a given pixel may have a respective different component associated with it, and that there may be a respective different set of samples for each component.
Fig. 8 is a diagram illustrating an embodiment 800 of inter prediction processing. Unlike intra-prediction, inter-prediction is used to identify a motion vector (e.g., an inter-prediction direction) based on a current set of pixels within a current frame (or picture) and one or more sets of reference or predicted pixels located within one or more other frames (or pictures) within a sequence of frames (or pictures). It can be seen that the motion vectors extend from a current frame (or picture) to another frame (or picture) within a sequence of frames (or pictures). Inter prediction may use sub-pixel interpolation such that the predicted pixel values correspond to a function of multiple pixels in a reference frame or picture.
The redundancy may be calculated according to an inter prediction process, although such redundancy is different from the redundancy calculated according to an intra prediction process. According to the inter prediction process, the redundancy of each pixel again corresponds to the difference between the current pixel and the predicted pixel value. However, according to the inter prediction process, the current pixel and the reference or predicted pixel are not located within the same frame (or picture). Although the figure illustrates inter prediction employed with respect to one or more previous frames or pictures, it is also noted that alternative embodiments may operate with frames corresponding to frames preceding and/or following the current frame. For example, multiple frames may be stored according to appropriate buffering and/or memory management. When operating on a given frame, references may be generated from other frames that precede and/or follow the given frame.
In conjunction with a CU, a basic unit may be used for a prediction partition mode (i.e., a prediction unit or PU). It is also noted that PUs are defined only for the last depth CU, and the respective sizes are limited to the size of the CU.
Fig. 9 and 10 are diagrams illustrating various embodiments 900 and 1000, respectively, of a video decoding architecture.
Generally, such video decoding architectures operate on an input bitstream. It is of course noted that such an input bit stream may be generated from a signal received by the communication device from the communication channel. Various operations may be performed on the continuous-time signal received from the communication channel, including such operations as digital sampling, demodulation, scaling, filtering, etc., as may be appropriate in accordance with the generated input bit stream. Moreover, certain embodiments that may implement one or more Error Correction Codes (ECCs), Forward Error Correction (FEC), etc., may perform appropriate decoding in accordance with the ECC, FEC, etc., to generate an input bitstream. That is, in some embodiments where additional redundancy has been made in accordance with generating a corresponding output bit stream (e.g., an output bit stream that may be transmitted from a transmitter portion of a transmitter communication device or a transceiver communication device), appropriate processing may be made in accordance with generating an input bit stream. In general, such video decoding architectures unfortunately process the input bitstream to generate an output video signal corresponding to the original input video signal as closely and ideally as possible for output to one or more video display capable devices.
Referring to embodiment 900 of fig. 9, in general, a decoder, such as an entropy decoder (which may be implemented according to CABAC, CAVLC, etc., for example), processes an input bitstream according to a complementary process of encoding (as performed in a video encoder architecture). The input bitstream can be viewed (as closely and ideally as possible) as a compressed output bitstream generated by a video encoder architecture. Of course, in practical applications, some errors may have been experienced in the signals transmitted via one or more communication links. The entropy decoder processes the input bitstream and extracts appropriate coefficients, such as DCT coefficients (e.g., representing chrominance, luminance, etc. information), and provides the coefficients to the inverse quantization and inverse transform block. If a DCT transform is employed, the inverse quantization and inverse transform block may be implemented to perform an Inverse DCT (IDCT) operation. Subsequently, the a/D blocking filter is implemented to generate respective frames and/or pictures corresponding to the output video signal. These frames and/or pictures may be provided to a picture buffer or Digital Picture Buffer (DPB) for use in other operations including motion compensation. In general, such motion compensation operations may be considered to correspond to inter-prediction associated with video coding. Also, inter prediction may be performed on the signal output from the inverse quantization and inverse transformation block. Similar to video encoding, such a video decoder architecture may be implemented to perform mode selection, either by inter prediction or by intra prediction, in accordance with decoding an input bitstream, to generate an output video signal.
Referring to embodiment 1000 of fig. 10, in some alternative embodiments, such as one or more in-loop filters that may be implemented in accordance with video encoding used to generate the output bitstream (e.g., in accordance with an Adaptive Loop Filter (ALF), a sample point adaptive offset (SAO) filter, and/or any other filter type), the corresponding one or more in-loop filters may be implemented in a video decoder architecture. In one embodiment, one or more in-loop filters are implemented as appropriate after the deblocking filter.
According to some possible embodiments, after completing the deblocking filtering process of the decoded picture (e.g., according to SAO filtering implemented within other in-loop filters in fig. 6), a Sample Adaptive Offset (SAO) process may be performed. The processing is performed based on a region defined as one or more complete Largest Coding Units (LCUs).
Fig. 11 illustrates an embodiment 1100 of a band offset Sample Adaptive Offset (SAO) filtering process. The figure shows the concept of band offset SAO. After each offset is applied, the resulting pixels are clipped to the valid 8-bit pixel range [0,255 ]. In this figure, the offset is applied to four consecutive, active bands; the remaining bands are not modified. Of course, in other embodiments, such offsets may be applied to non-continuous bands.
Fig. 12 shows an alternative embodiment 1200 of a video coding architecture. In this implementation 1200, any one or more other in-loop filters (e.g., in-loop filters implemented according to an Adaptive Loop Filter (ALF), a Sample Adaptive Offset (SAO) filter, and/or any other filter type) may be implemented to process the output of the inverse quantization and inverse transform block (e.g., prior to the deblocking filter). In other words, in such embodiments, one or more other in-loop filters (e.g., SAO filters in one embodiment) may be applied prior to the deblocking process. In alternative embodiments, such in-loop filters (e.g., in-loop filters implemented according to an Adaptive Loop Filter (ALF), a sample point adaptive offset (SAO) filter, and/or any other filter type) may be implemented prior to the deblocking process. However, prior to the deblocking process, aspects, embodiments, and/or equivalents thereof operatively apply such in-loop filters (e.g., in-loop filters implemented according to an Adaptive Loop Filter (ALF), a sample point adaptive offset (SAO) filter, and/or any other filter type), as shown in fig. 12.
According to some embodiments, some undesirable blocking artifacts may occur when an SAO is operational and on (e.g., embodiments implementing such an SAO for output from a deblocking process). In these embodiments, this is primarily because two adjacent LCUs are using different band offset values. To alleviate this problem, this SAO may be applied prior to the deblocking process, and the deblocking process may be used to reduce any undesirable and appearing blocking artifacts in such cases. In this case, the boundary strength and the variables β and tC used during the deblocking process are also determined by the SAO parameters.
From some perspectives, the band offset SAO may be considered substantially as a correction filter (e.g., a histogram correction filter in some implementations). The pixels are classified based on the intensity values to produce a distribution of pixels. For example, for histogram implementations, pixels (e.g., pixels of one or more Largest Coding Units (LCUs)) are classified as histogram bins or "bands" based on intensity values. The entire pixel range (0-255) is divided into 32 uniform bands and a specific offset is added to all pixels in each band. The encoder selects the offset to apply from the range-7, 7.
Although the offset is applicable to all 32 bands, to simplify the band offset processing and reduce overhead, the reduced set (e.g., only 4 consecutive bands) can be actually modified with the band offset SAO in any LCU. The encoder selects four consecutive bands for which the offsets are transmitted. The remaining 28 bands are not modified (zero offset). Since there may be 32 bands, the first band with a non-zero offset is represented in the bitstream. The band _ position parameter carries this information. The remaining three active bands can be determined by (band _ position + i)%32 of i e [1,3 ]. Note that in the modulo operation here, if the first band is 29, 30 or 31, the remaining bands will wrap around to 0.
Fig. 13 illustrates various embodiments 1300 of indications (adaptive and/or explicit signaling) of transmission band offsets in a sample point adaptive offset (SAO) band offset mode. Such an operation may be accomplished according to adaptively indicating the number of transmission band offsets in the SAO band offset mode. For example, the number of transmission band offsets in the SAO band offset mode may be related to the size of the LCU (e.g., such that the number of transmission band offsets in the SAO band offset mode may be a function of the LCU size). For example, if the LCU size decreases, the number of transmission bands also decreases. As another example, 4 transmission band offsets may be used for 64 × 64 LCUs, 3 transmission band offsets may be used for 32 × 32 LCUs, and 2 transmission band offsets may be used for 16 × 16 LCUs. Generally, different respective amounts of transmission band offset in the SAO band offset mode may be indicated, depending on the different respective sizes of the LCUs.
The number of transmission band offsets per LCU size may also be explicitly signaled in terms of SPS (sequence parameter set), PPS (picture parameter size), APS (adaptive parameter set), macroblock slice header, LCU data and/or using other parts.
Fig. 14 illustrates various embodiments 1400 of indication (adaptive and/or explicit signaling) of band granularity in SAO band offset mode. Such operation may be accomplished according to adaptively specifying granularity in SAO band offset mode.
In some embodiments, the entire pixel range (0-255) is divided into 32 uniform frequency bands. Only 4 bands can be actually modified with the band offset SAO in any LCU. An encoder (e.g., a transmitter communication device) selects four consecutive frequency bands for which the offsets are transmitted. The remaining 28 bands are not modified (e.g., zero offset). In each band, a specific offset is added to all pixels.
Since the size of the LCU may be changed (e.g., 64 × 64, 32 × 32, or 16 × 16), the granularity of the frequency bands may be adaptive. For example, the smaller the size of the LCU, the coarser the granularity. As another example, if the LCU size is 32 x 32, then [0,255] can be evenly divided into 16 bands, each band covering 16 consecutive intensity values. In general, different respective band granularities in SAO band offset mode may be specified according to different respective sizes of LCUs.
The granularity of the frequency band for each LCU size may also be explicitly signaled in terms of SPS (sequence parameter set), PPS (picture parameter size), APS (adaptive parameter set), macroblock slice header, LCU data, and/or other parts of use.
Fig. 15 shows an embodiment 1500 of implicit band index signaling. For example, this information may be inferred (e.g., determined based on analysis of the LCU, determined inferentially, etc.) based on pixel values of the current LCU without explicitly signaling the band index. For example, by generating a histogram of pixel values for the LCU, the band offset may be used for the frequency band where the number of pixels is dominant. Such band indices need not be contiguous (e.g., the band indices may be made to have a discontinuous distribution such that at least two consecutive band indices may be separated by at least one band index value, in other words, the band indices need not be contiguous with each other).
Fig. 16 shows an alternative embodiment 1600 of implicit band index signaling. In the greatly simplified diagram showing the very simplified embodiment, there are LCUs with only two gray levels. A histogram (e.g., one way of describing the distribution of pixels possible as understood by the reader) illustrates that 50% of the pixels have a gray scale of 25 and 50% of the pixels have a gray scale of 205. Thus, two band offsets are sufficient instead of four.
Fig. 17 shows an embodiment 1700 of band offset coding. In bandoffset mode, since sao _ band _ position indicates the beginning of a bandoffset with a non-zero offset, the first offset value sao _ offset [ cldx ] [ saoDepth ] [ x0] [ y0] [0] must be non-zero (e.g., in some cases, the smallest possible value may be a value of 1). So, the sign bits of sao _ offset [ cldx ] [ sao Depth ] [ x ] [0] [ y0] [0], bs (sao _ offset [ cldx ] [ sao Depth ] [ x0] [ y0] [0])1 and sao _ offset [ cldx ] [ sao Depth ] [ x0] [ y0] [0] may be encoded separately, where abs is a function that computes absolute values.
Fig. 18 and 19 illustrate various embodiments of methods of operation of one or more devices (e.g., a communication device, a receiver and/or decoder device, a transmitter and/or encoder device, etc.).
Referring to the method 1800 of fig. 18, the method 1800 first receives a video signal and a plurality of band offsets from at least one additional communications device via an input of the communications device, as shown in block 1810.
The method 1800 continues with analyzing a plurality of pixels associated with at least one Largest Coding Unit (LCU) associated with the video signal to identify a distribution of pixel values for identifying a plurality of band indices, as shown in block 1820.
The method 1800 then applies a plurality of band offsets to the plurality of band indices based on the filtering process of the video signal or the signal based thereon, as indicated at block 1830.
Referring to method 1900 of fig. 19, the method 1900 first receives a video signal and a plurality of band offsets from at least one additional communication device via an input of the communication device, as shown in block 1910.
The method 1900 continues with analyzing a plurality of pixels associated with at least one Largest Coding Unit (LCU) associated with the video signal to identify a pixel value distribution for identifying a plurality of band indices, as shown in block 1920.
The method 1900 continues with performing a Sample Adaptive Offset (SAO) filtering process on the video signal or a signal based thereon to generate a first filtered signal, such that the SAO filtering process includes applying a plurality of band offsets to a plurality of band indices, as shown in block 1930.
Method 1900 continues with performing a deblocking filtering process on the first filtered signal to generate a second filtered signal, as indicated at block 1940.
It is also noted that the various operations and functions described herein with respect to the various methods may be performed in any of a variety of communication devices, such as with baseband processing modules and/or processing modules implemented therein and/or other components therein. For example, such baseband processing module and/or processing module may generate such signals and perform such operations, processing, etc., as described herein, may also perform various operations and analyses, as described herein, or any other operations and functions, etc., as described herein, or equivalents thereof.
In some embodiments, this baseband processing module and/or processing module (which may be implemented in the same device or different devices) may perform this processing, operations, etc., according to aspects of the invention, and/or any other operations and functions, etc., as described herein, or their respective equivalents. In some embodiments, this process is performed cooperatively by a first processing module in a first device and a second processing module in a second device. In other embodiments, this processing, operation, etc. is performed entirely by the baseband processing module and/or the processing module in a given one of the devices. In even other embodiments, this process, operation, etc. is performed using at least a first processing module and a second processing module in a single device.
Also, the terms "substantially" and "approximately" as used herein provide industry-accepted tolerances and/or inter-item dependencies for their corresponding terms. Industry accepted tolerances range from less than 1% -50% and correspond to, but are not limited to, component values, integrated circuit process variables, temperature variations, rise and fall times, and/or thermal noise. The correlation between items varies from a few percent to an order of magnitude. As also used herein, the terms "operatively coupled to," "coupled to," and/or "coupled with" include direct coupling between items and/or indirect coupling between items through intervening items (e.g., items include, but are not limited to, components, elements, circuits, and/or modules), where, for indirect coupling, intervening items do not modify signal information but may adjust their current levels, voltage levels, and/or power levels. As further used herein, inferred coupling (i.e., when one element is coupled to another element by inference) includes direct and indirect coupling between two items in the same manner as "coupled to". As further used herein, the term "operatively usable" or "operatively coupled to" indicates that an item includes one or more power connections, inputs, outputs, etc. to perform one or more corresponding functions when activated and may further include speculative couplings with one or more other items. As still further used herein, the term "associated with …" includes a separate item embedded in another item and/or a direct and/or indirect coupling of one item. The term "comparable to …" as used herein indicates that a comparison between two or more items, signals, etc. provides a desired relationship. For example, a favorable comparison may be achieved when the relationship is that signal 1 is an order of magnitude greater than signal 2, when signal 1 is an order of magnitude greater than signal 2, or when signal 2 is an order of magnitude less than signal 1.
As also used herein, the terms "processing module," "processing circuit," and/or "processing unit" (e.g., including various modules and/or circuits that may operate, implement, and/or operate for encoding, decoding, baseband processing, etc.) may be a single processing device or a plurality of processing devices. Such a processing device may be a microprocessor, microcontroller, digital signal processor, microcomputer, central processing unit, field programmable gate array, programmable logic device, state machine, logic circuitry, analog circuitry, digital circuitry, and/or any device that manipulates signals (analog and/or digital) based on hard coding of the circuitry and/or operational instructions. A processing module, processing circuit, and/or processing unit may have associated memory and/or integrated storage elements that may be embedded circuits, modules, processing circuits, and/or processing units of a single storage device, multiple storage devices, and/or processing module. Such a memory device may be Read Only Memory (ROM), Random Access Memory (RAM), volatile memory, non-volatile memory, static memory, dynamic memory, flash memory, cache memory, and/or any device that stores digital information. It is noted that if the processing module, processing circuit, and/or processing unit includes more than one processing device, the processing devices may be centrally located (e.g., directly coupled together via a wired and/or wireless bus structure) or may be distributively located (e.g., indirectly coupled cloud computing via a local area network and/or a wide area network). It is further noted that if the processing module, processing circuit, and/or processing unit implements one or more functions via a state machine, analog circuitry, digital circuitry, and/or logic circuitry, the memory and/or memory elements storing the corresponding operational instructions may be embedded within or external to the circuitry comprising the state machine, analog circuitry, digital circuitry, and/or logic circuitry. It is further noted that the memory elements may store, process modules, processing circuits, and/or processing units that execute hard coded and/or operational instructions corresponding to at least a portion of the steps and/or functions illustrated in one or more of the figures. Such memory devices or memory elements may be included in an article of manufacture.
The invention has been described above with the aid of method steps illustrating the performance of specified functions and their relationships. The boundaries and sequence of these functional building blocks and method steps have been arbitrarily defined herein for the convenience of the description. Alternate boundaries and sequences may be defined so long as the specified functions and relationships are appropriately performed. Any alternative boundaries and sequences are within the scope and spirit of the claimed invention. Furthermore, the boundaries of these functional building blocks have been arbitrarily defined for the convenience of the description. Alternate boundaries can be defined so long as certain important functions are appropriately performed. Similarly, flow diagrams are arbitrarily defined herein to illustrate certain important functions. To the extent used, the flow diagram boundaries and sequence are otherwise defined and still perform some significant function. Substitutions of functional building blocks and flow diagram blocks and sequences are defined within the scope and spirit of the claimed invention. Those of ordinary skill in the art will also appreciate that the functional building blocks and other illustrative blocks, modules, and components herein may be implemented as illustrated using discrete components, application specific integrated circuits, a processor executing appropriate software, etc., or any combination thereof.
The present invention may have been described, at least in part, with respect to one or more embodiments. Embodiments of the present invention are used herein to illustrate the invention, an aspect thereof, features thereof, concepts thereof and/or examples thereof. Physical embodiments of devices, articles, machines and/or processes embodying the present invention may include one or more aspects, features, concepts, examples, etc. described with reference to one or more embodiments discussed herein. Furthermore, from figure to figure, embodiments may incorporate the same or similarly named functions, steps, modules, etc. that may use the same or different reference numbers and as such may be the same or similar functions, steps, modules, etc. or may be different functions, steps, modules, etc.
Signals passed to, from, and/or between elements in any of the figures shown herein may be analog or digital, continuous or discrete time, and single-ended or differential signals, unless specifically noted from the contrary. For example, if the signal paths are shown as single ended paths, then differential signal paths are also represented. Similarly, if the signal paths are shown as differential paths, single-ended signal paths are also represented. Although described herein with respect to one or more particular architectures, other architectures can be implemented as well, using one or more data buses (not explicitly shown), direct connections between elements, and/or indirect couplings between other elements as recognized by one of ordinary skill in the art.
The term "module" is used to describe various embodiments of the present invention. A module includes a functional block that is implemented via hardware to perform one or more module functions, such as processing one or more input signals to generate one or more output signals. The hardware implementing the modules may itself operate in conjunction with software and/or firmware. A module as used herein may contain one or more sub-modules, each sub-module itself being a module.
Although specific combinations of features and characteristics of the present invention are described herein, other combinations of features and functions are also possible. The invention is not limited to the specific examples disclosed herein and is expressly incorporated in other combinations.

Claims (10)

1. An apparatus, comprising:
at least one input for:
receiving a video signal from at least one additional device; and
receiving a plurality of band offsets from the at least one additional apparatus via signaling; and
a processor to:
analyzing a plurality of pixels associated with at least one Largest Coding Unit (LCU) to identify a pixel value distribution, wherein the largest coding unit is associated with the video signal;
identifying a plurality of band indices inferentially based on the pixel value distribution;
applying the plurality of band offsets to the plurality of band indexes according to a filtering process of the video signal or a signal based on the video signal;
performing a Sample Adaptive Offset (SAO) filtering process on the video signal or a signal based on the video signal to generate a first filtered signal, wherein the sample adaptive offset filtering process applies the plurality of band offsets to the plurality of band indices; and
a deblocking filtering process is performed on the first filtered signal to generate a second filtered signal.
2. The apparatus of claim 1, wherein the processor is to:
analyzing a plurality of pixels associated with at least one maximum coding unit to generate a pixel value histogram representing the pixel value distribution, wherein the maximum coding unit is associated with the video signal; and
identifying the plurality of band indices to which the plurality of band offsets are to be applied based on the histogram of pixel values.
3. The apparatus of claim 1, wherein:
the plurality of band indices have a discontinuous distribution such that at least two consecutive band indices of the plurality of band indices are separated from each other by at least one band index value.
4. The apparatus of claim 1, wherein:
the pixel value distribution indicates a plurality of subsets of the plurality of pixels respectively associated with at least a portion of the plurality of band indices; and
the plurality of band indices to which the plurality of band offsets are to be applied correspond to at least one of a plurality of subsets of the plurality of pixels, wherein the at least one subset has a relatively larger or maximum number of pixels compared to other subsets of the plurality of pixels.
5. The apparatus of claim 1, wherein:
the apparatus is a communication device operating in at least one of a satellite communication system, a wireless communication system, a wired communication system, a fiber optic communication system, and a mobile communication system.
6. An apparatus, comprising:
an input for receiving a video signal and a plurality of band offsets from at least one additional device; and
a processor to:
analyzing a plurality of pixels associated with at least one Largest Coding Unit (LCU) associated with the video signal to identify a pixel value distribution for identifying a plurality of band indices; and
applying the plurality of band offsets to the plurality of band indexes according to a filtering process of the video signal or a signal based on the video signal.
7. The apparatus of claim 6, wherein the processor is to:
performing a Sample Adaptive Offset (SAO) filtering process on the video signal or a signal based on the video signal to generate a first filtered signal, wherein the sample adaptive offset filtering process applies the plurality of band offsets to the plurality of band indices; and
a deblocking filtering process is performed on the first filtered signal to generate a second filtered signal.
8. The apparatus of claim 6, wherein the processor is to:
analyzing a plurality of pixels associated with at least one maximum coding unit to generate a pixel value histogram representing the pixel value distribution, wherein the maximum coding unit is associated with the video signal; and
identifying a plurality of band indices to which the plurality of band offsets are to be applied based on the histogram of pixel values.
9. The apparatus of claim 6, wherein:
the plurality of band indices have a discontinuous distribution such that at least two consecutive band indices of the plurality of band indices are separated from each other by at least one band index value.
10. A method of operation of a communication device, the method comprising:
receiving, via an input of the communication device, a video signal and a plurality of band offsets from at least one additional communication device;
analyzing a plurality of pixels associated with at least one Largest Coding Unit (LCU) associated with the video signal to identify a pixel value distribution for identifying a plurality of band indices; and
applying the plurality of band offsets to the plurality of band indexes according to a filtering process of the video signal or a signal based on the video signal.
HK13112702.9A 2012-02-10 2013-11-13 Sample adaptive offset (sao) in accordance with video coding HK1185482A (en)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US61/597,683 2012-02-10
US61/598,326 2012-02-13
US61/603,190 2012-02-24
US13/623,765 2012-09-20
US13/758,169 2013-02-04

Publications (1)

Publication Number Publication Date
HK1185482A true HK1185482A (en) 2014-02-14

Family

ID=

Similar Documents

Publication Publication Date Title
US11800086B2 (en) Sample adaptive offset (SAO) in accordance with video coding
US9332283B2 (en) Signaling of prediction size unit in accordance with video coding
US9432700B2 (en) Adaptive loop filtering in accordance with video coding
TWI542219B (en) Frequency domain sample adaptive offset (sao)
US9906797B2 (en) Multi-mode error concealment, recovery and resilience coding
TWI524739B (en) Sample adaptive offset (sao) in accordance with video coding
US20130343447A1 (en) Adaptive loop filter (ALF) padding in accordance with video coding
US9231616B2 (en) Unified binarization for CABAC/CAVLC entropy coding
CN103108180B (en) A kind of method based on infrastructure ability and conditions present determination Video coding sub-block size and device thereof
US20200137406A1 (en) Video coding with trade-off between frame rate and chroma fidelity
US20130235926A1 (en) Memory efficient video parameter processing
HK1185482A (en) Sample adaptive offset (sao) in accordance with video coding
HK1179791B (en) Unified binarization for cabac/cavlc entropy coding
HK1185481A (en) Frequency domain sample adaptive offset (sao)
HK1180503B (en) Adaptive loop filtering in accordance with video coding
HK1183578B (en) Method and apparatus of video coding sub-block sizing determining based on infrastructure capabilities and current conditions
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载