HK1161464B - Video coding with large macroblocks - Google Patents
Video coding with large macroblocks Download PDFInfo
- Publication number
- HK1161464B HK1161464B HK11113282.7A HK11113282A HK1161464B HK 1161464 B HK1161464 B HK 1161464B HK 11113282 A HK11113282 A HK 11113282A HK 1161464 B HK1161464 B HK 1161464B
- Authority
- HK
- Hong Kong
- Prior art keywords
- block
- partition
- video
- partitions
- encoded
- Prior art date
Links
Abstract
Techniques are described for encoding and decoding digital video data using macroblocks that are larger than the macroblocks prescribed by conventional video encoding and decoding standards. For example, the techniques include encoding and decoding a video stream using macroblocks comprising greater than 16x16 pixels, for example, 64x64 pixels. In one example, an apparatus includes a video encoder configured to encode a video block having a size of more than 16x16 pixels, generate block-type syntax information that indicates the size of the block, and generate a coded block pattern value for the encoded block, wherein the coded block pattern value indicates whether the encoded block includes at least one non-zero coefficient. The encoder may set the coded block pattern value to zero when the encoded block does not include at least one non-zero coefficient or set the coded block pattern value to one when the encoded block includes a non-zero coefficient.
Description
The present application claims the benefit of united states provisional application No. 61/102,787, filed on day 10/3 of 2008, united states provisional application No. 61/144,357, filed on day 1/13 of 2009, and united states provisional application No. 61/166,631, filed on day 4/3 of 2009, each of which is incorporated herein by reference in its entirety.
This application is related to U.S. patent applications (temporarily referenced by attorney docket numbers 090033U2, 090033U3, 090033U 4), all filed on even date herewith, all having the same title "VIDEO CODING WITH LARGE macro blocks," which are all assigned to the present assignee and are expressly incorporated herein by reference in their entirety for all purposes.
Technical Field
This disclosure relates to digital video coding, and more particularly, to block-based video coding.
Background
Digital video capabilities can be incorporated into a wide range of devices, including digital televisions, digital direct broadcast systems, wireless broadcast systems, Personal Digital Assistants (PDAs), laptop or desktop computers, digital cameras, digital recording devices, video gaming consoles, cellular or satellite radio telephones, and the like. Digital video devices implement video compression techniques, such as those described in the standards and extensions of the standards defined by MPEG-2, MPEG-4, ITU-T H.263, or ITU-T H.264/MPEG-4 part 10 Advanced Video Coding (AVC), to more efficiently transmit and receive digital video information.
Video compression techniques perform spatial prediction and/or temporal prediction to reduce or remove redundancy inherent in video sequences. For block-based video coding, a video frame or slice may be partitioned into macroblocks. Each macroblock may be further partitioned. Macroblocks in an intra-coded (I) frame or slice are encoded using spatial prediction with respect to neighboring macroblocks. Macroblocks in an inter-coded (P or B) frame or slice may use spatial prediction with respect to neighboring macroblocks in the same frame or slice or temporal prediction with respect to other reference frames.
Disclosure of Invention
In general, techniques are described for encoding digital video data using large macroblocks. Large macroblocks are larger than those typically specified by existing video coding standards. Most video coding standards specify the use of macroblocks in the form of 16x16 pixel arrays. According to the present invention, the encoder and decoder may utilize large macroblocks larger than 16 × 16 pixels in size. As an example, a large macroblock may have a 32 × 32, 64 × 64, or larger array of pixels.
Video coding relies on spatial and/or temporal redundancy to support compression of video data. Video frames generated at higher spatial resolutions and/or higher frame rates may support more redundancy. As described in this disclosure, using large macroblocks may permit video coding techniques to take advantage of the greater redundancy that occurs as spatial resolution and/or frame rate increases. In accordance with this disclosure, video coding techniques may utilize a variety of features to support coding of large macroblocks.
As described in this disclosure, large macroblock coding techniques may partition a large macroblock into partitions, and use different partition sizes and different coding modes (e.g., different spatial (I) modes or temporal (P or B) modes) for selected partitions. As another example, a coding technique may utilize Coded Block Pattern (CBP) values to efficiently identify coded macroblocks and partitions with non-zero coefficients within large macroblocks. As another example, a coding technique may compare rate-distortion metrics generated by coding using large macroblocks and small macroblocks to select a macroblock size that produces more favorable results.
In one example, the disclosure provides a method comprising: encoding a video block having a size greater than 16x16 pixels with a video encoder; generating block type syntax information indicating the size of the block; and generating a coded block pattern value for the encoded block, wherein the coded block pattern value indicates whether the encoded block includes at least one non-zero coefficient.
In another example, this disclosure provides an apparatus comprising a video encoder configured to: encoding a video block having a size greater than 16x16 pixels; generating block type syntax information indicating the size of the block; and generating a coded block pattern value for the encoded block, wherein the coded block pattern value indicates whether the encoded block includes at least one non-zero coefficient.
In another example, this disclosure provides a computer-readable medium encoded with instructions for causing a video encoding apparatus to: encoding a video block having a size greater than 16x16 pixels with a video encoder; generating block type syntax information indicating the size of the block; and generating a coded block pattern value for the encoded block, wherein the coded block pattern value indicates whether the encoded block includes at least one non-zero coefficient.
In an additional example, the disclosure provides a method comprising: receiving, with a video decoder, an encoded video block having a size greater than 16x16 pixels; receiving block type syntax information indicating the size of the encoded block; receiving a coded block pattern value for the encoded block, wherein the coded block pattern value indicates whether the encoded block includes at least one non-zero coefficient; and decode the encoded block based on the block type syntax information and the coded block mode value for the encoded block.
In another example, this disclosure provides an apparatus comprising a video decoder configured to: receiving an encoded video block having a size greater than 16x16 pixels; receiving block type syntax information indicating the size of the encoded block; receiving a coded block pattern value for the encoded block, wherein the coded block pattern value indicates whether the encoded block includes at least one non-zero coefficient; and decode the encoded block based on the block type syntax information and the coded block mode value for the encoded block.
In another example, this disclosure provides a computer-readable medium comprising instructions to cause a video decoder to: receiving an encoded video block having a size greater than 16x16 pixels; receiving block type syntax information indicating the size of the encoded block; receiving a coded block pattern value for the encoded block, wherein the coded block pattern value indicates whether the encoded block includes at least one non-zero coefficient; and decode the encoded block based on the block type syntax information and the coded block mode value for the encoded block.
In another example, the disclosure provides a method comprising: receiving, with a video encoder, a video block having a size greater than 16x16 pixels; partitioning the block into partitions; encoding one of the partitions using a first encoding mode; encoding another one of the partitions using a second encoding mode different from the first encoding mode; and generating block type syntax information indicating the size of the block and identifying the partition and the coding mode used to code the partition.
In an additional example, this disclosure provides an apparatus comprising a video encoder configured to: receiving a video block having a size greater than 16x16 pixels; partitioning the block into partitions; encoding one of the partitions using a first encoding mode; encoding another one of the partitions using a second encoding mode different from the first encoding mode; generating block type syntax information indicating the size of the block and identifying the partition and the coding mode used to encode the partition.
In another example, this disclosure provides a computer-readable medium encoded with instructions to cause a video encoder to: receiving a video block having a size greater than 16x16 pixels; partitioning the block into partitions; encoding one of the partitions using a first encoding mode; encoding another one of the partitions using a second encoding mode different from the first encoding mode; and generating block type syntax information indicating the size of the block and identifying the partition and the coding mode used to code the partition.
In another example, the disclosure provides a method comprising: receiving, with a video decoder, video blocks having a size greater than 16x16 pixels, wherein the blocks are partitioned into partitions, one of the partitions is encoded using a first encoding mode and another of the partitions is encoded using a second encoding mode different from the first encoding mode; receiving block type syntax information indicating the size of the block and identifying the partition and the coding mode used to code the partition; and decoding the video block based on the block type syntax information.
In another example, this disclosure provides an apparatus comprising a video decoder configured to: receiving video blocks having a size greater than 16x16 pixels, wherein the blocks are partitioned into partitions, one of the partitions being encoded using a first encoding mode and another of the partitions being encoded using a second encoding mode different from the first encoding mode; receiving block type syntax information indicating the size of the block and identifying the partition and the coding mode used to code the partition; and decoding the video block based on the block type syntax information.
In an additional example, this disclosure provides a computer-readable medium encoded with instructions to cause a video decoder to: receiving, with a video decoder, video blocks having a size greater than 16x16 pixels, wherein the blocks are partitioned into partitions, one of the partitions being encoded using a first encoding mode and another of the partitions being encoded using a second encoding mode different from the first encoding mode; receiving block type syntax information indicating the size of the block and identifying the partition and the coding mode used to code the partition; and decoding the video block based on the block type syntax information.
In another example, the disclosure provides a method comprising: receiving a video coding unit with a digital video encoder; determining a first rate-distortion metric for encoding the video coding unit using a first video block having a size of 16x16 pixels; determining a second rate-distortion metric for encoding the video coding unit using a second video block having a size greater than 16x16 pixels, the video coding unit being encoded using the first video block when the first rate-distortion metric is less than the second rate-distortion metric; and encode the video coding unit using the second video block when the second rate-distortion metric is less than the first rate-distortion metric.
In an additional example, this disclosure provides an apparatus comprising a video encoder configured to: receiving a video coding unit; determining a first rate-distortion metric for encoding the video coding unit using a first video block having a size of 16x16 pixels; determining a second rate-distortion metric for encoding the video coding unit using a second video block having a size greater than 16x16 pixels; encoding the video coding unit using the first video block when the first rate-distortion metric is less than a second rate-distortion metric; encoding the video coding unit using the second video block when the second rate-distortion metric is less than the first rate-distortion metric.
In another example, this disclosure provides a computer-readable medium encoded with instructions to cause a video encoder to: receiving a video coding unit; determining a first rate-distortion metric for encoding the video coding unit using a first video block having a size of 16x16 pixels; determining a second rate-distortion metric for encoding the video coding unit using a second video block having a size greater than 16x16 pixels; encoding the video coding unit using the first video block when the first rate-distortion metric is less than a second rate-distortion metric; and encode the video coding unit using the second video block when the second rate-distortion metric is less than the first rate-distortion metric.
In another example, the disclosure provides a method comprising: encoding, with a video encoder, a coded unit comprising a plurality of video blocks, wherein at least one of the plurality of video blocks comprises a size greater than 16x16 pixels; and generate syntax information for the coded unit that includes a maximum size value, wherein the maximum size value indicates a size of a largest of the plurality of video blocks in the coded unit.
In another example, this disclosure provides an apparatus comprising a video encoder configured to: encoding a coded unit comprising a plurality of video blocks, wherein at least one of the plurality of video blocks comprises a size greater than 16x16 pixels; and generate syntax information for the coded unit that includes a maximum size value, wherein the maximum size value indicates a size of a largest of the plurality of video blocks in the coded unit.
In another example, this disclosure provides an apparatus comprising: means for encoding a coded unit comprising a plurality of video blocks, wherein at least one of the plurality of video blocks comprises a size greater than 16x16 pixels; and means for generating syntax information for the coded unit that includes a maximum size value, wherein the maximum size value indicates a size of a largest of the plurality of video blocks in the coded unit.
In another example, this disclosure provides a computer-readable storage medium encoded with instructions to cause a programmable processor to: encoding a coded unit comprising a plurality of video blocks, wherein at least one of the plurality of video blocks comprises a size greater than 16x16 pixels; and generate syntax information for the coded unit that includes a maximum size value, wherein the maximum size value indicates a size of a largest of the plurality of video blocks in the coded unit.
In another example, the disclosure provides a method comprising: receiving, with a video decoder, a coded unit comprising a plurality of video blocks, wherein at least one of the plurality of video blocks comprises a size greater than 16x16 pixels; receiving syntax information for the coded unit that includes a maximum size value, wherein the maximum size value indicates a size of a largest of the plurality of video blocks in the coded unit; selecting a block type syntax decoder according to the maximum size value; and decoding each of the plurality of video blocks in the coded unit using the selected block type syntax decoder.
In another example, this disclosure provides an apparatus comprising a video decoder configured to: receiving a coded unit comprising a plurality of video blocks, wherein at least one of the plurality of video blocks comprises a size greater than 16x16 pixels; receiving syntax information for the coded unit that includes a maximum size value, wherein the maximum size value indicates a size of a largest of the plurality of video blocks in the coded unit; selecting a block type syntax decoder according to the maximum size value; and decoding each of the plurality of video blocks in the coded unit using the selected block type syntax decoder.
In another example, the invention provides: means for receiving a coded unit comprising a plurality of video blocks, wherein at least one of the plurality of video blocks comprises a size greater than 16x16 pixels; means for receiving syntax information for the coded unit that comprises a maximum size value, wherein the maximum size value indicates a size of a largest of the plurality of video blocks in the coded unit; means for selecting a block type syntax decoder according to the maximum size value; and means for decoding each of the plurality of video blocks in the coded unit using the selected block type syntax decoder.
In another example, this disclosure provides a computer-readable storage medium encoded with instructions for causing a programmable processor to: receiving a coded unit comprising a plurality of video blocks, wherein at least one of the plurality of video blocks comprises a size greater than 16x16 pixels; receiving syntax information for the coded unit that includes a maximum size value, wherein the maximum size value indicates a size of a largest of the plurality of video blocks in the coded unit; selecting a block type syntax decoder according to the maximum size value; and decoding each of the plurality of video blocks in the coded unit using the selected block type syntax decoder.
The details of one or more examples are set forth in the accompanying drawings and the description below. Other features, objects, and advantages will be apparent from the description and drawings, and from the claims.
Drawings
Fig. 1 is a block diagram illustrating an example video encoding and decoding system that encodes and decodes digital video data using large macroblocks.
Fig. 2 is a block diagram illustrating an example of a video encoder implementing techniques for coding large macroblocks.
Fig. 3 is a block diagram illustrating an example of a video decoder implementing techniques for coding large macroblocks.
Fig. 4A is a conceptual diagram illustrating partitioning in various levels of a large macroblock.
Fig. 4B is a conceptual diagram illustrating the assignment of different coding modes to different partitions of a large macroblock.
Fig. 5 is a conceptual diagram illustrating a hierarchical view of various levels of a large macroblock.
Fig. 6 is a flow diagram illustrating an example method for setting Coded Block Pattern (CBP) values for a large macroblock of 64 x 64 pixels.
Fig. 7 is a flow diagram illustrating an example method for setting CBP values for a 32 x 32 pixel partition of a large macroblock of 64 x 64 pixels.
Fig. 8 is a flow diagram illustrating an example method for setting CBP values for a16 x16 pixel partition of a 32 x 32 pixel partition of a large macroblock of 64 x 64 pixels.
FIG. 9 is a flow diagram illustrating an example method for determining a two-bit luma16 × 8_ CBP value.
Fig. 10 is a block diagram illustrating an example arrangement of a large macroblock of 64 x 64 pixels.
FIG. 11 is a flow diagram illustrating an example method for computing an optimal partitioning and encoding method for a large video block of N pixels.
Fig. 12 is a block diagram illustrating an example 64 x 64 pixel macroblock with various partitions and a selected encoding method for each partition.
Fig. 13 is a flow diagram illustrating an example method for determining an optimal size for a macroblock of a frame of an encoded video sequence.
Fig. 14 is a block diagram illustrating an example wireless communication device that includes a video encoder/decoder (CODEC) that encodes digital video data using large macroblocks.
Fig. 15 is a block diagram illustrating an example array representation of a hierarchical CBP representation of a large macroblock.
FIG. 16 is a block diagram illustrating an example tree structure corresponding to the hierarchical CBP representation of FIG. 15.
Fig. 17 is a flow diagram illustrating an example method for using syntax information of a coded unit to indicate and select a block-based syntax encoder and decoder for a video block of the coded unit.
Detailed Description
This disclosure describes techniques for encoding and decoding digital video data using large macroblocks. Large macroblocks are larger than those typically specified by existing video coding standards. Most video coding standards specify the use of macroblocks in the form of 16x16 pixel arrays. According to the present invention, an encoder and/or decoder may utilize large macroblocks larger than 16 × 16 pixels in size. As an example, a large macroblock may have a 32 × 32, 64 × 64, or possibly larger array of pixels.
In general, the term "macroblock" as used herein may refer to a data structure that includes an array of pixels of defined size expressed as N × N pixels, where N is a positive integer value. A macroblock may define: four luminance blocks, each luminance block comprising an array of (N/2) × (N/2) pixels; two chroma blocks, each chroma block comprising an array of N pixels; and a header including macroblock type information and Coded Block Pattern (CBP) information, as discussed in more detail below.
Conventional video coding standards typically specify a defined macroblock size as a16 x16 array of pixels. According to the various techniques described in this disclosure, a macroblock may include an N × N array of pixels, where N may be greater than 16. Likewise, conventional video coding standards specify that inter-coded macroblocks are typically assigned a single motion vector. According to the various techniques described in this disclosure, a plurality of motion vectors may be assigned with inter-coded partitions for an nxn macroblock, as described in more detail below. References to "large macroblocks" or similar phrases generally refer to macroblocks having a pixel array greater than 16x 16.
In some cases, large macroblocks may support improvements in coding efficiency and/or reductions in data transmission overhead while maintaining or possibly improving image quality. For example, the use of large macroblocks may permit video encoders and/or decoders to utilize the increased redundancy provided by generating video data at increased spatial resolution (e.g., 1280 × 720 or 1920 × 1080 pixels per frame) and/or increased frame rate (e.g., 30 or 60 frames per second).
As an illustration, a digital video sequence having a spatial resolution of 1280 × 720 pixels per frame and a frame rate of 60 frames per second is 36 times larger in space than a digital video sequence having a spatial resolution of 176 × 144 pixels per frame and a frame rate of 15 frames per second, and 4 times faster in time than a digital video sequence having a spatial resolution of 176 × 144 pixels per frame and a frame rate of 15 frames per second. With increased macroblock sizes, video encoders and/or decoders may better utilize increased spatial and/or temporal redundancy to support compression of video data.
Also, by using larger macroblocks, a smaller number of blocks can be encoded for a given frame or slice, thereby reducing the amount of overhead information that needs to be transmitted. In other words, larger macroblocks may permit a reduction in the total number of macroblocks coded for each frame or slice. If, for example, the spatial resolution of a frame is increased by a factor of 4, then for a pixel in the frame a factor of 4 of a16 x16 macroblock will be needed. In this example, with 64 × 64 macroblocks, the number of macroblocks needed to handle the increased spatial resolution is reduced. With a reduced number of macroblocks per frame or slice, for example, the amount of accumulation of coding information, such as syntax information, motion vector data, and the like, may be reduced.
In this disclosure, the size of a macroblock generally refers to the number of pixels contained in the macroblock, e.g., 64 × 64, 32 × 32, 16 × 16, etc. Thus, a large macroblock (e.g., 64 × 64 or 32 × 32) may be large in the sense that it contains a number of pixels that is larger than the number of pixels of a16 × 16 macroblock. However, the spatial region defined by the vertical and horizontal dimensions of a large macroblock (i.e., as part of the region defined by the vertical and horizontal dimensions of the video frame) may or may not be larger than the region of a conventional 16x16 macroblock. In some examples, the area of a large macroblock may be the same as or similar to a conventional 16 × 16 macroblock. However, large macroblocks have a higher spatial resolution characterized by a higher number and higher spatial density of pixels within the macroblock.
The size of the macroblock may be configured based at least in part on the number of pixels in the frame (i.e., the spatial resolution in the frame). A large macroblock can be configured to have a higher number of pixels if the frame has a higher number of pixels. As an illustration, a video encoder may be configured to utilize 32 × 32 pixel macroblocks for a 1280 × 720 pixel frame displayed at 30 frames per second. As another illustration, a video encoder may be configured to utilize 64 × 64 pixel macroblocks for a 1280 × 720 pixel frame displayed at 60 frames per second.
Each macroblock encoded by an encoder may require data describing one or more characteristics of the macroblock. The data may indicate, for example, macroblock type data used to represent the size of the macroblock, the manner in which the macroblock is partitioned, and the coding mode (spatial or temporal) applied to the macroblock and/or its partitions. In addition, the data may include motion vector difference (mvd) data as well as other syntax elements representing motion vector information for a macroblock and/or its partitions. Also, the data may include Coded Block Pattern (CBP) values as well as other syntax elements representing predicted residual information. The macroblock type data may be provided in a single macroblock header for large macroblocks.
As mentioned above, by utilizing large macroblocks, the encoder may reduce the number of macroblocks per frame or slice, and thereby reduce the amount of net overhead that needs to be transmitted for each frame or slice. Also, by utilizing large macroblocks, the total number of macroblocks can be reduced for a particular frame or slice, which can reduce blocking artifacts (blockartifacts) in the video displayed to the user.
The video encoding techniques described in this disclosure may utilize one or more features to support coding of large macroblocks. For example, a large macroblock may be partitioned into several smaller partitions. Different coding modes, e.g., different spatial (I) or temporal (P or B) coding modes, may be applied to selected partitions within a large macroblock. Also, layered Coded Block Pattern (CBP) values may be utilized to efficiently identify coded macroblocks and partitions having non-zero transform coefficients representing residual data. In addition, the rate-distortion metrics for coding using large and small macroblock sizes may be compared to select a macroblock size that yields favorable results. Moreover, a coded unit (e.g., a frame, slice, sequence, or group of pictures) that includes macroblocks of varying sizes may include a syntax element that indicates the size of the largest macroblock in the coded unit. As described in more detail below, large macroblocks include block-level syntax that is different from the standard 16x16 pixel blocks. Thus, by indicating the size of the largest macroblock in a coded unit, the encoder may signal to the decoder the block-level syntax decoder to be applied to the macroblock of the coded unit.
Using different coding modes for different partitions in a large macroblock may be referred to as mixed mode coding of the large macroblock. Instead of uniformly coding a large macroblock such that all partitions have the same intra or inter coding mode, the large macroblock may be coded such that some partitions have different coding modes, such as different intra coding modes (e.g., I _16 × 16, I _8 × 8, I _4 × 4) or intra and inter coding modes.
If a large macroblock is divided into two or more partitions, at least one partition may be coded in a first mode and another partition may be coded in a second mode different from the first mode, for example. In some cases, the first mode may be a first I-mode and the second mode may be a second I-mode different from the first I-mode. In other cases, the first mode may be an I-mode and the second mode may be a P-or B-mode. Thus, in some examples, a large macroblock may include one or more temporally (P or B) coded partitions and one or more spatially (I) coded partitions, or one or more spatially coded partitions with different I modes.
One or more layered Coded Block Pattern (CBP) values may be used to effectively describe whether any partition in a large macroblock has at least one non-zero transform coefficient, and if so, which partitions have at least one non-zero transform coefficient. Transform coefficients encode the residual data of a large macroblock. The large macroblock level CBP bit indicates whether any partition in the large macroblock includes non-zero, quantized coefficients. If any partition in a large macroblock does not include non-zero, quantized coefficients, then it is not necessary to consider whether any of the partitions have non-zero coefficients, since the entire large macroblock is known to have no non-zero coefficients. In this case, a macroblock without residual data may be decoded using a predicted macroblock.
Alternatively, if the macroblock-level CBP value indicates that at least one partition in a large macroblock has non-zero coefficients, the partition-level CBP value may be analyzed to identify which of the partitions includes at least one non-zero coefficient. The decoder may then retrieve the appropriate residual data for the partition having at least one non-zero coefficient and decode the partition using the residual data and prediction block data. In some cases, one or more partitions may have non-zero coefficients, and thus include partition-level CBP values with appropriate indications. Both the large macroblock and at least some of the partitions may be larger than 16x16 pixels.
To select the macroblock size that yields a favorable rate-distortion metric, the rate-distortion metric may be analyzed for both large macroblocks (e.g., 32 x 32 or 64 x 64) and small macroblocks (e.g., 16x 16). For example, the encoder may compare rate-distortion metrics between 16 × 16 macroblocks, 32 × 32 macroblocks, and 64 × 64 macroblocks of a coded unit (e.g., a frame or slice). The encoder may then select the macroblock size that yields the best rate-distortion and encode the coded unit using the selected macroblock size (i.e., the macroblock size with the best rate-distortion).
The selection may be based on encoding the frame or slice in three or more passes (e.g., a first pass using a16 x16 pixel macroblock, a second pass using a 32 x 32 pixel macroblock, and a third pass using a 64 x 64 pixel macroblock), and comparing the rate-distortion metrics for each pass. In this way, the encoder may optimize rate-distortion by changing the macroblock size and selecting the macroblock size that produces the best or optimal rate-distortion for a given coding unit (e.g., slice or frame). The encoder may further transmit syntax information for the coded unit that identifies the size of the macroblock used in the coded unit, e.g., as part of a frame header or a slice header. As discussed in more detail below, syntax information for a coded unit may include a maximum size indicator that indicates the maximum size of a macroblock used in the coded unit. In this way, the encoder may inform the decoder which syntax is expected to be used for macroblocks of the coded unit. When the maximum size of a macroblock includes 16x16 pixels, the decoder may expect the standard h.264 syntax and parse the macroblock according to the h.264-specific syntax. However, when the maximum size of a macroblock is greater than 16 × 16 (e.g., including 64 × 64 pixels), the decoder may expect modified syntax elements and/or additional syntax elements related to the processing of the larger macroblock, as described in this disclosure, and parse the macroblock according to the modified syntax or additional syntax.
For some video frames or slices, a large macroblock may present substantial bit rate savings given relatively low distortion and thereby produce the best rate-distortion results. However, for other video frames or slices, smaller macroblocks may exhibit less distortion, thereby exceeding the bit rate in the rate-distortion cost analysis. Thus, 64 x 64, 32 x 32, or 16x16 may be suitable for different video frames or slices in different cases, e.g., depending on video content and complexity.
Fig. 1 is a block diagram illustrating an example video encoding and decoding system 10, video encoding and decoding system 10 may utilize techniques for encoding/decoding digital video data using large macroblocks (i.e., macroblocks containing more pixels than a16 × 16 macroblock). As shown in fig. 1, system 10 includes a source device 12 that transmits encoded video to a destination device 14 via a communication channel 16. Source device 12 and destination device 14 may comprise any of a wide range of devices. In some cases, source device 12 and destination device 14 may comprise wireless communication devices, such as wireless handsets, so-called cellular or satellite radiotelephones, or any wireless devices that may communicate video information over communication channel 16 (in which case communication channel 16 is wireless). However, the techniques of this disclosure involving the use of large macroblocks containing more pixels than those specified by conventional video coding standards are not necessarily limited to wireless applications or settings. For example, these techniques may be applicable to over-the-air television broadcasts, cable television transmissions, satellite television transmissions, internet video transmissions, encoded digital video encoded onto a storage medium, or other scenarios. Accordingly, communication channel 16 may include any combination of wireless media or wired media suitable for transmission of encoded video data.
In the example of fig. 1, source device 12 may include a video source 18, a video encoder 20, a modulator/demodulator (modem) 22, and a transmitter 24. Destination device 14 may include a receiver 26, a modem 28, a video decoder 30, and a display device 32. In accordance with this disclosure, video encoder 20 of source device 12 may be configured to apply one or more of the techniques for using large macroblocks in a video encoding process that have a size greater than the macroblock size specified by conventional video encoding standards. Similarly, video decoder 30 of destination device 14 may be configured to apply one or more of the techniques for using a macroblock size in a video decoding process that is larger than a macroblock size specified by a conventional video coding standard.
The illustrated system 10 of fig. 1 is merely one example. The techniques for using large macroblocks as described in this disclosure may be performed by any digital video encoding and/or decoding device. Source device 12 and destination device 14 are merely examples of the coding device in which source device 12 generates coded video data for transmission to destination device 14. In some examples, devices 12, 14 may operate in a substantially symmetric manner such that each of devices 12, 14 includes video encoding and decoding components. Thus, system 10 may support one-way or two-way video transmission between video devices 12, 14, for example, for video streaming, video playback, video broadcasting, or video telephony.
Video source 18 of source device 12 may comprise a video capture device, such as a video camera, a video archive containing previously captured video, and/or a video feed (video feed) from a video content provider. As another alternative, video source 18 may generate computer graphics-based data as the source video, or generate a combination of live video, archived video, and computer-generated video. In some cases, if video source 18 is a video camera, source device 12 and destination device 14 may form so-called camera phones or video phones. However, as mentioned above, in general, the techniques described in this disclosure may be applicable to video coding, and may be applicable to wireless or wired applications. In each case, the captured, pre-captured, or computer-generated video may be encoded by video encoder 20. The encoded video information may then be modulated by modem 22 according to a communication standard and transmitted to destination device 14 via transmitter 24. Modem 22 may include various mixers, filters, amplifiers, or other components designed for signal modulation. Transmitter 24 may include circuitry designed for transmitting data, including amplifiers, filters, and one or more antennas.
Receiver 26 of destination device 14 receives information on channel 16 and modem 28 demodulates the information. Likewise, the video encoding process may implement one or more of the techniques described herein to use large macroblocks (e.g., greater than 16x 16) for inter (i.e., temporal) encoding and/or intra (i.e., spatial) encoding of video data. The video decoding process performed by video decoder 30 may also use the techniques during the decoding process. The information communicated over channel 16 may include syntax information defined by video encoder 20 that is also used by video decoder 30, including syntax elements that describe characteristics and/or processing of large macroblocks, as discussed in more detail below. Syntax information may be included in any or all of a frame header, a slice header, a sequence header (e.g., with respect to h.264, the profile and level to which a coded video sequence conforms by use), or a macroblock header. Display device 32 displays the decoded video data to a user, and may comprise any of a variety of display devices, such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), a plasma display, an Organic Light Emitting Diode (OLED) display, or another type of display device.
In the example of fig. 1, communication channel 16 may include any wireless or wired communication medium, such as a Radio Frequency (RF) spectrum or one or more physical transmission lines, or any combination of wireless and wired media. The communication channel 16 may form part of a packet-based network (e.g., a local area network, a wide area network, or a global network such as the internet). Communication channel 16 generally represents any suitable communication medium or collection of different communication media, including any suitable combination of wired or wireless media, for transmitting video data from source device 12 to destination device 14. The communication channel 16 may include a router, switch, base station, or any other equipment that may be used to facilitate communication from the source device 12 to the destination device 14.
Video encoder 20 and video decoder 30 may operate in accordance with a video compression standard such as the ITU-T h.264 standard (or described as MPEG-4 part 10, Advanced Video Coding (AVC)). However, the techniques of this disclosure are not limited to any particular coding standard. Other examples include MPEG-2 and ITU-T H.263. Although not shown in fig. 1, in some aspects, video encoder 20 and video decoder 30 may each be integrated with an audio encoder and decoder, and may include appropriate MUX-DEMUX units, or other hardware and software, to handle encoding of both audio and video in a common data stream or separate data streams. The MUX-DEMUX unit may conform to itu h.223 multiplexer protocol, if applicable, or other protocols such as User Datagram Protocol (UDP).
The ITU-T H.264/MPEG-4(AVC) standard is formulated by the ITU-T Video Coding Experts Group (VCEG) together with the ISO/IEC Moving Picture Experts Group (MPEG) as a product of a common partnership known as the Joint Video Team (JVT). In some aspects, the techniques described in this disclosure may be applicable to devices that generally comply with the h.264 standard. The H.264 standard is described in the ITU-T recommendation H.264 published by the ITU-T research group, month 3 2005, in Advanced Video Coding for general audio visual services (Advanced Video Coding for general audio visual services), which may be referred to herein as the H.264 standard or the H.264 specification, or the H.264/AVC standard or specification. The Joint Video Team (JVT) is constantly working on extensions to H.264/MPEG-4 AVC.
Video encoder 20 and video decoder 30 may each be implemented as any of a variety of suitable encoder circuits, such as one or more microprocessors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs), discrete logic, software, hardware, firmware or any combinations thereof. Each of video encoder 20 and video decoder 30 may be included in one or more encoders or decoders, either of video encoder 20 and video decoder 30 may be integrated as part of a combined encoder/decoder (CODEC) in a respective camera, computer, mobile device, subscriber device, broadcast device, set-top box, server, or the like.
A video sequence typically comprises a series of video frames. Video encoder 20 operates on video blocks within individual video frames in order to encode the video data. A video block may correspond to a macroblock or a partition of a macroblock. A video block may further correspond to a partition of a partition. Video blocks may have fixed or different sizes, and the sizes may differ according to a specified coding standard or according to the techniques of this disclosure. Each video frame may include a plurality of slices. Each slice may include a plurality of macroblocks, which may be arranged into partitions, also referred to as sub-blocks.
As an example, the ITU-T h.264 standard supports intra prediction for various block sizes (e.g., 16 by 16, 8 by 8, or 4 by 4 for luma components, and 8 x 8 for chroma components), and inter prediction for various block sizes (e.g., 16x16, 16x 8, 8 x16, 8 x 8, 8 x 4, 4 x 8, and 4 x 4 for luma components, and corresponding scaled sizes for chroma components). In this disclosure, "x" and "by" are used interchangeably to refer to the pixel size of a block in terms of vertical and horizontal dimensions, e.g., 16 × 16 pixels or 16 by 16 pixels. In general, a16 x16 block will have 16 pixels in the vertical direction and 16 pixels in the horizontal direction. Likewise, an nxn block typically has N pixels in the vertical direction and N pixels in the horizontal direction, where N represents a positive integer value that may be greater than 16. The pixels in a block may be arranged in rows and columns.
A block size smaller than 16 by 16 may be referred to as a partition of a16 by 16 macroblock. Likewise, for an nxn block, a block size that is less than nxn may be referred to as a partition of the nxn block. The techniques of this disclosure describe intra-coding and inter-coding of macroblocks (e.g., 32 x 32 pixel macroblocks, 64 x 64 pixel macroblocks, or larger macroblocks) that are larger than conventional 16x16 pixel macroblocks. The video blocks may comprise blocks of pixel data in the pixel domain, or blocks of transform coefficients in the transform domain, e.g., after applying a transform such as a Discrete Cosine Transform (DCT), an integer transform, a wavelet transform, or a conceptually similar transform to residual video block data representing pixel differences between coded video blocks and predicted video blocks. In some cases, a video block may include a block of quantized transform coefficients in the transform domain.
Smaller video blocks may provide better resolution and may be used for locations of a video frame that include high levels of detail. In general, macroblocks and various partitions, sometimes referred to as sub-blocks, may be considered video blocks. In addition, a slice may be considered to be a plurality of video blocks, such as macroblocks and/or subblocks. Each slice may be an independently decodable unit of the video frame. Alternatively, the frame itself may be a decodable unit, or other portions of the frame may be defined as decodable units. The term "coded unit" or "coding unit" may refer to any independently decodable unit of a video frame, such as an entire frame, a slice of a frame, a group of pictures (GOP), also referred to as a sequence, or another independently decodable unit defined according to applicable coding techniques.
Following intra-prediction or inter-prediction coding, which is used to generate prediction data and residual data, and following any transform used to generate the transform coefficients, such as the 4 x 4 or 8 x 8 integer transform or the discrete cosine transform DCT used in h.264/AVC, quantization of the transform coefficients may be performed. Quantization generally refers to a process of quantizing transform coefficients to possibly reduce the amount of data used to represent the coefficients. The quantization process may reduce the bit depth associated with some or all of the coefficients. For example, during quantization, an n-bit value may be rounded down to an m-bit value, where n is greater than m.
After quantization, entropy coding of the quantized data may be performed, for example, according to Content Adaptive Variable Length Coding (CAVLC), Context Adaptive Binary Arithmetic Coding (CABAC), or another entropy coding method. A processing unit configured for entropy coding, or another processing unit, may perform other processing functions, such as zero run length coding of quantized coefficients and/or generation of syntax information such as CBP values, macroblock types, coding modes, maximum macroblock sizes of coded units (e.g., frames, slices, macroblocks, or sequences), and so forth.
In accordance with the various techniques of this disclosure, video encoder 20 may encode digital video data using macroblocks that are larger than the macroblocks specified by conventional video encoding standards. In one example, video encoder 20 may encode a video block having a size greater than 16x16 pixels with a video encoder, generate block type syntax information indicating the size of the block, and generate a CBP value for the encoded block, wherein the coded block pattern value indicates whether the encoded block includes at least one non-zero coefficient. The macroblock block type syntax information may be provided in a macroblock header of a large macroblock. The macroblock block type syntax information may indicate an address or location of the macroblock in a frame or slice, or a macroblock number identifying the location of the macroblock, a type of coding mode applied to the macroblock, a quantization value of the macroblock, any motion vector information of the macroblock, and a CBP value of the macroblock.
In another example, video encoder 20 may receive video blocks having a size greater than 16x16 pixels, partition the blocks into partitions, encode one of the partitions using a first encoding mode, encode another of the partitions using a second encoding mode different from the first encoding mode, and generate block type syntax information that indicates the size of the blocks and identifies the partitions and the encoding modes used to encode the partitions.
In an additional example, video encoder 20 may receive a video coding unit (e.g., a frame or slice), determine a first rate-distortion metric for encoding the video coding unit using a first video block having a size of 16x16 pixels, determine a second rate-distortion metric for encoding the video coding unit using a second video block having a size greater than 16x16 pixels, encode the video coding unit using the first video block when the first rate-distortion metric is less than the second rate-distortion metric, and encode the video coding unit using the second video block when the second rate-distortion metric is less than the first rate-distortion metric.
In one example, video decoder 30 may receive an encoded video block having a size greater than 16x16 pixels, receive block type syntax information indicating the size of the encoded block, receive a coded block pattern value for the encoded block, wherein the coded block pattern value indicates whether the encoded block includes at least one non-zero coefficient, and decode the encoded block based on the block type syntax information and the coded block pattern value for the encoded block.
In another example, video decoder 30 may receive video blocks having a size greater than 16x16 pixels, where the blocks are partitioned into partitions, one of the partitions is intra-coded and another of the partitions is intra-coded, receive block type syntax information indicating the size of the blocks and identifying the partitions and the coding modes used to encode the partitions, and decode the video blocks based on the block type syntax information.
Fig. 2 is a block diagram illustrating an example of video encoder 50, video encoder 50 may implement techniques for using large macroblocks consistent with this disclosure. Video encoder 50 may correspond to video encoder 20 of source device 12 or a video encoder of a different device. Video encoder 50 may perform intra-coding and inter-coding of blocks within a video frame, including large macroblocks, or partitions or sub-partitions of large macroblocks. Intra-coding relies on spatial prediction to reduce or remove spatial redundancy of video within a given video frame. Inter-coding relies on temporal prediction to reduce or remove temporal redundancy of video within adjacent frames of a video sequence.
An intra-mode (I-mode) may refer to any of several spatially-based compression modes, and an inter-mode, such as predictive (P-mode) or bi-directional (B-mode), may refer to any of several temporally-based compression modes. The techniques of this disclosure may be applied during both inter-coding and intra-coding. In some cases, the techniques of this disclosure may also be applicable to encoding non-video digital pictures. That is, a digital still picture encoder may utilize the techniques of this disclosure to intra-code a digital still picture using large macroblocks in a manner similar to encoding intra-coded macroblocks in video frames in a video sequence.
As shown in fig. 2, video encoder 50 receives a current video block within a video frame to be encoded. In the example of fig. 2, video encoder 50 includes motion compensation unit 35, motion estimation unit 36, intra prediction unit 37, mode selection unit 39, reference frame store 34, summer 48, transform unit 38, quantization unit 40, and entropy coding unit 46. For video block reconstruction, video encoder 50 also includes inverse quantization unit 42, inverse transform unit 44, and summer 51. A deblocking filter (not shown in fig. 2) may also be included to filter block boundaries to remove blockiness artifacts from the reconstructed video. The deblocking filter typically filters the output of summer 51, if desired.
During the encoding process, video encoder 50 receives a video frame or slice to be coded. The frame or slice may be divided into a plurality of video blocks, including large macroblocks. Motion estimation unit 36 and motion compensation unit 35 perform inter-prediction coding of the received video block relative to one or more blocks in one or more reference frames to provide temporal compression. Intra-prediction unit 37 performs intra-prediction coding of the received video block relative to one or more neighboring blocks in the same frame or slice as the block to be coded to provide spatial compression.
Mode select unit 39 may select one of the coding modes (intra or inter), e.g., based on the error results, and provide the resulting intra-coded or inter-coded block to summer 48 to generate residual block data and to summer 51 to reconstruct the encoded block for use as a reference frame. In accordance with the techniques of this disclosure, a video block to be encoded may include macroblocks that are larger than the macroblocks specified by conventional coding standards (i.e., larger than 16x16 pixel macroblocks). For example, a large video block may comprise a 64 × 64 pixel macroblock or a 32 × 32 pixel macroblock.
Motion estimation unit 36 and motion compensation unit 35 may be highly integrated, but are illustrated separately for conceptual purposes. Motion estimation is the process of generating motion vectors that estimate the motion of video blocks. A motion vector, for example, may indicate the displacement of a prediction block within a prediction reference frame (or other coded unit) relative to a current block being coded within a current frame (or other coded unit). A prediction block is a block that is found to closely match the block to be coded in terms of pixel differences, which may be determined by Sum of Absolute Differences (SAD), Sum of Squared Differences (SSD), or other difference metrics.
The motion vector may also indicate the displacement of the partition of the large macroblock. In one example with respect to a 64 x 64 pixel macroblock having one 32 x 64 partition and two 32 x 32 partitions, a first motion vector may indicate displacement of the 32 x 64 partitions, a second motion vector may indicate displacement of a first of the 32 x 32 partitions, and a third motion vector may indicate displacement of a second of the 32 x 32 partitions, all relative to corresponding partitions in the reference frame. The partitions may also be considered video blocks (as those terms are used in this disclosure). Motion compensation may involve fetching or generating a prediction block based on a motion vector determined by motion estimation. Also, the motion estimation unit 36 and the motion compensation unit 35 may be functionally integrated.
Motion estimation unit 36 computes motion vectors for video blocks of inter-coded frames by comparing video blocks of inter-coded frames to video blocks of reference frames in reference frame store 34. Motion compensation unit 35 may also interpolate sub-integer pixels of a reference frame (e.g., an I frame or a P frame). The ITU h.264 standard refers to reference frames as "lists. Thus, the data stored in reference frame store 34 may also be considered a list. Motion estimation unit 36 compares blocks of one or more reference frames (or lists) from reference frame store 34 with blocks to be encoded of a current frame, such as a P-frame or B-frame. When a reference frame in reference frame store 34 includes values for sub-integer pixels, the motion vectors calculated by motion estimation unit 36 may refer to sub-integer pixel positions of the reference frame. Motion estimation unit 36 sends the calculated motion vectors to entropy coding unit 46 and motion compensation unit 35. The reference frame block identified by the motion vector may be referred to as a prediction block. The motion compensation unit 35 calculates error values of a prediction block of the reference frame.
Motion compensation unit 35 may calculate prediction data based on the prediction block. Video encoder 50 forms a residual video block by subtracting the prediction data from motion compensation unit 35 from the original video block being coded. Summer 48 represents the component that performs this subtraction operation. Transform unit 38 applies a transform, such as a Discrete Cosine Transform (DCT) or a conceptually similar transform, to the residual block, producing a video block that includes residual transform coefficient values. Transform unit 38 may perform other transforms conceptually similar to DCT, such as the transform defined by the h.264 standard. Wavelet transforms, integer transforms, subband transforms, or other types of transforms may also be used. In any case, transform unit 38 applies a transform to the residual block, producing a block of residual transform coefficients. The transform may convert the residual information from a pixel value domain to a transform domain (e.g., frequency domain).
Quantization unit 40 quantizes the residual transform coefficients to further reduce the bit rate. The quantization process may reduce the bit depth associated with some or all of the coefficients. In one example, quantization unit 40 may quantize the video according to a luma quantization parameter (referred to in this disclosure as QP)Y) But a different degree of quantization is established for each 64 x 64 pixel macroblock. Quantization unit 40 may further modify the luma quantization parameter used during quantization of 64 x 64 macroblocks based on a quantization parameter modifier (which is referred to herein as "MB 64_ delta _ QP") and previously encoded 64 x 64 pixel macroblocks.
Each large macroblock of 64 x 64 pixels may include an individual MB64 delta QP value in a range between-26 and +25, including-26 and + 25. In general, video encoder 50 may establish MB64 delta QP values for a particular block based on a desired bit rate for transmitting an encoded version of the particular block. The MB64_ delta _ QP value for the first 64 x 64 pixel macroblock may be equal to the QP value for the frame or slice that includes the first 64 x 64 pixel macroblock (e.g., in a frame/slice header). The QP for the current 64 x 64 pixel macroblock may be calculated according toY:
QPY=(QPY,PREV+MB64_delta_QP+52)%52
Wherein QPY,PREVQP referring to the previous 64 x 64 pixel macroblock in decoding order of the current slice/frameYValue, and where "%" refers to the modulus operator, so that N% 52 is returnedA result between 0 and 51 (including 0 and 51) thus corresponds to a remainder value of N divided by 52. For the first macroblock in a frame/slice, the QP may be setY,PREVSet equal to the frame/slice QP sent in the frame/slice header.
In one example, quantization unit 40 assumes that: when the MB64 delta QP value is not defined for a particular 64 x 64 pixel macroblock (including "Skip" type macroblocks, e.g., P _ Skip and B _ Skip macroblock types), the MB64 delta QP value is equal to zero. In some examples, an additional delta _ QP value (commonly referred to as a quantization parameter modification value) may be defined for finer granularity quantization control of partitions within a 64 x 64 pixel macroblock, such as MB32_ delta _ QP values for each 32 x 32 pixel partition of a 64 x 64 pixel macroblock. In some examples, each partition of a 64 x 64 macroblock may be assigned an individual quantization parameter. Instead of using a single QP for a 64 x 64 macroblock, using an individualized quantization parameter for each partition may enable more efficient quantization of the macroblock, e.g., in order to better adjust the quantization for non-homogeneous regions. Each quantization parameter modification value may be included with a corresponding encoded block as syntax information, and a decoder may decode the encoded block by dequantizing (i.e., inverse quantizing) the encoded block according to the quantization parameter modification value.
After quantization, entropy coding unit 46 entropy codes the quantized transform coefficients. For example, entropy coding unit 46 may perform Content Adaptive Variable Length Coding (CAVLC), Context Adaptive Binary Arithmetic Coding (CABAC), or another entropy coding technique. After entropy coding by entropy coding unit 46, the encoded video may be transmitted to another device or archived for later transmission or retrieval. A coded bitstream may include entropy coded blocks of residual transform coefficients, motion vectors for the blocks, MB64 delta QP values for each 64 x 64 pixel macroblock, and coded unit headers including, for example, a macroblock type identifier value, a maximum size indicating a macroblock in a coded unit, QP, and a motion vector for the blocksYA value, a Coded Block Pattern (CBP) value, a value that identifies a partitioning method of a macroblock or sub-block, and other syntax elements that transform a size flag value, as discussed in more detail below. From contextWith adaptation to binary arithmetic coding, the context may be based on neighboring macroblocks.
In some cases, in addition to entropy coding, entropy coding unit 46 or another unit of video encoder 50 may be configured to perform other coding functions. For example, entropy coding unit 46 may be configured to determine CBP values for large macroblocks and partitions. Entropy coding unit 46 may apply a layered CBP scheme to provide CBP values for a large macroblock that indicate whether any partitions in the macroblock include non-zero transform coefficient values, and if so, to provide other CBP values for indicating whether particular partitions within the large macroblock have non-zero transform coefficient values. Also, in some cases, entropy coding unit 46 may perform run-length coding of coefficients in large macroblocks or sub-partitions. In particular, entropy coding unit 46 may apply zig-zag scanning or other scanning modes to scan transform coefficients in a macroblock or partition and encode runs of zeros for further compression. Entropy coding unit 46 may also construct header information with the appropriate syntax elements for transmission in the encoded video bitstream.
Inverse quantization unit 42 and inverse transform unit 44 apply inverse quantization and inverse transform, respectively, to reconstruct the residual block in the pixel domain, e.g., for later use as a reference block. Motion compensation unit 35 may calculate the reference block by adding the residual block to a prediction block of one of the frames of reference frame store 34. Motion compensation unit 35 may also apply one or more interpolation filters to the reconstructed residual block to calculate sub-integer pixel values. Summer 51 adds the reconstructed residual block to the motion compensated prediction block produced by motion compensation unit 35 to produce a reconstructed video block for storage in reference frame store 34. The reconstructed video block may be used by motion estimation unit 36 and motion compensation unit 35 as a reference block for inter-coding a block in a subsequent video frame. Large macroblocks may include 64 × 64 pixel macroblocks, 32 × 32 pixel macroblocks, or other macroblocks larger than the size specified by conventional video coding standards.
Fig. 3 is a block diagram illustrating an example of video decoder 60, video decoder 60 decoding video sequences encoded in the manner described in this disclosure. An encoded video sequence may include encoded macroblocks that are larger than the size specified by conventional video coding standards. For example, the encoded macroblock may be a 32 × 32 pixel or 64 × 64 pixel macroblock. In the example of fig. 3, video decoder 60 includes entropy decoding unit 52, motion compensation unit 54, intra-prediction unit 55, inverse quantization unit 56, inverse transform unit 58, reference frame store 62, and summer 64. Video decoder 60 may, in some examples, perform a decoding pass generally reciprocal to the encoding pass described with respect to video encoder 50 (fig. 2). Motion compensation unit 54 may generate prediction data based on the motion vectors received from entropy decoding unit 52.
Entropy decoding unit 52 entropy decodes the received bitstream to generate quantized coefficients and syntax elements (e.g., motion vectors, CBP values, QPs)YValue, transform size flag value, MB64_ delta _ QP value). Entropy decoding unit 52 may parse the bitstream to identify syntax information in coded units (e.g., frame, slice, and/or macroblock headers). Syntax information for a coded unit that includes a plurality of macroblocks may indicate a maximum size of a macroblock (e.g., a16 × 16 pixel, 32 × 32 pixel, 64 × 64 pixel, or other larger sized macroblock) in the coded unit. Depending on the coding mode of the block, for example, syntax information for the block is forwarded from entropy coding unit 52 to motion compensation unit 54 or intra-prediction unit 55. A decoder may use a maximum size indicator in the syntax of a coded unit to select a syntax decoder for the coded unit. By using the syntax decoder specified for the maximum size, the decoder can then properly interpret and process the large macroblocks included in the coded unit.
Motion compensation unit 54 may use the motion vectors received in the bitstream to identify a prediction block in a reference frame in reference frame store 62. Intra-prediction unit 55 may use intra-prediction modes received in the bitstream to form prediction blocks from spatially neighboring blocks. Inverse quantization unit 56 inverse quantizes (i.e., dequantizes) the quantized block coefficients provided in the bitstream and decoded by entropy decoding unit 52. Inverse quantizationThe process may comprise, for example, a conventional process as defined by the h.264 decoding standard. The inverse quantization process may also include using a quantization parameter QP calculated by encoder 50 for each 64 x 64 macroblock to determine the degree of quantization and the degree of inverse quantization that should also be appliedY。
Inverse transform unit 58 applies an inverse transform (e.g., an inverse DCT, an inverse integer transform, or a conceptually similar inverse transform process) to the transform coefficients in order to generate a residual block in the pixel domain. Motion compensation unit 54 generates motion compensated blocks, possibly performing interpolation filter-based interpolation. An identifier of an interpolation filter to be used for motion estimation with sub-pixel precision may be included in the syntax element. Motion compensation unit 54 may calculate interpolated values for sub-integer pixels of the reference block using interpolation filters as used by video encoder 50 during encoding of the video block. Motion compensation unit 54 may determine the interpolation filters used by video encoder 50 according to the received syntax information and use the interpolation filters to generate prediction blocks.
Motion compensation unit 54 uses some syntax information to determine the size of macroblocks of the frame(s) used to encode the encoded video block, partition information that describes the manner in which each macroblock of a frame of the encoded video sequence is partitioned, a mode that indicates the manner in which each partition is encoded, one or more reference frames (or lists) for each inter-encoded macroblock or partition, and other information used to decode the encoded video sequence.
Summer 64 sums the residual block with the corresponding prediction block produced by motion compensation unit 54 or the intra-prediction unit to form a decoded block. When needed, deblocking filters may also be applied to filter the decoded blocks in order to remove blockiness artifacts. The decoded video blocks are then stored in reference frame store 62, reference frame store 62 providing reference blocks for subsequent motion compensation and also generating decoded video for presentation on a display device (e.g., device 32 of fig. 1). The decoded video blocks may each comprise a 64 x 64 pixel macroblock, a 32 x 32 pixel macroblock, or other larger-than-standard macroblock. Some macroblocks may include partitions having a variety of different partition sizes.
Fig. 4A is a conceptual diagram illustrating example partitioning in various partition levels of a large macroblock. The blocks of each partition level comprise a number of pixels corresponding to a particular level. Four partitioning patterns are also shown for each level, with a first partitioning pattern comprising an entire block, a second partitioning pattern comprising two horizontal partitions of equal size, a third partitioning pattern comprising two vertical partitions of equal size, and a fourth partitioning pattern comprising four partitions of equal size. One of the partitioning modes may be selected for each partition of each partition level.
In the example of fig. 4A, level 0 corresponds to a 64 x 64 pixel macroblock partition of luma samples and associated chroma samples. Level 1 corresponds to a 32 x 32 block of pixels of luma samples and associated chroma samples. Level 2 corresponds to a16 x16 block of pixels of luma samples and associated chroma samples, and level 3 corresponds to an 8 x 8 block of pixels of luma samples and associated chroma samples.
In other examples, additional levels may be introduced to utilize a greater or lesser number of pixels. For example, level 0 may start with a 128 × 128 pixel macroblock, a 256 × 256 pixel macroblock, or other larger sized macroblock. The highest numbered level may be as fine as a single pixel (i.e., a1 x1 block) in some examples. Thus, from the lowest level to the highest level, the partitions may be sub-partitioned step by step, such that the macroblock is partitioned, the partitions are further partitioned, the resulting partitions are further partitioned, and so on. In some cases, partitions below level 0 (i.e., partitioned partitions) may be referred to as child partitions.
When a block at one level is partitioned using four equally sized sub-blocks, any or all of the sub-blocks may be partitioned according to the partitioning mode of the next level. That is, for an N × N block partitioned into four equally sized sub-blocks (N/2) × (N/2) at level x, any of the (N/2) × (N/2) sub-blocks may be further partitioned according to any of the partitioning patterns of level x + 1. Thus, the 32 x 32 pixel sub-blocks of the 64 x 64 pixel macroblock at level 0 may be further partitioned according to any of the modes at level 1 shown in fig. 4A (e.g., 32 x 32, 32 x16 and 32 x16, 16x 32 and 16x 32, or 16x16, 16x16 and 16x 16). Likewise, where four 16 × 16 pixel sub-blocks are generated from a partitioned 32 × 32 pixel sub-block, each of the 16 × 16 pixel sub-blocks may be further partitioned according to any of the modes at level 2 shown in fig. 4A. Where four 8 x 8 pixel sub-blocks are generated from a partitioned 16x16 pixel sub-block, each of the 8 x 8 pixel sub-blocks may be further partitioned according to any of the modes at level 3 shown in fig. 4A.
By using the example four levels of partitioning shown in fig. 4A, large homogeneous regions and small sporadic changes can be adaptively represented by an encoder implementing the architecture and techniques of this disclosure. For example, video encoder 50 may determine different partition levels for different macroblocks, e.g., based on a rate-distortion analysis, and the encoding mode applied to the partitions. Also, as described in more detail below, video encoder 50 may encode at least some of the final partitions differently, e.g., using spatial (P-coding or B-coding) or temporal (I-coding) prediction based on rate-distortion metric results or other considerations.
Instead of uniformly encoding large macroblocks such that all partitions have the same intra-coding mode or inter-coding mode, large macroblocks may be coded such that some partitions have different coding modes. For example, some (at least one) partition may be coded with a different intra coding mode (e.g., l _16 × 16, l _8 × 8, l _4 × 4) relative to other (at least one) partition in the same macroblock. Also, some (at least one) partition may be intra coded while other (at least one) partition in the same macroblock may be inter coded.
For example, for a 32 x 32 block with four 16x16 partitions, video encoder 50 may encode some of the 16x16 partitions using spatial prediction and other 16x16 partitions using temporal prediction. As another example, for a 32 × 32 block having four 16 × 16 partitions, video encoder 50 may encode one or more of the 16 × 16 partitions using a first prediction mode (e.g., one of I _16 × 16, I _8 × 8, I _4 × 4) and one or more other 16 × 16 partitions using a different spatial prediction mode (e.g., one of I _16 × 16, I _8 × 8, I _4 × 4).
Fig. 4B is a conceptual diagram illustrating the assignment of different coding modes to different partitions of a large macroblock. Specifically, fig. 4B illustrates assigning an I _16 × 16 intra coding mode to the upper left 16 × 16 block of a large 32 × 32 macroblock, assigning an I _8 × 8 intra coding mode to the upper right and lower left 16 × 16 blocks of a large 32 × 32 macroblock, and assigning an I _4 × 4 intra coding mode to the lower right 16 × 16 block of a large 32 × 32 macroblock. In some cases, the coding mode illustrated in fig. 4B may be an h.264 intra coding mode used for luma coding.
In the described manner, each partition may be selectively further partitioned, and each final partition may be selectively coded using temporal prediction or spatial prediction and using a selected temporal coding mode or spatial coding mode. Thus, it is possible to code large macroblocks in a mixed mode such that some partitions in a macroblock are intra coded and other partitions in the same macroblock are inter coded, or to code some partitions in the same macroblock with different intra coding modes or different inter coding modes.
Video encoder 50 may further define each partition according to macroblock type. The macroblock type may be included in the encoded bitstream as a syntax element, e.g., as a syntax element in a macroblock header. In general, the macroblock type may be used to identify the manner in which the macroblock is partitioned, and the corresponding method or mode used to encode each of the partitions of the macroblock, as discussed above. The method for encoding partitions may include not only intra coding and inter coding, but also intra coding (e.g., I _16 × 16, I _8 × 8, I _4 × 4) or inter coding (e.g., P _ or B _16 × 16, 16 × 8, 8 × 16, 8 × 8, 8 × 4, 4 × 8, and 4 × 4) of a particular mode.
As discussed in more detail below with respect to the example of table 1 for P blocks and table 2 for B blocks, partition level 0 blocks may be defined according to an MB64_ type syntax element representing a macroblock having 64 × 64 pixels. Similar types of definitions may be formed for any MB [ N ] _ type (where [ N ] refers to a block having N pixels, where N is a positive integer that may be greater than 16). When an NxN block has four partitions of size (N/2) × (N/2), as shown in the last column on FIG. 4A, each of the four partitions may receive a definition of its own type (e.g., MB [ N/2] _ type). For example, for a 64 × 64 block of pixels having four 32 × 32 pixel partitions (a type of MB64_ type), video encoder 50 may introduce an MB32_ type for each of the four 32 × 32 pixel partitions. These macroblock type syntax elements may assist the decoder 60 in decoding large macroblocks and various partitions of large macroblocks, as described in this disclosure. Each nxn macroblock of pixels (where N is greater than 16) generally corresponds to a unique type of definition. Thus, the encoder may generate syntax appropriate for a particular macroblock and indicate to the decoder the maximum size of the macroblock in the coded unit (e.g., frame, slice, or sequence of macroblocks). In this way, the decoder may receive an indication of a syntax decoder to be applied to a macroblock of a coded unit. This also ensures that the decoder can be backward compatible with existing coding standards (e.g., h.264) because the encoder can indicate the type of syntax decoder to be applied to the macroblock (e.g., the standard h.264 or the type specified for processing of larger macroblocks in accordance with the techniques of this disclosure).
In general, for a corresponding type, each MB [ N ] _ type defines a number of pixels in a block that can represent the corresponding type (e.g., 64 × 64), a reference frame (or reference list) for the block, a number of partitions for the block, a size of each partition for the block, how each partition is encoded (e.g., intra or inter and a particular mode), and a reference frame (or reference list) for each partition for the block when the partition is inter coded. For 16x16 and smaller blocks, video encoder 50 may, in some examples, use conventional types of definitions as the type of block (e.g., the type specified by the h.264 standard). In other examples, video encoder 50 may apply the newly defined block types to 16x16 and smaller blocks.
Video encoder 50 may evaluate conventional inter-frame or intra-frame coding methods that use normal macroblock sizes and partitions (e.g., the methods specified by ITU h.264) and inter-frame or intra-frame coding methods described by this disclosure that use larger macroblocks and partitions, and compare the rate-distortion characteristics of each method to determine which method yields the best rate-distortion performance. Video encoder 50 may then select the best coding method, including inter or intra modes, macroblock sizes (large, or normal), and partitions, based on the best or acceptable rate-distortion results for the coding method, and apply it to the block to be coded. As an illustration, video encoder 50 may select to use 64 × 64 macroblocks, 32 × 32 macroblocks, or 16 × 16 macroblocks to encode a particular frame or slice based on rate-distortion results generated when the video encoder uses 64 × 64 macroblocks, 32 × 32 macroblocks, or 16 × 16 macroblock sizes.
In general, two different methods can be used to design intra modes using large macroblocks. As one example, during intra coding, spatial prediction may be performed on a block based directly on neighboring blocks. In accordance with the techniques of this disclosure, video encoder 50 may generate a spatially predicted 32 x 32 block directly based on neighboring pixels of the block and a spatially predicted 64 x 64 block directly based on neighboring pixels of the block. In this way, spatial prediction can be performed at a larger scale compared to 16 × 16 intra blocks. Thus, these techniques may result in some bit rate savings in some examples, e.g., using a smaller number of blocks or partitions per frame or slice.
As another example, video encoder 50 may group four nxn blocks together to produce an (N x 2) x (N x 2) block, and then encode the (N x 2) x (N x 2) block. Using the existing h.264 intra coding mode, video encoder 50 may group four intra coded blocks together, thereby forming a large intra coded macroblock. For example, four intra-coded blocks (each having a size of 16x 16) may be grouped together to form a large 32 x 32 intra-coded block. Video encoder 50 may encode each of the four corresponding nxn blocks using different encoding modes, e.g., I _16 × 16, I _8 × 8, or I _4 × 4 according to h.264. In this manner, each 16x16 block may be assigned its own spatial prediction mode by video encoder 50, e.g., to facilitate favorable encoding results.
Video encoder 50 may design the intra-mode according to either of the two different methods discussed above, and analyze the different methods to determine which method provides better encoding results. For example, video encoder 50 may apply different intra-mode methods and place them in a single candidate pool to allow them to compete with each other for best rate-distortion performance. By using a rate-distortion comparison between different methods, video encoder 50 may determine how to encode each partition and/or macroblock. In particular, video encoder 50 may select the coding modes that yield the best rate-distortion performance for a given macroblock and apply those coding modes to encode the macroblock.
Fig. 5 is a conceptual diagram illustrating a hierarchical view of various partition levels of a large macroblock. Fig. 5 also represents the relationship between various partition levels of a large macroblock as described with respect to fig. 4A. As illustrated in the example of fig. 5, each block of a partition level may have a corresponding Coded Block Pattern (CBP) value. The CBP value forms part of syntax information describing a block or macroblock. In one example, the CBP values are each a one-bit syntax value that indicates whether there are any non-zero transform coefficient values in a given block after the transform and quantization operations.
In some cases, a prediction block may be very close in pixel content to the block to be coded such that all residual transform coefficients are quantized to zero, in which case it may not be necessary to transmit transform coefficients for the coded block. Rather, the CBP value for a block may be set to 0 to indicate that the coded block does not include non-zero coefficients. Alternatively, if the block includes at least one non-zero coefficient, the CBP value may be set to 1. Decoder 60 may use the CBP values to identify coded residual blocks (i.e., having one or more non-zero transform coefficients) and uncoded blocks (i.e., not including non-zero transform coefficients).
According to some of the techniques described in this disclosure, an encoder may assign CBP values to large macroblocks (including their partitions) hierarchically based on whether those macroblocks have at least one non-zero coefficient, and assign CBP values to the partitions to indicate which partitions have non-zero coefficients. Layered CBPs of large macroblocks can facilitate processing of large macroblocks to quickly identify coded large macroblocks and uncoded large macroblocks, and permit identification of coded partitions for each partition level of a large macroblock to determine whether it is necessary to decode the block using residual data.
In one example, a 64 x 64 pixel macroblock at level zero may include syntax information including a CBP64 value (e.g., a one-bit value) to indicate whether the entire 64 x 64 pixel macroblock (including any partitions) has non-zero coefficients. In one example, video encoder 50 "sets" (e.g., to a value of "1") the CBP64 bit to indicate that a 64 x 64 pixel macroblock includes at least one non-zero coefficient. Thus, when the CBP64 value is set, for example, to the value "1," a 64 x 64 pixel macroblock includes at least one non-zero coefficient somewhere therein. In another example, video encoder 50 "clears" (e.g., to a value of "0") the CBP64 value to represent that a 64 x 64 pixel macroblock has all-zero coefficients. Thus, when the CBP64 value is cleared, e.g., to a value of "0," a 64 x 64 pixel macroblock is indicated as having all-zero coefficients. A macroblock having a CBP64 value of "0" generally does not require residual data to be transmitted in the bitstream, while a macroblock having a CBP64 value of "1" generally requires residual data to be transmitted in the bitstream for decoding the macroblock.
A 64 x 64 pixel macroblock with all-zero coefficients need not include CBP values for its partitions or sub-blocks. That is, because a 64 × 64 pixel macroblock has all-zero coefficients, each of the partitions must also have all-zero coefficients. Conversely, a 64 x 64 pixel macroblock that includes at least one non-zero coefficient may further include a CBP value for partitions at the next partition level. For example, the CBP64 with a value of 1 may include additional syntax information in the form of a one-bit value CBP32 for each 32 × 32 partition of a 64 × 64 block. That is, in one example, each 32 × 32 pixel partition (e.g., four partitioned blocks of level 1 in fig. 5) of a 64 × 64 pixel macroblock is assigned a CBP32 value as part of the syntax information for the 64 × 64 pixel macroblock. As with the CBP64 values, each CBP32 value may include a bit that is set to a value of 1 when the corresponding 32 x 32 pixel block has at least one non-zero coefficient and is cleared to a value of 0 when the corresponding 32 x 32 pixel block has all-zero coefficients. The encoder may further indicate a maximum size of the macroblocks in the coded unit (e.g., a frame, slice, or sequence) in the syntax of the coded unit that includes the plurality of macroblocks to indicate to the decoder how to interpret the syntax information for each macroblock (e.g., which syntax decoder is used to process the macroblocks in the coded unit).
In this way, a 64 x 64 pixel macroblock with all-zero coefficients may use a single bit to represent the fact that the macroblock has all-zero coefficients, while a 64 x 64 pixel macroblock with at least one non-zero coefficient may include CBP syntax information that includes at least five bits: the first bit used to represent that a 64 x 64 pixel macroblock has non-zero coefficients, and four additional bits that each represent whether a corresponding one of the four 32 x 32 pixel partitions of the macroblock includes at least one non-zero coefficient. In some examples, the fourth extra bit may not be included when the first three of the four extra bits are zero, which the decoder may interpret as a final partition of one. That is, the encoder may determine that the last bit has a value of 1 when the first three bits are zero and when the bit representing the higher level hierarchy has a value of 1. For example, the CBP64 prefix value "10001" may be shortened to "1000" because the first bit indicates that at least one of the four partitions has non-zero coefficients, and the next three zeros indicate that the first three partitions have all-zero coefficients. Thus, the decoder may, for example, infer from the bit string "1000": the last partition includes non-zero coefficients without explicit bits to inform the decoder of the fact. That is, the decoder may interpret the CBP64 prefix "1000" as "10001".
Likewise, the one-bit CBP32 may be set to a value of "1" when the 32 x 32 pixel partition includes at least one non-zero coefficient, and the one-bit CBP32 may be set to a value of "0" when all of the coefficients have zero values. If a 32 x 32 pixel partition has a CBP value of "1," then the partition at the next partition level for that 32 x 32 partition may be assigned a CBP value to indicate whether the respective partition includes any non-zero coefficients. Thus, CBP values may be assigned hierarchically at each partition level until there are no other partition levels or no partitions including non-zero coefficients.
In the above manner, the encoder and/or decoder may utilize layered CBP values to represent a large macroblock (e.g., 64 × 64 or 32 × 32) and whether its partitions include at least one non-zero or all-zero coefficient. Thus, the encoder can: encoding a large macroblock of coded units of a digital video stream such that the large macroblock comprises greater than 16x16 pixels; generating block type syntax information identifying a size of the block; generating a CBP value for the block such that the CBP value identifies whether the block includes at least one non-zero coefficient; and generating additional CBP values for various partition levels of the block, as applicable.
In one example, the layered CBP values may include an array of bits (e.g., a bit vector) whose length depends on the value of the prefix. The array may further represent a hierarchy (e.g., tree structure) of CBP values, as shown in fig. 5. The array may represent nodes of the tree in a breadth-first (break-first) manner, where each node corresponds to a bit in the array. In one example, when a node of a tree has a bit set to "1", the node has four branches (corresponding to four partitions), and when the bit is cleared to "0", the node has no branches.
In this example, to identify the value of a node diverging from a particular node X, the encoder and/or decoder may determine four consecutive bits from node Y that represent the node diverging from node X by calculating the following equation:
where tree [ ] corresponds to the bit array with a start index of 0, i is an integer index into the array tree [ ], X corresponds to the index of node X in the tree [ ], and Y corresponds to the index of node Y, which is the first branch node of node X. The three subsequent array positions (i.e., y +1, y +2, and y +3) correspond to other branch nodes of node X.
An encoder, such as video encoder 50 (fig. 2), may assign CBP values for a16 x16 pixel partition of a 32 x 32 pixel partition having at least one non-zero coefficient as part of the syntax of a 64 x 64 pixel macroblock using existing methods, such as the method specified by ITU h.264 for setting CBP values for a16 x16 block. The encoder may also select CBP values for partitions of a 32 x 32 pixel partition having at least one non-zero coefficient based on the size of the partition, the type of block corresponding to the partition (e.g., chroma block or luma block), or other characteristics of the partition. An example method for setting the CBP values for a partition of a 32 x 32 pixel partition is discussed in further detail with reference to fig. 8 and 9.
Fig. 6-9 are flow diagrams illustrating example methods for setting various Coded Block Pattern (CBP) values, in accordance with the techniques of this disclosure. Although the example methods of fig. 6-9 are discussed with respect to 64 x 64 pixel macroblocks, it should be understood that similar techniques may be applied to assign hierarchical CBP values for other sized macroblocks. Although the examples of fig. 6-9 are discussed with respect to video encoder 50 (fig. 2), it should be understood that other encoders may use a similar approach to assign CBP values to macroblocks that are larger than the standard. Likewise, the decoder may utilize a similar, but reciprocal approach to interpreting the meaning of a particular CBP value for a macroblock. For example, if an inter-coded macroblock received in the bitstream has a CBP value of "0," the decoder may not receive residual data for the macroblock, and may only generate a prediction block identified by the motion vector as a decoded macroblock, or a group of prediction blocks identified by the motion vector relative to a partition of the macroblock.
FIG. 6 is a flow diagram illustrating an example method for setting CBP64 values for an example 64 x 64 pixel macroblock. A similar approach may be applied to macroblocks larger than 64 x 64. Initially, video encoder 50 receives a 64 × 64 pixel macroblock (100). Motion estimation unit 36 and motion compensation unit 35 may then generate one or more motion vectors and one or more residual blocks, respectively, for encoding the macroblock. The output of transform unit 38 generally includes an array of residual transform coefficient values for a residual block of an intra-coded block or an inter-coded block, which is quantized by quantization unit 40 to generate a series of quantized transform coefficients.
Entropy coding unit 46 may provide entropy coding and other coding functions separate from entropy coding. For example, in addition to CAVLC, CABAC, or other entropy coding functions, entropy coding unit 46 or another unit of video encoder 50 may determine CBP values for large macroblocks and partitions. In particular, entropy coding unit 46 may determine the CBP64 value for a 64 x 64 pixel macroblock by first determining whether the macroblock has at least one non-zero, quantized transform coefficient (102). When entropy coding unit 46 determines that all transform coefficients have zero values (the "no" branch of 102), entropy coding unit 46 clears the CBP64 value for the 64 × 64 macroblock (e.g., resets the bits for the CBP64 value to "0") (104). When entropy coding unit 46 identifies at least one non-zero coefficient of a 64 × 64 macroblock (the "yes" branch of 102), entropy coding unit 46 sets the CBP64 value (e.g., sets the bit for the CBP64 value to "1") (106).
When a macroblock has all-zero coefficients, entropy coding unit 46 does not need to establish any additional CBP values for the partitions of the macroblock, which may reduce overhead. However, in one example, when the macroblock has at least one non-zero coefficient, entropy coding unit 46 proceeds to determine CBP values for each of the four 32 x 32 pixel partitions of the 64 x 64 pixel macroblock (108). Entropy coding unit 46 may utilize the method described with respect to fig. 7 four times, one for each of the four partitions, to establish four CBP32 values, each CBP32 value corresponding to a different one of the four 32 x 32 pixel partitions of a 64 x 64 macroblock. In this way, when a macroblock has all-zero coefficients, entropy coding unit 46 may transmit a single bit having a value of "0" to indicate that the macroblock has all-zero coefficients, while when the macroblock has at least one non-zero coefficient, entropy coding unit 46 may transmit five bits: one bit for the macroblock, and four bits each corresponding to one of the four partitions of the macroblock. In addition, when the partition includes at least one non-zero coefficient, residual data for the partition may be sent in the encoded bitstream. As with the example of CBP64 discussed above, when the first three of the four additional bits are zero, the fourth additional bit may not be necessary because the decoder may determine that it has a value of 1. Thus, in some examples, the encoder may only send three 0's (i.e., "000") instead of three 0's and one 1 (i.e., "0001").
Fig. 7 is a flow diagram illustrating an example method for setting CBP32 values for a 32 x 32 pixel partition of a 64 x 64 pixel macroblock. Initially, for the next partition level, entropy coding unit 46 receives a 32 × 32 pixel partition of a macroblock (e.g., one of the four partitions referred to with respect to fig. 6) (110). Entropy coding unit 46 then determines the CBP32 value for the 32 x 32 pixel partition by first determining whether the partition includes at least one non-zero coefficient (112). When entropy coding unit 46 determines that all coefficients for a partition have zero values ("no" branch of 112), entropy coding unit 46 clears the CBP32 value (e.g., resets the bits for the CBP32 value to "0") (114). When entropy coding unit 46 identifies at least one non-zero coefficient of the partition (the "yes" branch of 112), entropy coding unit 46 sets the CBP32 value (e.g., sets the bit for the CBP32 value to the value of "1") (116).
In one example, when a partition has all-zero coefficients, entropy coding unit 46 does not establish any additional CBP values for the partition. However, when the partitions include at least one non-zero coefficient, entropy coding unit 46 determines a CBP value for each of the four 16 × 16 pixel partitions of the 32 × 32 pixel partition of the macroblock. Entropy coding unit 46 may utilize the methods described with respect to fig. 8 to establish four CBP16 values that each correspond to one of the four 16x16 pixel partitions.
In this way, when a partition has all-zero coefficients, entropy coding unit 46 may set bits having a value of "0" to indicate that the partition has all-zero coefficients, while when a partition has at least one non-zero coefficient, entropy coding unit 46 may include five bits: one bit for a partition, and four bits each corresponding to a different one of four sub-partitions of the partition of the macroblock. Thus, when a partition in a previous partition level has at least one non-zero transform coefficient value, each additional partition level may present four additional CBP bits. As one example, if a 64 × 64 macroblock has a CBP value of 1, and four 32 × 32 partitions have CBP values of 1, 0, 1, and 1, respectively, then the total CBP value up to that point is 11011. Additional CBP bits for additional partitions of the 32 x 32 partition (e.g., 16x16 partitions) may be added.
Fig. 8 is a flow diagram illustrating an example method for setting CBP16 values for a16 x16 pixel partition of a 32 x 32 pixel partition of a 64 x 64 pixel macroblock. For a particular 16x16 pixel partition, video encoder 50 may utilize CBP values as specified by a video coding standard (e.g., ITU h.264), as discussed below. For other 16x16 partitions, video encoder 50 may utilize CBP values in accordance with other techniques of this disclosure. Initially, as shown in fig. 8, entropy coding unit 46 receives a16 × 16 partition (e.g., one of the 16 × 16 partitions of the 32 × 32 partition described with respect to fig. 7) (120).
Entropy coding unit 46 may then determine whether the motion partition of the 16x16 pixel partition is greater than the 8 x 8 block of pixels (122). In general, motion partitions describe partitions in which motion is more focused. For example, a16 x16 pixel partition having only one motion vector may be considered a16 x16 motion partition. Similarly, for a16 × 16 pixel partition having two 8 × 16 partitions each having one motion vector, each of the two 8 × 16 partitions may be considered an 8 × 16 motion partition. In any case, in the example of fig. 8, when the motion partition is not greater than an 8 x 8 block of pixels ("no" branch of 122), entropy coding unit 46 assigns CBP values to the 16x16 block of pixels in the same manner as specified by ITU h.264 (124).
When there are motion partitions of the 16x16 pixel partition that are larger than the 8 x 8 block of pixels (yes branch of 122), entropy coding unit 46 constructs and sends the lumacbp16 value using steps subsequent to step 125 (125). In the example of fig. 8, to construct the lumacbp16 value, entropy coding unit 46 determines whether the 16 × 16 pixel luma component of the partition has at least one non-zero coefficient (126). In the example of fig. 8, when the 16 × 16 pixel luma component has all-zero coefficients ("no" branch of 126), entropy coding unit 46 assigns a CBP16 value according to the coded block mode chroma portion of ITU h.264 (128).
When entropy coding unit 46 determines that the 16 × 16 pixel luma component has at least one non-zero coefficient ("yes" branch of 126), entropy coding unit 46 determines a transform size flag for the 16 × 16 pixel partition (130). The transform size flag generally indicates the transform being used for the partition. The transform represented by the transform size flag may comprise one of a 4 × 4 transform, an 8 × 8 transform, a16 × 16 transform, a16 × 8 transform, or an 8 × 16 transform. The transform size flag may comprise an integer value corresponding to the enumerated value identifying one of the possible transforms. Entropy coding unit 46 may then determine whether the transform size flag represents a transform size greater than or equal to 16 × 8 (or 8 × 16) (132).
In the example of fig. 8, when the transform size flag does not indicate that the transform size is greater than or equal to 16 × 8 (or 8 × 16) (the "no" branch of 132), entropy coding unit 46 assigns a value to CBP16 according to ITU h.264 (134). When the transform size flag indicates that the transform size is greater than or equal to 16 × 8 (or 8 × 16) (the "yes" branch of 132), entropy coding unit 46 then determines whether the type of 16 × 16 pixel partition is two 16 × 8 pixel partitions or two 8 × 16 pixel partitions (136).
In the example of fig. 8, when the type of 16 × 16 pixel partition is not two 16 × 8 pixel partitions and is not two 8 × 16 pixel partitions ("no" branch of 138), entropy coding unit 46 assigns CBP16 values for chroma coded block partitions as specified by ITU h.264 (140). When the type of 16 × 16 pixel partition is two 16 × 8 or two 8 × 16 pixel partitions (the "yes" branch of 138), entropy coding unit 46 also uses the chroma coded block pattern specified by ITU h.264, but additionally assigns a two-bit luma16 × 8_ CBP value to the CBP16 value (e.g., according to the method described with respect to fig. 9) (142).
FIG. 9 is a flow diagram illustrating an example method for determining a two-bit luma16 × 8_ CBP value. Entropy coding unit 46 receives 16 × 16 pixel partitions that are further partitioned into two 16 × 8 or two 8 × 16 pixel partitions (150). Entropy coding unit 46 generally assigns each bit of luma16 x 8_ CBP according to whether the corresponding sub-block of the 16x16 pixel partition includes at least one non-zero coefficient.
Entropy coding unit 46 determines whether a first sub-block of the 16x16 pixel partition has at least one non-zero coefficient to determine whether the first sub-block has at least one non-zero coefficient (152). When the first sub-block has all-zero coefficients (the "no" branch of 152), entropy coding unit 46 clears the first bit of luma16 × 8_ CBP (e.g., assigns luma16 × 8_ CBP [0] to a value of "0") (154). When the first sub-block has at least one non-zero coefficient (yes branch of 152), entropy coding unit 46 sets the first bit of luma16 × 8_ CBP (e.g., assigns luma16 × 8_ CBP [0] to a value of "1") (156).
Entropy coding unit 46 also determines whether a second sub-partition of the 16x16 pixel partition has at least one non-zero coefficient (158). When the second sub-partition has all-zero coefficients (the "no" branch of 158), entropy coding unit 46 clears the second bit of luma16 × 8_ CBP (e.g., assigns luma16 × 8_ CBP [1] to the value "0") (160). When the second sub-block has at least one non-zero coefficient (yes branch of 158), entropy coding unit 46 then sets the second bit of luma16 × 8_ CBP (e.g., assigns luma16 × 8_ CBP [1] to a value of "1") (162).
The following pseudo-code provides one example implementation of the method described with respect to fig. 8 and 9:
in the pseudo code, "lumacbp 16" corresponds to an operation that appends a one-bit flag indicating whether the entire 16 × 16 luma block has non-zero coefficients. When "lumacbp 16" is equal to 1, there is at least one non-zero coefficient. The function "Transform _ size _ flag" refers to a calculation performed with a result indicating the Transform used, e.g., one of a 4 × 4 Transform, an 8 × 8 Transform, a16 × 16 Transform (for motion partitions equal to or greater than 16 × 16), a16 × 8 Transform (for P _16 × 8), or an 8 × 16 Transform (for P _8 × 16). Transition _ SIZE _ coarse _ once _16 × 8 is the enumerated value (e.g., "2") used to indicate that the TRANSFORM SIZE is GREATER THAN or equal to 16 × 8 or 8 × 16. The result of transform _ size _ flag is incorporated into the syntax information of a 64 × 64 pixel macroblock.
"Luma 16 × 8_ cbp" refers to a calculation that produces two bits, where each bit indicates whether one of the two partitions, P _16 × 8 or P _8 × 16, has non-zero coefficients. The two-bit number generated by luma16 × 8_ cbp is incorporated into the syntax of a 64 × 64 pixel macroblock. "chroma _ cbp" may be calculated in the same manner as codedblockpattern chroma as specified by ITU h.264. The calculated chroma _ cbp value is incorporated into the syntax information for a 64 x 64 pixel macroblock. The function h264_ CBP can be calculated in the same way as the CBP defined in ITU h.264. The calculated H264_ cbp value is incorporated into the syntax information for a 64 × 64 pixel macroblock.
In general, the method according to fig. 6-9 may comprise: encoding a video block having a size greater than 16x16 pixels with a video encoder; generating block type syntax information indicating a size of the block; and generating a coded block pattern value for the encoded block, wherein the coded block pattern value indicates whether the encoded block includes at least one non-zero coefficient.
Fig. 10 is a block diagram illustrating an example arrangement of a 64 x 64 pixel macroblock. The macroblock of fig. 10 contains four 32 x 32 partitions labeled A, B, C and D in fig. 10. As discussed with respect to fig. 4A, in one example, a block may be partitioned in any of four ways: an entire block without sub-partitions (64 × 64), two equal-sized horizontal partitions (32 × 64 and 32 × 64), two equal-sized vertical partitions (64 × 32 and 64 × 32), or four equal-sized square partitions (32 × 32, and 32 × 32).
In the example of FIG. 10, the entire block partition includes each of blocks A, B, C and D; a first of the two equal-sized horizontal partitions includes a and B, while a second of the two equal-sized horizontal partitions includes C and D; a first of the two equal-sized vertical partitions includes a and C, while a second of the two equal-sized vertical partitions includes B and D; and four equally sized square partitions correspond to one of each of A, B, C and D. Similar partitioning schemes may be used for blocks of any size, e.g., greater than 64 × 64 pixels, 32 × 32 pixels, 16 × 16 pixels, 8 × 8 pixels, or other sizes of video blocks.
When intra coding a video block, various methods may be used to partition the video block. Further, each of the partitions may be intra coded differently (i.e., with different modes, such as different intra modes). For example, a 32 × 32 partition (e.g., partition a of fig. 10) may be further partitioned into four equally sized blocks of 16 × 16 pixels in size. As one example, ITU h.264 describes three different methods for intra-coding a16 × 16 macroblock, including intra-coding at the 16 × 16 level, intra-coding at the 8 × 8 level, and intra-coding at the 4 × 4 level. However, ITU h.264 specifies that each partition of a16 × 16 macroblock be encoded using the same intra coding mode. Thus, according to ITU h.264, if one sub-block of a16 × 16 macroblock is to be intra coded at the 4 × 4 level, each sub-block of the 16 × 16 macroblock must be intra coded at the 4 × 4 level.
On the other hand, an encoder configured in accordance with the techniques of this disclosure may apply a mixed mode approach. For intra coding, for example, a large macroblock may have various partitions encoded with different coding modes. As an illustration, in 32 × 32 partitions, one 16 × 16 partition may be intra coded at the 4 × 4 pixel level, while other 16 × 16 partitions may be intra coded at the 8 × 8 pixel level, and one 16 × 16 partition may be intra coded at the 16 × 16 level (e.g., as shown in fig. 4B).
When a video block is to be partitioned into four equally sized sub-blocks for intra coding, the first block to be intra coded may be the upper left block, followed by the block immediately to the right of the first block, followed by the block immediately below the first block, and finally the lower right block of the first block. Referring to the example block of fig. 10, the order of intra coding would proceed from a to B to C and finally to D. Although fig. 1O depicts a 64 × 64 pixel macroblock, intra-coding of partitioned blocks of different sizes may follow this same ordering.
When a video block is to be inter-coded as part of a P frame or P slice, the block may be partitioned into any of the four partitions described above, each of which may be separately encoded. That is, each partition of the block may be encoded according to a different encoding mode, either intra-coding (I-coding) or inter-coding (P-coding) with reference to a single reference frame/slice/list. Table 1 below summarizes the inter-coding information for each potential partition of a block of size nxn. In table 1 referring to the position of "M", M is N/2. In table 1 below, L0 refers to "list 0" (i.e., reference frame/slice/list). When deciding how to best partition an nxn block, an encoder (e.g., video encoder 50) may analyze rate-distortion cost information for each MB _ N _ type (i.e., each partition type) based on the lagrangian multiplier (as discussed in more detail with respect to fig. 11), selecting the lowest cost as the best partitioning method.
TABLE 1
In table 1 above, the element of the "MB _ N _ type" column is the key of each partition type of the N × N block. The element of the "name of MB _ N _ type" column is the name of the different partition types of the N × N block. "P" in the name refers to inter-coding a block using P coding (i.e., referring to a single frame/slice/list). "L0" in the name refers to a reference frame/slice/list (e.g., "list 0") that is used as a reference frame or slice for P coding. "N × N" refers to a partition being an entire block, "N × M" refers to a partition being two partitions of width N and height M, "M × N" refers to a partition being two partitions of width M and height N, "M × M" refers to a partition being four equally sized partitions each having width M and height M.
In table 1, PN Skip implies "skipping" the block, e.g., because the block resulting from the coding has all-zero coefficients. The elements of the "prediction mode part 1" column refer to the reference frame/slice/list for sub-partition 1 of a partition, while the elements of the "prediction mode part 2" column refer to the reference frame/slice/list for sub-partition 2 of a partition. Since P _ L0_ nxn has only a single partition, the corresponding element of "prediction mode portion 2" is "N/a" since there is no second sub-partition. For PN _ mxm, there are four partition blocks that can be separately encoded. Thus, the two prediction mode columns for PN _ mxm include "N/a". Like P _ L0_ nxn, PN _ Skip has only a single section, so the corresponding element of the "prediction mode section 2" column is "N/a".
Table 2 below includes columns and elements similar to those of table 1. However, table 2 corresponds to various encoding modes for inter-coded blocks using bi-prediction (B-coding). Thus, each partition may be encoded by either or both of the first frame/slice/list (L0) and the second frame/slice/list (L1). "biped" refers to the corresponding partition predicted from both L0 and L1. In Table 2, the meaning of the column labels and values is similar to that used in Table 1.
TABLE 2
FIG. 11 is a flow diagram illustrating an example method for computing a best partitioning and encoding method for an NxN pixel video block. In general, the method of fig. 11 includes calculating the cost of each different encoding method (e.g., various spatial or temporal modes) as applied to each different partitioning method, for example, as shown in fig. 4A, and selecting the combination of encoding mode and partitioning method that has the best rate-distortion cost for an N × N pixel video block. Costs may generally be calculated using lagrangian multipliers and rate and distortion values such that the rate-distortion cost is distortion + rate, where distortion represents the error between the original block and the coded block and rate represents the bit rate necessary to support the coding mode. In some cases, the rate and distortion may be determined at the macroblock, partition, slice, or frame level.
Initially, video encoder 50 receives an nxn video block to be encoded (170). For example, video encoder 50 may receive a 64 × 64 large macroblock or partition thereof (e.g., a 32 × 32 or 16 × 16 partition) for which video encoder 50 will select an encoding and partitioning method. Video encoder 50 then calculates the cost of encoding the nxn block using a plurality of different coding modes (e.g., different intra coding modes and inter coding modes) (172). To calculate the cost of spatially encoding an nxn block, video encoder 50 may calculate the distortion and bit rate required to encode the nxn block with a given coding mode, and then calculate the cost as the distortion(Mode,N×N)Velocity of + lambda(Mode,N×N). Video encoder 50 may encode the macroblock using a specified coding technique and determine the resulting bit rate cost and distortion. Distortion may be determined based on pixel differences between pixels in a coded macroblock and pixels in an original macroblock, e.g., based on a Sum of Absolute Differences (SAD) metric, a Sum of Squared Differences (SSD) metric, or other pixel difference metric.
Video encoder 50 may then partition the nxn block into two non-overlapping waters of equal sizeFlat N × (N/2) partitions. Video encoder 50 may calculate a cost for encoding each of the partitions using the various coding modes (176). For example, to calculate the cost of encoding the first nx (N/2) partition, video encoder 50 may calculate the distortion and bit rate at which to encode the first nx (N/2) partition, and then calculate the cost as the distortion(Mode,FIRSTPARTITION,N×(N/2))Velocity of + lambda(Mode, FIRST PARTITION,N×(N/2))。
Video encoder 50 may then partition the nxn block into two equal-sized non-overlapping vertical (N/2) xn partitions. Video encoder 50 may calculate a cost for encoding each of the partitions using the various coding modes (178). For example, to calculate the cost of encoding a first of the (N/2) xn partitions, video encoder 50 may calculate the distortion and bit rate to encode the first (N/2) xn partition, and then calculate the cost as distortion(Mode,FIRST PARTITION, (N/2)×N)Velocity of + lambda(Mode,FIRST PARTITION,(N/2)×N). Video encoder 50 may perform similar calculations for the cost of encoding the second of the (N/2) xn macroblock partitions.
Video encoder 50 may then partition the nxn block into four equal-sized, non-overlapping (N/2) × (N/2) partitions. Video encoder 50 may calculate a cost of encoding the partition using various coding modes (180). To calculate the cost of encoding the (N/2) × (N/2) partition, video encoder 50 may first calculate the distortion and bit rate for encoding the upper left (N/2) × (N/2) partition, and by cost(Mode,UPPER-LEFT,(N/2)×(N/2))Distortion (distortion)(Mode,UPPER-LEFT, (N/2)×(N/2))Velocity of + lambda(Mode,UPPER-LEFT,(N/2)×(N/2))The cost thereof is obtained. Video encoder 50 may similarly calculate the cost for each (N/2) × (N/2) block in the following order: (1) the left upper partition, (2) the right upper partition, (3) the left lower partition, and (4) the right lower partition. In some examples, video encoder 50 may recursively invoke this method for one or more of the (N/2) x (N/2) partitions to calculate further partitioning and separately encode each of the (N/2) x (N/2) partitions (e.g., as (N)(ii)/cost of (N/2) × (N/4) partitions, (N/4) × (N/2) partitions, and (N/4) × (N/4) partitions).
Next, video encoder 50 may determine which combination of segmentation and encoding modes yields the best (i.e., lowest) cost in terms of rate and distortion (182). For example, video encoder 50 may compare the best cost of encoding two adjacent (N/2) × (N/2) partitions to the best cost of encoding an nx (N/2) partition that includes the two adjacent (N/2) × (N/2) partitions. Video encoder 50 may select a lower cost option to encode an nx (N/2) partition when the cumulative cost of encoding the two adjacent (N/2) x (N/2) partitions exceeds the cost of encoding an nx (N/2) partition that includes the two adjacent (N/2) x (N/2) partitions. In general, video encoder 50 may apply each combination of a partitioning method and an encoding mode for each partition to identify the lowest cost partitioning and encoding method. In some cases, video encoder 50 may be configured to evaluate a more limited set of partition and encoding mode combinations.
Upon determining the best (e.g., lowest cost) partitioning and encoding method, video encoder 50 may encode the nxn macroblock using the best-cost determined method (184). In some cases, the result may be that a large macroblock has partitions coded using different coding modes. The ability to apply mixed mode coding to large macroblocks such that different coding modes are applied to different partitions in a large macroblock may permit the macroblock to be coded at reduced cost.
In some examples, a method of coding with a hybrid mode may comprise: receiving, with video encoder 50, a video block having a size greater than 16x16 pixels; partitioning the block into partitions; encoding one of the partitions with a first encoding mode; encoding another one of the partitions with a second encoding mode different from the first encoding mode; and generating block type syntax information indicating a size of the block and identifying the partition and the coding mode used to code the partition.
Fig. 12 is a block diagram illustrating an example large macroblock of 64 x 64 pixels with various partitions and different selected encoding methods for each partition. In the example of fig. 12, each partition is labeled with one of "I", "P", or "B". The partitions labeled "I" are the partitions that the encoder has selected to utilize intra coding, e.g., based on rate-distortion assessment. The partition labeled "P" is a partition that the encoder has selected to utilize single reference inter coding, e.g., based on rate-distortion evaluation. The partition labeled "B" is a partition that the encoder has selected to utilize bi-predictive inter coding, e.g., based on rate-distortion estimation. In the example of fig. 12, different partitions within the same large macroblock have different coding modes, including different partition or sub-partition sizes and different intra or inter coding modes.
A large macroblock is a macroblock identified by a macroblock syntax element that identifies the macroblock type (e.g., mb64_ type or mb32_ type) for a given coding standard, such as an extension of the h.264 coding standard. The macroblock type syntax element may be provided as a macroblock header syntax element in the encoded video bitstream. The partitions of I-coding, P-coding, and B-coding illustrated in fig. 12 may be coded according to different coding modes (e.g., intra-prediction modes or inter-prediction modes with various block sizes, including large block size modes for large partitions of size greater than 16x16 or h.264 modes for partitions of size less than or equal to 16x 16).
In one example, an encoder (e.g., video encoder 50) may use the example methods described with respect to fig. 11 to select various encoding modes and partition sizes for different partitions and sub-partitions of the large macroblock of the example of fig. 12. For example, video encoder 50 may receive a 64 × 64 macroblock, perform the method of fig. 11, and thus generate the example macroblock of fig. 12 with various partition sizes and coding modes. However, it should be understood that the selection of the partitioning and coding modes may result from applying the method of fig. 11, e.g., based on the type of frame from which the macroblock was selected, and based on the inputted macroblock for which the method was performed. For example, when a frame comprises an I-frame, each partition will be intra-coded. As another example, when the frame comprises a P frame, each partition may be intra-coded or inter-coded based on a single reference frame (i.e., no bi-directional prediction).
For purposes of illustration, assume that the example macroblock of fig. 12 is selected from a bi-directional predicted frame (B-frame). In other examples, where the macroblock is selected from a P frame, video encoder 50 will not encode the partition using bi-prediction. Likewise, in the case where the macroblocks are selected from I-frames, video encoder 50 will not encode the partitions using inter-coding (P-coding or B-coding). In any case, however, video encoder 50 may select various partition sizes for different portions of a macroblock, and select to encode each partition using any available encoding modes.
In the example of fig. 12, it is assumed that the combination of rate-distortion analysis based partitions and mode selection produces one 32 x 32B-coded partition, one 32 x 32P-coded partition, one 16x 32I-coded partition, one 32 x 16B-coded partition, one 16x 16P-coded partition, one 16x 8P-coded partition, one 8 x 16P-coded partition, one 8 x 8B-coded partition, one 8 x 8I-coded partition, and numerous smaller sub-partitions with various coding modes. The example of fig. 12 is provided for the purpose of conceptually illustrating mixed mode coding of partitions in large macroblocks, and should not necessarily be considered as representing the actual coding results of a particular large 64 x 64 macroblock.
Fig. 13 is a flow diagram illustrating an example method for determining an optimal size for encoding a macroblock of a frame or slice of a video sequence. Although described with respect to selecting an optimal size for a macroblock of a frame, a method similar to that described with respect to fig. 13 may be used to select an optimal size for a macroblock of a slice. Likewise, although the method of fig. 13 is described with respect to video encoder 50, it should be understood that any encoder may utilize the example method of fig. 13 to determine an optimal (e.g., lowest cost) size for encoding a macroblock of a frame of a video sequence. In general, the method of fig. 13 includes performing the encoding pass three times, one for each of a16 × 16 macroblock, a 32 × 32 macroblock, and a 64 × 64 macroblock, and the video encoder may calculate a rate-distortion metric for each pass to determine which macroblock size provides the best rate-distortion.
Video encoder 50 may first encode a frame using 16 × 16 pixel macroblocks (e.g., using functional encoding (frame, MB16_ type)) (190) during a first encoding pass to generate an encoded frame F16. After the first encoding pass, video encoder 50 may calculate the bit rate and distortion as R based on the use of 16 × 16 pixel macroblocks, respectively16And D16(192). Video encoder 50 may then use the lagrangian multiplier C16=D16+λ*R16To calculate the cost C of using a16 x16 pixel macroblock16A rate-distortion measure of the form (194). The coding mode and partition size for a16 x16 pixel macroblock may be selected, for example, according to the h.264 standard.
Video encoder 50 may then encode the frame using 32 x 32 pixel macroblocks (e.g., using functional encoding (frame, MB32_ type)) (196) during a second encoding pass to generate an encoded frame F32. After the second encoding pass, video encoder 50 may calculate the bit rate and distortion as R based on the use of 32 x 32 pixel macroblocks, respectively32And D32(198). Video encoder 50 may then use the lagrangian multiplier C32=D32+λ*R32To calculate the cost C of using 32 x 32 pixel macroblocks32A rate-distortion measure of the form (200). The coding mode and partition size for a 32 x 32 pixel macroblock may be selected, for example, using rate and distortion assessment techniques as described with reference to fig. 11 and 12.
Video encoder 50 may then encode the frame using 64 × 64 pixel macroblocks (e.g., using functional encoding (frame, MB64_ type)) (202) during the third encoding pass to generate an encoded frame F64. After the third encoding pass, video encoder 50 may calculate the bit rate and distortion as R based on the use of 64 × 64 pixel macroblocks, respectively64And D64(204). Video encoder 50 may then use lagrangLambertian multiplier C64=D64+λ*R64To calculate the cost C in using 64 x 64 pixel macroblocks64A rate-distortion measure of the form (206). The coding mode and partition size for a 64 x 64 pixel macroblock may be selected, for example, using rate and distortion assessment techniques as described with reference to fig. 11 and 12.
Next, video encoder 50 may determine metric C for the frame16、C32And C64Which is the lowest (208). Video encoder 50 may select to use the frame encoded with the macroblock size that yields the lowest cost (210). Thus, for example, when C16At a minimum, video encoder 50 may forward frame F in the bitstream encoded as an encoded frame with 16 × 16 macroblocks16For storage or transmission to a decoder. When C is present32At a minimum, video encoder 50 may forward F encoded with 32 × 32 macroblocks32. When C is present64At a minimum, video encoder 50 may forward F encoded with 64 × 64 macroblocks64。
In other examples, video encoder 50 may perform the encoding passes in any order. For example, video encoder 50 may begin with a 64 × 64 macroblock encoding pass, then perform a 32 × 32 macroblock encoding pass, and end with a16 × 16 macroblock encoding pass. Also, similar approaches may be used to encode other coded units that include multiple macroblocks (e.g., slices of macroblocks having different sizes). For example, video encoder 50 may apply a method similar to the method of fig. 13 to select the best macroblock size for encoding a slice of a frame (rather than an entire frame).
Video encoder 50 may also transmit an identifier of the size of a macroblock of a particular coded unit (e.g., a frame or slice) in the header of the coded unit for use by a decoder. According to the method of FIG. 13, a method may include: receiving, with a digital video encoder, coded units of a digital video stream; calculating a first rate-distortion metric corresponding to a rate-distortion for encoding the coded unit using a first plurality of blocks each including 16x16 pixels; calculating a second rate-distortion metric corresponding to a rate-distortion for encoding the coded unit using a second plurality of blocks each including greater than 16x16 pixels; and determining which of the first rate-distortion metric and the second rate-distortion metric is lowest for the coded unit. The method may further comprise: the coded unit is encoded using a first plurality of blocks when the first rate-distortion metric is determined to be lowest, and the coded unit is encoded using a second plurality of blocks when the second rate-distortion metric is determined to be lowest.
Fig. 14 is a block diagram illustrating an example wireless communication device 230 that includes a video encoder/decoder CODEC234 that may encode and/or decode digital video data using larger than standard macroblocks using any of the various techniques described in this disclosure. In the example of fig. 14, wireless communication device 230 includes a camera 232, a video encoder-decoder (CODEC)234, a modulator/demodulator (modem) 236, a transceiver 238, a processor 240, a user interface 242, a memory 244, a data storage device 246, an antenna 248, and a bus 250.
The components illustrated in fig. 14 included in wireless communication device 230 may be implemented by any suitable combination of hardware, software, and/or firmware. In the illustrated example, the components are depicted as separate units. However, in other examples, the various components may be integrated in a combined unit within common hardware and/or software. As one example, the memory 244 may store instructions executable by the processor 240 corresponding to various functions of the video CODEC 234. As another example, the camera 232 may include a video CODEC that performs the functions of the video CODEC234 (e.g., encoding and/or decoding video data).
In one example, camera 232 may correspond to video source 18 (FIG. 1). In general, camera 232 may record video data captured by the sensor array to generate digital video data. The camera 232 may send the raw, recorded digital video data to the video CODEC234 for encoding and then to the data storage device 246 for data storage via the bus 250. Processor 240 may send signals to camera 232 via bus 250 regarding the mode in which video is recorded, the frame rate at which video is recorded, the time at which recording or changing of the frame rate mode is completed, the time at which video data is sent to video CODEC234, or signals indicating other modes or parameters.
The user interface 242 may include one or more interfaces (e.g., input and output interfaces). For example, the user interface 242 may include a touch screen, keypad, buttons, a screen that may act as a viewfinder, a microphone, a speaker, or other interface. When camera 232 receives video data, processor 240 may signal camera 232 to send the video data to user interface 242 for display on the viewfinder.
The video CODEC234 may encode video data from the camera 232 and decode video data received via the antenna 248, the transceiver 238, and the modem 236. The video CODEC234 can additionally or alternatively decode previously encoded data received from the data storage 246 for playback. The video CODEC234 may encode and/or decode digital video data using macroblocks that are larger than the size of the macroblocks specified by conventional video coding standards. For example, the video CODEC234 may encode and/or decode digital video data using large macroblocks comprising 64 x 64 pixels or 32 x 32 pixels. Large macroblocks may be identified with macroblock type syntax elements according to a video standard, such as an extension of the h.264 standard.
Video CODEC234 may perform the functions of either or both of video encoder 50 (fig. 2) and/or video decoder 60 (fig. 3), as well as any other encoding/decoding functions or techniques as described in this disclosure. For example, CODEC234 may partition a large macroblock into a variety of different sized smaller partitions, and use different coding modes (e.g., spatial (I) or temporal (P or B)) for selected partitions. The selection of the partition size and the coding mode may be based on a rate-distortion result of the partition size and the coding mode. CODEC234 may also utilize layered Coded Block Pattern (CBP) values to identify coded macroblocks and partitions within large macroblocks that have non-zero coefficients. Additionally, in some examples, CODEC234 may compare the rate-distortion metrics for large macroblocks to small macroblocks to select macroblock sizes that produce more favorable results for frames, slices, or other coding units.
A user may interact with user interface 242 to transmit a recorded video sequence in data storage device 246 to another device (e.g., another wireless communication device) via modem 236, transceiver 238, and antenna 248. The video sequence may be encoded according to an encoding standard such as MPEG-2, MPEG-3, MPEG-4, h.263, h.264, or other video encoding standard subject to the extensions or modifications described in this disclosure. For example, the video sequence may also be encoded using macroblocks larger than the standard, as described in this disclosure. The wireless communication device 230 may also receive the encoded video segments and store the received video sequences in the data store 246.
The macroblocks of the received encoded video sequence may be larger than the macroblocks specified by conventional video coding standards. To display an encoded video segment (e.g., a recorded video sequence or a received video segment) in the data storage 246, the video CODEC234 can decode the video sequence and send decoded frames of the video segment to the user interface 242. When the video sequence comprises audio data, the video CODEC234 can decode the audio, or the wireless communication device 230 can further comprise an audio CODEC (not shown) to decode the audio. In this manner, the video CODEC234 may perform both the functions of an encoder and the functions of a decoder.
The memory 244 of the wireless communication device 230 of fig. 14 can be encoded with computer-readable instructions that cause the processor 240 and/or the video CODEC234 to perform various tasks (in addition to storing encoded video data). The instructions may be loaded into memory 244 from a data storage device, such as data storage device 246. For example, the instructions may cause the processor 240 to perform the functions described with respect to the video CODEC 234.
Fig. 15 is a block diagram illustrating an example layered Coded Block Pattern (CBP) 260. Examples of CBPs 260 generally correspond to a portion of the syntax information for a 64 x 64 pixel macroblock. In the example of FIG. 15, CBP 260 includes CBP64 value 262, four CBP32 values 264, 266, 268, 270, and four CBP16 values 272, 274, 276, 278. Each block of CBP 260 may include one or more bits. In one example, when the CBP64 value 262 is a bit having a value of "1" (which indicates the presence of at least one non-zero coefficient in a large macroblock), the CBP 260 includes four CBP32 values 264, 266, 268, 270 for four 32 x 32 partitions of a large 64 x 64 macroblock, as shown in the example of fig. 15.
In another example, when the CBP64 value 262 is a bit having a value of "0," the CBP 260 may consist of only CBP64, as the value of "0" may indicate that the block corresponding to CBP 260 has all-zero-valued coefficients. Thus, all partitions of that block will also contain all zero-valued coefficients. In one example, when the CBP64 is a bit having a value of "1" and the CBP32 value of the CBP32 values for a particular 32 x 32 partition is a bit having a value of "1", the CBP32 value for the 32 x 32 partition has four branches representing CBP16 values, for example, as shown with respect to CBP32 value 266. In one example, when the CBP32 value is a bit having a value of "0," CBP32 does not have any branches. In the example of fig. 15, CBP 260 may have a five-bit prefix "10100", which indicates: the CBP64 value is 1, and one of the 32 x 32 partitions has a CBP32 value of "1", with the subsequent bits corresponding to the four CBP16 values 272, 274, 276, 278 (corresponding to the 16x16 partition of the 32 x 32 partition having a CBP32 value of "1"). Although only a single CBP32 value is shown as having a value of "1" in the example of fig. 15, in other examples, two 32 x 32 partitions, three 32 x 32 partitions, or all four 32 x 32 partitions may have a CBP32 value of "1", in which case multiple instances of four 16x16 partitions with corresponding CBP16 values would be required.
In the example of fig. 15, the four CBP16 values 272, 274, 276, 278 for the four 16x16 partitions may be calculated according to various methods (e.g., according to the methods of fig. 8 and 9). Any or all of the CBP16 values 272, 274, 276, 278 may include a "lumacbp 16" value, transform _ size _ flag, and/or luma16 x 8_ CBP. CBP16 values 272, 274, 276, 278 may also be calculated from CBP values as defined in ITU h.264 or codedblockpattern chroma as in ITU h.264, as discussed with respect to fig. 8 and 9. In the example of fig. 15, assuming that CBP16278 has a value of "1" and the other CBP16 values 272, 274, 276 have a value of "0", the nine-bit CBP value for a 64 × 64 macroblock would be "101000001", with each bit corresponding to one of the partitions at the respective level in the CBP/partition hierarchy.
FIG. 16 is a block diagram illustrating an example tree structure 280 corresponding to CBP 260 (FIG. 15). The CBP64 node 282 corresponds to the CBP64 value 262, the CBP32 nodes 284, 286, 288, 290 each correspond to a respective one of the CBP32 values 264, 266, 268, 270, and the CBP16 nodes 292, 294, 296, 298 each correspond to a respective one of the CBP16 values 272, 274, 276, 278. In this way, coded block pattern values as defined in this disclosure may correspond to hierarchical CBPs. Each node in the tree that produces another branch corresponds to a respective CBP value of "1". In the examples of fig. 15 and 16, both CBP64282 and CBP32286 have a value of "1", and further partitions are generated having CBP values that may be "1", i.e., where at least one partition at the next partition level includes at least one non-zero transform coefficient value.
Fig. 17 is a flow diagram illustrating an example method for using syntax information of a coded unit to indicate and select a block-based syntax encoder and decoder for a video block of the coded unit. In general, a video encoder, such as video encoder 20 (FIG. 1), may perform steps 300-310 of FIG. 17 in addition to and in conjunction with encoding a plurality of video blocks of a coded unit. A coded unit may comprise a video frame, slice, or group of pictures (also referred to as a "sequence"). In addition to and in conjunction with decoding the plurality of video blocks of the coded unit, a video decoder, such as video decoder 30 (FIG. 1), may perform steps 312-316 of FIG. 17.
Initially, video encoder 20 may receive a set of blocks of various sizes for a coded unit (e.g., a frame, slice, or group of pictures) (300). In accordance with the techniques of this disclosure, one or more of the blocks may include greater than 16 × 16 pixels (e.g., 32 × 32 pixels, 64 × 64 pixels, etc.). However, the blocks need not each include the same number of pixels. In general, video encoder 20 may encode each of the blocks using the same block-based syntax. For example, video encoder 20 may encode each of the blocks using layered coded block patterns, as described above.
Video encoder 20 may select a block-based syntax to be used based on the largest block (i.e., the largest block size) in the set of blocks of the coded unit. The maximum block size may correspond to the size of the largest macroblock included in the coded unit. Accordingly, video encoder 20 may determine the largest sized block in the set (302). In the example of fig. 17, video encoder 20 may also determine the smallest sized block in the set (304). As discussed above, a hierarchically coded block pattern of a block has a length corresponding to whether a partition of the block has non-zero, quantized coefficients. In some examples, video encoder 20 may include the minimum size value in syntax information for the coded unit. In some examples, the minimum size value indicates a minimum partition size in the coded unit. In this way, the minimum partition size (e.g., the smallest block in a coded unit) may be used to determine the maximum length of the layered coded block pattern.
Video encoder 20 may then encode each block of the set of coded units according to the syntax corresponding to the largest block (306). For example, assuming that the largest block includes a 64 × 64 block of pixels, video encoder 20 may use syntax such as the syntax defined above for MB64_ type. As another example, assuming that the largest block comprises a 32 × 32 block of pixels, video encoder 20 may use a syntax such as the syntax defined above for MB32_ type.
Video encoder 20 also generates coded unit syntax information that includes values corresponding to a largest block in the coded unit and a smallest block in the coded unit (308). Video encoder 20 may then transmit the coded unit (including syntax information for each of the coded unit and the blocks of the coded unit) to video decoder 30.
Video decoder 30 may receive the coded unit and syntax information for the coded unit from video encoder 20 (312). Video decoder 30 may select a block-based syntax decoder based on an indication in coded unit syntax information for the largest block in the coded unit (314). For example, assuming the coded unit syntax information indicates that the largest block in the coded unit includes 64 × 64 pixels, video decoder 30 may select a syntax decoder for the MB64_ type block. Video decoder 30 may then apply the selected syntax decoder to the block of the coded unit to decode the block of the coded unit (316). Video decoder 30 may also determine when a block has no further separately encoded sub-partitions based on an indication in the coded unit syntax information of the smallest encoded partition. For example, if the largest block is 64 x 64 pixels and the smallest block is also 64 x 64 pixels, it may be determined not to divide the 64 x 64 block into sub-partitions smaller than the 64 x 64 size. As another example, if the largest block is 64 × 64 pixels and the smallest block is 32 × 32 pixels, it may be determined to divide the 64 × 64 block into sub-partitions that are not less than 32 × 32.
In this way, video decoder 30 may remain backward compatible with existing coding standards (e.g., h.264). For example, when the largest block in the coded unit includes 16 × 16 pixels, video encoder 20 may indicate this information in the coded unit syntax information, and video decoder 30 may apply a standard h.264 block-based syntax decoder. However, when the largest block in a coded unit includes greater than 16x16 pixels, video encoder 20 may indicate this information in the coded unit syntax information, and video decoder 30 may selectively apply a block-based syntax decoder to decode the blocks of the coded unit in accordance with the techniques of this disclosure.
In one or more examples, the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a computer. By way of example, and not limitation, such computer-readable media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. Also, any connection is properly termed a computer-readable medium. For example, if the software is transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, Digital Versatile Disc (DVD), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
Various examples have been described. These and other examples are within the scope of the following claims.
Claims (22)
1. A method for video coding, comprising:
encoding a video block having a size greater than 16x16 pixels with a video encoder;
generating block type syntax information indicating the size of the block; and
generating coded block indicator data for the encoded block, wherein the coded block indicator data indicates whether the encoded block includes at least one non-zero coefficient, wherein generating the coded block indicator data comprises:
generating a single bit for the coded block indicator data and setting the single bit to 0 when the encoded block does not include at least one non-zero coefficient;
when the encoded block includes at least one non-zero coefficient, generating a first bit of the coded block indicator data, generating four partition bits for the coded block indicator data, each of the four partition bits corresponding to a different one of four equal-sized partitions of the encoded block, setting a value of the first bit to 1, setting the four partition bits to respective values representing whether the corresponding one of the four equal-sized partitions includes at least one non-zero coefficient;
when the encoded block comprises at least one non-zero coefficient and a16 x16 pixel partition, generating partition coded block indicator data for the 16x16 pixel partition as part of the generated coded block indicator data, wherein generating the partition coded block indicator data further comprises generating the coded block indicator data to comprise data representative of whether sub-partitions of the 16x16 pixel partition each comprise at least one non-zero coefficient if the 16x16 pixel partition comprises sub-partitions.
2. The method of claim 1, further comprising:
when the first bit of the coded block indicator data is 1, generating an encoded video bitstream that includes encoded video data for the encoded block, the block type syntax information, and the coded block indicator data.
3. The method of claim 1, further comprising: when the encoded block includes at least one non-zero coefficient, identifying a16 x16 pixel partition of the encoded block, generating partition coded block indicator data for the 16x16 pixel partition as part of the generated coded block indicator data, and appending the partition coded block indicator data to the coded block indicator data after the four partition bits.
4. The method of claim 3, further comprising: when the encoded block includes at least one non-zero coefficient, generating coded block indicator data comprises generating a first luma16 x 8 bit and a second luma16 x 8 bit, the first luma16 x 8 bit representing whether a16 x 8 partition of the 16x16 partition of the block includes at least one non-zero coefficient and the second luma16 x 8 bit representing whether a second 16x 8 partition of the 16x16 partition of the block includes at least one non-zero coefficient.
5. The method of claim 3, wherein the video block has a size of at least 64 x 64 pixels.
6. The method of claim 1, wherein the coded block indicator data indicates whether the encoded block includes at least one non-zero coefficient and, when the encoded block includes at least one non-zero coefficient, indicates whether any first-level partitions of the encoded block include at least one non-zero coefficient and, when at least one of the first-level partitions of the encoded block includes at least one non-zero coefficient, indicates whether any second-level partitions of the first-level partitions include at least one non-zero coefficient, and wherein the coded block indicator data includes a pattern of bits corresponding to the encoded block, the first-level partitions of the encoded block, and the second-level partitions of the first-level partitions.
7. The method of claim 1, further comprising:
generating a quantization parameter modification value for the encoded block, wherein encoding the video block comprises quantizing the video block according to the quantization parameter modification value; and
including the quantization parameter modification value as syntax information for the encoded block.
8. An apparatus for video coding, comprising:
means for encoding a video block having a size greater than 16x16 pixels;
means for generating block type syntax information indicating the size of the block; and
means for generating coded block indicator data for the encoded block, wherein the coded block indicator data indicates whether the encoded block includes at least one non-zero coefficient; wherein the means for generating the coded block indicator data comprises:
means for generating a single bit for the coded block indicator data and setting the single bit to 0 when the encoded block does not include at least one non-zero coefficient;
when the encoded block includes at least one non-zero coefficient, means for generating a first bit of the coded block indicator data and setting a value of the first bit to 1 when the encoded block includes at least one non-zero coefficient, means for generating four partition bits for the coded block indicator data, each of the four partition bits corresponding to a different one of four equal-sized partitions of the encoded block, and means for setting the four partition bits to represent whether the corresponding one of the four equal-sized partitions includes a respective value of at least one non-zero coefficient;
when the encoded block comprises at least one non-zero coefficient and a16 x16 pixel partition, means for generating partition coded block indicator data for the 16x16 pixel partition as part of the generated coded block indicator data, wherein means for generating the partition coded block indicator data further comprises means for generating the coded block indicator data to comprise data representative of whether sub-partitions of the 16x16 pixel partition each comprise at least one non-zero coefficient if the 16x16 pixel partition comprises sub-partitions.
9. The apparatus of claim 8, further comprising:
means for generating an encoded video bitstream that includes encoded video data for the encoded block, the block type syntax information, and the coded block indicator data when the first bit of the coded block indicator data is 1.
10. The apparatus of claim 8, further comprising:
means for identifying a16 x16 pixel partition of the encoded block when the encoded block comprises at least one non-zero coefficient;
means for generating partition coded block indicator data for the 16x16 pixel partition as part of the generated coded block indicator data when the encoded block comprises at least one non-zero coefficient; and
means for appending the partition coded block indicator data to the coded block indicator data after the four partition bits when the encoded block includes at least one non-zero coefficient.
11. The apparatus of claim 10, wherein the means for generating comprises:
when the encoded block includes at least one non-zero coefficient, means for generating coded block indicator data to include generating a first luma16 x 8 bit and a second luma16 x 8 bit, the first luma16 x 8 bit representing whether a16 x 8 partition of the 16x16 partition of the block includes at least one non-zero coefficient and the second luma16 x 8 bit representing whether a second 16x 8 partition of the 16x16 partition of the block includes at least one non-zero coefficient.
12. The apparatus of claim 8, wherein the video block has a size of at least 64 x 64 pixels.
13. The apparatus of claim 8, wherein the coded block indicator data indicates whether the encoded block includes at least one non-zero coefficient and, when the encoded block includes at least one non-zero coefficient, indicates whether any first-level partitions of the encoded block include at least one non-zero coefficient and, when at least one of the first-level partitions of the encoded block includes at least one non-zero coefficient, indicates whether any second-level partitions of the first-level partitions include at least one non-zero coefficient, and wherein the coded block indicator data includes a pattern of bits corresponding to the encoded block, the first-level partitions of the encoded block, and the second-level partitions of the first-level partitions.
14. A method for video coding, comprising:
receiving, with a video decoder, an encoded video block having a size greater than 16x16 pixels;
receiving block type syntax information indicating the size of the encoded block;
receiving coded block indicator data for the encoded block, wherein the coded block indicator data indicates whether the encoded block includes at least one non-zero coefficient, and wherein the coded block indicator data is a single bit having a value of 0 when the encoded block does not include at least one non-zero coefficient, and the coded block indicator data includes a first bit having a value of 1 and four partition bits when the encoded block includes at least one non-zero coefficient, each of the partition bits corresponding to a different one of four equal-sized partitions of the encoded block, wherein the four partition bits are set to respective values representing whether the corresponding one of the four equal-sized partitions includes at least one non-zero coefficient, wherein when the encoded block includes at least one non-zero coefficient and a16 x16 pixel partition, the coded block indicator data comprises partition coded block indicator data for the 16x16 pixel partition, wherein the coded block indicator data comprises data representative of whether sub-partitions of the 16x16 pixel partition each comprise at least one non-zero coefficient if the 16x16 pixel partition comprises sub-partitions; and
decoding the encoded block based on the block type syntax information and the coded block indicator data for the encoded block.
15. The method of claim 14, the method further comprising receiving encoded video data for the encoded block when the first bit of the coded block indicator data is a 1.
16. The method of claim 14, wherein the coded block indicator data indicates whether the encoded block includes at least one non-zero coefficient and, when the encoded block includes at least one non-zero coefficient, indicates whether any first-level partitions of the encoded block include at least one non-zero coefficient and, when at least one of the first-level partitions of the encoded block includes at least one non-zero coefficient, indicates whether any second-level partitions of the first-level partitions include at least one non-zero coefficient, and wherein the coded block indicator data includes a pattern of bits corresponding to the encoded block, the first-level partitions of the encoded block, and the second-level partitions of the first-level partitions.
17. The method of claim 14, wherein the video block has a size of at least 64 x 64 pixels.
18. The method of claim 14, further comprising:
a quantization parameter modification value is received,
wherein decoding the encoded block includes dequantizing the encoded block according to the quantization parameter modification value.
19. An apparatus for video coding, comprising:
means for receiving an encoded video block having a size greater than 16x16 pixels;
means for receiving block type syntax information indicating the size of the encoded block;
means for receiving coded block indicator data for the encoded block, wherein the coded block indicator data indicates whether the encoded block includes at least one non-zero coefficient, and wherein the coded block indicator data is a single bit having a value of 0 when the encoded block does not include at least one non-zero coefficient, and the coded block indicator data includes a first bit having a value of 1 and four partition bits when the encoded block includes at least one non-zero coefficient, each of the partition bits corresponding to a different one of four equal-sized partitions of the encoded block, wherein the four partition bits are set to represent whether the corresponding one of the four equal-sized partitions includes a respective value of at least one non-zero coefficient, wherein when the encoded block includes at least one non-zero coefficient and 16x16 pixel partitions, the coded block indicator data comprises partition coded block indicator data for the 16x16 pixel partition, wherein the coded block indicator data comprises data representative of whether sub-partitions of the 16x16 pixel partition each comprise at least one non-zero coefficient if the 16x16 pixel partition comprises sub-partitions; and
means for decoding the encoded block based on the block type syntax information and the coded block indicator data for the encoded block.
20. The apparatus of claim 19, further comprising means for receiving encoded video data for the encoded block when the coded block indicator is 1.
21. The apparatus of claim 19, wherein the coded block indicator data indicates whether the encoded block includes at least one non-zero coefficient and, when the encoded block includes at least one non-zero coefficient, indicates whether any first-level partitions of the encoded block include at least one non-zero coefficient and, when at least one of the first-level partitions of the encoded block includes at least one non-zero coefficient, indicates whether any second-level partitions of the first-level partitions include at least one non-zero coefficient, and wherein the coded block indicator data includes a pattern of bits corresponding to the encoded block, the first-level partitions of the encoded block, and the second-level partitions of the first-level partitions.
22. The apparatus of claim 19, wherein the video block has a size of at least 64 x 64 pixels.
Applications Claiming Priority (9)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US10278708P | 2008-10-03 | 2008-10-03 | |
| US61/102,787 | 2008-10-03 | ||
| US14435709P | 2009-01-13 | 2009-01-13 | |
| US61/144,357 | 2009-01-13 | ||
| US16663109P | 2009-04-03 | 2009-04-03 | |
| US61/166,631 | 2009-04-03 | ||
| US12/562,412 | 2009-09-18 | ||
| US12/562,412 US8634456B2 (en) | 2008-10-03 | 2009-09-18 | Video coding with large macroblocks |
| PCT/US2009/058844 WO2010039733A2 (en) | 2008-10-03 | 2009-09-29 | Video coding with large macroblocks |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| HK1161464A1 HK1161464A1 (en) | 2012-08-24 |
| HK1161464B true HK1161464B (en) | 2015-07-31 |
Family
ID=
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US12389043B2 (en) | Video coding with large macroblocks | |
| JP6672226B2 (en) | Video coding using large macro blocks | |
| JP5384652B2 (en) | Video coding using large macro blocks | |
| US20100086031A1 (en) | Video coding with large macroblocks | |
| HK1161464B (en) | Video coding with large macroblocks | |
| HK1196910A (en) | Video coding with large macroblocks | |
| HK1161465B (en) | Video coding with large macroblocks | |
| HK1161475A (en) | Video coding with large macroblocks | |
| HK1161475B (en) | Video coding with large macroblocks | |
| HK1196910B (en) | Video coding with large macroblocks |