US20140153640A1 - Adaptive single-field/dual-field video encoding - Google Patents
Adaptive single-field/dual-field video encoding Download PDFInfo
- Publication number
- US20140153640A1 US20140153640A1 US13/705,422 US201213705422A US2014153640A1 US 20140153640 A1 US20140153640 A1 US 20140153640A1 US 201213705422 A US201213705422 A US 201213705422A US 2014153640 A1 US2014153640 A1 US 2014153640A1
- Authority
- US
- United States
- Prior art keywords
- encoding mode
- encoding
- frame
- field
- video stream
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000003044 adaptive effect Effects 0.000 title claims description 6
- 238000013139 quantization Methods 0.000 claims abstract description 49
- 238000012545 processing Methods 0.000 claims abstract description 36
- 238000000034 method Methods 0.000 claims description 48
- 230000007423 decrease Effects 0.000 claims description 5
- 230000003247 decreasing effect Effects 0.000 claims description 4
- 238000004458 analytical method Methods 0.000 claims description 3
- 230000003362 replicative effect Effects 0.000 claims 1
- 230000000694 effects Effects 0.000 abstract description 6
- 238000013459 approach Methods 0.000 abstract description 3
- 230000008569 process Effects 0.000 description 25
- 230000008901 benefit Effects 0.000 description 10
- 230000008859 change Effects 0.000 description 8
- 238000010586 diagram Methods 0.000 description 4
- 230000000750 progressive effect Effects 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 230000001413 cellular effect Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 230000001934 delay Effects 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 230000000717 retained effect Effects 0.000 description 2
- 101000969688 Homo sapiens Macrophage-expressed gene 1 protein Proteins 0.000 description 1
- 102100021285 Macrophage-expressed gene 1 protein Human genes 0.000 description 1
- 102100037812 Medium-wave-sensitive opsin 1 Human genes 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000006073 displacement reaction Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 235000019580 granularity Nutrition 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 230000011664 signaling Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
Images
Classifications
-
- H04N19/00018—
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/103—Selection of coding mode or of prediction mode
- H04N19/112—Selection of coding mode or of prediction mode according to a given display mode, e.g. for interlaced or progressive display mode
-
- H04N19/0009—
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/146—Data rate or code amount at the encoder output
- H04N19/149—Data rate or code amount at the encoder output by estimating the code amount by means of a model, e.g. mathematical model or statistical model
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
- H04N19/172—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/124—Quantisation
Definitions
- the present disclosure generally relates to video processing and more particularly relates to video encoding and transcoding.
- Block-based video encoding techniques are inherently lossy as they rely on quality compromises in ways that are intended to be minimally perceptible.
- One such compromise comes in the form of the quantization parameter (QP), which controls the degree of quantization during encoding and thus controls the degree of spatial detail retained from the original video source.
- QP quantization parameter
- spatial detail is increasingly aggregated, which has the effect of lowering the bit rate at the expense of an increase in distortion and loss of quality.
- Rate control is frequently employed in video encoding or transcoding applications in an attempt to ensure that picture data being encoded meets various constraints, such as network bandwidth limitations, storage limitations, or processing bandwidth limitations, which may dynamically change.
- rate control is to maintain the bit rate of the encoded stream within a certain range of the target bit rate, which may remain relatively constant, as found in constant bit rate (CBR) applications, or may vary as found in variable bit rate (VBR) applications. Rate control achieves this target bit rate through manipulation of QP.
- the high QP typically required to achieve relatively low bit rates in conventional encoding systems often introduces quantization artifacts that are readily perceivable by a viewer.
- quantization artifacts are addressed by lowering the temporal resolution or spatial resolution of the encoded stream, which often renders the artifacts unperceivable by the viewer.
- AVC Advanced Video Coding
- the downstream device receiving the encoded video stream which often is not under the control of the same entity controlling the encoder system, may not have the ability to, or be configured to, handle the resolution change.
- an encoding system that facilitates low bit rates and allows the original resolution to be maintained while reducing the impact of quantization artifacts would be advantageous.
- FIG. 1 is a block diagram illustrating a multimedia system employing adaptive single-field/dual-field encoding in accordance with at least one embodiment of the present disclosure.
- FIG. 2 is a block diagram illustrating an example configuration of a rate control module and an encoding module of the multimedia system of FIG. 1 in accordance with at least one embodiment of the present disclosure.
- FIG. 3 is a flow diagram illustrating a method for adaptive single-field/dual-field encoding in accordance with at least one embodiment of the present disclosure.
- FIGS. 1-3 illustrate techniques for encoding or transcoding an input video stream by dynamically switching between a single-field encoding mode and a dual-field encoding mode based on one or more varying encoding parameters.
- a rate control module controls the encoding process so as to maintain the output bit rate of the resulting encoded video stream to within a certain range of a target bit rate, which may vary based on changing system constraints.
- the rate control module dynamically modifies the quantization parameter (QP) used in the quantization process based on modifications to the target bit rate.
- QP quantization parameter
- the encoding process switches to the single-field encoding mode when the QP exceeds a specified threshold.
- the picture data of only one of the two fields of each frame is encoded for inclusion in the resulting encoded video stream, and the picture data of the other field is processed in an all-skip mode that references the first field.
- the encoding process switches to the dual-field encoding mode, in which the picture data of both fields of each frame is encoded for inclusion in the resulting encoded video stream.
- the dual-field encoding mode can employ Picture Adaptive Frame Field (PAFF) encoding to encode both fields in a frame-based mode or in a field-based mode. Under this approach, very low bit rates for the encoded video stream can be achieved with reduced quantization artifacts while maintaining the original horizontal resolution of the encoded video stream.
- PAFF Picture Adaptive Frame Field
- the techniques of the present disclosure are described in the example context of the ITU-T H.264 encoding standards, which are also commonly referred to as the MPEG-4 Part 10 standards or the Advanced Video Coding (AVC) standards.
- the techniques of the present disclosure are not limited to this context, but instead may be implemented in any of a variety of block-based video compression techniques that employ field-based frames, examples of which include the MPEG-2 standards and the ITU-T H.263 standards.
- FIG. 1 illustrates, in block diagram form, a multimedia system 100 in accordance with at least one embodiment of the present disclosure.
- the multimedia system 100 includes a video source 102 , a video processing device 104 , and a video destination 106 .
- the multimedia system 100 can represent any of a variety of multimedia systems in which encoding or transcoding can be advantageously used.
- the multimedia system 100 is a distributed television system whereby the video source 102 comprises a terrestrial, cable, or satellite television broadcaster, an over-the-top (OTT) multimedia source or other Internet-based multimedia source, and the like.
- OTT over-the-top
- the video processing device 104 and the video destination 106 together are implemented as user equipment, such as a set-top box, a tablet computer or personal computer, a computing-enabled cellular phone, and the like.
- the video processing device 104 encodes or transcodes an input video stream and the resulting encoded video stream is buffered or otherwise stored in a cache, memory, hard drive or other storage device (not shown) until it is accessed for decoding and display by the video destination 106 .
- the multimedia system 100 can comprise a video content server system, whereby the video source 102 comprises one or more hard drives or other mass-storage devices storing original video content, the video destination 106 is a remote computer system connected to the video content server via a network, and the video processing device 104 is used to transcode the video content responsive to current network conditions before the transcoded video content is transmitted to the remote computer system via the network.
- the video source 102 comprises one or more hard drives or other mass-storage devices storing original video content
- the video destination 106 is a remote computer system connected to the video content server via a network
- the video processing device 104 is used to transcode the video content responsive to current network conditions before the transcoded video content is transmitted to the remote computer system via the network.
- the video source 102 transmits or otherwise provides an input video stream 108 to the video processing device 104 in either an analog format, such as a National Television System Committee (NTSC) or Phase Alternating Line (PAL) format, or a digital format, such as an H.263 format, an H.264 format, a Moving Picture Experts Group (MPEG) format (such as MPEG1, MPEG-2 or MPEG4), Quicktime format, Real Media format, Windows Media Video (WMV) or Audio Video Interleave (AVI), or other digital video format, either standard or proprietary.
- an analog format such as a National Television System Committee (NTSC) or Phase Alternating Line (PAL) format
- PAL Phase Alternating Line
- a digital format such as an H.263 format, an H.264 format, a Moving Picture Experts Group (MPEG) format (such as MPEG1, MPEG-2 or MPEG4), Quicktime format, Real Media format, Windows Media Video (WMV) or Audio Video Interleave (AVI), or other digital video format, either standard or proprietary.
- the video processing device 104 operates to encode the input video stream 108 to generate an encoded video stream 110 , and in instances whereby the input video stream 108 has a digital format, the video processing device 104 operates to transcode the input video stream 108 to generate the encoded video stream 110 .
- the resulting encoded video stream 110 is transmitted to the video destination 106 for storage, decoding, display, and the like.
- the video processing device 104 includes interfaces 112 and 114 , an encoder 116 , a rate control module 118 , and, in instances whereby the video processing device 104 provides transcoding, a decoder 120 .
- the interfaces 112 and 114 include interfaces used to communicate signaling with the video source 102 and the video destination 106 , respectively.
- Examples of the interfaces 112 and 114 include input/output (I/O) interfaces, such as Peripheral Component Interconnect Express (PCIE), Universal Serial Bus (USB), Serial Attached Technology Attachment (SATA), wired network interfaces such as Ethernet, or wireless network interfaces, such as IEEE 802.11x or BluetoothTM or a wireless cellular interface, such as a 3GPP, 4G, or LTE cellular data standard.
- I/O input/output
- PCIE Peripheral Component Interconnect Express
- USB Universal Serial Bus
- SATA Serial Attached Technology Attachment
- wired network interfaces such as Ethernet
- wireless network interfaces such as IEEE 802.11x or BluetoothTM or a wireless cellular interface, such as a 3GPP, 4G, or LTE cellular data standard.
- the decoder 120 , the encoder 116 , and rate control module 118 each may be implemented entirely in hard-coded logic (that is, hardware), as the combination of software stored in a memory 122 and a
- the video processing device 104 is implemented as a SOC whereby portions of the decoder 120 , the encoder 116 , and the rate control module 118 are implemented as hardware logic, and other portions are implemented via firmware stored at the SOC and executed by a processor of the SOC.
- the hardware of the video processing device 104 can be implemented using a single processing device or a plurality of processing devices.
- processing devices can include a central processing unit (CPU), a graphics processing unit (GPU), a microcontroller, a digital signal processor, a field programmable gate array, programmable logic device, state machine, logic circuitry, analog circuitry, digital circuitry, or any device that manipulates signals (analog and/or digital) based on operational instructions that are stored in a memory, such as the memory 122 .
- the memory 122 may be a single memory device or a plurality of memory devices.
- Such memory devices can include a hard disk drive or other disk drive, read-only memory, random access memory, volatile memory, non-volatile memory, static memory, dynamic memory, flash memory, cache memory, and/or any device that stores digital information.
- the processing module implements one or more of its functions via a state machine, analog circuitry, digital circuitry, and/or logic circuitry
- the memory storing the corresponding operational instructions may be embedded within, or external to, the circuitry comprising the state machine, analog circuitry, digital circuitry, and/or logic circuitry.
- the decoder 120 operates to receive the input video stream 108 via the interface 112 and partially or fully decode the input video stream 108 to create a decoded data stream 126 , which can include pixel information, motion estimation/detection information, timing information, and other video parameters.
- the encoder 116 receives the decoded data stream 126 and uses the video parameters represented by the decoded data stream to generate the encoded video stream 110 , which comprises a transcoded representation of the video content of the original input video stream 108 .
- the transcoding process implemented by the encoder 116 can include, for example, a stream format change (e.g., conversion from an MPEG-2 format to an AVC format), a resolution change, a frame rate change, a bit rate change, and the like.
- a stream format change e.g., conversion from an MPEG-2 format to an AVC format
- a resolution change e.g., conversion from an MPEG-2 format to an AVC format
- a frame rate change e.g., a bit rate change, and the like.
- the decoder 120 is bypassed and the input video stream 108 is digitized and then encoded by the encoder 116 to generate the encoded video stream 110 .
- Video encoding schemes generally process frames as rows, or horizontal scan lines, of picture elements (“pixels”).
- Each frame comprises an “even” field composed of the even numbered rows of the frame, such as rows 0 , 2 , 4 , etc., and an “odd” field composed of the odd numbered rows of the frame, such as rows 1 , 3 , 5 , etc.
- the even field and odd field are also known as the “top” field and “bottom” field, respectively.
- a progressive display is configured to concurrently display both the even field and the odd field of a frame, whereas an interlaced display is configured to display the two fields of a frame in sequence.
- progressive encoding is configured to encode a frame using both fields, whereas interlaced encoding separately encodes the fields.
- the encoder 116 leverages this dual-field representation of frames by employing at least two encoding modes, including a dual-field encoding mode and a single-field encoding mode for encoding the sequence of frames represented by the input video stream 108 .
- the encoder 116 encodes both the even field and the odd field of a frame so that the picture content of both the even field and the odd field is represented, in compressed form, in the encoded video stream 110 .
- the encoder 116 can employ Picture Adaptive Frame Field (PAFF) encoding as found in, for example, the MPEG-4 AVC standard.
- PAFF Picture Adaptive Frame Field
- PAFF encoding enables a frame to be encoded as either interlaced (e.g., each field separately encoded) or progressive (the two fields encoded together) based on a motion analysis between the two fields of the frame.
- the encoder 116 encodes only a single field of each frame so that the picture content of only one field of each frame is represented in the encoded video stream.
- the picture content of other field of the frame is disregarded by the encoder 116 , which may instead insert skip information into the encoded video stream 110 in place of the other field, whereby the skip information references the field that was encoded.
- This approach halves the vertical resolution of the frame while maintaining the same horizontal resolution.
- the single-field encoding mode can achieve a substantially lower bit rate compared to the dual-field encoding mode while maintaining the same horizontal resolution.
- the rate control module 118 dynamically determines and adjusts various encoding parameters used by the encoder 116 to achieve a target bit rate.
- these encoding parameters include a control signal 128 (denoted “QP” in FIG. 1 ) to set the QP used during by quantization process of the encoder 116 , as well as a control signal 130 (denoted “MODE” in FIG. 1 ) to select which of the two encoding modes is to be employed by the encoder 116 .
- QP control signal 128
- MODE decoded “MODE” in FIG. 1
- the rate control module 118 continuously monitors the complexity of the pictures to be encoded, the bits allocated to encode the pictures, and a specified target bit rate (which may be constant or may vary with, for example, changing network bandwidth parameters) to determine the next value for QP and signals this new QP value via control signal 128 .
- the rate control module 118 uses the QP to adaptively select the mode to be employed by the encoder 116 . While the QP is within a range deemed acceptable, the rate control module 118 configures the control signal 130 to place the encoder 116 in the dual-field encoding mode so that both fields of each frame being processed are encoded for inclusion in the encoded video stream 110 .
- the rate control module 118 configures the control signal 130 to switch the encoder 116 into the single-field encoding mode so that only a single field of each frame being process is encoded for inclusion in the encoded video stream 110 .
- the rate control module 118 can also decrease the QP to take advantage of the additional bit rate headroom made available.
- the single-field encoding mode allows the encoder 116 to achieve the target bit rate with a lower QP and the same horizontal resolution, which typically facilitates a higher quality decoded video compared to a conventional process whereby a higher QP is employed while including both fields of the frame in the encoded video stream.
- FIG. 1 also depicts a timeline 140 for an example portion 150 of the encoded video stream 110 to illustrate the dynamic switching between the dual-field encoding mode and the single-field encoding mode.
- the rate control module 118 configures the encoder 116 to switch to the dual-field encoding mode.
- the encoder 116 encodes frames of a subsequence of the frames of the input video stream 108 to include the picture content of both the even and odd fields of each frame.
- frame 0 is encoded to include the picture content of the even field 151 and the bottom field 152 of frame 0 in the encoded video stream 110 , and this process continues for each frame up through frame J at time T 1 .
- the rate control module 118 increases QP based on any of a variety of factors, such as increasing image complexity, decreasing target bit rate, etc.
- QP has been increased to the point that it exceeds a specified threshold, and thus at time T 1 the rate control module 118 configures the encoder 116 to switch to the single-field encoding mode.
- the rate control module 118 decreases QP. While in the single-field encoding mode, the encoder 116 encodes frames of a subsequence of frames to include the picture content of only the even field of each frame.
- the frame J+1 processed after time t 1 is encoded to include the picture content of the even field 153 in the encoded video stream 110 , while the picture content of the bottom field 154 of frame J+1 is disregarded.
- skip information 155 is included in the encoded video stream 110 in place of what would have been the encoded picture content of the bottom field 154 .
- this skip information 155 specifies an all-skip mode (that is, sets each macroblock (MB) in the bottom field 154 to skip mode) and references the even field 153 of the same frame.
- the odd field can be encoded and the even field disregarded during the single-field encoding mode.
- the single-field encoding process continues for each frame up through frame K at time T 2 .
- the rate control module 118 decreases QP based on various factors, such as decreasing image complexity, increasing target bit rate, etc.
- QP has been decreased to the point that it falls below a specified threshold, and thus at time T 2 the rate control module 118 configures the encoder 116 to switch back to the dual-field encoding mode.
- the encoder 116 encodes frames of a subsequence of frames to include the picture content of both the even and odd fields of each frame, such as by encoding the picture content of both the even field 156 and the odd field 157 in the encoded video stream 110 of a frame K+1 processed after time T 2 .
- the threshold used to trigger the switch from the dual-field encoding mode to the single-field encoding mode can be the same threshold as that used to trigger the switch from the single-field encoding mode to the dual-field encoding mode.
- this can lead to frequent switching between the two encoding modes.
- a directional switch with two thresholds may be employed.
- the QP threshold used to initiate a switch from the dual-field encoding mode to the single-field encoding mode may be higher than the QP threshold used to initiate a switch from the single-field encoding mode to the dual-field encoding mode.
- the rate control module 118 can control the toggling frequency between modes to a degree by delaying a switch between modes be delayed until a specified switch point occurs or by implementing a minimum distance condition between mode switch points.
- switch points can include, for example, scene, group of picture (GOP), or mini-GOP boundaries.
- the minimum distance condition can be specified as, for example, a minimum number of GOPs, mini-GOPs, or scene changes since the previous switch, a minimum lapse of time since the previous switch, and the like.
- the video destination 106 can operate to decode and display the encoded video stream 110 .
- the video destination 106 includes a decoder 160 and a display device 162 .
- the decoder 160 operates to decode the encoded video stream 110 to generate a decoded video stream and then provide this decoded video stream to the display device 162 .
- the decoder 160 decodes the picture content of both the even field and the odd field of the frame from the encoded video stream 110 and displays the resulting decoded representation of the picture content of both fields either concurrently for a progressive display or in sequence for an interlaced display.
- the same decoding process may be used under conventional conditions, although the video content of the omitted field will not be present in the decoded result.
- FIG. 2 illustrates an example implementation of the rate control module 118 in greater detail in accordance with at least one embodiment of the present disclosure.
- the rate control module 118 includes a complexity estimation module 202 , a bit allocation module 204 , a virtual buffer model (VBM) 206 , and a rate-quantization module 208 .
- VBM virtual buffer model
- the encoder 116 employs a subtraction process and motion estimation process for data representing macroblocks of pixel values for a picture to be encoded.
- the motion estimation process compares each of these new macroblocks with macroblocks in a previously stored reference picture or pictures to find the macroblock in a reference picture that most closely matches the new macroblock.
- the motion estimation process then calculates a motion vector, which represents the horizontal and vertical displacement from the macroblock being encoded to the matching macroblock-sized area in the reference picture.
- the motion estimation process also provides this matching macroblock (known as a predicted macroblock) out of the reference picture memory to the subtraction process, whereby it is subtracted, on a pixel-by-pixel basis, from the new macroblock entering the encoder.
- the encoder 116 employs a two-dimensional (2D) discrete cosine transform (DCT) to transform the residual from the spatial domain.
- DCT discrete cosine transform
- the resulting DCT coefficients of the residual are then quantized using the QP so as to reduce the number of bits needed to represent each coefficient.
- the quantized DCT coefficients then may be Huffman run/level coded to further reduces the average number of bits per coefficient. This is combined with motion vector data and other side information (including an indication of I, P or B pictures) for insertion into the encoded video stream 110 .
- the quantized DCT coefficients also go to an internal loop that represents the operation of the decoder (a decoder within the encoder).
- the residual is inverse quantized and inverse DCT transformed.
- the predicted macroblock is read out of the reference picture memory is added back to the residual on a pixel by pixel basis and stored back into a memory to serve as a reference for predicting subsequent pictures.
- the encoding of I pictures uses the same process, except that no motion estimation occurs and the negative ( ⁇ ) input to the subtraction process is forced to 0.
- the quantized DCT coefficients represent transformed pixel values rather than residual values as was the case for P and B pictures.
- decoded I pictures are stored as reference pictures.
- the rate-quantization module 208 uses the image complexity and bit allocations as parameters for determining the QP, which in turn determines the degree of quantization performed by the encoder 116 and thus influences the bit rate of the resulting encoded video data.
- the image complexity is estimated by the complexity estimation module 202 , which calculates a mean average difference (MAD) of the residuals as an estimate of image complexity for picture data to be encoded.
- the MAD may be calculated using any of a variety of well-known algorithms.
- the bit allocations are represented by target numbers of bits that may be allocated at different granularities, such as per frame, picture, GOP, slice, or block.
- the VBM 206 maintains a model of the buffer fullness of a modeled decoder receiving the encoded video stream 110 and the bit allocation module 204 determines the number of target bits to allocate based on the buffer fullness and a specified target bit rate, which can include a specific bit rate or a bit rate range, using any of a variety of well-known bit allocation algorithms.
- the rate-quantization module 208 uses the calculated MAD and the target bit allocation to calculate a value for QP that is expected to achieve the target bit rate when used to encode the picture data having the calculated MAD and target bit allocation. Any of a variety of well-known QP calculation techniques may be used to determine the value for QP. Moreover, the rate-quantization module 208 may employ a QP limiter to dampen any rapid changes in the QP value so as to provide stability and minimize perceptible variations in quality. The revised QP value is signaled to the encoder 116 via the control signal 128 .
- the rate-quantization module 208 uses the relationship between QP and one or more specified thresholds to switch the encoder 116 between a single-field encoding mode (denoted as mode 210 in FIG. 2 ) and a dual-field encoding mode (denoted as mode 212 in FIG. 2 ) via the control signal 130 .
- These specified thresholds may be programmed via software-observable registers or pre-configured via a read-only memory (ROM), fuses, or one-time-programmable (OTP) registers, and the like.
- the rate-quantization module 208 can use the control signal 130 to specify whether the encoder 116 is to employ PAFF in a PAFF sub-mode (denoted as sub-mode 214 in FIG. 2 ) or bypass PAFF in a non-PAFF sub-mode (denoted as sub-mode 216 in FIG. 2 ) while the encoder 116 is in the dual-field encoding mode.
- the rate-quantization module 208 determines to switch from the dual-field encoding mode to the single-field encoding mode responsive to the QP exceeding an upper threshold, the rate-quantization module 208 is configured to reduce the QP to take advantage of the bit rate headroom created by application of the single-field encoding mode to the encoded video stream 110 .
- the rate-quantization module 208 implements a fixed reduction to the QP upon switching to the single-field encoding mode.
- the one or both of the MAD and the target bit allocation are updated to reflect that only one of the two fields of each frame is to be encoded, and the rate-quantization module 208 updates QP based on these updated input parameters.
- the rate-quantization module 208 can control the encoder 116 through the QP value and the encoding mode so as to provide more optimal video quality at the original horizontal resolution while also meeting very low target bit rates.
- FIG. 3 illustrates an example method 300 for setting the encoding parameters of the encoder 116 of the video processing device 104 of FIGS. 1 and 2 .
- the video processing device 104 Upon receiving the input video stream 108 at the video processing device 104 from the video source 102 , the video processing device 104 initiates an encoding process using the encoder 116 to either encode or transcode the input video stream 108 .
- the rate-quantization module 208 determines encoding parameters for the encoder 116 , including the QP to be implemented for the quantization process employed by the encoder 116 , based on various parameters, including image complexity and target bit allocations. This process is repeated continuously to update QP based on dynamically changing parameters.
- the rate-quantization module 208 then turns to determining which encoding mode is to be employed by the encoder 116 based on the QP. Accordingly, at block 304 the rate-quantization module 208 compares the QP with a specified upper threshold to determine whether the QP is excessively high, and thus likely to significantly impact video quality. Accordingly, if the QP exceeds the upper threshold, the rate-quantization module 208 prepares to switch to the single-field encoding mode. To this end, at block 306 the rate-quantization module 208 delays the switch until the next encountered switch point.
- the switch points can include, for example, scene, GOP, or mini-GOP boundaries, and a minimum distance condition may be instituted between mode switches.
- the rate-quantization module 208 decreases QP either by a fixed step or based on a recalculation of QP in view of the reduced image complexity and increased target bit rate allocation that will occur in the single-field encoding mode.
- the rate-quantization module 208 uses the control signal 130 to configure the encoder 116 to the single-field encoder mode at block 310 .
- the encoder 116 encodes the picture data of a select one of the two fields using the QP and includes the encoded picture data in the encoded video stream 110 at block 312 and discards or otherwise disregards the picture data of the non-selected one of the two fields at block 314 .
- the rate-quantization module 208 determines whether the QP falls below a specified lower threshold. If the QP is between the lower threshold and the upper threshold, at block 318 the rate-quantization module 208 maintains the encoder 116 in its current encoding mode. Otherwise, if the QP is below the lower threshold, the rate-quantization module 208 prepares to switch the encoder 116 to the dual-field encoding mode. Accordingly, at block 320 the rate-quantization module 208 delays the switch until the next switch point is encountered and any minimum distance condition is met.
- the rate-quantization module 208 reconfigures the encoder 116 to switch to the dual-field encoding mode and the rate-quantization module 208 increases QP either by a fixed step or based on a recalculation of QP in a similar manner as described above with reference to block 308 .
- the encoder 116 encodes both fields of each frame of the frame subsequence being processed while in the dual-field encoding mode.
- the dual-field encoding mode may employ PAFF such that the motion between fields of a frame or other measure of complexity is used to determine whether to encode both fields of a frame together in a frame-based encoding mode at block 324 , or to encode each field separately in a field-based encoding mode at block 326 .
- the QP is continuously updated at block 302 based on dynamic changes in the parameters used to calculate QP, such as the complexity of the particular pictures to be encoded, fluctuations in the target bit rate (due to, for example, fluctuations in the bandwidth of a network link), and the like.
- the video processing device 104 can repeat the process represented by blocks 304 - 326 based on the updated QP so as to dynamically adapt the encoder 116 to varying encoding limitations while attempting to maintain the original horizontal resolution in a manner that provides high video quality at lower bit rates.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- Algebra (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
Description
- The present disclosure generally relates to video processing and more particularly relates to video encoding and transcoding.
- Block-based video encoding techniques are inherently lossy as they rely on quality compromises in ways that are intended to be minimally perceptible. One such compromise comes in the form of the quantization parameter (QP), which controls the degree of quantization during encoding and thus controls the degree of spatial detail retained from the original video source. As QP increases, spatial detail is increasingly aggregated, which has the effect of lowering the bit rate at the expense of an increase in distortion and loss of quality. Rate control is frequently employed in video encoding or transcoding applications in an attempt to ensure that picture data being encoded meets various constraints, such as network bandwidth limitations, storage limitations, or processing bandwidth limitations, which may dynamically change. These constraints are reflected in the target bit rate for the resulting encoded video stream, and thus the goal of rate control is to maintain the bit rate of the encoded stream within a certain range of the target bit rate, which may remain relatively constant, as found in constant bit rate (CBR) applications, or may vary as found in variable bit rate (VBR) applications. Rate control achieves this target bit rate through manipulation of QP.
- The high QP typically required to achieve relatively low bit rates in conventional encoding systems often introduces quantization artifacts that are readily perceivable by a viewer. In some systems, such quantization artifacts are addressed by lowering the temporal resolution or spatial resolution of the encoded stream, which often renders the artifacts unperceivable by the viewer. However, in certain video encoding standards, such as the Advanced Video Coding (AVC) standards, a change in resolution is treated as the starting of a new video stream, which prevents the use of reference video content before the resolution change and which requires new steam information. Moreover, the downstream device receiving the encoded video stream, which often is not under the control of the same entity controlling the encoder system, may not have the ability to, or be configured to, handle the resolution change. As such, an encoding system that facilitates low bit rates and allows the original resolution to be maintained while reducing the impact of quantization artifacts would be advantageous.
- The present disclosure may be better understood, and its numerous features and advantages made apparent to those skilled in the art by referencing the accompanying drawings.
-
FIG. 1 is a block diagram illustrating a multimedia system employing adaptive single-field/dual-field encoding in accordance with at least one embodiment of the present disclosure. -
FIG. 2 is a block diagram illustrating an example configuration of a rate control module and an encoding module of the multimedia system ofFIG. 1 in accordance with at least one embodiment of the present disclosure. -
FIG. 3 is a flow diagram illustrating a method for adaptive single-field/dual-field encoding in accordance with at least one embodiment of the present disclosure. -
FIGS. 1-3 illustrate techniques for encoding or transcoding an input video stream by dynamically switching between a single-field encoding mode and a dual-field encoding mode based on one or more varying encoding parameters. A rate control module controls the encoding process so as to maintain the output bit rate of the resulting encoded video stream to within a certain range of a target bit rate, which may vary based on changing system constraints. As part of this process, the rate control module dynamically modifies the quantization parameter (QP) used in the quantization process based on modifications to the target bit rate. However, as a high QP can result in an unacceptable loss of spatial detail, the encoding process switches to the single-field encoding mode when the QP exceeds a specified threshold. In the single-field encoding mode, the picture data of only one of the two fields of each frame is encoded for inclusion in the resulting encoded video stream, and the picture data of the other field is processed in an all-skip mode that references the first field. When the QP falls below a specified threshold, the encoding process switches to the dual-field encoding mode, in which the picture data of both fields of each frame is encoded for inclusion in the resulting encoded video stream. Further, in some embodiments, the dual-field encoding mode can employ Picture Adaptive Frame Field (PAFF) encoding to encode both fields in a frame-based mode or in a field-based mode. Under this approach, very low bit rates for the encoded video stream can be achieved with reduced quantization artifacts while maintaining the original horizontal resolution of the encoded video stream. - For ease of illustration, the techniques of the present disclosure are described in the example context of the ITU-T H.264 encoding standards, which are also commonly referred to as the MPEG-4 Part 10 standards or the Advanced Video Coding (AVC) standards. However, the techniques of the present disclosure are not limited to this context, but instead may be implemented in any of a variety of block-based video compression techniques that employ field-based frames, examples of which include the MPEG-2 standards and the ITU-T H.263 standards.
-
FIG. 1 illustrates, in block diagram form, amultimedia system 100 in accordance with at least one embodiment of the present disclosure. Themultimedia system 100 includes avideo source 102, avideo processing device 104, and avideo destination 106. Themultimedia system 100 can represent any of a variety of multimedia systems in which encoding or transcoding can be advantageously used. In one embodiment, themultimedia system 100 is a distributed television system whereby thevideo source 102 comprises a terrestrial, cable, or satellite television broadcaster, an over-the-top (OTT) multimedia source or other Internet-based multimedia source, and the like. In this implementation, thevideo processing device 104 and thevideo destination 106 together are implemented as user equipment, such as a set-top box, a tablet computer or personal computer, a computing-enabled cellular phone, and the like. Thus, thevideo processing device 104 encodes or transcodes an input video stream and the resulting encoded video stream is buffered or otherwise stored in a cache, memory, hard drive or other storage device (not shown) until it is accessed for decoding and display by thevideo destination 106. As another example, themultimedia system 100 can comprise a video content server system, whereby thevideo source 102 comprises one or more hard drives or other mass-storage devices storing original video content, thevideo destination 106 is a remote computer system connected to the video content server via a network, and thevideo processing device 104 is used to transcode the video content responsive to current network conditions before the transcoded video content is transmitted to the remote computer system via the network. - In operation, the
video source 102 transmits or otherwise provides aninput video stream 108 to thevideo processing device 104 in either an analog format, such as a National Television System Committee (NTSC) or Phase Alternating Line (PAL) format, or a digital format, such as an H.263 format, an H.264 format, a Moving Picture Experts Group (MPEG) format (such as MPEG1, MPEG-2 or MPEG4), Quicktime format, Real Media format, Windows Media Video (WMV) or Audio Video Interleave (AVI), or other digital video format, either standard or proprietary. In instances whereby theinput video stream 108 has an analog format, thevideo processing device 104 operates to encode theinput video stream 108 to generate an encodedvideo stream 110, and in instances whereby theinput video stream 108 has a digital format, thevideo processing device 104 operates to transcode theinput video stream 108 to generate theencoded video stream 110. The resulting encodedvideo stream 110 is transmitted to thevideo destination 106 for storage, decoding, display, and the like. - In the illustrated embodiment, the
video processing device 104 includesinterfaces encoder 116, arate control module 118, and, in instances whereby thevideo processing device 104 provides transcoding, adecoder 120. Theinterfaces video source 102 and thevideo destination 106, respectively. Examples of theinterfaces decoder 120, theencoder 116, andrate control module 118 each may be implemented entirely in hard-coded logic (that is, hardware), as the combination of software stored in amemory 122 and aprocessor 124 to access and execute the software, or as combination of hard-coded logic and software-executed functionality. To illustrate, in one embodiment, thevideo processing device 104 is implemented as a SOC whereby portions of thedecoder 120, theencoder 116, and therate control module 118 are implemented as hardware logic, and other portions are implemented via firmware stored at the SOC and executed by a processor of the SOC. - The hardware of the
video processing device 104 can be implemented using a single processing device or a plurality of processing devices. Such processing devices can include a central processing unit (CPU), a graphics processing unit (GPU), a microcontroller, a digital signal processor, a field programmable gate array, programmable logic device, state machine, logic circuitry, analog circuitry, digital circuitry, or any device that manipulates signals (analog and/or digital) based on operational instructions that are stored in a memory, such as thememory 122. Thememory 122 may be a single memory device or a plurality of memory devices. Such memory devices can include a hard disk drive or other disk drive, read-only memory, random access memory, volatile memory, non-volatile memory, static memory, dynamic memory, flash memory, cache memory, and/or any device that stores digital information. Note that when the processing module implements one or more of its functions via a state machine, analog circuitry, digital circuitry, and/or logic circuitry, the memory storing the corresponding operational instructions may be embedded within, or external to, the circuitry comprising the state machine, analog circuitry, digital circuitry, and/or logic circuitry. - In a transcoding mode, the
decoder 120 operates to receive theinput video stream 108 via theinterface 112 and partially or fully decode theinput video stream 108 to create adecoded data stream 126, which can include pixel information, motion estimation/detection information, timing information, and other video parameters. Theencoder 116 receives thedecoded data stream 126 and uses the video parameters represented by the decoded data stream to generate the encodedvideo stream 110, which comprises a transcoded representation of the video content of the originalinput video stream 108. The transcoding process implemented by theencoder 116 can include, for example, a stream format change (e.g., conversion from an MPEG-2 format to an AVC format), a resolution change, a frame rate change, a bit rate change, and the like. In an encoding mode, thedecoder 120 is bypassed and theinput video stream 108 is digitized and then encoded by theencoder 116 to generate theencoded video stream 110. - Video encoding schemes generally process frames as rows, or horizontal scan lines, of picture elements (“pixels”). Each frame comprises an “even” field composed of the even numbered rows of the frame, such as
rows 0, 2, 4, etc., and an “odd” field composed of the odd numbered rows of the frame, such asrows 1, 3, 5, etc. Thus, the odd and even fields are interleaved with respect to a given frame. The even field and odd field are also known as the “top” field and “bottom” field, respectively. A progressive display is configured to concurrently display both the even field and the odd field of a frame, whereas an interlaced display is configured to display the two fields of a frame in sequence. Similarly, progressive encoding is configured to encode a frame using both fields, whereas interlaced encoding separately encodes the fields. - In at least one embodiment, the
encoder 116 leverages this dual-field representation of frames by employing at least two encoding modes, including a dual-field encoding mode and a single-field encoding mode for encoding the sequence of frames represented by theinput video stream 108. In the dual-field encoding mode, theencoder 116 encodes both the even field and the odd field of a frame so that the picture content of both the even field and the odd field is represented, in compressed form, in the encodedvideo stream 110. As part of this dual-field encoding mode, theencoder 116 can employ Picture Adaptive Frame Field (PAFF) encoding as found in, for example, the MPEG-4 AVC standard. PAFF encoding enables a frame to be encoded as either interlaced (e.g., each field separately encoded) or progressive (the two fields encoded together) based on a motion analysis between the two fields of the frame. In the single-field encoding mode, theencoder 116 encodes only a single field of each frame so that the picture content of only one field of each frame is represented in the encoded video stream. The picture content of other field of the frame is disregarded by theencoder 116, which may instead insert skip information into the encodedvideo stream 110 in place of the other field, whereby the skip information references the field that was encoded. This approach halves the vertical resolution of the frame while maintaining the same horizontal resolution. As such, the single-field encoding mode can achieve a substantially lower bit rate compared to the dual-field encoding mode while maintaining the same horizontal resolution. - The
rate control module 118 dynamically determines and adjusts various encoding parameters used by theencoder 116 to achieve a target bit rate. In one embodiment, these encoding parameters include a control signal 128 (denoted “QP” inFIG. 1 ) to set the QP used during by quantization process of theencoder 116, as well as a control signal 130 (denoted “MODE” inFIG. 1 ) to select which of the two encoding modes is to be employed by theencoder 116. As described in greater detail below with reference toFIG. 2 , therate control module 118 continuously monitors the complexity of the pictures to be encoded, the bits allocated to encode the pictures, and a specified target bit rate (which may be constant or may vary with, for example, changing network bandwidth parameters) to determine the next value for QP and signals this new QP value viacontrol signal 128. - Generally, as QP increases, the degree of quantization implemented by the
encoder 116 increases, which results in a lower degree of spatial resolution retained in the resulting encoded picture content, and vice versa. However, at very low target bit rates, the very high QP that might otherwise result can significantly degrade the quality of the resulting encoded picture content. Accordingly, in at least one embodiment, therate control module 118 uses the QP to adaptively select the mode to be employed by theencoder 116. While the QP is within a range deemed acceptable, therate control module 118 configures thecontrol signal 130 to place theencoder 116 in the dual-field encoding mode so that both fields of each frame being processed are encoded for inclusion in the encodedvideo stream 110. However, when the QP is deemed to be excessively high (that is, at a level deemed to result in an unacceptable reduction in quality), therate control module 118 configures thecontrol signal 130 to switch theencoder 116 into the single-field encoding mode so that only a single field of each frame being process is encoded for inclusion in the encodedvideo stream 110. As single-field encoding mode reduces the vertical picture information that needs to be encoded for each frame, and thus reduces the amount of data needed to represent the frame in the encodedvideo stream 110, therate control module 118 can also decrease the QP to take advantage of the additional bit rate headroom made available. Thus, the single-field encoding mode allows theencoder 116 to achieve the target bit rate with a lower QP and the same horizontal resolution, which typically facilitates a higher quality decoded video compared to a conventional process whereby a higher QP is employed while including both fields of the frame in the encoded video stream. - To illustrate,
FIG. 1 also depicts atimeline 140 for anexample portion 150 of the encodedvideo stream 110 to illustrate the dynamic switching between the dual-field encoding mode and the single-field encoding mode. At time T0 encoding conditions permit a higher target bit rate and lower QP, and thus therate control module 118 configures theencoder 116 to switch to the dual-field encoding mode. While in this dual-field encoding mode, theencoder 116 encodes frames of a subsequence of the frames of theinput video stream 108 to include the picture content of both the even and odd fields of each frame. For example, frame 0 is encoded to include the picture content of theeven field 151 and thebottom field 152 of frame 0 in the encodedvideo stream 110, and this process continues for each frame up through frame J at time T1. - Between time T0 and time T1, the
rate control module 118 increases QP based on any of a variety of factors, such as increasing image complexity, decreasing target bit rate, etc. By time T1, QP has been increased to the point that it exceeds a specified threshold, and thus at time T1 therate control module 118 configures theencoder 116 to switch to the single-field encoding mode. Moreover, with the additional bit rate headroom cleared by switching to the single-field encoding mode, therate control module 118 decreases QP. While in the single-field encoding mode, theencoder 116 encodes frames of a subsequence of frames to include the picture content of only the even field of each frame. For example, the frame J+1 processed after time t1 is encoded to include the picture content of theeven field 153 in the encodedvideo stream 110, while the picture content of thebottom field 154 of frame J+1 is disregarded. Thus, skipinformation 155 is included in the encodedvideo stream 110 in place of what would have been the encoded picture content of thebottom field 154. Rather than including picture content, thisskip information 155 specifies an all-skip mode (that is, sets each macroblock (MB) in thebottom field 154 to skip mode) and references theeven field 153 of the same frame. Alternatively, the odd field can be encoded and the even field disregarded during the single-field encoding mode. The single-field encoding process continues for each frame up through frame K at time T2. - Between time T1 and time T2, the
rate control module 118 decreases QP based on various factors, such as decreasing image complexity, increasing target bit rate, etc. By time T2, QP has been decreased to the point that it falls below a specified threshold, and thus at time T2 therate control module 118 configures theencoder 116 to switch back to the dual-field encoding mode. As with before, while in the dual-field encoding mode theencoder 116 encodes frames of a subsequence of frames to include the picture content of both the even and odd fields of each frame, such as by encoding the picture content of both theeven field 156 and theodd field 157 in the encodedvideo stream 110 of a frame K+1 processed after time T2. - In some implementations, the threshold used to trigger the switch from the dual-field encoding mode to the single-field encoding mode can be the same threshold as that used to trigger the switch from the single-field encoding mode to the dual-field encoding mode. However, this can lead to frequent switching between the two encoding modes. To reduce or eliminate perceptible quality deviations resulting from frequent switching between the dual-field encoding mode and single-field encoding mode, a directional switch with two thresholds may be employed. For example, the QP threshold used to initiate a switch from the dual-field encoding mode to the single-field encoding mode may be higher than the QP threshold used to initiate a switch from the single-field encoding mode to the dual-field encoding mode. Moreover, the
rate control module 118 can control the toggling frequency between modes to a degree by delaying a switch between modes be delayed until a specified switch point occurs or by implementing a minimum distance condition between mode switch points. Such switch points can include, for example, scene, group of picture (GOP), or mini-GOP boundaries. The minimum distance condition can be specified as, for example, a minimum number of GOPs, mini-GOPs, or scene changes since the previous switch, a minimum lapse of time since the previous switch, and the like. - As noted above, the
video destination 106 can operate to decode and display the encodedvideo stream 110. To this end, thevideo destination 106 includes adecoder 160 and adisplay device 162. Thedecoder 160 operates to decode the encodedvideo stream 110 to generate a decoded video stream and then provide this decoded video stream to thedisplay device 162. For those frames of a subsequence encoded under the dual-field encoding mode, such as frames 0, J, and K+1, thedecoder 160 decodes the picture content of both the even field and the odd field of the frame from the encodedvideo stream 110 and displays the resulting decoded representation of the picture content of both fields either concurrently for a progressive display or in sequence for an interlaced display. For those frames of a subsequence encoded under the single-field encoding mode, such as frames J+1, J+2, and K, the same decoding process may be used under conventional conditions, although the video content of the omitted field will not be present in the decoded result. -
FIG. 2 illustrates an example implementation of therate control module 118 in greater detail in accordance with at least one embodiment of the present disclosure. In the depicted example, therate control module 118 includes acomplexity estimation module 202, a bit allocation module 204, a virtual buffer model (VBM) 206, and a rate-quantization module 208. - In operation, the
encoder 116 employs a subtraction process and motion estimation process for data representing macroblocks of pixel values for a picture to be encoded. The motion estimation process compares each of these new macroblocks with macroblocks in a previously stored reference picture or pictures to find the macroblock in a reference picture that most closely matches the new macroblock. The motion estimation process then calculates a motion vector, which represents the horizontal and vertical displacement from the macroblock being encoded to the matching macroblock-sized area in the reference picture. The motion estimation process also provides this matching macroblock (known as a predicted macroblock) out of the reference picture memory to the subtraction process, whereby it is subtracted, on a pixel-by-pixel basis, from the new macroblock entering the encoder. This forms an error prediction, or “residual”, that represents the difference between the predicted macroblock and the actual macroblock being encoded. Theencoder 116 employs a two-dimensional (2D) discrete cosine transform (DCT) to transform the residual from the spatial domain. The resulting DCT coefficients of the residual are then quantized using the QP so as to reduce the number of bits needed to represent each coefficient. The quantized DCT coefficients then may be Huffman run/level coded to further reduces the average number of bits per coefficient. This is combined with motion vector data and other side information (including an indication of I, P or B pictures) for insertion into the encodedvideo stream 110. - For the case of P pictures, the quantized DCT coefficients also go to an internal loop that represents the operation of the decoder (a decoder within the encoder). The residual is inverse quantized and inverse DCT transformed. The predicted macroblock is read out of the reference picture memory is added back to the residual on a pixel by pixel basis and stored back into a memory to serve as a reference for predicting subsequent pictures. The encoding of I pictures uses the same process, except that no motion estimation occurs and the negative (−) input to the subtraction process is forced to 0. In this case the quantized DCT coefficients represent transformed pixel values rather than residual values as was the case for P and B pictures. As is the case for P pictures, decoded I pictures are stored as reference pictures.
- The rate-
quantization module 208 uses the image complexity and bit allocations as parameters for determining the QP, which in turn determines the degree of quantization performed by theencoder 116 and thus influences the bit rate of the resulting encoded video data. In one embodiment, the image complexity is estimated by thecomplexity estimation module 202, which calculates a mean average difference (MAD) of the residuals as an estimate of image complexity for picture data to be encoded. The MAD may be calculated using any of a variety of well-known algorithms. The bit allocations are represented by target numbers of bits that may be allocated at different granularities, such as per frame, picture, GOP, slice, or block. In one embodiment, theVBM 206 maintains a model of the buffer fullness of a modeled decoder receiving the encodedvideo stream 110 and the bit allocation module 204 determines the number of target bits to allocate based on the buffer fullness and a specified target bit rate, which can include a specific bit rate or a bit rate range, using any of a variety of well-known bit allocation algorithms. - The rate-
quantization module 208 uses the calculated MAD and the target bit allocation to calculate a value for QP that is expected to achieve the target bit rate when used to encode the picture data having the calculated MAD and target bit allocation. Any of a variety of well-known QP calculation techniques may be used to determine the value for QP. Moreover, the rate-quantization module 208 may employ a QP limiter to dampen any rapid changes in the QP value so as to provide stability and minimize perceptible variations in quality. The revised QP value is signaled to theencoder 116 via thecontrol signal 128. - Moreover, as noted above, the rate-
quantization module 208 uses the relationship between QP and one or more specified thresholds to switch theencoder 116 between a single-field encoding mode (denoted as mode 210 inFIG. 2 ) and a dual-field encoding mode (denoted asmode 212 inFIG. 2 ) via thecontrol signal 130. These specified thresholds may be programmed via software-observable registers or pre-configured via a read-only memory (ROM), fuses, or one-time-programmable (OTP) registers, and the like. Moreover, the rate-quantization module 208 can use thecontrol signal 130 to specify whether theencoder 116 is to employ PAFF in a PAFF sub-mode (denoted as sub-mode 214 inFIG. 2 ) or bypass PAFF in a non-PAFF sub-mode (denoted as sub-mode 216 inFIG. 2 ) while theencoder 116 is in the dual-field encoding mode. - In the event that the rate-
quantization module 208 determines to switch from the dual-field encoding mode to the single-field encoding mode responsive to the QP exceeding an upper threshold, the rate-quantization module 208 is configured to reduce the QP to take advantage of the bit rate headroom created by application of the single-field encoding mode to the encodedvideo stream 110. In one embodiment, the rate-quantization module 208 implements a fixed reduction to the QP upon switching to the single-field encoding mode. In other embodiments, the one or both of the MAD and the target bit allocation are updated to reflect that only one of the two fields of each frame is to be encoded, and the rate-quantization module 208 updates QP based on these updated input parameters. Thus, the rate-quantization module 208 can control theencoder 116 through the QP value and the encoding mode so as to provide more optimal video quality at the original horizontal resolution while also meeting very low target bit rates. -
FIG. 3 illustrates anexample method 300 for setting the encoding parameters of theencoder 116 of thevideo processing device 104 ofFIGS. 1 and 2 . Upon receiving theinput video stream 108 at thevideo processing device 104 from thevideo source 102, thevideo processing device 104 initiates an encoding process using theencoder 116 to either encode or transcode theinput video stream 108. Accordingly, atblock 302 the rate-quantization module 208 determines encoding parameters for theencoder 116, including the QP to be implemented for the quantization process employed by theencoder 116, based on various parameters, including image complexity and target bit allocations. This process is repeated continuously to update QP based on dynamically changing parameters. - With the QP determined, the rate-
quantization module 208 then turns to determining which encoding mode is to be employed by theencoder 116 based on the QP. Accordingly, atblock 304 the rate-quantization module 208 compares the QP with a specified upper threshold to determine whether the QP is excessively high, and thus likely to significantly impact video quality. Accordingly, if the QP exceeds the upper threshold, the rate-quantization module 208 prepares to switch to the single-field encoding mode. To this end, atblock 306 the rate-quantization module 208 delays the switch until the next encountered switch point. As noted above, the switch points can include, for example, scene, GOP, or mini-GOP boundaries, and a minimum distance condition may be instituted between mode switches. Once the next switch point is encountered and the minimum distance condition from the previous switch is satisfied, atblock 308 the rate-quantization module 208 decreases QP either by a fixed step or based on a recalculation of QP in view of the reduced image complexity and increased target bit rate allocation that will occur in the single-field encoding mode. The rate-quantization module 208 then uses thecontrol signal 130 to configure theencoder 116 to the single-field encoder mode atblock 310. As described above, while in the single-field encoder mode, theencoder 116 encodes the picture data of a select one of the two fields using the QP and includes the encoded picture data in the encodedvideo stream 110 atblock 312 and discards or otherwise disregards the picture data of the non-selected one of the two fields atblock 314. - Returning to block 304, if the QP does not exceed the upper threshold, at
block 316 the rate-quantization module 208 determines whether the QP falls below a specified lower threshold. If the QP is between the lower threshold and the upper threshold, atblock 318 the rate-quantization module 208 maintains theencoder 116 in its current encoding mode. Otherwise, if the QP is below the lower threshold, the rate-quantization module 208 prepares to switch theencoder 116 to the dual-field encoding mode. Accordingly, atblock 320 the rate-quantization module 208 delays the switch until the next switch point is encountered and any minimum distance condition is met. Thereafter, atblock 322 the rate-quantization module 208 reconfigures theencoder 116 to switch to the dual-field encoding mode and the rate-quantization module 208 increases QP either by a fixed step or based on a recalculation of QP in a similar manner as described above with reference to block 308. As noted above, while in the dual-field encoding mode, theencoder 116 encodes both fields of each frame of the frame subsequence being processed while in the dual-field encoding mode. The dual-field encoding mode may employ PAFF such that the motion between fields of a frame or other measure of complexity is used to determine whether to encode both fields of a frame together in a frame-based encoding mode atblock 324, or to encode each field separately in a field-based encoding mode atblock 326. - The QP is continuously updated at
block 302 based on dynamic changes in the parameters used to calculate QP, such as the complexity of the particular pictures to be encoded, fluctuations in the target bit rate (due to, for example, fluctuations in the bandwidth of a network link), and the like. Thevideo processing device 104 can repeat the process represented by blocks 304-326 based on the updated QP so as to dynamically adapt theencoder 116 to varying encoding limitations while attempting to maintain the original horizontal resolution in a manner that provides high video quality at lower bit rates. - In this document, relational terms such as “first” and “second”, and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual relationship or order between such entities or actions or any actual relationship or order between such entities and claimed elements. The term “another”, as used herein, is defined as at least a second or more. The terms “including”, “having”, or any variation thereof, as used herein, are defined as comprising.
- Other embodiments, uses, and advantages of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. The specification and drawings should be considered as examples only, and the scope of the disclosure is accordingly intended to be limited only by the following claims and equivalents thereof.
- Note that not all of the activities or elements described above in the general description are required, that a portion of a specific activity or device may not be required, and that one or more further activities may be performed, or elements included, in addition to those described. Still further, the order in which activities are listed are not necessarily the order in which they are performed.
- Also, the concepts have been described with reference to specific embodiments. However, one of ordinary skill in the art appreciates that various modifications and changes can be made without departing from the scope of the present disclosure as set forth in the claims below. Accordingly, the specification and figures are to be regarded in an illustrative rather than a restrictive sense, and all such modifications are intended to be included within the scope of the present disclosure.
- Benefits, other advantages, and solutions to problems have been described above with regard to specific embodiments. However, the benefits, advantages, solutions to problems, and any feature(s) that may cause any benefit, advantage, or solution to occur or become more pronounced are not to be construed as a critical, required, or essential feature of any or all the claims.
Claims (20)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/705,422 US9560361B2 (en) | 2012-12-05 | 2012-12-05 | Adaptive single-field/dual-field video encoding |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/705,422 US9560361B2 (en) | 2012-12-05 | 2012-12-05 | Adaptive single-field/dual-field video encoding |
Publications (2)
Publication Number | Publication Date |
---|---|
US20140153640A1 true US20140153640A1 (en) | 2014-06-05 |
US9560361B2 US9560361B2 (en) | 2017-01-31 |
Family
ID=50825436
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/705,422 Active 2034-03-06 US9560361B2 (en) | 2012-12-05 | 2012-12-05 | Adaptive single-field/dual-field video encoding |
Country Status (1)
Country | Link |
---|---|
US (1) | US9560361B2 (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140247983A1 (en) * | 2012-10-03 | 2014-09-04 | Broadcom Corporation | High-Throughput Image and Video Compression |
US20140278718A1 (en) * | 2013-03-15 | 2014-09-18 | International Business Machines Corporation | Enhanced time-management and recommendation system |
US20150222905A1 (en) * | 2012-08-27 | 2015-08-06 | Thomson Licensing | Method and apparatus for estimating content complexity for video quality assessment |
US20160198156A1 (en) * | 2015-01-05 | 2016-07-07 | Young Beom Jung | Methods, systems and devices including an encoder for image processing |
US10332534B2 (en) | 2016-01-07 | 2019-06-25 | Microsoft Technology Licensing, Llc | Encoding an audio stream |
US11159796B2 (en) | 2017-01-18 | 2021-10-26 | SZ DJI Technology Co., Ltd. | Data transmission |
US11172010B1 (en) * | 2017-12-13 | 2021-11-09 | Amazon Technologies, Inc. | Managing encoder updates |
US20220191508A1 (en) * | 2020-12-14 | 2022-06-16 | Comcast Cable Communications, Llc | Methods and systems for improved content encoding |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11363266B2 (en) * | 2017-11-09 | 2022-06-14 | Amimon Ltd. | Method and system of performing inter-frame prediction in video compression |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020009148A1 (en) * | 1997-07-15 | 2002-01-24 | Toshiroh Nishio | Progressive image signal transmitter, progressive image signal receiver and, medium |
US20050053302A1 (en) * | 2003-09-07 | 2005-03-10 | Microsoft Corporation | Interlace frame lapped transform |
US20050053300A1 (en) * | 2003-09-07 | 2005-03-10 | Microsoft Corporation | Bitplane coding of prediction mode information in bi-directionally predicted interlaced pictures |
US20050084006A1 (en) * | 2003-10-16 | 2005-04-21 | Shawmin Lei | System and method for three-dimensional video coding |
US20050111740A1 (en) * | 2003-09-05 | 2005-05-26 | Hiroyuki Sakuyama | Coding apparatus, coding method, program and information recording medium that can suppress unnatural degradation of image quality |
US20050152448A1 (en) * | 2003-09-07 | 2005-07-14 | Microsoft Corporation | Signaling for entry point frames with predicted first field |
US20110090956A1 (en) * | 2009-10-15 | 2011-04-21 | Sony Corporation | Compression method using adaptive field data selection |
US20110255594A1 (en) * | 2010-04-15 | 2011-10-20 | Soyeb Nagori | Rate Control in Video Coding |
US20130107961A1 (en) * | 2011-10-28 | 2013-05-02 | Fujitsu Limited | Video transcoder and video transcoding method |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5486863A (en) | 1994-04-29 | 1996-01-23 | Motorola, Inc. | Method for determining whether to intra code a video block |
US5610659A (en) | 1995-05-08 | 1997-03-11 | Futuretel, Inc. | MPEG encoder that concurrently determines video data encoding format and rate control |
US5978029A (en) | 1997-10-10 | 1999-11-02 | International Business Machines Corporation | Real-time encoding of video sequence employing two encoders and statistical analysis |
EP0917362A1 (en) | 1997-11-12 | 1999-05-19 | STMicroelectronics S.r.l. | Macroblock variance estimator for MPEG-2 video encoder |
US7299370B2 (en) | 2003-06-10 | 2007-11-20 | Intel Corporation | Method and apparatus for improved reliability and reduced power in a processor by automatic voltage control during processor idle states |
US8787447B2 (en) | 2008-10-30 | 2014-07-22 | Vixs Systems, Inc | Video transcoding system with drastic scene change detection and method for use therewith |
US9407925B2 (en) | 2008-10-30 | 2016-08-02 | Vixs Systems, Inc. | Video transcoding system with quality readjustment based on high scene cost detection and method for use therewith |
-
2012
- 2012-12-05 US US13/705,422 patent/US9560361B2/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020009148A1 (en) * | 1997-07-15 | 2002-01-24 | Toshiroh Nishio | Progressive image signal transmitter, progressive image signal receiver and, medium |
US20050111740A1 (en) * | 2003-09-05 | 2005-05-26 | Hiroyuki Sakuyama | Coding apparatus, coding method, program and information recording medium that can suppress unnatural degradation of image quality |
US20050053302A1 (en) * | 2003-09-07 | 2005-03-10 | Microsoft Corporation | Interlace frame lapped transform |
US20050053300A1 (en) * | 2003-09-07 | 2005-03-10 | Microsoft Corporation | Bitplane coding of prediction mode information in bi-directionally predicted interlaced pictures |
US20050152448A1 (en) * | 2003-09-07 | 2005-07-14 | Microsoft Corporation | Signaling for entry point frames with predicted first field |
US20050084006A1 (en) * | 2003-10-16 | 2005-04-21 | Shawmin Lei | System and method for three-dimensional video coding |
US20110090956A1 (en) * | 2009-10-15 | 2011-04-21 | Sony Corporation | Compression method using adaptive field data selection |
US20110255594A1 (en) * | 2010-04-15 | 2011-10-20 | Soyeb Nagori | Rate Control in Video Coding |
US20130107961A1 (en) * | 2011-10-28 | 2013-05-02 | Fujitsu Limited | Video transcoder and video transcoding method |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150222905A1 (en) * | 2012-08-27 | 2015-08-06 | Thomson Licensing | Method and apparatus for estimating content complexity for video quality assessment |
US10536703B2 (en) * | 2012-08-27 | 2020-01-14 | Interdigital Ce Patent Holdings | Method and apparatus for video quality assessment based on content complexity |
US9978156B2 (en) * | 2012-10-03 | 2018-05-22 | Avago Technologies General Ip (Singapore) Pte. Ltd. | High-throughput image and video compression |
US20140247983A1 (en) * | 2012-10-03 | 2014-09-04 | Broadcom Corporation | High-Throughput Image and Video Compression |
US20140278718A1 (en) * | 2013-03-15 | 2014-09-18 | International Business Machines Corporation | Enhanced time-management and recommendation system |
US20140278678A1 (en) * | 2013-03-15 | 2014-09-18 | International Business Machines Corporation | Enhanced time-management and recommendation system |
US10271050B2 (en) * | 2015-01-05 | 2019-04-23 | Samsung Electronics Co., Ltd. | Methods, systems and devices including an encoder for image processing |
KR20160084072A (en) * | 2015-01-05 | 2016-07-13 | 삼성전자주식회사 | Method for operating of encoder, and devices having the encoder |
CN105763879A (en) * | 2015-01-05 | 2016-07-13 | 三星电子株式会社 | Methods, Systems And Devices Including Encoder For Image Processing |
US20160198156A1 (en) * | 2015-01-05 | 2016-07-07 | Young Beom Jung | Methods, systems and devices including an encoder for image processing |
TWI688260B (en) * | 2015-01-05 | 2020-03-11 | 南韓商三星電子股份有限公司 | Methods, systems, and devices including an encoder for image processing |
KR102365685B1 (en) * | 2015-01-05 | 2022-02-21 | 삼성전자주식회사 | Method for operating of encoder, and devices having the encoder |
US10332534B2 (en) | 2016-01-07 | 2019-06-25 | Microsoft Technology Licensing, Llc | Encoding an audio stream |
US11159796B2 (en) | 2017-01-18 | 2021-10-26 | SZ DJI Technology Co., Ltd. | Data transmission |
US11172010B1 (en) * | 2017-12-13 | 2021-11-09 | Amazon Technologies, Inc. | Managing encoder updates |
US12021911B2 (en) | 2017-12-13 | 2024-06-25 | Amazon Technologies, Inc. | Managing encoder updates |
US12395539B2 (en) | 2017-12-13 | 2025-08-19 | Amazon Technologies, Inc. | Managing encoder updates |
US20220191508A1 (en) * | 2020-12-14 | 2022-06-16 | Comcast Cable Communications, Llc | Methods and systems for improved content encoding |
US12015783B2 (en) * | 2020-12-14 | 2024-06-18 | Comcast Cable Communications, Llc | Methods and systems for improved content encoding |
Also Published As
Publication number | Publication date |
---|---|
US9560361B2 (en) | 2017-01-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9560361B2 (en) | Adaptive single-field/dual-field video encoding | |
US9426475B2 (en) | Scene change detection using sum of variance and estimated picture encoding cost | |
US9565440B2 (en) | Quantization parameter adjustment based on sum of variance and estimated picture encoding cost | |
US10171824B2 (en) | System and method for adaptive frame re-compression in video processing system | |
KR100850705B1 (en) | Method for adaptive encoding motion image based on the temperal and spatial complexity and apparatus thereof | |
JP6272321B2 (en) | Use of chroma quantization parameter offset in deblocking | |
TWI399097B (en) | System and method for encoding video, and computer readable medium | |
KR101012600B1 (en) | Rate control using image-based lookahead window | |
US8077775B2 (en) | System and method of adaptive rate control for a video encoder | |
KR100850706B1 (en) | Method for adaptive encoding and decoding motion image and apparatus thereof | |
CN110740318A (en) | Automatic adaptive long-term reference frame selection for video processing and video coding | |
US20100027663A1 (en) | Intellegent frame skipping in video coding based on similarity metric in compressed domain | |
JP2019526195A (en) | Digital frame encoding / decoding by downsampling / upsampling with improved information | |
EP2965518B1 (en) | Resource for encoding a video signal | |
KR20140110221A (en) | Video encoder, method of detecting scene change and method of controlling video encoder | |
WO2007012928A1 (en) | Method, module, device and system for rate control provision for video encoders capable of variable bit rate encoding | |
US8948242B2 (en) | Encoding device and method and multimedia apparatus including the encoding device | |
US7899121B2 (en) | Video encoding method, video encoder, and personal video recorder | |
US9955168B2 (en) | Constraining number of bits generated relative to VBV buffer | |
US20150163486A1 (en) | Variable bitrate encoding | |
JP2005260935A (en) | Method and apparatus for increasing average image refresh rate in a compressed video bitstream | |
KR20050024732A (en) | H.263/MPEG Video Encoder for Effective Bits Rate Control and Its Control Method | |
US20150163484A1 (en) | Variable bitrate encoding for multiple video streams | |
US20130077674A1 (en) | Method and apparatus for encoding moving picture | |
KR20120008314A (en) | An apparatus and method for encoding / decoding video using adaptive macroblock size control and subblock depth control based on image characteristics and context |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: VIXS SYSTEMS INC., CANADA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:ZHAO, XU GANG;LI, YING;REEL/FRAME:029408/0755 Effective date: 20121205 |
|
FEPP | Fee payment procedure |
Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
FEPP | Fee payment procedure |
Free format text: PAT HOLDER NO LONGER CLAIMS SMALL ENTITY STATUS, ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: STOL); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 4 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 8 |