US20160360205A1 - Video encoding methods and systems using adaptive color transform - Google Patents
Video encoding methods and systems using adaptive color transform Download PDFInfo
- Publication number
- US20160360205A1 US20160360205A1 US14/757,556 US201514757556A US2016360205A1 US 20160360205 A1 US20160360205 A1 US 20160360205A1 US 201514757556 A US201514757556 A US 201514757556A US 2016360205 A1 US2016360205 A1 US 2016360205A1
- Authority
- US
- United States
- Prior art keywords
- coding
- unit
- threshold
- transform
- color space
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 238000000034 method Methods 0.000 title claims abstract description 143
- 230000003044 adaptive effect Effects 0.000 title claims description 23
- 238000012360 testing method Methods 0.000 description 53
- 230000008569 process Effects 0.000 description 40
- 241000023320 Luma <angiosperm> Species 0.000 description 25
- OSWPMRLSEDHDFF-UHFFFAOYSA-N methyl salicylate Chemical compound COC(=O)C1=CC=CC=C1O OSWPMRLSEDHDFF-UHFFFAOYSA-N 0.000 description 25
- 230000008859 change Effects 0.000 description 20
- 230000009467 reduction Effects 0.000 description 16
- 238000000638 solvent extraction Methods 0.000 description 14
- 238000004364 calculation method Methods 0.000 description 13
- 238000013139 quantization Methods 0.000 description 12
- 238000011156 evaluation Methods 0.000 description 11
- 101150114515 CTBS gene Proteins 0.000 description 10
- 238000010219 correlation analysis Methods 0.000 description 6
- 238000005192 partition Methods 0.000 description 6
- 230000008901 benefit Effects 0.000 description 4
- 230000000875 corresponding effect Effects 0.000 description 4
- 230000006978 adaptation Effects 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 230000006835 compression Effects 0.000 description 2
- 238000007906 compression Methods 0.000 description 2
- 238000013500 data storage Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 230000002441 reversible effect Effects 0.000 description 2
- 230000002123 temporal effect Effects 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- VBRBNWWNRIMAII-WYMLVPIESA-N 3-[(e)-5-(4-ethylphenoxy)-3-methylpent-3-enyl]-2,2-dimethyloxirane Chemical compound C1=CC(CC)=CC=C1OC\C=C(/C)CCC1C(C)(C)O1 VBRBNWWNRIMAII-WYMLVPIESA-N 0.000 description 1
- 241000985610 Forpus Species 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 239000012141 concentrate Substances 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000001052 transient effect Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/46—Embedding additional information in the video signal during the compression process
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/136—Incoming video signal characteristics or properties
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/124—Quantisation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/129—Scanning of coding units, e.g. zig-zag scan of transform coefficients or flexible macroblock ordering [FMO]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/136—Incoming video signal characteristics or properties
- H04N19/14—Coding unit complexity, e.g. amount of activity or edge presence estimation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/146—Data rate or code amount at the encoder output
- H04N19/15—Data rate or code amount at the encoder output by monitoring actual compressed data size at the memory before deciding storage at the transmission buffer
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/154—Measured or subjectively estimated visual quality after decoding, e.g. measurement of distortion
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/157—Assigned coding mode, i.e. the coding mode being predefined or preselected to be further used for selection of another element or parameter
- H04N19/16—Assigned coding mode, i.e. the coding mode being predefined or preselected to be further used for selection of another element or parameter for a given display mode, e.g. for interlaced or progressive display mode
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
- H04N19/176—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/182—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a pixel
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/186—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a colour or a chrominance component
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/593—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial prediction techniques
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/60—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
- H04N19/61—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/90—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals
- H04N19/96—Tree coding, e.g. quad-tree coding
Definitions
- This disclosure generally relates to methods and systems for video encoding and decoding.
- HEVC High Efficiency Video Coding
- MPEG Moving Picture Experts Group
- VCEG Video Coding Experts Group
- AVC Advanced Video Coding
- HEVC utilizes various coding tools, including inter prediction and intra prediction techniques to compress video during coding.
- Inter prediction techniques utilize temporal redundancies between different video frames in a video stream to compress video data.
- a video frame being currently encoded may utilize portions of previously encoded and decoded video frames containing similar content. These portions of previously encoded and decoded video frames may be used to predict encoding of areas of the current video frame containing similar content.
- intra prediction utilizes only video data within the currently encoded video frame to compress video data. No temporal redundancies between different video frames are employed in intra prediction techniques.
- encoding of a current video frame may utilize other portions of the same frame.
- Intra prediction features 35 intra modes, with the modes including a Planar mode, a DC mode, and 33 directional modes.
- HEVC also uses expansive partitioning and dividing of each input video frame compared to AVC.
- AVC relies only on macroblock division of an input video frame for its encoding and decoding.
- HEVC may divide an input video frame into various data units and blocks that are sized differently, as will be described in more detail below. This aspect of HEVC provides improved flexibility in the encoding and decoding of video frames featuring large amounts of motion, detail, and edges, for example, and allows for efficiency gains over AVC.
- SCC Screen Content Coding
- Example applications implicating SCC may include screen mirroring, cloud gaming, wireless display of content, displays generated during remote computer desktop access, and screen sharing, such as real-time screen sharing during video conferencing.
- the ACT is a color space transform applied to residue pixel samples of a coding unit (CU). For certain color spaces, correlations between color components of a pixel within a CU are present. When a correlation between color components of a pixel is high, performing the ACT on the pixel may help concentrate the energy of correlated color components by de-correlating the color components. Such concentrated energy allows for more efficient coding and decreased coding cost. Thus, the ACT may improve coding performance during HEVC coding.
- evaluating whether to enable ACT requires an additional rate distortion optimization (RDO) check during encoding, where the RDO check evaluates a rate distortion (RD) cost of the coding mode with enabled ACT.
- RDO rate distortion optimization
- RD rate distortion
- One aspect of the present disclosure is directed to a video encoding method.
- the method includes receiving a source video frame, dividing the source video frame into a coding tree unit, determining a coding unit from the coding tree unit, determining a correlation between components of the coding unit, enabling or disabling a coding mode of the coding unit, determining whether to evaluate a size of a transform unit for an enabled coding mode; and determining a transform unit of the coding unit for the enabled coding mode, wherein the size of the coding unit is defined by a number (N) of samples.
- the system includes a memory storing instructions and a processor.
- the instructions when executed by the processor, cause the processor to: receive a source video frame, divide the source video frame into a coding tree unit, determine a coding unit from the coding tree unit, determine a correlation between components of the coding unit, enable or disable a coding mode of the coding unit, determine whether to evaluate a size of a transform unit for an enabled coding mode, and determine a transform unit of the coding unit for the enabled coding mode, wherein the size of the coding unit is defined by a number (N) of samples.
- the present disclosure is directed to a non-transitory computer-readable storage medium storing a set of instructions.
- the instructions when executed by one or more processors, cause the one or more processors to perform a method of video encoding.
- the method of video encoding includes: receiving a source video frame, dividing the source video frame into a coding tree unit, determining a coding unit from the coding tree unit, determining a correlation between components of the coding unit, enabling or disabling a coding mode of the coding unit, determining whether to evaluate a size of a transform unit for an enabled coding mode; and determining a transform unit of the coding unit for the enabled coding mode, wherein the size of the coding unit is defined by a number (N) of samples.
- FIGS. 1A-1J illustrate a video frame and related partitions of the video frame according to embodiments of the present disclosure.
- FIG. 2 shows an exemplary video encoder consistent with the present disclosure.
- FIG. 3 shows a flow chart of an encoding method according to an exemplary embodiment of the present disclosure.
- FIG. 4 shows a flow chart of an encoding method according to another exemplary embodiment of the present disclosure.
- FIG. 5 shows a flow chart of an encoding method according to another exemplary embodiment of the present disclosure.
- FIG. 6 shows a flow chart of an encoding method according to another exemplary embodiment of the present disclosure.
- FIG. 7 shows a system for performing encoding and decoding methods and processes consistent with the present disclosure.
- FIGS. 1A-1J illustrate a video frame and related partitions of the video frame according to embodiments of the present disclosure.
- FIG. 1A shows a video frame 101 that includes a number of pixels located at locations within the video frame.
- Video frame 101 is partitioned into coding tree units (CTUs) 102 .
- Each CTU 102 is sized according to L vertical samples by L horizontal samples (L ⁇ L), where each sample corresponds to a pixel value located at a different pixel location in the CTU.
- L may equal 16, 32, or 64 samples.
- Pixel locations may be locations where pixels are present in the CTU, or locations between where pixels are present in the CTU. When a pixel location is between where pixels are present, the pixel value is an interpolated value determined from pixels located at one or more spatial locations around the pixel location.
- Each CTU 102 includes a luma coding tree block (CTB), chroma CTBs, and associated syntax.
- CTB luma coding tree block
- FIG. 1B shows CTBs that may be contained by a CTU 102 of FIG. 1A .
- CTU 102 may include a luma CTB 103 , and chroma CTBs 104 (Cb CTB) and 105 (Cr CTB).
- CTU 102 also may include associated syntax 106 .
- the Cb CTB 104 is the blue difference chroma component CTB, and represents changes in blue colorfulness for the CTB.
- the Cr CTB 105 is the red difference chroma component CTB, and represents changes in red colorfulness for the CTB.
- Associated syntax 106 contains information as to how CTBs 103 , 104 , and 105 are to be coded, and any further subdivision of CTBs 103 , 104 , and 105 .
- CTBs 103 , 104 , and 105 may have the same size as CTU 102 .
- luma CTB 103 may have the same size as CTU 102
- chroma CTBs 104 and 105 may have sizes smaller than CTU 102 .
- Coding tools such as intra prediction, inter prediction, and others, operate on coding blocks (CBs).
- CBs coding blocks
- CTBs may be partitioned into one or multiple CBs. Partitioning of CTBs into CBs is based on quad-tree splitting. Thus, a CTB may be partitioned into four CBs, where each CB may be further partitioned into four CBs. This partitioning may be continued based on the size of the CTB being partitioned.
- FIG. 1C shows various partitionings of the luma CTB 103 of FIG. 1B into one or multiple luma CBs 107 - 1 , 107 - 2 , 107 - 3 , or 107 - 4 .
- a corresponding luma CB 107 may be sized as N vertical by N horizontal (N ⁇ N) samples, such as 64 ⁇ 64, 32 ⁇ 32, 16 ⁇ 16, or 8 ⁇ 8.
- N ⁇ N N horizontal
- luma CTB 103 is sized as 64 ⁇ 64.
- luma CTB 103 may alternatively be sized as 32 ⁇ 32 or 16 ⁇ 16.
- FIG. 1D shows an example of quadtree partitioning of luma CTB 103 of FIG. 1B , wherein luma CTB 103 is partitioned into CBs 107 - 1 , 107 - 2 , 107 - 3 , or 107 - 4 shown in FIG. 1C .
- luma CTB 103 is sized as 64 ⁇ 64.
- luma CTB 103 may be alternatively be sized as 32 ⁇ 32 or 16 ⁇ 16.
- luma CTB 103 is partitioned into four 32 ⁇ 32 CBs, labeled 107 - 2 .
- Each 32 ⁇ 32 CB may further be partitioned into four 16 ⁇ 16 CBs, labeled 107 - 3 .
- Each 16 ⁇ 16 CB may then be partitioned into four 8 ⁇ 8 CBs, labeled 107 - 4 .
- Coding units are utilized to code CBs.
- a CTB contains only one CU or is divided to contain multiple CUs.
- a CU may also be sized as N vertical by N horizontal (N ⁇ N) samples, such as 64 ⁇ 64, 32 ⁇ 32, 16 ⁇ 16, or 8 ⁇ 8.
- N ⁇ N N horizontal
- Each CU contains a luma CB, two chroma CBs, and associated syntax.
- a residual CU formed during encoding and decoding may be sized the same as the CU corresponding to the residual CU.
- FIG. 1E shows CBs including, for example, luma CB 107 - 1 of FIG. 1C , that may be contained by a CU 108 .
- CU 108 may include luma CB 107 - 1 , and chroma CBs 109 (Cb CB) and 110 (Cr CB).
- CU 108 may also include associated syntax 111 .
- Associated syntax 111 contains information as to how CBs 107 - 1 , 109 , and 110 are to be encoded, such as quadtree syntax that specifies the size and positions of luma and chroma CBs, and further subdivision.
- Each CU 108 may have an associated partition of its CBs 107 - 1 , 109 , and 110 into prediction blocks (PBs). PBs are aggregated into prediction units (PUs).
- PBs prediction blocks
- FIG. 1F shows alternative partitionings of CB 107 - 1 of FIG. 1D into luma PBs 112 .
- CB 107 - 1 may, for example, be partitioned into PBs 112 depending on the predictability of the different areas of the CB 107 - 1 .
- CB 107 - 1 may contain a single PB 112 sized the same as CB 107 - 1 .
- CB 107 - 1 may be partitioned vertically or horizontally into two even PBs 112 , or CB 107 - 1 may be partitioned or CB 107 - 1 may be partitioned vertical or horizontally into four PBs 112 . It is noted that the partitions shown in FIG.
- FIG. 1F are exemplary, and any other kinds of partitions into PBs allowable under the HEVC standard are contemplated by the present disclosure. Furthermore, the different partitions of CB 107 - 1 into PBs 112 as shown in FIG. 1F are mutually exclusive. As an example, in an intra prediction mode in HEVC, 64 ⁇ 64, 32 ⁇ 32, and 16 ⁇ 16 CBs may be partitioned only into a single PB sized the same as the CB, while 8 ⁇ 8 CBs may be partitioned into one 8 ⁇ 8 PB or four 4 ⁇ 4 PBs.
- a residual signal generated from a difference between the prediction block and the source video image block is transformed to another domain for further coding using transforms such as the discrete cosine transform (DCT) or discrete sine transform (DST).
- transforms such as the discrete cosine transform (DCT) or discrete sine transform (DST).
- DCT discrete cosine transform
- DST discrete sine transform
- one or more transform blocks (TB) are utilized for each CU or each CB.
- FIG. 1G shows how luma CB 107 - 1 of FIG. 1E or 1F is partitioned into different TBs 113 - 1 , 113 - 2 , 113 - 3 , and 113 - 4 .
- CB 107 - 1 is a 64 ⁇ 64 CB
- TB 113 - 1 is a 32 ⁇ 32 TB
- TB 113 - 2 is a 16 ⁇ 16 TB
- TB 113 - 3 is a 8 ⁇ 8 TB
- TB 113 - 4 is a 4 ⁇ 4 TB.
- CB 107 - 1 would be partitioned into four TBs 113 - 1 , sixteen TBs 113 - 2 , sixty-four TBs 113 - 3 , and two-hundred and fifty-six TBs 113 - 4 .
- a CB 107 - 1 may be partitioned into TBs 113 all of the same size, or of different sizes.
- Partitioning of CBs into TBs is based on quad-tree splitting.
- a CB may be partitioned into one or multiple TBs, where each TB may be further partitioned into four TBs. This partitioning may be continued based on the size of the CB being partitioned.
- FIG. 1H shows an example of quadtree partitioning of luma CB 107 - 1 of FIG. 1E or 1F , utilizing the various partitionings into TBs 113 - 1 , 113 - 2 , 113 - 3 , or 113 - 4 shown in FIG. 1G .
- luma CB 107 - 1 is sized as 64 ⁇ 64.
- luma CB 107 - 1 may alternatively be sized as 32 ⁇ 32 or 16 ⁇ 16.
- luma CB 107 - 1 is partitioned into four 32 ⁇ 32 TBs, labeled 113 - 1 .
- Each 32 ⁇ 32 TB may further be partitioned into four 16 ⁇ 16 TBs, labeled 113 - 2 .
- Each 16 ⁇ 16 TB may then be partitioned into four 8 ⁇ 8 TBs, labeled 113 - 3 .
- Each 8 ⁇ 8 TB may then be partitioned into four 4 ⁇ 4 TBs, labeled 113 - 4 .
- TBs 113 are then transformed via, for example a DCT, or any other transform contemplated by the HEVC standard.
- Transform units (TUs) aggregate TBs 113 .
- One or more TBs are utilized for each CB.
- CBs form each CU.
- Transform unit(TU) structure is different for different CUs 108 , and is determined from CUs 108 .
- FIG. 1I shows alternative partitionings 113 - 1 , 113 - 2 , 113 - 3 , and 113 - 4 of a TU 114 , where each TU aggregates partitioned TBs of FIG. 1G or 1H .
- a 32 ⁇ 32 sized TU 114 can hold a single TB 113 - 1 sized 32 ⁇ 32, or one or more TBs 113 sized 16 ⁇ 16 ( 113 - 2 ), 8 ⁇ 8 ( 113 - 3 ), or 4 ⁇ 4 ( 113 - 4 ).
- the TU may be larger than PU, such that the TU may contain PU boundaries. However, the TU may not cross PU boundaries for a CU enabling intra prediction in the HEVC.
- FIG. 1J shows an example of quadtree partitioning of TU 114 of FIG. 1I , utilizing the various partitionings into TBs 113 - 1 , 113 - 2 , 113 - 3 , or 113 - 4 shown in FIG. 1I .
- TU 114 is sized as 32 ⁇ 32.
- TU may alternatively be sized as 16 ⁇ 16, 8 ⁇ 8, or 4 ⁇ 4.
- TU 114 is partitioned into one TB 113 - 1 sized 32 ⁇ 32, and four 16 ⁇ 16 TBs labeled 113 - 2 . Each 16 ⁇ 16 TB may further be partitioned into four 8 ⁇ 8 TBs, labeled 113 - 3 . Each 8 ⁇ 8 TB may then be partitioned into four 4 ⁇ 4 TBs, labeled 113 - 4 .
- each may include any features, sizes, and properties in accordance with the HEVC standard.
- the partitioning shown in FIGS. 1C, 1E, and 1F also applies to the chroma CTBs CTB 104 (Cb CTB) and CTB 105 (Cr CTB), and chroma CBs CB 109 (Cb CB) and CB 110 (Cr CB).
- FIG. 2 shows an exemplary video encoder 200 for performing encoding methods consistent with the present disclosure.
- Video encoder 200 may include one or more additional components that provide additional encoding functions contemplated by HEVC-SCC, such as palette mode, sample adaptive offset, and de-blocking filtering. Additionally, the present disclosure contemplates intra prediction mode enabling ACT, as well as other coding modes, such as inter prediction enabling ACT.
- An input source video frame is received by encoder 200 .
- the input source frame is first input into a Frame Dividing Module 202 , in which the frame is divided into at least one source CTU.
- a source CU is then determined from the source CTU.
- Source CTU sizes and source CU sizes are determined by Frame Dividing Module 202 .
- Encoding then takes place on a CU-by-CU basis, with source CUs output by Frame Dividing Module 202 input into Inter Prediction enabling adaptive color transformation (ACT) Module 204 , Inter Prediction disabling ACT Module 206 , Intra Prediction enabling ACT Module 212 , and Intra Prediction disabling ACT Module 214 .
- Inter Prediction enabling adaptive color transformation (ACT) Module 204 Inter Prediction enabling adaptive color transformation (ACT) Module 204
- Inter Prediction disabling ACT Module 206 Inter Prediction disabling ACT Module 206
- Intra Prediction enabling ACT Module 212 Intra Prediction disabling ACT Module
- Source CUs of the input frame are encoded by Inter Prediction enabling ACT Module 204 , in which a prediction of a source CU from the input frame is determined using inter prediction techniques with adaptive color transformation enabled.
- Source CUs of the input frame are also encoded by Inter Prediction disabling ACT Module 206 , in which a prediction of a source CU from the input frame is determined using inter prediction techniques without ACT enabled, i.e., ACT is disabled.
- Reference CUs from frames in a Frame Buffer 208 are utilized during the inter frame prediction.
- Source PUs and PBs are also determined from source CU and utilized during the inter frame prediction by Modules 204 and 206 .
- Inter frame prediction utilizes motion estimation from regions of different temporally located video frames.
- Encoded inter prediction CUs from Modules 204 and 206 are determined that result in the highest picture quality.
- the encoded inter prediction CUs are then input into a Mode Decision Module 210 .
- Source CUs of the input frame are also encoded by Intra Prediction enabling ACT Module 212 , in which a prediction of a source CU from the input frame is determined using intra prediction techniques with adaptive color transform.
- Source CUs of the input frame are also encoded by Intra Prediction disabling ACT Module 214 , in which a prediction of a source CU from the input frame is determined using intra prediction techniques without adaptive color transform, i.e., ACT is disabled
- Source CUs from the same frame located in Frame Buffer 208 are utilized during the intra frame prediction by Modules 212 and 214 .
- Source PUs, and PBs are also determined from source CUs and utilized during the intra frame prediction by Modules 212 and 214 .
- Encoded intra prediction CUs are determined that result in the highest picture quality.
- the encoded intra prediction CUs from Modules 212 and 214 are input into Mode Decision Module 210 .
- Mode Decision Module 210 the costs of encoding the source CUs using inter prediction enabling ACT, inter prediction disabling ACT, intra prediction disabling ACT and intra prediction enabling ACT are compared, along with the quality of each of the predicted CUs. A determination is then made as to which encoding mode prediction CU, such as an intra prediction CU or an inter prediction CU, should be selected based on the comparison. The selected prediction CU is then sent to Summing Modules 216 and 218 .
- the selected prediction CU is subtracted from the source CU version of itself, providing a residual CU. If the selected prediction CU is from one of Inter Prediction enabling ACT Module 204 , or Intra Prediction enabling ACT Module 212 , switch 220 is moved to position A. In position A, the residual CU is input into ACT Module 222 , and thereafter input into CCP, Transform, and Quantization Module 224 . However, if the selected prediction CU is from one of Inter Prediction disabling ACT Module 206 , or Intra Prediction disabling ACT Module 214 , switch 220 is move to position B. In position B, ACT Module 222 is skipped and not utilized during encoding, and the residual CU is instead directly input into CCP, Transform, and Quantization Module 224 from summing Module 216 .
- ACT Module 222 adaptive color transform is performed on the residual CU.
- the output from ACT Module 222 is input into CCP, Transform, and Quantization Module 224 .
- CCP Cross component prediction
- DCT Discrete Cosine Transform
- DST Discrete Sine Transform
- Entropy Coding Module 226 entropy encoding of the residual CU is performed. For example, Context Adaptive Binary Arithmetic Coding (CABAC) may be performed to encode the residual CU. Any other entropy encoding process provided under HEVC may be performed in Entropy Coding Module 226 .
- CABAC Context Adaptive Binary Arithmetic Coding
- the encoded bitstream for the CU of the input video frame is output from the video encoder 200 .
- the output encoded bitstream may be stored in a memory, broadcast over a transmission line or communication network, provided to a display, or the like.
- an inverse determination of the cross component prediction (CCP), transform, and quantization performed at Module 224 on the CU residual is performed to provide a reconstructed residual of the CU.
- switch 230 is moved to position C. In position C, the reconstructed residual CU is input into Inverse ACT Module 232 , and thereafter input into Summing Module 218 . However, if the selected prediction CU is from one of Inter Prediction disabling ACT Module 206 , or Intra Prediction disabling ACT Module 214 , switch 230 is move to position D. In position D, Inverse ACT Module 232 is skipped and not utilized, and the reconstructed residual CU is instead directly input into Summing Module 218 .
- Inverse ACT Module 232 an inverse adaptive color transform to that performed at ACT Module 222 is applied to the reconstructed residual CU.
- the output of Inverse ACT Module 232 is input into Summing Module 218 .
- the reconstructed residual of the CU is added to the selected prediction CU from Mode Decision Module 210 to provide a reconstructed source CU.
- the reconstructed source CU is then stored in Frame Buffer 208 for use in Inter and Intra Prediction of other CUs.
- Encoding methods 300 , 400 , and 500 are performed within Intra Prediction enabling ACT Module 212 . Through the use of encoding methods 300 , 400 , and 500 , encoding efficiency and encoding time are improved.
- FIG. 3 shows a flow chart of encoding method 300 for determining whether TU size evaluation should be performed in an ACT enabled intra prediction encoding process, according to an exemplary embodiment of the present disclosure. More particularly, encoding method 300 utilizes a threshold calculation regarding CU size and, based on the threshold calculation, determines whether a TU size evaluation should be performed.
- step 304 component correlation analysis is performed on a source CU to determine whether a coding mode with ACT of a coding unit should be enabled or disabled.
- a correlation of color components for each pixel contained in the CU is determined.
- correlation between color components is compared to a pixel correlation threshold. Based on the comparison, it is determined for each pixel whether the correlation is above, equal to, or below the pixel correlation threshold.
- the total number of pixels above the pixel correlation threshold is determined for a CU, with those pixels equal to the pixel correlation threshold counted as being above the threshold. This total number of pixels is then compared to a CU correlation threshold.
- step 308 disabling ACT.
- ACT is necessary to de-correlate the components of each pixel in the CU.
- ACT is enabled, and the process proceeds to step 306 , and a rough mode decision as to the intra prediction mode with ACT enabled is determined.
- the correlation analysis of step 304 may in addition or alternatively be based on the color space of a CU.
- color components of pixels in the CU may be analyzed and a color space of the CU determined.
- a color space may be determined as red, green, and blue (RGB), or as a luminance and chrominance (YUV) color space.
- the process proceeds to step 306 , and the rough mode decision as to the intra prediction mode with ACT enabled is determined. Because RGB pixel components are more likely to have high correlation, ACT is necessary to de-correlate the components of each pixel in the CU in order to isolate pixel energy into a single component.
- step 308 disabling ACT.
- ACT is not necessary for YUV pixel components because further de-correlation of the CU pixel components will likely not yield additional encoding benefits.
- Intra Prediction enabling ACT Module 212 when ACT is disabled during encoding method 300 , the coding mode of Infra Prediction enabling ACT is disabled and Module 212 does not output a prediction to Mode Decision Module 210 .
- Inter Prediction enabling ACT Module 204 in a variant of the described process for intra prediction, when ACT is disabled during inter prediction encoding, the coding mode of Inter Prediction enabling ACT is disabled and Module 204 does not output a prediction to Mode Decision Module 210 .
- the rough mode decision as to the intra prediction mode with ACT enabled is determined.
- the rough mode decision may be a cost-based mode decision. For example, in the rough mode decision, a low complexity cost associated with encoding utilizing the selected coding mode is determined to fast select coding modes that are most likely the highest quality and lowest encoding cost.
- a rate distortion optimization (RDO) mode decision is determined for the encoding mode with ACT enabled.
- RDO rate distortion optimization
- a deviation from the original video, as well as a bit cost for encoding modes are calculated when ACT, CCP, Transform, Quantization, and entropy coding are performed.
- the deviation may be measured by an error calculation, such as mean squared error (MSE), for example.
- MSE mean squared error
- Intra Prediction enabling ACT Module 212 35 intra prediction modes (IPMs) are available for encoding.
- IPMs intra prediction modes
- a selection of IPMs with the lowest encoding cost and highest encoding quality are selected out of the 35 IPMs using a simplified, low complexity encoding cost determination.
- a sum of absolute transform distortion (SATD) cost may be utilized to determine a low complexity encoding cost of each IPM.
- the selection of IPMs with the lowest encoding cost and highest encoding quality may be a selection of 3 IPMs, or a selection of 8 IPMs, for example.
- RDO mode decision step 310 for Intra Prediction enabling ACT module 212 an RDO mode decision is determined for each of the selected IPMs.
- a deviation from the original video, as well as a bit cost for encoding is calculated for each of the selected IPMs when ACT, CCP, Transform, Quantization, and entropy coding is performed. The deviation may be measured by an error calculation, such as MSE, for example.
- the IPM with the lowest encoding cost and highest encoding quality determined by the RDO analysis is then chosen from the selected IPMs.
- a variant of the process described above in relation to Intra Prediction enabling ACT Module 212 may also be performed by Inter Prediction enabling ACT Module 204 .
- a rough mode decision of the best inter prediction from temporally adjacent video frames is determined that provides the lowest encoding cost and highest encoding quality.
- an RDO mode decision is determined for the inter prediction.
- a deviation from the original video, as well as a bit cost for encoding is calculated for the inter prediction when ACT, CCP, Transform, Quantization, and entropy coding is performed. The deviation may be measured by an error calculation, such as MSE, for example.
- the inter prediction with the lowest encoding cost and highest encoding quality determined by the RDO analysis is then chosen.
- the CU size of the current CU being processed is calculated.
- a CU may be sized as N vertical samples by N horizontal samples (N ⁇ N), where N may equal 4, 8, 16, 32, or 64.
- the N value for the CU is compared to a threshold T1.
- T1 may equal 4, 8, 16, 32, or 64. Based on the comparison, it is determined whether the CU size is smaller than T1, and thereby whether to evaluate a size of a transform unit for the enabled coding mode. If the CU size is smaller than T1, the process proceeds to step 314 for a TU size decision. However, if the CU size is equal to or greater than T1, the process proceeds to step 316 , bypassing the TU size decision step 314 .
- the TU for the CU sized greater than T1 is determined. If the CU size is equal to or greater than T1, the TU quadtree structure may be determined as the largest possible TU size. For example, when CU size is equal to or greater than T1, for a PU sized 64 ⁇ 64, four TUs sized 32 ⁇ 32 may be determined. In another example, when CU size is equal to or greater than T1, for PUs sized 32 ⁇ 32, 16 ⁇ 16, 8 ⁇ 8, or 4 ⁇ 4, a TU may be sized the same as the PU. For example, if a PU is sized 32 ⁇ 32, a corresponding TU may be sized 32 ⁇ 32.
- step 312 improves coding time and efficiency because the TU size decision may be time consuming and increase encoding cost. Thus, encoding cost and time is saved if the TU size decision can be skipped.
- a CU size equal to or greater than T1 implies that content of the CU is not complex. For example, a CU size greater that T1 may mean that large areas of a video image are free of edges, motion, or complex patterns. Therefore, determining a TU size may not be needed for efficiently encoding the CU with high video quality.
- a TU size decision for the CU is performed.
- a TU of the CU is determined.
- One or more TU sizes are analyzed by evaluating the RDO cost determined in step 310 for prediction modes to find the TU size resulting in the most efficient and high video quality ACT transform of the CU.
- TU sizes 4 ⁇ 4, 8 ⁇ 8, 16 ⁇ 16, and 32 ⁇ 32, for example, are analyzed.
- this TU size is selected for the ACT transform of the CU and the process proceeds to step 316 .
- the selected TU size may be determined as the best TU quad-tree structure size.
- a chroma mode decision is determined.
- a chroma mode decision is determined by determining the prediction mode determined in step 310 , and using the determined prediction mode for chroma prediction to generate a chroma PU, and a corresponding chroma TU.
- the determined TU from step 312 or step 314 is also utilized to generate the chroma TU.
- the chroma TU is also subsampled according to the chroma format. Thus, in one example, when the chroma format is 4:2:0 and the luma TU size is 32 ⁇ 32, the determined chroma TU is a chroma TU sized 16 ⁇ 16.
- Step 308 the process of selecting the best intra prediction mode and selecting the best TU quad-tree structure size is completed for Module 212 .
- the prediction and the RDO cost are generated by Module 212 , and input into Mode Decision Module 210 for comparison with the RDO cost input into Mode Decision Module 210 from the other prediction modules.
- Inter Prediction enabling ACT module 204 may generate a prediction of a CU with ACT applied to it and an RDO cost, and input the prediction CU and RDO cost into Mode Decision Module 210 .
- Inter Prediction disabling ACT Module 206 and Infra Prediction disabling ACT Module 214 also each generate a prediction CU and RDO cost, and input their respective prediction CUs and RDO costs into Mode Decision Module 210 .
- Mode Decision Module 210 compares the prediction CUs and RDO costs input from Modules 204 , 206 , 212 , and 214 , and determines a prediction CU that will be input into Summing Modules 216 and 218 .
- FIG. 4 shows a flow chart of an encoding method 400 that determines whether ACT should be enabled according to another exemplary embodiment of the present disclosure. More particularly, the encoding method 400 utilizes a threshold calculation regarding CU size in combination with a determination about correlations between color components of CU pixels. Based on the threshold calculation, ACT may be either enabled or disabled. Elements labeled the same as previously referenced refer to previously described elements.
- step 304 component correlation analysis is performed on the source CU to determine whether ACT should be enabled or disabled.
- the process that takes place at step 304 is as described for step 304 of encoding method 300 . If it is determined that correlation between color components of the CU is high, or the color space is determined to be an RGB color space, ACT is enabled and the process proceeds through steps 306 , 310 , 314 , 316 , and 308 as described above for encoding method 300 . However, if the correlation is determined to be low, or the color space is determined to be a YUV color space, the process moves to step 402 .
- the CU size of the current CU being processed is determined. As discussed above, the CU is sized as N vertical by N horizontal (N ⁇ N) samples, where N may equal 4, 8, 16, 32, or 64. The N value for the CU is compared to a threshold T2. T2 may equal 4, 8, 16, 32, or 64. Based on the comparison, it is determined whether the CU size is smaller than T2. If the CU size is smaller than T2, ACT is enabled and the process proceeds to step 310 where an RDO based mode decision is made as described in step 310 of encoding method 300 . However, if the CU size is equal to or greater than T2, the process proceeds to step 308 , disabling ACT.
- Inter Prediction enabling ACT Module 204 when ACT is disabled during a variant of encoding method 400 , the coding mode of Inter Prediction enabling ACT is disabled and Module 204 does not output a prediction to Mode Decision Module 210 .
- Intra Prediction enabling Module 212 when ACT is disabled during encoding method 400 , the coding mode of Intra Prediction enabling ACT is disabled and Module 212 does not output a prediction to Mode Decision Module 210 .
- step 402 improves coding time and efficiency because a CU size equal to or greater than T2 implies that content of the CU, and thus the CU, is not complex.
- a CU size greater that T2 may mean that large areas of a video image are free of edges, motion, or complex patterns. In combination with already adequately de-correlated color components, there may not be a need for ACT in order to efficiently encode the CU.
- FIG. 5 shows a flow chart of an encoding method 500 that determines whether ACT should be enabled and whether TU size evaluation should be performed via two threshold calculations, according to another exemplary embodiment of the present disclosure. More particularly, encoding method 500 utilizes a first threshold calculation regarding CU size in combination with a determination about correlations between color components of CU pixels that determines whether ACT should be either enabled or disabled. Method 500 also utilizes a second threshold calculation regarding CU size, by which a determination is made as to whether a TU size evaluation should be performed. Elements labeled the same as previously referenced refer to previously described elements.
- step 304 component correlation analysis is performed on the source CU to determine whether ACT should be enabled or disabled.
- the process that takes place at step 304 is as described for step 304 of encoding method 300 . If it is determined that correlation between color components of the CU is high, or the color space is determined to be an RGB color space, ACT is enabled and the process proceeds to step 306 for rough mode decision and thereafter, step 310 for RDO based mode decision.
- the processes that take place at steps 306 and 310 are as described previously for encoding method 300 . However, if the correlation is determined to be low, or the color space is determined to be a YUV color space, the process moves to step 402 .
- the CU size of the current CU being processed is calculated, as discussed previously for encoding method 400 ( FIG. 4 ). If the CU size is smaller than T2, ACT is enabled and the process proceeds to step 310 for RDO based mode decision. However, if the CU size is equal to or greater than T2, the process proceeds to step 308 , disabling ACT.
- Inter Prediction enabling ACT Module 204 when ACT is disabled during a variant of encoding method 500 , the coding mode of Inter Prediction enabling ACT is disabled and Module 204 does not output a prediction to Mode Decision Module 210 .
- Intra Prediction enabling Module 212 when ACT is disabled during encoding method 500 , the coding mode of Intra Prediction enabling ACT is disabled and Module 212 does not output a prediction to Mode Decision Module 210 .
- RDO based mode decision is calculated as previously described for encoding method 300 .
- the CU size of the current CU being processed is calculated as previously described for encoding method 300 . It is determined whether the CU size of the CU is smaller than T1. If the CU size is smaller than T1, the process proceeds to step 314 for TU size decision. However, if the CU size is equal to or greater than T1, the process proceeds to step 316 , bypassing the TU size decision step 314 .
- the decision processes at steps 314 and 316 are the same as previously described for encoding method 300 .
- the thresholds T1 and T2 may be set as the same or as different values.
- Encoding method 500 of FIG. 5 combines threshold calculations to improve both encoding efficiency and time.
- a CU size equal to or greater than T2 implies that content of the CU, and thus the CU, is not complex, and may feature large areas free of edges, motion, or complex patterns. In combination with already adequately de-correlated color components, there may not be a need for ACT in order to efficiently encode the CU. Furthermore, encoding cost is saved if the TU size decision at step 314 can be skipped.
- FIG. 6 shows a flow chart of an encoding method 600 , similar to encoding method 300 , that determines whether TU size evaluation should be performed in an ACT enabled intra prediction encoding process, according to an exemplary embodiment of the present disclosure. More particularly, encoding method 600 utilizes a threshold calculation regarding CU size and, based on the threshold calculation, determines whether a TU size evaluation should be performed.
- step 304 component correlation analysis is performed on the source CU to determine whether ACT should be enabled or disabled.
- the process that takes place at step 304 is as described for step 304 of encoding method 300 . If it is determined that correlation between color components of the CU is high, or the color space is determined to be an RGB color space, ACT is enabled and the process proceeds to step 306 for rough mode decision and thereafter, step 310 for RDO based mode decision.
- the processes that take place at steps 306 and 310 are as described previously for encoding method 300 .
- step 304 if at step 304 , the correlation is determined to be low, or the color space is determined to be a YUV color space, the coding mode with ACT is enabled and the process proceeds directly to step 310 , but the rough mode decision in step 306 is disabled.
- ACT is still enabled to check if further de-correlation of the pixel components will yield additional encoding benefits.
- RDO based mode decision is calculated as previously described for encoding method 300 .
- the CU size of the current CU being processed is calculated as previously described for encoding method 300 . It is determined whether the CU size of the CU is smaller than T1. If the CU size is smaller than T1, the process proceeds to step 314 for TU size decision. However, if the CU size is equal to or greater than T1, the process proceeds to step 316 , bypassing the TU size decision step 314 .
- the decision processes at steps 314 and 316 are the same as previously described for encoding method 300 .
- the thresholds T1 and T2 may be set as the same or as different values.
- Decoding processes that perform the reverse steps of encoding methods 300 , 400 , 500 , and 600 may be effective to decode the video encoded by encoding methods 300 , 400 , 500 , and 600 .
- decoding methods that perform the reverse steps of the processes recited in encoding methods 300 , 400 , 500 , and 600 are contemplated by the present disclosure.
- Other decoding processes that include steps necessary to decode video encoded by encoding methods 300 , 400 , 500 , and 600 are also contemplated by the present disclosure.
- FIG. 7 shows a system 700 for performing the encoding and decoding methods consistent with the present disclosure.
- System 700 includes a non-transitory computer-readable storage medium 702 that may be a memory storing instructions capable of being performed by a processor 704 . It is noted that one or more non-transitory computer-readable storage mediums 702 and/or one or more processors 704 may alternatively be utilized to perform encoding and decoding methods consistent with the present disclosure.
- Non-transitory computer-readable storage medium 702 may be any sort of non-transitory computer-readable storage medium (CRM).
- a non-transitory computer-readable storage medium may include, for example, a floppy disk, a flexible disk, hard disk, hard drive, solid state drive, magnetic tape, or any other magnetic data storage medium, a CD-ROM, any other optical data storage medium, any physical medium with patterns of holes, a RAM, a PROM, and EPROM, a FLASH-EPROM or any other flash memory, NVRAM, a cache, a register, any other memory chip or cartridge, and networked versions of the same.
- a computer-readable storage medium may store instructions for execution by at least one processor, including instructions for causing the processor to perform steps or stages consistent with the encoding and decoding methods described herein. Additionally, one or more computer-readable storage mediums may be used to implement the encoding and decoding methods described herein.
- the term “computer-readable storage medium” should be understood to include tangible items and exclude carrier waves and transient signals.
- Processor 704 may be one or more of any sort of digital signal processor (DSP), application specific integrated circuit (ASIC), digital signal processing device (DSPD), programmable logic device (PLD), field programmable gate arrays (FPGA), controller, micro-controller, micro-processor, computer, or any other electronic component for performing the encoding and decoding methods described herein.
- DSP digital signal processor
- ASIC application specific integrated circuit
- DSPD digital signal processing device
- PLD programmable logic device
- FPGA field programmable gate arrays
- controller micro-controller
- micro-processor computer, or any other electronic component for performing the encoding and decoding methods described herein.
- Tests were conducted using the HEVC SCC reference mode, SCM 4.0 under common test conditions (CTC). Coding performance of the encoding methods described herein was compared to the reference models for HEVC. Encoding was first performed using the HEVC reference model, with the encoding time recorded as encoding time A. Encoding using a test encoding method according to the encoding methods described herein was also performed, with encoding time recorded as encoding time B. Encoding time percent was calculated by dividing encoding time B by encoding time A.
- HEVC common test sequences were utilized as video under examination. Video featured mixed video frames with text, graphics, and motion; mixed content; animation; and camera captured content.
- Video with RGB and YUV color spaces were tested, with source video quality equaling 720p, 1080p, or 1440p. All intra prediction under lossy conditions, random access, and low-B prediction was utilized. All intra prediction compresses a video frame using information contained within the frame being currently compressed, while random access and low-B prediction compress a video frame by utilizing information within previously coded frames as well as the frame currently being compressed. Low-B prediction is also referred to as low delay B prediction in the following description. In each test, encoding time, as well as decoding time, was recorded, with percentages indicating the percent of time taken to encode or decode compared to exemplary encoding and decoding methods of the reference models.
- Positive percentages referring to each G/Y, B/U, and RN component represent bit rate coding loss, while negative percentages represent bit rate coding gain, in relation to the original video source.
- a 0.1% for a G/Y component represents a coding loss of 0.1% for the G/Y component in the encoded video compared to the G/Y component in the original video source.
- a ⁇ 0.1% for a G/Y component represents a coding gain of 0.1% for the G/Y component in the encoded video compared to the G/Y component in the original video source.
- encoding method 500 of FIG. 5 and Table 1 below testing was performed under three settings. In setting 1, T2 and T1 were each set to 64. In setting 2, T2 was set to 64, while T1 was set to 32. In setting 3, T2 was set to 64, while T1 was set to 16. Intra prediction was the determined encoding mode.
- CU with CU sizes greater than or equal to 64 ⁇ 64 were encoded without ACT.
- CU sized smaller than 64 ⁇ 64 were encoded with ACT enabled.
- TU size decision 314 was skipped.
- TU size decision 314 was performed.
- CU with CU sizes greater than or equal to 64 ⁇ 64 were encoded without ACT.
- CU sized smaller than 64 ⁇ 64 were encoded with ACT enabled.
- TU size decision 314 was skipped.
- TU size decision 314 was performed.
- T2 and T1 were both set to 32.
- T2 and T1 were both set to 16.
- TU evaluation was disabled for CU with CU sizes greater than or equal to 32 ⁇ 32 in Test 1, and CU with CU sizes greater than or equal to 32 ⁇ 32 were encoded without ACT. CU sized smaller than 32 ⁇ 32 were encoded with ACT enabled.
- TU evaluation was disabled for CU with CU sizes greater than or equal to 16 ⁇ 16, and CU with CU sizes greater than or equal to 16 ⁇ 16 were encoded without ACT.
- CU sized smaller than 16 ⁇ 16 were encoded with ACT enabled. Testing was conducted in lossy conditions, with full frame intra block copy utilized.
- N 32 ⁇ 32 RGB, text & graphics with ⁇ 0.1% 0.0% ⁇ 0.1% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% motion, 1080p & 720p RGB, mixed content, 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.1% 0.1% 0.1% 1440p & 1080p RGB, Animation, 720p 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.1% 0.1% RGB, camera captured, 1080p 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.1% 0.0% ⁇ 0.1% ⁇ 0.1% YUV, text & graphics with 0.0% 0.0% 0.1% 0.0% 0.0% 0.0% 0.0% 0.0% 0.1% motion, 1080p & 720p YUV, mixed content, 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% ⁇ 0.1% 0.0% 0.2% ⁇ 0.1% 1440p & 1080p YUV, Animation, 720p
- N 32 ⁇ 32 RGB, text & graphics 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% with motion, 1080p & 720p RGB, mixed content, 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 1440p & 1080p RGB, Animation, 720p 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% RGB, camera 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% captured, 1080p
- each mode featured zero amount of bit-rate change in total or as an average. All intra mode featured the best reduction in encoding complexity, showing a 1% reduction in each test.
- Intra block copy utilizes a motion vector to copy a block from a previously coded CU in the currently coded video frame.
- 4-CTU indicates the allowable searching area for the motion vector.
- T2 and T1 were both set to 32.
- T2 and T1 were both set to 16.
- TU evaluation was disabled for CU with CU sizes greater than or equal to 32 ⁇ 32 in Test 1
- TU evaluation was disabled for CU with CU sizes greater than or equal to 16 ⁇ 16 in Test 2.
- ACT was enabled for CU sizes less than 32 ⁇ 32 in Test 1, with ACT disabled when CU sizes were greater than or equal to 32 ⁇ 32.
- ACT was enabled for CU sizes smaller than 16 ⁇ 16, with ACT disabled when CU sizes were greater than or equal to 16 ⁇ 16.
- N 32 ⁇ 32 RGB, text & graphics with 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.1% motion, 1080p & 720p RGB, mixed content, 0.0% 0.0% 0.0% ⁇ 0.1% 0.0% 0.0% 0.0% 0.0% 0.2% 1440p & 1080p RGB, Animation, 720p 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.1% 0.0% ⁇ 0.1% 0.0% RGB, camera captured, 1080p 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% YUV, text & graphics with 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.2% 0.1% motion, 1080p & 720p YUV, mixed content, 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.2% 0.2% 1440p & 1080p YUV, Animation, 720p 0.0% 0.1% 0.1% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.2%
- each mode featured minimal bit-rate change in all intra, random access, or low-delay B modes. All intra featured the best reduction in encoding complexity in both tests, showing a 5% reduction Test 1, and an 8% reduction in Test 2.
- testing was performed with T2 set to 64.
- determination at step 402 was performed to determine whether the CU size of the CU was smaller than 64 ⁇ 64. If the CU size of the CU was smaller than 64 ⁇ 64, ACT was enabled and RDO based mode decision was performed at step 310 . If CU size of the CU was greater than or equal to 64 ⁇ 64, the ACT was disabled and the process proceeded to step 308 .
- Testing conditions were based on lossy all intra encoding mode with full frame intra block copy in Test 1, and lossy all intra encoding mode with 4 CTU IBC in Test 2. Chroma mode was selected as 4:4:4 in each test.
- Test 1 Encoding AI, Lossy, FF-IBC search G/Y B/U R/V Time YUV, text & graphics with motion, 0.0% 0.0% 0.0% 97% 1080p & 720p YUV, mixed content, 1440p & 1080p 0.0% 0.0% 0.0% 97% YUV, Animation, 720p 0.0% 0.1% 0.1% 99% YUV, camera captured, 1080p 0.0% 0.0% 0.0% 98%
- T2 was set to 64.
- Lossless intra encoding was performed, with chroma mode selected as 4:4:4.
- the encoding method 400 resulted in a 0% to about 2% saving of encoding time.
- T1 was set to 32 in Test 1, and to 16 in Test 2. Consistent with method 300 , in Test 1, for CU with CU sizes greater than or equal to 32 ⁇ 32, the TU size decision 314 was skipped. For CU with CU sizes less than 32 ⁇ 32, TU size decision 314 was performed. In Test 2, for CU with CU sizes greater than or equal to 16 ⁇ 16, the TU size decision 314 was skipped. For CU with CU sizes less than 16 ⁇ 16, TU size decision 314 was performed. Lossy all intra encoding with ACT enabled was performed.
- Test 2 when CU ⁇ 32 ⁇ 32 Encoding when CU ⁇ 16 ⁇ 16 Encoding AI, Lossy G/Y B/U R/V time [%] G/Y B/U R/V time [%] RGB, TGM, 1080p & 720p ⁇ 0.1% 0.0% ⁇ 0.1% 96% 0.0% 0.0% 0.0% 0.0% 94% RGB, mixed, 1440p & 1080p 0.0% 0.0% 0.0% 96% 0.1% 0.1% 0.1% 0.1% 94% RGB, Animation, 720p 0.0% 0.0% 0.0% 0.0% 97% 0.1% 0.1% 0.1% 94% RGB, camera captured, 1080p 0.0% 0.0% 0.0% 96% 0.1% 0.0% 0.1% 93% YUV, TGM, 1080p & 720p 0.0% 0.0% 0.0% 0.1% 95% 0.0% 0.0% 0.0% 92% YUV, mixed, 1440p & 1080p 0.0% 0.0% 0.0% 95% 0.0% 0.0% 0.0% 0.0% 92% YUV, Animation, 720p 0.0% 0.0% 0.0% 0.0% 0.0% 9
- Encoding time in Test 1 was reduced by between 3% to 6%.
- encoding time was reduced by between 6% to 10%.
- allowing TU size decisions only for CU sized less than 32 ⁇ 32 or 16 ⁇ 16 aided encoding efficiency.
- Preparing computer programs based on the written description and methods of this specification is within the skill of a software developer.
- the various programs or program Modules can be created using a variety of programming techniques.
- program sections or program Modules can be designed in or by means of Java, C, C++, assembly language, or any such programming languages.
- One or more of such software sections or Modules can be integrated into a computer system, non-transitory computer-readable media, or existing communications software.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
- Compression Of Band Width Or Redundancy In Fax (AREA)
- Color Television Systems (AREA)
Abstract
A video encoding method includes receiving a source video frame, dividing the source video frame into a coding tree unit, determining a coding unit from the coding tree unit, determining a correlation between components of the coding unit, enabling or disabling a coding mode of the coding unit, determining whether to evaluate a size of a transform unit for an enabled coding mode, and determining a transform unit of the coding unit for the enabled coding mode, wherein the size of the coding unit is defined by a number (N) of samples.
Description
- This application is based upon and claims the benefit of U.S. Provisional Patent Application No. 62/172,256, filed Jun. 8, 2015, the entire contents of which are incorporated herein by reference.
- This disclosure generally relates to methods and systems for video encoding and decoding.
- The demand for high quality video continually increases. With the advent of 4K and 8K video formats that require the processing of large amounts of video data, improvements to video encoding and decoding efficiency in the compression of such video data are needed. Furthermore, consumers expect the transmission and reception of high quality video across various transmission mediums. For example, consumers expect high quality video obtained over a network for viewing on portable devices, such as smartphones, tablets, and laptops, as well as on home televisions and computers. Consumers also expect high quality video for display during teleconferencing and screen sharing, for example.
- The High Efficiency Video Coding (HEVC) standard H.265, implemented a new standard aimed at improving the performance of video encoding and decoding during video compression. Developed by the ISO/IEC JTC 1/SC 29/WG 11 Moving Picture Experts Group (MPEG) and the ITU-T SG16 Video Coding Experts Group (VCEG), HEVC reduces the data rate needed to compress high quality video in comparison to the previous standard, Advanced Video Coding (AVC). AVC is also known as H.264.
- HEVC utilizes various coding tools, including inter prediction and intra prediction techniques to compress video during coding. Inter prediction techniques utilize temporal redundancies between different video frames in a video stream to compress video data. For example, a video frame being currently encoded may utilize portions of previously encoded and decoded video frames containing similar content. These portions of previously encoded and decoded video frames may be used to predict encoding of areas of the current video frame containing similar content. In contrast, intra prediction utilizes only video data within the currently encoded video frame to compress video data. No temporal redundancies between different video frames are employed in intra prediction techniques. For example, encoding of a current video frame may utilize other portions of the same frame. Intra prediction features 35 intra modes, with the modes including a Planar mode, a DC mode, and 33 directional modes.
- HEVC also uses expansive partitioning and dividing of each input video frame compared to AVC. AVC relies only on macroblock division of an input video frame for its encoding and decoding. In contrast, HEVC may divide an input video frame into various data units and blocks that are sized differently, as will be described in more detail below. This aspect of HEVC provides improved flexibility in the encoding and decoding of video frames featuring large amounts of motion, detail, and edges, for example, and allows for efficiency gains over AVC.
- Additional coding tools that further improve video coding under HEVC have been proposed for inclusion in the standard. These coding tools are named coding extensions. The Screen Content Coding (SCC) extension is a proposed extension that focuses on improving processing performance related to video screen content under the HEVC standard. Screen content is video containing a significant portion of rendered graphics, text, or animation, rather than camera captured video scenes. The rendered graphics, text, or animation may be moving or static, and may also be provided in a video feed in addition to camera captured video scenes. Example applications implicating SCC may include screen mirroring, cloud gaming, wireless display of content, displays generated during remote computer desktop access, and screen sharing, such as real-time screen sharing during video conferencing.
- One coding tool included in SCC is the adaptive color transform (ACT). The ACT is a color space transform applied to residue pixel samples of a coding unit (CU). For certain color spaces, correlations between color components of a pixel within a CU are present. When a correlation between color components of a pixel is high, performing the ACT on the pixel may help concentrate the energy of correlated color components by de-correlating the color components. Such concentrated energy allows for more efficient coding and decreased coding cost. Thus, the ACT may improve coding performance during HEVC coding.
- However, evaluating whether to enable ACT, requires an additional rate distortion optimization (RDO) check during encoding, where the RDO check evaluates a rate distortion (RD) cost of the coding mode with enabled ACT. Such evaluations may increase both coding complexity and coding time. Furthermore, the ACT may not be necessary when color components of a pixel are already de-correlated. In such a case, further de-correlation of color components may not provide any benefit because the cost of performing the ACT is higher than coding performance gains.
- One aspect of the present disclosure is directed to a video encoding method. The method includes receiving a source video frame, dividing the source video frame into a coding tree unit, determining a coding unit from the coding tree unit, determining a correlation between components of the coding unit, enabling or disabling a coding mode of the coding unit, determining whether to evaluate a size of a transform unit for an enabled coding mode; and determining a transform unit of the coding unit for the enabled coding mode, wherein the size of the coding unit is defined by a number (N) of samples.
- Another aspect of the present disclosure is directed to a video encoding system. The system includes a memory storing instructions and a processor. The instructions, when executed by the processor, cause the processor to: receive a source video frame, divide the source video frame into a coding tree unit, determine a coding unit from the coding tree unit, determine a correlation between components of the coding unit, enable or disable a coding mode of the coding unit, determine whether to evaluate a size of a transform unit for an enabled coding mode, and determine a transform unit of the coding unit for the enabled coding mode, wherein the size of the coding unit is defined by a number (N) of samples.
- Another aspect of the present disclosure is directed to a non-transitory computer-readable storage medium storing a set of instructions. The instructions, when executed by one or more processors, cause the one or more processors to perform a method of video encoding. The method of video encoding includes: receiving a source video frame, dividing the source video frame into a coding tree unit, determining a coding unit from the coding tree unit, determining a correlation between components of the coding unit, enabling or disabling a coding mode of the coding unit, determining whether to evaluate a size of a transform unit for an enabled coding mode; and determining a transform unit of the coding unit for the enabled coding mode, wherein the size of the coding unit is defined by a number (N) of samples.
- It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure, as claimed.
- The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and, together with the description, serve to explain the principles of the disclosure.
-
FIGS. 1A-1J illustrate a video frame and related partitions of the video frame according to embodiments of the present disclosure. -
FIG. 2 shows an exemplary video encoder consistent with the present disclosure. -
FIG. 3 shows a flow chart of an encoding method according to an exemplary embodiment of the present disclosure. -
FIG. 4 shows a flow chart of an encoding method according to another exemplary embodiment of the present disclosure. -
FIG. 5 shows a flow chart of an encoding method according to another exemplary embodiment of the present disclosure. -
FIG. 6 shows a flow chart of an encoding method according to another exemplary embodiment of the present disclosure. -
FIG. 7 shows a system for performing encoding and decoding methods and processes consistent with the present disclosure. - Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. The following description refers to the accompanying′drawings in which the same numbers in different drawings represent the same or similar elements unless otherwise represented. The implementations set forth in the following description of exemplary embodiments do not represent all implementations consistent with the disclosure. Instead, they are merely examples of systems and methods consistent with aspects related to the disclosure as recited in the appended claims.
-
FIGS. 1A-1J illustrate a video frame and related partitions of the video frame according to embodiments of the present disclosure. -
FIG. 1A shows avideo frame 101 that includes a number of pixels located at locations within the video frame.Video frame 101 is partitioned into coding tree units (CTUs) 102. EachCTU 102 is sized according to L vertical samples by L horizontal samples (L×L), where each sample corresponds to a pixel value located at a different pixel location in the CTU. For example, L may equal 16, 32, or 64 samples. Pixel locations may be locations where pixels are present in the CTU, or locations between where pixels are present in the CTU. When a pixel location is between where pixels are present, the pixel value is an interpolated value determined from pixels located at one or more spatial locations around the pixel location. EachCTU 102 includes a luma coding tree block (CTB), chroma CTBs, and associated syntax. -
FIG. 1B shows CTBs that may be contained by aCTU 102 ofFIG. 1A . For example,CTU 102 may include aluma CTB 103, and chroma CTBs 104 (Cb CTB) and 105 (Cr CTB).CTU 102 also may include associatedsyntax 106. TheCb CTB 104 is the blue difference chroma component CTB, and represents changes in blue colorfulness for the CTB. TheCr CTB 105 is the red difference chroma component CTB, and represents changes in red colorfulness for the CTB.Associated syntax 106 contains information as to howCTBs CTBs CTBs CTU 102. Alternatively,luma CTB 103 may have the same size asCTU 102, butchroma CTBs CTU 102. - Coding tools such as intra prediction, inter prediction, and others, operate on coding blocks (CBs). In order to enable a determination of whether to encode via intra prediction or inter prediction, CTBs may be partitioned into one or multiple CBs. Partitioning of CTBs into CBs is based on quad-tree splitting. Thus, a CTB may be partitioned into four CBs, where each CB may be further partitioned into four CBs. This partitioning may be continued based on the size of the CTB being partitioned.
-
FIG. 1C shows various partitionings of theluma CTB 103 ofFIG. 1B into one or multiple luma CBs 107-1, 107-2, 107-3, or 107-4. For a 64×64 luma CTB, a corresponding luma CB 107 may be sized as N vertical by N horizontal (N×N) samples, such as 64×64, 32×32, 16×16, or 8×8. InFIG. 1C ,luma CTB 103 is sized as 64×64. However,luma CTB 103 may alternatively be sized as 32×32 or 16×16. -
FIG. 1D shows an example of quadtree partitioning ofluma CTB 103 ofFIG. 1B , whereinluma CTB 103 is partitioned into CBs 107-1, 107-2, 107-3, or 107-4 shown inFIG. 1C . InFIG. 1D ,luma CTB 103 is sized as 64×64. However,luma CTB 103 may be alternatively be sized as 32×32 or 16×16. - In
FIG. 1D ,luma CTB 103 is partitioned into four 32×32 CBs, labeled 107-2. Each 32×32 CB may further be partitioned into four 16×16 CBs, labeled 107-3. Each 16×16 CB may then be partitioned into four 8×8 CBs, labeled 107-4. - Coding units (CUs) are utilized to code CBs. A CTB contains only one CU or is divided to contain multiple CUs. Thus, a CU may also be sized as N vertical by N horizontal (N×N) samples, such as 64×64, 32×32, 16×16, or 8×8. Each CU contains a luma CB, two chroma CBs, and associated syntax. A residual CU formed during encoding and decoding may be sized the same as the CU corresponding to the residual CU.
-
FIG. 1E shows CBs including, for example, luma CB 107-1 ofFIG. 1C , that may be contained by aCU 108. For example,CU 108 may include luma CB 107-1, and chroma CBs 109 (Cb CB) and 110 (Cr CB).CU 108 may also include associatedsyntax 111.Associated syntax 111 contains information as to how CBs 107-1, 109, and 110 are to be encoded, such as quadtree syntax that specifies the size and positions of luma and chroma CBs, and further subdivision. EachCU 108 may have an associated partition of its CBs 107-1, 109, and 110 into prediction blocks (PBs). PBs are aggregated into prediction units (PUs). -
FIG. 1F shows alternative partitionings of CB 107-1 ofFIG. 1D intoluma PBs 112. CB 107-1 may, for example, be partitioned intoPBs 112 depending on the predictability of the different areas of the CB 107-1. For example, CB 107-1 may contain asingle PB 112 sized the same as CB 107-1. Alternatively, CB 107-1 may be partitioned vertically or horizontally into two evenPBs 112, or CB 107-1 may be partitioned or CB 107-1 may be partitioned vertical or horizontally into fourPBs 112. It is noted that the partitions shown inFIG. 1F are exemplary, and any other kinds of partitions into PBs allowable under the HEVC standard are contemplated by the present disclosure. Furthermore, the different partitions of CB 107-1 intoPBs 112 as shown inFIG. 1F are mutually exclusive. As an example, in an intra prediction mode in HEVC, 64×64, 32×32, and 16×16 CBs may be partitioned only into a single PB sized the same as the CB, while 8×8 CBs may be partitioned into one 8×8 PB or four 4×4 PBs. - Once an intra or inter prediction for a block is made, a residual signal generated from a difference between the prediction block and the source video image block is transformed to another domain for further coding using transforms such as the discrete cosine transform (DCT) or discrete sine transform (DST). To provide this transform, one or more transform blocks (TB) are utilized for each CU or each CB.
-
FIG. 1G shows how luma CB 107-1 ofFIG. 1E or 1F is partitioned into different TBs 113-1, 113-2, 113-3, and 113-4. If CB 107-1 is a 64×64 CB, TB 113-1 is a 32×32 TB, TB 113-2 is a 16×16 TB, TB 113-3 is a 8×8 TB, and TB 113-4 is a 4×4 TB. CB 107-1 would be partitioned into four TBs 113-1, sixteen TBs 113-2, sixty-four TBs 113-3, and two-hundred and fifty-six TBs 113-4. A CB 107-1 may be partitioned into TBs 113 all of the same size, or of different sizes. - Partitioning of CBs into TBs is based on quad-tree splitting. Thus, a CB may be partitioned into one or multiple TBs, where each TB may be further partitioned into four TBs. This partitioning may be continued based on the size of the CB being partitioned.
-
FIG. 1H shows an example of quadtree partitioning of luma CB 107-1 ofFIG. 1E or 1F , utilizing the various partitionings into TBs 113-1, 113-2, 113-3, or 113-4 shown inFIG. 1G . InFIG. 1H , luma CB 107-1 is sized as 64×64. However, luma CB 107-1 may alternatively be sized as 32×32 or 16×16. - In
FIG. 1H , luma CB 107-1 is partitioned into four 32×32 TBs, labeled 113-1. Each 32×32 TB may further be partitioned into four 16×16 TBs, labeled 113-2. Each 16×16 TB may then be partitioned into four 8×8 TBs, labeled 113-3. Each 8×8 TB may then be partitioned into four 4×4 TBs, labeled 113-4. - TBs 113 are then transformed via, for example a DCT, or any other transform contemplated by the HEVC standard. Transform units (TUs) aggregate TBs 113. One or more TBs are utilized for each CB. CBs form each CU. Thus, Transform unit(TU) structure is different for
different CUs 108, and is determined fromCUs 108. -
FIG. 1I shows alternative partitionings 113-1, 113-2, 113-3, and 113-4 of aTU 114, where each TU aggregates partitioned TBs ofFIG. 1G or 1H . A 32×32sized TU 114 can hold a single TB 113-1 sized 32×32, or one or more TBs 113 sized 16×16 (113-2), 8×8 (113-3), or 4×4 (113-4). For a CU enabling inter prediction in the HEVC, the TU may be larger than PU, such that the TU may contain PU boundaries. However, the TU may not cross PU boundaries for a CU enabling intra prediction in the HEVC. -
FIG. 1J shows an example of quadtree partitioning ofTU 114 ofFIG. 1I , utilizing the various partitionings into TBs 113-1, 113-2, 113-3, or 113-4 shown inFIG. 1I . InFIG. 1J ,TU 114 is sized as 32×32. However, TU may alternatively be sized as 16×16, 8×8, or 4×4. - In
FIG. 1J ,TU 114 is partitioned into one TB 113-1 sized 32×32, and four 16×16 TBs labeled 113-2. Each 16×16 TB may further be partitioned into four 8×8 TBs, labeled 113-3. Each 8×8 TB may then be partitioned into four 4×4 TBs, labeled 113-4. - For any CTU, CTB, CB, CU, PU, PB, TU, or TB mentioned in the present disclosure, each may include any features, sizes, and properties in accordance with the HEVC standard. The partitioning shown in
FIGS. 1C, 1E, and 1F also applies to the chroma CTBs CTB 104 (Cb CTB) and CTB 105 (Cr CTB), and chroma CBs CB 109 (Cb CB) and CB 110 (Cr CB). -
FIG. 2 shows anexemplary video encoder 200 for performing encoding methods consistent with the present disclosure.Video encoder 200 may include one or more additional components that provide additional encoding functions contemplated by HEVC-SCC, such as palette mode, sample adaptive offset, and de-blocking filtering. Additionally, the present disclosure contemplates intra prediction mode enabling ACT, as well as other coding modes, such as inter prediction enabling ACT. - An input source video frame is received by
encoder 200. The input source frame is first input into aFrame Dividing Module 202, in which the frame is divided into at least one source CTU. A source CU is then determined from the source CTU. Source CTU sizes and source CU sizes are determined byFrame Dividing Module 202. Encoding then takes place on a CU-by-CU basis, with source CUs output byFrame Dividing Module 202 input into Inter Prediction enabling adaptive color transformation (ACT)Module 204, Inter Prediction disablingACT Module 206, Intra Prediction enablingACT Module 212, and Intra Prediction disablingACT Module 214. - Source CUs of the input frame are encoded by Inter Prediction enabling
ACT Module 204, in which a prediction of a source CU from the input frame is determined using inter prediction techniques with adaptive color transformation enabled. Source CUs of the input frame are also encoded by Inter Prediction disablingACT Module 206, in which a prediction of a source CU from the input frame is determined using inter prediction techniques without ACT enabled, i.e., ACT is disabled. - Reference CUs from frames in a
Frame Buffer 208 are utilized during the inter frame prediction. Source PUs and PBs are also determined from source CU and utilized during the inter frame prediction byModules Modules Mode Decision Module 210. - Source CUs of the input frame are also encoded by Intra Prediction enabling
ACT Module 212, in which a prediction of a source CU from the input frame is determined using intra prediction techniques with adaptive color transform. - Source CUs of the input frame are also encoded by Intra Prediction disabling
ACT Module 214, in which a prediction of a source CU from the input frame is determined using intra prediction techniques without adaptive color transform, i.e., ACT is disabled - Source CUs from the same frame located in
Frame Buffer 208 are utilized during the intra frame prediction byModules Modules Modules Mode Decision Module 210. - In
Mode Decision Module 210 the costs of encoding the source CUs using inter prediction enabling ACT, inter prediction disabling ACT, intra prediction disabling ACT and intra prediction enabling ACT are compared, along with the quality of each of the predicted CUs. A determination is then made as to which encoding mode prediction CU, such as an intra prediction CU or an inter prediction CU, should be selected based on the comparison. The selected prediction CU is then sent to SummingModules - At Summing
Module 216, the selected prediction CU is subtracted from the source CU version of itself, providing a residual CU. If the selected prediction CU is from one of Inter Prediction enablingACT Module 204, or Intra Prediction enablingACT Module 212,switch 220 is moved to position A. In position A, the residual CU is input intoACT Module 222, and thereafter input into CCP, Transform, andQuantization Module 224. However, if the selected prediction CU is from one of Inter Prediction disablingACT Module 206, or Intra Prediction disablingACT Module 214,switch 220 is move to position B. In position B,ACT Module 222 is skipped and not utilized during encoding, and the residual CU is instead directly input into CCP, Transform, andQuantization Module 224 from summingModule 216. - At
ACT Module 222, adaptive color transform is performed on the residual CU. The output fromACT Module 222 is input into CCP, Transform, andQuantization Module 224. - At CCP, Transform, and
Quantization Module 224, a cross component prediction (CCP), a transform such as a Discrete Cosine Transform (DCT) or Discrete Sine Transform (DST), and quantization of the CU residual are performed. The output of CCP, Transform, andQuantization Module 224 is input intoEntropy Coding Module 226 and Inverse CCP, Transform, andQuantization Module 228. - At
Entropy Coding Module 226, entropy encoding of the residual CU is performed. For example, Context Adaptive Binary Arithmetic Coding (CABAC) may be performed to encode the residual CU. Any other entropy encoding process provided under HEVC may be performed inEntropy Coding Module 226. - After entropy encoding, the encoded bitstream for the CU of the input video frame is output from the
video encoder 200. The output encoded bitstream may be stored in a memory, broadcast over a transmission line or communication network, provided to a display, or the like. - At Inverse CCP, Transform, and
Quantization Module 228, an inverse determination of the cross component prediction (CCP), transform, and quantization performed atModule 224 on the CU residual is performed to provide a reconstructed residual of the CU. - If the selected prediction CU is from one of Inter Prediction enabling
ACT Module 204, or Intra Prediction enablingACT Module 212,switch 230 is moved to position C. In position C, the reconstructed residual CU is input intoInverse ACT Module 232, and thereafter input into SummingModule 218. However, if the selected prediction CU is from one of Inter Prediction disablingACT Module 206, or Intra Prediction disablingACT Module 214,switch 230 is move to position D. In position D,Inverse ACT Module 232 is skipped and not utilized, and the reconstructed residual CU is instead directly input into SummingModule 218. - At
Inverse ACT Module 232, an inverse adaptive color transform to that performed atACT Module 222 is applied to the reconstructed residual CU. The output ofInverse ACT Module 232 is input into SummingModule 218. - At Summing
Module 218, the reconstructed residual of the CU is added to the selected prediction CU fromMode Decision Module 210 to provide a reconstructed source CU. The reconstructed source CU is then stored inFrame Buffer 208 for use in Inter and Intra Prediction of other CUs. - Encoding
methods ACT Module 212. Through the use ofencoding methods -
FIG. 3 shows a flow chart ofencoding method 300 for determining whether TU size evaluation should be performed in an ACT enabled intra prediction encoding process, according to an exemplary embodiment of the present disclosure. More particularly, encodingmethod 300 utilizes a threshold calculation regarding CU size and, based on the threshold calculation, determines whether a TU size evaluation should be performed. - At
step 304, component correlation analysis is performed on a source CU to determine whether a coding mode with ACT of a coding unit should be enabled or disabled. A correlation of color components for each pixel contained in the CU is determined. For each pixel, correlation between color components is compared to a pixel correlation threshold. Based on the comparison, it is determined for each pixel whether the correlation is above, equal to, or below the pixel correlation threshold. - The total number of pixels above the pixel correlation threshold is determined for a CU, with those pixels equal to the pixel correlation threshold counted as being above the threshold. This total number of pixels is then compared to a CU correlation threshold.
- If the total number of pixels is below the CU correlation threshold, then it is determined that color components of the CU have low correlation. It is therefore decided that ACT is not necessary for the CU, and the process proceeds to step 308, disabling ACT.
- However, if the total number of pixels is above the CU correlation threshold, it is determined that color components of the CU have high correlation. In this case, it is determined that ACT is necessary to de-correlate the components of each pixel in the CU. When high correlation is calculated, ACT is enabled, and the process proceeds to step 306, and a rough mode decision as to the intra prediction mode with ACT enabled is determined.
- The correlation analysis of
step 304 may in addition or alternatively be based on the color space of a CU. For example, atstep 304, color components of pixels in the CU may be analyzed and a color space of the CU determined. A color space may be determined as red, green, and blue (RGB), or as a luminance and chrominance (YUV) color space. - When a determination is made that the color space is RGB, the process proceeds to step 306, and the rough mode decision as to the intra prediction mode with ACT enabled is determined. Because RGB pixel components are more likely to have high correlation, ACT is necessary to de-correlate the components of each pixel in the CU in order to isolate pixel energy into a single component.
- In contrast, when a determination is made that the color space is YUV, the process proceeds to step 308, disabling ACT. This is because YUV pixel components are more likely to have low correlation, with most pixel energy stored in a single pixel component. Thus, ACT is not necessary for YUV pixel components because further de-correlation of the CU pixel components will likely not yield additional encoding benefits.
- In Intra Prediction enabling
ACT Module 212, when ACT is disabled duringencoding method 300, the coding mode of Infra Prediction enabling ACT is disabled andModule 212 does not output a prediction toMode Decision Module 210. - In Inter Prediction enabling
ACT Module 204, in a variant of the described process for intra prediction, when ACT is disabled during inter prediction encoding, the coding mode of Inter Prediction enabling ACT is disabled andModule 204 does not output a prediction toMode Decision Module 210. - At
step 306, the rough mode decision as to the intra prediction mode with ACT enabled is determined. The rough mode decision may be a cost-based mode decision. For example, in the rough mode decision, a low complexity cost associated with encoding utilizing the selected coding mode is determined to fast select coding modes that are most likely the highest quality and lowest encoding cost. - At
step 310, a rate distortion optimization (RDO) mode decision is determined for the encoding mode with ACT enabled. Here, a deviation from the original video, as well as a bit cost for encoding modes are calculated when ACT, CCP, Transform, Quantization, and entropy coding are performed. The deviation may be measured by an error calculation, such as mean squared error (MSE), for example. The encoding mode with the lowest encoding cost and highest encoding quality determined by the RDO analysis is then chosen. - For example, in Intra Prediction enabling
ACT Module 212, 35 intra prediction modes (IPMs) are available for encoding. In the roughmode decision step 306 for Intra Prediction enablingACT module 212, a selection of IPMs with the lowest encoding cost and highest encoding quality are selected out of the 35 IPMs using a simplified, low complexity encoding cost determination. For example, a sum of absolute transform distortion (SATD) cost may be utilized to determine a low complexity encoding cost of each IPM. The selection of IPMs with the lowest encoding cost and highest encoding quality may be a selection of 3 IPMs, or a selection of 8 IPMs, for example. In RDOmode decision step 310 for Intra Prediction enablingACT module 212, an RDO mode decision is determined for each of the selected IPMs. A deviation from the original video, as well as a bit cost for encoding is calculated for each of the selected IPMs when ACT, CCP, Transform, Quantization, and entropy coding is performed. The deviation may be measured by an error calculation, such as MSE, for example. The IPM with the lowest encoding cost and highest encoding quality determined by the RDO analysis is then chosen from the selected IPMs. - A variant of the process described above in relation to Intra Prediction enabling
ACT Module 212 may also be performed by Inter Prediction enablingACT Module 204. For example, whenModule 204 performs encodingmethod 300, atstep 306, a rough mode decision of the best inter prediction from temporally adjacent video frames is determined that provides the lowest encoding cost and highest encoding quality. Atstep 310, an RDO mode decision is determined for the inter prediction. Here, a deviation from the original video, as well as a bit cost for encoding is calculated for the inter prediction when ACT, CCP, Transform, Quantization, and entropy coding is performed. The deviation may be measured by an error calculation, such as MSE, for example. The inter prediction with the lowest encoding cost and highest encoding quality determined by the RDO analysis is then chosen. - At
step 312, the CU size of the current CU being processed is calculated. A CU may be sized as N vertical samples by N horizontal samples (N×N), where N may equal 4, 8, 16, 32, or 64. The N value for the CU is compared to a threshold T1. T1 may equal 4, 8, 16, 32, or 64. Based on the comparison, it is determined whether the CU size is smaller than T1, and thereby whether to evaluate a size of a transform unit for the enabled coding mode. If the CU size is smaller than T1, the process proceeds to step 314 for a TU size decision. However, if the CU size is equal to or greater than T1, the process proceeds to step 316, bypassing the TUsize decision step 314. Atstep 312, when the CU size is greater than T1, the TU for the CU sized greater than T1 is determined. If the CU size is equal to or greater than T1, the TU quadtree structure may be determined as the largest possible TU size. For example, when CU size is equal to or greater than T1, for a PU sized 64×64, four TUs sized 32×32 may be determined. In another example, when CU size is equal to or greater than T1, for PUs sized 32×32, 16×16, 8×8, or 4×4, a TU may be sized the same as the PU. For example, if a PU is sized 32×32, a corresponding TU may be sized 32×32. - The process of
step 312 improves coding time and efficiency because the TU size decision may be time consuming and increase encoding cost. Thus, encoding cost and time is saved if the TU size decision can be skipped. Furthermore, a CU size equal to or greater than T1 implies that content of the CU is not complex. For example, a CU size greater that T1 may mean that large areas of a video image are free of edges, motion, or complex patterns. Therefore, determining a TU size may not be needed for efficiently encoding the CU with high video quality. - At
step 314, if the CU size is smaller than T1, a TU size decision for the CU is performed. Here, a TU of the CU is determined. One or more TU sizes are analyzed by evaluating the RDO cost determined instep 310 for prediction modes to find the TU size resulting in the most efficient and high video quality ACT transform of the CU. TU sizes of 4×4, 8×8, 16×16, and 32×32, for example, are analyzed. When the TU size that results in the most efficient ACT transform is determined, this TU size is selected for the ACT transform of the CU and the process proceeds to step 316. The selected TU size may be determined as the best TU quad-tree structure size. - At
step 316, a chroma mode decision is determined. A chroma mode decision is determined by determining the prediction mode determined instep 310, and using the determined prediction mode for chroma prediction to generate a chroma PU, and a corresponding chroma TU. The determined TU fromstep 312 or step 314 is also utilized to generate the chroma TU. The chroma TU is also subsampled according to the chroma format. Thus, in one example, when the chroma format is 4:2:0 and the luma TU size is 32×32, the determined chroma TU is a chroma TU sized 16×16. - At
step 308, the process of selecting the best intra prediction mode and selecting the best TU quad-tree structure size is completed forModule 212. The prediction and the RDO cost are generated byModule 212, and input intoMode Decision Module 210 for comparison with the RDO cost input intoMode Decision Module 210 from the other prediction modules. For example, Inter Prediction enablingACT module 204 may generate a prediction of a CU with ACT applied to it and an RDO cost, and input the prediction CU and RDO cost intoMode Decision Module 210. Inter Prediction disablingACT Module 206 and Infra Prediction disablingACT Module 214 also each generate a prediction CU and RDO cost, and input their respective prediction CUs and RDO costs intoMode Decision Module 210.Mode Decision Module 210 compares the prediction CUs and RDO costs input fromModules Modules -
FIG. 4 shows a flow chart of anencoding method 400 that determines whether ACT should be enabled according to another exemplary embodiment of the present disclosure. More particularly, theencoding method 400 utilizes a threshold calculation regarding CU size in combination with a determination about correlations between color components of CU pixels. Based on the threshold calculation, ACT may be either enabled or disabled. Elements labeled the same as previously referenced refer to previously described elements. - At
step 304, component correlation analysis is performed on the source CU to determine whether ACT should be enabled or disabled. The process that takes place atstep 304 is as described forstep 304 ofencoding method 300. If it is determined that correlation between color components of the CU is high, or the color space is determined to be an RGB color space, ACT is enabled and the process proceeds throughsteps method 300. However, if the correlation is determined to be low, or the color space is determined to be a YUV color space, the process moves to step 402. - At
step 402, the CU size of the current CU being processed is determined. As discussed above, the CU is sized as N vertical by N horizontal (N×N) samples, where N may equal 4, 8, 16, 32, or 64. The N value for the CU is compared to a threshold T2. T2 may equal 4, 8, 16, 32, or 64. Based on the comparison, it is determined whether the CU size is smaller than T2. If the CU size is smaller than T2, ACT is enabled and the process proceeds to step 310 where an RDO based mode decision is made as described instep 310 ofencoding method 300. However, if the CU size is equal to or greater than T2, the process proceeds to step 308, disabling ACT. - In Inter Prediction enabling
ACT Module 204, when ACT is disabled during a variant ofencoding method 400, the coding mode of Inter Prediction enabling ACT is disabled andModule 204 does not output a prediction toMode Decision Module 210. In IntraPrediction enabling Module 212, when ACT is disabled duringencoding method 400, the coding mode of Intra Prediction enabling ACT is disabled andModule 212 does not output a prediction toMode Decision Module 210. - The process of
step 402 improves coding time and efficiency because a CU size equal to or greater than T2 implies that content of the CU, and thus the CU, is not complex. A CU size greater that T2 may mean that large areas of a video image are free of edges, motion, or complex patterns. In combination with already adequately de-correlated color components, there may not be a need for ACT in order to efficiently encode the CU. -
FIG. 5 shows a flow chart of anencoding method 500 that determines whether ACT should be enabled and whether TU size evaluation should be performed via two threshold calculations, according to another exemplary embodiment of the present disclosure. More particularly, encodingmethod 500 utilizes a first threshold calculation regarding CU size in combination with a determination about correlations between color components of CU pixels that determines whether ACT should be either enabled or disabled.Method 500 also utilizes a second threshold calculation regarding CU size, by which a determination is made as to whether a TU size evaluation should be performed. Elements labeled the same as previously referenced refer to previously described elements. - At
step 304, component correlation analysis is performed on the source CU to determine whether ACT should be enabled or disabled. The process that takes place atstep 304 is as described forstep 304 ofencoding method 300. If it is determined that correlation between color components of the CU is high, or the color space is determined to be an RGB color space, ACT is enabled and the process proceeds to step 306 for rough mode decision and thereafter, step 310 for RDO based mode decision. The processes that take place atsteps encoding method 300. However, if the correlation is determined to be low, or the color space is determined to be a YUV color space, the process moves to step 402. - At
step 402, the CU size of the current CU being processed is calculated, as discussed previously for encoding method 400 (FIG. 4 ). If the CU size is smaller than T2, ACT is enabled and the process proceeds to step 310 for RDO based mode decision. However, if the CU size is equal to or greater than T2, the process proceeds to step 308, disabling ACT. - In Inter Prediction enabling
ACT Module 204, when ACT is disabled during a variant ofencoding method 500, the coding mode of Inter Prediction enabling ACT is disabled andModule 204 does not output a prediction toMode Decision Module 210. In IntraPrediction enabling Module 212, when ACT is disabled duringencoding method 500, the coding mode of Intra Prediction enabling ACT is disabled andModule 212 does not output a prediction toMode Decision Module 210. - At
step 310, RDO based mode decision is calculated as previously described forencoding method 300. - At
step 312, the CU size of the current CU being processed is calculated as previously described forencoding method 300. It is determined whether the CU size of the CU is smaller than T1. If the CU size is smaller than T1, the process proceeds to step 314 for TU size decision. However, if the CU size is equal to or greater than T1, the process proceeds to step 316, bypassing the TUsize decision step 314. The decision processes atsteps encoding method 300. - The thresholds T1 and T2 may be set as the same or as different values.
-
Encoding method 500 ofFIG. 5 combines threshold calculations to improve both encoding efficiency and time. As described above, a CU size equal to or greater than T2 implies that content of the CU, and thus the CU, is not complex, and may feature large areas free of edges, motion, or complex patterns. In combination with already adequately de-correlated color components, there may not be a need for ACT in order to efficiently encode the CU. Furthermore, encoding cost is saved if the TU size decision atstep 314 can be skipped. -
FIG. 6 shows a flow chart of anencoding method 600, similar toencoding method 300, that determines whether TU size evaluation should be performed in an ACT enabled intra prediction encoding process, according to an exemplary embodiment of the present disclosure. More particularly, encodingmethod 600 utilizes a threshold calculation regarding CU size and, based on the threshold calculation, determines whether a TU size evaluation should be performed. - At
step 304, component correlation analysis is performed on the source CU to determine whether ACT should be enabled or disabled. The process that takes place atstep 304 is as described forstep 304 ofencoding method 300. If it is determined that correlation between color components of the CU is high, or the color space is determined to be an RGB color space, ACT is enabled and the process proceeds to step 306 for rough mode decision and thereafter, step 310 for RDO based mode decision. The processes that take place atsteps encoding method 300. However, if atstep 304, the correlation is determined to be low, or the color space is determined to be a YUV color space, the coding mode with ACT is enabled and the process proceeds directly to step 310, but the rough mode decision instep 306 is disabled. Here, for low correlation pixel components or a YUV color space, ACT is still enabled to check if further de-correlation of the pixel components will yield additional encoding benefits. - At
step 310, RDO based mode decision is calculated as previously described forencoding method 300. - At
step 312, the CU size of the current CU being processed is calculated as previously described forencoding method 300. It is determined whether the CU size of the CU is smaller than T1. If the CU size is smaller than T1, the process proceeds to step 314 for TU size decision. However, if the CU size is equal to or greater than T1, the process proceeds to step 316, bypassing the TUsize decision step 314. The decision processes atsteps encoding method 300. - The thresholds T1 and T2 may be set as the same or as different values.
- Decoding processes that perform the reverse steps of
encoding methods methods encoding methods methods -
FIG. 7 shows asystem 700 for performing the encoding and decoding methods consistent with the present disclosure.System 700 includes a non-transitory computer-readable storage medium 702 that may be a memory storing instructions capable of being performed by aprocessor 704. It is noted that one or more non-transitory computer-readable storage mediums 702 and/or one ormore processors 704 may alternatively be utilized to perform encoding and decoding methods consistent with the present disclosure. - Non-transitory computer-
readable storage medium 702 may be any sort of non-transitory computer-readable storage medium (CRM). A non-transitory computer-readable storage medium may include, for example, a floppy disk, a flexible disk, hard disk, hard drive, solid state drive, magnetic tape, or any other magnetic data storage medium, a CD-ROM, any other optical data storage medium, any physical medium with patterns of holes, a RAM, a PROM, and EPROM, a FLASH-EPROM or any other flash memory, NVRAM, a cache, a register, any other memory chip or cartridge, and networked versions of the same. A computer-readable storage medium may store instructions for execution by at least one processor, including instructions for causing the processor to perform steps or stages consistent with the encoding and decoding methods described herein. Additionally, one or more computer-readable storage mediums may be used to implement the encoding and decoding methods described herein. The term “computer-readable storage medium” should be understood to include tangible items and exclude carrier waves and transient signals. -
Processor 704 may be one or more of any sort of digital signal processor (DSP), application specific integrated circuit (ASIC), digital signal processing device (DSPD), programmable logic device (PLD), field programmable gate arrays (FPGA), controller, micro-controller, micro-processor, computer, or any other electronic component for performing the encoding and decoding methods described herein. - The following is a description of experimental results obtained by testing the encoding methods described herein.
- Tests were conducted using the HEVC SCC reference mode, SCM 4.0 under common test conditions (CTC). Coding performance of the encoding methods described herein was compared to the reference models for HEVC. Encoding was first performed using the HEVC reference model, with the encoding time recorded as encoding time A. Encoding using a test encoding method according to the encoding methods described herein was also performed, with encoding time recorded as encoding time B. Encoding time percent was calculated by dividing encoding time B by encoding time A. HEVC common test sequences were utilized as video under examination. Video featured mixed video frames with text, graphics, and motion; mixed content; animation; and camera captured content. Video with RGB and YUV color spaces were tested, with source video quality equaling 720p, 1080p, or 1440p. All intra prediction under lossy conditions, random access, and low-B prediction was utilized. All intra prediction compresses a video frame using information contained within the frame being currently compressed, while random access and low-B prediction compress a video frame by utilizing information within previously coded frames as well as the frame currently being compressed. Low-B prediction is also referred to as low delay B prediction in the following description. In each test, encoding time, as well as decoding time, was recorded, with percentages indicating the percent of time taken to encode or decode compared to exemplary encoding and decoding methods of the reference models. Positive percentages referring to each G/Y, B/U, and RN component represent bit rate coding loss, while negative percentages represent bit rate coding gain, in relation to the original video source. For example, a 0.1% for a G/Y component represents a coding loss of 0.1% for the G/Y component in the encoded video compared to the G/Y component in the original video source. In another example, a −0.1% for a G/Y component represents a coding gain of 0.1% for the G/Y component in the encoded video compared to the G/Y component in the original video source.
- Reference is made to
encoding method 500 ofFIG. 5 and Table 1 below. Forencoding method 500, testing was performed under three settings. In setting 1, T2 and T1 were each set to 64. In setting 2, T2 was set to 64, while T1 was set to 32. In setting 3, T2 was set to 64, while T1 was set to 16. Intra prediction was the determined encoding mode. - In setting 1, when pixel components had low correlation, CU with CU sizes greater than or equal to 64×64 were encoded without ACT. CU sized smaller than 64×64 were encoded with ACT enabled. Furthermore, for CU sizes greater than or equal to 64×64,
TU size decision 314 was skipped. For CU sizes less than 64×64,TU size decision 314 was performed. - In setting 2, when pixel components had low correlation, CU with CU sizes greater than or equal to 64×64 were encoded without ACT. CU sized smaller than 64×64 were encoded with ACT enabled. Furthermore, for CU sizes greater than or equal to 32×32,
TU size decision 314 was skipped. For CU sizes less than 32×32,TU size decision 314 was performed. - In setting 3, when pixel components had low correlation, CU with CU sizes greater than or equal to 64×64 were encoded without ACT. CU sized smaller than 64×64 were encoded with ACT enabled. Furthermore, for CU sizes greater than or equal to 16×16,
TU size decision 314 was skipped. For CU sizes less than 16×16,TU size decision 314 was performed. -
TABLE 1 Setting 1 Setting 2 Setting 3 All Intra G/Y B/U R/V G/Y B/U R/V G/Y B/U R/V RGB, text & graphics with 0.0% 0.0% 0.0% 0.1% 0.0% 0.0% 0.1% 0.0% −0.1% motion, 1080p & 720p RGB, mixed content, 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.1% 0.2% 0.1% 1440p & 1080p RGB, Animation, 720p 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.1% 0.1% 0.1% RGB, camera captured, 1080p 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.1% YUV, text & graphics with 0.0% 0.0% 0.0% 0.0% 0.0% 0.1% 0.0% 0.1% 0.1% motion, 1080p & 720p YUV, mixed content, 0.1% 0.1% 0.1% 0.0% 0.0% 0.0% 0.0% −0.1% −0.1% 1440p & 1080p YUV, Animation, 720p 0.0% −0.1% 0.0% 0.0% 0.1% 0.0% 0.0% 0.0% 0.1% YUV, camera captured, 1080p 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% Enc Time[%] 97% 94% 91% Dec Time[%] 100% 100% 100% - As shown in Table 1, encoding performance in each of settings 1, 2, and 3 improved. Setting 1 showed a 3% reduction in encoding complexity, while setting 2 showed a 6% reduction in encoding complexity. Setting 3 showed the greatest reduction in encoding complexity, with a reduction of 9%. Thus, all settings exhibited an improvement in coding efficiency. While each setting featured minimal loss of bit rate, encoding time and efficiency was improved.
- Reference is made to
encoding method 500 and Tables 2 and 3 below. Here, testing was performed under all intra, random access, and low delay B. In Test 1, T2 and T1 were both set to 32. In Test 2, T2 and T1 were both set to 16. Consistent withmethod 500, TU evaluation was disabled for CU with CU sizes greater than or equal to 32×32 in Test 1, and CU with CU sizes greater than or equal to 32×32 were encoded without ACT. CU sized smaller than 32×32 were encoded with ACT enabled. In Test 2, TU evaluation was disabled for CU with CU sizes greater than or equal to 16×16, and CU with CU sizes greater than or equal to 16×16 were encoded without ACT. CU sized smaller than 16×16 were encoded with ACT enabled. Testing was conducted in lossy conditions, with full frame intra block copy utilized. -
TABLE 2 All Intra Random Access Low delay B G/Y B/U R/V G/Y B/U R/V G/Y B/U R/V Test 1: N = 32 × 32 RGB, text & graphics with −0.1% 0.0% −0.1% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% motion, 1080p & 720p RGB, mixed content, 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.1% 0.1% 0.1% 1440p & 1080p RGB, Animation, 720p 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.1% 0.1% RGB, camera captured, 1080p 0.0% 0.0% 0.0% 0.0% 0.0% 0.1% 0.0% −0.1% −0.1% YUV, text & graphics with 0.0% 0.0% 0.1% 0.0% 0.0% 0.0% 0.0% 0.0% 0.1% motion, 1080p & 720p YUV, mixed content, 0.0% 0.0% 0.0% 0.0% 0.0% −0.1% 0.0% 0.2% −0.1% 1440p & 1080p YUV, Animation, 720p 0.0% 0.1% 0.1% 0.0% 0.0% 0.2% −0.1% 0.3% 0.1% YUV, camera captured, 1080p 0.0% 0.0% 0.0% 0.1% 0.0% −0.1% 0.0% 0.0% 0.0% Enc Time[%] 95% 99% 99% Dec Time[%] 100% 100% 100% Test 2: N = 16 × 16 RGB, text & graphics with 0.0% 0.0% 0.0% 0.0% 0.0% 0.1% 0.0% 0.0% 0.0% motion, 1080p & 720p RGB, mixed content, 0.1% 0.1% 0.1% 0.0% 0.0% 0.0% 0.2% 0.1% 0.1% 1440p & 1080p RGB, Animation, 720p 0.1% 0.1% 0.1% 0.0% 0.0% 0.0% 0.0% 0.1% 0.1% RGB, camera captured, 1080p 0.1% 0.0% 0.1% 0.0% 0.1% 0.1% 0.0% 0.0% 0.0% YUV, text & graphics with 0.0% 0.0% 0.0% 0.0% 0.1% 0.0% −0.1% −0.1% −0.2% motion, 1080p & 720p YUV, mixed content, 0.0% 0.0% 0.0% −0.1% −0.2% −0.2% 0.1% 0.2% 0.0% 1440p & 1080p YUV, Animation, 720p 0.0% 0.2% 0.2% 0.0% 0.0% 0.4% 0.0% 0.2% 0.1% YUV, camera captured, 1080p 0.0% 0.0% 0.0% 0.0% 0.0% −0.2% −0.1% 0.1% 0.0% Enc Time[%] 92% 99% 100% Dec Time[%] 100% 100% 100% - As shown in Table 2, in Test 1, all intra mode resulted in a 5% reduction in encoding complexity. Random access and low delay B each produced a 1 percent encoding complexity reduction. Each setting showed very minimal bit-rate loss, with all intra and random access modes showing almost no change in bit-rate.
- In Test 2, all intra mode resulted in an 8% reduction in encoding complexity. Random access produced a 1 percent encoding complexity reduction, while low delay B produced no change in encoding complexity. Each mode featured more bit-rate loss compared to Test 1, but bit-rate loss was still minimal because it only registered in the decimal percentage range. A decimal percentage bit rate loss means that compared to the original video, the encoded video experienced only a small reduction in bit rate, and therefore only a small loss of video quality. Such a small loss in video quality is acceptable in most applications due to the improved encoding time achieved by encoding
method 500. -
TABLE 3 All Intra Random Access Low Delay B Bit-rate Bit-rate Bit-rate Bit-rate Bit-rate Bit-rate Bit-rate Bit-rate Bit-rate Bit-rate Bit-rate Bit-rate change change change change change change change change change change change change (Total) (Average) (Min) (Max) (Total) (Average) (Min) (Max) (Total) (Average) (Min) (Max) Test 1: N = 32 × 32 RGB, text & graphics 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% with motion, 1080p & 720p RGB, mixed content, 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 1440p & 1080p RGB, Animation, 720p 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% RGB, camera 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% captured, 1080p YUV, text & graphics 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% with motion, 1080p & 720p YUV, mixed content, 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 1440p & 1080p YUV, Animation, 720p 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% YUV, camera 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% captured, 1080p Enc Time[%] 99% 100% 100% Dec Time[%] 100% 100% 100% Test 2: N = 16 × 16 RGB, text & graphics 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% with motion, 1080p & 720p RGB, mixed content, 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 1440p & 1080p RGB, Animation, 720p 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% RGB, camera 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% captured, 1080p YUV, text & graphics 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% with motion, 1080p & 720p YUV, mixed content, 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 1440p & 1080p YUV, Animation, 720p 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% YUV, camera 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% captured, 1080p Enc Time[%] 99% 100% 100% Dec Time[%] 100% 100% 100% - As shown in Table 3, in Test 1 and Test 2, each mode featured zero amount of bit-rate change in total or as an average. All intra mode featured the best reduction in encoding complexity, showing a 1% reduction in each test.
- Reference is made to
encoding method 500 described inFIG. 5 and Table 4. Here, testing was conducted in lossy conditions, with 4-CTU intra block copy utilized and a chroma mode of 4:4:4. Intra block copy utilizes a motion vector to copy a block from a previously coded CU in the currently coded video frame. 4-CTU indicates the allowable searching area for the motion vector. - In Test 1, T2 and T1 were both set to 32. In Test 2, T2 and T1 were both set to 16. Consistent with
method 500, TU evaluation was disabled for CU with CU sizes greater than or equal to 32×32 in Test 1, and TU evaluation was disabled for CU with CU sizes greater than or equal to 16×16 in Test 2. ACT was enabled for CU sizes less than 32×32 in Test 1, with ACT disabled when CU sizes were greater than or equal to 32×32. In Test 2, ACT was enabled for CU sizes smaller than 16×16, with ACT disabled when CU sizes were greater than or equal to 16×16. -
TABLE 4 All Intra Random Access Low delay B G/Y B/U R/V G/Y B/U R/V G/Y B/U R/V Test 1: N = 32 × 32 RGB, text & graphics with 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.1% motion, 1080p & 720p RGB, mixed content, 0.0% 0.0% 0.0% −0.1% 0.0% 0.0% 0.1% 0.0% 0.2% 1440p & 1080p RGB, Animation, 720p 0.0% 0.0% 0.0% 0.0% 0.0% 0.1% 0.0% −0.1% 0.0% RGB, camera captured, 1080p 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% YUV, text & graphics with 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.2% 0.1% motion, 1080p & 720p YUV, mixed content, 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.2% 0.2% 1440p & 1080p YUV, Animation, 720p 0.0% 0.1% 0.1% 0.0% 0.2% 0.2% −0.1% −0.2% 0.1% YUV, camera captured, 1080p 0.0% 0.0% 0.0% 0.0% 0.1% 0.0% 0.0% −0.1% 0.1% Enc Time[%] 95% 99% 99% Dec Time[%] 100% 100% 100% Test 2: N = 16 × 16 RGB, text & graphics with 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.1% motion, 1080p & 720p RGB, mixed content, 0.1% 0.1% 0.1% 0.1% 0.1% 0.1% −0.1% −0.1% 0.1% 1440p & 1080p RGB, Animation, 720p 0.1% 0.1% 0.1% 0.0% 0.0% 0.0% 0.0% −0.1% −0.1% RGB, camera captured, 1080p 0.1% 0.0% 0.1% 0.0% 0.0% 0.1% 0.0% 0.0% −0.1% YUV, text & graphics with 0.0% 0.0% 0.0% 0.0% 0.0% 0.1% 0.0% 0.1% 0.0% motion, 1080p & 720p YUV, mixed content, 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.2% 0.2% 1440p & 1080p YUV, Animation, 720p 0.0% 0.2% 0.2% 0.0% 0.3% 0.3% 0.0% −0.1% 0.2% YUV, camera captured, 1080p 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.1% 0.0% 0.1% Enc Time[%] 92% 99% 100% Dec Time[%] 100% 100% 100% - As shown in Table 4, in Test 1 and Test 2, each mode featured minimal bit-rate change in all intra, random access, or low-delay B modes. All intra featured the best reduction in encoding complexity in both tests, showing a 5% reduction Test 1, and an 8% reduction in Test 2.
- Reference is made to
encoding method 400 ofFIG. 4 , and Tables 5.1 and 5.2 below. Here, testing was performed with T2 set to 64. Thus, when component correlation analysis atstep 304 determined low correlation amount color components of a CU, determination atstep 402 was performed to determine whether the CU size of the CU was smaller than 64×64. If the CU size of the CU was smaller than 64×64, ACT was enabled and RDO based mode decision was performed atstep 310. If CU size of the CU was greater than or equal to 64×64, the ACT was disabled and the process proceeded to step 308. Testing conditions were based on lossy all intra encoding mode with full frame intra block copy in Test 1, and lossy all intra encoding mode with 4 CTU IBC in Test 2. Chroma mode was selected as 4:4:4 in each test. -
TABLE 5.1 Test 1 Encoding AI, Lossy, FF-IBC search G/Y B/U R/V Time YUV, text & graphics with motion, 0.0% 0.0% 0.0% 97% 1080p & 720p YUV, mixed content, 1440p & 1080p 0.0% 0.0% 0.0% 97% YUV, Animation, 720p 0.0% 0.1% 0.1% 99% YUV, camera captured, 1080p 0.0% 0.0% 0.0% 98% -
TABLE 5.2 Test 2 Encoding AI, Lossy, 4-CTU IBC search G/Y B/U R/V Time YUV, text & graphics with motion, 0.0% 0.0% 0.0% 97% 1080p & 720p YUV, mixed content, 1440p & 1080p 0.0% 0.0% 0.0% 97% YUV, Animation, 720p 0.0% 0.1% 0.1% 98% YUV, camera captured, 1080p 0.0% 0.0% 0.0% 98% - As shown by Table 5.1, for YUV color spaces and lossy all intra (Al) encoding utilizing full frame IBC, encoding
method 400 resulted in a 1% to 3% reduction in encoding time, with minimal bit-rate loss. Table 5.2 shows that in lossy all intra encoding utilizing 4-CTU IBC search, theencoding method 400 resulted in similar reduction in encoding time, with minimal bit-rate loss, as in Table 5.1: Test 1. - Reference is made to
encoding method 400 and Table 6 below. Here, T2 was set to 64. Lossless intra encoding was performed, with chroma mode selected as 4:4:4. -
TABLE 6 All Intra Bit-rate Bit-rate Bit-rate Bit-rate change change change change Encoding (Total) (Average) (Min) (Max) Time YUV, text 0.0% 0.0% 0.0% 0.0% 98.7% & graphics with motion, 1080p & 720p YUV, mixed 0.0% 0.0% 0.0% 0.0% 98.4% content, 1440p & 1080p YUV, Animation, 0.0% 0.0% 0.0% 0.0% 100.0% 720p YUV, camera 0.0% 0.0% 0.0% 0.0% 98.2% captured, 1080p - For YUV color spaces, the
encoding method 400 resulted in a 0% to about 2% saving of encoding time. - Reference is made to encoding method 300 (
FIG. 3 ) and Table 7 below. Here, T1 was set to 32 in Test 1, and to 16 in Test 2. Consistent withmethod 300, in Test 1, for CU with CU sizes greater than or equal to 32×32, theTU size decision 314 was skipped. For CU with CU sizes less than 32×32,TU size decision 314 was performed. In Test 2, for CU with CU sizes greater than or equal to 16×16, theTU size decision 314 was skipped. For CU with CU sizes less than 16×16,TU size decision 314 was performed. Lossy all intra encoding with ACT enabled was performed. -
TABLE 7 Test 1: Test 2: when CU ≧ 32 × 32 Encoding when CU ≧ 16 × 16 Encoding AI, Lossy G/Y B/U R/V time [%] G/Y B/U R/V time [%] RGB, TGM, 1080p & 720p −0.1% 0.0% −0.1% 96% 0.0% 0.0% 0.0% 94% RGB, mixed, 1440p & 1080p 0.0% 0.0% 0.0% 96% 0.1% 0.1% 0.1% 94% RGB, Animation, 720p 0.0% 0.0% 0.0% 97% 0.1% 0.1% 0.1% 94% RGB, camera captured, 1080p 0.0% 0.0% 0.0% 96% 0.1% 0.0% 0.1% 93% YUV, TGM, 1080p & 720p 0.0% 0.0% 0.1% 95% 0.0% 0.0% 0.0% 92% YUV, mixed, 1440p & 1080p 0.0% 0.0% 0.0% 95% 0.0% 0.0% 0.0% 92% YUV, Animation, 720p 0.0% 0.0% 0.0% 96% 0.0% 0.2% 0.1% 92% YUV, camera captured, 1080p 0.0% 0.0% 0.0% 94% 0.0% 0.0% 0.0% 90% Enc Time[%] 96% 93% Dec Time[%] 100% 100% - Encoding time in Test 1 was reduced by between 3% to 6%. In Test 2, encoding time was reduced by between 6% to 10%. Thus, allowing TU size decisions only for CU sized less than 32×32 or 16×16 aided encoding efficiency.
- The foregoing description has been presented for purposes of illustration. It is not exhaustive and is not limited to the precise forms or embodiments disclosed. Modifications and adaptations of the embodiments will be apparent from consideration of the specification and practice of the disclosed embodiments. For example, the described implementations include hardware and software, but systems and methods consistent with the present disclosure can be implemented as hardware alone.
- Preparing computer programs based on the written description and methods of this specification is within the skill of a software developer. The various programs or program Modules can be created using a variety of programming techniques. For example, program sections or program Modules can be designed in or by means of Java, C, C++, assembly language, or any such programming languages. One or more of such software sections or Modules can be integrated into a computer system, non-transitory computer-readable media, or existing communications software.
- Moreover, while illustrative embodiments have been described herein, the scope includes any and all embodiments having equivalent elements, modifications, omissions, combinations (e.g., of aspects across various embodiments), adaptations or alterations based on the present disclosure. The elements in the claims are to be interpreted broadly based on the language employed in the claims and not limited to examples described in the present specification or during the prosecution of the application, which examples are to be construed as non-exclusive. Further, the steps of the disclosed methods can be modified in any manner, including by reordering steps or inserting or deleting steps. It is intended, therefore, that the specification and examples be considered as exemplary only, with a true scope and spirit being indicated by the following claims and their full scope of equivalents.
- Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure herein. The scope of the disclosure is intended to cover any variations, uses, or adaptations of the disclosure following the general principles thereof and including such departures from the present disclosure as come within known or customary practice in the art. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
- It will be appreciated that the present disclosure is not limited to the exact construction that has been described above and illustrated in the accompanying drawings, and that various modifications and changes can be made without departing from the scope thereof. It is intended that the scope of the disclosure only be limited by the appended claims.
Claims (20)
1. A video encoding method comprising:
receiving a source video frame;
dividing the source video frame into a coding tree unit;
determining a coding unit from the coding tree unit;
determining a correlation between components of the coding unit;
enabling or disabling a coding mode of the coding unit;
determining whether to evaluate a size of a transform unit for an enabled coding mode; and
determining a transform unit of the coding unit for the enabled coding mode;
wherein the size of the coding unit is defined by a number (N) of samples.
2. The method of claim 1 , wherein:
determining a correlation between components of the coding unit comprises determining a color space of the coding unit.
3. The method of claim 2 , wherein:
determining the color space comprises determining whether the color space is a red, green, and blue (RGB) color space or a luminance and chrominance (YUV) color space.
4. The method of claim 3 , further comprising:
enabling, if the enabled coding mode is an intra prediction mode which enables an adaptive color transform, a cost-based mode decision when the color space is determined to be an RGB color space.
5. The method of claim 3 , further comprising:
disabling, if the enabled coding mode is an intra prediction mode which enables an adaptive color transform, a cost-based decision when the color space is determined to be a YUV color space.
6. The method of claim 3 , further comprising
disabling a coding mode of the coding unit when the color space is determined to be a YUV color space and N is greater than or equal to a threshold.
7. The method of claim 3 , further comprising
determining whether N is smaller than a threshold; and
enabling, when the color space is determined to be a YUV color space and N is smaller than the threshold, a coding mode which enables an adaptive color transform
8. The method of claim 1 , further comprising:
determining whether N is greater than or equal to a threshold; and
determining, when N is greater than or equal to the threshold and the enabled coding mode is a coding mode which enables an adaptive color transform, not to evaluate the size of the transform unit.
9. The method of claim 1 , further comprising:
determining whether N is smaller than a threshold; and
evaluating, when N is smaller than the threshold and the enabled coding mode is a coding mode which enables an adaptive color transform, a size of the transform unit.
10. The method of claim 9 , further comprising:
selecting the size of the transform unit.
11. The method of claim 1 , further comprising:
determining whether N is smaller than a first threshold;
determining whether N is smaller than a second threshold;
enabling, when the color space is determined to be a YUV color space and when N is smaller than the first threshold, a coding mode which enables an adaptive color transform; and
evaluating a size of the transform unit for the coding mode which enables the adaptive color transform when N is smaller than the second threshold.
12. The method of claim 1 , further comprising:
determining whether N is smaller than a first threshold;
determining whether N is greater than or equal to a second threshold;
enabling, when the color space is determined to be a YUV color space and when N is smaller than the first threshold, a coding mode which enables an adaptive color transform; and
determining not to evaluate the size of the transform unit for the coding mode which enables the adaptive color transform when N is greater than or equal to the second threshold.
13. The method of claim 1 , wherein:
N equals a number of horizontal samples or a number of vertical samples.
14. The method of claim 1 , wherein:
determining a correlation between components of the coding unit comprises comparing a correlation of pixel components to a threshold.
15. A video encoding system comprising:
a memory storing instructions; and
a processor configured to execute the instructions to:
receive a source video frame;
divide the source video frame into a coding tree unit;
determine a coding unit from the coding tree unit;
determine a correlation between components of the coding unit;
enable or disable a coding mode of the coding unit;
determine whether to evaluate a size of a transform unit for an enabled coding mode; and
determine a transform unit of the coding unit for the enabled coding mode;
wherein the size of the coding unit is defined by a number (N) of samples.
16. The system of claim 15 , wherein the processor is further configured to execute instructions to:
determine the correlation between components of the coding unit by determining a color space of the coding unit.
17. The system of claim 16 , wherein the processor is further configured to execute instructions to:
determine the color space by determining whether the color space is a red, green, and blue (RGB) color space or a luminance and chrominance (YUV) color space.
18. The system of claim 15 , wherein the processor is further configured to execute instructions to:
determine whether N is smaller than a first threshold;
determine whether N is smaller than a second threshold;
enable, when the color space is determined to be a YUV color space and when N is smaller than the first threshold, a coding mode which enables an adaptive color transform, and
evaluate a size of the transform unit for the coding mode which enables the adaptive color transform when N is smaller than the second threshold.
19. The system of claim 15 , wherein the processor is further configured to execute instructions to:
determine whether N is smaller than a first threshold;
determine whether N is greater than or equal to a second threshold;
enable, when the color space is determined to be a YUV color space and when N is smaller than the first threshold, a coding mode which enables an adaptive color transform, and
determine not to evaluate the size of the transform unit for the coding mode which enables the adaptive color transform when N is greater than or equal to the second threshold.
20. A non-transitory computer-readable storage medium storing a set of instructions that, when executed by one or more processors, cause the one or more processors to perform a method of video encoding, the method comprising:
receiving a source video frame;
dividing the source video frame into a coding tree unit;
determining a coding unit from the coding tree unit;
determining a correlation between components of the coding unit;
enabling or disabling a coding mode of the coding unit;
determining whether to evaluate a size of a transform unit for an enabled coding mode; and
determining a transform unit of the coding unit for the enabled coding mode;
wherein the size of the coding unit is defined by a number (N) of samples.
Priority Applications (6)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/757,556 US20160360205A1 (en) | 2015-06-08 | 2015-12-24 | Video encoding methods and systems using adaptive color transform |
EP16166891.8A EP3104606B1 (en) | 2015-06-08 | 2016-04-25 | Video encoding methods and systems using adaptive color transform |
JP2016090958A JP6670670B2 (en) | 2015-06-08 | 2016-04-28 | Video encoding method and system using adaptive color conversion |
TW105114323A TWI597977B (en) | 2015-06-08 | 2016-05-09 | Video encoding methods and systems using adaptive color transform |
CN201610357374.8A CN106254870B (en) | 2015-06-08 | 2016-05-26 | Video encoding method, system and computer-readable recording medium using adaptive color conversion |
US15/196,108 US10390020B2 (en) | 2015-06-08 | 2016-06-29 | Video encoding methods and systems using adaptive color transform |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201562172256P | 2015-06-08 | 2015-06-08 | |
US14/757,556 US20160360205A1 (en) | 2015-06-08 | 2015-12-24 | Video encoding methods and systems using adaptive color transform |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/196,108 Continuation-In-Part US10390020B2 (en) | 2015-06-08 | 2016-06-29 | Video encoding methods and systems using adaptive color transform |
Publications (1)
Publication Number | Publication Date |
---|---|
US20160360205A1 true US20160360205A1 (en) | 2016-12-08 |
Family
ID=56117547
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/757,556 Abandoned US20160360205A1 (en) | 2015-06-08 | 2015-12-24 | Video encoding methods and systems using adaptive color transform |
US15/177,203 Active 2037-02-21 US10225556B2 (en) | 2015-06-08 | 2016-06-08 | Method and apparatus of encoding or decoding coding units of a video content in a palette coding mode using an adaptive palette predictor |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/177,203 Active 2037-02-21 US10225556B2 (en) | 2015-06-08 | 2016-06-08 | Method and apparatus of encoding or decoding coding units of a video content in a palette coding mode using an adaptive palette predictor |
Country Status (5)
Country | Link |
---|---|
US (2) | US20160360205A1 (en) |
EP (1) | EP3104607A1 (en) |
JP (2) | JP2017022696A (en) |
CN (1) | CN106254871B (en) |
TW (1) | TWI574551B (en) |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160360198A1 (en) * | 2015-06-08 | 2016-12-08 | Industrial Technology Research Institute | Video encoding methods and systems using adaptive color transform |
US9936199B2 (en) * | 2014-09-26 | 2018-04-03 | Dolby Laboratories Licensing Corporation | Encoding and decoding perceptually-quantized video content |
US20190246143A1 (en) * | 2018-02-08 | 2019-08-08 | Qualcomm Incorporated | Intra block copy for video coding |
CN111327950A (en) * | 2020-03-05 | 2020-06-23 | 腾讯科技(深圳)有限公司 | Video transcoding method and device |
US20210021811A1 (en) * | 2018-11-28 | 2021-01-21 | Beijing Bytedance Network Technology Co., Ltd. | Independent construction method for block vector list in intra block copy mode |
US11146805B2 (en) * | 2018-11-30 | 2021-10-12 | Tencent America LLC | Method and apparatus for video coding |
US11240507B2 (en) * | 2019-09-24 | 2022-02-01 | Qualcomm Incorporated | Simplified palette predictor update for video coding |
CN114009050A (en) * | 2019-06-21 | 2022-02-01 | 北京字节跳动网络技术有限公司 | Adaptive In-loop Color Space Transform for Video Codec |
US11277624B2 (en) | 2018-11-12 | 2022-03-15 | Beijing Bytedance Network Technology Co., Ltd. | Bandwidth control methods for inter prediction |
RU2781437C1 (en) * | 2019-09-24 | 2022-10-12 | Тенсент Америка Ллс | Transmitting information about the size of the encoding tree unit |
US11509923B1 (en) | 2019-03-06 | 2022-11-22 | Beijing Bytedance Network Technology Co., Ltd. | Usage of converted uni-prediction candidate |
CN116437086A (en) * | 2019-12-30 | 2023-07-14 | 阿里巴巴(中国)有限公司 | Method and apparatus for encoding video data in palette mode |
US11825030B2 (en) | 2018-12-02 | 2023-11-21 | Beijing Bytedance Network Technology Co., Ltd | Intra block copy mode with dual tree partition |
US11838539B2 (en) | 2018-10-22 | 2023-12-05 | Beijing Bytedance Network Technology Co., Ltd | Utilization of refined motion vector |
US20240073429A1 (en) * | 2019-02-17 | 2024-02-29 | Beijing Bytedance Network Technology Co., Ltd. | Restriction Of Applicability For Intra Block Copy Mode |
US11956465B2 (en) | 2018-11-20 | 2024-04-09 | Beijing Bytedance Network Technology Co., Ltd | Difference calculation based on partial position |
US12278949B2 (en) | 2019-11-07 | 2025-04-15 | Beijing Bytedance Technology Co., Ltd. | Quantization properties of adaptive in-loop color-space transform for video coding |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9807402B2 (en) * | 2014-10-06 | 2017-10-31 | Industrial Technology Research Institute | Method of color palette coding applicable to electronic device and electronic device using the same |
JP6148785B1 (en) * | 2016-12-26 | 2017-06-14 | 株式会社Cygames | Information processing system, information processing apparatus, and program |
CN113784129B (en) * | 2020-06-10 | 2024-11-22 | Oppo广东移动通信有限公司 | Point cloud quality assessment method, encoder, decoder and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140027051A1 (en) * | 2011-09-29 | 2014-01-30 | Viking Tech Corporation | Method of Fabricating a Light Emitting Diode Packaging Structure |
US20150373327A1 (en) * | 2014-06-20 | 2015-12-24 | Qualcomm Incorporated | Block adaptive color-space conversion coding |
US20160080751A1 (en) * | 2014-09-12 | 2016-03-17 | Vid Scale, Inc. | Inter-component de-correlation for video coding |
US20160261870A1 (en) * | 2015-03-06 | 2016-09-08 | Qualcomm Incorporated | Fast video encoding method with block partitioning |
US20170180740A1 (en) * | 2013-04-16 | 2017-06-22 | Fastvdo Llc | Adaptive coding, transmission and efficient display of multimedia (acted) |
Family Cites Families (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7634405B2 (en) | 2005-01-24 | 2009-12-15 | Microsoft Corporation | Palette-based classifying and synthesizing of auditory information |
AR064274A1 (en) * | 2006-12-14 | 2009-03-25 | Panasonic Corp | MOVEMENT IMAGE CODING METHOD, MOVING IMAGE CODING DEVICE, MOVING IMAGE RECORDING METHOD, RECORDING MEDIA, MOVING IMAGE PLAYBACK METHOD, IMPROVEMENT AND IMPROVEMENT SYSTEM |
US9654777B2 (en) * | 2013-04-05 | 2017-05-16 | Qualcomm Incorporated | Determining palette indices in palette-based video coding |
US9558567B2 (en) | 2013-07-12 | 2017-01-31 | Qualcomm Incorporated | Palette prediction in palette-based video coding |
CN104301737B (en) * | 2013-07-15 | 2017-11-17 | 华为技术有限公司 | The coding/decoding method of target image block and coding method and decoder and encoder |
US9607574B2 (en) * | 2013-08-09 | 2017-03-28 | Apple Inc. | Video data compression format |
US20150110181A1 (en) | 2013-10-18 | 2015-04-23 | Samsung Electronics Co., Ltd. | Methods for palette prediction and intra block copy padding |
PL3117617T3 (en) | 2014-03-14 | 2022-08-22 | Vid Scale, Inc. | Palette coding for screen content coding |
US10362336B2 (en) | 2014-03-25 | 2019-07-23 | Qualcomm Incorporated | Palette predictor signaling with run length code for video coding |
US10750198B2 (en) * | 2014-05-22 | 2020-08-18 | Qualcomm Incorporated | Maximum palette parameters in palette-based video coding |
CN105323583B (en) | 2014-06-13 | 2019-11-15 | 财团法人工业技术研究院 | Encoding method, decoding method, encoding/decoding system, encoder and decoder |
US10687064B2 (en) | 2014-08-04 | 2020-06-16 | Qualcomm Incorporated | Palette mode encoding and decoding with inferred pixel scan order |
CN105491379A (en) | 2014-10-01 | 2016-04-13 | 财团法人工业技术研究院 | Decoder, encoder, decoding method, encoding method and encoding/decoding system |
US9877029B2 (en) | 2014-10-07 | 2018-01-23 | Qualcomm Incorporated | Palette index binarization for palette-based video coding |
-
2015
- 2015-12-24 US US14/757,556 patent/US20160360205A1/en not_active Abandoned
-
2016
- 2016-06-08 US US15/177,203 patent/US10225556B2/en active Active
- 2016-06-08 EP EP16173459.5A patent/EP3104607A1/en not_active Ceased
- 2016-06-08 JP JP2016114199A patent/JP2017022696A/en active Pending
- 2016-06-08 CN CN201610405825.0A patent/CN106254871B/en active Active
- 2016-06-08 TW TW105118188A patent/TWI574551B/en active
-
2018
- 2018-04-10 JP JP2018075717A patent/JP2018137796A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140027051A1 (en) * | 2011-09-29 | 2014-01-30 | Viking Tech Corporation | Method of Fabricating a Light Emitting Diode Packaging Structure |
US20170180740A1 (en) * | 2013-04-16 | 2017-06-22 | Fastvdo Llc | Adaptive coding, transmission and efficient display of multimedia (acted) |
US20150373327A1 (en) * | 2014-06-20 | 2015-12-24 | Qualcomm Incorporated | Block adaptive color-space conversion coding |
US20160080751A1 (en) * | 2014-09-12 | 2016-03-17 | Vid Scale, Inc. | Inter-component de-correlation for video coding |
US20160261870A1 (en) * | 2015-03-06 | 2016-09-08 | Qualcomm Incorporated | Fast video encoding method with block partitioning |
Cited By (31)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9936199B2 (en) * | 2014-09-26 | 2018-04-03 | Dolby Laboratories Licensing Corporation | Encoding and decoding perceptually-quantized video content |
US10390020B2 (en) * | 2015-06-08 | 2019-08-20 | Industrial Technology Research Institute | Video encoding methods and systems using adaptive color transform |
US20160360198A1 (en) * | 2015-06-08 | 2016-12-08 | Industrial Technology Research Institute | Video encoding methods and systems using adaptive color transform |
US20190246143A1 (en) * | 2018-02-08 | 2019-08-08 | Qualcomm Incorporated | Intra block copy for video coding |
US11012715B2 (en) * | 2018-02-08 | 2021-05-18 | Qualcomm Incorporated | Intra block copy for video coding |
US11889108B2 (en) | 2018-10-22 | 2024-01-30 | Beijing Bytedance Network Technology Co., Ltd | Gradient computation in bi-directional optical flow |
US12041267B2 (en) | 2018-10-22 | 2024-07-16 | Beijing Bytedance Network Technology Co., Ltd. | Multi-iteration motion vector refinement |
US11838539B2 (en) | 2018-10-22 | 2023-12-05 | Beijing Bytedance Network Technology Co., Ltd | Utilization of refined motion vector |
US11956449B2 (en) | 2018-11-12 | 2024-04-09 | Beijing Bytedance Network Technology Co., Ltd. | Simplification of combined inter-intra prediction |
US11277624B2 (en) | 2018-11-12 | 2022-03-15 | Beijing Bytedance Network Technology Co., Ltd. | Bandwidth control methods for inter prediction |
US11284088B2 (en) | 2018-11-12 | 2022-03-22 | Beijing Bytedance Network Technology Co., Ltd. | Using combined inter intra prediction in video processing |
US11843725B2 (en) | 2018-11-12 | 2023-12-12 | Beijing Bytedance Network Technology Co., Ltd | Using combined inter intra prediction in video processing |
US11516480B2 (en) | 2018-11-12 | 2022-11-29 | Beijing Bytedance Network Technology Co., Ltd. | Simplification of combined inter-intra prediction |
US11956465B2 (en) | 2018-11-20 | 2024-04-09 | Beijing Bytedance Network Technology Co., Ltd | Difference calculation based on partial position |
US20230095275A1 (en) * | 2018-11-28 | 2023-03-30 | Beijing Bytedance Network Technology Co., Ltd. | Independent construction method for block vector list in intra block copy mode |
US20210021811A1 (en) * | 2018-11-28 | 2021-01-21 | Beijing Bytedance Network Technology Co., Ltd. | Independent construction method for block vector list in intra block copy mode |
US11146805B2 (en) * | 2018-11-30 | 2021-10-12 | Tencent America LLC | Method and apparatus for video coding |
US12137238B2 (en) | 2018-11-30 | 2024-11-05 | Tencent America LLC | Reusing reference sample memory based on flexible partition type |
US20210385476A1 (en) * | 2018-11-30 | 2021-12-09 | Tencent America LLC | Method and apparatus for video coding |
US11575924B2 (en) * | 2018-11-30 | 2023-02-07 | Tencent America LLC | Method and apparatus for video coding |
US11825030B2 (en) | 2018-12-02 | 2023-11-21 | Beijing Bytedance Network Technology Co., Ltd | Intra block copy mode with dual tree partition |
US20240073429A1 (en) * | 2019-02-17 | 2024-02-29 | Beijing Bytedance Network Technology Co., Ltd. | Restriction Of Applicability For Intra Block Copy Mode |
US11509923B1 (en) | 2019-03-06 | 2022-11-22 | Beijing Bytedance Network Technology Co., Ltd. | Usage of converted uni-prediction candidate |
US11930165B2 (en) | 2019-03-06 | 2024-03-12 | Beijing Bytedance Network Technology Co., Ltd | Size dependent inter coding |
CN114009050A (en) * | 2019-06-21 | 2022-02-01 | 北京字节跳动网络技术有限公司 | Adaptive In-loop Color Space Transform for Video Codec |
US11778233B2 (en) | 2019-06-21 | 2023-10-03 | Beijing Bytedance Network Technology Co., Ltd | Selective use of adaptive in-loop color-space transform and other video coding tools |
RU2781437C1 (en) * | 2019-09-24 | 2022-10-12 | Тенсент Америка Ллс | Transmitting information about the size of the encoding tree unit |
US11240507B2 (en) * | 2019-09-24 | 2022-02-01 | Qualcomm Incorporated | Simplified palette predictor update for video coding |
US12278949B2 (en) | 2019-11-07 | 2025-04-15 | Beijing Bytedance Technology Co., Ltd. | Quantization properties of adaptive in-loop color-space transform for video coding |
CN116437086A (en) * | 2019-12-30 | 2023-07-14 | 阿里巴巴(中国)有限公司 | Method and apparatus for encoding video data in palette mode |
CN111327950A (en) * | 2020-03-05 | 2020-06-23 | 腾讯科技(深圳)有限公司 | Video transcoding method and device |
Also Published As
Publication number | Publication date |
---|---|
CN106254871A (en) | 2016-12-21 |
JP2018137796A (en) | 2018-08-30 |
EP3104607A1 (en) | 2016-12-14 |
CN106254871B (en) | 2020-08-18 |
TWI574551B (en) | 2017-03-11 |
US10225556B2 (en) | 2019-03-05 |
JP2017022696A (en) | 2017-01-26 |
TW201709730A (en) | 2017-03-01 |
US20160360207A1 (en) | 2016-12-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20160360205A1 (en) | Video encoding methods and systems using adaptive color transform | |
US10390020B2 (en) | Video encoding methods and systems using adaptive color transform | |
US11979601B2 (en) | Encoder-side search ranges having horizontal bias or vertical bias | |
US11438618B2 (en) | Method and apparatus for residual sign prediction in transform domain | |
CN110024398B (en) | Local hash-based motion estimation for screen remoting scenarios | |
EP3158751B1 (en) | Encoder decisions based on results of hash-based block matching | |
US20190246122A1 (en) | Palette coding for video coding | |
CN106170092B (en) | Fast coding method for lossless coding | |
US9877044B2 (en) | Video encoder and operation method thereof | |
US10542274B2 (en) | Dictionary encoding and decoding of screen content | |
CN106254870B (en) | Video encoding method, system and computer-readable recording medium using adaptive color conversion | |
US10038917B2 (en) | Search strategies for intra-picture prediction modes | |
US10735725B2 (en) | Boundary-intersection-based deblock filtering | |
US20160269732A1 (en) | Encoder-side decisions for screen content encoding | |
US20220417511A1 (en) | Methods and systems for performing combined inter and intra prediction | |
KR20210099008A (en) | Method and apparatus for deblocking an image | |
KR20180029277A (en) | Encoding method and device, decoding method and device, and computer-readable storage medium | |
US20170006283A1 (en) | Computationally efficient sample adaptive offset filtering during video encoding | |
US20160373739A1 (en) | Intra/inter decisions using stillness criteria and information from previous pictures | |
EP3104606B1 (en) | Video encoding methods and systems using adaptive color transform | |
WO2015131304A1 (en) | Dictionary encoding and decoding of screen content | |
TWI597977B (en) | Video encoding methods and systems using adaptive color transform |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: INDUSTRIAL TECHNOLOGY RESEARCH INSTITUTE, TAIWAN Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:CHANG, YAO-JEN;LIN, CHUN-LUNG;TU, JIH-SHENG;AND OTHERS;REEL/FRAME:037922/0075 Effective date: 20160307 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |