The present application claims timely priority and benefit from international patent application number PCT/CN2019/102289 filed on 23/8/2019, in accordance with the regulations of the applicable patent laws and/or paris convention. The entire disclosure of the foregoing application is hereby incorporated by reference as part of the disclosure of the present application for all purposes in accordance with the law.
Detailed Description
Embodiments of the disclosed techniques may be applied to existing video codec standards (e.g., HEVC, h.265) and future standards to improve compression performance. Section headings are used herein to enhance readability of the description, and discussion or embodiments (and/or implementations) are not limited in any way to only the respective sections.
1. Introduction to the design reside in
This document relates to video coding and decoding techniques. In particular, the present document relates to adaptive resolution conversion in video encoding or decoding. It can be applied to existing video/image codec standards, such as HEVC, or to-be-planned standards (multifunctional video codec). It may also be applicable to future video codec standards or video codecs.
2. Preliminary discussion
The video codec standard has evolved largely by the development of the well-known ITU-T and ISO/IEC standards. ITU-T developed H.261 and H.263, ISO/IEC developed MPEG-1 and MPEG-4 visuals, and both organizations jointly developed the H.262/MPEG-2 video, H.264/MPEG-4 Advanced Video Codec (AVC), and H.265/HEVC standards. Since h.262, the video codec standard is based on a hybrid video codec structure, in which temporal prediction plus transform coding is employed. To explore future video codec technologies beyond HEVC, VCEG and MPEG have together established the joint video exploration team (jfet) in 2015. Since then, JFET has adopted many new approaches and applied them to a reference software named Joint Exploration Model (JEM). In month 4 of 2018, a joint video experts team (jfet) was created between VCEG (Q6/16) and ISO/IEC JTC1 SC29/WG11(MPEG) in an effort to study the VVC standard targeting a 50% bit rate reduction compared to HEVC.
AVC and HEVC do not have the ability to change resolution without having to introduce IDR or Intra Random Access Point (IRAP) pictures; such a capability may be denoted as Adaptive Resolution Change (ARC). There are some use cases or application scenarios that will benefit from ARC features, including the following:
rate adaptation in video telephony and conferencing: to adapt the codec video according to changing network conditions, when the network conditions get worse so that the available bandwidth becomes lower, the encoder can adapt to this bandwidth by encoding lower resolution pictures. Currently, the change of picture resolution can only be done after an IRAP picture; there are several problems with doing so. IRAP pictures at reasonable quality will be much larger than inter-coded pictures and are correspondingly more complex to decode: this takes time and resources. This is also a problem if the resolution change is requested by the decoder for load reasons. It may also undermine the low-delay buffer condition, forcing audio resynchronization, and the end-to-end delay of the stream will increase, at least temporarily. This may lead to a poor user experience.
Active speaker variation in multi-party video conferencing: for multiparty video conferencing, it is common to show the active speakers with a larger video size than the video for the remaining conference participants. It may also be necessary to adjust the picture resolution of each participant as the active speaker changes. The need to have an ARC feature becomes more important when such changes in the active speaker occur frequently.
-fast start in streaming media: for streaming applications, it is common that the application will buffer decoded pictures up to a certain length before starting display. Starting the bitstream with a lower resolution will allow the application to have enough pictures in the buffer to start displaying faster.
Adaptive stream switching in streaming media: dynamic adaptive streaming over HTTP specification (DASH) includes a feature named @ mediastreamstructurewld. This enables switching between different representations at an open GOP random access point (e.g., CRA pictures associated with RASL pictures in HEVC) with non-decodable leading pictures. When two different representations of the same video have different bitrates but the same spatial resolution while they have the same value of @ mediastream structure id, switching between the two representations can be performed at the CRA picture associated with the RASL picture and the RASL picture associated with the switching at the CRA picture can be decoded with acceptable quality, thus enabling seamless switching. With the aid of ARC, the @ medias streams structure id feature will also be available for switching between DASH representations with different spatial resolutions.
ARC is also known as dynamic resolution conversion.
ARC can be seen as a special case of Reference Picture Resampling (RPR) (e.g., h.263 Annex P).
Reference picture resampling in h.263 Annex P
This mode describes an algorithm that warps (warp) a reference picture before it is used for prediction. Which may be used to resample a reference picture that has a different source format than the source format of the picture being predicted. It can also be used for global motion estimation or estimation of rotational motion by warping the shape, size and position of the reference picture. The syntax includes the warping parameters to be used and the resampling algorithm. The simplest level of operation with reference to the picture resampling mode is the resampling of the implicit factor 4 when only FIR filters have to be applied to the upsampling and downsampling processes. In this case, when the size of the new picture (indicated in the picture header) is different from the size of the previous picture, no additional signaling overhead is needed because its purpose can be understood.
1.2. VVC-oriented literature on ARC
Several documents have been proposed to address ARC, as listed below:
JVET-M0135、JVET-M0259、JVET-N0048、JVET-N0052、JVET-N0118、JVET-N0279。
ARC in JFET-O2001-v 14
ARC, also known as RPR (reference Picture resampling), is incorporated in JFET-O2001-v 14.
For RPR in jfet-O2001-v 14, TMVP is disabled if the co-located picture has a different resolution than the current picture. Further, when the reference picture has a different resolution from the current picture, the BDOF and DMVR are disabled.
To handle normal MC when the reference picture has a different resolution than the current picture, the following interpolation section is specified (section number refers to the current version of VVC, italic text indicates the changes proposed to the specification).
8.5.6.3.1 overview
The inputs to this process are:
-a luminance position (xSb, ySb) specifying an upper left luma sample of the current codec sub-block relative to an upper left luma sample of the current picture,
a variable sbWidth specifying the width of the current codec sub-block,
a variable sbHeight specifying the height of the current codec subblock,
-a motion vector offset mvOffset,
-a refined motion vector refMvLX,
-a selected reference picture sample point array refPicLX,
-half-sample interpolation filter index hpeliffIdx,
-a bi-directional optical flow flag bdofllag,
-a variable cIdx specifying the color component index of the current block.
The outputs of this process are:
-an array of (sbWidth + brdexsize) x (sbHeight + brdexsize) values predSamplesLX of predicted sample values.
The prediction block boundary (border) extension size brdExtSize is derived as follows:
brdExtSize=(bdofFlag||(inter_affine_flag[xSb][ySb]&&sps_affine_prof_enabled_flag))?2:0 (8-752)
the variable freckwidth is set equal to picoutputwidth l of the reference picture measured in luminance samples.
The variable freghight is set equal to PicOutputHeightL of the reference picture measured in luma samples.
The motion vector mvLX is set equal to (refMvLX-mvOffset).
-if cIdx is equal to 0, then the following applies:
-defining the scaling factor and its fixed-point representation as:
hori_scale_fp=
((fRefWidth<<14)+(PicOutputWidthL>>1))/PicOutputWidthL (8-753)
vert_scale_fp=
((fRefHeight<<14)+(PicOutputHeightL>>1))/PicOutputHeightL(8-754)
let (xtintl, yIntL) be the luminance position given in units of full samples and (xFracL, yFracL) be the offset given in units of 1/16 samples. These variables are only used in this section to specify the fractional sample position within the reference sample array refPicLX.
-top left coordinates of bounding blocks (bounding blocks) to be used for reference sample padding
(xSbIntL,ySbIntL) Is set equal to (xSb + (mvLX [0]]>>4),ySb+(mvLX[1]>>4))。
-for each luminance sample position within the array of predicted luminance samples predSamplesLX
(xL=0..sbWidth-1+brdExtSize,
yLsbHeight-1+ brdexsize), the corresponding predicted luminance sample value predSamplesLX [ x ] is derived as followsL][yL]:
Order (refxSb)L,refySbL) And (refx)L,refyL) Is given as a motion vector (refMvLX [0 ]) in units of 1/16 samples],refMvLX[1]) The pointed brightness location. The variable refxSb is derived as followsL、refxL、refySbLAnd refyL:
refxSbL=((xSb<<4)+refMvLX[0])*hori_scale_fp(8-755)
refxL=((Sign(refxSb)*((Abs(refxSb)+128)>>8)+xL*((hori_scale_fp+8)>>4))+32)>>6(8-756)
refySbL=((ySb<<4)+refMvLX[1])*vert_scale_fp(8-757)
refyL=((Sign(refySb)*((Abs(refySb)+128)>>8)+yL*((vert_scale_fp+8)>>4))+32)>>6 (8-758)
Deriving the variable xtint as followsL、yIntL、xFracLAnd yFracL:
xIntL=refxL>>4 (8-759)
yIntL=refyL>>4 (8-760)
xFracL=refxL&15 (8-761)
yFracL=refyL&15 (8-762)
-if bdofFlag equals true or (sps _ affine _ prof _ enabled _ flag equals true and inter _ affine _ flag [ xSb ]][ySb]Equal to true), and one or more of the following conditions is true, then (xtin) is trueL+(xFracL>>3)-1),yIntL+(yFracL> 3) -1) and refPicLX as inputs by invoking the Brightness integer sample retrieval procedure specified in clause 8.5.6.3.3Deriving predicted luma sample value predSamplesLX [ x ]L][yL]。
-xLEqual to 0.
-xLEqual to sbWidth + 1.
-yLEqual to 0.
-yLEqual to sbHeight + 1.
Else, in the presence of (xIntL- (brdExtSize > 01: 0), yIntL- (brdExtSize > 01: 0)), (xFracL, yFracL), (xSbInt)L,ySbIntL) refPicLX, hpeliffidx, sbWidth, sbHeight and (xSb, ySb) are input and the predicted luminance sample values predSamplesLX [ xL ] are derived by calling the luminance sample 8 tap interpolation filtering process specified in clause 8.5.6.3.2][yL]。
Else (cIdx not equal to 0), then the following applies:
let (xtintc, yIntC) be the chroma position given in units of full samples and (xFracC, yFracC) be the offset given in units of 1/32 samples. These variables are only used in this section to specify the general fractional sample position within the reference sample array refPicLX.
-setting the top left coordinate (xSbIntC, ySbIntC) of the bounding box for reference sample point padding equal to ((xSb/SubWidthC) + (mvLX [0] > 5), (ySb/SubHeightC) + (mvLX [1] > 5)).
For each chroma sample position (xC 0.. sbWidth-1, yC 0.. sbHeight-1) within the predicted chroma sample array predSamplesLX, the corresponding predicted chroma sample value predSamplesLX [ xC ] [ yC ] is derived as follows:
order (refxSb)C,refySbC) And (refx)C,refyC) Is given as a motion vector (mvLX [0 ]) in units of 1/32 samples],mvLX[1]) The pointed chromaticity position. The variable refxSb is derived as followsC、refySbC、refxCAnd refyC:
refxSbC=((xSb/SubWidthC<<5)+mvLX[0])*hori_scale_fp(8-763)
refxC=((Sign(refxSbC)*((Abs(refxSbC)+256)>>9)+xC*((hori_scale_fp+8)>>4))+16)>>5(8-764)
refySbC=((ySb/SubHeightC<<5)+mvLX[1])*vert_scale_fp(8-765)
refyC=((Sign(refySbC)*((Abs(refySbC)+256)>>9)+yC*((vert_scale_fp+8)>>4))+16)>>5(8-766)
Deriving the variable xtint as followsC、yIntC、xFracCAnd yFracC:
xIntC=refxC>>5 (8-767)
yIntC=refyC>>5 (8-768)
xFracC=refyC&31 (8-769)
yFracC=refyC&31 (8-770)
-deriving the predicted sample point value predSamplesLX [ xC ] [ yC ] by calling the procedure specified in clause 8.5.6.3.4 with (xIntC, yitc), (xFracC, yFracC), (xsbtintc, yssbintc), sbWidth, sbHeight, and refPicLX as input.
Luminance sample interpolation filtering process
The inputs to this process are:
luminance position in units of full-pel (xInt)L,yIntL),
Luminance position in units of fractional samples (xFrac)L,yFracL),
Luminance position in units of full spots (xsbtin)L,ySbIntL) Specifying an upper left sample of a bounding block for reference sample padding relative to an upper left luminance sample of a reference picture,
luminance reference sample array refPicLXL,
-half-sample interpolation filter index hpeliffIdx,
a variable sbWidth specifying the width of the current subblock,
a variable sbHeight specifying the height of the current subblock,
-a luminance position (xSb, ySb) specifying an upper left luminance sample of the current sub-block relative to an upper left luminance sample of the current picture,
the output of this process is the predicted luminance sample value predsamplelXL
The variables shift1, shift2, and shift3 were derived as follows:
-setting the variable shift1 equal to the minimum value (4, BitDepth)Y-8), set variable shift2 equal to 6, and set variable shift3 equal to the maximum value (2, 14-BitDepth)Y)。
-setting the variable picW equal to pic _ width _ in _ luma _ samples and the variable picH equal to pic _ height _ in _ luma _ samples.
Is equal to xFrac as derived as followsLOr yFracLOf each 1/16 fractional sample point position p, and a luminance interpolation filter coefficient fL[p]:
If MotionModelIdc [ xSb ]][ySb]Greater than 0 and sbWidth and sbHeight are both equal to 4, then the luminance interpolation filter coefficient fL[p]Are specified in tables 8-12.
-otherwise, depending on hpeliffidx, the luminance interpolation filter coefficient fL[p]Are specified in tables 8-11. For i ═ 0..7, the luminance position in units of full-pel (xInt) is derived as followsi,yInti):
-if subpac _ managed _ as _ pic _ flag SubPicIdx is equal to 1, then the following applies:
xInti=Clip3(SubPicLeftBoundaryPos,SubPicRightBoundaryPos,xIntL+i-3)(8-771)
yInti=Clip3(SubPicTopBoundaryPos,SubPicBotBoundaryPos,yIntL+i-3)(8-772)
else (supplemental _ managed _ as _ pic _ flag [ SubPicIdx ] is equal to 0), then the following applies:
xInti=Clip3(0,picW-1,sps_ref_wraparound_enabled_flagClipH((sps_ref_wraparound_offset_minus1+1)*MinCbSizeY,picW,xI
ntL+i-3):(8-773)
xIntL+i-3)
yInti=Clip3(0,picH-1,yIntL+i-3) (8-774)
for i-0.. 7, the luminance position in units of full-pel is further modified as follows:
xInti=Clip3(xSbIntL-3,xSbIntL+sbWidth+4,xInti) (8-775)
yInti=Clip3(ySbIntL-3,ySbIntL+sbHeight+4,yInti) (8-776)
the predicted luminance sample value predSampleLX is derived as followsL:
If xFracLAnd yFracLBoth equal 0, then predSampleLX is derived as followsLThe value of (c):
predSampleLXL=refPicLXL[xInt3][yInt3]<<shift3 (8-777)
else, if xFracLNot equal to 0 and yFracLEqual to 0, then predSampleLX is derived as followsLThe value of (c):
else, if xFracLEqual to 0, and yFracLNot equal to 0, then predSampleLX is derived as followsLThe value of (c):
else, if xFracLNot equal to 0 and yFracLNot equal to 0, then predSampleLX is derived as followsLThe value of (c):
-deriving a sample point array temp [ n ], where n ═ 0..7, as follows:
-deriving the predicted luminance sample value predSampleLX as followsL:
Tables 8-11 luminance interpolation filter coefficients f for each 1/16 fractional sample position pL[p]Is specified by
Tables 8-12-luminance interpolation filter coefficients f for each 1/16 fractional sample position p for affine motion modeL[p]Is specified by
Luminance integer sample retrieval process
The inputs to this process are:
luminance position in units of full-pel (xInt)L,yIntL),
Luminance reference sample array refPicLXL,
The output of this process is the predicted luminance sample value predsamplelXL
Set variable shift equal to maximum (2, 14-BitDepth)Y)。
The variable picW is set equal to pic _ width _ in _ luma _ samples and the variable picH is set equal to pic _ height _ in _ luma _ samples.
The luminance position (xInt, yInt) in units of full-pel is derived as follows:
xInt=Clip3(0,picW-1,sps_ref_wraparound_enabled_flag?(8-782)
ClipH((sps_ref_wraparound_offset_minus1+1)*MinCbSizeY,picW,xIntL):xIntL)
yInt=Clip3(0,picH-1,yIntL) (8-783)
the predicted luminance sample value predSampleLX is derived as followsL:
predSampleLXL=refPicLXL[xInt][yInt]<<shift3
(8-784)
Chroma sampling interpolation process
The inputs to this process are:
chroma position in units of full samples (xtint)C,yIntC),
Chroma position in units of 1/32 fractional samples (xFrac)C,yFracC),
-a chroma position in units of full samples (xSbIntC, ySbIntC) specifying an upper left sample of a bounding block for reference sample padding relative to an upper left chroma sample of a reference picture,
a variable sbWidth specifying the width of the current subblock,
a variable sbHeight specifying the height of the current subblock,
-an array of chrominance reference samples refPicLXC。
The output of this process is the predicted chroma sample value predSampleLXC
The variables shift1, shift2, and shift3 were derived as follows:
the variable shift1 is set equal to the minimum value (4, BitDepth)C-8), set variable shift2 equal to 6, and set variable shift3 equal to the maximum value (2, 14-BitDepth)C)。
Will change the variable picWCSet equal to pic _ width _ in _ luma _ samples/SubWidthC and set the variable picHCSet equal to pic _ height _ in _ luma _ samples/sub height c.
In tables 8-13, it is specified that x is equal to xFracCOr yFracCOf each 1/32 fractional sample point position p, and a chroma interpolation filter coefficient fC[p]。
The variable xOffset is set equal to (sps _ ref _ winding _ offset _ minus1+1) × MinCbSizeY)/SubWidthC.
For i ═ 0..3, the chroma position in units of full-pel (xInt) is derived as followsi,yInti):
-if subpac _ managed _ as _ pic _ flag SubPicIdx is equal to 1, then the following applies:
xInti=Clip3(SubPicLeftBoundaryPos/SubWidthC,SubPicRightBoundaryPos/SubWidthC,xIntL+i)(8-785)
yInti=Clip3(SubPicTopBoundaryPos/SubHeightC,SubPicBotBoundaryPos/SubHeightC,yIntL+i) (8-786)
else (supplemental _ managed _ as _ pic _ flag [ SubPicIdx ] is equal to 0), then the following applies:
xInti=Clip3(0,picWC-1,
sps_ref_wraparound_enabled_flagClipH(xOffset,picWC,xIntC+i-1):
(8-787)
xIntC+i-1)
yInti=Clip3(0,picHC-1,yIntC+i-1) (8-788)
for i-0.. 3, the chroma position in full-pel units (xtint) is further modified as followsi,yInti):
xInti=Clip3(xSbIntC-1,xSbIntC+sbWidth+2,xInti) (8-789)
yInti=Clip3(ySbIntC-1,ySbIntC+sbHeight+2,yInti) (8-790)
The predicted chroma sample value predSampleLX is derived as followsC:
If xFracCAnd yFracCBoth equal 0, then predSampleLX is derived as followsCThe value of (c):
predSampleLXC=refPicLXC[xInt1][yInt1]<<shift3 (8-791)
otherwise, if xFracCNot equal to 0 and yFracCEqual to 0, then predSampleLX is derived as followsCThe value of (c):
otherwise, if xFracCEqual to 0, and yFracCNot equal to 0, then predSampleLX is derived as followsCThe value of (c):
otherwise, if xFracCNot equal to 0 and yFracCNot equal to 0, then predSampleLX is derived as followsCThe value of (c):
-deriving a sample point array temp [ n ], where n ═ 0.. 3:
-deriving the predicted chroma sample value predSampleLX as followsC:
predSampleLXC=(fC[yFracc][0]*temp[0]+fC[yFracC][1]*temp[1]+fC[yFracC][2]*temp[2]+
(8-795)
fC[yFracc][3]*temp[3])>>shift2
Tables 8-13-chroma interpolation filter coefficients f for each 1/32 fractional sample point position pC[p]Is specified by
2. Technical problem associated with current video coding and decoding techniques
When RPR is applied in VVC, RPR (arc) may have the following problems:
1. in terms of RPR, the interpolation filter may be different for adjacent samples within a block, which is undesirable in SIMD (single instruction multiple data) implementations.
2. Bounding regions do not consider RPR
3. Enumeration of embodiments and technical solutions
The following list should be considered as an example to explain the general concept. These items should not be construed narrowly. Furthermore, these items can be combined in any manner.
A motion vector is represented by (mv _ x, mv _ y), where mv _ x is a horizontal component and mv _ y is a vertical component.
1. When the resolution of the reference picture differs from the resolution of the current picture, the prediction values of the group of samples (at least two samples) of the current block can be generated by means of the same horizontal and/or vertical interpolation filter.
a. In one example, the group may include all of the samples within the region of the block.
i. For example, a block may be divided into S MxN rectangles that do not overlap each other. Each MxN rectangle is a group. In the example shown in fig. 1, 16x16 tiles may be divided into 16 4x4 rectangles, each of which is a group.
For example, a row with N samples is a group. N is an integer no greater than the block width. In one example, N is 4 or 8 or the block width.
For example, a column with N samples is a group. N is an integer no greater than the block height. In one example, N is 4 or 8 or block height.
M and/or N may be predefined or derived on the fly (e.g., based on block dimension/codec information) or signaled.
b. In one example, samples within a group may have the same MV (represented as a shared MV).
c. In one example, the samples within a group may have MVs with the same horizontal component (denoted as the shared horizontal component).
d. In one example, samples within a group may have MVs with the same vertical component (represented as sharing a vertical component).
e. In one example, the samples within a group may have MVs with the same horizontal component fractional portion (represented as sharing a fractional horizontal component).
i. For example, assuming that the MVs of the first samples are (MV1x, MV1y) and the MVs of the second samples are (MV2x, MV2y), then MV1x should be satisfied&(2M-1) equals MV2x&(2M-1), wherein M is tableIndicating MV accuracy. For example, M ═ 4.
f. In one example, samples within a group may have MVs with the same vertical component fractional part (represented as a shared fractional vertical component).
i. For example, assuming that the MVs of the first samples are (MV1x, MV1y) and the MVs of the second samples are (MV2x, MV2y), then MV1y should be satisfied&(2M-1) equals MV2y&(2M-1), where M represents MV accuracy. For example, M ═ 4.
g. In one example, for samples in the group to be predicted, one may first derive (refx) from the resolution of the current picture and the reference picture (e.g., among 8.5.6.3.1 in JFET-O2001-v 14L,refyL) Derived from MV)bThe motion vector of the representation. Thereafter, the MV may be adjustedbFurther modified (e.g., rounded/truncated/clipped) to MV 'to meet the requirements such as bullets above, and the predicted samples for that sample will be derived using MV'.
i. In one example, MV' is associated with MVbHave the same integer part and set the fractional part of MV' to share fractional horizontal and/or vertical components.
in one example, set MV' to be the closest MV with a shared score horizontal and/or vertical componentb。
h. The shared motion vector (and/or the shared horizontal component and/or the shared vertical component and/or the shared fractional horizontal component and/or the shared fractional vertical component) may be set to the motion vector (and/or the horizontal component and/or the vertical component and/or the fractional horizontal component and/or the fractional vertical component) of a particular sample point within the group.
i. For example, the particular sampling points may be at corners of a group of rectangles, such as "a", "B", "C", and "D" shown in fig. 2A.
For example, the particular sampling point may be at the center of a group of rectangles, such as "E", "F", "G", and "H" shown in fig. 2A.
For example, the particular sampling point may be at the end of a row-like or column-like group, such as "a" and "D" shown in fig. 2B and 2C.
For example, the particular sampling point may be in the middle of a row-like or column-like group, such as "B" and "C" shown in fig. 2B and 2C.
v. in one example, the motion vector for the particular sample point may be the MV mentioned in item symbol gb。
i. The shared motion vector (and/or the shared horizontal component and/or the shared vertical component and/or the shared fractional horizontal component and/or the shared fractional vertical component) may be set to the motion vector (and/or the horizontal component and/or the vertical component and/or the fractional horizontal component and/or the fractional vertical component) of a virtual sample point located at a different position than all sample points within this group.
i. In one example, the virtual sampling points are not within the group, but rather are located within an area that covers all of the sampling points within the group.
1) Alternatively, the virtual sample is located outside of an area covering all samples within the group, e.g., next to the bottom right of the area.
in one example, MVs for virtual samples are derived in the same way as real samples, but at different locations.
"V" in FIGS. 2A-2C show three examples of virtual sampling points.
j. The shared MV (and/or the shared horizontal component and/or the shared vertical component and/or the shared fractional horizontal component and/or the shared fractional vertical component) may be set as a function of the MVs (and/or the horizontal component and/or the vertical component and/or the fractional horizontal component and/or the fractional vertical component) of the plurality of samples and/or the virtual samples.
i. For example, the shared MV (and/or the shared horizontal component and/or the shared vertical component and/or the shared fractional horizontal component and/or the shared fractional vertical component) may be set as an average of all or part of the samples within the group, or of samples "E", "F", "G", "H" in fig. 2A, or of samples "E", "H" in fig. 2A, or of samples "a", "B", "C", "D" in fig. 2A, or of samples "a", "D" in fig. 2A, or of samples "B", "C" in fig. 2B, or of samples "a", "D" in fig. 2B, or of samples "B", "C" in fig. 2C, or of samples "a", "D" in fig. 2C (and/or the horizontal component and/or the vertical component and/or the fractional horizontal component and/or the fractional vertical component),
2. it is proposed to allow only integer MVs to perform a motion compensation process when the resolution of a reference picture is different from the resolution of a current picture to derive a prediction block for a current block.
a. In one example, the decoded motion vector of the sample to be predicted is rounded to an integer MV before being used.
3. The motion vectors used in the motion compensation process of samples in the current block (e.g., shared MV/shared horizontal or vertical or fractional component/MV' as mentioned in bullet above) may be stored into the decoded picture buffer and used for motion vector prediction of subsequent blocks in the current/different picture.
a. Alternatively, the motion vectors used in the motion compensation process of samples in the current block (e.g. shared MV/shared horizontal or vertical or fractional component/MV' as mentioned in bullets above) may not be allowed to be used for motion vector prediction of subsequent blocks in the current/different picture.
i. In one example, the decoded motion vector (e.g., the MV in the above bullet) may be decodedb) For motion vector prediction of subsequent blocks in the current/different picture.
b. In one example, the motion vectors used in the motion compensation process for samples in the current block may be used in a filtering process (e.g., deblocking filter/SAO/ALF).
i. Alternatively, the decoded motion vector (e.g., MV in bullets above) may be decodedb) Is used in the filtering process.
4. It is proposed that the interpolation filter used in the motion compensation process to derive the prediction block of the current block may be selected depending on whether the resolution of the reference picture is different from the resolution of the current picture.
a. In one example, the interpolation filter has fewer taps when the resolution of the reference picture is different from the resolution of the current picture.
i. In one example, a bilinear filter is applied when the resolution of the reference picture is different from the resolution of the current picture.
in one example, a 4-tap filter or a 6-tap filter is applied when the resolution of the reference picture is different from the resolution of the current picture.
5. It is proposed to apply a two-stage process for prediction block generation when the resolution of the reference picture is different from the resolution of the current picture.
a. In the first stage, a virtual reference block is generated by upsampling or downsampling an area in a reference picture according to the width and/or height of the current picture and the reference picture.
b. In a second stage, prediction samples are generated from the virtual reference block by applying interpolation filtering independent of the width and/or height of the current and reference pictures.
6. It is proposed that the top left coordinate (xsbtin) of a bounding block for reference sample padding as defined in 8.5.6.3.1 in jfet-O2001-v 14 may be derived from the width and/or height of the current picture and the reference picture (xsbtin)L,ySbIntL) And (4) calculating.
a. In one example, the luminance position in units of full-pel is modified as:
xInti=Clip3(xSbIntL-Dx,xSbIntL+sbWidth+Ux,xInti),
yInti=Clip3(ySbIntL-Dy,ySbIntL+sbHeight+Uy,yInti),
wherein Dx and/or Dy and/or Ux and/or Uy may depend on the width and/or height of the current picture and the reference picture.
b. In one example, the chromaticity position in units of full-pel is modified as:
xInti=Clip3(xSbIntC-Dx,xSbIntC+sbWidth+Ux,xInti)
yInti=Clip3(ySbIntC-Dy,ySbIntC+sbHeight+Uy,yInti)
wherein Dx and/or Dy and/or Ux and/or Uy may depend on the width and/or height of the current picture and the reference picture.
7. It is proposed whether and/or how to fill the space according to bounding blocks (e.g., (xSbInt) as defined in 8.5.6.3.1 in JFET-O2001-v 14 for reference samplesL,ySbIntL) Clipping MVs may depend on the use of DMVR.
a. In one example, padding from bounding blocks for reference samples (e.g., (xsbtin) as defined in 8.5.6.3.1) is only applied when DMVR is appliedL,ySbIntL) ) crop MVs.
i. For example, operations 8-775 and 8-776 in the luma sample interpolation filtering process as defined in JFET-O2001-v 14 are applied only if DMVR is used for the current block.
For example, operations 8-789 and 8-790 in the chroma sampling interpolation filtering process as defined in JFET-O2001-v 14 are applied only if DMVR is used for the current block.
b. Alternatively, the above method may also be applied to the clipping of chroma samples.
8. It is proposed whether and/or how to fill the space according to bounding blocks (e.g., (xSbInt) as defined in 8.5.6.3.1 in JFET-O2001-v 14 for reference samplesL,ySbIntL) The cropped MV may depend on whether picture packing is used (e.g., sps _ ref _ wraparound _ enabled _ flag is equal to 0 or 1).
a. In one example, padding according to bounding blocks for reference samples (e.g., (xsbtin) as defined in 8.5.6.3.1) is only done without using picture packing (e.g., as described inL,ySbIntL) ) crop MVs.
i. For example, operations 8-775 and 8-776 in the luma sample interpolation filtering process as defined in JFET-O2001-v 14 are applied only without using picture packing.
For example, operations 8-789 and 8-790 in the chroma sampling interpolation filtering process as defined in JFET-O2001-v 14 are applied only without using picture packing.
b. Alternatively, the above method may also be applied to the clipping of chroma samples.
c. In one example, the luminance position in units of full-pel is modified as:
xInti=Clip3(xSbIntL-Dx,xSbIntL+sbWidth+Ux,xInti),
yInti=Clip3(ySbIntL-Dy,ySbIntL+sbHeight+Uy,yInti),
wherein Dx and/or Dy and/or Ux and/or Uy may depend on whether picture packing is used.
d. In one example, the chromaticity position in units of full-pel is modified as:
xInti=Clip3(xSbIntC-Dx,xSbIntC+sbWidth+Ux,xInti)
yInti=Clip3(ySbIntC-Dy,ySbIntC+sbHeight+Uy,yInti)
wherein Dx and/or Dy and/or Ux and/or Uy may depend on whether picture packing is used.
9. Whether/how the filtering process (e.g., deblocking filter) is applied may depend on whether the reference picture has a different resolution.
a. In one example, the boundary strength setting in the deblocking filter may take into account resolution differences in addition to motion vector differences.
b. In one example, the boundary strength setting in the deblocking filter may take into account the scaled motion vector difference based on the resolution difference.
c. In one example, the strength of the deblocking filter is increased if the resolution of the at least one reference picture of block a is different (less than or greater than) the resolution of the at least one reference picture of block B.
d. In one example, if the resolution of the at least one reference picture of block a is different (less than or greater than) the resolution of the at least one reference picture of block B, the strength of the deblocking filter is reduced.
e. In one example, if the resolution of at least one reference picture of block a and/or block B is different (smaller or larger) than the resolution of the current block, the strength of the deblocking filter is increased.
f. In one example, if the resolution of at least one reference picture of block a and/or block B is different (smaller or larger) than the resolution of the current block, the strength of the deblocking filter is reduced.
10. It is proposed to use the actual motion vector in consideration of the resolution difference instead of storing/using the motion vector of the block based on the same reference picture resolution as the current picture.
a. Alternatively, furthermore, when generating a prediction block using motion vectors, it is not necessary to derive (refx) from the resolution of the current picture and the reference picture (e.g., 8.5.6.3.1 in JFET-O2001-v 14L,refyL) ) further changes the motion vector.
11. In one example, when there is a sub-picture, the reference picture must have the same resolution as the current picture.
a. Alternatively, when the reference picture has a different resolution than the current picture, there must not be a sub-picture in the current picture.
12. In one example, the sub-pictures may be separately defined for pictures having different resolutions.
13. In one example, if the reference picture has a different resolution than the current picture, a corresponding sub-picture of the reference picture may be derived by scaling and/or shifting a sub-picture of the current picture.
Fig. 3 is a block diagram of a video processing apparatus 300. The apparatus 300 may be used to implement one or more of the methods described herein. The apparatus 300 may be embodied in a smartphone, tablet, computer, internet of things (IoT) receiver, and the like. The apparatus 300 may include one or more processors 302, one or more memories 304, and video processing hardware 306. The processor(s) 302 may be configured to implement one or more methods described in this document. Memory(s) 304 may be used to store data and code for implementing the methods and techniques described herein. The video processing hardware 306 may be used to implement some of the techniques described in this document in hardware circuits. In some embodiments, the hardware 306 may be at least partially within the processor 302 (e.g., a graphics coprocessor).
The following solution may be implemented as a preferred solution in some embodiments.
The following solution may be implemented with the additional techniques described in the previous section listed items (e.g., item 1).
1. A video processing method (e.g., method 400 shown in fig. 4), comprising: determining (402), for a conversion between a current block of a video and a codec representation of the video, that a resolution of a current picture including the current block and a resolution of a reference picture used for the conversion are different, and performing (404) the conversion based on the determination, thereby generating a prediction value for a group of samples of the current block using a horizontal or vertical interpolation filter.
2. The method of solution 1, wherein the group of samples corresponds to all samples of the current block.
3. The method of solution 1, wherein the group of samples corresponds to a number of samples of the current block.
4. The method of solution 3, wherein the group of samples corresponds to all samples of the region in the current block.
5. The method according to any of solutions 1-4, wherein the group of samples is selected to have the same Motion Vector (MV) used during the conversion.
6. The method according to any of solutions 1-4, wherein the group of samples have the same horizontal motion vector component.
7. The method according to any of solutions 1-4, wherein the group of samples have the same vertical motion vector component.
8. The method of any of solutions 1-4, wherein the group of samples have the same fractional horizontal motion vector component part.
9. The method of any of solutions 1-4, wherein the group of samples have the same fractional vertical motion vector component part.
10. The method according to any of solutions 1-9, wherein during the conversion the motion vector for a particular sample point is derived by modifying the value of the motion vector derived based on the resolution of the current picture and the resolution of the reference picture by means of a modification step comprising truncation, cropping or rounding.
11. The method according to any of solutions 1-7, wherein during the conversion the motion vector of a particular sample is set to the value of a shared motion vector shared by all samples within the group of samples.
12. The method according to any of solutions 1-9, wherein the group of samples share a shared motion vector during the conversion, and wherein the shared motion vector is derived from motion vectors of one or more samples in the group of samples.
13. The method of solution 11, wherein the shared motion vector is further derived from virtual samples.
The following solution may be implemented with additional techniques described among the items listed in the previous section (e.g., item 5).
14. A video processing method, comprising: determining, for a conversion between a current block of video and a codec representation of the video, that a resolution of a current picture including the current block and a resolution of a reference picture used for the conversion are different; and performing the conversion based on the determination such that the prediction value of the group of samples of the current block is generated as an interpolated version of the virtual reference block generated by changing a sampling rate of a region of the reference picture, wherein the sampling rate change depends on a height or width of the current picture or the reference picture.
15. The method of solution 14, wherein the interpolated version is generated using an interpolation filter whose coefficients do not depend on the height or width of the current picture or the reference picture.
The following solution may be implemented with additional techniques described among the items listed in the previous section (e.g., item 6).
16. A video processing method, comprising: determining, for a conversion between a current block of video and a codec representation of the video, that a resolution of a current picture including the current block is different from a resolution of a reference picture used for the conversion; and deriving, based on the determination, an upper-left coordinate of a bounding block for reference sample padding on the basis of a scheme that depends on a height or width of the current picture or the reference picture; and performing the transformation using the derived upper-left coordinate of the bounding box.
17. The method of solution 16, comprising calculating luminance samples at integer sample positions as:
xInti=Clip3(xSbIntL-Dx,xSbIntL+sbWidth+Ux,xInti),
yInti=Clip3(ySbIntL-Dy,ySbIntL+sbHeight+Uy,yInti),
wherein Dx and/or Dy and/or Ux and/or Uy depend on the width and/or height of the current picture or reference picture, and wherein (xsbtin)L,ySbIntL) Is the upper left coordinate.
18. The method of solution 16, comprising calculating chroma samples at integer sample positions as:
xInti=Clip3(xSbIntC-Dx,xSbIntC+sbWidth+Ux,xInti)
yInti=Clip3(ySbIntC-Dy,ySbIntC+sbHeight+Uy,yInti)
wherein Dx and/or Dy and/or Ux and/or Uy depend on the width and/or height of the current picture or reference picture, and wherein (xsbtin)L,ySbIntL) Is the upper left coordinate.
The following solution may be implemented with additional techniques described among the items listed in the previous section (e.g., item 7).
19. A video processing method, comprising: determining, for a conversion between a current block in a current picture of a video and a codec representation of the video, a clipping operation to apply to motion vector calculations from bounding blocks for reference sample point padding based on use of decoder-side motion vector refinement (DMVR) during the conversion of the current block; and performing the conversion based on the clipping operation.
20. The method of solution 19, wherein the determining enables a legacy clipping operation due to using DMVR for the current block.
21. The method of any of solutions 19-20, wherein the current block is a chroma block.
The following solution may be implemented with additional techniques described among the items listed in the previous section (e.g., item 8).
22. A video processing method, comprising: determining, for a conversion between a current block in a current picture of a video and a codec representation of the video, a cropping operation to apply to motion vector calculations from bounding blocks for reference sample point padding based on use of picture packing in the conversion; and performing the conversion based on the clipping operation.
23. The method of solution 22, wherein the determination enables a legacy clipping operation if picture packing is disabled for the current block.
24. The method of any of solutions 22-23, wherein the current block is a chroma block.
25. The method of any of solutions 22-23, wherein the clipping operation is employed to calculate the luminance samples as:
xInti=Clip3(xSbIntL-Dx,xSbIntL+sbWidth+Ux,xInti),
yInti=Clip3(ySbIntL-Dy,ySbIntL+sbHeight+Uy,yInti),
wherein Dx and/or Dy and/or Ux and/or Uy depend on the use of picture packing, and wherein (xSbInt)L,ySbIntL) Representing a bounding box.
26. The method according to any of solutions 1 to 25, wherein the converting comprises encoding the video into a codec representation.
27. The method of any of solutions 1 to 25, wherein the converting comprises decoding the codec representation to generate pixel values of the video.
28. A video decoding apparatus comprising a processor configured to implement the method according to one or more of solutions 1 to 27.
29. A video coding device comprising a processor configured to implement the method according to one or more of solutions l to 27.
30. A computer program product having computer code stored thereon, which code, when executed by a processor, causes the processor to carry out the method according to any of the solutions 1 to 27.
31. A method, apparatus, or system described in this document.
The following examples illustrate features implemented by some preferred embodiments based on the disclosed technology. .
The following examples may preferably be implemented with the additional techniques described in the preceding section listing among items (e.g., item 4). In these examples, the sample values are filtered using an interpolation filter to produce sample values at (fractional) resampling.
1. A video processing method (e.g., method 500 shown in fig. 5A), comprising: selecting (502) an interpolation filter for determining a prediction block for a current block of a current picture of video by motion compensation from a reference picture based on a rule, and performing (504) a conversion between the current block of video and a codec representation of the video based on the prediction block, wherein the rule specifies: the interpolation filter is a first interpolation filter in a case where the resolution of the current picture and the resolution of the reference picture are different, and is a second interpolation filter in a case where the resolution of the current picture and the resolution of the reference picture are the same, wherein the first interpolation filter is different from the second interpolation filter.
2. The method of example 1, wherein the first interpolation filter has fewer taps than the second interpolation filter.
3. The method of any of examples 1-2, wherein the first interpolation filter is a bilinear filter.
4. The method of any of examples 1-3, wherein the first interpolation filter is a 4-tap filter.
5. The method of any of examples 1-3, wherein the first interpolation filter is a 6-tap filter.
The following examples may be preferably implemented with additional techniques described among the items listed in the previous section (e.g., items 11, 12, 13).
6. A video processing method (e.g., method 570 shown in fig. 5H) comprising: performing (572) a conversion between video comprising a current video picture and a codec representation of the video according to a rule, wherein a reference picture is included in at least one of a reference picture list of the current video picture, wherein the current video picture has a first resolution; wherein the reference picture has a second resolution; wherein the rule specifies whether and/or how the current video picture and/or the reference picture is allowed to have the sub-picture depends on at least one of the first resolution or the second resolution.
7. The method of example 6, wherein the rule specifies that the first resolution is equivalent to the second resolution in order for the reference picture to comprise a sub-picture. For example, the rule states that the same resolution may be both a requirement and a sufficiency.
8. The method of example 6, wherein the rule specifies that the first resolution is equal to the second resolution in order for the current picture to include a sub-picture.
9. The method of example 6, wherein the rule specifies that the reference picture cannot be split by sub-picture if the first resolution, etc., is different from the second resolution.
10. The method of example 6, wherein the rule specifies that the current picture cannot be split by sub-picture if the first resolution, etc., is different from the second resolution.
11. The method of any of examples 6-10, wherein the rule specifies that the picture is partitioned into sub-pictures according to picture resolution.
12. The method of example 6, wherein the rule specifies that scaling or offset is used to derive a sub-picture in the reference picture from a sub-picture in the current picture if the first resolution or the like is different from the second resolution.
The following examples may be preferably implemented with the additional techniques described in the preceding section listing the items (e.g., item 1).
13. A video processing method (e.g., method 510 shown in fig. 5B), comprising: determining (512), for a conversion between a current block of a video and a codec representation of the video, that a resolution of a current picture including the current block and a resolution of a reference picture used for the conversion are different, and performing (514) the conversion based on the determination such that prediction values for samples in a group of samples of the current block are generated using the same horizontal interpolation filter or vertical interpolation filter.
14. The method of example 13, wherein the group of samples corresponds to all samples of the current block.
15. The method of example 13, wherein the group of samples corresponds to fewer than all samples of the current block.
16. The method of example 15, wherein the group of samples corresponds to all samples of a region in the current block.
17. The method of any of examples 13-16, wherein the current block is divided into MxN non-overlapping rectangles of samples, wherein M and N are integers, and wherein the group of samples corresponds to the MxN rectangle.
18. The method of any of examples 13-16, wherein the group of samples corresponds to N samples of a line of samples for the current block, where N is an integer.
19. The method of example 18, wherein N is 4 or 8 or equal to the width of the current block.
20. The method of any of examples 13-16, wherein the group of samples corresponds to N samples in a list of samples for the current block, where N is an integer.
21. The method of example 20, wherein N is 4 or 8 or equal to the height of the current block.
22. The method of any of examples 17-21, wherein the values M and N are constant during the transition.
23. The method of any of examples 13-22, wherein the values M and N depend on a dimension of the current block or coding information of the current block or correspond to syntax elements included in the coded representation.
24. The method of example 13, wherein the group of samples are selected from samples having the same motion vector.
25. The method of example 13, wherein the group of samples includes samples having a same horizontal component of the motion vector.
26. The method of example 13, wherein the group of samples includes samples having a same motion vector vertical component.
27. The method of example 13, wherein the group of samples includes samples having the same motion vector horizontal component fraction.
28. The method of example 17, wherein the horizontal component fractional portion is represented using M least significant bits, where M is an integer.
29. The method of example 13, wherein the group of samples includes samples having a same motion vector vertical component fractional portion.
30. The method of example 17, wherein the horizontal component fractional portion is represented using M least significant bits, where M is an integer.
31. The method of any of examples 25-30, wherein the motion vector used to generate the prediction values for the samples in the group of samples corresponds to a final motion vector that is first derived according to the resolution of the current and reference pictures and denoted as MVb, and then modified according to the motion vector characteristics of the group of samples and denoted as MV'.
32. The method of example 31, wherein the integer part of MV 'is the same as the integer part of MVb, and wherein the fractional part of MV' is the same for all samples in the group of pictures.
33. The method of example 31, wherein MV' is selected as the motion vector having the closest match to MVb and having a fractional portion equal to a shared horizontal or shared vertical component between samples in the group of samples.
34. The method of example 13, wherein the prediction values for the samples are generated by using shared motion vector information for all samples in the group of samples, and wherein the shared motion vector information corresponds to a motion vector value for a particular sample in the group of samples.
35. The method of example 34, wherein the group of samples has a rectangular shape, and wherein the particular sample is a corner sample of the rectangular shape.
36. The method of example 34, wherein the group of samples has a rectangular shape, and wherein the particular sample is a center sample of the rectangular shape.
37. The method of example 34, wherein the group of samples has a line shape, and wherein the particular sample is an end sample of the line shape.
38. The method of example 34, wherein the group of samples has a line shape, and wherein the particular sample is a center sample of the line shape.
39. The method of example 34, wherein the prediction values for the samples are generated by using shared motion vector information for all samples in the group of samples, and wherein the shared motion vector information corresponds to a motion vector determined according to a resolution of the current picture and a resolution of the reference picture.
40. The method of example 13, wherein the prediction values for the samples are generated by using shared motion vector information for all samples in the group of samples, and wherein the shared motion vector information corresponds to motion vector values for particular samples that are not in the group of samples.
41. The method of example 40, wherein the particular sample point is within an area that encompasses all sample points in the group of sample points.
42. The method of example 40, wherein the particular sample point is in a region that does not overlap with all sample points within the group of sample points.
43. The method of example 42, wherein the particular sample point is near the bottom right of the group of sample points.
44. The method of any of examples 40-43, wherein a motion vector for the particular sample point is derived for the converting.
45. The method of any of examples 40-44, wherein the particular sample is at a fractional position between a row of samples in the group of samples and/or a column of samples in the group of samples.
46. The method of example 35, wherein the predicted values for the samples are generated using shared motion vector information for all samples in the group of samples, and wherein the shared motion vector information is a function of motion vector information for one or more samples in the group of samples or one or more virtual samples related to the group of samples.
47. The method of example 46, wherein the shared motion vector information is an average of motion vector information of one or more samples in the group of samples or one or more virtual samples related to the group of samples.
48. The method of any of examples 46-47, wherein the one or more virtual samples are at fractional positions relative to the group of samples.
The following examples may preferably be implemented with the additional techniques described in the preceding section listing the items (e.g., item 2).
49. A video processing method (e.g., method 520 shown in fig. 5C), comprising: making (522) a determination of a constraint on use of a motion vector for deriving a prediction block for the current block, since a resolution of a current picture of video including the current block is different from a resolution of a reference picture for the current block; and performing (524) a conversion between the video and a codec representation of the video based on the determination, wherein the constraint specifies that the motion vector is an integer motion vector.
50. The method of example 49, wherein the constraint is enforced during motion compensation by rounding decoded motion vectors from the codec representation to integer values.
The following examples may preferably be implemented with the additional techniques described in the preceding section listing the items (e.g., item 3).
51. A video processing method (e.g., method 530 shown in fig. 5D), comprising: the conversion between video comprising the current block and the codec representation is performed (532) according to a rule, wherein the rule specifies whether motion vector information for determining a prediction block of the current block by motion compensation is made available for motion vector prediction of a subsequent block in a current picture or another picture comprising the current block.
52. The method of example 51, wherein the motion vector information comprises shared motion vector information described in any of the above examples.
53. The method of any of examples 51-52, wherein the motion vector information comprises motion vector information that was coded into the coded representation.
54. The method of any of examples 51-53, wherein the motion vector information is used for motion vector prediction of a subsequent block in the current picture or another picture.
55. The method of any of examples 51-54, wherein the motion vector information is used in a filtering process in the converting.
56. The method of example 55, wherein the filtering process comprises deblocking filtering or sample adaptive offset filtering or adaptive loop filtering.
57. The method of example 51, wherein the rule does not allow the motion vector information to be used for motion compensation or filtering of a subsequent video block in the current picture or another picture.
The following examples may preferably be implemented with the additional techniques described in the preceding section listing the items (e.g., item 5).
58. A video processing method (e.g., method 540 shown in fig. 5E), comprising: making (542) a determination to generate a predicted block for the current block using a two-step process, since a resolution of a current picture of video including the current block is different from a resolution of a reference picture for the current block; and performing (544) a conversion between the video and a codec representation of the video based on the determination, wherein the two-step process includes a first step of resampling a region of the reference picture to generate a virtual reference block and a second step of generating the prediction block by using an interpolation filter on the virtual reference block.
59. The method of example 58, wherein the resampling comprises upsampling or downsampling depending on the width and/or height of the current picture and the reference picture.
60. The method of any of examples 58-59, wherein the interpolation filter is independent of a width and/or a height of the current picture and the reference picture.
The following examples may preferably be implemented with the additional techniques described in the preceding section listing of items (e.g., item 9).
61. A video processing method (e.g., method 550 shown in fig. 5F), comprising: making (552) a first determination that there is a difference between a resolution of one or more reference pictures used to generate a prediction block for a current block of a current picture of video and a resolution of the current picture; making (554), using a rule, a second determination based on the difference, as to whether or how to apply a filtering process to a transition between the video and a codec representation of the video; and performing (556) the conversion in accordance with the second determination.
62. The method of example 61, wherein the rule specifies that a boundary strength of a filter used in the filtering process is a function of the difference.
63. The method of example 62, wherein the rule specifies scaling the boundary strength based on the difference.
64. The method of example 62, wherein the rule specifies that the boundary strength is to be increased from a general value if a first reference picture of a first neighboring block of the current block and a second reference picture of a second neighboring block of the current block have different resolutions.
65. The method of example 62, wherein the rule specifies that the boundary strength is to be reduced from a general value if a first reference picture of a first neighboring block of the current block and a second reference picture of a second neighboring block of the current block have different resolutions.
66. The method of example 62, wherein the rule specifies that the boundary strength is to be increased from a general value if a resolution of a first reference picture of a first neighboring block and/or a resolution of a second reference picture of a second neighboring block of the current block are different.
67. The method of example 62, wherein the rule specifies that the boundary strength is to be reduced from a general value if a resolution of a first reference picture of a first neighboring block and/or a resolution of a second reference picture of a second neighboring block of the current block are different.
68. The method of any of examples 62-67, wherein a resolution of the first reference picture or a resolution of the second reference picture is greater than a resolution of a reference picture of the current block.
69. The method of any of examples 62-67, wherein a resolution of the first reference picture or a resolution of the second reference picture is less than a resolution of a reference picture of the current block.
The following examples may preferably be implemented with the additional techniques described in the preceding section listing the items (e.g., item 10).
70. A video processing method (e.g., method 560 shown in fig. 5G), comprising: performing (562) a conversion between video comprising a plurality of video blocks of a video picture and a codec representation of the video according to a rule, wherein the plurality of video blocks are processed in sequence, wherein the rule specifies that motion vector information for determining a prediction block for a first video block is stored according to a resolution of a reference picture used by the motion vector information and is used during processing of a subsequent video block of the plurality of video blocks.
71. The method of example 70, wherein the rule specifies that the motion vector information is to be used for processing a subsequent video block by adjusting according to a resolution difference between a resolution of the reference picture and a resolution of the current picture.
72. The method of example 70, wherein the rule specifies that the motion vector information is to be used for processing a subsequent video block without adjustment based on a resolution difference between a resolution of the reference picture and a resolution of the current picture.
73. The method of any of examples 1 to 72, wherein the converting comprises decoding the codec representation to generate pixel values of the video.
74. The method of any of examples 1 to 72, wherein the converting comprises encoding codec pixel values of the video into the codec representation.
75. A video decoding apparatus comprising a processor configured to implement the method of one or more of examples 1 to 74.
76. A video encoding apparatus comprising a processor configured to implement the method of one or more of examples 1 to 74.
77. A computer program product having computer code stored thereon, which when executed by a processor causes the processor to implement the method described in any of examples 1 to 74.
78. A non-transitory computer-readable storage medium storing instructions that cause a processor to implement the method of any of examples 1 to 74.
79. A non-transitory computer-readable recording medium storing a bitstream corresponding to a codec representation generated by the method described in example 78.
80. A method, apparatus, or system described in this document.
In the above solution, performing and converting comprises using the results of previous decision steps during the encoding or decoding operation to obtain the conversion result.
Herein, the term "video processing" may refer to video encoding, video decoding, video compression, or video decompression. For example, a video compression algorithm may be applied during conversion from a pixel representation of a video to a corresponding bitstream representation or vice versa. The bitstream representation or codec representation of the current video block may, for example, correspond to bits that are co-located or interspersed at different places within the bitstream, as defined by the syntax. For example, a video block may be encoded in dependence on transformed and coded error residual values and also using bits in headers and other fields in the bitstream. Furthermore, during the transition, the decoder may parse the bitstream knowing that certain fields may or may not be present based on this determination, as described in the solution above. Similarly, the encoder may determine whether certain syntax fields are included and generate the codec representation accordingly by including or excluding these syntax fields from the codec representation.
Some embodiments of the disclosed technology (e.g., the solutions and examples described above) include making a decision or decision to enable a video processing tool or mode. In one example, when a video processing tool or mode is enabled, the encoder will use or implement the tool or mode in the processing of the video blocks, but does not necessarily modify the resulting bitstream based on the use of the tool or mode. That is, when a video processing tool or mode is enabled based on a decision or decision, the conversion from a video block to a bitstream representation of the video will use that video processing tool or mode. In another example, when a video processing tool or mode is enabled, the decoder will process the bitstream knowing that the bitstream has been modified based on the video processing tool or mode. That is, the conversion from a bitstream representation of the video to video blocks will be performed using a video processing tool or mode that is enabled based on the decision or decision.
Some embodiments of the disclosed technology include making a decision or decision to disable a video processing tool or mode. In one example, when a video processing tool or mode is disabled, the encoder will not use that tool or mode in converting a video block into a bitstream representation of the video. In another example, when a video processing tool or mode is disabled, the decoder will process the bitstream knowing that no modifications have been made to the bitstream using the video processing tool or mode that was disabled based on the decision or decision.
The disclosed and other solutions, examples, embodiments, modules, and functional operations described in this document can be implemented in digital electronic circuitry, or in computer software, firmware, or hardware, including the structures disclosed in this document and their structural equivalents, or in combinations of one or more of them. The disclosed embodiments and other embodiments may be implemented as one or more computer program products, i.e., one or more modules of computer program instructions encoded on a computer-readable medium for execution by, or to control the operation of, data processing apparatus. The computer readable medium can be a machine readable storage device, a machine readable storage substrate, a memory device, a composition of matter effecting a machine readable propagated signal, or a combination of one or more of them. The term "data processing apparatus" encompasses all apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers. The apparatus can include, in addition to hardware, code that creates an execution environment for the computer program in question, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them. A propagated signal is an artificially generated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal, that is generated to encode information for transmission to suitable receiver apparatus.
A computer program (also known as a program, software application, script, or code) can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A computer program does not necessarily correspond to a file in a file system. A program can be stored in a portion of a file that holds other programs or data (e.g., one or more scripts stored in a markup language document), in a single file dedicated to the program, or in multiple coordinated files (e.g., files that store one or more modules, sub programs, or portions of code). A computer program can be deployed to be executed on one computer or on multiple computers at one site or distributed across multiple sites and interconnected by a communication network.
The processes and logic flows described in this specification can be performed by one or more computer programs executed by one or more programmable processors to perform functions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit).
Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer. Generally, a processor will receive instructions and data from a read-only memory or a random access memory or both. The essential elements of a computer are a processor for executing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks. However, a computer does not necessarily have such a device. Computer-readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto-optical disks; and CD ROM and DVD ROM disks. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.
While this patent document contains many specifics, these should not be construed as limitations on the scope of any subject matter or claims, but rather as descriptions of features specific to particular embodiments of particular technologies. Certain features that are described in this patent document in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Furthermore, although certain features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.
Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. Moreover, the separation of various system components among the embodiments described in this patent document should not be understood as requiring such separation in all embodiments.
Only a few embodiments and examples have been described and other embodiments, enhancements and variations can be made based on what is described and illustrated in this patent document.