WO2001077871A1 - Organisation renforcee en couches temporelles et par resolution dans la television avancee - Google Patents
Organisation renforcee en couches temporelles et par resolution dans la television avancee Download PDFInfo
- Publication number
- WO2001077871A1 WO2001077871A1 PCT/US2001/011204 US0111204W WO0177871A1 WO 2001077871 A1 WO2001077871 A1 WO 2001077871A1 US 0111204 W US0111204 W US 0111204W WO 0177871 A1 WO0177871 A1 WO 0177871A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- ofthe
- frame
- resolution
- digital video
- Prior art date
Links
- 230000002123 temporal effect Effects 0.000 title claims abstract description 140
- 238000000034 method Methods 0.000 claims abstract description 216
- 238000007906 compression Methods 0.000 claims abstract description 146
- 239000013598 vector Substances 0.000 claims abstract description 130
- 230000006835 compression Effects 0.000 claims abstract description 127
- 238000001914 filtration Methods 0.000 claims abstract description 43
- 230000009467 reduction Effects 0.000 claims abstract description 20
- 230000002708 enhancing effect Effects 0.000 claims abstract description 18
- 238000003384 imaging method Methods 0.000 claims abstract description 16
- 238000006243 chemical reaction Methods 0.000 claims description 37
- 238000012545 processing Methods 0.000 claims description 30
- 230000002829 reductive effect Effects 0.000 claims description 23
- 238000003860 storage Methods 0.000 claims description 22
- 230000001965 increasing effect Effects 0.000 claims description 16
- 238000012360 testing method Methods 0.000 claims description 13
- 238000013139 quantization Methods 0.000 claims description 12
- 238000011068 loading method Methods 0.000 claims description 9
- 238000012935 Averaging Methods 0.000 claims description 8
- 238000004891 communication Methods 0.000 claims description 3
- 230000008093 supporting effect Effects 0.000 claims description 2
- 239000007853 buffer solution Substances 0.000 claims 13
- 230000002194 synthesizing effect Effects 0.000 claims 10
- 230000003247 decreasing effect Effects 0.000 claims 2
- 230000003362 replicative effect Effects 0.000 claims 1
- 230000009466 transformation Effects 0.000 claims 1
- 230000001976 improved effect Effects 0.000 abstract description 18
- 238000004458 analytical method Methods 0.000 abstract description 17
- 230000003416 augmentation Effects 0.000 abstract description 10
- 238000004519 manufacturing process Methods 0.000 abstract description 8
- 238000012512 characterization method Methods 0.000 abstract description 5
- 238000012937 correction Methods 0.000 abstract description 4
- 230000000007 visual effect Effects 0.000 abstract description 4
- 238000005457 optimization Methods 0.000 abstract description 3
- 238000000354 decomposition reaction Methods 0.000 abstract description 2
- 239000010410 layer Substances 0.000 description 367
- 230000008569 process Effects 0.000 description 43
- 238000010586 diagram Methods 0.000 description 36
- 239000000463 material Substances 0.000 description 32
- 230000006872 improvement Effects 0.000 description 27
- 230000006870 function Effects 0.000 description 26
- 230000000875 corresponding effect Effects 0.000 description 25
- 239000000872 buffer Substances 0.000 description 21
- 230000008901 benefit Effects 0.000 description 17
- 238000009826 distribution Methods 0.000 description 13
- 102100037812 Medium-wave-sensitive opsin 1 Human genes 0.000 description 12
- 230000000694 effects Effects 0.000 description 12
- 238000009499 grossing Methods 0.000 description 11
- 230000008859 change Effects 0.000 description 8
- 230000009286 beneficial effect Effects 0.000 description 7
- 238000005516 engineering process Methods 0.000 description 7
- 238000007781 pre-processing Methods 0.000 description 7
- 238000013459 approach Methods 0.000 description 6
- 230000007246 mechanism Effects 0.000 description 6
- 230000005540 biological transmission Effects 0.000 description 5
- 230000006837 decompression Effects 0.000 description 5
- 230000003287 optical effect Effects 0.000 description 5
- 230000007704 transition Effects 0.000 description 5
- 239000003086 colorant Substances 0.000 description 4
- 238000010276 construction Methods 0.000 description 4
- 238000002474 experimental method Methods 0.000 description 4
- 239000000523 sample Substances 0.000 description 4
- 239000002356 single layer Substances 0.000 description 4
- 238000000638 solvent extraction Methods 0.000 description 4
- 208000003028 Stuttering Diseases 0.000 description 3
- 238000004590 computer program Methods 0.000 description 3
- 230000002596 correlated effect Effects 0.000 description 3
- 230000006735 deficit Effects 0.000 description 3
- 238000006073 displacement reaction Methods 0.000 description 3
- 239000000975 dye Substances 0.000 description 3
- 230000000670 limiting effect Effects 0.000 description 3
- 239000011159 matrix material Substances 0.000 description 3
- 230000036961 partial effect Effects 0.000 description 3
- 238000011282 treatment Methods 0.000 description 3
- TVZRAEYQIKYCPH-UHFFFAOYSA-N 3-(trimethylsilyl)propane-1-sulfonic acid Chemical compound C[Si](C)(C)CCCS(O)(=O)=O TVZRAEYQIKYCPH-UHFFFAOYSA-N 0.000 description 2
- 230000009471 action Effects 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 2
- 230000003466 anti-cipated effect Effects 0.000 description 2
- 230000035559 beat frequency Effects 0.000 description 2
- 230000015556 catabolic process Effects 0.000 description 2
- 238000006731 degradation reaction Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000009432 framing Methods 0.000 description 2
- 230000010354 integration Effects 0.000 description 2
- 230000002452 interceptive effect Effects 0.000 description 2
- 239000011229 interlayer Substances 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000001575 pathological effect Effects 0.000 description 2
- 238000001824 photoionisation detection Methods 0.000 description 2
- 230000000750 progressive effect Effects 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 238000011179 visual inspection Methods 0.000 description 2
- 241000425932 Buddleja globosa Species 0.000 description 1
- BQCADISMDOOEFD-UHFFFAOYSA-N Silver Chemical compound [Ag] BQCADISMDOOEFD-UHFFFAOYSA-N 0.000 description 1
- 230000003190 augmentative effect Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 230000000903 blocking effect Effects 0.000 description 1
- ZPUCINDJVBIVPJ-LJISPDSOSA-N cocaine Chemical compound O([C@H]1C[C@@H]2CC[C@@H](N2C)[C@H]1C(=O)OC)C(=O)C1=CC=CC=C1 ZPUCINDJVBIVPJ-LJISPDSOSA-N 0.000 description 1
- 239000002131 composite material Substances 0.000 description 1
- 230000001010 compromised effect Effects 0.000 description 1
- 238000004883 computer application Methods 0.000 description 1
- 239000013078 crystal Substances 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 239000012776 electronic material Substances 0.000 description 1
- 230000008030 elimination Effects 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
- 230000001747 exhibiting effect Effects 0.000 description 1
- 238000005562 fading Methods 0.000 description 1
- 230000002349 favourable effect Effects 0.000 description 1
- 210000003746 feather Anatomy 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 238000003707 image sharpening Methods 0.000 description 1
- 239000012464 large buffer Substances 0.000 description 1
- 239000003550 marker Substances 0.000 description 1
- 230000000873 masking effect Effects 0.000 description 1
- 238000002156 mixing Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000007935 neutral effect Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000004091 panning Methods 0.000 description 1
- 238000004321 preservation Methods 0.000 description 1
- 230000001105 regulatory effect Effects 0.000 description 1
- 230000010076 replication Effects 0.000 description 1
- 230000002441 reversible effect Effects 0.000 description 1
- 230000000630 rising effect Effects 0.000 description 1
- 239000012146 running buffer Substances 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 238000007493 shaping process Methods 0.000 description 1
- 230000011664 signaling Effects 0.000 description 1
- 229910052709 silver Inorganic materials 0.000 description 1
- 239000004332 silver Substances 0.000 description 1
- 239000010454 slate Substances 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000001629 suppression Effects 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 230000035899 viability Effects 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/14—Picture signal circuitry for video frequency region
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/13—Adaptive entropy coding, e.g. adaptive variable length coding [AVLC] or context adaptive binary arithmetic coding [CABAC]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/30—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
- H04N19/39—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability involving multiple description coding [MDC], i.e. with separate layers being structured as independently decodable descriptions of input picture data
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
- H04N19/523—Motion estimation or motion compensation with sub-pixel accuracy
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
- H04N19/55—Motion estimation with spatial constraints, e.g. at image or region borders
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
- H04N19/56—Motion estimation with initialisation of the vector search, e.g. estimating a good candidate to initiate a search
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
- H04N19/567—Motion estimation based on rate distortion criteria
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/587—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal sub-sampling or interpolation, e.g. decimation or subsequent interpolation of pictures in a video sequence
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/59—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial sub-sampling or interpolation, e.g. alteration of picture size or resolution
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/60—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
- H04N19/61—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/80—Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/14—Picture signal circuitry for video frequency region
- H04N5/21—Circuitry for suppressing or minimising disturbance, e.g. moiré or halo
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/01—Conversion of standards, e.g. involving analogue television standards or digital television standards processed at pixel level
- H04N7/0112—Conversion of standards, e.g. involving analogue television standards or digital television standards processed at pixel level one of the standards corresponding to a cinematograph film standard
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/01—Conversion of standards, e.g. involving analogue television standards or digital television standards processed at pixel level
- H04N7/0117—Conversion of standards, e.g. involving analogue television standards or digital television standards processed at pixel level involving conversion of the spatial resolution of the incoming video signal
- H04N7/012—Conversion between an interlaced and a progressive signal
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/01—Conversion of standards, e.g. involving analogue television standards or digital television standards processed at pixel level
- H04N7/0127—Conversion of standards, e.g. involving analogue television standards or digital television standards processed at pixel level by changing the field or frame frequency of the incoming video signal, e.g. frame rate converter
- H04N7/0132—Conversion of standards, e.g. involving analogue television standards or digital television standards processed at pixel level by changing the field or frame frequency of the incoming video signal, e.g. frame rate converter the field or frame frequency of the incoming video signal being multiplied by a positive integer, e.g. for flicker reduction
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N9/00—Details of colour television systems
- H04N9/64—Circuits for processing colour signals
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/14—Picture signal circuitry for video frequency region
- H04N5/144—Movement detection
- H04N5/145—Movement estimation
Definitions
- This invention relates to electronic communication systems, and more particularly to an advanced electronic television system having temporal and resolution layering of compressed image frames having enhanced compression, filtering, and display characteristics.
- interlace is required, due to a claimed need to have about 1000 lines of resolution at high frame rates, but based upon the notion that such images cannot be compressed within the available 18-19 o mbits/second of a conventional 6 MHz broadcast television channel.
- the present invention provides such enhancements.
- the invention provides a method and apparatus for image compression which demonstrably achieves better than 1000-line resolution image compression at high frame rates with high quality. It also achieves both temporal and resolution scalability at this resolution at high frame rates within the available bandwidth of a conventional television broadcast channel.
- the inventive technique efficiently achieves over twice the compression ratio being proposed for advanced television.
- layered compression allows a form of modularized decomposition of an image that supports flexible application of a variety of image enhancement techniques.
- Image material is preferably captured at an initial or primary framing rate of 72 fps.
- An MPEG-like (e.g., MPEG-2, MPEG-4, etc.) data stream is then generated, comprising:
- a base layer preferably encoded using only MPEG-type P frames, comprising a low resolution (e.g., 1024x512 pixels), low frame rate (24 or 36 Hz) bitstream;
- an optional base resolution temporal enhancement layer encoded using only MPEG-type B frames, comprising a low resolution (e.g., 1024x512 pixels), high frame rate (72 Hz) bitstream;
- an optional base temporal high resolution enhancement layer preferably encoded using only MPEG-type P frames, comprising a high resolution (e.g., 2kxlk pixels), low frame rate (24 or 36 Hz) bitstream;
- an optional high resolution temporal enhancement layer encoded using only MPEG-type B frames, comprising a high resolution (e.g., 2kxlk pixels), high frame rate (72 Hz) bitstream.
- a high resolution e.g., 2kxlk pixels
- high frame rate 72 Hz
- the invention provides a number of key technical attributes, allowing substantial improvement over current proposals, and including: replacement of numerous resolutions and frame rates with a single layered resolution and frame rate; no need for interlace in order to achieve better than 1000-lines of resolution for 2 megapixel images at high frame rates (72 Hz) within a 6 MHz television channel; compatibility with computer displays through use of a primary framing rate of 72 fps; and greater robustness than the current unlayered format proposal for advanced television, since all available bits may be allocated to a lower resolution base layer when "stressful" image material is encountered.
- the invention provides a number of enhancements to handle a variety of video quality and compression problems.
- the following describes a number of such enhancements, most of which are preferably embodied as a set of tools which can be applied to the tasks of enhancing images and compressing such images.
- the tools can be combined by a content developer in various ways, as desired, to optimize the visual quality and compression efficiency of a compressed data stream, particularly a layered compressed data stream.
- Such tools include improved image filtering techniques, motion vector representation and determination, de-interlacing and noise reduction enhancements, motion analysis, imaging device characterization and correction, an enhanced 3-2 pulldown system, frame rate methods for production, a modular bit rate technique, a multi-layer DCT structure, variable length coding optimization, an augmentation system for MPEG-2 and MPEG-4, and guide vectors for the spatial enhancement layer.
- FIG. 1 is a timing diagram showing the pulldown rates for 24 fps and 36 fps material to be displayed at 60 Hz.
- FIG. 2 is a first preferred MPEG-2 coding pattern.
- FIG. 3 is a second preferred MPEG-2 coding pattern.
- FIG.4 is a block diagram showing temporal layer decoding in accordance with the preferred embodiment ofthe invention.
- FIG. 5 is a block diagram showing 60 Hz interlaced input to a converter that can output both 36 Hz and 72 Hz frames.
- FIG. 6 is a diagram showing a "master template" for a base MPEG-2 layer at 24 or 36 Hz.
- FIG. 7 is a diagram showing enhancement of a base resolution template using hierarchical resolution scalability utilizing MPEG-2.
- FIG. 8 is a diagram showing the preferred layered resolution encoding process.
- FIG. 9 is a diagram showing the preferred layered resolution decoding process.
- FIG. 10 is a block diagram showing a combination of resolution and temporal scalable options for a decoder in accordance with the invention.
- FIG. 11 is diagram of a base layer expanded by using a gray area and enhancement to provide picture detail.
- FIG. 12 is a diagram ofthe relative shape, amplitudes, and lobe polarity of a preferred downsizing filter.
- FIGS. 13A and 13B are diagrams of the relative shape, amplitudes, and lobe polarity of a pair of preferred upsizing filters for upsizing by a factor of 2.
- FIG. 14A is a block diagram of an odd-field de-interlacer.
- FIG. 14B is a block diagram of an even-field de-interlacer.
- FIG. 15 is a block diagram of a frame de-interlacer using three de-interlaced fields.
- FIG. 16 is a block diagram of an additional layered mode based upon a 2/3 base layer.
- FIG. 17 is a diagram of one example of applying higher bit rates to modular portions of a compressed data stream.
- FIG. 18 graphically illustrates the relationships of DCT harmonics between two resolution layers.
- FIG. 19 graphically illustrates the similar relationships of DCT harmonics between three resolution layers.
- FIG. 20 is a diagram showing a set of matched DCT block sizes for multiple resolution layers.
- FIG. 21 is a diagram showing examples of splitting of motion compensation macroblocks for determining independent motion vectors.
- FIG. 22 is a block diagram showing an augmentation system for MPEG-2 type systems.
- FIG. 23 is a diagram showing use of motion vectors from a base layer as guide vectors for a resolution enhancement layer
- FIGS. 24A-24E are data flow diagrams showing on example professional level enhancement mode.
- Beat Problems Optimal presentation on a 72 or 75 Hz display will occur if a camera or simulated image is created having a motion rate equal to the display rate (72 or 75 Hz, respectively), and vice versa. Similarly, optimal motion fidelity on a 60 Hz display will result from a 60 Hz camera or simulated image.
- Use of 72 Hz or 75 Hz generation rates with 60 Hz displays results in a 12 Hz or 15 Hz beat frequency, respectively. This beat can be removed through motion analysis, but motion analysis is expensive and inexact, often leading to visible artifacts and temporal aliasing. In the absence of motion analysis, the beat frequency dominates the perceived display rate, making the 12 or 15 Hz beat appear to provide less accurate motion than even 24 Hz.
- 24 Hz forms a natural temporal common denominator between 60 and 72 Hz.
- 75 Hz has a slightly higher 15 Hz beat with 60 Hz, its motion is still not as smooth as 24 Hz, and there is no integral relationship between 75 Hz and 24 Hz unless the 24 Hz rate is increased to 25 Hz.
- movies are often played 4% fast at 25 Hz; this can be done to make film presentable on 75 Hz displays.
- the 3-2 pulldown pattern repeats a first frame (or field) 3 times, then the next frame 2 times, then the next frame 3 times, then the next frame 2 times, etc. This is how 24 fps film is presented on television at 60 Hz (actually, 59.94 Hz for NTSC color). That is, each of 12 pairs of 2 frames in one second of film is displayed 5 times, giving 60 images per second.
- the 3-2 pulldown pattern is shown in FIG. 1.
- Motion Blur m order to further explore the issue of finding a common temporal rate higher than 24 Hz, it is useful to mention motion blur in the capture of moving images.
- Camera sensors and motion picture film are open to sensing a moving image for a portion of the duration of each frame.
- the duration of this exposure is adjustable.
- Film cameras require a period of time tp advance the film, and are usually limited to being open only about 210 out of 360 degrees, or a 58% duty cycle.
- some portion ofthe frame time is often required to "read” the image from the sensor. This can vary from 10% to 50% ofthe frame time.
- an electronic shutter must be used to blank the light during this readout time.
- the "duty cycle" of CCD sensors usually varies from 50 to 90%, and is adjustable in some cameras. The light shutter can sometimes be adjusted to further reduce the duty cycle, if desired.
- the most common sensor duty cycle duration is 50%.
- Preferred Rate With this issue in mind, one can consider the use of only some of the frames from an image sequence captured at 60, 72, or 75 Hz. Utilizing one frame in two, three, four, etc., the subrates shown in TABLE 1 can be derived.
- the rate of 15 Hz is a unifying rate between 60 and 75 Hz.
- the rate of 12 Hz is a unifying rate between 60 and 72 Hz.
- the desire for a rate above 24 Hz eliminates these rates.24 Hz is not common, but the use of 3-2 pulldown has come to be accepted by the industry for presentation on 60 Hz displays.
- the only candidate rates are therefore 30, 36, and 37.5 Hz. Since 30 Hz has a 7.5 Hz beat with 75 Hz, and a 6 Hz beat with 72 Hz, it is not useful as a candidate.
- the motion rates of 36 and 37.5 Hz become prime candidates for smoother motion than 24 Hz material when presented on 60 and 72/75 Hz displays. Both of these rates are about 50% faster and smoother than 24 Hz.
- the rate of 37.5 Hz is not suitable for use with either 60 or 72 Hz, so it must be eliminated, leaving only 36 Hz as having the desired temporal rate characteristics.
- the motion rate of 37.5 Hz could be used if the 60 Hz display rate for television can be move 4% to 62.5 Hz. Given the interests behind 60 Hz, 62.5 Hz appears unlikely - there are even those who propose the very obsolete 59.94 Hz rate for new television systems. However, if such a change were to be made, the other aspects ofthe invention could be applied to the 37.5 Hz rate.
- the rates of 24, 36, 60, and 72 Hz are left as candidates for a temporal rate family.
- the rates of 72 and 60 Hz cannot be used for a distribution rate, since motion is less smooth when converting between these two rates than if 24 Hz is used as the distribution rate, as described above.
- 36 Hz is the prime candidate for a master, unifying motion capture and image distribution rate for use with 60 and 72/75 Hz displays.
- the 3-2 pulldown pattern for 24 Hz material repeats a first frame (or field) 3 times, then the next frame 2 times, then the next frame 3 times, then the next frame 2 times, etc.
- each pattern optimally should be repeated in a 2-1-2 pattern. This can be seen in TABLE 2 and graphically in FIG. 1.
- 36 Hz is the optimum rate for a master, unifying motion capture and image distribution rate for use with 60 and 72 Hz displays, yielding smoother motion than 24 Hz material presented on such displays.
- 36 Hz meets the goals set forth above, it is not the only suitable capture rate. Since 36 Hz cannot be simply extracted from 60 Hz, 60 Hz does not provide a suitable rate for capture. However, 72 Hz can be used for capture, with every other frame then used as the basis for 36 Hz distribution. The motion blur from using every other frame of 72 Hz material will be half of the motion blur at 36 Hz capture. Tests of motion blur appearance of every third frame from 72 Hz show that staccato strobing at 24 Hz is objectionable. However, utilizing every other frame from 72 Hz for 36 Hz display is not objectionable to the eye compared to 36 Hz native capture.
- 72 Hz camera to achieve a 36 Hz distribution rate can profit from an increased motion blur duty cycle.
- the normal 50% duty cycle at 72 Hz, yielding a 25% duty cycle at 36 Hz has been demonstrated to be acceptable, and represents a significant improvement over 24 Hz on 60 Hz and 72 Hz displays.
- the duty cycle is increased to be in the 75-90% range, then the 36 Hz samples would begin to approach the more common 50% duty cycle.
- Increasing the duty rate may be accomplished, for example, by using "backing store" CCD designs which have a short blanking time, yielding a high duty cycle. Other methods may be used, including dual CCD multiplexed designs.
- Modified MPEG-2 Compression For efficient storage and distribution, digital source material having the preferred temporal rate of 36 Hz should be compressed.
- the preferred form of compression for the invention is accomplished by using a novel variation ofthe MPEG-2 standard, but may be used with any other compression system with similar characteristics (e.g., MPEG-4).
- MPEG-2 Basics.
- MPEG-2 is an international video compression standard defining a video syntax that provides an efficient way to represent image sequences in the form of more compact coded data.
- the language ofthe coded bits is the "syntax.” For example, a few tokens can represent an entire block of 64 samples.
- MPEG also describes a decoding (reconstruction) process where the coded bits are mapped from the compact representation into the original, "raw" format ofthe image sequence. For example, a flag in the coded bitstream signals whether the following bits are to be decoded with a discrete cosine transform (DCT) algorithm or with a prediction algorithm.
- DCT discrete cosine transform
- the algorithms comprising the decoding process are regulated by the semantics defined by MPEG.
- MPEG-2 defines a programming language as well as a data format.
- An MPEG-2 decoder must be able to parse and decode an incoming data stream, but so long as the data stream complies with the MPEG-2 syntax, a wide variety of possible data structures and compression techniques can be used.
- the invention takes advantage of this flexibility by devising a novel means and method for temporal and resolution scaling using the MPEG-2 standard.
- MPEG-2 uses an intraframe and an interframe method of compression. In most video scenes, the background* remains relatively stable while action takes place in the foreground. The background may move, but a great deal of the scene is redundant. MPEG-2 starts its compression by creating a reference frame called an I (for Intra) frame. I frames are compressed without reference to other frames and thus contain an entire frame of video information. I frames provide entry points into a data bitstream for random access, but can only be moderately compressed. Typically, the data representing I frames is placed in the bitstream every 10 to 15 frames. Thereafter, since only a small portion of the frames that fall between the reference I frames are different from the bracketing I frames, only the differences are captured, compressed and stored.
- I for Intra
- P frames generally are encoded with reference to a past frame (either an I frame or a previous P frame), and, in general, will be used as a reference for future P frames.
- P frames receive a fairly high amount of compression.
- B frames pictures provide the highest amount of compression but generally require both a past and a future reference in order to be encoded.
- Bi-directional frames are never used for reference frames.
- Macroblocks within P frames may also be individually encoded using intra-frame coding.
- Macroblocks within B frames may also be individually encoded using intra-frame coding, forward predicted coding, backward predicted coding, or both forward and backward, or bi-directionally interpolated, predicted coding.
- a macroblock is a 16x16 pixel grouping of four 8x8 DCT blocks, together with one motion vector for P frames, and one or two motion vectors for B frames.
- an MPEG data bitstream comprises a sequence of I, P, and B frames.
- a sequence may consist of almost any pattern of I, P, and B frames (there are a few minor semantic restrictions on their placement). However, it is common in industrial practice to have a fixed pattern (e.g., IBBPBBPBBPBBPBB).
- an MPEG-2 data stream is created comprising a base layer, at least one optional temporal enhancement layer, and an optional resolution enhancement layer. Each of these layers will be described in detail.
- the base layer is used to carry 36 Hz source material.
- one of two MPEG-2 frame sequences can be used for the base layer: IBPBPBP or IPPPPPP.
- IBPBPBPBP IBPBPBP
- IPPPPPP IPPPPPP
- 72 Hz Temporal Enhancement Layer When using MPEG-2 compression, it is possible to embed a 36 Hz temporal enhancement layer as B frames within the MPEG-2 sequence for the 36 Hz base layer if the P frame distance is even. This allows the single data stream to support both 36 Hz display and 72 Hz display. For example, both layers could be decoded to generate a 72 Hz signal for computer monitors, while only the base layer might be decoded and converted to generate a 60 Hz signal for television.
- IPBBBPBBBPBBBP or PBPBPBPB both allow placing alternate frames in a separate stream containing only temporal enhancement B frames to take 36 Hz to 72 Hz.
- These coding patterns are shown in FIG.S 2 and 3, respectively.
- the 2-Frame P spacing coding pattern of FIG. 3 has the added advantage that the 36 Hz decoder would only need to decode P frames, reducing the required memory bandwidth if 24 Hz movies were also decoded without B frames.
- FIG. 3 is a block diagram showing that a 36 Hz base layer MPEG-2 decoder 50 simply decodes the P frames to generate 36 Hz output, which may then be readily converted to either 60 Hz or 72 Hz display.
- An optional second decoder 52 simply decodes the B frames to generate a second 36 Hz output, which when combined with the 36 Hz output ofthe base layer decoder 50 results in a 72 Hz output (a method for combining is discussed below).
- one fast MPEG-2 decoder 50 could decode both the P frames for the base layer and the B frames for the enhancement layer.
- Optimal Master Format A number of companies are building MPEG-2 decoding chips which operate at around 11 MPixels/second. The MPEG-2 standard has defined some "profiles" for resolutions and frame rates. Although these profiles are strongly biased toward computer-incompatible format parameters such as 60 Hz, non-square pixels, and interlace, many chip manufacturers appear to be developing decoder chips which operate at the "main profile, main level".
- This profile is defined to be any horizontal resolution up to 720 pixels, any vertical resolution up to 576 lines at up to 25 Hz, and any frame rate of up to 480 lines at up to 30 Hz.
- a wide range of data rates from approximately 1.5 Mbits/second to about 10 Mbits/second is also specified.
- the main-level, main-profile pixel rate is about 10.5 MPixels/second.
- MPEG-2 decoder chips Although there is variation among chip manufacturers, most MPEG-2 decoder chips will in fact operate at up to 13 MPixels/second, given fast support memory. Some decoder chips will go as fast as 20 MPixels/second or more. Given that CPU chips tend to gain 50% improvement or more each year at a given cost, one can expect some near-term flexibility in the pixel rate of MPEG-2 decoder chips.
- TABLE 4 illustrates some desirable resolutions and frame rates, and their corresponding pixel rates.
- All of these formats can be utilized with MPEG-2 decoder chips that can generate at least 12.6 MPixels/second.
- the very desirable 640x480 at 36 Hz format can be achieved by nearly all current chips, since its rate is 11.1 MPixels/second.
- a widescreen 1024x512 image can be squeezed into 680x512 using a 1.5:1 squeeze, and can be supported at 36 Hz if 12.5 MPixels/second can be handled.
- the highly desirable square pixel widescreen template of 1024x512 can achieve 36 Hz when MPEG-2 decoder chips can process about 18.9 MPixels/second. This becomes more feasible if 24 Hz and 36 Hz material is coded only with P frames, such that B frames are only required in the 72 Hz temporal enhancement layer decoders. Decoders which use only P frames require less memory and memory bandwidth, making the goal of 19 MPixels/second more accessible.
- the 1024x512 resolution template would most often be used with 2.35:1 and
- the invention provides a unique way of accommodating a wide variety of aspect ratios and temporal resolution compared to the prior art. (Further discussion of a master template is set forth below).
- the temporal enhancement layer of B frames to generate 72 Hz can be decoded using a chip with double the pixel rates specified above, or by using a second chip in parallel with additional access to the decoder memory.
- merging can be done invisibly to the decoder chip using the MPEG-2 transport layer.
- the MPEG-2 transport packets for two PIDs can be recognized as containing the base layer and enhancement layer, and their stream contents can both be simply passed on to a double-rate capable decoder chip, or to an appropriately configured pair of normal rate decoders.
- the data partitioning feature allows the B frames to be marked as belonging to a different class within the MPEG-2 compressed data stream, and can therefore be flagged to be ignored by 36-Hz decoders which only support the temporal base layer rate.
- Temporal scalability as defined by MPEG-2 video compression, is not as optimal as the simple B frame partitioning ofthe invention.
- the MPEG-2 temporal scalability is only forward referenced from a previous P or B frame, and thus lacks the efficiency available in the B frame encoding proposed here, which is both forward and backward referenced.
- the simple use of B frames as a temporal enhancement layer provides a simpler and more efficient temporal scalability than does the temporal scalability defined within MPEG-2.
- this use of B frames as the mechanism for temporal scalability is fully compliant with MPEG-2.
- the two methods of identifying these B frames as an enhancement layer, via data partitioning or alternate PID's for the B frames, are also fully compliant.
- Temporal enhancement layer 50/60 Hz Temporal enhancement layer.
- a 60 Hz temporal enhancement layer (which encodes a 24 Hz signal) can be added in similar fashion to the 36 Hz base layer.
- a 60 Hz temporal enhancement layer is particular useful for encoding existing 60 Hz interlaced video material.
- FIG. 5 is a block diagram showing 60 Hz interlaced input from cameras 60 or other sources (such as non-film video tape) 62 to a converter 64 that includes a de-interlacer function and a frame rate conversion function that can output a 36 Hz signal (36 Hz base layer only) and a 72 Hz signal (36 Hz base layer plus 36 Hz from the temporal enhancement layer).
- NTSC NTSC
- most NTSC signals are viewed with substantial impairment on most home televisions. Further, viewers have come to accept the temporal impairments inherent in the use of 3-2 pulldown to present film on television. Nearly all prime-time television is made on film at 24 frames per second. Thus, only sports, news, and other video-original shows need be processed in this fashion. The artifacts and losses associated with converting these shows to a 36/72 Hz format are likely to be offset by the improvements associated with high-quality de-interlacing ofthe signal.
- enhancement layer 5 and enhancement layer should appear similar to 72 Hz origination in terms of motion blur. Accordingly, few viewers will notice the difference, except possibly as a slight improvement, when interlaced 60 Hz NTSC material is processed into a 36 Hz base layer, plus 24 Hz from the temporal enhancement layer, and displayed at 60 Hz. However, those who buy new 72 Hz digital non-interlaced televisions will notice a small improvement o when viewing NTSC, and a major improvement when viewing new material captured or originated at 72 Hz. Even the decoded 36 Hz base layer presented on 72 Hz displays will look as good as high quality digital NTSC, replacing interlace artifacts with a slower frame rate.
- PAL video tapes are best slowed to 48 Hz prior to such conversion.
- Live PAL requires conversion using the relatively unrelated rates of 50, 36, and 72 Hz.
- Such converter units presently are only affordable at the source of broadcast signals, and are not presently practical at each receiving device in the home and office.
- the process of resolution enhancement can be achieved by generating a resolution enhancement layer as an independent MPEG-2 stream and applying MPEG-2 compression to the enhancement layer.
- This technique differs from the "spatial scalability" defined with MPEG-2, which has proven to be highly inefficient.
- MPEG-2 0 contains all ofthe tools to construct an effective layered resolution to provide spatial scalability.
- the preferred layered resolution encoding process ofthe invention is shown in FIG. 8.
- the preferred decoding process ofthe invention is shown in FIG. 9.
- an original 2kxlk image 80 is down-filtered, preferably using an optimized filter having negative lobes (see discussion of FIG. 12 below), to 112 resolution in each dimension to create a 1024x512 base layer 81.
- the base layer 81 is then compressed according to conventional MPEG-2 algorithms, generating an MPEG-2 base layer 82 suitable for transmission.
- full MPEG-2 motion compensation can be used during this compression step.
- That same signal is then decompressed using conventional MPEG-2 algorithms back to a 1024x512 image 83.
- the 1024x512 image 83 is expanded (for example, by pixel replication, or preferably by better up-filters such as spline interpolation or filters having negative lobes; see discussion of FIGS. 13A and 13B below) to a first 2kxlk enlargement 84.
- the filtered 1024x512 base layer 81 is expanded to a second 2kxlk enlargement 85.
- This second 2kxlk enlargement 85 is subtracted from the original 2kxlk image 80 to generate an image that represents the top octave of resolution between the original high resolution image 80 and the original base layer image 81.
- the resulting image is optionally multiplied by a sharpness factor or weight, and then added to the difference between the original 2kxlk image 80 and the second 2kxlk enlargement 85 to generate a center-weighted 2kxlk enhancement layer source image 86.
- This enhancement layer source image 86 is then compressed according to conventional MPEG-2 algorithms, generating a separate MPEG-2 resolution enhancement layer 87 suitable for transmission.
- full MPEG-2 motion compensation can be used during this compression step.
- the base layer 82 is decompressed using conventional MPEG-2 algorithms back to a 1024x512 image 90.
- the 1024x512 image 90 is expanded to a first 2kxlk image 91.
- the resolution enhancement layer 87 is decompressed using conventional MPEG-2 algorithms back to a second 2kxlk image 92.
- the first 2kxlk image 91 and the second 2kxlk image 92 are then added to generate a high-resolution 2kxlk image 93. Improvements Over MPEG-2.
- the enhancement layer is created by expanding the decoded base layer, taking the difference between the original image and the decode base layer, and compressing.
- a compressed resolution enhancement layer may be optionally added to the base layer after decoding to create a higher resolution image in the decoder.
- the inventive layered resolution encoding process differs from MPEG-2 spatial scalability in several ways: •
- the enhancement layer difference picture is compressed as its own MPEG-2 data stream, with I, B, and P frames. This difference represents the major reason that resolution scalability, as proposed here, is effective, where MPEG-2 spatial scalability is ineffective.
- the spatial scalability defined within MPEG-2 allows an upper layer to be coded as the difference between the upper layer picture and the expanded base layer, or as a motion compensated MPEG-2 data stream ofthe actual picture, or a combination of both. However, neither of these encodings is efficient.
- the difference from the base layer could be considered as an I frame of the difference, which is inefficient compared to a motion-compensated difference picture, as in the invention.
- the upper-layer encoding defined within MPEG-2 is also inefficient, since it is identical to a complete encoding ofthe upper layer.
- the motion compensated encoding of the difference picture, as in the invention, is therefore substantially more efficient.
- the MPEG-2 systems transport layer (or another similar mechanism) must be used to multiplex the base layer and enhancement layer.
- the expansion and resolution reduction (down) filtering can be a gaussian or spline function, or a filter with negative lobes (see FIG. 12), which are more optimal than the bilinear interpolation specified in MPEG-2 spatial scalability.
- the image aspect ratio must match between the lower and higher layers in the preferred embodiment.
- MPEG-2 spatial scalability extensions to width and/or height are allowed. Such extensions are not allowed in the preferred embodiment due to efficiency requirements.
- the entire area ofthe enhancement layer is not coded.
- the area excluded from enhancement will be the border area.
- the 2kxlk enhancement layer source image 86 in the preferred embodiment is center-weighted.
- a fading function (such as linear weighting) is used to "feather" the enhancement layer toward the center ofthe image and away from the border edge to avoid abrupt transitions in the image.
- any manual or automatic method of determining regions having detail which the eye will follow can be utilized to select regions which need detail, and to exclude regions where extra detail is not required. All ofthe image has detail to the level ofthe base layer, so all ofthe image is present. Only the areas of special interest benefit from the enhancement layer.
- edges or borders ofthe frame can be excluded from enhancement, as in the center-weighted embodiment described above.
- the MPEG-2 parameters "lower_layer_prediction_horizontal&vertical offset” parameters used as signed negative integers, combined with the "horizontal&ver- tical_subsampling_factor_m&n” values, can be used to specify the enhancement layer rectangle's overall size and placement within the expanded base layer.
- a sharpness factor is added to the enhancement layer to offset the loss of sharpness which occurs during quantization. Care must be taken to utilize this parameter only to restore the clarity and sharpness of the original picture, and not to enhance the image.
- the sharpness factor is the "high octave" of resolution between the original high resolution image 80 and the original base layer image 81 (after expansion). This high octave image will be quite noisy, in addition to containing the sharpness and detail of the high octave of resolution.
- the amount that should be added depends upon the level ofthe noise in the original image. A typical weighting value is 0.25. For noisy images, no sharpness should be added, and it even may be advisable to suppress the noise in the original for the enhancement layer before compressing using conventional noise suppression techniques which preserve detail.
- Temporal and resolution scalability are intermixed by utilizing B frames for temporal enhancement from 36 to 72 Hz in both the base and resolution enhancement layers. In this way, four possible levels of decoding performance are possible with two layers of resolution scalability, due to the options available with two levels of temporal scalability. These differences represent substantial improvements over MPEG-2 spatial and temporal scalability. However, these differences are still consistent with MPEG-2 decoder chips, although additional logic may be required in the decoder to perform the expansion and addition in the resolution enhancement decoding process shown in FIG. 9. Such
- Optional Non-MPEG-2 Coding of the Resolution Enhancement Layer It is possible to utilize a different compression technique for the resolution enhancement layer than MPEG-2. Further, it is not necessary to utilize the same compression technology for o the resolution enliancement layer as for the base layer. For example, motion-compensated block wavelets can be utilized to match and track details with great efficiency when the difference layer is coded. Even if the most efficient position for placement of wavelets jumps around on the screen due to changing amounts of differences, it would not be noticed in the low-amplitude enhancement layer. Further, it is not necessary to cover the 5 entire image - it is only necessary to place the wavelets on details. The wavelets can have their placement guided by detail regions in the image. The placement can also be biased away from the edge.
- a 2k Ik template can efficiently support the common widescreen aspect ratios of 1.85:1 and 2.35:1.
- a 2kxlk template can also accommodate 1.33:1 and other aspect ratios.
- integers especially the factor of 2 and simple fractions (3/2 & 4/3) are most efficient step sizes in resolution layering, it is also possible to use arbitrary ratios to achieve any required resolution layering.
- using a 2048x1024 template, or something near it, provides not only a high quality digital master format, but also can provide many other convenient resolutions from a factor of two base layer (lkx512), including NTSC, the U.S. television standard.
- digital mastering formats should be created in the frame rate ofthe film if from existing movies (i. e. , at 24 frames per second) .
- the common use of both 3 -2 pulldown and interlace would be inappropriate for digital film masters.
- For new digital electronic material it is hoped that the use of 60 Hz interlace will cease in the near future, and be replaced by frame rates which are more compatible with computers, such as 72 Hz, as proposed here.
- the digital image masters should be made at whatever frame rate the images are captured, whether at 72 Hz, 60 Hz, 36 Hz, 37.5 Hz, 75 Hz, 50 Hz, or other rates.
- a mastering format as a single digital source picture format for all electronic release formats differs from existing practices, where PAL, NTSC, letterbox, pan-and-scan, HDTV, and other masters are all generally independently made from a film original.
- PAL, NTSC, letterbox, pan-and-scan, HDTV, and other masters are all generally independently made from a film original.
- the use of a mastering format allows both film and digital/electronic shows to be mastered once, for release on a variety of resolutions and formats.
- Temporal enhancement is provided by decoding B frames.
- the resolution enhancement layer also has two temporal layers, and thus also contains B frames. For 24 fps film, the most efficient and lowest cost decoders might use only
- decoding movies at 24 fps and decoding advanced television at 36 fps could utilize a decoder without B frame capability.
- B frames can then be utilized between P frames to yield the higher temporal layer at 72 Hz, as shown in FIG. 3, which could be decoded by a second decoder.
- This second decoder could also be simplified, since it would only have to decode B frames.
- the resolution enhancement layer can add the full temporal rate of 72 Hz at high resolution by adding B frame decoding within the resolution enhancement layer.
- FIG. 10 The combined resolution and temporal scalable options for a decoder are illustrated in FIG. 10. This example also shows an allocation ofthe proportions of an approximately 18 mbits/second data stream to achieve the spatio-temporal layered Advanced Television ofthe invention.
- a base layer MPEG-2 1024x512 pixel data stream (comprising only P frames in the preferred embodiment) is applied to a base resolution decoder 100. Approximately 5 mbits/per sec of bandwidth is required for the P frames.
- the base resolution decoder 100 can decode at 24 or 36 fps.
- the output of the base resolution decoder 100 comprises low resolution, low frame rate images (1024x512 pixels at 24 or 36 Hz).
- the B frames from the same data stream are parsed out and applied to a base resolution temporal enhancement layer decoder 102. Approximately 3 mbits/per sec of bandwidth is required for such B frames.
- the output ofthe base resolution decoder 100 is also coupled to the temporal enhancement layer decoder 102.
- the temporal enhancement layer decoder 102 can decode at 36 fps.
- the combined output of the temporal enhancement layer decoder 102 comprises low resolution, high frame rate images (1024x512 pixels at 72 Hz). Also in FIG. 10, a resolution enhancement layer MPEG-22kx 1 k pixel data stream
- the output of the base resolution decoder 100 is also coupled to the high resolution enhancement layer decoder 104.
- the high resolution enhancement layer decoder 104 can decode at 24 or 36 fps.
- the output of the high resolution enhancement layer decoder 104 comprises high resolution, low frame rate images (2kxlk pixels at 24 or 36 Hz).
- the B frames from the same data stream are parsed out and applied to a high resolution temporal enhancement layer decoder 106. Approximately 4 mbits/per sec of bandwidth is required for such B frames.
- the output ofthe high resolution enhancement layer decoder 104 is coupled to the high resolution temporal enhancement layer decoder 106.
- the output ofthe temporal enhancement layer decoder 102 is also coupled to the high resolution temporal enhancement layer decoder 106.
- the high resolution temporal enhancement layer decoder 106 can decode at 36 fps.
- the combined output ofthe high resolution temporal enhancement layer decoder 106 comprises high resolution, high frame rate images (2kxlk pixels at 72 Hz).
- MPEG-2 encoding syntax also provides efficient motion representation through the use of motion-vectors in both the base and enhancement layers. Up to some threshold of high noise and rapid image change, MPEG-2 is also efficient at coding details instead of noise within an enhancement layer through motion compensation in conjunction with DCT quantization. Above this threshold, the data bandwidth is best allocated to the base layer.
- macroblock is a 16x16 pixel grouping of four 8x8 DCT blocks, together with one motion vector for P frames, and one or two motion vectors for B frames.
- the bits available per macroblock for each layer are shown in TABLE 6.
- High Temporal 18 (5+3+6+4) 123 37 overall, 30/enh. layer w/border around hi-res center for comparison:
- the available number of bits to code each macroblock is smaller in the enhancement layer than in the base layer. This is appropriate, since it is desirable for the base layer to have as much quality as possible.
- the motion vector requires 8 bits or so, leaving 10 to 25 bits for the macroblock type codes and for the DC and AC coefficients for all four 8x8 DCT blocks. This leaves room for only a few "strategic" AC coefficients. Thus, statistically, most of the information available for each macroblock must come from the previous frame of an enhancement layer.
- the system described here gains its efficiency by utilizing motion compensated prediction from the previous enhancement difference frame. This is demonstrably effective in providing excellent results in temporal and resolution (spatial) layered encoding.
- the temporal scaling and resolution scaling techniques described here work well for normal-running material at 72 frames per second using a 2k lk original source. These techniques also work well on film-based material which runs at 24 ⁇ s. At high frame rates, however, when a very noise-like image is coded, or when there are numerous shot cuts within an image stream, the enhancement layers may lose the coherence between frames which is necessary for effective coding. Such loss is easily detected, since the buffer-fullness/rate-control mechanism of a typical MPEG-2 encoder/decoder will attempt to set the quantizer to very coarse settings.
- all of the bits normally used to encode the resolution enhancement layers can be allocated to the base layer, since the base layer will need as many bits as possible in order to code the stressful material. For example, at between about 0.5 and 0.33 MPixels per frame for the base layer, at 72 frames per second, the resultant pixel rate will be 24 to 36 MPixels/second. Applying all ofthe available bits to the base layer provides about 0.5 to 0.67 million additional bits per frame at 18.5 mbits/second, which should be sufficient to code very well, even on stressful material.
- the base-layer quantizer is still operating at a coarse level under about 18.5 mbits/second at 36 ⁇ s, then the base layer frame rate can be dynamically reduced to 24, 18, or even 12 frames per second (which would make available between 1.5 and 4 mbits for every frame), which should be able to handle even the most pathological moving image types. Methods for changing frame rate in such circumstances are known in the art.
- the adaptive quantization level is controlled by the output buffer fullness. At the high compression ratios involved in the resolution enhancement layer ofthe invention, this mechanism may not function optimally.
- Various techniques can be used to optimize the allocation of data to the most appropriate image regions. The conceptually simplest technique is to perform a pre-pass of encoding over the resolution enhancement layer to gather statistics and to search out details which should be preserved. The results from the pre-pass can be used to set the adaptive quantization to optimize the preservation of detail in the resolution enhancement layer.
- the settings can also be artificially biased to be non-uniform over the image, such that image detail is biased to allocation in the main screen regions, and away from the macroblocks at the extreme edges ofthe frame. Except for leaving an enhancement-layer border at high frame rates, none of these adjustments are required, since existing decoders function well without such improvements. However, these further improvements are available with a small extra effort in the enhancement layer encoder.
- the invention described here achieves many highly desirable features. It has been claimed by some involved in the U.S . advanced television process that neither resolution nor temporal scalability can be achieved at high definition resolutions within the approximately 18.5 mbits/second available in terrestrial broadcast. However, the invention achieves both temporal and spatial-resolution scalability within this available data rate. It has also been claimed that 2 MPixels at high frame rates cannot be achieved without the use of interlace within the available 18.5 mbits/second data rate. However, achieves not only resolution (spatial) and temporal scalability, it can provide 2 MPixels at 72 frames per second.
- the invention is also very robust, particularly compared to the current proposal for advanced television. This is made possible by the allocation of most or all ofthe bits to the base layer when very stressful image material is encountered. Such stressful material is by its nature both noise-like and very rapidly changing. In these circumstances, the eye cannot see detail associated with the enhancement layer of resolution. Since the bits are applied to the base layer, the reproduced frames are substantially more accurate than the currently proposed advanced television system, which uses a single constant higher resolution.
- this aspect ofthe inventive system optimizes both perceptual and coding efficiency, while providing maximum visual impact.
- This system provides a very clean image at a resolution and frame rate performance that had been considered by many to be impossible. It is believed that this aspect ofthe inventive system is likely to outperform the advanced television formats being proposed by at this time. In addition to this anticipated superior performance, the invention also provides the highly valuable features of temporal and resolution layering.
- MPEG-2 While the discussion above has used MPEG-2 in its examples, these and other aspects of the invention may be carried out using other compression systems.
- the invention will work with any comparable standard that provides I, B, and P frames or equivalents, such as MPEG-1, MPEG-2, MPEG-4, H.263, and other compression systems (including wavelets and other non-DCT systems).
- a number of enhancements to the embodiments described above may be made to handle a variety of video quality and compression problems.
- the following describes a number of such enhancements, most of which are preferably embodied as a set of tools which can be applied to the tasks of enhancing images and compressing such images.
- the tools can be combined by a content developer in various ways, as desired, to optimize the visual quality and compression efficiency of a compressed data stream, particularly a layered compressed data stream.
- a resolution enhancement layer is coded using MPEG-type (e.g., MPEG-2, MPEG-4, or comparable systems) compression is to bias a difference picture with a gray bias.
- MPEG-type e.g., MPEG-2, MPEG-4, or comparable systems
- the difference picture is found by subtracting an expanded and decompressed base layer from an original high resolution image. Sequences of these difference pictures are then encoded as an MPEG-type difference picture stream of frames, which operates as a normal MPEG-type picture stream. The gray bias value is removed when each difference picture is added to another image (for example, the expanded decoded base layer) to create an improved resolution result.
- motion vectors found on this difference picture stream which is often quite noisy
- Each difference picture represents a delta adjustment which would be needed to make a perfect encoding ofthe high resolution original.
- the pixel difference delta values can only extend for half of the range, which is nearly always sufficient, since differences are usually quite small.
- black (typically 0) regions can be extended at most to half gray (at 127, typically), and white (typically 255) can be extended down at most to half gray (at 128, typically).
- the difference picture could be used to create an entire black-to-white range (of 0 to 255, typically).
- This simple relationship can be utilized to widen the aspect ratio of a final image in addition to enhancing the resolution ofthe base layer.
- the system can provide the gray value (128, typically) for use with a difference picture in order to add full picture detail outside ofthe extent ofthe expanded, decompressed base layer.
- FIG. 11 is diagram of a base layer expanded by using a gray area and enhancement to provide picture detail.
- a base layer 1100 having a narrower aspect ratio is upfiltered and then expanded in area as an expanded base region 1102.
- the expanded base region 1102 is then "padded" with a uniform mid- gray pixel value (e.g., 128) to widen its aspect ratio or otherwise increase its size (an "additional area region” 1104).
- An enhancement layer can then be added having a small range of possible pixel values (i.e., a difference picture) for the area that coincides with the the expanded base region 1102, but a full range of possible pixel values (e.g., ⁇ 127) over the additional area region 1104, thus providing additional actual picture information.
- the base layer can represent a narrow or shorter (or both) image extent than the enhanced image at higher resolution.
- the enhancement layer then contains both gray-biased image difference pictures, corresponding to the extent ofthe expanded decompressed base layer (i.e., the expanded base region 1102), as well as containing actual picture. Since the compressed enhancement layer is encoded as a standard MPEG-type picture stream, the fact that the edge region is actual picture, and the inner region is a difference picture, is not distinguished, and both are coded and carried along together in the same picture stream of frames.
- the edge region outside the expanded decompressed base layer extent, is a normal high resolution MPEG- type encoded stream. It has an efficiency which corresponds to normal MPEG- type encoding of a high resolution picture.
- motion vectors within the difference picture region should be constrained not to point into the border region (which contains actual picture information).
- motion vectors for the border actual-picture region should be constrained not to point into the inner difference picture region. In this way the border actual-picture region coding and difference picture region coding will be naturally separated.
- the quantizer and rate buffer control during encoding of these hybrid difference-plus-actual-picture image-expanding enhancement layer pictures may need special adjustment to differentiate the larger extent of signals in the border actual- picture region over the inner difference picture region.
- this technique concerning the amount ofthe extent ofthe border actual-picture region.
- the number of bits in proportion to the overall stream is small, but the relative efficiency for the small area is reduced because ofthe number of motion vectors which cannot find matches since such matches would be off the edge ofthe border region.
- Another way of looking at this is that the border region has a high proportion of edge to area, unlike a usual image rectangle, which has a much lower proportion of edge to area.
- the inner rectangular picture region typical of normal digital video as is usually coded with compression such as MPEG-2 or MPEG-4, has a high degree of matches when finding motion vectors since most ofthe area within the frame, except at the very frame edges, is usually present in the previous frame.
- compression such as MPEG-2 or MPEG-4
- the direction of picture coming on-screen will cause one edge to have to create picture from nothing, since the image is coming from off-screen for each frame.
- most of a normal picture rectangle is on-screen in the previous frame, allowing the motion vectors to most often find matches.
- the border area has a much higher percentage of off-screen mismatches in previous frames for motion compensation, since the screen outer edge, as well as the difference picture inner edge, are both "out-of-bounds" for motion vectors.
- some loss of efficiency is inherent in this approach when considered as bits per image area (or per pixel or per macroblock, which are equivalent bits-per-area measures).
- this relative inefficiency is a sufficiently small proportion of the overall bit rate to be acceptable. If the border is relatively large, likewise, the efficiency becomes higher, and the proportion may again be acceptable. Moderately sized borders may suffer some inefficiency during pans, but this inefficiency may be acceptable.
- the lower resolution image may also be most naturally used on narrower screens, while the higher resolution image may be more naturally viewed on larger and wider, and/or taller screens.
- FIG. 12 is a diagram ofthe relative shape, amplitudes, and lobe polarity of a preferred downsizing filter.
- the down filter essentially is a center- weighted function which has been truncated to a center positive lobe 1200, a symmetric pair of adjacent (bracketing) small negative lobes 1204, and a symmetric pair of adjacent (bracketing) very small outer positive lobes 1206.
- the absolute amplitude ofthe lobes 1200, 1202, 1204 may be adjusted as desired, so long as the relative polarity and amplitude inequality relationships shown in FIG. 12 are maintained.
- the preferred downsizing filter When creating a base layer original (as input to the base layer compression) from a low-noise high resolution original input, the preferred downsizing filter has first negative lobes which are of a normal sine function amplitude. For clean and for high resolution input images, this normal truncated sine function works well. For lower resolutions (e.g., 1280x720, 1024x768, or 1536x768), and for noisier input pictures, a reduced first negative lobe amplitude in the filters is more optimal. A suitable amplitude in such cases is about half the truncated sine function negative lobe amplitude.
- the small first positive lobes outside ofthe first negative lobes are also reduced to lower amplitude, typically to 1/2 to 2/3 ofthe normal sine function amplitude.
- the affect of reducing the first negative lobes is the main issue, since the small outside positive lobes do not contribute to picture noise. Further samples outside the first positive lobes preferably are truncated to minimize ringing and other potential artifacts.
- FIGS. 13 A and 13B are diagrams ofthe relative shape, amplitudes, and lobe polarity of a pair of preferred upsizing filters for upsizing by a factor of 2.
- a central positive lobe 1300, 1300' is bracketed by a pair of small negative lobes 1302, 1302'.
- An asymmetrically placed positive lobe 1304, 1304' is also required.
- These paired upfilters could also be considered to be truncated sine filters centered on the newly created samples. For example, for a factor of two upfilter, two new samples will be created for each original sample.
- the small adjacent negative lobes 1302, 1302' have less negative amplitude than is used in the corresponding downsizing filter (FIG. 12), or than would be used in an optimal (sine-based) upsizing filter for normal images. This is because the images being upsized are decompressed, and the compression process changes the spectral distribution. Thus, more modest negative lobes, and no additional positive lobes beyond the middle ones 1300, 1300', work better for upsizing a decompressed base layer.
- this upsizing filter preferably is used for the base layer in both the encoder and the decoder.
- the signal path which expands the original uncompressed base layer input image uses a gaussian upfilter rather than the upfilter described above.
- a gaussian upfilter is used for the "high octave" of picture detail, which is determined by subtracting the expanded original base- resolution input image (without using compression) from the original picture.
- no negative lobes are used for this particular upfiltered expansion.
- this high octave difference signal path is typically weighted with 0.25 (or 25%) and added to the expanded decompressed base layer (using the other upfilter described above) as input to the enhancement layer compression process.
- weights 10%, 15%, 20%, 30%, and 35% are useful for particular images when using MPEG-2.
- Other weights may also prove useful.
- filter weights 4-8% may be optimal when used in conjunction with other improvements described below. Accordingly, this weighting should be regarded as an adjustable parameter, depending upon the encoding system, the scenes being encoded/compressed, the particular camera (or film) being used, and the image resolution.
- the de- interlacing techniques described below are useful as input to single-layer noninterlaced MPEG-like, as well as to the layered MPEG-like compression described above.
- noise reduction must similarly match the needs of being an input to compression algorithms, rather than just reducing noise appearance.
- the goal is generally to reproduce, upon decompression, no more noise than the original camera or film-grain noise. Equal noise is generally considered acceptable, after compression/decompression. Reduced noise, with equivalent sharpness and clarity with the original, is a bonus.
- the noise reduction described below achieves these goals.
- noise reduction can be the difference between a good looking compressed/decompressed image vs. one which is unwatchably noisy.
- the compression process greatly amplifies noise which is above some threshold of acceptability to the compressor.
- the use of noise-reduction pre-processing to keep noise below this threshold may be required for acceptable good quality results.
- de-graining and/or noise-reducing filtering before layered or non-layered encoding improves the ability of the compression system to perform. While de-graining or noise-reduction is most effective on grainy or noisy images prior to compression, either process may be helpful when used in moderation even on relatively low noise or low grain pictures. Any of several known de-graining or noise-reduction algorithms may be applied. Examples are "coring", simple neighbor median filters, and softening filters.
- noise-reduction is needed is determined by how noisy the original images are.
- the interlace itself is a form of noise, which usually will require additional noise reduction filtering, in addition to the complex de- interlacing process described below.
- noise processing is useful in layered and non-layered compression when noise is present above a certain level.
- video transfers from film include film grain noise.
- Film grain noise is caused by silver grains which couple to yellow, cyan, and magenta film dyes. Yellow affects both red and green, cyan affects both blue and green, and magenta affects both red and blue. Red is formed where yellow and magenta dye crystals overlap.
- the red, green, and blue noise is uncorrelated. In this case, it is best to process the red, green, and blue records independently. Thus, red noise is reduced with self-red processing independently of green noise and blue noise; the same approach applies to green and blue noise. Thus, noise processing is best matched to the characteristics ofthe noise source itself. In the case of a composite image (from multiple sources), the noise may differ in characteristics over different portions ofthe image. In this situation, generic noise processing may be the only option, if noise processing is needed.
- Re-graining and/or re-noising are relatively easy effects to add in the decoder using any of several known algorithms. For example, this can be accomplished by the addition of low pass filtered random noise of suitable amplitude.
- the preferred compression method for interlaced source which is ultimately intended for non-interlaced display includes a step to de-interlace the interlaced source before the compression steps.
- De-interlacing a signal after decoding in the receiver, where the signal has been compressed in the interlaced mode is both more costly and less efficient than de-interlacing prior to compression, and then sending a non-interlaced compressed signal.
- the non-interlaced compressed signal can be either layered or non-layered (i.e., a conventional single layer compression). Experimentation has shown that filtering a single field of an interlaced source, and using that field as if it were a non-interlaced full frame, gives poor and noisy compression results.
- a field-de-interlacer is used as the first step in the overall process to create field-frames.
- each field is de- interlaced, creating a synthesized frame where the total number of lines in the frame is derived from the half number of lines in a field.
- an interlaced 1080 line image will have 540 lines per even and odd field, each field representing l/60th of a second.
- the even and odd fields of 540 lines will be interlaced to create 1080 lines for each frame, which represents l/30th of a second.
- the de-interlacer copies each scanline without modification from a specified field (e.g., the odd fields) to a buffer that will hold some ofthe de- interlaced result.
- the remaining intermediate scanlines (in this example, the even scanlines) for the frame are synthesized by adding half of the field line above and half ofthe field line below each newly stored line.
- the pixel values of line 2 for a frame would each comprise 1/2 ofthe summed corresponding pixel values from each of line 1 and line 3.
- the generation of intermediate synthesized scanlines may be done on the fly, or may be computed after all ofthe scanlines from a field are stored in a buffer. The same process is repeated for the next field, although the field types (i.e., even, odd) will be reversed.
- FIG. 14A is a block diagram of an odd-field de-interlacer, showing that the odd lines from an odd field 1400 are simply copied to a de-interlaced odd field 1402, while the even lines are created by averaging adjacent odd lines from the original odd field together to form the even lines ofthe de-interlaced odd field 1402.
- FIG. 14B is a block diagram of an even-field de-interlacer, showing that the even lines from an even field 1404 are simply copied to a de-interlaced even field 1406, while the odd lines are created by averaging adjacent even lines from the original even field together to form the odd lines ofthe de-interlaced even field 1406. Note that this case corresponde to "top field first"; "bottom field first” could also be considered the "even” field.
- FIG. 15 is a block diagram showing how the pixels of each output frame are composed of 25% of the corresponding pixels from a previous de-interlaced field (field-frame) 1502, 50% ofthe corresponding pixels from a current field-frame 1504, and 25% ofthe corresponding pixels from the next field-frame 1506.
- the new de-interlaced frame then contains much fewer interlace difference artifacts between frames than do the three field-frames of which it is composed.
- there is a temporal smearing by adding the previous field-frame and next field-frame into a current field-frame. This temporal smearing is usually not objectionable, especially in light ofthe de-interlacing improvements which result.
- This de-interlacing process is very beneficial as input to compression, either single layer (unlayered) or layered. It is also beneficial just as a treatment for interlaced video for presentation, viewing, or making still frames, independent of use with compression.
- the picture from the de-interlacing process appears "clearer" than the presentation ofthe interlace directly, or ofthe de-interlaced fields.
- a threshold test may be applied which compares the result ofthe [0.25, 0.5, 0.25] temporal filter against the corresponding pixel values of only the middle field-frame. If a middle field-frame pixel value differs more than a specified threshold amount from the value ofthe corresponding pixel from the three-field-frame temporal filter, then only the middle field-frame pixel value is used.
- a pixel from the three-field-frame temporal filter is selected where it differs less than the threshold amount from the corresponding pixel ofthe single de-interlaced middle field-frame, and the middle field-frame pixel value is used when there is more difference than the threshold.
- This allows fast motion to be tracked at the field rate, and smoother parts ofthe image to be filtered and smoothed by the three-field-frame temporal filter.
- This combination has proven an effective, if not optimal, input to compression. It is also very effective for processing for direct viewing to de-interlace image material (also called line doubling in conjunction with display).
- Rdiff R_single_field_deinterlaced minus R_three_field_deinterlaced
- Gdiff G_single_field_deinterlaced minus G_three_field_deinterlaced
- Bdiff B_single_field_deinterlaced minus B_three_field_deinterlaced
- ThresholdingValue abs(Rdiff+Gdiff+Bdiff) + abs(Rdiff) + abs(Gdiff)+ abs(Bdiff)
- ThresholdingValue is then compared to a threshold setting.
- Typical threshold settings are in the range of 0.1 to 0.3, with 0.2 being most common.
- smooth-filtering the three-field- frame and single-field-frame de-interlaced pictures can be used before comparing and thresholding them.
- This smooth filtering can be accomplished simply by down filtering (e.g., down filtering by two using the preferably down filter described above), and then up filtering (e.g., using a gaussian up-filter by two.
- This "down-up" smoothed filter can be applied to both the single-field-frame de-interlaced picture and the three-field-frame de-interlaced picture.
- the smoothed single-field-frame and three-field-frame pictures can then compared to compute a ThresholdingValue and then thresholded to determine which picture will source each final output pixel.
- the threshold test is used as a switch to select between the single- field-frame de-interlaced picture and the three-field-frame temporal filter combination of single-field-frame de-interlaced pictures.
- This selection then results in an image where the pixels are from the three-field-frame de-interlacer in those areas where that image differs in small amounts (i.e., below the threshold) from the single field-frame image, and where the pixels are from the single field-frame image in those areas where the three-field-frame differed more than then the threshold amount from the single-field-frame de-interlaced pixels (after smoothing).
- This technique has proven effective in preserving single-field fast motion details (by switching to the single-field-frame de-interlaced pixels), while smoothing large portions ofthe image (by switching to the three-field-frame de-interlaced temporal filter combination).
- a typical blending is to create new frame by adding 33.33% (1/3) of a single middle field-frame to 66.67% (2/3) ofthe corresponding three-field-frame smoothed image. This can be done before or after threshold switching, since the result is the same either way, only affecting the smoothed three-field-frame picture.
- the variations adjust the threshold (0.018051) a little, the factor (4.5) a little (e.g. 4.0), and the exponent (0.45) a little (e.g., 0.4).
- the fundamental formula remains the same.
- a matrix operation such as a RGB to/from YUV conversion, implies linear values.
- the various types of MPEG encoding are neutral to the non-linear aspects ofthe signal, although its efficiency is effected due to the use ofthe matrix conversion RGB to/from YUV.
- the brightness variation will be represented completely in the Luminance parameter, where full detail is provided.
- linear vs. logarithmic vs. video issue impacts filtering.
- small signal excursions e.g. 10% or less
- small signal excursions e.g. 10% or less
- a linear filter is much more effective, and produces much better image quality. Accordingly, if large excursions are to be optimally coded, transformed, or otherwise processed, it would be desirable to first convert the non-linear signal to a linear one in order to be able to apply a linear filter.
- De-interlacing is therefore much better when each filter and summation step utilizes conversions to linear values prior to filtering or summing. This is due to the large signal excursions inherent in interlaced signals at small details ofthe image.
- the image signals are converted back to the non-linear video digital representation.
- the three-field-frame weighting e.g., [0.25, 0.5, 0.25] or [0.1667, 0.6666, 0.1667]
- Other filtering and weighted sums of partial terms in noise and de-interlace filtering should also be converted to linear form for computation. Which operations warrant linear processing is determined by signal excursion, and the type of filtering.
- Image sharpening can be appropriately computed in video or logarithmic non-linear representations, since it is self-proportional.
- matrix processing, spatial filtering, weighted sums, and de-interlace processing should be computed using linearized digital values.
- the single field-frame de-interlacer described above computes missing alternate lines by averaging the line above and below each actual line. This average is much more correct numerically and visually if this average is done linearly.
- the digital values are linearized first, then averaged, and then reconverted back into the non-linear video representation.
- a 1280x720 enhancement layer can utilize an 864x480 base layer (i.e., a 2/3 relationship between the enhancement and base layer).
- FIG. 16 is a block diagram of such a mode.
- An original image 1600 at 1280x720 is padded to 1296x720 (to be a multiple of 16) and then downsized by 2/3 to an 864x480 image 1602 (also a multiple of 16).
- the downsizing preferably uses a normal filter or a filter having mild negative lobes.
- this downsized image 1602 may be input to an first encoder 1604 (e.g., an MPEG-2 or MPEG-4 encoder) for direct encoding as a base layer.
- an first encoder 1604 e.g., an MPEG-2 or MPEG-4 encoder
- the base layer is decompressed and upsized (expanded and up-filtered) by 3/2 to a 1296x720 intermediate frame 1606.
- the upfilter preferably has mild negative lobes.
- This intermediate frame 1606 is subtracted from the original image 1600.
- the 864x480 image 1602 is up-filtered by 3/2 (preferably using a gaussian filter) to 1280x720 and subtracted from the original image 1600.
- the result is weighted (e.g., typically by 25% for MPEG-2) and added to the result ofthe subtraction ofthe intermediate frame 1606 from the original image 1600.
- This resulting sum is cropped to a reduced size (e.g., 1152x688) and the edges feathered, resulting in a pre-compression enhancement layer frame 1608.
- This pre- compression enhancement layer frame 1608 is applied as an input to a second encoder 1610 (e.g., an MPEG-2 or MPEG-4 encoder) for encoding as an enhancement layer.
- a second encoder 1610 e.g., an MPEG-2 or MPEG-4 encoder
- the efficiency and quality at 18.5 mbits/sec is approximately equivalent between "single" layered (i.e., non-layered) and a layered system using this configuration.
- the efficiency of a 2/3 relationship between the enhancement and base layer is not as good as when using a factor of two, since the DCT coefficients are less orthogonal between the base and enhancement layers.
- this construction is workable, and has the advantage of providing a high quality base layer (which is cheaper to decode). This is an improvement over the single layered configuration where the entire high resolution picture must be decoded (at a higher cost), when lower resolution is all that can be provided by a particular display.
- the layered configuration also has the advantage that the enhancement sub- region is adjustable.
- efficiency can be controlled by adjusting the size ofthe enhancement layer and the proportion ofthe total bit rate that is allocated to the base layer vs. the enhancement layer.
- Adjustment ofthe enhancement layer size and bit proportion can be used to optimize compression performance, especially under high stress (rapid motion or many scene changes). For example, as noted above, all ofthe bits may be allocated to the base layer under extreme stress.
- a source picture of 2048x1024 could have a base layer of 1536x512, which has a horizontal relationship of 3/4 and a vertical relationship of 1/2 with respect to the source image.
- this is not optimal (a factor of two both horizontally and vertically is optimal), it is illustrative ofthe principle.
- the use of 2/3 both horizontally and vertically might be improved upon for some resolutions by using a factor of 2 vertically and a factor of 2/3 horizontally.
- the most useful filter is the median filter.
- a three element median filter just ranks the three entries, via a simple sort, and picks the middle one. For example, an X (horizontal) median filter looks at the red value (or green or blue) of three adjacent horizontal pixels, and picks the one with the middle-most value. If two are the same, that value is selected. Similarly, a Y filter looks in the scanlines above and below the current pixel, and again picks the middle value.
- each new pixel is the 50% equal average ofthe X and Y medians for the corresponding pixel from a source image.
- a temporal median Another beneficial source of noise reduction is information from the previous and subsequent frame (i.e., a temporal median).
- motion analysis provides the best match for moving regions.
- it is compute intensive If a region ofthe image is not moving, or is moving slowly, the red values (and green and blue) from a current pixel can be median filtered with the red value at that same pixel location in the previous and subsequent frames.
- odd artifacts may occur if significant motion is present and such a temporal filter is used.
- a threshold be taken first, to determine whether such a median would differ more than a selected amount from the value of a current pixel.
- the threshold can be computed essentially the same as for the de-interlacing threshold above:
- Rdiff R_current_pixel minus R_temporal_median
- Gdiff G_current_pixel minus G_temporal_median
- ThresholdingValue abs(Rdiff+Gdiff+Bdiff) + abs(Rdiff) + abs(Gdiff)+ abs(Bdiff)
- the ThresholdingValue is then compared to a threshold setting. Typical threshold settings are in the range 0.1 to 0.3, with 0.2 being typical. Above the threshold, the current value is kept. Below the threshold, the temporal median is used.
- An additional median type is a median taken between the X, Y, and temporal medians.
- Another median type can take the temporal median, and then take the equal average ofthe X and Y medians from it.
- a preferred combination of medians is a linear weighted sum (see the discussion above on linear video processing) of five terms to determine the value for each pixel of a current image: 50% ofthe original image (thus, the most noise reduction is 3db, or half);
- de-interlacing and noise reduction can also be improved by use of motion analysis. Adding the pixels at the same location in three fields or three frames is valid for stationary objects. However, for moving objects, if temporal averaging/smoothing is desired, it is often more optimal to attempt to analyze prevailing motion over a small group of pixels. For example, an nxn block of pixels (e.g., 2x2, 3x3, 4x4, 6x6, or 8x8) can be used to search in previous and subsequent fields or frames to attempt to find a match (in the same way MPEG-2 motion vectors are found by matching 16x16 macroblocks).
- nxn block of pixels e.g., 2x2, 3x3, 4x4, 6x6, or 8x8
- a "trajectory" and "moving mini-picture” can be determined.
- the motion analysis preferably is performed by comparison of an nxn block in the current thresholded de-interlaced image with all nearby blocks in the previous and subsequent one or more frames.
- the comparison may be the absolute value of differences in luminance or RGB over the nxn block.
- One frame is sufficient forward and backward if the motion vectors are nearly equal and opposite. However, if the motion vectors are not nearly equal and opposite, then an additional one or two frames forward and backward can help determine the actual trajectory.
- different de- interlacing treatments may be useful in helping determine the "best guess" motion vectors going forward and back.
- One de-interlacing treatment can be to use only individual de-interlaced fields, although this is heavily prone to aliasing and artifacts on small moving details.
- Another de-interlacing technique is to use only the three- field-frame smooth de-interlacing, without thresholding, having weightings [0.25, 0.5, 0.25], as described above. Although details are smoothed and sometimes lost, the trajectory may often be more correct.
- a "smoothed nxn block" can be created by temporally filtering using the motion-vector-offset pixels from the one (or more) previous and subsequent frames.
- a typical filter might again be [0.25, 0.5, 0.25] or [0.1667, 0.6666, 0.1667] for three frames, and possibly [0.1, 0.2, 0.4, 0.2, 0.1] for two frames back and forward.
- filter weights can be applied to: individual de-interlaced motion- compensated field-frames; thresholded three-field-frame de-interlaced pictures, described above; and non-thresholded three-field-frame de-interlaced images, with a [0.25, 0.5, 0.25] weighting, also as described above.
- the best filter weights usually come from applying the motion-compensated block linear filtering to the thresholded three-field-frame result described above. This is because the thresholded three-field-frame image is both the smoothest (in terms of removing aliasing in smooth areas), as well as the most motion-responsive (in terms of defaulting to a single de-interlaced field-frame above the threshold).
- the motion vectors from motion analysis can be used as the inputs to multi-frame or multi-de-interlaced-field- frame or single-de-interlaced field-frame filters, or combinations thereof.
- the thresholded multi-field-frame de-interlaced images form the best filter input in most cases.
- the use of motion analysis is computationally expensive for a large search region, when fast motion might be found (such as ⁇ 32 pixels). Accordingly, it may be best to augment the speed by using special-purpose hardware or a digital signal processor assisted computer.
- motion vectors Once motion vectors are found, together with their absolute difference measure of accuracy, they can be utilized for the complex process of attempting frame rate conversion.
- occlusion issues objects obscuring or revealing others
- Occlusion can also involve temporal aliasing, as can normal image temporal undersampling and its beat with natural image frequencies (such as the "backward wagon wheel” effect in movies).
- temporal aliasing can also involve temporal aliasing, as can normal image temporal undersampling and its beat with natural image frequencies (such as the "backward wagon wheel” effect in movies).
- De-interlacing is a simple form ofthe same problem. Just as with frame-rate- conversion, the task of de-interlacing is theoretically impossible to perform perfectly. This is especially due to the temporal undersampling (closed shutter), and an inappropriate temporal sample filter (i.e., a box filter). However, even with correct samples, issues such as occlusion and interlace aliasing further ensure the theoretical impossibility of correct results. The cases where this is visible are mitigated by the depth ofthe tools, as described here, which are applied to the problem. Pathological cases will always exist in real image sequences. The goal can only be to reduce the frequency and level of impairment when these sequences are encountered. However, in many cases, the de-interlacing process can be acceptably fully automated, and can run unassisted in real-time. Even so, there are many parameters which can often benefit from manual adjustment.
- the filter parameters for the median filtering described above for an original image should be matched to the noise characteristics ofthe film grain or image sensor that captured the image. After this median filtered image is down-filtered to generate an input to the base layer compression process, it still contains a small amount of noise. This noise may be further reduced by a combination of another X-Y median filters (equally averaging the X and Y medians), plus a very small amount ofthe high frequency smoothing filter.
- a preferred filter weighting of these three terms, applied to each pixel ofthe base layer, is: 70% ofthe original base layer (down filtered from median-filtered original above);
- This small amount of additional filtering in the base layer provides a small additional amount of noise reduction and improved stability, resulting in better MPEG encoding and limiting the amount of noise added by such encoding.
- MPEG-4 reference filters have been implemented for shifting macroblocks when finding the best motion vector match, and then using the matched region for motion compensation.
- MPEG-4 video coding like MPEG-2, supports 1/2 pixel resolution of motion vectors for macroblocks. Unlike MPEG-2, MPEG-4 also supports 1/4 pixel accuracy.
- the filters used are sub-optimal. In MPEG-2, the half-way point between pixels is just the average ofthe two neighbors, which is a sub-optimal box filter. In MPEG-4, this filter is used for 1/2 pixel resolution.
- effects of filtering are significantly improved by using a negative lobe truncated sine function for filtering the 1/8-pixel points for U and V chrominance when using 1/4 pixel luminance resolution, and by using 1/4 pixel resolution filters with similar negative lobe filters when using 1/2 pixel luminance resolution.
- These filters may be applied to video images under MPEG-1, MPEG-2, MPEG-4 or any other appropriate motion-compensated block-based image coding system.
- each individual electronic camera, each camera type, each film- type, and each individual film scanner and scanner type used in creating input to a compression/decompression system should be individually characterized in terms of color alignment and noise (electronic noise for video cameras and scanners, and grain for film).
- the information about where the image was created, a table ofthe specific properties, and specific settings of each piece of equipment, should be carried with the original image, and subsequently used in pre-processing prior to compression.
- a specific camera may require a color realignment. It may also be set on a medium noise setting (substantially affecting the amount of noise processing needed). These camera settings and inherent camera properties should be carried as side information along with each shot from that camera. This information then can be used to control the type of pre-processing, and the settings of parameters for the pre- processes. For images which are edited from multiple cameras, or even composited from multiple cameras and/or film sources, the pre-processing should probably be performed prior to such editing and combining. Such pre-processing should not degrade image quality, and may even be invisible to the eye, but does have a major impact on the quality ofthe compression.
- GB preferably expressed in pixel units.
- the 3-2 pulldown method is used because 24 frames per second do not divide evenly into 59.94 or 60 fields per second for existing NTSC (and some proposed HDTV) systems.
- the odd frames (or even) are placed on two ofthe interlaced fields, and the even frames (or odd) are placed on three ofthe interlaced fields.
- one field is a duplicate in every five fields.
- One frame of film maps to five fields of video.
- this process leads to numerous unpleasant problems.
- Most video processing equipment only applies its process to an immediate signal. With this being the case, a time-changing effect will operate differently on one field than the next, even though some ofthe input fields were duplicates.
- Telecine i.e., convert from film to video
- all original images with a deterministic cadence (i.e., either always 3 then 2, or 2 then 3) if the telecine does not provide direct 24 ⁇ s output.
- a deterministic cadence i.e., either always 3 then 2, or 2 then 3
- any particular processing device requiring 3-2 pulldown as input and ou ⁇ ut will get its input(s) made on the fly in real time from a 24 ⁇ s source.
- the cadence will always begin in a standard way for each input.
- the cadence ofthe device's output is then known, and must be identical to the cadence created on the fly as the devices' input.
- the cadence is then un-done by this a priori knowledge, and the frames are saved in 24 ⁇ s format on the storage medium.
- This methodology requires real-time 3-2 pulldown undo and 3-2 pulldown synthesis. Unless the cadence comes from tape in an unknown format, the 24 ⁇ s nature ofthe frames will automatically be preserved by such a film-based telecine post-production system. The system will then automatically form an optimal input to compression systems (including layered compression process described above).
- This process should be broadly useful in video and HDTV telecine facilities. Someday, when all devices accept a 24 ⁇ s (and other rate progressive scan) native signal input, ou ⁇ ut, processing, and storage modes, such a methodology will no longer be needed. However, in the interim, many devices require 3-2 pulldown for the interface in and out, even though the devices have a targeted function to operate on film input. During this interim, the above methodology eliminates 3-2 pulldown problems and can be an essential element ofthe efficiency of post-production and telecine of film.
- center frame helps strike a balance between the clarity of a single frame, due to the short motion blur, plus the needed blur from the adjacent frames in order to help smooth the stutter of 24 ⁇ s motion (by simulating 24 ⁇ s motion blur).
- This weighting technique works well in about 95% of all cases, allowing this simple weighting function to provide the majority ofthe 24 ⁇ s conversions. For the remaining 5% or so ofthe cases, motion compensation can be used, as taught in U.S. patent application Serial No. 09/435,277. By having reduced the workload on the conversion process by a factor of 20 by this simple weighting technique, the remaining motion-compensated conversions become more practical when needed. It should also be noted that a 120 ⁇ s source can be used with five weightings to achieve a similar results at 24 ⁇ s. For example, weightings of [0.1, 0.2, 0.4, 0.2, 0.1] may be used.
- 60 ⁇ s can be derived from 120 ⁇ s by taking every other frame, although the shorter open shutter duration will be noticeable on fast motion.
- an overlapping filter can also be used (e.g., preferably about [0.1667, 0.6666, 0.1667], but may be in the range [0.1, 0.8, 0.1] to [0.25, 0.5, 0.25]), repeating the low-amplitude weighted frames.
- frame rates allow even more careful shaping ofthe temporal sample for deriving 24 ⁇ s and other frame rates. As the frame rates become very high, the techniques of U.S. Patent No.
- the "blocking" (setting up) ofthe shots in a scene can be checked to ensure that the 24 ⁇ s results will look good (in addition to the 72 ⁇ s or other higher rate full-rate versions).
- the benefits of high frame rate capture are fully integrated with the capability to provide 24 ⁇ s international film and video release.
- bit rate it is useful in many video compression applications to "modularize" the bit rate.
- Variable bit rate systems have used continuously varying bit rates to attempt to apply more bits to faster changing shots. This can be done in a coarse way by giving each useful unit a different bit rate.
- suitable units include a range of frames (a "Group of Pictures,” or GOP) or each P frame.
- GOP Group of Pictures
- the bit rate might be constant within a GOP.
- a higher constant bit rate 5 can be utilized.
- This is similar to the above-described layering technique of applying all ofthe bits in an enhancement layer to the base layer during periods of high stress (typically resetting at the next I frame).
- more bits can be applied to single layer compressions, or to the base and enhancement layer (in the case of layered compression), so as to yield o high quality during periods of high stress.
- FIG. 17 is a diagram of one example of applying higher bit rates to modular portions of a compressed data stream.
- Groups of pictures containing normal 0 scenes 1800, 1802 are allocated bits at a constant rate.
- a GOP 1804 occurs that . contains a scene exhibiting a high level of stress (i.e., changes that are difficult for the compression process to compress as much as "normal" scenes)
- a higher number of bits e.g., 50-100% additional
- MPEG-2 implementations use a constant bit rate. Constant bit rate provides a good match with constant bit rate transport and storage media. Transport systems such as broadcast channels, satellite channels, cables, and fibers, all have a fixed constant total capacity. Also, digital compressed video tape storage systems have a constant tape playback rate, thereby yielding a constant recording or playback bit rate.
- Other MPEG-2 implementations such as DirecTV/DSS, and DVD, use some form of variable bit rate allocation. In the case of DirecTV/DSS, the variability is a combination of scene stress in the current program vs. scene stress in adjacent TV programs which share a common multiplex. The multiplex corresponds to a tuned satellite channel and transponder, which has a fixed total bit rate.
- the digital optical disk capacity is 2.5 Gbytes, requiring that the MPEG-2 bit rate average 4.5 mbits/s for a two-hour movie.
- the optical disk has a peak reading rate capability of 100% higher, at 9 mbits/s.
- the average rate can be higher, up to the full 9mbits/s.
- the way that the bit rate achieves an average of 4.5mbits/s is that a rate above this is used for scenes having high scene stress (high change due to rapid scene motion), while a rate below this average is used during low scene stress (low change due to little motion).
- bit rate in MPEG-2 and MPEG-4 is held constant by a combination of modeling of a virtual decoder buffer's capacity, and by varying the quantization parameter to throttle the bit rate emitted from the encoder.
- a constant quantization parameter will yield a variable number of bits, in proportion to scene change and detail, also known as scene "entropy".
- a constant quantization parameter yields relatively constant quality, but variable bit rate.
- a varying quantization parameter can be used in conjunction with a size bounded decoder buffer to smooth out any variability and provide a constant bit rate.
- the sharing of many channels in a multiplex is one method that can support variable bit rate, as with DirecTV, or with standard definition signals in the ACATS/ATSC 19.3 mbits/s 6Mhz multiplex.
- variable bit rate systems have a peak bit rate, usually somewhere near 100% above the average. Thus, these systems become constant bit rate systems at the highest bit rate, limiting the peak bit rate available for periods of continued high scene stress. There is also a limit to the input bit rate in some MPEG-2 decoder systems, also limiting the peak bit rate in such variable bit rate systems. However, this limit on peak input bit rate is gradually rising well above these other limits, as decoders improve.
- each of these prior bit rate control systems is that there is a small memory buffer in the decoder, holding somewhere between a fraction of a frame and a few frames of moving image.
- this decoder bit rate buffer was conceived, around 1990, there was concern that the memory cost of this buffer in decoders would have a significant affect on the decoder's price.
- the cost of this buffer has proven insignificant. In fact, many seconds' worth of buffer is now an insignificant cost. It may be extrapolated that, in the near future, the bit receiving memory buffer may hold many minutes of video information at insignificant cost. Further, the cost of disk and other storage media has also fallen rapidly, while capacity has increased rapidly. It is therefore also reasonable to spool the compressed bitstream to disk or other storage memory systems, thereby yielding many hours or days worth of storage capacity. This is currently being done by commercially available harddrive based home video recorders.
- FIG. 18 graphically illustrates the relationships of DCT harmonics between two resolution layers.
- the base layer utilizes DCT coefficients using an arithmetic harmonic series having frequencies of 1, 2, 3, 4, 5, 6, and 7 times the 8x8 pixel DCT block size 1900.
- these base layer harmonics then map to frequencies of 1/2, 1, 3/2, 2, 5/2, 3, and 7/2 ofthe corresponding enhancement layer DCT block 1902.
- frequencies of 2, 4, and 6 times the macroblock size from the base layer are aligned with frequencies of 1, 2, and 3 times the macroblock size from the enhancement layer.
- SNR signal-to-noise ratio
- the 3, 5, and 7 terms from the base layer are non- harmonic with the enhancement layer, and therefore represent orthogonality to the base layer only, providing no synergy with the enhancement layer.
- the remaining terms in the enhancement layer, 4, 5, 6, and 7, represent additional detail which the enhancement layer can provide to the image, without overlap with the base layer.
- FIG. 19 graphically illustrates the similar relationships of DCT harmonics between three resolution layers, showing a highest enhancement layer 1904.
- a solution to providing cross-layer orthogonality is to utilize different DCT block sizes for each resolution layer. For example, if a given layer doubles the resolution, then the DCT block size will be twice as large. This results in a harmonically aligned resolution layering structure, providing optimal coding efficiency due to optimal inter-layer coefficient orthogonality.
- FIG. 20 is a diagram showing various DCT block sizes for different resolution layers.
- a 4x4 pixel DCT block 2000 could be used at the base layer
- an 8x8 pixel DCT block 2002 could be utilized at the next layer up
- a 16x16 pixel DCT block 2004 could be utilized at the third layer
- a 32x32 pixel DCT block 2006 could be utilized at the fourth layer.
- each layer adds additional harmonic terms in full orthogonality to the layer(s) below.
- additional precision in the SNR sense
- the 16x16 pixel subset 2008 in the 32x32 pixel block 2006 can be used to augment (in an SNR improvement sense) the precision ofthe 16x16 pixel DCT block 2004.
- macroblocks corresponding to motion vectors consist of 16x16 pixels, organized as four 8x8 DCT blocks.
- each macroblock can optionally be further subdivided into 8x8 regions, corresponding to the DCT blocks, each having their own motion vector.
- the motion compensation macroblocks need not be constrained by this structure.
- the simplest structure is where the single motion vector for each base layer motion compensation macroblock applies to all higher layers as well, eliminating motion vectors from all enhancement layers entirely, since the motion is specified by the base layer's motion vector for all layers together.
- a more efficient structure is to allow each layer to independently select (1) no motion vector (i.e., use the base layer motion vector), (2) additional sub-pixel precision for the base layer's motion vector, or (3) split each motion compensation macroblock into two, four, or other numbers of blocks each having independent motion vectors.
- OBMC overlapped block motion compensation
- each DCT block at each layer may be split into as many motion vector blocks for motion compensation as are optimal for that layer.
- FIG. 21 is a diagram showing examples of splitting of motion compensation macroblocks for determining independent motion vectors.
- the base layer if constructed using 4x4 pixel DCT blocks 2100, could utilize from one (shown) to as many as 16 motion vectors (one for each pixel), or even utilize sub-pixel motion vectors.
- each higher level can split its larger corresponding DCT block 2102, 2104, 2106 as appropriate, yielding an optimal balance between coding prediction quality (thus saving DCT coefficient bits) vs. the bits required to specify the motion vectors.
- the block split for motion compensation is a tradeoff between the bits used to code the motion vectors and the improvement in picture prediction.
- variable length codes such as Huffman or arithmetic codes
- MPEG-1, MPEG-2, MPEG-4, H.263, and other compression systems are selected based upon demonstrated efficiency on a small group of test sequences. These test sequences are limited in the types of images, and only represent a relatively narrow range of bit rate, resolution, and frame rate. Further, the variable length codes are selected based upon average performance over each test sequence, and over the test sequences as a group.
- variable length coding system can be obtained by (1) applying specific variable length coding tables to each frame and (2) selecting the most optimal codes for that particular frame.
- Such a selection of optimal variable length codes can applied in units smaller than a frame (a part or region of a frame), or in groups of several frames.
- the variable length codes used for motion vectors, DCT coefficients, macroblock types, etc. can then each be independently optimized for the instantaneous conditions of a given unit (i.e., frame, sub-frame, or group of frames) at that unit's current resolution and bit rate.
- This technique is also applicable to the spatial resolution enhancement layers described in other parts of this description.
- variable length codes The selection of which group of variable length codes is to be used can be conveyed with each frame (or subpart or group) using a small number of bits. Further, custom coding tables can be downloaded where reliable data transmission or playback is available (such as with optical data disk or optical fiber networks).
- MPEG-2 capable decoders.
- DVD players and DirecTV satellite receivers are now in millions of homes.
- the improvement which MPEG-4 video compression coding could offer beyond MPEG-2 is not yet available, since MPEG-4 is incompatible with MPEG-2.
- MPEG-4 and MPEG-2 are both motion-compensated DCT compression system, sharing a common basic structure.
- the composition system in MPEG-4's video coding system is fundamentally different from MPEG-2, as are some other expanded features. In this discussion, only the full frame video coding aspects of MPEG-4 are being considered.
- MPEG-4 can optionally split a 16x16 macroblock into four 8x8 blocks, one for each DCT, each having an independent motion vector.
- MPEG-4 B-frames have a "direct” mode, which is a type of prediction.
- MPEG-4 B-frames do not support "I” macroblocks, unlike MPEG-2 which does support "I” macroblocks in B-frames.
- the DCT coefficients in MPEG-4 can be coded by more elaborate patterns than with MPEG-2, although the well-known zigzag pattern is common to both MPEG-2 and MPEG-4.
- MPEG-4 supports 10-bit and 12-bit pixel depths, whereas MPEG-2 is limited to 8 bits.
- MPEG-4 supports quarter-pixel motion vector precision, whereas MPEG-2 is limited to half-pixel precision.
- FIG. 22 is a block diagram showing an augmentation system for MPEG-2 type systems.
- a main compressed data stream 2200 (shown as including motion vectors, DCT coefficients, macroblock mode bits, and I, B, and P frames) is conveyed to a conventional MPEG-2 type decoder 2202 and to a parallel enhanced decoder 2204.
- an enhanced data stream 2206 (shown as including quarter-pixel motion vector precision, 8x8 four-way block split motion vectors, and 10-bit and 12-bit pixel depths) is conveyed to the enhanced decoder 2204.
- the enhanced decoder 2204 would combine the two data streams 2200, 2206 and decode them to provide an enhanced video output.
- any coding enhancements can be added to any motion-compensated DCT compression system.
- the use of this structure can be biased by an encoder toward more optimal
- MPEG-2 decoding or toward more optimal enhanced decoding.
- the expectation is that such enhanced decoding, by adding MPEG-4 video coding improvements, would be favored, to achieve the optimal enhanced picture quality, with a small compromise in quality to the MPEG-2 decoded picture.
- the MPEG-2 motion vectors can be used as "predictors" for the four-way split motion vectors (in those cases where MPEG-4 chooses to split four ways), or may be used directly for non-split 16x16 macroblocks.
- the quarter pixel motion vector resolution can be coded as one additional bit of precision (vertically and horizontally) in the enhanced data stream 2206.
- the extra pixel depth can be coded as extra precision to the DCT coefficients prior to applying the inverse DCT function.
- the spatial resolution layering which is a principal subject of this invention performs most optimally when the base layer is as perfectly coded as possible.
- MPEG- 2 is an imperfect coding, yielding degraded performance for resolution enhancement layers.
- the base layer can be improved, for example, by using the MPEG-4 improvements described above (as well as other improvements set forth in this description) to augment the MPEG-2 data stream that encodes the base layer.
- the resulting base layer, with accompanying enhancement data stream will then have most ofthe quality and efficiency that would have been obtained using an improved base layer which would have resulted from better coding (such as with MPEG-4 and the other improvements of this invention).
- the resulting improved base layer can then have one or more resolution enhancement layers applied, using other aspects of this invention.
- Motion vectors comprise a large portion ofthe allocated bits within each resolution enhancement layer created in accordance with the invention. It has been determined that it is possible to substantially reduce the number of bits required for enhancement layer motion vectors by using the corresponding motion vectors at the same position in the base layer as "guide vectors".
- the enhancement layer motion vectors are therefore coded by only searching for a small search range about the corresponding guide vector center from the base layer. This is especially important with MPEG-4 enhancement layers, since each macroblock can optionally have 4 motion vectors, and since quarter-pixel resolution of motion vectors is available.
- FIG. 23 is a diagram showing use of motion vectors from a base layer 2300 as guide vectors for a resolution enhancement layer 2302.
- the process is the same for all ofthe motion vectors from the base layer. For example, in MPEG-4 a 16x16 pixel base layer macroblock may optionally be split into four 8x8 pixel motion vector blocks. A corresponding factor-of-two enhancement layer would then utilize the co- located motion vectors from the base layer as guide vectors.
- a motion vector from one ofthe 8x8 motion vector blocks in the base layer would guide the search for a motion vector in a corresponding 16x16 pixel macroblock in the enhancement layer.
- This 16x16 block could optionally be further split into four 8x8 motion vector blocks, all using the same corresponding base layer motion vector as a guide vector.
- This guide-vector technique is applicable to MPEG-2, MPEG-4, or other appropriate motion-compensated spatial resolution enhancement layer(s).
- FIGS. 24A-24E are data flow diagrams showing on example professional level enhancement mode. This figures shows picture data (including intermediate stages) in the left column, processing steps in the middle column, and ou ⁇ ut in the right column. It should be noted that this is just one example of how to combine a number ofthe processing steps described herein. Different combinations, simpler as well as more complex, can be configured to achieve different levels of compression, aspect ratios, and image quality.
- FIG. 24A shows an initial picture 2400 at 2kxlk pixels. Down filter 2402 this image to lkx512 pixels 2404. Create motion vectors 2406 from the initial picture and output as a file 2407. Compress/decompress 2408 the lkx512 pixel image 2404 to a lkx512 decompressed image 2410 and output the compressed version as the base layer 2412, along with the associated motion vector files 2416. Expand 2418 the lkx512 decompressed image 2410 as a 2kxlk image 2420. Expand 2422 the lkx512 image 2404 as a 2kxlk image 2424. Subtract 2426 the 2kxlk image 2420 from the original image 2400 to create a 2kxlk difference picture 2428.
- Encode/decode 2442 the combined difference picture 2440 using the original motion vectors and ou ⁇ ut an encoded enhancement layer 2444 (MPEG-2, in this example), and a 2kxlk decoded enhanced layer 2246.
- Add 2448 the 2kxlk decoded enhanced layer 2246 to the 2kxlk image 2420 to create a 2kxlk reconstructed full base plus enhancement image 2450.
- Subtract 2452 the original image 2400 from the 2kxlk reconstructed full base plus enhancement image 2450 to create a 2kxlk second layer difference picture 2454.
- Increase 2456 the amplitude ofthe 2kxlk second layer difference picture 2454 to create a 2kxlk difference picture 2458.
- red channel information 2458, the green channel information 2460, and the blue channel information 2462 to create respective red difference 2464, green difference 2466, and blue difference 2468 images.
- encode/decode 2470 a second red layer from the red difference picture 2464 as a red second enhancement layer 2472, and a decoded red difference image 2474
- encode/decode 2476 a second green layer from the green difference picture 2466 as a green second enhancement layer 2478, and a decoded green difference image 2480
- encode/decode 2482 a second blue layer from the blue difference picture 2468 as a blue second enhancement layer 2484, and a decoded blue difference image 2486.
- the invention may be implemented in hardware or software, or a combination of both. However, preferably, the invention is implemented in computer programs executing on one or more programmable computers each comprising at least a processor, a data storage system (including volatile and non- volatile memory and/or storage elements), an input device, and an ou ⁇ ut device. Program code is applied to input data to perform the functions described herein and generate ou ⁇ ut information. The ou ⁇ ut information is applied to one or more ou ⁇ ut devices, in known fashion.
- Each such program may be implemented in any desired computer language (including machine, assembly, or high level procedural, logical, or object oriented programming languages) to communicate with a computer system.
- the language may be a compiled or interpreted language.
- Each such computer program is preferably stored on a storage media or device
- inventive system may also be considered to be implemented as a computer-readable storage medium, configured with a computer program, where the storage medium so configured causes a computer system to operate in a specific and predefined manner to perform the functions described herein.
- Allowing higher bit rates for compression units (such as the GOP) during periods of high compression stress (either automatically, by detecting high values of rate control quantization parameter, or manually controlled).
- variable bit rate systems with one or more layers ofthe present layered compression system. • Using combined fixed and variable bit rate systems used with various layers ofthe present layered compression system.
- DCT block size Using correspondingly larger DCT block size and additional DCT coefficients for use in resolution layering (also called "spatial scalability"). For example, if a given layer doubles the resolution, then the DCT block size will be twice as large. This results in a harmonically aligned resolution layering structure, providing optimal coding efficiency due to optimal inter-layer coefficient orthogonality.
- Using multiple motion vectors per DCT block so that both large and small DCT blocks can optimize the tradeoff between motion vector bits and improved motion compensated prediction.
- negative-lobed upsizing and downsizing filters particularly truncated sine filters.
- Using negative-lobed motion compensation displacement filters Using negative-lobed motion compensation displacement filters.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Computer Graphics (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
- Television Systems (AREA)
- Studio Devices (AREA)
Abstract
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU2001251386A AU2001251386A1 (en) | 2000-04-07 | 2001-04-06 | Enhanced temporal and resolution layering in advanced television |
JP2001574651A JP4352105B2 (ja) | 2000-04-07 | 2001-04-06 | アドバンスドテレビジョンの強化された時相及び解像度の階層化 |
EP01924762A EP1279111A4 (fr) | 2000-04-07 | 2001-04-06 | Organisation renforcee en couches temporelles et par resolution dans la television avancee |
CA002406459A CA2406459C (fr) | 2000-04-07 | 2001-04-06 | Organisation renforcee en couches temporelles et par resolution dans la television avancee |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US09/545,233 US6728317B1 (en) | 1996-01-30 | 2000-04-07 | Moving image compression quality enhancement using displacement filters with negative lobes |
US09/545,233 | 2000-04-07 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2001077871A1 true WO2001077871A1 (fr) | 2001-10-18 |
Family
ID=24175400
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2001/011204 WO2001077871A1 (fr) | 2000-04-07 | 2001-04-06 | Organisation renforcee en couches temporelles et par resolution dans la television avancee |
Country Status (5)
Country | Link |
---|---|
EP (1) | EP1279111A4 (fr) |
JP (1) | JP4352105B2 (fr) |
AU (1) | AU2001251386A1 (fr) |
CA (1) | CA2406459C (fr) |
WO (1) | WO2001077871A1 (fr) |
Cited By (49)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1460846A1 (fr) * | 2001-12-13 | 2004-09-22 | Sony Corporation | Appareil de traitement de signal d'image et procede de traitement |
WO2004095829A1 (fr) * | 2003-04-10 | 2004-11-04 | Thomson Licensing S.A. | Technique de simulation de grain sur une video codee |
WO2004102963A1 (fr) | 2003-05-16 | 2004-11-25 | Sony Corporation | Dispositif et procede de correction de mouvement |
EP1501312A2 (fr) * | 2003-07-18 | 2005-01-26 | Samsung Electronics Co., Ltd. | Appareil et procédé pour le codage et le décodage d'image |
EP1511320A1 (fr) * | 2003-09-01 | 2005-03-02 | Matsushita Electric Industrial Co., Ltd. | Codage du grain de film |
EP1526730A1 (fr) * | 2002-04-17 | 2005-04-27 | Matsushita Electric Industrial Co., Ltd. | Dispositif convertisseur d'image et procede correspondant |
EP1536643A1 (fr) * | 2002-04-17 | 2005-06-01 | Matsushita Electric Industrial Co., Ltd. | Dispositif de conversion d'image et procede de conversion d'image |
WO2005076599A1 (fr) * | 2004-02-03 | 2005-08-18 | Koninklijke Philips Electronics N.V. | Modification du rapport hauteur/largeur d'images a afficher sur un ecran |
FR2872664A1 (fr) * | 2004-07-01 | 2006-01-06 | Nextream France Sa | Dispositif et procede de pre-traitemebnt avant codage d'une sequence d'images video |
WO2006008609A1 (fr) * | 2004-07-12 | 2006-01-26 | Nokia Corporation | Systeme et procede de prediction des mouvements en codage video a geometrie variable |
JP2006509421A (ja) * | 2002-12-03 | 2006-03-16 | トムソン ライセンシング | 単一のディスク上で標準画質および高画質のビデオ・フォーマットを実現するディジタル・ビデオ・ディスク |
WO2006020532A3 (fr) * | 2004-08-09 | 2006-04-06 | Pinnacle Systems Inc | Filtrage rapide par zone selectionnee, destine a la reduction d'artefacts de type pixel-bruit et analogue |
FR2876860A1 (fr) * | 2004-10-20 | 2006-04-21 | Thomson Licensing Sa | Procede de codage hierarchique d'images video |
FR2876861A1 (fr) * | 2004-10-20 | 2006-04-21 | Thomson Licensing Sa | Procede de codage d'images video de differents formats non proportionnels |
WO2006058921A1 (fr) * | 2004-12-03 | 2006-06-08 | Thomson Licensing | Procede de codage video a geometrie variable |
FR2879066A1 (fr) * | 2004-12-03 | 2006-06-09 | Thomson Licensing Sa | Procede et dispositif de codage hierarchique inter couches |
WO2006064422A1 (fr) * | 2004-12-13 | 2006-06-22 | Koninklijke Philips Electronics N.V. | Codage d'images echelonnables |
EP1737240A2 (fr) * | 2005-06-21 | 2006-12-27 | Thomson Licensing | Procédé pour le codage ou le décodage ä l'echelle d'une image |
JP2007514336A (ja) * | 2003-05-15 | 2007-05-31 | トムソン ライセンシング | 1つ又は複数のパラメータによって画像粒状性を表す方法及び装置 |
WO2007063912A1 (fr) | 2005-11-29 | 2007-06-07 | Matsushita Electric Industrial Co., Ltd. | Dispositif de reproduction |
WO2007080477A3 (fr) * | 2006-01-10 | 2007-10-25 | Nokia Corp | Mécanisme de sur-échantillonnage de filtre commuté pour codage vidéo hiérarchique |
US7366242B2 (en) | 1996-01-30 | 2008-04-29 | Dolby Laboratories Licensing Corporation | Median filter combinations for video noise reduction |
JP2008530839A (ja) * | 2005-02-07 | 2008-08-07 | トムソン ライセンシング | 24Hzフレーム周波数のビデオ信号に基づくオーディオ/ビデオ・データに関係するビデオ信号および一つまたは複数のオーディオ信号を再生するための方法および装置 |
EP2129108A1 (fr) * | 2006-12-18 | 2009-12-02 | Sony Corporation | Dispositif et procédé d'imagerie, dispositif et procédé d'enregistrement, et dispositif et procédé de reproduction |
US7680356B2 (en) | 2003-10-14 | 2010-03-16 | Thomson Licensing | Technique for bit-accurate comfort noise addition |
US7738721B2 (en) | 2003-08-29 | 2010-06-15 | Thomson Licensing | Method and apparatus for modeling film grain patterns in the frequency domain |
US7738722B2 (en) | 2004-10-21 | 2010-06-15 | Thomson Licensing | Technique for adaptive de-blocking of block-based film grain patterns |
EP2254339A2 (fr) | 2002-06-28 | 2010-11-24 | Dolby Laboratories Licensing Corporation | Interpolation améliorée de cadres vidéo compressés |
US7843508B2 (en) | 2002-07-23 | 2010-11-30 | Mediostream, Inc. | Method and system for direct recording of video information onto a disk medium |
US7852409B2 (en) | 2004-11-16 | 2010-12-14 | Thomson Licensing | Bit-accurate seed initialization for pseudo-random number generators used in a video system |
US7889939B2 (en) | 2003-09-23 | 2011-02-15 | Thomson Licensing | Technique for simulating film grain using frequency filtering |
US7945106B2 (en) | 2003-09-23 | 2011-05-17 | Thomson Licensing | Method for simulating film grain by mosaicing pre-computer samples |
CN102075755A (zh) * | 2005-03-18 | 2011-05-25 | 夏普株式会社 | 用于图像上采样的方法和系统 |
US8014558B2 (en) | 2004-10-18 | 2011-09-06 | Thomson Licensing | Methods, apparatus and system for film grain simulation |
US8150206B2 (en) | 2004-03-30 | 2012-04-03 | Thomson Licensing | Method and apparatus for representing image granularity by one or more parameters |
US8238613B2 (en) | 2003-10-14 | 2012-08-07 | Thomson Licensing | Technique for bit-accurate film grain simulation |
US8345762B2 (en) | 2005-02-18 | 2013-01-01 | Thomson Licensing | Method for deriving coding information for high resolution pictures from low resolution pictures and coding and decoding devices implementing said method |
US8446956B2 (en) | 2006-01-05 | 2013-05-21 | Thomson Licensing | Inter-layer motion prediction method using resampling |
WO2014137920A1 (fr) * | 2013-03-05 | 2014-09-12 | Qualcomm Incorporated | Construction d'image de référence entre couches pour aptitude à une mise à l'échelle spatiale avec différents rapports géométriques |
US9098916B2 (en) | 2004-11-17 | 2015-08-04 | Thomson Licensing | Bit-accurate film grain simulation method based on pre-computed transformed coefficients |
US9117261B2 (en) | 2004-11-16 | 2015-08-25 | Thomson Licensing | Film grain SEI message insertion for bit-accurate simulation in a video system |
US9167266B2 (en) | 2006-07-12 | 2015-10-20 | Thomson Licensing | Method for deriving motion for high resolution pictures from motion data of low resolution pictures and coding and decoding devices implementing said method |
US9177364B2 (en) | 2004-11-16 | 2015-11-03 | Thomson Licensing | Film grain simulation method based on pre-computed transform coefficients |
EP3340625A4 (fr) * | 2015-08-19 | 2019-01-23 | Sony Corporation | Dispositif d'émission, procédé d'émission, dispositif de réception et procédé de réception |
US10715834B2 (en) | 2007-05-10 | 2020-07-14 | Interdigital Vc Holdings, Inc. | Film grain simulation based on pre-computed transform coefficients |
CN113316001A (zh) * | 2021-05-25 | 2021-08-27 | 上海哔哩哔哩科技有限公司 | 视频对齐方法及装置 |
CN114697677A (zh) * | 2022-03-31 | 2022-07-01 | 展讯通信(上海)有限公司 | 数据压缩方法及装置、计算机可读存储介质、终端 |
WO2022251383A1 (fr) * | 2021-05-26 | 2022-12-01 | Qualcomm Incorporated | Limites d'éléments d'iu de haute qualité utilisant des masques dans des trames interpolées dans le temps |
CN119052399A (zh) * | 2024-10-30 | 2024-11-29 | 渭南大东印刷包装机械有限公司 | 一种用于凹版印刷机的工作图像传输方法 |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CA2584215A1 (fr) * | 2004-10-18 | 2006-04-27 | Samsung Electronics Co., Ltd. | Procedes de codage et de decodage video par filtrage intercouche et codeur et decodeur associes |
JP2009530946A (ja) * | 2006-03-23 | 2009-08-27 | コーニンクレッカ フィリップス エレクトロニクス エヌ ヴィ | 映画データを符号化する符号化装置及び方法 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5742343A (en) * | 1993-07-13 | 1998-04-21 | Lucent Technologies Inc. | Scalable encoding and decoding of high-resolution progressive video |
US5828788A (en) * | 1995-06-29 | 1998-10-27 | Thomson Multimedia, S.A. | System for processing data in variable segments and with variable data resolution |
US5852565A (en) * | 1996-01-30 | 1998-12-22 | Demografx | Temporal and resolution layering in advanced television |
US6028634A (en) * | 1995-10-27 | 2000-02-22 | Kabushiki Kaisha Toshiba | Video encoding and decoding apparatus |
-
2001
- 2001-04-06 CA CA002406459A patent/CA2406459C/fr not_active Expired - Lifetime
- 2001-04-06 AU AU2001251386A patent/AU2001251386A1/en not_active Abandoned
- 2001-04-06 WO PCT/US2001/011204 patent/WO2001077871A1/fr not_active Application Discontinuation
- 2001-04-06 EP EP01924762A patent/EP1279111A4/fr not_active Withdrawn
- 2001-04-06 JP JP2001574651A patent/JP4352105B2/ja not_active Expired - Fee Related
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5742343A (en) * | 1993-07-13 | 1998-04-21 | Lucent Technologies Inc. | Scalable encoding and decoding of high-resolution progressive video |
US5828788A (en) * | 1995-06-29 | 1998-10-27 | Thomson Multimedia, S.A. | System for processing data in variable segments and with variable data resolution |
US6028634A (en) * | 1995-10-27 | 2000-02-22 | Kabushiki Kaisha Toshiba | Video encoding and decoding apparatus |
US5852565A (en) * | 1996-01-30 | 1998-12-22 | Demografx | Temporal and resolution layering in advanced television |
US5988863A (en) * | 1996-01-30 | 1999-11-23 | Demografx | Temporal and resolution layering in advanced television |
Non-Patent Citations (1)
Title |
---|
See also references of EP1279111A4 * |
Cited By (110)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
USRE45082E1 (en) | 1996-01-30 | 2014-08-19 | Dolby Laboratories Licensing Corporation | Enhancing image quality in an image system |
US7366242B2 (en) | 1996-01-30 | 2008-04-29 | Dolby Laboratories Licensing Corporation | Median filter combinations for video noise reduction |
USRE44235E1 (en) | 1996-01-30 | 2013-05-21 | Dolby Laboratories Licensing Corporation | Enhancing image quality in an image system |
USRE43360E1 (en) | 1996-01-30 | 2012-05-08 | Dolby Laboratories Licensing Corporation | Enhancing image quality in an image system |
US8995528B2 (en) | 2001-07-11 | 2015-03-31 | Dolby Laboratories Licensing Corporation | Switch-select single frame reference |
US8942285B2 (en) | 2001-07-11 | 2015-01-27 | Dolby Laboratories Licensing Corporation | Motion compensation filtering in an image system |
US8767829B2 (en) | 2001-07-11 | 2014-07-01 | Dolby Laboratories Licensing Corporation | Switch-select single frame reference |
EP1460846A4 (fr) * | 2001-12-13 | 2009-12-09 | Sony Corp | Appareil de traitement de signal d'image et procede de traitement |
EP1460846A1 (fr) * | 2001-12-13 | 2004-09-22 | Sony Corporation | Appareil de traitement de signal d'image et procede de traitement |
EP1526730A1 (fr) * | 2002-04-17 | 2005-04-27 | Matsushita Electric Industrial Co., Ltd. | Dispositif convertisseur d'image et procede correspondant |
EP1526730A4 (fr) * | 2002-04-17 | 2005-09-07 | Matsushita Electric Ind Co Ltd | Dispositif convertisseur d'image et procede correspondant |
EP1536643A4 (fr) * | 2002-04-17 | 2005-09-07 | Matsushita Electric Ind Co Ltd | Dispositif de conversion d'image et procede de conversion d'image |
US7446815B2 (en) | 2002-04-17 | 2008-11-04 | Matsushita Electric Industrial Co., Ltd. | Image conversion device and image conversion method |
EP1536643A1 (fr) * | 2002-04-17 | 2005-06-01 | Matsushita Electric Industrial Co., Ltd. | Dispositif de conversion d'image et procede de conversion d'image |
US7388617B2 (en) | 2002-04-17 | 2008-06-17 | Matsushita Electric Industrial Co., Ltd. | Image conversion device and image conversion method |
EP2458863A2 (fr) | 2002-06-28 | 2012-05-30 | Dolby Laboratories Licensing Corporation | Interpolation améliorée de cadres vidéo compressés |
EP2254339A2 (fr) | 2002-06-28 | 2010-11-24 | Dolby Laboratories Licensing Corporation | Interpolation améliorée de cadres vidéo compressés |
EP2458864A2 (fr) | 2002-06-28 | 2012-05-30 | Dolby Laboratories Licensing Corporation | Interpolation améliorée de cadres vidéo compressés |
US8619188B2 (en) | 2002-07-23 | 2013-12-31 | Mediostream, Inc. | Method and system for direct recording of video information onto a disk medium |
US7843508B2 (en) | 2002-07-23 | 2010-11-30 | Mediostream, Inc. | Method and system for direct recording of video information onto a disk medium |
JP2006509421A (ja) * | 2002-12-03 | 2006-03-16 | トムソン ライセンシング | 単一のディスク上で標準画質および高画質のビデオ・フォーマットを実現するディジタル・ビデオ・ディスク |
JP4751614B2 (ja) * | 2002-12-03 | 2011-08-17 | トムソン ライセンシング | 単一のディスク上で標準画質および高画質のビデオ・フォーマットを実現するディジタル・ビデオ・ディスク |
US7912125B2 (en) | 2002-12-03 | 2011-03-22 | Thomson Licensing | Hybrid scalable encoder, method and media for standard definition and high-definition video formats on a single-disc |
US7899113B2 (en) | 2003-04-10 | 2011-03-01 | Thomson Licensing | Technique for simulating film grain on encoded video |
KR100987666B1 (ko) * | 2003-04-10 | 2010-10-13 | 톰슨 라이센싱 | 인코딩된 비디오 상에서 필름 그레인을 시뮬레이팅하기 위한 방법 |
WO2004095829A1 (fr) * | 2003-04-10 | 2004-11-04 | Thomson Licensing S.A. | Technique de simulation de grain sur une video codee |
US7742655B2 (en) | 2003-05-15 | 2010-06-22 | Thomson Licensing | Method and apparatus for representing image granularity by one or more parameters |
JP2007514336A (ja) * | 2003-05-15 | 2007-05-31 | トムソン ライセンシング | 1つ又は複数のパラメータによって画像粒状性を表す方法及び装置 |
EP1627360B1 (fr) * | 2003-05-15 | 2018-10-03 | Dolby International AB | Procede et appareil permettant de representer la granularite d'une image par un ou plusieurs parametres |
JP2007515846A (ja) * | 2003-05-15 | 2007-06-14 | トムソン ライセンシング | 1つ又は複数のパラメータによって画像粒状性を表す方法及び装置 |
JP2011071998A (ja) * | 2003-05-15 | 2011-04-07 | Thomson Licensing | 1つ又は複数のパラメータによって画像粒状性を表す方法及び装置 |
EP1513344A4 (fr) * | 2003-05-16 | 2009-10-28 | Sony Corp | Dispositif et procede de correction de mouvement |
EP1513344A1 (fr) * | 2003-05-16 | 2005-03-09 | Sony Corporation | Dispositif et procede de correction de mouvement |
WO2004102963A1 (fr) | 2003-05-16 | 2004-11-25 | Sony Corporation | Dispositif et procede de correction de mouvement |
US9961356B2 (en) | 2003-07-18 | 2018-05-01 | Samsung Electronics Co., Ltd. | Image encoding and decoding apparatus and method |
US9706216B2 (en) | 2003-07-18 | 2017-07-11 | Samsung Electronics Co., Ltd. | Image encoding and decoding apparatus and method |
US8345748B2 (en) | 2003-07-18 | 2013-01-01 | Samsung Electronics Co., Ltd. | Image encoding and decoding apparatus and method |
US9042443B2 (en) | 2003-07-18 | 2015-05-26 | Samsung Electronics Co., Ltd. | Image encoding and decoding apparatus and method |
US9042442B2 (en) | 2003-07-18 | 2015-05-26 | Samsung Electronics Co., Ltd. | Image encoding and decoding apparatus and method |
EP2323397A3 (fr) * | 2003-07-18 | 2011-08-10 | Samsung Electronics Co., Ltd. | Appareil et procédé de codage et de décodage d'images |
EP1959689A3 (fr) * | 2003-07-18 | 2010-05-12 | Samsung Electronics Co., Ltd | Appareil et procédé de codage et de décodage d'image |
EP1501312A2 (fr) * | 2003-07-18 | 2005-01-26 | Samsung Electronics Co., Ltd. | Appareil et procédé pour le codage et le décodage d'image |
EP2323396A3 (fr) * | 2003-07-18 | 2011-08-10 | Samsung Electronics Co., Ltd. | Appareil et procédé de codage et de décodage d'images |
US8270474B2 (en) | 2003-07-18 | 2012-09-18 | Samsung Electronics Co., Ltd. | Image encoding and decoding apparatus and method |
EP2323400A3 (fr) * | 2003-07-18 | 2011-08-10 | Samsung Electronics Co., Ltd. | Appareil et procédé de codage et de décodage d'images |
US9706215B2 (en) | 2003-07-18 | 2017-07-11 | Samsung Electronics Co., Ltd. | Image encoding and decoding apparatus and method |
US9729892B2 (en) | 2003-07-18 | 2017-08-08 | Samsung Electronics Co., Ltd. | Image encoding and decoding apparatus and method |
EP1501312A3 (fr) * | 2003-07-18 | 2005-10-26 | Samsung Electronics Co., Ltd. | Appareil et procédé pour le codage et le décodage d'image |
EP2323395A3 (fr) * | 2003-07-18 | 2011-08-10 | Samsung Electronics Co., Ltd. | Appareil et procédé de codage et de décodage d'images |
EP2323401A3 (fr) * | 2003-07-18 | 2011-08-10 | Samsung Electronics Co., Ltd. | Appareil et procédé de codage et de décodage d'images |
US10602172B2 (en) | 2003-07-18 | 2020-03-24 | Samsung Electronics Co., Ltd. | Image encoding and decoding apparatus and method |
US7738721B2 (en) | 2003-08-29 | 2010-06-15 | Thomson Licensing | Method and apparatus for modeling film grain patterns in the frequency domain |
EP1511320A1 (fr) * | 2003-09-01 | 2005-03-02 | Matsushita Electric Industrial Co., Ltd. | Codage du grain de film |
US7945106B2 (en) | 2003-09-23 | 2011-05-17 | Thomson Licensing | Method for simulating film grain by mosaicing pre-computer samples |
US7889939B2 (en) | 2003-09-23 | 2011-02-15 | Thomson Licensing | Technique for simulating film grain using frequency filtering |
US7680356B2 (en) | 2003-10-14 | 2010-03-16 | Thomson Licensing | Technique for bit-accurate comfort noise addition |
US8238613B2 (en) | 2003-10-14 | 2012-08-07 | Thomson Licensing | Technique for bit-accurate film grain simulation |
WO2005076599A1 (fr) * | 2004-02-03 | 2005-08-18 | Koninklijke Philips Electronics N.V. | Modification du rapport hauteur/largeur d'images a afficher sur un ecran |
US8150206B2 (en) | 2004-03-30 | 2012-04-03 | Thomson Licensing | Method and apparatus for representing image granularity by one or more parameters |
FR2872664A1 (fr) * | 2004-07-01 | 2006-01-06 | Nextream France Sa | Dispositif et procede de pre-traitemebnt avant codage d'une sequence d'images video |
WO2006003102A1 (fr) * | 2004-07-01 | 2006-01-12 | Thomson Licensing S.A. | Dispositif et procede de pretraitement avant le codage d'une sequence d'images video |
US8537904B2 (en) | 2004-07-01 | 2013-09-17 | Thomson Licensing | Pre-processing device and method before encoding of a video image sequence |
KR101160718B1 (ko) * | 2004-07-01 | 2012-06-28 | 톰슨 라이센싱 | 영상 이미지 시퀀스의 인코딩 이전의 전처리 장치 및 방법 |
WO2006008609A1 (fr) * | 2004-07-12 | 2006-01-26 | Nokia Corporation | Systeme et procede de prediction des mouvements en codage video a geometrie variable |
WO2006020532A3 (fr) * | 2004-08-09 | 2006-04-06 | Pinnacle Systems Inc | Filtrage rapide par zone selectionnee, destine a la reduction d'artefacts de type pixel-bruit et analogue |
US9953401B2 (en) | 2004-10-18 | 2018-04-24 | Thomson Licensing | Apparatus and system for determining block averages for film grain simulation |
US9117260B2 (en) | 2004-10-18 | 2015-08-25 | Thomson Licensing | Methods for determining block averages for film grain simulation |
US8014558B2 (en) | 2004-10-18 | 2011-09-06 | Thomson Licensing | Methods, apparatus and system for film grain simulation |
WO2006059021A1 (fr) * | 2004-10-20 | 2006-06-08 | Thomson Licensing | Procede de codage d'images video de differents formats non proportionnels |
FR2876860A1 (fr) * | 2004-10-20 | 2006-04-21 | Thomson Licensing Sa | Procede de codage hierarchique d'images video |
FR2876861A1 (fr) * | 2004-10-20 | 2006-04-21 | Thomson Licensing Sa | Procede de codage d'images video de differents formats non proportionnels |
WO2006042990A1 (fr) * | 2004-10-20 | 2006-04-27 | Thomson Licensing | Procede de codage hierarchique d'images video |
US8306119B2 (en) | 2004-10-20 | 2012-11-06 | Thomson Licensing | Method for hierarchically coding video images |
US7738722B2 (en) | 2004-10-21 | 2010-06-15 | Thomson Licensing | Technique for adaptive de-blocking of block-based film grain patterns |
US7852409B2 (en) | 2004-11-16 | 2010-12-14 | Thomson Licensing | Bit-accurate seed initialization for pseudo-random number generators used in a video system |
US9177364B2 (en) | 2004-11-16 | 2015-11-03 | Thomson Licensing | Film grain simulation method based on pre-computed transform coefficients |
US9117261B2 (en) | 2004-11-16 | 2015-08-25 | Thomson Licensing | Film grain SEI message insertion for bit-accurate simulation in a video system |
US9098916B2 (en) | 2004-11-17 | 2015-08-04 | Thomson Licensing | Bit-accurate film grain simulation method based on pre-computed transformed coefficients |
WO2006058921A1 (fr) * | 2004-12-03 | 2006-06-08 | Thomson Licensing | Procede de codage video a geometrie variable |
FR2879066A1 (fr) * | 2004-12-03 | 2006-06-09 | Thomson Licensing Sa | Procede et dispositif de codage hierarchique inter couches |
WO2006064422A1 (fr) * | 2004-12-13 | 2006-06-22 | Koninklijke Philips Electronics N.V. | Codage d'images echelonnables |
JP2008530839A (ja) * | 2005-02-07 | 2008-08-07 | トムソン ライセンシング | 24Hzフレーム周波数のビデオ信号に基づくオーディオ/ビデオ・データに関係するビデオ信号および一つまたは複数のオーディオ信号を再生するための方法および装置 |
US8244094B2 (en) | 2005-02-07 | 2012-08-14 | Thomson Licensing | Method and apparatus for replaying a video signal and one or more audio signals related to audio/video data that are based on a 24Hz frame frequency video signal |
US8345762B2 (en) | 2005-02-18 | 2013-01-01 | Thomson Licensing | Method for deriving coding information for high resolution pictures from low resolution pictures and coding and decoding devices implementing said method |
CN102075755A (zh) * | 2005-03-18 | 2011-05-25 | 夏普株式会社 | 用于图像上采样的方法和系统 |
EP1737240A3 (fr) * | 2005-06-21 | 2007-03-14 | Thomson Licensing | Procédé pour le codage ou le décodage ä l'echelle d'une image |
WO2006136568A3 (fr) * | 2005-06-21 | 2007-04-26 | Thomson Licensing | Procede pour codage ou decodage extensibles d'images sources numeriques |
EP1737240A2 (fr) * | 2005-06-21 | 2006-12-27 | Thomson Licensing | Procédé pour le codage ou le décodage ä l'echelle d'une image |
WO2007063912A1 (fr) | 2005-11-29 | 2007-06-07 | Matsushita Electric Industrial Co., Ltd. | Dispositif de reproduction |
EP1956830A1 (fr) * | 2005-11-29 | 2008-08-13 | Matsushita Electric Industrial Co., Ltd. | Dispositif de reproduction |
EP1956830A4 (fr) * | 2005-11-29 | 2010-09-29 | Panasonic Corp | Dispositif de reproduction |
US8351756B2 (en) | 2005-11-29 | 2013-01-08 | Panasonic Corporation | Reproduction device |
US8446956B2 (en) | 2006-01-05 | 2013-05-21 | Thomson Licensing | Inter-layer motion prediction method using resampling |
WO2007080477A3 (fr) * | 2006-01-10 | 2007-10-25 | Nokia Corp | Mécanisme de sur-échantillonnage de filtre commuté pour codage vidéo hiérarchique |
US9167266B2 (en) | 2006-07-12 | 2015-10-20 | Thomson Licensing | Method for deriving motion for high resolution pictures from motion data of low resolution pictures and coding and decoding devices implementing said method |
EP2129108A1 (fr) * | 2006-12-18 | 2009-12-02 | Sony Corporation | Dispositif et procédé d'imagerie, dispositif et procédé d'enregistrement, et dispositif et procédé de reproduction |
EP2129108A4 (fr) * | 2006-12-18 | 2011-10-26 | Sony Corp | Dispositif et procédé d'imagerie, dispositif et procédé d'enregistrement, et dispositif et procédé de reproduction |
US8102436B2 (en) | 2006-12-18 | 2012-01-24 | Sony Corporation | Image-capturing apparatus and method, recording apparatus and method, and reproducing apparatus and method |
US10715834B2 (en) | 2007-05-10 | 2020-07-14 | Interdigital Vc Holdings, Inc. | Film grain simulation based on pre-computed transform coefficients |
US10284842B2 (en) | 2013-03-05 | 2019-05-07 | Qualcomm Incorporated | Inter-layer reference picture construction for spatial scalability with different aspect ratios |
WO2014137920A1 (fr) * | 2013-03-05 | 2014-09-12 | Qualcomm Incorporated | Construction d'image de référence entre couches pour aptitude à une mise à l'échelle spatiale avec différents rapports géométriques |
JP2016513913A (ja) * | 2013-03-05 | 2016-05-16 | クゥアルコム・インコーポレイテッドQualcomm Incorporated | 異なるアスペクト比を伴う空間スケーラビリティのためのレイヤ間の参照ピクチャの構築 |
EP3340625A4 (fr) * | 2015-08-19 | 2019-01-23 | Sony Corporation | Dispositif d'émission, procédé d'émission, dispositif de réception et procédé de réception |
US12120344B2 (en) | 2015-08-19 | 2024-10-15 | Saturn Licensing Llc | Transmission device, transmission method, reception device and reception method |
CN113316001A (zh) * | 2021-05-25 | 2021-08-27 | 上海哔哩哔哩科技有限公司 | 视频对齐方法及装置 |
CN113316001B (zh) * | 2021-05-25 | 2023-04-11 | 上海哔哩哔哩科技有限公司 | 视频对齐方法及装置 |
WO2022251383A1 (fr) * | 2021-05-26 | 2022-12-01 | Qualcomm Incorporated | Limites d'éléments d'iu de haute qualité utilisant des masques dans des trames interpolées dans le temps |
US11587208B2 (en) | 2021-05-26 | 2023-02-21 | Qualcomm Incorporated | High quality UI elements with frame extrapolation |
CN114697677A (zh) * | 2022-03-31 | 2022-07-01 | 展讯通信(上海)有限公司 | 数据压缩方法及装置、计算机可读存储介质、终端 |
CN119052399A (zh) * | 2024-10-30 | 2024-11-29 | 渭南大东印刷包装机械有限公司 | 一种用于凹版印刷机的工作图像传输方法 |
Also Published As
Publication number | Publication date |
---|---|
EP1279111A1 (fr) | 2003-01-29 |
EP1279111A4 (fr) | 2005-03-23 |
JP4352105B2 (ja) | 2009-10-28 |
JP2003531514A (ja) | 2003-10-21 |
AU2001251386A1 (en) | 2001-10-23 |
CA2406459A1 (fr) | 2001-10-18 |
CA2406459C (fr) | 2006-06-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US6728317B1 (en) | Moving image compression quality enhancement using displacement filters with negative lobes | |
CA2406459C (fr) | Organisation renforcee en couches temporelles et par resolution dans la television avancee | |
KR100481572B1 (ko) | Atv에서의시간및해상도계층화 | |
JP4266389B2 (ja) | フィルムでないソースの高品質の再生のための動き信号を用いる汎用ビデオディスク記録および再生 | |
US7280155B2 (en) | Method and system for converting interlaced formatted video to progressive scan video | |
US6862372B2 (en) | System for and method of sharpness enhancement using coding information and local spatial features | |
US5237413A (en) | Motion filter for digital television system | |
US6873657B2 (en) | Method of and system for improving temporal consistency in sharpness enhancement for a video signal | |
EP1506525B1 (fr) | Systeme et procede d'amelioration de la nettete d'une video numerique codee | |
US20070086666A1 (en) | Compatible interlaced sdtv and progressive hdtv | |
Demos | Temporal and resolution layering in advanced television | |
US6816553B1 (en) | Coding method for an image signal | |
Reimers et al. | JPEG and MPEG Source Coding of Video Signals | |
JP2004515133A (ja) | 圧縮符号化されたビデオの伸長 | |
Bayrakeri | Scalable video coding using spatiotemporal interpolation | |
Drury | Video preprocessing in MPEG-2 compression systems |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
WWE | Wipo information: entry into national phase |
Ref document number: 2406459 Country of ref document: CA |
|
ENP | Entry into the national phase |
Ref country code: JP Ref document number: 2001 574651 Kind code of ref document: A Format of ref document f/p: F |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2001924762 Country of ref document: EP |
|
WWP | Wipo information: published in national office |
Ref document number: 2001924762 Country of ref document: EP |
|
WWW | Wipo information: withdrawn in national office |
Ref document number: 2001924762 Country of ref document: EP |