KR101144752B1

KR101144752B1 - video encoding/decoding method and apparatus thereof

Info

Publication number: KR101144752B1
Application number: KR1020100019893A
Authority: KR
Inventors: 박광훈; 박민우
Original assignee: 경희대학교 산학협력단
Priority date: 2009-08-05
Filing date: 2010-03-05
Publication date: 2012-05-09
Anticipated expiration: 2030-03-05
Also published as: KR20110014507A

Abstract

비디오 인코딩/디코딩 방법 및 그 장치가 개시(disclose)된다. 일실시예에 따른 방법은 복수의 시점별 비디오 영상들을 부호화하되, 상기 복수의 시점별 비디오 영상들 중 적어도 하나에 대해서는 다시점 비디오 인코딩을 적용하고, 상기 복수의 시점별 비디오 영상들 중 적어도 하나에 대해서는 스케일러블 비디오 인코딩을 적용하여, 시점별 계층별 비트스트림들을 생성하는 단계; 및 상기 생성된 시점별 계층별 비트스트림들을, 시점 및 계층에 기초한 미리 설정된 순서에 따라 조합하여, 출력 비트스트림을 생성하는 단계를 포함한다.A video encoding / decoding method and apparatus thereof are disclosed. A method according to an embodiment encodes a plurality of view-by-view video images, applies multi-view video encoding to at least one of the plurality of view-by-view video images, and applies to at least one of the plurality of view-by-view video images. Generating scalable per-layer bitstreams by applying scalable video encoding; And generating the output bitstream by combining the generated per-layer layer bitstreams according to a predetermined order based on the view and the layer.

Description

Video encoding / decoding method and apparatus thereof

개시된 기술은, 비디오 인코딩/디코딩 기술에 관한 것으로, 보다 상세하지만 제한됨이 없이는(more particularly, but not exclusively), 다양한 종류의 네트워크 환경과 다양한 포맷의 단말들 - 다양한 종류의 실감형(Realistic) 디스플레이(예컨대, 스테레오스코픽 디스플레이, 다시점 디스플레이, 시점 선택형 디스플레이 등)를 지원하는 단말과 기존의 다양한 종류의 2차원 디스플레이를 지원하는 단말 등을 포함함 - 을 고려하여 영상 정보를 효율적으로 처리할 수 있는 비디오 인코딩/디코딩 기술에 관한 것이다.The disclosed technique relates to a video encoding / decoding technique, which is more particularly, but not exclusively, various kinds of network environments and terminals of various formats-various kinds of realistic displays ( For example, the terminal includes a terminal supporting stereoscopic display, a multi-view display, a view-selective display, and the like, and a terminal supporting various kinds of existing two-dimensional displays. The present invention relates to an encoding / decoding technique.

최근, 비디오 코딩 기술의 급속한 발전을 통하여, 다양한 응용 분야(예컨대, 통신, 방송, 저장매체 등)에서 사용자는 고해상도 및/또는 고화질의 영상 서비스를 향유할 수 있게 되었다. Recently, with the rapid development of video coding technology, users can enjoy high resolution and / or high quality video services in various application fields (eg, communication, broadcasting, storage media, etc.).

한편, 현재까지는 주로 2차원 평면 디스플레이 장치를 지원하거나 특정 포맷 - 예컨대, 특정 어플리케이션/전송 환경/단말에 전용적인(dedicated) 영상 포맷 - 에 한하여 지원하는 비디오 코딩 기술이 사용되고 있지만, 앞으로는 다시점 비디오 코딩 (Multi-view Video Coding : MVC) 기술과 스케일러블 비디오 코딩 (Scalable Video Coding : SVC) 기술이 급속도로 보급될 것으로 예상되고 있다. On the other hand, video coding techniques are mainly used to support two-dimensional flat panel display devices or to support a specific format, for example, a video format dedicated to a specific application / transmission environment / terminal, but in the future, multi-view video coding is used. Multi-view Video Coding (MVC) technology and Scalable Video Coding (SVC) technology are expected to spread rapidly.

개시된 기술이 이루고자 하는 기술적 과제는 다양한 종류의 네트워크 환경과 다양한 종류의 단말을 고려하여 영상 정보를 효율적으로 처리할 수 있는 기술을 제공하는 데 있다.The technical problem to be achieved by the disclosed technology is to provide a technology capable of efficiently processing image information in consideration of various types of network environments and various types of terminals.

상기의 기술적 과제를 이루기 위하여 개시된 기술의 일 측면은 (a) 복수의 시점별 비디오 영상들을 부호화하되, 상기 복수의 시점별 비디오 영상들 중 적어도 하나에 대해서는 다시점 비디오 인코딩을 적용하고, 상기 복수의 시점별 비디오 영상들 중 적어도 하나에 대해서는 스케일러블 비디오 인코딩을 적용하여, 시점별 계층별 비트스트림들을 생성하는 단계; 및 (b) 상기 생성된 시점별 계층별 비트스트림들을, 시점 및 계층에 기초한 미리 설정된 순서에 따라 조합하여, 출력 비트스트림을 생성하는 단계를 포함하는 비디오 인코딩 방법을 제공한다.In order to achieve the above technical problem, an aspect of the disclosed technology is to (a) encode a plurality of view-by-view video images, apply multi-view video encoding to at least one of the plurality of view-by-view video images, and Generating scalable per-layer bitstreams by applying scalable video encoding to at least one of per-view video images; And (b) combining the generated per-layer layer-specific bitstreams according to a predetermined order based on the view and the layer to generate an output bitstream.

일실시예에 있어서, 상기 (a) 단계는, 상기 복수의 시점별 비디오 영상들 중 적어도 하나에 대해서는 다시점 비디오 인코딩 및 스케일러블 비디오 인코딩을 적용하는 단계를 포함한다.In an embodiment, the step (a) may include applying multi-view video encoding and scalable video encoding to at least one of the plurality of view-by-view video images.

일실시예에 있어서, 상기 (a) 단계에서 상기 다시점 비디오 코딩을 적용하는 것은, 타 시점의 비디오 영상 정보에 기초한 시점간 예측을 수행하는 것을 포함한다.In one embodiment, applying the multi-view video coding in step (a) includes performing inter-view prediction based on video image information of another view.

일실시예에 있어서, 상기 (a) 단계에서 상기 스케일러블 비디오 코딩을 적용하는 것은, 하위 계층에 대한 비디오 인코딩으로 얻어지는 비디오 영상 정보 및 현재 계층의 비디오 영상을 기초로 현재 계층의 비디오 영상을 부호화하는 것을 포함한다.In one embodiment, applying the scalable video coding in the step (a) is to encode the video image of the current layer based on the video image information obtained by the video encoding for the lower layer and the video image of the current layer. It includes.

일실시예에 있어서, 상기 (a) 단계는, 계층별로 예측 구조를 설정하고, 상기 설정된 계층별 예측 구조에 따라 스케일러블 비디오 인코딩하는 단계를 포함한다.In an embodiment, the step (a) may include setting a prediction structure for each layer and encoding scalable video according to the set layer-specific prediction structure.

일실시예에 있어서, 상기 (a) 단계는, 현재 시점 영상의 현재 계층의 예측 구조가, 상기 현재 시점 영상의 하위 계층의 예측 구조와 동일한지 여부를 판단하는 단계; 및 상기 판단 결과에 따라 상기 하위 계층의 예측 구조를 사용할지 여부를 결정하여 상기 현재 계층의 예측 구조를 인코딩하는 단계를 포함한다.In an embodiment, the step (a) may include determining whether a prediction structure of a current layer of the current view image is the same as a prediction structure of a lower layer of the current view image; And encoding the prediction structure of the current layer by determining whether to use the prediction structure of the lower layer according to the determination result.

일실시예에 있어서, 상기 스케일러블 비디오 인코딩은, 상기 복수 개의 시점별 영상들에 대해, 공간적 스케일러블 비디오 인코딩, 시간적 스케일러블 비디오 인코딩 및 화질적 스케일러블 비디오 인코딩 중 적어도 하나를 포함한다.In one embodiment, the scalable video encoding includes at least one of spatial scalable video encoding, temporal scalable video encoding, and quality scalable video encoding for the plurality of view-by-view images.

일실시예에 있어서, 상기 (b) 단계는, 시간 순서에 따라 동일한 시간대의 시점별 계층별 비트스트림들끼리 나열한 시간대별 액세스 유닛(Access Unit)들을 생성하여, 상기 시간대별 액세스 유닛들을 시간 순서에 따라 나열한 다시점 스케일러블 비디오 비트스트림을 출력하는 단계를 포함한다.In one embodiment, the step (b) generates time-phased access units arranged by time-based bitstreams of time-phase hierarchies in the same time zone, and assigns the time-phased access units to the time sequence. Outputting the multi-view scalable video bitstreams listed accordingly.

상기의 기술적 과제를 이루기 위하여 개시된 기술의 다른 측면은 (a) 비트스트림을 수신하는 단계; (b) 상기 수신된 비트스트림으로부터 시점별 계층별 비트스트림들로 분할하는 단계; 및 (c) 상기 시점별 계층별 비트스트림들에 대해, 시점 간 정보 및 계층 간 정보를 이용하여 다시점 스케일러블 비디오 디코딩을 수행하는 단계를 포함하는 비디오 디코딩 방법을 제공한다.Another aspect of the disclosed technology to achieve the above technical problem comprises the steps of: (a) receiving a bitstream; (b) dividing the received bitstream into layer-wise bitstreams for each view; And (c) performing multi-view scalable video decoding on the per-layer layered bitstreams using the inter-view information and the inter-layer information.

일실시예에 있어서, 상기 (b) 단계는, 상기 비디오 비트스트림 중, 계층 및 시점에 따라 디코딩이 필요한 비트스트림을 선택하는 단계를 포함한다.In an embodiment, the step (b) includes selecting a bitstream to be decoded according to a layer and a time point among the video bitstream.

일실시예에 있어서, 상기 (c) 단계는, 현재 시점 영상에 대한 하위 계층의 영상 또는 하위 계층의 정보가 존부에 따라 계층 간 스케일러블 비디오 디코딩의 수행 여부를 판단하는 단계; 시점 방향 예측 코딩 수행 여부에 따라 다시점 비디오 디코딩의 수행 여부를 판단하는 단계; 및 상기 스케일러블 비디오 디코딩의 수행 여부 판단 결과 및 상기 다시점 비디오 디코딩의 수행 여부 판단 결과에 따라, 상기 현재 시점 영상에 대해 다시점 스케일러블 비디오 디코딩, 단일 시점 스케일러블 비디오 디코딩, 다시점 비디오 디코딩 및 단일 시점 비디오 디코딩 중 하나를 수행하는 단계를 포함한다.In an embodiment, the step (c) may include: determining whether to perform inter-layer scalable video decoding according to whether an image of a lower layer or information on a lower layer of a current view image is present; Determining whether to perform multiview video decoding according to whether to perform view direction prediction coding; A multi-view scalable video decoding, a single-view scalable video decoding, a multi-view video decoding, and the like according to a result of determining whether the scalable video decoding is performed and whether or not the multi-view video decoding is performed. Performing one of the single-view video decoding.

일실시예에 있어서, 상기 (c) 단계는, 계층별로 설정된 예측 구조에 따라 디코딩하는 단계를 포함한다.In an embodiment, the step (c) includes decoding according to a prediction structure set for each layer.

일실시예에 있어서, 상기 (c) 단계는, 상기 복수 개의 시점들 중, 현재 시점 이외의 다른 시점의 영상 정보를 참조하여 상기 현재 시점의 영상에 대해 다시점 비디오 디코딩을 수행하는 단계를 포함한다.In an embodiment, the step (c) may include performing multi-view video decoding on an image of the current view with reference to image information of a view other than the current view among the plurality of views. .

일실시예에 있어서, 상기 (b) 단계는, 상기 수신된 다시점 스케일러블 비디오 비트스트림을 시간 순서에 따라 나열된 시간대별 액세스 유닛들로 분할하고, 상기 시간대별 액세스 유닛들을 시간 순서에 따라 동일한 시간대의 시점별 계층별 비트스트림들로 분할하는 단계를 포함한다.In one embodiment, the step (b) divides the received multi-view scalable video bitstream into time-phased access units listed in chronological order, and the time-phased access units in the same time zone in chronological order. And dividing the bitstream into layer-by-layer bitstreams.

상기의 기술적 과제를 이루기 위하여 개시된 기술의 또 다른 측면은 복수 개의 시점별 영상들에 대해, 시점 간 정보 및 계층 간 정보를 이용한 다시점 스케일러블 비디오 인코딩을 수행하는 부호화부; 및 상기 시점별 영상들에 대한 스케일러블 비디오 인코딩에 의해 생성된 시점별 계층별 비트스트림을, 상기 시점 및 계층을 고려한 순서에 따라 조합하여 다시점 스케일러블 비디오 비트스트림을 출력하는 출력부를 포함하는 비디오 인코딩 장치를 제공한다.Another aspect of the disclosed technology to achieve the above technical problem, the encoder for performing a multi-view scalable video encoding using the inter-view information and inter-layer information for a plurality of time-specific images; And a video output unit configured to output a multiview scalable video bitstream by combining the per-view layer bitstream generated by the scalable video encoding of the per-view images according to the order considering the view and the layer. Provide an encoding device.

상기의 기술적 과제를 이루기 위하여 개시된 기술의 또 다른 측면은 다시점 스케일러블 비디오 비트스트림을 수신하는 수신부; 상기 다시점 스케일러블 비디오 비트스트림의 복수 개의 시점들 및 복수 개의 계층들에 기초하여, 상기 다시점 스케일러블 비디오 비트스트림을 시점별 계층별 비트스트림들로 분할하는 분할부; 및 상기 시점별 계층별 비트스트림들에 대해, 시점 간 정보 및 계층 간 정보를 이용하여 다시점 스케일러블 비디오 디코딩을 수행하는 복호화부를 포함하는 비디오 디코딩 장치를 제공한다.Another aspect of the disclosed technology to achieve the above technical problem is a receiver for receiving a multiview scalable video bitstream; A splitter configured to divide the multiview scalable video bitstream into viewable layer-specific bitstreams based on a plurality of views and a plurality of layers of the multiview scalable video bitstream; And a decoder configured to perform multi-view scalable video decoding using the inter-view layer information and the inter-layer information on the per-layer hierarchical bitstreams.

상기의 기술적 과제를 이루기 위하여 개시된 기술의 또 다른 측면은 개시된 비디오 인코딩 방법을 구현하는 프로그램을 기록한 컴퓨터로 판독 가능한 기록매체를 제공한다.Another aspect of the disclosed technology to achieve the above technical problem provides a computer-readable recording medium recording a program for implementing the disclosed video encoding method.

상기의 기술적 과제를 이루기 위하여 개시된 기술의 또 다른 측면은 개시된 비디오 디코딩 방법을 구현하는 프로그램을 기록한 컴퓨터로 판독 가능한 기록매체를 제공한다.Another aspect of the disclosed technology to achieve the above technical problem provides a computer-readable recording medium recording a program for implementing the disclosed video decoding method.

개시된 기술의 일 실시예들은 다음의 장점들을 포함하는 효과를 가질 수 있다. 다만, 개시된 기술의 모든 실시예들이 이를 전부 포함하여야 한다는 의미는 아니므로, 개시된 기술의 권리범위는 이에 의하여 제한되는 것으로 이해되어서는 아니 될 것이다. One embodiment of the disclosed technique may have the effect of including the following advantages. However, not all the embodiments of the disclosed technology to include all, it should not be understood that the scope of the disclosed technology is limited by this.

개시된 기술(예컨대, 다시점 스케일러블 비디오 인코딩/디코딩 기술, 계층별 적응적 예측 기술 등)을 통하여, 유비쿼터스 환경에서 실감형 단말과 기존 2차원 단말들에게 비디오 정보가 통합적으로 처리되어 손쉽게 변환, 전송될 수 있다.Through the disclosed technology (e.g., multi-view scalable video encoding / decoding technology, hierarchical adaptive prediction technology, etc.), video information is integrated and processed and easily converted and transmitted to realistic terminals and existing 2D terminals in a ubiquitous environment. Can be.

도 1은 다시점 비디오 코딩 기술에서 사용되는 예측 구조의 일례를 나타낸다.
도 2는 스케일러블 비디오 코딩 기술을 사용하지 않는 방식과 스케일러블 비디오 코딩 기술을 사용하는 방식을 비교 설명하기 위한 도면이다.
도 3은 일실시예에 따른 다시점 스케일러블 비디오 인코딩 장치의 블록도를 도시한다.
도 4는 일실시예에 따른 다시점 스케일러블 비디오 디코딩 장치의 블록도를 도시한다.
도 5는 일실시예에 따라 구현 가능한 실감형 다시점 스케일러블 비디오 서비스의 개요도를 도시한다.
도 6은 일실시예에 따른 다시점 스케일러블 비디오 인코딩을 위한 구현된 인코더의 구조를 도시한다.
도 7은 일실시예에 따른 다시점 스케일러블 비디오 인코딩을 위한 구현된 인코더의 상세한 구조를 도시한다.
도 8은 일실시예에 따라 출력되는 비트스트림을 조합하기 위한 구조를 도시한다.
도 9는 일실시예에 따른 다시점 비디오 인코딩에 의해 생성된 비트스트림의 구성을 도시한다.
도 10은 일실시예에 따른 다시점 스케일러블 비디오 인코딩에 의해 생성된 비트스트림의 구성을 도시한다.
도 11은 일 실시예에 따른 시점별 계층별 비트스트림들을 도시한다.
도 12는 일 실시예에 따른 다시점 스케일러블 비디오 비트스트림의 상세한 구성을 도시한다.
도 13은 일 실시예에 따른 다시점 비디오 디코딩을 위한 구현된 디코더의 구조를 도시한다.
도 14는 일 실시예에 따른 다시점 스케일러블 비디오 디코딩을 위한 구현된 디코더의 구조를 도시한다.
도 15는 다른 일실시예에 따른 다시점 스케일러블 비디오 디코딩을 위한 구현된 디코더의 구조를 도시한다.
도 16은 일실시예에 따라, 서브 디코더에서 각 픽처 또는 슬라이스 단위로 디코딩을 수행하는 방법의 흐름도를 도시한다.
도 17은 일실시예에 따른, 단일 영상 및 영상 간 스케일러블 비디오 디코딩에 필요한 정보들을 도시한다.
도 18은 일실시예에 따른 계층별 예측 구조에 따라, 현재 계층의 예측 구조를 설정하는 일 방법의 흐름도를 도시한다.
도 19는 일실시예에 따른 계층별 예측 구조에 따라, 현재 계층의 예측 구조를 설정하는 다른 방법의 흐름도를 도시한다.
도 20은 다른 일실시예에 따른 계층별 예측 구조에 따라, 하위 계층의 예측 구조를 선택적으로 사용하여 현재 계층의 예측 구조를 코딩하는 일 방법의 흐름도를 도시한다.
도 21은 다른 일실시예에 따른 계층별 예측 구조에 따라, 하위 계층의 예측 구조를 선택적으로 사용하여 현재 계층의 예측 구조를 코딩하는 다른 방법의 흐름도를 도시한다.
도 22는 일실시예에 따라 4개의 시점 및 2개의 공간적 계층을 가지는 다시점 스케일러블 비디오 코딩에서 공간적 현재 계층 및 하위 계층의 예측 구조의 일례를 도시한다.
도 23은 일실시예에 따라 4개의 시점 및 2개의 공간적 계층을 가지는 다시점 스케일러블 비디오 코딩에서 공간적 현재 계층 및 하위 계층의 예측 구조의 다른 예를 도시한다.
도 24는 일실시예에 따라 4개의 시점 및 2개의 공간적 계층을 가지는 다시점 스케일러블 비디오 코딩에서 공간적 현재 계층 및 하위 계층의 예측 구조의 또 다른 실시예를 도시한다.
도 25는 다른 일실시예에 따른 다시점 스케일러블 비디오 인코더의 블록도를 도시한다.
도 26은 다른 일실시예에 따른 다시점 스케일러블 비디오 디코더의 블록도를 도시한다.
도 27은 일실시예에 따른 다시점 스케일러블 비디오 인코딩 방법의 흐름도를 도시한다.
도 28은 일실시예에 따른 다시점 스케일러블 비디오 디코딩 방법의 흐름도를 도시한다.1 illustrates an example of a prediction structure used in a multiview video coding technique.
FIG. 2 is a diagram for explaining a method of not using a scalable video coding technique and a method of using a scalable video coding technique.
3 illustrates a block diagram of a multiview scalable video encoding apparatus according to an embodiment.
4 is a block diagram of an apparatus for multiview scalable video decoding, according to an exemplary embodiment.
5 illustrates a schematic diagram of an immersive multiview scalable video service that can be implemented according to an embodiment.
6 illustrates a structure of an implemented encoder for multiview scalable video encoding according to an embodiment.
7 illustrates a detailed structure of an implemented encoder for multiview scalable video encoding according to an embodiment.
8 illustrates a structure for combining the output bitstreams according to an embodiment.
9 illustrates a configuration of a bitstream generated by multiview video encoding according to an embodiment.
10 illustrates a configuration of a bitstream generated by multiview scalable video encoding according to an embodiment.
11 is a diagram illustrating bitstreams according to layers according to an embodiment, according to an embodiment.
12 illustrates a detailed configuration of a multiview scalable video bitstream according to an embodiment.
13 illustrates a structure of an implemented decoder for multiview video decoding according to an embodiment.
14 illustrates a structure of an implemented decoder for multiview scalable video decoding according to an embodiment.
15 illustrates a structure of an implemented decoder for multiview scalable video decoding according to another embodiment.
16 is a flowchart of a method of performing decoding in units of pictures or slices in a sub decoder, according to an embodiment.
FIG. 17 illustrates information required for decoding a single image and scalable video between images according to an embodiment.
18 is a flowchart of a method of setting a prediction structure of a current layer, according to a layer-by-layer prediction structure, according to an embodiment.
19 is a flowchart of another method of setting a prediction structure of a current layer, according to a layer-by-layer prediction structure, according to an embodiment.
20 is a flowchart of a method of coding a prediction structure of a current layer by selectively using a prediction structure of a lower layer, according to a layer-by-layer prediction structure, according to another embodiment.
21 is a flowchart of another method of coding a prediction structure of a current layer by selectively using a prediction structure of a lower layer, according to another layer-specific prediction structure, according to another embodiment.
22 illustrates an example of a prediction structure of a spatial current layer and a lower layer in multiview scalable video coding having four views and two spatial layers, according to an embodiment.
23 illustrates another example of a prediction structure of a spatial current layer and a lower layer in multiview scalable video coding having four viewpoints and two spatial layers, according to an embodiment.
24 illustrates another embodiment of a prediction structure of a spatial current layer and a lower layer in multiview scalable video coding having four views and two spatial layers, according to an embodiment.
25 is a block diagram of a multiview scalable video encoder according to another embodiment.
26 is a block diagram of a multiview scalable video decoder according to another embodiment.
27 is a flowchart of a multiview scalable video encoding method according to an embodiment.
28 is a flowchart of a multiview scalable video decoding method, according to an embodiment.

본 발명의 실시예들에 관한 설명은 본 발명의 구조적 내지 기능적 설명들을 위하여 예시된 것에 불과하므로, 본 발명의 권리범위는 본문에 설명된 실시예들에 의하여 제한되는 것으로 해석되어서는 아니 된다. 즉, 본 발명의 실시예들은 다양한 변경이 가능하고 여러 가지 형태를 가질 수 있으므로 본 발명의 기술적 사상을 실현할 수 있는 균등물들을 포함하는 것으로 이해되어야 한다.Since descriptions of embodiments of the present invention are merely illustrated for structural to functional descriptions of the present invention, the scope of the present invention should not be construed as limited by the embodiments described in the present invention. That is, the embodiments of the present invention may be variously modified and may have various forms, and thus, it should be understood to include equivalents that may realize the technical idea of the present invention.

한편, 본 발명에서 서술되는 용어의 의미는 다음과 같이 이해되어야 할 것이다.On the other hand, the meaning of the terms described in the present invention will be understood as follows.

"제1", "제2" 등의 용어는 하나의 구성요소를 다른 구성요소로부터 구별하기 위한 것으로 이들 용어들에 의해 본 발명의 권리범위가 한정되어서는 아니 된다. 예를 들어, 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 명명될 수 있다.Terms such as "first" and "second" are intended to distinguish one component from another component, and the scope of the present invention should not be limited by these terms. For example, the first component may be named a second component, and similarly, the second component may also be named a first component.

"및/또는"의 용어는 하나 이상의 관련 항목으로부터 제시가능 한 모든 조합을 포함하는 것으로 이해되어야 한다. 예를 들어, "제1 항목, 제2 항목 및/또는 제3 항목"의 의미는 "제1 항목, 제2 항목 및 제3 항목 중 적어도 하나 이상"을 의미하는 것으로, 제1, 제2 또는 제3 항목뿐만 아니라 제1, 제2 및 제3 항목들 중 2개 이상으로부터 제시될 수 있는 모든 항목의 조합을 의미한다.The term “and / or” should be understood to include all combinations that can be presented from one or more related items. For example, "first item, second item, and / or third item" means "at least one or more of the first item, second item, and third item", and means first, second, or third item. A combination of all items that can be presented from two or more of the first, second and third items as well as the third item.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결될 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소가 다른 구성요소에 "직접 연결되어" 있다고 언급된 때에는, 중간에 다른 구성요소가 존재하지 않는 것으로 이해되어야 할 것이다. 한편, 구성요소들 간의 관계를 설명하는 다른 표현들, 즉 "~사이에"와 "바로 ~사이에" 또는 "~에 이웃하는"과 "~에 직접 이웃하는" 등도 마찬가지로 해석되어야 한다.When a component is referred to as being "connected" to another component, it should be understood that there may be other components in between, although it may be directly connected to the other component. On the other hand, when an element is referred to as being "directly connected" to another element, it should be understood that there are no other elements in between. On the other hand, other expressions describing the relationship between the components, such as "between" and "immediately between" or "neighboring to" and "directly neighboring to", should be interpreted as well.

본 발명에서 기재된 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한 복수의 표현을 포함하는 것으로 이해되어야 하고, "포함하다" 또는 "가지다" 등의 용어는 설시된 특징, 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.Singular expressions described herein are to be understood to include plural expressions unless the context clearly indicates otherwise, and the terms "comprise" or "having" include elements, features, numbers, steps, operations, and elements described. It is to be understood that the present invention is intended to designate that there is a part or a combination thereof, and does not exclude in advance the possibility of the presence or addition of one or more other features or numbers, steps, actions, components, parts or combinations thereof. .

본 발명에서 기술한 각 단계들은 문맥상 명백하게 특정 순서를 기재하지 않은 이상 명기된 순서와 다르게 일어날 수 있다. 즉, 각 단계들은 명기된 순서와 동일하게 일어날 수도 있고 실질적으로 동시에 수행될 수도 있으며 반대의 순서대로 수행될 수도 있다.Each step described in the present invention may occur out of the stated order unless the context clearly dictates the specific order. That is, each step may occur in the same order as specified, may be performed substantially simultaneously, or may be performed in the reverse order.

여기서 사용되는 모든 용어들은 다르게 정의되지 않는 한, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가지고 있다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 의미를 가지는 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한 이상적이거나 과도하게 형식적인 의미를 지니는 것으로 해석될 수 없다.
Unless otherwise defined, all terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the present invention belongs. Terms such as those defined in the commonly used dictionaries should be construed as having meanings consistent with the meanings in the context of the related art, and shall be interpreted as having ideal or overly formal meanings unless expressly defined in this application. Can't be.

먼저, 다시점 비디오 코딩 기술은, 기존의 비디오 국제표준인 MPEG-4 part 10 Advanced Video Coding(AVC; H.264) 방식에 기초하되 다양한 형태 안에서 일정한 간격으로 떨어져 있는 다수의 카메라로부터 입력된 복수의 시점의 비디오 영상들을 효율적으로 부호화하는 기술로서, 3차원 TV(3DTV)나 FTV(Free Viewpoint TV) 등의 실감형 디스플레이 장치를 지원한다. 이 다시점 비디오 부호화 기술에서는, 시간 방향으로는 JSVC(Joint Scalable Video Coding) 방식에서 시간적 스케일러빌리티(temporal scalability)를 지원하기 위해 수행되는 방법인 계층적 B-픽쳐(hierarchical B-pictures) 부호화가 사용되고, 시점 방향으로는 시점간 예측(inter-view prediction)이 사용된다. First, the multi-view video coding technology is based on the existing video international standard MPEG-4 part 10 Advanced Video Coding (AVC; H.264) method, but is inputted from a plurality of cameras spaced at regular intervals in various forms. As a technology for efficiently encoding video images of a viewpoint, a realistic display device such as a 3D TV (3DTV) or a free viewpoint TV (FTV) is supported. In this multi-view video encoding technique, hierarchical B-pictures coding, which is a method performed to support temporal scalability in the Joint Scalable Video Coding (JSVC) scheme, is used in the time direction. In the view direction, inter-view prediction is used.

도 1은 다시점 비디오 코딩 기술에서 사용되는 예측 구조의 일례를 나타내며, 보다 상세하게는, 8개의 시점 비디오 영상들이 존재하며, 시간 방향의 GOP(Group of Pictures)의 크기가 8 (=N)일 경우의 예측 구조를 나타낸다.1 illustrates an example of a prediction structure used in a multiview video coding technique. In detail, eight viewpoint video images exist, and a size of a GOP (group of pictures) in the time direction is 8 (= N). The prediction structure of the case is shown.

도 1에서, S0, S0, S1, S2, S3, S4, S5, S6, S7는 각각 하나의 시점(view)을 나타내고, T0, T1, T2, T3, ..., T16은 시간 상의 흐름을 나타낸다. 다시점 비디오 코딩 기술에서는, 도 1의 화살표 방향에 따라 다른 시점의 비디오 영상을 참조하는 부호화 즉, 시점 방향의 예측(시점간 예측)이 수행된다.In FIG. 1, S0, S0, S1, S2, S3, S4, S5, S6, S7 each represent one view, and T0, T1, T2, T3, ..., T16 represent the flow in time. Indicates. In the multi-view video coding technique, encoding that refers to video images of different views according to the arrow direction of FIG. 1, that is, prediction of the view direction (inter-view prediction) is performed.

다음으로, 스케일러블 비디오 코딩 방식은 ISO(International Organization for Standardization)/IEC(International Electrotechnical Commission)의 MPEG 회의에서 2004년 3월에 표준화를 시작하였고, H.264 코딩 기술을 기반으로 표준화를 수행하기로 결정된 후 2005년 1월부터 JVT (Joint Video Team)에서 표준화를 진행하였으며, 2007년 7월 표준화가 완료되어 H.264 Amendment 3이 완성되었다. 스케일러블 비디오 코딩 방식은 다양한 종류의 단말들과 다양한 전송환경에서 비디오 정보를 통합적으로 취급하기 위한 기술로써 다양한 공간적 해상도(Spatial Resolution), 다양한 프레임율(Frame-rate)과 다양한 화질(Quality)를 지원 가능한 하나의 통합된 데이터를 생성하여 다양한 전송환경과 다양한 단말들에게 데이터를 효율적으로 전송할 수 있도록 지원하는 방법이다.
Next, scalable video coding began standardization in March 2004 at the MPEG meeting of the International Organization for Standardization (ISO) / IEC (International Electrotechnical Commission) and decided to standardize on H.264 coding technology. After the decision, JVT (Joint Video Team) standardized it in January 2005. In July 2007, H.264 Amendment 3 was completed. The scalable video coding method is a technology for integrating video information in various types of terminals and various transmission environments and supports various spatial resolutions, various frame rates, and various image quality. It is a method of supporting data to be efficiently transmitted to various transmission environments and various terminals by generating one integrated data as much as possible.

도 2는 스케일러블 비디오 코딩 기술을 사용하지 않는 방식(이하, 제1 방식)과 스케일러블 비디오 코딩 기술을 사용하는 방식(이하, 제2 방식)을 비교 설명하기 위한 도면이다. 도 2에서는, 4CIF(4ㅧCommon Intermediate Format, 704ㅧ576) 해상도의 비디오 컨텐트(Content)(200)를 CIF(352ㅧ288)급의 저화질(Low Quality) 2차원 디스플레이 장치(210), CIF급 고화질(High Quality) 2차원 디스플레이 장치(220), 및 4CIF급 고화질 2차원 디스플레이 장치(230)에 각각 전달한다고 전제한다.FIG. 2 is a diagram for comparing and explaining a method of not using a scalable video coding technique (hereinafter, referred to as a first scheme) and a method of using a scalable video coding technique (hereinafter referred to as a second scheme). In FIG. 2, a video quality 200 having 4 CIF (4 × Common Intermediate Format, 704 × 576) resolution is converted into a CIF (352 × 288) class low-quality two-dimensional display device 210 or CIF class. It is assumed that the present invention is transmitted to the high quality 2D display device 220 and the 4CIF class high quality 2D display device 230, respectively.

제1 방식(240)에 따르면, 각각의 장치(210, 220, 230)의 포맷에 적합하도록 비디오 인코더들(242, 244, 246)은 각각의 인코딩 과정을 수행해서 각각의 비트스트림(252, 254, 256)을 생성해야 한다. 또한, 생성된 비트스트림들(252, 254, 256)을 모두 저장하는 전달해야 하는 복잡한 과정을 거치게 된다.According to the first scheme 240, the video encoders 242, 244, 246 perform respective encoding processes to conform to the format of the respective devices 210, 220, 230 so that each bitstream 252, 254 , 256). In addition, a complex process of transferring all the generated bitstreams 252, 254, and 256 is stored.

이에 반해 제2 방식(260)에 따르면, SVC 인코더(265)가 스케일러블 비디오 코딩 방식에 따른 인코딩 과정을 한번만 수행하여, SVC 비트스트림(275)을 생성한 후, SVC 비트스트림(275)에서 각각의 장치(210, 220, 230)에 맞도록 비트스트림을 추출 과정(285)를 거쳐 전송하면 된다.
In contrast, according to the second scheme 260, the SVC encoder 265 performs an encoding process according to the scalable video coding scheme only once to generate the SVC bitstream 275, and then, respectively, in the SVC bitstream 275. The bitstream may be transmitted through an extraction process 285 to fit the devices 210, 220, and 230 of the apparatus.

실감형 디스플레이 장치들을 지원하기 위해 다시점 영상을 효율적으로 코딩하는 방법인 다시점 비디오 코딩 표준과, 다양한 전송환경과 다양한 종류의 단말들을 통합적으로 효율적으로 코딩 전송할 수 있는 스케일러블 비디오 코딩 표준이 존재하지만, 유비쿼터스 환경에서 실감형 단말들과 기존 2차원 단말들에게 비디오 정보를 통합적으로 처리하는 방법은 현재 존재하지 않는다.Multi-view video coding standard, which is a method of efficiently coding multi-view video to support sensory display devices, and a scalable video coding standard that can efficiently and efficiently code and transmit various transmission environments and various types of terminals, In the ubiquitous environment, there is currently no method for integrating video information to realistic terminals and existing 2D terminals.

비디오 코딩 기술의 발전을 통해 사용자가 고화질, 고해상도의 비디오 정보를 접할 수 있게 되었지만, 이들은 단지 2차원 평면 디스플레이 장치를 지원하기 위한 방법일 뿐이며 사용자에게 입체감을 주거나 시점을 자유롭게 선택할 수 있도록 하는 실감형 서비스(Realistic Service)를 고려하지 않고 있다.Although advances in video coding technology have made it possible for users to access high-definition and high-definition video information, they are only ways to support two-dimensional flat panel display devices, and they are realistic services that give users a three-dimensional effect or freely select a viewpoint. (Realistic Service) is not taken into account.

또한 앞으로는 방송/통신의 융합 및 유/무선의 융합 등의 과정을 통해 다양한 전송환경 속에서 다양한 종류의 단말들이 혼재하는 유비쿼터스(Ubiquitous) 환경에서 비디오 정보를 효율적으로 전달할 수 있어야 한다. 하지만 이들 비디오 코딩 방법은 각각의 어플리케이션에서 특정한 전송환경과 특정한 단말을 대상으로 한정된 코딩을 수행하도록 설계되어 있기 때문에 효율적인 처리가 불가능하다.In the future, video information should be efficiently delivered in a ubiquitous environment in which various types of terminals are mixed in various transmission environments through processes such as convergence of broadcasting / communication and convergence of wired / wireless. However, since these video coding methods are designed to perform specific coding for a specific transmission environment and a specific terminal in each application, efficient processing is impossible.

만약 기존의 비디오 코딩 방법을 이용하여 하나의 비디오 정보를 서비스한다고 가정하면 다양한 전송환경과 단말들을 지원하기 위해 반복적으로 인코딩을 수행해야 하기 때문에 상당히 어렵고 복잡한 작업이 될 수 있음을 예상할 수 있다.If it is assumed that a single video information is serviced using a conventional video coding method, it can be expected to be a very difficult and complicated task because encoding must be repeatedly performed to support various transmission environments and terminals.

현재 실감형 콘텐츠에 대한 관심 및 사용자 욕구가 매우 높으며, 다양한 종류의 실감형 디스플레이 장치들이 상용화될 것으로 예측된다. 실감형 컨텐트에 대한 사용자의 관심은 영화산업을 중심으로 급격하게 높아지고 있으며, 현재 개인용 스테레오스코픽 디스플레이 장치나 다시점 영상 디스플레이 장치와 같은 실감형 디스플레이 장치들이 다양한 플랫폼에서 개발되고 있다.At present, interest and user desire for immersive content are very high, and various types of sensational display devices are expected to be commercialized. The user's interest in immersive content is rapidly increasing in the movie industry, and immersive display devices such as personal stereoscopic display devices and multi-view video display devices are being developed on various platforms.

따라서, 실감형 비디오 컨텐트를 다양한 전송 환경과 다양한 단말들에게 효율적으로 전달될 수 있도록, 실감형 서비스를 지원하고 다양한 전송 환경과 다양한 단말들의 다양한 포맷들을 지원하는 비디오 정보들을 통합적으로 처리하는 기술이 필요하다.
Therefore, there is a need for a technology that processes sensory services and integrates video information that supports various transmission environments and various formats of various terminals so that realistic video contents can be efficiently delivered to various transmission environments and various terminals. Do.

이하 도 3 내지 29를 참조하여, 각 실시예에 따른 다시점 스케일러블 비디오 인코딩/디코딩 기술을 상세히 설명한다.
3 to 29, a multi-view scalable video encoding / decoding technique according to each embodiment will be described in detail.

도 3은 일 실시예에 따른 다시점 스케일러블 비디오 인코딩 장치의 구성을 나타내는 블록도이다.3 is a block diagram illustrating a configuration of a multiview scalable video encoding apparatus according to an embodiment.

일실시예에 있어서, 다시점 스케일러블 비디오 인코딩 장치(300)는 부호화부(320) 및 출력부(330)를 포함하여 이루어지며, 다른 일실시예에 있어서, 도 3에 도시된 바와 같이, 다시점 스케일러블 비디오 인코딩 장치(300)는 획득부(310)를 더 포함할 수 있다.
In one embodiment, the multi-view scalable video encoding apparatus 300 includes an encoder 320 and an output unit 330. In another embodiment, as shown in FIG. The point scalable video encoding apparatus 300 may further include an acquirer 310.

획득부(310)는, 서로 다른 시점을 가진 복수의 비디오 영상들(이하, 시점 비디오 영상)을 획득하여 시점 비디오 영상들을 미리 설정된 영상 처리 단위로 출력한다. 비디오 영상을 획득하는 방법의 예로는, 획득부(310)가 촬영 모듈(예컨대, 다시점 카메라 또는 복수의 카메라들)을 구비하여 시점 비디오 영상들을 촬영을 통하여 획득하는 방법, 획득부(310)가 외부의 촬영 모듈로부터 시점 비디오 영상들을 제공받는 방법, 저장 매체(예컨대, 하드디스크, CD 롬 등)에 기록된 시점 비디오 영상들을 독취하는 방법을 들 수 있으나, 반드시 이에 한정되는 것은 아니다. 미리 설정된 영상 처리 단위의 예로는, 픽처, 슬라이스, 프레임, 필드 등을 들 수 있으나 반드시 이에 한정되는 것은 아니다. 또한, 본 명세서에서는, 편의상, 개시된 기술에 따른 영상 처리 방식을 픽처 단위로 주로 설명할 것이나, 기타의 영상 처리 단위(예컨대, 슬라이스 등)를 사용하는 영상 처리 방식도 본 발명의 범주에 속함은 이 분야에 종사하는 자라면 충분히 이해할 수 있다.The acquirer 310 acquires a plurality of video images having different viewpoints (hereinafter, referred to as view video images) and outputs view video images in a preset image processing unit. As an example of a method of acquiring a video image, the acquiring unit 310 includes a photographing module (eg, a multi-view camera or a plurality of cameras) to acquire viewpoint video images through photographing, and the acquiring unit 310 A method of receiving viewpoint video images from an external photographing module, and a method of reading viewpoint video images recorded on a storage medium (eg, hard disk, CD ROM, etc.), but are not limited thereto. Examples of the preset image processing unit may include a picture, a slice, a frame, a field, and the like, but are not limited thereto. In addition, in the present specification, for convenience, an image processing method according to the disclosed technology will be mainly described in picture units, but an image processing method using other image processing units (eg, a slice, etc.) is also within the scope of the present invention. Anyone in the field can fully understand.

부호화부(320)는, 획득부(310)로부터 제공되는 복수의 시점 비디오 영상들을 부호화하되, 상기 복수의 시점 비디오 영상들 중 적어도 하나에 대해서는 다시점 비디오 코딩을 적용하고, 상기 복수의 시점 비디오 영상들 중 적어도 하나에 대해서는 스케일러블 비디오 코딩을 적용하여, 시점별 계층별 비트스트림들을 생성한다. 여기서, N개의 시점 비디오 영상(제1 내지 제N 시점 비디오 영상)은 동일한 해상도를 가질 수도 있고, 동일하지 않은 해상도를 가질 수도 있다. 일실시예에 있어서, 부호화부(320)는 복수의 시점별 비디오 영상들 중 적어도 하나에 대해서는 다시점 비디오 코딩 및 스케일러블 비디오 코딩을 적용할 수 있다.The encoder 320 encodes a plurality of view video images provided from the acquirer 310, applies multi-view video coding to at least one of the plurality of view video images, and applies the plurality of view video images. For at least one of them, scalable video coding is applied to generate bitstreams per view layer. Here, the N view video images (first to Nth view video images) may have the same resolution or may not have the same resolution. In an embodiment, the encoder 320 may apply multi-view video coding and scalable video coding to at least one of the plurality of view-by-view video images.

해당 시점의 비디오 영상에 다시점 비디오 코딩을 적용하는 방법으로는, 타 시점의 비디오 영상 정보에 기초한 시점간 예측(inter-view prediction)을 통하여 상기 해당 시점의 비디오 영상을 부호화하는 방법을 들 수 있으나, 반드시 이에 한정되는 것은 아니다. 본 명세서에서, 비디오 영상 정보는 재구성 비디오 영상(예컨대, 타 시점의 재구성 비디오 영상, 하위 계층의 재구성 비디오 영상, 타 시점의 비디오 영상에 스케일러블 비디오 코딩이 적용된 경우, 타 시점의 계층별 재구성 비디오 영상 등), 및 부호화 정보(예컨대, 타 시점의 부호화 정보, 하위 계층의 부호화 정보, 타 시점의 비디오 영상에 스케일러블 비디오 코딩이 적용된 경우, 타시점의 계층별 부호화 정보 등)를 포괄하는 용어로 사용되며, 부호화 정보는 움직임 벡터, 부호화에 사용된 블록 타입(예컨대, 움직임 추정에 사용되는 블록 타입, DCT에 사용되는 블록 타입 등), 변환 계수 등을 포괄하는 용어로 사용된다.As a method of applying multi-view video coding to a video image of a corresponding view, a method of encoding the video image of the corresponding view through inter-view prediction based on video image information of another view may be used. However, the present invention is not limited thereto. In the present specification, the video image information includes a reconstructed video image (for example, when scalable video coding is applied to a reconstructed video image of another view, a reconstructed video image of a lower layer, and a video image of another view, hierarchical reconstructed video image of another view. Etc.) and encoding information (for example, encoding information of another view, encoding information of a lower layer, and encoding information according to layers of another view point when scalable video coding is applied to a video image of another view). The encoding information is used as a term encompassing a motion vector, a block type used for encoding (eg, a block type used for motion estimation, a block type used for DCT), a transform coefficient, and the like.

또한, 해당 시점의 비디오 영상에 스케일러블 비디오 코딩을 적용하는 방법으로는, 상기 해당 시점의 하위 계층에 대한 부호화로 얻어지는 비디오 영상 정보 및 상기 해당 시점의 현재 계층의 비디오 영상을 기초로 현재 계층의 비디오 영상을 부호화하는 방법을 들 수 있으나, 반드시 이에 한정되는 것은 아니다. 스케일러블 비디오 코딩의 예로는 공간적 스케일러블 비디오 코딩, 시간적 스케일러블 비디오 코딩, 및 화질적 스케일러블 비디오 코딩 등을 들 수 있다. 공간적 스케일러블 비디오 코딩의 일례에 따르면, 공간적 해상도에 따라 현재 계층의 영상 및 하위 계층의 영상이 생성되고, 하위 계층의 영상을 참조하여 현재 계층의 영상을 예측할 수 있다. 또한 시간적 스케일러블 비디오 코딩의 일례에 따르면 프레임율에 따라 상위 및 하위 계층의 영상들이 생성되고, 화질적 스케일러블 비디오 코딩의 일례에 따라 화질에 따른 상위 및 하위 계층이 구별될 수 있다.
In addition, as a method of applying scalable video coding to a video image of a corresponding view, the video of the current layer is based on the video image information obtained by encoding the lower layer of the corresponding view and the video image of the current layer of the corresponding view. A method of encoding an image may be included, but is not necessarily limited thereto. Examples of scalable video coding include spatial scalable video coding, temporal scalable video coding, and quality scalable video coding. According to an example of spatial scalable video coding, an image of a current layer and an image of a lower layer may be generated according to spatial resolution, and the image of the current layer may be predicted with reference to the image of the lower layer. In addition, according to an example of temporal scalable video coding, images of upper and lower layers may be generated according to frame rates, and upper and lower layers according to image quality may be distinguished according to an example of image quality scalable video coding.

부호화부(320)의 다시점 스케일러블 비디오 코딩의 일례에 따라, 소정 시점에 대해, 입력 영상을 다운샘플링하여 해상도가 감소된 하위 계층의 영상이 생성되고, 하위 계층의 영상에 대해 시간적 스케일러블 비디오 코딩 및 화질적 스케일러블 비디오 코딩이 수행될 수 있다. 이때 일 실시예에 따른 부호화부(320)는, 입력 영상의 현재 계층의 영상에 대한 스케일러블 비디오 코딩을 위해, 하위 계층의 영상에 대해 시간적 스케일러블 비디오 코딩된 결과를 참조하여 공간적 스케일러블 비디오 코딩을 수행하고, 시간적 스케일러블 비디오 코딩 및 화질적 스케일러블 비디오 코딩을 선택적으로 수행할 수 있다. 이 때, 하위 계층에 대한 시간적 스케일러블 비디오 코딩 및 현재 계층에 대한 공간적 스케일러블 비디오 코딩은 소정 시점 이외의 다른 시점들의 영상들에 대한 부호화 정보를 참조함으로써, 인코딩 수행부(320)의 다시점 스케일러블 비디오 코딩의 일례가 수행될 수 있다.According to an example of multi-view scalable video coding of the encoder 320, an image of a lower layer having a reduced resolution is generated by downsampling an input image at a predetermined time point, and temporally scalable video of the image of the lower layer is generated. Coding and quality scalable video coding may be performed. In this case, the encoder 320 according to an embodiment may perform spatial scalable video coding by referring to a result of temporally scalable video coding of an image of a lower layer, for scalable video coding of an image of a current layer of an input image. And temporally scalable video coding and quality scalable video coding may be selectively performed. In this case, the temporal scalable video coding of the lower layer and the spatial scalable video coding of the current layer refer to encoding information about images of viewpoints other than a predetermined viewpoint, thereby multi-view scale of the encoding performing unit 320. An example of flexible video coding may be performed.

일실시예에 있어서, 인코딩 수행부(320)는 계층별로 예측 구조를 적응적으로 설정하여, 계층별로 설정된 예측 구조에 따라 인코딩할 수 있다. 일례로, 공간적 계층 또는 화질적 계층에 따른 계층별로 시간 방향(temporal direction) 예측 구조 또는 시점 방향(view direction) 예측 구조가 적응적으로 결정될 수 있다. 이하, 일 실시예에 따라 계층별로 예측 구조를 개별적으로 설정한 일실시예의 예측 구조를 '계층별 예측 구조'라 지칭한다.In an embodiment, the encoding performing unit 320 may adaptively set a prediction structure for each layer and encode the prediction structure for each layer. For example, a temporal direction prediction structure or a view direction prediction structure may be adaptively determined for each layer according to the spatial layer or the image quality layer. Hereinafter, a prediction structure of an embodiment in which a prediction structure is individually set for each layer according to an embodiment is referred to as a 'layer prediction structure'.

일실시예에 있어서, 계층별 예측 구조에 따른 부호화부(320)는, 모든 계층에 대해 계층별 예측 구조를 설정한 후, 각각의 계층별로 계층별 예측 구조에 따른 부호화를 수행할 수 있다. 또한, 다른 일실시예에 있어서, 계층별 예측 구조에 따른 인코딩 수행부(320)는, 계층마다 반복적으로, 예측 구조를 개별적으로 설정한 후, 설정된 예측 구조에 따른 부호화를 수행할 수 있다.According to an embodiment, the encoder 320 according to the prediction structure for each layer may set the prediction structure for each layer and then perform encoding according to the prediction structure for each layer for each layer. In another embodiment, the encoding performing unit 320 according to the prediction structure for each layer may repeatedly set the prediction structure individually for each layer and then perform encoding according to the set prediction structure.

일실시예에 있어서, 계층별 예측 구조에 따른 부호화부(320)는, 현재 시점의 비디오 영상의 현재 계층의 예측 구조가, 현재 시점의 비디오 영상의 하위 계층의 예측 구조와 동일한지 여부를 판단하고, 판단 결과에 따라 하위 계층의 예측 구조를 사용할지 여부를 결정하여 현재 계층의 예측 구조를 인코딩할 수 있다. 이때 참조되는 하위 계층은, 현재 계층의 한 계층 하위의 하위 계층 또는 복수 개의 하위 계층들 중 선택된 소정 계층일 수 있다.In an embodiment, the encoder 320 according to the prediction structure for each layer determines whether the prediction structure of the current layer of the video image at the current view is the same as the prediction structure of the lower layer of the video image at the current view. According to the determination result, it may be determined whether to use the prediction structure of the lower layer to encode the prediction structure of the current layer. In this case, the referenced lower layer may be a lower layer below one layer of the current layer or a predetermined layer selected from a plurality of lower layers.

일실시예에 있어서, 계층별 예측 구조에 따른 부호화부(320)는, 공간적 향상 계층의 시점 방향 예측 구조를, 공간적 기본 계층의 시점 방향 예측 구조와 독립적으로 설정할 수 있다. 일례로, 계층별 예측 구조에 따른 부호화부(320)는, 공간적 기본 계층의 시점 방향 예측 구조는 모든 픽처에 대해 시점 방향 예측을 수행하는 반면에, 공간적 향상 계층에 대해서는 비앵커 픽처들 또는 모든 픽처들에 대해서는 시점 방향 예측을 수행하지 않는 예측 구조를 설정할 수 있다.In one embodiment, the encoder 320 according to the prediction structure for each layer may set the view direction prediction structure of the spatial enhancement layer independently of the view direction prediction structure of the spatial base layer. For example, the encoder 320 according to the layer-by-layer prediction structure, the view direction prediction structure of the spatial base layer performs view direction prediction for all pictures, while for the spatial enhancement layer, non-anchor pictures or all the pictures. In this case, a prediction structure for not performing view direction prediction may be set.

일 실시예에 있어서, 부호화부(320)는, 계층별 예측 구조에 따라 하위 계층의 부호화 정보를 이용하여 현재 계층의 예측 구조를 예측하도록 설정되었지만, 하위 계층의 부호화 정보가 현재 계층의 예측 구조의 부호화에 사용될 수 없는 구조인 경우(예컨대, 하위 계층의 부호화 정보가 현재 계층 픽처의 예측 구조에 부합하지 않는 경우), 하위 계층의 부호화 정보를 사용하지 않을 수 있다.In one embodiment, the encoder 320 is configured to predict the prediction structure of the current layer by using the encoding information of the lower layer according to the prediction structure for each layer, but the encoding information of the lower layer is determined by the prediction structure of the current layer. In the case of a structure that cannot be used for encoding (eg, when encoding information of a lower layer does not match the prediction structure of the current layer picture), encoding information of a lower layer may not be used.

다른 일실시예에 있어서, 부호화부(320)는 현재 계층의 예측 구조와 부합하는 부분만 사용할 수 있다. 일례로, 만약 현재 계층의 픽처의 예측 구조가 시간 방향의 예측만을 허용하는데, 하위 계층의 부호화 정보가 시간 방향과 시점 방향의 예측을 모두 수행한 결과라면, 부호화부(320)는 하위 계층의 부호화 정보 중 시간 방향의 예측 정보만을 참조하여 현재 계층의 예측 구조에 따른 부호화를 수행할 수 있다. 다른 일례로, 현재 계층의 예측 구조가 시간 방향의 예측만을 허용하는데 하위 계층의 부호화 정보가 시점 방향의 예측을 수행한 결과라면, 부호화부(320)는 하위 계층의 블록 타입 및 분할 정보만을 참조할 수 있다.
In another embodiment, the encoder 320 may use only a portion that matches the prediction structure of the current layer. For example, if the prediction structure of the picture of the current layer allows only prediction in the time direction, and the encoding information of the lower layer is a result of performing both the prediction in the time direction and the view direction, the encoder 320 may encode the lower layer. The encoding according to the prediction structure of the current layer may be performed by referring to only prediction information in the temporal direction among the information. As another example, if the prediction structure of the current layer allows only prediction in the temporal direction and the encoding information of the lower layer is a result of performing the prediction in the view direction, the encoder 320 may refer to only the block type and the partition information of the lower layer. Can be.

출력부(330)는 부호화부(320)에서 생성된 시점별 계층별 비트스트림들을, 시점 및 계층에 기초한 미리 설정된 순서에 따라 조합하여, 출력 비트스트림(즉, 다시점 스케일러블 비디오 비트스트림)을 생성한다. 일실시예에 있어서, 출력부(330)는 시간 순서에 따라 동일한 시간대의 시점별 계층별 비트스트림들끼리 나열한 시간대별 액세스 유닛(Access Unit)들을 생성할 수 있다. 액세스 유닛은 날 유닛(NAL unit)의 일종이다. 시간대별 액세스 유닛들은 시간 순서에 따라 나열되어 다시점 스케일러블 비디오 비트스트림을 구성할 수 있다. 다시점 스케일러블 비디오 비트스트림 구성 방법에 대해서는 이하 도 9 내지 도 13을 참조하여 보다 상세히 설명된다.
The output unit 330 combines the per-layer layer-specific bitstreams generated by the encoder 320 according to a predetermined order based on the view and the layer, thereby generating an output bitstream (that is, a multi-view scalable video bitstream). Create According to an embodiment, the output unit 330 may generate access units according to time, which are arranged in bit streams according to time layers of the same time zone. An access unit is a kind of NAL unit. Timephased access units may be listed in chronological order to form a multiview scalable video bitstream. A method for constructing a multiview scalable video bitstream will be described in more detail with reference to FIGS. 9 to 13.

일실시예에 있어서, 다시점 스케일러블 비디오 코딩 장치(300)는, 액세스 유닛의 시점별 계층별 비트스트림을 식별하기 위해, 공간적 스케일러빌리티(Spatial Scalability) 정보의 식별 정보, 시간적 스케일러빌리티(Temporal Scalability) 정보의 식별 정보, 화질적 스케일러빌리티(Quality Scalability) 정보의 식별 정보 및 시점별 계층별 비트스트림의 식별 정보를 포함하는 신택스를 전송할 수 있다. 이러한 식별 정보에 대한 보다 자세한 내용은 후술한다.
In one embodiment, the multi-view scalable video coding apparatus 300 may identify identification information of spatial scalability information and temporal scalability in order to identify a bitstream for each view layer of the access unit. ) May include a syntax including identification information of the information, identification information of the quality scalability information, and identification information of the bitstream for each layer per view. More details about this identification information will be described later.

도 4 은 일 실시예에 따른 다시점 스케일러블 비디오 디코딩 장치의 구성을 나타내는 블록도이다.4 is a block diagram illustrating a configuration of a multiview scalable video decoding apparatus according to an embodiment.

일실시예에 있어서, 다시점 스케일러블 비디오 디코딩 장치(400)는 분할부(420) 및 복호화부(430)를 포함하여 이루어지며, 다른 일실시예에 있어서, 도 4에 도시된 바와 같이, 다시점 스케일러블 비디오 디코딩 장치(300)는 수신부(410)를 더 포함할 수 있다.In one embodiment, the multi-view scalable video decoding apparatus 400 includes a divider 420 and a decoder 430. In another embodiment, as shown in FIG. The point scalable video decoding apparatus 300 may further include a receiver 410.

수신부(410)는, 다시점 스케일러블 비디오 코딩 방식에 의해 부호화된 비트스트림(즉, 다시점 스케일러블 비디오 비트스트림)을 수신한다. 수신하는 방식의 예로는 유/무선 통신 또는 방송으로 비트스트림을 수신하는 방식, 저장 매체에 기록된 비트스트림을 독취하는 방식 등을 들 수 있으며, 해당 다시점 스케일러블 비디오 디코딩 장치(400)가 설치되는 환경에 따라 다양한 수신 방식이 존재할 수 있음은 이 분야에 종사하는 자라면 충분히 이해할 수 있다.The receiver 410 receives a bitstream (that is, a multiview scalable video bitstream) encoded by a multiview scalable video coding scheme. Examples of the reception method may include a method of receiving a bitstream through wired / wireless communication or broadcasting, a method of reading a bitstream recorded in a storage medium, and the multiview scalable video decoding apparatus 400 is installed. It can be fully understood by those skilled in the art that various reception methods may exist depending on the environment.

일실시예에 있어서, 다시점 스케일러블 비디오 디코딩 장치(400)는, 수신된 비트스트림에 대해, 디코딩하기 원하는 부분의 비트스트림들을 선택할 수도 있다. 예를 들어, 디코더 요청 또는 디스플레이 장치의 재생 능력에 따라, 디코딩을 원하는 비트스트림이 공간적 해상도, 시간적 해상도, 화질 등의 구체적 포맷에 따라 검출될 수 있다. 일실시예에 따른 다시점 스케일러블 비디오 비트스트림은 공간적 스케일러빌리티 기능, 시간적 스케일러빌리티 기능, 화질적 스케일러빌리티 기능을 지원하기 때문에, 공간적 해상도, 시간적 해상도, 화질 등의 구체적 포맷에 적합한 비디오에 해당하는 비트스트림이 다시점 스케일러블 비디오 비트스트림으로부터 선택될 수 있다.In one embodiment, the multiview scalable video decoding apparatus 400 may select bitstreams of a portion to be decoded with respect to the received bitstream. For example, according to the decoder request or the reproduction capability of the display device, the bitstream to be decoded may be detected according to a specific format such as spatial resolution, temporal resolution, image quality, and the like. Since a multi-view scalable video bitstream according to an embodiment supports spatial scalability, temporal scalability, and quality scalability, the multi-view scalable video bitstream corresponds to a video suitable for a specific format such as spatial resolution, temporal resolution, and image quality. The bitstream may be selected from a multiview scalable video bitstream.

분할부(420)는, 비트스트림 수신부(410)에 의해 수신된 비트스트림을, 다시점 스케일러블 비디오 코딩에 따른 복수 개의 시점들 및 복수 개의 계층들에 기초하여, 시점별 계층별 비트스트림들로 분할한다.The divider 420 may convert the bitstream received by the bitstream receiver 410 into bitstreams per view layer, based on a plurality of views and a plurality of layers according to multiview scalable video coding. Divide.

일 실시예에 따른 비트스트림 분할부(420)는, 비트스트림을 시간 순서에 따라 나열된 시간대별 액세스 유닛들로 분할하고, 시간대별 액세스 유닛들을 시간 순서에 따라 동일한 시간대의 시점별 계층별 비트스트림들로 분할할 수 있다. 또한 일 실시예에 따른 비트스트림 분할부(420)는, 각각의 시간대별 액세스 유닛으로부터 한 종류 이상의 계층들에 따른 각각의 계층별 비트스트림을 분할할 수 있다.The bitstream splitter 420 according to an embodiment divides the bitstream into timephased access units listed in a time sequence, and timephased access units of timeline hierarchical bitstreams of the same timezone in a time sequence. Can be divided into In addition, the bitstream splitter 420 according to an embodiment may divide the bitstream for each layer according to one or more layers from each time zone access unit.

일실시예에 따른 분할부(420)는, 비트스트림에 대한 신택스 중, 공간적 스케일러빌리티 정보의 식별 정보, 시간적 스케일러빌리티 정보의 식별 정보, 화질적 스케일러빌리티 정보의 식별 정보 및 시점별 계층별 비트스트림의 식별 정보에 따라, 액세스 유닛의 시점별 계층별 비트스트림을 식별하여 분할할 수 있다. 분할부(420)는 공간적 스케일러빌리티 정보의 식별 정보, 시간적 스케일러빌리티 정보의 식별 정보, 화질적 스케일러빌리티 정보의 식별 정보 및 시점별 계층별 비트스트림의 식별 정보를 이용하여, 원하는 비트스트림 조각을 선택적으로 추출할 수 있다.The splitter 420 according to an embodiment may include: identification information of spatial scalability information, identification information of temporal scalability information, identification information of image quality scalability information, and a bitstream for each layer in view of a syntax for a bitstream According to the identification information of, the layer-specific bitstream for each view of the access unit may be identified and divided. The divider 420 selectively selects a desired bitstream fragment using identification information of spatial scalability information, identification information of temporal scalability information, identification information of image quality scalability information, and identification information of a bitstream for each layer in each view. Can be extracted with

복호화부(430)는, 비트스트림 분할부(420)에 의해 분할된 시점별 계층별 비트스트림들에 대해, 시점 간 정보 및 계층 간 정보를 이용하여 다시점 스케일러블 비디오 디코딩을 수행한다.The decoder 430 performs multi-view scalable video decoding using the inter-view layer information and the inter-layer information on the view-by-layer layer bitstreams divided by the bitstream divider 420.

일실시예에 따른 복호화부(430)는, 복수 개의 시점들 중, 현재 시점 이외의 다른 시점의 비디오 영상 정보를 참조하여 현재 시점의 비디오 영상에 대해 다시점 비디오 디코딩을 수행할 수 있다. 또한, 일실시예에 따른 복호화부(430)는, 현재 시점의 현재 계층의 비디오 영상, 하위 계층의 비디오 영상 정보, 및 현재 계층과 하위 계층 간의 계층 간 예측 정보 중 적어도 하나를 이용하여 스케일러블 비디오 디코딩을 수행할 수 있다. 일 실시예에 따른 스케일러블 비디오 디코딩은, 복수 개의 시점별 계층별 비트스트림들에 대한, 공간적 스케일러블 비디오 디코딩, 시간적 스케일러블 비디오 디코딩 및 화질적 스케일러블 비디오 디코딩 중 적어도 하나를 포함할 수 있다.The decoder 430 according to an embodiment may perform multi-view video decoding on a video image of a current view with reference to video image information of a view other than the current view among a plurality of views. In addition, the decoder 430 according to an embodiment may use the scalable video using at least one of a video image of a current layer, a video layer information of a lower layer, and inter-layer prediction information between the current layer and a lower layer. Decoding can be performed. The scalable video decoding according to an embodiment may include at least one of spatial scalable video decoding, temporal scalable video decoding, and quality scalable video decoding for a plurality of view-by-layer layered bitstreams.

일 실시예에 따른 복호화부(430)는, 현재 시점의 비디오 영상에 대한 하위 계층의 비디오 영상 또는 하위 계층의 비디오 영상 정보의 존부에 따라, 스케일러블 비디오 디코딩의 수행 여부를 판단하고, 시점 방향 예측 코딩 수행 여부에 따라 다시점 비디오 디코딩의 수행 여부를 판단할 수 있다. 판단 결과에 따라, 일 실시예에 따른 디코딩 수행부(430)는, 현재 시점 영상에 대해 다시점 스케일러블 비디오 디코딩, 단일 시점 스케일러블 비디오 디코딩, 다시점 비디오 디코딩 및 단일 시점 비디오 디코딩 중 하나를 수행할 수 있다.The decoder 430 according to an embodiment determines whether to perform scalable video decoding according to whether the lower layer video image or the lower layer video image information is present with respect to the video image of the current view, and predicts the view direction. It may be determined whether to perform multiview video decoding according to whether to perform coding. According to the determination result, the decoding performing unit 430 according to an embodiment performs one of a multiview scalable video decoding, a single view scalable video decoding, a multiview video decoding, and a single view video decoding on the current view image. can do.

일실시예에 따른 복호화부(430)는, 계층별로 예측 구조를 개별적으로 설정하는 계층별 예측 구조에 따라 디코딩을 수행할 수 있다. 따라서, 하위 계층의 예측 구조를 선택적으로 참조하여 현재 계층의 예측 구조를 예측할 수 있다.The decoder 430 according to an embodiment may perform decoding according to the prediction structure for each layer that individually sets the prediction structure for each layer. Therefore, the prediction structure of the current layer can be predicted by selectively referring to the prediction structure of the lower layer.

이에 따라 계층별 예측 구조에 따른 복호화부(430)는, 모든 계층에 대한 계층별 예측 구조를 설정한 후, 각각의 계층별로 계층별 예측 구조에 따라 디코딩할 수 있다. 또한 계층별 예측 구조에 따른 디코딩 수행부(430)는, 계층마다 반복적으로, 예측 구조를 개별적으로 설정한 후 디코딩할 수 있다.Accordingly, the decoder 430 according to the prediction structure for each layer may set the prediction structure for each layer and then decode according to the prediction structure for each layer. In addition, the decoding performing unit 430 according to the prediction structure for each layer may repeatedly set the prediction structure and decode each layer repeatedly.

또한 계층별 예측 구조에 따른 복호화부(430)는, 현재 시점 영상의 현재 계층의 예측 구조가, 현재 시점 영상의 하위 계층의 예측 구조와 동일한지 여부를 판단하고, 판단 결과에 따라 하위 계층의 예측 구조를 참조할지 여부를 결정하여 현재 시점 영상의 예측 구조를 디코딩할 수 있다. 이때 참조되는 하위 계층은, 현재 계층의 한 계층 하위의 하위 계층 또는 복수 개의 하위 계층들 중 선택된 소정 계층일 수 있다.In addition, the decoder 430 according to the prediction structure for each layer determines whether the prediction structure of the current layer of the current view image is the same as the prediction structure of the lower layer of the current view image, and predicts the lower layer according to the determination result. By determining whether to refer to the structure, the prediction structure of the current view image may be decoded. In this case, the referenced lower layer may be a lower layer below one layer of the current layer or a predetermined layer selected from a plurality of lower layers.

디코딩 수행부(430)의 계층별 예측 구조에 따라 하위 계층의 부호화 정보가 현재 계층의 예측 구조와 부합하지 않는 경우, 디코딩 수행부(430)는 하위 계층의 부호화 정보를 사용하지 않거나, 하위 계층의 부호화 정보 중 현재 계층의 예측 구조와 부합하는 부분만 참조하여, 현재 계층의 예측 구조를 예측할 수 있다.If the encoding information of the lower layer does not match the prediction structure of the current layer according to the prediction structure for each layer of the decoding execution unit 430, the decoding execution unit 430 does not use the encoding information of the lower layer or A prediction structure of the current layer may be predicted by referring to only a portion of the encoding information that matches the prediction structure of the current layer.

계층별 예측 구조에 따라, 디코딩 수행부(430)는 공간적 향상 계층의 시점 방향 예측 구조를 공간적 기본 계층의 시점 방향 예측 구조와 독립적으로 결정할 수 있다.According to the layered prediction structure, the decoding performing unit 430 may determine the view direction prediction structure of the spatial enhancement layer independently of the view direction prediction structure of the spatial base layer.

일 실시예에 따른 복호화부(430)에 의해 디코딩된 비디오는, 다양한 해상도, 다양한 시점, 다양한 화질, 다양한 프레임율의 다양한 포맷의 다시점 비디오들로 복원될 수 있다. 다양한 시점, 다양한 화질, 다양한 프레임율의 다양한 포맷의 다시점 비디오는 지원 가능한 디스플레이 장치들에 의해 재생될 수 있다.
The video decoded by the decoder 430 according to an embodiment may be reconstructed into multi-view videos of various formats having various resolutions, various viewpoints, various image quality, and various frame rates. Multi-view video of various formats of various viewpoints, various image quality, and various frame rates may be reproduced by supportable display devices.

일 실시예에 따른 다시점 스케일러블 비디오 인코딩 장치(300)에 의해, 기존 2차원 디스플레이, 스테레오스코픽 디스플레이, 다시점 영상 디스플레이, 자유로운 시점 선택형 디스플레이 등을 포함하는 다양한 시점, QVGA, SD, HD, Full HD 등을 포함하는 다양한 화면 크기, VCD, DVD, HDTV 등을 포함하는 다양한 화질, 5Hz, 15Hz, 30Hz, 60Hz 등을 포함하는 다양한 시간적 해상도 등의 다양한 포맷의 컨텐트가 인코딩되어 비트스트림으로 전송될 수 있다.By the multi-view scalable video encoding apparatus 300 according to an embodiment, various viewpoints including a conventional two-dimensional display, a stereoscopic display, a multi-view video display, a free view selectable display, QVGA, SD, HD, Full Content in a variety of formats, including various screen sizes including HD, various image quality including VCD, DVD, HDTV, and various temporal resolutions including 5Hz, 15Hz, 30Hz, 60Hz, etc., can be encoded and transmitted as bitstreams. have.

또한, 일 실시예에 따른 다시점 스케일러블 비디오 디코딩 장치(400)에 의해, 수신된 비트스트림들로부터 원하는 포맷의 컨텐트에 해당하는 비트스트림을 선택하여 컨텐트가 추출될 수 있다. 이에 따라, 다양한 시점, 다양한 화면 크기, 다양한 화질, 다양한 시간적 해상도를 지원할 수 있는 디스플레이 장치들에게 각각의 환경에 맞는 컨텐트가 제공될 수 있다.In addition, the multi-view scalable video decoding apparatus 400 according to an embodiment may extract a content by selecting a bitstream corresponding to the content having a desired format from the received bitstreams. Accordingly, content suitable for each environment may be provided to display devices capable of supporting various viewpoints, various screen sizes, various image quality, and various temporal resolutions.

따라서, 일 실시예에 따른 다시점 스케일러블 비디오 인코딩 장치(300) 및 일 실시예에 따른 다시점 스케일러블 비디오 디코딩 장치(400)에 의해 구현되는 다시점 스케일러블 비디오 코딩 방식에 따라, 다양한 시점, 다양한 화면 크기, 다양한 화질, 다양한 시간적 해상도를 지원할 수 있는 디스플레이 장치들에게 각각의 포맷에 적합한 컨텐트가 제공할 수 있으며, 다양한 포맷의 영상 정보가 한 단위의 비트스트림를 통해 통합적으로 처리되어 효율적으로 전송 및 수신할 수 있다.Therefore, according to the multi-view scalable video encoding apparatus 300 and the multi-view scalable video decoding apparatus 400 according to an embodiment, various views, Content suitable for each format can be provided to display devices capable of supporting various screen sizes, various image quality, and various temporal resolutions, and image information of various formats is integrated and processed through one unit of bitstream for efficient transmission and Can be received.

도 5 는 일 실시예에 따라 구현 가능한 실감형 다시점 스케일러블 비디오 서비스의 개요도를 도시한다.5 illustrates a schematic diagram of an immersive multiview scalable video service that can be implemented according to an embodiment.

일 실시예에 따른 다시점 스케일러블 비디오 인코딩 장치(300) 및 일 실시예에 따른 다시점 스케일러블 비디오 디코딩 장치(400)는, 다양한 시점, 다양한 해상도, 다양한 화질, 다양한 프레임율을 지원할 수 있어, 실감형 비디오 컨텐트를 다양한 전송 환경과 다양한 단말들에게 효율적으로 전달할 수 있다.The multi-view scalable video encoding apparatus 300 according to an embodiment and the multi-view scalable video decoding apparatus 400 according to an embodiment may support various viewpoints, various resolutions, various image quality, and various frame rates. The immersive video content can be efficiently delivered to various transmission environments and various terminals.

실감형 다시점 스케일러블 비디오 서비스(500)에 의해, HD급 해상도의 다시점 영상 컨텐트(510)를 이용한 다양한 포맷의 비디오 컨텐트들이 제공될 수 있다. 실감형 다시점 스케일러블 비디오 서비스(500) 중 인코더 동작 단계(520)는 일 실시예에 따른 다시점 스케일러블 비디오 인코딩 장치(300)에 의해 구현될 수 있으며, 비트스트림 추출 단계(540)는 일 실시예에 따른 다시점 스케일러블 비디오 디코딩 장치(400)에 의해 구현될 수 있다.By the realistic multiview scalable video service 500, video content of various formats using the multiview image content 510 of HD resolution may be provided. The encoder operation step 520 of the realistic multiview scalable video service 500 may be implemented by the multiview scalable video encoding apparatus 300 according to an embodiment, and the bitstream extraction step 540 may include It may be implemented by the multi-view scalable video decoding apparatus 400 according to the embodiment.

인코더 동작 단계(520)는 HD급 해상도의 다시점 영상 컨텐트(510)에 대해 한번의 인코딩을 수행하여 비트스트림(530)을 생성한다. 비트스트림은 비트스트림 추출 단계(540)로 전송되어, 다양한 해상도, 다양한 화질, 다양한 시간적 해상도, 다양한 시점을 지원하는 디스플레이 장치들에게 각각의 환경에 맞도록 비트스트림이 추출된다.The encoder operation step 520 generates a bitstream 530 by performing a single encoding on the multi-view video content 510 of HD resolution. The bitstream is transmitted to the bitstream extraction step 540, and the bitstream is extracted to fit the environment to display devices supporting various resolutions, various image quality, various temporal resolutions, and various viewpoints.

추출된 비디오 컨텐트들 중 저화질 및 고정 시점의 비디오 컨텐트는 소형 2차원 디스플레이 장치(550)로 제공될 수 있으며, 고화질 및 고정 시점의 비디오 컨텐트는 소형 2차원 디스플레이 장치(551)로, 고화질 및 고정 시점의 비디오 컨텐트는 SD급 2차원 디스플레이 장치(552)로, 고화질 및 고정 시점의 비디오 컨텐트는 HD급 2차원 디스플레이 장치(553)로 제공되어 재생될 수 있다.Among the extracted video contents, the low quality and fixed view video content may be provided to the small 2D display device 550, and the high definition and fixed view video content may be provided to the small 2D display device 551. The video content may be provided to the SD class 2D display device 552 and the high quality and fixed view video content may be provided to the HD class 2D display device 553 for playback.

또한, 추출된 비디오 컨텐트들 중 고화질 및 시점 선택형 비디오 컨텐트는 HD급 2차원 디스플레이 장치(554)로 제공될 수 있으며, 고화질의 비디오 컨텐트는 소형 스테레오 디스플레이 장치(555)로, HD급 스테레오 디스플레이 장치(556) 및 다시점 디스플레이 장치(558)로, 저화질 비디오 컨텐트는 다시점 디스플레이 장치(557)로 제공되어 재생될 수 있다. 이 중 디스플레이 장치들(554, 555, 556, 557, 558)는 실감형 디스플레이 장치들로서, 다시점 비디오 컨텐트를 재생할 수 있다.In addition, high-definition and view-selectable video content among the extracted video content may be provided to the HD-level two-dimensional display device 554, and the high-definition video content to the small stereo display device 555, the HD-level stereo display device ( 556 and the multi-view display device 558, the low quality video content may be provided to the multi-view display device 557 and played back. The display devices 554, 555, 556, 557, and 558 are realistic display devices, and may play multi-view video content.

일 실시예에 따른 실감형 다시점 스케일러블 비디오 서비스(500)에 따라 다양한 종류의 어플리케이션들이 손쉽게 지원될 수 있게 된다. 시점의 종류로는, 기존 2차원 디스플레이, 스테레오스코픽 디스플레이 다시점 영상 디스플레이 장치, 자유로운 시점 선택형 디스플레이 등이 있으며, 화질의 종류로는 QVGA, SD, HD, Full HD 등이 있고, 화질의 종류로는 VCD, DVD, HDTV 등이 있고, 프레임율의 종류로는 5Hz, 15Hz, 30Hz, 60Hz 등이 있다.Various kinds of applications may be easily supported according to the realistic multi-view scalable video service 500 according to an embodiment. Types of viewpoints include conventional two-dimensional displays, stereoscopic display multi-view video display devices, free viewpoint selectable displays, and the types of image quality include QVGA, SD, HD, and Full HD. There are VCD, DVD, HDTV, and the like, and frame rates include 5 Hz, 15 Hz, 30 Hz, and 60 Hz.

일 실시예에 따른 실감형 다시점 스케일러블 비디오 서비스(500)에 의해 유비쿼터스 컴퓨팅 환경 내에서, 다양한 시점, 다양한 화질, 다양한 해상도, 다양한 프레임율이 통합적으로 지원될 수 있으며, 또한 효율적으로 각각의 환경에 적합한 비디오 컨텐트들이 전달될 수 있다.
In the ubiquitous computing environment, the realistic multi-view scalable video service 500 according to an embodiment may support various viewpoints, various image quality, various resolutions, and various frame rates in an integrated manner, and efficiently each environment. Video content suitable for.

이하 도 6 내지 8을 참조하여, 일 실시예에 따른 다시점 스케일러블 비디오 인코딩 장치(400)의 인코딩 방법에 대해 상술된다.6 to 8, a method of encoding the multiview scalable video encoding apparatus 400 according to an embodiment will be described in detail.

도 6 은 일 실시예에 따른 다시점 비디오 인코딩을 위해 구현된 인코더의 구조를 도시한다.
6 illustrates a structure of an encoder implemented for multiview video encoding according to an embodiment.

도 6은 일실시예에 따른 인코더의 구체적인 구성을 예시한다.6 illustrates a detailed configuration of an encoder according to an embodiment.

도 6을 참조하면, 인코더(700)는 제1 내지 제N 시점 비디오 영상들 각각을 부호화하는 제1 내지 제N 부인코더(sub-encoder)(710, 720, 730, 740)로부터 생성된 시점별 계층별 비트스트림을 다중화하여 출력 비트스트림을 생성하는 다중화기(MUX, 750)를 포함하여 이루어진다. 도 6의 제1 내지 제N 부인코더(710, 720, 730, 740)는 도 3의 부호화부(320)에 대응되며, 도 6의 다중화기(750)는 도 3의 출력부(330)에 대응될 수 있다.Referring to FIG. 6, the encoder 700 generates view points generated from first to Nth sub-encoders 710, 720, 730, and 740 encoding respective first to Nth view video images. And a multiplexer (MUX) 750 for multiplexing the bitstream for each layer to generate an output bitstream. The first to N th denier coders 710, 720, 730, and 740 of FIG. 6 correspond to the encoder 320 of FIG. 3, and the multiplexer 750 of FIG. 6 is connected to the output unit 330 of FIG. 3. Can correspond.

각각의 부인코더(710, 720, 730, 740)는 단일 시점 비디오 코딩이 가능하며, 다양한 화면 해상도를 제공할 수 있는 공간적 스케일러빌리티 기능, 다양한 프레임율을 제공할 수 있는 시간적 스케일러빌리티 기능, 다양한 화질을 제공할 수 있는 화질적 스케일러빌리티 기능을 포함한 다양한 스케일러빌리티 기능을 지원하는 스케일러블 비디오 코딩이 가능하다. 또한 이러한 부인코더들 중 적어도 하나(도 6에서는, 제2 내지 제N 부인코더)는, 다른 부인코더(도 6에서는, 제1 부인코더)에 의해 인코딩되어 재구성된 한 개 또는 그 이상의 다른 영상을 참조하는 예측 인코딩이 가능하다. 이러한 다양한 기능을 다양한 방법으로 조합한 형태의 인코딩이 각각의 부인코더(710, 720, 730, 740)에 의해 수행될 수 있다.Each denier coder (710, 720, 730, 740) is capable of single-view video coding, spatial scalability to provide various screen resolutions, temporal scalability to provide various frame rates, and various image quality. Scalable video coding is available that supports a variety of scalability features, including quality scalability capabilities. In addition, at least one of these gyrocoders (second to Nth gypsy coders in FIG. 6) may include one or more other images encoded and reconstructed by another gyrocoder (the first gycoder in FIG. 6). Reference encoding is possible. Encoding of the combination of these various functions in various ways may be performed by the respective denier coders 710, 720, 730, and 740.

일실시예에 있어서, 도 6에 도시된 바와 같이, 부인코더(710, 720, 730, 740)는 입력되는 영상의 수에 따라 물리적 또는 논리적으로 별개로 존재할 수 있다. 이 경우 부인코더(710, 720, 730, 740)는 각각의 인코딩 과정을 병렬적으로 수행할 수도 있다.In an embodiment, as shown in FIG. 6, the denier coders 710, 720, 730, and 740 may exist physically or logically separately according to the number of input images. In this case, the denier coders 710, 720, 730, and 740 may perform each encoding process in parallel.

다른 일실시예에 있어서, 부인코더(710, 720, 730, 740)는, 도 25에서 후술하는 바와 같이, 물리적 또는 논리적으로 하나만 존재하여 각각의 영상의 코딩 순서에 따라 순차적으로 각각의 영상에서 요구하는 형태의 인코딩을 수행할 수도 있다.In another embodiment, the denier coders 710, 720, 730, and 740, as described later in FIG. 25, have only one physically or logically and are required in each image sequentially according to the coding order of each image. It is also possible to perform a form of encoding.

그리고 각각의 부인코더(710, 720, 730, 740)에서 인코딩이 수행된 후의 출력 데이터들(즉, 시점별 계층별 비트스트림)은 다중화기(750)를 통해 하나의 통합된 비트스트림(즉, 다시점 스케일러블 비트스트림)으로 출력된다.
The output data after encoding is performed in each denier coder 710, 720, 730, and 740 (that is, the layer-by-view layer-specific bitstream) is combined through a multiplexer 750 into one integrated bitstream (ie, Multi-view scalable bitstream).

도 7은 도 6의 다시점 스케일러블 비디오 코딩을 위한 인코더(700)의 상세한 구조를 예시 설명하기 위한 도면이다. FIG. 7 is a diagram for describing a detailed structure of an encoder 700 for multiview scalable video coding of FIG. 6.

인코더(800)는, 3개 시점에 따른 비디오 영상(제1 내지 제3 비디오 영상)을 입력 받아, 도 7에 도시된 바와 같이, 각각의 시점마다 2개의 공간적 스케일러블 계층을 지원하는 스케일러빌러티 기능을 지원하는 다시점 스케일러블 비디오 코딩을 수행할 수 있다. 또한 인코더(800)는, 각각의 공간적 스케일러블 계층마다 화질적 스케일러빌리티 기능과 시간적 스케일러빌리티 기능을 지원하며, 서로 다른 시점 영상 정보를 참조하여 시점 방향 예측 코딩을 수행할 수 있다.The encoder 800 receives a video image (first to third video image) according to three viewpoints, and as shown in FIG. 7, scalability to support two spatial scalable layers for each viewpoint. Multi-view scalable video coding that supports the function may be performed. In addition, the encoder 800 may support an image quality scalability function and a temporal scalability function for each spatial scalable layer, and perform view direction prediction coding with reference to different view image information.

제1 내지 제3 부인코더(810, 820, 830) 각각은, 2개의 공간적 스케일러빌리티 기능을 지원하기 위해 입력 영상에 대해 다운샘플링(down-sampling)을 수행하여 원하는 저해상도의 하위 계층의 영상을 생성할 수 있다.Each of the first to third non-coding coders 810, 820, and 830 generates down-sampling images of a desired low resolution by performing down-sampling on the input image to support two spatial scalability functions. can do.

제1 부인코더(810)에서, 다운샘플링부(811)는 제1 시점 비디오 영상에 대해 다운샘플링을 수행하고, VC부(813)는 다운샘플링된 제1 시점 비디오 영상을 부호화한다. 일실시예에 있어서, VC부(813)는 시간적 스케일러빌리티 비디오 코딩을 수행할 수 있다. 이 경우, VC부(813)의 출력은 공간적 기본 계층(즉, 공간적 하위 계층)에 대해 시간적 스케일러빌리티 기능이 지원된 부호화 정보를 포함하게 된다. 시간적 스케일러빌리티 비디오 코딩 방법의 일례로는, H.264와 같이, 계층적인 B-픽처 구조(Hierarchical B-pictures Structure)를 사용하는 방법을 들 수 있으나, 반드시 이에 한정되는 것은 아니다. In the first denier coder 810, the downsampling unit 811 performs downsampling on the first view video image, and the VC unit 813 encodes the downsampled first view video image. In one embodiment, the VC unit 813 may perform temporal scalability video coding. In this case, the output of the VC unit 813 includes encoding information in which a temporal scalability function is supported for the spatial base layer (that is, the spatial lower layer). An example of a temporal scalability video coding method may be a method using a hierarchical B-pictures structure, such as H.264, but is not limited thereto.

QS-VC부(815)는 다운 샘플링된 제1 시점 비디오 영상 및 VC부(813)의 인코딩 결과 데이터(즉, 부호화 정보) 중 적어도 하나를 기초로, 화질적 스케일러빌리티 기능을 지원하기 위한 화질적 스케일러블 비디오 코딩을 수행한다. QS-VC부(815)의 출력은 공간적 기본 계층 또는 시간적 스케일러빌리티 기능이 지원된 공간적 기본 계층에 대해 화질적 스케일러빌리티 기능이 지원된 부호화 정보에 해당한다.The QS-VC unit 815 is based on at least one of the down-sampled first view video image and the encoding result data (that is, encoding information) of the VC unit 813, and the image quality for supporting the image quality scalability function. Perform scalable video coding. The output of the QS-VC unit 815 corresponds to encoding information in which the image quality scalability function is supported for the spatial base layer or the spatial base layer in which the temporal scalability function is supported.

SS-VC부(817)는, 제1 시점 비디오 영상에 대해 공간적 스케일러블 비디오 코딩을 적용한다. SS-VC부(817)에서는 이미 인코딩을 수행한 다운샘플링된 제1 시점 비디오 영상의 정보(즉, 재구성 영상, 부호화 정보를 포함하는 비디오 영상 정보)를 참조하여, 예측 코딩을 수행하여, 공간적 향상 계층(즉, 공간적 상위 계층)에 대한 부호화 정보를 생성할 수 있다. 또한, SS-VC부(817)는, 시간적 스케일러빌리티 코딩을 선택적으로 공간적 스케일러블 비디오 코딩과 함께 수행할 수 있다. 이 경우, SS-VC부(817)의 출력은 공간적 향상 계층에 대해 시간적 스케일러빌러티 기능이 지원된 부호화 정보이다.The SS-VC unit 817 applies spatial scalable video coding to the first view video image. The SS-VC unit 817 performs spatial prediction by performing predictive coding by referring to information (ie, reconstructed video and video video information including encoded information) of the downsampled first view video video that has already been encoded. Encoding information about a layer (ie, a spatial upper layer) may be generated. In addition, the SS-VC unit 817 may selectively perform temporal scalability coding together with spatial scalable video coding. In this case, the output of the SS-VC unit 817 is encoding information in which a temporal scalability function is supported for the spatial enhancement layer.

QS-VC부(819)는 제1 시점 비디오 영상 및 SS-VC부(817)로부터 제공되는 비디오 영상 정보(예컨대, 재구성 영상, 부호화 정보)를 기초로, 제1 시점 비디오 영상에 대한 화질적 스케일러빌리티 비디오 코딩을 수행한다. QS-VC부(819)의 출력은 공간적 향상 계층에 대해 화질적 스케일러빌리티 기능이 지원된 부호화 정보 또는 시간적 스케일러빌리티 기능이 지원된 공간적 향상 계층에 대해 화질적 스케일러빌리티 기능이 지원된 부호화 정보에 해당한다.The QS-VC unit 819 is based on the first view video image and the video image information (eg, reconstructed image and encoding information) provided from the SS-VC unit 817, and the image quality scaler for the first view video image. Perform capability video coding. The output of the QS-VC unit 819 corresponds to encoding information supported by the image quality scalability function for the spatial enhancement layer or encoding information supported by the image quality scalability function to the spatial enhancement layer supported by the temporal scalability function. do.

제2 부인코더(820)에서, 다운샘플링부(821)는 제2 시점 비디오 영상에 대해 다운샘플링을 수행하고, MVC부(823)는 제1 부인코더(810)로부터 제공되는 제1 시점 비디오 영상에 대한 정보(즉, 비디오 영상 정보)를 참조하여, 다운샘플링된 제2 시점 비디오 영상을 부호화한다. MVC(Multi-view Video Coding, H.264 Amendment 4) 방식은, H.264 (MPEG-4 Part 10 Advanced Video Coding(AVC)) 방식에 추가적으로 이미 다른 인코더를 통해 인코딩 되어 재구성된 동일 시간 대의 다른 시점의 영상 정보를 참조하여 인코딩하는 방법을 의미한다. MVC 방식에 따른 비디오 코딩을 수행할 때 각 픽처 단위로 혹은 픽처 타입 단위로 혹은 영상 시퀀스 단위로 이미 인코딩된 동일한 시간대의 다른 시점의 영상 정보를 참조할 수도 있고, 참조하지 않을 수도 있다. 일실시예에 있어서, VC부(813)와 마찬가지로, MVC부(823)는 시간적 스케일러빌리티 비디오 코딩을 수행할 수도 있다. In the second denier coder 820, the downsampling unit 821 performs downsampling on the second viewpoint video image, and the MVC unit 823 is the first viewpoint video image provided from the first denier coder 810. The downsampled second view video image is encoded by referring to information (ie, video image information). MVC (Multi-view Video Coding, H.264 Amendment 4) method, in addition to H.264 (MPEG-4 Part 10 Advanced Video Coding (AVC)) method, another time point of the same time zone that is already encoded and reconstructed by another encoder Refers to the video information of the encoding method. When performing video coding according to the MVC scheme, it may or may not refer to image information of another viewpoint of the same time zone that is already encoded in each picture unit, picture type unit, or image sequence unit. In one embodiment, similar to the VC unit 813, the MVC unit 823 may perform temporal scalability video coding.

M&SS-VC부(827)는 제2 시점 비디오 영상 뿐만 아니라, 제1 부인코더(810)로부터 제공되는 제1 시점 비디오 영상에 대한 정보(즉, 비디오 영상 정보), 및 MVC부(823)로부터 제공되는 비디오 영상 정보를 기초로, 제2 시점 비디오영상에 대해 공간적 스케일러블 비디오 코딩과 다시점 비디오 코딩을 함께 수행한다. 즉, 제2 부인코더(820)의 부호화 정보에는 제1 부인코더 (810)의 부호화 정보와는 달리, 시점 방향 예측 코딩의 결과 데이터가 반영되어 있다. QS-VC(825, 829)는 QS-VC(815, 819)와 마찬가지 원리로 설명되므로 이하 설명은 생략한다. The M & SS-VC unit 827 provides information about the first viewpoint video image provided from the first denier coder 810 as well as the second viewpoint video image, that is, the video image information, and the MVC unit 823. The spatial scalable video coding and the multiview video coding are performed on the second view video image based on the video image information. In other words, unlike the encoding information of the first non-coder 810, the encoding information of the second non-coder 820 reflects the result data of the view direction prediction coding. Since the QS-VCs 825 and 829 are described on the same principle as the QS-VCs 815 and 819, the following description is omitted.

제3 부인코더(830)는 제2 부인코더(820)와 마찬가지 원리로 설명되므로 이하 설명은 생략한다.Since the third non-coder 830 is described in the same principle as the second non-coder 820, a description thereof will be omitted.

각각의 서브-인코더(810, 820, 830)에서 출력한 데이터는 다중화기(MUX, 850)에서 하나의 통합된 비트스트림으로 출력된다. 즉, 도 7의 다중화기(850)는 도 6의 다중화기(750)에 대응될 수 있다.Data output from each sub-encoder 810, 820, 830 is output from the multiplexer MUX 850 as one integrated bitstream. That is, the multiplexer 850 of FIG. 7 may correspond to the multiplexer 750 of FIG. 6.

따라서, 인코더(800)에서 서로 다른 카메라로부터 입력 받은 다른 시점의 3개의 시점 비디오 영상을 입력으로 하여, 공간적 스케일러빌리티 기능, 시간적 스케일러빌리티 기능 및 화질적 스케일리빌리티 기능을 지원하도록 인코딩을 수행하여 생성된 통합된 비트스트림은, 다시점, 공간적 해상도, 프레임율, 화질에 따른 다양한 포맷에 대응하는 비트스트림으로 제공될 수 있다.Therefore, the encoder 800 receives three viewpoint video images of different viewpoints input from different cameras, and encodes and generates the encoding to support the spatial scalability function, the temporal scalability function, and the image quality scalability function. The integrated bitstream may be provided as a bitstream corresponding to various formats according to multi-view, spatial resolution, frame rate, and image quality.

예를 들어, 먼저 제1 시점 비디오 영상의 다운샘플링된 영상의 해상도의 H.264 방식의 비트스트림이 제공될 수 있다. 또한, 제1 시점 비디오 영상의 공간적, 시간적, 화질적 스케일러빌리티를 제공할 수 있는 SVC 방식의 비트스트림이 제공될 수 있다. 그리고 다운샘플링된 해상도의 2개의 시점으로부터 3개의 시점까지의 MVC 방식의 비트스트림이 제공될 수 있다. 그리고 다운샘플링된 해상도의 2개의 시점부터 3개의 시점까지의 영상을 지원하며 각각의 영상에 다양한 스케일러빌리티를 포함하는 비트스트림이 제공할 수 있다. 그리고 입력된 각 시점 비디오 영상과 동일한 해상도를 갖는 2개의 시점부터 3개의 시점까지의 비디오 영상을 지원하며 각각의 영상에 다양한 스케일러빌리티를 포함하는 비트스트림이 제공될 수 있다.For example, first, an H.264 bitstream of the resolution of a downsampled image of a first view video image may be provided. In addition, an SVC-based bitstream capable of providing spatial, temporal, and image quality scalability of the first view video image may be provided. In addition, a bitstream of an MVC scheme from two viewpoints to three viewpoints of downsampled resolution may be provided. A bitstream that supports images from two viewpoints to three viewpoints of downsampled resolution and includes various scalability in each image may be provided. In addition, a bitstream including video data from two viewpoints to three viewpoints having the same resolution as each input view video image and having various scalability in each image may be provided.

인코더(800)의 실시예로부터 더욱 확장하여, 일 실시예에 따른 다시점 스케일러블 비디오 인코딩 장치(300)는 임의의 개수의 시점 비디오 영상들을 입력받을 수 있으며, 각각의 서브-인코더에서 제공되는 스케일러블 계층을 갖는 비트스트림들이 더욱 다양한 조합으로 구성된 다시점 스케일러블 비디오 비트스트림을 생성할 수 있다.Further extending from the embodiment of the encoder 800, the multi-view scalable video encoding apparatus 300 according to an embodiment may receive any number of viewpoint video images, and scale provided by each sub-encoder. Bitstreams having a flexible layer may generate a multiview scalable video bitstream composed of more various combinations.

인코더(800)에 따른 실시예에서 출력된 다시점 스케일러블 비디오 비트스트림은, 다양한 비디오 코딩 방법과 하방 호환성(backward compatibility)을 가질 수 있다. 일 예로 출력된 비트스트림은 기존의 비디오 코딩 방법인 H.264 (MPEG-4 Part 10 Advanced Video Coding(AVC)) 방식을 사용하는 단일 시점(single view-point) 디스플레이 단말들과 호환성을 가질 수 있다. 또 다른 일 예로 출력된 비트스트림은 기존의 비디오 코딩 방법인 SVC(Scalable Video Coding, H.264 Amendment 3) 방식을 사용하는 단일 시점 디스플레이 단말들과 호환성을 가질 수 있다. 또 다른 일 예로 출력된 비트스트림은 기존의 비디오 코딩 방법인 MVC(Multi-view Video Coding, H.264 Amendment 4) 방식을 사용하는 스테레오스코픽 디스플레이 단말 또는 다시점 디스플레이 단말들과 호환성을 가질 수 있다.
The multi-view scalable video bitstream output in the embodiment according to the encoder 800 may have backward compatibility with various video coding methods. As an example, the output bitstream may be compatible with single view-point display terminals using H.264 (MPEG-4 Part 10 Advanced Video Coding (AVC)), which is a conventional video coding method. . As another example, the output bitstream may be compatible with single-view display terminals using SVC (Scalable Video Coding, H.264 Amendment 3) scheme. As another example, the output bitstream may be compatible with stereoscopic display terminals or multi-view display terminals using MVC (Multi-view Video Coding, H.264 Amendment 4) scheme.

이하 도 8 내지 12를 참조하여, 일 실시예에 따른 다시점 스케일러블 비디오 인코딩 장치(300)에 의해 생성되는 비트스트림의 구조에 대해 상술된다.Hereinafter, a structure of a bitstream generated by the multiview scalable video encoding apparatus 300 according to an embodiment will be described with reference to FIGS. 8 through 12.

도 8은 일 실시예에 따라 출력되는 비트스트림을 조합하기 위한 구조를 도시한다.8 illustrates a structure for combining the output bitstreams according to an embodiment.

본 실시예의 다중화기(900)는 도 6의 다중화기(750)에 대응될 수 있다. 일실시예에 있어서, 다중화기(900)는, 복수 개의 부인코더에서 생성된 N(N > 1)개의 비트스트림을 입력받아 각각의 비트스트림을 일정한 단위로 분해하고, 분해한 비트스트림을 조합함으로서 하나의 다시점 스케일러블 비디오 비트스트림을 생성할 수 있다.The multiplexer 900 of the present embodiment may correspond to the multiplexer 750 of FIG. 6. In one embodiment, the multiplexer 900 receives N (N> 1) bitstreams generated from a plurality of denier coders, decomposes each bitstream into predetermined units, and combines the decomposed bitstreams. One multiview scalable video bitstream may be generated.

다중화기(900)에 입력되는 비트스트림들은 다양한 시점 및 다양한 계층(지원하는 스케일러빌리티 기능에 따른 계층)을 지원 가능하게 하는 형태로 인코딩되어 있다. 일 예로 단일 시점의 하나의 화질을 가지는 비트스트림일 수 있고, 공간적 스케일러빌리티, 시간적 스케일러빌리티, 화질적 스케일러빌리티 등 다양한 스케일러빌리티를 지원하는 비트스트림일 수 있다. 또한 여러 개의 시점을 지원하는 비트스트림일 수도 있다.The bitstreams input to the multiplexer 900 are encoded in a form that can support various views and various layers (layers according to supported scalability functions). For example, the image may be a bitstream having one image quality at a single point of view, or may be a bitstream supporting various scalability such as spatial scalability, temporal scalability, and image quality scalability. It may also be a bitstream supporting multiple views.

도 9는 다시점 스케일러블 비디오 인코딩에 의해 생성되는 비트스트림을 예시한다.9 illustrates a bitstream generated by multiview scalable video encoding.

비트스트림(1000)은, 다중화기(900)에서 입력받은 N개의 비트스트림을 하나의 다시점 스케일러블 비디오 비트스트림으로 구성된 비트스트림이다. 비트스트림 0, 비트스트림 1, 비트스트림 2 내지 비트스트림 N의 N개의 비트스트림은 각각 일정한 단위의 비트스트림 조각으로 분할되어, 동일한 시간대의 화면을 나타내는 비트스트림 조각끼리 조합되어 전체 비트스트림을 구성한다.The bitstream 1000 is a bitstream including N bitstreams input from the multiplexer 900 as one multiview scalable video bitstream. N bitstreams of bitstream 0, bitstream 1, bitstream 2, and bitstream N are each divided into bitstream fragments of a predetermined unit, and bitstream fragments representing screens of the same time zone are combined to form an entire bitstream. .

예를 들어, 비트스트림 0, 비트스트림 1, 비트스트림 2 내지 비트스트림 N이 분할된 비트스트림 조각으로 분할되어, 동일 시간대별로 비트스트림 조각 1010, 1020, 1030 내지 1040가 순서대로 조합되고, 시간 순서에 따라 동일 시간대별로 비트스트림 조각 1050, 1060, 1070 내지 1080가 순서대로 조합될 수 있다.For example, bitstream 0, bitstream 1, bitstream 2 to bitstream N are divided into divided bitstream fragments, whereby bitstream fragments 1010, 1020, 1030 to 1040 are combined in sequence for the same time zone, and in chronological order. Accordingly, the bitstream fragments 1050, 1060, and 1070 to 1080 may be combined in the same time slot in order.

분할된 비트스트림 조각은 일 예로 픽처에 해당하는 비트스트림일 수 있고, 또는 하나의 화면을 구성하는 슬라이스에 해당하는 비트스트림일 수도 있다. 각각의 비트스트림 조각은, 하나의 시점 및 하나의 계층로 식별되는 시점별 계층별 비트스트림일 수 있다.The divided bitstream fragment may be, for example, a bitstream corresponding to a picture, or may be a bitstream corresponding to a slice constituting one screen. Each bitstream fragment may be a per-layer layer-specific bitstream identified by one view and one layer.

도 10은 일 실시예에 따른 다시점 스케일러블 비디오 인코딩에 의해 생성된 비트스트림의 구성을 도시한다.10 illustrates a configuration of a bitstream generated by multiview scalable video encoding according to an embodiment.

N개의 비트스트림(1110, 1120, 1130, 1140)은 시간적으로 임의 접근이 가능한 X개의 서브-액세스 유닛(Sub-Access Unit; Sub-AU)으로 구성되어 있다. 일 실시예에 따른 비트스트림 출력부(330)는, 비트스트림(1110, 1120, 1130, 1140)을 서브-액세스 유닛들로 분해하여 동일한 시간대의 서브-액세스 유닛별로 조합하여 하나의 액세스 유닛(1152, 1154, 1156)을 구성하고, 구성된 액세스 유닛들(1152, 1154, 1156)을 다시 시간적으로 순차적으로 조합하여 다시점 스케일러블 비디오 비트스트림(1150)을 구성한다.The N bitstreams 1110, 1120, 1130, and 1140 are composed of X sub-access units (Sub-AUs) that are randomly accessible in time. The bitstream output unit 330 according to an embodiment may decompose the bitstreams 1110, 1120, 1130, and 1140 into sub-access units, and combine the substreams of the same time zone into one access unit 1152. , 1154, 1156, and sequentially combine the configured access units 1152, 1154, and 1156 in time to form a multiview scalable video bitstream 1150.

각각의 비트스트림을 구성하는 서브-액세스 유닛의 일 예는 단일 시점의 하나의 픽처 또는 슬라이스로 구성될 수 있다. 또한 서브-액세스 유닛의 다른 일 예는, 하나의 픽처에 대한 공간적 또는 화질적 스케일러빌리티 계층들로 구성될 수 있으며, 또 다른 일 예로 여러 개의 시점 중 하나의 시점의 픽처 또는 슬라이스로 구성될 수 있다.One example of a sub-access unit constituting each bitstream may consist of one picture or slice at a single point in time. In addition, another example of the sub-access unit may be composed of spatial or image quality scalability layers for one picture, and as another example, may be composed of a picture or slice of one of several views. .

도 11은 일 실시예에 따른 시점별 계층별 비트스트림들을 도시한다.11 is a diagram illustrating bitstreams according to layers according to an embodiment, according to an embodiment.

일실시예에 있어서, 부호화부(320)는 서로 다른 시점을 가진 3개의 시점 비디오 영상을 시점 0, 시점 2, 시점 1의 순서로 코딩하고, 각각의 시점은 2개의 공간적 스케일러빌리티 계층(DId 0, DId 1)으로 구성되며, 각각의 공간적 스케일러빌리티 계층은 다시 2개의 화질적 스케일러빌리티 계층(QId 0, QId 1)으로 구성된 3개의 비트스트림을 시점 0 비트스트림(1200), 시점 2 비트스트림(1260), 시점 1 비트스트림(1230)의 순서로 출력한다. 출력부(330)가 비트스트림들(1200, 1260, 1230)을 입력받았을 때 다시점 스케일러블 비디오 비트스트림을 구성하는 방법이 이하 도 12를 참조하여 보다 상세히 설명된다.According to an embodiment, the encoder 320 codes three viewpoint video images having different viewpoints in the order of viewpoint 0, viewpoint 2, and viewpoint 1, and each viewpoint includes two spatial scalability layers DId 0. And DId 1), and each spatial scalability layer further includes three bitstreams composed of two image quality scalability layers (QId 0 and QId 1) at a time point 0 bitstream 1200 and a time point 2 bitstream ( 1260), and output in the order of the time point 1 bitstream 1230. A method of configuring a multiview scalable video bitstream when the output unit 330 receives the bitstreams 1200, 1260, and 1230 is described in more detail with reference to FIG. 12.

도 11의 하나의 블록은 날 유닛(NAL Unit)을 나타내며, 하나의 날 유닛은 하나의 화면을 나타내는 픽처 또는 슬라이스로 구성되며, 하나의 온전한 화면 또는 하위 계층의 화질을 향상시키기 위한 잔여 신호 또는 공간적 해상도를 높이기 위한 잔여 신호들이 포함되어 있을 수 있다. 각각의 시점 비디오 영상의 비트스트림은 시점 별로 구분하기 위해 VId 0, VId 1, VId 2로 표기되며, 가로 방향은 시간 순서를 나타내며 T0, T1, T2......로 표기된다. 세로 방향은 동일한 시간의 공간적, 화질적 계층을 나타내며, 공간적 기본 계층은 DId 0, 공간적 향상 계층은 DId 1로 표기되고, 화질적 기본 계층은 QId 0, 화질적 향상 계층은 QId 1로 표기된다.One block of FIG. 11 represents a NAL unit, and one raw unit is composed of a picture or a slice representing one screen, and a residual signal or spatial space for improving the quality of one intact screen or a lower layer. Residual signals may be included to increase the resolution. The bitstreams of the respective view video images are labeled VId 0, VId 1, and VId 2 in order to distinguish them by view, and the horizontal direction indicates a time sequence and is denoted by T0, T1, T2. The vertical direction represents a spatial and image quality layer of the same time, the spatial base layer is denoted as DId 0, the spatial enhancement layer is denoted as DId 1, the image quality base layer is denoted as QId 0, and the image quality enhancement layer is denoted as QId 1.

각각의 비트스트림은 임의의 시간으로 접근이 가능한 서브-액세스 유닛으로 구성이 되어 있는데, 서브-액세스 유닛은 동일한 시간의 공간적 계층 및 화질적 계층을 포함한 날 유닛으로 구성된다. 일 예로 시간 T0의 DId 0 및 QId 0의 날 유닛(1216), T0의 DId 0 및 QId 1의 날 유닛(1211), T0의 DId 1 및 QId 0의 날 유닛(1206), T0의 DId 1 및 QId 1의 날 유닛(1201)이 시간 T0에 대한 서브-액세스 유닛이 되며, 시간 T1의 DId 0 및 QId 0(1217), T1의 DId 0 및 QId 1(1212), T1의 DId 1 및 QId 0(1207), T1의 DId 1 및 QId 1(1202)의 날 유닛이 시간 T1에 대한 서브-액세스 유닛이 된다.Each bitstream consists of sub-access units that can be accessed at any time, which is composed of raw units including spatial and image quality layers at the same time. For example, the day unit 1216 of DId 0 and QId 0 of time T0, the day unit 1211 of DId 0 and QId 1 of T0, the day unit 1206 of DId 1 and QId 0 of T0, and the DId 1 of T0 and Day unit 1201 of QId 1 becomes the sub-access unit for time T0, DId 0 and QId 0 1217 of time T1, DId 0 and QId 1 1212 of T1, DId 1 and QId 0 of T1. 1207, the day unit of DId 1 and QId 1 1202 of T1 becomes the sub-access unit for time T1.

도 12는 일실시예에 따른 다시점 스케일러블 비디오 비트스트림의 상세한 구성을 도시한다.12 illustrates a detailed configuration of a multiview scalable video bitstream according to an embodiment.

일 실시예에 있어서, 출력부(330)는, 각각의 시점의 비트스트림에서 획득한 서브-액세스 유닛을 동일한 시간 T의 다른 시점의 서브-액세스 유닛과 조합하여, 다시점 스케일러블 비디오 비트스트림의 액세스 유닛으로 구성한다.In one embodiment, the output unit 330 combines the sub-access units obtained in the bitstream at each time point with the sub-access units at different time points in the same time T, so that It consists of an access unit.

일 예로 시간 T0의 액세스 유닛(1310)은 VId0의 T0의 서브-액세스 유닛들(1216, 1211, 1206, 1201), VId1의 T0의 서브-액세스 유닛들(1246, 1241, 1236, 1231), VId 2의 T0의 서브-액세스 유닛들(1276, 1271, 1266, 1261)으로 구성 될 수 있다. 서브-액세스 유닛들을 액세스 유닛으로 조합하는 순서의 일 예는, 시점 번호 순서인 VId 0의 T0의 서브-액세스 유닛들(1216, 1211, 1206, 1201), VId 1의 T0의 서브-액세스 유닛들(1246, 1241, 1236, 1231), VId 2의 T0의 서브-액세스 유닛들(1276, 1271, 1266, 1261)의 순서로 구성될 수 있다. 또 다른 일 예로, 비트스트림이 코딩된 순서인 VId 0의 T0의 서브-액세스 유닛들(1216, 1211, 1206, 1201), VId 2의 T0의 서브-액세스 유닛들(1276, 1271, 1266, 1261), VId 1의 T0의 서브-액세스 유닛들(1246, 1241, 1236, 1231)의 순서로 구성할 수 있다.As an example, the access unit 1310 at time T0 may include sub-access units 1216, 1211, 1206, 1201 of T0 of VId0, sub-access units 1246, 1241, 1236, 1231 of T0 of VId1, VId. Two sub-access units 1276, 1271, 1266, 1261. An example of the order of combining the sub-access units into an access unit is sub-access units 1216, 1211, 1206, 1201 of T0 of VId 0, which is the view number order, sub-access units of T0 of VId 1. 1246, 1241, 1236, 1231, and sub-access units 1276, 1271, 1266, 1261 of T0 of VId 2. As another example, the sub-access units 1216, 1211, 1206, 1201 of T0 of VId 0, in which the bitstream is coded, and the sub-access units 1276, 1271, 1266, 1261 of T0 of VId 2. ) May be configured in the order of the sub-access units 1246, 1241, 1236, and 1231 of T0 of VId 1.

시간 T1의 액세스 유닛(1320)은 VId0의 T1의 서브-액세스 유닛들(1217, 1212, 1207, 1202), VId1의 T1의 서브-액세스 유닛들(1247, 1242, 1237, 1232), VId2의 T1의 서브-액세스 유닛들(1277, 1272, 1267, 1262)로 구성될 수 있다.Access unit 1320 at time T1 is sub-access units 1217, 1212, 1207, 1202 of T1 of VId0, sub-access units 1247, 1242, 1237, 1232 of T1 of VId1, and T1 of VId2. Of sub-access units 1277, 1272, 1267, and 1262.

시간 T2의 액세스 유닛(1330)은 VId0의 T2의 서브-액세스 유닛들(1218, 1213, 1208, 1203), VId1의 T2의 서브-액세스 유닛들(1248, 1243, 1238, 1233), VId2의 T2의 서브-액세스 유닛들(1278, 1273, 1268, 1263)으로 구성될 수 있다.The access unit 1330 at time T2 is the sub-access units 1218, 1213, 1208, 1203 of T2 of VId0, the sub-access units 1248, 1243, 1238, 1233 of T2 of VId1, and T2 of VId2. Of sub-access units 1278, 1273, 1268, 1263.

시간 T3의 액세스 유닛(1340)은 VId0의 T3의 서브-액세스 유닛들(1219, 1214, 1209, 1204), VId1의 T3의 서브-액세스 유닛들(1249, 1244, 1239, 1234), VId2의 T3의 서브-액세스 유닛들(1279, 1274, 1269, 1264)으로 구성될 수 있다.Access unit 1340 at time T3 includes sub-access units 1219, 1214, 1209, 1204 at T3 of VId0, sub-access units 1249, 1244, 1239, 1234 at T3 of VId1, and T3 at VId2. Of sub-access units 1279, 1274, 1269, and 1264.

시간 TX의 액세스 유닛(1350)은 VId0의 TX의 서브-액세스 유닛들(1220, 1215, 1210, 1205), VId1의 TX의 서브-액세스 유닛들(1250, 1245, 1240, 1235), VId2의 TX의 서브-액세스 유닛들(1280, 1275, 1270, 1265)으로 구성될 수 있다.Access unit 1350 of time TX is sub-access units 1220, 1215, 1210, 1205 of TX of VId0, sub-access units 1250, 1245, 1240, 1235 of TX of VId1, TX of VId2. Of sub-access units 1280, 1275, 1270, 1265.

각각의 시점별 계층별 비트스트림이 한 종류 이상의 계층별 비트스트림으로 구성되는 경우, 각각의 시간대별 액세스 유닛은 각각의 계층별 비트스트림을 포함할 수 있다.When each layer of time-specific bitstreams is composed of one or more kinds of layer-specific bitstreams, each time zone-specific access unit may include a layer-specific bitstream.

일실시예에 있어서, 출력부(330)는, 구성된 액세스 유닛들(1320, 1330, 1340, 1350)을 시간 순서대로 배치하여 다시점 스케일러블 비디오 비트스트림(1330)을 구성한다.In one embodiment, the output unit 330 configures the multi-view scalable video bitstream 1330 by arranging the configured access units 1320, 1330, 1340, and 1350 in time order.

다시점 스케일러블 비디오 비트스트림 중, 시점 비디오 영상별 서브-액세스 유닛을 구분하고 액세스 유닛 내에서 서브-액세스 유닛들을 순서대로 배치하기 위해 필요한 정보를 비트스트림 내에서 표기할 수 있다. 표기하는 방법의 구현의 일 예로서 날 유닛 헤더(NAL Unit Header)에 표 1과 같은 신택스를 추가하여 구현할 수 있다.Among the multi-view scalable video bitstreams, information necessary for distinguishing sub-access units for each view video image and arranging the sub-access units in the access unit in order may be indicated in the bitstream. As an example of the implementation of the notation method, the syntax shown in Table 1 may be added to the NAL unit header.

nal_unit_header_smvc( ){nal_unit_header_smvc () { .......... dependency_iddependency_id quality_idquality_id temporal_idtemporal_id 비트스트림 구별_idBitstream distinction_id .......... }}

일 실시예에 따른 다시점 스케일러블 비디오 인코딩 장치(300)는, 날 유닛 헤더에 각각의 비트스트림 내에서 서브-액세스 유닛을 식별하기 위해서 공간적 스케일러빌리티 정보를 나타내는 신택스 'dependency_id', 화질적 스케일러빌리티 정보를 나타내는 신택스 'quality_id', 시간적 스케일러빌리티 정보를 나타내는 신택스 'temporal_id' 를 추가할 수 있다.The multi-view scalable video encoding apparatus 300 according to an embodiment may include a syntax 'dependency_id' representing spatial scalability information in order to identify a sub-access unit in each bitstream in a raw unit header, and image quality scalability. A syntax 'quality_id' indicating information and a syntax 'temporal_id' indicating temporal scalability information may be added.

또한 일 실시예에 따른 다시점 스케일러블 비디오 인코딩 장치(300)는, 날 유닛 헤더에, 액세스 유닛 내에서 시점 비디오 영상들 간의 조합 순서를 구분하기 위해서 각각의 날 유닛이 어떠한 시점 비디오 영상의 비트스트림의 날 유닛인지 표기하기 위한 '비트스트림 구별_id'가 추가될 수 있다. '비트스트림 구별_id'의 예로는, 여러 시점의 비디오 영상들 중에서 현재 시점의 식별 정보를 나타내는 'view_id'를 들 수 있다.In addition, the multi-view scalable video encoding apparatus 300 according to an embodiment may include, in the raw unit header, each raw unit is a bitstream of a certain viewpoint video image in order to distinguish a combination order between viewpoint video images in the access unit. 'Bitstream discrimination_id' may be added to indicate whether the unit is a. An example of 'bitstream discrimination_id' may include 'view_id' indicating identification information of a current view among video images of various views.

이하 도 13 내지 17을 참조하여, 일 실시예에 따른 다시점 스케일러블 비디오 디코딩 장치(400)의 디코딩 방법에 대해 보다 상세히 설명된다.Hereinafter, a decoding method of the multiview scalable video decoding apparatus 400 according to an embodiment will be described in more detail with reference to FIGS. 13 to 17.

도 13은 일 실시예에 따른 다시점 비디오 디코딩을 위한 구현된 디코더의 구조를 도시한다.13 illustrates a structure of an implemented decoder for multiview video decoding according to an embodiment.

디코더(1400)는 인코더에서 생성한 다시점 스케일러블 비디오 비트스트림은 혹은 그 비트스트림으로부터 일정한 기능을 위해 추출된 비트스트림을 입력받는다. 디코더(1400)는 비트스트림에서 제공하고자 하는 수준(다양한 시점, 다양한 화질, 다양한 공간적 해상도, 다양한 시간적 해상도)의 영상에 대한 데이터를 추출하여 디코딩을 수행하고, M(N ≥ M ≥ 1)개의 시점별로 디코딩된 영상을 출력할 수 있다. 디코더(1400)는 일 실시예에 따른 다시점 비디오 디코딩 장치(400)의 복호화부 (430)를 구현할 수 있다.The decoder 1400 receives a multiview scalable video bitstream generated by an encoder or a bitstream extracted for a predetermined function from the bitstream. The decoder 1400 extracts and decodes data of an image of a level (various viewpoints, various image quality, various spatial resolutions, and various temporal resolutions) to be provided in the bitstream, and decodes M (N ≥ M ≥ 1) viewpoints. The decoded video may be output. The decoder 1400 may implement the decoder 430 of the multiview video decoding apparatus 400 according to an embodiment.

도 14는 일 실시예에 따른 다시점 스케일러블 비디오 디코딩을 위한 구현된 디코더의 구조를 도시한다.14 illustrates a structure of an implemented decoder for multiview scalable video decoding according to an embodiment.

디코더(1500)는 도 4의 복호화부(430)를 구현하는 일 실시예로서, 디코더(1500)에 다시점 스케일러블 비디오 비트스트림 혹은 그 비트스트림에서 다양한 응용을 위해 선택 또는 추출된 비트스트림이 입력된다. 입력된 비트스트림은 역다중화기(DEMUX, 1510)를 통해 M개의 시점별 영상에 해당하는 데이터로 분리된다. 각각의 분리된 데이터는 각각 대응하는 제1 내지 제M 부디코더(1520, 1530, 1540, 1550) 중 하나로 입력된다. 부디코더들(1520, 1530, 1540, 1550)은 각각의 데이터에 대한 디코딩을 수행하여 시점별로 디코딩된 비디오 영상 1 내지 M 을 출력한다.The decoder 1500 is an embodiment of implementing the decoder 430 of FIG. 4, wherein a multiview scalable video bitstream or a bitstream selected or extracted from the bitstream for various applications is input to the decoder 1500. do. The input bitstream is separated into data corresponding to M view images through a demultiplexer (DEMUX) 1510. Each separated data is input to one of the corresponding first to Mth decoders 1520, 1530, 1540, and 1550, respectively. The decoders 1520, 1530, 1540, and 1550 decode the respective data to output video images 1 to M decoded for each view.

부디코더들(1520, 1530, 1540, 1550)은 단일 시점을 지원하는 비트스트림에 대한 디코딩이 가능하며, 공간적 스케일러빌리티 기능, 시간적 스케일러빌리티 기능, 화질적 스케일러빌리티 기능 등의 다양한 스케일러빌리티 기능을 지원하는 비트스트림에 대해 디코딩할 수 있다. 또한, 부디코더들(1520, 1530, 1540, 1550)은, 다른 부디코더를 통해 디코딩된 하나 또는 그 이상의 다른 시점의 영상 정보를 참조하는 예측 디코딩에 따라 비트스트림에 대한 다시점 비디오 디코딩을 수행할 수 있다. 부디코더들(1520, 1530, 1540, 1550)은, 전술한 다양한 기능이 다양한 방법으로 조합된 형태의 비트스트림에 대한 디코딩을 수행할 수 있는 디코더이다.The decoders 1520, 1530, 1540, and 1550 can decode a bitstream that supports a single viewpoint, and support various scalability functions such as spatial scalability, temporal scalability, and quality scalability. Can be decoded for the bitstream. Also, the decoders 1520, 1530, 1540, and 1550 may perform multi-view video decoding on the bitstream according to predictive decoding referring to image information of one or more different views decoded through other decoders. Can be. The decoders 1520, 1530, 1540, and 1550 are decoders capable of decoding a bitstream in which various functions described above are combined in various ways.

예를 들어 부디코더들(1520, 1530, 1540, 1550)은, H.264 방식에 따라 코딩된 단일 시점을 지원하는 비트스트림에 대한 디코딩을 수행할 수 있다. 또한 부디코더들(1520, 1530, 1540, 1550)은, SVC 방식에 따라 공간적 스케일러빌리티 기능, 시간적 스케일러빌리티 기능, 화질적 스케일러빌리티 기능을 지원하는 비트스트림에 대한 디코딩을 수행할 수 있다. 부디코더들(1520, 1530, 1540, 1550)은, MVC 방식에 따라 다른 서브-디코더를 통해 디코딩된 다른 시점의 영상 정보를 참조하여 비트스트림에 대한 디코딩을 수행할 수 있다. 또한, 부디코더들(1520, 1530, 1540, 1550)은, H.264 방식, SVC 방식, MVC 방식의 기능을 다양한 방법으로 조합한 형태의 비트스트림에 대한 디코딩을 수행할 수 있도록 디코더를 구성할 수 있다.For example, the decoders 1520, 1530, 1540, and 1550 may perform decoding on a bitstream supporting a single view coded according to the H.264 scheme. In addition, the decoders 1520, 1530, 1540, and 1550 may decode a bitstream supporting a spatial scalability function, a temporal scalability function, and a quality scalability function according to the SVC scheme. The decoders 1520, 1530, 1540, and 1550 may perform decoding on the bitstream by referring to image information of another view decoded through another sub-decoder according to an MVC scheme. In addition, the decoders 1520, 1530, 1540, and 1550 may configure a decoder to decode a bitstream having a combination of H.264, SVC, and MVC functions in various ways. Can be.

부디코더들(1520, 1530, 1540, 1550)은, 물리적 또는 논리적으로 하나만 존재하여 각각의 시점별 영상이 디코딩되는 순서에 따라 순차적으로 각각의 비트스트림의 형태에 따라 디코딩을 수행할 수도 있다. 또한 부디코더들(1520, 1530, 1540, 1550)은, 입력된 비트스트림에 대응하여 출력할 시점별 영상의 수에 따라 물리적 또는 논리적으로 별개로 존재할 수 있다. 이 경우, 부디코더들(1520, 1530, 1540, 1550)의 디코딩 동작을 병렬로 수행할 수도 있다.The sub-coders 1520, 1530, 1540, and 1550 may physically or logically perform decoding according to the shape of each bitstream sequentially in the order in which the image for each view is decoded. Also, the decoders 1520, 1530, 1540, and 1550 may exist physically or logically separately according to the number of images for each view to be output corresponding to the input bitstream. In this case, the decoding operations of the decoders 1520, 1530, 1540, and 1550 may be performed in parallel.

도 15는 다른 실시예에 따른 다시점 스케일러블 비디오 디코딩을 위한 구현된 디코더의 구조를 도시한다.15 illustrates a structure of an implemented decoder for multiview scalable video decoding according to another embodiment.

디코더(1600)는 디코딩 수행부(430)를 구현하는 다른 실시예로서, 디코더(1500)에 비트스트림 선택 모듈(1610)이 추가된 구조와 상응한다. 비트스트림 선택 모듈(1610)에서는 입력받은 비트스트림 중에 현재 단말의 디코딩 능력, 예를 들어 화면 해상도, 프레임율, 컴퓨팅 성능, 전력 소비량, 시점의 수 등에 따라서, 또는 단말에서 요청에 따라서 비트스트림에서 필요한 부분만을 선택할 수 있다.The decoder 1600 is another embodiment of implementing the decoding performing unit 430 and corresponds to a structure in which the bitstream selection module 1610 is added to the decoder 1500. The bitstream selection module 1610 may be required in the bitstream according to the decoding capability of the current terminal, for example, the screen resolution, the frame rate, the computing performance, the power consumption, the number of views, or the like in the input bitstream. Only parts can be selected.

일례로 방송망(Broadcasting Network)으로부터 전송받은 비트스트림이 1920ㅧ1080의 공간적 해상도까지 지원하는 다시점 스케일러블 비디오 비트스트림이지만, 현재 디스플레이 단말이 1280ㅧ720의 해상도를 지원할 경우, 비트스트림 선택 모듈(1610)은 전송받은 비트스트림으로부터 1280ㅧ720의 해상도에 해당하는 비트스트림만을 선택해서, 나머지 디코딩 동작들을 수행할 수 있다.
For example, if the bitstream received from the broadcasting network is a multiview scalable video bitstream that supports spatial resolution of 1920 x 1080, but the current display terminal supports the resolution of 1280 x 720, the bitstream selection module 1610 ) May select only a bitstream corresponding to a resolution of 1280x720 from the received bitstream and perform the remaining decoding operations.

도 16은 일 실시예에 따라, 각 부디코더에서 각 픽처 또는 슬라이스 단위로 디코딩을 수행하는 방법의 흐름도를 도시한다.16 is a flowchart of a method of performing decoding in each picture or slice unit in each subdecoder according to an embodiment.

일 실시예에 따른 복호화부(430)가 복수 개의 시점별 서브-디코더들을 포함하는 경우, 각각의 부디코더들은 흐름도(1700)에 따라 각각의 픽처, 슬라이스 등의 처리 단위로 현재 시점의 영상에 대해 디코딩을 수행할 수 있다. 설명의 편의를 위해 픽처 단위로 현재 시점의 영상에 대해 디코딩을 수행되는 것으로 기술되나, 일 실시예에 따른 다시점 스케일러블 비디오 코딩의 영상 처리 단위는 픽처에 한정되지 않고, 픽처, 슬라이스, 프레임, 필드, 블록 등의 기술 분야에서 사용되는 다양한 처리 단위를 포괄할 수 있다.When the decoder 430 according to an embodiment includes a plurality of view-specific sub-decoders, each of the sub-decoders may be processed with respect to an image of the current view in processing units of each picture, slice, etc. according to the flowchart 1700. Decoding can be performed. For convenience of description, decoding is performed on an image of a current view in picture units, but an image processing unit of multi-view scalable video coding according to an embodiment is not limited to a picture, and includes a picture, a slice, a frame, It is possible to cover various processing units used in the technical fields such as fields and blocks.

단계 1710에서, 현재 디코딩하려고 하는 픽처에 대해 하위 계층이 존재하는지 판단된다. 하위 계층이 존재한다면, 디코딩 동작이 단계 1720으로 진행하고, 만약 존재하지 않는다면 단계 1750으로 진행한다.In step 1710, it is determined whether a lower layer exists for a picture to be currently decoded. If there is a lower layer, the decoding operation proceeds to step 1720 and, if not present, to step 1750.

단계 1720에서, 현재 디코딩하려고 하는 픽처가 다른 부디코더에 의해 디코딩된 다른 시점의 비디오 영상들 간의 시점 방향 예측을 수행하는지 판단된다. 시점 방향 예측이 수행된다면 디코딩 동작이 단계 1730으로 진행하고, 만약 수행되지 않는다면 단계 1740으로 진행한다.In operation 1720, it is determined whether a picture to be decoded currently performs view direction prediction between video images of different views decoded by another sub-decoder. If the viewpoint direction prediction is performed, the decoding operation proceeds to step 1730, and if not performed to step 1740.

단계 1730에서, 현재 픽처에 대해 단일 시점 및 다시점 스케일러블 비디오 디코딩이 수행된다. 즉, 현재 픽처에 대해, 하위 계층의 비디오 영상 정보 및/또는 다른 부디코더의 디코딩을 통하여 얻어지는 다른 시점의 비디오 영상 정보를 예측에 참조하는 디코딩이 수행되거나, 현재 부디코더에 의해 이미 디코딩된 현재 계층 및 현재 시점에 대한 비디오 영상 정보를 예측 데이터로 참조하여 디코딩이 수행될 수 있다. 예를 들어 단일 시점 및 다시점 스케일러블 비디오 디코딩은, SVC 방식와 MVC 방식의 기능이 결합된 형태의 MSVC(Multi-view Scalable Video Coding) 방식으로 구현될 수 있다.In step 1730, single view and multiview scalable video decoding is performed on the current picture. That is, for the current picture, decoding is performed to refer to prediction of video image information of lower layers and / or video image information of another viewpoint obtained through decoding of another sub decoder, or is currently decoded by the current sub decoder. And decoding may be performed by referring to the video image information of the current view as the prediction data. For example, single-view and multi-view scalable video decoding may be implemented in a multi-view scalable video coding (MSVC) scheme in which the functions of the SVC scheme and the MVC scheme are combined.

단계 1740에서, 현재 픽처에 대해 단일 시점 스케일러블 비디오 디코딩이 수행된다. 즉, 현재 픽처에 대해, 하위 계층의 비디오 영상 정보를 예측 데이터로 참조하여 디코딩이 수행되거나, 현재 부디코더에 의해 이미 디코딩된 현재 계층 및 현재 시점의 비디오 영상 정보를 예측 데이터로 참조하여 디코딩이 수행될 수 있다. 예를 들어, 단일 시점 스케일러블 비디오 디코딩은 SVC 방식으로 구현될 수 있다.In step 1740, single view scalable video decoding is performed on the current picture. That is, decoding is performed by referring to video image information of a lower layer as prediction data with respect to the current picture, or decoding is performed by referring to video image information of the current layer and the current view that have already been decoded by the current sub-coder as prediction data. Can be. For example, single view scalable video decoding may be implemented in an SVC manner.

단계 1750에서, 현재 디코딩하려고 하는 픽처가 다른 부디코더에 의해 디코딩된 다른 시점의 영상과의 시점 방향 예측을 수행하는지 판단된다. 시점 방향 예측이 수행한다면 디코딩 동작이 단계 1760으로 진행하고, 수행하지 않는다면 단계 1770으로 진행한다.In operation 1750, it is determined whether the current picture to be decoded performs view direction prediction with an image of another view decoded by another sub decoder. If the view direction prediction is to be performed, the decoding operation proceeds to step 1760, and if not performed to step 1770.

단계 1760에서, 현재 픽처에 대해 단일 시점 및 다시점 비디오 디코딩이 수행된다. 즉, 현재 픽처에 대해, 다른 부디코더에 의해 디코딩된 다른 시점 영상의 정보를 예측 데이터로 참조하여 디코딩이 수행되거나, 현재 부디코더에 의해 디코딩된 현재 시점 비디오 영상의 정보를 예측 데이터로 참조하여 디코딩이 수행될 수 있다. 예를 들어, 단일 시점 및 다시점 비디오 디코딩은 MVC 방식으로 구현될 수 있다.In step 1760, single view and multiview video decoding is performed on the current picture. That is, decoding is performed by referring to the information of another viewpoint image decoded by another sub-coder as prediction data, or decoded by referring to the information of the current viewpoint video image decoded by the current sub-coder as prediction data. This can be done. For example, single view and multiview video decoding may be implemented in an MVC manner.

단계 1770에서, 현재 픽처에 대해 단일 시점 비디오 디코딩이 수행된다. 즉, 현재 픽처에 대해, 현재 부디코더에 의해 디코딩된 현재 시점 영상의 정보를 예측 데이터로 참조하여 디코딩이 수행될 수 있다. 예를 들어 단일 시점 비디오 디코딩은 H.264 방식으로 구현될 수 있다.In step 1770, single view video decoding is performed on the current picture. That is, decoding may be performed on the current picture by referring to the information of the current view image decoded by the current sub decoder as prediction data. For example, single view video decoding may be implemented in an H.264 manner.

도 17은 일 실시예에 따른, 단일 시점 및 시점 간 스케일러블 비디오 디코딩에 필요한 정보들을 도시한다.17 illustrates information required for single view and inter-view scalable video decoding, according to an embodiment.

일 실시예에 따른 단일 시점 및 시점 간 스케일러블 비디오 디코딩 방식에 따른 디코더(1800)는, 먼저 픽처, 슬라이스 등의 처리 단위의 비트스트림을 입력받는다. 디코더(1800)는, 비트스트림에 해당하는 비디오 영상의 현재 계층 및 현재 시점에 대비하여, 하위 계층의 정보와 다른 시점의 영상 정보, 현재 시점의 영상 정보를 예측 데이터로 이용하여 디코딩을 수행하고, 디코딩된 픽처, 슬라이스 등의 영상 데이터를 출력할 수 있다.The decoder 1800 according to the single view and the inter-view scalable video decoding scheme according to an embodiment first receives a bitstream of a processing unit such as a picture and a slice. The decoder 1800 performs decoding by using information of a lower layer, image information of a different view, and image information of a current view as prediction data, in preparation for a current layer and a current view of a video image corresponding to a bitstream, Video data such as a decoded picture or slice may be output.

일 실시예에 따른 하위 계층의 정보는 텍스쳐(Texture) 정보, 움직임 벡터, 참조 픽쳐 번호, 잔여 데이터(residual data) 정보 등 다양한 하위 계층에 대한 부호화 정보를 포함할 수 있다.The lower layer information according to an embodiment may include encoding information about various lower layers such as texture information, a motion vector, a reference picture number, and residual data information.

일 실시예에 따른 다른 시점의 영상 정보는, 다른 시점에 대한 디코더에 의해 디코딩된 다른 시점의 비디오 영상에 대한 부호화 정보며, 다른 시점 비디오 영상의 디코딩된 비디오 영상 자체 또는 그 일부분이 될 수 있다. 또한 다른 시점의 영상 정보는, 다른 시점 영상의 움직임 벡터 또는 참조 픽쳐 번호 등의 움직임 정보가 될 수도 있으며, 이외 다양한 다른 영상에 대한 부호화 정보 등을 포함할 수 있다.The image information of another viewpoint according to an embodiment may be encoding information about a video image of another viewpoint decoded by a decoder of another viewpoint, and may be a decoded video image itself or a part of the other viewpoint video image. In addition, the image information of another viewpoint may be motion information such as a motion vector or a reference picture number of another viewpoint image, and may include encoding information about various other images.

일 실시예에 따른 현재 시점의 영상 정보는, 현재 시점에 대한 현재 디코더에서 디코딩된 현재 시점의 비디오 영상에 대한 부호화 정보이며, 현재 디코더에서 이미 디코딩된 영상 자체 또는 그 일부분이 될 수 있다. 또한 현재 시점의 영상 정보는, 이미 디코딩된 현재 시점 영상의 움직임 벡터, 참조 픽쳐 번호 등의 움직임 정보, 현재 영상 내의 이미 디코딩된 영역 또는 픽셀의 정보 등 다양한 현재 시점의 비디오 영상에 대한 부호화 정보를 포함할 수 있다.The image information of the current view according to an embodiment may be encoding information about a video image of the current view decoded by the current decoder for the current view, and may be an image itself or a portion thereof that is already decoded by the current decoder. Also, the image information of the current view includes encoding information of various video views of the current view, such as motion vectors of the current view image, which is already decoded, motion information such as a reference picture number, and information of regions or pixels already decoded in the current image. can do.

일 실시예에 따른 다시점 스케일러블 비디오 인코딩에 의해 다시점 영상들의 부호화 정보를 통합하며 공간적 스케일러블 계층, 시간적 스케일러블 계층 및 화질적 스케일러블 계층을 지원하는 비트스트림을 생성할 수 있다. 일 실시예에 따른 다시점 스케일러블 비디오 디코딩에 의해, 공간적 스케일러블 계층, 시간적 스케일러블 계층 및 화질적 스케일러블 계층을 비트스트림으로부터, 원하는 공간적 계층, 시간적 계층 및 화질적 계층의 비디오 영상에 대응하는 비트스트림을 선택적으로 추출하여 디코딩할 수 있다.
Multi-view scalable video encoding according to an embodiment integrates encoding information of multi-view images and generates a bitstream supporting a spatial scalable layer, a temporal scalable layer, and an imageable scalable layer. By multi-view scalable video decoding according to an embodiment, the spatial scalable layer, the temporal scalable layer, and the quality scalable layer correspond to video images of a desired spatial layer, temporal layer, and quality layer from the bitstream. The bitstream can be selectively extracted and decoded.

이하 도 18 내지 24를 참고하여, 일 실시예에 따른 부호화부(320) 및 일 실시예에 따른 복호화부(430)의 계층별 예측 구조에 대해 상술된다.Hereinafter, the prediction structure for each layer of the encoder 320 and the decoder 430 according to an embodiment will be described in detail with reference to FIGS. 18 to 24.

도 18은 일 실시예에 따른 계층별 예측 구조에 따라, 현재 계층의 예측 구조를 설정하는 일 방법의 흐름도(1900)를 도시한다.18 illustrates a flowchart 1900 of a method of setting a prediction structure of a current layer, according to a layer-by-layer prediction structure, according to an embodiment.

일 실시예에 있어서, 부호화부(320)는, 계층별로 현재 계층의 시간 방향 및/또는 시점 방향의 예측 구조를 설정한 후 설정한 예측 구조에 따라 현재 계층을 코딩하는 동작을, 계층마다 반복할 수 있다. 복호화부(430)도 상술한 실시예에 따른 부호화부(320)와 마찬가지의 원리로, 계층별로 현재 계층의 시간 방향 및/또는 시점 방향의 예측 구조를 설정한 후 설정한 예측 구조에 따라 현재 계층을 디코딩하는 동작을 수행할 수 있다.In an embodiment, the encoder 320 may set the prediction structure in the time direction and / or the view direction of the current layer for each layer, and then repeat the operation of coding the current layer according to the set prediction structure for each layer. Can be. The decoder 430 also has the same principle as the encoder 320 according to the above-described embodiment, and sets the prediction structure of the time direction and / or the view direction of the current layer for each layer, and then sets the current layer according to the prediction structure. The decoding operation may be performed.

단계 1910에서, 현재 계층의 예측 구조가 설정된다.In step 1910, the prediction structure of the current layer is set.

단계 1920에서, 단계 1910에서 설정된 현재 계층의 예측 구조에 따라 현재 계층의 코딩이 수행된다.In step 1920, coding of the current layer is performed according to the prediction structure of the current layer set in step 1910.

단계 1910 및 단계 1920를 계층 수만큼 반복한다. 여기서 계층은 모든 종류의 계층이 될 수도 있으며, 특정한 종류의 계층이 될 수도 있다.Steps 1910 and 1920 are repeated by the number of layers. The hierarchy can be any kind of hierarchy or a specific kind of hierarchy.

도 19는 일 실시예에 따른 계층별 예측 구조에 따라, 현재 계층의 예측 구조를 설정하는 다른 방법의 흐름도(2000)를 도시한다.19 illustrates a flowchart 2000 of another method of setting a prediction structure of a current layer, according to a layer-by-layer prediction structure, according to an embodiment.

일 실시예에 있어서, 부호화부(320)는, 코딩을 수행하기 전에 모든 계층에 대해 각각의 시간 방향 및/또는 시점 방향의 예측 구조를 설정한 후, 각각의 계층에 대해 순차적으로 앞서 설정된 예측 구조에 따라 코딩할 수 있다. 복호화부(430)도 상술한 실시예에 따른 부호화부(320)와 마찬가지의 원리로, 모든 계층에 대한 시간 방향 및/또는 시점 방향의 예측 구조를 설정한 후, 각각의 계층에 대해 순차적으로 앞서 설정된 예측 구조에 따라 디코딩할 수 있다.In an embodiment, the encoder 320 sets a prediction structure in each time direction and / or a view direction for all layers before performing coding, and then sequentially predicts the structure previously set for each layer. Can be coded according to The decoder 430 also has the same principle as the encoder 320 according to the above-described embodiment, and sets a prediction structure in the time direction and / or the view direction for all layers, and then sequentially advances each layer. It can be decoded according to the set prediction structure.

단계 2010에서, 모든 계층에 대해 각각의 계층별 예측 구조가 설정된다.In step 2010, each layer-specific prediction structure is set for all layers.

단계 2020에서, 단계 2010에서 설정된 예측 구조에 따라 현재 계층의 코딩이 수행된다. 계층별로 순차적으로 단계 2010에서 설정된 예측 구조에 따라 코딩이 수행되는 동작이 계층 수만큼 반복된다. 여기서 계층은 모든 종류의 계층이 될 수도 있으며, 특정한 종류의 계층이 될 수도 있다.In step 2020, coding of the current layer is performed according to the prediction structure set in step 2010. The operations in which coding is performed according to the prediction structure set in step 2010 are sequentially repeated for each layer by the number of layers. The hierarchy can be any kind of hierarchy or a specific kind of hierarchy.

일 실시예에 따른 계층별 예측 구조에 따라, 공간적 계층, 시간적 계층, 화질적 계층 등 다양한 계층의 스케일러블 계층별로 적응적으로 설정된 시간 방향 또는 시점 방향의 예측 구조에 따라 예측 코딩을 수행함으로써, 코딩의 랜덤 액세스 성능을 향상시킬 수 있다.According to the prediction structure for each layer according to an embodiment, coding is performed by performing prediction coding according to a prediction structure of a temporal direction or a view direction that is adaptively set for each scalable layer of various layers such as a spatial layer, a temporal layer, and an image quality layer. Can improve random access performance.

각각의 계층의 예측 구조를 설정하기 위해 다양한 방식이 이용될 수 있다. 일 실시예에 따른 부호화부(320)는, 각각의 계층의 예측 구조를 설정하기 위해, 사용자 커맨드를 입력받아 현재 계층의 시간 방향 및/또는 시점 방향의 예측 구조를 설정할 수 있다. 또한 일 실시예에 따른 부호화부(320)는 각각의 계층의 예측 구조를 설정하기 위해, 미리 정해 놓은 계층별 예측 구조에 따를 수도 있다. 예를 들어, 두 개의 공간적 계층에 대한 공간적 스케일러블 비디오 코딩을 수행할 경우, 공간적 기본 계층(spatial base layer)에서는 시간 방향의 예측 구조 및 시점 방향의 예측 구조가 모두 채택되고, 공간적 향상 계층(spatial enhancement layer)에서는 시간 방향의 예측 구조만이 채택될 수 있다.Various schemes can be used to set the prediction structure of each layer. In order to set the prediction structure of each layer, the encoder 320 according to an embodiment may receive a user command and set a prediction structure of a time direction and / or a view direction of the current layer. In addition, the encoder 320 according to an embodiment may follow a predetermined prediction structure for each layer in order to set the prediction structure of each layer. For example, when spatial scalable video coding of two spatial layers is performed, a spatial base layer adopts both a temporal prediction structure and a view direction prediction structure, and a spatial enhancement layer. In the enhancement layer, only a prediction structure in the time direction may be adopted.

일 실시예에 따른 복호화부(430)는, 인코더로부터 전송된 정보에 기초하여 현재 계층의 예측 구조를 설정할 수 있다. 또한 일 실시예에 따른 복호화부(430)는, 계층별 예측 구조를 인코더와 디코더에서 함께 미리 정해 놓은 예측 구조로 설정할 수도 있다.The decoder 430 according to an embodiment may set a prediction structure of the current layer based on the information transmitted from the encoder. In addition, the decoder 430 according to an embodiment may set the prediction structure for each layer to a prediction structure that is previously determined by the encoder and the decoder together.

도 20은 다른 실시예에 따른 계층별 예측 구조에 따라, 하위 계층의 예측 구조를 선택적으로 사용하여 현재 계층의 예측 구조를 코딩하는 일 방법의 흐름도(2100)를 도시한다.20 is a flowchart 2100 of a method of coding a prediction structure of a current layer by selectively using a prediction structure of a lower layer, according to a layer-by-layer prediction structure, according to another embodiment.

다른 실시예에 따른 각각의 계층별 예측 구조를 따라, 일 실시예에 따른 부호화부(320) 및 일 실시예에 따른 복호화부(430)는, 현재 계층의 예측 구조를 코딩하기 위해 하위 계층의 예측 구조를 선택적으로 사용할 수 있다.According to the prediction structure for each layer according to another embodiment, the encoder 320 and the decoder 430 according to an embodiment predict the lower layer to code the prediction structure of the current layer. The structure can optionally be used.

단계 2110에서, 현재 계층의 예측 구조의 코딩을 위해 한 계층 아래의 하위 계층의 예측 구조를 사용할지 여부가 판단된다. 하위 계층의 예측 구조가 사용된다면 단계 2120로 진행하고, 만약 사용되지 않는다면 단계 2130으로 진행한다.In step 2110, it is determined whether to use the prediction structure of the lower layer below one layer for coding the prediction structure of the current layer. If the prediction structure of the lower layer is used, the process proceeds to step 2120, and if not, proceeds to step 2130.

단계 2120에서 현재 계층의 예측 구조가 하위 계층의 예측 구조와 동일하게 설정되고, 단계 2130에서 현재 계층의 예측 구조가 별도로 설정된다.In operation 2120, the prediction structure of the current layer is set to be the same as that of the lower layer, and in operation 2130, the prediction structure of the current layer is separately set.

도 20을 참조하여 전술된 하위 계층의 예측 구조를 선택적으로 참조한 현재 계층의 예측 구조 설정 방법에 따르면, 현재 계층의 예측 구조에서 바로 하위 계층의 예측 구조를 사용할지에 대한 정보를 코딩함으로써 바로 하위 계층의 예측 구조를 참조할지 여부가 판단될 수 있다. 일 실시예에 따른 다시점 스케일러블 비디오 인코딩 장치(300)는 현재 계층의 예측 구조 코딩을 위해 하위 계층의 예측 구조를 사용할지에 대한 정보를 플래그(flag)로 코딩할 수 있다.According to the method of setting the prediction structure of the current layer selectively referring to the prediction structure of the lower layer described above with reference to FIG. 20, the information on whether to use the prediction structure of the lower layer directly in the prediction structure of the current layer may be used. It may be determined whether to refer to the prediction structure. The multi-view scalable video encoding apparatus 300 according to an embodiment may code information on whether to use a prediction structure of a lower layer for coding a prediction structure of a current layer as a flag.

일 실시예에 따른 다시점 스케일러블 비디오 디코딩 장치(400)는 수신된 비트스트림으로부터 플래그 정보를 추출하여, 플래그 정보가 1인 경우에는 해당 비트스트림의 현재 계층의 예측 구조를 하위 계층의 예측 구조와 동일하게 설정하며, 0인 경우에는 현재 계층의 예측 구조를 별개로 설정할 수 있다.The multi-view scalable video decoding apparatus 400 according to an embodiment extracts flag information from the received bitstream, and when the flag information is 1, the prediction structure of the current layer of the corresponding bitstream may be compared with the prediction structure of the lower layer. In the case of 0, the prediction structure of the current layer may be separately set.

계층마다 예측 구조의 정보를 코딩한다고 가정하였을 때, 일 실시예에 따른 다시점 스케일러블 비디오 인코딩 장치(300) 및 일 실시예에 따른 다시점 스케일러블 비디오 디코딩 장치(400)는, 표 2와 같은 신택스를 사용할 수 있다.Assuming that the information of the prediction structure is coded for each layer, the multiview scalable video encoding apparatus 300 and the multiview scalable video decoding apparatus 400 according to an embodiment are shown in Table 2 below. You can use syntax.

layer_prediction_structure( ){layer_prediction_structure () { use_lower_layer_pred_struct_flaguse_lower_layer_pred_struct_flag if( use_lower_layer_pred_struct_flag = = 0 ) {if (use_lower_layer_pred_struct_flag = = 0) { (현재 계층의 예측 정보를 설정)...(Set forecast information for the current layer) ... }} }}

계층별 예측 구조의 정보를 코딩하는 신택스 'layer_prediction_structure' 는, 현재 계층의 예측 구조의 코딩을 위해 하위 계층의 예측 구조를 사용할지 여부에 대한 플래그 정보인 'use_lower_layer_pred_struct_flag'를 포함할 수 있다.The syntax 'layer_prediction_structure' for coding the information of the layer-by-layer prediction structure may include 'use_lower_layer_pred_struct_flag' which is flag information on whether to use the prediction structure of the lower layer for coding the prediction structure of the current layer.

현재 계층의 예측 구조가 바로 하위 계층의 예측 구조를 사용한다면 'use_lower_layer_pred_struct_flag'가 1로 설정되어 현재 계층의 예측 구조가 하위 계층의 정보와 동일하게 설정될 수 있다.If the prediction structure of the current layer directly uses the prediction structure of the lower layer, 'use_lower_layer_pred_struct_flag' is set to 1 so that the prediction structure of the current layer may be set to be the same as the information of the lower layer.

현재 계층의 예측 구조를 예측하기 위해 하위 계층의 예측 구조를 참조하지 않는다면 'use_lower_layer_pred_struct_flag'가 0으로 설정되어 현재 계층의 예측 정보가 별개로 설정된다.If the prediction structure of the lower layer is not referred to to predict the prediction structure of the current layer, 'use_lower_layer_pred_struct_flag' is set to 0 to separately set prediction information of the current layer.

예를 들어, 현재 계층의 예측 구조는 시점 방향의 예측 구조와 시간 방향의 예측 구조 모두가 될 수도 있으며, 다른 일례로 시점 방향만의 예측 구조, 또 다른 일례로 시간 방향만의 예측 구조일 수 있다.For example, the prediction structure of the current layer may be both a prediction structure in the view direction and a prediction structure in the time direction. As another example, the prediction structure in the view direction may be a prediction structure only in the view direction, and in another example, the prediction structure in the time direction only. .

도 21은 다른 실시예에 따른 계층별 예측 구조에 따라, 하위 계층의 예측 구조를 선택적으로 사용하여 현재 계층의 예측 구조를 코딩하는 다른 방법의 흐름도(2200)를 도시한다. 편의상, 부호화부(320)의 동작을 위주로 설명하는 것이며, 그에 대응되는 복호화부(430)의 동작도 마찬가지 원리로 설명된다.FIG. 21 illustrates a flowchart 2200 of another method of coding a prediction structure of a current layer by selectively using a prediction structure of a lower layer, according to the layer-by-layer prediction structure, according to another embodiment. For convenience, the operation of the encoder 320 will be mainly described, and the operation of the decoder 430 corresponding thereto will be described on the same principle.

일 실시예에 따른 부호화부(320)는 현재 계층의 예측 구조를 설정하기 위해 하위 계층의 예측 구조를 선택적으로 사용할 수 있도록 하는 방법의 또 다른 일례로, 현재 계층을 기준으로 N 번째 하위 계층의 예측 구조가 현재 계층의 예측 구조와 동일한 예측 구조인 경우, 현재 계층의 예측 구조를 N번째 하위 계층의 예측 구조를 사용하여 코딩할 수 있다.The encoder 320 according to an embodiment may be another example of a method of selectively using a prediction structure of a lower layer to set a prediction structure of a current layer. The prediction of the Nth lower layer based on the current layer is performed. If the structure is the same prediction structure as the prediction structure of the current layer, the prediction structure of the current layer may be coded using the prediction structure of the Nth lower layer.

단계 2210에서, 현재 계층의 예측 구조의 코딩에서 하위 계층들 중에 특정 계층의 예측 구조를 사용할 것인지 여부가 판단된다. 특정 계층의 예측 구조가 사용된다면 단계 2220으로 진행하고, 만약 사용되지 않는다면 단계 2230으로 진행한다.In step 2210, it is determined whether to use the prediction structure of a specific layer among the lower layers in the coding of the prediction structure of the current layer. If a prediction layer of a specific layer is used, the process proceeds to step 2220, and if not, proceeds to step 2230.

단계 2220에서, 현재 계층의 예측 구조가 특정 N 번째 하위 계층의 예측 구조와 동일하게 설정된다. 단계 2230에서, 현재 계층의 예측 구조는 별도로 설정된다.In step 2220, the prediction structure of the current layer is set equal to the prediction structure of a specific Nth lower layer. In step 2230, the prediction structure of the current layer is set separately.

흐름도(2200)의 방법에 따르면, 일 실시예에 따른 다시점 스케일러블 비디오 디코딩 장치(400)는 현재 계층의 예측 구조의 코딩에서 하위 계층의 예측 구조를 사용할지 여부에 대한 정보를 디코딩함으로써 N번째 하위 계층의 예측 구조를 사용할지 여부를 판단할 수 있다. 일 실시예에 따른 다시점 스케일러블 비디오 인코딩 장치(300)는 현재 계층의 예측 구조로서 N번째 하위 계층의 예측 구조를 사용할지여부에 대한 정보를 플래그로 코딩할 수 있다. 플래그 정보가 1인 경우에는 N 번째 하위 계층의 예측 구조와 동일하게 현재 계층의 예측 구조가 코딩되며, 0인 경우에는 하위 계층의 예측 구조와 동일하게 현재 계층의 예측 구조가 코딩되지 않고, 현재 계층의 예측 구조가 별개로 설정된다.According to the method of the flowchart 2200, the multi-view scalable video decoding apparatus 400 according to an embodiment decodes information on whether to use a prediction structure of a lower layer in coding of a prediction structure of a current layer, thereby decoding the N th. It may be determined whether to use a prediction structure of a lower layer. The multi-view scalable video encoding apparatus 300 according to an embodiment may code information on whether to use the prediction structure of the Nth lower layer as a prediction structure of the current layer with a flag. If the flag information is 1, the prediction structure of the current layer is coded in the same manner as the prediction structure of the Nth lower layer. If the flag information is 0, the prediction structure of the current layer is not coded in the same manner as the prediction structure of the lower layer. The prediction structure of is set separately.

그리고 하위 계층의 예측 구조와 동일하게 현재 계층의 예측 구조가 코딩될 경우에는 예측의 대상이 되는 하위 계층을 나타내는 정보가 코딩될 수 있다. 예를 들어 예측의 대상이 되는 하위 계층을 나타내는 정보로서, 현재 계층과 하위 계층의 차이값을 코딩한 값이 코딩될 수 있다.When the prediction structure of the current layer is coded in the same manner as the prediction structure of the lower layer, information indicating the lower layer to be predicted may be coded. For example, as information indicating a lower layer to be predicted, a value obtained by coding a difference value between a current layer and a lower layer may be coded.

계층마다 예측 구조의 정보를 코딩한다고 가정하였을 때, 일 실시예에 따른 다시점 스케일러블 비디오 인코딩 장치(300) 및 일 실시예에 따른 다시점 스케일러블 비디오 디코딩 장치(400)는, 표 3과 같은 신택스를 사용할 수 있다.Assuming that the information of the prediction structure is coded for each layer, the multiview scalable video encoding apparatus 300 and the multiview scalable video decoding apparatus 400 according to an embodiment are shown in Table 3 below. You can use syntax.

layer_prediction_structure( ){layer_prediction_structure () { use_lower_layer_pred_struct_flaguse_lower_layer_pred_struct_flag if( use_lower_layer_pred_struct_flag = = 0 ) {if (use_lower_layer_pred_struct_flag = = 0) { (현재 계층의 예측정보를 설정)...(Set forecast information for the current layer) ... }} else {else { delta_lower_layer_minus1delta_lower_layer_minus1 }} }}

계층별 예측 구조의 정보를 코딩하는 신택스 'layer_prediction_structure'는 현재 계층의 예측 구조의 코딩에서 하위 계층의 예측 구조를 사용할지에 대한 플래그 정보인 'use_lower_layer_pred_struct_flag'를 포함할 수 있다. 만일 현재 계층의 예측 구조의 코딩이 하위 계층의 예측 구조를 사용한다면, 'use_lower_layer_pred_struct_flag'는 1로 설정되고 현재 계층의 예측 구조가 하위 계층의 정보와 동일하게 설정될 수 있다. 만일 현재 계층의 예측 구조가 하위 계층의 예측 구조를 참조하지 않는다면 'use_lower_layer_pred_struct_flag'는 0으로 설정되고 현재 계층의 예측 정보가 별도로 설정된다.The syntax 'layer_prediction_structure' for coding the information of the layer-specific prediction structure may include 'use_lower_layer_pred_struct_flag', which is flag information on whether to use the prediction structure of the lower layer in the coding of the prediction structure of the current layer. If the coding of the prediction structure of the current layer uses the prediction structure of the lower layer, 'use_lower_layer_pred_struct_flag' may be set to 1 and the prediction structure of the current layer may be set equal to the information of the lower layer. If the prediction structure of the current layer does not refer to the prediction structure of the lower layer, 'use_lower_layer_pred_struct_flag' is set to 0 and prediction information of the current layer is set separately.

또한, 'use_lower_layer_pred_struct_flag'가 1인 경우에는 예측의 대상이 되는 하위 계층과 현재 계층의 차이값에서 -1을 뺀 값을 나타내는 'delta_lower_layer_mius1'가 코딩될 수 있다. 예를 들어, 바로 하위 계층의 예측 구조를 사용하여 현재 계층의 예측 구조를 코딩하는 경우에는 'delta_lower_layer_minus1' 값은 0이 된다.In addition, when 'use_lower_layer_pred_struct_flag' is 1, 'delta_lower_layer_mius1' indicating a value obtained by subtracting -1 from a difference value between a lower layer to be predicted and the current layer may be coded. For example, when coding the prediction structure of the current layer using the prediction structure of the lower layer, the value of 'delta_lower_layer_minus1' becomes zero.

일 실시예에 따른 계층별 예측 구조에 따라, 하위 계층의 예측 구조를 사용하여 현재 계층의 예측 구조의 정보를 코딩하는 경우, 현재 계층의 예측 구조와 하위 계층의 예측 구조가 부합하지 않는 경우가 발생할 수 있다.According to the prediction structure for each layer according to an embodiment, when the information of the prediction structure of the current layer is coded using the prediction structure of the lower layer, a case where the prediction structure of the current layer does not match the prediction structure of the lower layer may occur. Can be.

일례로, 현재 픽처에서 각각의 현재 계층의 블록의 예측 코딩을 수행할 경우에 하위 계층의 블록의 예측 코딩 결과가 현재 계층의 픽쳐의 예측 구조와 맞지 않는다면, 현재 계층의 블록의 예측 코딩을 위해 하위 계층의 정보가 사용되지 않을 수 있다. 예를 들어, 만약 현재 계층의 픽처의 예측 구조가 시간 방향의 예측만을 허용하는데 하위 계층의 블록의 예측 코딩 결과가 시점 방향의 예측을 수행하였다면 상위 계층의 블록에서 하위 계층의 정보가 사용되지 않을 수 있다.For example, when performing predictive coding of a block of each current layer in the current picture, if the predictive coding result of the block of the lower layer does not match the prediction structure of the picture of the current layer, the lower layer is used for predictive coding of the block of the current layer. The information of the layer may not be used. For example, if the prediction structure of the picture of the current layer allows only prediction in the temporal direction and the prediction coding result of the block in the lower layer performs prediction in the view direction, the information of the lower layer may not be used in the upper layer block. have.

다른 예로, 현재 픽처에서 각각의 현재 계층의 블록의 예측 코딩을 수행할 경우에 하위 계층의 블록의 예측 코딩 결과가 현재 계층의 픽쳐의 예측 구조와 맞지 않는다면, 하위 계층의 정보 중에 현재 계층의 픽처의 예측 구조에 맞는 부분만을 이용하여 현재 계층이 예측될 수 있다. 예를 들어, 만약 상위 계층의 픽쳐의 예측 구조가 시간 방향의 예측만을 허용하는데 하위 계층의 블록의 예측 코딩 결과가 시간 방향과 시점 방향의 예측을 모두 수행하였다면, 현재 계층의 블록에서는 하위 계층의 정보 중에 시간 방향의 예측 정보만이 사용될 수 있다.As another example, when performing prediction coding of blocks of each current layer in the current picture, if the prediction coding result of the blocks of the lower layer does not match the prediction structure of the picture of the current layer, the information of the picture of the current layer may be The current layer may be predicted using only a portion that fits the prediction structure. For example, if the prediction structure of the picture of the upper layer allows only prediction in the time direction and the prediction coding result of the block in the lower layer performs both the prediction in the time direction and the view direction, the information in the lower layer is included in the block of the current layer. Only prediction information in the time direction may be used.

또 다른 예로, 현재 픽처에서 각각의 현재 계층의 블록의 예측 코딩을 수행할 경우에 하위 계층의 블록의 예측 코딩 결과가 현재 계층의 픽쳐의 예측 구조와 맞지 않는다면, 하위 계층의 정보 중에 블록의 타입 또는 블록의 분할 정보만이 참조되어 현재 계층의 예측 코딩에 사용될 수 있다. 예를 들어, 만약 상위 계층의 픽쳐의 예측 구조가 시간 방향의 예측만을 허용하는데 하위 계층의 블록의 예측 코딩 결과가 시점 방향의 예측을 수행하였다면, 현재 계층에서는 하위 계층의 블록의 블록의 타입 또는 블록의 분할 정보만이 참조되어 사용될 수 있다.As another example, when performing prediction coding of a block of each current layer in the current picture, if the prediction coding result of the block of the lower layer does not match the prediction structure of the picture of the current layer, the type of the block or Only the partition information of the block is referenced and can be used for predictive coding of the current layer. For example, if the prediction structure of the picture of the upper layer allows only prediction in the temporal direction and the prediction coding result of the block in the lower layer performs prediction in the view direction, the type or block of the block of the block in the lower layer in the current layer. Only the partition information of can be referred to and used.

도 22은 일 실시예에 따라 4개의 시점 및 2개의 공간적 계층을 가지는 다시점 스케일러블 비디오 코딩에서 공간적 현재 계층 및 하위 계층의 예측 구조의 일례를 도시한다.22 illustrates an example of a prediction structure of a spatial current layer and a lower layer in multiview scalable video coding having four views and two spatial layers, according to an embodiment.

계층별 예측 구조의 또 다른 예로, 2개의 공간적 계층을 지원하는 다시점 스케일러블 비디오 코딩에서, 시점 방향 예측 구조를 계층별로 적응적으로 구현할 수 있다. 기본 계층 및 향상 계층은 각각 2개의 공간적 계층 중 상위 계층 및 하위 계층에 해당한다. 시간 방향의 GOP의 크기는 8이며, 4개의 시점(S0, S1, S2, S3)별로 시점별 픽처들이 인코딩된다.As another example of the prediction structure for each layer, in multi-view scalable video coding supporting two spatial layers, the view direction prediction structure may be adaptively implemented for each layer. The base layer and the enhancement layer correspond to an upper layer and a lower layer, respectively, of two spatial layers. The size of the GOP in the temporal direction is 8, and pictures of views are encoded for four views S0, S1, S2, and S3.

계층별 예측 구조를 사용하지 않는 다시점 스케일러블 비디오 코딩에 따라, 공간적 기본 계층의 예측 구조(2300) 및 공간적 향상 계층(2350)의 예측 구조는 동일하다. 공간적 향상 계층에서는 하위 계층을 모두 디코딩하여 텍스처 정보를 예측하는 것이 아니라, 하위 계층의 예측 구조에 따른 움직임 정보와 해당 움직임 정보를 사용하여 예측된 잔여 데이터 정보를 주로 예측하여 코딩이 수행되므로 공간적 향상 계층에서는 기본적으로 공간적 기본 계층의 예측 구조를 그대로 차용하는 것이 일반적이다.According to multi-view scalable video coding that does not use the prediction structure for each layer, the prediction structures of the spatial base layer 2300 and the spatial enhancement layer 2350 are the same. The spatial enhancement layer does not decode all the lower layers to predict the texture information. Instead, the coding is performed by mainly predicting the motion information according to the prediction structure of the lower layer and the residual data information predicted using the motion information. In general, it is common to borrow the prediction structure of the spatial base layer as it is.

다만, 모든 공간적 계층들(2300, 2350)이 동일한 예측 구조에 따라 코딩된다면, 다시점 비디오 코딩과 같이 시점 방향 예측 코딩으로 인한 랜덤 액세스 성능이 감소한다.However, if all of the spatial layers 2300 and 2350 are coded according to the same prediction structure, random access performance due to view direction prediction coding, such as multiview video coding, is reduced.

도 23은 일 실시예에 따라 4개의 시점 및 2개의 공간적 계층을 가지는 다시점 스케일러블 비디오 코딩에서 공간적 현재 계층 및 하위 계층의 예측 구조의 다른 예를 도시한다.23 illustrates another example of a prediction structure of a spatial current layer and a lower layer in multiview scalable video coding having four viewpoints and two spatial layers, according to an embodiment.

계층별 예측 구조에 따르는 다시점 스케일러블 비디오 코딩에 따라, 공간적 기본 계층의 예측 구조(2400)와 독립적으로 공간적 향상 계층(2450)의 예측 구조가 설정될 수 있다. 공간적 기본 계층의 예측 구조(2400)는 전술한 공간적 기본 계층의 예측 구조(2300)와 동일한 반면, 공간적 향상 계층(2450)의 예측 구조는, 비앵커 픽처들에 대해서는 시점 방향의 예측을 수행하지 않는 예측 구조가 채택되었다. 비앵커 픽쳐에서 시점 방향 예측을 수행하지 않기 때문에 공간적 향상 계층의 랜덤 액세스 성능 측면에서 향상이 있을 것으로 예상될 수 있다.According to multi-view scalable video coding according to the prediction structure for each layer, the prediction structure of the spatial enhancement layer 2450 may be set independently of the prediction structure 2400 of the spatial base layer. The prediction structure 2400 of the spatial base layer is the same as the prediction structure 2300 of the spatial base layer described above, while the prediction structure of the spatial enhancement layer 2450 does not perform view direction prediction for non-anchor pictures. The prediction structure has been adopted. Since the view direction prediction is not performed in the non-anchor picture, an improvement may be expected in terms of random access performance of the spatial enhancement layer.

도 24는 일 실시예에 따라 4개의 시점 및 2개의 공간적 계층을 가지는 다시점 스케일러블 비디오 코딩에서 공간적 현재 계층 및 하위 계층의 예측 구조의 또 다른 실시예를 도시한다.24 illustrates another embodiment of a prediction structure of a spatial current layer and a lower layer in multiview scalable video coding having four views and two spatial layers, according to an embodiment.

계층별 예측 구조에 따라, 공간적 향상 계층(2550)의 예측 구조도 공간적 기본 계층의 예측 구조(2500)와 독립적으로 설정될 수 있다. 공간적 기본 계층의 예측 구조(2500)는 전술한 공간적 기본 계층의 예측 구조(2300)와 동일한 반면, 공간적 향상 계층(2550)의 예측 구조는, 앵커 픽처들 및 비앵커 픽처들을 불문하고 모든 시점 방향의 예측을 수행하지 않는 예측 구조가 채택되었다. 공간적 향상 계층에서 시점 방향 예측을 수행하지 않기 때문에, 랜덤 액세스 성능 측면에서는 가장 성능이 좋다고 할 수 있다.According to the prediction structure for each layer, the prediction structure of the spatial enhancement layer 2550 may also be set independently of the prediction structure 2500 of the spatial base layer. The prediction structure 2500 of the spatial base layer is the same as the prediction structure 2300 of the spatial base layer described above, while the prediction structure of the spatial enhancement layer 2550 is in all view directions regardless of anchor pictures and non-anchor pictures. Prediction structures that do not make predictions are adopted. Since the view direction prediction is not performed in the spatial enhancement layer, the best performance can be said in terms of random access performance.

사용자가 항상 모든 시점의 영상을 필요로 하는 것이 아니고 임의의 원하는 시점 또는 시점들의 영상을 필요로 하는 경우가 있으므로, 실제 비디오 전송 환경에서 랜덤 액세스 기능이 유용하다. 예를 들어, 하나의 시점을 지원하는 단말의 사용자는 임의의 원하는 시점을 선택할 필요가 있으며, 스테레오를 지원하는 단말의 사용자는 임의의 두 시점을 선택할 필요가 있다.Since the user does not always need the image of every view, but needs an image of any desired view or views, the random access function is useful in the actual video transmission environment. For example, a user of a terminal supporting one viewpoint needs to select an arbitrary desired viewpoint, and a user of a terminal supporting stereo needs to select two arbitrary viewpoints.

수학식 1은, 시점 랜덤 액세스의 복잡도를 나타낸다.Equation 1 shows the complexity of the viewpoint random access.

C는 현재 액세스하고자 하는 시점의 픽처의 수, M은 공간적 기본 계층을 포함하여 디코딩을 수행해야 하는 픽처의 수를 나타내며, h _base 와 w _base 는 각각 공간적 기본 계층의 각 픽처의 높이와 넓이를 의미하고, h _ehn 와 w _enh 는 각각 공간적 향상 계층의 각각의 픽처의 높이와 넓이를 나타한다. r _base 는 공간적 기본 계층의 프레임율, r _enh 는 공간적 향상 계층의 프레임율을 나타낸다. GOP_size는 시점별 시간 방향의 GOP의 크기를 나타내고, α는 현재 시점의 앵커 픽처의 연속적으로 참조하는 시점의 수를, β는 현재 시점의 비앵커 픽쳐의 연속적으로 참조하는 시점의 수를 나타낸다.C is the number of pictures at the moment you want to access, M is the number of pictures that need to be decoded, including the spatial base layer, and h _base and w _base are the height and width of each picture in the spatial base layer, respectively. H _ehn and w _enh represent the height and width of each picture of the spatial enhancement layer, respectively. r _base represents the frame rate of the spatial base layer, and r _enh represents the frame rate of the spatial enhancement layer. GOP_size represents the size of the time-specific temporal GOP, α is the number of time of continuous reference to the anchor picture of the current point in time, β is the number of time of the non-anchor picture references in a row at the current time.

도 22, 23, 24의 공간적 향상 계층들의 계층별 예측 구조들(2350, 2450, 2550)에 대해 수학식 1을 적용하면, 모두 다른 변수는 동일하고 α 및 β만이 다른 값을 갖는다. 따라서, 현재 시점의 앵커 픽처의 연속적으로 참조하는 시점의 수 α 및 현재 시점의 비앵커 픽쳐의 연속적으로 참조하는 시점의 수 β가 작을수록, 공간적 기본 계층을 포함하여 디코딩을 수행해야 하는 픽처의 수 M가 작고, 이에 따라 시점 랜덤 액세스의 복잡도가 감소함을 확인할 수 있다.Applying Equation 1 to the hierarchical prediction structures 2350, 2450, and 2550 of the spatial enhancement layers of FIGS. 22, 23, and 24, all other variables are the same and only α and β have different values. Therefore, the smaller the number α of consecutively referenced viewpoints α of the anchor picture of the current view and the number β of consecutively referenced viewpoints of the non-anchor picture of the current viewpoint are smaller, the number of pictures that must be decoded including the spatial base layer. It can be seen that M is small and, accordingly, the complexity of the viewpoint random access is reduced.

예를 들어, 도 22의 공간적 향상 계층의 예측 구조(2350)에서는, 공간적 향상 계층의 S1 시점의 픽처를 액세스하기 위해서는, 먼저 공간적 기본 계층 중 동일한 S1 시점의 픽처에 대해 움직임 보상을 제외한 디코딩을 수행해야 하고, 동일한 공간적 향상 계층에서는 S1 시점이 S0 시점과 S2 시점을 참조하기 때문에 S0, S1, S2 시점들의 모든 픽처들에 대해 디코딩을 수행해야 하는 복잡한 코딩 과정이 필요하다. 따라서, S1 시점에 대한 현재 시점의 앵커 픽처의 연속적으로 참조하는 시점의 수 α 및 현재 시점의 비앵커 픽쳐의 연속적으로 참조하는 시점의 수 β가 모두 2이다.For example, in the prediction structure 2350 of the spatial enhancement layer of FIG. 22, in order to access a picture at the S1 view of the spatial enhancement layer, decoding of the spatial enhancement layer except for motion compensation is performed on the picture at the same S1 view of the spatial enhancement layer. In the same spatial enhancement layer, since the S1 view refers to the S0 view and the S2 view, a complex coding process that requires decoding of all pictures of the S0, S1, and S2 views is required. Therefore, the number α of the continuous reference of the anchor picture of the current view with respect to the viewpoint S1 and the number β of the consecutive reference of the non-anchor picture of the current view are both two.

이에 비해, 도 23의 공간적 향상 계층의 예측 구조(2450)에서 S1 시점은 현재 시점의 앵커 픽처의 연속적으로 참조하는 시점의 수 α 및 현재 시점의 비앵커 픽쳐의 연속적으로 참조하는 시점의 수 β는 각각 2, 0이고, 도 24의 공간적 향상 계층의 예측 구조(2550)에서 S1 시점에 대한 현재 시점의 앵커 픽처의 연속적으로 참조하는 시점의 수 α 및 현재 시점의 비앵커 픽쳐의 연속적으로 참조하는 시점의 수 β가 모두 0이다.In contrast, in the prediction structure 2450 of the spatial enhancement layer of FIG. 23, the S1 viewpoint is the number α of consecutively referenced viewpoints of the anchor picture of the current viewpoint and the number β of the consecutive reference points of the non-anchor picture of the current viewpoint is 2 and 0, respectively, and in the prediction structure 2550 of the spatial enhancement layer of FIG. 24, the number α of consecutively referenced viewpoints of the anchor picture of the current viewpoint with respect to the viewpoint of S1 and the viewpoint of continuous reference of the non-anchor picture of the current viewpoint. The number β of all is zero.

따라서, 도 22의 공간적 향상 계층의 예측 구조(2350)의 경우 랜덤 액세스 복잡도가 가장 크고, 도 24의 공간적 향상 계층의 예측 구조(2550)의 경우 랜덤 액세스 복잡도가 가장 작다. 즉, 일 실시예에 따른 계층별 예측 구조가 시점 랜덤 액세스 성능 향상 면에서 효과가 있음을 확인할 수 있다.Accordingly, the random access complexity is the largest in the prediction structure 2350 of the spatial enhancement layer of FIG. 22, and the random access complexity is the smallest in the prediction structure 2550 of the spatial enhancement layer of FIG. 24. That is, it can be confirmed that the prediction structure for each layer according to an embodiment has an effect in terms of improving random access performance.

도 25는 다른 실시예에 따른 다시점 스케일러블 비디오 인코더의 블록도를 도시한다.25 is a block diagram of a multiview scalable video encoder according to another embodiment.

다른 실시예에 따른 다시점 스케일러블 비디오 인코딩 장치(2600)는 다운샘플링부(2610), 인코딩부(2620), 화질적 스케일러블 비디오 코딩부(2630) 및 비트스트림 조합부(2640)를 포함한다. 다른 실시예에 따른 다시점 스케일러블 비디오 인코딩 장치(2600)와 일 실시예에 따른 다시점 스케일러블 비디오 인코딩 장치(300)를 비교하면, 다운샘플링부(2610), 인코딩부(2620) 및 화질적 스케일러블 비디오 코딩부(2630)가 인코딩 수행부(320)에 대응하고, 비트스트림 조합부(2640)가 비트스트림 출력부(330)에 대응할 수 있다.The multi-view scalable video encoding apparatus 2600 according to another embodiment includes a downsampling unit 2610, an encoding unit 2620, a high quality scalable video coding unit 2630, and a bitstream combination unit 2640. . When the multiview scalable video encoding apparatus 2600 according to another embodiment and the multiview scalable video encoding apparatus 300 according to an embodiment are compared, the downsampling unit 2610, the encoding unit 2620, and the image quality may be compared. The scalable video coding unit 2630 may correspond to the encoding performing unit 320, and the bitstream combining unit 2640 may correspond to the bitstream output unit 330.

다시점 스케일러블 비디오 인코딩 장치(2600)는 하나의 독립된 장치로 존재하여 N개의 영상을 차례로 입력받을 수 있다. 또한, 여러 개의 다시점 스케일러블 비디오 인코딩 장치(2600)가 병렬로 존재하여 N개의 영상을 각각 입력받을 수도 있다.The multi-view scalable video encoding apparatus 2600 may exist as one independent device and sequentially receive N images. In addition, a plurality of multiview scalable video encoding apparatuses 2600 may exist in parallel to receive N images.

입력된 시점 비디오 영상은 공간적 스케일러빌리티를 지원하기 위한 하위 계층을 만들기 위해서 다운샘플링부(2610)에 입력될 수 있다. 다운샘플링부(2610)는 입력되는 영상을 다운샘플링하여 공간적 하위 계층의 영상을 생성하기 위해서 사용될 수 있다. 만약 공간적 스케일러빌리티를 지원하지 않는다면, 입력된 시점 비디오 영상은 다운샘플링부(2610)를 거치지 않고 인코딩부(2620)로 입력될 수 있다. 인코딩부(2620)는 입력된 시점 비디오 영상과 다른 시점의 비디오 영상 정보, 하위 계층의 정보 그리고 계층 간 예층 정보를 입력받을 수 있다.The input view video image may be input to the downsampling unit 2610 to create a lower layer for supporting spatial scalability. The downsampling unit 2610 may be used to downsample an input image to generate an image of a spatial lower layer. If the spatial scalability is not supported, the input view video image may be input to the encoder 2620 without passing through the downsampling unit 2610. The encoder 2620 may receive input video image information, video information of a lower view, lower layer information, and interlayer layer information.

다른 시점 영상 정보는, 다른 시점 영상의 인코딩 후 재구성된 영상 또는 다른 시점 영상의 코딩된 정보일 수 있다. 다른 시점 영상 정보가 입력될 경우 인코딩부(2620)는 다른 시점 영상 정보를 이용하여 현재 시점의 영상에 대한 다시점 비디오 코딩을 수행할 수 있다. 또한 하위 계층 정보는 하위 계층의 재구성된 영상 또는 하위 계층의 움직임 벡터 또는 매크로블록 타입과 같은 부호화 정보일 수 있다. 하위 계층 정보가 입력된 경우에 인코딩부(2620)는 하위 계층의 정보를 이용하여 현재 계층의 영상에 대한 스케일러블 비디오 코딩을 수행할 수 있다.The other viewpoint image information may be a reconstructed image or coded information of another viewpoint image after encoding another viewpoint image. When different view image information is input, the encoder 2620 may perform multi-view video coding on an image of the current view using the different view image information. In addition, the lower layer information may be encoding information such as a reconstructed image of the lower layer or a motion vector or macroblock type of the lower layer. When lower layer information is input, the encoder 2620 may perform scalable video coding on an image of a current layer by using information of a lower layer.

그리고, 계층간 예측 정보가 입력된 경우 인코딩부(2620)는 '계층별 예측 구조'에 따른 인코딩을 수행할 수 있다.When inter-layer prediction information is input, the encoder 2620 may perform encoding according to the “layer prediction structure”.

인코딩부(2620)에 의해 인코딩된 정보는 다른 계층 또는 다른 시점의 영상을 인코딩하는데 이용될 수 있다. 만약 다시점 스케일러블 비디오 인코딩 장치(2600)가 독립된 하나의 인코딩 장치로 구성되어 있다면, 부호화 정보는 다시 인코딩부(2620)에 입력될 수 있다. 만약 다시점 스케일러블 비디오 인코딩 장치(2600)가 여러 개의 인코더들 중 하나라면, 다른 계층 또는 다른 시점의 영상을 위한 인코더로 입력될 수 있다.The information encoded by the encoder 2620 may be used to encode an image of another layer or another viewpoint. If the multiview scalable video encoding apparatus 2600 is configured as an independent encoding apparatus, encoding information may be input to the encoding unit 2620 again. If the multi-view scalable video encoding apparatus 2600 is one of a plurality of encoders, the multi-view scalable video encoding apparatus 2600 may be input to an encoder for an image of another layer or a different view.

인코딩부(2620)에서 인코딩을 수행한 후 화질적 스케일러빌리티를 지원하고자 하면 화질적 스케일러블 비디오 코딩부(2630)로 분기한다. 화질적 스케일러블 비디오 코딩부(2630)는 인코딩부(2620)에 의해 인코딩된 정보들과 인코딩 장치에 입력된 영상을 이용하여 화질적 스케일러블 비디오 코딩을 수행한다. 인코딩부(2620)에 의해 인코딩된 정보는, 복원된 영상, 코딩된 정보 등을 포함할 수 있다. 화질적 스케일러블 비디오 코딩부(2630)에 의해 인코딩된 정보는 인코딩부(2620)와 동일하게 다른 계층 또는 다른 시점 영상을 위한 부호화 정보를 출력할 수 있다.After encoding by the encoder 2620, if the encoder intends to support image quality scalability, the encoder branches to the image quality scalable video coding unit 2630. The image quality scalable video coding unit 2630 performs image quality scalable video coding using information encoded by the encoding unit 2620 and an image input to the encoding apparatus. The information encoded by the encoder 2620 may include a reconstructed image, coded information, and the like. The information encoded by the image quality scalable video coding unit 2630 may output encoding information for another layer or another view image, similarly to the encoding unit 2620.

인코딩부(2620) 또는 화질적 스케일러블 비디오 코딩부(2630)에 의해 인코딩된 결과 출력된 비트스트림은 비트스트림 조합부(2640)에 입력된다. 비트스트림 조합부(2640)는 인코딩부(2620)에 의해 인코딩된 비트스트림, 화질적 스케일러빌리티 인코딩부(2630)에 의해 인코딩된 비트스트림, 상위 또는 하위 계층의 비트스트림, 그리고 다른 시점 영상의 비트스트림을 입력받을 수 있다.The resulting bitstream encoded by the encoder 2620 or the image quality scalable video coding unit 2630 is input to the bitstream combiner 2640. The bitstream combination unit 2640 may include a bitstream encoded by the encoding unit 2620, a bitstream encoded by the image quality scalability encoding unit 2630, a bitstream of an upper or lower layer, and bits of another view image. You can receive a stream.

비트스트림 조합부(2640)는 입력받은 비트스트림들을 이용하여, 도 10 내지 13을 참조하여 전술한 비트스트림 구조에 따라 비트스트림을 구성하여, 하나의 실감형 다시점 스케일러블 비디오 비트스트림을 출력할 수 있다. 이 때 비트스트림 조합부(2640)에서는, 디코딩 단계에서 다시점 스케일러블 비디오 비트스트림으로부터 필요한 정보를 선택 또는 추출하기 위해 필요한 '비트스트림 요약 정보'를, 하나의 실감형 다시점 스케일러블 비디오 비트스트림에 추가로 조합할 수 있다.The bitstream combiner 2640 configures the bitstream according to the bitstream structure described above with reference to FIGS. 10 to 13 by using the received bitstreams, and outputs one realistic multiview scalable video bitstream. Can be. In this case, the bitstream combiner 2640 may use one realistic multiview scalable video bitstream to generate 'bitstream summary information' necessary for selecting or extracting necessary information from the multiview scalable video bitstream in the decoding step. Can be combined in addition.

도 26은 다른 실시예에 따른 다시점 스케일러블 비디오 디코더의 블록도를 도시한다.26 is a block diagram of a multiview scalable video decoder according to another embodiment.

다른 실시예에 따른 다시점 스케일러블 비디오 디코딩 장치(2700)는 비트스트림 선택부(2710), 비트스트림 분해부(2720) 및 디코딩부(2730)를 포함한다. 다른 실시예에 따른 다시점 스케일러블 비디오 디코딩 장치(2700)와 일 실시예에 따른 다시점 스케일러블 비디오 디코딩 장치(400)를 비교하면, 비트스트림 분해부(2720)이 분할부(420)에 대응하고, 디코딩부(2730)가 복호화부(430)에 대응할 수 있다.The multi-view scalable video decoding apparatus 2700 according to another embodiment includes a bitstream selector 2710, a bitstream decomposer 2720, and a decoder 2730. When comparing the multiview scalable video decoding apparatus 2700 according to another embodiment with the multiview scalable video decoding apparatus 400 according to an embodiment, the bitstream decomposing unit 2720 corresponds to the splitter 420. In addition, the decoder 2730 may correspond to the decoder 430.

다시점 스케일러블 비디오 디코딩 장치(2700)에 입력된 비트스트림은 비트스트림 선택부(2710)에 입력될 수 있거나 비트스트림 분해부(2720)로 분기될 수 있다. 입력된 비트스트림은 다시점 스케일러블 비디오 비트스트림일 수 있다.The bitstream input to the multiview scalable video decoding apparatus 2700 may be input to the bitstream selector 2710 or may be branched to the bitstream decomposer 2720. The input bitstream may be a multiview scalable video bitstream.

비트스트림 선택부(2710)는 비트스트림 선택 정보와 비트스트림을 입력받아 비트스트림 선택 정보를 이용하여 비트스트림의 일부를 선택하여 출력할수 있다. 비트스트림 선택부(2710)에서 선택된 비트스트림 또는 다시점 스케일러블 비디오 디코딩 장치(2700)에 입력된 비트스트림은 비트스트림 분해부(2720)로 입력되거나 또는 디코딩부(2730)로 분기한다.The bitstream selector 2710 may receive the bitstream selection information and the bitstream and select and output a part of the bitstream using the bitstream selection information. The bitstream selected by the bitstream selector 2710 or the bitstream input to the multiview scalable video decoding apparatus 2700 is input to the bitstream decomposer 2720 or branches to the decoder 2730.

비트스트림 분해부(2720)는 비트스트림과 비트스트림 요약 정보를 입력받고, 비트스트림 요약 정보를 이용하여 비트스트림을 분해하여, 실감형 다시점 스케일러블 비디오 비트스트림으로부터 필요한 정보를 선택 또는 추출하고, 추출된 정보는 디코딩부(2730)로 입력된다.The bitstream decomposition unit 2720 receives the bitstream and the bitstream summary information, decomposes the bitstream using the bitstream summary information, and selects or extracts necessary information from the realistic multiview scalable video bitstream, The extracted information is input to the decoding unit 2730.

디코딩부(2730)는 '하나의 독립된 디코더'에서 분해되지 않은 비트스트림을 입력 받아 비트스트림을 분해하여 디코딩을 수행할 수 있다. 또한, 디코딩부(2730)는 '여러 개의 디코더'로 구성되어 비트스트림 분해부(2720)에 의해서 분해된 비트스트림을 '여러 개의 디코더'가 각각 입력받아 디코딩을 수행할 수 있다. 만약 '여러 개의 디코더'로 구성된 경우 비트스트림 분해부(2720)를 거친 각각의 비트스트림은 '여러 개의 디코더' 중 대응하는 디코더로 입력된다.The decoder 2730 may receive the undecomposed bitstream from the 'one independent decoder' and decode the bitstream. In addition, the decoder 2730 may be configured of 'multiple decoders', and the 'multiple decoders' may respectively receive a bitstream decomposed by the bitstream splitter 2720 to perform decoding. If composed of 'multiple decoders', each bitstream passed through the bitstream splitter 2720 is input to a corresponding decoder among 'multiple decoders'.

디코딩부(2730)는 비트스트림과 다른 시점 영상 정보, 하위 계층 정보, 계층 간 예측 정보를 입력으로 받을 수 있다. 디코딩부(2730)에 입력된 비트스트림에 따라서 다양한 디코딩 방법이 수행될 수 있다. 만약 H.264 방식이 수행되는 경우에 비트스트림 이외에 다른 시점 영상 정보, 하위 계층 정보, 계층 간 예측 정보는 사용되지 않을 수 있다. 또한 SVC 방식이 수행되는 경우에는 하위 계층 정보를 이용하여 디코딩이 수행될 수 있으며, MVC 방식이 수행되는 경우에는 다른 시점의 영상 정보를 이용하여 디코딩이 수행될 수 있다. 그리고 MSVC 방식이 수행되는 경우에는 다른 시점 영상 정보, 하위 계층 정보 그리고 계층 간 예측 정보를 이용하여 디코딩이 수행될 수 있다.The decoder 2730 may receive, as input, viewpoint image information different from the bitstream, lower layer information, and inter-layer prediction information. Various decoding methods may be performed according to the bitstream input to the decoding unit 2730. If the H.264 method is performed, other view image information, lower layer information, and inter-layer prediction information other than the bitstream may not be used. In addition, when the SVC scheme is performed, decoding may be performed using lower layer information. When the MVC scheme is performed, decoding may be performed using image information of another viewpoint. When the MSVC scheme is performed, decoding may be performed using other viewpoint image information, lower layer information, and inter-layer prediction information.

디코딩부(2730)에서 디코딩된 정보는 다른 계층 또는 다른 시점 영상을 디코딩하는데 사용하기 위해서, 현재 디코딩 정보로서 다른 디코더가 사용할 수 있도록 출력될 수 있다. 현재 디코딩 정보는, 현재 스케일러블 계층의 디코딩 정보, 디코딩된 영상 등을 포함할 수 있다.The information decoded by the decoder 2730 may be output to be used by other decoders as current decoding information for use in decoding another layer or another view image. The current decoding information may include decoding information of the current scalable layer, a decoded image, and the like.

만일 디코딩부(2730)가 독립적으로 구성된 경우에는 출력하는 디코딩 정보가 다시 현재 디코딩부(2730)에 입력될 수 있으며, 여러 개의 디코더들로 구성하는 경우에는 다른 디코더에 입력될 수 있다.If the decoding unit 2730 is configured independently, the decoding information to be output may be input to the current decoding unit 2730 again, and in the case of a plurality of decoders, the decoding information may be input to other decoders.

도 27은 일 실시예에 따른 다시점 스케일러블 비디오 인코딩 방법의 흐름도를 도시한다.27 is a flowchart of a multiview scalable video encoding method according to an embodiment.

단계 2810에서, 복수 개의 시점에 대한 시점별 영상들 즉, 시점 비디오 영상들이 입력된다.In operation 2810, viewpoint-specific images of the plurality of viewpoints, that is, viewpoint video images, are input.

단계 2820에서, 복수 개의 시점별 영상들에 대해, 시점 간 정보 및 계층 간 정보를 이용한 다시점 스케일러블 비디오 코딩이 수행된다. 현재 시점의 영상에 대해 다시점 비디오 코딩 및 스케일러블 비디오 코딩이 조합되어 수행될 수 있다. 다시점 스케일러블 비디오 코딩에 따라, 현재 시점의 현재 계층의 영상, 하위 계층의 정보, 및 현재 계층 및 하위 계층 간의 계층 간 정보 중 적어도 하나를 이용하여, 현재 시점의 영상에 대해 인코딩이 수행될 수 있다. 다시점 스케일러블 비디오 코딩은, 계층별로 적응적으로 설정된 계층적 예측 구조에 따라 인코딩될 수 있다.In operation 2820, multi-view scalable video coding using inter-view information and inter-layer information is performed on the plurality of views. Multiview video coding and scalable video coding may be performed on the image of the current view in combination. According to the multi-view scalable video coding, encoding may be performed on an image of the current view by using at least one of an image of the current layer of the current view, information of a lower layer, and inter-layer information between the current layer and the lower layer. have. Multi-view scalable video coding may be encoded according to a hierarchical prediction structure adaptively configured for each layer.

단계 2830에서, 시점 비디오 영상들 각각에 대한 스케일러블 비디오 코딩에 의해 생성된 시점별 계층별 비트스트림을, 시점 및 계층을 고려한 순서에 따라 조합하여 코딩 결과 비트스트림이 출력된다. 시간 순서에 따라 동일한 시간대의 시점별 계층별 비트스트림들끼리 나열한 시간대별 액세스 유닛들이 생성되며, 시간대별 액세스 유닛들이 시간 순서에 따라 나열된 한 단위로 통합된 비트스트림이 출력될 수 있다. 시간대별 액세스 유닛들은 계층별 서브-액세스 유닛들로 구성된다.In operation 2830, the coding result bitstream is output by combining the bitstream for each view layer generated by scalable video coding for each of the view video images in the order considering the view and the layer. Time-phased access units that are arranged by the time-phase layered bitstreams of the same time zone may be generated according to the time sequence, and a bitstream in which time-domain access units are integrated into one unit listed according to the time sequence may be output. Time zone access units consist of hierarchical sub-access units.

도 28은 일 실시예에 따른 다시점 스케일러블 비디오 디코딩 방법의 흐름도를 도시한다.28 is a flowchart of a multiview scalable video decoding method, according to an embodiment.

단계 2910에서, 다시점 스케일러블 비디오 코딩 방식에 의해 인코딩된 비트스트림이 수신된다.In step 2910, a bitstream encoded by a multiview scalable video coding scheme is received.

단계 2920에서, 다시점 스케일러블 비디오 코딩에 따른 복수 개의 시점들 및 복수 개의 계층들에 기초하여, 비트스트림이 시점별 계층별 비트스트림들로 분할된다. 수신된 비트스트림 중, 계층 및 시점에 따라 디코딩이 필요한 비트스트림이 선택될 수 있다. 수신된 비트스트림은 시간 순서에 따라 나열된 시간대별 액세스 유닛들로 분할되고, 시간대별 액세스 유닛들이 시간 순서에 따라 동일한 시간대의 시점별 계층별 비트스트림들로 분할될 수 있다. 시간대별 액세스 유닛으로부터, 해당 시간대의 계층별 서브-액세스 유닛들이 분할될 수 있다.In operation 2920, the bitstream is divided into view-by-layer layer-based bitstreams based on the plurality of viewpoints and the plurality of layers according to the multiview scalable video coding. Among the received bitstreams, a bitstream requiring decoding may be selected according to a layer and a viewpoint. The received bitstream may be divided into time-phased access units listed in chronological order, and the time-phase access units may be divided into time-phase layered bitstreams of the same time zone in chronological order. From the time-phased access unit, hierarchical sub-access units of the corresponding time zone may be divided.

단계 2930에서, 시점별 계층별 비트스트림들에 대해, 시점 간 정보 및 계층 간 정보를 이용하여 다시점 스케일러블 비디오 디코딩이 수행된다. 다시점 스케일러블 비디오 디코딩은, 현재 시점의 현재 계층의 영상, 하위 계층의 정보, 및 현재 계층 및 하위 계층 간의 계층 간 정보 중 적어도 하나를 이용하여 디코딩을 수행하여야 현재 시점의 현재 계층을 복원할 수 있다.In operation 2930, multi-view scalable video decoding is performed on the inter-view layer-specific bitstreams using inter-view information and inter-layer information. In multi-view scalable video decoding, decoding is performed using at least one of an image of a current layer of a current view, information of a lower layer, and inter-layer information between the current layer and a lower layer to restore the current layer of the current view. have.

다시점 스케일러블 비디오 디코딩 방법은, 하위 계층의 존부에 따라 계층 간 스케일러블 비디오 디코딩의 수행 여부 및 시점 방향 예측 코딩 수행 여부에 따라 다시점 비디오 디코딩의 수행 여부를 판단하고, 판단 결과에 따라 현재 시점 영상에 대해 다시점 스케일러블 비디오 디코딩, 단일 시점 스케일러블 비디오 디코딩, 다시점 비디오 디코딩 및 단일 시점 비디오 디코딩 중 하나를 수행할 수 있다. 다시점 스케일러블 비디오 디코딩은, 계층별로 설정된 예측 구조에 따라 예측 디코딩을 수행할 수 있다.The multi-view scalable video decoding method determines whether multi-view video decoding is performed according to whether the inter-layer scalable video decoding is performed according to the existence of the lower layer and whether the view direction prediction coding is performed, and the current view according to the determination result. One of the multi-view scalable video decoding, the single-view scalable video decoding, the multi-view video decoding, and the single-view video decoding may be performed on the image. Multi-view scalable video decoding may perform prediction decoding according to a prediction structure set for each layer.

본 발명의 일 실시예에 따른, 다시점 스케일러블 비디오 인코딩 방법 및 그 장치는, HD급 해상도의 다시점 영상 컨텐트를 한번의 인코딩을 수행하여 비트스트림을 생성할 수 있다. 기존 2차원 디스플레이, 스테레오스코픽 디스플레이, 다시점 영상 디스플레이 장치, 자유로운 시점 선택형 디스플레이 등을 포함하는 다양한 시점, QVGA, SD, HD, Full HD 등을 포함하는 다양한 화면 크기, VCD, DVD, HDTV 등을 포함하는 다양한 화질, 5Hz, 15Hz, 30Hz, 60Hz 등을 포함하는 다양한 시간적 해상도 등의 다양한 포맷의 컨텐트가 인코딩되어 비트스트림으로 전송될 수 있다.The multi-view scalable video encoding method and apparatus according to an embodiment of the present invention may generate a bitstream by performing one-time encoding on multi-view video content having HD resolution. Including various 2D displays, stereoscopic displays, multi-view video display devices, free viewpoint selectable displays, various screen sizes including QVGA, SD, HD, Full HD, VCD, DVD, HDTV, etc. The contents of various formats such as various image quality, various temporal resolution including 5 Hz, 15 Hz, 30 Hz, 60 Hz, etc. may be encoded and transmitted in the bitstream.

또한, 본 발명의 일 실시예에 따른, 다시점 스케일러블 비디오 디코딩 방법 및 그 장치는, 수신된 비트스트림들로부터 원하는 포맷의 컨텐트에 해당하는 비트스트림을 선택하여 컨텐트를 추출하여 디스플레이 장치들로 전송할 수 있다. 이에 따라, 다양한 시점, 다양한 화면 크기, 다양한 화질, 다양한 시간적 해상도를 지원할 수 있는 디스플레이 장치들에게 각각의 환경에 맞는 컨텐트가 제공될 수 있다.In addition, the multi-view scalable video decoding method and apparatus according to an embodiment of the present invention, by selecting a bitstream corresponding to the content of the desired format from the received bitstreams to extract the content to be transmitted to the display devices Can be. Accordingly, content suitable for each environment may be provided to display devices capable of supporting various viewpoints, various screen sizes, various image quality, and various temporal resolutions.

따라서 일 실시예에 따른 다시점 스케일러블 비디오 코딩에 의해, 실감형 비디오 컨텐트가 다양한 전송환경과 다양한 단말들에게 효율적으로 전달할 수 있다.Therefore, by using multi-view scalable video coding according to an embodiment, immersive video content can be efficiently delivered to various transmission environments and various terminals.

본 발명은 또한 컴퓨터로 읽을 수 있는 기록매체에 컴퓨터가 읽을 수 있는 코드로서 구현하는 것이 가능하다. 컴퓨터가 읽을 수 있는 기록매체는 컴퓨터 시스템에 의해 읽혀질 수 있는 패킷이 저장되는 모든 종류의 기록장치를 포함한다. 컴퓨터가 읽을 수 있는 기록매체의 예로는 ROM, RAM, CD-ROM, 자기 테이프, 플로피 디스크, 광데이터 저장장치 등이 있으며, 또한 케리어 웨이브(예를 들어 인터넷을 통한 전송)의 형태로 구현되는 것도 포함한다. 또한, 컴퓨터가 읽을 수 있는 기록매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어, 분산방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수 있다. 그리고, 본 발명을 구현하기 위한 기능적인(functional) 프로그램, 코드 및 코드 세그먼트들은 본 발명이 속하는 기술분야의 프로그래머들에 의해 용이하게 추론될 수 있다.
The invention can also be embodied as computer readable code on a computer readable recording medium. The computer readable recording medium includes all kinds of recording devices in which packets which can be read by a computer system are stored. Examples of computer-readable recording media include ROM, RAM, CD-ROM, magnetic tape, floppy disks, optical data storage devices, and the like, which are also implemented in the form of carrier waves (for example, transmission over the Internet). Include. In addition, the computer-readable recording medium may be distributed over network-connected computer systems so that computer readable codes can be stored and executed in a distributed manner. In addition, functional programs, codes, and code segments for implementing the present invention can be easily inferred by programmers in the art to which the present invention belongs.

이러한 본원 발명인 장치는 이해를 돕기 위하여 도면에 도시된 실시예를 참고로 설명되었으나, 이는 예시적인 것에 불과하며, 당해 분야에서 통상적 지식을 가진 자라면 이로부터 다양한 변형 및 균등한 타 실시예가 가능하다는 점을 이해할 것이다. 따라서, 본 발명의 진정한 기술적 보호 범위는 첨부된 특허청구범위에 의해 정해져야 할 것이다.The inventors of the present invention have been described with reference to the embodiments shown in the drawings for clarity, but this is merely exemplary, and those skilled in the art may various modifications and other equivalent embodiments therefrom. Will understand. Therefore, the true technical protection scope of the present invention will be defined by the appended claims.

Claims

(a) encoding a plurality of view-by-view video images, applying multi-view video encoding to at least one of the plurality of view-by-view video images, and scalable video to at least one of the plurality of view-by-view video images Generating encoding per view layer by bit by applying encoding; And
(b) combining the generated per-layer layer-specific bitstreams according to a predetermined order based on the view and the layer to generate an output bitstream;
In step (a),
And setting a prediction structure for each layer and scalable video encoding according to the set prediction structure for each layer, wherein the view direction prediction structure of the spatial enhancement layer is set independently of the view direction prediction structure of the spatial base layer. The video encoding method according to claim 1, wherein the view direction prediction is performed on all pictures of the spatial base layer, and the view direction prediction is not performed on non-anchor pictures or all pictures on the spatial enhancement layer.

According to claim 1, wherein the step (a),
And applying multi-view video encoding and scalable video encoding to at least one of the plurality of view-by-view video images.

The method of claim 1,
Applying the multiview video coding in step (a) comprises performing inter-view prediction based on video image information of another view.

The method of claim 1,
In the step (a), applying the scalable video coding includes encoding a video image of a current layer based on video image information obtained by video encoding for a lower layer and a video image of a current layer. Video encoding method.

delete

According to claim 1, wherein the step (a),
A video encoding method comprising setting a layer-by-layer prediction structure for all layers and encoding the layer-by-layer according to the set layer-specific prediction structure.

According to claim 1, wherein the step (a),
And encoding the layered prediction structure repeatedly for each layer.

According to claim 1, wherein the step (a),
Determining whether a prediction structure of a current layer of a current view image is the same as a prediction structure of a lower layer of the current view image; And
And determining whether to use the prediction structure of the lower layer according to the determination result, and encoding the prediction structure of the current layer.

The method of claim 8,
And the lower layer is a lower layer below one layer of the current layer or a predetermined layer selected from the plurality of lower layers.

delete

According to claim 1, wherein the step (a),
If the prediction structure of the lower layer does not match the prediction structure of the current layer, the current structure is not referred to by using the prediction structure of the lower layer or by referring to only the portion of the prediction structure of the lower layer that matches the prediction structure of the current layer. A video encoding method comprising predicting a prediction structure of a layer.

The method of claim 1,
The scalable video encoding performs at least one of spatial scalable video encoding, temporal scalable video encoding, and quality scalable video encoding on the plurality of view-by-view images.

The method of claim 13, wherein step (a) comprises:
Video according to the spatial scalable video encoding, wherein the image of the current layer and the image of the lower layer are generated according to the resolution, and the image of the current layer is predicted with reference to the image of the lower layer. Encoding Method.

The method of claim 13,
And according to the temporal scalable video encoding, an image of a current layer and an image of a lower layer are generated according to a frame rate, and predicting the image of the current layer with reference to the image of the lower layer. Video encoding method.

According to claim 1, wherein step (b),
Generating time-phase access units arranged in time-layer hierarchical bitstreams of the same time zone according to time order, and outputting a multi-view scalable video bitstream in which the time-phase access units are arranged in time order. Video encoding method comprising a.

The method of claim 16, wherein step (b)
When each time-level layered bitstream is composed of a bitstream according to an image quality layer and a spatial layer, each time-phase access unit includes a bitstream for each layer according to an image quality layer and a spatial layer. Video encoding method.

The method of claim 16,
As syntax information for the multiview scalable video bitstream, in order to identify a bitstream for each view layer of the access unit, identification information of spatial scalability information, identification information of temporal scalability information, and image quality scalability information And setting the identification information of the bitstream and the identification information of the bitstream per layer.

(a) receiving a bitstream;
(b) dividing the received bitstream into layer-wise bitstreams for each view; And
(c) performing multi-view scalable video decoding on the per-layer layered bitstreams using the inter-view information and the inter-layer information,
In step (c),
And decoding according to a prediction structure set for each layer, wherein the view direction prediction structure of the spatial enhancement layer is set independently of the view direction prediction structure of the spatial base layer, and the view direction for all pictures for the spatial base layer. Performing prediction and not performing view direction prediction on non-anchor pictures or all pictures with respect to the spatial enhancement layer.

20. The method of claim 19, wherein step (b) comprises:
Selecting a bitstream to be decoded according to a layer and a viewpoint of the video bitstream.

The method of claim 19, wherein step (c) is
Determining whether to perform inter-layer scalable video decoding according to whether an image of a lower layer or information of a lower layer with respect to a current view image is present;
Determining whether to perform multiview video decoding according to whether to perform view direction prediction coding; And
Multi-view scalable video decoding, single-view scalable video decoding, multi-view video decoding, and single for the current view image according to a result of determining whether the scalable video decoding is performed and whether to perform the multiview video decoding. And performing one of the viewpoint video decoding.

delete

The method of claim 19, wherein step (c) is
A video decoding method comprising setting a layer-by-layer prediction structure for all layers and decoding each layer according to the set layer-specific prediction structure.

The method of claim 19, wherein step (c) is
Iteratively for each layer, video decoding method characterized in that for setting and decoding the prediction structure for each layer.

The method of claim 19, wherein step (c) is
Determining whether a prediction structure of a current layer of a current view image is the same as a prediction structure of a lower layer of the current view image; And
And determining whether to use the prediction structure of the lower layer according to the determination result and decoding the prediction structure of the current layer.

The method of claim 25,
And the lower layer is a lower layer below one layer of the current layer or a predetermined layer selected from the plurality of lower layers.

The method of claim 25, wherein step (c) comprises:
If the prediction structure of the lower layer does not match the prediction structure of the current layer, the current structure is not referred to by using the prediction structure of the lower layer or by referring to only the portion of the prediction structure of the lower layer that matches the prediction structure of the current layer. Video decoding method for predicting the prediction structure of the layer.

The method of claim 19, wherein step (c) is
Performing multi-view video decoding on an image of the current view with reference to image information of a view other than the current view among the plurality of views.

The method of claim 28, wherein step (c) comprises:
Performing the scalable video decoding using at least one of an image of a current layer of the current view, information of a lower layer of the current view, and inter-layer prediction information between the current layer and the lower layer. Video decoding method.

The method of claim 29,
The scalable video decoding performs at least one of spatial scalable video decoding, temporal scalable video decoding, and quality scalable video decoding on the plurality of view-by-layer hierarchical bitstreams. Way.

delete

20. The method of claim 19, wherein step (b) comprises:
Dividing the received multi-view scalable video bitstream into time-phased access units listed in chronological order, and dividing the time-phase access units into time-layer hierarchical bitstreams of the same time zone in chronological order. Video decoding method comprising a.

The method of claim 33, wherein step (b) comprises:
And dividing the bitstream for each layer according to the image quality layer and the spatial layer from each time zone access unit.

The method of claim 33, wherein step (b) comprises:
Among the syntax information of the multi-view scalable video bitstream, identification information of spatial scalability information, identification information of temporal scalability information, identification information of image quality scalability information, and identification information of the bitstream for each layer of each view may be determined. And identifying and dividing the bitstream for each view layer of the access unit.

An encoder for performing multi-view scalable video encoding using the inter-view information and the inter-layer information on the plurality of view-by-view images; And
An output unit for outputting a multi-view scalable video bitstream by combining the view-by-layer layer bitstream generated by the scalable video encoding of the view-by-view images according to the order considering the view and the layer,
The encoder,
A prediction structure is set for each layer and scalable video encoding is performed according to the set prediction structure for each layer, but the view direction prediction structure of the spatial enhancement layer is set independently of the view direction prediction structure of the spatial base layer. And a view direction prediction for all pictures for the layer and a view direction prediction for non-anchor pictures or all pictures for the spatial enhancement layer.

A receiver configured to receive a multiview scalable video bitstream;
A splitter configured to divide the multiview scalable video bitstream into viewable layer-specific bitstreams based on a plurality of views and a plurality of layers of the multiview scalable video bitstream; And
A decoder configured to perform multi-view scalable video decoding using the inter-view information and the inter-layer information on the per-layer hierarchical bitstreams;
The decoding unit,
The decoding is performed according to the prediction structure set for each layer, but the view direction prediction structure of the spatial enhancement layer is set independently of the view direction prediction structure of the spatial base layer, and the view direction prediction is performed on all pictures for the spatial base layer. And do not perform view direction prediction on non-anchor pictures or all pictures with respect to the spatial enhancement layer.

19. A computer-readable recording medium having recorded thereon a program for implementing the method of any one of claims 1 to 4, 6 to 9, or 12 to 18.

A computer-readable recording medium having recorded thereon a program for implementing the method according to any one of claims 19 to 21, 23 to 30, or 33 to 35.