KR20250096744A

KR20250096744A - Context-aware voxel-based upsampling for point cloud processing

Info

Publication number: KR20250096744A
Application number: KR1020257016192A
Authority: KR
Inventors: 자하오 팡; 케빈 부이; 둥 톈
Original assignee: 인터디지털 브이씨 홀딩스 인코포레이티드
Priority date: 2022-10-18
Filing date: 2023-10-17
Publication date: 2025-06-27
Also published as: EP4588242A1; CN120077647A; WO2024086165A1

Abstract

방법의 일부 실시예들은 초기 업샘플링을 사용하여 제1 포인트 클라우드를 업샘플링하여 제2 포인트 클라우드를 획득하는 단계; 제2 포인트 클라우드의 특징들을 콘텍스트 정보와 연관시켜 제3 포인트 클라우드를 획득하는 단계; 제3 포인트 클라우드의 적어도 하나의 복셀의 점유 상태를 예측하는 단계; 및 예측된 점유 상태에 따라, 비어 있는 것으로 분류된 제3 포인트 클라우드의 복셀들을 제거하여 프루닝된 포인트 클라우드를 생성하는 단계를 포함할 수 있다.Some embodiments of the method may include the steps of: upsampling a first point cloud using initial upsampling to obtain a second point cloud; associating features of the second point cloud with context information to obtain a third point cloud; predicting an occupancy state of at least one voxel of the third point cloud; and removing voxels of the third point cloud classified as empty based on the predicted occupancy state to generate a pruned point cloud.

Description

Context-aware voxel-based upsampling for point cloud processing

관련 출원에 대한 상호 참조Cross-reference to related applications

본 출원은 국제 출원으로, 35 U.S.C. § 119(e)에 따라 2023년 1월 10일에 출원되고 발명의 명칭이 “CONTEXT-AWARE VOXEL-BASED UPSAMPLING FOR POINT CLOUD PROCESSING”인 미국 가특허 출원 일련 번호 제63/438,212호("'212 출원"), 및 2022년 10월 18일에 출원되고 발명의 명칭이 “CONTEXT-AWARE VOXEL-BASED UPSAMPLING FOR POINT CLOUD PROCESSING”인 미국 가특허 출원 일련 번호 제63/417,284호("'284 출원")의 이익을 주장하며, 이들 미국 가특허 출원은 이로써 참조에 의해 그 전체가 포함된다.This application claims the benefit of U.S. Provisional Patent Application Serial No. 63/438,212, filed Jan. 10, 2023, entitled “CONTEXT-AWARE VOXEL-BASED UPSAMPLING FOR POINT CLOUD PROCESSING” (the “'212 application”), and U.S. Provisional Patent Application Serial No. 63/417,284, filed Oct. 18, 2022, entitled “CONTEXT-AWARE VOXEL-BASED UPSAMPLING FOR POINT CLOUD PROCESSING” (the “'284 application”), which are hereby incorporated by reference in their entireties.

참조에 의한 포함Inclusion by reference

본 출원은 다음 출원들: 2021년 12월 17일에 출원되고 발명의 명칭이 “Hybrid Framework for Point Cloud Compression”인 미국 가특허 출원 일련 번호 제63/291,015호("'015 출원"); 2022년 1월 10일에 출원되고 발명의 명칭이 “A Scalable Framework for Point Cloud Compression” 미국 가특허 출원 일련 번호 제63/297,869호("'869 출원"); 2022년 7월 11일에 출원되고 발명의 명칭이 “A Scalable Framework for Point Cloud Compression”인 미국 가특허 출원 일련 번호 제63/388,087호("'087 출원"); 2021년 10월 5일에 출원되고 발명의 명칭이 “Method and Apparatus for Point Cloud Compression Using Hybrid Deep Entropy Coding”인 미국 가특허 출원 일련 번호 제63/252,482호("'482 출원"); 2022년 1월 10일에 출원되고 발명의 명칭이 “Coordinate Refinement and Upsampling from Quantized Point Cloud Reconstruction”인 미국 가특허 출원 일련 번호 제63/297,894호("'894 출원"); 및 2022년 7월 12일에 출원되고 발명의 명칭이 “Deep Distribution-Aware Point Feature Extractor for AI-Based Point Cloud Compression”인 미국 가특허 출원 일련 번호 제63/388,600호("'600 출원")를 참조에 의해 그 전체를 포함한다.This application claims the benefit of the following applications: U.S. Provisional Patent Application Serial No. 63/291,015, filed December 17, 2021, entitled “Hybrid Framework for Point Cloud Compression” (the “'015 application”); U.S. Provisional Patent Application Serial No. 63/297,869, filed January 10, 2022, entitled “A Scalable Framework for Point Cloud Compression” (the “'869 application”); U.S. Provisional Patent Application Serial No. 63/388,087, filed July 11, 2022, entitled “A Scalable Framework for Point Cloud Compression” (the “'087 application”); No. 63/252,482, filed Oct. 5, 2021, entitled “Method and Apparatus for Point Cloud Compression Using Hybrid Deep Entropy Coding” (the “'482 application”); U.S. Provisional Patent Application Serial No. 63/297,894, filed Jan. 10, 2022, entitled “Coordinate Refinement and Upsampling from Quantized Point Cloud Reconstruction” (the “'894 application”); and U.S. Provisional Patent Application Serial No. 63/388,600, filed July 12, 2022, entitled “Deep Distribution-Aware Point Feature Extractor for AI-Based Point Cloud Compression” (the “'600 application”), which are hereby incorporated by reference in their entirety.

포인트 클라우드(PC) 데이터 포맷은 여러 사업 영역, 예를 들면, 자율 주행(autonomous driving), 로봇 공학, 증강 현실/가상 현실(AR/VR), 토목 공학, 컴퓨터 그래픽스, 및 애니메이션/영화 산업에 걸쳐 보편적인 데이터 포맷이다. 3D LiDAR(Light Detection and Ranging) 센서가 자율 주행 자동차(self-driving car)에 배치되었으며, 저렴한 LiDAR 센서가 이용 가능하다. 감지 기술의 발전으로, 3D 포인트 클라우드 데이터가 그 어느 때보다 실용적으로 되었다.Point cloud (PC) data format is a common data format across many business areas, such as autonomous driving, robotics, augmented reality/virtual reality (AR/VR), civil engineering, computer graphics, and animation/film industries. 3D LiDAR (Light Detection and Ranging) sensors have been deployed in self-driving cars, and inexpensive LiDAR sensors are available. With the advancement of sensing technology, 3D point cloud data has become more practical than ever.

본 명세서에 설명된 실시예들은 비디오 인코딩 및 디코딩(총칭하여 "코딩")에 사용되는 방법들을 포함한다.Embodiments described herein include methods used for video encoding and decoding (collectively, “coding”).

일부 실시예들에 따른 제1 예시적인 방법/장치는: 초기 업샘플링을 사용하여 제1 포인트 클라우드를 업샘플링하여 제2 포인트 클라우드를 획득하는 단계; 제2 포인트 클라우드의 특징들을 콘텍스트 정보와 연관시켜 제3 포인트 클라우드를 획득하는 단계; 제3 포인트 클라우드의 적어도 하나의 복셀의 점유 상태를 예측하는 단계; 및 예측된 점유 상태에 따라, 비어 있는 것으로 분류된 제3 포인트 클라우드의 복셀들을 제거하여 프루닝된 포인트 클라우드를 생성하는 단계를 포함할 수 있다.A first exemplary method/device according to some embodiments may include: obtaining a second point cloud by upsampling a first point cloud using initial upsampling; obtaining a third point cloud by associating features of the second point cloud with context information; predicting an occupancy state of at least one voxel of the third point cloud; and generating a pruned point cloud by removing voxels of the third point cloud classified as empty based on the predicted occupancy state.

제1 예시적인 방법의 일부 실시예들에서, 초기 업샘플링은 최근접 이웃 업샘플링을 포함한다.In some embodiments of the first exemplary method, the initial upsampling includes nearest neighbor upsampling.

제1 예시적인 방법의 일부 실시예들에서, 특징들을 연관시키는 단계는 제2 포인트 클라우드의 특징들을 콘텍스트 정보와 연결(concatenate)하여 제3 포인트 클라우드를 획득하는 단계를 포함한다.In some embodiments of the first exemplary method, the step of associating features includes the step of concatenating features of the second point cloud with context information to obtain a third point cloud.

제1 예시적인 방법의 일부 실시예들에서, 콘텍스트 정보는 복셀 단위 콘텍스트 정보이다.In some embodiments of the first exemplary method, the context information is voxel-wise context information.

제1 예시적인 방법의 일부 실시예들에서, 콘텍스트 정보는 콘텍스트 포인트 클라우드를 포함한다.In some embodiments of the first exemplary method, the context information includes a context point cloud.

제1 예시적인 방법의 일부 실시예들에서, 콘텍스트 정보는 제2 포인트 클라우드에 대한 정보를 포함한다.In some embodiments of the first exemplary method, the context information includes information about the second point cloud.

제1 예시적인 방법의 일부 실시예들에서, 콘텍스트 정보는 제2 포인트 클라우드의 복셀 점유 상태에 대한 정보를 포함한다.In some embodiments of the first exemplary method, the context information includes information about voxel occupancy status of the second point cloud.

제1 예시적인 방법의 일부 실시예들에서, 콘텍스트 정보는 제1 포인트 클라우드의 부모 복셀의 위치에 대한 자식 복셀의 위치에 관한 정보를 포함한다.In some embodiments of the first exemplary method, the context information includes information about the location of a child voxel relative to the location of a parent voxel in the first point cloud.

제1 예시적인 방법의 일부 실시예들에서, 콘텍스트 정보는 제1 및 제2 포인트 클라우드들 중 적어도 하나의 점유된 복셀의 위치에 관한 좌표 정보를 포함한다.In some embodiments of the first exemplary method, the context information includes coordinate information regarding the location of an occupied voxel of at least one of the first and second point clouds.

제1 예시적인 방법의 일부 실시예들에서, 콘텍스트 정보는 좌표 정보를 포함하고, 좌표 정보는 유클리드 좌표, 구면 좌표 및 원통 좌표 중 하나의 형태이다.In some embodiments of the first exemplary method, the context information includes coordinate information, and the coordinate information is in the form of one of Euclidean coordinates, spherical coordinates, and cylindrical coordinates.

제1 예시적인 방법의 일부 실시예들에서, 콘텍스트 정보는 제1 포인트 클라우드의 초기 업샘플링에 이용 가능한 정보 외에도 제1 포인트 클라우드에 관한 알려진 정보를 제공한다.In some embodiments of the first exemplary method, the context information provides known information about the first point cloud in addition to information available for initial upsampling of the first point cloud.

제1 예시적인 방법의 일부 실시예들에서, 콘텍스트 정보는 제2 포인트 클라우드의 비트 깊이를 포함한다.In some embodiments of the first exemplary method, the context information includes the bit depth of the second point cloud.

제1 예시적인 방법의 일부 실시예들은 입력 포인트 클라우드 및 제1 비트스트림에 대해 특징 디코드(feature decode)를 수행하여 제1 포인트 클라우드를 생성하는 단계를 더 포함할 수 있다.Some embodiments of the first exemplary method may further include a step of performing feature decode on the input point cloud and the first bitstream to generate a first point cloud.

제1 예시적인 방법의 일부 실시예들은: 프루닝된 포인트 클라우드에 대해 특징 집계를 수행하여 집계된 특징을 생성하는 단계; 및 집계된 특징에 대해 콘텍스트 인식 업샘플링 프로세스를 수행하여 디코딩된 포인트 클라우드를 생성하는 단계를 더 포함할 수 있다.Some embodiments of the first exemplary method may further include: performing feature aggregation on the pruned point cloud to generate aggregated features; and performing a context-aware upsampling process on the aggregated features to generate a decoded point cloud.

제1 예시적인 방법의 일부 실시예들은: 프루닝된 포인트 클라우드에 대해 특징-잔차 변환(feature to residual conversion)을 수행하여 잔차 출력을 생성하는 단계; 및 프루닝된 포인트 클라우드를 잔차 출력에 추가하여 디코딩된 포인트 클라우드를 생성하는 단계를 더 포함할 수 있다.Some embodiments of the first exemplary method may further include: performing a feature to residual conversion on the pruned point cloud to generate a residual output; and adding the pruned point cloud to the residual output to generate a decoded point cloud.

제1 예시적인 방법의 일부 실시예들은 프루닝된 포인트 클라우드에 대해 특징 집계를 수행하여 집계된 특징을 생성하는 단계를 더 포함할 수 있으며, 여기서 특징-잔차 변환은 집계된 특징에 대해 수행된다.Some embodiments of the first exemplary method may further include a step of performing feature aggregation on the pruned point cloud to generate aggregated features, wherein feature-to-residual transformation is performed on the aggregated features.

제1 예시적인 방법의 일부 실시예들에서, 점유 상태를 예측하는 단계는 제1 신경 네트워크를 사용하여 수행된다.In some embodiments of the first exemplary method, the step of predicting the occupancy state is performed using a first neural network.

제1 예시적인 방법의 일부 실시예들에서, 점유 상태를 예측하는 단계는 적어도 하나의 복셀의 실측 점유 상태(ground-truth occupancy status)를 예측한다.In some embodiments of the first exemplary method, the step of predicting occupancy status predicts a ground-truth occupancy status of at least one voxel.

제1 예시적인 방법의 일부 실시예들에서, 점유 상태를 예측하는 단계는 적어도 하나의 복셀이 점유될 가능성을 예측한다.In some embodiments of the first exemplary method, the step of predicting occupancy predicts a likelihood that at least one voxel will be occupied.

제1 예시적인 방법의 일부 실시예들에서, 제3 포인트 클라우드의 복셀들을 제거하는 단계는 복셀 프루닝 프로세스를 사용하여 복셀들을 제거한다.In some embodiments of the first exemplary method, the step of removing voxels of the third point cloud removes voxels using a voxel pruning process.

제1 예시적인 방법의 일부 실시예들은 제2 포인트 클라우드의 적어도 하나의 특징을 집계하는 단계를 더 포함할 수 있다.Some embodiments of the first exemplary method may further include a step of aggregating at least one feature of the second point cloud.

제1 예시적인 방법의 일부 실시예들에서, 적어도 하나의 복셀의 점유 상태를 예측하는 단계는: 제3 포인트 클라우드의 적어도 하나의 특징을 집계하는 단계; 집계된 특징을 다층 퍼셉트론(MLP) 계층들로 처리하여 MLP 계층 출력을 생성하는 단계; MLP 계층 출력에 대해 소프트맥스 프로세스를 수행하여 소프트맥스 출력 값들을 생성하는 단계; 및 소프트맥스 출력 값들의 임계값 처리(thresholding)를 수행하여 제3 포인트 클라우드의 적어도 하나의 복셀의 예측된 점유 상태를 생성하는 단계를 포함한다.In some embodiments of the first exemplary method, the step of predicting the occupancy state of at least one voxel comprises: aggregating at least one feature of the third point cloud; processing the aggregated feature with multilayer perceptron (MLP) layers to generate an MLP layer output; performing a softmax process on the MLP layer output to generate softmax output values; and performing thresholding on the softmax output values to generate a predicted occupancy state of the at least one voxel of the third point cloud.

제1 예시적인 방법의 일부 실시예들에서, 소프트맥스 출력 값들의 임계값 처리는 0.5 초과의 소프트맥스 출력 값들을 1의 출력 값으로 변환하고, 0.5 이하의 소프트맥스 출력 값들을 0의 출력 값으로 변환한다.In some embodiments of the first exemplary method, thresholding of the softmax output values converts softmax output values greater than 0.5 to an output value of 1, and converts softmax output values less than or equal to 0.5 to an output value of 0.

제1 예시적인 방법의 일부 실시예들에서, 적어도 하나의 복셀의 점유 상태를 예측하는 단계는: 제3 포인트 클라우드의 적어도 하나의 특징을 집계하는 단계; 및 집계된 특징에 기초하여 제3 포인트 클라우드의 적어도 하나의 복셀의 예측된 점유 상태를 생성하는 단계를 포함한다.In some embodiments of the first exemplary method, the step of predicting the occupancy state of at least one voxel comprises: the step of aggregating at least one feature of the third point cloud; and the step of generating a predicted occupancy state of at least one voxel of the third point cloud based on the aggregated feature.

제1 예시적인 방법의 일부 실시예들에서, 적어도 하나의 특징을 집계하는 단계는: 캐스케이딩 프로세스를 한 번 이상 반복하는 단계를 포함하며, 캐스케이딩 프로세스는: 입력 포인트 클라우드의 희소 3D 콘볼루션을 수행하여 콘볼루션 출력 포인트 클라우드를 생성하는 단계; 콘볼루션 출력 포인트 클라우드에 대해 비선형 활성화 프로세스를 수행하여 비선형 출력 포인트 클라우드를 생성하는 단계; 및 캐스케이딩 프로세스의 다음 사이클이 있을 경우 비선형 출력 포인트 클라우드를 입력 포인트 클라우드로 준비하는 단계를 포함하며, 여기서 제3 포인트 클라우드는 캐스케이딩 프로세스의 첫 번째 사이클에 대한 입력 포인트 클라우드이고, 여기서 캐스케이딩 프로세스의 마지막 사이클은 집계된 특징을 생성한다.In some embodiments of the first exemplary method, the step of aggregating at least one feature comprises: repeating a cascading process one or more times, wherein the cascading process comprises: performing a sparse 3D convolution of an input point cloud to generate a convolution output point cloud; performing a nonlinear activation process on the convolution output point cloud to generate a nonlinear output point cloud; and preparing the nonlinear output point cloud as an input point cloud for a next cycle of the cascading process, wherein the third point cloud is an input point cloud for a first cycle of the cascading process, and wherein a last cycle of the cascading process generates aggregated features.

제1 예시적인 방법의 일부 실시예들은 제3 포인트 클라우드를 캐스케이딩 프로세스의 마지막 사이클의 ReLU 출력 포인트 클라우드에 추가하는 단계를 더 포함할 수 있다.Some embodiments of the first exemplary method may further include a step of adding a third point cloud to the ReLU output point cloud of the last cycle of the cascading process.

제1 예시적인 방법의 일부 실시예들에서, 적어도 하나의 특징을 집계하는 단계는: 입력 포인트 클라우드의 희소 3D 콘볼루션을 수행하여 콘볼루션 출력 포인트 클라우드를 생성하는 단계; 및 콘볼루션 출력 포인트 클라우드에 대해 비선형 활성화 프로세스를 수행하여 집계된 특징을 생성하는 단계를 포함한다.In some embodiments of the first exemplary method, the step of aggregating at least one feature comprises: performing a sparse 3D convolution of an input point cloud to generate a convolution output point cloud; and performing a nonlinear activation process on the convolution output point cloud to generate aggregated features.

제1 예시적인 방법의 일부 실시예들에서, 비선형 활성화 프로세스는 ReLU(rectifier linear unit) 활성화 프로세스를 포함하고, 비선형 출력 포인트 클라우드는 ReLU 출력 포인트 클라우드를 포함한다.In some embodiments of the first exemplary method, the nonlinear activation process comprises a rectifier linear unit (ReLU) activation process, and the nonlinear output point cloud comprises a ReLU output point cloud.

제1 예시적인 방법의 일부 실시예들에서, 적어도 하나의 특징을 집계하는 단계는: 제1 캐스케이딩 프로세스를 한 번 이상 반복하는 단계 - 제1 캐스케이딩 프로세스는: 제1 입력 포인트 클라우드의 제1 희소 3D 콘볼루션을 수행하여 제1 콘볼루션 출력 포인트 클라우드를 생성하는 단계; 제1 콘볼루션 출력 포인트 클라우드에 대해 제1 비선형 활성화 프로세스를 수행하여 제1 비선형 출력 포인트 클라우드를 생성하는 단계; 및 제1 캐스케이딩 프로세스의 다음 사이클이 있을 경우 제1 비선형 출력 포인트 클라우드를 제1 입력 포인트 클라우드로 준비하는 단계를 포함하고, 제3 포인트 클라우드는 제1 캐스케이딩 프로세스의 첫 번째 사이클에 대한 제1 입력 포인트 클라우드이며, 제1 캐스케이딩 프로세스의 마지막 사이클은 제1 캐스케이딩 프로세스 출력을 생성함 -; 제2 캐스케이딩 프로세스를 한 번 이상 반복하는 단계 - 제2 캐스케이딩 프로세스는: 제2 입력 포인트 클라우드의 제2 희소 3D 콘볼루션을 수행하여 제2 콘볼루션 출력 포인트 클라우드를 생성하는 단계; 제2 콘볼루션 출력 포인트 클라우드에 대해 제2 비선형 활성화 프로세스를 수행하여 제2 비선형 출력 포인트 클라우드를 생성하는 단계; 및 제2 캐스케이딩 프로세스의 다음 사이클이 있을 경우 제2 비선형 출력 포인트 클라우드를 제2 입력 포인트 클라우드로 준비하는 단계를 포함하고, 제3 포인트 클라우드는 제2 캐스케이딩 프로세스의 첫 번째 사이클에 대한 제2 입력 포인트 클라우드이며, 제2 캐스케이딩 프로세스의 마지막 사이클은 제2 캐스케이딩 프로세스 출력을 생성함 -; 제1 캐스케이딩 프로세스 출력과 제2 캐스케이딩 프로세스 출력을 연결하여 연결 출력(concatenation output)을 생성하는 단계; 및 제3 포인트 클라우드를 연결 출력에 추가하여 집계된 특징을 생성하는 단계를 포함한다.In some embodiments of the first exemplary method, the step of aggregating at least one feature comprises: repeating a first cascading process one or more times, wherein the first cascading process comprises: performing a first sparse 3D convolution of a first input point cloud to generate a first convolution output point cloud; performing a first nonlinear activation process on the first convolution output point cloud to generate a first nonlinear output point cloud; and preparing the first nonlinear output point cloud as the first input point cloud for a next cycle of the first cascading process, wherein the third point cloud is the first input point cloud for the first cycle of the first cascading process, and wherein the last cycle of the first cascading process generates the first cascading process output; repeating a second cascading process one or more times, wherein the second cascading process comprises: performing a second sparse 3D convolution of the second input point cloud to generate a second convolution output point cloud; A method comprising: performing a second nonlinear activation process on a second convolution output point cloud to generate a second nonlinear output point cloud; and preparing the second nonlinear output point cloud as a second input point cloud when there is a next cycle of the second cascading process, wherein the third point cloud is a second input point cloud for a first cycle of the second cascading process, and the last cycle of the second cascading process generates a second cascading process output; concatenating the first cascading process output and the second cascading process output to generate a concatenation output; and adding the third point cloud to the concatenation output to generate an aggregated feature.

제1 예시적인 방법의 일부 실시예들에서, 적어도 하나의 특징을 집계하는 단계는: 제1 캐스케이딩 프로세스를 한 번 이상 반복하는 단계 - 제1 캐스케이딩 프로세스는: 제1 입력 포인트 클라우드의 제1 희소 3D 콘볼루션을 수행하여 제1 콘볼루션 출력 포인트 클라우드를 생성하는 단계; 제1 콘볼루션 출력 포인트 클라우드에 대해 제1 ReLU(rectifier linear unit) 활성화 프로세스를 수행하여 제1 ReLU 출력 포인트 클라우드를 생성하는 단계; 및 제1 캐스케이딩 프로세스의 다음 사이클이 있을 경우 제1 ReLU 출력 포인트 클라우드를 제1 입력 포인트 클라우드로 준비하는 단계를 포함하고, 제3 포인트 클라우드는 제1 캐스케이딩 프로세스의 첫 번째 사이클에 대한 제1 입력 포인트 클라우드이며, 제1 캐스케이딩 프로세스의 마지막 사이클은 제1 캐스케이딩 프로세스 출력을 생성함 -; 제2 캐스케이딩 프로세스를 한 번 이상 반복하는 단계 - 제2 캐스케이딩 프로세스는: 제2 입력 포인트 클라우드의 제2 희소 3D 콘볼루션을 수행하여 제2 콘볼루션 출력 포인트 클라우드를 생성하는 단계; 제2 콘볼루션 출력 포인트 클라우드에 대해 제2 ReLU(rectifier linear unit) 활성화 프로세스를 수행하여 제2 ReLU 출력 포인트 클라우드를 생성하는 단계; 및 제2 캐스케이딩 프로세스의 다음 사이클이 있을 경우 제2 ReLU 출력 포인트 클라우드를 제2 입력 포인트 클라우드로 준비하는 단계를 포함하고, 제3 포인트 클라우드는 제2 캐스케이딩 프로세스의 첫 번째 사이클에 대한 제2 입력 포인트 클라우드이며, 제2 캐스케이딩 프로세스의 마지막 사이클은 제2 캐스케이딩 프로세스 출력을 생성함 -; 제1 캐스케이딩 프로세스 출력과 제2 캐스케이딩 프로세스 출력을 연결하여 연결 출력을 생성하는 단계; 및 제3 포인트 클라우드를 연결 출력에 추가하여 집계된 특징을 생성하는 단계를 포함한다.In some embodiments of the first exemplary method, the step of aggregating at least one feature comprises: repeating a first cascading process one or more times, wherein the first cascading process comprises: performing a first sparse 3D convolution of a first input point cloud to generate a first convolution output point cloud; performing a first rectifier linear unit (ReLU) activation process on the first convolution output point cloud to generate a first ReLU output point cloud; and preparing the first ReLU output point cloud as the first input point cloud when there is a next cycle of the first cascading process, wherein the third point cloud is the first input point cloud for the first cycle of the first cascading process, and wherein a last cycle of the first cascading process generates the first cascading process output; A step of repeating a second cascading process one or more times, wherein the second cascading process comprises: performing a second sparse 3D convolution of a second input point cloud to generate a second convolution output point cloud; performing a second ReLU (rectifier linear unit) activation process on the second convolution output point cloud to generate a second ReLU output point cloud; and preparing the second ReLU output point cloud as a second input point cloud when there is a next cycle of the second cascading process, wherein the third point cloud is the second input point cloud for a first cycle of the second cascading process, and the last cycle of the second cascading process generates a second cascading process output; connecting the first cascading process output and the second cascading process output to generate a connected output; and adding the third point cloud to the connected output to generate aggregated features.

제1 예시적인 방법의 일부 실시예들에서, 적어도 하나의 특징을 집계하는 단계는: 제3 포인트 클라우드에 대해 셀프 어텐션 프로세스(self-attention process)를 수행하는 단계; 제3 포인트 클라우드를 셀프 어텐션 프로세스 출력에 추가하여 MLP 프로세스 입력을 생성하는 단계; MLP 프로세스 입력에 대해 MLP 프로세스를 수행하는 단계; 및 MLP 프로세스 입력을 MLP 프로세스 출력에 추가하여 집계된 특징을 생성하는 단계를 포함한다.In some embodiments of the first exemplary method, the step of aggregating at least one feature comprises: performing a self-attention process on the third point cloud; adding the third point cloud to an output of the self-attention process to generate an MLP process input; performing an MLP process on the MLP process input; and adding the MLP process input to the MLP process output to generate an aggregated feature.

제1 예시적인 방법의 일부 실시예들에서, 셀프 어텐션 프로세스는 제3 포인트 클라우드의 복셀의 k개의 최근접 이웃에 기초하여 출력 특징을 생성한다.In some embodiments of the first exemplary method, the self-attention process generates output features based on the k nearest neighbors of a voxel of the third point cloud.

제1 예시적인 방법의 일부 실시예들에서, 제3 포인트 클라우드의 적어도 하나의 특징을 집계하는 단계는 특징 집계 프로세스를 두 번 이상 수행하는 단계를 포함한다.In some embodiments of the first exemplary method, the step of aggregating at least one feature of the third point cloud comprises performing the feature aggregation process two or more times.

일부 실시예들에 따른 제1 예시적인 방법/장치는: 프로세서; 및 프로세서에 의해 실행될 때, 장치로 하여금 초기 업샘플링을 사용하여 제1 포인트 클라우드를 업샘플링하여 제2 포인트 클라우드를 획득하게 하고; 제2 포인트 클라우드의 특징들을 콘텍스트 정보와 연관시켜 제3 포인트 클라우드를 획득하게 하며; 제3 포인트 클라우드의 적어도 하나의 복셀의 점유 상태를 예측하게 하고; 예측된 점유 상태에 따라, 비어 있는 것으로 분류된 제3 포인트 클라우드의 복셀들을 제거하여 프루닝된 포인트 클라우드를 생성하게 하도록 작동하는 명령어들을 저장한 비일시적 컴퓨터 판독 가능 매체를 포함할 수 있다.A first exemplary method/device according to some embodiments may include: a processor; and a non-transitory computer-readable medium storing instructions that, when executed by the processor, cause the device to: upsample a first point cloud using initial upsampling to obtain a second point cloud; associate features of the second point cloud with context information to obtain a third point cloud; predict an occupancy state of at least one voxel of the third point cloud; and remove voxels of the third point cloud classified as empty based on the predicted occupancy state to generate a pruned point cloud.

제1 예시적인 장치의 일부 실시예들에서, 초기 업샘플링은 최근접 이웃 업샘플링을 포함한다.In some embodiments of the first exemplary device, the initial upsampling includes nearest neighbor upsampling.

제1 예시적인 장치의 일부 실시예들에서, 특징들을 연관시키는 것은 제2 포인트 클라우드의 특징들을 콘텍스트 정보와 연결하여 제3 포인트 클라우드를 획득하는 것을 포함한다.In some embodiments of the first exemplary device, associating the features includes obtaining a third point cloud by connecting features of the second point cloud with context information.

일부 실시예들에 따른 예시적인 디바이스는: 위에 나열된 장치에 따른 장치; 및 (i) 신호를 수신하도록 구성된 안테나 - 신호는 이미지를 표현하는 데이터를 포함함 -, (ii) 수신된 신호를 이미지를 표현하는 데이터를 포함하는 주파수 대역으로 제한하도록 구성된 대역 제한기, 또는 (iii) 이미지를 디스플레이하도록 구성된 디스플레이 중 적어도 하나를 포함할 수 있다.An exemplary device according to some embodiments may include: a device according to any of the devices listed above; and at least one of (i) an antenna configured to receive a signal, the signal comprising data representing an image, (ii) a band limiter configured to limit the received signal to a frequency band comprising the data representing the image, or (iii) a display configured to display the image.

예시적인 디바이스의 일부 실시예들은 TV, 셀 폰, 태블릿, 및 셋톱 박스(STB) 중 적어도 하나를 더 포함할 수 있다.Some embodiments of the exemplary device may further include at least one of a TV, a cell phone, a tablet, and a set-top box (STB).

일부 실시예들에 따른 예시적인 컴퓨터 판독 가능 매체는 하나 이상의 프로세서로 하여금: 초기 업샘플링을 사용하여 제1 포인트 클라우드를 업샘플링하여 제2 포인트 클라우드를 획득하게 하고; 제2 포인트 클라우드의 특징들을 콘텍스트 정보와 연관시켜 제3 포인트 클라우드를 획득하게 하며; 제3 포인트 클라우드의 적어도 하나의 복셀의 점유 상태를 예측하게 하고; 예측된 점유 상태에 따라, 비어 있는 것으로 분류된 제3 포인트 클라우드의 복셀들을 제거하여 프루닝된 포인트 클라우드를 생성하게 하는 명령어들을 포함할 수 있다.An exemplary computer-readable medium according to some embodiments may include instructions that cause one or more processors to: obtain a second point cloud by upsampling a first point cloud using initial upsampling; obtain a third point cloud by associating features of the second point cloud with context information; predict an occupancy state of at least one voxel of the third point cloud; and remove voxels of the third point cloud classified as empty based on the predicted occupancy state to generate a pruned point cloud.

일부 실시예들에 따른 예시적인 컴퓨터 프로그램 제품은, 프로그램이 하나 이상의 프로세서에 의해 실행될 때, 하나 이상의 프로세서로 하여금: 초기 업샘플링을 사용하여 제1 포인트 클라우드를 업샘플링하여 제2 포인트 클라우드를 획득하게 하고; 제2 포인트 클라우드의 특징들을 콘텍스트 정보와 연관시켜 제3 포인트 클라우드를 획득하게 하며; 제3 포인트 클라우드의 적어도 하나의 복셀의 점유 상태를 예측하게 하고; 예측된 점유 상태에 따라, 비어 있는 것으로 분류된 제3 포인트 클라우드의 복셀들을 제거하여 프루닝된 포인트 클라우드를 생성하게 하는 명령어들을 포함할 수 있다.An exemplary computer program product according to some embodiments may include instructions that, when the program is executed by one or more processors, cause the one or more processors to: obtain a second point cloud by upsampling a first point cloud using initial upsampling; obtain a third point cloud by associating features of the second point cloud with context information; predict an occupancy state of at least one voxel of the third point cloud; and remove voxels of the third point cloud classified as empty based on the predicted occupancy state to generate a pruned point cloud.

일부 실시예들에 따른 제2 예시적인 방법은 제1 포인트 클라우드의 콘텍스트 인식 업샘플링을 수행하여 업샘플링된 제2 포인트 클라우드를 결정하는 단계를 포함할 수 있으며, 여기서 콘텍스트 인식 업샘플링은: 제3 포인트 클라우드의 특징들을 콘텍스트 정보와 연관시키는 것 - 제3 포인트 클라우드는 제1 포인트 클라우드의 초기 업샘플링된 버전에 적어도 부분적으로 기초함 -; 및 제3 포인트 클라우드로부터 콘텍스트 정보에 적어도 부분적으로 기초하여 비어 있는 것으로 예측되는 제4 포인트 클라우드의 복셀들을 제거하여 업스케일링된 제2 포인트 클라우드를 생성하는 것을 포함한다.A second exemplary method according to some embodiments may include performing context-aware upsampling of the first point cloud to determine an upsampled second point cloud, wherein the context-aware upsampling comprises: associating features of a third point cloud with context information, the third point cloud being at least partially based on an initial upsampled version of the first point cloud; and removing voxels of a fourth point cloud that are predicted to be empty based at least partially on the context information from the third point cloud to generate the upscaled second point cloud.

일부 실시예들에 따른 제3 예시적인 방법은: 초기 업샘플링을 사용하여 제1 포인트 클라우드를 업샘플링하여 제2 포인트 클라우드를 획득하는 단계; 제2 포인트 클라우드의 특징들을 콘텍스트 정보와 연관시켜 제3 포인트 클라우드를 획득하는 단계; 제3 포인트 클라우드의 적어도 하나의 복셀의 점유 상태를 예측하는 단계 - 적어도 하나의 복셀의 점유 상태를 예측하는 단계는 제3 포인트 클라우드의 적어도 하나의 특징을 집계하는 단계를 포함하고, 제3 포인트 클라우드의 적어도 하나의 특징을 집계하는 단계는 제1 신경 네트워크를 사용하는 단계를 포함하며, 제1 신경 네트워크를 사용하여 제3 포인트 클라우드의 적어도 하나의 특징을 집계하는 단계는 제1 신경 네트워크와 함께 제1 신경 네트워크 파라미터 세트를 사용하는 단계를 포함함 -; 예측된 점유 상태에 따라, 비어 있는 것으로 분류된 제3 포인트 클라우드의 복셀들을 제거하여 프루닝된 포인트 클라우드를 생성하는 단계; 및 프루닝된 포인트 클라우드에 대해 특징 집계를 수행하여 집계된 특징을 생성하는 단계 - 프루닝된 포인트 클라우드에 대해 특징 집계를 수행하는 단계는 제2 신경 네트워크를 사용하는 단계를 포함하고, 제2 신경 네트워크를 사용하여 집계된 특징을 생성하는 단계는 제2 신경 네트워크와 함께 제2 신경 네트워크 파라미터 세트를 사용하는 단계를 포함하며, 제1 신경 네트워크 파라미터 세트는 제2 신경 네트워크 파라미터 세트와 동일함 - 를 포함할 수 있다.A third exemplary method according to some embodiments comprises: obtaining a second point cloud by upsampling a first point cloud using an initial upsampling; obtaining a third point cloud by associating features of the second point cloud with context information; predicting an occupancy state of at least one voxel of the third point cloud, wherein the predicting the occupancy state of the at least one voxel comprises aggregating at least one feature of the third point cloud, wherein the aggregating the at least one feature of the third point cloud comprises using a first neural network, and wherein the aggregating the at least one feature of the third point cloud using the first neural network comprises using a first neural network parameter set together with the first neural network; generating a pruned point cloud by removing voxels of the third point cloud classified as empty according to the predicted occupancy state; and performing feature aggregation on the pruned point cloud to generate aggregated features, wherein the performing feature aggregation on the pruned point cloud comprises using a second neural network, and wherein the step of generating the aggregated features using the second neural network comprises using a second neural network parameter set together with the second neural network, wherein the first neural network parameter set is identical to the second neural network parameter set.

제3 예시적인 방법의 일부 실시예들은 제2 포인트 클라우드의 적어도 하나의 특징을 집계하는 단계를 더 포함할 수 있다.Some embodiments of the third exemplary method may further include a step of aggregating at least one feature of the second point cloud.

제3 예시적인 방법의 일부 실시예들에서, 여기서 제2 포인트 클라우드의 적어도 하나의 특징을 집계하는 단계는 제3 신경 네트워크를 사용하는 단계를 포함하고, 여기서 제3 신경 네트워크를 사용하여 제2 포인트 클라우드의 적어도 하나의 특징을 집계하는 단계는 제3 신경 네트워크와 함께 제3 신경 네트워크 파라미터 세트를 사용하는 단계를 포함하며, 여기서 제3 신경 네트워크 파라미터 세트는 제1 신경 네트워크 파라미터 세트와 동일하다.In some embodiments of the third exemplary method, wherein the step of aggregating at least one feature of the second point cloud comprises the step of using a third neural network, wherein the step of aggregating at least one feature of the second point cloud using the third neural network comprises the step of using a third neural network parameter set together with the third neural network, wherein the third neural network parameter set is identical to the first neural network parameter set.

제3 예시적인 방법의 일부 실시예들에서, 초기 업샘플링은 최근접 이웃 업샘플링을 포함한다.In some embodiments of the third exemplary method, the initial upsampling includes nearest neighbor upsampling.

제3 예시적인 방법의 일부 실시예들에서, 특징들을 연관시키는 단계는 제2 포인트 클라우드의 특징들을 콘텍스트 정보와 연결하여 제3 포인트 클라우드를 획득하는 단계를 포함한다.In some embodiments of the third exemplary method, the step of associating features includes the step of obtaining a third point cloud by associating features of the second point cloud with context information.

제3 예시적인 방법의 일부 실시예들에서, 콘텍스트 정보는 복셀 단위 콘텍스트 정보이다.In some embodiments of the third exemplary method, the context information is voxel-wise context information.

제3 예시적인 방법의 일부 실시예들은 입력 포인트 클라우드 및 제1 비트스트림에 대해 특징 디코드를 수행하여 제1 포인트 클라우드를 생성하는 단계를 더 포함할 수 있다.Some embodiments of the third exemplary method may further include a step of performing feature decoding on the input point cloud and the first bitstream to generate a first point cloud.

제3 예시적인 방법의 일부 실시예들은 집계된 특징에 대해 콘텍스트 인식 업샘플링 프로세스를 수행하여 디코딩된 포인트 클라우드를 생성하는 단계를 더 포함할 수 있다.Some embodiments of the third exemplary method may further include a step of performing a context-aware upsampling process on the aggregated features to generate a decoded point cloud.

제3 예시적인 방법의 일부 실시예들은 프루닝된 포인트 클라우드에 대해 특징-잔차 변환을 수행하여 잔차 출력을 생성하는 단계; 및 프루닝된 포인트 클라우드를 잔차 출력에 추가하여 디코딩된 포인트 클라우드를 생성하는 단계를 더 포함할 수 있다.Some embodiments of the third exemplary method may further include the steps of performing a feature-residual transformation on the pruned point cloud to generate a residual output; and the steps of adding the pruned point cloud to the residual output to generate a decoded point cloud.

제3 예시적인 방법의 일부 실시예들에서, 특징-잔차 변환이 집계된 특징에 대해 수행된다.In some embodiments of the third exemplary method, feature-residual transformation is performed on aggregated features.

제3 예시적인 방법의 일부 실시예들에서, 점유 상태를 예측하는 단계는 적어도 하나의 복셀의 실측 점유 상태를 예측한다.In some embodiments of the third exemplary method, the step of predicting an occupancy state predicts a ground truth occupancy state of at least one voxel.

제3 예시적인 방법의 일부 실시예들에서, 점유 상태를 예측하는 단계는 적어도 하나의 복셀이 점유될 가능성을 예측한다.In some embodiments of the third exemplary method, the step of predicting occupancy predicts the likelihood that at least one voxel will be occupied.

제3 예시적인 방법의 일부 실시예들에서, 제3 포인트 클라우드의 복셀들을 제거하는 단계는 복셀 프루닝 프로세스를 사용하여 복셀들을 제거한다.In some embodiments of the third exemplary method, the step of removing voxels of the third point cloud removes voxels using a voxel pruning process.

제3 예시적인 방법의 일부 실시예들에서, 적어도 하나의 복셀의 점유 상태를 예측하는 단계는: 집계된 특징을 다층 퍼셉트론(MLP) 계층들로 처리하여 MLP 계층 출력을 생성하는 단계; MLP 계층 출력에 대해 소프트맥스 프로세스를 수행하여 소프트맥스 출력 값들을 생성하는 단계; 및 소프트맥스 출력 값들의 임계값 처리를 수행하여 제3 포인트 클라우드의 적어도 하나의 복셀의 예측된 점유 상태를 생성하는 단계를 더 포함한다.In some embodiments of the third exemplary method, the step of predicting the occupancy state of at least one voxel further comprises: processing the aggregated features with multilayer perceptron (MLP) layers to generate MLP layer outputs; performing a softmax process on the MLP layer outputs to generate softmax output values; and performing thresholding on the softmax output values to generate a predicted occupancy state of at least one voxel of the third point cloud.

제3 예시적인 방법의 일부 실시예들에서, 소프트맥스 출력 값들의 임계값 처리는 0.5 초과의 소프트맥스 출력 값들을 1의 출력 값으로 변환하고, 0.5 이하의 소프트맥스 출력 값들을 0의 출력 값으로 변환한다.In some embodiments of the third exemplary method, thresholding of the softmax output values converts softmax output values greater than 0.5 to an output value of 1, and converts softmax output values less than or equal to 0.5 to an output value of 0.

제3 예시적인 방법의 일부 실시예들에서, 적어도 하나의 복셀의 점유 상태를 예측하는 단계는: 제3 포인트 클라우드의 적어도 하나의 특징을 집계하는 단계; 및 집계된 특징에 기초하여 제3 포인트 클라우드의 적어도 하나의 복셀의 예측된 점유 상태를 생성하는 단계를 포함한다.In some embodiments of the third exemplary method, the step of predicting the occupancy state of at least one voxel comprises: the step of aggregating at least one feature of the third point cloud; and the step of generating a predicted occupancy state of at least one voxel of the third point cloud based on the aggregated feature.

제3 예시적인 방법의 일부 실시예들에서, 제3 포인트 클라우드의 적어도 하나의 특징을 집계하는 단계는: 캐스케이딩 프로세스를 한 번 이상 반복하는 단계를 포함하며, 캐스케이딩 프로세스는: 입력 포인트 클라우드의 희소 3D 콘볼루션을 수행하여 콘볼루션 출력 포인트 클라우드를 생성하는 단계; 콘볼루션 출력 포인트 클라우드에 대해 비선형 활성화 프로세스를 수행하여 비선형 출력 포인트 클라우드를 생성하는 단계; 및 캐스케이딩 프로세스의 다음 사이클이 있을 경우 비선형 출력 포인트 클라우드를 입력 포인트 클라우드로 준비하는 단계를 포함하며, 여기서 제3 포인트 클라우드는 캐스케이딩 프로세스의 첫 번째 사이클에 대한 입력 포인트 클라우드이고, 여기서 캐스케이딩 프로세스의 마지막 사이클은 집계된 특징을 생성한다.In some embodiments of the third exemplary method, the step of aggregating at least one feature of the third point cloud comprises: repeating a cascading process one or more times, wherein the cascading process comprises: performing a sparse 3D convolution of an input point cloud to generate a convolution output point cloud; performing a nonlinear activation process on the convolution output point cloud to generate a nonlinear output point cloud; and preparing the nonlinear output point cloud as an input point cloud for a next cycle of the cascading process, wherein the third point cloud is an input point cloud for a first cycle of the cascading process, and wherein a last cycle of the cascading process generates aggregated features.

제3 예시적인 방법의 일부 실시예들은 제3 포인트 클라우드를 캐스케이딩 프로세스의 마지막 사이클의 ReLU 출력 포인트 클라우드에 추가하는 단계를 더 포함할 수 있다.Some embodiments of the third exemplary method may further include a step of adding the third point cloud to the ReLU output point cloud of the last cycle of the cascading process.

제3 예시적인 방법의 일부 실시예들에서, 적어도 하나의 특징을 집계하는 단계는: 입력 포인트 클라우드의 희소 3D 콘볼루션을 수행하여 콘볼루션 출력 포인트 클라우드를 생성하는 단계; 및 콘볼루션 출력 포인트 클라우드에 대해 비선형 활성화 프로세스를 수행하여 집계된 특징을 생성하는 단계를 포함한다.In some embodiments of the third exemplary method, the step of aggregating at least one feature comprises: performing a sparse 3D convolution of an input point cloud to generate a convolution output point cloud; and performing a nonlinear activation process on the convolution output point cloud to generate aggregated features.

제3 예시적인 방법의 일부 실시예들에서, 비선형 활성화 프로세스는 ReLU(rectifier linear unit) 활성화 프로세스를 포함하고, 비선형 출력 포인트 클라우드는 ReLU 출력 포인트 클라우드를 포함한다.In some embodiments of the third exemplary method, the nonlinear activation process comprises a rectifier linear unit (ReLU) activation process, and the nonlinear output point cloud comprises a ReLU output point cloud.

제3 예시적인 방법의 일부 실시예들에서, 제3 포인트 클라우드의 적어도 하나의 특징을 집계하는 단계는: 제1 캐스케이딩 프로세스를 한 번 이상 반복하는 단계 - 제1 캐스케이딩 프로세스는: 제1 입력 포인트 클라우드의 제1 희소 3D 콘볼루션을 수행하여 제1 콘볼루션 출력 포인트 클라우드를 생성하는 단계; 제1 콘볼루션 출력 포인트 클라우드에 대해 제1 비선형 활성화 프로세스를 수행하여 제1 비선형 출력 포인트 클라우드를 생성하는 단계; 및 제1 캐스케이딩 프로세스의 다음 사이클이 있을 경우 제1 비선형 출력 포인트 클라우드를 제1 입력 포인트 클라우드로 준비하는 단계를 포함하고, 제3 포인트 클라우드는 제1 캐스케이딩 프로세스의 첫 번째 사이클에 대한 제1 입력 포인트 클라우드이며, 제1 캐스케이딩 프로세스의 마지막 사이클은 제1 캐스케이딩 프로세스 출력을 생성함 -; 제2 캐스케이딩 프로세스를 한 번 이상 반복하는 단계 - 제2 캐스케이딩 프로세스는: 제2 입력 포인트 클라우드의 제2 희소 3D 콘볼루션을 수행하여 제2 콘볼루션 출력 포인트 클라우드를 생성하는 단계; 제2 콘볼루션 출력 포인트 클라우드에 대해 제2 비선형 활성화 프로세스를 수행하여 제2 비선형 출력 포인트 클라우드를 생성하는 단계; 및 제2 캐스케이딩 프로세스의 다음 사이클이 있을 경우 제2 비선형 출력 포인트 클라우드를 제2 입력 포인트 클라우드로 준비하는 단계를 포함하고, 제3 포인트 클라우드는 제2 캐스케이딩 프로세스의 첫 번째 사이클에 대한 제2 입력 포인트 클라우드이며, 제2 캐스케이딩 프로세스의 마지막 사이클은 제2 캐스케이딩 프로세스 출력을 생성함 -; 제1 캐스케이딩 프로세스 출력과 제2 캐스케이딩 프로세스 출력을 연결하여 연결 출력을 생성하는 단계; 및 제3 포인트 클라우드를 연결 출력에 추가하여 집계된 특징을 생성하는 단계를 포함한다.In some embodiments of the third exemplary method, the step of aggregating at least one feature of the third point cloud comprises: repeating a first cascading process one or more times, wherein the first cascading process comprises: performing a first sparse 3D convolution of a first input point cloud to generate a first convolution output point cloud; performing a first nonlinear activation process on the first convolution output point cloud to generate a first nonlinear output point cloud; and preparing the first nonlinear output point cloud as the first input point cloud if there is a next cycle of the first cascading process, wherein the third point cloud is the first input point cloud for the first cycle of the first cascading process, and wherein a last cycle of the first cascading process generates the first cascading process output; A method for generating a second cascading process, the method comprising: performing a second sparse 3D convolution of a second input point cloud to generate a second convolution output point cloud; performing a second nonlinear activation process on the second convolution output point cloud to generate a second nonlinear output point cloud; and preparing the second nonlinear output point cloud as a second input point cloud when there is a next cycle of the second cascading process, wherein the third point cloud is the second input point cloud for a first cycle of the second cascading process, and the last cycle of the second cascading process generates a second cascading process output; concatenating the first cascading process output and the second cascading process output to generate a concatenated output; and adding the third point cloud to the concatenated output to generate aggregated features.

제3 예시적인 방법의 일부 실시예들에서, 적어도 하나의 특징을 집계하는 단계는: 제1 캐스케이딩 프로세스를 한 번 이상 반복하는 단계 - 제1 캐스케이딩 프로세스는: 제1 입력 포인트 클라우드의 제1 희소 3D 콘볼루션을 수행하여 제1 콘볼루션 출력 포인트 클라우드를 생성하는 단계; 제1 콘볼루션 출력 포인트 클라우드에 대해 제1 ReLU(rectifier linear unit) 활성화 프로세스를 수행하여 제1 ReLU 출력 포인트 클라우드를 생성하는 단계; 및 제1 캐스케이딩 프로세스의 다음 사이클이 있을 경우 제1 ReLU 출력 포인트 클라우드를 제1 입력 포인트 클라우드로 준비하는 단계를 포함하고, 제3 포인트 클라우드는 제1 캐스케이딩 프로세스의 첫 번째 사이클에 대한 제1 입력 포인트 클라우드이며, 제1 캐스케이딩 프로세스의 마지막 사이클은 제1 캐스케이딩 프로세스 출력을 생성함 -; 제2 캐스케이딩 프로세스를 한 번 이상 반복하는 단계 - 제2 캐스케이딩 프로세스는: 제2 입력 포인트 클라우드의 제2 희소 3D 콘볼루션을 수행하여 제2 콘볼루션 출력 포인트 클라우드를 생성하는 단계; 제2 콘볼루션 출력 포인트 클라우드에 대해 제2 ReLU(rectifier linear unit) 활성화 프로세스를 수행하여 제2 ReLU 출력 포인트 클라우드를 생성하는 단계; 및 제2 캐스케이딩 프로세스의 다음 사이클이 있을 경우 제2 ReLU 출력 포인트 클라우드를 제2 입력 포인트 클라우드로 준비하는 단계를 포함하고, 제3 포인트 클라우드는 제2 캐스케이딩 프로세스의 첫 번째 사이클에 대한 제2 입력 포인트 클라우드이며, 제2 캐스케이딩 프로세스의 마지막 사이클은 제2 캐스케이딩 프로세스 출력을 생성함 -; 제1 캐스케이딩 프로세스 출력과 제2 캐스케이딩 프로세스 출력을 연결하여 연결 출력을 생성하는 단계; 및 제3 포인트 클라우드를 연결 출력에 추가하여 집계된 특징을 생성하는 단계를 포함한다.In some embodiments of the third exemplary method, the step of aggregating at least one feature comprises: repeating a first cascading process one or more times, wherein the first cascading process comprises: performing a first sparse 3D convolution of a first input point cloud to generate a first convolution output point cloud; performing a first ReLU (rectifier linear unit) activation process on the first convolution output point cloud to generate a first ReLU output point cloud; and preparing the first ReLU output point cloud as the first input point cloud when there is a next cycle of the first cascading process, wherein the third point cloud is the first input point cloud for the first cycle of the first cascading process, and wherein a last cycle of the first cascading process generates the first cascading process output; A step of repeating a second cascading process one or more times, wherein the second cascading process comprises: performing a second sparse 3D convolution of a second input point cloud to generate a second convolution output point cloud; performing a second ReLU (rectifier linear unit) activation process on the second convolution output point cloud to generate a second ReLU output point cloud; and preparing the second ReLU output point cloud as a second input point cloud when there is a next cycle of the second cascading process, wherein the third point cloud is the second input point cloud for a first cycle of the second cascading process, and the last cycle of the second cascading process generates a second cascading process output; connecting the first cascading process output and the second cascading process output to generate a connected output; and adding the third point cloud to the connected output to generate aggregated features.

제3 예시적인 방법의 일부 실시예들에서, 적어도 하나의 특징을 집계하는 단계는: 제3 포인트 클라우드에 대해 셀프 어텐션 프로세스를 수행하는 단계; 제3 포인트 클라우드를 셀프 어텐션 프로세스 출력에 추가하여 MLP 프로세스 입력을 생성하는 단계; MLP 프로세스 입력에 대해 MLP 프로세스를 수행하는 단계; 및 MLP 프로세스 입력을 MLP 프로세스 출력에 추가하여 집계된 특징을 생성하는 단계를 포함한다.In some embodiments of the third exemplary method, the step of aggregating at least one feature comprises: performing a self-attention process on the third point cloud; adding the third point cloud to an output of the self-attention process to generate an MLP process input; performing an MLP process on the MLP process input; and adding the MLP process input to the MLP process output to generate an aggregated feature.

제3 예시적인 방법의 일부 실시예들에서, 셀프 어텐션 프로세스는 제3 포인트 클라우드의 복셀의 k개의 최근접 이웃에 기초하여 출력 특징을 생성한다.In some embodiments of the third exemplary method, the self-attention process generates output features based on the k nearest neighbors of a voxel of the third point cloud.

제3 예시적인 방법의 일부 실시예들에서, 제3 포인트 클라우드의 적어도 하나의 특징을 집계하는 단계는 특징 집계 프로세스를 두 번 이상 수행하는 단계를 포함한다.In some embodiments of the third exemplary method, the step of aggregating at least one feature of the third point cloud comprises performing the feature aggregation process two or more times.

제3 예시적인 방법의 일부 실시예들에서, 제1 신경 네트워크 파라미터 세트와 제2 신경 네트워크 파라미터 세트는 동일한 신경 네트워크 파라미터 세트이며, 동일한 신경 네트워크 파라미터 세트가 적어도 제1 신경 네트워크와 제2 신경 네트워크에 의해 사용된다.In some embodiments of the third exemplary method, the first neural network parameter set and the second neural network parameter set are the same neural network parameter set, and the same neural network parameter set is used by at least the first neural network and the second neural network.

제3 예시적인 방법의 일부 실시예들에서, 제1 신경 네트워크 파라미터 세트와 제2 신경 네트워크 파라미터 세트는 구별되지만 동일한 신경 네트워크 파라미터 세트이다.In some embodiments of the third exemplary method, the first neural network parameter set and the second neural network parameter set are distinct but identical neural network parameter sets.

일부 실시예들에 따른 제4 예시적인 방법은: 제1 포인트 클라우드를 획득하는 단계; 제1 포인트 클라우드의 적어도 하나의 복셀의 점유 상태를 결정하는 단계; 결정된 점유 상태에 따라, 비어 있는 것으로 분류된 제1 포인트 클라우드의 복셀들을 제거하여 제2 포인트 클라우드를 생성하는 단계; 제2 포인트 클라우드의 특징들을 콘텍스트 정보와 연관시켜 제3 포인트 클라우드를 획득하는 단계; 초기 다운샘플링을 사용하여 제3 포인트 클라우드를 다운샘플링하여 제4 포인트 클라우드를 획득하는 단계; 및 제4 포인트 클라우드를 인코딩된 포인트 클라우드로서 출력하는 단계를 포함할 수 있다.A fourth exemplary method according to some embodiments may include: obtaining a first point cloud; determining an occupancy state of at least one voxel of the first point cloud; generating a second point cloud by removing voxels of the first point cloud classified as empty according to the determined occupancy state; obtaining a third point cloud by associating features of the second point cloud with context information; downsampling the third point cloud using initial downsampling to obtain a fourth point cloud; and outputting the fourth point cloud as an encoded point cloud.

일부 실시예들에 따른 제4 예시적인 장치는: 프로세서; 및 프로세서에 의해 실행될 때, 장치로 하여금 제1 포인트 클라우드를 획득하게 하고; 제1 포인트 클라우드의 적어도 하나의 복셀의 점유 상태를 결정하게 하며; 결정된 점유 상태에 따라, 비어 있는 것으로 분류된 제1 포인트 클라우드의 복셀들을 제거하여 제2 포인트 클라우드를 생성하게 하고; 제2 포인트 클라우드의 특징들을 콘텍스트 정보와 연관시켜 제3 포인트 클라우드를 획득하게 하며; 초기 다운샘플링을 사용하여 제3 포인트 클라우드를 다운샘플링하여 제4 포인트 클라우드를 획득하게 하고; 제4 포인트 클라우드를 인코딩된 포인트 클라우드로서 출력하게 하도록 작동하는 명령어들을 저장한 비일시적 컴퓨터 판독 가능 매체를 포함할 수 있다.A fourth exemplary device according to some embodiments may include: a processor; and a non-transitory computer-readable medium storing instructions that, when executed by the processor, cause the device to obtain a first point cloud; determine an occupancy state of at least one voxel of the first point cloud; generate a second point cloud by removing voxels of the first point cloud classified as empty according to the determined occupancy state; obtain a third point cloud by associating features of the second point cloud with context information; obtain a fourth point cloud by downsampling the third point cloud using initial downsampling; and output the fourth point cloud as an encoded point cloud.

일부 실시예들에 따른 제5 예시적인 방법/장치는: 제1 포인트 클라우드를 포함하는 데이터에 액세스하는 단계; 및 제1 포인트 클라우드를 포함하는 데이터를 송신하는 단계를 포함할 수 있다.A fifth exemplary method/device according to some embodiments may include: accessing data including a first point cloud; and transmitting data including the first point cloud.

일부 실시예들에 따른 제5 예시적인 방법/장치는: 제1 포인트 클라우드를 포함하는 데이터에 액세스하도록 구성된 액세스 유닛; 및 제1 포인트 클라우드를 포함하는 데이터를 송신하도록 구성된 송신기를 포함할 수 있다.A fifth exemplary method/device according to some embodiments may include: an access unit configured to access data including a first point cloud; and a transmitter configured to transmit data including the first point cloud.

일부 실시예들에 따른 제6 예시적인 방법/장치는: 프로세서; 및 프로세서에 의해 실행될 때, 장치로 하여금 위에 나열된 방법들 중 어느 하나를 수행하게 하도록 작동하는 명령어들을 저장한 비일시적 컴퓨터 판독 가능 매체를 포함할 수 있다.A sixth exemplary method/device according to some embodiments may include: a processor; and a non-transitory computer-readable medium storing instructions that, when executed by the processor, operate to cause the device to perform any one of the methods listed above.

일부 실시예들에 따른 제7 예시적인 방법/장치는 위에 나열된 방법들 중 어느 하나를 수행하도록 구성된 적어도 하나의 프로세서를 포함할 수 있다.A seventh exemplary method/device according to some embodiments may include at least one processor configured to perform any one of the methods listed above.

일부 실시예들에 따른 제8 예시적인 방법/장치는 하나 이상의 프로세서로 하여금 위에 나열된 방법들 중 어느 하나를 수행하게 하기 위한 명령어들을 저장한 컴퓨터 판독 가능 매체를 포함할 수 있다.An eighth exemplary method/device according to some embodiments may include a computer-readable medium storing instructions for causing one or more processors to perform any one of the methods listed above.

일부 실시예들에 따른 제9 예시적인 방법/장치는 적어도 하나의 프로세서 및 적어도 하나의 프로세서로 하여금 위에 나열된 방법들 중 어느 하나를 수행하게 하기 위한 명령어들을 저장한 적어도 하나의 비일시적 컴퓨터 판독 가능 매체를 포함할 수 있다.A ninth exemplary method/device according to some embodiments may include at least one processor and at least one non-transitory computer-readable medium storing instructions for causing the at least one processor to perform any one of the methods listed above.

일부 실시예들에 따른 예시적인 신호는 위에 나열된 방법들 중 어느 하나에 따라 생성된 비트스트림을 포함할 수 있다.An exemplary signal according to some embodiments may include a bitstream generated according to any of the methods listed above.

추가적인 실시예들에서, 본 명세서에 설명된 방법들을 수행하기 위해 인코더 및 디코더 장치가 제공된다. 인코더 또는 디코더 장치는 본 명세서에 설명된 방법들을 수행하도록 구성된 프로세서를 포함할 수 있다. 이 장치는 본 명세서에 설명된 방법들을 수행하기 위한 명령어들을 저장한 컴퓨터 판독 가능 매체(예를 들면, 비일시적 매체)를 포함할 수 있다. 일부 실시예들에서, 컴퓨터 판독 가능 매체(예를 들면, 비일시적 매체)는 본 명세서에 설명된 방법들 중 어느 하나를 사용하여 인코딩된 비디오를 저장한다.In additional embodiments, an encoder and decoder device are provided for performing the methods described herein. The encoder or decoder device may include a processor configured to perform the methods described herein. The device may include a computer-readable medium (e.g., a non-transitory medium) storing instructions for performing the methods described herein. In some embodiments, the computer-readable medium (e.g., a non-transitory medium) stores video encoded using any one of the methods described herein.

본 실시예들 중 하나 이상은 또한 위에서 설명된 방법들 중 어느 하나에 따라 양방향 광학 흐름을 수행하거나, 비디오 데이터를 인코딩 또는 디코딩하기 위한 명령어들이 저장된 컴퓨터 판독 가능 저장 매체를 제공한다. 본 실시예들은 또한 위에서 설명된 방법들에 따라 생성된 비트스트림이 저장된 컴퓨터 판독 가능 저장 매체를 제공한다. 본 실시예들은 또한 위에서 설명된 방법들에 따라 생성된 비트스트림을 송신하기 위한 방법 및 장치를 제공한다. 본 실시예들은 또한 설명된 방법들 중 어느 하나를 수행하기 위한 명령어들을 포함하는 컴퓨터 프로그램 제품을 제공한다.One or more of the present embodiments also provide a computer-readable storage medium having stored thereon instructions for performing bidirectional optical flow, or encoding or decoding video data, according to any one of the methods described above. The present embodiments also provide a computer-readable storage medium having stored thereon a bitstream generated according to the methods described above. The present embodiments also provide a method and apparatus for transmitting a bitstream generated according to the methods described above. The present embodiments also provide a computer program product comprising instructions for performing any one of the methods described above.

도 1a는 일부 실시예들에 따른 예시적인 통신 시스템을 예시하는 시스템 다이어그램이다.
도 1b는 일부 실시예들에 따른 도 1a에 예시된 통신 시스템 내에서 사용될 수 있는 예시적인 WTRU(wireless transmit/receive unit)를 예시하는 시스템 다이어그램이다.
도 1c는 일부 실시예들에 따른 시스템에 대한 예시적인 인터페이스 세트를 예시하는 시스템 다이어그램이다.
도 2a는 포인트 클라우드의 예시적인 복셀 기반 표현을 보여주는 개략적인 예시이다.
도 2b는 포인트 클라우드의 예시적인 희소 복셀 기반 표현을 보여주는 개략적인 예시이다.
도 3은 포인트 클라우드에 대한 예시적인 최근접 이웃(NN) 업샘플링을 보여주는 개략적인 프로세스 다이어그램이다.
도 4는 프루닝을 사용하는 예시적인 복셀 기반 업샘플링을 보여주는 개략적인 프로세스 다이어그램이다.
도 5는 일부 실시예들에 따른 프루닝을 사용하는 예시적인 콘텍스트 인식 복셀 기반 업샘플링을 보여주는 개략적인 프로세스 다이어그램이다.
도 6a는 일부 실시예들에 따른 예시적인 위치 값들을 예시하는 표이다.
도 6b는 일부 실시예들에 따른 콘텍스트 정보로서 예시적인 자식 복셀 위치들을 예시하는 개략적인 사시도이다.
도 7은 일부 실시예들에 따른 여러 콘텍스트 인식 업샘플링을 캐스케이딩하기 위한 예시적인 프로세스를 예시하는 흐름도이다.
도 8은 일부 실시예들에 따른 초기 특징 집계를 사용하는 예시적인 콘텍스트 인식 복셀 기반 업샘플링을 보여주는 개략적인 프로세스 다이어그램이다.
도 9는 일부 실시예들에 따른 이진 분류를 위한 예시적인 프로세스를 예시하는 흐름도이다.
도 10은 일부 실시예들에 따른 특징 집계를 위한 캐스케이딩된(cascaded) 희소 콘볼루션 계층들을 갖는 예시적인 프로세스를 예시하는 블록 다이어그램이다.
도 11은 일부 실시예들에 따른 특징 집계를 위한 예시적인 ResNet 블록을 예시하는 블록 다이어그램이다.
도 12는 일부 실시예들에 따른 특징 집계를 위한 예시적인 Inception-ResNet 블록을 예시하는 블록 다이어그램이다.
도 13은 일부 실시예들에 따른 특징 집계를 위한 예시적인 트랜스포머 블록을 예시하는 블록 다이어그램이다.
도 14는 일부 실시예들에 따른 셀프 어텐션 블록의 예시적인 아키텍처를 예시하는 블록 다이어그램이다.
도 15는 일부 실시예들에 따른 여러 특징 집계를 캐스케이딩하기 위한 예시적인 프로세스를 예시하는 흐름도이다.
도 16은 일부 실시예들에 따른 예시적인 원래 디코더 아키텍처를 예시하는 블록 다이어그램이다.
도 17은 일부 실시예들에 따른 복셀 기반 업샘플링이 있는 예시적인 디코더 아키텍처를 예시하는 블록 다이어그램이다.
도 18은 일부 실시예들에 따른 복셀 기반 업샘플링 및 특징 집계가 있는 예시적인 디코더 아키텍처를 예시하는 블록 다이어그램이다.
도 19는 일부 실시예들에 따른 특징-잔차 변환기가 없는 예시적인 디코더 아키텍처를 예시하는 블록 다이어그램이다.
도 20은 일부 실시예들에 따른 복셀 기반 업샘플링 및 특징 집계를 통한 단일 진행을 갖는 예시적인 디코더 아키텍처를 예시하는 블록 다이어그램이다.
도 21은 일부 실시예들에 따른 예시적인 희소 텐서 연산들을 예시하는 블록 다이어그램이다.
도 22는 일부 실시예들에 따른 예시적인 디코더 아키텍처를 예시하는 블록 다이어그램이다.
도 23은 일부 실시예들에 따른 예시적인 디코더 아키텍처를 예시하는 블록 다이어그램이다.
도 24는 일부 실시예들에 따른 프루닝을 사용하는 콘텍스트 인식 복셀 기반 업샘플링의 예시적인 프로세스를 예시하는 흐름도이다.
도 25는 일부 실시예들에 따른 콘텍스트 인식 복셀 기반 업샘플링 및 특징 집계의 예시적인 프로세스를 예시하는 흐름도이다.
도 26은 일부 실시예들에 따른 비트스트림을 인코딩하는 예시적인 프로세스를 예시하는 흐름도이다.
다양한 도면들에 묘사된 - 그리고 그와 관련하여 설명된 - 엔티티, 연결, 배열 등은 예로서 제시된 것이고 제한으로서 제시된 것이 아니다. 따라서, 특정 도면이 무엇을 "묘사하는지", 특정 도면 내의 특정 요소 또는 엔티티가 무엇"인지" 또는 무엇을 "가지고 있는지"에 대한 임의의 및 모든 진술들 또는 기타 지시들, 그리고 - 단독으로 맥락을 벗어나 절대적인 것으로 따라서 제한적인 것으로 읽힐 수 있는 - 임의의 및 모든 유사한 진술들은 "적어도 하나의 실시예에서, ..."와 같은 구절이 구조상 선행하는 것으로만 적절하게 읽힐 수 있다. 제시의 간결성과 명확성을 위해, 이러한 암시된 선행 구절은 상세한 설명에서 지루할 정도로 반복되지 않는다.FIG. 1A is a system diagram illustrating an exemplary communications system according to some embodiments.
FIG. 1b is a system diagram illustrating an exemplary wireless transmit/receive unit (WTRU) that may be used within the communications system illustrated in FIG. 1a according to some embodiments.
FIG. 1c is a system diagram illustrating an exemplary set of interfaces for a system according to some embodiments.
Figure 2a is a schematic example showing an exemplary voxel-based representation of a point cloud.
Figure 2b is a schematic example showing an exemplary sparse voxel-based representation of a point cloud.
Figure 3 is a schematic process diagram showing an exemplary nearest neighbor (NN) upsampling for a point cloud.
Figure 4 is a schematic process diagram showing an example voxel-based upsampling using pruning.
Figure 5 is a schematic process diagram showing an exemplary context-aware voxel-based upsampling using pruning according to some embodiments.
FIG. 6a is a table illustrating exemplary position values according to some embodiments.
FIG. 6b is a schematic perspective diagram illustrating exemplary child voxel locations as context information according to some embodiments.
FIG. 7 is a flowchart illustrating an exemplary process for cascading multiple context-aware upsampling according to some embodiments.
FIG. 8 is a schematic process diagram showing an exemplary context-aware voxel-based upsampling using initial feature aggregation according to some embodiments.
FIG. 9 is a flowchart illustrating an exemplary process for binary classification according to some embodiments.
FIG. 10 is a block diagram illustrating an exemplary process having cascaded sparse convolutional layers for feature aggregation according to some embodiments.
FIG. 11 is a block diagram illustrating an exemplary ResNet block for feature aggregation according to some embodiments.
FIG. 12 is a block diagram illustrating an exemplary Inception-ResNet block for feature aggregation according to some embodiments.
FIG. 13 is a block diagram illustrating an exemplary transformer block for feature aggregation according to some embodiments.
FIG. 14 is a block diagram illustrating an exemplary architecture of a self-attention block according to some embodiments.
FIG. 15 is a flowchart illustrating an exemplary process for cascading multiple feature aggregations according to some embodiments.
FIG. 16 is a block diagram illustrating an exemplary original decoder architecture according to some embodiments.
FIG. 17 is a block diagram illustrating an exemplary decoder architecture with voxel-based upsampling according to some embodiments.
FIG. 18 is a block diagram illustrating an exemplary decoder architecture with voxel-based upsampling and feature aggregation according to some embodiments.
FIG. 19 is a block diagram illustrating an exemplary decoder architecture without a feature-residual transformer according to some embodiments.
FIG. 20 is a block diagram illustrating an exemplary decoder architecture with a single pass through voxel-based upsampling and feature aggregation according to some embodiments.
FIG. 21 is a block diagram illustrating exemplary sparse tensor operations according to some embodiments.
FIG. 22 is a block diagram illustrating an exemplary decoder architecture according to some embodiments.
FIG. 23 is a block diagram illustrating an exemplary decoder architecture according to some embodiments.
FIG. 24 is a flowchart illustrating an exemplary process of context-aware voxel-based upsampling using pruning according to some embodiments.
FIG. 25 is a flowchart illustrating an exemplary process of context-aware voxel-based upsampling and feature aggregation according to some embodiments.
FIG. 26 is a flowchart illustrating an exemplary process for encoding a bitstream according to some embodiments.
The entities, connections, arrangements, and the like depicted in - and described in connection with - the various drawings are presented by way of example, and not limitation. Accordingly, any and all statements or other instructions as to what a particular drawing "depicts," what a particular element or entity "is" or "has," and any and all similar statements - which alone, taken out of context, could be read as absolute and therefore limiting - are properly read only as if they were structurally preceded by a phrase such as "In at least one embodiment, ..." In the interest of brevity and clarity of presentation, such implied preceding phrases are not tediously repeated in the Detailed Description.

도 1a는 하나 이상의 개시된 실시예가 구현될 수 있는 예시적인 통신 시스템(100)을 예시하는 시스템 다이어그램이다. 통신 시스템(100)은, 음성, 데이터, 비디오, 메시징, 방송 등과 같은, 콘텐츠를 다수의 무선 사용자에게 제공하는 다중 액세스 시스템(multiple access system)일 수 있다. 통신 시스템(100)은 다수의 무선 사용자가, 무선 대역폭을 포함한, 시스템 자원의 공유를 통해 그러한 콘텐츠에 액세스하는 것을 가능하게 할 수 있다. 예를 들어, 통신 시스템(100)은, CDMA(code division multiple access), TDMA(time division multiple access), FDMA(frequency division multiple access), OFDMA(orthogonal FDMA), SC-FDMA(single-carrier FDMA), ZT UW DTS-s OFDM(zero-tail unique-word DFT-Spread OFDM), UW-OFDM(unique word OFDM), 자원 블록 필터링된 OFDM(resource block-filtered OFDM), FBMC(filter bank multicarrier) 등과 같은, 하나 이상의 채널 액세스 방법을 이용할 수 있다.FIG. 1A is a system diagram illustrating an exemplary communications system (100) in which one or more of the disclosed embodiments may be implemented. The communications system (100) may be a multiple access system that provides content, such as voice, data, video, messaging, broadcast, and the like, to multiple wireless users. The communications system (100) may enable multiple wireless users to access such content through sharing of system resources, including wireless bandwidth. For example, the communication system (100) may utilize one or more channel access methods, such as code division multiple access (CDMA), time division multiple access (TDMA), frequency division multiple access (FDMA), orthogonal FDMA (OFDMA), single-carrier FDMA (SC-FDMA), zero-tail unique-word DFT-Spread OFDM (ZT UW DTS-s OFDM), unique word OFDM (UW-OFDM), resource block-filtered OFDM, filter bank multicarrier (FBMC), etc.

도 1a에 도시된 바와 같이, 통신 시스템(100)은 무선 송신/수신 유닛들(WTRU들)(102a, 102b, 102c, 102d), RAN(104/113), CN(106), PSTN(public switched telephone network)(108), 인터넷(110), 및 다른 네트워크들(112)을 포함할 수 있지만, 개시된 실시예들이 임의의 수의 WTRU, 기지국, 네트워크, 및/또는 네트워크 요소를 고려하고 있음이 이해될 것이다. WTRU들(102a, 102b, 102c, 102d) 각각은 무선 환경에서 작동 및/또는 통신하도록 구성된 임의의 유형의 디바이스일 수 있다. 예로서, WTRU들(102a, 102b, 102c, 102d) - 이들 중 임의의 것은 "스테이션" 및/또는 "STA"라고 지칭될 수 있음 - 은 무선 신호들을 송신 및/또는 수신하도록 구성될 수 있고, UE(user equipment), 이동국, 고정 또는 모바일 가입자 유닛, 가입 기반 유닛, 호출기, 셀룰러 전화, PDA(personal digital assistant), 스마트폰, 랩톱, 넷북, 개인용 컴퓨터, 무선 센서, 핫스폿 또는 Mi-Fi 디바이스, IoT(Internet of Things) 디바이스, 시계 또는 다른 웨어러블, HMD(head-mounted display), 차량, 드론, 의료 디바이스 및 응용 분야(예를 들어, 원격 수술), 산업 디바이스 및 응용 분야(예를 들어, 산업 및/또는 자동화된 처리 체인 콘텍스트에서 작동하는 로봇 및/또는 다른 무선 디바이스), 소비자 전자 디바이스, 상업 및/또는 산업 무선 네트워크 상에서 작동하는 디바이스 등을 포함할 수 있다. WTRU들(102a, 102b, 102c 및 102d) 중 임의의 것은 UE라고 상호 교환적으로 지칭될 수 있다.As illustrated in FIG. 1a, the communications system (100) may include wireless transmit/receive units (WTRUs) (102a, 102b, 102c, 102d), a RAN (104/113), a CN (106), a public switched telephone network (PSTN) (108), the Internet (110), and other networks (112), although it will be appreciated that the disclosed embodiments contemplate any number of WTRUs, base stations, networks, and/or network elements. Each of the WTRUs (102a, 102b, 102c, 102d) may be any type of device configured to operate and/or communicate in a wireless environment. For example, the WTRUs (102a, 102b, 102c, 102d)—any of which may be referred to as a “station” and/or a “STA”—may be configured to transmit and/or receive wireless signals, and may include user equipment (UE), a mobile station, a fixed or mobile subscriber unit, a subscription-based unit, a pager, a cellular telephone, a personal digital assistant (PDA), a smartphone, a laptop, a netbook, a personal computer, a wireless sensor, a hotspot or Mi-Fi device, an Internet of Things (IoT) device, a watch or other wearable, a head-mounted display (HMD), a vehicle, a drone, medical devices and applications (e.g., remote surgery), industrial devices and applications (e.g., robots and/or other wireless devices operating in an industrial and/or automated process chain context), consumer electronics devices, devices operating on commercial and/or industrial wireless networks, and the like. Any of the WTRUs (102a, 102b, 102c, and 102d) may be interchangeably referred to as UEs.

통신 시스템(100)은 기지국(114a) 및/또는 기지국(114b)을 또한 포함할 수 있다. 기지국들(114a, 114b) 각각은, CN(106), 인터넷(110), 및/또는 다른 네트워크들(112)과 같은, 하나 이상의 통신 네트워크에 대한 액세스를 용이하게 하기 위해 WTRU들(102a, 102b, 102c, 102d) 중 적어도 하나와 무선으로 인터페이싱하도록 구성된 임의의 유형의 디바이스일 수 있다. 예로서, 기지국들(114a, 114b)은 BTS(base transceiver station), Node-B, eNode B, Home Node B, Home eNode B, gNB, NR NodeB, 사이트 제어기(site controller), 액세스 포인트(access point, AP), 무선 라우터(wireless router) 등일 수 있다. 기지국들(114a, 114b)이 각각 단일 요소로서 묘사되어 있지만, 기지국들(114a, 114b)이 임의의 수의 상호 연결된 기지국 및/또는 네트워크 요소를 포함할 수 있다는 것이 이해될 것이다.The communication system (100) may also include a base station (114a) and/or a base station (114b). Each of the base stations (114a, 114b) may be any type of device configured to wirelessly interface with at least one of the WTRUs (102a, 102b, 102c, 102d) to facilitate access to one or more communication networks, such as the CN (106), the Internet (110), and/or other networks (112). By way of example, the base stations (114a, 114b) may be a base transceiver station (BTS), a Node-B, an eNode B, a Home Node B, a Home eNode B, a gNB, a NR NodeB, a site controller, an access point (AP), a wireless router, and/or the like. Although the base stations (114a, 114b) are each depicted as a single element, it will be appreciated that the base stations (114a, 114b) may include any number of interconnected base stations and/or network elements.

기지국(114a)은, 기지국 제어기(base station controller, BSC), 라디오 네트워크 제어기(radio network controller, RNC), 릴레이 노드(relay node) 등과 같은, 다른 기지국들 및/또는 네트워크 요소들(도시되지 않음)을 또한 포함할 수 있는 RAN(104/113)의 일부일 수 있다. 기지국(114a) 및/또는 기지국(114b)은, 셀(cell)(도시되지 않음)이라고 지칭될 수 있는, 하나 이상의 캐리어 주파수에서 무선 신호들을 송신 및/또는 수신하도록 구성될 수 있다. 이러한 주파수들은 면허 스펙트럼(licensed spectrum), 비면허 스펙트럼(unlicensed spectrum), 또는 면허 스펙트럼과 비면허 스펙트럼의 조합에 있을 수 있다. 셀은 상대적으로 고정될 수 있거나 시간에 따라 변할 수 있는 특정 지리적 영역에 대한 무선 서비스를 위한 커버리지를 제공할 수 있다. 셀은 셀 섹터들로 추가로 분할될 수 있다. 예를 들어, 기지국(114a)과 연관된 셀이 세 개의 섹터로 분할될 수 있다. 따라서, 일 실시예에서, 기지국(114a)은, 즉, 셀의 각각의 섹터에 대해 하나씩, 세 개의 트랜시버를 포함할 수 있다. 실시예에서, 기지국(114a)은 MIMO(multiple-input multiple-output) 기술을 이용할 수 있고, 셀의 각각의 섹터에 대해 다수의 트랜시버를 활용할 수 있다. 예를 들어, 빔포밍은 신호들을 원하는 공간 방향들로 송신 및/또는 수신하는 데 사용될 수 있다.The base station (114a) may be part of a RAN (104/113) that may also include other base stations and/or network elements (not shown), such as a base station controller (BSC), a radio network controller (RNC), a relay node, etc. The base station (114a) and/or the base station (114b) may be configured to transmit and/or receive wireless signals on one or more carrier frequencies, which may be referred to as a cell (not shown). These frequencies may be in a licensed spectrum, an unlicensed spectrum, or a combination of licensed and unlicensed spectrum. A cell may provide coverage for wireless services for a particular geographic area, which may be relatively fixed or may vary over time. A cell may be further divided into cell sectors. For example, a cell associated with the base station (114a) may be divided into three sectors. Thus, in one embodiment, the base station (114a) may include three transceivers, one for each sector of the cell. In an embodiment, the base station (114a) may utilize multiple-input multiple-output (MIMO) technology and utilize multiple transceivers for each sector of the cell. For example, beamforming may be used to transmit and/or receive signals in desired spatial directions.

기지국들(114a, 114b)은 에어 인터페이스(air interface)(116)를 통해 WTRU들(102a, 102b, 102c, 102d) 중 하나 이상과 통신할 수 있으며, 에어 인터페이스(116)는 임의의 적합한 무선 통신 링크(예를 들면, 라디오 주파수(radio frequency, RF), 마이크로파, 센티미터 파, 마이크로미터 파, 적외선(IR), 자외선(UV), 가시 광 등)일 수 있다. 에어 인터페이스(116)는 임의의 적합한 RAT(radio access technology)를 사용하여 설정될 수 있다.The base stations (114a, 114b) may communicate with one or more of the WTRUs (102a, 102b, 102c, 102d) via an air interface (116), which may be any suitable wireless communication link (e.g., radio frequency (RF), microwave, centimeter wave, micrometer wave, infrared (IR), ultraviolet (UV), visible light, etc.). The air interface (116) may be established using any suitable radio access technology (RAT).

보다 구체적으로, 위에서 언급된 바와 같이, 통신 시스템(100)은 다중 액세스 시스템일 수 있고, CDMA, TDMA, FDMA, OFDMA, SC-FDMA 등과 같은, 하나 이상의 채널 액세스 방식을 이용할 수 있다. 예를 들어, RAN(104/113) 내의 기지국(114a) 및 WTRU들(102a, 102b, 102c)은, WCDMA(wideband CDMA)를 사용하여 에어 인터페이스(116)를 설정할 수 있는, UTRA(UMTS(Universal Mobile Telecommunications System) Terrestrial Radio Access)와 같은 라디오 기술(radio technology)을 구현할 수 있다. WCDMA는 HSPA(High-Speed Packet Access) 및/또는 HSPA+(Evolved HSPA)와 같은 통신 프로토콜들을 포함할 수 있다. HSPA는 HSDPA(High-Speed DL(Downlink) Packet Access) 및/또는 HSUPA(High-Speed UL Packet Access)를 포함할 수 있다.More specifically, as noted above, the communication system (100) may be a multiple-access system and may utilize one or more channel access schemes, such as CDMA, TDMA, FDMA, OFDMA, SC-FDMA, etc. For example, the base station (114a) and the WTRUs (102a, 102b, 102c) within the RAN (104/113) may implement a radio technology, such as Universal Mobile Telecommunications System (UMTS) Terrestrial Radio Access (UTRA), which may establish the air interface (116) using wideband CDMA (WCDMA). WCDMA may include communication protocols such as High-Speed Packet Access (HSPA) and/or Evolved HSPA (HSPA+). HSPA may include High-Speed DL (Downlink) Packet Access (HSDPA) and/or High-Speed UL Packet Access (HSUPA).

실시예에서, 기지국(114a) 및 WTRU들(102a, 102b, 102c)은, LTE(Long Term Evolution) 및/또는 LTE-A(LTE-Advanced) 및/또는 LTE-A Pro(LTE-Advanced Pro)를 사용하여 에어 인터페이스(116)를 설정할 수 있는, E-UTRA(Evolved UMTS Terrestrial Radio Access)와 같은 라디오 기술을 구현할 수 있다.In an embodiment, the base station (114a) and the WTRUs (102a, 102b, 102c) may implement a radio technology, such as Evolved UMTS Terrestrial Radio Access (E-UTRA), which may establish the air interface (116) using Long Term Evolution (LTE) and/or LTE-Advanced (LTE-A) and/or LTE-Advanced Pro (LTE-A Pro).

실시예에서, 기지국(114a) 및 WTRU들(102a, 102b, 102c)은, NR(New Radio)을 사용하여 에어 인터페이스(116)를 설정할 수 있는, NR 라디오 액세스와 같은 라디오 기술을 구현할 수 있다.In an embodiment, the base station (114a) and the WTRUs (102a, 102b, 102c) may implement a radio technology, such as New Radio (NR) radio access, that may establish the air interface (116) using NR.

실시예에서, 기지국(114a) 및 WTRU들(102a, 102b, 102c)은 다수의 라디오 액세스 기술을 구현할 수 있다. 예를 들어, 기지국(114a) 및 WTRU들(102a, 102b, 102c)은, 예를 들어, DC(dual connectivity) 원리를 사용하여, LTE 라디오 액세스와 NR 라디오 액세스를 함께 구현할 수 있다. 따라서, WTRU들(102a, 102b, 102c)에 의해 활용되는 에어 인터페이스는 다수의 유형의 기지국들(예를 들면, eNB 및 gNB)로/로부터 전송되는 다수의 유형의 라디오 액세스 기술들 및/또는 송신들에 의해 특징지어질 수 있다.In an embodiment, the base station (114a) and the WTRUs (102a, 102b, 102c) may implement multiple radio access technologies. For example, the base station (114a) and the WTRUs (102a, 102b, 102c) may implement LTE radio access and NR radio access together, for example, using dual connectivity (DC) principles. Thus, the air interface utilized by the WTRUs (102a, 102b, 102c) may be characterized by multiple types of radio access technologies and/or transmissions to/from multiple types of base stations (e.g., eNBs and gNBs).

다른 실시예들에서, 기지국(114a) 및 WTRU들(102a, 102b, 102c)은 IEEE 802.11(즉, WiFi(Wireless Fidelity)), IEEE 802.16(즉, WiMAX(Worldwide Interoperability for Microwave Access)), CDMA2000, CDMA2000 1X, CDMA2000 EV-DO, IS-2000(Interim Standard 2000), IS-95(Interim Standard 95), IS-856(Interim Standard 856), GSM(Global System for Mobile communications), EDGE(Enhanced Data rates for GSM Evolution), GSM EDGE(GERAN) 등과 같은 라디오 기술들을 구현할 수 있다.In other embodiments, the base station (114a) and the WTRUs (102a, 102b, 102c) may implement radio technologies such as IEEE 802.11 (i.e., Wireless Fidelity (WiFi)), IEEE 802.16 (i.e., Worldwide Interoperability for Microwave Access (WiMAX)), CDMA2000, CDMA2000 1X, CDMA2000 EV-DO, Interim Standard 2000 (IS-2000), Interim Standard 95 (IS-95), Interim Standard 856 (IS-856), Global System for Mobile communications (GSM), Enhanced Data rates for GSM Evolution (EDGE), GSM EDGE (GERAN), and the like.

도 1a에서의 기지국(114b)은, 예를 들어, 무선 라우터, Home Node B, Home eNode B, 또는 액세스 포인트일 수 있고, 사업장, 가정, 차량, 캠퍼스, 산업 시설, (예를 들면, 드론이 사용할) 공중 회랑(air corridor), 도로 등과 같은, 로컬화된 영역에서의 무선 연결을 용이하게 하기 위해 임의의 적합한 RAT를 활용할 수 있다. 일 실시예에서, 기지국(114b)과 WTRU들(102c, 102d)은 WLAN(wireless local area network)을 설정하기 위해 IEEE 802.11과 같은 라디오 기술을 구현할 수 있다. 실시예에서, 기지국(114b)과 WTRU들(102c, 102d)은 WPAN(wireless personal area network)을 설정하기 위해 IEEE 802.15와 같은 라디오 기술을 구현할 수 있다. 또 다른 실시예에서, 기지국(114b)과 WTRU들(102c, 102d)은 피코셀(picocell) 또는 펨토셀(femtocell)을 설정하기 위해 셀룰러 기반 RAT(예를 들면, WCDMA, CDMA2000, GSM, LTE, LTE-A, LTE-A Pro, NR 등)를 활용할 수 있다. 도 1a에 도시된 바와 같이, 기지국(114b)은 인터넷(110)에 대한 직접 연결(direct connection)을 가질 수 있다. 따라서, 기지국(114b)은 CN(106)을 통해 인터넷(110)에 액세스할 필요가 없을 수 있다.The base station (114b) in FIG. 1a may be, for example, a wireless router, a Home Node B, a Home eNode B, or an access point, and may utilize any suitable RAT to facilitate wireless connectivity in a localized area, such as a business premises, a home, a vehicle, a campus, an industrial facility, an air corridor (e.g., for use by drones), a roadway, and the like. In one embodiment, the base station (114b) and the WTRUs (102c, 102d) may implement a radio technology, such as IEEE 802.11, to establish a wireless local area network (WLAN). In an embodiment, the base station (114b) and the WTRUs (102c, 102d) may implement a radio technology, such as IEEE 802.15, to establish a wireless personal area network (WPAN). In another embodiment, the base station (114b) and the WTRUs (102c, 102d) may utilize a cellular-based RAT (e.g., WCDMA, CDMA2000, GSM, LTE, LTE-A, LTE-A Pro, NR, etc.) to establish a picocell or femtocell. As illustrated in FIG. 1a, the base station (114b) may have a direct connection to the Internet (110). Thus, the base station (114b) may not need to access the Internet (110) via the CN (106).

RAN(104/113)은 CN(106)과 통신할 수 있으며, CN(106)은 음성, 데이터, 애플리케이션, 및/또는 VoIP(voice over internet protocol) 서비스를 WTRU들(102a, 102b, 102c, 102d) 중 하나 이상에 제공하도록 구성된 임의의 유형의 네트워크일 수 있다. 데이터는, 상이한 처리량 요구 사항, 레이턴시 요구 사항, 허용 오차(error tolerance) 요구 사항, 신뢰도 요구 사항, 데이터 처리량 요구 사항, 이동성 요구 사항 등과 같은, 다양한 QoS(quality of service) 요구 사항을 가질 수 있다. CN(106)은 호 제어(call control), 과금(billing) 서비스, 모바일 위치 기반 서비스, 선불 전화(pre-paid calling), 인터넷 연결성, 비디오 배포 등을 제공할 수 있고/있거나, 사용자 인증과 같은, 상위 레벨 보안 기능들을 수행할 수 있다. 비록 도 1a에 도시되어 있지는 않지만, RAN(104/113) 및/또는 CN(106)이 RAN(104/113)과 동일한 RAT 또는 상이한 RAT를 이용하는 다른 RAN들과 직접 또는 간접 통신을 할 수 있다는 것이 이해될 것이다. 예를 들어, NR 라디오 기술을 활용하고 있을 수 있는 RAN(104/113)에 연결되는 것 외에도, CN(106)은 또한 GSM, UMTS, CDMA 2000, WiMAX, E-UTRA, 또는 WiFi 라디오 기술을 이용하는 다른 RAN(도시되지 않음)과 통신할 수 있다.The RAN (104/113) may be in communication with the CN (106), which may be any type of network configured to provide voice, data, application, and/or voice over internet protocol (VoIP) services to one or more of the WTRUs (102a, 102b, 102c, 102d). The data may have different quality of service (QoS) requirements, such as different throughput requirements, latency requirements, error tolerance requirements, reliability requirements, data throughput requirements, mobility requirements, and the like. The CN (106) may provide call control, billing services, mobile location-based services, pre-paid calling, Internet connectivity, video distribution, and/or may perform higher level security functions, such as user authentication. Although not shown in FIG. 1a, it will be appreciated that the RAN (104/113) and/or the CN (106) may communicate directly or indirectly with other RANs that utilize the same RAT as the RAN (104/113) or a different RAT. For example, in addition to being connected to the RAN (104/113) that may utilize NR radio technology, the CN (106) may also communicate with other RANs (not shown) that utilize GSM, UMTS, CDMA 2000, WiMAX, E-UTRA, or WiFi radio technologies.

CN(106)은 또한 WTRU들(102a, 102b, 102c, 102d)이 PSTN(108), 인터넷(110), 및/또는 다른 네트워크들(112)에 액세스하기 위한 게이트웨이로서 역할할 수 있다. PSTN(108)은 POTS(plain old telephone service)를 제공하는 회선 교환 전화 네트워크들을 포함할 수 있다. 인터넷(110)은 TCP/IP 인터넷 프로토콜 스위트 내의 TCP(transmission control protocol), UDP(user datagram protocol) 및/또는 IP(internet protocol)와 같은, 공통의 통신 프로토콜들을 사용하는 상호 연결된 컴퓨터 네트워크들 및 디바이스들의 글로벌 시스템을 포함할 수 있다. 네트워크들(112)은 다른 서비스 제공자들에 의해 소유 및/또는 운영되는 유선 및/또는 무선 통신 네트워크들을 포함할 수 있다. 예를 들어, 네트워크들(112)은, RAN(104/113)과 동일한 RAT 또는 상이한 RAT를 이용할 수 있는, 하나 이상의 RAN에 연결된 다른 CN을 포함할 수 있다.The CN (106) may also act as a gateway for the WTRUs (102a, 102b, 102c, 102d) to access the PSTN (108), the Internet (110), and/or other networks (112). The PSTN (108) may include circuit-switched telephone networks that provide plain old telephone service (POTS). The Internet (110) may include a global system of interconnected computer networks and devices that use common communication protocols, such as transmission control protocol (TCP), user datagram protocol (UDP), and/or internet protocol (IP) within the TCP/IP suite of Internet protocols. The networks (112) may include wired and/or wireless communication networks owned and/or operated by other service providers. For example, the networks (112) may include other CNs connected to one or more RANs, which may utilize the same RAT as the RAN (104/113) or a different RAT.

통신 시스템(100) 내의 WTRU들(102a, 102b, 102c, 102d)의 일부 또는 전부는 다중 모드 능력(multi-mode capabilities)을 포함할 수 있다(예를 들면, WTRU들(102a, 102b, 102c, 102d)은 상이한 무선 링크들을 통해 상이한 무선 네트워크들과 통신하기 위한 다수의 트랜시버를 포함할 수 있다). 예를 들어, 도 1a에 도시된 WTRU(102c)는 셀룰러 기반 라디오 기술을 이용할 수 있는 기지국(114a)과 통신하도록, 그리고 IEEE 802 라디오 기술을 이용할 수 있는 기지국(114b)과 통신하도록 구성될 수 있다.Some or all of the WTRUs (102a, 102b, 102c, 102d) within the communication system (100) may include multi-mode capabilities (e.g., the WTRUs (102a, 102b, 102c, 102d) may include multiple transceivers for communicating with different wireless networks over different wireless links). For example, the WTRU (102c) illustrated in FIG. 1a may be configured to communicate with a base station (114a) that may utilize a cellular-based radio technology and to communicate with a base station (114b) that may utilize an IEEE 802 radio technology.

도 1b는 예시적인 WTRU(102)를 예시하는 시스템 다이어그램이다. 도 1b에 도시된 바와 같이, WTRU(102)는, 그 중에서도, 프로세서(118), 트랜시버(120), 송신/수신 요소(transmit/receive element)(122), 스피커/마이크로폰(124), 키패드(126), 디스플레이/터치패드(128), 비이동식 메모리(130), 이동식 메모리(132), 전원(134), GPS(global positioning system) 칩세트(136), 및/또는 다른 주변기기들(138)을 포함할 수 있다. 실시예와 부합한 채로 있으면서 WTRU(102)가 전술한 요소들의 임의의 서브컴비네이션(sub-combination)을 포함할 수 있다는 것이 이해될 것이다.FIG. 1B is a system diagram illustrating an exemplary WTRU (102). As depicted in FIG. 1B, the WTRU (102) may include, among other things, a processor (118), a transceiver (120), a transmit/receive element (122), a speaker/microphone (124), a keypad (126), a display/touchpad (128), non-removable memory (130), removable memory (132), a power source (134), a global positioning system (GPS) chipset (136), and/or other peripherals (138). It will be appreciated that the WTRU (102) may include any sub-combination of the foregoing elements while remaining consistent with the embodiment.

프로세서(118)는 범용 프로세서, 특수 목적 프로세서, 종래의 프로세서, DSP(digital signal processor), 복수의 마이크로프로세서, DSP 코어와 연관된 하나 이상의 마이크로프로세서, 제어기, 마이크로컨트롤러, ASIC(Application Specific Integrated Circuit), FPGA(Field Programmable Gate Array) 회로, 임의의 다른 유형의 IC(integrated circuit), 상태 머신 등일 수 있다. 프로세서(118)는 WTRU(102)가 무선 환경에서 작동할 수 있게 하는 신호 코딩, 데이터 처리, 전력 제어, 입출력 처리, 및/또는 임의의 다른 기능을 수행할 수 있다. 프로세서(118)는 트랜시버(120)에 결합될 수 있고, 트랜시버(120)는 송신/수신 요소(122)에 결합될 수 있다. 도 1b가 프로세서(118)와 트랜시버(120)를 별개의 컴포넌트들로서 묘사하고 있지만, 프로세서(118)와 트랜시버(120)가 전자 패키지 또는 칩에 함께 통합되어 있을 수 있다는 것이 이해될 것이다.The processor (118) may be a general purpose processor, a special purpose processor, a conventional processor, a digital signal processor (DSP), multiple microprocessors, one or more microprocessors in conjunction with a DSP core, a controller, a microcontroller, an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) circuit, any other type of integrated circuit (IC), a state machine, or the like. The processor (118) may perform signal coding, data processing, power control, input/output processing, and/or any other functions that enable the WTRU (102) to operate in a wireless environment. The processor (118) may be coupled to the transceiver (120), which may be coupled to the transmit/receive element (122). Although FIG. 1b depicts the processor (118) and the transceiver (120) as separate components, it will be appreciated that the processor (118) and the transceiver (120) may be integrated together in an electronic package or chip.

송신/수신 요소(122)는 에어 인터페이스(116)를 통해 기지국(예를 들어, 기지국(114a))으로 신호들을 송신하거나 기지국으로부터 신호들을 수신하도록 구성될 수 있다. 예를 들어, 일 실시예에서, 송신/수신 요소(122)는 RF 신호들을 송신 및/또는 수신하도록 구성된 안테나일 수 있다. 실시예에서, 송신/수신 요소(122)는, 예를 들어, IR, UV, 또는 가시 광 신호들을 송신 및/또는 수신하도록 구성된 방출기/검출기(emitter/detector)일 수 있다. 또 다른 실시예에서, 송신/수신 요소(122)는 RF 신호 및 광 신호 둘 모두를 송신 및/또는 수신하도록 구성될 수 있다. 송신/수신 요소(122)가 무선 신호들의 임의의 조합을 송신 및/또는 수신하도록 구성될 수 있음이 이해될 것이다.The transmit/receive element (122) may be configured to transmit signals to or receive signals from a base station (e.g., base station 114a) over the air interface (116). For example, in one embodiment, the transmit/receive element (122) may be an antenna configured to transmit and/or receive RF signals. In an embodiment, the transmit/receive element (122) may be an emitter/detector configured to transmit and/or receive, for example, IR, UV, or visible light signals. In another embodiment, the transmit/receive element (122) may be configured to transmit and/or receive both RF signals and optical signals. It will be appreciated that the transmit/receive element (122) may be configured to transmit and/or receive any combination of wireless signals.

비록 송신/수신 요소(122)가 도 1b에서 단일 요소로서 묘사되어 있지만, WTRU(102)는 임의의 수의 송신/수신 요소(122)를 포함할 수 있다. 보다 구체적으로는, WTRU(102)는 MIMO 기술을 이용할 수 있다. 따라서, 일 실시예에서, WTRU(102)는 에어 인터페이스(116)를 통해 무선 신호들을 송신 및 수신하기 위한 두 개 이상의 송신/수신 요소(122)(예를 들면, 다수의 안테나)를 포함할 수 있다.Although the transmit/receive element (122) is depicted as a single element in FIG. 1B, the WTRU (102) may include any number of transmit/receive elements (122). More specifically, the WTRU (102) may utilize MIMO technology. Thus, in one embodiment, the WTRU (102) may include two or more transmit/receive elements (122) (e.g., multiple antennas) for transmitting and receiving wireless signals over the air interface (116).

트랜시버(120)는 송신/수신 요소(122)에 의해 송신되어야 하는 신호들을 변조하도록 그리고 송신/수신 요소(122)에 의해 수신되는 신호들을 복조하도록 구성될 수 있다. 위에서 언급된 바와 같이, WTRU(102)는 다중-모드 능력을 가질 수 있다. 따라서, 트랜시버(120)는 WTRU(102)가, 예를 들어, NR 및 IEEE 802.11과 같은, 다수의 RAT를 통해 통신할 수 있도록 하기 위한 다수의 트랜시버를 포함할 수 있다.The transceiver (120) may be configured to modulate signals to be transmitted by the transmit/receive element (122) and to demodulate signals to be received by the transmit/receive element (122). As noted above, the WTRU (102) may have multi-mode capabilities. Accordingly, the transceiver (120) may include multiple transceivers to enable the WTRU (102) to communicate over multiple RATs, such as, for example, NR and IEEE 802.11.

WTRU(102)의 프로세서(118)는 스피커/마이크로폰(124), 키패드(126), 및/또는 디스플레이/터치패드(128)(예를 들면, LCD(liquid crystal display) 디스플레이 유닛 또는 OLED(organic light-emitting diode) 디스플레이 유닛)에 결합될 수 있고 이들로부터 사용자 입력 데이터를 수신할 수 있다. 프로세서(118)는 또한 사용자 데이터를 스피커/마이크로폰(124), 키패드(126) 및/또는 디스플레이/터치패드(128)에 출력할 수 있다. 추가적으로, 프로세서(118)는, 비이동식 메모리(130) 및/또는 이동식 메모리(132)와 같은, 임의의 유형의 적합한 메모리로부터의 정보에 액세스하고 그에 데이터를 저장할 수 있다. 비이동식 메모리(130)는 RAM(random-access memory), ROM(read-only memory), 하드 디스크, 또는 임의의 다른 유형의 메모리 저장 디바이스를 포함할 수 있다. 이동식 메모리(132)는 SIM(subscriber identity module) 카드, 메모리 스틱, SD(secure digital) 메모리 카드 등을 포함할 수 있다. 다른 실시예들에서, 프로세서(118)는 WTRU(102) 상에, 예컨대, 서버 또는 홈 컴퓨터(도시되지 않음) 상에 물리적으로 위치하지 않은 메모리로부터의 정보에 액세스하고 그에 데이터를 저장할 수 있다.The processor (118) of the WTRU (102) may be coupled to and receive user input data from a speaker/microphone (124), a keypad (126), and/or a display/touchpad (128) (e.g., a liquid crystal display (LCD) display unit or an organic light-emitting diode (OLED) display unit). The processor (118) may also output user data to the speaker/microphone (124), the keypad (126), and/or the display/touchpad (128). Additionally, the processor (118) may access information from and store data in any type of suitable memory, such as non-removable memory (130) and/or removable memory (132). The non-removable memory (130) may include random-access memory (RAM), read-only memory (ROM), a hard disk, or any other type of memory storage device. The removable memory (132) may include a subscriber identity module (SIM) card, a memory stick, a secure digital (SD) memory card, etc. In other embodiments, the processor (118) may access information from and store data in memory that is not physically located on the WTRU (102), such as on a server or a home computer (not shown).

프로세서(118)는 전원(134)으로부터 전력을 받을 수 있고, WTRU(102) 내의 다른 컴포넌트들에 전력을 분배하고/하거나 전력을 제어하도록 구성될 수 있다. 전원(134)은 WTRU(102)에 전력을 공급하기 위한 임의의 적합한 디바이스일 수 있다. 예를 들어, 전원(134)은 하나 이상의 건전지 배터리(예를 들면, 니켈 카드뮴(NiCd), 니켈 아연(NiZn), 니켈 금속 수소화물(NiMH), 리튬 이온(Li 이온) 등), 태양 전지, 연료 전지 등을 포함할 수 있다.The processor (118) may receive power from a power source (134) and may be configured to distribute power to and/or control power to other components within the WTRU (102). The power source (134) may be any suitable device for providing power to the WTRU (102). For example, the power source (134) may include one or more dry cell batteries (e.g., nickel cadmium (NiCd), nickel zinc (NiZn), nickel metal hydride (NiMH), lithium ion (Li ion), etc.), a solar cell, a fuel cell, etc.

프로세서(118)는 또한 WTRU(102)의 현재 위치에 관한 위치 정보(예를 들면, 경도 및 위도)를 제공하도록 구성될 수 있는 GPS 칩세트(136)에 결합될 수 있다. GPS 칩세트(136)로부터의 정보 외에도 또는 그 대신에, WTRU(102)는 기지국(예를 들면, 기지국들(114a, 114b))으로부터 에어 인터페이스(116)를 통해 위치 정보를 수신할 수 있고/있거나 신호들이 두 개 이상의 인근 기지국으로부터 수신되는 타이밍에 기초하여 그의 위치를 결정할 수 있다. WTRU(102)가 일 실시예와 부합한 채로 있으면서 임의의 적합한 위치 결정 방법에 의해 위치 정보를 획득할 수 있다는 것이 이해될 것이다.The processor (118) may also be coupled to a GPS chipset (136) that may be configured to provide location information (e.g., longitude and latitude) regarding the current location of the WTRU (102). In addition to or instead of information from the GPS chipset (136), the WTRU (102) may receive location information from a base station (e.g., base stations 114a, 114b) over the air interface (116) and/or may determine its location based on the timing at which signals are received from two or more nearby base stations. It will be appreciated that the WTRU (102) may obtain location information by any suitable location determination method while remaining consistent with an embodiment.

프로세서(118)는, 추가적인 특징, 기능 및/또는 유선 또는 무선 연결을 제공하는 하나 이상의 소프트웨어 및/또는 하드웨어 모듈을 포함할 수 있는, 다른 주변기기들(138)에 추가로 결합될 수 있다. 예를 들어, 주변기기들(138)은 가속도계, e-나침반(e-compass), 위성 트랜시버, (사진 및/또는 비디오를 위한) 디지털 카메라, USB(universal serial bus) 포트, 진동 디바이스, 텔레비전 트랜시버, 핸즈프리 헤드셋, 블루투스® 모듈, FM(frequency modulated) 라디오 유닛, 디지털 음악 플레이어, 미디어 플레이어, 비디오 게임 플레이어 모듈, 인터넷 브라우저, 가상 현실 및/또는 증강 현실(VR/AR) 디바이스, 활동 추적기(activity tracker) 등을 포함할 수 있다. 주변기기들(138)은 하나 이상의 센서를 포함할 수 있고, 센서들은 자이로스코프, 가속도계, 홀 효과 센서, 자력계, 배향 센서, 근접 센서, 온도 센서, 시간 센서; 지오로케이션 센서; 고도계, 광 센서, 터치 센서, 자력계, 기압계, 제스처 센서, 생체측정 센서, 및/또는 습도 센서 중 하나 이상일 수 있다.The processor (118) may be further coupled to other peripherals (138), which may include one or more software and/or hardware modules that provide additional features, functionality, and/or wired or wireless connectivity. For example, the peripherals (138) may include an accelerometer, an e-compass, a satellite transceiver, a digital camera (for photography and/or video), a universal serial bus (USB) port, a vibration device, a television transceiver, a hands-free headset, a Bluetooth® module, a frequency modulated (FM) radio unit, a digital music player, a media player, a video game player module, an Internet browser, a virtual reality and/or augmented reality (VR/AR) device, an activity tracker, and the like. The peripherals (138) may include one or more sensors, including but not limited to a gyroscope, an accelerometer, a Hall effect sensor, a magnetometer, an orientation sensor, a proximity sensor, a temperature sensor, a time sensor; a geolocation sensor; It may be one or more of an altimeter, a light sensor, a touch sensor, a magnetometer, a barometer, a gesture sensor, a biometric sensor, and/or a humidity sensor.

WTRU(102)는 ((예를 들어, 송신을 위한) UL 및 (예를 들어, 수신을 위한) 하향링크 둘 모두에 대한 특정의 서브프레임과 연관된) 신호들의 일부 또는 전부의 송신 및 수신이 동시발생적(concurrent) 및/또는 동시적(simultaneous)일 수 있는 전이중 라디오(full duplex radio)를 포함할 수 있다. 전이중 라디오는 하드웨어(예를 들면, 초크(choke)) 또는 프로세서(예를 들면, 별개의 프로세서(도시되지 않음) 또는 프로세서(118))를 통한 신호 처리 중 어느 하나를 통해 자기 간섭(self-interference)을 감소시키고/시키거나 실질적으로 제거하기 위한 간섭 관리 유닛을 포함할 수 있다. 실시예에서, WTRU(102)는 신호들의 일부 또는 전부의 송신 및 수신이 (예를 들어, 송신을 위한) UL 또는 (예를 들어, 수신을 위한) 다운링크 중 어느 하나에 대한 특정의 서브프레임들과 연관된 반이중 라디오(half-duplex radio)를 포함할 수 있다.The WTRU (102) may include a full duplex radio in which transmission and reception of some or all of the signals (e.g., associated with particular subframes for both the UL (e.g., for transmitting) and the downlink (e.g., for receiving)) may be concurrent and/or simultaneous. The full duplex radio may include an interference management unit to reduce and/or substantially eliminate self-interference, either via hardware (e.g., a choke) or via signal processing via a processor (e.g., a separate processor (not shown) or the processor 118 ). In an embodiment, the WTRU (102) may include a half-duplex radio in which transmission and reception of some or all of the signals associated with particular subframes for either the UL (e.g., for transmitting) or the downlink (e.g., for receiving).

비록 WTRU가 도 1a 및 도 1b에서 무선 단말로서 설명되고 있지만, 특정 대표적인 실시예들에서 그러한 단말이 통신 네트워크와의 유선 통신 인터페이스들을 (예를 들어, 일시적으로 또는 영구적으로) 사용할 수 있는 것이 생각된다.Although the WTRU is illustrated in FIGS. 1A and 1B as a wireless terminal, it is contemplated that in certain representative embodiments such a terminal may utilize wired communication interfaces with a communications network (e.g., temporarily or permanently).

대표적인 실시예들에서, 다른 네트워크(112)는 WLAN일 수 있다.In representative embodiments, the other network (112) may be a WLAN.

도 1a와 도 1b 및 대응하는 설명을 고려할 때, 본 명세서에 설명된 기능들 중 하나 이상 또는 그 전부는 하나 이상의 에뮬레이션 디바이스(도시되지 않음)에 의해 수행될 수 있다. 에뮬레이션 디바이스들은 본 명세서에서 설명된 기능들 중 하나 이상, 또는 그 전부를 에뮬레이션하도록 구성된 하나 이상의 디바이스일 수 있다. 예를 들어, 에뮬레이션 디바이스들은 다른 디바이스들을 테스트하는 데 및/또는 네트워크 및/또는 WTRU 기능들을 시뮬레이션하는 데 사용될 수 있다.In view of FIGS. 1A and 1B and the corresponding descriptions, one or more, or all, of the functions described herein may be performed by one or more emulation devices (not shown). The emulation devices may be one or more devices configured to emulate one or more, or all, of the functions described herein. For example, the emulation devices may be used to test other devices and/or to simulate network and/or WTRU functions.

에뮬레이션 디바이스는 랩 환경(lab environment)에 있는 및/또는 운영자 네트워크 환경에 있는 다른 디바이스들의 하나 이상의 테스트를 구현하도록 설계될 수 있다. 예를 들어, 하나 이상의 에뮬레이션 디바이스는 통신 네트워크 내의 다른 디바이스들을 테스트하기 위해 유선 및/또는 무선 통신 네트워크의 일부로서 전체적으로 또는 부분적으로 구현 및/또는 배포되어 있으면서 하나 이상의 기능 또는 모든 기능들을 수행할 수 있다. 하나 이상의 에뮬레이션 디바이스는 유선 및/또는 무선 통신 네트워크의 일부로서 일시적으로 구현/배포되어 있으면서 하나 이상의, 또는 모든 기능들을 수행할 수 있다. 에뮬레이션 디바이스는 테스트 목적으로 다른 디바이스에 직접 결합될 수 있고/있거나 오버-디-에어 무선 통신(over-the-air wireless communications)을 사용하여 테스트를 수행할 수 있다.The emulation device may be designed to implement one or more tests of other devices in a lab environment and/or in an operator network environment. For example, the one or more emulation devices may be implemented and/or deployed, in whole or in part, as part of a wired and/or wireless communications network to test other devices within the communications network while performing one or more or all of the functions. The one or more emulation devices may be temporarily implemented/deployed as part of a wired and/or wireless communications network while performing one or more or all of the functions. The emulation device may be directly coupled to the other device for testing purposes and/or may perform the tests using over-the-air wireless communications.

하나 이상의 에뮬레이션 디바이스는 유선 및/또는 무선 통신 네트워크의 일부로서 구현/배포되어 있지 않으면서 하나 이상의 기능 - 모든 기능들을 포함함 - 을 수행할 수 있다. 예를 들어, 에뮬레이션 디바이스들은 하나 이상의 컴포넌트에 대한 테스트를 구현하기 위해 테스트 연구실 및/또는 비배포된(예를 들면, 테스트) 유선 및/또는 무선 통신 네트워크에서의 테스트 시나리오에서 활용될 수 있다. 하나 이상의 에뮬레이션 디바이스는 테스트 장비일 수 있다. (예를 들면, 하나 이상의 안테나를 포함할 수 있는) RF 회로를 통한 직접 RF 결합 및/또는 무선 통신은 데이터를 송신 및/또는 수신하기 위해 에뮬레이션 디바이스들에 의해 사용될 수 있다.One or more emulation devices may perform one or more functions - including all functions - without being implemented/deployed as part of a wired and/or wireless communications network. For example, the emulation devices may be utilized in a test scenario in a test lab and/or in a non-deployed (e.g., test) wired and/or wireless communications network to implement tests for one or more components. The one or more emulation devices may be test equipment. Direct RF coupling and/or wireless communications via RF circuitry (which may include one or more antennas, for example) may be used by the emulation devices to transmit and/or receive data.

도 1c는 일부 실시예들에 따른 시스템에 대한 예시적인 인터페이스 세트를 예시하는 시스템 다이어그램이다. 확장 현실 디스플레이 디바이스가, 그 제어 전자 장치와 함께, 구현될 수 있다. 시스템(150)은 아래에서 설명되는 다양한 컴포넌트들을 포함하는 디바이스로서 구체화될 수 있으며, 본 문서에서 설명되는 양태들 중 하나 이상을 수행하도록 구성된다. 그러한 디바이스들의 예는 개인용 컴퓨터, 랩톱 컴퓨터, 스마트폰, 태블릿 컴퓨터, 디지털 멀티미디어 셋톱 박스, 디지털 텔레비전 수신기, 개인용 비디오 레코딩 시스템, 커넥티드 가전 제품, 및 서버와 같은 다양한 전자 디바이스들을 포함하지만, 이들로 제한되지 않는다. 시스템(150)의 요소들은, 단독으로 또는 조합하여, 단일 집적 회로(IC), 다수의 IC, 및/또는 개별 컴포넌트들로 구체화될 수 있다. 예를 들어, 적어도 하나의 실시예에서, 시스템(150)의 처리 및 인코더/디코더 요소들은 다수의 IC 및/또는 개별 컴포넌트에 걸쳐 분산되어 있다. 다양한 실시예들에서, 시스템(150)은, 예를 들어, 통신 버스를 통해 또는 전용 입력 및/또는 출력 포트들을 통해 하나 이상의 다른 시스템, 또는 다른 전자 디바이스에 통신 가능하게 결합된다. 다양한 실시예들에서, 시스템(1000)은 본 문서에 설명되는 양태들 중 하나 이상을 구현하도록 구성된다.FIG. 1C is a system diagram illustrating an exemplary set of interfaces for a system according to some embodiments. An extended reality display device, along with its control electronics, may be implemented. The system (150) may be embodied as a device comprising various components described below and configured to perform one or more of the aspects described herein. Examples of such devices include, but are not limited to, various electronic devices such as personal computers, laptop computers, smart phones, tablet computers, digital multimedia set-top boxes, digital television receivers, personal video recording systems, connected home appliances, and servers. The elements of the system (150), alone or in combination, may be embodied as a single integrated circuit (IC), multiple ICs, and/or discrete components. For example, in at least one embodiment, the processing and encoder/decoder elements of the system (150) are distributed across multiple ICs and/or discrete components. In various embodiments, the system (150) is communicatively coupled to one or more other systems, or other electronic devices, for example, via a communications bus or via dedicated input and/or output ports. In various embodiments, the system (1000) is configured to implement one or more of the aspects described herein.

시스템(150)은, 예를 들어, 본 문서에 설명되는 다양한 양태들을 구현하기 위해 그 안에 로드되는 명령어들을 실행하도록 구성된 적어도 하나의 프로세서(152)를 포함한다. 프로세서(152)는 본 기술 분야에 알려진 바와 같은 임베디드 메모리, 입출력 인터페이스, 및 다양한 다른 회로들을 포함할 수 있다. 시스템(150)은 적어도 하나의 메모리(154)(예를 들면, 휘발성 메모리 디바이스, 및/또는 비휘발성 메모리 디바이스)를 포함한다. 시스템(150)은 EEPROM(Electrically Erasable Programmable Read-Only Memory), ROM(Read-Only Memory), PROM(Programmable Read-Only Memory), RAM(Random Access Memory), DRAM(Dynamic Random Access Memory), SRAM(Static Random Access Memory), 플래시, 자기 디스크 드라이브 및/또는 광학 디스크 드라이브를 포함하지만 이에 제한되지 않는 비휘발성 메모리 및/또는 휘발성 메모리를 포함할 수 있는 저장 디바이스(158)를 포함할 수 있다. 저장 디바이스(158)는, 비제한적인 예들로서, 내부 저장 디바이스, 부착형 저장 디바이스(분리 가능 및 비분리 가능 저장 디바이스들을 포함함), 및/또는 네트워크 액세스 가능 저장 디바이스를 포함할 수 있다.The system (150) includes at least one processor (152) configured to execute instructions loaded therein to implement, for example, various aspects described herein. The processor (152) may include embedded memory, input/output interfaces, and various other circuits as are known in the art. The system (150) includes at least one memory (154) (e.g., a volatile memory device and/or a nonvolatile memory device). The system (150) may include a storage device (158) which may include nonvolatile memory and/or volatile memory, including but not limited to Electrically Erasable Programmable Read-Only Memory (EEPROM), Read-Only Memory (ROM), Programmable Read-Only Memory (PROM), Random Access Memory (RAM), Dynamic Random Access Memory (DRAM), Static Random Access Memory (SRAM), flash, magnetic disk drives, and/or optical disk drives. The storage device (158) may include, but is not limited to, an internal storage device, an attachable storage device (including removable and non-removable storage devices), and/or a network accessible storage device.

시스템(150)은, 예를 들어, 인코딩된 비디오 또는 디코딩된 비디오를 제공하기 위해 데이터를 처리하도록 구성된 인코더/디코더 모듈(156)을 포함하고, 인코더/디코더 모듈(156)은 자체 프로세서 및 메모리를 포함할 수 있다. 인코더/디코더 모듈(156)은 인코딩 및/또는 디코딩 기능들을 수행하기 위해 디바이스에 포함될 수 있는 모듈(들)을 나타낸다. 알려진 바와 같이, 디바이스는 인코딩 모듈 및 디코딩 모듈 중 하나 또는 둘 모두를 포함할 수 있다. 추가적으로, 인코더/디코더 모듈(156)은 본 기술 분야의 통상의 기술자에게 알려진 바와 같이 시스템(150)의 별개의 요소로서 구현될 수 있거나 하드웨어와 소프트웨어의 조합으로서 프로세서(152) 내에 통합될 수 있다.The system (150) includes an encoder/decoder module (156) configured to process data to provide, for example, encoded video or decoded video, wherein the encoder/decoder module (156) may include its own processor and memory. The encoder/decoder module (156) represents a module(s) that may be included in a device to perform encoding and/or decoding functions. As will be appreciated, a device may include one or both of an encoding module and a decoding module. Additionally, the encoder/decoder module (156) may be implemented as a separate element of the system (150) or may be integrated into the processor (152) as a combination of hardware and software, as will be appreciated by those skilled in the art.

본 문서에서 설명되는 다양한 양태들을 수행하기 위해 프로세서(152) 또는 인코더/디코더(156)에 로드될 프로그램 코드는 저장 디바이스(158)에 저장되고 후속적으로 프로세서(152)에 의한 실행을 위해 메모리(154)에 로드될 수 있다. 다양한 실시예들에 따르면, 프로세서(152), 메모리(154), 저장 디바이스(158), 및 인코더/디코더 모듈(156) 중 하나 이상은 본 문서에서 설명되는 프로세스들의 수행 동안 다양한 항목들 중 하나 이상을 저장할 수 있다. 그러한 저장된 항목들은 입력 비디오, 디코딩된 비디오 또는 디코딩된 비디오의 부분들, 비트스트림, 행렬들, 변수들, 및 방정식들, 공식들, 연산들 및 연산 논리의 처리로부터의 중간 또는 최종 결과들을 포함할 수 있지만 이에 제한되지는 않는다.Program code to be loaded into the processor (152) or the encoder/decoder (156) to perform the various aspects described herein may be stored in the storage device (158) and subsequently loaded into the memory (154) for execution by the processor (152). According to various embodiments, one or more of the processor (152), the memory (154), the storage device (158), and the encoder/decoder module (156) may store one or more of various items during performance of the processes described herein. Such stored items may include, but are not limited to, input video, decoded video or portions of decoded video, a bitstream, matrices, variables, and intermediate or final results from processing equations, formulas, operations and computational logic.

일부 실시예들에서, 프로세서(152) 및/또는 인코더/디코더 모듈(156) 내부의 메모리는 명령어들을 저장하는 데 그리고 인코딩 또는 디코딩 동안 필요한 처리를 위한 작업 메모리를 제공하는 데 사용된다. 그렇지만, 다른 실시예들에서, 처리 디바이스(예를 들어, 처리 디바이스는 프로세서(152) 또는 인코더/디코더 모듈(152) 중 어느 하나일 수 있음) 외부의 메모리는 이러한 기능들 중 하나 이상을 위해 사용된다. 외부 메모리는 메모리(154) 및/또는 저장 디바이스(158), 예를 들어, 동적 휘발성 메모리 및/또는 비휘발성 플래시 메모리일 수 있다. 여러 실시예들에서, 외부 비휘발성 플래시 메모리는, 예를 들어, 텔레비전의 운영 체제를 저장하는 데 사용된다. 적어도 하나의 실시예에서, RAM과 같은 고속 외부 동적 휘발성 메모리는, 예를 들어, MPEG-2(MPEG는 동영상 전문가 그룹(Moving Picture Experts Group)을 지칭하고, MPEG-2는 ISO/IEC 13818이라고도 지칭되며, 13818-1은 H.222라고도 알려지고 13818-2는 H.262라고도 알려짐), HEVC(HEVC는 고효율 비디오 코딩(High Efficiency Video Coding)을 지칭하고, H.265 및 MPEG-H Part 2라고도 알려짐), 또는 VVC(Versatile Video Coding, JVET(Joint Video Experts Team)에 의해 개발 중인 새로운 표준)와 같은, 비디오 코딩 및 디코딩 동작들을 위한 작업 메모리로서 사용된다.In some embodiments, memory within the processor (152) and/or the encoder/decoder module (156) is used to store instructions and provide working memory for processing necessary during encoding or decoding. However, in other embodiments, memory external to the processing device (e.g., the processing device may be either the processor (152) or the encoder/decoder module (152)) is used for one or more of these functions. The external memory may be memory (154) and/or a storage device (158), such as dynamic volatile memory and/or nonvolatile flash memory. In several embodiments, the external nonvolatile flash memory is used to store, for example, an operating system of the television. In at least one embodiment, high-speed external dynamic volatile memory, such as RAM, is used as working memory for video coding and decoding operations, such as, for example, MPEG-2 (MPEG stands for Moving Picture Experts Group; MPEG-2 is also known as ISO/IEC 13818; 13818-1 is also known as H.222 and 13818-2 is also known as H.262), HEVC (HEVC stands for High Efficiency Video Coding; also known as H.265 and MPEG-H Part 2), or VVC (Versatile Video Coding, a new standard under development by the Joint Video Experts Team (JVET)).

시스템(150)의 요소들에 대한 입력은 블록(172)에 표시된 바와 같이 다양한 입력 디바이스들을 통해 제공될 수 있다. 그러한 입력 디바이스들은 (i), 예를 들어, 방송사에 의해 오버 디 에어(over the air)로, 송신되는 RF 신호를 수신하는 라디오 주파수(RF) 부분, (ii) 컴포넌트(COMP) 입력 단자(또는 COMP 입력 단자 세트), (iii) USB(Universal Serial Bus) 입력 단자, 및/또는 (iv) HDMI(High Definition Multimedia Interface) 입력 단자를 포함하지만, 이들로 제한되지 않는다. 도 1c에 도시되지 않은 다른 예들은 컴포지트 비디오(composite video)를 포함한다.Inputs to elements of the system (150) may be provided via a variety of input devices, as illustrated in block (172). Such input devices include, but are not limited to, (i) a radio frequency (RF) portion for receiving RF signals transmitted over the air, for example, by a broadcaster, (ii) a component (COMP) input terminal (or a set of COMP input terminals), (iii) a Universal Serial Bus (USB) input terminal, and/or (iv) a High Definition Multimedia Interface (HDMI) input terminal. Other examples not illustrated in FIG. 1c include composite video.

다양한 실시예들에서, 블록(172)의 입력 디바이스들은 본 기술 분야에서 알려진 바와 같이 연관된 각자의 입력 처리 요소들을 가진다. 예를 들어, RF 부분은 (i) 원하는 주파수를 선택하는 것(신호를 선택하는 것 또는 신호를 주파수 대역으로 대역 제한하는 것으로도 지칭됨), (ii) 선택된 신호를 하향 변환하는 것, (iii) 특정 실시예들에서 채널이라고 지칭될 수 있는 (예를 들어) 신호 주파수 대역을 선택하기 위해 보다 좁은 주파수 대역으로 또다시 대역 제한하는 것, (iv) 하향 변환되고 대역 제한된 신호를 복조하는 것, (v) 오류 정정을 수행하는 것, 및 (vi) 원하는 데이터 패킷 스트림을 선택하기 위해 역다중화하는 것에 적합한 요소들과 연관될 수 있다. 다양한 실시예들의 RF 부분은 이러한 기능들을 수행하기 위한 하나 이상의 요소, 예를 들어, 주파수 선택기, 신호 선택기, 대역 제한기, 채널 선택기, 필터, 하향 변환기, 복조기, 오류 정정기 및 역다중화기를 포함한다. RF 부분은, 예를 들어, 수신된 신호를 보다 낮은 주파수(예를 들어, 중간 주파수 또는 기저대역 근처의 주파수(near-baseband frequency))로 또는 기저대역으로 하향 변환하는 것을 포함하여, 이러한 기능들 중 다수를 수행하는 튜너를 포함할 수 있다. 하나의 셋톱 박스 실시예에서, RF 부분 및 그의 연관된 입력 처리 요소는 유선(예를 들어, 케이블) 매체를 통해 송신되는 RF 신호를 수신하고, 원하는 주파수 대역으로 필터링하는 것, 하향 변환하는 것 그리고 또다시 필터링하는 것에 의해 주파수 선택을 수행한다. 다양한 실시예들은 위에서 설명된(및 다른) 요소들의 순서를 재배열하고, 이러한 요소들 중 일부를 제거하고/하거나, 유사하거나 상이한 기능들을 수행하는 다른 요소들을 추가한다. 요소들을 추가하는 것은, 예를 들어, 증폭기들 및 아날로그-디지털 변환기를 삽입하는 것과 같이, 기존의 요소들 사이에 요소들을 삽입하는 것을 포함할 수 있다. 다양한 실시예들에서, RF 부분은 안테나를 포함한다.In various embodiments, the input devices of block (172) have their own associated input processing elements as is known in the art. For example, the RF portion may be associated with elements suitable for (i) selecting a desired frequency (also referred to as selecting a signal or band-limiting a signal to a frequency band), (ii) down-converting the selected signal, (iii) further band-limiting the signal to a narrower frequency band (for example) which may in certain embodiments be referred to as a channel, (iv) demodulating the down-converted and band-limited signal, (v) performing error correction, and (vi) demultiplexing to select a desired data packet stream. The RF portion of various embodiments may include one or more elements for performing these functions, such as a frequency selector, a signal selector, a band-limiter, a channel selector, a filter, a down-converter, a demodulator, an error corrector, and a demultiplexer. The RF portion may include a tuner that performs many of these functions, including, for example, downconverting a received signal to a lower frequency (e.g., an intermediate frequency or near-baseband frequency) or to baseband. In one set-top box embodiment, the RF portion and its associated input processing elements perform frequency selection by receiving an RF signal transmitted over a wired (e.g., cable) medium, filtering it to a desired frequency band, downconverting it, and filtering it again. Various embodiments rearrange the order of the elements described above (and others), removing some of these elements, and/or adding other elements that perform similar or different functions. Adding elements may include inserting elements between existing elements, such as inserting amplifiers and analog-to-digital converters. In various embodiments, the RF portion includes an antenna.

추가적으로, USB 및/또는 HDMI 단자들은 USB 및/또는 HDMI 연결들을 통해 시스템(150)을 다른 전자 디바이스들에 연결시키기 위한 각자의 인터페이스 프로세서들을 포함할 수 있다. 입력 처리의 다양한 양태들, 예를 들어 리드-솔로몬 오류 정정이, 예를 들어, 필요에 따라 별개의 입력 처리 IC 내에서 또는 프로세서(152) 내에서 구현될 수 있다는 것이 이해되어야 한다. 유사하게, USB 또는 HDMI 인터페이스 처리의 양태들은 필요에 따라 별개의 인터페이스 IC들 내에서 또는 프로세서(152) 내에서 구현될 수 있다. 복조되고, 오류 정정되며 역다중화된 스트림은 출력 디바이스 상에 제시하기 위해 필요에 따라 데이터 스트림을 처리하기 위해, 예를 들어, 메모리 및 저장 요소들과 결합하여 작동하는 프로세서(152) 및 인코더/디코더(156)를 포함한, 다양한 처리 요소들에 제공된다.Additionally, the USB and/or HDMI terminals may include their own interface processors for connecting the system (150) to other electronic devices via the USB and/or HDMI connections. It should be appreciated that various aspects of input processing, such as Reed-Solomon error correction, may be implemented, as desired, within separate input processing ICs or within the processor (152). Similarly, aspects of USB or HDMI interface processing may be implemented, as desired, within separate interface ICs or within the processor (152). The demodulated, error corrected, and demultiplexed stream is provided to various processing elements, including, for example, the processor (152) and an encoder/decoder (156) that operate in conjunction with memory and storage elements to process the data stream as desired for presentation on an output device.

시스템(150)의 다양한 요소들은 통합 하우징 내에 제공될 수 있다. 통합 하우징 내에서, 다양한 요소들은 적합한 연결 장치(connection arrangement)(174), 예를 들어, I2C(Inter-IC) 버스, 배선 및 인쇄 회로 기판들을 포함하여, 본 기술 분야에서 알려진 내부 버스를 사용하여 상호 연결되고 그 사이에서 데이터를 송신할 수 있다.The various elements of the system (150) may be provided within an integrated housing. Within the integrated housing, the various elements may be interconnected and data may be transmitted therebetween using suitable connection arrangements (174), for example, an Inter-IC (I2C) bus, wiring, and printed circuit boards, as known in the art.

시스템(150)은 통신 채널(162)을 통해 다른 디바이스들과의 통신을 가능하게 하는 통신 인터페이스(160)를 포함한다. 통신 인터페이스(160)는 통신 채널(162)을 통해 데이터를 송신 및 수신하도록 구성된 트랜시버를 포함할 수 있지만, 이에 제한되지 않는다. 통신 인터페이스(160)는 모뎀 또는 네트워크 카드를 포함할 수 있지만 이에 제한되지 않으며, 통신 채널(162)은, 예를 들어, 유선 및/또는 무선 매체 내에 구현될 수 있다.The system (150) includes a communication interface (160) that enables communication with other devices via a communication channel (162). The communication interface (160) may include, but is not limited to, a transceiver configured to transmit and receive data via the communication channel (162). The communication interface (160) may include, but is not limited to, a modem or a network card, and the communication channel (162) may be implemented within, for example, wired and/or wireless media.

데이터는, 다양한 실시예들에서, Wi-Fi 네트워크, 예를 들어, IEEE 802.11(IEEE는 Institute of Electrical and Electronics Engineers를 지칭함)과 같은 무선 네트워크를 사용하여 시스템(150)에 스트리밍되거나 다른 방식으로 제공된다. 이러한 실시예들의 Wi-Fi 신호는 Wi-Fi 통신에 적합한 통신 채널(162) 및 통신 인터페이스(160)를 통해 수신된다. 이러한 실시예들의 통신 채널(162)은 전형적으로 스트리밍 애플리케이션들 및 다른 오버-더-톱(over-the-top) 통신을 가능하게 하기 위해 인터넷을 포함한 외부 네트워크들에 대한 액세스를 제공하는 액세스 포인트 또는 라우터에 연결된다. 다른 실시예들은, 입력 블록(172)의 HDMI 연결을 통해 데이터를 전달하는 셋톱 박스를 사용하여, 스트리밍된 데이터를 시스템(150)에 제공한다. 또 다른 실시예들은, 입력 블록(172)의 RF 연결을 사용하여, 스트리밍된 데이터를 시스템(150)에 제공한다. 위에서 언급된 바와 같이, 다양한 실시예들은 비스트리밍(non-streaming) 방식으로 데이터를 제공한다. 추가적으로, 다양한 실시예들은 Wi-Fi 이외의 무선 네트워크들, 예를 들어, 셀룰러 네트워크 또는 블루투스 네트워크를 사용한다.Data is, in various embodiments, streamed or otherwise provided to the system (150) using a wireless network, such as a Wi-Fi network, for example, an IEEE 802.11 (wherein IEEE stands for Institute of Electrical and Electronics Engineers). The Wi-Fi signal of such embodiments is received via a communication channel (162) and a communication interface (160) suitable for Wi-Fi communications. The communication channel (162) of such embodiments is typically connected to an access point or router that provides access to external networks, including the Internet, to enable streaming applications and other over-the-top communications. Other embodiments provide the streamed data to the system (150) using a set-top box that passes the data via an HDMI connection of the input block (172). Still other embodiments provide the streamed data to the system (150) using an RF connection of the input block (172). As noted above, various embodiments provide the data in a non-streaming manner. Additionally, various embodiments use wireless networks other than Wi-Fi, such as a cellular network or a Bluetooth network.

시스템(150)은, 디스플레이(176), 스피커들(178), 및 다른 주변 디바이스들(180)을 포함한, 다양한 출력 디바이스들에 출력 신호를 제공할 수 있다. 다양한 실시예들의 디스플레이(176)는, 예를 들어, 터치스크린 디스플레이, OLED(organic light-emitting diode) 디스플레이, 곡면 디스플레이(curved display), 및/또는 폴더블 디스플레이(foldable display) 중 하나 이상을 포함한다. 디스플레이(176)는 텔레비전, 태블릿, 랩톱, 셀 폰(모바일 폰), 또는 다른 디바이스에 대한 것일 수 있다. 디스플레이(176)는 또한 다른 컴포넌트들과 통합될 수 있거나(예를 들어, 스마트 폰에서), 분리될 수 있다(예를 들어, 랩톱에 대한 외부 모니터). 다른 주변 디바이스들(180)은, 실시예들의 다양한 예들에서, 독립형 디지털 비디오 디스크(digital video disc)(또는 디지털 다기능 디스크(digital versatile disc))(두 용어 모두에 대해, DVR), 디스크 플레이어, 스테레오 시스템 및/또는 조명 시스템 중 하나 이상을 포함한다. 다양한 실시예들은 시스템(150)의 출력에 기초하여 기능을 제공하는 하나 이상의 주변 디바이스(180)를 사용한다. 예를 들어, 디스크 플레이어는 시스템(150)의 출력을 재생하는 기능을 수행한다.The system (150) can provide output signals to various output devices, including a display (176), speakers (178), and other peripheral devices (180). The display (176) of various embodiments includes, for example, one or more of a touchscreen display, an organic light-emitting diode (OLED) display, a curved display, and/or a foldable display. The display (176) can be for a television, a tablet, a laptop, a cell phone (mobile phone), or other device. The display (176) can also be integrated with other components (e.g., in a smart phone) or separate (e.g., as an external monitor for a laptop). The other peripheral devices (180), in various examples of embodiments, include one or more of a standalone digital video disc (or digital versatile disc) (for both terms, a DVR), a disc player, a stereo system, and/or a lighting system. Various embodiments utilize one or more peripheral devices (180) that provide functionality based on the output of the system (150). For example, a disc player performs the function of reproducing the output of the system (150).

다양한 실시예들에서, 제어 신호들이 AV.Link, CEC(Consumer Electronics Control), 또는 사용자 개입 여부에 관계없이 디바이스 대 디바이스 제어(device-to-device control)를 가능하게 하는 다른 통신 프로토콜들과 같은 시그널링을 사용하여 시스템(150)과 디스플레이(176), 스피커들(178), 또는 다른 주변 디바이스들(180) 사이에서 통신된다. 출력 디바이스들은 각자의 인터페이스들(164, 166 및 168)을 통한 전용 연결들을 통해 시스템(1000)에 통신 가능하게 결합될 수 있다. 대안적으로, 출력 디바이스들은 통신 인터페이스(160)를 통해 통신 채널(162)을 사용하여 시스템(150)에 연결될 수 있다. 디스플레이(176) 및 스피커들(178)은, 예를 들어, 텔레비전과 같은 전자 디바이스에서 시스템(150)의 다른 컴포넌트들과 단일 유닛으로 통합될 수 있다. 다양한 실시예들에서, 디스플레이 인터페이스(164)는, 예를 들어, 타이밍 컨트롤러(T 콘(T Con)) 칩과 같은 디스플레이 드라이버를 포함한다.In various embodiments, control signals are communicated between the system (150) and the display (176), speakers (178), or other peripheral devices (180) using signaling such as AV.Link, Consumer Electronics Control (CEC), or other communication protocols that enable device-to-device control with or without user intervention. The output devices may be communicatively coupled to the system (1000) via dedicated connections through their respective interfaces (164, 166, and 168). Alternatively, the output devices may be communicatively connected to the system (150) using a communication channel (162) via the communication interface (160). The display (176) and speakers (178) may be integrated into a single unit with other components of the system (150), for example in an electronic device such as a television. In various embodiments, the display interface (164) includes a display driver, such as a timing controller (T Con) chip.

디스플레이(176) 및 스피커(178)는 대안적으로, 예를 들어, 입력(172)의 RF 부분이 별개의 셋톱 박스의 일부인 경우 다른 컴포넌트들 중 하나 이상과 분리될 수 있다. 디스플레이(176) 및 스피커들(178)이 외부 컴포넌트들인 다양한 실시예들에서, 출력 신호는, 예를 들어, HDMI 포트들, USB 포트들, 또는 COMP 출력들을 포함한, 전용 출력 연결들을 통해 제공될 수 있다.The display (176) and speakers (178) may alternatively be separate from one or more of the other components, for example, if the RF portion of the input (172) is part of a separate set-top box. In various embodiments where the display (176) and speakers (178) are external components, the output signal may be provided via dedicated output connections, including, for example, HDMI ports, USB ports, or COMP outputs.

시스템(150)은 하나 이상의 센서 디바이스(168)를 포함할 수 있다. 사용될 수 있는 센서 디바이스들의 예들은 하나 이상의 GPS 센서, 자이로스코프 센서, 가속도계, 광 센서, 카메라, 깊이 카메라, 마이크로폰 및/또는 자력계를 포함한다. 이러한 센서들은 사용자의 위치 및 배향과 같은 정보를 결정하는 데 사용될 수 있다. 시스템(150)이 (제어 모듈들(124, 132)과 같은) 확장 현실 디스플레이에 대한 제어 모듈로서 사용될 때, 사용자의 위치와 배향은 사용자가 올바른 시점에서 가상 객체 또는 가상 장면의 올바른 부분을 인지하도록 이미지 데이터를 렌더링하는 방법을 결정하는 데 사용될 수 있다. 머리 장착형 디스플레이 디바이스의 경우에, 가상 콘텐츠를 렌더링할 목적으로 사용자의 위치 및 배향을 결정하는 데 디바이스 자체의 위치 및 배향이 사용될 수 있다. 전화, 태블릿, 컴퓨터 모니터 또는 텔레비전과 같은 다른 디스플레이 디바이스들의 경우에, 콘텐츠를 렌더링할 목적으로 사용자의 위치 및 배향을 결정하는 데 다른 입력들이 사용될 수 있다. 예를 들어, 사용자는 터치스크린, 키패드 또는 키보드, 트랙볼, 조이스틱 또는 기타 입력을 사용하여 원하는 시점 및/또는 시청 방향을 선택 및/또는 조정할 수 있다. 디스플레이 디바이스가 가속도계 및/또는 자이로스코프와 같은 센서들을 가지는 경우, 콘텐츠를 렌더링할 목적으로 사용되는 시점 및 배향이 디스플레이 디바이스의 움직임에 기초하여 선택 및/또는 조정될 수 있다.The system (150) may include one or more sensor devices (168). Examples of sensor devices that may be used include one or more GPS sensors, gyroscopes, accelerometers, optical sensors, cameras, depth cameras, microphones, and/or magnetometers. Such sensors may be used to determine information such as the user's position and orientation. When the system (150) is used as a control module for an extended reality display (such as the control modules (124, 132)), the user's position and orientation may be used to determine how to render image data so that the user perceives the correct portion of a virtual object or virtual scene at the correct point in time. In the case of a head-mounted display device, the position and orientation of the device itself may be used to determine the user's position and orientation for the purpose of rendering virtual content. In the case of other display devices, such as a phone, tablet, computer monitor, or television, other inputs may be used to determine the user's position and orientation for the purpose of rendering content. For example, a user may select and/or adjust a desired viewpoint and/or orientation using a touchscreen, keypad or keyboard, trackball, joystick, or other input. If the display device has sensors such as an accelerometer and/or gyroscope, the viewpoint and orientation used for rendering content may be selected and/or adjusted based on movement of the display device.

실시예들은 프로세서(152)에 의해 구현되는 컴퓨터 소프트웨어에 의해 또는 하드웨어에 의해, 또는 하드웨어와 소프트웨어의 조합에 의해 수행될 수 있다. 비제한적인 예로서, 실시예들은 하나 이상의 집적 회로에 의해 구현될 수 있다. 메모리(154)는 기술적 환경에 적절한 임의의 유형일 수 있고, 비제한적 예들로서, 광학 메모리 디바이스, 자기 메모리 디바이스, 반도체 기반 메모리 디바이스, 고정식 메모리 및 이동식 메모리와 같은 임의의 적절한 데이터 저장 기술을 사용하여 구현될 수 있다. 프로세서(152)는 기술적 환경에 적절한 임의의 유형일 수 있으며, 비제한적 예들로서, 마이크로프로세서, 범용 컴퓨터, 특수 목적 컴퓨터 및 멀티 코어 아키텍처 기반 프로세서 중 하나 이상을 포괄할 수 있다.The embodiments may be performed by computer software implemented by the processor (152), by hardware, or by a combination of hardware and software. As a non-limiting example, the embodiments may be implemented by one or more integrated circuits. The memory (154) may be of any type suitable to the technical environment, and may be implemented using any suitable data storage technology, such as, but not limited to, optical memory devices, magnetic memory devices, semiconductor-based memory devices, fixed memory, and removable memory. The processor (152) may be of any type suitable to the technical environment, and may include, but not limited to, one or more of a microprocessor, a general purpose computer, a special purpose computer, and a multi-core architecture-based processor.

본 출원은 포인트 클라우드 신호들의 처리, 압축, 표현, 분석 및 이해를 포함하는 포인트 클라우드 처리 및 압축에 대해 논의한다. 또한, 본 출원은 포인트 클라우드 처리 및 압축에 적용된 일부 실시예들을 포함하여, 딥 신경 네트워크 기반의 적응형 복셀 기반 포인트 클라우드 업샘플링 방법에 대해 논의한다. 본 출원은 또한 복셀 도메인에서 업샘플링을 수행하는 것에 대해 논의한다.The present application discusses point cloud processing and compression, including processing, compression, representation, analysis and understanding of point cloud signals. The present application also discusses a deep neural network-based adaptive voxel-based point cloud upsampling method, including some embodiments applied to point cloud processing and compression. The present application also discusses performing upsampling in the voxel domain.

포인트 클라우드 데이터는, 예를 들면, 5G 네트워크를 통해 커넥티드 카들 간에 그리고 몰입형(예를 들면, AR/VR/MR) 통신에서, 네트워크 트래픽의 많은 부분을 차지(consume)할 수 있다. 효율적인 표현 포맷들이 포인트 클라우드 및 통신에 사용될 수 있다. 특히, 원시 포인트 클라우드 데이터는 세계, 환경 또는 장면과 같은 모델링 및 감지를 위해 구성(organize)되고 처리될 수 있다. 원시 포인트 클라우드의 압축은 데이터의 저장 및 송신에 사용될 수 있다.Point cloud data can consume a large portion of network traffic, for example, between connected cars over 5G networks and in immersive (e.g., AR/VR/MR) communications. Efficient representation formats can be used for point cloud and communication. In particular, raw point cloud data can be organized and processed for modeling and sensing, such as the world, environment or scene. Compression of the raw point cloud can be used for storage and transmission of the data.

게다가, 포인트 클라우드는 다수의 움직이는 객체를 포함할 수 있는, 동일한 장면의 순차적인 스캔들을 나타낼 수 있다. 동적 포인트 클라우드는 움직이는 객체를 캡처하는 반면, 정적 포인트 클라우드는 정적 장면 및/또는 정적 객체를 캡처한다. 동적 포인트 클라우드는 전형적으로 프레임으로 구성될 수 있으며, 상이한 프레임들은 상이한 시간들에서 캡처된다. 동적 포인트 클라우드의 처리 및 압축은 실시간으로 또는 적은 양의 지연으로 수행될 수 있다.In addition, point clouds may represent sequential scans of the same scene, which may include multiple moving objects. Dynamic point clouds capture moving objects, whereas static point clouds capture static scenes and/or static objects. Dynamic point clouds may typically consist of frames, with different frames captured at different times. Processing and compression of dynamic point clouds may be performed in real time or with a small amount of delay.

예를 들어, 자율 주행 차량을 포함한 자동차 산업은 포인트 클라우드를 사용할 수 있다. 자율 주행 자동차는 그의 바로 인접한 주변 상황(immediate surroundings)에 기초하여 운전 결정을 내리기 위해 그의 환경을 "탐색(probe)"한다. 전형적으로, LiDAR 센서는 인지 엔진(perception engine)에 의해 사용되는 (동적) 포인트 클라우드를 생성한다. 게다가, 전형적으로, 이러한 포인트 클라우드는 높은 캡처 빈도로 동적이고, 희소하며, 반드시 컬러인 것은 아니며, 인간의 눈에 보이지 않는다. 이러한 포인트 클라우드는 감지된 객체의 재료를 나타낼 수 있고 의사 결정에 사용될 수 있는 LiDAR에 의해 제공되는 반사율(reflectance ratio)과 같은 다른 속성들을 포함할 수 있다.For example, the automotive industry, including autonomous vehicles, may use point clouds. An autonomous vehicle "probes" its environment to make driving decisions based on its immediate surroundings. Typically, a LiDAR sensor produces a (dynamic) point cloud that is used by the perception engine. Furthermore, typically, such point clouds are dynamic, sparse, not necessarily colored, and invisible to the human eye, with a high capture rate. Such point clouds may include other properties, such as reflectance ratio, provided by the LiDAR, that may indicate the material of the detected object and may be used in decision making.

자동차 산업과 자율 주행 자동차는 포인트 클라우드가 사용될 수 있는 분야들 중 일부이다. 자율 주행 자동차는 그의 바로 인접한 주변 상황의 현실에 기초하여 양호한 운전 결정을 내리기 위해 그의 환경을 "탐색" 및 감지한다. LiDAR와 같은 센서는 인지 엔진에 의해 사용되는 (동적) 포인트 클라우드를 생성한다. 이러한 포인트 클라우드는 전형적으로 인간의 눈에 보이도록 의도되지 않으며, 이러한 포인트 클라우드는 컬러일 수 있거나 컬러가 아닐 수 있고, 전형적으로 희소하고 높은 캡처 빈도로 동적이다. 이러한 포인트 클라우드는 LiDAR에 의해 제공되는 반사율과 같은 다른 속성들을 가질 수 있는데 왜냐하면 이 속성이 감지된 객체의 재료를 나타내며 의사 결정에 도움이 될 수 있기 때문이다.The automotive industry and autonomous vehicles are some of the areas where point clouds can be used. Autonomous vehicles “navigate” and sense their environment in order to make good driving decisions based on the reality of their immediate surroundings. Sensors such as LiDAR generate (dynamic) point clouds that are used by the perception engine. These point clouds are typically not intended to be visible to the human eye, they may or may not be colored, and they are typically sparse and dynamic with a high capture rate. These point clouds may have other properties such as reflectivity provided by the LiDAR because this property indicates the material of the detected object and can be helpful in decision making.

가상 현실(VR) 및 몰입형 세계는 뜨거운 주제가 되었으며 많은 사람들에 의해 2D 평면 비디오(2D flat video)의 미래로 예견되고 있다. 시청자는, 시청자가 시청자 앞에 있는 가상 세계만을 보는 표준 TV와는 달리, 전방위 환경(all-around environment)에 몰입할 수 있다. 환경에서의 시청자의 자유도에 따라 몰입감에 여러 단계가 있다. 포인트 클라우드 포맷은 VR 세계와 환경 데이터를 배포하는 데 사용될 수 있다. 이러한 포인트 클라우드는 정적이거나 동적일 수 있으며, 전형적으로 한 번에 수백만 개 미만의 포인트와 같은 평균 크기이다.Virtual reality (VR) and immersive worlds have become hot topics and are seen by many as the future of 2D flat video. Unlike standard TV where the viewer only sees the virtual world in front of them, viewers can be immersed in an all-around environment. There are different levels of immersion depending on the viewer's degree of freedom in the environment. Point cloud formats can be used to distribute VR world and environment data. These point clouds can be static or dynamic and are typically of average size, such as less than a million points at a time.

포인트 클라우드는 또한 조각상 또는 건물과 같은 객체가 3D로 스캔되는 문화 유산 객체 및/또는 건물의 스캔과 같은 다양한 다른 목적들을 위해 사용될 수 있다. 객체의 공간 구성 데이터가 실제 객체 또는 건물을 보내거나 방문하지 않고도 공유될 수 있다. 또한, 이 데이터는 지진에 의해 사원이 파괴되는 것과 같이, 객체 또는 건물이 파괴되는 경우에 객체에 대한 지식을 보존하는 데 사용될 수 있다. 이러한 포인트 클라우드는, 전형적으로, 정적이고, 컬러이며, 크기가 엄청나다.Point clouds can also be used for a variety of other purposes, such as scanning cultural heritage objects and/or buildings, where objects such as statues or buildings are scanned in 3D. The spatial configuration data of the object can be shared without sending or visiting the actual object or building. Additionally, this data can be used to preserve knowledge about the object in case the object or building is destroyed, such as a temple being destroyed by an earthquake. These point clouds are typically static, colored, and huge in size.

다른 사용 사례는 지도가 평면으로 제한되지 않고 기복을 포함할 수 있는, 3D 표현을 사용하는 지형도 제작(topography) 및 지도 제작(cartography)에서이다. 예를 들어, Google Maps는 그의 3D 지도를 위해 포인트 클라우드 대신 메시(mesh)를 사용할 수 있다. 그럼에도 불구하고, 포인트 클라우드는 3D 지도에 적합한 데이터 포맷일 수 있으며, 이러한 포인트 클라우드는, 전형적으로, 또한 정적이고, 컬러이며, 크기가 엄청나다.Another use case is in topography and cartography, where the map is not limited to a flat surface and can include relief, using a 3D representation. For example, Google Maps can use meshes instead of point clouds for its 3D maps. Nevertheless, point clouds can be a suitable data format for 3D maps, and these point clouds are typically also static, colored, and huge in size.

포인트 클라우드를 통한 세계 모델링 및 감지는 머신이 그 주변의 3D 세계에 대한 공간 구성 데이터를 기록하고 사용할 수 있도록 하며, 이는 위에서 논의된 응용 분야에서 사용될 수 있다.World modeling and sensing via point clouds allows machines to record and use spatial configuration data about the 3D world around them, which can be used in the applications discussed above.

3D 포인트 클라우드 데이터는 객체 또는 장면의 표면의 이산 샘플을 포함한다. 포인트 샘플로 현실 세계를 완전히 표현하기 위해서는, 엄청난 수의 포인트가 사용될 수 있다. 예를 들어, 전형적인 VR 몰입형 장면은 수백만 개의 포인트를 포함하는 반면, 포인트 클라우드는 전형적으로 수억 개의 포인트를 포함할 수 있다. 따라서, 이러한 대규모 포인트 클라우드의 처리는, 특히 제한된 계산 능력을 가질 수 있는 소비자 디바이스, 예를 들면, 스마트폰, 태블릿 및 자동차 내비게이션 시스템에서, 계산 비용이 많이 든다.3D point cloud data contains discrete samples of the surface of an object or scene. To fully represent the real world with point samples, a huge number of points can be used. For example, a typical VR immersive scene contains millions of points, while a point cloud can typically contain hundreds of millions of points. Therefore, processing such large point clouds is computationally expensive, especially for consumer devices that may have limited computational capabilities, such as smartphones, tablets, and car navigation systems.

저렴한 계산 비용으로 입력 포인트 클라우드를 저장하고 처리하기 위해, 입력 포인트 클라우드는 다운샘플링될 수 있으며, 여기서 다운샘플링된 포인트 클라우드는 훨씬 적은 포인트를 가지면서 입력 포인트 클라우드의 기하학적 구조를 요약한다. 다운샘플링된 포인트 클라우드는 추가 처리를 위해 후속 머신 작업에 입력된다. 다운샘플링된 포인트 클라우드는 포인트 클라우드를 점진적으로 업샘플링하는 것에 의해 처리될 수 있다. 특히, 일부 실시예들에서, 학습 기반 오토인코더 아키텍처는 특징 추출을 위해 다운샘플링을 사용하고 재구성을 위해 업샘플링을 사용할 수 있다. 예를 들어, 이러한 업샘플링은 (예를 들면, 디코더에서) 포인트 클라우드 압축에 그리고 포인트 클라우드 초해상도에 사용될 수 있다. 그 중에서도, 본 출원은 일부 실시예들에 따른, 적응형 포인트 클라우드 업샘플링 방법의 예를 논의한다.To store and process the input point cloud at low computational cost, the input point cloud can be downsampled, where the downsampled point cloud summarizes the geometry of the input point cloud while having much fewer points. The downsampled point cloud is input to a subsequent machine task for further processing. The downsampled point cloud can be processed by incrementally upsampling the point cloud. In particular, in some embodiments, a learning-based autoencoder architecture can use downsampling for feature extraction and upsampling for reconstruction. For example, such upsampling can be used for point cloud compression (e.g., in the decoder) and for point cloud super-resolution. Among others, the present application discusses an example of an adaptive point cloud upsampling method, according to some embodiments.

도 2a는 포인트 클라우드의 예시적인 복셀 기반 표현을 보여주는 개략적인 예시이다. 포인트 클라우드 데이터의 복셀 기반 표현에서, 3D 포인트 좌표는 양자화 스텝(quantization step)에 의해 균일하게 양자화된다. 도 2a에 도시된 바와 같이, 표현(200)에서의 각각의 포인트는 양자화 스텝과 동일한 크기를 갖는 점유된 복셀에 대응한다. “단순한(Nave)” 복셀 표현은 대부분의 복셀이 일반적으로 비어 있을 수 있기 때문에 메모리 사용 면에서 효율적이지 않을 수 있다. 따라서 효율적인 저장 및 처리를 위해 점유된 복셀이 희소 텐서 포맷으로 배열되는 희소 복셀 표현이 도입된다.Fig. 2a is a schematic illustration showing an exemplary voxel-based representation of a point cloud. In a voxel-based representation of point cloud data, 3D point coordinates are uniformly quantized by a quantization step. As illustrated in Fig. 2a, each point in the representation (200) corresponds to an occupied voxel having the same size as the quantization step. “Simple (Na The “ve)” voxel representation may not be memory efficient since most of the voxels may be empty. Therefore, a sparse voxel representation is introduced where occupied voxels are arranged in a sparse tensor format for efficient storage and processing.

도 2b는 포인트 클라우드의 예시적인 희소 복셀 기반 표현을 보여주는 개략적인 예시이다. 도 2b에 묘사된 희소 복셀 표현(250)의 예에서, 비어 있는 복셀(252)(점선으로 표시)은 점유된 복셀(254)(대각 실선으로 표시)만큼의 많은 메모리 또는 저장소를 반드시 소비하지는 않는다.Figure 2b is a schematic illustration showing an exemplary sparse voxel-based representation of a point cloud. In the example sparse voxel representation (250) depicted in Figure 2b, empty voxels (252) (represented by a dashed line) do not necessarily consume as much memory or storage as occupied voxels (254) (represented by a solid diagonal line).

도 2a와 도 2b 및, 도 3, 도 4, 도 5, 도 8, 및 도 9를 포함한, 나머지 도면들은 단지 설명 및 단순화를 위해 포인트 클라우드가 2D로 예시되어 있으며, 이 개념들은 일반적으로 3D에 적용되고 3D로 확장될 수 있음에 유의한다.Note that the remaining drawings, including FIGS. 2A and 2B and FIGS. 3, 4, 5, 8, and 9, illustrate point clouds in 2D for illustrative and simplistic purposes only, and that the concepts are generally applicable and extendable to 3D.

포인트 클라우드를 3D 복셀로 표현하는 것에 의해, 포인트 클라우드는 3D 콘볼루션 신경 네트워크로 처리(소화)될 수 있다. 2D 콘볼루션 신경 네트워크를 2D 이미지에 적용하는 것은 성공적이었다. 일반적인 3D 콘볼루션의 경우, 복셀이 점유되어 있는지 또는 비어있는지에 상관없이 3D 커널이 스트라이드 스텝(stride step)에 의해 지정된 모든 위치에 오버레이된다. 스트라이드는 콘볼루션을 적용할 때 3D 복셀 그리드 위에서의 이동량 또는 스텝 크기를 나타낸다. 스텝 크기가 1로 설정되는 경우, 3D 커널은 전형적으로 3D 공간에서 모든 복셀을 슬라이딩하며 출력을 계산한다. 이 경우에, 출력 복셀 공간의 차원(높이, 폭, 깊이)은 입력 복셀 공간의 차원과 동일하다. 스트라이드 스텝이 2로 설정된 경우, 3D 커널은 두 개 걸러 하나의 복셀(every other two voxels)을 슬라이딩하며 출력을 계산한다. 이 경우에, 출력 복셀 공간의 모든 차원은 입력 복셀 공간의 절반이 된다. 비어 있는 복셀은 해당 복셀의 위치에 3D 포인트가 없는 복셀이다. 비어 있는 복셀로 인해 발생하는 계산 및 메모리 소모를 피하기 위해, 포인트 클라우드 복셀이 희소 텐서로 표현되는 경우 희소 3D 콘볼루션 계층이 적용될 수 있다.By representing the point cloud as a 3D voxel, the point cloud can be processed (digested) by a 3D convolutional neural network. The application of a 2D convolutional neural network to a 2D image has been successful. In the case of a typical 3D convolution, a 3D kernel is overlaid at every location specified by the stride step, regardless of whether the voxel is occupied or empty. The stride represents the amount of movement or step size on the 3D voxel grid when applying the convolution. When the step size is set to 1, the 3D kernel typically slides over every voxel in the 3D space and computes the output. In this case, the dimensions (height, width, depth) of the output voxel space are the same as the dimensions of the input voxel space. When the stride step is set to 2, the 3D kernel slides over every other two voxels and computes the output. In this case, all dimensions of the output voxel space are half of the input voxel space. An empty voxel is a voxel that does not have a 3D point at its location. To avoid the computation and memory consumption caused by empty voxels, a sparse 3D convolution layer can be applied if the point cloud voxels are represented as sparse tensors.

도 3은 포인트 클라우드에 대한 예시적인 최근접 이웃(NN) 업샘플링을 보여주는 개략적인 프로세스 다이어그램이다. 예를 들어, 복셀 도메인에서 두 배(2x) 업샘플링을 수행하기 위해, 점유된 복셀이 x 방향, y 방향, 및 z 방향 모두를 따라 두 개의 복셀로 분할된다. 따라서, 하나의 (점유된) 부모 복셀은 업샘플링 후에 8개(=)개의 점유된 자식 복셀이 되며, 포인트 클라우드의 해상도는 각각의 차원(x, y, 및 z)에서 두 배로 업샘플링된다. 이 예시적인 접근 방식에 따르면, 부모 복셀과 연관된 임의의 특징 벡터가 있는 경우, 이 특징 벡터는 그의 8개의 자식 복셀에 직접 상속된다. 이 업샘플링 접근 방식의 메커니즘은 도 3에 묘사되어 있으며 최근접 이웃(NN) 업샘플링이라고 지칭된다. 도 3의 예에서, 입력 포인트 클라우드(302)가 NN 업샘플링(304)된 후, 출력 포인트 클라우드(306)와 그 특징들이 추가 처리를 위해 후속 파이프라인에 입력된다.Figure 3 is a schematic process diagram showing an example nearest neighbor (NN) upsampling for a point cloud. For example, to perform two-fold (2x) upsampling in the voxel domain, an occupied voxel is split into two voxels along both the x- , y- , and z- directions. Thus, one (occupied) parent voxel is split into eight (= ) occupied child voxels, and the resolution of the point cloud is upsampled by a factor of two in each dimension (x, y, and z). According to this exemplary approach, if there is any feature vector associated with a parent voxel, this feature vector is directly inherited by its eight child voxels. The mechanism of this upsampling approach is depicted in Fig. 3 and is referred to as nearest neighbor (NN) upsampling. In the example of Fig. 3, after the input point cloud (302) is NN upsampled (304), the output point cloud (306) and its features are input to a subsequent pipeline for further processing.

이 NN 업샘플링 접근 방식에는 몇 가지 단점이 있다. 첫째, 업샘플링 후에, 포인트 클라우드의 기하학적 구조가 원래 기하학적 구조의 단순한 확대이다 - 포인트 클라우드의 형상 면에서 어떠한 세분화도 없다 -. 둘째로, 업샘플링 후 점유된 복셀의 수는 항상 원래의 것의 8배이며, 이는 후속 처리에서 높은 계산 비용을 초래할 수 있다.This NN upsampling approach has several drawbacks. First, after upsampling, the geometry of the point cloud is a simple extension of the original geometry - there is no refinement in the shape of the point cloud. Second, the number of occupied voxels after upsampling is always 8 times that of the original, which can lead to high computational costs in subsequent processing.

도 4는 프루닝을 사용하는 예시적인 복셀 기반 업샘플링을 보여주는 개략적인 프로세스 다이어그램이다. 논문 Wang, Jianqiang, et al., Multiscale Point Cloud Geometry Compression, 2021 Data Compression Conference (DCC) 73-82, IEEE (2021) (“Wang”)에서, 저자들은 이진 분류와 복셀 프루닝에 의해 업샘플링된 기하학적 구조를 추가적으로 세분화하는 방법을 제안했다. 도 4는 Wang에서 취해진 접근 방식을 도시한다.Figure 4 is a schematic process diagram showing an example voxel-based upsampling using pruning. In the paper Wang, Jianqiang, et al., Multiscale Point Cloud Geometry Compression , 2021 Data Compression Conference (DCC) 73-82, IEEE (2021) (“ Wang ”), the authors proposed a method to further refine the upsampled geometric structures by binary classification and voxel pruning. Figure 4 illustrates the approach taken in Wang .

예시적인 접근 방식(400)에 따르면, 입력 포인트 클라우드 PC₀(402)은 먼저 NN 업샘플링 블록(404)으로 업샘플링되며(도 4에 도시됨), 그 결과 초기에 업샘플링된 포인트 클라우드 PC₁(406)이 얻어진다. PC₁은 신경 네트워크 기반 이진 분류기(408)에 입력되며, 신경 네트워크 기반 이진 분류기(408)는 PC₁에서의 점유된 복셀들 각각에 대한 점유 상태(410)를 결정한다. 초기에 업샘플링된 포인트 클라우드 PC₁은 점유되지 않은 것으로 분류된 모든 복셀들(도 4에서 "0")을 제거하는 것에 의해 프루닝된다(412). 세분화되고 업샘플링된 PC₂(414)는 출력이다. 프루닝의 예에 대해서는 도 21을 참조한다.According to an exemplary approach (400), an input point cloud PC ₀ (402) is first upsampled with a NN upsampling block (404) (as shown in FIG. 4 ), resulting in an initially upsampled point cloud PC ₁ (406). The PC ₁ is input to a neural network-based binary classifier (408), which determines an occupancy state (410) for each of the occupied voxels in PC ₁ . The initially upsampled point cloud PC ₁ is then pruned (412) by removing all voxels classified as unoccupied ("0" in FIG. 4 ). The refined and upsampled PC ₂ (414) is the output. For an example of pruning, see FIG. 21 .

이 접근 방식은 도 3의 NN 업샘플링 방법의 앞서 언급된 두 가지 단점을 해결한다. 그러나, 많은 응용 분야에서, 이 방법이 정확한 기하학적 세분화가 되기 위해서는 이진 분류기의 성공이 매우 중요하다. 일부 실시예들에 따르면, 본 출원은 추가적인 복셀 콘텍스트 정보를 도입하는 것에 의해 이진 분류의 성능을 개선시킨다.This approach addresses the two aforementioned shortcomings of the NN upsampling method of Fig. 3. However, in many applications, the success of the binary classifier is critical for accurate geometric segmentation. In some embodiments, the present application improves the performance of the binary classification by introducing additional voxel context information.

도 4의 접근 방식에 따른 특징 정보는 딥 신경 네트워크에 의해 생성된 기하학적 구조의 고수준 추상적 기술자(high-level, abstract descriptor)이다. 이러한 특징 정보는, 일부 경우에, 너무 추상적이어서 정확한 분류를 수행하기에 불충분할 수 있다. 반대로, 본 출원에서 도입된 콘텍스트 정보는 이진 분류에 도움이 되는 단서일 수 있는 복셀별 로컬 지식(local, per voxel, knowledge)을 포함할 수 있다.The feature information according to the approach of Fig. 4 is a high-level, abstract descriptor of the geometric structure generated by the deep neural network. This feature information may, in some cases, be too abstract to be sufficient for accurate classification. In contrast, the context information introduced in the present application may include local, per voxel, knowledge that may be a clue helpful for binary classification.

도 5는 일부 실시예들에 따른 프루닝을 사용하는 예시적인 콘텍스트 인식 복셀 기반 업샘플링을 보여주는 개략적인 프로세스 다이어그램이다. 본 출원은 초기에(“단순하게(naively)”) 업샘플링된 포인트 클라우드 PC₁에 대한 추가적인 알려진 정보를 전달하는 콘텍스트 포인트 클라우드를 도입한다. 콘텍스트 포인트 클라우드(510)를 업샘플링된 포인트 클라우드 PC₁(506)과 연결(연관)(512)하는 것에 의해, 후속하는 이진 분류(516) 및 복셀 프루닝(520) 프로세스들은 초기 업샘플링된 포인트 클라우드 PC₁(506)을 더 잘 세분화할 수 있으며, 이는 더 정확한 업샘플링된 포인트 클라우드 PC₂(522)를 생성할 수 있다. 일부 실시예들에서, 복셀 프루닝 프로세스(520)는 업샘플링된 포인트 클라우드 PC₁(506) 및 이진 분류된 포인트 클라우드 PC"₁(518)을 입력으로 받아 출력 포인트 클라우드 PC₂(522)를 생성한다. 도 5의 블록 다이어그램은 일부 실시예들에 대한 복셀 기반 업샘플링 방법(500)을 보여준다. 복셀 기반 업샘플링 방법은 (예를 들면, 입력 포인트 클라우드 PC₀(502)의 NN 업샘플링(504)로 인한) “단순히” 업샘플링된 포인트 클라우드 PC₁에 더하여 업샘플링된 기하학적 구조를 세분화하기 위해 분류(516) 및 프루닝(520)을 수행한다. 도 5에서, 콘텍스트 정보를 전달하는 콘텍스트 포인트 클라우드 PC_CTX(510)가 도입된다. 콘텍스트 구성 블록(508)은 초기 업샘플링된 포인트 클라우드 PC₁(506)을 입력으로 받아 콘텍스트 포인트 클라우드 PC_CTX(510)를 출력한다. 콘텍스트 포인트 클라우드 PC_CTX(510)는 "단순히" 업샘플링된 포인트 클라우드 PC₁(506)과 연결되어 이진 분류 스테이지(516)에 대한 입력으로서 증강된 포인트 클라우드(514)를 생성한다. 이 예에 따르면, 콘텍스트 포인트 클라우드 PC_CTX(510)는 PC₁(506)과 동일한 기하학적 구조를 공유하는 반면, PC_CTX(510)는 실측 점유 상태(이 경우에 복셀에 대한 실측 점유 상태, 자식 복셀이 "비어 있는지" 또는 "점유되어 있는지")를 예측하기 위한 콘텍스트 정보(예를 들면, 복셀 단위 판별 정보(voxel-wise discriminative information))를 포함하도록 의도되어 있다. 증강된 포인트 클라우드 PC’₁(514)은 PC_CTX(510)와 PC₁(506)의 특징들을 연결(512)하는 것에 의해 생성된다.FIG. 5 is a schematic process diagram illustrating an exemplary context-aware voxel-based upsampling using pruning according to some embodiments. The present application initially (“naively”) introduces a context point cloud that conveys additional known information about the upsampled point cloud PC ₁ . By associating (associating) (512) the context point cloud (510) with the upsampled point cloud PC ₁ (506), the subsequent binary classification (516) and voxel pruning (520) processes can better refine the initial upsampled point cloud PC ₁ (506), which can generate a more accurate upsampled point cloud PC ₂ (522). In some embodiments, the voxel pruning process (520) takes as input an upsampled point cloud PC ₁ (506) and a binary classified point cloud PC" ₁ (518) and generates an output point cloud PC ₂ (522). The block diagram of FIG. 5 illustrates a voxel-based upsampling method (500) for some embodiments. The voxel-based upsampling method performs classification (516) and pruning (520) to refine the upsampled geometry in addition to the "simple" upsampled point cloud PC ₁ (e.g., due to NN upsampling (504) of the input point cloud _PC 0 (502)). In FIG. 5, a context point cloud PC _CTX (510) that conveys context information is introduced. The context construction block (508) takes as input an initial upsampled point cloud PC ₁ (506) and generates a context point cloud PC Outputs _{a CTX} (510). The context point cloud PC _CTX (510) is "simply" concatenated with the upsampled point cloud PC ₁ (506) to generate an augmented point cloud (514) as input to a binary classification stage (516). In this example, the context point cloud PC _CTX (510) shares the same geometry as PC ₁ (506), while PC _CTX (510) is intended to include context information (e.g., voxel-wise discriminative information) for predicting ground truth occupancy state (in this case ground truth occupancy state for a voxel, whether its child voxels are "empty" or "occupied"). The augmented point cloud PC' ₁ (514) is generated by concatenating (512) features of PC _CTX (510) and PC ₁ (506).

연결(concatenation)은 딥 신경 네트워크에서 통상적으로 사용되는 연산자이다. 연결 연산자는 PC_CTX에서의 (모든) 특징들과 PC₁에서의 대응하는 특징들을 연결하여 증강된 포인트 클라우드 PC’₁을 생성한다. PC_CTX에서의 점유된 복셀 (x, y, z)이 길이 a의 연관된 콘텍스트 정보 벡터 c를 가지는 반면, PC₁에서의 동일한 위치 (x, y, z)가 길이 b의 연관된 특징 벡터 f ₁을 가지는 경우, 연결 연산자는 c와 f ₁을 함께 연결하여 길이 (a + b)의 다른 특징 벡터 [cf ₁]을 생성한다. 이 생성된 특징 벡터 [c f ₁]은 증강된 포인트 클라우드 PC’₁의 복셀 위치 (x, y, z)에 할당될 것이다. 이 단계는 PC_CTX 및 PC₁에서의 모든 점유된 복셀들에 대해 수행되어 증강된 포인트 클라우드 PC'₁을 생성할 수 있다. 증강된 포인트 클라우드 PC’₁은 이진 분류기에 대한 입력인 PC₁을 교체한다.Concatenation is a commonly used operator in deep neural networks. The concatenation operator concatenates (all) features in PC _CTX with their corresponding features in PC ₁ to generate the augmented point cloud PC' ₁ . If an occupied voxel ( x , y , z ) in PC _CTX has an associated context information vector c of length a , while the same location ( x , y , z ) in PC ₁ has an associated feature vector f ₁ of length b , the concatenation operator concatenates c and f ₁ together to generate another feature vector [ cf ₁ ] of length ( a + b ). This generated feature vector [ cf ₁ ] will be assigned to the voxel location ( x , y , z ) in the augmented point cloud PC' ₁ . This step can be performed for all occupied voxels in PC _CTX and PC ₁ to generate the augmented point cloud PC' ₁ . Augmented point cloud PC' ₁ replaces PC ₁ , which is the input to the binary classifier.

일부 실시예들에 따르면, 콘텍스트 정보는 복셀에 대한 임의의 알려진 지식 또는 알려진 콘텍스트일 수 있다. 예를 들어, 콘텍스트 정보는 [x, y, z] 좌표와 같은 복셀의 위치일 수 있다. 콘텍스트 정보는, 예를 들어, 입력 포인트 클라우드의 비트 깊이를 포함할 수 있다. 콘텍스트 정보는, 예를 들어, 부모 복셀에 대한 복셀의 상대적 위치와 같은 기타 정보일 수 있다. 그러나, 콘텍스트가 위치 정보로 제한되지 않으며, 일부 실시예들에서 위치 정보 외에도 또는 이를 대신하여 다른 유형의 정보를 포함할 수 있다. 일부 실시예들에서, 콘텍스트 정보(예를 들면, 복셀 단위 판별 정보)는 복셀의 실측 점유 상태(예를 들면, 복셀이 "비어 있는지" 또는 "점유되어 있는지")를 예측하는 데 사용될 수 있는데 왜냐하면 콘텍스트 정보가 복셀에 대해 이미 알려진 어떤 정보를 제공할 수 있기 때문이다. 이러한 알려진 콘텍스트 정보를 통합하는 것에 의해, 딥 신경 네트워크는 점유 상태를 더 잘 추론할 수 있다.In some embodiments, the context information can be any known knowledge or known context about the voxel. For example, the context information can be the location of the voxel, such as [x, y, z] coordinates. The context information can include, for example, the bit depth of the input point cloud. The context information can be other information, such as, for example, the relative location of the voxel to its parent voxel. However, the context is not limited to location information, and in some embodiments can include other types of information in addition to or instead of location information. In some embodiments, the context information (e.g., voxel-wise discriminant information) can be used to predict the ground truth occupancy state of the voxel (e.g., whether the voxel is "empty" or "occupied"), because the context information can provide some information that is already known about the voxel. By incorporating such known context information, the deep neural network can better infer the occupancy state.

일부 실시예들에서, 콘텍스트 정보는 입력 포인트 클라우드의 처리로부터 얻어진다. 콘텍스트 정보는 복셀에 대한 임의의 정보일 수 있으며 복셀을 인코딩하기 전에도 결정될 수 있다. 예를 들어, 복셀의 위치 [x,y,z] 및 [x, y, z] 정보에 대한 복셀의 현재 비트 깊이가 알려져 있다고 가정한다. 따라서, 콘텍스트 정보는 구면 좌표로 표현될 수 있다. 아래의 수학식 1, 수학식 2, 및 수학식 3을 참조한다.In some embodiments, the context information is obtained from processing the input point cloud. The context information can be any information about the voxel and can be determined even before encoding the voxel. For example, assume that the current bit depth of the voxel for the location [x, y, z] of the voxel and [x, y, z] information of the voxel are known. Therefore, the context information can be expressed in spherical coordinates. See Equations 1, 2, and 3 below.

콘텍스트 정보 PC_CTX는 입력 포인트 클라우드 PC₀로부터 얻어진다. 일부 실시예들에 따르면, 콘텍스트 정보의 다른 외부 소스는 없다. 예를 들어, 콘텍스트 정보는 (x, y, z) 위치 좌표일 수 있다. 콘텍스트 정보 벡터 는 PC_CTX에서의 복셀 위치 (x, y, z)에 직접 할당될 수 있다. 일부 실시예들에서, 이 (x, y, z) 좌표 위치는 전처리되고, 예컨대, 아래에서 설명되는 바와 같이 수학식 1 내지 수학식 3을 통해, 다른 좌표 시스템으로 변환되며, PC_CTX에서의 복셀 위치 (x, y, z)에 할당될 수 있다.The context information PC _CTX is obtained from the input point cloud PC _0. In some embodiments, there is no other external source of context information. For example, the context information may be (x, y, z) position coordinates. Context information vector can be directly assigned to the voxel location (x, y, z) in the PC _CTX . In some embodiments, this (x, y, z) coordinate location is preprocessed and transformed to another coordinate system, for example via Equations 1 to 3 as described below, and can be assigned to the voxel location (x, y, z) in the PC _CTX .

일부 실시예들에서, 콘텍스트 정보는 x, y, 및 z 좌표를 포함한다. 예를 들어, PC_CTX에서의 점유된 복셀 (x, y, z)에 대해, 콘텍스트 정보는 벡터 일 수 있다. 일부 실시예들에서, 정규화된 좌표가 콘텍스트 정보로서 사용된다. PC₁이 N의 비트 깊이를 가지는 것으로 가정하며, 이는 PC₁이 차원 2 ^N x 2 ^N x 2 ^N 을 가진다는 것을 의미한다. 따라서, PC_CTX에서의 복셀 (x, y, z)와 연관된 콘텍스트 정보 벡터는 일 수 있다.In some embodiments, the context information includes x , y , and z coordinates. For example, for an occupied voxel ( x , y , z ) in a PC _CTX , the context information is a vector can be. In some embodiments, normalized coordinates are used as context information. Assume that PC ₁ has a bit depth of N , which means that PC ₁ has dimension 2 ^N x 2 ^N x 2 ^N. Therefore, the context information vector associated with a voxel ( x , y , z ) in PC _CTX is It could be.

더욱이, 유클리드 좌표로 작업하는 대신, 구면 좌표가 사용될 수 있으며, 이는 특히 LiDAR 스위프를 처리하는 데 유용하다. 이렇게 하기 위해, 다음과 같은 수학식 1, 수학식 2, 및 수학식 3이 적용되며:Moreover, instead of working in Euclidean coordinates, spherical coordinates can be used, which is particularly useful for handling LiDAR sweeps. To do this, the following mathematical equations 1, 2, and 3 are applied:

여기서 N은 비트 깊이이고, r은 반경방향 거리이며, φ는 고도각이고, θ는 방위각이다. 벡터 c는 이 되거나, 거리가 정규화된 경우 이 된다. 일부 실시예들에서, 콘텍스트 정보는 또한 PC₁의 비트 깊이, 즉, N일 수 있다. 이 경우에, 특징은 상수 스칼라 c = N이다. 더욱이, 유클리드 좌표 또는 구면 좌표로 작업하는 대신, 원통 좌표가 사용될 수 있는데 왜냐하면 원통 좌표가 LiDAR 스위프를 처리하는 데 사용될 수 있기 때문이다. 그렇게 하기 위해, 수학식 1과 수학식 3은, 제각기, 반경방향 거리(r)와 방위각()을 계산하는 데 적용된다. 복셀의 유클리드 좌표 (x, y, z)를 원통 좌표로 변환하면 이다. 따라서, 콘텍스트 정보를 담고 있는 벡터 c 는 이 되거나, 거리가 정규화된 경우 이 된다. 상이한 방식들로 콘텍스트 정보를 제공하는 것은 이진 분류 프로세스의 작업을 용이하게 할 수 있다. 예를 들어, 일부 경우에, 높이 z가 복셀의 점유 상태와 특히 관련이 있는 경우, 높이 z를 포함시키는 것은 분류에 도움이 될 수 있다. 다른 예는 LiDAR 포인트 클라우드이며, 이 경우에 높이는 합리적인 범위에 있으며, 예를 들면, LiDAR가 지하에 있는 무언가를 감지할 수 없기 때문에 높이는 0보다 크다. 그리고 일부 실시예들에서, 방위각 (수학식 3)가 점유와 매우 관련이 있을 수 있는 경우, 방위각을 포함시키는 것은 분류에 도움이 될 수 있다.where N is the bit depth, r is the radial distance, φ is the elevation angle, and θ is the azimuth angle. The vector c is This is or if the distance is normalized This becomes. In some embodiments, the context information can also be the bit depth of PC ₁ , i.e., N. In this case, the feature is a constant scalar c = N. Furthermore, instead of working in Euclidean or spherical coordinates, cylindrical coordinates can be used because cylindrical coordinates can be used to process LiDAR sweeps. To do so, Equations 1 and 3 are, respectively, expressed in terms of the radial distance ( r ) and the azimuth ( ) is applied to calculate the Euclidean coordinates ( x , y , z ) of the voxel. When the Euclidean coordinates ( x , y , z ) of the voxel are converted to cylindrical coordinates, Therefore, the vector c containing the context information is This is or if the distance is normalized Providing context information in different ways can facilitate the task of the binary classification process. For example, in some cases, including height z can help classification if height z is particularly relevant to the occupancy state of a voxel. Another example is LiDAR point clouds, where height is in a reasonable range, for example, height is greater than 0 because LiDAR cannot detect something underground. And in some embodiments, azimuth Including azimuth may aid in classification if (equation 3) is highly relevant to occupancy.

일부 실시예들에서, 인코더는 도 5에 도시된 것의 반대 방향으로 작동할 수 있다. 예를 들어, 인코더는 프루닝된 포인트 클라우드(522)와 같은 제1 포인트 클라우드를 획득할 수 있다. 포인트 클라우드의 복셀 점유 상태가 결정될 수 있다. 제2 포인트 클라우드는 비어 있는 것으로 결정된 복셀들을 제1 포인트 클라우드로부터 제거하는 것에 의해 생성될 수 있다. 제2 포인트 클라우드의 특징들이 결정되고 콘텍스트 정보와 연관되어 제3 포인트 클라우드를 생성할 수 있다. 일부 실시예들에서, 제2 포인트 클라우드의 특징들은 콘텍스트 정보와 연결될 수 있다. 제3 포인트 클라우드가 다운샘플링되어 제4 포인트 클라우드를 획득할 수 있다. 제4 포인트 클라우드는 인코더 출력으로서 출력될 수 있다.In some embodiments, the encoder can operate in the opposite direction to that illustrated in FIG. 5. For example, the encoder can obtain a first point cloud, such as a pruned point cloud (522). A voxel occupancy state of the point cloud can be determined. A second point cloud can be generated by removing voxels determined to be empty from the first point cloud. Features of the second point cloud can be determined and associated with context information to generate a third point cloud. In some embodiments, features of the second point cloud can be associated with the context information. The third point cloud can be downsampled to obtain a fourth point cloud. The fourth point cloud can be output as an encoder output.

도 6a는 일부 실시예들에 따른 예시적인 위치 값들을 예시하는 표이다. 일부 실시예들에서, 콘텍스트 정보는 그의 부모 복셀에 대한 자식 복셀의 위치일 수 있다. 예를 들어, "전방"/"후방", "좌측"/"우측", 그리고 "상부"/"하부"는, 제각기, 도 6a의 표 600, 표 602, 표 604에 도시된 바와 같이, "0"과 "1"로 표현될 수 있다. 환언하면, 특징 어레이의 최좌측 값은 전방/후방 상태를 나타내고, 중앙 값은 좌측/우측 상태를 나타내며, 최우측 값은 상부/하부 상태를 나타낸다. 전방/후방 상태에 대한 0은 전방을 나타내는 반면, 전방/후방 상태에 대한 1은 후방을 나타낸다. 좌측/우측 상태에 대한 0은 좌측을 나타내는 반면, 좌측/우측 상태에 대한 1은 우측을 나타낸다. 상부/하부 상태에 대한 0은 상부를 나타내는 반면, 상부/하부 상태에 대한 1은 하부를 나타낸다.FIG. 6A is a table illustrating exemplary location values according to some embodiments. In some embodiments, context information may be the location of a child voxel relative to its parent voxel. For example, "front"/"back", "left"/"right", and "top"/"bottom" may be represented as "0" and "1", respectively, as shown in Tables 600, 602, and 604 of FIG. 6A . In other words, the leftmost value in the feature array represents the front/back state, the center value represents the left/right state, and the rightmost value represents the top/bottom state. A 0 for the front/back state represents front, while a 1 for the front/back state represents back. A 0 for the left/right state represents left, while a 1 for the left/right state represents right. A 0 for the top/bottom state represents top, while a 1 for the top/bottom state represents bottom.

도 6b는 일부 실시예들에 따른 콘텍스트 정보로서 예시적인 자식 복셀 위치들을 예시하는 개략적인 사시도이다. 도 6b에 도시된 바와 같이, 그의 부모 복셀(650)의 전방, 우측 및 상부에 위치한 PC₁의 자식 복셀은 3비트 콘텍스트 정보 을 가지는 반면, 그의 부모 복셀의 후방, 우측 및 상부에 위치한 PC₁(650)의 자식 복셀은 3비트 콘텍스트 특징(652) 을 가진다. 일부 실시예들에서, 3비트 콘텍스트 특징 벡터는 이진수로 해석되어 10진수로 변환될 수 있으며, 예를 들면, 은 스칼라 c = 2가 되고 은 스칼라 c = 6이 된다. 일부 실시예들에서, 콘텍스트 정보가 앞서 언급된 예시적인 콘텍스트 특징들의 임의의 부분, 조합 및/또는 순열일 수 있다.Fig. 6b is a schematic perspective diagram illustrating exemplary child voxel locations as context information according to some embodiments. As shown in Fig. 6b, a child voxel of PC ₁ located in front, right and above its parent voxel (650) is provided with 3-bit context information On the other hand, the child voxel of PC ₁ (650), located posterior, right and above its parent voxel, has a 3-bit context feature (652). In some embodiments, the 3-bit context feature vector may be interpreted as binary and converted to decimal, for example, becomes the scalar c = 2 becomes a scalar c = 6. In some embodiments, the context information may be any portion, combination and/or permutation of the exemplary context features mentioned above.

도 5로 돌아가서, 의 좌측 상부 코너에 있는 특징 에 관한 다음 예를 고려한다. 이 예는 이진 분류 블록 이전에 입력 포인트 클라우드 에서 증강된 포인트 클라우드 로의 을 따른다.Going back to Figure 5, Features in the upper left corner of the Consider the following example. This example shows an input point cloud before the binary classification block. Augmented point cloud in Of the Follows .

의 특징 및 다른 특징들 각각은 1차원 벡터이다. 이 비제한적인 예에서, 벡터 길이는 5일 것이다. 따라서, 각각의 특징 벡터 은 5개의 숫자를 갖는 1x5 벡터일 것이다. 이러한 특징 벡터는 의 실측 점유 상태에 대한 정보를 포함하며 따라서 기하학적 특징이라고 불린다. 예를 들어, 은 좌측 상부 코너에 있는 의 실측 점유 상태에 대한 정보를 포함하는 반면; 는 우측 상단 코너에 있는 의 실측 점유 상태에 대한 정보를 포함한다. Features of and other features Each is a one-dimensional vector. In this non-limiting example, the vector length will be 5. Therefore, each feature vector will be a 1x5 vector with 5 numbers. This feature vector is contains information about the actual occupancy status of the object and is therefore called a geometric feature. For example, is in the upper left corner While it contains information about the actual occupancy status of; is in the upper right corner Contains information about the actual occupancy status.

이 예에서, 은 딥 신경 네트워크에 의해 생성된 추상적이고 고수준의 특징/기술자이다. 이 예에서, 의 값은 구체적인 물리적 의미를 갖지 않으며 따라서 추상적이고 "고수준"이다. 그러나, 이러한 값들은 추론을 수행하기 위해 신경 네트워크 자체에 의미를 제공한다. 이러한 벡터 특징이,의 움직임을 보여주기 위해 선택되는, 과 같은, 난수라고 가정한다.In this example, are abstract, high-level features/descriptors generated by deep neural networks. In this example, The values of do not have a specific physical meaning and are therefore abstract and "high-level". However, these values provide meaning to the neural network itself for performing inference. These vector features, , which is chosen to show the movement of Assuming that it is a random number, such as

의 일부로서, 도 5에 도시된 바와 같이, 은 NN 업샘플링 블록을 통해 전달되며, NN 업샘플링 블록은 의 좌측 상부 코너에 의 4개의 사본을 생성한다. 은 콘텍스트 구성 블록에 전달되며, 콘텍스트 구성 블록은 의 좌측 상부 코너에 있는 에 상응하는 네 개의 콘텍스트 정보 벡터를 생성한다. 이 네 개의 대응하는 콘텍스트 정보 벡터는 , 및 이다. 콘텍스트 구성 블록이 복셀의 좌표 (x, y, z)(또는 도 5의 2D 예에서 (x, y)만)를 사용하여 콘텍스트 정보 벡터를 구성한다고 가정하면, As part of, as shown in Fig. 5, is passed through the NN upsampling block, which is in the upper left corner of Creates four copies of . is passed to the context configuration block, which is in the upper left corner of Generate four context information vectors corresponding to each other. These four corresponding context information vectors are , and Assuming that the context configuration block constructs a context information vector using the coordinates ( x , y , z ) of the voxel (or just ( x , y ) in the 2D example of Fig. 5),

이고, And,

이며, And,

이고, And,

이다. am.

및 은 연결 블록에 입력되어, 포인트 클라우드 을 생성한다. 일부 실시예들에서, 연결 블록은 다음과 같은 예시적인 연결을 수행할 수 있다. and is entered into the connection block, point cloud In some embodiments, the connection block may perform the following exemplary connections:

은 과 연결되어, 새로운 벡터: silver Connected with, a new vector:

의 좌측 상부 코너(첫 번째 행, 첫 번째 열)에 있는 복셀인 The voxel in the upper left corner (first row, first column) of

를 생성한다. Creates .

는 과 연결되어, 새로운 벡터: Is Connected with, a new vector:

의 첫 번째 행, 두 번째 열에 있는 복셀인 The voxel in the first row and second column of

을 생성한다. Creates.

은 과 연결되어, 새로운 벡터: silver Connected with, a new vector:

의 두 번째 행, 첫 번째 열에 있는 복셀인 The voxel in the second row, first column of

을 생성한다. Creates.

는 과 연결되어, 새로운 벡터: Is Connected with, a new vector:

의 두 번째 행, 두 번째 열에 있는 복셀인 The voxel in the second row, second column of

을 생성한다. Creates.

획득된 포인트 클라우드 은 에서의 점유된 것으로 추정된 복셀이 결정된 실측 포인트 클라우드 에 따라 실제로 점유되었는지 여부를 결정하기 위해 이진 분류 블록에 입력된다.Acquired point cloud silver The ground truth point cloud in which the voxels estimated to be occupied are determined It is input into a binary classification block to determine whether it is actually occupied or not.

도 7은 일부 실시예들에 따른 여러 콘텍스트 인식 업샘플링을 캐스케이딩하기 위한 예시적인 프로세스를 예시하는 흐름도이다. 예시적인 프로세스(700)의 일부 실시예들에서, 도 7에 도시된 바와 같이, 콘텍스트 인식 복셀 기반 업샘플링(702, 706, 710)이 더 높은 업샘플링 비율(upsampling ratio)을 달성하기 위해 여러 번 캐스케이딩될 수 있다. 이 경우에, 두 개의 연속적인 업샘플링(702, 706, 710) 사이에, 세분화 및 특징 집계를 위해 특징 집계 블록(704, 708, 712)이 삽입될 수 있다. 예를 들어, 특징 집계 블록은 N개의 채널을 가진 특징들이 있는 희소 텐서를 입력으로 받을 수 있다. 특징 집계 블록은 압축 작업에 더 적합하게 기능하도록 특징들을 수정한다. 특히, 포인트 클라우드 압축 해제를 위한 고품질 재구성을 획득하기 위해, 특징 집계 블록은 로컬 기하학적 세부 사항을 표현할 수 있는 설명적이거나 독특한 기하학적 특징을 생성한다. 출력 특징은 여전히 N개의 채널을 가지고 있으며, 이는 특징 집계 블록이 희소 텐서의 형상을 변경하지 않음을 의미한다. 일부 실시예들에서, 특징 집계는, 예를 들어, 캐스케이딩된 희소 콘볼루션 계층 아키텍처, 잔차 네트워크(ResNet) 아키텍처, Inception-ResNet(IRN) 아키텍처, 또는 트랜스포머 블록일 수 있다. 일부 실시예들에서, 콘텍스트 인식 업샘플링이 도 5에 도시된 프로세스에 의해 수행될 수 있다.FIG. 7 is a flowchart illustrating an exemplary process for cascading multiple context-aware upsamplings according to some embodiments. In some embodiments of the exemplary process (700), the context-aware voxel-based upsamplings (702, 706, 710) can be cascaded multiple times to achieve a higher upsampling ratio, as illustrated in FIG. 7. In this case, a feature aggregation block (704, 708, 712) can be inserted between two consecutive upsamplings (702, 706, 710) for segmentation and feature aggregation. For example, the feature aggregation block can take as input a sparse tensor with features having N channels. The feature aggregation block modifies the features to better suit the compression task. In particular, to obtain a high-quality reconstruction for point cloud decompression, the feature aggregation block generates descriptive or unique geometric features that can represent local geometric details. The output features still have N channels, which means that the feature aggregation block does not change the shape of the sparse tensor. In some embodiments, the feature aggregation can be, for example, a cascaded sparse convolutional layer architecture, a residual network (ResNet) architecture, an Inception-ResNet (IRN) architecture, or a transformer block. In some embodiments, the context-aware upsampling can be performed by the process illustrated in FIG. 5.

일부 실시예들에서, 도 7에 도시된 특징 집계 블록은 가중치 공유 메커니즘을 포함할 수 있다. 도 7에서, 다수의 콘텍스트 인식 블록이 (직렬로) 캐스케이딩되어 있다. 일부 실시예들에서 도 7의 특징 집계 블록들과 콘텍스트 인식 업샘플링 블록들이 동일한 신경 네트워크 파라미터 세트를 공유할 수 있다.In some embodiments, the feature aggregation block illustrated in FIG. 7 may include a weight sharing mechanism. In FIG. 7, multiple context-aware blocks are cascaded (in series). In some embodiments, the feature aggregation blocks and the context-aware upsampling blocks of FIG. 7 may share the same set of neural network parameters.

도 8은 일부 실시예들에 따른 초기 특징 집계를 사용하는 예시적인 콘텍스트 인식 복셀 기반 업샘플링을 보여주는 개략적인 프로세스 다이어그램이다. 일부 실시예들에서, 도 8에 도시된 바와 같이, 초기 특징 세분화를 위해 NN 업샘플링 블록(804) 직후에 특징 집계 블록(806)이 삽입될 수 있다. PC₀(802)의 특징들은 NN 업샘플링(804) 할당에 기초하여 PC₁(808)의 특징들로 전달된다. 또한, 도 5에 도시된 것과 유사하게, 이진 분류 블록(818) 이전에 연결 블록(814)이 삽입된다. 일부 실시예들에서, 도 8의 블록들은 도 5에 설명된 블록들과 동일하게 작동하며, 특징 집계 블록이 추가되어 있다.FIG. 8 is a schematic process diagram showing an exemplary context-aware voxel-based upsampling using initial feature aggregation according to some embodiments. In some embodiments, a feature aggregation block (806) can be inserted immediately after the NN upsampling block (804) for initial feature segmentation, as illustrated in FIG. 8. The features of PC ₀ (802) are passed to the features of PC ₁ (808) based on the NN upsampling (804) assignment. Additionally, a concatenation block (814) is inserted before the binary classification block (818), similar to that illustrated in FIG. 5. In some embodiments, the blocks of FIG. 8 operate identically to the blocks described in FIG. 5, with the addition of a feature aggregation block.

도 5에서와 같이, 콘텍스트 구성 블록(810)은 초기 업샘플링된 포인트 클라우드 PC₁(808)을 입력으로 받아 콘텍스트 포인트 클라우드 PC_CTX(812)를 출력한다. 콘텍스트 포인트 클라우드(812)를 업샘플링된 포인트 클라우드 PC₁(808)과 연결(814)하는 것에 의해, 후속하는 이진 분류(818) 및 복셀 프루닝(822) 프로세스들은 초기 업샘플링된 포인트 클라우드 PC₁(808)을 세분화할 수 있으며, 이는 더 정확한 업샘플링된 포인트 클라우드 PC₂(824)를 생성할 수 있다. 증강된 포인트 클라우드 PC’₁(816)은 PC_CTX(812)와 PC₁(808)의 특징들을 연결(814)하는 것에 의해 생성된다. 예시적인 프로세스(800)의 일부 실시예들에서, 복셀 프루닝 프로세스(822)는 업샘플링된 포인트 클라우드 PC₁(808) 및 이진 분류된 포인트 클라우드 PC’’₁(820)을 입력으로 받아 출력 포인트 클라우드 PC₂(824)를 생성한다.As in FIG. 5, the context building block (810) takes as input an initial upsampled point cloud PC ₁ (808) and outputs a context point cloud PC _CTX (812). By concatenating (814) the context point cloud (812) with the upsampled point cloud PC ₁ (808), subsequent binary classification (818) and voxel pruning (822) processes can refine the initial upsampled point cloud PC ₁ (808), which can generate a more accurate upsampled point cloud PC ₂ (824). The augmented point cloud PC' ₁ (816) is generated by concatenating (814) features of the PC _CTX (812) and PC ₁ (808). In some embodiments of the exemplary process (800), the voxel pruning process (822) takes as input an upsampled point cloud PC ₁ (808) and a binary classified point cloud PC'' ₁ (820) and generates an output point cloud PC ₂ (824).

도 9는 일부 실시예들에 따른 이진 분류를 위한 예시적인 프로세스를 예시하는 흐름도이다. 도 9는 신경 네트워크를 사용하여 이진 분류를 수행하는 방식으로 간주될 수 있다. 이진 분류기는 입력 포인트 클라우드(PC₁)에서의 각각의 점유된 복셀에 대한 실측 점유 상태를 예측하는 데 사용될 수 있다. 이진 분류기는 PC₁에서의 각각의 점유된 복셀을 1(점유됨) 또는 0 (비어 있음)으로 분류하여 PC₁의 기하학적 구조가 세분화될 수 있도록 한다. 일부 실시예들에서, 도 9의 이진 분류 프로세스(900)는 도 5 및 도 8의 이진 분류 블록들에 사용될 수 있다.FIG. 9 is a flowchart illustrating an exemplary process for binary classification according to some embodiments. FIG. 9 may be considered a method for performing binary classification using a neural network. A binary classifier may be used to predict a ground truth occupancy state for each occupied voxel in an input point cloud (PC ₁ ). The binary classifier classifies each occupied voxel in PC ₁ as 1 (occupied) or 0 (empty) so that the geometry of PC ₁ can be refined. In some embodiments, the binary classification process (900) of FIG. 9 may be used in the binary classification blocks of FIGS. 5 and 8 .

도 9에서, 연결된 포인트 클라우드 PC’₁은 특징 세분화 및 추출을 위한 특징 집계 블록에 입력되며, 출력 채널 크기는 D ₁이다. 입력 포인트 클라우드는 특징 집계(902)를 거친다. 집계된 특징은 이어서 분류를 위해 채널 차원들(D ₁, D ₂, …, 1)을 갖는 다층 퍼셉트론 (MLP) 계층들(904)에 입력된다. MLP 계층은 입력 특징 벡터에 선형 매핑을 적용하는 신경 네트워크 계층이다. 예를 들어, 길이 D ₁의 입력 특징을 길이 D ₂의 출력 특징에 매핑하기 위해, MLP 계층은 크기 D ₁×D ₂의 행렬을 입력 특징과 곱하여, 길이 D ₂의 출력 특징을 생성한다. 두 개의 MLP 계층을 캐스케이딩할 때, 두 개의 MLP 계층 사이에 비선형 활성화 함수(예컨대, ReLU 함수)가 삽입된다. 출력은 소프트맥스 함수(906)에 입력되고, 소프트맥스 함수(906)는 MLP 출력 값을 0 내지 1의 범위로 변환한다. 0.5 초과 1 이하의 값에 대해, 임계값 처리 블록(908)은 해당 값을 1로 변환하여 점유된 상태(occupied status)를 나타낸다. 0 내지 0.5의 범위에 있는 값에 대해, 임계값 처리 블록은 해당 값을 0으로 변환하여 이진 분류된 출력(910)에 반영되는 바와 같은 비어 있는 상태(empty status)를 나타낸다.In Fig. 9, the connected point cloud PC' ₁ is input to the feature aggregation block for feature segmentation and extraction, and the output channel size is D ₁ . The input point cloud goes through feature aggregation (902). The aggregated features are then input to multilayer perceptron (MLP) layers (904) having channel dimensions ( D ₁ , D ₂ , …, 1) for classification. The MLP layer is a neural network layer that applies a linear mapping to the input feature vector. For example, to map an input feature of length D ₁ to an output feature of length D ₂ , the MLP layer multiplies the input feature by a matrix of size D ₁ × D ₂ to generate an output feature of length D ₂ . When cascading two MLP layers, a nonlinear activation function (e.g., ReLU function) is inserted between the two MLP layers. The output is input to a softmax function (906), and the softmax function (906) converts the MLP output value into a range of 0 to 1. For values greater than 0.5 and less than or equal to 1, the threshold processing block (908) converts the value to 1 to indicate an occupied status. For values in the range of 0 to 0.5, the threshold processing block converts the value to 0 to indicate an empty status, as reflected in the binary classified output (910).

도 10 내지 도 13은 일부 실시예들에 대한 특징 집계를 위한 네 가지 상이한 설계 선택 사항을 도시한다. 예를 들어, 일부 실시예들에서, 특징 집계 블록은 캐스케이딩된 희소 콘볼루션 계층 아키텍처(1000)(예를 들면, 도 10), 잔차 네트워크(ResNet) 아키텍처(1100)(예를 들면, 도 11), Inception-ResNet(IRN) 아키텍처(1200)(예를 들면, 도 12) 또는 트랜스포머 블록 아키텍처(1300)(예를 들면, 도 13)일 수 있다.Figures 10-13 illustrate four different design choices for feature aggregation for some embodiments. For example, in some embodiments, the feature aggregation block can be a cascaded sparse convolutional layer architecture (1000) (e.g., FIG. 10 ), a residual network (ResNet) architecture (1100) (e.g., FIG. 11 ), an Inception-ResNet (IRN) architecture (1200) (e.g., FIG. 12 ), or a transformer block architecture (1300) (e.g., FIG. 13 ).

도 10은 일부 실시예들에 따른 특징 집계를 위한 캐스케이딩된 희소 콘볼루션 계층들을 갖는 예시적인 프로세스를 예시하는 블록 다이어그램이다. 도 10에 도시된 예와 같은, 일부 실시예들에서, 두 개의 블록이 여러 번 반복되어 시리즈(series)를 형성한다. 두 개의 블록은 희소 3D 콘볼루션 계층(1002, 1006, 1010)(“CONV D”) 및 이에 뒤이은 ReLU 활성화(1004, 1008, 1012)(“ReLU”)이다. 도 10에 도시된 예에서, “CONV D”는 D개의 출력 채널을 갖는 희소 3D 콘볼루션 계층을 나타낸다. "ReLU" 활성화는 정류 선형 유닛(rectifier linear unit) 활성화 함수를 지칭한다. 예를 들어, ReLU 활성화 블록은 음수 입력값에 대해 0을 출력할 수 있고, 양수 입력값에 대해 입력에 스칼라 값을 곱한 것을 출력할 수 있다. 다른 실시예에서, ReLU 활성화 함수는, tanh() 활성화 함수 및/또는 sigmoid() 활성화 함수와 같은, 다른 활성화 함수들로 교체될 수 있다. 일부 실시예들에서, 비선형 활성화 프로세스는 ReLU(rectifier linear unit) 활성화 프로세스를 포함할 수 있다.FIG. 10 is a block diagram illustrating an exemplary process having cascaded sparse convolutional layers for feature aggregation according to some embodiments. In some embodiments, such as the example illustrated in FIG. 10 , two blocks are repeated multiple times to form a series. The two blocks are a sparse 3D convolutional layer (1002, 1006, 1010) (“CONV D”) followed by a ReLU activation (1004, 1008, 1012) (“ReLU”). In the example illustrated in FIG. 10 , “CONV D ” represents a sparse 3D convolutional layer having D output channels. The “ReLU” activation refers to a rectifier linear unit activation function. For example, a ReLU activation block may output 0 for a negative input value and may output the input multiplied by a scalar value for a positive input value. In other embodiments, the ReLU activation function can be replaced with other activation functions, such as the tanh() activation function and/or the sigmoid() activation function. In some embodiments, the nonlinear activation process can include a rectifier linear unit (ReLU) activation process.

도 11은 일부 실시예들에 따른 특징 집계를 위한 예시적인 ResNet 블록을 예시하는 블록 다이어그램이다. 일부 실시예들에서, 도 11에 도시된 바와 같이, 특징 집계 프로세스는 ResNet 아키텍처를 사용할 수 있다. 논문 He, Kaiming, et al., Deep Residual Learning for Image Recognition, Proceedings of the IEEE Conf. on Computer Vision and Pattern Recognition 770-778, IEEE (2016) (“He”)는 예시적인 ResNet 아키텍처를 설명한다. 예를 들어, He의 4 페이지에 있는 도 3의 최우측 프로세스 라인을 참조한다.FIG. 11 is a block diagram illustrating an exemplary ResNet block for feature aggregation according to some embodiments. In some embodiments, as illustrated in FIG. 11 , the feature aggregation process may use a ResNet architecture. The paper He, Kaiming, et al., Deep Residual Learning for Image Recognition , Proceedings of the IEEE Conf. on Computer Vision and Pattern Recognition 770-778, IEEE (2016) (“ He ”) describes an exemplary ResNet architecture. See, for example, the rightmost process line of FIG. 3 on page 4 of He .

도 11의 예는 D개의 채널을 사용하여 특징들을 집계하는 ResNet 블록 아키텍처를 보여준다. 도 11에 도시된 예와 같은, 일부 실시예들에서, 두 개의 블록이 여러 번 반복되어 시리즈를 형성한다. 두 개의 블록은 희소 3D 콘볼루션 계층(1102, 1106, 1110)(“CONV D”) 및 이에 뒤이은 ReLU 활성화(1104, 1108, 1112)(“ReLU”)이다. 도 10과 비교하여, 도 11은 일련의 콘볼루션 계층들의 출력에 입력을 가산하기 위해 입력에 잔차 연결(1114)을 도입한다.An example in Fig. 11 shows a ResNet block architecture that aggregates features using D channels. In some embodiments, such as the example illustrated in Fig. 11, two blocks are repeated multiple times to form a series. The two blocks are a sparse 3D convolutional layer (1102, 1106, 1110) (“CONV D”) followed by a ReLU activation (1104, 1108, 1112) (“ReLU”). Compared to Fig. 10, Fig. 11 introduces a residual connection (1114) to the input to add the input to the output of a series of convolutional layers.

도 12는 일부 실시예들에 따른 특징 집계를 위한 예시적인 Inception-ResNet 블록을 예시하는 블록 다이어그램이다. 일부 실시예들에서, 도 12에 도시된 바와 같이, 특징 집계는 Inception-ResNet (IRN) 아키텍처로 구성될 수 있다. Wang의 도 1(b)는 또한 IRN 아키텍처를 도시한다. 도 12의 예는 D개의 채널을 사용하여 특징들을 집계하는 IRN 블록의 아키텍처를 보여준다. IRN 블록은 특징 집계 프로세스를 세 개의 병렬 경로로 분리한다.Fig. 12 is a block diagram illustrating an exemplary Inception-ResNet block for feature aggregation according to some embodiments. In some embodiments, as illustrated in Fig. 12, feature aggregation can be constructed with an Inception-ResNet (IRN) architecture. Figure 1(b) of Wang also illustrates an IRN architecture. The example in Fig. 12 shows the architecture of an IRN block that aggregates features using D channels. The IRN block splits the feature aggregation process into three parallel paths.

더 많은 콘볼루션 계층이 있는 경로(도 12의 좌측 경로)는 더 큰 수용 영역(receptive field)으로 (더 많은) 전역 정보를 집계한다. 일부 실시예들에서, 좌측 경로는 콘볼루션 계층(1202), 이에 뒤이은 두 세트의 ReLU 활성화(1204, 1208) 및 콘볼루션 계층(1206, 1210)을 포함할 수 있다.A path with more convolutional layers (the left path in Figure 12) aggregates (more) global information into a larger receptive field. In some embodiments, the left path may include a convolutional layer (1202), followed by two sets of ReLU activations (1204, 1208) and a convolutional layer (1206, 1210).

더 적은 콘볼루션 계층이 있는 경로(도 12의 중간 경로)는 더 작은 수용 영역으로 로컬 상세 정보를 집계한다. 일부 실시예들에서, 중간 경로는 콘볼루션 계층(1212), 이에 뒤이은 ReLU 활성화(1214) 및 콘볼루션 계층(1216)을 포함할 수 있다.A path with fewer convolutional layers (the middle path in Figure 12) aggregates local details into a smaller receptive field. In some embodiments, the middle path may include a convolutional layer (1212), followed by a ReLU activation (1214), and a convolutional layer (1216).

마지막 경로(1220)(도 12의 우측 경로)는 도 11의 잔차 연결과 유사하게 입력을 직접 출력으로 전달하는 잔차 연결이다. 일부 실시예들에서, CONV D/2 블록 이후, 그리고 도 12의 좌측 경로 및 중간 경로 각각에서의 연결(1218) 이전에 ReLU 블록이 삽입될 수 있다.The last path (1220) (right path in Fig. 12) is a residual connection that passes the input directly to the output, similar to the residual connection in Fig. 11. In some embodiments, a ReLU block may be inserted after the CONV D/2 block and before the connection (1218) in each of the left path and the middle path in Fig. 12.

도 13은 일부 실시예들에 따른 특징 집계를 위한 예시적인 트랜스포머 블록을 예시하는 블록 다이어그램이다. 논문 Mao, Jiageng, et al., Voxel Transformer for 3D Object Detection, Proceedings of the IEEE/CVF International Conference on Computer Vision 3164-3173, IEEE (2021) (“Mao”)의 섹션 3.2는 복셀 트랜스포머를 논의한다. 일부 실시예들에서, 본 출원의 트랜스포머 아키텍처는 Mao의 복셀 트랜스포머와 유사할 수 있다.FIG. 13 is a block diagram illustrating an exemplary transformer block for feature aggregation according to some embodiments. Section 3.2 of the paper Mao, Jiageng, et al., Voxel Transformer for 3D Object Detection , Proceedings of the IEEE/CVF International Conference on Computer Vision 3164-3173, IEEE (2021) (“ Mao ”) discusses a voxel transformer. In some embodiments, the transformer architecture of the present application may be similar to the voxel transformer of Mao .

트랜스포머 블록의 다이어그램은 도 13에 도시되어 있다. 셀프 어텐션 블록(1302)의 출력은 잔차 연결을 통해 셀프 어텐션 블록의 입력에 가산된다. MLP 블록(1304)이 해당 가산의 결과와 직렬로 연결되며, MLP 블록(1304)의 출력은 다른 잔차 연결을 통해 MLP 블록(1304)의 입력에 가산된다. MLP 블록에는 일련의 다층 퍼셉트론(MLP) 계층들을 포함한다.A diagram of the transformer block is illustrated in Fig. 13. The output of the self-attention block (1302) is added to the input of the self-attention block via a residual connection. An MLP block (1304) is connected in series with the result of the addition, and the output of the MLP block (1304) is added to the input of the MLP block (1304) via another residual connection. The MLP block includes a series of multilayer perceptron (MLP) layers.

도 14는 일부 실시예들에 따른 셀프 어텐션 블록의 예시적인 아키텍처를 예시하는 블록 다이어그램이다. 일부 실시예들에서, 도 13의 셀프 어텐션 블록(1302)은 도 14에 설명된 아키텍처(1400)를 사용하여 구현될 수 있다. 복셀 위치 와 연관된 현재 특징 벡터 및 복셀 위치들 와 연관된 그의 이웃하는 k개의 특징 - 여기서 )에 대한 는 입력 희소 텐서에서 의 k개의 최근접 이웃임 - 이 주어지면, 셀프 어텐션 블록(1400)은 모든 이웃하는 특징들 에 기초하여 특징 를 업데이트하려고 시도한다. 포인트들 는 의 좌표에 기초하여 k 최근접 이웃(kNN) 탐색(1402)에 의해 획득된다. 에 대한 를 임베딩하는 쿼리(1404)는 수학식 4에 의해 계산되고:FIG. 14 is a block diagram illustrating an exemplary architecture of a self-attention block according to some embodiments. In some embodiments, the self-attention block (1302) of FIG. 13 may be implemented using the architecture (1400) described in FIG. 14. Voxel location Current feature vector associated with and voxel locations Its k neighboring features are associated with - Here )for is in the input sparse tensor. Given the k nearest neighbors of - , the self-attention block (1400) selects all neighboring features Features based on Try to update the points Is It is obtained by k nearest neighbor (kNN) search (1402) based on the coordinates of . for The query (1404) embedding is calculated by mathematical expression 4:

여기서 (1404)는 쿼리를 획득하기 위한 MLP 계층들을 나타낸다. 는 복셀들 와 사이의 위치 인코딩(1410)이며, 이는 수학식 5에 의해 계산되고:Here (1404) represents the MLP layers for obtaining queries. are voxels and is the position encoding (1410) between, which is calculated by mathematical expression 5:

여기서 (1410)는 위치 인코딩을 획득하는 데 사용되는 MLP 계층들을 나타낸다. 및 는, 제각기, 복셀들 및 의 중심에 대한 3D 좌표이다. 의 모든 최근접 이웃들의 키 임베딩(key embedding) 및 값 임베딩(value embedding) 는 수학식 6 및 수학식 7을 사용하여 계산되며:Here (1410) represents the MLP layers used to obtain positional encoding. and are, respectively, voxels and is the 3D coordinate for the center of . Key embeddings of all nearest neighbors of and value embedding is calculated using Equations 6 and 7:

여기서 (1406) 및 (1408)는, 제각기, 키 및 값을 획득하기 위한 MLP 계층들이다. 셀프 어텐션 블록은 수학식 8에 의해 주어지는 바와 같이 위치 의 출력 특징 을 출력하며:Here (1406) and (1408) are MLP layers for obtaining keys and values, respectively. The self-attention block is located as given by Equation 8. Output characteristics of and prints:

여기서 는 소프트맥스 정규화 함수(1412)이고, 는 특징 벡터 의 길이이며, 는 미리 정의된 상수이다.Here is the softmax regularization function (1412), is a feature vector is the length of, is a predefined constant.

수학식 4 내지 수학식 8은 도 14에 도시되어 있다. 수학식 4는 도 14의 좌측 상부에 도시되어 있고, 여기서 블록(1404)은 현재 특징 벡터 를 입력으로 받아 를 출력한다. 도 14의 상부에서, kNN 블록(1402)은 현재 특징 벡터 를 입력으로 받아, 의 좌표에 기초하여 k 최근접 이웃 (kNN) 탐색을 수행한다. kNN 블록(1402)의 출력은 에 대한 의 좌표이다. 수학식 5는 도 14의 우측 상부 코너에 도시되어 있고, 여기서 블록(1410)은 복셀들 및 간의 차이를 입력으로 받아 를 출력한다. 수학식 6은 도 14의 좌측 중앙 부분에 도시되어 있고, 여기서 특징 벡터 는 블록(1406)에 대한 입력이고, 블록(1406)의 출력은 에 가산되어 에 대한 을 생성한다. 수학식 7은 도 14의 우측 중앙 부분에 도시되어 있고, 여기서 특징 벡터 는 블록(1408)에 대한 입력이고, 블록(1408)의 출력은 에 가산되어 에 대한 을 생성한다. 수학식 8을 도 14에 적용하면, 소프트맥스 정규화 함수(1412)에 대한 입력은 와 의 내적(dot-product)이다. 이 내적은 특징 벡터 의 길이로 정규화하기 위해 로 나누어진다. 도 14의 하부에 도시되어 있는 위치 에 대한 출력 특징()은 소프트맥스 함수 출력과 의 내적에 대한 k개의 최근접 이웃의 합계이다.Mathematical expressions 4 to 8 are illustrated in Fig. 14. Mathematical expression 4 is illustrated in the upper left of Fig. 14, where Block (1404) is the current feature vector Take as input In the upper part of Fig. 14, the kNN block (1402) outputs the current feature vector Take as input, Based on the coordinates of the k nearest neighbor (kNN) search is performed. The output of the kNN block (1402) is for is the coordinate of. Mathematical expression 5 is shown in the upper right corner of Fig. 14, where Block (1410) is a voxel and Take the difference between the two as input . Mathematical expression 6 is shown in the left center part of Fig. 14, where the feature vector Is Input for block (1406), The output of block (1406) is Added to for . Mathematical expression 7 is shown in the right center part of Fig. 14, where the feature vector Is Input for block (1408), The output of block (1408) is Added to for When mathematical expression 8 is applied to Fig. 14, the input to the softmax regularization function (1412) is and is the dot product. This dot product is the feature vector. To normalize to the length of is divided into. The location shown in the lower part of Fig. 14 Output features for ( ) is the output of the softmax function and is the sum of the k nearest neighbors of the inner product.

일부 실시예들에서, 포인트 A에 가장 가까운 포인트 클라우드 내의 k개의 포인트를 탐색하는 kNN 탐색을 사용하는 대신, 포인트 A로부터 거리 r 이내에 있는 포인트 클라우드 내의 포인트들 전부가 사용될 수 있다. 이 동작은 볼 쿼리(ball query)라고 한다. 볼 쿼리에 대한 값(또는 반지름) r은 양자화기의 양자화 스텝 크기 s 에 의해 결정될 수 있다. 예를 들어, 더 거친 포인트 클라우드에 대응하는 더 큰 s가 주어지면, 원래 포인트 클라우드로부터의 더 많은 포인트를 커버하기 위해 r의 값이 더 커진다.In some embodiments, instead of using a kNN search to find the k points in the point cloud closest to point A , all points in the point cloud within a distance r from point A can be used. This operation is called a ball query. The value (or radius) r for the ball query can be determined by the quantization step size s of the quantizer. For example, given a larger s corresponding to a coarser point cloud, the value of r will be larger to cover more points from the original point cloud.

일부 실시예들에서, kNN 탐색은 쿼리 포인트, 예를 들면, A에 가장 가까운 k개의 포인트를 찾는 데 사용될 수 있다. 그러나, 그 후, A로부터 거리 r 내에 있는 포인트들만이 유지된다. r의 값은 볼 쿼리와 동일한 방식으로 결정될 수 있다. kNN 탐색에 의해 사용되는 거리 메트릭(distance metric)은 임의의 거리 메트릭일 수 있다.In some embodiments, a kNN search may be used to find the k closest points to a query point, e.g., A. However, only points within a distance r from A are then retained. The value of r may be determined in the same manner as for the ball query. The distance metric used by the kNN search may be any distance metric.

일부 실시예들에서, 트랜스포머 블록(예컨대, 도 13에 도시된 예)이 동일한 방식으로 희소 텐서에서 (모든) 점유된 위치들에 대해 특징을 업데이트하고 업데이트된 희소 텐서를 출력한다. 일부 실시예들에서, , , , 및 는 선형 투영(linear projection)에 대응하는 하나의 완전 연결 계층(fully-connected layer)만을 포함할 수 있다.In some embodiments, a transformer block (e.g., the example illustrated in FIG. 13) updates features for (all) occupied positions in the sparse tensor in the same manner and outputs the updated sparse tensor. In some embodiments, , , , and can contain only one fully-connected layer corresponding to a linear projection.

도 15는 일부 실시예들에 따른 여러 특징 집계를 캐스케이딩하기 위한 예시적인 프로세스를 예시하는 흐름도이다. 일부 실시예들예에서, 도 15의 예시적인 프로세스(1500)에서 도시된 바와 같이, 성능을 더욱 향상시키기 위해 (도 10 내지 도 13에 도시된 특징 집계 블록 예들과 같은) 여러 특징 집계 블록(1502, 1504, 1506, 1508)이 함께 캐스케이딩된다. 특징 집계 블록들은 동일한 유형일 수 있으며, 예를 들면, 이들 모두가 트랜스포머 블록일 수 있다.FIG. 15 is a flowchart illustrating an exemplary process for cascading multiple feature aggregations according to some embodiments. In some embodiments, as illustrated in the exemplary process (1500) of FIG. 15, multiple feature aggregation blocks (1502, 1504, 1506, 1508) are cascaded together (such as the feature aggregation block examples illustrated in FIGS. 10-13) to further improve performance. The feature aggregation blocks can be of the same type, for example, they can all be transformer blocks.

특징 집계 블록들이 동일한 유형인 일부 실시예들에서, 그의 신경 네트워크 계층들의 파라미터들이 공유된다. 특징 집계 블록들 간에 신경 네트워크 파라미터들을 공유하는 것에 의해, 신경 네트워크 모델의 총 파라미터 수가 감소될 수 있으며, 이는 (예를 들어) 다음과 같은 두 가지 이점이 있다. 첫째, 신경 네트워크의 모델 크기가 감소될 수 있으며, 이는 신경 네트워크 모델의 저장 또는 송신을 더 용이하도록 만든다. 둘째로, 훈련 스테이지 동안 학습될 총 파라미터 수를 감소시키는 것은 훈련이 더 빨리 수렴하도록 만들 수 있다. 그러나, 특징 집계 블록들 간에 신경 네트워크 파라미터들을 공유한 결과는 신경 네트워크 모델의 용량의 감소일 수 있으며, 이는 신경 네트워크가 고수준 특징을 추출하는 능력을 저하시킬 수 있고, 신경 네트워크의 성능을 저하시킬 수 있다. 이러한 잠재적인 결과는 일부 응용 분야들에 불리한 것으로 판명될 수 있다.In some embodiments where the feature aggregation blocks are of the same type, the parameters of their neural network layers are shared. By sharing the neural network parameters between the feature aggregation blocks, the total number of parameters of the neural network model can be reduced, which has two advantages (for example): First, the model size of the neural network can be reduced, which makes it easier to store or transmit the neural network model. Second, reducing the total number of parameters to be learned during the training stage can make the training converge faster. However, the result of sharing the neural network parameters between the feature aggregation blocks can be a reduction in the capacity of the neural network model, which can reduce the ability of the neural network to extract high-level features and can degrade the performance of the neural network. This potential consequence can prove detrimental to some applications.

일부 실시예들에서, 각각의 집계 블록은 별개의 신경 네트워크 파라미터 세트와 함께 동일한 신경 네트워크를 사용할 수 있다. 일부 실시예들에서, 각각의 집계 블록은 별개의 신경 네트워크 파라미터 세트들과 함께 별도의 신경 네트워크들을 사용할 수 있다. 일부 실시예들에서 별개의 신경 네트워크 파라미터 세트들이 동일할 수 있다. 일부 실시예들에서, 제1 신경 네트워크 파라미터 세트와 제2 신경 네트워크 파라미터 세트는 (동일한) 신경 네트워크 파라미터의 동일한 세트이며, 동일한 신경 네트워크 파라미터 세트가 적어도 제1 신경 네트워크와 제2 신경 네트워크에 의해 사용될 수 있다. 일부 실시예들에서, 제1 신경 네트워크 파라미터 세트와 제2 신경 네트워크 파라미터 세트는 구별되지만 동일한 신경 네트워크 파라미터 세트이다.In some embodiments, each aggregation block may use the same neural network with a distinct set of neural network parameters. In some embodiments, each aggregation block may use separate neural networks with distinct sets of neural network parameters. In some embodiments, the separate sets of neural network parameters may be the same. In some embodiments, the first neural network parameter set and the second neural network parameter set are the same set of (identical) neural network parameters, and the same set of neural network parameters may be used by at least the first neural network and the second neural network. In some embodiments, the first neural network parameter set and the second neural network parameter set are distinct but identical sets of neural network parameters.

물론, 일부 실시예들에서, 모든 신경 네트워크 파라미터 세트들이 모든 신경 네트워크들에 걸쳐 동일한 것은 아니며, 모든 신경 네트워크들이 모든 기능 블록들(예를 들면, 특징 집계 블록들)에 걸쳐 동일하게 모델링되는 것은 아니다. 두 개 이상의 특징 집계 블록을 사용하는 일부 실시예들에서, 두 개 이상의 각자의 신경 네트워크를 포함하는 두 개 이상의 특징 집계 블록이 두 개 이상의 각자의 동일한 신경 네트워크 파라미터 세트를 활용할 수 있다.Of course, in some embodiments, not all neural network parameter sets are identical across all neural networks, and not all neural networks are modeled identically across all functional blocks (e.g., feature aggregation blocks). In some embodiments using more than one feature aggregation block, two or more feature aggregation blocks comprising two or more respective neural networks may utilize two or more respective sets of neural network parameters.

일부 실시예들에서, 특징 집계 블록들은 상이한 유형의 특징 집계 블록들의 혼합, 예를 들면, IRN 블록들과 트랜스포머 블록들의 혼합일 수 있다. 일부 실시예들에서, 단일 특징 집계 블록이 더 나은 압축 성능을 달성하기 위해 두 개 이상의 캐스케이딩된 특징 집계 블록으로 교체될 수 있다.In some embodiments, the feature aggregation blocks may be a mixture of different types of feature aggregation blocks, for example , a mixture of IRN blocks and Transformer blocks. In some embodiments, a single feature aggregation block may be replaced by two or more cascaded feature aggregation blocks to achieve better compression performance.

콘텍스트 인식 복셀 기반 업샘플링은 포인트 클라우드 압축 해제에 적용될 수 있다. 일부 실시예들에서, 콘텍스트 인식 복셀 기반 업샘플링이 '087 출원의 디코더에 적용되어 실측치에 더 가까운 포인트 클라우드를 생성할 수 있다.Context-aware voxel-based upsampling can be applied to point cloud decompression. In some embodiments, context-aware voxel-based upsampling can be applied to the decoder of the '087 application to generate point clouds that are closer to ground truth.

도 16은 일부 실시예들에 따른 예시적인 원래 디코더 아키텍처를 예시하는 블록 다이어그램이다. 기본 계층(base layer)과 향상 계층(enhancement layer)을 포함하는, '087 출원의 디코더의 아키텍처(1600)가 도 16에 도시되어 있다. 기본 계층은 기본 디코드(base decode)(1602) 및 역양자화(1604)를 수행하여 거친/단순화된 포인트 클라우드 PC₀을 생성하는 데 사용되는, 비트스트림 BS₀을 수신한다. PC₀은 복셀 기반 표현에서 원래 입력 포인트 클라우드의 단순화된 또는 저해상도 버전이다. 비트스트림 BS₁로부터, 특징 디코더 블록(1606)은 PC₀의 복셀 단위 특징들을 디코딩하며, 이는 PC₀에서의 모든 점유된 복셀에 로컬 기하학적 구조의 추상화인 벡터 특징을 부여(equip)한다. BS₁이 이용 불가능한 경우 - '087 출원에서 "스킵 모드"라고 함 -, 특징 디코더는 여전히, PC₀에서의 각각의 복셀에 대해, PC₀의 기하학적 구조에 기초하여 벡터 특징을 합성한다. 포인트 클라우드에 첨부되는 결과적인 특징들은 PC’₀로 표기된다. '087 출원에서, PC’₀에서의 모든 특징이 특징-잔차 변환기(1608)에 입력되어 로컬 3D 포인트 세트를 디코딩한다. 구체적으로, 에 위치한 PC’₀에서의 복셀 A에 대해, 그의 특징 는 특징-잔차 변환기에 입력되며, 이 변환기는 k개의 3D 포인트 세트 을 출력한다. 특징-잔차 변환기(1608)는 일련의 MLP 계층들일 수 있다. 기하학적 합산(도 16에서의“”)은 수학식 9 내지 수학식 11에 나와 있는 바와 같이 (x, y, z)를 사용하여 병진 이동시키는 것에 의해 디코딩된 포인트 세트를 병진 이동시킨다:FIG. 16 is a block diagram illustrating an exemplary original decoder architecture according to some embodiments. The architecture (1600) of the decoder of the '087 application, including a base layer and an enhancement layer, is depicted in FIG. 16. The base layer receives a bitstream BS ₀ , which is used to perform a base decode (1602) and inverse quantization (1604) to generate a coarse/simplified point cloud PC ₀ . PC ₀ is a simplified or low-resolution version of the original input point cloud in a voxel-based representation. From the bitstream BS ₁ , a feature decoder block (1606) decodes voxel-wise features of PC ₀ , which equip every occupied voxel in PC ₀ with a vector feature, which is an abstraction of the local geometric structure. If BS ₁ is not available - referred to as "skip mode" in the '087 application - the feature decoder still synthesizes, for each voxel in PC ₀ , vector features based on the geometry of PC ₀ . The resulting features, which are appended to the point cloud, are denoted as PC' ₀ . In the '087 application, all features in PC' ₀ are input to the feature-to-residual transformer (1608) to decode the local 3D point set. Specifically, For voxel A at PC' ₀ located at , its features is input to a feature-residual transformer, which transforms a set of k 3D points outputs. The feature-residual transformer (1608) may be a series of MLP layers. Geometric summation (see “ ”) translates the decoded point set by translating it using (x, y, z) as shown in Equations 9 to 11:

특징 세트 PC’₀에서의 모든 복셀과 연관된 병진 이동된 포인트 세트 는 디코딩된 포인트 클라우드 PC_DEC를 형성한다. PC’₀에 M개의 복셀이 있는 경우, 디코딩된 포인트 클라우드 PC_DEC는 Mk개의 포인트를 포함한다.A set of translationally translated points associated with all voxels in feature set PC' ₀ forms a decoded point cloud PC _DEC . If there are M voxels in PC' ₀ , the decoded point cloud PC _DEC contains Mk points.

도 16의 기본 계층에서, 포인트 클라우드는 기본 디코더를 사용하여 비트스트림 BS₀으로부터 디코딩된다. 역양자화기가 포인트 클라우드에 적용되어 더 거친 포인트 클라우드 PC₀를 획득한다. 일부 실시예들에서, 역양자화기는 s의 스텝 크기를 사용할 수 있다.In the base layer of Fig. 16, the point cloud is decoded from the bitstream BS ₀ using a basic decoder. A dequantizer is applied to the point cloud to obtain a coarser point cloud PC ₀ . In some embodiments, the dequantizer can use a step size of s.

향상 계층에서, 이미 디코딩된 더 거친 포인트 클라우드 PC₀과 함께 BS₁을 디코딩하여 포인트 단위 특징(pointwise feature) 세트 PC'₀을 출력하기 위해 특징 디코더가 적용된다. 특징 세트 PC'₀은 PC₀에서의 각각의 포인트에 대한 포인트 단위 특징을 포함한다. 예를 들어, PC₀에서의 포인트 A는 자체 특징 벡터 f' _A 를 가진다. 디코딩된 특징 벡터 f' _A 는 인코더 측의 그의 대응하는 특징 벡터 f _A 와 상이한 크기를 가질 수 있다. 그러나, f _A 와 f' _A 둘 모두는 포인트 A에 가까운 PC₀의 세밀한 로컬 기하학적 구조 상세를 설명하기 위해 생성된다. 디코딩된 특징 세트 PC’₀은 특징-잔차 변환기에 입력되며, 특징-잔차 변환기는 PC_DEC의 잔차 성분을 생성한다. 더 거친 포인트 클라우드 PC₀ 및 잔차는 기하학적 합산 블록에 입력된다. 합산 블록은 잔차 성분을 더 거친 포인트 클라우드 PC₀에 가산하여 최종 디코딩된 포인트 클라우드 PC_DEC를 생성한다.In the enhancement layer, a feature decoder is applied to decode BS ₁ together with the already decoded coarser point cloud PC ₀ to output a pointwise feature set PC' ₀ . The feature set PC' ₀ contains pointwise features for each point in PC ₀ . For example, point A in PC ₀ has its own feature vector f' _A . The decoded feature vector f' _A may have different magnitude from its corresponding feature vector f _A at the encoder side. However, both f _A and f' _A are generated to describe the fine local geometric details of PC ₀ close to point A. The decoded feature set PC' ₀ is input to a feature-residual transformer, which generates a residual component of PC _DEC . The coarser point cloud PC ₀ and the residual are input to a geometric summation block. The summation block adds the residual components to the coarser point cloud PC ₀ to produce the final decoded point cloud PC _DEC .

일부 실시예들에서, 기본 디코더는 임의의 PCC 코덱일 수 있다. 일부 실시예들에서, 기본 디코더는 Wang의 코덱과 같은, 손실 PCC 코덱을 사용하도록 선택된다. 일부 실시예들에서, 기본 코덱은 MPEG G-PCC 표준과 같은 무손실 PCC 코덱이거나, 옥트리 표현(octree representation)을 사용하는 딥 엔트로피 모델(deep entropy model)일 수 있다.In some embodiments, the base decoder can be any PCC codec. In some embodiments, the base decoder is selected to use a lossy PCC codec, such as Wang 's codec. In some embodiments, the base codec can be a lossless PCC codec, such as the MPEG G-PCC standard, or a deep entropy model using an octree representation.

디코딩된 포인트의 수 m은 m = 5와 같은 고정 상수일 수 있거나, 디코딩된 포인트의 수는 예컨대, 원본 포인트 클라우드의 밀도 수준에 대한 사전 지식에 기초하여, 적응적으로 선택될 수 있다. 예를 들어, 원래 포인트 클라우드가 매우 희소한 경우, m은 m= 2와 같은 작은 수로 설정될 수 있다.The number of decoded points m can be a fixed constant, such as m = 5, or the number of decoded points can be chosen adaptively, for example, based on prior knowledge about the density level of the original point cloud. For example, if the original point cloud is very sparse, m can be set to a small number, such as m = 2.

특징-잔차 변환기는 디코딩된 특징 세트 PC’₀을 PC_DEC의 잔차 성분으로 다시 변환한다. 특히, 일부 실시예들에서, 특징-잔차 변환기는 딥 신경 네트워크를 적용하여 PC'₀ 내의 (PC₀에서의 포인트 A와 연관된) 모든 특징 벡터 f' _A 를 다시 대응하는 잔차 포인트 세트 S' _A 로 변환한다.The feature-residual transformer transforms the decoded feature set PC' ₀ back into the residual components of PC _DEC . In particular, in some embodiments, the feature-residual transformer applies a deep neural network to transform every feature vector f' _A in PC' ₀ (associated with a point A in PC ₀ ) back into the corresponding residual point set S' _A.

일부 실시예에서, 특징-잔차 변환기는 일련의 MLP 계층들일 수 있다. 이 경우에, PC'₀에서의 특징 벡터, 예를 들면, f' _A 는 일련의 MLP 계층들에 입력된다. MLP 계층들은 m개의 3D 포인트 C ₀, C ₁, ..., C _m _-1로 구성된 세트를 직접 출력하며, 이는 디코딩된 잔차 세트 S' _A 를 제공한다. 따라서, n개의 포인트 A ₀, A ₁, ..., A _n _-1를 갖는 PC₀에 대해, 특징-잔차 변환기는 S' ₀, S' ₁, …, S' _n _-1로 표기된, 각자의 디코딩된 잔차 세트들을 생성한다. 이 잔차 세트들은 함께 디코딩된 잔차 성분을 구성한다.In some embodiments, the feature-to-residual transformer can be a sequence of MLP layers. In this case, the feature vector at PC' ₀ , e.g., f' _A , is input to a sequence of MLP layers. The MLP layers directly output a set of m 3D points C ₀ , C ₁ , ..., C _m _-1 , which provides a decoded residual set S' _A . Thus, for PC ₀ with n points A ₀ , A ₁ , ..., A _n _-1 , the feature-to-residual transformer produces its own decoded residual sets, denoted S' ₀ , S' ₁ , ... , S' _n _-1 . These residual sets together constitute a decoded residual component.

일부 실시예들에서, 원점에서 너무 멀리 떨어진 잔차 성분에서의 해당 3D 포인트들은 제거될 수 있다. 구체적으로, 잔차 성분에서의 포인트 C _i 에 대해, 원점으로부터의 그의 거리가 임계값 t보다 큰 경우, 포인트 C는 이상치(outlier)로 간주되어 잔차 성분으로부터 제거된다. 임계값 t는 미리 정의된 상수일 수 있다. 임계값은 또한 인코더 상의 양자화기의 양자화 스텝 크기 s에 따라 선택될 수 있다. 예를 들어, 더 큰 s는 PC₀이 더 거칠다는 것을 의미하며, 잔차 성분에 더 많은 노드들을 유지하기 위해 임계값이 더 큰 값으로 설정될 수 있다.In some embodiments, corresponding 3D points in the residual component that are too far from the origin can be removed. Specifically, for a point C _i in the residual component, if its distance from the origin is greater than a threshold t , the point C is considered as an outlier and is removed from the residual component. The threshold t can be a predefined constant. The threshold can also be chosen depending on the quantization step size s of the quantizer on the encoder. For example, a larger s means a coarser PC ₀ , and the threshold can be set to a larger value to retain more nodes in the residual component.

입력 포인트 클라우드의 밀도 수준에 기초하여 k의 값이 선택될 수 있다. 밀집된 포인트 클라우드의 경우, k의 값이 더 클 수 있다(예를 들면, k = 10). LiDAR 스위프와 같은, 희소 포인트 클라우드의 경우, 도 16의 PC₀에서의 모든 포인트가 원래 포인트 클라우드에서의 단 하나의 포인트와 연관될 수 있음을 의미할 수 있는 k = 1과 같이, k의 값이 매우 작을 수 있다.The value of k can be chosen based on the density level of the input point cloud. For dense point clouds, the value of k can be larger, for example, k = 10. For sparse point clouds, such as LiDAR sweeps, the value of k can be very small, such as k = 1, which can mean that every point in PC ₀ in Fig. 16 can be associated with only one point in the original point cloud.

도 17은 일부 실시예들에 따른 복셀 기반 업샘플링이 있는 예시적인 디코더 아키텍처를 예시하는 블록 다이어그램이다. 일부 실시예들에서, 도 16 및 도 17에서 여기에 도시된 바와 같이, 기본 계층은 기본 디코드(1702) 및 역양자화(1704)를 수행하는 데 사용되는, 비트스트림 BS₀을 수신한다. 일부 실시예들에서는, 도 17에 도시된 바와 같이, 특징 디코더 블록(1706)과 특징-잔차 변환기(1710) 사이에 콘텍스트 인식 업샘플링 블록(1708)이 삽입된다. 더욱이, 기하학적 합산 모듈은 이제 도 16에 도시된 PC₀ 대신에 PC₁을 입력으로 받는다. 일부 실시예들에서, 도 16에서 원래 PC₀을 인코딩/디코딩하는 대신, 이제 도 17에서 인코딩/디코딩된 PC₀은 절반으로 줄어들고(two-times smaller) 따라서 비트 수가 더 적어진다. 일부 실시예들에서, 도 17의 콘텍스트 인식 업샘플링 블록(1708)은 도 5에 도시된 예시적인 프루닝 프로세스를 사용하는 콘텍스트 인식 복셀 기반 업샘플링에 의해 수행될 수 있다.FIG. 17 is a block diagram illustrating an exemplary decoder architecture with voxel-based upsampling according to some embodiments. In some embodiments, as depicted here in FIGS. 16 and 17 , the base layer receives a bitstream BS ₀ , which is used to perform base decode (1702) and dequantization (1704). In some embodiments, a context-aware upsampling block (1708) is inserted between the feature decoder block (1706) and the feature-to-residual transformer (1710), as depicted in FIG. 17 . Furthermore, the geometric summation module now takes PC ₁ as input instead of PC ₀ as depicted in FIG. 16 . In some embodiments, instead of encoding/decoding the original PC ₀ in FIG. 16 , the encoded/decoded PC ₀ in FIG. 17 is now two-times smaller and thus has fewer bits. In some embodiments, the context-aware upsampling block (1708) of FIG. 17 may be performed by context-aware voxel-based upsampling using the exemplary pruning process illustrated in FIG. 5.

도 18은 일부 실시예들에 따른 복셀 기반 업샘플링 및 특징 집계가 있는 예시적인 디코더 아키텍처를 예시하는 블록 다이어그램이다. 예시적인 프로세스(1800)의 일부 실시예들에서, 도 18에 도시된 바와 같이, 콘텍스트 인식 업샘플링 블록(1808)과 특징-잔차 변환기(1812 사이에 특징 집계 블록(1810)이 삽입된다. 이 경우에, 더 높은 업샘플링 비율 및 PC₀의 추가 비트 절감을 달성하기 위해, 예컨대, 도 7에 제시된 방식으로, 콘텍스트 인식 업샘플링 블록(1808)이 또한 여러 번 캐스케이딩될 수 있다. 일부 실시예들에서, 기본 계층은 기본 디코드(1802), 역양자화(1804), 및 특징 디코드(1806)를 수행하는 데 사용되는, 비트스트림 BS₀을 수신한다.FIG. 18 is a block diagram illustrating an exemplary decoder architecture with voxel-based upsampling and feature aggregation according to some embodiments. In some embodiments of the exemplary process (1800), a feature aggregation block (1810) is inserted between the context-aware upsampling block (1808) and the feature-to-residual transformer (1812), as illustrated in FIG. 18. In this case, the context-aware upsampling block (1808) can also be cascaded multiple times, for example, in the manner presented in FIG. 7, to achieve higher upsampling ratio and additional bit savings of _PC 0 . In some embodiments, the base layer receives a bitstream BS ₀ , which is used to perform base decode (1802), dequantization (1804), and feature decode (1806).

도 19는 일부 실시예들에 따른 특징-잔차 변환기가 없는 예시적인 디코더 아키텍처를 예시하는 블록 다이어그램이다. 일부 실시예들에서, 도 18과 비교하여, 도 19에 도시된 바와 같이, 콘텍스트 인식 업샘플링 블록들(1908, 1912)만이 제시되고, 특징-잔차 변환기가 제거된다. 이 시나리오에서, PC₀은 점진적으로 업샘플링되고 세분화되어 디코딩된 포인트 클라우드 PC_DEC를 획득한다. 도 18의 예에서, 두 개의 콘텍스트 인식 업샘플링 블록(1908, 1912)이 특징 집계 블록(1910)의 양쪽에 도시되어 있다. 그러나, 일부 실시예들에서, PC_DEC를 획득하기 위해 상이한 수의 콘텍스트 인식 업샘플링 블록이 사용될 수 있다. 일부 실시예들에서, 기본 계층은 기본 디코드(1902), 역양자화(1904), 및 특징 디코드(1906)를 수행하는 데 사용되는, 비트스트림 BS₀을 수신한다.FIG. 19 is a block diagram illustrating an exemplary decoder architecture without a feature-to-residual transformer according to some embodiments. In some embodiments, compared to FIG. 18, only the context-aware upsampling blocks (1908, 1912) are presented, and the feature-to-residual transformer is removed, as illustrated in FIG. 19. In this scenario, PC ₀ is progressively upsampled and refined to obtain a decoded point cloud PC _DEC . In the example of FIG. 18, two context-aware upsampling blocks (1908, 1912) are illustrated on either side of the feature aggregation block (1910). However, in some embodiments, a different number of context-aware upsampling blocks may be used to obtain the PC _DEC . In some embodiments, the base layer receives a bitstream BS ₀ , which is used to perform base decode (1902), dequantization (1904), and feature decode (1906).

'015 출원에서, 옥트리 기반 PCC, 복셀 기반 PCC, 및 포인트 기반 PCC를 구현하기 위해 PCC에 대한 하이브리드 코딩 프레임워크가 사용된다. 일부 실시예들에서, 이러한 유형의 PCC들 중 하나 이상이 도 17, 도 18 및 도 19에 도시된 방법들에서 사용될 수 있다. '015 출원은 이러한 유형의 PCC들 중 두 개를 결합한 것을 제안한다. 특히, 한 경우에서, (i) 옥트리 기반 PCC 방법 및 (ii) 복셀 기반 PCC 방법만이 사용된다. 이 구성은 도 19에 대응한다.In the '015 application, a hybrid coding framework for PCC is used to implement octree-based PCC, voxel-based PCC, and point-based PCC. In some embodiments, more than one of these types of PCCs may be used in the methods illustrated in FIGS. 17, 18 and 19. The '015 application proposes combining two of these types of PCCs. In particular, in one case, only (i) the octree-based PCC method and (ii) the voxel-based PCC method are used. This configuration corresponds to FIG. 19.

본 출원에 설명된 프로세스들은 포인트 클라우드 초해상도에 적용될 수 있다. 일부 실시예들에서, 콘텍스트 인식 업샘플링 프로세스가 입력 포인트 클라우드 PC₀ 및 그의 점유된 복셀들 각각과 연관된 특징 세트에 적용되어 2배의 초해상도를 달성할 수 있다. 일부 실시예들에서, 다수의 콘텍스트 인식 업샘플링 프로세스가 도 7에 도시된 바와 같이 입력 포인트 클라우드 PC₀ 및 그의 점유된 복셀들 각각과 연관된 특징 세트에 적용되어 2배 초과의 초해상도를 달성할 수 있다.The processes described in the present application can be applied to point cloud super-resolution. In some embodiments, a context-aware upsampling process can be applied to a feature set associated with an input point cloud PC ₀ and each of its occupied voxels to achieve a 2x super-resolution. In some embodiments, multiple context-aware upsampling processes can be applied to a feature set associated with an input point cloud PC ₀ and each of its occupied voxels to achieve a greater than 2x super-resolution, as illustrated in FIG. 7.

일부 실시예들에서, 복셀들과 연관된 특징들은 색상 및 강도와 같은 속성들일 수 있다. 일부 실시예들에서, 도 10 내지 도 12에 도시된 특징 집계 프로세스들과 같이, 복셀들과 연관된 특징들이 신경 네트워크 계층들로 추출된 PC₀의 로컬 기하학적 특징들일 수 있다. 일부 실시예들에서, 복셀들과 연관된 특징들은 속성들과 기하학적 특징들 둘 모두의 연결일 수 있다.In some embodiments, the features associated with voxels can be attributes such as color and intensity. In some embodiments, the features associated with voxels can be local geometric features of PC ₀ extracted by neural network layers, such as the feature aggregation processes illustrated in FIGS. 10-12. In some embodiments, the features associated with voxels can be a concatenation of both attributes and geometric features.

도 20은 일부 실시예들에 따른 복셀 기반 업샘플링 및 특징 집계를 통한 단일 진행을 갖는 예시적인 디코더 아키텍처를 예시하는 블록 다이어그램이다. 예시적인 특징 디코더 프로세스(2000)의 일부 실시예들에서, 도 20에 도시된 바와 같이, 콘텍스트 인식 복셀 기반 업샘플링(2002)이 한 번만 적용되는 경우, 특징 집계 및 세분화를 위해 콘텍스트 인식 복셀 기반 업샘플링 후에 특징 집계(2004)가 추가(append)될 수 있다.FIG. 20 is a block diagram illustrating an exemplary decoder architecture having a single pass through voxel-based upsampling and feature aggregation according to some embodiments. In some embodiments of the exemplary feature decoder process (2000), where context-aware voxel-based upsampling (2002) is applied only once, as illustrated in FIG. 20 , feature aggregation (2004) may be appended after the context-aware voxel-based upsampling for feature aggregation and segmentation.

(도 20에 도시된 바와 같이) 콘텍스트 인식 복셀 기반 업샘플링에 뒤이어서 특징 집계를 적용하는 경우, 특징 집계 및 콘텍스트 인식 업샘플링 블록 내의 다른 모든 특징 집계들은 동일한 신경 네트워크 아키텍처를 가지며 동일한 신경 네트워크 파라미터들을 공유한다. 일부 실시예들에서, 모든 특징 집계 블록들이 동일한 가중치 세트를 공유하도록 하는 것에 의해, 신경 네트워크 파라미터의 총수가 감소될 수 있다. 특징 집계 블록들 간에 신경 네트워크 파라미터들을 공유하는 것에 의해, 신경 네트워크 모델의 총 파라미터 수가 감소될 수 있으며, 이는 (예를 들어) 다음과 같은 두 가지 이점이 있다. 첫째, 신경 네트워크의 모델 크기가 감소될 수 있으며, 이는 신경 네트워크 모델의 저장 또는 송신을 더 용이하도록 만든다. 둘째로, 훈련 스테이지 동안 학습될 총 파라미터 수를 감소시키는 것은 훈련이 더 빨리 수렴하도록 만들 수 있다. 그러나, 특징 집계 블록들 간에 신경 네트워크 파라미터들을 공유한 결과는 신경 네트워크 모델의 용량의 감소일 수 있으며, 이는 신경 네트워크가 고수준 특징을 추출하는 능력을 저하시킬 수 있고, 신경 네트워크의 성능을 저하시킬 수 있다. 이러한 잠재적인 결과는 일부 응용 분야들에 불리한 것으로 판명될 수 있다.When applying feature aggregation following context-aware voxel-based upsampling (as illustrated in FIG. 20), all other feature aggregations within the feature aggregation and context-aware upsampling blocks have the same neural network architecture and share the same neural network parameters. In some embodiments, by having all feature aggregation blocks share the same set of weights, the total number of neural network parameters can be reduced. By sharing neural network parameters between feature aggregation blocks, the total number of parameters of the neural network model can be reduced, which has two advantages (for example): First, the model size of the neural network can be reduced, which makes it easier to store or transmit the neural network model. Second, reducing the total number of parameters to be learned during the training stage can make the training converge faster. However, the result of sharing neural network parameters between feature aggregation blocks can be a reduction in the capacity of the neural network model, which can reduce the ability of the neural network to extract high-level features and degrade the performance of the neural network. These potential consequences may prove detrimental to some applications.

일부 실시예들에서, 도 5에 도시된 바와 같이 프루닝을 사용한 콘텍스트 인식 복셀 기반 업샘플링이 사용되는 경우, 다음과 같은 특징 집계 블록들은 동일한 신경 네트워크 파라미터 세트를 공유할 수 있다: (i) 이진 분류 블록 내의 특징 집계 블록(도 9에 도시됨), 및 (ii) 콘텍스트 인식 복셀 기반 업샘플링 블록 이후의 특징 집계 블록(도 20에 도시됨).In some embodiments, when context-aware voxel-based upsampling with pruning is used as illustrated in FIG. 5, the following feature aggregation blocks can share the same set of neural network parameters: (i) the feature aggregation block within the binary classification block (illustrated in FIG. 9), and (ii) the feature aggregation block after the context-aware voxel-based upsampling block (illustrated in FIG. 20).

유사하게, 일부 실시예들에서, 도 8에 도시된 바와 같이 초기 특징 집계를 사용한 콘텍스트 인식 복셀 기반 업샘플링이 사용되는 경우, 다음과 같은 특징 집계 블록들은 동일한 신경 네트워크 파라미터 세트를 공유할 수 있다: (i) 최근접 이웃(NN) 업샘플링 블록 이후의 특징 집계 블록(도 8에 도시됨), (ii) 이진 분류 블록 내의 특징 집계 블록(도 9에 도시됨), 및 (iii) 콘텍스트 인식 복셀 기반 업샘플링 블록 이후의 특징 집계 블록(도 20에 도시됨).Similarly, in some embodiments, when context-aware voxel-based upsampling with initial feature aggregation is used as illustrated in FIG. 8 , the following feature aggregation blocks can share the same set of neural network parameters: (i) the feature aggregation block after the nearest neighbor (NN) upsampling block (illustrated in FIG. 8 ), (ii) the feature aggregation block within the binary classification block (illustrated in FIG. 9 ), and (iii) the feature aggregation block after the context-aware voxel-based upsampling block (illustrated in FIG. 20 ).

도 21은 일부 실시예들에 따른 예시적인 희소 텐서 연산들을 예시하는 블록 다이어그램이다. 도 21은 다운샘플링(2104), 업샘플링(2108), 좌표 판독/분할(2116), 및 좌표 프루닝(2112) 프로세스들에 대한 예시적인 프로세스(2100)를 예시하는 데 사용된다. 간단함을 위해, 도 21에서의 동작들이 2D 공간에서 설명되지만, 동일한 근거(rationale)가 3D 공간에 적용될 수 있다. 이 예에서, 입력 포인트 클라우드 A0(2102)은 위치들 (0, 2), (0, 3), (0, 4), (0, 5), (1, 1), (1, 6), (2, 6), (3, 5), (4, 4), (5, 4), (6, 4), 및 (7, 4)(2118)에 있는 복셀들을 점유하고 있으며, 여기서 원점은 제로 기반이고 좌측 상부 코너에 있다. 따라서, 좌표 판독기/분할기(2116)는 점유된 좌표를 (0, 2), (0, 3), (0, 4), (0, 5), (1, 1), (1, 6), (2, 6), (3, 5), (4, 4), (5, 4), (6, 4), 및 (7, 4)(2118)로서 출력한다. 2의 비율에 의해 A0(2102)을 다운샘플링하는 것에 의해, 다운샘플링된 포인트 클라우드 A1(2106)에서 각각의 차원에서 복셀 수가 절반으로 감소되고, A0(2102)에서 대응하는 4개의 포인트 중 임의의 것이 점유되어 있는 경우 복셀은 점유된 것으로 간주된다. A1(2106)을 업샘플링한 후에, A2(2110)에서 복셀의 수가 재개되며, A1(2106)에서 대응하는 복셀이 점유된 경우 A2(2110)에서의 복셀은 점유된 것으로 간주된다. A2(2110)는 입력 포인트 클라우드 A0(2102)보다 더 밀집되어 있다. A2(2110)로부터 A0(2102)에서 점유되지 않은 포인트들(복셀들)을 제거/프루닝하기 위해, 점유된 좌표 정보가 좌표 프루닝 블록에 의해 사용된다. 결과적인 포인트 클라우드 A3(2114)에서, 원래 포인트 클라우드 A0에서 점유된 복셀들만이 점유된 것으로 취급된다.FIG. 21 is a block diagram illustrating exemplary sparse tensor operations according to some embodiments. FIG. 21 is used to illustrate an exemplary process (2100) for downsampling (2104), upsampling (2108), coordinate reading/splitting (2116), and coordinate pruning (2112) processes. For simplicity, the operations in FIG. 21 are described in 2D space, but the same rationale can be applied to 3D space. In this example, the input point cloud A0 (2102) occupies voxels at locations (0, 2), (0, 3), (0, 4), (0, 5), (1, 1), (1, 6), (2, 6), (3, 5), (4, 4), (5, 4), (6, 4), and (7, 4) (2118), where the origin is zero-based and in the upper left corner. Therefore, the coordinate reader/segmenter (2116) outputs the occupied coordinates as (0, 2), (0, 3), (0, 4), (0, 5), (1, 1), (1, 6), (2, 6), (3, 5), (4, 4), (5, 4), (6, 4), and (7, 4) (2118). By downsampling A0 (2102) by a ratio of 2, the number of voxels in each dimension in the downsampled point cloud A1 (2106) is reduced by half, and a voxel is considered occupied if any of its four corresponding points in A0 (2102) are occupied. After upsampling A1(2106), the number of voxels in A2(2110) is resumed, and a voxel in A2(2110) is considered occupied if its corresponding voxel in A1(2106) is occupied. A2(2110) is more dense than the input point cloud A0(2102). To remove/prune the unoccupied points(voxels) in A0(2102) from A2(2110), the occupied coordinate information is used by the coordinate pruning block. In the resulting point cloud A3(2114), only the voxels that are occupied in the original point cloud A0 are treated as occupied.

도 22는 일부 실시예들에 따른 예시적인 디코더 아키텍처를 예시하는 블록 다이어그램이다. 일부 실시예들에서 희소 3D 콘볼루션, 다운샘플링 및 업샘플링에 기초한 예시적인 특징 디코더(2200)가 도 22에 도시되어 있다. 비트스트림 BS₁(2202)은 엔트로피 디코딩(2204)되고 특징 역양자화(2206)되어 다운샘플링된 특징 세트 F'_down을 생성한 후, PC₀의 기하학적 구조를 사용한 순차적인 업샘플링을 통해 특징들을 점진적으로 확대 및 세분화한다. 도 22에서, 비트스트림 BS₁은 엔트로피 디코더에 의해 디코딩된 후, 역양자화기에 의해 다운샘플링된 특징 세트 F'_down을 생성한다.FIG. 22 is a block diagram illustrating an exemplary decoder architecture according to some embodiments. An exemplary feature decoder (2200) based on sparse 3D convolution, downsampling and upsampling in some embodiments is illustrated in FIG. 22. A bitstream BS ₁ (2202) is entropy decoded (2204) and feature dequantized (2206) to generate a downsampled feature set F' _down , followed by sequential upsampling using the geometry of PC ₀ to progressively enlarge and refine the features. In FIG. 22 , the bitstream BS ₁ is decoded by an entropy decoder, followed by a dequantizer to generate a downsampled feature set F' _down .

도 22의 우측 상부 코너에 도시된 바와 같이, PC₀의 기하학적 구조(좌표)에 (단독으로) 기초하여 3D 희소 텐서가 구성된다(2244). 텐서는 순차적으로 다운샘플링되어(2240, 2236) 텐서 PC'_down을 생성한다. 도 22에서의 PC'_down 및 특징 인코더(도시되지 않음)에 대한 PC_down은 동일한 기하학적 구조를 가질 수 있지만, 일부 실시예들에서 그의 특징들이 상이할 수 있다. F'_down을 업샘플링하기 위해, F'_down은 PC'_down의 기하학적 구조로 변환된다. F'_down의 기하학적 구조를 PC'_down의 기하학적 구조로 변환하기 위해, 특징 교체 블록(2208)은 PC'_down의 원래 특징들을 F'_down으로 교체하여, 다른 희소 텐서 PC"_down을 생성한다.As illustrated in the upper right corner of Fig. 22, a 3D sparse tensor is constructed (2244) based (solely) on the geometry (coordinates) of PC ₀ . The tensor is sequentially downsampled (2240, 2236) to generate a tensor PC' _down . PC' _down and PC _down for feature encoder (not illustrated) in Fig. 22 may have the same geometry, but in some embodiments their features may be different. To upsample F' _down , F' _down is transformed into the geometry of PC' _down . To transform the geometry of F' _down into that of PC' _down , a feature replacement block (2208) replaces original features of PC' _down with F' _down to generate another sparse tensor PC" _down .

PC"_down은 두 개의 업샘플링 처리 블록에 의해 업샘플링되며, 여기서 각각의 블록은 하나의 업샘플링 연산자(2210, 2222)와 두 개의 희소 3D 콘볼루션 계층을 포함한다. 도 22에서, "업샘플링 2"는 2의 비율을 갖는 희소 텐서 업샘플링 연산자(2210, 2222)이다. 일부 실시예들에서, 업샘플링 2 블록은 희소 텐서 업샘플링 연산자(2210, 2222)와 두 세트의 콘볼루션 디코더(2212, 2216, 2224, 2228) 및 정류 선형 유닛(ReLU)(2214, 2218, 2226, 2230)을 포함할 수 있다. 업샘플링 2 블록은 "일반적인" 2D 이미지의 업샘플링 연산자와 유사하게, 각각의 차원을 따라 희소 텐서의 크기를 2배로 확장한다. 예시적인 예에 대해서는 도 21을 참조한다. 각각의 업샘플링 처리 블록 후에, 결과적인 텐서는 각자의 좌표 판독기(2238, 2242) 및 좌표 프루닝 블록(2220, 2232)을 사용하여 세분화된다. 도 21은 좌표 프루닝의 예시적인 예를 보여주며, 이는 입력 텐서의 점유된 복셀들 중 일부를 제거하고 좌표 판독기로부터 획득될 수 있는 입력 좌표 세트에 기초하여 나머지를 유지한다. 좌표 프루닝 블록(2220)은 PC"_down의 업샘플링된 버전으로부터 일부 복셀들(및 연관된 특징들)을 제거하고, PC₀의 다운샘플링된 버전에도 나타나는 해당 복셀들만을 유지한다. 두 번째 좌표 프루닝 블록(2232)의 출력은 PC₀과 동일한 기하학적 구조를 가지는 텐서이다. 이 텐서는 디코딩된 특징 세트 PC'₀을 획득하기 위해 특징 판독기(2234)에 입력된다.PC" _down is upsampled by two upsampling processing blocks, each block including one upsampling operator (2210, 2222) and two sparse 3D convolutional layers. In FIG. 22, "upsampling 2" is a sparse tensor upsampling operator (2210, 2222) with a ratio of 2. In some embodiments, the upsampling 2 block may include the sparse tensor upsampling operator (2210, 2222) and two sets of convolutional decoders (2212, 2216, 2224, 2228) and rectified linear units (ReLUs) (2214, 2218, 2226, 2230). The upsampling 2 block may include sparse tensor upsampling operators (2210, 2222) along each dimension, similar to the upsampling operator of a "regular" 2D image. Double the size of the tensor. For an illustrative example, see Fig. 21. After each upsampling processing block, the resulting tensor is refined using respective coordinate readers (2238, 2242) and coordinate pruning blocks (2220, 2232). Fig. 21 shows an illustrative example of coordinate pruning, which removes some of the occupied voxels of the input tensor and keeps the rest based on the input coordinate set that can be obtained from the coordinate readers. The coordinate pruning block (2220) removes some voxels (and associated features) from the upsampled version of PC" _down and keeps only those voxels that also appear in the downsampled version of PC ₀ . The output of the second coordinate pruning block (2232) is a tensor having the same geometry as PC ₀ . This tensor is input to a feature reader (2234) to obtain a decoded feature set PC' ₀ .

도 23은 일부 실시예들에 따른 예시적인 디코더 아키텍처를 예시하는 블록 다이어그램이다. 도 22와 비교하여, 좌표 프루닝 블록(2318, 2332)(그리고 일부 실시예들에서 특징 집계 블록(2320, 2334))이 일부 실시예들에서 도 23의 선행하는 업샘플링 처리 블록에 흡수된다.Fig. 23 is a block diagram illustrating an exemplary decoder architecture according to some embodiments. Compared to Fig. 22, the coordinate pruning blocks (2318, 2332) (and in some embodiments the feature aggregation blocks (2320, 2334)) are absorbed into the preceding upsampling processing block of Fig. 23 in some embodiments.

예시적인 디코더(2300)의 일부 실시예들에서, 텐서가 순차적으로 다운샘플링(2342, 2338)되어, 텐서 PC'_down을 생성한다. 일부 실시예들에서, 비트스트림은 엔트로피 디코딩(2302)되고 특징 역양자화(2304)되어 다운샘플링된 특징 세트 F'_down을 생성한다. F'_down의 기하학적 구조를 PC'_down의 기하학적 구조로 변환하기 위해, 특징 교체 블록(2306)은 PC'_down의 원래 특징들을 F'_down으로 교체하여, 다른 희소 텐서 PC"_down을 생성한다. 일부 실시예들에서, 업샘플링 2 블록은 희소 텐서 업샘플링 연산자(2308, 2322)와 두 세트의 콘볼루션 디코더(2310, 2314, 2324, 2328) 및 정류 선형 유닛(ReLU)(2312, 2316, 2326, 2330)을 포함할 수 있다. 일부 실시예들에서, 각각의 업샘플링 처리 블록 후에, 결과적인 텐서는 각자의 좌표 판독기(2340, 2344) 및 좌표 프루닝 블록(2318, 2332)을 사용하여 세분화된다. 두 번째 특징 집계 블록(2334)의 출력은 PC₀과 동일한 기하학적 구조를 가지는 텐서이다. 이 텐서는 디코딩된 특징 세트 PC'₀을 획득하기 위해 특징 판독기(2336)에 입력된다.In some embodiments of the exemplary decoder (2300), the tensor is sequentially downsampled (2342, 2338) to generate a tensor PC' _down . In some embodiments, the bitstream is entropy decoded (2302) and feature dequantized (2304) to generate a downsampled feature set F' _down . To transform the geometry of F' _down into the geometry of PC' _down , the feature replacement block (2306) replaces the original features of PC' _down with F' _down to generate another sparse tensor PC" _down . In some embodiments, the upsampling 2 block may include a sparse tensor upsampling operator (2308, 2322) and two sets of convolutional decoders (2310, 2314, 2324, 2328) and rectified linear units (ReLUs) (2312, 2316, 2326, 2330). In some embodiments, after each upsampling processing block, the resulting tensor is refined using respective coordinate readers (2340, 2344) and coordinate pruning blocks (2318, 2332). The output of the second feature aggregation block (2334) is the same as PC ₀ . A tensor having a geometric structure. This tensor is input to a feature reader (2336) to obtain a decoded feature set PC' ₀ .

도 24는 일부 실시예들에 따른 프루닝을 사용하는 콘텍스트 인식 복셀 기반 업샘플링의 예시적인 프로세스를 예시하는 흐름도이다. 일부 실시예들에서, 예시적인 프로세스(2400)는 초기 업샘플링을 사용하여 제1 포인트 클라우드를 업샘플링하여 제2 포인트 클라우드를 획득하는 단계(2402)를 포함할 수 있다. 일부 실시예들에서, 예시적인 프로세스(2400)는 제2 포인트 클라우드의 특징들을 복셀 단위 콘텍스트 정보와 연관시켜 제3 포인트 클라우드를 획득하는 단계(2404)를 더 포함할 수 있다. 일부 실시예들에서, 예시적인 프로세스(2400)는 제3 포인트 클라우드의 적어도 하나의 복셀의 점유 상태를 예측하는 단계(2406)를 더 포함할 수 있다. 일부 실시예들에서, 예시적인 프로세스(2400)는, 예측된 점유 상태에 따라, 비어 있는 것으로 분류된 제3 포인트 클라우드의 복셀들을 제거하여 프루닝된 포인트 클라우드를 생성하는 단계(2408)를 더 포함할 수 있다. 일부 실시예들에서, 초기 업샘플링은 최근접 이웃 업샘플링을 포함할 수 있다. 일부 실시예들에서, 특징들을 연관시키는 것은 특징들을 연결하는 것을 포함할 수 있다.FIG. 24 is a flowchart illustrating an exemplary process of context-aware voxel-based upsampling using pruning according to some embodiments. In some embodiments, the exemplary process (2400) may include a step (2402) of upsampling a first point cloud using an initial upsampling to obtain a second point cloud. In some embodiments, the exemplary process (2400) may further include a step (2404) of associating features of the second point cloud with voxel-wise context information to obtain a third point cloud. In some embodiments, the exemplary process (2400) may further include a step (2406) of predicting an occupancy state of at least one voxel of the third point cloud. In some embodiments, the exemplary process (2400) may further include a step (2408) of removing voxels of the third point cloud classified as empty based on the predicted occupancy state to generate a pruned point cloud. In some embodiments, the initial upsampling may include nearest neighbor upsampling. In some embodiments, associating features may include concatenating features.

도 25는 일부 실시예들에 따른 콘텍스트 인식 복셀 기반 업샘플링 및 특징 집계의 예시적인 프로세스를 예시하는 흐름도이다. 일부 실시예들에서, 예시적인 프로세스(2500)는 초기 업샘플링을 사용하여 제1 포인트 클라우드를 업샘플링하여 제2 포인트 클라우드를 획득하는 단계(2502)를 포함할 수 있다. 일부 실시예들에서, 예시적인 프로세스(2500)는 제2 포인트 클라우드의 특징들을 콘텍스트 정보와 연관시켜 제3 포인트 클라우드를 획득하는 단계(2504)를 더 포함할 수 있다. 일부 실시예들에서, 예시적인 프로세스(2500)는 제3 포인트 클라우드의 적어도 하나의 복셀의 점유 상태를 예측하는 단계(2506)를 더 포함할 수 있으며, 여기서 적어도 하나의 복셀의 점유 상태를 예측하는 단계는 제3 포인트 클라우드의 적어도 하나의 특징을 집계하는 단계를 포함하고, 여기서 제3 포인트 클라우드의 적어도 하나의 특징을 집계하는 단계는 제1 신경 네트워크를 사용하는 단계를 포함하며, 여기서 제1 신경 네트워크를 사용하여 제3 포인트 클라우드의 적어도 하나의 특징을 집계하는 단계는 제1 신경 네트워크와 함께 제1 신경 네트워크 파라미터 세트를 사용하는 단계를 포함한다. 일부 실시예들에서, 예시적인 프로세스(2500)는 예측된 점유 상태에 따라, 비어 있는 것으로 분류된 제3 포인트 클라우드의 복셀들을 제거하여 프루닝된 포인트 클라우드를 생성하는 단계(2508)를 더 포함할 수 있다. 일부 실시예들에서, 예시적인 프로세스(2500)는 프루닝된 포인트 클라우드에 대해 특징 집계를 수행하여 집계된 특징을 생성하는 단계(2510)를 더 포함할 수 있으며, 여기서 프루닝된 포인트 클라우드에 대해 특징 집계를 수행하는 단계는 제2 신경 네트워크를 사용하는 단계를 포함하고, 여기서 제2 신경 네트워크를 사용하여 집계된 특징을 생성하는 단계는 제2 신경 네트워크와 함께 제2 신경 네트워크 파라미터 세트를 사용하는 단계를 포함하며, 여기서 제1 신경 네트워크 파라미터 세트는 제2 신경 네트워크 파라미터 세트와 동일하다.FIG. 25 is a flowchart illustrating an exemplary process of context-aware voxel-based upsampling and feature aggregation according to some embodiments. In some embodiments, the exemplary process (2500) may include a step (2502) of upsampling a first point cloud using an initial upsampling to obtain a second point cloud. In some embodiments, the exemplary process (2500) may further include a step (2504) of associating features of the second point cloud with context information to obtain a third point cloud. In some embodiments, the exemplary process (2500) may further include a step (2506) of predicting an occupancy state of at least one voxel of the third point cloud, wherein the step of predicting the occupancy state of the at least one voxel comprises aggregating at least one feature of the third point cloud, wherein the step of aggregating the at least one feature of the third point cloud comprises using a first neural network, and wherein the step of aggregating the at least one feature of the third point cloud using the first neural network comprises using a first neural network parameter set in conjunction with the first neural network. In some embodiments, the exemplary process (2500) may further include a step (2508) of removing voxels of the third point cloud classified as empty, based on the predicted occupancy state, to generate a pruned point cloud. In some embodiments, the exemplary process (2500) may further include a step (2510) of performing feature aggregation on the pruned point cloud to generate aggregated features, wherein the step of performing feature aggregation on the pruned point cloud comprises using a second neural network, and wherein the step of generating the aggregated features using the second neural network comprises using a second neural network parameter set in conjunction with the second neural network, wherein the first neural network parameter set is identical to the second neural network parameter set.

도 26은 일부 실시예들에 따른 비트스트림을 인코딩하는 예시적인 프로세스를 예시하는 흐름도이다. 일부 실시예들에서, 예시적인 프로세스(2600)는 제1 포인트 클라우드를 획득하는 단계(2602)를 포함할 수 있다. 일부 실시예들에서, 예시적인 프로세스(2600)는 제1 포인트 클라우드의 적어도 하나의 복셀의 점유 상태를 결정하는 단계(2604)를 더 포함할 수 있다. 일부 실시예들에서, 예시적인 프로세스(2600)는 결정된 점유 상태에 따라, 비어 있는 것으로 분류된 제1 포인트 클라우드의 복셀들을 제거하여 제2 포인트 클라우드를 생성하는 단계(2606)를 더 포함할 수 있다. 일부 실시예들에서, 예시적인 프로세스(2600)는 제2 포인트 클라우드의 특징들을 콘텍스트 정보와 연관시켜 제3 포인트 클라우드를 획득하는 단계(2608)를 더 포함할 수 있다. 일부 실시예들에서, 예시적인 프로세스(2600)는 초기 다운샘플링을 사용하여 제3 포인트 클라우드를 다운샘플링하여 제4 포인트 클라우드를 획득하는 단계(2610)를 더 포함할 수 있다. 일부 실시예들에서, 예시적인 프로세스(2600)는 제4 포인트 클라우드를 인코딩된 포인트 클라우드로서 출력하는 단계(2612)를 더 포함할 수 있다.FIG. 26 is a flowchart illustrating an exemplary process for encoding a bitstream according to some embodiments. In some embodiments, the exemplary process (2600) may include a step of obtaining a first point cloud (2602). In some embodiments, the exemplary process (2600) may further include a step of determining an occupancy state of at least one voxel of the first point cloud (2604). In some embodiments, the exemplary process (2600) may further include a step of generating a second point cloud by removing voxels of the first point cloud classified as empty based on the determined occupancy state (2606). In some embodiments, the exemplary process (2600) may further include a step of obtaining a third point cloud by associating features of the second point cloud with context information (2608). In some embodiments, the exemplary process (2600) may further include a step (2610) of downsampling the third point cloud using initial downsampling to obtain a fourth point cloud. In some embodiments, the exemplary process (2600) may further include a step (2612) of outputting the fourth point cloud as an encoded point cloud.

일부 실시예들에서, 장치는 최근접 이웃 업샘플링을 사용하여 제1 포인트 클라우드를 업샘플링하여 제2 포인트 클라우드를 획득하고; 제2 포인트 클라우드의 특징들을 복셀 단위 콘텍스트 정보와 연결하여 제3 포인트 클라우드를 획득하며; 제3 포인트 클라우드의 적어도 하나의 복셀의 점유 상태를 예측하고; 예측된 점유 상태에 따라, 비어 있는 것으로 분류된 제3 포인트 클라우드의 복셀들을 제거하도록 구성된 하나 이상의 프로세서를 포함할 수 있다.In some embodiments, the device may include one or more processors configured to obtain a second point cloud by upsampling a first point cloud using nearest neighbor upsampling; obtain a third point cloud by concatenating features of the second point cloud with voxel-wise context information; predict an occupancy state of at least one voxel of the third point cloud; and remove voxels of the third point cloud classified as empty based on the predicted occupancy state.

일부 실시예들에 따른 방법들 및 시스템들은 가상 현실(VR) 맥락에서 논의되지만, 일부 실시예들은 혼합 현실(MR)/증강 현실(AR) 맥락에도 적용될 수 있다. 또한, 일부 실시예들에 따라 "머리 장착형 디스플레이(HMD)"라는 용어가 본 명세서에서 사용되지만, 일부 실시예들은, 예를 들면, 일부 실시예들에 대해 VR, AR 및/또는 MR이 가능한 (머리에 부착될 수 있거나 부착되지 않을 수 있는) 웨어러블 디바이스에 적용될 수 있다.While the methods and systems according to some embodiments are discussed in a virtual reality (VR) context, some embodiments may also be applied in a mixed reality (MR)/augmented reality (AR) context. Additionally, while the term "head-mounted display (HMD)" is used herein according to some embodiments, some embodiments may be applied to, for example, a wearable device (which may or may not be head-mounted) that is capable of VR, AR, and/or MR, for example, for some embodiments.

일부 실시예들에 따른 예시적인 방법은 초기 업샘플링을 사용하여 제1 포인트 클라우드를 업샘플링하여 제2 포인트 클라우드를 획득하는 단계; 제2 포인트 클라우드의 특징들을 콘텍스트 정보와 연관시켜 제3 포인트 클라우드를 획득하는 단계; 제3 포인트 클라우드의 적어도 하나의 복셀의 점유 상태를 예측하는 단계; 및 예측된 점유 상태에 따라, 비어 있는 것으로 분류된 제3 포인트 클라우드의 복셀들을 제거하여 프루닝된 포인트 클라우드를 생성하는 단계를 포함할 수 있다.An exemplary method according to some embodiments may include: upsampling a first point cloud using initial upsampling to obtain a second point cloud; associating features of the second point cloud with context information to obtain a third point cloud; predicting an occupancy state of at least one voxel of the third point cloud; and removing voxels of the third point cloud classified as empty based on the predicted occupancy state to generate a pruned point cloud.

예시적인 방법의 일부 실시예들에서, 초기 업샘플링은 최근접 이웃 업샘플링을 포함할 수 있다.In some embodiments of the exemplary method, the initial upsampling may include nearest neighbor upsampling.

예시적인 방법의 일부 실시예들에서, 특징들을 연관시키는 단계는 제2 포인트 클라우드의 특징들을 콘텍스트 정보와 연결하여 제3 포인트 클라우드를 획득하는 단계를 포함할 수 있다.In some embodiments of the exemplary method, the step of associating features may include the step of obtaining a third point cloud by associating features of the second point cloud with context information.

예시적인 방법의 일부 실시예들에서, 점유 상태를 예측하는 단계는 제1 신경 네트워크를 사용하여 수행될 수 있다.In some embodiments of the exemplary method, the step of predicting the occupancy state may be performed using a first neural network.

예시적인 방법의 일부 실시예들에서, 점유 상태를 예측하는 단계는 적어도 하나의 복셀의 실측 점유 상태를 예측할 수 있다.In some embodiments of the exemplary method, the step of predicting an occupancy state may predict a ground truth occupancy state of at least one voxel.

예시적인 방법의 일부 실시예들에서, 점유 상태를 예측하는 단계는 적어도 하나의 복셀이 점유될 가능성을 예측할 수 있다.In some embodiments of the exemplary method, the step of predicting occupancy may predict a likelihood that at least one voxel is occupied.

예시적인 방법의 일부 실시예들에서, 제3 포인트 클라우드의 복셀들을 제거하는 단계는 복셀 프루닝 프로세스를 사용하여 복셀들을 제거할 수 있다.In some embodiments of the exemplary method, the step of removing voxels of the third point cloud may remove voxels using a voxel pruning process.

예시적인 방법의 일부 실시예들은 제2 포인트 클라우드의 적어도 하나의 특징을 집계하는 단계를 더 포함할 수 있다.Some embodiments of the exemplary method may further include a step of aggregating at least one feature of the second point cloud.

예시적인 방법의 일부 실시예들에서, 콘텍스트 정보는 복셀 단위 콘텍스트 정보일 수 있다.In some embodiments of the exemplary method, the context information may be voxel-wise context information.

예시적인 방법의 일부 실시예들에서, 적어도 하나의 복셀의 점유 상태를 예측하는 단계는: 제3 포인트 클라우드의 적어도 하나의 특징을 집계하는 단계; 집계된 특징을 다층 퍼셉트론(MLP) 계층들로 처리하여 MLP 계층 출력을 생성하는 단계; MLP 계층 출력에 대해 소프트맥스 프로세스를 수행하여 소프트맥스 출력 값들을 생성하는 단계; 및 소프트맥스 출력 값들의 임계값 처리를 수행하여 제3 포인트 클라우드의 적어도 하나의 복셀의 예측된 점유 상태를 생성하는 단계를 포함할 수 있다.In some embodiments of the exemplary method, the step of predicting the occupancy state of at least one voxel may include: aggregating at least one feature of the third point cloud; processing the aggregated feature with multilayer perceptron (MLP) layers to generate an MLP layer output; performing a softmax process on the MLP layer output to generate softmax output values; and performing thresholding on the softmax output values to generate a predicted occupancy state of the at least one voxel of the third point cloud.

예시적인 방법의 일부 실시예들에서, 소프트맥스 출력 값들의 임계값 처리는 0.5 초과의 소프트맥스 출력 값들을 1의 출력 값으로 변환하고, 0.5 이하의 소프트맥스 출력 값들을 0의 출력 값으로 변환한다.In some embodiments of the exemplary method, thresholding of the softmax output values converts softmax output values greater than 0.5 to an output value of 1, and converts softmax output values less than or equal to 0.5 to an output value of 0.

예시적인 방법의 일부 실시예들에서, 적어도 하나의 복셀의 점유 상태를 예측하는 단계는: 제3 포인트 클라우드의 적어도 하나의 특징을 집계하는 단계; 및 집계된 특징에 기초하여 제3 포인트 클라우드의 적어도 하나의 복셀의 예측된 점유 상태를 생성하는 단계를 포함할 수 있다.In some embodiments of the exemplary method, the step of predicting the occupancy state of at least one voxel may include: aggregating at least one feature of the third point cloud; and generating a predicted occupancy state of the at least one voxel of the third point cloud based on the aggregated feature.

예시적인 방법의 일부 실시예들에서, 적어도 하나의 특징을 집계하는 단계는: 캐스케이딩 프로세스를 한 번 이상 반복하는 단계를 포함할 수 있으며, 캐스케이딩 프로세스는: 입력 포인트 클라우드의 희소 3D 콘볼루션을 수행하여 콘볼루션 출력 포인트 클라우드를 생성하는 단계; 콘볼루션 출력 포인트 클라우드에 대해 비선형 활성화 프로세스를 수행하여 비선형 출력 포인트 클라우드를 생성하는 단계; 및 캐스케이딩 프로세스의 다음 사이클이 있을 경우 비선형 출력 포인트 클라우드를 입력 포인트 클라우드로 준비하는 단계를 포함할 수 있으며, 여기서 제3 포인트 클라우드는 캐스케이딩 프로세스의 첫 번째 사이클에 대한 입력 포인트 클라우드이며, 여기서 캐스케이딩 프로세스의 마지막 사이클은 집계된 특징을 생성한다.In some embodiments of the exemplary method, the step of aggregating at least one feature may comprise: repeating a cascading process one or more times, wherein the cascading process may comprise: performing a sparse 3D convolution of an input point cloud to generate a convolution output point cloud; performing a nonlinear activation process on the convolution output point cloud to generate a nonlinear output point cloud; and preparing the nonlinear output point cloud as an input point cloud for a next cycle of the cascading process, wherein the third point cloud is an input point cloud for a first cycle of the cascading process, and wherein a last cycle of the cascading process generates aggregated features.

예시적인 방법의 일부 실시예들은 제3 포인트 클라우드를 캐스케이딩 프로세스의 마지막 사이클의 ReLU 출력 포인트 클라우드에 추가하는 단계를 더 포함할 수 있다.Some embodiments of the exemplary method may further include a step of adding a third point cloud to the ReLU output point cloud of the last cycle of the cascading process.

예시적인 방법의 일부 실시예들에서, 적어도 하나의 특징을 집계하는 단계는: 입력 포인트 클라우드의 희소 3D 콘볼루션을 수행하여 콘볼루션 출력 포인트 클라우드를 생성하는 단계; 및 콘볼루션 출력 포인트 클라우드에 대해 비선형 활성화 프로세스를 수행하여 집계된 특징을 생성하는 단계를 포함할 수 있다.In some embodiments of the exemplary method, the step of aggregating at least one feature may include: performing a sparse 3D convolution of an input point cloud to generate a convolution output point cloud; and performing a nonlinear activation process on the convolution output point cloud to generate aggregated features.

예시적인 방법의 일부 실시예들에서, 비선형 활성화 프로세스는 ReLU(rectifier linear unit) 활성화 프로세스일 수 있다.In some embodiments of the exemplary method, the nonlinear activation process may be a rectifier linear unit (ReLU) activation process.

예시적인 방법의 일부 실시예들에서, 적어도 하나의 특징을 집계하는 단계는: 제1 캐스케이딩 프로세스를 한 번 이상 반복하는 단계 - 제1 캐스케이딩 프로세스는: 제1 입력 포인트 클라우드의 제1 희소 3D 콘볼루션을 수행하여 제1 콘볼루션 출력 포인트 클라우드를 생성하는 단계; 제1 콘볼루션 출력 포인트 클라우드에 대해 제1 비선형 활성화 프로세스를 수행하여 제1 비선형 출력 포인트 클라우드를 생성하는 단계; 및 제1 캐스케이딩 프로세스의 다음 사이클이 있을 경우 제1 비선형 출력 포인트 클라우드를 제1 입력 포인트 클라우드로 준비하는 단계를 포함할 수 있고, 제3 포인트 클라우드는 제1 캐스케이딩 프로세스의 첫 번째 사이클에 대한 제1 입력 포인트 클라우드이며, 제1 캐스케이딩 프로세스의 마지막 사이클은 제1 캐스케이딩 프로세스 출력을 생성함 -; 제2 캐스케이딩 프로세스를 한 번 이상 반복하는 단계 - 제2 캐스케이딩 프로세스는: 제2 입력 포인트 클라우드의 제2 희소 3D 콘볼루션을 수행하여 제2 콘볼루션 출력 포인트 클라우드를 생성하는 단계; 제2 콘볼루션 출력 포인트 클라우드에 대해 제2 비선형 활성화 프로세스를 수행하여 제2 비선형 출력 포인트 클라우드를 생성하는 단계; 및 제2 캐스케이딩 프로세스의 다음 사이클이 있을 경우 제2 비선형 출력 포인트 클라우드를 제2 입력 포인트 클라우드로 준비하는 단계를 포함할 수 있고, 제3 포인트 클라우드는 제2 캐스케이딩 프로세스의 첫 번째 사이클에 대한 제2 입력 포인트 클라우드이며, 제2 캐스케이딩 프로세스의 마지막 사이클은 제2 캐스케이딩 프로세스 출력을 생성함 -; 제1 캐스케이딩 프로세스 출력과 제2 캐스케이딩 프로세스 출력을 연결하여 연결 출력을 생성하는 단계; 및 제3 포인트 클라우드를 연결 출력에 추가하여 집계된 특징을 생성하는 단계를 포함할 수 있다.In some embodiments of the exemplary method, the step of aggregating at least one feature may include: repeating a first cascading process one or more times, wherein the first cascading process comprises: performing a first sparse 3D convolution of a first input point cloud to generate a first convolution output point cloud; performing a first nonlinear activation process on the first convolution output point cloud to generate a first nonlinear output point cloud; and preparing the first nonlinear output point cloud as the first input point cloud for a next cycle of the first cascading process, wherein the third point cloud is the first input point cloud for the first cycle of the first cascading process, and wherein a last cycle of the first cascading process generates the first cascading process output; repeating a second cascading process one or more times, wherein the second cascading process comprises: performing a second sparse 3D convolution of the second input point cloud to generate a second convolution output point cloud; The method may further include: performing a second nonlinear activation process on the second convolution output point cloud to generate a second nonlinear output point cloud; and preparing the second nonlinear output point cloud as a second input point cloud when there is a next cycle of the second cascading process, wherein the third point cloud is a second input point cloud for a first cycle of the second cascading process, and the last cycle of the second cascading process generates a second cascading process output; connecting the first cascading process output and the second cascading process output to generate a connected output; and adding the third point cloud to the connected output to generate aggregated features.

예시적인 방법의 일부 실시예들에서, 적어도 하나의 특징을 집계하는 단계는: 제1 캐스케이딩 프로세스를 한 번 이상 반복하는 단계 - 제1 캐스케이딩 프로세스는: 제1 입력 포인트 클라우드의 제1 희소 3D 콘볼루션을 수행하여 제1 콘볼루션 출력 포인트 클라우드를 생성하는 단계; 제1 콘볼루션 출력 포인트 클라우드에 대해 제1 ReLU(rectifier linear unit) 활성화 프로세스를 수행하여 제1 ReLU 출력 포인트 클라우드를 생성하는 단계; 및 제1 캐스케이딩 프로세스의 다음 사이클이 있을 경우 제1 ReLU 출력 포인트 클라우드를 제1 입력 포인트 클라우드로 준비하는 단계를 포함할 수 있고, 제3 포인트 클라우드는 제1 캐스케이딩 프로세스의 첫 번째 사이클에 대한 제1 입력 포인트 클라우드이며, 제1 캐스케이딩 프로세스의 마지막 사이클은 제1 캐스케이딩 프로세스 출력을 생성함 -; 제2 캐스케이딩 프로세스를 한 번 이상 반복하는 단계 - 제2 캐스케이딩 프로세스는: 제2 입력 포인트 클라우드의 제2 희소 3D 콘볼루션을 수행하여 제2 콘볼루션 출력 포인트 클라우드를 생성하는 단계; 제2 콘볼루션 출력 포인트 클라우드에 대해 제2 ReLU(rectifier linear unit) 활성화 프로세스를 수행하여 제2 ReLU 출력 포인트 클라우드를 생성하는 단계; 및 제2 캐스케이딩 프로세스의 다음 사이클이 있을 경우 제2 ReLU 출력 포인트 클라우드를 제2 입력 포인트 클라우드로 준비하는 단계를 포함할 수 있고, 제3 포인트 클라우드는 제2 캐스케이딩 프로세스의 첫 번째 사이클에 대한 제2 입력 포인트 클라우드이며, 제2 캐스케이딩 프로세스의 마지막 사이클은 제2 캐스케이딩 프로세스 출력을 생성함 -; 제1 캐스케이딩 프로세스 출력과 제2 캐스케이딩 프로세스 출력을 연결하여 연결 출력을 생성하는 단계; 및 제3 포인트 클라우드를 연결 출력에 추가하여 집계된 특징을 생성하는 단계를 포함할 수 있다.In some embodiments of the exemplary method, the step of aggregating at least one feature may include: repeating a first cascading process one or more times, wherein the first cascading process comprises: performing a first sparse 3D convolution of a first input point cloud to generate a first convolution output point cloud; performing a first rectifier linear unit (ReLU) activation process on the first convolution output point cloud to generate a first ReLU output point cloud; and preparing the first ReLU output point cloud as the first input point cloud for a next cycle of the first cascading process, wherein the third point cloud is the first input point cloud for the first cycle of the first cascading process, and wherein a last cycle of the first cascading process generates the first cascading process output; The second cascading process may include: performing a second sparse 3D convolution of a second input point cloud to generate a second convolution output point cloud; performing a second ReLU (rectifier linear unit) activation process on the second convolution output point cloud to generate a second ReLU output point cloud; and preparing the second ReLU output point cloud as a second input point cloud when there is a next cycle of the second cascading process, wherein the third point cloud is the second input point cloud for a first cycle of the second cascading process, and the last cycle of the second cascading process generates the second cascading process output; concatenating the first cascading process output and the second cascading process output to generate a concatenated output; and adding the third point cloud to the concatenated output to generate aggregated features.

예시적인 방법의 일부 실시예들에서, 적어도 하나의 특징을 집계하는 단계는: 제3 포인트 클라우드에 대해 셀프 어텐션 프로세스를 수행하는 단계; 제3 포인트 클라우드를 셀프 어텐션 프로세스 출력에 추가하여 MLP 프로세스 입력을 생성하는 단계; MLP 프로세스 입력에 대해 MLP 프로세스를 수행하는 단계; 및 MLP 프로세스 입력을 MLP 프로세스 출력에 추가하여 집계된 특징을 생성하는 단계를 포함할 수 있다.In some embodiments of the exemplary method, the step of aggregating at least one feature may include: performing a self-attention process on the third point cloud; adding the third point cloud to the self-attention process output to generate an MLP process input; performing an MLP process on the MLP process input; and adding the MLP process input to the MLP process output to generate an aggregated feature.

예시적인 방법의 일부 실시예들에서, 셀프 어텐션 프로세스는 제3 포인트 클라우드의 복셀의 k개의 최근접 이웃에 기초하여 출력 특징을 생성한다.In some embodiments of the exemplary method, the self-attention process generates output features based on the k nearest neighbors of a voxel in the third point cloud.

예시적인 방법의 일부 실시예들에서, 제3 포인트 클라우드의 적어도 하나의 특징을 집계하는 단계는 특징 집계 프로세스를 두 번 이상 수행하는 단계를 포함할 수 있다.In some embodiments of the exemplary method, the step of aggregating at least one feature of the third point cloud may include performing the feature aggregation process more than once.

예시적인 방법의 일부 실시예들은 입력 포인트 클라우드 및 제1 비트스트림에 대해 특징 디코드를 수행하여 제1 포인트 클라우드를 생성하는 단계를 더 포함할 수 있다.Some embodiments of the exemplary method may further include a step of performing feature decoding on the input point cloud and the first bitstream to generate a first point cloud.

예시적인 방법의 일부 실시예들은 프루닝된 포인트 클라우드에 대해 특징-잔차 변환을 수행하여 잔차 출력을 생성하는 단계; 및 프루닝된 포인트 클라우드를 잔차 출력에 추가하여 디코딩된 포인트 클라우드를 생성하는 단계를 더 포함할 수 있다.Some embodiments of the exemplary method may further include performing a feature-residual transform on the pruned point cloud to generate a residual output; and adding the pruned point cloud to the residual output to generate a decoded point cloud.

예시적인 방법의 일부 실시예들은 프루닝된 포인트 클라우드에 대해 특징 집계를 수행하여 집계된 특징을 생성하는 단계를 더 포함할 수 있으며, 여기서 특징-잔차 변환은 집계된 특징에 대해 수행된다.Some embodiments of the exemplary method may further include a step of performing feature aggregation on the pruned point cloud to generate aggregated features, wherein feature-to-residual transformation is performed on the aggregated features.

예시적인 방법의 일부 실시예들은 프루닝된 포인트 클라우드에 대해 특징 집계를 수행하여 집계된 특징을 생성하는 단계; 및 집계된 특징에 대해 콘텍스트 인식 업샘플링 프로세스를 수행하여 디코딩된 포인트 클라우드를 생성하는 단계를 더 포함할 수 있다.Some embodiments of the exemplary method may further include the steps of performing feature aggregation on the pruned point cloud to generate aggregated features; and performing a context-aware upsampling process on the aggregated features to generate a decoded point cloud.

일부 실시예들에 따른 예시적인 장치는 프로세서; 및 프로세서에 의해 실행될 때, 장치로 하여금 최근접 이웃 업샘플링을 사용하여 제1 포인트 클라우드를 업샘플링하여 제2 포인트 클라우드를 획득하게 하고; 제2 포인트 클라우드의 특징들을 콘텍스트 정보와 연결하여 제3 포인트 클라우드를 획득하게 하며; 제3 포인트 클라우드의 적어도 하나의 복셀의 점유 상태를 예측하게 하고; 예측된 점유 상태에 따라, 비어 있는 것으로 분류된 제3 포인트 클라우드의 복셀들을 제거하여 프루닝된 포인트 클라우드를 생성하게 하도록 작동하는 명령어들을 저장한 비일시적 컴퓨터 판독 가능 매체를 포함할 수 있다.An exemplary device according to some embodiments may include a processor; and a non-transitory computer-readable medium storing instructions that, when executed by the processor, cause the device to: upsample a first point cloud using nearest neighbor upsampling to obtain a second point cloud; associate features of the second point cloud with context information to obtain a third point cloud; predict an occupancy state of at least one voxel of the third point cloud; and remove voxels of the third point cloud classified as empty based on the predicted occupancy state to generate a pruned point cloud.

일부 실시예들에 따른 예시적인 디바이스는 예시적인 장치에 따른 장치; 및 (i) 신호를 수신하도록 구성된 안테나 - 신호는 이미지를 표현하는 데이터를 포함함 -, (ii) 수신된 신호를 이미지를 표현하는 데이터를 포함하는 주파수 대역으로 제한하도록 구성된 대역 제한기, 또는 (iii) 이미지를 디스플레이하도록 구성된 디스플레이 중 적어도 하나를 포함할 수 있다.An exemplary device according to some embodiments may include a device according to the exemplary device; and at least one of (i) an antenna configured to receive a signal, the signal comprising data representing an image, (ii) a band limiter configured to limit the received signal to a frequency band comprising the data representing the image, or (iii) a display configured to display the image.

예시적인 방법의 일부 실시예들은 TV, 셀 폰, 태블릿, 및 셋톱 박스(STB) 중 적어도 하나를 더 포함할 수 있다.Some embodiments of the exemplary method may further include at least one of a TV, a cell phone, a tablet, and a set-top box (STB).

일부 실시예들에 따른 예시적인 장치는 제1 포인트 클라우드를 포함하는 데이터에 액세스하도록 구성된 액세스 유닛; 및 제1 포인트 클라우드를 포함하는 데이터를 송신하도록 구성된 송신기를 포함할 수 있다.An exemplary device according to some embodiments may include an access unit configured to access data including a first point cloud; and a transmitter configured to transmit data including the first point cloud.

일부 실시예들에 따른 예시적인 방법은 제1 포인트 클라우드를 포함하는 데이터에 액세스하는 단계; 및 제1 포인트 클라우드를 포함하는 데이터를 송신하는 단계를 포함할 수 있다.An exemplary method according to some embodiments may include: accessing data comprising a first point cloud; and transmitting data comprising the first point cloud.

일부 실시예들에 따른 예시적인 컴퓨터 판독 가능 매체는 하나 이상의 프로세서로 하여금: 최근접 이웃 업샘플링을 사용하여 제1 포인트 클라우드를 업샘플링하여 제2 포인트 클라우드를 획득하게 하고; 제2 포인트 클라우드의 특징들을 콘텍스트 정보와 연결하여 제3 포인트 클라우드를 획득하게 하며; 제3 포인트 클라우드의 적어도 하나의 복셀의 점유 상태를 예측하게 하고; 예측된 점유 상태에 따라, 비어 있는 것으로 분류된 제3 포인트 클라우드의 복셀들을 제거하여 프루닝된 포인트 클라우드를 생성하게 하는 명령어들을 포함할 수 있다.An exemplary computer-readable medium according to some embodiments may include instructions that cause one or more processors to: obtain a second point cloud by upsampling a first point cloud using nearest neighbor upsampling; obtain a third point cloud by concatenating features of the second point cloud with context information; predict an occupancy state of at least one voxel of the third point cloud; and remove voxels of the third point cloud classified as empty based on the predicted occupancy state to generate a pruned point cloud.

일부 실시예들에 따른 예시적인 컴퓨터 프로그램 제품은 프로그램이 하나 이상의 프로세서에 의해 실행될 때, 하나 이상의 프로세서로 하여금: 최근접 이웃 업샘플링을 사용하여 제1 포인트 클라우드를 업샘플링하여 제2 포인트 클라우드를 획득하게 하고; 제2 포인트 클라우드의 특징들을 콘텍스트 정보와 연결하여 제3 포인트 클라우드를 획득하게 하며; 제3 포인트 클라우드의 적어도 하나의 복셀의 점유 상태를 예측하게 하고; 예측된 점유 상태에 따라, 비어 있는 것으로 분류된 제3 포인트 클라우드의 복셀들을 제거하여 프루닝된 포인트 클라우드를 생성하게 하는 명령어들을 포함할 수 있다.An exemplary computer program product according to some embodiments may include instructions that, when the program is executed by one or more processors, cause the one or more processors to: obtain a second point cloud by upsampling a first point cloud using nearest neighbor upsampling; obtain a third point cloud by concatenating features of the second point cloud with context information; predict an occupancy state of at least one voxel of the third point cloud; and remove voxels of the third point cloud classified as empty based on the predicted occupancy state to generate a pruned point cloud.

일부 실시예들에 따른 예시적인 방법은 제1 포인트 클라우드의 콘텍스트 인식 업샘플링을 수행하여 업샘플링된 제2 포인트 클라우드를 결정하는 단계를 포함할 수 있으며, 여기서 콘텍스트 인식 업샘플링은: 제3 포인트 클라우드의 특징들을 콘텍스트 정보와 연관시키는 것 - 제3 포인트 클라우드는 제1 포인트 클라우드의 초기 업샘플링된 버전에 적어도 부분적으로 기초함 -; 및 제3 포인트 클라우드로부터 콘텍스트 정보에 적어도 부분적으로 기초하여 비어 있는 것으로 예측되는 제4 포인트 클라우드의 복셀들을 제거하여 업스케일링된 제2 포인트 클라우드를 생성하는 것을 포함할 수 있다.An exemplary method according to some embodiments may include performing context-aware upsampling of a first point cloud to determine an upsampled second point cloud, wherein the context-aware upsampling may include: associating features of a third point cloud with context information, the third point cloud being at least partially based on an initial upsampled version of the first point cloud; and removing voxels of a fourth point cloud that are predicted to be empty based at least partially on the context information from the third point cloud to generate the upscaled second point cloud.

일부 실시예들에 따른 추가적인 예시적인 방법은: 초기 업샘플링을 사용하여 제1 포인트 클라우드를 업샘플링하여 제2 포인트 클라우드를 획득하는 단계; 제2 포인트 클라우드의 특징들을 콘텍스트 정보와 연관시켜 제3 포인트 클라우드를 획득하는 단계; 제3 포인트 클라우드의 적어도 하나의 복셀의 점유 상태를 예측하는 단계 - 적어도 하나의 복셀의 점유 상태를 예측하는 단계는 제3 포인트 클라우드의 적어도 하나의 특징을 집계하는 단계를 포함하고, 제3 포인트 클라우드의 적어도 하나의 특징을 집계하는 단계는 제1 신경 네트워크를 사용하는 단계를 포함하며, 제1 신경 네트워크를 사용하여 제3 포인트 클라우드의 적어도 하나의 특징을 집계하는 단계는 제1 신경 네트워크와 함께 제1 신경 네트워크 파라미터 세트를 사용하는 단계를 포함함 -; 예측된 점유 상태에 따라, 비어 있는 것으로 분류된 제3 포인트 클라우드의 복셀들을 제거하여 프루닝된 포인트 클라우드를 생성하는 단계; 및 프루닝된 포인트 클라우드에 대해 특징 집계를 수행하여 집계된 특징을 생성하는 단계를 포함할 수 있으며, 여기서 프루닝된 포인트 클라우드에 대해 특징 집계를 수행하는 단계는 제2 신경 네트워크를 사용하는 단계를 포함하고, 여기서 제2 신경 네트워크를 사용하여 집계된 특징을 생성하는 단계는 제2 신경 네트워크와 함께 제2 신경 네트워크 파라미터 세트를 사용하는 단계를 포함하며, 여기서 제1 신경 네트워크 파라미터 세트는 제2 신경 네트워크 파라미터 세트와 동일하다.An additional exemplary method according to some embodiments comprises: obtaining a second point cloud by upsampling a first point cloud using an initial upsampling; obtaining a third point cloud by associating features of the second point cloud with context information; predicting an occupancy state of at least one voxel of the third point cloud, wherein predicting the occupancy state of the at least one voxel comprises aggregating at least one feature of the third point cloud, wherein aggregating the at least one feature of the third point cloud comprises using a first neural network, and wherein aggregating the at least one feature of the third point cloud using the first neural network comprises using a first neural network parameter set together with the first neural network; generating a pruned point cloud by removing voxels of the third point cloud classified as empty according to the predicted occupancy state; and performing feature aggregation on the pruned point cloud to generate aggregated features, wherein the performing feature aggregation on the pruned point cloud comprises using a second neural network, and wherein the step of generating the aggregated features using the second neural network comprises using a second neural network parameter set together with the second neural network, wherein the first neural network parameter set is identical to the second neural network parameter set.

추가적인 예시적인 방법의 일부 실시예들은 제2 포인트 클라우드의 적어도 하나의 특징을 집계하는 단계를 더 포함할 수 있다.Some embodiments of the additional exemplary method may further include a step of aggregating at least one feature of the second point cloud.

추가적인 예시적인 방법의 일부 실시예들에서, 제2 포인트 클라우드의 적어도 하나의 특징을 집계하는 단계는 제3 신경 네트워크를 사용하는 단계를 포함할 수 있고, 제3 신경 네트워크를 사용하여 제2 포인트 클라우드의 적어도 하나의 특징을 집계하는 단계는 제3 신경 네트워크와 함께 제3 신경 네트워크 파라미터 세트를 사용하는 단계를 포함할 수 있으며, 제3 신경 네트워크 파라미터 세트는 제1 신경 네트워크 파라미터 세트와 동일할 수 있다.In some embodiments of the additional exemplary method, the step of aggregating at least one feature of the second point cloud may comprise using a third neural network, and the step of aggregating at least one feature of the second point cloud using the third neural network may comprise using a third neural network parameter set together with the third neural network, wherein the third neural network parameter set may be identical to the first neural network parameter set.

추가적인 예시적인 방법의 일부 실시예들은 입력 포인트 클라우드 및 제1 비트스트림에 대해 특징 디코드를 수행하여 제1 포인트 클라우드를 생성하는 단계를 더 포함할 수 있다.Some embodiments of the additional exemplary method may further include a step of performing feature decoding on the input point cloud and the first bitstream to generate a first point cloud.

추가적인 예시적인 방법의 일부 실시예들은: 프루닝된 포인트 클라우드에 대해 특징-잔차 변환을 수행하여 잔차 출력을 생성하는 단계; 및 프루닝된 포인트 클라우드를 잔차 출력에 추가하여 디코딩된 포인트 클라우드를 생성하는 단계를 더 포함할 수 있다.Some embodiments of the additional exemplary method may further include: performing a feature-residual transformation on the pruned point cloud to generate a residual output; and adding the pruned point cloud to the residual output to generate a decoded point cloud.

추가적인 예시적인 방법의 일부 실시예들에서, 특징-잔차 변환이 집계된 특징에 대해 수행될 수 있다.In some embodiments of the additional exemplary method, feature-residual transformation can be performed on aggregated features.

추가적인 예시적인 방법의 일부 실시예들은 집계된 특징에 대해 콘텍스트 인식 업샘플링 프로세스를 수행하여 디코딩된 포인트 클라우드를 생성하는 단계를 더 포함할 수 있다.Some embodiments of the additional exemplary method may further include a step of performing a context-aware upsampling process on the aggregated features to generate a decoded point cloud.

추가적인 예시적인 방법의 일부 실시예들에서, 초기 업샘플링은 최근접 이웃 업샘플링을 포함할 수 있다.In some embodiments of the additional exemplary method, the initial upsampling may include nearest neighbor upsampling.

추가적인 예시적인 방법의 일부 실시예들에서, 특징들을 연관시키는 단계는 제2 포인트 클라우드의 특징들을 콘텍스트 정보와 연결하여 제3 포인트 클라우드를 획득하는 단계를 포함할 수 있다.In some embodiments of the additional exemplary method, the step of associating features may include the step of obtaining a third point cloud by associating features of the second point cloud with context information.

추가적인 예시적인 방법의 일부 실시예들에서, 점유 상태를 예측하는 단계는 적어도 하나의 복셀의 실측 점유 상태를 예측할 수 있다.In some embodiments of the additional exemplary method, the step of predicting an occupancy state may predict a ground truth occupancy state of at least one voxel.

추가적인 예시적인 방법의 일부 실시예들에서, 점유 상태를 예측하는 단계는 적어도 하나의 복셀이 점유될 가능성을 예측할 수 있다.In some embodiments of the additional exemplary method, the step of predicting occupancy may predict a likelihood that at least one voxel is occupied.

추가적인 예시적인 방법의 일부 실시예들에서, 제3 포인트 클라우드의 복셀들을 제거하는 단계는 복셀 프루닝 프로세스를 사용하여 복셀들을 제거할 수 있다.In some embodiments of the additional exemplary method, the step of removing voxels of the third point cloud may remove voxels using a voxel pruning process.

추가적인 예시적인 방법의 일부 실시예들에서, 콘텍스트 정보는 복셀 단위 콘텍스트 정보일 수 있다.In some embodiments of the additional exemplary method, the context information may be voxel-wise context information.

추가적인 예시적인 방법의 일부 실시예들에서, 적어도 하나의 복셀의 점유 상태를 예측하는 단계는: 집계된 특징을 다층 퍼셉트론(MLP) 계층들로 처리하여 MLP 계층 출력을 생성하는 단계; MLP 계층 출력에 대해 소프트맥스 프로세스를 수행하여 소프트맥스 출력 값들을 생성하는 단계; 및 소프트맥스 출력 값들의 임계값 처리를 수행하여 제3 포인트 클라우드의 적어도 하나의 복셀의 예측된 점유 상태를 생성하는 단계를 포함할 수 있다.In some embodiments of the additional exemplary method, the step of predicting the occupancy state of at least one voxel may include: processing the aggregated features with multilayer perceptron (MLP) layers to generate MLP layer outputs; performing a softmax process on the MLP layer outputs to generate softmax output values; and performing thresholding on the softmax output values to generate a predicted occupancy state of at least one voxel of the third point cloud.

추가적인 예시적인 방법의 일부 실시예들에서, 소프트맥스 출력 값들의 임계값 처리는 0.5 초과의 소프트맥스 출력 값들을 1의 출력 값으로 변환하고, 0.5 이하의 소프트맥스 출력 값들을 0의 출력 값으로 변환할 수 있다.In some embodiments of the additional exemplary method, thresholding of the softmax output values can convert softmax output values greater than 0.5 to an output value of 1, and convert softmax output values less than or equal to 0.5 to an output value of 0.

추가적인 예시적인 방법의 일부 실시예들에서, 적어도 하나의 복셀의 점유 상태를 예측하는 단계는: 제3 포인트 클라우드의 적어도 하나의 특징을 집계하는 단계; 및 집계된 특징에 기초하여 제3 포인트 클라우드의 적어도 하나의 복셀의 예측된 점유 상태를 생성하는 단계를 포함할 수 있다.In some embodiments of the additional exemplary method, the step of predicting the occupancy state of at least one voxel may include: aggregating at least one feature of the third point cloud; and generating a predicted occupancy state of the at least one voxel of the third point cloud based on the aggregated feature.

추가적인 예시적인 방법의 일부 실시예들에서, 제3 포인트 클라우드의 적어도 하나의 특징을 집계하는 단계는: 캐스케이딩 프로세스를 한 번 이상 반복하는 단계를 포함할 수 있으며, 캐스케이딩 프로세스는: 입력 포인트 클라우드의 희소 3D 콘볼루션을 수행하여 콘볼루션 출력 포인트 클라우드를 생성하는 단계; 콘볼루션 출력 포인트 클라우드에 대해 비선형 활성화 프로세스를 수행하여 비선형 출력 포인트 클라우드를 생성하는 단계; 및 캐스케이딩 프로세스의 다음 사이클이 있을 경우 비선형 출력 포인트 클라우드를 입력 포인트 클라우드로 준비하는 단계를 포함할 수 있으며, 제3 포인트 클라우드는 캐스케이딩 프로세스의 첫 번째 사이클에 대한 입력 포인트 클라우드일 수 있고, 캐스케이딩 프로세스의 마지막 사이클은 집계된 특징을 생성할 수 있다.In some embodiments of the additional exemplary method, the step of aggregating at least one feature of the third point cloud may comprise: repeating a cascading process one or more times, wherein the cascading process may comprise: performing a sparse 3D convolution of an input point cloud to generate a convolution output point cloud; performing a nonlinear activation process on the convolution output point cloud to generate a nonlinear output point cloud; and preparing the nonlinear output point cloud as an input point cloud for a next cycle of the cascading process, wherein the third point cloud may be an input point cloud for a first cycle of the cascading process, and wherein a last cycle of the cascading process may generate the aggregated features.

추가적인 예시적인 방법의 일부 실시예들은 제3 포인트 클라우드를 캐스케이딩 프로세스의 마지막 사이클의 ReLU 출력 포인트 클라우드에 추가하는 단계를 더 포함할 수 있다.Some embodiments of the additional exemplary method may further include a step of adding a third point cloud to the ReLU output point cloud of the last cycle of the cascading process.

추가적인 예시적인 방법의 일부 실시예들에서, 적어도 하나의 특징을 집계하는 단계는: 입력 포인트 클라우드의 희소 3D 콘볼루션을 수행하여 콘볼루션 출력 포인트 클라우드를 생성하는 단계; 및 콘볼루션 출력 포인트 클라우드에 대해 비선형 활성화 프로세스를 수행하여 집계된 특징을 생성하는 단계를 포함할 수 있다.In some embodiments of the additional exemplary method, the step of aggregating at least one feature may include: performing a sparse 3D convolution of an input point cloud to generate a convolution output point cloud; and performing a nonlinear activation process on the convolution output point cloud to generate aggregated features.

추가적인 예시적인 방법의 일부 실시예들에서, 비선형 활성화 프로세스는 ReLU(rectifier linear unit) 활성화 프로세스를 포함할 수 있고, 비선형 출력 포인트 클라우드는 ReLU 출력 포인트 클라우드를 포함한다.In some embodiments of the additional exemplary method, the nonlinear activation process may include a rectifier linear unit (ReLU) activation process, and the nonlinear output point cloud includes a ReLU output point cloud.

추가적인 예시적인 방법의 일부 실시예들에서, 제3 포인트 클라우드의 적어도 하나의 특징을 집계하는 단계는: 제1 캐스케이딩 프로세스를 한 번 이상 반복하는 단계 - 제1 캐스케이딩 프로세스는: 제1 입력 포인트 클라우드의 제1 희소 3D 콘볼루션을 수행하여 제1 콘볼루션 출력 포인트 클라우드를 생성하는 단계; 제1 콘볼루션 출력 포인트 클라우드에 대해 제1 비선형 활성화 프로세스를 수행하여 제1 비선형 출력 포인트 클라우드를 생성하는 단계; 및 제1 캐스케이딩 프로세스의 다음 사이클이 있을 경우 제1 비선형 출력 포인트 클라우드를 제1 입력 포인트 클라우드로 준비하는 단계를 포함할 수 있고, 제3 포인트 클라우드는 제1 캐스케이딩 프로세스의 첫 번째 사이클에 대한 제1 입력 포인트 클라우드일 수 있으며, 제1 캐스케이딩 프로세스의 마지막 사이클은 제1 캐스케이딩 프로세스 출력을 생성할 수 있음 -; 제2 캐스케이딩 프로세스를 한 번 이상 반복하는 단계 - 제2 캐스케이딩 프로세스는: 제2 입력 포인트 클라우드의 제2 희소 3D 콘볼루션을 수행하여 제2 콘볼루션 출력 포인트 클라우드를 생성하는 단계; 제2 콘볼루션 출력 포인트 클라우드에 대해 제2 비선형 활성화 프로세스를 수행하여 제2 비선형 출력 포인트 클라우드를 생성하는 단계; 및 제2 캐스케이딩 프로세스의 다음 사이클이 있을 경우 제2 비선형 출력 포인트 클라우드를 제2 입력 포인트 클라우드로 준비하는 단계를 포함할 수 있고, 제3 포인트 클라우드는 제2 캐스케이딩 프로세스의 첫 번째 사이클에 대한 제2 입력 포인트 클라우드일 수 있으며, 제2 캐스케이딩 프로세스의 마지막 사이클은 제2 캐스케이딩 프로세스 출력을 생성할 수 있음 -; 제1 캐스케이딩 프로세스 출력과 제2 캐스케이딩 프로세스 출력을 연결하여 연결 출력을 생성하는 단계; 및 제3 포인트 클라우드를 연결 출력에 추가하여 집계된 특징을 생성하는 단계를 포함할 수 있다.In some embodiments of the additional exemplary method, the step of aggregating at least one feature of the third point cloud may include: repeating a first cascading process one or more times, wherein the first cascading process comprises: performing a first sparse 3D convolution of a first input point cloud to generate a first convolution output point cloud; performing a first nonlinear activation process on the first convolution output point cloud to generate a first nonlinear output point cloud; and preparing the first nonlinear output point cloud as the first input point cloud if there is a next cycle of the first cascading process, wherein the third point cloud may be the first input point cloud for a first cycle of the first cascading process, and wherein a last cycle of the first cascading process may generate the first cascading process output; A step of repeating a second cascading process one or more times, wherein the second cascading process may include: performing a second sparse 3D convolution of a second input point cloud to generate a second convolution output point cloud; performing a second nonlinear activation process on the second convolution output point cloud to generate a second nonlinear output point cloud; and preparing the second nonlinear output point cloud as the second input point cloud if there is a next cycle of the second cascading process, wherein the third point cloud may be the second input point cloud for a first cycle of the second cascading process, and a last cycle of the second cascading process may generate the second cascading process output; concatenating the first cascading process output and the second cascading process output to generate a concatenated output; and adding the third point cloud to the concatenated output to generate aggregated features.

추가적인 예시적인 방법의 일부 실시예들에서, 적어도 하나의 특징을 집계하는 단계는: 제1 캐스케이딩 프로세스를 한 번 이상 반복하는 단계 - 제1 캐스케이딩 프로세스는: 제1 입력 포인트 클라우드의 제1 희소 3D 콘볼루션을 수행하여 제1 콘볼루션 출력 포인트 클라우드를 생성하는 단계; 제1 콘볼루션 출력 포인트 클라우드에 대해 제1 ReLU(rectifier linear unit) 활성화 프로세스를 수행하여 제1 ReLU 출력 포인트 클라우드를 생성하는 단계; 및 제1 캐스케이딩 프로세스의 다음 사이클이 있을 경우 제1 ReLU 출력 포인트 클라우드를 제1 입력 포인트 클라우드로 준비하는 단계를 포함할 수 있고, 제3 포인트 클라우드는 제1 캐스케이딩 프로세스의 첫 번째 사이클에 대한 제1 입력 포인트 클라우드일 수 있으며, 제1 캐스케이딩 프로세스의 마지막 사이클은 제1 캐스케이딩 프로세스 출력을 생성할 수 있음 -; 제2 캐스케이딩 프로세스를 한 번 이상 반복하는 단계 - 제2 캐스케이딩 프로세스는: 제2 입력 포인트 클라우드의 제2 희소 3D 콘볼루션을 수행하여 제2 콘볼루션 출력 포인트 클라우드를 생성하는 단계; 제2 콘볼루션 출력 포인트 클라우드에 대해 제2 ReLU(rectifier linear unit) 활성화 프로세스를 수행하여 제2 ReLU 출력 포인트 클라우드를 생성하는 단계; 및 제2 캐스케이딩 프로세스의 다음 사이클이 있을 경우 제2 ReLU 출력 포인트 클라우드를 제2 입력 포인트 클라우드로 준비하는 단계를 포함할 수 있고, 제3 포인트 클라우드는 제2 캐스케이딩 프로세스의 첫 번째 사이클에 대한 제2 입력 포인트 클라우드일 수 있으며, 제2 캐스케이딩 프로세스의 마지막 사이클은 제2 캐스케이딩 프로세스 출력을 생성할 수 있음 -; 제1 캐스케이딩 프로세스 출력과 제2 캐스케이딩 프로세스 출력을 연결하여 연결 출력을 생성하는 단계; 및 제3 포인트 클라우드를 연결 출력에 추가하여 집계된 특징을 생성하는 단계를 포함할 수 있다.In some embodiments of the additional exemplary method, the step of aggregating at least one feature may include: repeating a first cascading process one or more times, wherein the first cascading process comprises: performing a first sparse 3D convolution of a first input point cloud to generate a first convolution output point cloud; performing a first rectifier linear unit (ReLU) activation process on the first convolution output point cloud to generate a first ReLU output point cloud; and preparing the first ReLU output point cloud as the first input point cloud if there is a next cycle of the first cascading process, wherein the third point cloud may be the first input point cloud for the first cycle of the first cascading process, and wherein a last cycle of the first cascading process may generate the first cascading process output; A step of repeating a second cascading process one or more times, wherein the second cascading process may include: performing a second sparse 3D convolution of a second input point cloud to generate a second convolution output point cloud; performing a second ReLU (rectifier linear unit) activation process on the second convolution output point cloud to generate a second ReLU output point cloud; and preparing the second ReLU output point cloud as a second input point cloud when there is a next cycle of the second cascading process, wherein the third point cloud may be the second input point cloud for a first cycle of the second cascading process, and a last cycle of the second cascading process may generate the second cascading process output; connecting the first cascading process output and the second cascading process output to generate a connected output; and adding the third point cloud to the connected output to generate an aggregated feature.

추가적인 예시적인 방법의 일부 실시예들에서, 적어도 하나의 특징을 집계하는 단계는: 제3 포인트 클라우드에 대해 셀프 어텐션 프로세스를 수행하는 단계; 제3 포인트 클라우드를 셀프 어텐션 프로세스 출력에 추가하여 MLP 프로세스 입력을 생성하는 단계; MLP 프로세스 입력에 대해 MLP 프로세스를 수행하는 단계; 및 MLP 프로세스 입력을 MLP 프로세스 출력에 추가하여 집계된 특징을 생성하는 단계를 포함할 수 있다.In some embodiments of the additional exemplary method, the step of aggregating at least one feature may include: performing a self-attention process on the third point cloud; adding the third point cloud to the self-attention process output to generate an MLP process input; performing an MLP process on the MLP process input; and adding the MLP process input to the MLP process output to generate an aggregated feature.

추가적인 예시적인 방법의 일부 실시예들에서, 셀프 어텐션 프로세스는 제3 포인트 클라우드의 복셀의 k개의 최근접 이웃에 기초하여 출력 특징을 생성할 수 있다.In some embodiments of the additional exemplary method, the self-attention process can generate output features based on the k nearest neighbors of a voxel of the third point cloud.

추가적인 예시적인 방법의 일부 실시예들에서, 제3 포인트 클라우드의 적어도 하나의 특징을 집계하는 단계는 특징 집계 프로세스를 두 번 이상 수행하는 단계를 포함할 수 있다.In some embodiments of the additional exemplary method, the step of aggregating at least one feature of the third point cloud may include performing the feature aggregation process more than once.

추가적인 예시적인 방법의 일부 실시예들에서, 제1 신경 네트워크 파라미터 세트와 제2 신경 네트워크 파라미터 세트는 동일한 신경 네트워크 파라미터 세트일 수 있으며, 동일한 신경 네트워크 파라미터 세트가 적어도 제1 신경 네트워크와 제2 신경 네트워크에 의해 사용된다.In some embodiments of the additional exemplary method, the first neural network parameter set and the second neural network parameter set can be the same neural network parameter set, and the same neural network parameter set is used by at least the first neural network and the second neural network.

추가적인 예시적인 방법의 일부 실시예들에서, 제1 신경 네트워크 파라미터 세트와 제2 신경 네트워크 파라미터 세트는 구별되지만 동일한 신경 네트워크 파라미터 세트일 수 있다.In some embodiments of the additional exemplary method, the first neural network parameter set and the second neural network parameter set may be distinct but identical neural network parameter sets.

일부 실시예들에 따른 추가적인 예시적인 장치는: 프로세서; 및 프로세서에 의해 실행될 때, 장치로 하여금 위에 나열된 방법들 중 어느 하나를 수행하게 하도록 작동하는 명령어들을 저장한 비일시적 컴퓨터 판독 가능 매체를 포함할 수 있다.An additional exemplary device according to some embodiments may include: a processor; and a non-transitory computer-readable medium storing instructions that, when executed by the processor, cause the device to perform any one of the methods listed above.

제1 예시적인 방법의 일부 실시예들에서, 특징들을 연관시키는 단계는 제2 포인트 클라우드의 특징들을 콘텍스트 정보와 연결하여 제3 포인트 클라우드를 획득하는 단계를 포함한다.In some embodiments of the first exemplary method, the step of associating features includes the step of obtaining a third point cloud by associating features of the second point cloud with context information.

제1 예시적인 방법의 일부 실시예들은 입력 포인트 클라우드 및 제1 비트스트림에 대해 특징 디코드를 수행하여 제1 포인트 클라우드를 생성하는 단계를 더 포함할 수 있다.Some embodiments of the first exemplary method may further include a step of performing feature decoding on the input point cloud and the first bitstream to generate a first point cloud.

제1 예시적인 방법의 일부 실시예들은: 프루닝된 포인트 클라우드에 대해 특징-잔차 변환을 수행하여 잔차 출력을 생성하는 단계; 및 프루닝된 포인트 클라우드를 잔차 출력에 추가하여 디코딩된 포인트 클라우드를 생성하는 단계를 더 포함할 수 있다.Some embodiments of the first exemplary method may further include: performing a feature-residual transformation on the pruned point cloud to generate a residual output; and adding the pruned point cloud to the residual output to generate a decoded point cloud.

제1 예시적인 방법의 일부 실시예들에서, 점유 상태를 예측하는 단계는 적어도 하나의 복셀의 실측 점유 상태를 예측한다.In some embodiments of the first exemplary method, the step of predicting an occupancy state predicts a ground truth occupancy state of at least one voxel.

제1 예시적인 방법의 일부 실시예들에서, 적어도 하나의 복셀의 점유 상태를 예측하는 단계는: 제3 포인트 클라우드의 적어도 하나의 특징을 집계하는 단계; 집계된 특징을 다층 퍼셉트론(MLP) 계층들로 처리하여 MLP 계층 출력을 생성하는 단계; MLP 계층 출력에 대해 소프트맥스 프로세스를 수행하여 소프트맥스 출력 값들을 생성하는 단계; 및 소프트맥스 출력 값들의 임계값 처리를 수행하여 제3 포인트 클라우드의 적어도 하나의 복셀의 예측된 점유 상태를 생성하는 단계를 포함한다.In some embodiments of the first exemplary method, the step of predicting the occupancy state of at least one voxel comprises: aggregating at least one feature of the third point cloud; processing the aggregated feature with multilayer perceptron (MLP) layers to generate an MLP layer output; performing a softmax process on the MLP layer output to generate softmax output values; and performing thresholding on the softmax output values to generate a predicted occupancy state of the at least one voxel of the third point cloud.

제1 예시적인 방법의 일부 실시예들에서, 적어도 하나의 복셀의 점유 상태를 예측하는 단계는: 제3 포인트 클라우드의 적어도 하나의 특징을 집계하는 단계; 및 집계된 특징에 기초하여 제3 포인트 클라우드의 적어도 하나의 복셀의 예측된 점유 상태를 생성하는 단계를 포함한다.In some embodiments of the first exemplary method, the step of predicting an occupancy state of at least one voxel comprises: aggregating at least one feature of a third point cloud; and generating a predicted occupancy state of at least one voxel of the third point cloud based on the aggregated feature.

제1 예시적인 방법의 일부 실시예들에서, 적어도 하나의 특징을 집계하는 단계는: 제1 캐스케이딩 프로세스를 한 번 이상 반복하는 단계 - 제1 캐스케이딩 프로세스는: 제1 입력 포인트 클라우드의 제1 희소 3D 콘볼루션을 수행하여 제1 콘볼루션 출력 포인트 클라우드를 생성하는 단계; 제1 콘볼루션 출력 포인트 클라우드에 대해 제1 비선형 활성화 프로세스를 수행하여 제1 비선형 출력 포인트 클라우드를 생성하는 단계; 및 제1 캐스케이딩 프로세스의 다음 사이클이 있을 경우 제1 비선형 출력 포인트 클라우드를 제1 입력 포인트 클라우드로 준비하는 단계를 포함하고, 제3 포인트 클라우드는 제1 캐스케이딩 프로세스의 첫 번째 사이클에 대한 제1 입력 포인트 클라우드이며, 제1 캐스케이딩 프로세스의 마지막 사이클은 제1 캐스케이딩 프로세스 출력을 생성함 -; 제2 캐스케이딩 프로세스를 한 번 이상 반복하는 단계 - 제2 캐스케이딩 프로세스는: 제2 입력 포인트 클라우드의 제2 희소 3D 콘볼루션을 수행하여 제2 콘볼루션 출력 포인트 클라우드를 생성하는 단계; 제2 콘볼루션 출력 포인트 클라우드에 대해 제2 비선형 활성화 프로세스를 수행하여 제2 비선형 출력 포인트 클라우드를 생성하는 단계; 및 제2 캐스케이딩 프로세스의 다음 사이클이 있을 경우 제2 비선형 출력 포인트 클라우드를 제2 입력 포인트 클라우드로 준비하는 단계를 포함하고, 제3 포인트 클라우드는 제2 캐스케이딩 프로세스의 첫 번째 사이클에 대한 제2 입력 포인트 클라우드이며, 제2 캐스케이딩 프로세스의 마지막 사이클은 제2 캐스케이딩 프로세스 출력을 생성함 -; 제1 캐스케이딩 프로세스 출력과 제2 캐스케이딩 프로세스 출력을 연결하여 연결 출력을 생성하는 단계; 및 제3 포인트 클라우드를 연결 출력에 추가하여 집계된 특징을 생성하는 단계를 포함한다.In some embodiments of the first exemplary method, the step of aggregating at least one feature comprises: repeating a first cascading process one or more times, wherein the first cascading process comprises: performing a first sparse 3D convolution of a first input point cloud to generate a first convolution output point cloud; performing a first nonlinear activation process on the first convolution output point cloud to generate a first nonlinear output point cloud; and preparing the first nonlinear output point cloud as the first input point cloud for a next cycle of the first cascading process, wherein the third point cloud is the first input point cloud for the first cycle of the first cascading process, and wherein the last cycle of the first cascading process generates the first cascading process output; repeating a second cascading process one or more times, wherein the second cascading process comprises: performing a second sparse 3D convolution of the second input point cloud to generate a second convolution output point cloud; A method comprising: performing a second nonlinear activation process on a second convolution output point cloud to generate a second nonlinear output point cloud; and preparing the second nonlinear output point cloud as a second input point cloud when there is a next cycle of the second cascading process, wherein the third point cloud is a second input point cloud for a first cycle of the second cascading process, and the last cycle of the second cascading process generates a second cascading process output; connecting the first cascading process output and the second cascading process output to generate a connected output; and adding the third point cloud to the connected output to generate aggregated features.

제1 예시적인 방법의 일부 실시예들에서, 적어도 하나의 특징을 집계하는 단계는: 제3 포인트 클라우드에 대해 셀프 어텐션 프로세스를 수행하는 단계; 제3 포인트 클라우드를 셀프 어텐션 프로세스 출력에 추가하여 MLP 프로세스 입력을 생성하는 단계; MLP 프로세스 입력에 대해 MLP 프로세스를 수행하는 단계; 및 MLP 프로세스 입력을 MLP 프로세스 출력에 추가하여 집계된 특징을 생성하는 단계를 포함한다.In some embodiments of the first exemplary method, the step of aggregating at least one feature comprises: performing a self-attention process on the third point cloud; adding the third point cloud to the self-attention process output to generate an MLP process input; performing an MLP process on the MLP process input; and adding the MLP process input to the MLP process output to generate an aggregated feature.

제3 예시적인 방법의 일부 실시예들에서, 적어도 하나의 복셀의 점유 상태를 예측하는 단계는: 제3 포인트 클라우드의 적어도 하나의 특징을 집계하는 단계; 및 집계된 특징에 기초하여 제3 포인트 클라우드의 적어도 하나의 복셀의 예측된 점유 상태를 생성하는 단계를 포함한다.In some embodiments of the third exemplary method, the step of predicting the occupancy state of at least one voxel comprises: aggregating at least one feature of the third point cloud; and generating a predicted occupancy state of at least one voxel of the third point cloud based on the aggregated feature.

본 개시는 툴, 특징, 실시예, 모델, 접근 방식 등을 포함한 다양한 양태들을 설명한다. 이러한 양태들의 대부분은 구체적으로 설명되고, 적어도 개별 특성들을 보여주기 위해, 종종 제한적인 것처럼 들릴 수 있는 방식으로 설명된다. 그렇지만, 이는 설명의 명확성을 위한 것이며, 해당 양태들의 개시 또는 범위를 제한하지 않는다. 실제로, 상이한 양태들 모두는 추가의 양태들을 제공하기 위해 결합되고 상호 교환될 수 있다. 더욱이, 이 양태들이 또한 이전 출원들에도 설명된 양태들과 결합되고 상호 교환될 수 있다.The present disclosure describes various aspects, including tools, features, embodiments, models, approaches, and the like. Many of these aspects are described in detail, and sometimes in a way that may sound limiting, at least to show individual characteristics. However, this is for clarity of description and does not limit the disclosure or scope of the aspects. In fact, all of the different aspects can be combined and interchanged to provide additional aspects. Moreover, these aspects can also be combined and interchanged with aspects described in the previous applications.

본 개시에서 설명되고 고려되는 양태들은 많은 상이한 형태들로 구현될 수 있다. 일부 실시예들이 구체적으로 예시되어 있지만, 다른 실시예들도 생각되며, 특정 실시예들에 대한 논의가 구현들의 범위를 제한하지 않다. 양태들 중 적어도 하나는 일반적으로 비디오 인코딩 및 디코딩에 관한 것이고, 적어도 하나의 다른 양태는 일반적으로 생성되거나 인코딩되는 비트스트림을 송신하는 것에 관한 것이다. 이들 및 다른 양태들은 방법, 장치, 설명된 방법들 중 임의의 방법에 따라 비디오 데이터를 인코딩 또는 디코딩하기 위한 명령어들을 저장한 컴퓨터 판독 가능 저장 매체, 및/또는 설명된 방법들 중 임의의 방법에 따라 생성되는 비트스트림을 저장한 컴퓨터 판독 가능 저장 매체로서 구현될 수 있다.The aspects described and contemplated in this disclosure may be implemented in many different forms. While certain embodiments are specifically illustrated, other embodiments are contemplated, and the discussion of particular embodiments does not limit the scope of implementations. At least one of the aspects relates generally to video encoding and decoding, and at least one other aspect relates generally to transmitting a bitstream generated or encoded. These and other aspects may be implemented as a method, an apparatus, a computer-readable storage medium storing instructions for encoding or decoding video data according to any of the described methods, and/or a computer-readable storage medium storing a bitstream generated according to any of the described methods.

본 개시에서, "재구성된" 및 "디코딩된"이라는 용어들은 상호 교환적으로 사용될 수 있고, "픽셀" 및 "샘플"이라는 용어들은 상호 교환적으로 사용될 수 있으며, "이미지", "픽처" 및 "프레임"이라는 용어들은 상호 교환적으로 사용될 수 있다. 일반적으로, 그러나 반드시 그런 것은 아니지만, "재구성된"이라는 용어는 인코더 측에서 사용되는 반면 "디코딩된"이라는 용어는 디코더 측에서 사용된다.In this disclosure, the terms "reconstructed" and "decoded" may be used interchangeably, the terms "pixel" and "sample" may be used interchangeably, and the terms "image", "picture" and "frame" may be used interchangeably. Typically, but not necessarily, the term "reconstructed" is used on the encoder side, while the term "decoded" is used on the decoder side.

HDR(high dynamic range) 및 SDR(standard dynamic range)이라는 용어들은 종종 본 기술 분야의 통상의 기술자에게 특정 다이내믹 레인지 값을 전달한다. 그러나, HDR에 대한 언급이 "더 높은 다이내믹 레인지"를 의미하는 것으로 이해되고 SDR에 대한 언급이 "더 낮은 다이내믹 레인지"를 의미하는 것으로 이해되는 추가적인 실시예들도 의도된다. 이러한 추가적인 실시예들은 종종 "높은 다이내믹 레인지" 및 "표준 다이내믹 레인지"라는 용어들과 연관될 수 있는 임의의 특정 다이내믹 레인지 값에 의해 제약되지 않는다.The terms HDR (high dynamic range) and SDR (standard dynamic range) often convey particular dynamic range values to one of ordinary skill in the art. However, additional embodiments are also intended in which reference to HDR is understood to mean "higher dynamic range" and reference to SDR is understood to mean "lower dynamic range." These additional embodiments are not limited by any particular dynamic range values that may often be associated with the terms "high dynamic range" and "standard dynamic range."

다양한 방법들이 본 명세서에서 설명되고, 방법들 각각은 설명된 방법을 달성하기 위한 하나 이상의 단계 또는 행동을 포함한다. 방법의 적절한 작동을 위해 특정 순서의 단계들 또는 행동들이 필요하지 않은 한, 특정 단계들 및/또는 행동들의 순서 및/또는 사용이 수정되거나 결합될 수 있다. 추가적으로, "제1", "제2" 등과 같은 용어들은 다양한 실시예들에서, 예를 들어, "제1 디코딩" 및 "제2 디코딩"과 같은, 요소, 컴포넌트, 단계, 동작 등을 수식하기 위해 사용될 수 있다. 그러한 용어들의 사용은, 특별히 요구되지 않는 한, 수식된 동작들에 대한 순서를 의미하지 않는다. 따라서, 이 예에서, 제1 디코딩은 제2 디코딩 이전에 수행될 필요가 없고, 예를 들어, 제2 디코딩 이전에, 제2 디코딩 동안에, 또는 제2 디코딩과 중첩하는 시간 기간에 발생할 수 있다.Various methods are described herein, each of which comprises one or more steps or acts for achieving the described method. The order and/or use of particular steps and/or acts may be modified or combined, unless a particular order of steps or acts is required for the proper operation of the method. Additionally, terms such as “first,” “second,” etc. may be used in various embodiments to describe elements, components, steps, operations, etc., such as, for example, “first decoding” and “second decoding.” The use of such terms does not imply an order for the described operations, unless specifically required. Thus, in this example, the first decoding need not be performed before the second decoding, and may occur, for example, before the second decoding, during the second decoding, or in a time period overlapping with the second decoding.

예를 들어, 본 개시에서는 다양한 수치 값들이 사용될 수 있다. 특정 값들은 예시를 위한 것이며, 설명된 양태들은 이러한 특정 값들로 제한되지 않는다.For example, various numerical values may be used in the present disclosure. Specific values are provided for illustrative purposes only, and the described embodiments are not limited to these specific values.

본 명세서에 설명된 실시예들은 프로세서 또는 다른 하드웨어에 의해 구현되는 컴퓨터 소프트웨어에 의해, 또는 하드웨어와 소프트웨어의 조합에 의해 수행될 수 있다. 비제한적인 예로서, 실시예들은 하나 이상의 집적 회로에 의해 구현될 수 있다. 프로세서는 기술적 환경에 적절한 임의의 유형일 수 있으며, 비제한적 예로서, 마이크로프로세서, 범용 컴퓨터, 특수 목적 컴퓨터 및 멀티 코어 아키텍처 기반 프로세서 중 하나 이상을 포괄할 수 있다.The embodiments described herein may be performed by computer software implemented by a processor or other hardware, or by a combination of hardware and software. As a non-limiting example, the embodiments may be implemented by one or more integrated circuits. The processor may be of any type suitable to the technical environment, including, but not limited to, one or more of a microprocessor, a general purpose computer, a special purpose computer, and a multi-core architecture-based processor.

다양한 구현들은 디코딩을 포함한다. "디코딩"은, 본 개시에서 사용되는 바와 같이, 디스플레이하기 적합한 최종 출력을 생성하기 위해, 예를 들어, 수신된 인코딩된 시퀀스에 대해 수행되는 프로세스들의 전부 또는 일부를 포괄할 수 있다. 다양한 실시예들에서, 그러한 프로세스들은 디코더에 의해 전형적으로 수행되는 프로세스들, 예를 들어, 엔트로피 디코딩, 역양자화, 역변환 및 차분 디코딩 중 하나 이상을 포함한다. 다양한 실시예들에서, 이러한 프로세스들은 또한, 또는 대안적으로, 본 개시에 설명된 다양한 구현들의 디코더에 의해 수행되는 프로세스들, 예를 들어, 타일링된(패킹된) 픽처에서 픽처를 추출하는 것, 사용할 업샘플링 필터를 결정한 다음 픽처를 업샘플링하는 것, 및 픽처를 그의 의도된 배향으로 다시 뒤집는 것을 포함한다.Various implementations include decoding. "Decoding," as used herein, may encompass all or part of, for example, processes performed on a received encoded sequence to generate a final output suitable for display. In various embodiments, such processes include processes typically performed by a decoder, such as one or more of entropy decoding, inverse quantization, inverse transform, and differential decoding. In various embodiments, such processes also, or alternatively, include processes performed by a decoder of various implementations described herein, such as extracting a picture from a tiled (packed) picture, determining an upsampling filter to use, and then upsampling the picture, and flipping the picture back to its intended orientation.

추가의 예들로서, 일 실시예에서, "디코딩"은 엔트로피 디코딩만을 지칭하고, 다른 실시예에서 "디코딩"은 차분 디코딩만을 지칭하며, 다른 실시예에서 "디코딩"은 엔트로피 디코딩과 차분 디코딩의 조합을 지칭한다. "디코딩 프로세스"라는 문구가 동작들의 서브세트를 특정하여 지칭하는 것으로 의도되는지 또는 보다 넓은 디코딩 프로세스를 일반적으로 지칭하는 것으로 의도되는지는 구체적인 설명의 맥락에 기초하여 명확할 것이다.As additional examples, in one embodiment, "decoding" refers only to entropy decoding, in another embodiment, "decoding" refers only to differential decoding, and in another embodiment, "decoding" refers to a combination of entropy decoding and differential decoding. Whether the phrase "decoding process" is intended to refer specifically to a subset of operations or to the broader decoding process in general will be clear based on the context of the specific description.

다양한 구현들은 인코딩을 포함한다. "디코딩"에 관한 상기 논의와 유사한 방식으로, "인코딩"은, 본 개시에서 사용되는 바와 같이, 예를 들어, 인코딩된 비트스트림을 생성하기 위해 입력 비디오 시퀀스에 대해 수행되는 프로세스들의 전부 또는 일부를 포괄할 수 있다. 다양한 실시예들에서, 그러한 프로세스들은 인코더에 의해 전형적으로 수행되는 프로세스들, 예를 들어, 분할, 차분 인코딩, 변환, 양자화, 및 엔트로피 인코딩 중 하나 이상을 포함한다. 다양한 실시예들에서, 그러한 프로세스들은, 또한 또는 대안적으로, 본 개시에서 설명되는 다양한 구현들의 인코더에 의해 수행되는 프로세스들을 포함한다.Various implementations include encoding. In a manner similar to the discussion above regarding "decoding," "encoding," as used herein, may encompass all or part of the processes performed on an input video sequence to generate, for example, an encoded bitstream. In various embodiments, such processes include one or more of the processes typically performed by an encoder, for example, segmentation, differential encoding, transform, quantization, and entropy encoding. In various embodiments, such processes also or alternatively include processes performed by an encoder of various implementations described herein.

추가의 예들로서, 일 실시예에서 "인코딩"은 엔트로피 인코딩만을 지칭하고, 다른 실시예에서 "인코딩"은 차분 인코딩만을 지칭하며, 다른 실시예에서 "인코딩"은 차분 인코딩과 엔트로피 인코딩의 조합을 지칭한다. "인코딩 프로세스"라는 문구가 동작들의 서브세트를 특정하여 지칭하는 것으로 의도되는지 또는 보다 넓은 인코딩 프로세스를 일반적으로 지칭하는 것으로 의도되는지는 구체적인 설명의 맥락에 기초하여 명확할 것이다.As additional examples, in one embodiment "encoding" refers only to entropy encoding, in another embodiment "encoding" refers only to differential encoding, and in another embodiment "encoding" refers to a combination of differential encoding and entropy encoding. Whether the phrase "encoding process" is intended to refer specifically to a subset of operations or to the broader encoding process in general will be clear based on the context of the specific description.

도면이 흐름 다이어그램으로서 제시될 때, 도면이 또한 대응하는 장치의 블록 다이어그램을 제공한다는 것이 이해되어야 한다. 유사하게, 도면이 블록 다이어그램으로서 제시될 때, 도면이 또한 대응하는 방법/프로세스의 흐름 다이어그램을 제공한다는 것이 이해되어야 한다.When a drawing is presented as a flow diagram, it should be understood that the drawing also provides a block diagram of a corresponding device. Similarly, when a drawing is presented as a block diagram, it should be understood that the drawing also provides a flow diagram of a corresponding method/process.

본 명세서에 설명되는 구현들 및 양태들은, 예를 들어, 방법 또는 프로세스, 장치, 소프트웨어 프로그램, 데이터 스트림, 또는 신호로 구현될 수 있다. 단일 형태의 구현의 맥락에서만 논의되어 있더라도(예를 들어, 방법으로서만 논의되어 있더라도), 논의된 특징들의 구현이 또한 다른 형태들(예를 들어, 장치 또는 프로그램)로 구현될 수 있다. 장치가, 예를 들어, 적절한 하드웨어, 소프트웨어, 및 펌웨어로 구현될 수 있다. 방법들이, 예를 들어, 컴퓨터, 마이크로프로세서, 집적 회로, 또는 프로그래밍 가능 로직 디바이스(programmable logic device)를 포함한, 일반 처리 디바이스들을 지칭하는, 예를 들어, 프로세서에서 구현될 수 있다. 프로세서들은, 예를 들어, 컴퓨터, 셀 폰, 휴대용/개인 휴대 단말("PDA"), 및 최종 사용자들 사이의 정보의 통신을 용이하게 하는 다른 디바이스와 같은, 통신 디바이스들을 또한 포함한다.The implementations and aspects described herein may be implemented, for example, as a method or process, an apparatus, a software program, a data stream, or a signal. Even if discussed in the context of only a single form of implementation (e.g., discussed only as a method), the implementation of the features discussed may also be implemented in other forms (e.g., as an apparatus or a program). An apparatus may be implemented, for example, in suitable hardware, software, and firmware. The methods may be implemented in, for example, a processor, which refers to general processing devices, including, for example, a computer, a microprocessor, an integrated circuit, or a programmable logic device. Processors also include communication devices, such as, for example, computers, cell phones, portable/personal digital assistants ("PDAs"), and other devices that facilitate communication of information between end users.

"일 실시예" 또는 "실시예" 또는 "일 구현" 또는 "구현"은 물론, 이들의 다른 변형들에 대한 언급은 해당 실시예와 관련하여 설명되는 특정 특징, 구조, 특성 등이 적어도 하나의 실시예에 포함된다는 것을 의미한다. 따라서, 본 개시 전반에 걸쳐 여러 곳에서 나오는 "일 실시예에서" 또는 "실시예에서" 또는 "일 구현에서" 또는 "구현에서"는 물론 임의의 다른 변형들과 같은 문구들의 출현들은 모두가 반드시 동일한 실시예를 지칭하는 것은 아니다.Reference to “one embodiment” or “an embodiment” or “an implementation” or “an implementation” as well as other variations thereof means that a particular feature, structure, characteristic, etc. described in connection with the embodiment is included in at least one embodiment. Thus, the appearances of the phrases “in one embodiment” or “in an embodiment” or “in an implementation” or “in an implementation” as well as any other variations thereof in various places are not necessarily all referring to the same embodiment.

추가적으로, 본 개시는 다양한 정보들을 "결정"하는 것을 언급할 수 있다. 정보를 결정하는 것은, 예를 들어, 정보를 추정하는 것, 정보를 산출하는 것, 정보를 예측하는 것, 또는 메모리로부터 정보를 검색하는 것 중 하나 이상을 포함할 수 있다.Additionally, the present disclosure may refer to "determining" various pieces of information. Determining the information may include, for example, one or more of estimating the information, calculating the information, predicting the information, or retrieving the information from memory.

게다가, 본 개시는 다양한 정보들에 "액세스"하는 것을 언급할 수 있다. 정보에 액세스하는 것은, 예를 들어, 정보를 수신하는 것, (예를 들어, 메모리로부터) 정보를 검색하는 것, 정보를 저장하는 것, 정보를 이동시키는 것, 정보를 복사하는 것, 정보를 산출하는 것, 정보를 결정하는 것, 정보를 예측하는 것, 또는 정보를 추정하는 것 중 하나 이상을 포함할 수 있다.Additionally, the present disclosure may refer to "accessing" various pieces of information. Accessing the information may include, for example, one or more of receiving the information, retrieving the information (e.g., from memory), storing the information, moving the information, copying the information, calculating the information, determining the information, predicting the information, or estimating the information.

추가적으로, 본 개시는 다양한 정보들을 "수신"하는 것을 언급할 수 있다. 수신하는 것은, "액세스하는 것"에서와 같이, 광범위한 용어(broad term)로 의도되어 있다. 정보를 수신하는 것은, 예를 들어, 정보에 액세스하는 것, 또는 (예를 들어, 메모리로부터) 정보를 검색하는 것 중 하나 이상을 포함할 수 있다. 게다가, "수신하는 것"이 전형적으로, 예를 들어, 정보를 저장하는 것, 정보를 처리하는 것, 정보를 송신하는 것, 정보를 이동시키는 것, 정보를 복사하는 것, 정보를 소거하는 것, 정보를 산출하는 것, 정보를 결정하는 것, 정보를 예측하는 것, 또는 정보를 추정하는 것과 같은 동작들 동안, 어떻게든, 수반될 수 있다.Additionally, the present disclosure may refer to "receiving" various pieces of information. Receiving is intended to be a broad term, as in "accessing." Receiving information may include, for example, one or more of accessing the information, or retrieving the information (e.g., from memory). Furthermore, "receiving" may typically involve, in some way, actions such as storing the information, processing the information, transmitting the information, moving the information, copying the information, erasing the information, generating the information, determining the information, predicting the information, or estimating the information.

예를 들어, "A/B", "A 및/또는 B" 및 "A 및 B 중 적어도 하나"의 경우에, 다음의 "/", "및/또는", 및 "~중 적어도 하나" 중 임의의 것의 사용이 첫 번째 열거된 옵션(A)만의 선택, 또는 두 번째 열거된 옵션(B)만의 선택, 또는 두 옵션(A 및 B) 모두의 선택을 포괄하는 것으로 의도된다는 것이 이해되어야 한다. 추가의 예로서, "A, B, 및/또는 C" 및 "A, B, 및 C 중 적어도 하나"의 경우에, 그러한 문구는 첫 번째 열거된 옵션(A)만의 선택, 또는 두 번째 열거된 옵션(B)만의 선택, 또는 세 번째 열거된 옵션(C)만의 선택, 또는 첫 번째 및 두 번째 열거된 옵션들(A 및 B)만의 선택, 첫 번째 및 세 번째 열거된 옵션들(A 및 C)만의 선택, 또는 두 번째 및 세 번째 열거된 옵션들(B 및 C)만의 선택, 또는 세 가지 옵션 모두(A 및 B 및 C)의 선택을 포괄하는 것으로 의도된다. 이는 나열된 항목 수만큼 확장될 수 있다.For example, in the cases of "A/B", "A and/or B", and "at least one of A and B", it should be understood that the use of any of the following "/", "and/or", and "at least one of" is intended to encompass the selection of the first listed option (A) only, or the selection of the second listed option (B) only, or the selection of both options (A and B). As a further example, in the cases of "A, B, and/or C" and "at least one of A, B, and C", such phrases are intended to encompass the selection of the first listed option (A) only, or the selection of the second listed option (B) only, or the selection of the third listed option (C) only, or the selection of the first and second listed options (A and B) only, the selection of the first and third listed options (A and C) only, or the selection of the second and third listed options (B and C) only, or the selection of all three options (A and B and C). This can be extended by as many items as listed.

또한, 본 명세서에서 사용되는 바와 같이, "시그널링하다(signal)"라는 단어는, 그 중에서도, 대응하는 디코더에 무언가를 알리는 것을 지칭한다. 예를 들어, 특정 실시예들에서, 인코더는 아티팩트 제거 필터링(de-artifact filtering)을 위한 영역 기반 필터 파라미터 선택을 위해 복수의 파라미터 중 특정 파라미터를 시그널링한다. 이러한 방식으로, 일 실시예에서, 인코더 측과 디코더 측 둘 모두에서 동일한 파라미터가 사용된다. 따라서, 예를 들어, 인코더는 디코더가 동일한 특정 파라미터를 사용할 수 있도록 특정 파라미터를 디코더로 송신할 수 있다(명시적 시그널링). 반대로, 디코더가 특정 파라미터는 물론 다른 파라미터들을 이미 가지고 있는 경우, 단순히 디코더가 특정 파라미터를 알고 선택할 수 있게 하기 위해 송신 없이 시그널링이 사용될 수 있다(암시적 시그널링). 임의의 실제 기능들의 송신을 피하는 것에 의해, 다양한 실시예들에서 비트 절감이 실현된다. 시그널링이 각종의 방식들로 달성될 수 있다는 것이 이해되어야 한다. 예를 들어, 다양한 실시예들에서 하나 이상의 신택스 요소, 플래그 등이 정보를 대응하는 디코더로 시그널링하기 위해 사용된다. 전술한 바는 동사 형태의 단어 "시그널링"에 관련이 있지만, "신호(signal)"라는 단어가 본 명세서에서 명사로도 사용될 수 있다.Also, as used herein, the word "signal" refers, inter alia, to informing a corresponding decoder of something. For example, in certain embodiments, an encoder signals a particular parameter among a plurality of parameters for region-based filter parameter selection for de-artifact filtering. In this way, in one embodiment, the same parameter is used on both the encoder side and the decoder side. Thus, for example, the encoder can transmit a particular parameter to the decoder so that the decoder can use the same particular parameter (explicit signaling). Conversely, if the decoder already has the particular parameter as well as other parameters, signaling can be used without transmission simply to enable the decoder to know and select the particular parameter (implicit signaling). By avoiding transmission of any actual functions, bit savings are realized in various embodiments. It should be appreciated that signaling can be accomplished in a variety of ways. For example, in various embodiments, one or more syntax elements, flags, etc. are used to signal information to a corresponding decoder. While the foregoing relates to the word "signaling" in verb form, the word "signal" may also be used herein as a noun.

구현들은, 예를 들어, 저장되거나 송신될 수 있는 정보를 전달하도록 포맷팅된 다양한 신호들을 생성할 수 있다. 정보는, 예를 들어, 방법을 수행하기 위한 명령어들, 또는 설명된 구현들 중 하나에 의해 생성되는 데이터를 포함할 수 있다. 예를 들어, 신호는 설명된 실시예의 비트스트림을 전달하도록 포맷팅될 수 있다. 그러한 신호는, 예를 들어, 전자기파로서(예를 들어, 스펙트럼의 라디오 주파수 부분을 사용하여) 또는 기저대역 신호로서 포맷팅될 수 있다. 포맷팅은, 예를 들어, 데이터 스트림을 인코딩하는 것 및 캐리어를 인코딩된 데이터 스트림을 사용하여 변조하는 것을 포함할 수 있다. 신호가 전달하는 정보는, 예를 들어, 아날로그 또는 디지털 정보일 수 있다. 신호는, 알려진 바와 같이, 각종의 상이한 유선 또는 무선 링크들을 통해 송신될 수 있다. 신호는 프로세서 판독 가능 매체에 저장될 수 있다.Implementations may generate various signals formatted to convey information that may be stored or transmitted, for example. The information may include, for example, instructions for performing a method, or data generated by one of the described implementations. For example, the signal may be formatted to convey a bitstream of the described embodiment. Such a signal may be formatted, for example, as an electromagnetic wave (e.g., using a radio frequency portion of the spectrum) or as a baseband signal. Formatting may include, for example, encoding a data stream and modulating a carrier with the encoded data stream. The information conveyed by the signal may be, for example, analog or digital information. The signal may be transmitted over a variety of different wired or wireless links, as is known. The signal may be stored on a processor-readable medium.

설명된 실시예들 중 하나 이상의 실시예의 다양한 하드웨어 요소들이 각자의 모듈들과 관련하여 본 명세서에 설명되는 다양한 기능들을 수행(carry out)하는(즉, 수행(perform)하는, 실행하는 등) "모듈들"이라고 지칭된다는 점에 유의한다. 본 명세서에서 사용되는 바와 같이, 모듈은 주어진 구현에 대해 관련 기술 분야의 통상의 기술자에 의해 적합한 것으로 간주되는 하드웨어(예컨대, 하나 이상의 프로세서, 하나 이상의 마이크로프로세서, 하나 이상의 마이크로컨트롤러, 하나 이상의 마이크로칩, 하나 이상의 ASIC(application-specific integrated circuit), 하나 이상의 FPGA(field programmable gate array), 하나 이상의 메모리 디바이스)를 포함한다. 각각의 설명된 모듈은 각자의 모듈에 의해 수행되는 것으로 설명된 하나 이상의 기능을 수행하기 위해 실행 가능한 명령어들을 또한 포함할 수 있으며, 그 명령어들이 하드웨어(즉, 하드와이어드) 명령어들, 펌웨어 명령어들, 소프트웨어 명령어들 등의 형태를 취하거나 이들을 포함할 수 있으며, RAM, ROM 등이라고 흔히 지칭되는 것과 같은, 임의의 적합한 비일시적 컴퓨터 판독 가능 매체 또는 매체들에 저장될 수 있음에 유의해야 한다.Note that various hardware elements of one or more of the described embodiments are referred to as "modules" which carry out (i.e., perform, execute, etc.) various functions described herein in connection with their respective modules. As used herein, a module includes hardware (e.g., one or more processors, one or more microprocessors, one or more microcontrollers, one or more microchips, one or more application-specific integrated circuits (ASICs), one or more field programmable gate arrays (FPGAs), one or more memory devices) deemed suitable by one of ordinary skill in the art for a given implementation. It should be noted that each described module may also include instructions executable to perform one or more of the functions described as being performed by the respective module, which instructions may take the form of or include hardware (i.e., hardwired) instructions, firmware instructions, software instructions, and the like, and may be stored in any suitable non-transitory computer-readable medium or media, such as commonly referred to as RAM, ROM, etc.

비록 특징들 및 요소들이 특정 조합들로 위에서 설명되어 있지만, 본 기술 분야의 통상의 기술자는 각각의 특징 또는 요소가 단독으로 또는 다른 특징들 및 요소들과 임의의 조합으로 사용될 수 있음을 이해할 것이다. 추가적으로, 본 명세서에 설명된 방법들은 컴퓨터 또는 프로세서에 의해 실행하기 위한 컴퓨터 판독 가능 매체에 포함된 컴퓨터 프로그램, 소프트웨어, 또는 펌웨어로 구현될 수 있다. 컴퓨터 판독 가능 저장 매체의 예들은 ROM(read only memory), RAM(random access memory), 레지스터(register), 캐시 메모리, 반도체 메모리 디바이스, 내부 하드 디스크 및 이동식 디스크와 같은 자기 매체, 자기 광학 매체, 그리고 CD-ROM 디스크 및 DVD(digital versatile disk)와 같은 광학 매체를 포함하지만, 이에 제한되지 않는다. 프로세서는 소프트웨어와 함께 WTRU, UE, 단말, 기지국, RNC, 또는 임의의 호스트 컴퓨터에서 사용하기 위한 무선 주파수 트랜시버를 구현하는 데 사용될 수 있다.Although the features and elements are described above in specific combinations, those skilled in the art will appreciate that each feature or element may be used alone or in any combination with other features and elements. Additionally, the methods described herein may be implemented as a computer program, software, or firmware contained in a computer-readable medium for execution by a computer or processor. Examples of computer-readable storage media include, but are not limited to, read only memory (ROM), random access memory (RAM), registers, cache memory, semiconductor memory devices, magnetic media such as internal hard disks and removable disks, magneto-optical media, and optical media such as CD-ROM disks and digital versatile disks (DVDs). The processor, in conjunction with software, may be used to implement a radio frequency transceiver for use in a WTRU, a UE, a terminal, a base station, an RNC, or any host computer.

Claims

As a method,
A step of obtaining a second point cloud by upsampling the first point cloud using initial upsampling;
A step of obtaining a third point cloud by associating features of the second point cloud with context information;
a step of predicting the occupancy state of at least one voxel of the third point cloud; and
A step of generating a pruned point cloud by removing voxels of the third point cloud classified as empty based on the predicted occupancy status.
A method comprising:

A method in claim 1, wherein the initial upsampling includes nearest neighbor upsampling.

A method according to claim 1 or 2, wherein associating features comprises obtaining the third point cloud by connecting features of the second point cloud with the context information.

A method according to any one of claims 1 to 3, wherein the context information is voxel-unit context information.

A method according to any one of claims 1 to 4, wherein the context information includes a context point cloud.

A method according to any one of claims 1 to 5, wherein the context information includes information about the second point cloud.

A method according to any one of claims 1 to 6, wherein the context information includes information on voxel occupancy status of the second point cloud.

A method according to any one of claims 1 to 7, wherein the context information includes information about the location of a child voxel with respect to the location of a parent voxel of the first point cloud.

A method according to any one of claims 1 to 8, wherein the context information includes coordinate information regarding the location of an occupied voxel of at least one of the first and second point clouds.

In any one of claims 1 to 9,
The above context information includes coordinate information,
A method wherein the above coordinate information is in one of the forms of Euclidean coordinates, spherical coordinates, and cylindrical coordinates.

A method according to any one of claims 1 to 10, wherein the context information provides known information about the first point cloud in addition to information available for the initial upsampling of the first point cloud.

A method according to any one of claims 1 to 11, wherein the context information includes a bit depth of the second point cloud.

A method according to any one of claims 1 to 21, further comprising the step of generating the first point cloud by performing feature decode on the input point cloud and the first bitstream.

In Article 13,
A step of performing feature aggregation on the above pruned point cloud to generate aggregated features; and
A method further comprising the step of performing a context-aware upsampling process on the above-mentioned aggregated features to generate a decoded point cloud.

In Article 13,
A step of performing feature to residual conversion on the above pruned point cloud to generate residual output; and
A method further comprising the step of adding the pruned point cloud to the residual output to generate a decoded point cloud.

In Article 15,
Further comprising a step of performing feature aggregation on the above pruned point cloud to generate aggregated features,
A method wherein the above feature-residual transformation is performed on the above aggregated features.

A method according to any one of claims 1 to 16, wherein the step of predicting the occupancy state is performed using a first neural network.

A method according to any one of claims 1 to 17, wherein the step of predicting the occupancy status predicts the ground-truth occupancy status of at least one voxel.

A method according to any one of claims 1 to 18, wherein the step of predicting the occupancy state predicts a possibility that the at least one voxel is occupied.

A method according to any one of claims 1 to 19, wherein removing voxels of the third point cloud removes voxels using a voxel pruning process.

A method according to any one of claims 1 to 20, further comprising a step of aggregating at least one feature of the second point cloud.

In any one of claims 1 to 21, the step of predicting the occupancy state of at least one voxel comprises:
a step of aggregating at least one feature of the third point cloud;
A step of processing the aggregated features with multilayer perceptron (MLP) layers to generate MLP layer output;
A step of performing a softmax process on the above MLP layer output to generate softmax output values; and
A method comprising the step of performing thresholding of the softmax output values to generate a predicted occupancy state of at least one voxel of the third point cloud.

In the 22nd paragraph, the threshold processing of the softmax output values is a method in which softmax output values exceeding 0.5 are converted into an output value of 1, and softmax output values less than or equal to 0.5 are converted into an output value of 0.

In any one of claims 1 to 21, the step of predicting the occupancy state of at least one voxel comprises:
a step of aggregating at least one feature of the third point cloud; and
A method comprising the step of generating a predicted occupancy state of at least one voxel of the third point cloud based on the aggregated features.

In any one of claims 22 to 24, the step of aggregating at least one feature comprises:
A method comprising the steps of repeating a cascading process one or more times, said cascading process comprising:
A step of performing sparse 3D convolution of an input point cloud to generate a convolution output point cloud;
A step of generating a nonlinear output point cloud by performing a nonlinear activation process on the above convolution output point cloud; and
Including a step of preparing the nonlinear output point cloud as an input point cloud when there is a next cycle of the above cascading process,
The above third point cloud is the input point cloud for the first cycle of the above cascading process,
The final cycle of the above cascading process is a method for generating the aggregated features.

A method in claim 25, further comprising the step of adding the third point cloud to the ReLU output point cloud of the last cycle of the cascading process.

In any one of claims 22 to 24, the step of aggregating at least one feature comprises:
A step of performing sparse 3D convolution of an input point cloud to generate a convolution output point cloud; and
A method comprising the step of performing a nonlinear activation process on the convolution output point cloud to generate the aggregated features.

A method according to any one of claims 25 to 27, wherein the nonlinear activation process comprises a rectifier linear unit (ReLU) activation process, and the nonlinear output point cloud comprises a ReLU output point cloud.

In any one of claims 22 to 24, the step of aggregating at least one feature comprises:
A step of repeating a first cascading process one or more times, wherein said first cascading process:
A step of performing a first sparse 3D convolution of a first input point cloud to generate a first convolution output point cloud;
A step of generating a first nonlinear output point cloud by performing a first nonlinear activation process on the first convolution output point cloud; and
Including the step of preparing the first nonlinear output point cloud as the first input point cloud when there is a next cycle of the first cascading process,
The above third point cloud is the first input point cloud for the first cycle of the above first cascading process,
The last cycle of the first cascading process produces the first cascading process output;
A step of repeating the second cascading process one or more times, wherein the second cascading process comprises:
A step of performing a second sparse 3D convolution of the second input point cloud to generate a second convolution output point cloud;
A step of generating a second nonlinear output point cloud by performing a second nonlinear activation process on the second convolution output point cloud; and
Including the step of preparing the second nonlinear output point cloud as a second input point cloud when there is a next cycle of the second cascading process,
The above third point cloud is the second input point cloud for the first cycle of the second cascading process,
The last cycle of the second cascading process produces the second cascading process output;
A step of concatenating the output of the first cascading process and the output of the second cascading process to generate a concatenation output; and
A method comprising the step of adding the third point cloud to the connected output to generate the aggregated features.

In any one of claims 22 to 24, the step of aggregating at least one feature comprises:
A step of repeating the first cascading process one or more times, wherein the first cascading process comprises:
A step of performing a first sparse 3D convolution of a first input point cloud to generate a first convolution output point cloud;
A step of generating a first ReLU output point cloud by performing a first ReLU (rectifier linear unit) activation process on the first convolution output point cloud; and
Including a step of preparing the first ReLU output point cloud as the first input point cloud when there is a next cycle of the first cascading process,
The above third point cloud is the first input point cloud for the first cycle of the above first cascading process,
The last cycle of the first cascading process produces the first cascading process output;
A step of repeating a second cascading process one or more times, wherein said second cascading process comprises:
A step of performing a second sparse 3D convolution of the second input point cloud to generate a second convolution output point cloud;
A step of generating a second ReLU output point cloud by performing a second ReLU (rectifier linear unit) activation process on the second convolution output point cloud; and
Including a step of preparing the second ReLU output point cloud as a second input point cloud when there is a next cycle of the second cascading process,
The above third point cloud is the second input point cloud for the first cycle of the second cascading process,
The last cycle of the second cascading process produces the second cascading process output;
A step of generating a connected output by connecting the output of the first cascading process and the output of the second cascading process; and
A method comprising the step of adding the third point cloud to the connected output to generate the aggregated features.

In any one of claims 22 to 24, the step of aggregating at least one feature comprises:
A step of performing a self-attention process on the third point cloud;
A step of adding the third point cloud to the self-attention process output to generate an MLP process input;
A step of performing an MLP process on the above MLP process input; and
A method comprising the step of adding said MLP process input to said MLP process output to generate said aggregated features.

In the 31st paragraph, the self-attention process generates output features based on k nearest neighbors of voxels of the third point cloud.

A method according to any one of claims 22 to 24, wherein the step of aggregating at least one feature of the third point cloud comprises performing the feature aggregation process two or more times.

As a device,
processor; and
When executed by said processor, said device causes
Obtain a second point cloud by upsampling the first point cloud using initial upsampling;
The features of the second point cloud are associated with context information to obtain a third point cloud;
Predict the occupancy state of at least one voxel of the third point cloud;
Based on the predicted occupancy status, the voxels of the third point cloud classified as empty are removed to generate a pruned point cloud.
A non-transitory computer-readable medium storing instructions for operation.
A device comprising:

A device in claim 34, wherein the initial upsampling comprises nearest neighbor upsampling.

A device according to claim 34 or 35, wherein associating features comprises obtaining the third point cloud by connecting features of the second point cloud with the context information.

As a device,
Device according to Article 34; and
(i) an antenna configured to receive a signal, said signal comprising data representing an image; (ii) a band limiter configured to limit the received signal to a frequency band comprising said data representing said image; or (iii) at least one display configured to display said image.
A device comprising:

A device according to claim 37, further comprising at least one of a TV, a cell phone, a tablet, and a set-top box (STB).

A computer-readable medium containing instructions, said instructions causing one or more processors to:
Obtain a second point cloud by upsampling the first point cloud using initial upsampling;
The features of the second point cloud are associated with context information to obtain a third point cloud;
Predict the occupancy state of at least one voxel of the third point cloud;
Based on the predicted occupancy state, the voxels of the third point cloud classified as empty are removed to generate a pruned point cloud.
Computer readable medium.

A computer program product comprising instructions, wherein the instructions, when the program is executed by one or more processors, cause the one or more processors to:
Obtain a second point cloud by upsampling the first point cloud using initial upsampling;
The features of the second point cloud are associated with context information to obtain a third point cloud;
Predict the occupancy state of at least one voxel of the third point cloud;
A computer program product that generates a pruned point cloud by removing voxels of the third point cloud classified as empty based on the predicted occupancy state.

As a method,
A step of determining an upsampled second point cloud by performing context-aware upsampling of the first point cloud.
Including,
The above context-aware upsampling:
associating features of a third point cloud with contextual information, wherein the third point cloud is at least partially based on an initial upsampled version of the first point cloud; and
A method comprising generating the upscaled second point cloud by removing voxels of the fourth point cloud that are predicted to be empty at least partially based on the context information from the third point cloud.

As a method,
A step of obtaining a second point cloud by upsampling the first point cloud using initial upsampling;
A step of obtaining a third point cloud by associating features of the second point cloud with context information;
A step of predicting the occupancy state of at least one voxel of the third point cloud;
The step of predicting the occupancy state of at least one voxel comprises the step of aggregating at least one feature of the third point cloud,
The step of aggregating at least one feature of the third point cloud comprises a step of using a first neural network,
Aggregating at least one feature of the third point cloud using the first neural network comprises using a first neural network parameter set together with the first neural network;
A step of generating a pruned point cloud by removing voxels of the third point cloud classified as empty according to the predicted occupancy state; and
A step of generating aggregated features by performing feature aggregation on the above pruned point cloud -
Performing feature aggregation on the above pruned point cloud involves using a second neural network,
Generating the aggregated features using the second neural network comprises using a second neural network parameter set together with the second neural network,
The above first neural network parameter set is identical to the above second neural network parameter set -
A method comprising:

A method, further comprising the step of aggregating at least one feature of the second point cloud, in claim 42.

In Article 43,
The step of aggregating at least one feature of the second point cloud comprises a step of using a third neural network,
Aggregating at least one feature of the second point cloud using the third neural network comprises using a third neural network parameter set together with the third neural network,
The above third neural network parameter set is identical to the above first neural network parameter set, method.

A method according to any one of claims 42 to 44, wherein the initial upsampling comprises nearest neighbor upsampling.

A method according to any one of claims 42 to 45, wherein associating features comprises obtaining the third point cloud by connecting features of the second point cloud with the context information.

A method according to any one of claims 42 to 46, wherein the step of associating features comprises the step of obtaining the third point cloud by associating features of the second point cloud with the context information.

A method according to any one of claims 42 to 47, wherein the context information is voxel-unit context information.

A method according to any one of claims 42 to 48, further comprising the step of performing feature decoding on an input point cloud and a first bitstream to generate the first point cloud.

A method, further comprising the step of performing a context-aware upsampling process on the aggregated features to generate a decoded point cloud, in claim 49.

In Article 49,
A step of performing feature-residual transformation on the above pruned point cloud to generate a residual output; and
A method further comprising the step of adding the pruned point cloud to the residual output to generate a decoded point cloud.

A method in claim 51, wherein the feature-residual transformation is performed on the aggregated features.

A method according to any one of claims 42 to 52, wherein the step of predicting the occupancy state predicts the actual occupancy state of at least one voxel.

A method according to any one of claims 42 to 53, wherein the step of predicting the occupancy state predicts a probability that the at least one voxel is occupied.

A method according to any one of claims 42 to 54, wherein removing voxels of the third point cloud removes voxels using a voxel pruning process.

In any one of claims 42 to 55, the step of predicting the occupancy state of at least one voxel comprises:
A step of processing the above-mentioned aggregated features with multilayer perceptron (MLP) layers to generate an MLP layer output;
A step of performing a softmax process on the above MLP layer output to generate softmax output values; and
A method comprising the step of performing threshold processing of the softmax output values to generate a predicted occupancy state of at least one voxel of the third point cloud.

In claim 56, the threshold processing of the softmax output values is a method in which softmax output values exceeding 0.5 are converted into an output value of 1, and softmax output values less than or equal to 0.5 are converted into an output value of 0.

In any one of claims 42 to 55, the step of predicting the occupancy state of at least one voxel comprises:
a step of aggregating at least one feature of the third point cloud; and
A method comprising the step of generating a predicted occupancy state of at least one voxel of the third point cloud based on the aggregated features.

In any one of claims 56 to 58, the step of aggregating at least one feature of the third point cloud comprises:
A method comprising: repeating a cascading process one or more times;
A step of performing sparse 3D convolution of an input point cloud to generate a convolution output point cloud;
A step of generating a nonlinear output point cloud by performing a nonlinear activation process on the above convolution output point cloud; and
Including a step of preparing the nonlinear output point cloud as an input point cloud when there is a next cycle of the above cascading process,
The above third point cloud is the input point cloud for the first cycle of the above cascading process,
The final cycle of the above cascading process is a method for generating the aggregated features.

A method in claim 59, further comprising the step of adding the third point cloud to the ReLU output point cloud of the last cycle of the cascading process.

In any one of claims 56 to 58, the step of aggregating at least one feature comprises:
A step of performing sparse 3D convolution of an input point cloud to generate a convolution output point cloud; and
A method comprising the step of performing a nonlinear activation process on the convolution output point cloud to generate the aggregated features.

A method according to any one of claims 59 to 61, wherein the nonlinear activation process comprises a rectifier linear unit (ReLU) activation process, and the nonlinear output point cloud comprises a ReLU output point cloud.

In any one of claims 56 to 58, the step of aggregating at least one feature of the third point cloud comprises:
A step of repeating the first cascading process one or more times, wherein the first cascading process comprises:
A step of performing a first sparse 3D convolution of a first input point cloud to generate a first convolution output point cloud;
A step of generating a first nonlinear output point cloud by performing a first nonlinear activation process on the first convolution output point cloud; and
Including the step of preparing the first nonlinear output point cloud as the first input point cloud when there is a next cycle of the first cascading process,
The above third point cloud is the first input point cloud for the first cycle of the above first cascading process,
The last cycle of the first cascading process produces the first cascading process output;
A step of repeating a second cascading process one or more times, wherein said second cascading process comprises:
A step of performing a second sparse 3D convolution of the second input point cloud to generate a second convolution output point cloud;
A step of generating a second nonlinear output point cloud by performing a second nonlinear activation process on the second convolution output point cloud; and
Including the step of preparing the second nonlinear output point cloud as a second input point cloud when there is a next cycle of the second cascading process,
The above third point cloud is the second input point cloud for the first cycle of the second cascading process,
The last cycle of the second cascading process produces the second cascading process output;
A step of generating a connected output by connecting the output of the first cascading process and the output of the second cascading process; and
A method comprising the step of adding the third point cloud to the connected output to generate the aggregated features.

In any one of claims 56 to 58, the step of aggregating at least one feature comprises:
A step of repeating the first cascading process one or more times, wherein the first cascading process comprises:
A step of performing a first sparse 3D convolution of a first input point cloud to generate a first convolution output point cloud;
A step of generating a first ReLU output point cloud by performing a first ReLU (rectifier linear unit) activation process on the first convolution output point cloud; and
Including a step of preparing the first ReLU output point cloud as the first input point cloud when there is a next cycle of the first cascading process,
The above third point cloud is the first input point cloud for the first cycle of the above first cascading process,
The last cycle of the first cascading process produces the first cascading process output;
A step of repeating a second cascading process one or more times, wherein said second cascading process comprises:
A step of performing a second sparse 3D convolution of the second input point cloud to generate a second convolution output point cloud;
A step of generating a second ReLU output point cloud by performing a second ReLU (rectifier linear unit) activation process on the second convolution output point cloud; and
Including a step of preparing the second ReLU output point cloud as a second input point cloud when there is a next cycle of the second cascading process,
The above third point cloud is the second input point cloud for the first cycle of the second cascading process,
The last cycle of the second cascading process produces the second cascading process output;
A step of generating a connected output by connecting the output of the first cascading process and the output of the second cascading process; and
A method comprising the step of adding the third point cloud to the connected output to generate the aggregated features.

In any one of claims 56 to 58, the step of aggregating at least one feature comprises:
A step of performing a self-attention process on the above third point cloud;
A step of adding the third point cloud to the self-attention process output to generate an MLP process input;
A step of performing an MLP process on the above MLP process input; and
A method comprising the step of adding said MLP process input to said MLP process output to generate said aggregated features.

In claim 65, the self-attention process generates output features based on k nearest neighbors of voxels of the third point cloud.

A method according to any one of claims 56 to 58, wherein the step of aggregating at least one feature of the third point cloud comprises performing the feature aggregation process two or more times.

A method according to any one of claims 42 to 67, wherein the first neural network parameter set and the second neural network parameter set are the same neural network parameter set, and the same neural network parameter set is used by at least the first neural network and the second neural network.

A method according to any one of claims 42 to 68, wherein the first neural network parameter set and the second neural network parameter set are distinct but identical neural network parameter sets.

As a method,
Step of obtaining the first point cloud;
A step of determining the occupancy state of at least one voxel of the first point cloud;
A step of generating a second point cloud by removing voxels of the first point cloud classified as empty based on the determined occupancy status;
A step of obtaining a third point cloud by associating features of the second point cloud with context information;
A step of obtaining a fourth point cloud by downsampling the third point cloud using initial downsampling; and
A step of outputting the above fourth point cloud as an encoded point cloud.
A method comprising:

As a device,
processor; and
When executed by said processor, said device causes
Acquire the first point cloud;
Determine the occupancy state of at least one voxel of the first point cloud;
Based on the determined occupancy status, voxels of the first point cloud classified as empty are removed to generate a second point cloud;
The features of the second point cloud are associated with context information to obtain a third point cloud;
Using initial downsampling, the third point cloud is downsampled to obtain a fourth point cloud;
To output the above 4th point cloud as an encoded point cloud.
A non-transitory computer-readable medium storing instructions for operation.
A device comprising:

As a method,
A step of accessing data including a first point cloud; and
A step of transmitting the data including the first point cloud
A method comprising:

As a device,
An access unit configured to access data including a first point cloud; and
A transmitter configured to transmit said data including said first point cloud
A device comprising:

As a device,
processor; and
A non-transitory computer-readable medium storing instructions that, when executed by said processor, cause said device to perform any one of the methods of claims 1 to 33 and claims 41 to 70.
A device comprising:

As a device, at least one processor configured to perform the method of any one of claims 1 to 33 and claims 41 to 70.
A device comprising:

A computer-readable medium storing instructions for causing one or more processors to perform the method of any one of claims 1 to 33 and claims 41 to 70, as a device.
A device comprising:

As a device, at least one processor and at least one non-transitory computer-readable medium storing instructions for causing the at least one processor to perform a method according to any one of claims 1 to 33 and claims 41 to 70.
A device comprising:

As a signal, a bitstream generated according to any one of claims 1 to 33 and claims 41 to 70
A signal containing .