HK40075483A - Method and apparatus for point cloud coding - Google Patents
Method and apparatus for point cloud coding Download PDFInfo
- Publication number
- HK40075483A HK40075483A HK62022064557.1A HK62022064557A HK40075483A HK 40075483 A HK40075483 A HK 40075483A HK 62022064557 A HK62022064557 A HK 62022064557A HK 40075483 A HK40075483 A HK 40075483A
- Authority
- HK
- Hong Kong
- Prior art keywords
- lcu
- coding
- geometric
- point cloud
- coding mode
- Prior art date
Links
Description
Is incorporated by reference
This application claims the benefit of priority from U.S. patent application No. 17/466,729, "METHOD AND APPARATUS FOR POINT CLOUD CODING", filed ON 3.9.2021, which claims the benefit of priority from U.S. provisional application No. 63/121,835, "UPDATE ON NODE-BASED GEOMETRY AND ATTRIBUTE CODING FOR POINT CLOUD", filed ON 4.12.2020. The entire disclosure of the prior application is incorporated herein by reference in its entirety.
Technical Field
This disclosure describes embodiments generally related to point cloud coding, which includes node-based geometric and attribute coding of point clouds.
Background
The background description provided herein is for the purpose of generally presenting the context of the disclosure. Work of the presently named inventors, to the extent it is described in this background section, as well as aspects of the description that may otherwise qualify as prior art at the time of filing, are neither expressly nor impliedly admitted as prior art against the present disclosure.
Various techniques have been developed to capture the world and represent the world in a 3-dimensional (3D) space, such as objects in the world, environments in the world, and so forth. The 3D representation of the world may enable a more immersive form of interaction and communication. The point cloud may be used as a 3D representation of the world. A point cloud is a set of points in 3D space, each point having associated attributes, such as color, material properties, texture information, intensity attributes, reflectivity attributes, motion-related attributes, morphological attributes, and/or various other attributes. Such point clouds may include a large amount of data, and storage and transmission may be both expensive and time consuming.
Disclosure of Invention
Aspects of the present disclosure provide methods and apparatus for point cloud compression and decompression. According to an aspect of the present disclosure, a method of point cloud geometric encoding in a point cloud encoder is provided. In the method, geometric coding may be performed on the point cloud at a first segmentation depth. Further, a plurality of Largest Coding Units (LCUs) of the point clouds may be determined at the second partition depth. A coding state of an LCU of the plurality of LCUs of the point cloud may be set at a second segmentation depth. Geometric coding may be performed on the plurality of LCUs of the point cloud at the second partition depth based on the coding states of the LCUs at the second partition depth.
In some embodiments, the geometric coding may include one of octree-based geometric coding and prediction tree-based coding.
In an embodiment, the decoding state of the LCU may be set with an initial state of the point cloud, which may be obtained prior to decoding the point cloud based on geometric decoding.
In another embodiment, when the LCU is a first LCU of the plurality of LCUs of the point cloud at the second segmentation depth, the coding state may be obtained and stored after the point cloud is coded based on geometric coding at the first segmentation depth.
In another embodiment, when the LCU is not the first LCU of the plurality of LCUs of the point cloud at the second partition depth, the decoding status of the LCU may be set with the stored decoding status. The stored coding state may be obtained (i) after coding the point cloud based on geometric coding at a first partition depth, or (ii) stored prior to coding a first LCU of a plurality of LCUs of the point cloud based on geometric coding at a second partition depth.
In some implementations, the coding state may include at least one of context of entropy coding associated with the LCU or geometric occupancy history information associated with the LCU.
In some implementations, each of the plurality of LCUs may include a respective node at the second split depth.
According to another aspect of the present disclosure, a method of point cloud geometric encoding in a point cloud encoder is provided. In the method, a density of Largest Coding Units (LCUs) of the point cloud may be determined. The density of the LCUs may be a ratio of the number of points in the LCUs to the volume of the LCUs. A geometric coding mode of the LCU may be determined based on the density of the LCU and a first threshold. Geometry coding mode information may also be signaled in the bitstream, wherein the geometry coding mode information may indicate a determined geometry coding mode of the LCU based on the density of the LCU and a first threshold.
In an example, a geometric coding mode of an LCU may be determined to be predictive tree geometric coding based on a density of the LCU being equal to or less than a first threshold. In another example, a geometric coding mode of an LCU may be determined to be octree-based geometric coding based on a density of the LCU being greater than a first threshold.
In an example, a geometric coding mode of an LCU may be determined to be predictive tree geometric coding based on a density of LCUs being equal to or greater than a first threshold and being equal to or less than a second threshold, where the second threshold may be greater than the first threshold. In another example, a geometric coding mode of an LCU may be determined to be octree-based geometric coding based on the density of LCUs being less than a first threshold or greater than a second threshold.
In an example, a geometric coding mode of an LCU may be determined to be predictive tree geometric coding based on (i) a density of LCUs being equal to or greater than a first threshold and equal to or less than a second threshold and (ii) a number of points in the LCU being equal to or greater than a point number threshold. In another example, a geometric coding mode of an LCU may be determined to be octree-based geometric coding based on one of (i) a density of LCUs being less than a first threshold or greater than a second threshold and (ii) a number of points in the LCU being less than a point number threshold.
In some embodiments, the geometry coding mode information may be signaled with a first value based on the geometry coding mode being a first geometry coding mode. The geometry coding mode information may be signaled with a second value based on the geometry coding mode being a second geometry coding mode.
In this method, the geometric coding mode information may be entropy coded using a context or may be coded using bypass coding.
In an embodiment, the geometry coding mode information may be signaled with a first value based on the geometry coding mode being a first geometry coding mode. In another embodiment, the geometry coding mode information may be signaled with a second value based on the geometry coding mode being a second geometry coding mode. In another example, the geometry coding mode information may be signaled with a third value based on the geometry coding mode being a third geometry coding mode.
In some embodiments, the binarization information may be signaled with a first value only in the first bin, wherein the binarization information having the first value may indicate the first geometric coding mode. In some embodiments, the binarization information may be signaled with a second value in a first bin and a first value in a subsequent second bin, wherein the binarization information having the second value in the first bin and having the first value in the second bin may indicate the second geometric coding mode. In some implementations, the binarization information may be signaled with a second value used in the first bin and a second value used in the second bin, where the binarization information having the second value in the first bin and the second bin may indicate a third geometric coding mode.
In some implementations, the binarization information in the first bin may be entropy coded with a first context and the binarization information in the second bin may be entropy coded with a second context.
In some examples, an apparatus for processing point cloud data includes processing circuitry configured to perform one or more of the methods described above. For example, an apparatus may include processing circuitry configured to perform geometry coding on a point cloud at a first partition depth. The processing circuitry may be further configured to determine a plurality of Largest Coding Units (LCUs) of the point cloud at the second segmentation depth. The processing circuitry may be configured to set a decoding state of an LCU of the plurality of LCUs of the point cloud at the second segmentation depth. The processing circuitry may be configured to perform geometric coding on the plurality of LCUs of the point cloud at the second segmentation depth based on the coding state of the LCUs at the second segmentation depth.
In another example, the processing circuitry may be configured to determine a density of Largest Coding Units (LCUs) of the point cloud. The density of the LCUs may be a ratio of the number of points in the LCUs to the volume of the LCUs. The processing circuitry may be configured to determine a geometric coding mode for the LCU based on the density of the LCU and a first threshold. The processing circuitry may be further configured to signal geometry coding mode information in the bitstream, wherein the geometry coding mode information may indicate a determined geometry coding mode of the LCU based on the density of the LCU and a first threshold.
According to another aspect of the disclosure, a non-transitory computer-readable storage medium is provided. A non-transitory computer-readable storage medium stores instructions that, when executed by at least one processor, cause the at least one processor to perform one or more of the methods described above. For example, in an approach, geometric coding may be performed on a point cloud at a first segmentation depth. Further, a plurality of Largest Coding Units (LCUs) of the point cloud may be determined at the second segmentation depth. A coding state of an LCU of the plurality of LCUs of the point cloud may be set at a second segmentation depth. Geometric coding may be performed on the plurality of LCUs of the point cloud at the second segmentation depth based on the coding state of the LCUs at the second segmentation depth.
In another example, in a method, a density of Largest Coding Units (LCUs) of a point cloud may be determined. The density of the LCUs may be a ratio of the number of points in the LCUs to the volume of the LCUs. A geometric coding mode of the LCU may be determined based on the density of the LCU and a first threshold. Geometric coding mode information may also be signaled in the bitstream, wherein the geometric coding mode information may indicate a determined geometric coding mode of the LCU based on the density of the LCU and the first threshold.
Drawings
Further features, properties and various advantages of the disclosed subject matter will become more apparent from the following detailed description and the accompanying drawings, in which:
fig. 1 is a schematic illustration of a simplified block diagram of a communication system according to an embodiment;
fig. 2 is a schematic illustration of a simplified block diagram of a streaming system according to an embodiment;
FIG. 3 illustrates a block diagram of an encoder for encoding a point cloud frame, in accordance with some embodiments;
FIG. 4 illustrates a block diagram of a decoder for decoding a compressed bitstream corresponding to a point cloud frame, in accordance with some embodiments;
FIG. 5 illustrates a block diagram of an encoder for encoding a point cloud frame, in accordance with some embodiments;
FIG. 6 illustrates a block diagram of a decoder for decoding a compressed bitstream corresponding to a point cloud frame, in accordance with some embodiments;
FIG. 7 shows a diagram illustrating the partitioning of a cube based on an octree partitioning technique according to some embodiments of the present disclosure.
FIG. 8 shows a diagram illustrating the segmentation of a cube based on a quadtree segmentation technique along the x-y, x-z, and y-z axes, according to some embodiments of the present disclosure.
Fig. 9 shows a diagram illustrating partitioning of a cube based on a binary tree partitioning technique along the x-axis, y-axis, and z-axis, according to some embodiments of the present disclosure.
Fig. 10A shows a diagram illustrating a breadth-first traversal order in an octree splitting technique according to some embodiments of the present disclosure.
Fig. 10B shows a diagram illustrating a depth-first traversal order in an octree splitting technique according to some embodiments of the present disclosure.
Fig. 11 is a schematic illustration of prediction tree based geometry coding in accordance with some embodiments of the present disclosure.
FIG. 12 illustrates a block diagram of a forward transform in lifting-based attribute coding, in accordance with some embodiments;
FIG. 13 illustrates a block diagram of an inverse transform in lifting-based attribute coding, in accordance with some embodiments;
fig. 14A illustrates a diagram of a forward transform in Region Adaptive Hierarchical Transform (RAHT) -based attribute coding according to some embodiments of the present disclosure.
Fig. 14B illustrates a diagram of an inverse transform in Region Adaptive Hierarchical Transform (RAHT) -based attribute coding according to some embodiments of the present disclosure.
Fig. 15 illustrates examples of octree splits and octree structures corresponding to the octree splits according to some embodiments of the present disclosure.
Fig. 16 illustrates a graph of node-based (LCU-based) geometry and attribute coding, according to some embodiments of the present disclosure.
Fig. 17 shows a flowchart outlining node-based (LCU-based) parallel decoding according to some embodiments of the present disclosure.
Fig. 18 shows a flowchart outlining a first exemplary decoding process according to some embodiments.
Fig. 19 shows a flowchart outlining a second exemplary decoding process according to some embodiments.
Fig. 20 is a schematic illustration of a computer system, according to an embodiment.
Detailed Description
In recent years, point clouds have become more widely used. For example, the point cloud may be used in an autonomously driven vehicle for object detection and localization. The point cloud can also be used for mapping in a Geographic Information System (GIS), and for visualizing and archiving cultural heritage objects and collections in the cultural heritage, and the like.
The point cloud may contain a set of high-dimensional points, typically in three dimensions (3D). Each of the high-dimensional points may include 3D position information and other attributes, such as color, reflectivity, etc. High-dimensional points may be captured using multiple cameras and depth sensors, or Lidar in various settings, and may be formed of thousands or billions of points to truly represent the original scene.
Therefore, compression techniques are needed to reduce the amount of data required to represent the point cloud, for faster transmission or to reduce storage. ISO/IEC MPEG (JTC 1/SC 29/WG 11) has created the ad-hoc group (MPEG-PCC) to standardize compression techniques for static or dynamic point clouds. In addition, the audio video coding standards working group of china has also created an ad-hoc group (AVS-PCC) to standardize the compression of point clouds.
Fig. 1 shows a simplified block diagram of a communication system (100) according to an embodiment of the present disclosure. The communication system (100) comprises a plurality of terminal devices which can communicate with each other via, for example, a network (150). For example, a communication system (100) includes a pair of terminal devices (110) and (120) interconnected via a network (150). In the example of fig. 1, a first pair of terminal devices (110) and (120) may perform a one-way transmission of point cloud data. For example, the terminal device (110) may compress a point cloud (e.g., points representing a structure) captured by a sensor (105) connected to the terminal device (110). The compressed point cloud can be transmitted to another terminal device (120), for example in the form of a bit stream, via a network (150). The terminal device (120) may receive the compressed point cloud from the network (150), decompress the bitstream to reconstruct the point cloud, and display the reconstructed point cloud as appropriate. Unidirectional data transmission may be common in media service applications and the like.
In the example of fig. 1, terminal devices (110) and (120) may be shown as servers and personal computers, although the principles of the disclosure may not be so limited. Embodiments of the present disclosure may be used in laptop computers, tablet computers, smart phones, gaming terminals, media players, and/or dedicated three-dimensional (3D) devices. The network (150) represents any number of networks that transmit compressed point clouds between terminal devices (110) and (120). The network (150) may include, for example, a telephone line (wired) and/or a wireless communication network. The network (150) may exchange data in circuit-switched and/or packet-switched channels. Representative networks include telecommunications networks, local area networks, wide area networks, and/or the internet. For purposes of this discussion, the architecture and topology of the network (150) may be unimportant to the operation of the present disclosure, unless described herein below.
Fig. 2 shows a simplified block diagram of a streaming system (200) according to an embodiment. Fig. 2 illustrates an application of the disclosed subject matter to point clouds. The disclosed subject matter may be equally applicable to other point cloud enabled applications, such as 3D telepresence applications, virtual reality applications, and the like.
The streaming system (200) may include a capture subsystem (213). The capture subsystem (213) may include a point cloud source (201) that generates, for example, an uncompressed point cloud (202), such as a laser radar (LIDAR) system, a 3D camera, a 3D scanner, a graphics generation component that generates the uncompressed point cloud in software, and so forth. In an example, the point cloud (202) includes points captured by a 3D camera. The point cloud (202) is depicted as a thick line emphasizing high data volume when compared to the compressed point cloud (204) (bit stream of compressed point cloud). The compressed point cloud (204) may be generated by an electronic device (220), the electronic device (220) including an encoder (203) coupled to the point cloud source (201). The encoder (203) may include hardware, software, or a combination thereof to implement or perform aspects of the disclosed subject matter as described in more detail below. The compressed point cloud (204) (or bit stream of the compressed point cloud (204)) is depicted as a thin line emphasizing a lower data volume when compared to the stream of point clouds (202), which may be stored on the streaming server (205) for future use. One or more streaming client subsystems in fig. 2, such as client subsystems (206) and (208), may access the streaming server (205) to retrieve copies (207) and (209) of the compressed point cloud (204). The client subsystem (206) may include a decoder (210), for example, in the electronic device (230). A decoder (210) decodes an incoming copy (207) of the compressed point cloud and creates an outgoing stream of reconstructed point clouds (211) that can be rendered on a rendering device (212).
Note that the electronic devices (220) and (230) may include other components (not shown). For example, the electronic device (220) may include a decoder (not shown), and the electronic device (230) may also include an encoder (not shown).
In some streaming systems, the compressed point clouds (204), (207), and (209) (e.g., a bitstream of compressed point clouds) may be compressed according to some standard. In some examples, video coding standards are used for compression of point clouds. Examples of these standards include High Efficiency Video Coding (HEVC), multi-function video coding (VVC), and the like.
Fig. 3 illustrates a block diagram of a V-PCC encoder (300) for encoding a point cloud frame, in accordance with some embodiments. In some embodiments, the V-PCC encoder (300) may be used in a communication system (100) and a streaming system (200). For example, the encoder (203) may be configured and operate in a similar manner as the V-PCC encoder (300).
A V-PCC encoder (300) receives the point cloud frame as an uncompressed input and generates a bitstream corresponding to the compressed point cloud frame. In some implementations, the V-PCC encoder (300) may receive a point cloud frame from a point cloud source, such as point cloud source (201).
In the fig. 3 example, the V-PCC encoder (300) includes a patch generation module (306), a patch packing module (308), a geometric image generation module (310), a texture image generation module (312), a patch information module (304), an occupancy map module (314), a smoothing module (336), image filling modules (316) and (318), a group expansion module (320), video compression modules (322), (323) and (332), an auxiliary patch information compression module (338), an entropy compression module (334), and a multiplexer (324).
According to an aspect of the disclosure, a V-PCC encoder (300) converts a 3D point cloud frame and some metadata (e.g., occupancy graph and patch information) to an image-based representation, which is used to convert the compressed point cloud back to a decompressed point cloud. In some examples, the V-PCC encoder (300) may convert the 3D point cloud frames into a geometric image, a texture image, and an occupancy map, and then encode the geometric image, the texture image, and the occupancy map as a bitstream using video coding techniques. Typically, a geometric image is a 2D image having pixels populated with geometric values associated with points projected to the pixels, and pixels populated with geometric values that may be referred to as geometric samples. A texture image is a 2D image having pixels populated with texture values associated with points projected to the pixels, and pixels populated with texture values that may be referred to as texture samples. The occupancy map is a 2D image having pixels filled with values indicating occupied or unoccupied by the patch.
A patch may generally refer to a contiguous subset of a surface described by a point cloud. In an example, the patch includes points where surface normal vectors deviate from each other by less than a threshold amount. A patch generation module (306) divides the point cloud into a set of patches that may or may not overlap, such that each patch may be described by a depth field with respect to a plane in 2D space. In some implementations, the patch generation module (306) is directed to decompose the point cloud into a minimum number of patches with smooth boundaries while also minimizing reconstruction errors.
The patch information module (304) may collect patch information indicating the size and shape of the patch. In some examples, the patch information may be packed into an image frame and then encoded by a secondary patch information compression module (338) to generate compressed secondary patch information.
The patch packing module (308) is configured to map the extracted patches onto a 2-dimensional (2D) grid while minimizing unused space and ensuring that each M x M (e.g., 16x 16) block of the grid is associated with a unique patch. Efficient patch packing can directly impact compression efficiency by minimizing unused space or ensuring temporal consistency.
A geometric image generation module (310) may generate a 2D geometric image associated with the geometry of the point cloud at a given patch location. A texture image generation module (312) may generate a 2D texture image associated with the texture of the point cloud at a given patch location. A geometric image generation module (310) and a texture image generation module (312) store the geometry and texture of the point cloud as an image using a 3D-to-2D mapping computed during the packing process. To better handle the case where multiple points are projected onto the same sample, each patch is projected onto two images (called layers). In an example, the geometric image is represented by a monochrome frame of WxH in YUV420-8 bit format. To generate the texture image, the texture generation process utilizes the reconstructed/smoothed geometry to compute the color to be associated with the resampled point.
The occupancy graph module (314) may generate an occupancy graph that describes fill information at each cell. For example, the occupancy image comprises a binary map indicating for each cell of the grid whether the cell belongs to an empty space or to a point cloud. In an example, the occupancy map uses binary information that describes for each pixel whether the pixel is populated. In another example, the occupancy map uses binary information that describes, for each block of pixels, whether a block of pixels is populated.
The occupancy map generated by the occupancy map module (314) may be compressed using lossless coding or lossy coding. When lossless coding is used, the occupancy map is compressed using an entropy compression module (334). When lossy coding is used, the occupancy map is compressed using a video compression module (332).
Note that the patch packing module (308) may leave some empty space between 2D patches packed in an image frame. The image fill modules (316) and (318) may fill empty spaces (referred to as fills) to generate image frames that may be suitable for 2D video and image codecs. Image fill is also referred to as background fill, where unused space can be filled with redundant information. In some examples, good background filling minimally increases the bit rate and does not introduce significant coding distortion around patch boundaries.
The video compression modules (322), (323), and (332) may encode 2D images such as padded geometric images, padded texture images, and occupancy maps based on suitable video coding standards such as HEVC, VVC, and the like. In an example, the video compression modules (322), (323), and (332) are individual components that operate separately. Note that in another example, the video compression modules (322), (323), and (332) may be implemented as a single component.
In some examples, the smoothing module (336) is configured to generate a smoothed image of the reconstructed geometric image. The smoothed image may be provided to texture image generation (312). The texture image generation (312) may then adjust the generation of the texture image based on the reconstructed geometric image. For example, when the patch shape (e.g., geometry) is slightly distorted during encoding and decoding, the distortion may be considered in generating the texture image to correct the distortion of the patch shape.
In some embodiments, the group expansion (320) is configured to fill in pixels around object boundaries with redundant low frequency content to improve coding gain and visual quality of the reconstructed point cloud.
A multiplexer (324) may multiplex the compressed geometry image, the compressed texture image, the compressed occupancy map, and/or the compressed auxiliary patch information into a compressed bitstream.
Fig. 4 illustrates a block diagram of a V-PCC decoder (400) for decoding a compressed bitstream corresponding to a point cloud frame, in accordance with some embodiments. In some embodiments, the V-PCC decoder (400) may be used in a communication system (100) and a streaming system (200). For example, the decoder (210) may be configured to operate in a similar manner as the V-PCC decoder (400). A V-PCC decoder (400) receives the compressed bitstream and generates a reconstructed point cloud based on the compressed bitstream.
In the fig. 4 example, the V-PCC decoder (400) includes a demultiplexer (432), video decompression modules (434) and (436), an occupancy map compression module (438), an auxiliary patch information decompression module (442), a geometry reconstruction module (444), a smoothing module (446), a texture reconstruction module (448), and a color smoothing module (452).
A demultiplexer (432) may receive the compressed bitstream and divide the compressed bitstream into a compressed texture image, a compressed geometry image, a compressed footprint image, and compressed auxiliary patch information.
The video decompression modules (434) and (436) may decode the compressed image according to a suitable standard (e.g., HEVC, VVC, etc.) and output a decompressed image. For example, the video decompression module (434) decodes the compressed texture image and outputs a decompressed texture image; and the video decompression module (436) decodes the compressed geometric image and outputs a decompressed geometric image.
The occupancy map decompression module (438) may decode the compressed occupancy map according to a suitable standard (e.g., HEVC, VVC, etc.) and output a decompressed occupancy map.
The auxiliary patch information decompression module (442) may decode the compressed auxiliary patch information according to a suitable standard (e.g., HEVC, VVC, etc.) and output decompressed auxiliary patch information.
A geometry reconstruction module (444) may receive the decompressed geometry image and generate a geometry of a reconstructed point cloud based on the decompressed occupancy map and the decompressed auxiliary patch information.
The smoothing module (446) may smooth out inconsistencies at the edges of the patch. The smoothing process is intended to mitigate potential discontinuities that may occur at patch boundaries due to compression artifacts. In some embodiments, a smoothing filter may be applied to pixels located on patch boundaries to mitigate distortion that may be caused by compression/decompression.
The texture reconstruction module (448) can determine texture information about the points in the point cloud based on the decompressed texture image and the smooth geometry.
The color smoothing module (452) may smooth out the shading inconsistencies. Non-adjacent patches in 3D space are typically packed next to each other in 2D video. In some examples, pixel values from non-neighboring patches may be mixed by a block-based video codec. The purpose of color smoothing is to reduce visible artifacts that occur at patch boundaries.
Fig. 5 illustrates a block diagram of a G-PPC encoder (500) in accordance with some embodiments. The encoder (500) may be configured to receive point cloud data and compress the point cloud data to generate a bitstream carrying the compressed point cloud data. In an embodiment, an encoder (500) may include a location quantization module (510), a repetition point deletion module (512), an octree encoding module (530), an attribute transfer module (520), a level of detail (LOD) generation module (540), an attribute prediction module (550), a residual quantization module (560), an arithmetic coding module (570), an inverse residual quantization module (580), an addition module (581), and a memory (590) to store reconstructed attribute values.
As shown, an input point cloud (501) may be received at an encoder (500). The location (e.g., 3D coordinates) of the point cloud (501) is provided to a quantization module (510). A quantization module (510) is configured to quantize the coordinates to generate a quantized position. A duplicate point deletion module (512) is configured to receive the quantized locations and perform a filtering process to identify and delete duplicate points. An octree encoding module (530) is configured to receive the filtered locations from the repetition point deletion module (512) and perform an octree-based encoding process to generate a sequence of occupancy codes that describe a 3D mesh of voxels. The occupancy code is provided to an arithmetic decoding module (570).
An attribute transfer module (520) is configured to receive attributes of the input point cloud and perform an attribute transfer process to determine an attribute value for each voxel when a plurality of attribute values are associated with respective voxels. The attribute delivery process may be performed on the reordered points output from the octree encoding module (530). The attributes after the transfer operation are provided to an attribute prediction module (550). The LOD generation module (540) is configured to operate on the reordered points output from the octree encoding module (530) and reorganize the points into different LODs. The LOD information is provided to an attribute prediction module (550).
The attribute prediction module (550) processes the points according to the LOD-based order indicated by the LOD information from the LOD generation module (540). The attribute prediction module (550) generates an attribute prediction for the current point based on reconstructed attributes of a set of neighboring points to the current point stored in the memory (590). Prediction residuals may then be obtained based on the original attribute values received from the attribute transfer module (520) and the locally generated attribute predictions. When the candidate index is used in the corresponding attribute prediction process, the index corresponding to the selected prediction candidate may be provided to the arithmetic coding module (570).
The residual quantization module (560) is configured to receive the prediction residual from the attribute prediction module (550) and perform quantization to generate a quantized residual. The quantized residue is provided to an arithmetic coding module (570).
The inverse residual quantization module (580) is configured to receive a quantized residual from the residual quantization module (560) and to generate a reconstructed prediction residual by performing an inverse of the quantization operation performed at the residual quantization module (560). The addition module (581) is configured to receive the reconstructed prediction residual from the inverse residual quantization module (580) and the corresponding property prediction from the property prediction module (550). By combining the reconstructed prediction residual and the attribute prediction, a reconstructed attribute value is generated and stored to memory (590).
The arithmetic coding module (570) is configured to receive the occupancy code, the candidate index (if used), the quantized residual (if generated), and other information, and perform entropy coding to further compress the received values or information. Thus, a compressed bitstream (502) carrying compressed information may be generated. The bitstream (502) may be transmitted to or otherwise provided to a decoder that decodes the compressed bitstream, or the bitstream (502) may be stored in a storage device.
Fig. 6 shows a block diagram of a G-PCC decoder (600) according to an embodiment. The decoder (600) may be configured to receive a compressed bitstream and perform point cloud data decompression to decompress the bitstream to generate decoded point cloud data. In an embodiment, the decoder (600) may include an arithmetic decoding module (610), an inverse residual quantization module (620), an octree decoding module (630), a LOD generation module (640), an attribute prediction module (650), and a memory (660) that stores reconstruction attribute values.
As shown, a compressed bitstream (601) may be received at an arithmetic decoding module (610). The arithmetic decoding module (610) is configured to decode the compressed bitstream (601) to obtain the occupancy code and the quantized residual of the point cloud (if generated). An octree decoding module (630) is configured to determine reconstructed locations of points in the point cloud from the occupancy codes. A LOD generation module (640) is configured to reorganize the points into different LODs based on the reconstruction locations and determine an order based on the LODs. The inverse residual quantization module (620) is configured to generate a reconstructed residual based on the quantized residual received from the arithmetic decoding module (610).
An attribute prediction module (650) is configured to perform an attribute prediction process to determine an attribute prediction for a point according to the LOD-based order. For example, a prediction of properties of the current point may be determined based on reconstructed property values of neighbors of the current point stored in the memory (660). In some examples, the property prediction may be combined with the corresponding reconstructed residual to generate reconstructed properties for the current point.
In one example, the sequence of reconstruction attributes generated from the attribute prediction module (650) along with the reconstruction locations generated from the octree decoding module (630) correspond to the decoded point cloud (602) output from the decoder (600). In addition, the reconstructed attributes are also stored in memory (660) and can be subsequently used to derive attribute predictions for subsequent points.
In various embodiments, the encoder (300), decoder (400), encoder (500), and/or decoder (600) may be implemented in hardware, software, or a combination thereof. For example, the encoder (300), decoder (400), encoder (500), and/or decoder (600) may be implemented with processing circuitry, e.g., one or more Integrated Circuits (ICs) operating with or without software, e.g., an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), etc. In another example, the encoder (300), decoder (400), encoder (500), and/or decoder (600) may be implemented as software or firmware including instructions stored in a non-volatile (or non-transitory) computer-readable storage medium. The instructions, when executed by processing circuitry, e.g., one or more processors, cause the processing circuitry to perform the functions of the encoder (300), decoder (400), encoder (500), and/or decoder (600).
Note that attribute prediction modules (550) and (650) configured to implement the attribute prediction techniques disclosed herein may be included in other decoders or encoders, which may have a similar or different structure than that shown in fig. 5 and 6. Additionally, the encoder (500) and decoder (600) may be included in the same device, or in various examples, may be included in separate devices.
In the MPEG geometry-based point cloud decoding (G-PCC) software test model, TMC13, the geometry information and associated attributes of the point cloud, such as color or reflectivity, may be compressed separately. The geometric information as the 3D coordinates of the point cloud may be decoded by octree division, quadtree division, and binary division using occupancy information of the point cloud. After the geometric information is coded, prediction, lifting, and region-adaptive hierarchical transformation techniques may then be used to compress the attributes of the point cloud based on the reconstructed geometry. For geometric decoding, two methods can be applied. The first method may be an octree-based method (or octree-based geometric coding), and the second method may be a predictive tree-based method (or predictive tree-based geometric coding).
In octree-based geometric coding, the point cloud may be partitioned by octree, quadtree, or binary partitioning, which may be described as follows.
For point clouds, the bounding box B of the point cloud may not be limited to having the same size in each direction. Alternatively, bounding box B may be a rectangular cuboid of any size to better fit the shape of the 3D scene or object. In an example, the size of bounding box B may be expressed as a power of 2, e.g.Note that d x 、d y 、d z May not be equal.
To partition bounding box B, octree, quadtree, or binary partitioning may be utilized. Fig. 7 shows an octree partitioning of a bounding box 700, where the x, y, and z dimensions of the bounding box 700 can be divided in half, which results in 8 sub-boxes with the same size. Fig. 8 shows a quadtree partitioning of a bounding box, where two of the three dimensions of the bounding box, e.g., the x, y, and z dimensions, may be split in half, which results in 4 sub-boxes with the same size. For example, as shown in FIG. 8, bounding box 801 may be partitioned into 4 sub-boxes along the x-y axis, bounding box 802 may be partitioned into 4 sub-boxes along the x-z axis, and bounding box 803 may be partitioned into 4 sub-boxes along the y-z axis.
Fig. 9 shows a binary tree partitioning of a bounding box, where only one of the three dimensions (e.g., x, y, and z dimensions) can be split in half, resulting in 2 sub-boxes of the same size. For example, as shown in FIG. 9, bounding box 901 may be partitioned into 2 sub-boxes along the x-axis, bounding box 902 may be partitioned into 2 sub-boxes along the y-axis, and bounding box 903 may be partitioned into 2 sub-boxes along the z-axis.
Thus, the point cloud may be represented by a general tree structure having octree segmentation, quadtree segmentation, and binary tree segmentation, respectively. To traverse such a tree, breadth-first methods may be employed in the MPEG TMC13 model. On the other hand, a depth-first method may also be utilized, which may be illustrated in fig. 10A and 10B.
In fig. 10A and 10B, the shaded circles represent occupied nodes in the tree, and the blank circles represent unoccupied nodes. The numbers in the circles indicate the traversal order. FIG. 10A shows a breadth first traversal order, where nodes are accessed/processed starting at depth 0, then depths 1, 2, etc. Fig. 10B illustrates a depth-first traversal order, where a node is accessed/processed starting from the root node (e.g., node 0), then the first occupied child node of the root node (e.g., node 1), then the occupied child nodes of the first occupied child node of the root node (e.g., nodes 3, 4, and 5), until a leaf node is reached. Then, the access/processing starts from the second occupied child node of the root node (e.g., node 2), and then to the occupied child nodes of the second occupied child node of the root node (e.g., nodes 6,7, and 8), until the leaf node is reached.
In prediction tree based geometry coding, a prediction tree may be constructed, such as a spanning tree over all points in a point cloud. For the prediction of points, all the previous generations of points may be used. For example, the location of a point may be predicted from the location of its parent or from the locations of its parent and its grandparent. FIG. 11 illustrates a prediction tree 1100, the prediction tree 1100 spanning all points in a point cloud representing a rabbit surface, wherein an enlarged block 1102 illustrates a portion of the prediction tree.
Trisoup-based geometric decoding is another geometric decoding method that can represent an object surface as a series of triangular meshes. The trisoup-based geometric decoding may be applicable to dense surface point clouds. The Trisoup decoder may generate a point cloud from the mesh surface with a specified voxel granularity, so that the density of the reconstructed point cloud may be ensured. In general, trisoup-based geometric decoding can introduce distortion to the original point cloud, with the benefit of reduced bitstream size.
Prediction-based attribute interpretation of point clouds may be described as follows. For simplicity, a level of detail (LoD) may be assumed in prediction-based attribute coding.
Make (P) i ) i=1...N Is associated with a point of the point cloudGroup position, and make (M) i ) i=1...N Is and (P) i ) i=1...N The associated morton code. First, the points may be sorted according to their associated morton codes in ascending order. Let I be the array of point indices arranged according to ascending order. The encoder/decoder can compress/decompress the points according to the order defined by I, respectively. At each iteration i, a point P may be selected i . Can be paired with P i The distances to s (e.g., s = 64) previous points are analyzed, and P may be selected i K (e.g., k = 3) nearest neighbors for prediction. More precisely, the attribute value (a) may be predicted by using a linear interpolation process based on the distance of the nearest neighbor of the point i i ) i∈1...N . Make itIs a set of k nearest neighbors to the current point i, andis a set of k nearest neighbor decoding/reconstruction attribute values andis the distance of a set of k nearest neighbors to the current point i. Predicting attribute valuesThe following can be given by equation (1):
lifting-based attribute coding may be based on prediction-based attribute coding. Compared to prediction-based attribute coding, two additional steps are introduced in lifting-based attribute coding: (a) introducing an update operator; and (b) using an adaptive quantization strategy.
For illustration, the operation of lifting-based attribute transcoding may be illustrated in fig. 12 and 13. Fig. 12 shows a block diagram of a forward transform 1200 in lifting-based attribute coding, and fig. 13 shows a block diagram of an inverse transform 1300 in lifting-based attribute coding.
As shown in fig. 12, the attribute signal at level N may be split into a high-pass signal H (N) and a low-pass signal L (N). L (N) may generate a prediction signal P (N) based on the prediction process 1202. The difference signal D (N) may be generated based on the difference between H (N) and L (N). The difference signal D (N) may also be updated to generate an update signal U (N). The sum of U (N) and L (N) may generate an updated low-pass signal L' (N). L' (N) can also be split into a high-pass signal H (N-1) and a low-pass signal L (N-1) at a subsequent level (N-1). L (N-1) may generate a prediction signal P (N-1) at level N-1. The difference signal D (N-1) at level N-1 may be generated based on the difference between H (N-1) and L (N-1). The difference signal D (N-1) may also be updated to generate an update signal U (N-1) at level N-1. The sum of U (N-1) and L (N-1) may generate an updated low-pass signal L' (N-1) at level N-1.
The updated low-pass signal L '(N-1) may also be decomposed into D (N-2) and L' (N-2). The splitting step may be repeatedly applied until an updated low-pass signal L' (0) of the base layer is obtained.
In fig. 13, an inverse transform 1300 based on boosted attribute decoding is provided. As shown in fig. 13, the low-pass signal L (0) at level zero may be generated based on the difference of the updated low-pass signal L' (0) and the update signal U (0). The update signal U (0) is obtained by updating the difference signal D (0). L (0) may also generate a prediction signal P (0) based on the prediction process 1302. P (0) and D (0) are also added to generate a high-pass signal H (0). H (0) and L (0) may be combined to generate an updated low pass L' (1) at level one. The combining step may be repeatedly applied until a high-pass signal H (N) and a low-pass signal L (N) at level N are generated. H (N) and L (N) may also be combined to form a reconstruction property signal.
Fig. 14A shows a forward transform 1400A of RAHT-based attribute decoding, and fig. 14B shows an inverse transform 1400B of RAHT-based attribute decoding. In figure 14A and figure 14B,andand w 0 Is an input coefficient F l+1,2n Of the weight of, and w 1 Is an input coefficient F l+1,2n+1 The sign of the weight of (a).
The node-based geometric and attribute decoding of the point cloud may be tree-based geometric and attribute decoding, wherein the point cloud is represented as a general tree structure including not only octree partitions but also quadtree and binary tree partitions. The root of the tree contains the entire volume of the point cloud, while the middle nodes of the tree contain the sub-volumes (or sub-trees) of the point cloud.
For simplicity and clarity, the following notation may be applied in node-based geometry and attribute decoding: (a) the root node may be at depth 0 of the tree; (b) After one level of segmentation, the resulting node is at depth 1 of the tree; (c) After a k level of segmentation, the resulting nodes are at depth k of the tree until all nodes are unit nodes, e.g., the size of a node in all three dimensions is one.
Fig. 15 illustrates an example of octree partitioning (1510) and an octree structure (1520) corresponding to octree partitioning (1510), according to some embodiments of the present disclosure. Fig. 15 shows two levels of partitioning in octree partitioning (1510). The octree structure (1520) includes nodes (N0) corresponding to cube boxes of the octree partition (1510). At the first level, the cube box is divided into 8 sub-cube boxes numbered 0 through 7 according to the numbering technique shown in FIG. 7. The segmented occupancy code for node N0 is "10000001" in binary, indicating that the first sub-cube box represented by node N0-0 and the eighth sub-cube box represented by node NO-7 comprise a point in the point cloud and that the other sub-cube boxes are empty.
Then, at the second level of partitioning, the first sub-cube box (represented by nodes N0-0) and the eighth sub-cube box (represented by nodes N0-7), respectively, are further divided into eight octants. For example, the first sub-cube box (represented by node N0-0) is partitioned into 8 smaller sub-cube boxes numbered 0 through 7 according to the numbering technique shown in FIG. 7. The segmented occupancy code for node N0-0 is "00011000" in binary, indicating that the fourth smaller sub-cube box (represented by nodes N0-0-3) and the fifth smaller sub-cube box (represented by nodes N0-0-4) comprise a point in the point cloud and that the other smaller sub-cube boxes are empty. At the second level, the seventh sub-cube box (represented by nodes N0-7) is similarly partitioned into 8 smaller sub-cube boxes, as shown in FIG. 15.
In this disclosure, instead of transcoding the attributes after the geometric transcoding is completed, the geometry of the point cloud may be first encoded until a depth k is reached, where k may be specified by the encoder and transmitted in the bitstream. For each occupied node at depth k, which can be considered a sub-volume (or sub-tree) of the point cloud, the geometric information can be first encoded for all points in the node (sub-tree), followed by attribute coding of all points in the node. In another embodiment, the geometric coding and attribute coding of all points in a node (sub-tree) may be encoded in an interleaved manner. In either approach, the node at depth k (sub-tree) may be considered a top level coding unit. Such a concept may be similar to LCUs used in the HEVC video coding standard. In the point cloud coding concept, each node at depth k may form a separate tree and may be considered an LCU, which may be shown in fig. 16.
As shown in fig. 16, the root node at depth k =0 may be divided into four nodes at depth k =1 by quadtree division, where two nodes (e.g., node "1" and node "6") of the four nodes at depth k =1 may be occupied nodes. The two occupied nodes at depth k =1 may also be split at subsequent depths, e.g. at depth k =2 and depth k =3, respectively, and form separate trees, respectively. Thus, each of the occupied nodes at depth k =1 may be considered a single LCU. For example, node "1" at depth k =1 may be considered a first LCU 1602, and node "6" at depth k =1 may be considered a second LCU 1604. For simplicity and clarity, the node at depth k may be named an LCU. Thus, a node and an LCU may be interchangeable terms as applied in this disclosure.
The generated bitstream of both the geometry and attributes of each node can be transmitted without waiting for the completion of the geometric interpretation of the entire point cloud. On the other hand, the decoder can decode and display all the points of the node without waiting for completion of geometric decoding of the entire point cloud. In this way, low latency encoding and decoding may be achieved.
In one embodiment, the occupied nodes (or LCUs) at depth k may be decoded in morton order. In another embodiment, the occupied nodes at depth k may be decoded in other spatial fill orders besides morton code (or morton order).
The decoding of the geometric information and attribute information of an LCU may depend on information of neighboring LCUs of the LCU. In one embodiment, the decoding of the geometry information and attribute information of an LCU may not depend on information of neighboring LCUs of the LCU. Thus, prediction/referencing across LCU boundaries may be disabled and context and history information may also need to be reinitialized for each LCU. Thus, maximum parallelism, e.g., LCU level parallel encoding and decoding, may be enabled at depth k.
In another embodiment, the decoding of the geometric information and attribute information of the LCU may depend on information of already decoded neighbor nodes of the LCU and information of already decoded child nodes of the already decoded neighbor nodes. Therefore, better compression efficiency can be obtained.
In the present disclosure, an update to node-based geometric coding is provided, including a method of node-based parallel coding and deciding geometric coding modes at each node level.
As mentioned above, to enable node-based (or LCU-based) parallel decoding, decoding of the geometric and attribute information of an LCU may not depend on information of neighboring LCUs of the LCU. Thus, prediction/referencing across LCU boundaries may be disabled, and context and history information may also need to be reinitialized for each LCU.
In one embodiment of the present disclosure, at each LCU, coding state, e.g., entropy coded context and geometric occupancy history information, and/or
Other necessary state information for LCU-based decoding (or node-based decoding) may be set to an initial state, which may be the state at the beginning of decoding of the point cloud.
In another embodiment, instead of using the initial state, a coding state, such as context and geometric occupancy history information of entropy coding, etc., may be stored just before the first LCU at octree depth K is reached, such as when encoding of the point cloud at octree depth K-1 is completed, where the node at octree depth K may be considered as an LCU. When encoding each of the LCUs at octree depth k, the coding state may be set with the above-mentioned stored encoding state (or stored coding state). In this way, node-based (or LCU-based) parallel decoding may be achieved. In addition, the stored coding states may help to improve coding performance compared to initial coding states obtained before the coding process begins.
Fig. 17 shows a flow diagram illustrating an exemplary node-based (LCU-based) parallel decoding using stored decoding states. In FIG. 17, N number of LCUs (nodes) may be set at octree depth k, where N is a positive integer. In contrast to the related example, the coding state may be stored prior to encoding any of the N LCUs at octree depth k. At the start of decoding of an LCU at octree depth k, the stored state may be used to restore or set the code state.
As shown in fig. 17, node-based coding process (1700) (or process (1700)) may begin at (S1710), where a point cloud may be encoded at octree depth k-1. The process (1700) may then continue at octree depth K to (S1720), where a plurality of LCUs may be determined at octree depth K. In some embodiments, the plurality of LCUs may also be ordered, for example, based on morton order or other space filling order. The number of the plurality of LCUs at octree depth k may be equal to a positive integer N. Additionally, an index i may be applied that indicates the ordering order of the plurality of LCUs at octree depth K. The index i may range from 0 to N-1. At (S1720), an index i may be set to zero, indicating a first LCU of the plurality of LCUs at octree depth k.
At (S1730), a first determination process may be performed to determine whether the index i is less than N. In response to the index i being determined to be equal to or greater than N, the process (1700) may proceed to (S1790), which indicates that all LCUs at octree depth k are decoded, and the process (1700) is complete. In response to the index i being determined to be less than N, the process (1700) may proceed to (S1740), where a second determination process may be performed to determine whether the index i is equal to 0. When index i is equal to 0, it indicates that a first LCU of the plurality of LCUs is to be coded. When index i is not equal to 0, it indicates that an LCU of the plurality of LCUs other than the first LCU is to be coded.
When the index i is determined to be equal to 0 at (S1740), the process (1700) may continue to (S1750), where the coding state may be stored. As mentioned above, the coding state may be obtained after coding the point cloud at octree depth k-1, and the coding state may be stored before a first LCU of the plurality of LCUs of the point cloud is coded. The process (1700) may then continue to (S1770), where the first LCU may be decoded. The process (1700) may further continue to (S1780), where the index i may be incremented by one. Accordingly, LCUs immediately following the LCU coded at (S1770), e.g., the first LCU, may be selected for coding. Then, the process (1700) may proceed to (S1730) to perform the first determination process again.
Still referring to (S1740), when the index i is determined not to be equal to 0, the process (1700) may proceed to (S1760). At (S1760), the decoding status may be set or otherwise determined with the stored decoding status mentioned above at (S1750). Process (1700) may then continue to (1770), where the LCU of index i may be decoded based on the decoding state set with the stored decoding state at (S1760). Thus, when each of the plurality of LCUs is to be decoded, the decoding status may be first set with the stored decoding status. Thus, node-based (LCU-based) parallel decoding may be achieved.
In the discussion above, the plurality of LCUs may be coded according to octree-based geometric coding. However, other geometric decoding methods may also be applied in the process (1700). For example, prediction tree based coding may also be applied to code multiple LCUs.
In a related example, for an LCU, a geometric coding mode may be determined based on heuristics. For example, octree-based geometric decoding may be applied to decode relatively dense point clouds, while predictive tree-based geometric decoding may be applied to decode sparse point clouds, which may be generated by Lidar from an autonomous driving vehicle.
In one embodiment, the density of LCUs may be used to determine the geometric coding mode. Octree-based geometric coding and predictive tree-based geometric coding may be used as illustrative examples without loss of generality. Of course, other types of geometric coding modes may be applied.
To determine the geometric coding mode, the density of LCUs may first be calculated in equation (2) as follows:
LCU _ density = number of points in LCU/volume of LCU (equation 2)
To calculate the volume of the LCU, a nominal bounding box of the LCU may be applied. A nominal bounding box for the LCU may be determined based on the octree segmentation depth, the octree segmentation type, and the bounding box of the point cloud. For example, assume that the bounding box of the point cloud isAnd at octree partitioning depth k, the nominal bounding box of each node (LCU) at octree partitioning depth k may be reduced to based on the octree partitioning type (octree, orthogonal tree, or binary tree partitioning)Wherein n is x ≤d x ,n y ≤d y ,n z ≤d z . Thus, the volume of the LCU may be calculated in equation (3) as follows:
in another embodiment, the actual bounding box for the LCU may be calculated based on points inside the LCU. The 3D coordinates of all points in the LCU may be expressed as (x) i ,y i ,z i ) (for i =0,1,.., N-1), where N is the number of points in the LCU. The minimum and maximum values along the x, y and z dimensions can be calculated in equations (4) to (9):
x min =min(x 0 ,x 1 ,...,x N-1 ) Equation (4)
x max =max(x 0 ,x 1 ,...,x N-1 ) Equation (5)
y min =min(y 0 ,Y 1 ,...,y N-1 ) Equation (6)
y max =max(y 0 ,y 1 ,...,y N-1 ) Equation (7)
Z min =min(z 0 ,z 1 ,...,z N-1 ) Equation (8)
z max =max(z 0 ,z 1 ,...,z N-1 ) Equation (9)
The volume of the LCU may be calculated in equation (10) as follows:
volume of LCU = (x) max +1-x min )((y max +1-y min )(z max +1-z min ) (equation 10)
Given the density of LCUs and a threshold D th The geometric coding mode may be determined as follows:
if LCU _ Density ≦ D th Prediction tree based geometry coding may be used for the LCU.
Otherwise, if LCU _ Density ≧ D th Octree-based geometric coding may be used for the LCU.
In another embodiment, two thresholds D may be defined th _ Low And D th _ high Wherein D is th _ Low <D th _ high . Then, the geometric coding mode may be determined as follows:
if it is notD th _ Low LCU _ Density ≦ D th _ high Then prediction tree based geometric coding may be used for the LCU;
otherwise, if D th _ Low LCU _ Density or LCU _ Density > D th _ high Octree-based geometric coding may be used for the LCU.
In another embodiment, two density thresholds D may be defined th _ Low And D th _ high And another point number threshold value N th . The geometric coding mode may be determined as follows:
if D is th _ Low LCU _ Density ≤ D th _ high And N is more than or equal to N th Then prediction tree based geometric coding may be used for the LCU.
Otherwise, if D th _ Low LCU _ Density, or LCU _ Density > D th _ high Or N < N th Octree coding may be used for the LCU.
Similarly, a plurality of density thresholds and point number thresholds may be defined and used to determine a geometric coding mode among two or more candidates. A similar approach may be used to determine an attribute decoding mode among two or more candidates.
Since the geometric coding modes of LCUs may differ, signaling information needs to be sent in the bitstream to indicate to the decoder which geometric coding mode to use. The corresponding syntax can be specified in table 1 as follows:
table 1: syntax table specifying geometric coding modes
As shown in table 1, a geometric coding mode flag (e.g., geometric _ coding _ mode) may specify a geometric coding mode for coding an LCU. When the geometric coding mode flag is set to 0, octree-based geometric coding may be applied. When the geometric coding mode flag is set to 1, prediction tree based geometric coding may be used. Accordingly, when the geometry coding mode flag is equal to 0, first signaling information (e.g., octree _ lcu _ coding ()) can be signaled based on table 1 to specify the use of octree-based geometry coding. When the geometry coding mode flag is equal to 1, second signaling information (e.g., predictive _ tree _ lcu _ coding ()) may be signaled based on table 1 to specify the use of prediction tree based geometry coding. Note that the geometric coding mode flag may be entropy coded with context. In another embodiment, the geometric coding mode flag may be coded with bypass coding.
In another embodiment, three modes may be used. Without loss of generality, the three geometric decoding modes can be represented as first _ mode, second _ mode, and third _ mode. The corresponding syntax table can be specified in table 2 as follows:
table 2: syntax table specifying three geometric coding modes
As shown in table 2, a geometric coding mode flag (e.g., geometric _ coding _ mode) may specify a geometric coding mode for the LCU. When the geometric coding mode flag is set to 0, first _ mode geometric coding may be used. When the geometric coding mode flag is set to 1, second _ mode geometric coding may be used. Otherwise, when the geometric coding mode flag is set to neither 0 nor 1, third _ mode geometric coding may be used. Accordingly, when the geometric coding mode flag is equal to 0, first signaling information (e.g., first _ mode _ lcu _ coding ()) may be signaled based on table 2 to specify the use of first _ mode geometric coding. When the geometry decoding mode flag is set to 1, second signaling information (e.g., second _ mode _ lcu _ coding ()) may be signaled based on table 2 to specify the use of second _ mode geometry decoding. When the geometry decoding mode flag is set to a value other than 0 or 1, third signaling information (e.g., third _ mode _ lcu _ coding ()) is signaled based on table 2 to specify the use of third _ mode geometry decoding.
The first mode may be most commonly applied without loss of generality. Thus, the geometric coding mode flag (e.g., geometry _ coding _ mode) may be binarized as follows: (a) Bin0=1 may represent first _ mode; (2) Bin0=0 and Bin1=1 may represent second _ mode; and (c) Bin0=0 and Bin1=0 may represent third _ mode, where Bin0 and Bin1 may be entropy coded with separate contexts.
The proposed methods can be used alone or in any order in combination. Further, each of the method (or embodiment), encoder and decoder may be implemented by processing circuitry (e.g., one or more processors or one or more integrated circuits). In one example, one or more processors may execute a program stored in a non-transitory computer readable medium.
It should be noted that the present disclosure is not limited to TMC13 software, MPEG-PCC, or AVS-PCC standards. The present disclosure provides a general solution for other systems, such as PCC systems.
Fig. 18 and 19 show flowcharts outlining the process (1800) and the process (1900) according to embodiments of the present disclosure. Processes (1800) and (1900) may be used during an encoding and/or decoding process of a point cloud. In various embodiments, processes (1800) and (1900) may be performed by processing circuitry, e.g., processing circuitry in terminal device (110), processing circuitry that performs the functions of encoder (203) and/or decoder (201), processing circuitry that performs the functions of encoder (300), decoder (400), encoder (500), and/or decoder (600), etc. In some embodiments, processes (1800) and (1900) may be implemented in software instructions such that when the software instructions are executed by processing circuitry, the processing circuitry performs processes (1800) and (1900), respectively.
As shown in fig. 18, the process (1800) starts (S1801) and proceeds to (S1810).
At (S1810), geometric coding may be performed on the point cloud at a first segmentation depth.
At (S1820), a plurality of LCUs of the point cloud may be determined at a second segmentation depth.
At (S1830), a coding state of an LCU of the plurality of LCUs of the point cloud may be set at a second segmentation depth.
At (S1840), geometric coding may be performed on the plurality of LCUs of the point cloud at the second segmentation depth based on the coding state of the LCUs at the second segmentation depth.
In some embodiments, the geometric coding may include one of octree-based geometric coding and prediction tree-based coding.
In an embodiment, a decode state of an LCU may be set with an initial state of a point cloud, where the initial state of the point cloud may be obtained before being decoded at a root node segmented from the point cloud based on geometric decoding.
In another embodiment, when the LCU is a first LCU of a plurality of LCUs of the point cloud at the second segmentation depth, the coding state may be obtained and stored after coding the point cloud based on geometric coding at the first segmentation depth.
In another embodiment, when the LCU is not the first LCU of the plurality of LCUs of the point cloud at the second partition depth, the decoding status of the LCU may be set with the stored decoding status. The stored coding state may be obtained (i) after coding the point cloud based on geometric coding at a first partition depth, or (ii) stored prior to coding a first LCU of a plurality of LCUs of the point cloud based on geometric coding at a second partition depth.
In some implementations, the coding state may include at least one of context of entropy coding associated with the LCU or geometric occupancy history information associated with the LCU.
In some implementations, each of the plurality of LCUs may include a respective node at the second segmentation depth.
As shown in fig. 19, the process (1900) starts (S1901) and proceeds to (S1910).
At (S1910), a density of LCUs of the point cloud may be determined. The density of the LCUs may be a ratio of the number of points in the LCUs to the volume of the LCUs.
At (S1920), a geometric coding mode of the LCU may be determined based on the density of the LCU and a first threshold.
At (S1930), geometry coding mode information may also be signaled in the bitstream. The geometric coding mode information may indicate a determined geometric coding mode of the LCU based on the density of the LCU and a first threshold.
In an example, a geometric coding mode of an LCU may be determined to be predictive tree geometric coding based on a density of the LCU being equal to or less than a first threshold. In another example, a geometric coding mode of an LCU may be determined to be octree-based geometric coding based on a density of the LCU being greater than a first threshold.
In an example, a geometric coding mode of an LCU may be determined to be predictive tree geometric coding based on a density of LCUs being equal to or greater than a first threshold and being equal to or less than a second threshold, where the second threshold may be greater than the first threshold. In another example, a geometric coding mode of an LCU may be determined to be octree-based geometric coding based on a density of the LCU being less than a first threshold or greater than a second threshold.
In an example, a geometric coding mode of an LCU may be determined to be predictive tree geometric coding based on (i) a density of LCUs being equal to or greater than a first threshold and equal to or less than a second threshold and (ii) a number of points in the LCU being equal to or greater than a number of points threshold. In another example, a geometric coding mode of an LCU may be determined to be octree-based geometric coding based on one of (i) a density of LCUs is less than a first threshold or greater than a second threshold and (ii) a number of points in the LCU is less than a number of points threshold.
In some embodiments, the geometric coding mode information may be signaled with a first value based on whether the geometric coding mode is the first geometric coding mode. The geometry coding mode information may be signaled with a second value based on the geometry coding mode being a second geometry coding mode.
In process (1900), the geometric coding mode information may be entropy coded with context or may be coded with bypass coding.
In an embodiment, the geometric coding mode information may be signaled with a first value based on whether the geometric coding mode is the first geometric coding mode. In another embodiment, the geometric coding mode information may be signaled with a second value based on whether the geometric coding mode is a second geometric coding mode. In another example, the geometry coding mode information may be signaled with a third value based on the geometry coding mode being a third geometry coding mode.
In some embodiments, the binarization information may be signaled with a first value only in the first bin, wherein the binarization information having the first value may indicate the first geometric coding mode. In some embodiments, the binarization information may be signaled with a second value in a first bin and a first value in a subsequent second bin, wherein the binarization information having the second value in the first bin and having the first value in the second bin may indicate the second geometric coding mode. In some embodiments, the binarization information may be signaled with a second value in the first bin and with a second value in the second bin, wherein the binarization information having the second value in the first bin and the second bin may indicate the third geometric coding mode.
In some implementations, the binarization information in a first bin may be entropy coded with a first context and the binarization information in a second bin may be entropy coded with a second context.
As mentioned above, the techniques described above may be implemented as computer software using computer readable instructions and physically stored in one or more computer readable media. For example, fig. 20 illustrates a computer system (2000) suitable for implementing certain embodiments of the disclosed subject matter.
The computer software may be decoded using any suitable machine code or computer language that may be subject to assembly, compilation, linking, or similar mechanisms to create code comprising instructions that may be executed directly by one or more computer Central Processing Units (CPUs), graphics Processing Units (GPUs), etc., or by interpretation, microcode execution, etc.
The instructions may be executed on various types of computers or components thereof, including, for example, personal computers, tablets, servers, smart phones, gaming devices, internet of things devices, and so forth.
The components shown in fig. 20 of computer system (2000) are exemplary in nature and are not intended to suggest any limitation as to the scope of use or functionality of the computer software implementing embodiments of the present disclosure. Neither should the configuration of the components be interpreted as having any dependency or requirement relating to any one or combination of components illustrated in the exemplary embodiments of the computer system (2000).
The computer system (2000) may include some human interface input devices. Such human interface input devices may be responsive to input by one or more human users through, for example, tactile input (e.g., keystrokes, sliding, data glove movement), audio input (e.g., speech, clapping hands), visual input (e.g., gestures), olfactory input (not depicted). Human interface devices may also be used to capture certain media that are not necessarily directly related to human conscious input, such as audio (e.g., voice, music, ambient sounds), images (e.g., scanned images, photographic images obtained from still-image cameras), video (e.g., two-dimensional video, three-dimensional video including stereoscopic video).
The input human interface device may include one or more of the following (only one in each depiction): keyboard (2001), mouse (2002), trackpad (2003), touchscreen (2010), data gloves (not shown), joystick (2005), microphone (2006), scanner (2007), camera (2008).
The computer system (2000) may also include some human interface output devices. Such human interface output devices may stimulate one or more human user's senses through, for example, tactile output, sound, light, and smell/taste. Such human interface output devices may include tactile output devices (e.g., tactile feedback through a touch screen (2010), data glove (not shown), or joystick (2005), although there may also be tactile feedback devices that do not act as input devices), audio output devices (e.g., speakers (2009), headphones (not depicted)), visual output devices (e.g., screens (2010) including CRT screens, LCD screens, plasma screens, OLED screens, each with or without touch screen input capability, each with or without tactile feedback capability — some of which may be capable of outputting two-dimensional visual output or more than three-dimensional output through such means as stereoscopic output, virtual reality glasses (not depicted), holographic displays, and canisters (not depicted)), and printers (not depicted).
The computer system (2000) may also include a human-accessible storage device and its associated media, including, for example, a CD/DVD ROM/RW with a CD/DVD or similar medium (2021)
(2020) Optical media, thumb drive (2022), removable hard drive or solid state drive (2023), conventional magnetic media (not depicted) such as magnetic tape and floppy disk, ROM/ASIC/PLD based application specific devices such as security dongle (not depicted), and the like.
Those skilled in the art will also appreciate that the term "computer-readable medium" as used in relation to the presently disclosed subject matter does not include transmission media, carrier waves, or other transitory signals.
The computer system (2000) may also include an interface to one or more communication networks. For example, the network may be wireless, wired, optical. The network may also be local, wide area, metropolitan, vehicle and industrial, real-time, delay tolerant, etc. Examples of networks include local area networks such as ethernet, wireless LAN, cellular networks including GSM, 3G, 4G, 5G, LTE, etc., TV wired or wireless wide area digital networks including cable TV, satellite TV, and terrestrial broadcast TV, vehicle and industrial networks including CANBus, etc. Certain networks typically require external network interface adapters attached to certain universal data ports or peripheral buses (2049) (e.g., such as USB ports of a computer system (2000)); other networks are typically integrated into the core of the computer system (2000) by attaching to a system bus as described below (e.g., an ethernet interface into a PC computer system or a cellular network interface into a smartphone computer system). Using any of these networks, the computer system (2000) may communicate with other entities. Such communications may be unidirectional, receive-only (e.g., broadcast TV), transmit-only unidirectional (e.g., CANbus to certain CANbus devices), or bidirectional, e.g., to other computer systems using a local or wide area digital network. Certain protocols and protocol stacks may be used on each of these networks and network interfaces as described above.
The human interface device, human accessible storage device and network interface described above may be attached to the core (2040) of the computer system (2000).
The core (2040) may include one or more Central Processing Units (CPUs) (2041), graphics Processing Units (GPUs) (2042), special purpose programmable processing units in the form of Field Programmable Gate Arrays (FPGAs) (2043), hardware accelerators (2044) for certain tasks, and the like. These devices, as well as Read Only Memory (ROM) (2045), random access memory (2046), internal mass storage such as internal non-user accessible hard drives, SSDs, etc. (2047) may be connected by a system bus (2048). In some computer systems, the system bus (2048) may be accessed in the form of one or more physical plugs to enable expansion by other CPUs, GPUs, and the like. The peripheral devices may be attached directly to the system bus (2048) of the core or may be attached to the system bus (2048) of the core through a peripheral bus (2049). The architecture of the peripheral bus includes PCI, USB, etc.
The CPU (2041), GPU (2042), FPGA (2043), and accelerator (2044) may execute certain instructions, the combination of which may constitute the computer code described above. The computer code may be stored in ROM (2045) or RAM (2046). The transition data can also be stored in RAM
(2046) And the permanent data may be stored, for example, in an internal mass storage device (2047). Fast storage and retrieval of any of the memory devices may be achieved through the use of cache memory, which may be associated with one or more of a CPU (2041), GPU
(2042) Mass storage device (2047), ROM (2045), RAM (2046), etc.
The computer-readable medium may have computer code thereon for performing various computer-implemented operations. The media and computer code may be those specially designed and constructed for the purposes of the present disclosure, or they may be of the kind well known and available to those having skill in the computer software arts.
By way of example, and not by way of limitation, a computer system having an architecture (2000) and, in particular, a core (2040) may provide functionality as a result of a processor (including a CPU, GPU, FPGA, accelerator, etc.) executing software embodied in one or more tangible computer-readable media. Such computer readable media may be media associated with user accessible mass storage as introduced above, as well as certain storage of the core (2040) of a non-transitory nature, such as core internal mass storage (2047) or ROM (2045). Software implementing various embodiments of the present disclosure may be stored in such devices and executed by the core (2040). The computer readable medium may include one or more memory devices or chips, according to particular needs. The software will cause the core (2040), and in particular the processors therein (including CPUs, GPUs, FPGAs, etc.), to perform certain processes or certain portions of certain processes described herein, including defining data structures stored in RAM (2046) and modifying such data structures according to the processes defined by the software. Additionally or alternatively, the computer system may provide functionality as a result of logic hardwired or otherwise implemented in a circuit (e.g., accelerator (2044)) that may operate in place of or in conjunction with software to perform certain processes or certain portions of certain processes described herein. Where appropriate, references to software may include logic, and references to logic may include software. References to a computer-readable medium may include, where appropriate, circuitry that stores software for execution (e.g., an Integrated Circuit (IC)), circuitry that implements logic for execution, or circuitry that stores software for execution (e.g., an Integrated Circuit (IC)), and circuitry that implements logic for execution. This disclosure encompasses any suitable combination of hardware and software.
While this disclosure has described several exemplary embodiments, there are alterations, permutations, and various substitute equivalents, which fall within the scope of this disclosure. It will thus be appreciated that those skilled in the art will be able to devise numerous systems and methods which, although not explicitly shown or described herein, embody the principles of the disclosure and are thus within its spirit and scope.
Claims (20)
1. A method of point cloud geometric encoding in a point cloud encoder, comprising:
performing geometric coding on the point cloud at a first partition depth;
determining a plurality of Largest Coding Units (LCUs) of the point cloud at a second partition depth;
setting a coding state of an LCU of a plurality of LCUs of the point cloud at the second segmentation depth; and
performing the geometric coding on a plurality of LCUs of the point cloud at the second partition depth based on a coding state of the LCU at the second partition depth.
2. The method of claim 1, wherein the geometric coding comprises one of octree-based geometric coding and prediction tree-based coding.
3. The method of claim 1, wherein setting a coding state of the LCU comprises:
setting a decoding state of the LCU with an initial state of the point cloud obtained prior to decoding the point cloud based on the geometric decoding.
4. The method of claim 1, wherein setting a coding state of the LCU comprises:
based on the LCU being a first LCU of a plurality of LCUs of the point cloud at the second segmentation depth,
storing a coding state obtained after coding the point cloud based on the geometric coding at the first segmentation depth.
5. The method of claim 1, wherein setting a coding state of the LCU comprises:
based on the LCU not being a first LCU of a plurality of LCUs of the point cloud at the second segmentation depth, an
Setting a coding state of the LCU with a stored coding state that is either (i) obtained after coding the point cloud based on the geometric coding at the first partition depth or (ii) stored before coding a first LCU of a plurality of LCUs of the point cloud based on the geometric coding at the second partition depth.
6. The method of claim 1, wherein the coding state comprises at least one of context of entropy coding associated with the LCU or geometric occupancy history information associated with the LCU.
7. The method of claim 1, wherein each of the plurality of LCUs comprises a respective node at the second partition depth.
8. A method of point cloud geometric encoding in a point cloud encoder, comprising:
determining a density of Largest Coding Units (LCUs) of a point cloud, the density of LCUs being a ratio of a number of points in the LCUs and a volume of the LCUs;
determining a geometric coding mode of the LCU based on a density of the LCU and a first threshold; and
signaling geometry coding mode information in a bitstream, the geometry coding mode information indicating a determined geometry coding mode of the LCU based on a density of the LCU and the first threshold.
9. The method of claim 8, wherein determining a geometric coding mode of the LCU further comprises:
determining, based on a density of the LCU being equal to or less than the first threshold, that a geometric coding mode of the LCU is predictive tree geometric coding; and
determining that a geometric coding mode of the LCU is octree-based geometric coding based on the density of the LCU being greater than the first threshold.
10. The method of claim 8, wherein determining a geometric coding mode of the LCU further comprises:
determining that a geometric coding mode of the LCU is predictive tree geometric coding based on a density of the LCU being equal to or greater than the first threshold and equal to or less than a second threshold, the second threshold being greater than the first threshold; and
determining that a geometric coding mode of the LCU is octree-based geometric coding based on the density of the LCU being less than the first threshold or greater than the second threshold.
11. The method of claim 8, wherein determining a geometric coding mode of the LCU further comprises:
determining that a geometric coding mode of the LCU is predictive tree geometric coding based on (i) a density of the LCU being equal to or greater than the first threshold and equal to or less than a second threshold and (ii) a number of points in the LCU being equal to or greater than a number of points threshold; and
determining that a geometric coding mode of the LCU is based on one of (i) a density of the LCU being less than the first threshold or greater than the second threshold and (ii) a number of points in the LCU being less than the point number threshold is octal tree based geometric coding.
12. The method of claim 8, wherein signaling the geometric coding mode information further comprises:
signaling the geometric coding mode information with a first value based on the geometric coding mode being a first geometric coding mode; and
signaling the geometric coding mode information with a second value based on the geometric coding mode being a second geometric coding mode.
13. The method of claim 8, wherein the geometric coding mode information is either entropy coded with context or coded with bypass coding.
14. The method of claim 8, wherein signaling the geometric coding mode information further comprises:
signaling the geometric coding mode information with a first value based on the geometric coding mode being a first geometric coding mode;
signaling the geometric coding mode information with a second value based on the geometric coding mode being a second geometric coding mode; and
signaling the geometric coding mode information with a third value based on the geometric coding mode being a third geometric coding mode.
15. The method of claim 14, wherein signaling the geometric coding mode information further comprises:
signaling binarization information with a first value in only a first bin based on the binarization information indicating a first geometric coding mode;
signaling, based on the binarization information indicating a second geometric coding mode having a second value in the first bin and having the first value in a subsequent second bin, the binarization information with the second value in the first bin and with the first value in the second bin; and
signaling the binarization information with the second value in the first bin and with the second value in the second bin based on the binarization information having the second value in the first bin and the second bin indicating a third geometric coding mode.
16. The method of claim 15, wherein the binarization information in the first bin is entropy coded with a first context and the binarization information in the second bin is entropy coded with a second context.
17. An apparatus for processing point cloud data, comprising:
processing circuitry configured to:
performing geometric coding on the point cloud at a first segmentation depth;
determining a plurality of Largest Coding Units (LCUs) of the point cloud at a second partition depth;
setting a coding state of an LCU of a plurality of LCUs of the point cloud at the second segmentation depth; and
performing the geometric coding on a plurality of LCUs of the point cloud at the second partition depth based on a coding state of the LCU at the second partition depth.
18. The apparatus of claim 17, wherein the processing circuitry is further configured to:
setting a decoding state of the LCU with an initial state of the point cloud obtained prior to decoding the point cloud based on the geometric decoding.
19. The apparatus of claim 17, wherein the processing circuitry is further configured to:
based on the LCU being a first LCU of a plurality of LCUs of the point cloud at the second segmentation depth,
storing the coding state obtained after coding the point cloud based on the geometric coding at the first segmentation depth.
20. The apparatus of claim 17, wherein the processing circuitry is further configured to:
based on the LCU not being a first LCU of a plurality of LCUs of the point cloud at the second segmentation depth, an
Setting a coding state of the LCU with a stored coding state that is either (i) obtained after coding the point cloud based on the geometric coding at the first partition depth or (ii) stored before coding a first LCU of a plurality of LCUs of the point cloud based on the geometric coding at the second partition depth.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US63/121,835 | 2020-12-04 | ||
| US17/466,729 | 2021-09-03 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| HK40075483A true HK40075483A (en) | 2023-01-20 |
Family
ID=
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11683524B2 (en) | Method and apparatus for point cloud compression | |
| CN115336243B (en) | Point cloud data processing method, device and medium | |
| KR102650334B1 (en) | Method and apparatus for point cloud coding | |
| JP7233561B2 (en) | Method for point cloud compression and its apparatus and computer program | |
| US11611775B2 (en) | Method and apparatus for point cloud coding | |
| CN115152225B (en) | Method, computer device, equipment and computer readable medium for point cloud decoding | |
| KR20240131434A (en) | Triangulation method using boundary information for dynamic mesh compression | |
| JP7470211B2 (en) | Method and apparatus for computing distance-based weighted averages for point cloud coding | |
| JP2023520565A (en) | Method, Apparatus and Program for Constructing 3D Geometry | |
| HK40075483A (en) | Method and apparatus for point cloud coding | |
| JP7609509B2 (en) | Predictive coding of boundary UV2XYZ indices for mesh compression | |
| JP2025530269A (en) | Texture coordinate compression using chart partitioning | |
| HK40075481A (en) | Method, computer apparatus, device and computer readable medium for point cloud coding | |
| CN116250009A (en) | Fast block generation for video-based point cloud coding |