Disclosure of Invention
The technical problem to be solved by the invention is to provide a point cloud geometric compression method based on block division and deep learning, which can realize better point cloud compression effect.
In order to solve the technical problem, the invention provides a point cloud geometric compression method based on block division and deep learning, which comprises the following steps:
(1) partitioning point clouds by using a farthest point sampling FPS and a K neighbor KNN operation; sampling S structural points in the original point cloud by using farthest point sampling, and then sequentially acquiring K adjacent points in the original point cloud by using a K adjacent method by taking each structural point as a center, namely dividing the original point cloud into S blocks, wherein each block has K points;
(2) sequentially taking the S point cloud blocks obtained in the step (1) as the input of an encoder in a self-encoder, compressing the S blocks by the encoder to obtain S low-dimensional feature vectors, quantizing the low-dimensional feature vectors, and splicing the quantized low-dimensional feature vectors with the coordinates of the S structure points sampled in the step (1) to form the final hidden layer representation of the point cloud;
(3) entropy coding the final hidden layer representation of the point cloud to form a final bit stream, and transmitting the final bit stream to a decoding end;
(4) a decoding end obtains the transmitted bit stream and carries out entropy decoding on the bit stream into S low-dimensional feature vectors and S structure point coordinates; and decoding the S low-dimensional feature vectors into S point cloud blocks by a decoder of the self-encoder, and finally adding and combining the S point cloud blocks and the corresponding S structural point coordinates to obtain a final point cloud reconstruction result.
Preferably, in the step (1), the partitioning of the point cloud by using the farthest point sampling FPS and the K-nearest neighbor KNN operation specifically includes: for an original point cloud with N points, sampling S structural points from the original point cloud by using a farthest point sampling; for each structural point, K adjacent points of the structural point are found from the original point cloud by using KNN, and the coordinates of the sampling point are subtracted from the coordinates of the K adjacent points, namely the coordinates of the K points are moved to a coordinate system with the structural point as an origin, so that a block can be obtained; for S structure points, S blocks may be generated, each block having K points, each block may be considered a "mini" point cloud.
Preferably, in the step (2) and the step (4), the self-encoder is an end-to-end trained deep neural network, and the input and the output of the network are the origin cloud block and the reconstruction result of the point cloud block respectively; the self-encoder is divided into three parts: analyzing the transform encoder, the quantization and the synthesis transform decoder and using the chamfer distance to constrain the error between the reconstructed block and the input block, the loss function from the encoder is defined as:
Loss=DCD+λR
wherein DCDAnd R represents the bit rate estimated according to the quantized hidden layer characteristics, and lambda represents a Lagrange multiplier.
The invention has the beneficial effects that: the invention provides a method for dividing point cloud into small blocks, wherein points in the point cloud are independent from each other, and regular voxel spatial correlation does not exist; a farthest point sampling and K nearest neighbor algorithm is used, a complex point cloud is divided into a plurality of small blocks with simple shapes, and the local structure information of the point cloud is better captured in the subsequent reconstruction process; the self-encoder structure based on the block compresses the point cloud, can obtain better results in the reconstruction process, can freely adjust the point number of the reconstructed point cloud, and has great reference significance for solving other problems related to point cloud reconstruction.
Detailed Description
A point cloud geometric compression method based on block division and deep learning comprises the following steps:
(1) partitioning point clouds by using a farthest point sampling FPS and a K neighbor KNN operation; sampling S structural points in the original point cloud by using farthest point sampling, and then sequentially acquiring K adjacent points in the original point cloud by using a K adjacent method by taking each structural point as a center, namely dividing the original point cloud into S blocks, wherein each block has K points;
(2) sequentially taking the S point cloud blocks obtained in the step (1) as the input of an encoder in a self-encoder, compressing the S blocks by the encoder to obtain S low-dimensional feature vectors, quantizing the low-dimensional feature vectors, and splicing the quantized low-dimensional feature vectors with the coordinates of the S structure points sampled in the step (1) to form the final hidden layer representation of the point cloud;
(3) entropy coding the final hidden layer representation of the point cloud to form a final bit stream, and transmitting the final bit stream to a decoding end;
(4) a decoding end obtains the transmitted bit stream and carries out entropy decoding on the bit stream into S low-dimensional feature vectors and S structure point coordinates; and decoding the S low-dimensional feature vectors into S point cloud blocks by a decoder of the self-encoder, and finally adding and combining the S point cloud blocks and the corresponding S structural point coordinates to obtain a final point cloud reconstruction result.
As shown in fig. 1, the point cloud is first divided into S blocks, each block having K points. In the encoding process, the encoding result of the self-encoder and the S sampling points are connected to form a point cloud hidden layer representation with the size of a final (S, 3+ d) matrix. During decoding, the hidden representation of the point cloud is split into two parts identical to the encoder output. And finally, assembling the decoded blocks to generate a point cloud reconstruction result with the matrix size of (S multiplied by k, 3), wherein k represents the number of points of the point cloud of the reconstruction block. It should be noted that the number of points of the input block point cloud and the number of points of the reconstructed block point cloud may be set to be unequal, that is, K may not be equal to K.
Different from a two-dimensional image blocking method, the method provided by the invention adopts a method of Farthest Point Sampling (FPS) and K-Nearest Neighbors (KNN) to segment the Point cloud into a plurality of blocks with the same resolution, and a specific blocking process is described as follows.
First, for an original point cloud having N points, S structural points are sampled from the original point cloud using the farthest point sampling.
Next, for each structure point, K neighbor points of the structure point are found from the original point cloud using KNN. And subtracting the coordinates of the sampling point from the coordinates of the K adjacent points, namely moving the coordinates of the K points to a coordinate system taking the structure point as an origin to obtain a block. For S structure points, S blocks can be generated, each block having K points. Each block can be viewed as a "mini" point cloud.
Fig. 3(a) - (d) are illustrations of the S and K sizes in block segmentation. Fig. 3(a) shows a point cloud of 8192 points, fig. 3(b) shows an FPS sampling result when S takes 32, fig. 3(c) shows a blocking result when S takes 32 and K takes 256, and fig. 3(d) shows a blocking result when S takes 32 and K takes 512. It can be seen that the point cloud structure information cannot be completely captured even when the product of S and K is just the number of input point clouds. It is necessary to use sxk ═ α N (α >1) to avoid the case where some points in the original point cloud are missed. In a specific implementation, we use α ═ 2.
When the number S of the blocks is large and the resolution K of the blocks is small, the self-encoder is more suitable for high-quality reconstruction of point clouds under high bit rate; when the number S of blocks is small and the resolution K of the blocks is large, it is more suitable for compressing a smaller volume of a point cloud at a low bit rate.
The invention designs a self-encoder based on PointNet, which realizes the transformation and compression of point cloud blocks. The self-encoder comprises an analysis transform (f)a) Quantization function (Q) and synthesis transform (f)s). We extract hidden layer features from a simple block using analytical transformation; quantizing the hidden layer features with a quantization function to facilitate further compression; the quantized features are reconstructed into one block using a synthesis transform.
The analytical transformation first uses a Set Abstraction (SA) to extract a local feature in the local region for each point, and then uses PointNet to extract higher level features on the global scale. For the quantization process, during training, we superimpose a noise generated by uniform distribution between-0.5 and 0.5 on each element on the hidden layer features extracted by the analysis transformation. During testing, rounding operation is carried out on the hidden layer characteristics to realize subsequent entropy coding. The synthesis transformation is a multi-layer perceptron composed of several fully connected layers, and the last step is to transform the output of the multi-layer perceptron into a two-dimensional matrix of block geometric information.
The self-encoder uses a chamfer distance (D)CD) To constrain the error between the reconstructed result and the input patch. As shown in fig. 2, the final loss function from the encoder is L ═ DCD+ λ R, where R represents the bit rate estimated by the probability distribution of hidden layer features. It should be noted that the influence of λ on the compression effect is not significant, and we mainly adjust the compression ratio by changing the size of the bottleneck dimension of the self-encoder.
Examples
The present invention will be described in further detail with reference to a specific embodiment. For ease of explanation, and without loss of generality, the following assumptions are made:
the method proposed by the invention is intended to be trained and tested by using a ModelNet40 data set. The ModelNet40 dataset contains 9835 training point clouds created by uniform sampling on each 3D model in ModelNet40 and 2467 test point clouds, each point cloud containing 8192 points.
The present embodiment controls the compression ratio of the input point cloud by S, K and the size of the bottleneck dimension d from the encoder. Experiments show that for the point cloud of 8192 points used in the embodiment, when the dimension of the bottleneck is between 8 and 16, the compression performance is good.
Taking a point cloud reconstruction at a low bit rate as an example, where S is 32, K is 512, and d is 16, that is, where α is 2, that is, sxk is 2N, is set to capture the overall structure of the point cloud more completely, where N is the number of points of the input point cloud, that is, 8192. The present embodiment first divides the point cloud in each training set into 32 blocks, each block contains 512 points, i.e. 9835 points cloud is converted into 9835 × 32 blocks. We use these 9835 × 32 blocks as training data for the self-encoder, and such a huge amount of data can significantly avoid the overfitting problem often encountered in model training. Meanwhile, as the shape of each block is simple, the self-encoder can obtain good reconstruction effect without large parameter. The parameters of the self-encoder in this embodiment are as follows:
SA(K=512,8,[32,64,128])→PN(64,32,d=16)→Quantization→FC(d=16,128)
→ReLU→FC(128,256)→ReLU→FC(256,256×3)
the parameters of each operation SA (representing the set feature extraction layer), PN (representing the PointNet layer), and FC (representing the fully connected layer) are defined as follows:
set feature extraction layer: SA (number of blocks, number of groups, shared multi-layer perceptron size)
PointNet layer: PN (sharing multi-layer perceptron size)
Full connection layer: FC (input feature dimension, output feature dimension)
After the self-encoder is trained in the above-mentioned manner, it is applied to the compression process proposed by us. The reconstruction results from the encoder over different iterations are shown in fig. 4.
During compression, we divide the input point cloud into 32 blocks using FPS and KNN to input the trained auto-encoder. The self-encoder converts the 32 blocks into 32 hidden layer representations, and the 32 hidden layer representations and the sampled 32 structure point coordinates are spliced into a (32, 16+3) two-dimensional matrix to serve as a final representation of the input point cloud. The two-dimensional matrix is transmitted to a decoding end after entropy coding.
In the decoding process, the two-dimensional matrix of (32, 16+3) is first split into 32, 16 block implicit representation sets and 32 structure point coordinate information with a matrix size of (32, 3). The 32 16-dimensional block hidden representations are reconstructed into 32 blocks by the decoder of the self-encoder, each block having a matrix size of (256, 3). And then adding the 32 blocks and the corresponding structural point coordinates, namely, shifting the 32 blocks to the original positions of the 32 blocks in the input point cloud again, and taking the union set to obtain a coordinate set with a two-dimensional matrix size of (32 multiplied by 256, 3), namely finishing the overall reconstruction of the original point cloud. Examples of three-dimensional point cloud reconstruction results are shown in fig. 5(a) - (d).