EP4627533A1

EP4627533A1 - Holographic communication pipeline

Info

Publication number: EP4627533A1
Application number: EP22822667.6A
Authority: EP
Inventors: Ali El Essaili; Natalya TYUDINA; Joerg Christian Ewert
Original assignee: Telefonaktiebolaget LM Ericsson AB
Current assignee: Telefonaktiebolaget LM Ericsson AB
Priority date: 2022-12-02
Filing date: 2022-12-02
Publication date: 2025-10-08
Also published as: WO2024115959A1

Abstract

The present disclosure provides methods and systems for compressing a digital image using an enhanced technique for real-time processing and compression of 3D meshes that utilizes not just vertices and faces information, but also color information to encode all the information required to render a three-dimensional (3D) digital image as a hologram for display on a wireless communication device, other computing device, or as part of an augmented reality or virtual reality display in a single bitstream which avoids synchronization problems. In this approach all three elements are encoded: vertices, colors, and faces. In the end, the hologram can look like a complete surface, with triangulation done faster on a sending device or in the cloud with no need to synchronize a mesh and color stream.

Description

HOLOGRAPHIC COMMUNICATION PIPELINE

Technical Field

[0001] The present disclosure relates to methods and systems for compressing a digital image utilizing enhanced mesh compression.

[0002] Immersive media will be key for Extended Reality (XR) and Metaverse applications. Compression of immersive media such as red, green, and blue (RGB) and depth, meshes, and point clouds is not established as in legacy two dimensional (2D) video codecs. There are different approaches for compression which are reviewed in the following sections.

3D-2D Compression

[0003] To facilitate early adoption of three dimensional (3D) visual representations on existing hardware one prominent approach is use 3D to 2D and use standard video compression algorithms such as MPEG H.264/H.265/H.266 codecs. MPEG Visual Volumetric Video-based Coding (V3C) and Video-based point cloud compression (V-PCC) are two codecs which are based on 2D video compression. The approach is based on projecting a 3D coordinate including color attributes onto a 2D plane. The color is coming from texture mapping and each 3D point attribute can be projected onto a set of 2D pixel attributes. The basic process is shown in Fig. 1 , where a 2D representation 102 can have a 3D point attribute projected on the 2D representation as shown in 3D representation 104.

[0004] V-PCC supports 3D to 2D compression. V-PCC workflow is shown in Fig. 2. V- PCC takes as input an input point cloud frame 202 and applies 2D encoding to geometry and texture, which results in a number of bitstreams 204-1, 204-2, 204-3, 204-4, 204-5, and 204-6 which are then multiplexed together at multiplexer 206.

Native 3D Compression

[0005] An alternative approach is to apply 3D compression for point cloud geometry and texture. Geometry-based point cloud compression (G-PCC) supports native 3D compression. The workflow is shown in Fig. 3. G-PCC uses point clouds geometry (positions) and texture (color attributes). An encoder 302 can receive positions 306 and attributes 308 and apply the 3D compression to create a geometry bitstream 310 and an attribute bitstream 312. A decoder 304 can perform the reverse and create positions 306 and attributes 308 from a geometry bitstream 310 and an attribute bitstream 312. Mesh Compression

[0006] A triangle mesh may be represented by its vertex data and by its connectivity. Vertex data comprises coordinates of all the vertices and optionally the coordinates of the associated normal vectors and textures. In its simplest form, connectivity captures the incidence relation between the triangles of the mesh and their bounding vertices. It may be represented by a triangle-vertex incidence table, which associates with each triangle the references to its three bounding vertices. Edgebreaker is an exemplary simple scheme for compressing the triangle/vertex incidence graphs (sometimes called connectivity or topology) of three- dimensional triangle meshes. Edgebreaker is used in current mesh codec algorithms, e.g., Draco. [0007] Texture and vertices can be encoded separately. Texture encoded by standard video encoders such as H.264/H.265, and vertices can be encoded also (connectivity information in mesh is compressed by mesh encoder such as Draco).

[0008] An example of traditional mesh compression is depicted in Figure 4 where both texture 404 and vertices 406 are captured of an object 402. The texture 404 is then encoded by a video encoder 408 to a video bitstream, and the vertices 406 are mesh encoded by a mesh encoder 410 to generate a mesh bitstream, which can be multiplexed with the video bitstream by multiplexer 412 to create the output bitstream 414.

Depth Compression

[0009] Depth data can be acquired for example from Lidar sensors (e.g., Intel realsense, Microsoft Kinect, back cameras of mobile devices as iPad/iPhone). Android and iPhone Operating System (iOS) devices are equipped with depth sensors while iPhone has a front TrueDepth camera, which provides depth data in real time that allows to determine the distance of a pixel from the front-facing camera. The Samsung Fold has front RGB Depth Camera on devices. For both they have the following depth formats: 640x480 depth maps, 640x480x3 RGB images. Fig. 5 depicts a workflow for depth compression. The depth map distance data 502 is serialized into an image format at 508 and then stored as a separate item in the file container 510. The encoding pipeline contains two steps:

[0010] 1) Convert (504) from the input format (e.g., float or int32 values) to an integer grayscale image format such as 16bit words.

[0011] 2) Compress (506) using an image / video codec supported by the file container type.

Point Cloud Generation

[0012] A point cloud can be generated from RGB and depth images. Camera parameters are used to map the RGB and depth information into point clouds. Intrinsic camera parameters (such as focal length, aperture, field of view, resolution, etc.) are denoted as (K) in the matrix below. The focal lengths arc 'x and/y are focal lengths, ex and cy are the central points on the x and y coordinates. fx 0 ex K = 0 fy cy . 0 0 1 .

[0013] The coordination of pixels can be represented by u and v. R is the rotational matrix and T is the translation matrix. With these two formulas, the point cloud can be obtained.

-x-

-u- v =KRT y z

-1- -1-

[0014] Some problems with the current standards for media compression are as follows.

Emerging point cloud compression / mesh codecs from Moving Picture Experts Group (MPEG) are being standardized now. Existing 2D codecs e.g., High Efficiency Video Coding (HEVC) do not fully support immersive media capabilities, e.g., 16 bits compression of depth images. Additionally, emerging point cloud compression codecs from Moving Picture Experts Group (MPEG) are not optimized for real-time communications (processing and compression time is not enough for real-time conversational services). Furthermore, hardware-based encoders are not sufficiently developed yet. Furthermore, there is no standard high quality compression algorithm for the compression of mesh and color data that works in real-time.

Summary

[0015] The present disclosure provides methods and systems for compressing a digital image using an enhanced technique for real-time processing and compression of 3D meshes that utilizes not just vertices and faces information, but also color information to encode all the information required to render a three-dimensional (3D) digital image as a hologram for display on a wireless communication device, other computing device, or as part of an augmented reality or virtual reality display in a single bitstream which avoids synchronization problems. In this approach all three elements are encoded: vertices, colors, and faces. In the end, the hologram can look like a complete surface, with triangulation done faster on a sending device or in the cloud with no need to synchronize a mesh and color stream.

[0016] In an embodiment, a method of compressing a digital image that can be implemented by one or more computing nodes is provided. The method can include determining a representation of a digital image corresponding to a set of vertices, color attributes corresponding to the set of vertices, and faces comprising subsets of vertices of the set of vertices. This set of vertices may be a subset of a larger set of vertices. The method can also include generating a mesh that comprises for each vertex of each of the faces, a three-dimensional vertex coordinate and a corresponding color attribute associated with each vertex coordinate. The method can also include encoding the mesh to provide a compressed digital image and transmitting the compressed digital image over a network.

[0017] In another embodiment, a computing node can be provided that compresses a digital image. The computing node can comprise processing circuitry that is configured to determine a representation of a digital image corresponding to a set of vertices, color attributes corresponding to the set of vertices, and faces comprising subsets of vertices of the set of vertices. The processing circuitry can also be configured to generate a mesh that comprises for each vertex of each of the faces a three-dimensional vertex coordinate and a corresponding color attribute associated with each vertex coordinate. The processing circuitry can also be configured to encode the mesh to provide a compressed digital image and transmit the compressed digital image over a network.

[0018] In another embodiment a non-transitory computer readable medium can be provided that comprises instructions, that when executed by a processor, perform operations. The operations can include determining a representation of a digital image corresponding to a set of vertices, color attributes corresponding to the set of vertices, and faces comprising subsets of vertices of the set of vertices. The operations can also include generating a mesh that comprises for each vertex of each of the faces, a three-dimensional vertex coordinate and a corresponding color attribute associated with each vertex coordinate. The operations can also include encoding the mesh to provide a compressed digital image and transmitting the compressed digital image over a network.

Brief Description of the Drawings

[0019] The accompanying drawing figures incorporated in and forming a part of this specification illustrate several aspects of the disclosure, and together with the description serve to explain the principles of the disclosure.

[0020] Figure 1 is a diagram illustrating a mapping of two dimensional pixels to a three dimensional representation according to some embodiments of the present disclosure;

[0021] Figure 2 is a diagram depicting a video-based point cloud compression workflow according to some embodiments of the present disclosure;

[0022] Figure 3 is a diagram depicting geometry-based point cloud compression workflow according to some embodiments of the present disclosure;

[0023] Figure 4 is a diagram depicting a mesh compression according to some embodiments of the present disclosure; [0024] Figure 5 is a diagram depicting a depth compression workflow according to some embodiments of the present disclosure;

[0025] Figure 6 is a diagram depicting an enhanced mesh compression for holographic communications according to some embodiments of the present disclosure;

[0026] Figure 7 is a flow chart of enhanced mesh compression for holographic communications according to some embodiments of the present disclosure;

[0027] Figure 8 is a diagram of a mesh structure according to some embodiments of the present disclosure;

[0028] Figure 9 is a flow chart of a method for enhanced mesh compression for holographic communications according to some embodiments of the present disclosure;

[0029] Figures 10A and 10B are computing nodes that perform enhanced mesh compression according to some embodiments of the present disclosure;

[0030] Figure 11 is a diagram depicting a multi-processing pipeline for enhanced mesh compression according to some embodiments of the present disclosure;

[0031] Figure 12 is a schematic block diagram of a computing node according to some embodiments of the present disclosure;

[0032] Figure 13 is a schematic block diagram that illustrates a virtualized embodiment of the computing node of Figure 12 according to some embodiments of the present disclosure; and [0033] Figure 14 is a schematic block diagram of the computing node of Figure 12 according to some other embodiments of the present disclosure.

Detailed Description

[0034] The embodiments set forth below represent information to enable those skilled in the art to practice the embodiments and illustrate the best mode of practicing the embodiments.

Upon reading the following description in light of the accompanying drawing figures, those skilled in the art will understand the concepts of the disclosure and will recognize applications of these concepts not particularly addressed herein. It should be understood that these concepts and applications fall within the scope of the disclosure.

[0035] Computing Node: As used herein, a “computing node” is any node with cloud computing server capabilities, or communication devices.

[0036] Communication Device: As used herein, a “communication device” is any type of device that has access to a network. Some examples of a communication device include, but are not limited to: mobile phone, smart phone, sensor device, meter, vehicle, household appliance, medical appliance, media player, camera, or any type of consumer electronic, for instance, but not limited to, a television, radio, lighting arrangement, tablet computer, laptop, Extended Reality device or Personal Computer (PC). The communication device may be a portable, hand-held, computer-comprised, or vehicle-mounted mobile device, enabled to communicate voice and/or data via a wireless or wireline connection.

[0037] The present disclosure provides methods and systems for compressing a digital image using an enhanced technique for real-time processing and compression of 3D meshes that utilizes not just vertices and faces information, but also color information to encode all the information required to render a three-dimensional (3D) digital image as a hologram for display on a wireless communication device, other computing device, or as part of an augmented reality or virtual reality display in a single bitstream which avoids synchronization problems. In this approach all three elements are encoded: vertices, colors, and faces. In the end, the hologram can look like a complete surface, with triangulation done faster on a sending device or in the cloud with no need to synchronize a mesh and color stream.

[0038] A pipeline for real-time compression and communication of holographic media, volumetric, immersive, or 3D media is enabled by the systems and method disclosed here. Media can be captured by cameras (e.g., red/green/blue (RGB) and depth sensors). The enhanced mesh compression system disclosed herein can reduce the latency to enable real-time end-to-end (Camera to a mobile device (e.g., Phone) / Augmented Reality (AR) glass or virtual reality (VR)) holographic communications for conversational services. Augmented reality devices can be standalone glass, or glass tethered to a mobile phone including wired or wireless technology or glass tethered to a cloud and/or edge server.

[0039] Traditional point cloud compression uses vertices and color compression. In this approach only vertices and colors are compressed. Visually, it will look like a set of individual points with colors, or a point cloud. There is no surface or a complete hologram. In order to create a surface, a triangulation on the end device (phone) is needed. Triangulation is computationally intense operation that can require powerful hardware. Furthermore, traditional mesh compression uses just vertices and faces. In this approach only vertices and faces are compressed, and the hologram will be without any colors. The main disadvantage is a need to encode and transport the colors separately which can result in synchronization problems when synchronizing the mesh and color streams.

[0040] The enhanced mesh compression approach disclosed herein uses vertices, color, and faces. In this approach all three elements are encoded during the mesh compression: vertices, colors, and faces. In the end, the advantages of the enhanced mesh compression approach disclosed herein are that the resulting hologram rendered at the target device can appear like a complete surface. Furthermore, the enhanced mesh compression enables triangulation to be done faster on a sending device or cloud without any need to synchronize the mesh and color stream. [0041] Figure 6 is a diagram depicting an enhanced mesh compression for holographic communications according to some embodiments of the present disclosure. To perform the enhanced mesh compression there can be two different workflows, a point cloud generation workflow 602, and a compression workflow 604. The point cloud generation workflow 602 and the compression workflow 604 can be performed and/or executed by one or more computing nodes, as discussed later with regard to Figures 10A and 10B.

[0042] The point cloud generation workflow 602 starts with inputs of images from one or more cameras. One of the input images can be an RGB image from a digital camera or other imaging device. Another input image can be a depth image or depth map from a depth camera, such as a light detection and ranging (LIDAR) camera, a depth sensor, or any other device capable of determining distances of objects or points from the origin point of the depth camera. The depth image can be an image or image channel that contains information relating to the distance of the surfaces of scene objects from a viewpoint.

[0043] At operation 606, the RGB image and the depth image can be aligned such that the depth image can be overlayed over the RGB image and that objects captured in the RGB image are aligned with the corresponding objects in the depth image. One or both of the RGB image or depth image can be translated, rotated, resized, or have another transformation performed based on the different field of views, focal lengths, etc., of the respective RGB and depth cameras.

[0044] The aligned RGB and depth images can then be down-sampled at operation 608, where the down-sampling can include, for example, down-sampling the resolution of the RGB and depth images from 32 bits to 16 bits. The down-sampling at operation 608 can also have different beginning and ending bit rates depending on the initial resolutions of the RGB and depth images as well as the network conditions. For example, one or more decimation filters can be selected or adjusted based on the network conditions, profile settings, and/or quality of service requirements associated with the communication.

[0045] At operation 610 in the point cloud generation workflow 602, the point cloud can be created based on the down-sampled RGB and depth images. The point cloud can also be based on the camera parameters associated with the respective RGB and depth cameras. Camera parameters include focal length, field of view, aperture, resolution, and other parameters intrinsic to the camera that can be used to facilitate generation of a camera model. These intrinsic camera parameters and the resulting model can allow the conversion of points from one coordinate system to another. [0046] The point cloud information can include a set of vertices that are a 3D representation of the depth image and further include attributes associated with the vertices. The attributes can include the 3D coordinates of each vertice (or vertex) of the set of vertices, and also include color information associated with the vertice.

[0047] At operation 612 in the compression workflow 604, the background vertices can be removed from the set of vertices. The background vertices can be associated with objects in the RGB image and depth image that are not in the foreground. In an embodiment, the background removal can be performed by removing vertices that exceed a predefined distance from an origin point associated with the depth camera. In practice, this can be accomplished by removing all vertices that exceed a z-coordinate threshold, where the z-axis extends outwards from the origin point of the camera.

[0048] The background removal at operation 612 results in a set of remaining vertices in a vertices frame. At operation 614, a color frame of the attributes can then be aligned and/or mapped to the vertices frame of the remaining vertices to ensure that the color attributes correspond to the remaining vertices. After the mapping performed at operation 614, the output is the color attributes that are utilized by the mesh compression at operation 620.

[0049] Concurrently, the set of remaining vertices that correspond to the foreground objects in the RGB and depth images, can also undergo triangulation at operation 616 to generate the faces which are used by the mesh compression. Triangulation is the process whereby individual vertices of the set of remaining vertices used to form polygons that collectively form a surface (e.g., a face) corresponding to the surface of the object in the images. Triangulation is the process of determining the location of a point by forming triangles to the point from known points. This mechanism creates a surface out of a set of points. In order to create a triangle, the triangulation process goes through all vertices and creates edges between them. The resulting face formed by the triangulation is the polygon formed by three or more vertices of the set of remaining vertices. It is to be appreciated that a vertice of the remaining vertices may be associated with one or more faces. Likewise, it is possible that not all vertices of the set of remaining vertices are associated with any face of the faces formed during the triangulation. In an embodiment, the number of faces generated, or subsets of vertices of the remaining vertices that are selected to form faces can be based on a size and/or shape of the foreground object.

[0050] At operation 618, smoothing can be performed on the set of remaining vertices which is essentially a noise reduction process. Smoothing captures important patterns in the data, while leaving out noise from a depth camera. In an exemplary algorithm, all the vertices can be iterated in a loop while considering neighboring points. In an exemplary implementation, each vertice’s x, y, and z coordinates can be set to the average coordinate values of a predefined number of neighbor vertices. For example, if the predefined number of neighboring vertices is four, the x, y, and z coordinates of each coordinate is set to the average of four neighboring vertice’s coordinates. The smoothing can reduce the computation time of the mesh compression, and the number of neighboring vertices to average can be based on network conditions or quality of service requirements. The smoothing at operation 618 can result in a set of smoothed vertices which can then be utilized by the mesh compression at operation 620.

[0051] The mesh compression at operation 620 utilizes as input: 1) the color information that has been mapped to the vertice frame at operation 614, 2) the faces were generated by the triangulation at operation 616, and 3) the smoothed vertices from the smoothing at operation 618. The mesh compression at operation 620 generates a mesh structure comprising the faces, vertices and color information associated with each vertice which can be transmitted in a single bitstream to another device, thus avoiding synchronization issues that result from transmitting the color information separately from the traditional mesh structure that only incorporated the faces and vertices.

[0052] An example of the mesh structure is depicted in Figure 8, where a face 804 is depicted with three vertices (e.g., vertices 802). Each vertice 802 has associated vertex coordinates 806 along with color information 808 for each vertex.

[0053] An exemplary mesh compression can employ a modified version of an open source mesh compression such as Draco to extend Draco to incorporate color information (e.g., RGB values). Other mesh compression algorithms can be modified or extended, but the following discussion is an example of how Draco could be modified to support the enhanced mesh compression process.

[0054] A first operation can include defining a data type for exporting RGB values. Where the following line can be added to the src/draco/core/vector_d.h file: typedef VeclorD<uint8_l, 3> Vector3ui8;

This data type is used by the encode_color_mesh_to„buffer function to export RGB values to Draco and is not included in the original Draco project.

[0055] A second operation can include modifying the mesh encoding algorithm (e.g., dracopy.h) to account for colors. The modified mesh encoding should take vertices, faces and color as input.

[0056] Next, a mesh encoding option can be created. E.g., inside encode_mesh (mesh encoding option) introduces a new method for encoding color in meshes. The new method takes vertices, colors and faces as attributes. The bolded portions of the code are the modified portions. EncodedObject encode _color_mesh( const std: :vector<float> &points, const std: :vector<unsigned int> defaces, const std::vector<uint8_t> decolors, int quantization_bits, int compression_level, float quantization_range, const float *quantization_origin, bool create_metadata) {

[0057] To ensure the colors and points correspond, each vertex can be associated with a color attribute:

The number of points (vertices) is equivalent to the number of color values: if (points. size() != colors. size()){ std::cout « "Unequal number of colors and vertices passedSn”;

EncodedObject badObj; badObj. encode _status = failed_during_encoding; return badObj;

}

[0058] Then, a color attribute can be added to the mesh builder: const int color_att_id = mb.AddAttribute( draco: : Geometry Attribute: :COLOR, 3, draco: :DataType: :DT_UINT8, true );

[0059] Then, the location and color for each vertex can be set for the current face. For each vertex of a face, it can be associated with a vertex (3D coordinate) and the corresponding color attribute of each vertex:

// Set location and color for each vertex of the current face mb.SetAttributeValuesForFace( color_att_id, draco: :FaceIndex( i ), draco:: Vector3ui8( colors [ point 1 Index ], colors [ point IIndex+ 1 ], colors[pointlIndex+2 ] ).data( ), draco: :Vector3ui8( colors [point2Index], colors [point2Index+ 1 ], colors [point2Index+ 2 ] ).data( ), draco:: Vector3ui8( colors [ point3Index ], colors [ point3Index+ 1 ], colors [point3Index+ 2 ] ).data( ) );

[0060] Then mesh value can be assigned: mesh->DeduplicateAttributeValues(); mesh->DeduplicatePointIds( );

[0061] Then each mesh can be passed to an underlying mesh encoder function (e.g., Edgebreaker) and the resulting mesh can be the mesh structure depicted in Figure 8. [0062] More generally the steps for modifying an existing mesh compression algorithm can include:

1) define a new data type for exporting RGB values to a mesh encoder function;

2) modify a mesh encoding algorithm to account for colors;

3) introduce a method for encoding color in meshes that takes vertices, colors, and faces as input;

4) ensure colors and vertices sizes match;

5) add color attributes to the mesh builder;

6) set location and color for each vertex of the face;

7) assign attribute values to the mesh; and

8) pass the new mesh to the encoder function.

[0063] Figure 7 illustrates a flow chart of enhanced mesh compression for holographic communications according to some embodiments of the present disclosure. The flow chart contains the same operations as the workflow in Figure 6, but organized in a different way to more easily see the flow from beginning to end.

[0064] At operation 606, RGB and depth frames are captured from camera sensors. For example, the RGB frames or images can be received from a digital camera while the depth frame is taken from a depth sensor such as LIDAR camera, a depth sensor, or any other device capable of determining distances of objects or points from the origin point of the depth camera.

[0065] At operation 608, the RGB and depth frames are down-sampled from a first bitrate to a second and lower bitrate to reduce the number of points for depth and RGB resolution.

[0066] At operation 610, the point cloud is created from the depth and RGB images and results in frames in the form of vertices and attributes.

[0067] At operation 612, the background is removed from the vertices of the vertices frame, and the output from the background removal, the set of remaining vertices is then processed concurrently at operations 616, 614, and 618. At operation 616, the triangulation is performed to generate faces from the remaining vertices. At operation 614, the color frame is mapped to the frame of remaining vertices, which generates an aligned color frame, and at operation 618, the remaining vertices are smoothed by averaging their coordinate values with neighboring vertices. The outputs of each of operation 616 (faces), operation 614 (color frame), and operation 618 (vertices) are then used as input to the enhanced mesh compression at operation 620.

[0068] Figure 9 illustrates a flow chart of a method for enhanced mesh compression for holographic communications according to some embodiments of the present disclosure.

[0069] The method can begin at operation 902 which includes determining a representation of a digital image corresponding to a set of vertices, color attributes corresponding to the set of vertices, and faces comprising subsets of vertices of the set of vertices. In an embodiment, the representation of the digital image includes the set of vertices, faces, and color attributes collectively are the data by which the digital image can be reconstructed after being compressed/encoded into the mesh. Thus, the determining the representation of the digital image includes operations 606-618 from Figure 6, which result in the color attributes, the set of vertices, and the faces.

[0070] At operation 904, the method includes generating a mesh that comprises for each vertex of each of the faces a three-dimensional vertex coordinate and a corresponding color attribute associated with each vertex coordinate. The mesh created can include a plurality of the mesh structures depicted in Figure 8, one for each face.

[0071] At operation 906, the method includes encoding the mesh to provide a compressed digital image.

[0072] At operation 908, the method includes transmitting the compressed digital image over a network. The compressed digital image is represented by the encoded mesh, and is transmitted as a single bitstream which avoids synchronization problems at the receiving device. The transmission can be performed over the internet between computing devices, or via a wireless communications network from one wireless communications device to another.

[0073] Figures 10A and 10B are computing nodes that perform enhanced mesh compression according to some embodiments of the present disclosure. In the example shown in Figure 10A, a single computing node 1002 can perform both the point cloud generation workflow 602 and compression workflow 604. The computing node 1002 can be the device or associated with the device which captures the RGB and depth images. Alternatively, the computing node 1002 can be an intermediate device, for example, a network node of a wireless communications network, or a server in the cloud.

[0074] In the example shown in Figure 10B, different parts of the preprocessing operations 1010 and 1012 of the point cloud generation workflow 602 can be performed at different computing nodes 1004 and 1006 and the output of each can be provided to computing node 1002 to perform the compression workflow 604.

[0075] In an embodiment, compression of RGB and depth images and image decompression of RGB and depth images can be additionally applied, when the RGB and depth image capturing, and the point cloud creation are taking place at different nodes. As an example, if the RGB and depth images that are captured on mobile device can be compressed and transmitted to another computing node in the network for point cloud creation and mesh compression. The RGB and depth images can be encoded using standard legacy codecs (e.g., H.265).

[0076] In other embodiments, various different operations can be performed at one or more computing nodes. For example, RGB/depth operations such as RGB and depth alignment, reducing number of points for depth and RGB (fidelity) can be performed at a first node. Point cloud creation, background removal, mapping of color points to reduced frame, smoothing, and triangulation can be performed at a second node, and mesh compression based on vertices, attribute (color) and faces can be done at a third node. Any combination of operations and computing nodes is possible.

[0077] Parts of the compression pipeline can be optional such as the down-sampling, background removal, smoothing. These optional steps can be used for the purpose of bitrate reduction for some use-cases e.g., real-time conversational services or for quality enhancement such as smoothing of vertices.

[0078] Figure 11 is a diagram depicting another embodiment of a multi-processing pipeline for enhanced mesh compression according to some embodiments of the present disclosure [0079] In an example, a main device 1102 can capture the RGB and depth images, and different operations can be executed using a multiprocessing framework where point cloud processing, mesh generation, and compression are handled using different workers or computing nodes 1104-1, 1104-2, and 1104-3, and then their output can be transmitted to a consumer or receiving device 1106.

[0080] Figure 12 is a schematic block diagram of a computing node 1200 according to some embodiments of the present disclosure. Optional features are represented by dashed boxes. The computing node 1200 may be, for example, a base station or a network node that implements all or part of the functionality of the base station or gNB described herein. Alternatively, the computing node 1200 can be a server or other device in the cloud. As illustrated, the computing node 1200 includes a control system 1202 that includes one or more processors 1204 (e.g., Central Processing Units (CPUs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs), and/or the like), memory 1206, and a network interface 1208. The one or more processors 1204 are also referred to herein as processing circuitry. In addition, the computing node 1200 may optionally include one or more radio units 1210 that each includes one or more transmitters 1212 and one or more receivers 1214 coupled to one or more antennas 1216. The radio units 1210 may be referred to or be part of radio interface circuitry. In some embodiments, the radio unit(s) 1210 is external to the control system 1202 and connected to the control system 1202 via, e.g., a wired connection (e.g., an optical cable). However, in some other embodiments, the radio unit(s) 1210 and potentially the antenna(s) 1216 are integrated together with the control system 1202. The one or more processors 1204 operate to provide one or more functions of a computing node 1200 as described herein. In some embodiments, the function(s) are implemented in software that is stored, e.g., in the memory 1206 and executed by the one or more processors 1204.

[0081] Figure 13 is a schematic block diagram that illustrates a virtualized embodiment of the computing node 1200 according to some embodiments of the present disclosure. This discussion is equally applicable to other types of network nodes. Further, other types of network nodes may have similar virtualized architectures. Again, optional features are represented by dashed boxes.

[0082] As used herein, a “virtualized” computing node is an implementation of the computing node 1200 in which at least a portion of the functionality of the computing node 1200 is implemented as a virtual component(s) (e.g., via a virtual machine(s) executing on a physical processing node(s) in a network(s)). As illustrated, in this example, the computing node 1200 may include the control system 1202 and/or the one or more radio units 1210, as described above. The control system 1202 may be connected to the radio unit(s) 1210 via, for example, an optical cable or the like. The computing node 1200 includes one or more processing nodes 1300 coupled to or included as part of a network(s) 1302. If present, the control system 1202 or the radio unit(s) are connected to the processing node(s) 1300 via the network 1302. Each processing node 1300 includes one or more processors 1304 (e.g., CPUs, ASICs, FPGAs, and/or the like), memory 1306, and a network interface 1308.

[0083] In this example, functions 1310 of the computing node 1200 described herein are implemented at the one or more processing nodes 1300 or distributed across the one or more processing nodes 1300 and the control system 1202 and/or the radio unit(s) 1210 in any desired manner. In some particular embodiments, some or all of the functions 1310 of the computing node 1200 described herein are implemented as virtual components executed by one or more virtual machines implemented in a virtual environment(s) hosted by the processing node(s) 1300. As will be appreciated by one of ordinary skill in the art, additional signaling or communication between the processing node(s) 1300 and the control system 1202 is used in order to carry out at least some of the desired functions 1310. Notably, in some embodiments, the control system 1202 may not be included, in which case the radio unit(s) 1210 communicate directly with the processing node(s) 1300 via an appropriate network interface(s).

[0084] In some embodiments, a computer program including instructions which, when executed by at least one processor, causes the at least one processor to carry out the functionality of computing node 1200 or a node (e.g., a processing node 1300) implementing one or more of the functions 1310 of the computing node 1200 in a virtual environment according to any of the embodiments described herein is provided. In some embodiments, a carrier comprising the aforementioned computer program product is provided. The carrier is one of an electronic signal, an optical signal, a radio signal, or a computer readable storage medium (e.g., a non-transitory computer readable medium such as memory).

[0085] Figure 14 is a schematic block diagram of the computing node 1200 according to some other embodiments of the present disclosure. The computing node 1200 includes one or more modules 1400, each of which is implemented in software. The module(s) 1400 provide the functionality of the computing node 1200 described herein. This discussion is equally applicable to the processing node 1300 of Figure 13 where the modules 1400 may be implemented at one of the processing nodes 1300 or distributed across multiple processing nodes 1300 and/or distributed across the processing node(s) 1300 and the control system 1202.

[0086] Any appropriate operations, methods, features, functions, or benefits disclosed herein may be performed through one or more functional units or modules of one or more virtual apparatuses. Each virtual apparatus may comprise a number of these functional units. These functional units may be implemented via processing circuitry, which may include one or more microprocessor or microcontrollers, as well as other digital hardware, which may include Digital Signal Processors (DSPs), special-purpose digital logic, and the like. The processing circuitry may be configured to execute program code stored in memory, which may include one or several types of memory such as Read Only Memory (ROM), Random Access Memory (RAM), cache memory, flash memory devices, optical storage devices, etc. Program code stored in memory includes program instructions for executing one or more telecommunications and/or data communications protocols as well as instructions for carrying out one or more of the techniques described herein. In some implementations, the processing circuitry may be used to cause the respective functional unit to perform corresponding functions according one or more embodiments of the present disclosure.

[0087] While processes in the figures may show a particular order of operations performed by certain embodiments of the present disclosure, it should be understood that such order is exemplary (e.g., alternative embodiments may perform the operations in a different order, combine certain operations, overlap certain operations, etc.).

[0088] At least some of the following abbreviations may be used in this disclosure. If there is an inconsistency between abbreviations, preference should be given to how it is used above. If listed multiple times below, the first listing should be preferred over any subsequent listing(s). 2D Two Dimensional

3D Three Dimensional

3GPP Third Generation Partnership Project

5G Fifth Generation

AMF Access and Mobility Function

AN Access Network

AR Augmented Reality

ASIC Application Specific Integrated Circuit

AUSF Authentication Server Function

CPU Central Processing Unit

DSP Digital Signal Processor eNB Enhanced or Evolved Node B

FPGA Field Programmable Gate Array gNB New Radio Base Station gNB-DU New Radio Base Station Distributed Unit

G-PCC Geometry-based Point Cloud Compression

HE VC High Efficiency Video Coding

HSS Home Subscriber Server iOS iPhone Operating System

LIDAR Light Detection and Ranging

LTE Long Term Evolution

MME Mobility Management Entity

MPEG Moving Picture Experts Group

NEF Network Exposure Function

NF Network Function

NR New Radio

NRF Network Function Repository Function

NSSF Network Slice Selection Function

PC Personal Computer

PCF Policy Control Function

P-GW Packet Data Network Gateway

QoS Quality of Service

RAM Random Access Memory • RAN Radio Access Network

• RGB Red, Green, and Blue

• ROM Read Only Memory

• SCEF Service Capability Exposure Function

• SMF Session Management Function

• UDM Unified Data Management

• UPF User Plane Function

• V-PCC Video-based Point Cloud Compression

• XR Extended Reality [0089] Those skilled in the art will recognize improvements and modifications to the embodiments of the present disclosure. All such improvements and modifications are considered within the scope of the concepts disclosed herein.

Claims

1. A method of compressing a digital image, the method implemented by one or more computing nodes (1002, 1004, 1006), comprising: determining (902) a representation of a digital image corresponding to a set of vertices, color attributes corresponding to the set of vertices, and faces comprising subsets of vertices of the set of vertices; generating (904) a mesh that comprises for each vertex of each of the faces: a three-dimensional vertex coordinate; and a corresponding color attribute associated with each vertex coordinate; encoding (906) the mesh to provide a compressed digital image; and transmitting (908) the compressed digital image over a network.

2. The method of claim 1, further comprising: prior to determining the representation of a digital image, removing (612) vertices that exceed a predefined distance from an origin point associated with a camera, resulting in a set of remaining vertices.

3. The method of claim 2, wherein determining the representation of the digital image, further comprises: mapping (614) a color frame of the color attributes to a vertices frame of the set of remaining vertices associated with a rendered object.

4. The method of claim 3, wherein determining the representation of the digital image further comprises: associating (616) the faces to subsets of vertices of the set of vertices based on a size of a rendered object.

5. The method of any of claims 1-4, wherein the set of vertices and color attributes are point cloud information.

6. The method of claim 5, wherein determining the representation of a digital image further comprises: generating (610) the point cloud information based on first image data comprising Red Green Blue, RGB, information of a region and second image data comprising depth information of the region.

7. The method of any of claims 5-6, wherein the point cloud information is compressed and is received from another computing node (1004, 1006).

8. The method of claim 6, further comprising: prior to generating (610) the point cloud information, down-sampling (608) resolutions of the first image data and the second image data.

9. The method of any of claims 1-8, further comprising: adjusting the set of vertices to match the size of the rendered object.

10. The method of any of claims 1-9, further comprising: smoothing (618) the set of vertices to remove noise from neighboring vertices.

11. The method of any of claims 1-10, further comprising: wherein the compressed digital image facilitates rendering of the digital image in a virtual reality display or augmented reality display.

12. The method of claim 3, wherein the rendered object is a person in a foreground of an image.

13. A computing node (1002) that compresses a digital image, the computing node comprising processing circuitry configured to: determine (902) a representation of a digital image corresponding to a set of vertices, color attributes corresponding to the set of vertices, and faces comprising subsets of vertices of the set of vertices; generate (904) a mesh that comprises for each vertex of each of the faces: a three-dimensional vertex coordinate; and a corresponding color attribute associated with each vertex coordinate; encode (906) the mesh to provide a compressed digital image; and transmit (908) the compressed digital image over a network.

14. The computing node (1002) of claim 13, wherein the processing circuitry is further configured to: prior to determining the representation of a digital image, remove (612) vertices that exceed a predefined distance from an origin point associated with a camera.

15. The computing node (1002) of claim 14, wherein the processing circuitry is further configured to: map (614) a color frame of the color attributes to a vertices frame of the set of remaining vertices associated with a rendered object.

16. The computing node (1002) of claim 15, wherein the processing circuitry is further configured to: associate (616) the faces to subsets of vertices of the set of vertices based on a size of the rendered object.

17. The computing node (1002) of any of claims 13-16, wherein the set of vertices and color attributes associated with the set of vertices are point cloud information.

18. The computing node (1002) of claim 17, wherein the processing circuitry is further configured to: generate (610) the point cloud information based on first image data comprising Red Green Blue, RGB, information of a region and second image data comprising depth information of the region

19. The computing node (1002) of claim 17, wherein the point cloud information is compressed and is received from another computing node (1004, 1006).

20. The computing node (1002) of claim 18, wherein the processing circuitry is further configured to: prior to generating (610) the point cloud information, down-sample (608) resolutions of the first image data and the second image data.

21. The computing node (1002) of any of claims 13-20, wherein the processing circuitry is further configured to: adjust the set of vertices are adjusted to match the size of rendered object.

22. The computing node (1002) of any of claims 13-21, wherein the processing circuitry is further configured to: smooth (618) the set of vertices to remove noise from neighboring vertices.

23. The computing node (1002) of any of claims 13-22, wherein the compressed digital image facilitates rendering of the digital image in a virtual reality display or augmented reality display.

24. The computing node (1002) of claim 15, wherein the rendered object is a person in a foreground of an image.

25. A non-transitory computer readable medium comprising instructions, that when executed by a processor, perform operations comprising: determining (902) a representation of a digital image corresponding to a set of vertices, color attributes corresponding to the set of vertices, and faces comprising subsets of vertices of the set of vertices; generating (904) a mesh that comprises for each vertex of each of the faces: a three-dimensional vertex coordinate; and a corresponding color attribute associated with each vertex coordinate; encoding (906) the mesh to provide a compressed digital image; and transmitting (908) the compressed digital image over a network.

26. The non-transitory computer readable medium of claim 25, wherein the operations further comprise: prior to determining the representation of a digital image, removing (612) vertices that exceed a predefined distance from an origin point associated with a camera, resulting in a set of remaining vertices.

27. The non-transitory computer readable medium of claim 26, wherein determining the representation of the digital image, further comprises: mapping (614) a color frame of the color attributes to a vertices frame of the set of remaining vertices associated with a rendered object.

28. The non-transitory computer readable medium of claim 27, wherein determining the representation of the digital image further comprises: associating (616) the faces to subsets of vertices of the set of vertices based on a size of a rendered object.

29. The non-transitory computer readable medium of any of claims 25-28, wherein the set of vertices and color attributes are point cloud information.

30. The non-transitory computer readable medium of claim 29, wherein determining the representation of a digital image further comprises: generating (610) the point cloud information based on first image data comprising Red Green Blue, RGB, information of a region and second image data comprising depth information of the region.

31. The non-transitory computer readable medium of any of claims 29-30, wherein the point cloud information is compressed and is received from another computing node (1004, 1006).

32. The non-transitory computer readable medium of claim 30, wherein the operations further comprise: prior to generating (610) the point cloud information, down-sampling (608) resolutions of the first image data and the second image data.

33. The non-transitory computer readable medium of any of claims 25-32, wherein the operations further comprise: adjusting the set of vertices to match the size of the rendered object.

34. The non-transitory computer readable medium of any of claims 25-33, wherein the operations further comprise: smoothing (618) the set of vertices to remove noise from neighboring vertices.

35. The non-transitory computer readable medium of any of claims 25-34, wherein the operations further comprise: wherein the compressed digital image facilitates rendering of the digital image in a virtual reality display or augmented reality display.

36. The non-transitory computer readable medium of claim 27, wherein the rendered object is a person in a foreground of an image.

37. The method of any of claims 1-36, wherein the set of vertices is a subset of a larger set of vertices.