CN118474327A

CN118474327A - Video editing method, device, equipment and storage medium

Info

Publication number: CN118474327A
Application number: CN202410577428.6A
Authority: CN
Inventors: 刘梓凌; 杨金宇; 高明琦; 郑锋
Original assignee: Shenzhen Tapu Intelligent Technology Co ltd
Current assignee: Shenzhen Tapu Intelligent Technology Co ltd
Priority date: 2024-05-10
Filing date: 2024-05-10
Publication date: 2024-08-09
Anticipated expiration: 2044-05-10
Also published as: CN118474327B

Abstract

The application discloses a video editing method, a device, equipment and a storage medium, relating to the field of computer vision, wherein the method comprises the following steps: acquiring surrounding video of an inserted target, wherein the surrounding video comprises video frames of the inserted target under different visual angles; inputting the surrounding video into a preset three-dimensional reconstruction model to obtain a target three-dimensional model, wherein the preset three-dimensional reconstruction model is obtained by training each video frame obtained by segmenting the surrounding video; obtaining video reconstruction information according to optical flow information of a video to be inserted in combination with a preset optimization algorithm, wherein the video reconstruction information comprises camera inner and outer parameter information and dense point cloud information; and implanting the target three-dimensional model into a user selected area to be inserted into the video according to the camera inside-outside parameter information and the dense point cloud information to obtain the edited video. Because the user only needs to provide the surrounding video of the insertion target, the video editing process from the three-dimensional model production to the video insertion integration can be realized, and compared with the prior art, the video editing speed is greatly improved.

Description

Video editing method, device, equipment and storage medium

技术领域Technical Field

本申请涉及计算机视觉技术领域，尤其涉及一种视频编辑方法、装置、设备及存储介质。The present application relates to the field of computer vision technology, and in particular to a video editing method, apparatus, device and storage medium.

背景技术Background Art

随着视频生成技术的发展，将三维模型插入到已有视频的视频编辑方式已在影视制作、广告制作以及虚拟现实等领域得到了应用，用户可对插入目标进行三维重建，进而将生成的三维模型插入至待插入视频以实现真实融合的效果。With the development of video generation technology, video editing methods that insert three-dimensional models into existing videos have been applied in film and television production, advertising production, virtual reality and other fields. Users can perform three-dimensional reconstruction of the insertion target and then insert the generated three-dimensional model into the video to be inserted to achieve a real fusion effect.

然而，传统的物体三维重建的方式往往依赖着住专业的扫描设备以及特殊拍摄环境，重建的成本高且耗时长，且将三维模型植入现有视频需要多个后期软件的配合使用。因此为现实中的物体生成三维模型并将三维模型植入现有视频的视频编辑流程繁琐且低效，对于一般用户来说较为困难。However, traditional methods of object 3D reconstruction often rely on professional scanning equipment and special shooting environments. The reconstruction cost is high and time-consuming, and the implantation of 3D models into existing videos requires the use of multiple post-production software. Therefore, the video editing process of generating 3D models for real objects and implanting 3D models into existing videos is cumbersome and inefficient, and difficult for general users.

上述内容仅用于辅助理解本申请的技术方案，并不代表承认上述内容是现有技术。The above contents are only used to assist in understanding the technical solution of the present application and do not constitute an admission that the above contents are prior art.

发明内容Summary of the invention

本申请的主要目的在于提供一种视频编辑方法，旨在解决现有的将三维模型插入至已有视频的视频编辑方式流程繁琐低效，对一般用户来说较为困难的技术问题。The main purpose of this application is to provide a video editing method, which aims to solve the technical problem that the existing video editing method of inserting a three-dimensional model into an existing video is cumbersome and inefficient, and is relatively difficult for general users.

为实现上述目的，本申请提出一种视频编辑方法，所述视频编辑方法包括：To achieve the above object, the present application proposes a video editing method, which includes:

获取插入目标的环绕视频，所述环绕视频中包含所述插入目标在不同视角下的视频帧；Acquire a surround video of the inserted target, wherein the surround video includes video frames of the inserted target at different viewing angles;

将所述环绕视频输入至预设三维重建模型，获得目标三维模型，所述预设三维重建模型由所述环绕视频分割得到的各视频帧训练所得；Inputting the surround video into a preset 3D reconstruction model to obtain a target 3D model, wherein the preset 3D reconstruction model is trained by each video frame obtained by segmenting the surround video;

根据待插入视频的光流信息结合预设优化算法获得视频重建信息，所述视频重建信息包括相机内外参信息以及稠密点云信息；Obtaining video reconstruction information based on the optical flow information of the video to be inserted and a preset optimization algorithm, wherein the video reconstruction information includes camera internal and external parameter information and dense point cloud information;

根据所述相机内外参信息以及所述稠密点云信息将所述目标三维模型植入至所述待插入视频中的用户选定区域，获得已编辑视频。The target three-dimensional model is implanted into a user-selected area in the video to be inserted according to the camera internal and external parameter information and the dense point cloud information to obtain an edited video.

在一实施例中，所述预设三维重建模型包括几何编码感知机以及纹理编码感知机；所述预设三维重建模型的训练步骤，包括：In one embodiment, the preset 3D reconstruction model includes a geometric coding perceptron and a texture coding perceptron; the training step of the preset 3D reconstruction model includes:

初始化四面体网格，将所述四面体网格的三维坐标输入至所述几何编码感知机，获得对应的几何信息，并根据所述几何信息转换得到目标网格；Initialize a tetrahedral mesh, input the three-dimensional coordinates of the tetrahedral mesh into the geometric encoding perceptron, obtain corresponding geometric information, and convert the target mesh according to the geometric information;

将所述目标网格的三维坐标输入至所述纹理编码感知机，获得纹理信息；Inputting the three-dimensional coordinates of the target grid into the texture encoding perceptron to obtain texture information;

将环境照明贴图、所述环绕视频的相机位姿信息、所述目标网格以及所述纹理信息输入至预设可微渲染器，获得当前渲染图；Inputting the environment lighting map, the camera pose information of the surround video, the target mesh and the texture information into a preset differentiable renderer to obtain a current rendering image;

根据所述当前渲染图与所述插入目标在不同视角下的前景图像进行损失计算，获得损失梯度，并基于梯度回传机制通过所述损失梯度对所述几何编码感知机、所述纹理编码感知机以及所述环境照明贴图进行优化；Perform loss calculation according to the current rendering image and the foreground image of the inserted target at different viewing angles to obtain a loss gradient, and optimize the geometry encoding perceptron, the texture encoding perceptron and the ambient lighting map through the loss gradient based on a gradient feedback mechanism;

基于最小化经验风险的优化目标，获得预设三维重建模型。Based on the optimization goal of minimizing empirical risk, a preset three-dimensional reconstruction model is obtained.

在一实施例中，所述初始化四面体网格，将所述四面体网格的三维坐标输入至所述几何编码感知机，获得对应的几何信息，并根据所述几何信息转换得到目标网格的步骤之前，包括：In one embodiment, before the steps of initializing the tetrahedral mesh, inputting the three-dimensional coordinates of the tetrahedral mesh into the geometric encoding perceptron, obtaining corresponding geometric information, and converting the target mesh according to the geometric information, the steps include:

在接收到用户的点击指令时，根据所述点击指令在所述环绕视频的当前视频帧中生成所述插入目标对应的掩码区域；When receiving a click instruction from the user, generating a mask area corresponding to the insertion target in the current video frame of the surround video according to the click instruction;

在所述环绕视频分割得到的其他视频帧中根据所述掩码区域进行分割，获得所述插入目标在不同视角下的前景图像。Segmentation is performed in other video frames obtained by segmenting the surround video according to the mask area to obtain foreground images of the inserted target at different viewing angles.

在一实施例中，所述根据待插入视频的光流信息结合预设优化算法获得视频重建信息的步骤，包括：In one embodiment, the step of obtaining video reconstruction information based on the optical flow information of the video to be inserted in combination with a preset optimization algorithm includes:

将待插入视频对应的若干个视频帧输入至光流预测模型，确定各所述视频帧的光流位移值；Inputting a plurality of video frames corresponding to the video to be inserted into the optical flow prediction model to determine the optical flow displacement value of each of the video frames;

在所述光流位移值大于预设阈值时，将所述光流位移值对应的视频帧添加至关键帧集合；When the optical flow displacement value is greater than a preset threshold, adding the video frame corresponding to the optical flow displacement value to a key frame set;

在所述关键帧集合中采用预设优化算法进行视频重建信息的确定，获得相机内外参信息以及稠密点云信息。A preset optimization algorithm is used in the key frame set to determine the video reconstruction information to obtain camera internal and external parameter information and dense point cloud information.

在一实施例中，所述对所述关键帧集合中的各视频帧采用预设优化算法进行视频重建信息的确定，获得相机内外参信息以及稠密点云信息的步骤，包括：In one embodiment, the step of determining video reconstruction information using a preset optimization algorithm for each video frame in the key frame set to obtain camera internal and external parameter information and dense point cloud information includes:

将所述关键帧集合划分为在线优化集合以及离线优化集合；Dividing the key frame set into an online optimization set and an offline optimization set;

在所述在线优化集合中采用预设优化算法确定各视频帧对应的相机相对位姿、深度图以及相机内参；In the online optimization set, a preset optimization algorithm is used to determine the relative position and depth map of the camera and the camera intrinsic parameters corresponding to each video frame;

在所述离线优化集合中采用预设优化算法对所述相机相对位姿、所述深度图以及所述相机内参进行更新，获得相机内外参信息以及稠密点云信息。In the offline optimization set, a preset optimization algorithm is used to update the camera relative pose, the depth map and the camera intrinsic parameters to obtain camera internal and external parameter information and dense point cloud information.

在一实施例中，所述根据所述相机内外参信息以及所述稠密点云信息将所述目标三维模型植入至所述待插入视频中的用户选定区域，获得已编辑视频的步骤，包括：In one embodiment, the step of implanting the target three-dimensional model into the user-selected area in the video to be inserted according to the camera internal and external parameter information and the dense point cloud information to obtain the edited video includes:

在接收到用户的选取指令时，在所述待插入视频中生成用户选定区域；Upon receiving a selection instruction from a user, generating a user-selected area in the video to be inserted;

根据所述相机内外参信息以及所述稠密点云信息将所述用户选定区域的像素坐标反投影到所述待插入视频对应的相机坐标系中，获得所述用户选定区域的三维点云坐标；Back-projecting the pixel coordinates of the area selected by the user into the camera coordinate system corresponding to the video to be inserted according to the camera internal and external parameter information and the dense point cloud information to obtain the three-dimensional point cloud coordinates of the area selected by the user;

确定所述待插入视频对应的三维平面法向量，并结合所述用户选定区域的三维点云坐标对所述目标三维模型进行预处理；Determine the three-dimensional plane normal vector corresponding to the video to be inserted, and pre-process the target three-dimensional model in combination with the three-dimensional point cloud coordinates of the area selected by the user;

根据所述视频内外参信息渲染预处理后的目标三维模型对应的不同视角图片，并根据各所述视角图片和所述待插入视频对应的若干个视频帧合成已编辑视频。Different viewing angle pictures corresponding to the preprocessed target three-dimensional model are rendered according to the internal and external parameter information of the video, and the edited video is synthesized according to each of the viewing angle pictures and a number of video frames corresponding to the video to be inserted.

在一实施例中，所述确定所述待插入视频对应的三维平面法向量，并结合所述用户选定区域的三维点云坐标对所述目标三维模型进行预处理的步骤，包括：In one embodiment, the step of determining the three-dimensional plane normal vector corresponding to the video to be inserted, and preprocessing the target three-dimensional model in combination with the three-dimensional point cloud coordinates of the user-selected area includes:

基于随机抽样一致算法，根据所述待插入视频对应的相机坐标系确定三维平面法向量；Based on a random sampling consensus algorithm, a three-dimensional plane normal vector is determined according to a camera coordinate system corresponding to the video to be inserted;

基于所述三维平面法向量，结合所述用户选定区域的三维点云坐标对所述目标三维模型进行预处理。The target three-dimensional model is preprocessed based on the three-dimensional plane normal vector and combined with the three-dimensional point cloud coordinates of the user-selected area.

此外，为实现上述目的，本申请还提出一种视频编辑装置，所述装置包括：模型生成模块、视频重建模块以及模型植入模块；In addition, to achieve the above-mentioned purpose, the present application also proposes a video editing device, which includes: a model generation module, a video reconstruction module and a model implantation module;

所述模型生成模块，用于获取插入目标的环绕视频，所述环绕视频中包含所述插入目标在不同视角下的视频帧；The model generation module is used to obtain a surround video of the inserted target, wherein the surround video includes video frames of the inserted target at different viewing angles;

所述模型生成模块，还用于将所述环绕视频输入至预设三维重建模型，获得目标三维模型，所述预设三维重建模型由所述环绕视频分割得到的各视频帧训练所得；The model generation module is further used to input the surround video into a preset 3D reconstruction model to obtain a target 3D model, wherein the preset 3D reconstruction model is trained by each video frame obtained by segmenting the surround video;

所述视频重建模块，用于根据待插入视频的光流信息结合预设优化算法获得视频重建信息，所述视频重建信息包括相机内外参信息以及稠密点云信息；The video reconstruction module is used to obtain video reconstruction information according to the optical flow information of the video to be inserted in combination with a preset optimization algorithm, wherein the video reconstruction information includes camera internal and external parameter information and dense point cloud information;

所述模型植入模块，用于根据所述相机内外参信息以及所述稠密点云信息将所述目标三维模型植入至所述待插入视频中的用户选定区域，获得已编辑视频。The model implantation module is used to implant the target three-dimensional model into the user-selected area in the video to be inserted according to the camera internal and external parameter information and the dense point cloud information to obtain the edited video.

此外，为实现上述目的，本申请还提出一种视频编辑设备，所述设备包括：存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序，所述计算机程序配置为实现如上文所述的视频编辑方法的步骤。In addition, to achieve the above objectives, the present application also proposes a video editing device, which includes: a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the computer program is configured to implement the steps of the video editing method described above.

此外，为实现上述目的，本申请还提出一种存储介质，所述存储介质为计算机可读存储介质，所述存储介质上存储有计算机程序，所述计算机程序被处理器执行时实现如上文所述的视频编辑方法的步骤。In addition, to achieve the above-mentioned purpose, the present application also proposes a storage medium, which is a computer-readable storage medium, and a computer program is stored on the storage medium. When the computer program is executed by a processor, the steps of the video editing method described above are implemented.

本申请提供了一种视频编辑方法，通过获取插入目标的环绕视频，所述环绕视频中包含所述插入目标在不同视角下的视频帧；将所述环绕视频输入至预设三维重建模型，获得目标三维模型，所述预设三维重建模型由所述环绕视频分割得到的各视频帧训练所得；根据待插入视频的光流信息结合预设优化算法获得视频重建信息，所述视频重建信息包括相机内外参信息以及稠密点云信息；根据所述相机内外参信息以及所述稠密点云信息将所述目标三维模型植入至所述待插入视频中的用户选定区域，获得已编辑视频。由于本申请中用户只需提供插入目标的环绕视频，即可由预设三维重建模型得到目标三维模型，简化了传统的三维重建流程，能够降低三维重建成本，进而在待插入视频的用户选定区域中根据视频重建信息实现目标三维模型的插入，实现三维模型生产到插入视频一体的视频编辑过程，相较于现有方式，大大提升了视频编辑的速率。The present application provides a video editing method, which includes obtaining a surround video of an inserted target, wherein the surround video includes video frames of the inserted target at different viewing angles; inputting the surround video into a preset 3D reconstruction model to obtain a target 3D model, wherein the preset 3D reconstruction model is obtained by training each video frame obtained by segmenting the surround video; obtaining video reconstruction information based on the optical flow information of the video to be inserted in combination with a preset optimization algorithm, wherein the video reconstruction information includes camera internal and external parameter information and dense point cloud information; and implanting the target 3D model into a user-selected area in the video to be inserted based on the camera internal and external parameter information and the dense point cloud information to obtain an edited video. Since the user in the present application only needs to provide a surround video of the inserted target, the target 3D model can be obtained from the preset 3D reconstruction model, which simplifies the traditional 3D reconstruction process and can reduce the cost of 3D reconstruction. Furthermore, the insertion of the target 3D model is realized in the user-selected area of the video to be inserted based on the video reconstruction information, thereby realizing a video editing process that integrates 3D model production and video insertion, which greatly improves the speed of video editing compared to the existing methods.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

此处的附图被并入说明书中并构成本说明书的一部分，示出了符合本申请的实施例，并与说明书一起用于解释本申请的原理。The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and, together with the description, serve to explain the principles of the present application.

为了更清楚地说明本申请实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，对于本领域普通技术人员而言，在不付出创造性劳动性的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present application or the prior art, the drawings required for use in the embodiments or the description of the prior art will be briefly introduced below. Obviously, for ordinary technicians in this field, other drawings can be obtained based on these drawings without paying any creative labor.

图1为本申请视频编辑方法实施例一提供的流程示意图；FIG1 is a schematic diagram of a flow chart of a video editing method according to an embodiment of the present invention;

图2为本申请视频编辑方法实施例二提供的流程示意图；FIG2 is a schematic diagram of a flow chart of a second embodiment of a video editing method of the present application;

图3为本申请视频编辑方法实施例三提供的流程示意图；FIG3 is a flow chart of a third embodiment of the video editing method of the present application;

图4为本申请视频编辑装置实施例一提供的模块结构示意图；FIG4 is a schematic diagram of a module structure provided by Embodiment 1 of the video editing device of the present application;

图5为本申请实施例中视频编辑方法涉及的硬件运行环境的设备结构示意图。FIG. 5 is a schematic diagram of the device structure of the hardware operating environment involved in the video editing method in an embodiment of the present application.

本申请目的实现、功能特点及优点将结合实施例，参照附图做进一步说明。The purpose, features and advantages of this application will be further described in conjunction with the embodiments and with reference to the accompanying drawings.

具体实施方式DETAILED DESCRIPTION

应当理解，此处所描述的具体实施例仅仅用以解释本申请的技术方案，并不用于限定本申请。It should be understood that the specific embodiments described herein are only used to explain the technical solutions of the present application and are not used to limit the present application.

为了更好的理解本申请的技术方案，下面将结合说明书附图以及具体的实施方式进行详细的说明。In order to better understand the technical solution of the present application, a detailed description will be given below in conjunction with the accompanying drawings and specific implementation methods.

本申请实施例的主要解决方案是：获取插入目标的环绕视频，所述环绕视频中包含所述插入目标在不同视角下的视频帧；将所述环绕视频输入至预设三维重建模型，获得目标三维模型，所述预设三维重建模型由所述环绕视频分割得到的各视频帧训练所得；根据待插入视频的光流信息结合预设优化算法获得视频重建信息，所述视频重建信息包括相机内外参信息以及稠密点云信息；根据所述相机内外参信息以及所述稠密点云信息将所述目标三维模型渲染至所述待插入视频中的用户选定区域，获得已编辑视频。The main solution of the embodiment of the present application is: obtaining a surround video of the inserted target, the surround video containing video frames of the inserted target at different perspectives; inputting the surround video into a preset 3D reconstruction model to obtain a target 3D model, the preset 3D reconstruction model is trained by each video frame obtained by segmenting the surround video; obtaining video reconstruction information based on the optical flow information of the video to be inserted in combination with a preset optimization algorithm, the video reconstruction information including camera internal and external parameter information and dense point cloud information; rendering the target 3D model to the user-selected area in the video to be inserted based on the camera internal and external parameter information and the dense point cloud information to obtain an edited video.

为现实中的物体生成三维模型并将三维模型植入现有视频的视频编辑流程对于普通人来说是困难且成本高昂的。重建现实中物体的精确几何结构和表面材质往往需要专业的3d扫描设备和特殊的拍摄环境。而将一个三维模型插入一个现有视频也通常需要专业的影视后期工程师花费较长的时间,比如首先用boujou恢复视频的稀疏点云和相机内外参，然后再在3ds Max，blender等三维渲染软件中手动对齐物体放置平面和视频中的目标平面，并且利用boujou恢复出的相机参数逐帧渲染3d不同视角的图片。除了视频后期软件，现在的AR增强现实技术SDK如安卓的ARcore和谷歌的ARKit需要首先借助手机的精确相机内参和IMU，雷达等辅助传感器完成相机位姿的恢复和场景几何的重建，然后才能将3d物体渲染到实时拍摄的视频流中，整体流程繁琐低效。The video editing process of generating 3D models for real objects and inserting them into existing videos is difficult and costly for ordinary people. Reconstructing the precise geometry and surface material of real objects often requires professional 3D scanning equipment and special shooting environments. Inserting a 3D model into an existing video usually requires a long time for professional film and television post-production engineers, such as first using Boujou to restore the sparse point cloud and camera internal and external parameters of the video, and then manually aligning the object placement plane with the target plane in the video in 3D rendering software such as 3ds Max and Blender, and using the camera parameters restored by Boujou to render 3D pictures from different perspectives frame by frame. In addition to video post-production software, current AR augmented reality technology SDKs such as Android's ARcore and Google's ARKit need to first use the phone's precise camera internal parameters and auxiliary sensors such as IMU and radar to complete the camera pose recovery and scene geometry reconstruction before rendering the 3D object into the real-time video stream. The overall process is cumbersome and inefficient.

本申请中用户只需提供插入目标的环绕视频，即可由预设三维重建模型得到目标三维模型，简化了传统的三维重建流程，能够降低三维重建成本，进而在待插入视频的用户选定区域中根据视频重建信息实现目标三维模型的插入，实现三维模型生产到插入视频一体的视频编辑过程，相较于现有方式，大大提升了视频编辑的速率。In this application, the user only needs to provide a surround video of the insertion target, and the target 3D model can be obtained from the preset 3D reconstruction model, which simplifies the traditional 3D reconstruction process and can reduce the 3D reconstruction cost. Then, the target 3D model can be inserted into the user-selected area of the video to be inserted according to the video reconstruction information, thereby realizing an integrated video editing process from 3D model production to video insertion, which greatly improves the speed of video editing compared to the existing method.

需要说明的是，本实施例的执行主体可以是一种具有交互显示、数据处理、网络通信以及程序运行的计算服务设备，例如平板电脑、个人电脑、手机等，还可以是能够实现如同或是相似功能的其他电子设备，本实施例对此不加以限制，以下以视频编辑设备(下文简称“编辑设备”)为例对本实施例及下述各实施例进行说明。It should be noted that the execution subject of this embodiment can be a computing service device with interactive display, data processing, network communication and program running, such as a tablet computer, a personal computer, a mobile phone, etc., or other electronic devices that can achieve the same or similar functions. This embodiment is not limited to this. The following uses a video editing device (hereinafter referred to as "editing device") as an example to illustrate this embodiment and the following embodiments.

基于此，本申请实施例提供了一种视频编辑方法，参照图1，图1为本申请视频编辑方法实施例一提供的流程示意图。所述视频编辑方法包括步骤S10～S40：Based on this, the embodiment of the present application provides a video editing method, refer to Figure 1, which is a flow chart of the video editing method embodiment 1 of the present application. The video editing method includes steps S10 to S40:

步骤S10：获取插入目标的环绕视频，所述环绕视频中包含所述插入目标在不同视角下的视频帧。Step S10: Acquire a surround video of the inserted target, wherein the surround video includes video frames of the inserted target at different viewing angles.

需要说明的是，该插入目标即为待进行三维模型重建的目标物体，用户可以先环绕该插入目标拍摄一段视频，使得拍摄得到的环绕视频包括插入目标的不同视角，进而将该环绕视频上传至编辑设备。It should be noted that the inserted target is the target object to be reconstructed into a three-dimensional model. The user can first shoot a video around the inserted target so that the captured surround video includes different perspectives of the inserted target, and then upload the surround video to the editing device.

步骤S20：将所述环绕视频输入至预设三维重建模型，获得目标三维模型，所述预设三维重建模型由所述环绕视频分割得到的各视频帧训练所得。Step S20: inputting the surround video into a preset 3D reconstruction model to obtain a target 3D model, wherein the preset 3D reconstruction model is trained by each video frame obtained by segmenting the surround video.

应该理解的是，该预设三维重建模型能够通过用户拍摄的环绕视频重建该插入目标的三维网格模型和表面材质。该预设三维模型可以为基于神经网络实现的反渲染模型，其可以包括用于提取插入目标对应的三维网格模型的几何编码感知机，以及用于获取插入目标表面材质的纹理编码感知机。It should be understood that the preset 3D reconstruction model can reconstruct the 3D mesh model and surface material of the inserted target through the surround video shot by the user. The preset 3D model can be a de-rendering model based on a neural network, which can include a geometry encoding perceptron for extracting the 3D mesh model corresponding to the inserted target, and a texture encoding perceptron for obtaining the surface material of the inserted target.

还需说明的是，上述包含插入目标在不同视角下的视频帧可以作为训练预设三维重建模型的监督数据，以下对模型训练过程进行具体说明，模型的训练过程包括步骤A01～步骤A05：It should also be noted that the video frames containing the inserted target at different viewing angles can be used as supervision data for training the preset 3D reconstruction model. The model training process is specifically described below. The model training process includes steps A01 to A05:

步骤A01：初始化四面体网格，将所述四面体网格的三维坐标输入至所述几何编码感知机，获得对应的几何信息，并根据所述几何信息转换得到目标网格。Step A01: Initialize a tetrahedral mesh, input the three-dimensional coordinates of the tetrahedral mesh into the geometric encoding perceptron, obtain corresponding geometric information, and convert the geometric information to obtain a target mesh.

需要说明的是，符号距离函数(Signed Distance Function，SDF)是一种用于表示几何物体的形状的数学表示方法，其定义了一个函数，对空间中的每个点返回一个距离体表面的有符号距离值，这个值可以是正值、负值或者零，用于表示点的位置关系。几何编码感知机可以为多层感知机(Multi-Layer Perceptron，MLP)，在几何编码感知机中可以利用MLP网络结构学习并逼近SDF，从而实现对三维物体几何形状的表示和生成。It should be noted that the Signed Distance Function (SDF) is a mathematical representation method for representing the shape of a geometric object. It defines a function that returns a signed distance value from the surface of a body for each point in space. This value can be positive, negative or zero, and is used to represent the positional relationship of the points. The geometric encoding perceptron can be a multi-layer perceptron (MLP). In the geometric encoding perceptron, the MLP network structure can be used to learn and approximate the SDF, thereby realizing the representation and generation of the geometric shape of a three-dimensional object.

应该理解的是，几何信息可以为四面体网格各顶点对应的SDF值，目标网格为由SDF转化得到的无纹理的mesh网格。It should be understood that the geometric information may be the SDF value corresponding to each vertex of the tetrahedral mesh, and the target mesh is a textureless mesh obtained by converting the SDF.

在具体实现中，首先可以初始化固定分辨率的一个四面体网格，接着将四面体网格的顶点坐标输入至几何编码感知机，转化得到对应的SDF值，再通过四面体网格对应的SDF值采用Marching tetrahedrons(一种等值面提取方法，可以根据空间中坐标的SDF值提取出网格模型)得到无纹理的mesh网格。In the specific implementation, we can first initialize a tetrahedral mesh with a fixed resolution, then input the vertex coordinates of the tetrahedral mesh into the geometric encoding perceptron, convert them into corresponding SDF values, and then use Marching tetrahedrons (an isosurface extraction method that can extract a mesh model based on the SDF values of coordinates in space) to obtain a texture-free mesh.

步骤A02：将所述目标网格的三维坐标输入至所述纹理编码感知机，获得纹理信息。Step A02: Input the three-dimensional coordinates of the target grid into the texture coding perceptron to obtain texture information.

需要说明的是，该纹理编码感知机中可以利用MLP网格结构对所有材料参数进行编码，学习网格表面的自动纹理参数化过程，得到各网格顶点对应的像素值。It should be noted that the texture coding perceptron can use the MLP grid structure to encode all material parameters, learn the automatic texture parameterization process of the grid surface, and obtain the pixel value corresponding to each grid vertex.

在具体实现中，纹理编码感知机首先将mesh网格的网格顶点作为输入，输出基础颜色、反射系数以及法线系数，确定目标网格的各网格顶点对应的像素值；接着通过UV映射转化得到mesh网格对应的颜色图、反射图和法线图，获得纹理信息。In the specific implementation, the texture encoding perceptron first takes the mesh vertices of the mesh as input, outputs the basic color, reflection coefficient and normal coefficient, and determines the pixel values corresponding to each mesh vertex of the target mesh; then, it converts the color map, reflection map and normal map corresponding to the mesh through UV mapping to obtain the texture information.

步骤A03：将环境照明贴图、所述环绕视频的相机位姿信息、所述目标网格以及所述纹理信息输入至预设可微渲染器，获得当前渲染图。Step A03: Input the environment lighting map, the camera pose information of the surround video, the target mesh and the texture information into a preset differentiable renderer to obtain a current rendering image.

需要说明的是，在初始化四面体网格的同时，还可以初始化一固定大小的环境照明(High Dynamic Range，HDR)图，用于在后续从图像观察中学习环境照明。It should be noted that, while initializing the tetrahedral mesh, a fixed-size ambient lighting (High Dynamic Range, HDR) map may also be initialized for subsequent learning of ambient lighting from image observations.

应该理解的是，可以采用常见的相机位姿估计算法，例如colmap算法，对环绕视频的各视频做运动结构恢复(Structure From Motion，SFM)以获得该环绕视频的相机位姿信息。It should be understood that a common camera pose estimation algorithm, such as the colmap algorithm, may be used to perform Structure From Motion (SFM) recovery on each of the surround videos to obtain the camera pose information of the surround video.

还需说明的是，该预设可微渲染器可以采用diffrast可微渲染器，能够由输入的带材质的网格模型、环境HDR以及相机位姿信息，渲染得到相机位姿对应的四通道RGBA图像。It should also be noted that the preset differentiable renderer can adopt the diffrast differentiable renderer, which can render a four-channel RGBA image corresponding to the camera pose from the input mesh model with material, environment HDR and camera pose information.

步骤A04：根据所述当前渲染图与所述插入目标在不同视角下的前景图像进行损失计算，获得损失梯度，并基于梯度回传机制通过所述损失梯度对所述几何编码感知机、所述纹理编码感知机以及所述环境照明贴图进行优化。Step A04: Perform loss calculation based on the current rendering and the foreground image of the inserted target at different viewing angles to obtain a loss gradient, and optimize the geometry encoding perceptron, the texture encoding perceptron and the ambient lighting map through the loss gradient based on a gradient feedback mechanism.

需要说明的是，为了获得插入目标在不同视角下的前景图像，步骤A04之前，还可以包括环绕视频的背景抠除过程，该过程的步骤包括步骤A001～步骤A002：It should be noted that, in order to obtain the foreground image of the inserted target at different viewing angles, before step A04, a background removal process of the surrounding video may be included, and the steps of the process include steps A001 to A002:

步骤A001：在接收到用户的点击指令时，根据所述点击指令在所述环绕视频的当前视频帧中生成所述插入目标对应的掩码区域。Step A001: upon receiving a click instruction from the user, generating a mask area corresponding to the insertion target in the current video frame of the surround video according to the click instruction.

步骤A002：在所述环绕视频分割得到的其他视频帧中根据所述掩码区域进行分割，获得所述插入目标在不同视角下的前景图像。Step A002: Segment the other video frames obtained by segmenting the surround video according to the mask area to obtain foreground images of the inserted target at different viewing angles.

应该理解的是，可以采用视频标分割模型例如Track Anything，在环绕视频中实现视频背景抠除。具体地，用户可以在编辑设备的交互界面中对上传的环绕视频进行点击，在当前视频帧中选定要进行三维建模的插入目标，由该视频标分割模型分割得到插入目标并生成对应的掩码区域，接着采用帧跟踪目标技术在其他视频帧中自动跟踪该插入目标，得到环绕视频每一帧分割出的前景四通道图像即不同视角下的前景图像。It should be understood that a video marker segmentation model such as Track Anything can be used to implement video background removal in surround video. Specifically, the user can click on the uploaded surround video in the interactive interface of the editing device, select the insertion target to be 3D modeled in the current video frame, segment the insertion target by the video marker segmentation model and generate the corresponding mask area, and then automatically track the insertion target in other video frames using the frame tracking target technology to obtain the foreground four-channel image segmented from each frame of the surround video, that is, the foreground image under different viewing angles.

需要说明的是，该不同视角下的前景图像可以作为ground truth与上述通过可微渲染器渲染得到的相机位姿对应的四通道RGBA图片进行损失计算，获得的损失梯度被反向传播至几何编码感知机、纹理编码感知机以及初始化的环境HDR图进行参数更新优化。It should be noted that the foreground image under different viewing angles can be used as the ground truth and the four-channel RGBA image corresponding to the camera pose rendered by the differentiable renderer is used for loss calculation. The obtained loss gradient is back-propagated to the geometry encoding perceptron, texture encoding perceptron and initialized environment HDR map for parameter update optimization.

步骤A05：基于最小化经验风险的优化目标，获得预设三维重建模型。Step A05: Based on the optimization goal of minimizing the empirical risk, a preset three-dimensional reconstruction model is obtained.

需要说明的是，模型训练过程可以视为对具有几何形状、纹理材质以及光照的三角形网格的目标形状表示的优化任务。上述损失计算的公式可以为：It should be noted that the model training process can be regarded as an optimization task for the target shape representation of a triangular mesh with geometric shape, texture material and lighting. The above loss calculation formula can be:

L_total＝l₁(S,R)+L_light+L_mat L _total ＝l ₁ (S,R)+L _light +L _mat

式中，l₁(S,R)是对当前渲染图和前景图像R的l₁损失。L_light正则化项约束环境HDR图每个通道的亮度值c_i不会差异过大，更接近真实世界绝大多数白光的环境。L_mat正则化项约束相邻的网格顶点基础颜色(k_d(x_surf)以及k_d(x_surf+ε))不会差异过大，使得纹理更加平滑。Where l ₁ (S, R) is the l ₁ loss of the current rendering and the foreground image R. _{The L light} regularization term constrains the brightness value ci of each channel of the ambient HDR image _to not differ too much, which is closer to the environment of most white light in the real world. The L _mat regularization term constrains the base colors of adjacent mesh vertices (k _d (x _surf ) and k _d (x _surf +ε)) to not differ too much, making the texture smoother.

在具体实现中，基于最小化经验风险的优化目标，使用预设可微渲染器diffrast执行上述优化任务，通过损失梯度的反向传播更新几何形状、纹理材质以及光照的相关参数以实现对几何编码感知机、纹理编码感知机以及初始化的环境HDR图的优化过程，最终得到效果较好的预设三维重建模型，进而通过该预设三维重建模型构建所述插入目标对应的目标三维模型。In the specific implementation, based on the optimization goal of minimizing empirical risk, the preset differentiable renderer diffrast is used to perform the above optimization tasks, and the relevant parameters of the geometric shape, texture material and lighting are updated through the back propagation of the loss gradient to realize the optimization process of the geometric coding perceptron, texture coding perceptron and the initialized environment HDR map, and finally a preset 3D reconstruction model with better effect is obtained, and then the target 3D model corresponding to the inserted target is constructed through the preset 3D reconstruction model.

步骤S30：根据待插入视频的光流信息结合预设优化算法获得视频重建信息，所述视频重建信息包括相机内外参信息以及稠密点云信息。Step S30: Obtain video reconstruction information according to the optical flow information of the video to be inserted in combination with a preset optimization algorithm, wherein the video reconstruction information includes camera internal and external parameter information and dense point cloud information.

需要说明的是，光流信息可以用于描述待插入视频中的像素运动情况，根据所述光流信息可以获得所述待插入视频中各视频帧对应的深度信息，进而重建出待插入视频场景的稠密点云。It should be noted that the optical flow information can be used to describe the pixel motion in the video to be inserted. According to the optical flow information, the depth information corresponding to each video frame in the video to be inserted can be obtained, and then the dense point cloud of the video scene to be inserted can be reconstructed.

应该理解的是，该预设优化算法可以采用自标定光束平差法，其优化任务为最小化重投影误差，可以采用例如梯度下降法、牛顿法或是高斯牛顿法进行求解，实现对待插入视频的相机内外参信息的求解。It should be understood that the preset optimization algorithm can adopt a self-calibration bundle adjustment method, and its optimization task is to minimize the reprojection error. It can be solved by, for example, the gradient descent method, the Newton method, or the Gauss-Newton method to achieve the solution of the camera internal and external parameter information to be inserted into the video.

步骤S40：根据所述相机内外参信息以及所述稠密点云信息将所述目标三维模型植入至所述待插入视频中的用户选定区域，获得已编辑视频。Step S40: implanting the target three-dimensional model into the user-selected area in the video to be inserted according to the camera internal and external parameter information and the dense point cloud information to obtain an edited video.

应该理解的是，用户可以在编辑设备的交互界面中对待插入视频进行点选，确定目标三维模型的插入位置。It should be understood that the user can select the video to be inserted in the interactive interface of the editing device to determine the insertion position of the target three-dimensional model.

在具体实现中，在获得相机内外参信息以及稠密点云信息后，可以接收用户在待插入视频的当前帧选取的坐标点信息，生成用户选定区域，进而基于用户选定区域渲染出该目标三维模型在待插入视频的各视频帧中对应的视角图片，完成多帧图片渲染并合成最终的结果视频，即获得已编辑视频。In a specific implementation, after obtaining the camera's internal and external parameter information and dense point cloud information, the coordinate point information selected by the user in the current frame of the video to be inserted can be received to generate a user-selected area, and then based on the user-selected area, the corresponding perspective images of the target three-dimensional model in each video frame of the video to be inserted are rendered, and multi-frame image rendering is completed and the final result video is synthesized to obtain the edited video.

本实施例中用户只需提供插入目标的环绕视频，以及在环绕视频中点击选定需要重建的待插入目标，即可由预设三维重建模型得到目标三维模型，简化了传统的三维重建流程，能够降低三维重建成本，进而在待插入视频的用户选定区域中根据视频重建信息实现目标三维模型的插入，实现三维模型生产到插入视频一体的视频编辑过程，相较于现有方式，大大提升了视频编辑的速率。In this embodiment, the user only needs to provide a surround video of the insertion target, and click to select the target to be reconstructed in the surround video, and the target three-dimensional model can be obtained from the preset three-dimensional reconstruction model, which simplifies the traditional three-dimensional reconstruction process and can reduce the three-dimensional reconstruction cost. Then, the target three-dimensional model can be inserted into the user-selected area of the video to be inserted according to the video reconstruction information, realizing a video editing process from three-dimensional model production to video insertion, which greatly improves the video editing speed compared to the existing method.

基于本申请第一实施例，在本申请第二实施例中，与上述实施例一相同或相似的内容，可以参考上文介绍，后续不再赘述。在此基础上，请参考图2，图2为本申请视频编辑方法实施例二提供的流程示意图，步骤S30，具体包括步骤S301～S303：Based on the first embodiment of the present application, in the second embodiment of the present application, the same or similar contents as those in the first embodiment can be referred to the above description, and will not be described in detail later. On this basis, please refer to FIG. 2, which is a flow chart of the second embodiment of the video editing method of the present application, step S30, specifically including steps S301 to S303:

步骤S301：将待插入视频对应的若干个视频帧输入至光流预测模型，确定各所述视频帧的光流位移值。Step S301: inputting a plurality of video frames corresponding to the video to be inserted into an optical flow prediction model to determine the optical flow displacement value of each of the video frames.

需要说明的是，该光流预测模型可以采用RAFT(Recurrent All-Pairs FieldTransforms，一个光流估计的深度神经网络)。RAFT通过神经网络结构，利用深度学习技术从图像序列中学习像素级的运动信息。相对于传统的光流算法，RAFT能够更好地处理复杂的运动情况。其采用了循环神经网络(RNN)来处理图像序列信息，同时引入了局部块解耦注意机制和可变形卷积等技术，有利于提高光流估计的准确性和稳定性。It should be noted that the optical flow prediction model can use RAFT (Recurrent All-Pairs Field Transforms, a deep neural network for optical flow estimation). RAFT uses a neural network structure and deep learning technology to learn pixel-level motion information from image sequences. Compared with traditional optical flow algorithms, RAFT can better handle complex motion situations. It uses a recurrent neural network (RNN) to process image sequence information, and introduces technologies such as local block decoupling attention mechanism and deformable convolution, which is conducive to improving the accuracy and stability of optical flow estimation.

在具体实现中，将待插入视频输入至RAFT，可以得到各视频帧相对于上一视频帧的光流位移值，实现对待插入视频各视频帧的光流信息计算。In a specific implementation, the video to be inserted is input into RAFT, and the optical flow displacement value of each video frame relative to the previous video frame can be obtained, thereby realizing the calculation of the optical flow information of each video frame of the video to be inserted.

步骤S302：将在所述光流位移值大于预设阈值时，将所述光流位移值对应的视频帧添加至关键帧集合。Step S302: when the optical flow displacement value is greater than a preset threshold, adding the video frame corresponding to the optical flow displacement value to a key frame set.

在具体实现中，可以设置有一光流阈值，将光流位移值高于该光流阈值的视频帧添加至关键帧集合，光流位移值不高于该光流阈值的视频帧跳过，遍历完各视频帧，最终获得待插入视频对应的关键帧集合。In a specific implementation, an optical flow threshold can be set, and video frames whose optical flow displacement values are higher than the optical flow threshold are added to the key frame set, and video frames whose optical flow displacement values are not higher than the optical flow threshold are skipped. After traversing each video frame, the key frame set corresponding to the video to be inserted is finally obtained.

步骤S303：将在所述关键帧集合中采用预设优化算法进行视频重建信息的确定，获得相机内外参信息以及稠密点云信息。Step S303: using a preset optimization algorithm in the key frame set to determine video reconstruction information, and obtaining camera internal and external parameter information and dense point cloud information.

需要说明的是，为了提高优化效率，可以将获取上述视频重建信息的优化任务划分为在线优化部分以及离线优化部分。因此，步骤S303，还包括步骤S3031～S3033：It should be noted that, in order to improve the optimization efficiency, the optimization task of obtaining the above video reconstruction information can be divided into an online optimization part and an offline optimization part. Therefore, step S303 also includes steps S3031 to S3033:

步骤S3031：将所述关键帧集合划分为在线优化集合以及离线优化集合。Step S3031: Divide the key frame set into an online optimization set and an offline optimization set.

需要说明的是，在线优化集合可以为仅取选取关键帧集合中的后25帧作为在线局部估计任务的待优化集合；该离线优化集合对应离线全局优化任务，可以包括该关键帧集合中全部的关键帧。It should be noted that the online optimization set may be a set to be optimized that only selects the last 25 frames in the key frame set as the online local estimation task; the offline optimization set corresponds to the offline global optimization task and may include all the key frames in the key frame set.

步骤S3032：在所述在线优化集合中采用预设优化算法确定各视频帧对应的相机相对位姿、深度图以及相机内参。Step S3032: using a preset optimization algorithm in the online optimization set to determine the relative camera pose, depth map, and camera intrinsic parameters corresponding to each video frame.

步骤S3033：在所述离线优化集合中采用预设优化算法对所述相机相对位姿、所述深度图以及所述相机内参进行更新，获得相机内外参信息以及稠密点云信息。Step S3033: using a preset optimization algorithm in the offline optimization set to update the camera relative pose, the depth map and the camera intrinsic parameters to obtain camera intrinsic and extrinsic parameter information and dense point cloud information.

需要说明的是，在线局部估计任务可以在待插入视频的各视频帧输入时同步进行，当读取完全部的视频帧序列并完成了在线局部估计后，可以获得各视频帧对应的相机相对位姿、深度图以及相机内参，进而可以执行离线全局优化任务实现参数精调和更新。It should be noted that the online local estimation task can be performed synchronously when each video frame to be inserted into the video is input. After reading the entire video frame sequence and completing the online local estimation, the camera relative pose, depth map and camera intrinsic parameters corresponding to each video frame can be obtained, and then the offline global optimization task can be performed to achieve parameter fine-tuning and updating.

应该理解的是，待优化集合包括上述在线优化集合以及离线优化集合，其分别对应的优化任务为在线局部估计任务以及离线全局优化任务。在执行优化任务时，在待优化集合中对于任何光流位移在指定范围内的两帧i，j，可以通过自标定光束平差法最小化重投影误差来估计相机内外参以及稠密点云。It should be understood that the set to be optimized includes the online optimization set and the offline optimization set, and the corresponding optimization tasks are the online local estimation task and the offline global optimization task. When performing the optimization task, for any two frames i and j in the set to be optimized whose optical flow displacement is within the specified range, the camera internal and external parameters and the dense point cloud can be estimated by minimizing the reprojection error through the self-calibration bundle adjustment method.

该最小化重投影误差的任务函数为：The task function of minimizing the reprojection error is:

式中，P为待优化集合。G_ij是第i帧和第j帧的相机相对位姿。是通过光流估计的第i帧中的像素坐标在第j帧中的坐标。z_i是第i帧对应的深度图。θ是相机内参。∑ij是估计的置信权重，∑ij越大，说明估计的越准确。可以将相机坐标下的三维坐标根据内参投影到像素坐标系，π^-1(.,θ)将像素坐标根据相机内参和其对应的深度值反投影到相机坐标系下。由于最小化重投影误差为非线性优化问题，可以采用高斯牛顿法迭代地优化相对位姿G_ij，关键帧深度z_i以及相机内参θ。Where P is the set to be optimized. _{G ij} is the relative camera pose between the i-th frame and the j-th frame. is the pixel coordinates in the i-th frame estimated by optical flow in the j-th frame. _{z i} is the depth map corresponding to the i-th frame. θ is the camera internal parameter. ∑ij is the estimated confidence weight. The larger ∑ij is, the more The more accurate the estimation is. The three-dimensional coordinates in the camera coordinate system can be projected to the pixel coordinate system according to the intrinsic parameters, and the pixel coordinates can be back-projected to the camera coordinate system according to the camera intrinsic parameters and their corresponding depth values. Since minimizing the reprojection error is a nonlinear optimization problem, ^the Gauss-Newton method can be used to iteratively optimize the relative pose G _ij , the key frame depth z _i and the camera intrinsic parameter θ.

在具体实现中，在完成上述优化任务后，即可以还原得到各视频帧对应的相机相对位姿即相机外参、各视频帧的深度值(稠密点云信息)以及视频帧序列对应的相机内参，完成待插入视频的视频重建。In the specific implementation, after completing the above optimization tasks, the relative camera pose corresponding to each video frame, that is, the camera extrinsic parameters, the depth value of each video frame (dense point cloud information) and the camera intrinsic parameters corresponding to the video frame sequence can be restored to complete the video reconstruction of the video to be inserted.

本实施例对待插入视频的视频重建过程中采用结合光流算法RAFT的自标定光束平差法，能够恢复任意连续视频的相机内外参，适用于任何相机参数未知的待插入视频，并重建出对应视频场景的稠密点云，增强了对视频场景中弱纹理区域和重复纹理区域的定位能力，解决了传统的相机跟踪软件恢复的场景稀疏点云无法定位视频中弱纹理区域与重复纹理区域的问题。This embodiment adopts the self-calibration bundle adjustment method combined with the optical flow algorithm RAFT in the video reconstruction process of the video to be inserted, which can restore the camera internal and external parameters of any continuous video. It is suitable for any video to be inserted with unknown camera parameters, and reconstructs a dense point cloud of the corresponding video scene, thereby enhancing the ability to locate weak texture areas and repeated texture areas in the video scene, and solving the problem that the sparse point cloud of the scene restored by traditional camera tracking software cannot locate weak texture areas and repeated texture areas in the video.

基于本申请第一实施例和/或第二实施例，在本申请第三实施例中，与上述第一实施例和第二实施例相同或相似的内容，可以参考上文介绍，后续不再赘述。在此基础上，请参照图3，图3为本申请视频编辑方法实施例三提供的流程示意图，步骤S40，包括步骤S401～步骤S404：Based on the first embodiment and/or the second embodiment of the present application, in the third embodiment of the present application, the same or similar contents as those in the first embodiment and the second embodiment can be referred to the above description, and will not be described in detail later. On this basis, please refer to FIG. 3, which is a flowchart of the third embodiment of the video editing method of the present application, step S40, including steps S401 to S404:

步骤S401：在接收到用户的选取指令时，在所述待插入视频中生成用户选定区域。Step S401: upon receiving a selection instruction from the user, generating a user-selected area in the video to be inserted.

需要说明的是，用户可以在编辑设备的交互界面上通过点击确定目标三维模型在待插入视频中的插入位置，具体地可以在待插入视频的第一帧上点击选择四个坐标点以生成用户选定区域。It should be noted that the user can click on the interactive interface of the editing device to determine the insertion position of the target 3D model in the video to be inserted. Specifically, the user can click on the first frame of the video to be inserted to select four coordinate points to generate a user-selected area.

步骤S402：根据所述相机内外参信息以及所述稠密点云信息将所述用户选定区域的像素坐标反投影到所述待插入视频对应的相机坐标系中，获得所述用户选定区域的三维点云坐标。Step S402: back-projecting the pixel coordinates of the area selected by the user into the camera coordinate system corresponding to the video to be inserted according to the camera internal and external parameter information and the dense point cloud information to obtain the three-dimensional point cloud coordinates of the area selected by the user.

需要说明的是，可以根据反投影公式π^-1(u_r,z_r,θ)将用户选定区域的像素坐标u_r反投影到相机坐标下的三维点云坐标，其中z_r为该待插入视频的稠密点云深度值，确定用户选定区域在该待插入视频的相机坐标系中的三维点云坐标。It should be noted that the pixel coordinates _ur of the user selected area can be back-projected to the three-dimensional point cloud coordinates under the camera coordinates according to the back-projection formula π ^-1 ( _ur , _zr , θ), where _zr is the dense point cloud depth value of the video to be inserted, to determine the three-dimensional point cloud coordinates of the user selected area in the camera coordinate system of the video to be inserted.

步骤S403：确定所述待插入视频对应的三维平面法向量，并结合所述用户选定区域的三维点云坐标对所述目标三维模型进行预处理。Step S403: Determine the three-dimensional plane normal vector corresponding to the video to be inserted, and pre-process the target three-dimensional model in combination with the three-dimensional point cloud coordinates of the area selected by the user.

应该理解的是，可以基于随机抽样一致算法，根据所述待插入视频对应的相机坐标系确定三维平面法向量；基于所述三维平面法向量，结合所述用户选定区域的三维点云坐标对所述目标三维模型进行预处理。It should be understood that the three-dimensional plane normal vector can be determined based on the camera coordinate system corresponding to the video to be inserted based on a random sampling consensus algorithm; based on the three-dimensional plane normal vector, the target three-dimensional model is preprocessed in combination with the three-dimensional point cloud coordinates of the user-selected area.

需要说明的是，随机抽样一致(RANdom SAmple Consensus，RANSAC)算法是一种基于随机采样和一致性检验的迭代优化算法，可以通过拟合平面模型来估计法向量，采用RANSAC算法还可以过滤掉一些用户误选的像素点和深度估计的噪声值。It should be noted that the RANdom SAmple Consensus (RANSAC) algorithm is an iterative optimization algorithm based on random sampling and consistency check. The normal vector can be estimated by fitting a plane model. The RANSAC algorithm can also filter out some pixels that are mistakenly selected by the user and noise values of the depth estimation.

还需说明的是，考虑到RANSAC算法估计的平面法向量可能有向上或者向下两种可能，可以增加额外约束条件防止后续目标三维模型的插入出现上下颠倒的情况。具体来说，对于估计的平面法向量相机位置所处的世界坐标系坐标c，和平面上任一点坐标p，需要满足的条件，若不满足，则将法向量进行反向。It should also be noted that, considering that the plane normal vector estimated by the RANSAC algorithm may be upward or downward, additional constraints can be added to prevent the subsequent insertion of the target 3D model from being upside down. Specifically, for the estimated plane normal vector The world coordinate system coordinate c of the camera position and the coordinate p of any point on the plane need to satisfy If the condition is not met, the normal vector Do the reverse.

应该理解的是，在获得了三维平面法向量后，在对目标三维模型进行预处理时，编辑设备可以将目标三维模型的y轴方向旋转至与该三维平面法向量同向以达到放置的视觉效果。并进一步地，为了将目标三维模型的底面与用户选定区域尽可能贴合，可以基于用户选定区域的三维点云坐标，将目标三维模型的底面中心移动到用户选定区域的平面中心，并根据该用户选定区域在x轴和z轴上延伸的距离与目标三维模型在x轴和z轴方向的宽度的比值对目标三维模型进行尺寸调节。It should be understood that after obtaining the 3D plane normal vector, when preprocessing the target 3D model, the editing device can rotate the y-axis direction of the target 3D model to be in the same direction as the 3D plane normal vector to achieve a placement visual effect. Furthermore, in order to fit the bottom surface of the target 3D model to the user-selected area as closely as possible, the bottom surface center of the target 3D model can be moved to the plane center of the user-selected area based on the 3D point cloud coordinates of the user-selected area, and the size of the target 3D model can be adjusted according to the ratio of the distance of the user-selected area extending on the x-axis and z-axis to the width of the target 3D model in the x-axis and z-axis directions.

步骤S404：根据所述视频内外参信息渲染预处理后的目标三维模型对应的不同视角图片，并根据各所述视角图片和所述待插入视频对应的若干个视频帧合成已编辑视频。Step S404: Rendering preprocessed images of different viewing angles corresponding to the target three-dimensional model according to the internal and external parameter information of the video, and synthesizing the edited video according to each of the viewing angle images and a number of video frames corresponding to the video to be inserted.

需要说明的是，在完成对目标三维模型的预处理后，则可以根据上述视频重建过程得到的相机内参以及相机外参渲染待插入视频的每一视频帧对应的视角图片，该不同视角图片为RGBA图像，且RGBA图像与插入视频的视频帧像素一致。It should be noted that after completing the preprocessing of the target three-dimensional model, the perspective pictures corresponding to each video frame to be inserted into the video can be rendered according to the camera intrinsic parameters and camera extrinsic parameters obtained in the above video reconstruction process. The different perspective pictures are RGBA images, and the RGBA images are consistent with the video frame pixels of the inserted video.

还需说明的是，由于预处理过程中在待插入视频的第一帧初始化时将目标三维模型放置于用户选定区域，因而各不同视角的RGBA图像中目标三维模型渲染得到的二维结果都位于该用户选定区域。It should also be noted that, since the target 3D model is placed in the user-selected area when the first frame of the video to be inserted is initialized during the preprocessing process, the 2D results obtained by rendering the target 3D model in the RGBA images of different viewing angles are all located in the user-selected area.

进一步地，考虑到上述视频重建过程中只对关键帧进行了相机外参的还原，而对于待插入视频中的非关键帧，可以通过相邻关键帧的线性插值得到其近似的相机外参。Furthermore, considering that only the camera extrinsic parameters of the key frames are restored in the above video reconstruction process, for the non-key frames to be inserted into the video, their approximate camera extrinsic parameters can be obtained by linear interpolation of adjacent key frames.

还需说明的是，由于待插入视频的各视频帧中的渲染是相对独立的，因此可以利用基于pytorch张量的渲染器pytorch3d，并行渲染多帧的图片，并且合成最终的结果视频，其中各渲染得到的RGBA图像与对应视频帧的合成过程同样可以并行实现。It should also be noted that since the rendering in each video frame to be inserted into the video is relatively independent, the pytorch tensor-based renderer pytorch3d can be used to render multiple frames of images in parallel and synthesize the final result video, where the synthesis process of each rendered RGBA image and the corresponding video frame can also be implemented in parallel.

在具体实现中，可以通过待插入视频对应的相机内参以及待插入视频各视频帧对应的相机内参，渲染得到各视频帧对应的RGBA图像，并将该RGBA图像和视频帧分别进行合成得到结果视频帧，进而得到已编辑视频。In a specific implementation, the camera intrinsic parameters corresponding to the video to be inserted and the camera intrinsic parameters corresponding to each video frame to be inserted can be used to render the RGBA image corresponding to each video frame, and the RGBA image and the video frame can be synthesized to obtain the result video frame, thereby obtaining the edited video.

进一步地，编辑设备的交互界面还可以为用户提供有目标三维模型的大小和朝向的微调接口，以便于用户进一步调整目标三维模型的插入效果。Furthermore, the interactive interface of the editing device can also provide the user with a fine-tuning interface for the size and orientation of the target three-dimensional model, so that the user can further adjust the insertion effect of the target three-dimensional model.

本实施例中能够将用户选定区域的二维像素坐标反投影到世界坐标系中并且估计三维平面参数，并且根据用户选定区域的大小调整目标三维模型的初始大小。用户只需在待插入视频的第一帧通过点击生成选定区域，无需如传统后期软件中在三维渲染空间中设置相机以及平面参数，也无需手动对齐待视频平面以及三维平面，提供了一种对用户友好的视频编辑方法，能够满足一般用户将三维模型插入至现有视频的视频编辑需求。In this embodiment, the two-dimensional pixel coordinates of the user-selected area can be back-projected into the world coordinate system and the three-dimensional plane parameters can be estimated, and the initial size of the target three-dimensional model can be adjusted according to the size of the user-selected area. The user only needs to generate the selected area by clicking on the first frame of the video to be inserted, without setting the camera and plane parameters in the three-dimensional rendering space as in traditional post-production software, and without manually aligning the plane to be video and the three-dimensional plane, providing a user-friendly video editing method that can meet the video editing needs of general users to insert a three-dimensional model into an existing video.

本实施例还提供一种视频编辑装置，参考图4，图4为本申请视频编辑装置实施例一提供的模块结构示意图；所述视频编辑装置包括：模型生成模块401、视频重建模块402以及模型植入模块403；This embodiment also provides a video editing device. Referring to FIG. 4 , FIG. 4 is a schematic diagram of a module structure provided in Embodiment 1 of the video editing device of this application. The video editing device includes: a model generation module 401, a video reconstruction module 402, and a model implantation module 403.

所述模型生成模块401，用于获取插入目标的环绕视频，所述环绕视频中包含所述插入目标在不同视角下的视频帧；The model generation module 401 is used to obtain a surround video of the inserted target, wherein the surround video includes video frames of the inserted target at different viewing angles;

所述模型生成模块401，还用于将所述环绕视频输入至预设三维重建模型，获得目标三维模型，所述预设三维重建模型由所述环绕视频分割得到的各视频帧训练所得，所述预设三维重建模型包括几何编码感知机以及纹理编码感知机；The model generation module 401 is further used to input the surround video into a preset 3D reconstruction model to obtain a target 3D model, wherein the preset 3D reconstruction model is trained by each video frame obtained by segmenting the surround video, and the preset 3D reconstruction model includes a geometric coding perceptron and a texture coding perceptron;

所述视频重建模块402，用于根据待插入视频的光流信息结合预设优化算法获得视频重建信息，所述视频重建信息包括相机内外参信息以及稠密点云信息；The video reconstruction module 402 is used to obtain video reconstruction information according to the optical flow information of the video to be inserted in combination with a preset optimization algorithm, wherein the video reconstruction information includes camera internal and external parameter information and dense point cloud information;

所述模型植入模块403，用于根据所述相机内外参信息以及所述稠密点云信息将所述目标三维模型植入至所述待插入视频中的用户选定区域，获得已编辑视频。The model implantation module 403 is used to implant the target three-dimensional model into the user-selected area in the video to be inserted according to the camera internal and external parameter information and the dense point cloud information to obtain the edited video.

进一步地，所述模型生成模块401，还用于初始化四面体网格，将所述四面体网格的三维坐标输入至所述几何编码感知机，获得对应的几何信息，并根据所述几何信息转换得到目标网格；将所述目标网格的三维坐标输入至所述纹理编码感知机，获得纹理信息；将环境照明贴图、所述环绕视频的相机位姿信息、所述目标网格以及所述纹理信息输入至预设可微渲染器，获得当前渲染图；根据所述当前渲染图与所述插入目标在不同视角下的前景图像进行损失计算，获得损失梯度，并基于梯度回传机制通过所述损失梯度对所述几何编码感知机、所述纹理编码感知机以及所述环境照明贴图进行优化；基于最小化经验风险的优化目标，获得预设三维重建模型。Furthermore, the model generation module 401 is also used to initialize a tetrahedral mesh, input the three-dimensional coordinates of the tetrahedral mesh into the geometry coding perceptron to obtain corresponding geometry information, and convert the geometry information to obtain a target mesh; input the three-dimensional coordinates of the target mesh into the texture coding perceptron to obtain texture information; input the ambient lighting map, the camera pose information of the surround video, the target mesh and the texture information into a preset differentiable renderer to obtain a current rendering; perform loss calculation based on the current rendering and the foreground image of the inserted target at different viewing angles to obtain a loss gradient, and optimize the geometry coding perceptron, the texture coding perceptron and the ambient lighting map through the loss gradient based on a gradient feedback mechanism; obtain a preset three-dimensional reconstruction model based on the optimization goal of minimizing empirical risk.

进一步地，所述模型生成模块401，还用于在接收到用户的点击指令时，根据所述点击指令在所述环绕视频的当前视频帧中生成所述插入目标对应的掩码区域；在所述环绕视频分割得到的其他视频帧中根据所述掩码区域进行分割，获得所述插入目标在不同视角下的前景图像。Furthermore, the model generation module 401 is also used to generate a mask area corresponding to the insertion target in the current video frame of the surround video according to the click instruction when receiving a user's click instruction; and to segment other video frames obtained by segmenting the surround video according to the mask area to obtain foreground images of the insertion target at different perspectives.

进一步地，所述视频重建模块402，还用于将待插入视频对应的若干个视频帧输入至光流预测模型，确定各所述视频帧的光流位移值；在所述光流位移值大于预设阈值时，将所述光流位移值对应的视频帧添加至关键帧集合；在所述关键帧集合中采用预设优化算法进行视频重建信息的确定，获得相机内外参信息以及稠密点云信息。Furthermore, the video reconstruction module 402 is also used to input several video frames corresponding to the video to be inserted into the optical flow prediction model to determine the optical flow displacement value of each of the video frames; when the optical flow displacement value is greater than a preset threshold, the video frame corresponding to the optical flow displacement value is added to a key frame set; a preset optimization algorithm is used in the key frame set to determine the video reconstruction information to obtain the camera internal and external parameter information and dense point cloud information.

进一步地，所述视频重建模块402，还用于将所述关键帧集合划分为在线优化集合以及离线优化集合；在所述在线优化集合中采用预设优化算法确定各视频帧对应的相机相对位姿、深度图以及相机内参；在所述离线优化集合中采用预设优化算法对所述相机相对位姿、所述深度图以及所述相机内参进行更新，获得相机内外参信息以及稠密点云信息。Furthermore, the video reconstruction module 402 is also used to divide the key frame set into an online optimization set and an offline optimization set; in the online optimization set, a preset optimization algorithm is used to determine the camera relative pose, depth map and camera intrinsic parameters corresponding to each video frame; in the offline optimization set, a preset optimization algorithm is used to update the camera relative pose, the depth map and the camera intrinsic parameters to obtain camera internal and external parameter information and dense point cloud information.

进一步地，所述模型植入模块403，还用于根据所述相机内外参信息以及所述稠密点云信息将所述用户选定区域的像素坐标反投影到所述待插入视频对应的相机坐标系中，获得所述用户选定区域的三维点云坐标；确定所述待插入视频对应的三维平面法向量，并结合所述用户选定区域的三维点云坐标对所述目标三维模型进行预处理；根据所述视频内外参信息渲染预处理后的目标三维模型对应的不同视角图片，并根据各所述视角图片和所述待插入视频对应的若干个视频帧合成已编辑视频。Furthermore, the model implantation module 403 is also used to back-project the pixel coordinates of the user-selected area into the camera coordinate system corresponding to the video to be inserted according to the camera internal and external parameter information and the dense point cloud information, so as to obtain the three-dimensional point cloud coordinates of the user-selected area; determine the three-dimensional plane normal vector corresponding to the video to be inserted, and pre-process the target three-dimensional model in combination with the three-dimensional point cloud coordinates of the user-selected area; render different perspective images corresponding to the pre-processed target three-dimensional model according to the video internal and external parameter information, and synthesize the edited video according to each of the perspective images and several video frames corresponding to the video to be inserted.

进一步地，所述模型植入模块403，还用于基于随机抽样一致算法，根据所述待插入视频对应的相机坐标系确定三维平面法向量；基于所述三维平面法向量，结合所述用户选定区域的三维点云坐标对所述目标三维模型进行预处理。Furthermore, the model implantation module 403 is also used to determine the three-dimensional plane normal vector based on the camera coordinate system corresponding to the video to be inserted based on a random sampling consensus algorithm; based on the three-dimensional plane normal vector, the target three-dimensional model is pre-processed in combination with the three-dimensional point cloud coordinates of the user-selected area.

本实施例中用户只需提供插入目标的环绕视频，即可由模型生成模块得到插入目标对应的目标三维模型，并由视频重建模块还原包括待插入视频的相机内外参以及稠密点云信息的视频重建信息，最后在模型植入模块中根据视频重建信息将目标三维模型插入至用户选定区域，渲染并合成得到已编辑视频。能够实现三维模型生产到插入视频一体的视频编辑过程，满足一般用户将三维模型插入至现有视频的视频编辑需求，相较于现有方式，降低了视频编辑门槛并提升了视频编辑的速率。In this embodiment, the user only needs to provide a surround video of the inserted target, and the model generation module can obtain the target 3D model corresponding to the inserted target, and the video reconstruction module restores the video reconstruction information including the camera internal and external parameters of the video to be inserted and the dense point cloud information. Finally, the target 3D model is inserted into the user-selected area according to the video reconstruction information in the model implantation module, and the edited video is rendered and synthesized. It can realize the video editing process from 3D model production to video insertion, meet the video editing needs of general users to insert 3D models into existing videos, and lower the threshold of video editing and increase the speed of video editing compared to the existing methods.

本申请提供一种视频编辑设备，视频编辑设备包括：至少一个处理器；以及，与至少一个处理器通信连接的存储器；其中，存储器存储有可被至少一个处理器执行的指令，指令被至少一个处理器执行，以使至少一个处理器能够执行上述实施例一中的视频编辑方法。The present application provides a video editing device, which includes: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor so that the at least one processor can execute the video editing method in the above-mentioned embodiment one.

下面参考图5，图5为本申请实施例中视频编辑方法涉及的硬件运行环境的设备结构示意图。本申请实施例中的视频编辑设备可以包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、PDA(Personal Digital Assistant：个人数字助理)、PAD(PortableApplication Description：平板电脑)、PMP(Portable Media Player：便携式多媒体播放器)、车载终端(例如车载导航终端)等等的移动终端以及诸如数字TV、台式计算机等等的固定终端。图5示出的视频编辑设备仅仅是一个示例，不应对本申请实施例的功能和使用范围带来任何限制。Reference is made to FIG5 , which is a schematic diagram of the device structure of the hardware operating environment involved in the video editing method in the embodiment of the present application. The video editing device in the embodiment of the present application may include but is not limited to mobile terminals such as mobile phones, laptop computers, digital broadcast receivers, PDAs (Personal Digital Assistants), PADs (Portable Application Descriptions), PMPs (Portable Media Players), vehicle-mounted terminals (such as vehicle-mounted navigation terminals), etc., and fixed terminals such as digital TVs, desktop computers, etc. The video editing device shown in FIG5 is merely an example and should not bring any limitation to the functions and scope of use of the embodiments of the present application.

如图5所示，视频编辑设备可以包括处理装置1001(例如中央处理器、图形处理器等)，其可以根据存储在只读存储器(ROM：Read Only Memory)1002中的程序或者从存储装置1003加载到随机访问存储器(RAM：Random Access Memory)1004中的程序而执行各种适当的动作和处理。在RAM1004中，还存储有视频编辑设备操作所需的各种程序和数据。处理装置1001、ROM1002以及RAM1004通过总线1005彼此相连。输入/输出(I/O)接口1006也连接至总线。通常，以下系统可以连接至I/O接口1006：包括例如触摸屏、触摸板、键盘、鼠标、图像传感器、麦克风、加速度计、陀螺仪等的输入装置1007；包括例如液晶显示器(LCD：LiquidCrystal Display)、扬声器、振动器等的输出装置1008；包括例如磁带、硬盘等的存储装置1003；以及通信装置1009。通信装置1009可以允许视频编辑设备与其他设备进行无线或有线通信以交换数据。虽然图中示出了具有各种系统的视频编辑设备，但是应理解的是，并不要求实施或具备所有示出的系统。可以替代地实施或具备更多或更少的系统。As shown in FIG5 , the video editing device may include a processing device 1001 (e.g., a central processing unit, a graphics processor, etc.), which can perform various appropriate actions and processes according to a program stored in a read-only memory (ROM: Read Only Memory) 1002 or a program loaded from a storage device 1003 to a random access memory (RAM: Random Access Memory) 1004. In RAM 1004, various programs and data required for the operation of the video editing device are also stored. The processing device 1001, ROM 1002, and RAM 1004 are connected to each other via a bus 1005. An input/output (I/O) interface 1006 is also connected to the bus. Generally, the following systems can be connected to the I/O interface 1006: an input device 1007 including, for example, a touch screen, a touch pad, a keyboard, a mouse, an image sensor, a microphone, an accelerometer, a gyroscope, etc.; an output device 1008 including, for example, a liquid crystal display (LCD: Liquid Crystal Display), a speaker, a vibrator, etc.; a storage device 1003 including, for example, a magnetic tape, a hard disk, etc.; and a communication device 1009. Communication device 1009 can allow the video editing device to communicate wirelessly or wired with other devices to exchange data. Although the video editing device with various systems is shown in the figure, it should be understood that it is not required to implement or have all the systems shown. More or fewer systems can be implemented or have alternatively.

特别地，根据本申请公开的实施例，上文参考流程图描述的过程可以被实现为计算机软件程序。例如，本申请公开的实施例包括一种计算机程序产品，其包括承载在计算机可读介质上的计算机程序，该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中，该计算机程序可以通过通信装置从网络上被下载和安装，或者从存储装置1003被安装，或者从ROM1002被安装。在该计算机程序被处理装置1001执行时，执行本申请公开实施例的方法中限定的上述功能。In particular, according to the embodiments disclosed in the present application, the process described above with reference to the flowchart can be implemented as a computer software program. For example, the embodiments disclosed in the present application include a computer program product, which includes a computer program carried on a computer-readable medium, and the computer program includes a program code for executing the method shown in the flowchart. In such an embodiment, the computer program can be downloaded and installed from a network through a communication device, or installed from a storage device 1003, or installed from a ROM 1002. When the computer program is executed by the processing device 1001, the above-mentioned functions defined in the method of the embodiment disclosed in the present application are executed.

本申请提供的视频编辑设备，采用上述实施例中的视频编辑方法，能解决视频编辑的技术问题。与现有技术相比，本申请提供的视频编辑设备的有益效果与上述实施例提供的视频编辑方法的有益效果相同，且该视频编辑设备中的其他技术特征与上一实施例方法公开的特征相同，在此不做赘述。The video editing device provided by the present application adopts the video editing method in the above embodiment to solve the technical problems of video editing. Compared with the prior art, the beneficial effects of the video editing device provided by the present application are the same as the beneficial effects of the video editing method provided by the above embodiment, and the other technical features in the video editing device are the same as the features disclosed in the method of the previous embodiment, which will not be repeated here.

应当理解，本申请公开的各部分可以用硬件、软件、固件或它们的组合来实现。在上述实施方式的描述中，具体特征、结构、材料或者特点可以在任何的一个或多个实施例或示例中以合适的方式结合。It should be understood that the various parts disclosed in this application can be implemented by hardware, software, firmware or a combination thereof. In the description of the above embodiments, specific features, structures, materials or characteristics can be combined in any one or more embodiments or examples in a suitable manner.

以上所述，仅为本申请的具体实施方式，但本申请的保护范围并不局限于此，任何熟悉本技术领域的技术人员在本申请揭露的技术范围内，可轻易想到变化或替换，都应涵盖在本申请的保护范围之内。因此，本申请的保护范围应以所述权利要求的保护范围为准。The above is only a specific implementation of the present application, but the protection scope of the present application is not limited thereto. Any person skilled in the art who is familiar with the present technical field can easily think of changes or substitutions within the technical scope disclosed in the present application, which should be included in the protection scope of the present application. Therefore, the protection scope of the present application should be based on the protection scope of the claims.

本申请提供一种计算机可读存储介质，具有存储在其上的计算机可读程序指令(即计算机程序)，计算机可读程序指令用于执行上述实施例中的视频编辑方法。The present application provides a computer-readable storage medium having computer-readable program instructions (ie, computer programs) stored thereon, wherein the computer-readable program instructions are used to execute the video editing method in the above-mentioned embodiment.

上述计算机可读存储介质可以是视频编辑设备中所包含的；也可以是单独存在，而未装配入视频编辑设备中。The computer-readable storage medium may be included in the video editing device; or may exist independently without being assembled into the video editing device.

上述计算机可读存储介质承载有一个或者多个程序，当上述一个或者多个程序被视频编辑设备执行时，使得视频编辑设备：获取插入目标的环绕视频，所述环绕视频中包含所述插入目标在不同视角下的视频帧；将所述环绕视频输入至预设三维重建模型，获得目标三维模型，所述预设三维重建模型由所述环绕视频分割得到的各视频帧训练所得；根据待插入视频的光流信息结合预设优化算法获得视频重建信息，所述视频重建信息包括相机内外参信息以及稠密点云信息；根据所述相机内外参信息以及所述稠密点云信息将所述目标三维模型植入至所述待插入视频中的用户选定区域，获得已编辑视频。The computer-readable storage medium carries one or more programs. When the one or more programs are executed by a video editing device, the video editing device: obtains a surround video of an inserted target, wherein the surround video contains video frames of the inserted target at different viewing angles; inputs the surround video into a preset three-dimensional reconstruction model to obtain a target three-dimensional model, wherein the preset three-dimensional reconstruction model is trained by each video frame obtained by segmenting the surround video; obtains video reconstruction information based on optical flow information of the video to be inserted in combination with a preset optimization algorithm, wherein the video reconstruction information includes camera internal and external parameter information and dense point cloud information; and implants the target three-dimensional model into a user-selected area in the video to be inserted based on the camera internal and external parameter information and the dense point cloud information to obtain an edited video.

本申请提供的可读存储介质为计算机可读存储介质，所述计算机可读存储介质存储有用于执行上述视频编辑方法的计算机可读程序指令(即计算机程序)，能够解决视频编辑的技术问题。与现有技术相比，本申请提供的计算机可读存储介质的有益效果与上述实施例提供的视频编辑方法的有益效果相同，在此不做赘述。The readable storage medium provided in the present application is a computer-readable storage medium, which stores computer-readable program instructions (i.e., computer programs) for executing the above-mentioned video editing method, and can solve the technical problems of video editing. Compared with the prior art, the beneficial effects of the computer-readable storage medium provided in the present application are the same as the beneficial effects of the video editing method provided in the above-mentioned embodiment, and will not be described in detail here.

以上所述仅为本申请的部分实施例，并非因此限制本申请的专利范围，凡是在本申请的技术构思下，利用本申请说明书及附图内容所作的等效结构变换，或直接/间接运用在其他相关的技术领域均包括在本申请的专利保护范围内。The above descriptions are only some embodiments of the present application, and are not intended to limit the patent scope of the present application. All equivalent structural changes made using the contents of the present application specification and drawings under the technical concept of the present application, or direct/indirect applications in other related technical fields are included in the patent protection scope of the present application.

Claims

1.A method of video editing, the method comprising:

acquiring surrounding video of an insertion target, wherein the surrounding video comprises video frames of the insertion target under different visual angles;

inputting the surrounding video into a preset three-dimensional reconstruction model to obtain a target three-dimensional model, wherein the preset three-dimensional reconstruction model is obtained by training each video frame obtained by dividing the surrounding video;

obtaining video reconstruction information according to optical flow information of a video to be inserted in combination with a preset optimization algorithm, wherein the video reconstruction information comprises camera inside and outside parameter information and dense point cloud information;

and implanting the target three-dimensional model into a user selected area in the video to be inserted according to the camera inside-outside parameter information and the dense point cloud information to obtain an edited video.

2. The method of claim 1, wherein the pre-set three-dimensional reconstruction model includes a geometrically encoded perceptron and a texture encoded perceptron; the training step of the preset three-dimensional reconstruction model comprises the following steps:

initializing a tetrahedral grid, inputting the three-dimensional coordinates of the tetrahedral grid to the geometric coding perceptron to obtain corresponding geometric information, and converting the geometric information to obtain a target grid;

inputting the three-dimensional coordinates of the target grid to the texture coding perceptron to obtain texture information;

Inputting the ambient lighting map, the camera pose information of the surrounding video, the target grid and the texture information to a preset micro-renderer to obtain a current rendering map;

Performing loss calculation according to the current rendering map and foreground images of the insertion target under different view angles to obtain a loss gradient, and optimizing the geometric coding perceptron, the texture coding perceptron and the ambient lighting map through the loss gradient based on a gradient return mechanism;

based on an optimization target for minimizing experience risks, a preset three-dimensional reconstruction model is obtained.

3. The method of claim 2, wherein the initializing the tetrahedral mesh, inputting three-dimensional coordinates of the tetrahedral mesh to the geometric coding perceptron, obtaining corresponding geometric information, and converting the geometric information to obtain the target mesh, comprises:

When a click command of a user is received, generating a mask area corresponding to the insertion target in a current video frame of the surrounding video according to the click command;

And dividing the other video frames obtained by dividing the surrounding video according to the mask area to obtain foreground images of the insertion target under different visual angles.

4. The method of claim 1, wherein the step of obtaining the video reconstruction information according to the optical flow information of the video to be inserted in combination with a preset optimization algorithm comprises:

inputting a plurality of video frames corresponding to the video to be inserted into an optical flow prediction model, and determining an optical flow displacement value of each video frame;

when the optical flow displacement value is larger than a preset threshold value, adding a video frame corresponding to the optical flow displacement value to a key frame set;

and determining video reconstruction information in the key frame set by adopting a preset optimization algorithm to obtain camera inside and outside parameter information and dense point cloud information.

5. The method of claim 4, wherein the step of determining video reconstruction information for each video frame in the keyframe set by using a preset optimization algorithm to obtain camera inside-outside parameter information and dense point cloud information comprises:

dividing the keyframe set into an online optimization set and an offline optimization set;

determining the relative pose, depth map and camera internal parameters of the cameras corresponding to each video frame by adopting a preset optimization algorithm in the online optimization set;

and updating the relative pose of the camera, the depth map and the camera internal parameters in the offline optimization set by adopting a preset optimization algorithm to obtain camera internal and external parameter information and dense point cloud information.

6. The method of claim 1, wherein the step of implanting the target three-dimensional model into the user-selected area in the video to be inserted according to the camera inside-outside parameter information and the dense point cloud information to obtain the edited video comprises:

generating a user selected area in the video to be inserted when receiving a selection instruction of a user;

Back projecting the pixel coordinates of the user selected area into a camera coordinate system corresponding to the video to be inserted according to the camera inside and outside parameter information and the dense point cloud information to obtain three-dimensional point cloud coordinates of the user selected area;

Determining a three-dimensional plane normal vector corresponding to the video to be inserted, and preprocessing the target three-dimensional model by combining the three-dimensional point cloud coordinates of the user selected area;

Rendering different view angle pictures corresponding to the preprocessed target three-dimensional model according to the video inside and outside parameter information, and synthesizing an edited video according to each view angle picture and a plurality of video frames corresponding to the video to be inserted.

7. The method of claim 6, wherein the step of determining the normal vector of the three-dimensional plane corresponding to the video to be inserted and preprocessing the target three-dimensional model in combination with the three-dimensional point cloud coordinates of the user-selected area comprises:

Based on a random sampling coincidence algorithm, determining a three-dimensional plane normal vector according to a camera coordinate system corresponding to the video to be inserted;

and preprocessing the target three-dimensional model by combining the three-dimensional point cloud coordinates of the user selected area based on the three-dimensional plane normal vector.

8. A video editing apparatus, the apparatus comprising: the system comprises a model generation module, a video reconstruction module and a model implantation module;

The model generation module is used for acquiring surrounding video of an insertion target, wherein the surrounding video comprises video frames of the insertion target under different visual angles;

The model generation module is further used for inputting the surrounding video into a preset three-dimensional reconstruction model to obtain a target three-dimensional model, wherein the preset three-dimensional reconstruction model is obtained by training each video frame obtained by segmenting the surrounding video;

the video reconstruction module is used for obtaining video reconstruction information according to optical flow information of a video to be inserted in combination with a preset optimization algorithm, wherein the video reconstruction information comprises camera inner and outer parameter information and dense point cloud information;

The model implantation module is used for rendering the target three-dimensional model to a user selected area in the video to be inserted according to the camera inside-outside parameter information and the dense point cloud information to obtain an edited video.

9. A video editing apparatus, the apparatus comprising: memory, a processor and a computer program stored on the memory and executable on the processor, the computer program being configured to implement the steps of the video editing method of any of claims 1 to 7.

10. A storage medium, characterized in that the storage medium is a computer-readable storage medium, on which a computer program is stored, which computer program, when being executed by a processor, realizes the steps of the video editing method according to any of claims 1 to 7.