CN113570730A

CN113570730A - Video data collection method, video creation method and related products

Info

Publication number: CN113570730A
Application number: CN202110865827.9A
Authority: CN
Inventors: 周礼
Original assignee: Shenzhen TetrasAI Technology Co Ltd
Current assignee: Shenzhen TetrasAI Technology Co Ltd
Priority date: 2021-07-29
Filing date: 2021-07-29
Publication date: 2021-10-29
Anticipated expiration: 2041-07-29
Also published as: CN113570730B

Abstract

Embodiments of the present application provide a video data collection method, a video creation method, and related products. The video data collection method includes: collecting image data and pose data of at least one frame of an original video; The pose data calculates target augmented reality AR data; the target AR data includes at least one of plane data, anchor point data, and grid data; the target AR data is used for video augmented reality creation. The embodiments of the present application can improve the processing effect of video augmented reality.

Description

Video data acquisition method, video creation method and related products

Technical Field

The application relates to the technical field of video processing, in particular to a video data acquisition method, a video creation method and a related product.

Background

In recent years, with the high popularity of mobile internet, the entertainment requirements of people are increasing, and new elements are gradually blended from various two-dimensional forms such as initial simple photographing, beauty photographing, later short videos, live broadcasting and the like. At present, video editing or video creating tools on the market are still in a two-dimensional processing layer, and reality sense of augmented reality is poor.

Disclosure of Invention

The embodiment of the application provides a video data acquisition method, a video creation method and a related product, and improves the processing effect of video augmented reality.

A first aspect of an embodiment of the present application provides a video data acquisition method, including:

acquiring image data and pose data of at least one frame of image of an original video;

calculating target Augmented Reality (AR) data according to the image data and the pose data; the target AR data comprises at least one of plane data, anchor data and grid data; the target AR data is used for video augmented reality authoring.

When video data are collected, target AR data used for video augmented reality creation are calculated according to image data and pose data of at least one frame of image of a collected original video, the target AR data comprise plane data, anchor point data and grid data, a plane of material capable of being added of each frame of image can be determined according to the plane data, three-dimensional anchor point coordinates and material orientation of the material capable of being added in each frame of image are determined according to the anchor point data, three-dimensional grids of each frame of image are determined according to the grid data, combination of the added material and each frame of image in the video is more vivid, and therefore processing effect of video augmented reality is improved.

Optionally, the calculating target augmented reality AR data according to the image data and the pose data includes:

calculating initial AR data according to the image data and the pose data;

and filtering the initial AR data according to the quality score of the initial AR data, and reserving the target AR data with the quality score larger than a set threshold value.

The embodiment of the application can filter the initial AR data through the quality score of the initial AR data, so that the reliability of the filtered target AR data is greatly enhanced, video creation is performed on the reliable target AR data, and the probability of the effect distortion of the video creation is reduced.

Optionally, after calculating augmented reality target AR data according to the image data and the pose data, the method further includes:

and adding a three-dimensional material in the original video by using the target AR data to obtain an augmented reality video.

After image data and pose data are collected, a three-dimensional material can be added into an original video by utilizing target AR data to obtain an augmented reality video, so that the combination of the added three-dimensional material and each frame of image in the video is more vivid, and the processing effect of video augmented reality is improved.

Optionally, the method further includes:

determining video classification data, wherein the video classification data comprises any one of video voice keywords, video character data and video scene data;

after the calculating target Augmented Reality (AR) data from the image data and the pose data, the method further comprises:

encoding the image data, the target AR data and the depth data of the at least one frame of image, or encoding the image data, the target AR data, the depth data of the at least one frame of image and the video classification data to obtain an encoded video file or video stream data;

and uploading the encoded video file or video stream data to a server.

The video classification data can be used as a reference for subsequent addition of the three-dimensional material, so that the situation that the theme of the added three-dimensional material is not in a lattice with the video content is avoided, and the video fusion effect is improved. The encoded video stream data is uploaded to a server, and the server can push the video stream data to a video playing client to carry out video editing and video playing, so that subsequent video editing and video playing are facilitated.

Optionally, the depth data is calculated according to the image data and the pose data; or the depth data is obtained by calculating initial depth information acquired by a depth camera and the image data, or the depth data is obtained by calculating binocular images acquired by a binocular camera.

A second aspect of an embodiment of the present application provides a video authoring method, including:

acquiring a video file or video stream data;

decoding the video file or video stream data to obtain image data, target Augmented Reality (AR) data and depth data of each frame of image;

and adding a three-dimensional material in each frame of image by using the target AR data and the depth data, and fusing the three-dimensional material and the image data to obtain an augmented reality video.

When the video is created, because the target AR data comprises the plane data, the anchor point data and the grid data, the plane of the material which can be added in each frame of image can be determined according to the plane data, the three-dimensional anchor point coordinate and the material orientation of the material which can be added in each frame of image can be determined according to the anchor point data, and the three-dimensional grid of each frame of image can be determined according to the grid data, so that the combination of the added material and each frame of image in the video is more vivid, and the processing effect of video augmented reality is improved.

Optionally, the target AR data includes plane data and anchor point data, and adding a three-dimensional material in each frame image by using the target AR data and the depth data includes:

determining the three-dimensional space coordinate of each frame of image according to the depth data, determining the plane of each frame of image, to which materials can be added, according to the plane data, and determining the three-dimensional anchor point coordinate and the material orientation of each frame of image, to which materials can be added, according to the anchor point data and the three-dimensional space coordinate;

determining the size of the addable material according to the size of the plane of the addable material and the three-dimensional anchor point coordinates, and selecting a three-dimensional material according to the size of the addable material;

and adding the three-dimensional material on the three-dimensional anchor point coordinates according to the material orientation.

Optionally, the target AR data further includes mesh data, and before the fusion processing of the three-dimensional material and the image data, the method further includes:

determining the three-dimensional grid of each frame of image according to the grid data and the three-dimensional space coordinates, and performing collision detection according to the three-dimensional materials and the three-dimensional grid to obtain a collision detection result;

the fusing the three-dimensional material and the image data comprises the following steps:

and fusing the three-dimensional material and the image data according to the collision detection result.

The electronic equipment can execute collision detection according to the three-dimensional material and the three-dimensional grid to obtain a collision detection result, and the collision detection result can be used in subsequent fusion processing of the three-dimensional material and image data, so that the three-dimensional material and the three-dimensional grid are prevented from being overlapped, the visual feeling that the three-dimensional material is embedded into the three-dimensional grid is avoided, and the display effect of the augmented reality video is improved. When the three-dimensional material is in a motion state, the three-dimensional coordinate of the three-dimensional material is coincident with the coordinate of the three-dimensional grid, the contact between the three-dimensional material and the three-dimensional grid can be judged, the collision between the three-dimensional material and the three-dimensional grid is detected, and the collision trajectory of the three-dimensional material can be simulated according to the motion direction of the three-dimensional material. And performing collision detection according to the three-dimensional material and the three-dimensional grid, so that the three-dimensional material and the three-dimensional grid can be prevented from being overlapped, the visual feeling that the three-dimensional material is embedded into the three-dimensional grid is avoided, and the display effect of the augmented reality video is improved.

Optionally, after the video file or the video stream data is decoded to obtain the image data, the target AR data, and the depth data of each frame of image, the method further includes, before the step of adding a three-dimensional material to each frame of image by using the target AR data and the depth data, and fusing the three-dimensional material and the image data to obtain the augmented reality video:

acquiring video classification data;

selecting the three-dimensional material corresponding to the video classification data from a material library.

The video classification data can be used as a reference when the three-dimensional material is added, so that the situation that the theme of the added three-dimensional material is not in a lattice with the video content is avoided, and the video fusion effect is improved.

Optionally, after a three-dimensional material is added to each frame of image by using the target AR data and the depth data, and the three-dimensional material and the image data are fused to obtain an augmented reality video, the method further includes:

determining a virtual scene rendering effect of the augmented reality video according to a user portrait of a player;

and rendering the augmented reality video by using the virtual scene rendering effect.

Different user images can be matched with corresponding rendering effects, and thousands of people and thousands of faces of video experience is achieved.

Optionally, the three-dimensional material includes a three-dimensional video or a three-dimensional picture.

A third aspect of an embodiment of the present application provides a video data acquisition apparatus, including:

the acquisition unit is used for acquiring image data and pose data of at least one frame of image of an original video;

the computing unit is used for computing target Augmented Reality (AR) data according to the image data and the pose data; the target AR data comprises at least one of plane data, anchor data and grid data; the target AR data is used for video augmented reality authoring.

A fourth aspect of an embodiment of the present application provides a video authoring apparatus, including:

an acquisition unit configured to acquire a video file or video stream data;

the decoding unit is used for decoding the video file or the video stream data to obtain image data, target Augmented Reality (AR) data and depth data of each frame of image;

and the video processing unit is used for adding a three-dimensional material in each frame of image by using the target AR data and the depth data, and fusing the three-dimensional material and the image data to obtain an augmented reality video.

A fifth aspect of embodiments of the present application provides an electronic device, including a processor and a memory, where the memory is used to store a computer program, and the computer program includes program instructions, and the processor is configured to call the program instructions to execute the step instructions in the first aspect or the second aspect of embodiments of the present application.

A sixth aspect of embodiments of the present application provides a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program for electronic data exchange, and wherein the computer program causes a computer to perform some or all of the steps as described in the first aspect or the second aspect of embodiments of the present application.

A seventh aspect of embodiments of the present application provides a computer program product, wherein the computer program product comprises a non-transitory computer readable storage medium storing a computer program operable to cause a computer to perform some or all of the steps as described in the first or second aspect of embodiments of the present application. The computer program product may be a software installation package.

In the embodiment of the application, image data and pose data of at least one frame of image of an original video are collected; calculating target Augmented Reality (AR) data according to the image data and the pose data; the target AR data comprises at least one of plane data, anchor data and grid data; the target AR data is used for video augmented reality authoring. When video data are collected, target AR data used for video augmented reality creation are calculated according to image data and pose data of at least one frame of image of a collected original video, the target AR data comprise plane data, anchor point data and grid data, a plane of material capable of being added of each frame of image can be determined according to the plane data, three-dimensional anchor point coordinates and material orientation of the material capable of being added in each frame of image are determined according to the anchor point data, three-dimensional grids of each frame of image are determined according to the grid data, combination of the added material and each frame of image in the video is more vivid, and therefore processing effect of video augmented reality is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic flowchart of a video data acquisition method according to an embodiment of the present application;

fig. 2 is a schematic structural diagram of a face mesh provided in an embodiment of the present application;

FIG. 3 is a schematic structural diagram of a grid provided in an embodiment of the present application;

fig. 4 is a schematic flowchart of another video data acquisition method provided in the embodiment of the present application;

FIG. 5 is a schematic flow chart of another video authoring method provided in the embodiments of the present application;

fig. 6 is a schematic structural diagram of a video data acquisition device according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a video creation apparatus according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms "first," "second," and the like in the description and claims of the present application and in the above-described drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.

Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

The electronic device according to the embodiment of the present application may include a device having a camera and having image and video processing capabilities, such as a mobile phone, a tablet computer, and the like.

Referring to fig. 1, fig. 1 is a schematic flowchart illustrating a video data acquisition method according to an embodiment of the present disclosure. As shown in fig. 1, the method includes the following steps.

101, the electronic device collects image data and pose data of at least one frame of image of an original video.

In the embodiment of the application, the electronic device may acquire image data of each frame of video image through the camera, and acquire pose data corresponding to each frame of video image through an Inertial Measurement Unit (IMU). Each frame of video image may be referred to simply as each frame of image. Each frame of image may include image data and pose data. The image data is the image content of each frame image, for example, the RGB values of the pixels of the image. Pose data may be displacement and rotation information in three dimensions acquired by the IMU. The IMU may measure three-axis attitude angles, three-axis accelerations, and three-axis displacements of the electronic device. Within each frame of image, the IMU may measure pose data corresponding to that frame of video image.

And 102, the electronic equipment calculates target augmented reality AR data according to the image data and the pose data, and the target AR data is used for video augmented reality creation.

Wherein the target AR data includes at least one of planar data, anchor data, and mesh data.

The electronic device can also determine depth data for each frame of image from the image data and the pose data. Wherein the depth data may be calculated from the image data and the pose data. The depth data may be measured by a rotatable camera. According to the rotation of the camera, the camera is equivalent to taking two pictures, namely a parallax of the two pictures, and an effective depth is finally calculated according to the rotation of the shot camera.

The depth data in the implementation of the present application may be: depth information of pixel points in each frame of image of the video.

Optionally, after the step 102 is executed, the following steps may also be executed:

and the electronic equipment adds a three-dimensional material in the original video by using the target AR data to obtain an augmented reality video.

In the embodiment of the application, after the original video is shot by a video shooting person, the three-dimensional material can be automatically added into the original video according to the target AR data, so that the augmented reality video is obtained. Because the target AR data comprises the plane data, the anchor point data and the grid data, the plane of the addable material of each frame of image can be determined according to the plane data, the three-dimensional anchor point coordinates and the material orientation of the addable material in each frame of image are determined according to the anchor point data, and the three-dimensional grid of each frame of image is determined according to the grid data, so that the combination of the added material and each frame of image in the video is more vivid, and the processing effect of video augmented reality is improved.

After the electronic equipment adds the three-dimensional material to the original video by using the target AR data to obtain the augmented reality video, the electronic equipment can also play the augmented reality video.

Optionally, the depth data may also be obtained by calculating initial depth information acquired by a depth camera and the image data. The depth camera can acquire depth data of each pixel point on each frame of image. The depth data can accurately describe the depth of the pixel and convert the depth of each pixel point into an accurate coordinate in a three-dimensional space. The three-dimensional object can be placed on any pixel point in the three-dimensional space.

Optionally, the depth data may be calculated from a binocular image acquired by a binocular camera. Specifically, two cameras simultaneously take a frame of picture, calculate the parallax of the pictures collected by the two cameras, calculate the distance between the two cameras, and finally calculate an effective depth.

Optionally, when the depth data of each pixel point on each frame of image is calculated, only the depth information of a part of the points may be calculated, for example, the depth information of a point is found every third pixel or five pixels, and the depth information of the middle pixel point may be calculated by combining the RGB image data of each frame of image.

After the depth data of each frame of image is determined, the depth data of each pixel point of each frame of image can be converted into an accurate coordinate in a three-dimensional space, and then the three-dimensional coordinate of each pixel point of each frame of image in the three-dimensional space can be obtained. Each frame of image may be mapped to three-dimensional space by depth data for each pixel point of each frame of image. With the three-dimensional coordinates of each pixel point of each frame of image in the three-dimensional space, subsequent calculation of plane data, anchor point data, grid data and the like can be facilitated.

The plane data is data of a planar object existing in each frame image. For example, the planar object may include a table, a wall, a floor, a ceiling, etc. in the image. The data of the planar object may include a size of the planar object, coordinate information in an image, three-dimensional coordinate information in a three-dimensional space, and the like.

Anchor point data, which may be the location where three-dimensional material is placed on each frame of image. The anchor point may be disposed on the plane, above the plane or below the plane. For example, after a plane is detected, an anchor point can be marked at a certain position of the plane, the anchor point is a three-dimensional point, and when three-dimensional materials are placed later, the three-dimensional materials can be placed on the anchor point. The anchor points are arranged on the plane, the three-dimensional materials and the plane have a fitting feeling, so that the inserted three-dimensional materials are not too obtrusive, the three-dimensional materials and the plane have better fusion effect, and the three-dimensional effect is more vivid. Placing the anchor points on a flat surface creates a feeling of drift.

The anchor point data includes three-dimensional position data and a three-dimensional normal vector of the anchor point. The three-dimensional position data of the anchor point is used to represent the coordinates of the anchor point in three-dimensional space. The three-dimensional normal vector of the anchor point is used for representing the direction of the anchor point. The normal vector can be understood as the orientation of the three-dimensional material placed at the anchor point, for example, a wall plane, and in the two-dimensional space, a point is marked on the wall plane, and the point has no three-dimensional information. In a three-dimensional space, a point is marked on a wall plane, for example, on a wall on the west side, a normal vector faces east, and the anchor point has direction information, so that three-dimensional stereoscopic impression is embodied, and subsequently added three-dimensional materials have richer three-dimensional effect.

The normal vector relates to the visual effect of the three-dimensional material that needs to be added. For example, if the video creator wants to completely stick the three-dimensional object to the wall, the set normal vector should be perpendicular to the wall, and if the video creator wants to crack the wall, the set normal vector should be shifted to the vertical direction. The three-dimensional position data of the anchor point determines where the three-dimensional object is added, and the normal vector determines how the three-dimensional object rotates.

Mesh data, which may also be referred to as mesh data, is similar to mesh information. Referring to fig. 2, fig. 2 is a schematic structural diagram of a face mesh according to an embodiment of the present application. As shown in fig. 2, there are many face feature points, and the face feature points are connected into a triangle according to a certain rule, so that the face is equivalent to a piece of information, which is equivalent to a mesh, and the piece of information is equivalent to a mask attached to the face. Inside the three-dimensional space, the whole person or object can be described by mesh without considering texture information (without considering RGB information of the image). For example, when a video creator wants a small ball to run off an irregular object, a collision detection is performed along the mesh in the image and the small ball, so that the small ball is sensed to roll on the irregular object, which is equivalent to moving forward along the surface mesh of the irregular object.

The plane in fig. 3 may correspond to a mesh with many points above it. For example, a ball rolls from a flat surface, and in fact, the ball and the underlying layer mesh are ultimately subjected to collision detection. When the ball rolls in the three-dimensional space, the physical characteristics of the space are not displayed in the three-dimensional space (in the real space, the ball is on the table and is always on the table, and the table gives a supporting force to the ball, so the ball cannot fall off the table). In the computer, a plane is detected, namely a grid with a three-dimensional ball, the grid is a plane but can also be understood as a three-dimensional object, when the three-dimensional ball is added into a video, collision detection is carried out on the grid and the three-dimensional ball, and when the collision detection is that the grid is in contact with the three-dimensional ball, the grid cannot move any further. The small balls can be seen moving on the grid at this time. Without the mesh, the video creator or the video editing software does not know at all which object the bead is to be detected in collision, and in the three-dimensional space, the mesh is equivalent to the table of fig. 3, and the bead is placed on the table, and the detected bead is placed on the table, and the bead does not fall off the table and can move along the table. Some information in the three-dimensional space can be used to describe something that corresponds to real physics. Because there is no supporting force in the computer, the video creator or the video editing software must know whether the inserted three-dimensional material coincides with some elements in the image in the three-dimensional space, so as to avoid losing the reality. Whether the inserted three-dimensional material coincides with certain elements in the image can be determined by whether the inserted three-dimensional material overlaps with grids in the image, and if so, coincidence is indicated.

The mesh (mesh) may be computed. Generally, there are several ways, such as the mesh of the face in fig. 2, first detecting key points, which are detected as some two-dimensional key point information, and then mapping into a three-dimensional space according to the key point information and the face information to obtain mesh data of the face. For another example, parallax can be used, when the user moves (a camera of the electronic device moves or a shot object moves), the pose data measured by the IMU has depth due to the parallax formed by the movement, and how far the shot object (such as a wall) is from me can be known by the depth, so that the mesh data of the shot object can be calculated according to the depth data.

The mesh data may be divided into: object grid data, human body grid data and human face grid data.

As shown in fig. 3, the upper plane and the lower plane are detected, the small robot in fig. 3 is placed on the back, the original video is two-dimensional, and the small robot is placed in a three-dimensional space, which is information of the upper plane and the lower plane when shooting, so that the small robot is placed in the space to feel that the small robot stands on a table or on a wall.

Optionally, the target AR data may further include at least one of object segmentation data, human expression data, and human skeleton data.

The object segmentation data is data of an object that can be segmented in an image. The divided object can be a human body, a human face, a head of the human body and the like. The segmentation data is mainly used for shielding, for example, when three advertisement materials are put in, a wall is arranged behind the three advertisement materials, and when people pass through the front of the wall, the three-dimensional advertisement materials are not shielded by the three-dimensional advertisement materials.

The human face expression data and the human body skeleton data can be obtained through calculation of an artificial intelligence algorithm. When a face exists in a certain frame of image of the video, face expression data in the frame of image can be calculated; when a human body exists in a certain frame of image of the video, human body skeleton data in the frame of image can be calculated.

The video editing software or the video creator can determine the plane of each frame of image, to which the material can be added, according to the plane data, determine the three-dimensional anchor point coordinates and the material orientation of each frame of image, to which the material can be added, according to the anchor point data, and determine the three-dimensional grid of each frame of image according to the grid data, so that the combination of the added material and each frame of image in the video is more vivid, and the processing effect of video augmented reality is improved.

When the video is created, the target AR data can be displayed in the video to assist the video creator in creating. For example, when a video creator wants to put a three-dimensional object into a video, the video creator may put the three-dimensional object on a plane. For example, if the video is taken with a table, the video creator or the video editing software recognizes the table and puts a three-dimensional toy on the table. When the toy is placed in a three-dimensional space by a video creator or video editing software, the position of the plane in the three-dimensional space can be known according to plane data, so that when the toy is placed in the video creator or the video editing software, the toy and the video are fused together, the effect is completely different from that of two-dimensional, the effect is obviously lifelike compared with that of two-dimensional, and the fusion is better.

Optionally, step 102 may include the following steps:

(11) the electronic equipment calculates initial AR data according to the image data and the pose data;

(12) and the electronic equipment filters the initial AR data according to the quality score of the initial AR data, and retains the target AR data with the quality score larger than a set threshold.

In the embodiment of the application, the electronic device can calculate the initial AR data according to the image data and the pose data through the algorithm framework. The algorithm framework may include common algorithm frameworks for ARCore, ARkit, Slam, and the like. May be used to detect target AR data in the video. The target AR data may include at least one of plane data, anchor data, and mesh data.

The initial AR data calculated by the general algorithm framework may have a certain error, and the calculated plane data, anchor point data, and mesh data may not meet the requirements of subsequent video creation. If the plane calculated by the algorithm framework is not necessarily a plane, errors may occur in subsequent video authoring, resulting in distortion of the effect of video authoring.

For example, for plane data, after calculating N planes, a general algorithm framework may calculate the degree of dispersion between pixel points on the N planes and the calculated planes, and calculate a quality score of the planes according to the degree of dispersion. The discrete degree can calculate the average value of the vertical distance between the pixel point on the first plane and the calculated first plane through the three-dimensional coordinates of the pixel point on the first plane, and if the average value is larger, the discrete degree is larger, and the quality score is lower; the smaller the mean, the smaller the degree of dispersion and the higher the quality score. For example, we can filter out planes with a quality score below 0.5 based on the quality score of the plane.

Referring to fig. 4, fig. 4 is a schematic flowchart illustrating another video data acquisition method according to an embodiment of the present disclosure. Fig. 4 is a further optimization based on fig. 1, and as shown in fig. 4, the method comprises the following steps.

401, the electronic device collects image data and pose data of at least one frame of image of the original video.

And 402, the electronic equipment calculates target augmented reality AR data according to the image data and the pose data, and the target AR data is used for video augmented reality creation.

The specific implementation of steps 401 to 402 may refer to steps 101 to 102 shown in fig. 1, which is not described herein again.

The electronic device determines video classification data, the video classification data including any one of video speech keywords, video text data, and video scene data 403.

In the embodiment of the application, video voice can be collected in the process of collecting the original video, and video voice keywords in the video voice are extracted. Video text data in the original video can be extracted by an OCR algorithm. The video scene data scene may be a summary of the video content and the video scene data may be in the form of a tag, for example, a scene tag such as a wedding, a pet, a landscape, etc. may be tagged to the video content.

The video classification data can be used as a reference when the three-dimensional material is added subsequently, so that the situation that the theme of the added three-dimensional material and the video content cannot enter the grid is avoided, and the video fusion effect is improved.

Specifically, the electronic device can collect video voice in an original video through a microphone of the electronic device, convert the video voice into voice characters through a voice recognition algorithm, and extract video voice keywords from the voice characters. The electronic device may extract the video text data in the original video through an OCR algorithm. The electronic device may determine the video scene data by identifying an object in the video content. The video content may be tagged with a scene of "landscape" if it is identified that the object in the video content includes a landscape, a scene of "pet" if it is identified that the object in the video content includes a pet, and a scene of "wedding" if it is identified that the object in the video content includes a wedding stage.

The video scene data may be used for subsequent three-dimensional ad insertion. After the scene label is identified, the corresponding three-dimensional advertisement can be automatically inserted into the video according to the scene label. The relevance of the inserted three-dimensional advertisement and the video content is improved, the advertisement putting effect is improved, and the order conversion rate of the advertisement flow is improved. For example, if the scene label is "landscape", a three-dimensional advertisement of sports goods may be inserted into the video. If the scene label is 'pet', dog food and three-dimensional advertisements of dog clothes can be inserted into the video. If the scene label is 'wedding', three-dimensional advertisements related to diamond ring, wedding dress and wedding dress arrangement can be inserted into the video. The three-dimensional advertisement may exist in the form of a video or picture.

Optionally, after step 403 is executed, the following steps may also be executed:

404, the electronic device encodes the image data, the target AR data and the depth data of the at least one frame of image, or encodes the image data, the target AR data, the depth data of the at least one frame of image and the video classification data to obtain a coded video file or video stream data;

the electronic device uploads 405 the encoded video file or video stream data to a server.

In the embodiment of the present application, the image data, the target AR data, and the depth data of the at least one frame of image may be encoded or the image data, the target AR data, the depth data of the at least one frame of image, and the video classification data may be encoded by using a digital video compression format such as h.264, h.265, h.266, and the like.

The target AR data, the depth data of the at least one frame of image, and the video classification data may be stored in a set area of a video, or may be stored through an additional file, only the corresponding frame is needed.

The video file may be stored locally or uploaded to a server. The video stream data can be uploaded to a server to support the downstream playing of the consumer side.

The encoded video stream data is uploaded to a server, and the server can push the video stream data to a video playing client to carry out video editing and video playing, so that subsequent video editing and video playing are facilitated.

At the video playing client, the three-dimensional elements can be added into the video before the video is played. Such as three-dimensional advertisements, three-dimensional characters, three-dimensional videos, three-dimensional pictures, etc.

Referring to fig. 5, fig. 5 is a flowchart illustrating a video creation method according to an embodiment of the present disclosure. As shown in fig. 5, the method includes the following steps.

501, the electronic device obtains a video file or video stream data.

In the embodiment of the application, the electronic device can download the video file from the server or download the video stream data online.

502, the electronic device decodes the video file or video stream data to obtain image data, target Augmented Reality (AR) data and depth data of each frame of image.

Wherein the target AR data may include at least one of plane data, anchor data, and mesh data. The target AR data can be referred to the related descriptions in fig. 1 to fig. 4, and the details are not repeated here.

The decoding step of the embodiment of the present application corresponds to the encoding step of step 404, and is not described herein again.

503, the electronic device adds a three-dimensional material to each frame of image by using the target AR data and the depth data, and performs fusion processing on the three-dimensional material and the image data to obtain an augmented reality video.

After the depth data of each frame of image is determined, the depth data of each pixel point of each frame of image can be converted into an accurate coordinate in a three-dimensional space, and then the three-dimensional coordinate of each pixel point of each frame of image in the three-dimensional space can be obtained. Each frame of image may be mapped to three-dimensional space by depth data for each pixel point of each frame of image. With the three-dimensional coordinates of each pixel point of each frame of image in the three-dimensional space, plane data, anchor point data, grid data and the like can be calculated conveniently.

When the video creation is performed, the target AR data comprises the plane data, the anchor point data and the grid data, the plane of each frame of image, to which the material can be added, can be determined according to the plane data, the three-dimensional anchor point coordinates and the material orientation of the material, to which the material can be added, in each frame of image are determined according to the anchor point data, the three-dimensional grid of each frame of image is determined according to the grid data, the combination of the added material and each frame of image in the video is more vivid, and the processing effect of video augmented reality is improved. The three-dimensional material and the image data are fused, so that the fusion of the three-dimensional image and the image data is better, and the incompatibility of the material added in each frame and the image can be avoided.

In the embodiment of the present application, for the relevant description of the plane data, the anchor data, and the mesh data, reference may be made to the relevant description in fig. 1 to fig. 4, which is not described herein again.

The electronic device in fig. 5 may be different from or the same as the electronic devices in fig. 1 to 4.

Optionally, the target AR data includes plane data and anchor point data, and in step 503, the electronic device adding a three-dimensional material in each frame of image by using the target AR data and the depth data may include the following steps:

(21) the electronic equipment determines the three-dimensional space coordinates of each frame of image according to the depth data, determines the plane of each frame of image, to which the material can be added, according to the plane data, and determines the three-dimensional anchor point coordinates and the material orientation of each frame of image, to which the material can be added, according to the anchor point data and the three-dimensional space coordinates.

(22) The electronic equipment determines the size of the addable material according to the size of the plane of the addable material and the three-dimensional anchor point coordinates, and selects the three-dimensional material according to the size of the addable material.

(23) And the electronic equipment adds the three-dimensional material on the three-dimensional anchor point coordinates according to the material orientation.

In an embodiment of the application, the electronic device may determine, from the plane data, at least one plane of each frame of image to which material may be added. The plane in each frame image is generally material-adding, but in some cases, some material is not suitable for adding on the plane. For example, the plane of the ceiling is not suitable for adding a three-dimensional figure standing on the ceiling, and the plane of the wall is not suitable for adding a water cup filled with water.

The anchor point data may include three-dimensional position data and a three-dimensional normal vector of the anchor point. The three-dimensional position data of the anchor point is used to represent the coordinates of the anchor point in three-dimensional space. The three-dimensional normal vector of the anchor point is used for representing the direction of the anchor point.

The electronic equipment can determine the three-dimensional anchor point coordinates of the material which can be added in each frame of image according to the three-dimensional position data of the anchor point and the three-dimensional space coordinates of each frame of image, and can determine the material orientation of the material which can be added in each frame of image according to the three-dimensional normal vector of the anchor point.

The method and the device for determining the size of the addable material can determine the size of the addable material according to the size of the plane of the addable material and the three-dimensional anchor point coordinates. Specifically, the size of the added material can be determined according to the projection size of the added material on a two-dimensional plane (for example, a plane where a screen is located), and if the projection size of the added material on the two-dimensional plane is in a proper range, the size of the added material is proper; if the two-dimensional projection size of the addable material is smaller than the minimum size of the proper range interval, the size of the addable material is smaller; if the projection size of the addable material in two dimensions is smaller than the maximum size of the suitable range section, it indicates that the size of the addable material is larger. Generally, the larger the size of the plane, the larger the size of the material that can be added. The size of the addable material is smaller if the three-dimensional anchor point is closer to the screen side, and the size of the addable material is larger if the coordinates of the three-dimensional anchor point is farther from the screen side.

When a three-dimensional object is inserted into a large space, it is necessary to make the three-dimensional object close to nature. If the size of the three-dimensional object is large and is close to the screen side, the whole picture is discordant. If the three-dimensional object is small in size and is far from the screen side, the three-dimensional object cannot be seen clearly. In addition, when inserting an object, it is preferable to insert a three-dimensional object in an appropriate dimension and direction. If far from the screen side, a large three-dimensional object is selected, and if near to the screen side, a small three-dimensional object is selected. Similarly, if the cup is far away from the screen side, a large cup can be placed, the size of the projection of the cup to the two dimensions is moderate, and if the cup is near to the screen side, a small cup can be placed, and the size of the projection of the cup to the two dimensions is moderate. Therefore, the phenomenon that pictures are uncoordinated and too abrupt can be avoided.

Optionally, the target AR data further includes mesh data, and before the electronic device fuses the three-dimensional material and the image data, the following steps may be further performed:

(31) the electronic equipment determines the three-dimensional grid of each frame of image according to the grid data and the three-dimensional space coordinates;

(32) the electronic equipment executes collision detection according to the three-dimensional material and the three-dimensional grid to obtain a collision detection result;

the electronic device performs fusion processing on the three-dimensional material and the image data, and specifically includes the following steps:

and the electronic equipment fuses the three-dimensional material and the image data according to the collision detection result.

Any solid object in the image may be represented by mesh data, which is three-dimensional data. There may be multiple three-dimensional meshes in the image. The electronic device can determine a three-dimensional grid for each frame of image from the grid data. Any three-dimensional object in the image may be represented by the mesh data. Such as walls, tables, cups, faces, etc. in the image. A three-dimensional mesh may be composed of a plurality of point connections of the surface of a three-dimensional object in an image. If the added three-dimensional material is overlapped with the three-dimensional grid, the visual perception that the three-dimensional material is embedded into the three-dimensional object can occur, and the effect of the augmented reality video is greatly reduced. For example, the inserted three-dimensional material is a table tennis ball, the grid data is a three-dimensional grid of a table, and when the inserted three-dimensional material is overlapped with the three-dimensional grid of the table, the visual feeling that the table tennis ball is embedded into the table occurs, and the visual effect that the table tennis ball rolls or bounces on the table cannot be presented, so that the visual effect is greatly reduced.

The electronic equipment can execute collision detection according to the three-dimensional material and the three-dimensional grid to obtain a collision detection result. The collision detection result can comprise the possible motion track of the three-dimensional material in the three-dimensional space of the continuous multi-frame images. The electronic equipment can determine the motion track of the three-dimensional material in the video according to the collision detection result, the superposition of the three-dimensional material and the three-dimensional grid in the augmented reality video after fusion processing can be avoided, the visual perception that the three-dimensional material is embedded into the three-dimensional grid in the augmented reality video is avoided, and therefore the display effect of the augmented reality video is improved. When the three-dimensional material is in a motion state, the three-dimensional coordinate of the three-dimensional material is coincident with the coordinate of the three-dimensional grid, the contact between the three-dimensional material and the three-dimensional grid can be judged, the collision between the three-dimensional material and the three-dimensional grid is detected, and the collision trajectory of the three-dimensional material can be simulated according to the motion direction of the three-dimensional material. And performing collision detection according to the three-dimensional material and the three-dimensional grid, so that the three-dimensional material and the three-dimensional grid can be prevented from being overlapped, the visual feeling that the three-dimensional material is embedded into the three-dimensional grid is avoided, and the display effect of the augmented reality video is improved.

Optionally, after step 502 is executed and before step 503 is executed, the following steps may also be executed:

(41) the electronic equipment acquires video classification data;

(42) the electronic device selects a three-dimensional material corresponding to the video classification data from a material library.

In the embodiment of the application, the video classification data includes any one of video voice keywords, video text data and video scene data. The video classification data may be a tag of the video content, and there may be one or more tags of the video content. The video classification data can be used as a reference when the three-dimensional material is added, so that the situation that the theme of the added three-dimensional material is not in a lattice with the video content is avoided, and the video fusion effect is improved.

For example, if the video classification data includes "landscape," three-dimensional advertising material for sporting goods may be inserted in the video. If the video classification data comprises 'pets', dog food and three-dimensional advertisement materials of dog clothes can be inserted into the video. If the video classification data includes "wedding", three-dimensional advertising material related to diamond ring, wedding dress, wedding arrangement may be inserted in the video. The three-dimensional advertising material may be in the form of a video or picture.

Optionally, after step 503 is executed, the following steps may also be executed:

(51) the electronic equipment determines a virtual scene rendering effect of the augmented reality video according to the user portrait of the player;

(52) and the electronic equipment renders the augmented reality video by using the virtual scene rendering effect.

In the embodiment of the application, the user portrait may include the age, sex, skin color, interests, hobbies, and the like of the user. Different user images may match corresponding virtual scene rendering effects. The method can be combined with the preference of a player, browse data and the like to trigger the rendering of the virtual scene inside the video, and the video experience of thousands of people and thousands of faces is achieved.

After the electronic equipment determines the virtual scene rendering effect of the augmented reality video according to the user portrait of the player, the electronic equipment can also render the augmented reality video by using the virtual scene rendering effect, and play the rendered augmented reality video after the rendered augmented reality video is obtained.

Optionally, the electronic device determines the virtual scene rendering effect of the augmented reality video according to Location Based Services (LBS) data of the player. The LBS data may include GPS data.

The three-dimensional material comprises a three-dimensional video or a three-dimensional picture. When the three-dimensional material is a three-dimensional video, the effect of picture-in-picture can be realized. The three-dimensional video can be presented in the form of a television or a movie screen.

The above description has introduced the solution of the embodiment of the present application mainly from the perspective of the method-side implementation process. It is understood that the electronic device comprises corresponding hardware structures and/or software modules for performing the respective functions in order to realize the above-mentioned functions. Those of skill in the art will readily appreciate that the present application is capable of hardware or a combination of hardware and computer software implementing the various illustrative elements and algorithm steps described in connection with the embodiments provided herein. Whether a function is performed as hardware or computer software drives hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiment of the present application, the electronic device may be divided into the functional units according to the method example, for example, each functional unit may be divided corresponding to each function, or two or more functions may be integrated into one processing unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit. It should be noted that the division of the unit in the embodiment of the present application is schematic, and is only a logic function division, and there may be another division manner in actual implementation.

Referring to fig. 6, fig. 6 is a schematic structural diagram of a video data acquisition apparatus according to an embodiment of the present application, the video data acquisition apparatus 600 is applied to an electronic device, and the video data acquisition apparatus 600 may include an acquisition unit 601 and a calculation unit 602, where:

the acquisition unit 601 is used for acquiring image data and pose data of at least one frame of image of an original video;

a calculating unit 602, configured to calculate target augmented reality AR data according to the image data and the pose data; the target AR data comprises at least one of plane data, anchor data and grid data; the target AR data is used for video augmented reality authoring.

Optionally, the calculating unit 602 calculates target augmented reality AR data according to the image data and the pose data, including: calculating initial AR data according to the image data and the pose data; and filtering the initial AR data according to the quality score of the initial AR data, and reserving the target AR data with the quality score larger than a set threshold value.

Optionally, the video data acquisition apparatus 600 may further include an adding unit 603;

the adding unit 603 is configured to, after the calculating unit 602 calculates augmented reality target AR data according to the image data and the pose data, add a three-dimensional material to the original video by using the target AR data to obtain an augmented reality video.

Optionally, the video data acquisition apparatus 600 may further include a first determining unit 604, an encoding unit 605, and an uploading unit 606;

a first determining unit 604, configured to determine video classification data, where the video classification data includes any one of video voice keywords, video text data, and video scene data.

The encoding unit 605 is configured to encode the image data, the target AR data, and the depth data of the at least one frame of image, or encode the image data, the target AR data, the depth data of the at least one frame of image, and the video classification data to obtain an encoded video file or video stream data;

the uploading unit 606 is configured to upload the encoded video file or video stream data to a server.

The acquisition unit 601 in the embodiment of the present application includes a camera module and an IMU in an electronic device. The calculating unit 602, the adding unit 603, the first determining unit 604 and the testing unit 603 may be a processor of the electronic device, and the uploading unit 606 may also be a communication module in the electronic device.

In the embodiment of the application, when video data are collected, target AR data used for video augmented reality creation are calculated according to image data and pose data of at least one frame of image of a collected original video, the target AR data comprise plane data, anchor point data and grid data, a plane of material capable of being added of each frame of image can be determined according to the plane data, three-dimensional anchor point coordinates and material orientation of the material capable of being added in each frame of image are determined according to the anchor point data, three-dimensional grids of each frame of image are determined according to the grid data, combination of the added material and each frame of image in the video is more vivid, and therefore processing effect of video augmented reality is improved.

Referring to fig. 7, fig. 7 is a schematic structural diagram of a video creation apparatus 700 according to an embodiment of the present disclosure, where the video creation apparatus 700 is applied to an electronic device, and the video creation apparatus 700 may include an obtaining unit 701, a decoding unit 702, and a video processing unit 703, where:

an acquiring unit 701 configured to acquire a video file or video stream data;

a decoding unit 702, configured to decode the video file or the video stream data to obtain image data, target Augmented Reality (AR) data, and depth data of each frame of image;

and the video processing unit 703 is configured to add a three-dimensional material to each frame of image by using the target AR data and the depth data, and perform fusion processing on the three-dimensional material and the image data to obtain an augmented reality video.

Optionally, the target AR data includes plane data and anchor point data, and the video processing unit 703 adds a three-dimensional material in each frame image by using the target AR data and the depth data, including: determining the three-dimensional space coordinate of each frame of image according to the depth data, determining the plane of each frame of image, to which materials can be added, according to the plane data, and determining the three-dimensional anchor point coordinate and the material orientation of each frame of image, to which materials can be added, according to the anchor point data and the three-dimensional space coordinate; determining the size capable of being added according to the size of the plane of the addable material and the three-dimensional anchor point coordinates, and selecting the three-dimensional material according to the size of the addable material; and adding the three-dimensional material on the three-dimensional anchor point coordinates according to the material orientation.

Optionally, the target AR data further includes mesh data, and the video processing unit 703 is further configured to determine a three-dimensional mesh of each frame of image according to the mesh data and the three-dimensional space coordinates before performing fusion processing on the three-dimensional material and the image data, and perform collision detection according to the three-dimensional material and the three-dimensional mesh to obtain a collision detection result;

the video processing unit 703 performs fusion processing on the three-dimensional material and the image data, including:

the video processing unit 703 performs fusion processing on the three-dimensional material and the image data according to the collision detection result.

Optionally, the video authoring apparatus 700 may further include and select unit 704.

The obtaining unit 701 is further configured to obtain video classification data after the decoding unit 702 decodes the video file or the video stream data to obtain image data, target AR data, and depth data of each frame of image;

the selecting unit 704 is configured to select the three-dimensional material corresponding to the video classification data from a material library.

Optionally, the video authoring apparatus 700 may further include a second determining unit 705.

The second determining unit 705 is configured to, after the video processing unit 703 adds a three-dimensional material to each frame of image by using the target AR data and the depth data, and performs fusion processing on the three-dimensional material and the image data to obtain an augmented reality video, determine a virtual scene rendering effect of the augmented reality video according to a user portrait of a player; and rendering the augmented reality video by using the virtual scene rendering effect.

The decoding unit 702, the video processing unit 703, the selecting unit 704, and the second determining unit 705 in this embodiment may be processors of electronic devices, and the obtaining unit 701 may also be a communication module in the electronic devices.

When the video creation is performed, the target AR data comprises the plane data, the anchor point data and the grid data, the plane of each frame of image, to which the material can be added, can be determined according to the plane data, the three-dimensional anchor point coordinates and the material orientation of the material, to which the material can be added, in each frame of image are determined according to the anchor point data, the three-dimensional grid of each frame of image is determined according to the grid data, the combination of the added material and each frame of image in the video is more vivid, and the processing effect of video augmented reality is improved.

Referring to fig. 8, fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure, as shown in fig. 8, the electronic device 800 includes a processor 801 and a memory 802, and the processor 801 and the memory 802 may be connected to each other through a communication bus 803. The communication bus 803 may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus 803 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 8, but this is not intended to represent only one bus or type of bus. The memory 802 is used for storing a computer program comprising program instructions, the processor 801 being configured for invoking the program instructions, the program comprising instructions for performing the method shown in fig. 1, 4 or 5.

The processor 801 may be a general purpose Central Processing Unit (CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more integrated circuits for controlling the execution of programs according to the above schemes.

The Memory 802 may be a Read-Only Memory (ROM) or other type of static storage device that can store static information and instructions, a Random Access Memory (RAM) or other type of dynamic storage device that can store information and instructions, an Electrically Erasable Programmable Read-Only Memory (EEPROM), a Compact Disc Read-Only Memory (CD-ROM) or other optical Disc storage, optical Disc storage (including Compact Disc, laser Disc, optical Disc, digital versatile Disc, blu-ray Disc, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to these. The memory may be self-contained and coupled to the processor via a bus. The memory may also be integral to the processor.

The electronic device 800 may further include a camera module 804, and the camera module 804 may include at least one camera, a camera sensor, an image processing module, and the like. The camera sensor may comprise an IMU. The electronic device 800 may also include a communication module.

The electronic device 800 may further include a display screen, a speaker, and the like, and may further include a radio frequency circuit, an antenna, and the like.

In the embodiment of the application, when video data are collected, target AR data for video augmented reality creation are calculated according to image data and pose data of at least one frame of image of a collected original video; when the video is created, because the target AR data comprises the plane data, the anchor point data and the grid data, the plane of the material which can be added in each frame of image can be determined according to the plane data, the three-dimensional anchor point coordinate and the material orientation of the material which can be added in each frame of image can be determined according to the anchor point data, and the three-dimensional grid of each frame of image can be determined according to the grid data, so that the combination of the added material and each frame of image in the video is more vivid, and the processing effect of video augmented reality is improved.

Embodiments of the present application also provide a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program for electronic data exchange, and the computer program enables a computer to execute part or all of the steps of any one of the methods as described in the above method embodiments.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.

In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implementing, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of some interfaces, devices or units, and may be an electric or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit may be implemented in the form of hardware, or may be implemented in the form of a software program module.

The integrated units, if implemented in the form of software program modules and sold or used as stand-alone products, may be stored in a computer readable memory. Based on such understanding, the technical solution of the present application may be substantially implemented or a part of or all or part of the technical solution contributing to the prior art may be embodied in the form of a software product stored in a memory, and including several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned memory comprises: various media capable of storing program codes, such as a usb disk, a read-only memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and the like.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable memory, which may include: flash memory disks, read-only memory, random access memory, magnetic or optical disks, and the like.

The foregoing detailed description of the embodiments of the present application has been presented to illustrate the principles and implementations of the present application, and the above description of the embodiments is only provided to help understand the method and the core concept of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. a video data collection method, is characterized in that, comprises:

Collect image data and pose data of at least one frame of the original video;

Calculate target augmented reality AR data according to the image data and the pose data; the target AR data includes at least one of plane data, anchor point data, and grid data; the target AR data is used for video augmented reality creation.

2. The method according to claim 1, wherein the calculating target augmented reality AR data according to the image data and the pose data comprises:

Calculate initial AR data according to the image data and the pose data;

The initial AR data is filtered according to the quality score of the initial AR data, and the target AR data whose quality score is greater than a set threshold is retained.

3. The method according to claim 1, wherein after calculating the AR data of the augmented reality target according to the image data and the pose data, the method further comprises:

Using the target AR data, three-dimensional material is added to the original video to obtain an augmented reality video.

4. The method according to claim 1, wherein the method further comprises:

Determine video classification data, and the video classification data includes any one of video voice keywords, video text data and video scene data;

After the target augmented reality AR data is calculated according to the image data and the pose data, the method further includes:

encoding the image data, the target AR data, and the depth data of the at least one frame of image, or the image data, the target AR data, the depth data of the at least one frame of image, and the video The classified data is encoded to obtain the encoded video file or video stream data;

Upload the encoded video file or video stream data to the server.

5. The method according to claim 4, wherein the depth data is calculated according to the image data and the pose data; or, the depth data is obtained by initial depth information collected by a depth camera and the depth data. Image data is calculated, or the depth data is calculated through binocular images collected by a binocular camera.

6. a video creation method, is characterized in that, comprises:

Get video files or video stream data;

Decoding the video file or video stream data to obtain image data, target augmented reality AR data and depth data of each frame of image;

Using the target AR data and the depth data, a three-dimensional material is added to each frame of image, and the three-dimensional material and the image data are fused to obtain an augmented reality video.

7 . The method according to claim 6 , wherein the target AR data includes plane data and anchor point data, and the target AR data and the depth data are used to add three-dimensional material in each frame of image. 8 . ,include:

The three-dimensional space coordinates of each frame of image are determined according to the depth data, the plane to which materials can be added to each frame of image are determined according to the plane data, and the The 3D anchor point coordinates and material orientation of the material can be added;

Determine the size of the addable material according to the size of the plane of the addable material and the coordinates of the three-dimensional anchor point, and select a three-dimensional material according to the size of the addable material;

The three-dimensional material is added on the three-dimensional anchor point coordinates according to the material orientation.

8 . The method according to claim 7 , wherein the target AR data further comprises grid data, and before the fusion processing of the three-dimensional material and the image data, the method further comprises: 8 .

Determine the three-dimensional grid of each frame of image according to the grid data and the three-dimensional space coordinates, and perform collision detection according to the three-dimensional material and the three-dimensional grid to obtain a collision detection result;

The fusion processing of the three-dimensional material and the image data includes:

The three-dimensional material and the image data are fused according to the collision detection result.

9. The method according to any one of claims 6 to 8, wherein after decoding the video file or video stream data to obtain image data, target AR data and depth data of each frame of image , before using the target AR data and the depth data to add a three-dimensional material to each frame of image, and merging the three-dimensional material with the image data to obtain an augmented reality video, the method further includes:

Get video classification data;

The three-dimensional material corresponding to the video classification data is selected from the material library.

10. The method according to any one of claims 6 to 9, characterized in that, adding a three-dimensional material to each frame of image by using the target AR data and the depth data, and combining the three-dimensional material with the After the image data fusion processing is performed to obtain the augmented reality video, the method further includes:

Determine the virtual scene rendering effect of the augmented reality video according to the user portrait of the player;

The augmented reality video is rendered using the virtual scene rendering effect.

The method according to any one of claims 6 to 10, wherein the three-dimensional material comprises a three-dimensional video or a three-dimensional picture.

12. A video data acquisition device, comprising:

an acquisition unit for acquiring image data and pose data of at least one frame of the original video;

a calculation unit, configured to calculate target augmented reality AR data according to the image data and the pose data; the target AR data includes at least one of plane data, anchor point data, and grid data; the target AR data For video augmented reality creation.

13. A video creation device, comprising:

an acquisition unit for acquiring video files or video stream data;

a decoding unit for decoding the video file or video stream data to obtain image data, target augmented reality AR data and depth data of each frame of image;

A video processing unit, configured to use the target AR data and the depth data to add a three-dimensional material to each frame of image, and fuse the three-dimensional material with the image data to obtain an augmented reality video.

14. An electronic device, comprising a processor and a memory, wherein the memory is used to store a computer program, the computer program includes program instructions, and the processor is configured to invoke the program instructions to execute a program such as: The method of any one of claims 1-11.

15. A computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, the computer program comprising program instructions that, when executed by a processor, cause the processor to execute The method according to any one of claims 1-11.