+

CN115529378B - A video processing method and related device - Google Patents

A video processing method and related device Download PDF

Info

Publication number
CN115529378B
CN115529378B CN202210193721.3A CN202210193721A CN115529378B CN 115529378 B CN115529378 B CN 115529378B CN 202210193721 A CN202210193721 A CN 202210193721A CN 115529378 B CN115529378 B CN 115529378B
Authority
CN
China
Prior art keywords
video
electronic device
highlight
segment
scene
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210193721.3A
Other languages
Chinese (zh)
Other versions
CN115529378A (en
Inventor
董振
朱世宇
侯伟龙
杜远超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Honor Device Co Ltd
Original Assignee
Honor Device Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Honor Device Co Ltd filed Critical Honor Device Co Ltd
Priority to CN202210193721.3A priority Critical patent/CN115529378B/en
Publication of CN115529378A publication Critical patent/CN115529378A/en
Priority to EP22908855.4A priority patent/EP4258632A4/en
Priority to PCT/CN2022/143814 priority patent/WO2023160241A1/en
Priority to US18/268,799 priority patent/US12342038B2/en
Application granted granted Critical
Publication of CN115529378B publication Critical patent/CN115529378B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/724User interfaces specially adapted for cordless or mobile telephones
    • H04M1/72403User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality
    • H04M1/7243User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality with interactive means for internal management of messages
    • H04M1/72439User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality with interactive means for internal management of messages for image or video messaging
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/724User interfaces specially adapted for cordless or mobile telephones
    • H04M1/72403User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/431Generation of visual interfaces for content selection or interaction; Content or additional data rendering
    • H04N21/4312Generation of visual interfaces for content selection or interaction; Content or additional data rendering involving specific graphical features, e.g. screen layout, special fonts or colors, blinking icons, highlights or animations
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44016Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for substituting a video clip
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/47205End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for manipulating displayed content, e.g. interacting with MPEG-4 objects, editing locally
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8456Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/8549Creating video summaries, e.g. movie trailer
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
    • H04N5/265Mixing

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Databases & Information Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Computer Security & Cryptography (AREA)
  • Business, Economics & Management (AREA)
  • General Business, Economics & Management (AREA)
  • Television Signal Processing For Recording (AREA)
  • Studio Devices (AREA)

Abstract

The application provides a video processing method and a related device, which can delete invalid fragments in recorded video by analyzing scenes and transferring the recorded video of a user, clip a plurality of wonderful video fragments in the recorded video and fuse the wonderful video fragments into a wonderful video. Thus, the ornamental value of the video recorded by the user can be improved.

Description

Video processing method and related device
Technical Field
The present application relates to the field of computer vision, and in particular, to a video processing method and related apparatus.
Background
Smartphones have been developed to date, with photographing and photography becoming one of the most important features. With the increasing intensity of photographing and video recording functions of electronic devices such as smartphones, more people use the electronic devices such as smartphones to replace professional cameras to photograph.
When a user uses an electronic device such as a smart phone to record a video, the electronic device needs to synthesize a video stream by using an image stream and an audio stream that are continuously acquired in a period of time. Because the content in the video recorded by the user is relatively large, when the user looks back at the recorded video, the user is easy to feel tired because the video comprises too much uninteresting content, and the viewing experience of the user is poor.
Disclosure of Invention
The application provides a video processing method and a related device, which realize that meaningless fragments in recorded video are deleted, a plurality of wonderful video fragments in the recorded video are clipped and fused into a wonderful video by scene analysis and transition analysis in the video recorded by a user. The ornamental value of the video recorded by the user is improved.
The application provides a video processing method, which comprises the steps that an electronic device displays a shooting interface, wherein the shooting interface comprises a preview frame and a recording start control, pictures acquired by a camera of the electronic device in real time are displayed in the preview frame, the electronic device detects first input of the recording start control, the electronic device starts recording a first video in response to the first input, the electronic device displays a recording interface, the recording interface comprises a recording end control and video pictures of the first video recorded by the electronic device in real time, the electronic device detects second input of the recording end control, the electronic device ends recording the first video in response to the second input, the electronic device stores the first video and the second video, the first video comprises a first video segment, a second video segment and a third video segment, the ending time of the first video segment is earlier than or equal to the starting time of the second video segment, the ending time of the second video segment is earlier than or equal to the starting time of the third video segment, the second video comprises the first video segment and the third video segment does not comprise the second video segment.
By the video processing method, a plurality of highlight video fragments of a specified shooting scene (such as characters, spring festival, christmas, ancient architecture, beach, fireworks, plants or snow scenes and the like) in the recorded video can be clipped by analyzing scenes in the video recorded by a user, deleting invalid fragments (such as scene switching, picture scaling, picture quick moving mirror, picture severely dithering and the like) in the recorded video, and fusing the highlight video fragments into one highlight video. Thus, the ornamental value of the video recorded by the user can be improved.
In one possible implementation, the time period of the first video is longer than the time period of the second video, or the time period of the first video is shorter than the time period of the second video, or the time period of the second video is equal to the time period of the second video.
In one possible implementation, before the electronic device saves the second video, the method further includes the electronic device stitching together the first video segment and the third video segment in the first video to obtain the second video.
In one possible implementation manner, the electronic device splices the first video segment and the third video segment in the first video to obtain the first video, and specifically includes that the electronic device splices the ending position of the first video segment and the starting position of the third video segment to obtain the second video, or the electronic device splices the ending position of the first video segment and the starting position of the first special effect segment to obtain the second video, and splices the ending position of the first special effect segment and the starting position of the third video segment to obtain the second video.
In one possible implementation, the first video clip and the third video clip are highlight video clips and the second video clip is an inactive video clip.
In one possible implementation, the first video further includes a fourth video segment, the second video includes the fourth video segment if the fourth video segment is a highlight video segment, and the second video does not include the fourth video segment if the fourth video segment is an invalid video segment.
In one possible implementation, the highlight video clip includes a video clip in which the captured scene in the first video is a highlight scene and does not include a transition clip.
In one possible implementation, the highlight video clip includes a video clip in which the captured scene in the first video is a designated highlight scene and does not include a noisy or non-sound transition clip.
Wherein the highlight scene comprises one or more of a character, a landscape, a delicacy, a spring festival, a christmas, a building, a beach, a firework, a plant, a snow scene or a trip, etc.
In one possible implementation, the recording interface further includes a snapshot control, and when the electronic device displays the recording interface, the method further includes the electronic device receiving a third input by a user for the snapshot control, and in response to the third input, the electronic device saving a first video picture of the first video as a first picture when the third input is received.
In one possible implementation manner, after the electronic device finishes recording the first video, the method further comprises the step of storing a third video by the electronic device, wherein the first video comprises a fifth video segment and a sixth video segment, the ending time of the fifth video segment is earlier than or equal to the starting time of the sixth video segment, the third video comprises the fifth video segment and the sixth video segment, and the fifth video segment and the sixth video segment both comprise the same shooting subject.
In one possible implementation, after the electronic device stores the first video and the second video, the method further includes displaying a video album interface by the electronic device, where the video album interface includes a first option corresponding to the first video, and displaying a first video display interface of the first video after the electronic device detects a fourth input for the first option, where the first video display interface of the first video includes a first display area of the first video and a second display area of the second video, where the first display area is used to display a video frame of the first video, and the second display area is used to display a video frame of the second video. In this way, the first video and the second video are classified in one video display interface, so that a user can conveniently find the first video and the second video.
In one possible implementation, after the electronic device stores the first video and the second video, the method further includes displaying a video album interface by the electronic device, where the video album interface includes a first option corresponding to the first video and a second option corresponding to the second video, displaying a first video display interface of the first video after the electronic device detects a fourth input for the first option, where the first display interface of the first video includes a first display area of the first video, where the first display area is used to display a video frame of the first video, and displaying a second video display interface of the second video after the electronic device detects a fifth input for the second option, where the second display interface of the second video includes a second display area of the second video, where the second display area is used to display a video frame of the second video. Therefore, the options of the first video and the options of the second video are displayed in parallel in one video album, and a user can conveniently and quickly open the display interface of the first video or the display interface of the second video.
In one possible implementation, after the electronic device saves the first video and the second video, the method further includes displaying the capture interface by the electronic device and displaying a first prompt on the capture interface to prompt a user that the electronic device has generated and saved the second video from the recorded first video. In this way, the user can see the generated second video in time.
In one possible implementation, after the first input to the recording start control is detected, the method further comprises the steps that the electronic device collects an image stream of the first video in real time through a camera and collects an audio stream of the first video in real time through a microphone, the electronic device performs scene detection on the image stream of the first video to determine a scene category of each picture frame in the image stream of the first video, the electronic device performs transition detection on the image stream of the first video to determine a transition position and a transition category of scene conversion in the image stream of the first video, the electronic device records a plurality of highlight pictures in the first video based on the scene category of each picture frame in the image stream of the first video and the transition position and the transition category of scene conversion in the image stream of the first video, the electronic device divides the image stream of the first video into a plurality of picture segments and determines a theme of each picture segment in the plurality of picture segments, the electronic device determines a plurality of highlight pictures in the image segment based on the theme of the picture segments, and records the highlight pictures in the first video stream and the electronic device based on the first video segment and the first video stream, and the electronic device records the highlight pictures in the first video stream and the first video stream based on the first video stream and the first video stream.
Thus, by performing scene analysis and transition analysis on the recorded video during the recording of the video by the user, invalid segments (e.g., scene cuts, picture zoom, picture quick mirror, picture heavy shake, etc.) in the recorded video can be deleted, a plurality of highlight video segments in the recorded video can be clipped, and the highlight video segments can be fused into one highlight video. Thus, the ornamental value of the video recorded by the user can be improved.
In one possible implementation, after the first input to the recording start control is detected, the method further includes the steps of the electronic device collecting an image stream of the first video in real time through a camera and collecting an audio stream of the first video in real time through a microphone, the electronic device conducting scene detection on the image stream of the first video to determine a scene category of each picture frame in the image stream of the first video, the electronic device conducting transition detection on the image stream of the first video to determine a transition position and a transition category of a scene transition in the image stream of the first video, the electronic device conducting sound activation detection on the audio stream of the first video to identify a start-stop time point of a voice signal in the audio stream of the first video and divide the audio stream of the first video into a plurality of audio clips based on the start-stop time point of the voice signal, the electronic device conducting audio event classification on the audio clips in the audio stream of the first video to determine an audio event category of each picture frame in the audio clips, the electronic device determining a scene transition position and a scene category of each picture frame in the audio clips in the image stream of the first video to determine a scene in the audio stream of the audio clips based on the start-stop time point of the voice signal in the audio stream of the first video to determine that the scene event category of the audio clips in the audio stream of the audio clips is the audio clip, the electronic device divides the image stream of the first video into a plurality of picture segments and determines the segment subject of each picture segment of the plurality of picture segments based on the scene type of each picture frame in the image stream of the first video, the transition position and the transition type of scene transition in the image stream of the first video and the audio event type of the plurality of audio event image segments, the electronic device mixes the image stream of the first video and the audio stream of the first video into the first video after detecting the second input aiming at the video end control, the electronic device cuts out the first video segment and the third video segment from the first video based on the position of the plurality of highlight picture segments in the image stream of the first video, and the electronic device generates the second video based on the first video segment and the third video segment.
In this way, scene analysis, transition analysis and audio event analysis can be performed on the recorded video in the process of recording the video by the user, meaningless segments in the recorded video are deleted, a plurality of highlight video segments in the recorded video are clipped, and the highlight video segments are fused into a highlight video. Thus, the ornamental value of the video recorded by the user can be improved.
In one possible implementation, after the electronic device generates the second video, the method further comprises adding background music to the second video by the electronic device, and storing the second video by the electronic device, wherein the method specifically comprises storing the second video after adding the background music by the electronic device.
In one possible implementation, the first input includes one or more of a gesture input, a click input, a double click input, and the like.
In a second aspect, the present application provides an electronic device comprising a display screen, a camera, one or more processors, and one or more memories. The one or more memories are coupled to the one or more processors, the one or more memories being operable to store computer program code comprising computer instructions that, when executed by the one or more processors, cause the electronic device to perform the video processing method in any of the possible implementations of the above.
In a third aspect, the present application provides a chip system for application to an electronic device, the chip system comprising one or more processors configured to invoke computer instructions to cause the electronic device to perform the video processing method in any of the possible implementations of the above aspect.
In a fourth aspect, the present application provides a computer storage medium comprising computer instructions which, when run on an electronic device, cause the electronic device to perform the video processing method of any one of the possible implementations of the above aspect.
In a fifth aspect, the application provides a computer program product for, when run on a computer, causing the computer to perform the video processing method of any one of the possible implementations of the above aspect.
Drawings
Fig. 1 is a schematic hardware structure of an electronic device according to an embodiment of the present application;
Fig. 2 is a schematic software architecture diagram of an electronic device according to an embodiment of the present application;
FIGS. 3A-3I are schematic diagrams illustrating a video recording interface according to an embodiment of the present application;
FIG. 3J is a schematic view of highlight video stitching provided by an embodiment of the present application;
FIGS. 4A-4G are schematic diagrams of a set of highlight video display interfaces provided in accordance with embodiments of the present application;
FIGS. 5A-5E are schematic views of a highlight setting interface according to an embodiment of the present application;
FIGS. 6A-6F are schematic diagrams of a set of generated highlight video interfaces provided by embodiments of the present application;
FIGS. 7A-7H are schematic diagrams of another set of generated highlight video interfaces provided by embodiments of the present application;
fig. 8A-8C are schematic interface diagrams of generating a highlight video from a set of video call scenes according to an embodiment of the present application;
fig. 9 is a schematic flow chart of a video processing method according to an embodiment of the present application;
FIG. 10 is a timing diagram of generating a highlight video according to an embodiment of the present application;
FIG. 11 is a schematic view of a highlight video stitching provided in an embodiment of the present application;
Fig. 12 is a schematic block diagram of a video processing system according to an embodiment of the present application;
Fig. 13 is a flowchart of a video processing method according to another embodiment of the present application;
FIG. 14 is a timing diagram of generating a highlight video according to another embodiment of the present application;
Fig. 15 is a schematic block diagram of a video processing system according to another embodiment of the present application.
Detailed Description
The technical solutions of the embodiments of the present application will be clearly and thoroughly described below with reference to the accompanying drawings. In the description of the embodiment of the present application, unless otherwise indicated, "/" means or, for example, a/B may represent a or B, and "and/or" in the text is merely an association relationship describing an association object, which means that three relationships may exist, for example, a and/or B, and that three cases of a alone, a and B together, and B alone exist, and further, in the description of the embodiment of the present application, "a plurality" means two or more.
The terms "first," "second," and the like, are used below for descriptive purposes only and are not to be construed as implying or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature, and in the description of embodiments of the application, unless otherwise indicated, the meaning of "a plurality" is two or more.
The term "User Interface (UI)" in the following embodiments of the present application is a media interface for interaction and information exchange between an application program or an operating system and a user, which enables conversion between an internal form of information and a form acceptable to the user. The user interface is a source code written in a specific computer language such as java, extensible markup language (extensible markup language, XML) and the like, and the interface source code is analyzed and rendered on the electronic equipment to finally be presented as content which can be identified by a user. A commonly used presentation form of a user interface is a graphical user interface (GRAPHICAL USER INTERFACE, GUI), which refers to a graphically displayed user interface that is related to computer operations. It may be a visual interface element of text, icons, buttons, menus, tabs, text boxes, dialog boxes, status bars, navigation bars, widgets, etc., displayed in a display of the electronic device.
Fig. 1 shows a schematic configuration of an electronic device 100.
The embodiment will be specifically described below taking the electronic device 100 as an example. It should be understood that the electronic device 100 shown in fig. 1 is only one example, and that the electronic device 100 may have more or fewer components than shown in fig. 1, may combine two or more components, or may have a different configuration of components. The various components shown in the figures may be implemented in hardware, software, or a combination of hardware and software, including one or more signal processing and/or application specific integrated circuits.
The electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (universal serial bus, USB) interface 130, a charge management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, a sensor module 180, keys 190, a motor 191, an indicator 192, a camera 193, a display 194, and a subscriber identity module (subscriber identification module, SIM) card interface 195, etc. The sensor module 180 may include a pressure sensor 180A, a gyro sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, an ambient light sensor 180L, a bone conduction sensor 180M, and the like.
It should be understood that the illustrated structure of the embodiment of the present application does not constitute a specific limitation on the electronic device 100. In other embodiments of the application, electronic device 100 may include more or fewer components than shown, or certain components may be combined, or certain components may be split, or different arrangements of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.
The processor 110 may include one or more processing units, for example, the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (IMAGE SIGNAL processor, ISP), a controller, a memory, a video codec, a digital signal processor (DIGITAL SIGNAL processor, DSP), a baseband processor, and/or a neural network processor (neural-network processing unit, NPU), etc. Wherein the different processing units may be separate devices or may be integrated in one or more processors.
The controller may be a neural hub and a command center of the electronic device 100, among others. The controller can generate operation control signals according to the instruction operation codes and the time sequence signals to finish the control of instruction fetching and instruction execution.
A memory may also be provided in the processor 110 for storing instructions and data. In some embodiments, the memory in the processor 110 is a cache memory. The memory may hold instructions or data that the processor 110 has just used or recycled. If the processor 110 needs to reuse the instruction or data, it can be called directly from the memory. Repeated accesses are avoided and the latency of the processor 110 is reduced, thereby improving the efficiency of the system.
In some embodiments, the processor 110 may include one or more interfaces. The interfaces may include an integrated circuit (inter-INTEGRATED CIRCUIT, I2C) interface, an integrated circuit built-in audio (inter-INTEGRATED CIRCUIT SOUND, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, and/or a universal serial bus (universal serial bus, USB) interface, etc.
The electronic device 100 implements display functions through a GPU, a display screen 194, an application processor, and the like. The GPU is a microprocessor for image processing, and is connected to the display 194 and the application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. Processor 110 may include one or more GPUs that execute program instructions to generate or change display information.
The display screen 194 is used to display images, videos, and the like. The display 194 includes a display panel. The display panel may employ a Liquid Crystal Display (LCD). The display panel may also be manufactured using organic light-emitting diodes (OLED), active-matrix organic LIGHT EMITTING diode (AMOLED), flexible light-emitting diodes (FLED), miniled, microled, micro-OLED, quantum dot LIGHT EMITTING diodes (QLED), or the like. In some embodiments, the electronic device 100 may include 1 or N display screens 194, N being a positive integer greater than 1.
The electronic device 100 may implement photographing functions through an ISP, a camera 193, a video codec, a GPU, a display screen 194, an application processor, and the like.
The ISP is used to process data fed back by the camera 193. For example, when photographing, the shutter is opened, light is transmitted to the camera photosensitive element through the lens, the optical signal is converted into an electric signal, and the camera photosensitive element transmits the electric signal to the ISP for processing and is converted into an image visible to naked eyes. ISP can also optimize the noise, brightness and skin color of the image. The ISP can also optimize parameters such as exposure, color temperature and the like of a shooting scene. In some embodiments, the ISP may be provided in the camera 193.
The camera 193 is used to capture still images or video. The object generates an optical image through the lens and projects the optical image onto the photosensitive element. The photosensitive element may be a charge coupled device (charge coupled device, CCD) or a Complementary Metal Oxide Semiconductor (CMOS) phototransistor. The photosensitive element converts the optical signal into an electrical signal, which is then transferred to the ISP to be converted into a digital image signal. The ISP outputs the digital image signal to the DSP for processing. The DSP converts the digital image signal into an image signal in a standard RGB, YUV, or the like format. In some embodiments, electronic device 100 may include 1 or N cameras 193, N being a positive integer greater than 1.
The digital signal processor is used for processing digital signals, and can process other digital signals besides digital image signals. For example, when the electronic device 100 selects a frequency bin, the digital signal processor is used to fourier transform the frequency bin energy, or the like.
Video codecs are used to compress or decompress digital video. The electronic device 100 may support one or more video codecs. Thus, the electronic device 100 may play or record video in a variety of encoding formats, such as moving picture experts group (moving picture experts group, MPEG) 1, MPEG2, MPEG3, MPEG4, and the like.
The NPU is a neural-network (NN) computing processor, and can rapidly process input information by referencing a biological neural network structure, for example, referencing a transmission mode between human brain neurons, and can also continuously perform self-learning. Applications such as intelligent recognition of the electronic device 100, for example, image recognition, face recognition, voice recognition, text understanding, etc., can be realized through the NPU.
The external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to enable expansion of the memory capabilities of the electronic device 100. The external memory card communicates with the processor 110 through an external memory interface 120 to implement data storage functions. For example, files such as music, video, etc. are stored in an external memory card.
The internal memory 121 may be used to store computer executable program code including instructions. The processor 110 executes various functional applications of the electronic device 100 and data processing by executing instructions stored in the internal memory 121. The internal memory 121 may include a storage program area and a storage data area.
The electronic device 100 may implement audio functions through an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, an application processor, and the like. Such as music playing, recording, etc.
The audio module 170 is used to convert digital audio information into an analog audio signal output and also to convert an analog audio input into a digital audio signal. The audio module 170 may also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be disposed in the processor 110, or a portion of the functional modules of the audio module 170 may be disposed in the processor 110.
The speaker 170A, also referred to as a "horn," is used to convert audio electrical signals into sound signals. The electronic device 100 may listen to music, or to hands-free conversations, through the speaker 170A.
A receiver 170B, also referred to as a "earpiece", is used to convert the audio electrical signal into a sound signal. When electronic device 100 is answering a telephone call or voice message, voice may be received by placing receiver 170B in close proximity to the human ear.
Microphone 170C, also referred to as a "microphone" or "microphone", is used to convert sound signals into electrical signals. When making a call or transmitting voice information, the user can sound near the microphone 170C through the mouth, inputting a sound signal to the microphone 170C. The electronic device 100 may be provided with at least one microphone 170C.
The earphone interface 170D is used to connect a wired earphone. The headset interface 170D may be a USB interface 130 or a 3.5mm open mobile electronic device platform (open mobile terminal platform, OMTP) standard interface, a american cellular telecommunications industry association (cellular telecommunications industry association of the USA, CTIA) standard interface.
The pressure sensor 180A is used to sense a pressure signal, and may convert the pressure signal into an electrical signal. In some embodiments, the pressure sensor 180A may be disposed on the display screen 194. The pressure sensor 180A is of various types, such as a resistive pressure sensor, an inductive pressure sensor, a capacitive pressure sensor, and the like.
The gyro sensor 180B may be used to determine a motion gesture of the electronic device 100. In some embodiments, the angular velocity of electronic device 100 about three axes (i.e., x, y, and z axes) may be determined by gyro sensor 180B. The gyro sensor 180B may be used for photographing anti-shake. For example, when the shutter is pressed, the gyro sensor 180B detects the shake angle of the electronic device 100, calculates the distance to be compensated by the lens module according to the angle, and makes the lens counteract the shake of the electronic device 100 through the reverse motion, so as to realize anti-shake. The gyro sensor 180B may also be used for navigating, somatosensory game scenes.
The acceleration sensor 180E may detect the magnitude of acceleration of the electronic device 100 in various directions (typically three axes). The magnitude and direction of gravity may be detected when the electronic device 100 is stationary. The electronic equipment gesture recognition method can also be used for recognizing the gesture of the electronic equipment, and is applied to horizontal and vertical screen switching, pedometers and other applications.
A distance sensor 180F for measuring a distance. The electronic device 100 may measure the distance by infrared or laser. In some embodiments, the electronic device 100 may range using the distance sensor 180F to achieve quick focus.
The fingerprint sensor 180H is used to collect a fingerprint. The electronic device 100 may utilize the collected fingerprint feature to unlock the fingerprint, access the application lock, photograph the fingerprint, answer the incoming call, etc.
The temperature sensor 180J is for detecting temperature. In some embodiments, the electronic device 100 performs a temperature processing strategy using the temperature detected by the temperature sensor 180J.
The touch sensor 180K, also referred to as a "touch panel". The touch sensor 180K may be disposed on the display screen 194, and the touch sensor 180K and the display screen 194 form a touch screen, which is also called a "touch screen". The touch sensor 180K is for detecting a touch operation acting thereon or thereabout. The touch sensor may communicate the detected touch operation to the application processor to determine the touch event type. Visual output related to touch operations may be provided through the display 194. In other embodiments, the touch sensor 180K may also be disposed on the surface of the electronic device 100 at a different location than the display 194.
Fig. 2 schematically illustrates a software architecture of an electronic device according to an embodiment of the present application.
As shown in fig. 2, the hierarchical architecture divides the system into several layers, each with distinct roles and branches. The layers communicate with each other through a software interface. In some embodiments, the system is divided into five layers, from top to bottom, an application layer, an application framework layer, a hardware abstraction layer, a kernel layer, and a hardware layer, respectively.
The application layer may include a series of application packages.
The application package may include a camera, gallery, etc.
The application framework layer provides an application programming interface (application programming interface, API) and programming framework for the application of the application layer. The application framework layer includes a number of predefined functions.
In some embodiments, the application framework layer may include a camera access interface, wherein the camera access interface may include camera management and camera devices. The camera access interface is used to provide an application programming interface and programming framework for camera applications.
The hardware abstraction layer is an interface layer between the application framework layer and the kernel layer, and provides a virtual hardware platform for the operating system.
In the embodiment of the application, the hardware abstraction layer can comprise a camera hardware abstraction layer and a camera algorithm library.
The camera hardware abstraction layer may provide, among other things, virtual hardware of the camera device 1 (first camera) and the camera device 2 (second camera). It may also acquire pose data and transmit it to a camera algorithm library. The camera hardware abstraction layer may also be used to calculate the number of images to be stitched N. And obtaining information from the camera algorithm library.
The camera algorithm library may include an algorithm module and a motion detection module.
The algorithm module comprises a plurality of algorithms for processing the images, and the algorithms can be used for realizing the splicing and other processing of the N frames of images to be spliced.
The motion detection module may be used to calculate whether a current shooting scene of the electronic device is moving.
The kernel layer is a layer between hardware and software. The kernel layer includes drivers for various hardware.
In some embodiments, the kernel layer may include camera device drivers, digital signal processor drivers, and image processor drivers, among others.
The camera device drives a sensor for driving the camera to acquire images and drives the image signal processor to preprocess the images.
The digital signal processor driver is used for driving the digital signal processor to process the image.
The image processor driver is used for driving the image processor to process the image.
The method in the embodiment of the present application is specifically described below with reference to the above hardware structure and system structure:
1. The electronic device 100 starts a video recording function, and acquires an image stream and an audio stream.
This step 1 is carried out continuously. In response to an operation (for example, clicking operation) of a user on a shooting interface for a video recording start control, the camera application calls a camera access interface of an application framework layer, starts the camera application, and then sends an instruction for starting video recording by calling a camera device 1 (a first camera) in a camera hardware abstraction layer, the camera hardware abstraction layer sends the instruction to a camera device driver of a kernel layer, and the camera device driver can start a sensor (a sensor 1) of the first camera of the camera, and collects image light signals through the sensor 1. And transmitting the image light signal to an image signal processor for preprocessing to obtain an image stream (at least 2 original image frames form an image sequence), and then transmitting the original stream to a camera hardware abstraction layer through a camera device driver. The camera application also sends an instruction for starting video through an audio input unit in the audio hardware abstraction layer, the audio hardware abstraction layer sends the instruction to an audio driver of the kernel layer, and the audio driver can start a microphone to collect audio signals to obtain an audio stream.
2. The electronic device 100 obtains a processing stream from the image stream.
This step 2 is continued. The camera hardware abstraction layer may send the original stream a library of camera algorithms. Based on the support of the digital signal processor and the image processor, the camera algorithm library can firstly downsample the original stream to obtain a low-resolution processing stream.
3. The electronic device 100 performs scene detection and transition detection on the image frames in the processing stream, determining highlight clips.
This step 3 is continued. Based on the support of the digital signal processor and the image processor, the camera algorithm library can call a scene detection algorithm, a transition detection algorithm and the like to detect the scene category, the transition position, the transition category and the like of each frame of picture in the image stream, and the like, wherein the transition position, the transition category and the like of scene transition occur, so that the highlight picture segments are determined.
4. The electronic device 100 mixes the image stream and the audio stream into an original video.
Based on the support of the digital signal processor and the image processor, the image stream and the audio stream can be mixed into the original video based on the same time track.
5. The electronic device 100 may extract a plurality of highlight video segments from the original video based on the location of the highlight frame segments and fuse the plurality of highlight video segments into one highlight video.
The camera algorithm library may invoke a clipping algorithm and a fusion algorithm to extract a plurality of highlight video segments from the video stream based on the locations of the highlight video segments and fuse the plurality of highlight video segments into one highlight video. The highlight video clips may include video clips in the original video, where the shooting scene is a highlight scene and does not include a transition clip. Or the highlight video clip may include a video clip in which the photographed scene is a highlight scene in the original video and does not include a noisy or non-sound transition clip. Wherein the highlight scene comprises one or more of a character, a landscape, a delicacy, a spring festival, a christmas, a building, a beach, a firework, a plant, a snow scene or a trip, etc.
6. The electronic device 100 may save the highlight video and the original video.
The camera algorithm library may send the highlight video to the camera hardware abstraction layer. The camera hardware abstraction layer may then save it.
The embodiment of the application provides a video processing method, which can be used for deleting invalid fragments (such as scene switching, picture scaling, picture quick-moving mirror, picture severely jittering and the like) in recorded video by analyzing scenes in the recorded video of a user, clipping a plurality of wonderful video fragments of appointed shooting scenes (such as characters, spring festival, christmas, ancient architecture, beach, fireworks, plants or snow scenes and the like) in the recorded video, and fusing the wonderful video fragments into a wonderful video. Thus, the ornamental value of the video recorded by the user can be improved.
The video processing method provided by the embodiment of the application is described below with reference to application scenes.
In some application scenarios, a user may record video in a normal recording mode in a camera application using the electronic device 100. During the recording of the video by the electronic device 100, the electronic device 100 may identify and clip a plurality of highlight video segments of a highlight scene in the recorded original video, and blend the plurality of highlight frequency bands into one highlight video. After the recording of the video is completed, the electronic device 100 may save the original video and the highlight video. Thus, the ornamental value of the video recorded by the user can be improved.
For example, as shown in FIG. 3A, the electronic device 100 may display a desktop 310 in which a page of application icons is displayed, the page including a plurality of application icons (e.g., weather application icons, stock application icons, calculator application icons, setup application icons, mail application icons, gallery application icons 312, music application icons, video application icons, browser application icons, etc.). Page indicators are also displayed below the application icons to indicate the positional relationship between the currently displayed page and other pages. Below the page indicator are a plurality of tray icons (e.g., dialing application icon, information application icon, contact application icon, camera application icon 311) that remain displayed when the page is switched. In some embodiments, the page may also include a plurality of application icons and page indicators, which may not be part of the page, exist alone, and the tray icons are also optional, as embodiments of the application are not limited in this respect.
The electronic device 100 may receive an input operation (e.g., a click) by a user on the camera application icon 311, and in response to the input operation, the electronic device 100 may display a photographing interface 320 as shown in fig. 3B.
As shown in fig. 3B, the capture interface 320 may include a playback control 321, a capture control 322, a camera conversion control 323, a preview box, a setup control 325, a zoom magnification control 326, one or more capture mode controls (e.g., a "night view mode" control 327A, a "portrait capture mode" control 327B, a "large aperture mode" control 327C, a "normal capture mode" control 327D, a "video mode" control 327E, a "professional mode" control 327F, and a more mode control 327G). A preview screen 324 is displayed in the preview box. The back display control 321 can be used to display the captured image. The capture control 322 is used to trigger saving of images captured by the camera. The camera conversion control 323 can be used to switch the camera to take a photograph. The settings control 325 may be used to set the photographing function. The zoom magnification control 326 may be used to set the zoom magnification of the camera. The shooting mode control can be used for triggering and starting an image processing flow corresponding to the shooting mode. For example, a "night scene mode" control 327A may be used to trigger an increase in brightness and color richness in the captured image, and so forth. The "portrait mode" control 327B may be used to trigger a beautification process for a portrait in a captured image. As shown in fig. 3B, the photographing mode selected by the current user is the "normal photographing mode".
The electronic device 100 may receive input (e.g., a single click) from the user selecting the "record mode" control 327E, and in response to the input, the electronic device 100 may switch from the "normal photographing mode" to the "record mode" and replace the photographing control 322 with the recording start control 331, as shown in fig. 3C. The electronic device 100 may also display video time information 332.
As shown in fig. 3C, when the electronic device 100 records a video, there are a character a, a character B, and ferris wheels in the direction in which the camera of the electronic device 100 is aligned. The electronic device 100 may receive a first input (e.g., a single click) by a user on the recording start control 331, in response to which the electronic device 100 may start recording video. For example, after starting to record video, the user may shoot the person a within a period of 0 to 3s after starting to record. And transferring from the person A to the person B for shooting within a period of 3 s-5 s after the recording is started. And shooting the person B in a time period of 5 s-7 s after the recording is started. And shooting the ferris wheel by transferring from the person B in a 7 s-9 s time period after recording is started. Shooting the ferris wheel in a 9 s-11 s time period after recording is started. And zooming and shooting ferris wheel details in a time period of 11 s-13 s after recording starts, wherein the picture is blurred. And continuously shooting the ferris wheel within a13 s-15 s time period after recording is started. And shooting panorama comprising the character A, the character B and ferris wheels from the ferris wheel rotating field within 15 s-17 s time period after recording is started. And shooting the panorama comprising the character A, the character B and the ferris wheel within a 17 s-20 s time period after the recording is started.
As shown in fig. 3D, electronic device 100 may display video recording interface 330 after beginning to record video. The recording interface 330 includes a recording end control 333, a shooting control 334, recording time information 332, and a recording frame. The end of recording control 333 may be used to trigger the electronic device 100 to end recording video. The capture control 334 is operable to trigger the electronic device 100 to save as a first picture a first video frame of a first video captured by a camera of the electronic device 100 upon receipt of a third input by a user.
The original video recorded by the electronic device 100 may include a plurality of highlight video clips having a highlight scene, where the highlight scene may include one or more of a character, a landscape, a delicacy, a spring festival, a christmas festival, a building, a beach, a firework, a plant, a snow scene, a travel, and the like.
For example, as shown in fig. 3D, the character a is included in the video frame 341 about 4 seconds in the original video recorded by the electronic device 100, and the electronic device 100 may determine that the scene category of the video clip about 4 seconds in the original video is "character". As shown in fig. 3E, the character B is included in the video frame 342 of the original video recorded by the electronic device 100, and the electronic device 100 can determine that the scene category of the video clip of the original video of about 9 seconds is "character". As shown in fig. 3F, the electronic device 100 may determine that the scene category of the video clip about 12 seconds in the original video is "building" when the building (for example, ferris wheel) is included in the video screen 343 about 12 seconds in the original video recorded by the electronic device 100. As shown in fig. 3G, the video frame 344 of the original video recorded by the electronic device 100 is zooming to increase the zoom magnification, so that the building (for example, ferris wheel) in the video frame 344 is blurred, and the electronic device 100 can determine that the video clip of the original video of the 16 th second is an invalid clip. As shown in fig. 3H, the panorama of the building, person a and person B is included in the video frame 345 about 20 seconds in the original video recorded by the electronic device 100, and the electronic device 100 may determine that the scene category of the video clip about 20 seconds in the original video is "travel". Alternatively, the transition portion between each of the shooting scenes may be regarded as invalid, for example, a transition from the character a to the character B in a period of 3s to 5s after the start of recording, a transition from the character B to the ferris wheel in a period of 7s to 9s after the start of recording, a zoom-in of the ferris wheel in a period of 11s to 13s after the start of recording (for example, an increase in zoom magnification), and a shot of the panorama including the character a, the character B, and the ferris wheel from the ferris wheel in a period of 15s to 17s after the start of recording may be regarded as invalid.
As shown in fig. 3H, the electronic device 100 may receive a second input from the user acting on the recording end control 333 (e.g., at 20 seconds of starting recording, clicking on the recording end control 333), in response to which the electronic device 100 may end recording and save the recorded original video and the highlight cut out of the original video.
The electronic device 100 may continuously identify and cut out a plurality of highlight video clips in the original video, where the highlight video clips are in the specified scene during the recording of the original video. After the electronic device 100 finishes recording the original video, the electronic device 100 may fuse a plurality of highlight video segments in the original video into one highlight video. The electronic device 100 may save the original video and the highlight video.
Alternatively, as shown in fig. 3I, the electronic device 100 may display the photographing interface 340 after finishing the video recording. The text description in the shooting interface 340 may refer to the text description shown in fig. 3C, and will not be described herein. After the electronic device 100 generates and saves the highlight video, the electronic device 100 may display a prompt 335 (may be referred to as a first prompt in an embodiment of the present application) on the capture interface 340, where the prompt 335 is used to prompt the user that the electronic device 100 has generated and saved the highlight video from the recorded original video. The prompt 335 may be a text prompt (e.g., "video that you have shot, generate a highlight video, please view in a gallery), a pattern prompt, an animation prompt, etc.
In one possible implementation, the electronic device 100 may save the original video after finishing recording the original video, and then identify and crop out a plurality of highlight video segments in the original video that are in the specified scene. After cropping out the plurality of highlight video segments, the electronic device 100 may fuse the plurality of highlight video segments into one highlight video. After generating the highlight video, the electronic device 100 may save the highlight video.
For example, as shown in fig. 3J, the segment of 0 to 3s in the original video is shooting a person a, the segment of 3 to 5s is transiting from the person a to the person B, the segment of 5s to 7s is shooting the person B, the segment of 7s to 9s is transiting from the person B to the ferris wheel, the segment of 9s to 11s is shooting the ferris wheel, the segment of 11s to 13s is zooming to shoot the ferris wheel details on the screen, but the screen is blurred, the segment of 13s to 15s is shooting the ferris wheel, and the segment of 15s to 17s is shooting the panorama including the person a, the person B and the ferris wheel from the ferris wheel field. And shooting panorama comprising a character A, a character B and ferris wheels on the 17 s-20 s section. The segments 3 s-5 s, 7 s-9 s, 11 s-13 s and 15 s-17 s are all transition or zooming, so they can be determined as invalid segments. The remaining segments of 0 to 3s, 5s to 7s, 9s to 11s, 13s to 15s, and 17s to 20s can be determined as highlight video segments. The segments 0-3 s are highlight video segment 1, the segments 5 s-7 s are highlight video segment 2, the segments 9 s-11 s are highlight video segment 3, the segments 13 s-15 s are highlight video segment 4, and the segments 17 s-20 s are highlight video segment 5. The electronic device 100 may splice the highlight video clip 1, the highlight video clip 2, the highlight video clip 3, the highlight video clip 4, and the highlight video clip 5 together end to end in time order to obtain a highlight video. For example, the tail of the highlight video segment 1 and the head of the highlight video segment 2 may be stitched together, the tail of the highlight video segment 2 and the head of the highlight video 3 may be stitched together, the tail of the highlight video segment 3 and the head of the highlight video 4 may be stitched together, and the tail of the highlight video segment 4 and the head of the highlight video segment 5 may be stitched together.
Alternatively, if the highlight video segment 4 is an invalid segment in fig. 3J, for example, a blurred picture or no shot object, or a user's hand tremble or passer-by or other obstacle shielding, the highlight video segment 4 is an invalid segment, then the highlight video is a highlight video segment 1, a highlight video segment 2, a highlight video segment 3, and a highlight video segment 5 are stitched together. The specific process of the electronic device 100 identifying and cropping the multiple highlight video segments in the original video and merging the multiple highlight video segments into one highlight video may refer to the following embodiments of the present application, which are not described herein.
The first input, the second input, and other inputs in the embodiment of the present application include, but are not limited to, gesture input, click operation input, voice input, and the like.
In some embodiments, after the original video and the highlight video generated from the original video are saved, the electronic device 100 may simultaneously display the display area of the highlight video in the display interface of the original video. When the electronic device 100 receives user input (e.g., a single click) for a presentation area of the highlight video, the electronic device 100 may play the highlight video.
For example, as shown in FIG. 4A, the electronic device 100 may display a desktop 310. The text description of the desktop 310 may refer to the embodiment shown in fig. 3A, and will not be repeated here.
The electronic device 100 may receive input (e.g., a single click) from a user on the gallery application icon 312, in response to which the electronic device 100 may display a gallery application interface 410 as shown in fig. 4B.
As shown in FIG. 4B, the gallery application interface 410 may display a content that includes one or more albums (e.g., all photo albums, video albums 416, camera albums, portrait albums, weChat albums, microblog albums, etc.). The electronic device 100 may display a gallery menu 411 below the gallery application interface 410. The gallery menu 411 includes a photo control 412, an album control 413, a time control 414, and a discovery control 415. The photo control 412 is used to trigger the electronic device 100 to display all the local pictures in the form of a picture thumbnail. The album control 413 is used to trigger the electronic device 100 to display an album to which the local album belongs. As shown in fig. 4B, the current album control 413 is in the selected state and the electronic device 100 displays the gallery application interface 410. The time control 414 may be used to trigger the electronic device 100 to display a locally stored pick picture. The discovery control 415 may be used to trigger the electronic device 100 to display a categorized album of pictures.
The electronic device 100 may receive user input (e.g., a click) on the video album 416, in response to which the electronic device 100 may display a video album interface 420 as shown in fig. 4C.
As shown in fig. 4C, the video album interface 420 may include one or more options of video files, such as an option 421 (which may be referred to as a first option in the embodiment of the present application) corresponding to an original video recorded by the user. The option of the video file may have the thumbnail of the designated frame picture and the video time length information displayed thereon, for example, the option 421 may have the thumbnail of the first frame picture and the video time length information (for example, 20 seconds) of the original video recorded by the user displayed thereon.
The electronic device 100 may receive a fourth input (e.g., a single click) by the user on the option 421 described above, in response to which the electronic device 100 may display a video presentation interface 430 as shown in fig. 4D.
In one possible implementation, the electronic device 100 may also receive and display the video presentation interface 430 (which may be referred to as a first video presentation interface in embodiments of the present application) shown in fig. 4D, described above, in response to a user input (e.g., a single click) to the echo control 321 shown in fig. 3I, described above.
As shown in fig. 4D, the video presentation interface 430 may include a presentation area 431 of the original video (which may be referred to as a first presentation area in the present embodiment), a presentation area 433 of the highlight video generated from the original video (which may be referred to as a second presentation area in the present embodiment), a menu 436, and the like. The display area 431 of the original video may display the frame and time information 432 (for example, the time length is 20 seconds) in the original video. When presentation area 431 of the original video receives user input (e.g., a click), electronic device 100 may play or pause the original video. The highlight video presentation area 433 may have frames in the highlight video and time information 435 for the highlight video displayed thereon (e.g., 12 seconds in length). Optionally, a highlight 434 may be displayed on the highlight 433, and the highlight 434 may be used to prompt the user that the highlight 433 is displaying a highlight generated from the original video. The menu 436 may include a share button, a favorites button, an edit button, a delete button, and a more button. The share button may be used to trigger sharing of the original video and/or the highlight video. The collection button may be used to trigger collection of the original video and/or highlight video to a collection folder. Editing buttons may be used to trigger editing functions on the original video and/or highlight, such as rotation, cropping, adding filters, blurring, etc. The delete button may be used to trigger the deletion of the original video and/or the highlight video. More buttons may be used to trigger the opening of more functions related to the original video and/or highlight video.
The electronic device 100 may receive user input (e.g., a single click) on the display area 433 of the highlight video, and in response to the input, as shown in fig. 4E, the electronic device 100 may zoom out the display area 431 of the original video and zoom in the display area 433 of the highlight video in the video display interface 430. After the presentation area 433 of the highlight video is displayed in enlargement, the electronic device 100 may receive and respond to user input (e.g., a click) on the presentation area 433 of the highlight video, in response to which the electronic device 100 may play the highlight video.
In some embodiments, the electronic device 100 may juxtapose the options of the original video and the options of the highlight video in the video album after saving the original video and the highlight video generated from the original video. When the electronic device 100 receives an input of a user's option for the original video, the electronic device 100 may display a presentation interface of the original video. When the electronic device 100 receives an input of a user's selection of the highlight video, the electronic device 100 may display a presentation interface of the highlight video.
Illustratively, after the electronic device 100 may receive user input (e.g., a click) on the video album 416 shown in FIG. 4B described above, the electronic device 100 may display the video album interface 440 shown in FIG. 4F.
As shown in fig. 4F, the video album interface 440 may include a plurality of video file options, where the plurality of video file options include an option 421 of the original video (which may be referred to as a first option in the embodiment of the present application) and an option 423 of the highlight video generated based on the original video (which may be referred to as a second option in the embodiment of the present application). The option 421 may display a thumbnail of a specified frame and video time length information (for example, 20 seconds) in the original video recorded by the user. The option 423 may have displayed thereon a thumbnail of a specified picture frame in the highlight video, video time length information (e.g., 12 seconds), and a highlight mark 425. Wherein the highlight mark 425 may be used to prompt the user that the video file corresponding to the option 423 is a highlight video generated from the original video.
The electronic device 100 may receive a fifth input (e.g., a single click) of the user's option 423 for the highlight video, in response to which the electronic device 100 may display a video presentation interface 450 (which may be referred to as a second video presentation interface in embodiments of the application) as shown in fig. 4G.
As shown in fig. 4G, the video display interface 450 may include a display area 451 with a highlight, and a frame of the highlight, and time information 452 (for example, a time length of 12 seconds) may be displayed in the display area 451. Optionally, a highlight 453 may be displayed on the highlight 451, and the highlight 453 may be used to prompt the user that the highlight 451 is a highlight generated from the original video. The menu 454 may include a share button, a favorites button, an edit button, a delete button, and a more button. The share button may be used to trigger sharing of the highlight video. The collection button may be used to trigger collection of the highlight video to a collection folder. The edit button may be used to trigger edit functions on the highlight video such as rotation, cropping, adding filters, blurring, etc. The delete button may be used to trigger the deletion of the highlight. More buttons may be used to trigger the turning on of more highlight related functions. When the display area 451 of the highlight video receives user input (e.g., a click), the electronic device 100 may play or pause the highlight video.
In some application scenarios, a user may record video in a particular recording mode (e.g., a highlight) in a camera application using the electronic device 100. During the process of recording video by the electronic device 100, the electronic device 100 may identify and clip a plurality of highlight video segments of a specified shooting scene in the recorded original video, and blend the plurality of highlight frequency bands into one highlight video. After the recording of the video is completed, the electronic device 100 may save the highlight video. Optionally, the electronic device 100 may also save the original video. Thus, the ornamental value of the video recorded by the user can be improved.
For example, as shown in fig. 5A, the electronic device 100 may capture an interface 510. The capture interface 510 may include a back display control 511, a capture control 512, a camera conversion control 513, a preview box, a setup control 515, a zoom magnification control 516, one or more capture mode controls (e.g., a "night scene mode" control 517A, a "portrait capture mode" control 517B, a "large aperture mode" control 517C, a "normal capture mode" control 517D, a "video mode" control 517E, a "highlight video mode" control 517H, a "professional mode" control 517F, a "more modes" control, etc.). The electronic device 100 may receive an input (e.g., a single click) of the user option highlight mode control 517H, in response to which the electronic device 100 may switch from "normal photographing mode" to "highlight mode". The text description of the control in the shooting interface 510 may refer to the shooting interface 320 shown in fig. 3B, which is not described herein.
As shown in fig. 5B, after switching to the "highlight recording mode", the electronic device 100 may replace the shooting control 512 with a recording start control 521. The electronic device 100 may also display video time information 522.
Electronic device 100 may receive user input (e.g., a single click) on video recording start control 521, in response to which electronic device 100 may begin recording video. In the highlight recording mode, the electronic device 100 may continuously identify and cut out a plurality of highlight video clips in the original video, where the highlight video clips are in the specified scene during the recording of the original video. After the electronic device 100 finishes recording the original video, the electronic device 100 may fuse a plurality of highlight video segments in the original video into one highlight video. The electronic device 100 may save the highlight video. Optionally, the electronic device 100 may also save the original video.
In one possible implementation, as shown in fig. 5C, the electronic device 100 may display the prompt 523 on the capturing interface when switching to the highlight mode. Wherein the prompt 523 may be used to prompt the user for a pattern profile of the highlight pattern (e.g., a highlight clip in your recording will be identified and a highlight generated).
In one possible implementation, the electronic device 100 may preset the highlight scene that the user needs during the recording process. After the user sets the highlight scene, during the recording process of the electronic device 100, the electronic device 100 may identify a plurality of highlight video segments corresponding to the highlight scene set by the user from the original video, and fuse the plurality of highlight video segments into one highlight video.
For example, as shown in fig. 5C, the electronic device 100 may receive user input (e.g., a single click) for the settings control 515, in response to which the electronic device 100 may display a settings window 530 as shown in fig. 5D on the capture interface 510.
As shown in fig. 5D, the settings window 530 may display a window including a window closing control 531, one or more settings items, such as a resolution settings bar 532 and a highlight settings bar. Among other things, the highlight scene setting column may include one or more highlight scene settings, such as a "people scene" setting 533, a "scenery scene" setting 534, a "building scene" setting 535, a "delicacy scene" setting 536, and a "travel scene" setting 537, among others.
As shown in fig. 5E, the electronic device 100 may receive user input for a highlight setting bar, select a person scene, a landscape scene, a delicacy scene, a travel scene, and the like as highlight scenes when generating a highlight video. After the user sets the highlight scene, during the recording process of the electronic device 100, the electronic device 100 may identify a plurality of highlight video segments corresponding to the highlight scene set by the user from the original video, and fuse the plurality of highlight video segments into one highlight video.
In some application scenarios, the electronic device 100 may trigger generation of a highlight video from an original video in a display interface of the original video in the video album after recording the original video and saving the original video to the video album. After the user triggers the generation of the highlight from the original video, the electronic device 100 may identify and clip a plurality of highlight clips in the original video that have highlight scenes, and blend the plurality of highlight bands into one highlight. After the recording of the video is completed, the electronic device 100 may save the highlight video. Thus, the ornamental value of the video recorded by the user can be improved.
For example, as shown in fig. 6A, the electronic device 100 may display a library application interface 410. The text description of the gallery application interface 410 may refer to the text portion of the embodiment shown in fig. 4B, which is not described herein.
The electronic device 100 may receive user input (e.g., a click) on the video album 416, in response to which the electronic device 100 may display a video album interface 420 as shown in fig. 6B.
As shown in fig. 6B, the video album interface 420 may include one or more options for video files, such as the option 421 corresponding to the original video recorded by the user in the above embodiment. The detailed text descriptions of the video album interface 420 may refer to the text parts of the embodiment shown in fig. 4C, and will not be repeated herein.
The electronic device 100 may receive input (e.g., a single click) of a user's action on an option 421 of the original video, in response to which the electronic device 100 may display a video presentation interface 610 as shown in fig. 6C.
As shown in fig. 6C, the video presentation interface 610 may include a presentation area 611 of the original video, a menu 613, and a highlight video generation control 614, among others. The display area 611 of the original video may display the frame and time information 612 (for example, the time length is 20 seconds) in the original video. When the presentation area 611 of the original video receives user input (e.g., a click), the electronic device 100 may play or pause the original video. The highlight generation control 614 may be used to trigger the electronic device 100 to generate a highlight from the original video presented in the presentation area 611. The menu 613 may include a share button, a favorites button, an edit button, a delete button, and a more button. The share button may be used to trigger sharing of the original video. The collection button may be used to trigger collection of the original video to a collection folder. The edit button can be used to trigger the edit functions of rotation, cropping, adding filters, blurring, etc. to the original video. A delete button may be used to trigger deletion of the original video. More buttons may be used to trigger the opening of more functions related to the original video.
The electronic device 100 can receive input (e.g., a single click) by a user for the highlight video generation control 614, in response to which the electronic device 100 can identify and crop out multiple highlight video segments in the original video that are in the highlight scene and fuse the multiple highlight video segments into one highlight video.
Alternatively, as shown in fig. 6D, in the process of generating a highlight video by the electronic device 100, the electronic device 100 may display the generation progress 615 of the highlight video on the video presentation interface 610 of the original video. The specific process of the electronic device 100 identifying and cropping the multiple highlight video segments in the original video and merging the multiple highlight video segments into one highlight video may refer to the following embodiments of the present application, which are not described herein.
As shown in fig. 6E, after the electronic device 100 has generated the highlight video, the electronic device 100 may display a display area 616 corresponding to the highlight video generated from the original video on the video display interface 610 of the original video. Wherein the highlight video presentation area 616 may have displayed thereon frames in the highlight video and time information 618 (e.g., 12 seconds in length of time) for the highlight video. Optionally, a highlight mark 617 may be displayed on the highlight video display area 616, which highlight mark 617 may be used to prompt the user that the highlight video generated from the original video is displayed in the display area 616.
The electronic device 100 may receive user input (e.g., a single click) on the display area 616 of the highlight video, in response to which the electronic device 100 may zoom out the display area 611 of the original video in the video display interface 610 and zoom in the display area 616 of the highlight video, as shown in fig. 6F. After the display area 616 of the highlight video is enlarged, the electronic device 100 may receive and respond to user input (e.g., a single click) to the display area 616 of the highlight video, in response to which the electronic device 100 may play the highlight video.
In one possible implementation, the electronic device 100 may receive a highlight scene set by a user when the user confirms that a highlight video is generated from an original video in a presentation interface of the original video displayed on the electronic device 100. The electronic device 100 may identify and clip a plurality of highlight video clips in the highlight scene in the original video based on the highlight scene set by the user, and blend the plurality of highlight bands into one highlight video. Wherein, when the user selects different highlight scenes for the same original video, the electronic device 100 may generate different highlight videos.
For example, as shown in fig. 7A, the electronic device 100 may display a video presentation interface 610. For a text description of the video presentation interface 610, reference may be made to the embodiment shown in fig. 6C, which is not repeated here.
The electronic device 100 can receive input (e.g., a single click) by a user for the highlight video generation control 614, in response to which the electronic device 100 can display a scene setting window 710 as shown in fig. 7B.
As shown in fig. 7B, the scene settings window 710 may display settings including a determination control 716, a cancel control 717, and one or more highlight scenes, such as a "person scene" setting 711, a "scenery scene" setting 712, a "building scene" setting 713, a "delicacy scene" setting 714, and a "travel scene" setting 715, among others.
As shown in fig. 7C, the electronic device 100 may receive input of a setting item for a highlight scene by a user, select a person scene, a landscape scene, a delicacy scene, a travel scene, and the like as the highlight scene when generating a highlight video. After the user has set the highlight, the electronic device 100 can user input (e.g., a single click) to the determination control 716, in response to which the electronic device 100 can identify from the original video a plurality of highlight video segments corresponding to the set highlight scene type set a of the user and fuse the plurality of highlight video segments into one highlight video (e.g., highlight video a).
Alternatively, as shown in fig. 7D, in the process of generating a highlight video by the electronic device 100, the electronic device 100 may display the generation progress 615 of the highlight video a and the scene type of the highlight video a on the video presentation interface 610 of the original video. The specific process of the electronic device 100 identifying and cropping the multiple highlight video segments in the original video and merging the multiple highlight video segments into one highlight video may refer to the following embodiments of the present application, which are not described herein.
As shown in fig. 7E, after the electronic device 100 generates the highlight video a, the electronic device 100 may display a display area 616 corresponding to the highlight video a generated from the original video on the video display interface 610 of the original video. The display area 616 of the highlight video a may display thereon a frame in the highlight video a, time information 618 of the highlight video a (e.g., 12 seconds in length), and scene information 619 of the highlight video a (e.g., people, landscapes, delicacies, and travel). Optionally, a highlight mark 617 may be displayed on the highlight region 616, which highlight mark 617 may be used to prompt the user that the highlight video a generated from the original video is displayed in the display region 616.
Wherein, when the user selects different highlight scenes for the same original video, the electronic device 100 may generate different highlight videos. Thus, while and after the electronic device 100 generates the highlight video a from the original video, the electronic device 100 may also continue to display the highlight video generation control 614 on the video presentation interface 610 described above.
The electronic device 100 may continue to receive user input (e.g., a single click) for the highlight video generation control 614 after generating the highlight video a from the original video, in response to which the electronic device 100 may display the scene setting window 710 as shown in fig. 7F. For the text description of the scene setting window 710, reference may be made to the text portion of the embodiment shown in fig. 7B, which is not described herein.
As shown in fig. 7F, when the set of highlight scene types b selected by the user in the scene setting window 710 is the same as the set of highlight scene types a for which the highlight video a has been generated, the electronic device 100 can output a prompt 718 and disable the determination control 716. Wherein, after the determining control 716 is disabled, the determining control 716 cannot perform the corresponding highlight video generating function in response to the input of the user. The prompt 718 may be used to prompt the user that the type of highlight scene selected by the user is the same as the highlight scene for which highlight video a has been generated. For example, the prompt 718 may be a text prompt "you have generated a highlight video of the same highlight scene, please reselect".
As shown in fig. 7G, when the set of highlight scene types b selected by the user in the scene setting window 710 is different from the set of highlight scene types a for which the highlight video a has been generated, the electronic device 100 can enable the determination control 716.
The electronic device 100 can receive user input (e.g., a single click) on the determination control 716, in response to which the electronic device 100 can identify from the original video a plurality of highlight video segments corresponding to the set of highlight scenes b of the user's setting and fuse the plurality of highlight video segments into one highlight video (e.g., highlight video b).
As shown in fig. 7H, after the electronic device 100 generates the highlight video b, the electronic device 100 may display a display area 721 corresponding to the highlight video b generated from the original video on the video display interface 610 of the original video. The display area 721 of the highlight video b may have displayed thereon a frame picture in the highlight video b, time information 723 of the highlight video b (for example, a time length of 8 seconds), and scene information 724 of the highlight video b (for example, characters, scenery, delicacies, and travel). Optionally, a highlight 722 may be displayed on the highlight 721, and the highlight 722 may be used to prompt the user that the highlight 721 is displaying the highlight b generated from the original video.
In some application scenarios, the electronic device 100 may identify and clip a plurality of highlight video clips in a video stream having a highlight scene during a video call, and fuse the plurality of highlight video clips into a highlight video. After the video call is completed, the electronic device 100 may save the highlight video. Optionally, the electronic device 100 may also share the generated highlight video to the counterpart of the video call. Therefore, in the video call process, a plurality of highlight video clips in the video stream can be fused into a highlight video, so that a user can conveniently review the content of the video call.
For example, as shown in fig. 8A, the electronic device 100 may display a video call answering interface 810. The video call answering interface 810 can include a reject control 811, a video to voice control 812, and an answer control 813.
The electronic device 100 can receive user input (e.g., a click) on the answer control 813, in response to which the electronic device 100 can display a video call interface 820 as shown in fig. 8B, and receive video streams sent from a call partner while capturing video streams through a camera and microphone in real time.
As shown in fig. 8B, the video call interface 820 may include a picture 821 in a video stream captured by the electronic device 100 through a camera and a microphone in real time, a picture 822 in a video stream sent from a call partner, a hang-up control 823, a video-to-voice control 824, a shot switching control 825, a highlight recording control 826, and a picture switching control 827. The hang-up control 823 may be used to trigger the electronic device 100 to hang up a video call with the other party. The video-to-voice control 824 can be used to trigger the electronic device 100 to convert a video call to a voice call. The shot cut control 825 may be used to trigger a camera of the electronic device 100 to capture video pictures in real time (e.g., front camera to rear camera or rear camera to front camera). The highlight control 826 may be used to trigger the electronic device 100 to generate a highlight video based on the call video stream. The screen switch control 827 may be used to trigger the electronic device 100 to switch the display positions of the screen 821 and the screen 822.
The electronic device 100 can receive user input (e.g., a single click) on the highlight control 826, in response to which the electronic device 100 can identify a plurality of highlight video clips having a highlight scene in a video stream captured by the electronic device 100 via a camera and microphone in real-time and/or a video stream from a conversation partner, and fuse the plurality of highlight video clips into a highlight video. The electronic device 100 may save the highlight video after the recording is completed or the video call is completed.
As shown in fig. 8C, when electronic device 100 begins a highlight recording, electronic device 100 can replace highlight recording control 826 with end recording control 828. The end recording control 828 may be used to trigger the electronic device 100 to end recording of the highlight video.
In some application scenarios, the electronic device 100 may identify and clip a plurality of highlight video segments having highlight scenes in a video stream during live video, and fuse the plurality of highlight video segments into one highlight video. The electronic device 100 may save the highlight video after the live broadcast is over. Optionally, the electronic device 100 may synchronize the generated highlight video to a server of the live application, bind with the live account, and share the highlight video to a public viewing area for viewing by other accounts focusing on the live account. Therefore, in the video live broadcast process, a plurality of highlight video clips in the video live broadcast can be fused into a highlight video, so that a user and other users focusing on the live broadcast account can conveniently review the content of the video call.
In one possible implementation manner, the electronic device 100 may acquire a live video stream of the electronic device 100 during a live video process, and the live video server may identify a plurality of highlight video segments in the live video stream of the electronic device 100, fuse the plurality of highlight video segments into a highlight video, and store the highlight video into a storage space associated with an account of the live video logged in by the electronic device 100. The user may also share the highlight video for viewing by other users through the live server using the electronic device 100. In this way, the user and other users who are interested in the live account can conveniently review the content of the video call.
In the embodiment of the application, the original video may be referred to as a first video, and the highlight video may be referred to as a second video. The second video may include a part of the video segments in the first video, for example, the first video includes a first video segment, a second video segment (highlight video segment), a second video segment (invalid video segment), and a third video segment (highlight video segment). The ending time of the first video segment is earlier than or equal to the starting time of the second video segment, and the ending time of the second video segment is earlier than or equal to the starting time of the third video segment. Since the second video clip is an invalid clip, the second video includes the first video clip and the third video clip, excluding the second video clip.
The first video further comprises a fourth video segment, if the fourth video segment is a highlight video segment, the second video comprises the fourth video segment, and if the fourth video segment is an invalid video segment, the second video does not comprise the fourth video segment.
The time length of the first video is longer than that of the second video, or the time length of the first video is shorter than that of the second video, or the time length of the second video is equal to that of the second video.
The highlight video clip comprises a video clip in which the shooting scene in the first video is a highlight scene and does not comprise a transition clip. Or the highlight video clip includes a video clip in which the photographed scene in the first video is a designated highlight scene and does not include a noisy or non-sound transition clip. Wherein the highlight scene comprises one or more of a character, a landscape, a delicacy, a spring festival, a christmas, a building, a beach, a firework, a plant, a snow scene or a trip, etc.
The following describes a video processing method provided in the embodiment of the present application with reference to a flowchart and a functional block diagram.
Fig. 9 is a schematic flow chart of a video processing method according to an embodiment of the present application.
As shown in fig. 9, the method may include the steps of:
s901, the electronic device 100 acquires an audio stream and an image stream acquired in real time in the video recording process.
During the recording process, the electronic device 100 may collect the image stream in real time through the camera and collect the audio stream in real time through the microphone and the audio circuit. The time stamp of the audio stream acquired in real time is the same as the time stamp of the image stream.
The interface for the video recording process may refer to the embodiments shown in fig. 3A-3I or the embodiments shown in fig. 5A-5E, which are not described herein.
S902, the electronic device 100 performs scene detection on the image stream to determine a scene category of each picture frame in the image stream.
Among other things, scene categories may include characters, spring festival, christmas, ancient architecture, beach, fireworks, plants, snow scenes, delicacies, and travel, among others.
The electronic device 100 may identify a scene category for each picture frame in the image stream using the trained scene classification model. The training of the scene classification model can be performed in advance by marking a large amount of image data of scene categories, so as to establish a data set. The data set is then input into a classification model to train the neural network classification model. The neural network used in the scene classification model is not limited, and may be, for example, a convolutional neural network, a full convolutional neural network, a deep neural network, a BP neural network, and the like.
In one possible implementation, to increase the recognition speed of the scene categories of the frame in the image stream, the electronic device 100 may sample the image stream acquired in real time at intervals (for example, 1 frame every 3 frames) before inputting the image stream into the scene classification model, obtain a sampled image stream, record the frame numbers of the sampled image frames in the sampled image stream in the real-time image stream, and input the sampled image stream into the neural network classification model, so as to recognize the scene categories of each sampled image frame in the sampled image stream. After identifying the scene category of each sample image frame in the sample image stream, the electronic device 100 may label the plurality of picture frames in the image stream that are identical and adjacent to the picture frame number of the sample image frame based on the scene category and the picture frame number of the sample image frame as the corresponding scene category of the sample image frame. For example, the electronic device 100 may take 1 picture frame from every 3 picture frames of the image stream as a sampling image frame. Wherein, the 77 th picture frame in the image stream is a sampling image frame, and the scene category of the sampling image frame with the frame number of 77 is 'person'. Then, the electronic device 100 may label the scene categories of the 77 th picture frame and the 76 th and 78 th picture frames in the image stream as "person".
In one possible implementation, to increase the recognition speed of the scene category of the frame in the image stream, the resolution of the image stream may be reduced (for example, from 4K to 640×480 resolution) and then input into the scene classification model.
In one possible implementation, to increase the recognition speed of the scene category of the picture frame in the image stream, the resolution of the image stream may be reduced (for example, from 4K to 640×480 resolution) and input into the scene classification model after sampling at intervals.
S903, the electronic device 100 performs transition detection on the image stream, and determines a transition position and a transition category where scene transition occurs in the image stream.
The conversion categories of scene conversion may include video subject conversion (e.g., may be specifically divided into a video subject to a person from a landscape, a person to a food, a food to a person, a person to ancient architecture, an ancient architecture to a landscape, etc.), a screen zoom, a quick mirror, etc.
The electronic device 100 may use the trained transition recognition model to recognize the transition locations and transition categories in the image stream at which scene transitions occur. The data set can be established by training a transition recognition model through a large number of image streams marked with the transition positions and the transition categories in advance. The dataset is then input into a transition recognition model to train the transition recognition model. The neural network used in the transition recognition model is not limited, and may be, for example, a 3D convolutional neural network, or the like.
In one possible implementation, to increase the recognition speed of the transition locations and transition categories in the image stream where scene transitions occur. The electronic device 100 may perform a resolution reduction process (e.g., from 4K to 640 x 480 resolution) on the image stream acquired in real-time to obtain a low-definition image stream before inputting the image stream into the transition recognition model. And then inputting the low-definition image stream into a transition identification model, performing transition detection, and identifying the transition position and the transition category in the low-definition image stream. The electronic device 100 may determine a transition position and a transition category in the low-definition image stream, where the transition position and the transition category are corresponding in the image stream acquired in real time.
In the embodiment of the present application, the execution order of the step S902 and the step S903 is not limited, and the step S902 may be executed first, the step S903 may be executed first, or the step S902 and the step S903 may be executed in parallel.
S904, the electronic device 100 divides the image stream into a plurality of picture segments based on the scene category of each picture frame in the image stream and the transition position and transition category at which the scene transition occurs in the image stream, and determines the segment subject of each picture segment.
S905, the electronic device 100 determines a plurality of highlight clips under the highlight subject from the plurality of clips based on the clip subjects of the plurality of clips, and records the positions of the plurality of highlight clips in the image stream.
For example, as shown in fig. 10, the time length of the image stream may be 0 to t14. The recognition result of the scene category in the image stream may be that the scene category of the 0-t 2 segment in the image stream is "person (person A)", the scene category of the t 2-t 5 segment in the image stream is "person (person B)", the scene category of the t 5-t 10 segment in the image stream is "food", and the scene category of the t 10-t 14 segment in the image stream is "landscape".
The identification result of the transition in the image stream can be that the transition type of the t 1-t 3 segment in the image stream is 'person to person', the transition type of the t 4-t 6 segment in the image stream is 'person to food', the transition type of the t 7-t 8 segment in the image stream is 'quick operation mirror', and the transition type of the t 9-t 11 segment in the image stream is 'picture zoom'.
The division of the picture segments in the image stream can be that the image stream can be divided into t 0-t 1 picture segments, t 1-t 3 picture segments, t 3-t 4 picture segments, t 4-t 6 picture segments, t 6-t 7 picture segments, t 7-t 8 picture segments, t 8-t 9 picture segments, t 9-t 11 picture segments, t 11-t 12 picture segments, t 12-t 13 picture segments and t 13-t 14 picture segments. Wherein the segment subject of the t 0-t 1 frame segment is "character", the segment subject of the t 1-t 3 frame segment is "invalid", the segment subject of the t 3-t 4 frame segment is "character", the segment subject of the t 4-t 6 frame segment is "invalid", the segment subject of the t 6-t 7 frame segment is "food", the segment subject of the t 7-t 8 frame segment is "invalid", the segment subject of the t 8-t 9 frame segment is "food", the segment subject of the t 9-t 11 frame segment is "invalid", the segment subject of the t 11-t 12 frame segment is "scenery", the segment subject of the t 12-t 13 frame segment is "invalid", and the segment subject of the t 13-t 14 frame segment is "scenery".
The electronic device 100 may eliminate invalid theme segments from the plurality of scene segments, leaving the remaining highlight scene segments. For example, as shown in FIG. 10, the remaining highlight clips may include t 0-t 1, t 3-t 4, t 6-t 7, t 8-t 9, t 11-t 12, and t 13-t 14.
At S906, at the end of recording, the electronic device 100 mixes the image stream and the audio stream into the original video.
At the end of recording, the electronic device 100 may mix the image stream and the audio stream into the original video based on the time axis of the image stream and the time axis of the audio stream. The electronic device 100 may receive an input from a user, trigger to record an end video, or may automatically end recording when the electronic device 100 records a specified duration.
S907, the electronic device 100 intercepts a plurality of highlight clips from the original video based on the positions of the plurality of highlight clips in the image stream.
For example, the highlight clips may include t 0-t 1, t 3-t 4, t 6-t 7, t 8-t 9, t 11-t 12, and t 13-t 14. The electronic device 100 may intercept a video segment with a time line of t0 to t1 in an original video as a highlight video segment 1, intercept a video segment with a time line of t3 to t4 as a highlight video segment 2, intercept a video segment with a time line of t6 to t7 as a highlight video segment 3, intercept a video segment with a time line of t8 to t9 as a highlight video segment 4, intercept a video segment with a time line of t11 to t12 as a highlight video segment 5, and intercept a video segment with a time line of t13 to t14 as a highlight video segment 6.
S908, the electronic device 100 fuses the plurality of highlight video clips into one highlight video.
The electronic device 100 may directly splice a plurality of highlight video segments together according to a time sequence as one highlight video. For example, when the original video includes a first video segment, a second video segment, and a third video segment, and the highlight video segment includes the first video segment and the third video segment, the electronic device may splice the end position of the first video segment and the start position of the third video segment together, so as to obtain the highlight video.
In one possible implementation, the electronic device 100 may add video special effects in the splicing area of the highlight video clips during the splicing process, for transitioning the video. Wherein the video effect may include a picture effect. Optionally, the video effects may also include audio effects. For example, when the original video includes a first video segment, a second video segment, and a third video segment, and the highlight video segment includes the first video segment and the third video segment, the electronic device may splice together an end position of the first video segment and a start position of the first special effect segment, and splice together an end position of the first special effect segment and a start position of the third video segment, so as to obtain the second video.
The splicing area may be increased by a time zone between an end position of a previous highlight video clip and a start position of a next highlight video clip of the two highlight video clips. For example, as shown in fig. 10, there may be a mosaic area 1 between the end position of the highlight video segment 1 and the start position of the highlight video segment 2, there may be a mosaic area 2 between the end position of the highlight video segment 2 and the start position of the highlight video segment 3, there may be a mosaic area 3 between the end position of the highlight video segment 3 and the start position of the highlight video segment 4, there may be a mosaic area 4 between the end position of the highlight video segment 5 and the start position of the highlight video segment 6, and there may be a mosaic area 5 between the end position of the highlight video segment 5 and the start position of the highlight video segment 6.
In one possible implementation, the stitching region may be a region consisting of an end portion region (e.g., an end 500ms portion) of a previous highlight video segment and a beginning portion region (e.g., a beginning 500ms portion) of a subsequent highlight video segment of the two highlight video segments. For example, as shown in fig. 11, the end portion area of the highlight video segment 1 and the start portion area of the highlight video segment 2 may be the mosaic area 1, the end portion area of the highlight video segment 2 and the start portion area of the highlight video segment 3 may be the mosaic area 2, the end portion area of the highlight video segment 3 and the start portion area of the highlight video segment 4 may be the mosaic area 3, the end portion area of the highlight video segment 4 and the start portion area of the highlight video segment 5 may be the mosaic area 4, the end portion area of the highlight video segment 5 and the start portion area of the highlight video segment 6 may be the mosaic area 5.
The picture special effect of the splicing area can comprise the picture fusion of the two highlight video clips, such as the in-flight, the out-flight, the front highlight video clip and the rear highlight video clip. For example, in a splicing area of two highlight video clips, it is possible to fly the picture of the preceding highlight video clip from the left side out of the video display window and simultaneously fly the picture of the following highlight video clip from the right side into the video display window.
Wherein the audio effects of the mosaic area may comprise pure music, songs, etc. In a possible implementation, when the splicing area may be an area formed by an end portion area of a previous highlight video clip and a start portion area (for example, a start 500ms portion) of a next highlight video clip, the electronic device 100 may gradually decrease the audio volume of the previous highlight video clip and gradually increase the audio volume of the next highlight video clip from small in the splicing area.
In one possible implementation, the electronic device 100 may select the video special effects used in the splicing region based on the segment topics corresponding to the two highlight video segments before and after the splicing region. For example, the segment corresponding to the highlight video segment 1 before the splicing area 1 is entitled "character", the segment corresponding to the highlight video segment 2 after the splicing area 1 is entitled "character", and the video special effect 1 can be used in the splicing area 1. The segment corresponding to the highlight video segment 2 before the splicing area 2 is a character, the segment corresponding to the highlight video segment 3 after the splicing area 2 is a food, and the video special effect 2 can be used in the splicing area 2. The main title of the segment corresponding to the highlight video segment 3 before the splicing area 3 is "delicious food", the main title of the segment corresponding to the highlight video segment 4 after the splicing area 3 is "delicious food", and the video special effect 3 can be used in the splicing area 3. The segment corresponding to the highlight video segment 4 before the splicing area 4 is titled as "delicious food", the segment corresponding to the highlight video segment 5 after the splicing area 4 is titled as "landscape", and the video special effect 4 can be used in the splicing area 4. The segment corresponding to the highlight video segment 5 before the splicing area 5 is entitled "landscape", the segment corresponding to the highlight video segment 6 after the splicing area 5 is entitled "landscape", and the video special effect 5 can be used in the splicing area 5.
In one possible implementation, the electronic device 100 may add background music to the highlight video after splicing together a plurality of highlight video segments in chronological order as one highlight video. Alternatively, the electronic device 100 may select background music based on the clip topics in the plurality of highlight video clips. For example, the electronic device 100 may select a clip theme with the longest appearance among the plurality of highlight clips as a theme of the highlight, and select music corresponding to the theme of the highlight as background music based on the theme of the highlight, and add the music to the highlight.
In one possible implementation, the electronic device 100 may score a plurality of highlight video clips based on the clip theme of the plurality of highlight video clips, respectively. Then, a plurality of highlight clips after the dubbing are spliced together in chronological order as one highlight. For example, the segment topic corresponding to the highlight video segment 1 is "character", and thus, the segment topic of the highlight video segment 1 may be music 1. The segment topic corresponding to highlight video segment 2 is "personage", and thus the segment topic for highlight video segment 1 may be music 1. The clip topic corresponding to highlight video clip 3 is "delicates", and therefore the clip topic of highlight video clip 1 may be music 2. The clip topic corresponding to highlight video clip 4 is "delicates", and thus the clip topic for highlight video clip 1 may be music 2. The clip topic corresponding to the highlight video clip 5 is "landscape", and thus the clip topic of the highlight video clip 1 can be music 3. The clip topic corresponding to the highlight video clip 6 is "landscape", and thus the clip topic of the highlight video clip 1 can be music 3.
S909, the electronic device 100 saves the original video and the highlight video.
After the original video and the highlight video are saved in the electronic device 100, the interface schematic diagram for displaying the saved original video and highlight video may refer to the embodiments shown in fig. 4A to 4G, which are not described herein again.
In some embodiments, the electronic device 100 may generate a highlight video in a gallery application for the captured raw video. At this time, the electronic device 100 may split the image stream and the audio stream from the original video first. Then, the above steps S902 to S905, and steps S907 to S908 are performed based on the image stream, and a highlight video is generated.
In one possible implementation, the electronic device 100 may save a third video, where the original video may include a fifth video clip and a sixth video clip. The ending time of the fifth video clip is earlier than or equal to the starting time of the sixth video clip, the third video includes the fifth video clip and the sixth video clip, and the fifth video clip and the sixth video clip both include the same shooting subject. For example, a person capturing subject is included in both the fifth video clip and the sixth video clip, and so on. Therefore, the fragments of the same type of shooting subject in the original video can be extracted to generate the wonderful video, and the viewing experience of the video recorded by the user is improved.
According to the video processing method provided by the embodiment of the application, invalid fragments (such as scene switching, picture scaling, picture quick-moving mirror, picture severely jittering and the like) in recorded video can be deleted by analyzing scenes and transferring fields in the video recorded by a user, a plurality of wonderful video fragments in the recorded video are clipped, and the wonderful video fragments are fused into a wonderful video. Thus, the ornamental value of the video recorded by the user can be improved.
Fig. 12 is a functional block diagram of a video processing system according to an embodiment of the present application.
As shown in fig. 12, the video processing system 1200 may include a data module 1201, a perception module 1202, a fusion module 1203, and a video processing module 1204. Wherein,
The data module 1201 is used for acquiring an image stream and an audio stream when recording video. The data module 1201 may pass the image stream to the perception module 1202 and the image stream and the audio stream to the video processing module 1204.
The perception module 1202 may perform video understanding on the image stream, wherein the video understanding includes transition detection and scene detection. Specifically, the perception module 1202 may perform scene detection on the image stream, and identify a scene category of each frame in the image stream. The perception module 1202 may perform transition detection on the image stream, identifying transition locations and transition categories in the image stream where scene transitions occur. For details of the transition detection and the scene detection of the image stream, reference may be made to step S902 and step S903 in the foregoing embodiment shown in fig. 9, which are not described herein.
The perception module 1202 may pass the scene category of each picture frame and the transition location and transition category in the image stream where the scene transition occurred to the fusion module 1203.
The fusion module 1203 may divide the image stream into a plurality of picture segments based on the transition locations in the image stream where scene transitions occur. The fusion module 1203 may determine a segment topic for each of the plurality of picture segments based on the transition location and transition category at which the scene transition occurred and the scene category for each picture frame. For details, reference may be made to step S905 in the embodiment shown in fig. 9, which is not described herein.
The fusion module 1203 may present the positions of the plurality of picture segments and the segment subject to the video processing module 1204.
The video processing module 1204 may blend the audio stream and the image stream into an original video. The video processing module 1204 may remove the frame segments of the invalid theme in the original video based on the positions of the plurality of frame segments and the segment theme, thereby capturing a plurality of highlight video segments. For details, reference may be made to step S906 to step S907 in the foregoing embodiment shown in fig. 9, which are not described herein.
The video processing module 1204 may fuse the plurality of highlight video segments into one highlight video, wherein the fusing process includes stitching the highlight video segments, adding special effects, adding a soundtrack, and the like. For details, reference may be made to step S908 in the embodiment shown in fig. 9, which is not described herein.
The video processing module 1204 may output the raw video and the highlight video.
Fig. 13 is a schematic flow chart of a video processing method according to another embodiment of the present application.
As shown in fig. 13, the video processing method includes:
s1301, the electronic equipment 100 acquires an audio stream and an image stream acquired in real time in the video recording process.
For details, reference may be made to step S901 in the embodiment shown in fig. 9, which is not described herein.
S1302, the electronic device 100 performs scene detection on the image stream to determine a scene category of each frame in the image stream.
For details, reference may be made to step S902 in the embodiment shown in fig. 9, which is not described herein.
S1303, the electronic device 100 performs transition detection on the image stream, and determines a transition position and a transition category where scene transition occurs in the image stream.
For details, reference may be made to step S903 in the embodiment shown in fig. 9, which is not described herein.
In the embodiment of the present application, the execution order of the step S1302 and the step S1303 is not limited, and the step S1302 may be executed first, the step S1303 may be executed first, or the step S1302 and the step S1303 may be executed in parallel.
S1304, the electronic device 100 performs voice activation detection on the audio stream, identifies a start-stop time point of a voice signal in the audio stream, and divides the audio stream into a plurality of audio clips.
The electronic device 100 may slide the voice signal, and detect the audio feature of the voice signal in the slide window. The electronic device 100 may identify a start-stop time point of a speech signal in an image stream based on audio features in the image stream. The electronic device 100 may divide the audio stream into a plurality of audio segments based on the start-stop time points of the speech signals in the audio stream. The audio features may include, among other features, spectral slope (spectralslope), correlation coefficient (correlationcoefficiens), log-likelihood ratio (loglikelihoodratio), cepstral coefficient (cepstral), weighted cepstral coefficient (WEIGHTEDCEPSTRAL), and the like.
S1305, the electronic device 100 classifies audio events for a plurality of audio clips in the audio stream.
Wherein the electronic device 100 may identify the audio event category of the audio clip using the audio event classification model of the training number. The training of the audio event classification model can be performed by marking a large amount of data of audio signals and audio event categories in advance, so as to establish a data set. The data set is then input into an audio event classification model to train the audio event classification model. The neural network used in the transition recognition model is not limited, and may be, for example, a cyclic neural network (recurrent neural network, RNN) classification model, a long short-term memory (LSTM) artificial neural network classification model, or the like.
The audio event categories may include, among other things, speech, laughter, music, noise, and the like. Alternatively, the noise may be subdivided to include vehicle travel sounds, animal sounds, bird sounds, dog barking, wind sounds, and the like.
S1306, the electronic device 100 determines a plurality of audio event image segments corresponding to the plurality of audio segments in the image stream and an audio event category corresponding to each audio event image segment based on start-stop time points of the plurality of audio segments.
S1307 the electronic apparatus 100 divides the image stream into a plurality of picture segments based on the scene category of each picture frame in the image stream, the transition position and transition category in the image stream at which the scene transition occurs, and the positions and audio event categories of the plurality of audio event image segments, and determines the segment subject of each picture segment.
Specifically, the electronic device 100 may divide the image stream into a plurality of picture segments based on the locations of the audio event image segments and the transition locations in the image stream where scene transitions occur. Wherein the positions of the audio event image segments and the transition positions can be combined to divide the image stream into a plurality of picture segments.
The electronic device 100 may then determine the theme of each of the picture segments based on the scene category, the transition category, and the audio event category to which each of the picture segments corresponds.
For example, as shown in fig. 14, the time length of the original video may be 0 to t20. The recognition result of the scene category in the image stream may be that the scene category of the 0-t 3 segment in the image stream is "person (person A)", the scene category of the t 3-t 7 segment in the image stream is "person (person B)", the scene category of the t 7-t 13 segment in the image stream is "food", and the scene category of the t 13-t 16 segment in the image stream is "no scene". The scene category of the t 16-t 20 segments in the image stream is 'landscape'.
The identification result of the transition in the image stream can be that the transition type of the t 2-t 4 segment in the image stream is 'person to person', the transition type of the t 6-t 8 segment in the image stream is 'person to food', the transition type of the t 10-t 11 segment in the image stream is 'fast moving mirror', the transition type of the t 12-t 14 segment in the image stream is 'food to no scene', and the transition type of the t 17-t 19 segment in the image stream is 'picture zoom'.
The position of the audio event image segment in the image stream and the audio event type recognition result can be that the audio event type of the t 0-t 1 segment in the image stream is speaking, the audio event type of the t 1-t 5 segment in the image stream is laughter, the audio event type of the t 5-t 9 segment in the image stream is music, the audio event type of the t 9-t 11 segment in the image stream is no sound, the audio event type of the t 11-t 18 segment in the image stream is noise, and the audio event type of the t 18-t 20 segment in the image stream is no sound.
The division of the picture segments in the image stream can be that the image stream can be divided into t 0-t 1 picture segments, t 1-t 2 picture segments, t 2-t 4 picture segments, t 4-t 5 picture segments, t 5-t 6 picture segments, t 6-t 8 picture segments, t 8-t 9 picture segments, t 9-t 10 picture segments, t 10-t 11 picture segments, t 11-t 12 picture segments, t 12-t 14 picture segments, t 14-t 16 picture segments, t 16-t 17 picture segments, t 17-t 18 picture segments, t 18-t 19 picture segments and t 19-t 20 picture segments. Wherein, the segment subject of the t 0-t 1 picture segment is a "character", the segment subject of the t 1-t 2 picture segment is a "character", the segment subject of the t 2-t 4 picture segment is a "character+laugh", the segment subject of the t 4-t 5 picture segment is a "character", the segment subject of the t 5-t 6 picture segment is a "character", the segment subject of the t 6-t 8 picture segment is a "character-to-food+music", the segment subject of the t 8-t 9 picture segment is a "food", the segment subject of the t 9-t 10 picture segment is a "food", the segment subject of the t 10-t 11 picture segment is a "quick mirror", the segment subject of the t 11-t 12 picture segment is a "food", the segment subject of the t 12-t 14 picture segment is a "food-scene-free+noise", the segment subject of the t 14-t 16 picture segment is a "noise", the segment subject of the t 16-t 17 picture segment is a "scene-picture+music", the segment subject of the t 8-t 9 picture segment is a "scene-picture" 18 "and the segment subject of the t 19-of the t19 is a" scene-picture "18" and the segment of the scale "t 19".
S1308, the electronic device 100 determines a plurality of highlight clips under the highlight subject from the plurality of clips based on the clip subjects of the plurality of clips, and records the positions of the plurality of highlight clips in the image stream.
The electronic device 100 may determine a picture segment under a preset highlight subject among the plurality of picture segments as a highlight picture segment.
The electronic device 100 may determine as invalid segments only the segments of the scene that are not transitioned but not valid sounds (e.g., speaking, laughter, music, etc.), and the segments of the scene categories that are not transitioned and not valid sounds. The picture segments other than the invalid segment among the plurality of picture segments are determined as highlight picture segments.
For example, as shown in fig. 14, the electronic device 100 may determine t0 to t1, t1 to t2, t2 to t4, t4 to t5, t5 to t6, t6 to t8, t8 to t9, and t9 to t10 as highlight segments, determine t10 to t11 as inactive segments, determine t11 to t12 as highlight segments, determine t12 to t14 and t14 to t16 as inactive segments, determine t16 to t17 as highlight segments, determine t17 to t18 and t18 to t19 as inactive segments, and determine t19 to t20 as highlight segments.
At the end of recording, the electronic device 100 mixes the image stream and the audio stream into the original video S1309.
S1310, the electronic device 100 intercepts a plurality of highlight clips from the original video based on the positions of the plurality of highlight clips in the image stream.
As shown in FIG. 14, since the t 0-t 1 frame segment, the t 1-t 2 frame segment, the t 2-t 4 frame segment, the t 4-t 5 frame segment, the t 5-t 6 frame segment, the t 6-t 8 frame segment, the t 8-t 9 frame segment, and the t 9-t 10 frame segment are continuous and all highlight frame segments, the electronic device 100 can determine the t 0-t 10 video segment in the original video as highlight video segment 1. Since the frame segments t11 to t12 are highlight frame segments, the electronic device 100 determines the frame segments t11 to t12 in the original video as highlight video segment 2. Since the t16 to t17 frame segments are highlight frame segments, the electronic device 100 determines the t16 to t17 video segments in the original video as highlight video segment 3. Since the t19 to t20 frame segments are highlight frame segments, the electronic device 100 determines the t19 to t20 video segments in the original video as highlight video segments 4.
S1311, the electronic device 100 fuses the plurality of highlight video clips into one highlight video.
S1312, the electronic device 100 saves the original video and the highlight video.
The electronic device 100 may directly splice a plurality of highlight video segments together according to a time sequence as one highlight video.
In one possible implementation, the electronic device 100 may add video special effects in the splicing area of the highlight video clips during the splicing process, for transitioning the video. Wherein the video effect may include a picture effect. Optionally, the video effects may also include audio effects.
The splicing area may be increased by a time zone between an end position of a previous highlight video clip and a start position of a next highlight video clip of the two highlight video clips. For example, as shown in fig. 14, there may be a mosaic area 1 between the end position of the highlight video segment 1 and the start position of the highlight video segment 2, there may be a mosaic area 2 between the end position of the highlight video segment 2 and the start position of the highlight video segment 3, and there may be a mosaic area 3 between the end position of the highlight video segment 3 and the start position of the highlight video segment 4.
In one possible implementation, the stitching region may be a region consisting of an end portion region (e.g., an end 500ms portion) of a previous highlight video segment and a beginning portion region (e.g., a beginning 500ms portion) of a subsequent highlight video segment of the two highlight video segments. Reference may be made specifically to the aforementioned embodiment shown in fig. 11, and no further description is given here.
The picture special effect of the splicing area can comprise the picture fusion of the two highlight video clips, such as the in-flight, the out-flight, the front highlight video clip and the rear highlight video clip. For example, in a splicing area of two highlight video clips, it is possible to fly the picture of the preceding highlight video clip from the left side out of the video display window and simultaneously fly the picture of the following highlight video clip from the right side into the video display window.
Wherein the audio effects of the mosaic area may comprise pure music, songs, etc. In a possible implementation, when the splicing area may be an area formed by an end portion area of a previous highlight video clip and a start portion area (for example, a start 500ms portion) of a next highlight video clip, the electronic device 100 may gradually decrease the audio volume of the previous highlight video clip and gradually increase the audio volume of the next highlight video clip from small in the splicing area.
In one possible implementation, the electronic device 100 may select the video special effects used in the splicing region based on the segment topics corresponding to the two highlight video segments before and after the splicing region.
In one possible implementation, the electronic device 100 may add background music to the highlight video after splicing together a plurality of highlight video segments in chronological order as one highlight video. Alternatively, the electronic device 100 may select background music based on the clip topics in the plurality of highlight video clips. For example, the electronic device 100 may select a clip theme with the longest appearance among the plurality of highlight clips as a theme of the highlight, and select music corresponding to the theme of the highlight as background music based on the theme of the highlight, and add the music to the highlight.
In one possible implementation, the electronic device 100 may score a plurality of highlight video clips based on the clip theme of the plurality of highlight video clips, respectively. Then, a plurality of highlight clips after the dubbing are spliced together in chronological order as one highlight.
According to the video processing method provided by the embodiment of the application, invalid fragments in recorded video can be deleted by analyzing scenes, transition analysis and audio event analysis in the video recorded by a user, a plurality of wonderful video fragments in the recorded video can be clipped, and the wonderful video fragments can be fused into a wonderful video. Thus, the ornamental value of the video recorded by the user can be improved.
Fig. 15 is a functional block diagram of a video processing system according to an embodiment of the present application.
As shown in fig. 15, video processing system 1500 may include a data module 1501, a perception module 1502, a fusion module 1503, and a video processing module 1504. Wherein,
The data module 1501 is used to acquire an image stream and an audio stream when video is recorded. The data module 1501 may pass the image stream and the audio stream to the sensing module 1502 and the image stream and the audio stream to the video processing module 1504.
The perception module 1502 may perform video understanding on the image stream, wherein the video understanding includes transition detection and scene detection. Specifically, the sensing module 1502 may perform scene detection on the image stream to identify a scene type of each frame in the image stream. The sensing module 1502 can perform transition detection on the image stream to identify a transition location and a transition category in the image stream at which a scene transition occurs. For details of the transition detection and the scene detection of the image stream, reference may be made to step S1302 and step S1303 in the foregoing embodiment shown in fig. 13, which are not described herein.
The perception module 1502 may also perform audio understanding on the audio stream. Where the audio understanding includes sound activation detection and audio event classification. Specifically, the sensing module 1502 may perform voice activation detection on an audio stream, identify a start-stop time point of a voice signal in the audio stream, and divide the audio stream into a plurality of audio segments. The perception module 1502 may categorize audio events for a plurality of audio clips in an audio stream. For specific details of the audio event classification and the sound activation detection of the audio stream, reference may be made to step S1304 and step S1305 in the foregoing embodiment shown in fig. 13, which are not described herein.
The sensing module 1502 may pass the scene category of each picture frame to the fusion module 1503 along with the transition locations and transition categories in the image stream at which the scene transition occurred, the locations of the audio clips, and the audio event categories.
The fusion module 1503 may divide the image stream into a plurality of picture segments based on the locations of the audio event image segments corresponding to the audio segments and the transition locations in the image stream where scene transitions occur. The fusion module 1503 may determine the topic of each picture segment based on the scene category, the transition category, and the audio event category to which each picture segment corresponds. For details, reference may be made to step S1307 in the embodiment shown in fig. 13, which is not described herein.
The fusion module 1503 may render the locations of the plurality of picture segments and the segment subject matter to the video processing module 1504.
The video processing module 1504 may mix the audio stream and the image stream into an original video. The video processing module 1504 may remove the frame segments of the invalid theme from the original video based on the positions of the frame segments and the segment themes, thereby capturing a plurality of highlight video segments. For details, reference may be made to steps S1308 to S1310 in the embodiment shown in fig. 13, which are not described herein.
The video processing module 1504 may fuse multiple highlight video segments into one highlight video, wherein the fusing process includes stitching the highlight video segments, adding special effects, adding soundtrack, and so forth. For details, reference may be made to step S1311 in the embodiment shown in fig. 13, which is not described herein.
The video processing module 1504 may output the raw video and the highlight video.
While the application has been described in detail with reference to the foregoing embodiments, it will be understood by those skilled in the art that the foregoing embodiments may be modified or equivalents may be substituted for some of the features thereof, and that the modifications or substitutions do not depart from the spirit of the embodiments.

Claims (19)

1. A video processing method, comprising:
The electronic equipment determines the scene category of each picture frame in the image stream of a first video, the transition position and the transition category of scene transition in the image stream of the first video, and divides the audio stream of the first video into a plurality of audio fragments, wherein the first video comprises a first video fragment, a second video fragment and a third video fragment, the ending time of the first video fragment is earlier than or equal to the starting time of the second video fragment, the ending time of the second video fragment is earlier than or equal to the starting time of the third video fragment, and the time length of the first video fragment is longer than the time length of the third video fragment;
the electronic equipment determines a plurality of audio event image fragments corresponding to the plurality of audio fragments in the image stream of the first video and audio event categories corresponding to each audio event image fragment;
The electronic device divides the image stream of the first video into a plurality of picture segments based on a scene category of each picture frame in the image stream of the first video, a transition position and a transition category at which a scene transition occurs in the image stream of the first video, and an audio event category of the plurality of audio event image segments, and determines a segment subject of each picture segment in the plurality of picture segments;
The electronic equipment determines a plurality of highlight picture segments under a highlight theme from the plurality of picture segments based on the segment subjects of the plurality of picture segments, and records the positions of the plurality of highlight picture segments in an image stream of the first video, wherein the highlight picture segments are picture segments except for invalid segments in the plurality of picture segments, and the invalid segments are picture segments with transition only and no valid sound and picture segments with no transition and no scene category;
The electronic device cuts out the first video segment and the third video segment from the first video based on the positions of the plurality of highlight segments in the image stream of the first video;
the electronic device generates the second video based on the first video segment and the third video segment;
In the second video generation process, the electronic equipment determines background music of the second video based on the segment subject of the first video segment, wherein the second video comprises the first video segment, the third video segment and the background music, and does not comprise the second video segment.
2. The method of claim 1, wherein before the electronic device determines a scene category for each picture frame in an image stream of a first video, a transition location in the image stream of the first video at which a scene transition occurred, and a transition category, dividing an audio stream of the first video into a plurality of audio segments, the method comprises:
the electronic device displays a video album interface, the video album interface including one or more video files, the one or more video files including the first video;
the video album interface displays a thumbnail of the first video;
the electronic device detecting a first input to the first video;
In response to the first input, the electronic device displays a first interface including a presentation area of the first video and a first control for generating the second video from the first video;
The electronic device detecting a second input to the first control;
The electronic device determines a scene type of each picture frame in an image stream of a first video, a transition position and a transition type of scene transition in the image stream of the first video, and divides an audio stream of the first video into a plurality of audio clips, and specifically includes:
In response to the second input, the electronic device determines a scene category of each picture frame in an image stream of a first video, a transition location in the image stream of the first video at which a scene transition occurs, and a transition category, and divides an audio stream of the first video into a plurality of audio clips.
3. The method according to claim 1 or 2, characterized in that the method further comprises:
the electronic equipment displays a second interface, wherein the second interface comprises a display area of the first video and a first window, and the first window is used for displaying the generation progress of the second video;
After the second video is generated, the electronic device displays a third interface, wherein the third interface comprises a display area of the second video.
4. The method of claim 3, wherein the step of,
The third interface also comprises a sharing control and an editing control;
the sharing control is used for sharing the second video, and the editing control is used for editing the second video.
5. The method according to claim 3 or 4, wherein,
The third interface further comprises a display area of the first video;
In response to a third input of a user to the display area of the second video, the display area of the first video is zoomed out and displayed, and the display area of the second video is zoomed in and displayed.
6. The method of claim 1, wherein the duration of the first video is greater than the duration of the second video, or wherein the duration of the first video is equal to the duration of the second video.
7. The method of claim 3, wherein prior to the electronic device displaying the third interface, the method further comprises:
and the electronic equipment splices the first video fragment and the third video fragment in the first video together to obtain the second video.
8. The method according to claim 7, wherein the electronic device splices the first video segment and the third video segment in the first video together to obtain the second video, and specifically comprises:
the electronic device splices the ending position of the first video segment and the starting position of the third video segment together to obtain the second video, or,
And the electronic equipment splices the ending position of the first video segment and the starting position of the first special effect segment together, and splices the ending position of the first special effect segment and the starting position of the third video segment together to obtain the second video.
9. The method of claim 1, wherein the first video clip and the third video clip are highlight video clips and the second video clip is an inactive video clip.
10. The method of claim 1, wherein the first video further comprises a fourth video clip;
if the fourth video segment is a highlight video segment, the second video comprises the fourth video segment;
And if the fourth video segment is an invalid video segment, the second video does not comprise the fourth video segment.
11. The method of claim 9, wherein the highlight video clip comprises a video clip in the first video in which the captured scene is a highlight scene and does not include a transition clip.
12. The method of claim 9, wherein the highlight video clip comprises a video clip in which the captured scene in the first video is a highlight scene and does not include a noisy or a non-sound transition clip.
13. The method of claim 11 or 12, wherein the highlight scene comprises one or more of a character, a landscape, a delicacy, a spring festival, a christmas festival, a building, a beach, a firework, a plant, a snow scene, or a trip.
14. The method of claim 2, wherein prior to the electronic device displaying a video album interface, the method further comprises:
The electronic equipment displays a shooting interface, wherein the shooting interface comprises a preview frame and a recording start control, and pictures acquired by a camera of the electronic equipment in real time are displayed in the preview frame;
The electronic device detecting a fourth input to the recording start control;
Responsive to the fourth input, the electronic device displays a recording interface and begins recording the first video;
The electronic equipment displays a recording interface, wherein the recording interface comprises a recording end control and a video picture of the first video recorded by the electronic equipment in real time;
the electronic device detecting a fifth input to the recording end control;
responsive to the fifth input, the electronic device ends recording the first video;
the electronic device saves the first video.
15. The method of claim 14, wherein the step of providing the first information comprises,
The recording interface further comprises a snapshot control, and when the electronic equipment displays the recording interface, the method further comprises:
The electronic equipment receives a sixth input of a user aiming at the snapshot control;
In response to the sixth input, the electronic device saves a first video frame of the first video acquired by a camera of the electronic device when receiving the sixth input as a first picture.
16. The method of claim 1, wherein after the electronic device generates the second video, the method further comprises:
The electronic device saves the second video.
17. An electronic device comprising a camera, a display screen, one or more processors, and one or more memories, wherein the one or more memories are coupled to the one or more processors, the one or more memories to store computer program code comprising computer instructions that, when executed by the one or more processors, cause the electronic device to perform the method of any of claims 1-16.
18. A chip system for application to an electronic device, the chip system comprising one or more processors to invoke computer instructions to cause the electronic device to perform the method of any of claims 1-16.
19. A computer readable storage medium comprising instructions which, when run on an electronic device, cause the electronic device to perform the method of any one of claims 1-16.
CN202210193721.3A 2022-02-28 2022-02-28 A video processing method and related device Active CN115529378B (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
CN202210193721.3A CN115529378B (en) 2022-02-28 2022-02-28 A video processing method and related device
EP22908855.4A EP4258632A4 (en) 2022-02-28 2022-12-30 Video processing method and related device
PCT/CN2022/143814 WO2023160241A1 (en) 2022-02-28 2022-12-30 Video processing method and related device
US18/268,799 US12342038B2 (en) 2022-02-28 2022-12-30 Video processing method and related apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210193721.3A CN115529378B (en) 2022-02-28 2022-02-28 A video processing method and related device

Publications (2)

Publication Number Publication Date
CN115529378A CN115529378A (en) 2022-12-27
CN115529378B true CN115529378B (en) 2025-06-13

Family

ID=84693559

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210193721.3A Active CN115529378B (en) 2022-02-28 2022-02-28 A video processing method and related device

Country Status (4)

Country Link
US (1) US12342038B2 (en)
EP (1) EP4258632A4 (en)
CN (1) CN115529378B (en)
WO (1) WO2023160241A1 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115529378B (en) * 2022-02-28 2025-06-13 荣耀终端股份有限公司 A video processing method and related device
CN118573948A (en) * 2023-05-26 2024-08-30 武汉星巡智能科技有限公司 Intelligent identification method, device, equipment and storage medium for dining behaviors of infants
WO2025102369A1 (en) * 2023-11-17 2025-05-22 影石创新科技股份有限公司 Imaging system and control method therefor
CN118590714B (en) * 2024-08-02 2025-02-28 荣耀终端股份有限公司 Visual media data processing method, program product, storage medium and electronic device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105245810A (en) * 2015-10-08 2016-01-13 广东欧珀移动通信有限公司 Method and device for processing video transitions
CN111061912A (en) * 2018-10-16 2020-04-24 华为技术有限公司 A method and electronic device for processing video files
CN113766314A (en) * 2021-11-09 2021-12-07 北京中科闻歌科技股份有限公司 Video segmentation method, device, equipment, system and storage medium
CN113973216A (en) * 2020-07-22 2022-01-25 聚好看科技股份有限公司 A method for generating a video collection and a display device

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030237091A1 (en) * 2002-06-19 2003-12-25 Kentaro Toyama Computer user interface for viewing video compositions generated from a video composition authoring system using video cliplets
US20140328570A1 (en) * 2013-01-09 2014-11-06 Sri International Identifying, describing, and sharing salient events in images and videos
CN106803987B (en) * 2015-11-26 2021-09-07 腾讯科技(深圳)有限公司 Video data acquisition method, device and system
CN105979188A (en) * 2016-05-31 2016-09-28 北京疯景科技有限公司 Video recording method and video recording device
CN108965599A (en) * 2018-07-23 2018-12-07 Oppo广东移动通信有限公司 Memory video processing method and related product
CN110602546A (en) 2019-09-06 2019-12-20 Oppo广东移动通信有限公司 Video generation method, terminal and computer-readable storage medium
CN112822563A (en) * 2019-11-15 2021-05-18 北京字节跳动网络技术有限公司 Method, device, electronic equipment and computer readable medium for generating video
CN113163272B (en) * 2020-01-07 2022-11-25 海信集团有限公司 Video editing method, computer device and storage medium
CN111447489A (en) 2020-04-02 2020-07-24 北京字节跳动网络技术有限公司 Video processing method and device, readable medium and electronic equipment
CN111866585B (en) 2020-06-22 2023-03-24 北京美摄网络科技有限公司 Video processing method and device
WO2022007545A1 (en) * 2020-07-06 2022-01-13 聚好看科技股份有限公司 Video collection generation method and display device
CN112738557A (en) * 2020-12-22 2021-04-30 上海哔哩哔哩科技有限公司 Video processing method and device
CN113709561B (en) * 2021-04-14 2024-04-19 腾讯科技(深圳)有限公司 Video editing method, device, equipment and storage medium
CN115529378B (en) * 2022-02-28 2025-06-13 荣耀终端股份有限公司 A video processing method and related device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105245810A (en) * 2015-10-08 2016-01-13 广东欧珀移动通信有限公司 Method and device for processing video transitions
CN111061912A (en) * 2018-10-16 2020-04-24 华为技术有限公司 A method and electronic device for processing video files
CN113973216A (en) * 2020-07-22 2022-01-25 聚好看科技股份有限公司 A method for generating a video collection and a display device
CN113766314A (en) * 2021-11-09 2021-12-07 北京中科闻歌科技股份有限公司 Video segmentation method, device, equipment, system and storage medium

Also Published As

Publication number Publication date
EP4258632A1 (en) 2023-10-11
CN115529378A (en) 2022-12-27
US20250024097A1 (en) 2025-01-16
EP4258632A4 (en) 2024-08-07
WO2023160241A1 (en) 2023-08-31
US12342038B2 (en) 2025-06-24

Similar Documents

Publication Publication Date Title
CN115529378B (en) A video processing method and related device
CN113194242B (en) A shooting method and mobile terminal in a telephoto scene
CN113727017A (en) Shooting method, graphical interface and related device
CN113727015B (en) Video shooting method and electronic equipment
US20240373119A1 (en) Shooting Method and Electronic Device
CN112887583A (en) Shooting method and electronic equipment
CN112199016B (en) Image processing method, image processing device, electronic equipment and computer readable storage medium
CN109922252A (en) The generation method and device of short-sighted frequency, electronic equipment
CN115225756B (en) Method for determining target object, shooting method and device
WO2023134583A1 (en) Video recording method and apparatus, and electronic device
CN109257649B (en) A kind of multimedia file generation method and terminal device
CN117119285B (en) A method of shooting
WO2023231696A1 (en) Photographing method and related device
CN115484423B (en) A method for adding transition special effects and electronic equipment
CN118474447A (en) Video processing method, electronic device, chip system and storage medium
WO2023231616A9 (en) Photographing method and electronic device
CN115883958A (en) Portrait shooting method
CN115484425A (en) Method and electronic device for determining transition effects
CN116708649A (en) Video processing method, electronic device and readable medium
KR20200079209A (en) Mobile terminal and method for controlling the same
CN115484400B (en) Video data processing method and electronic device
CN115484392B (en) Video shooting method and electronic equipment
CN120186287A (en) Video processing method, electronic device, chip system and storage medium
CN118474448A (en) Video processing method, electronic device, chip system and storage medium
WO2025092012A1 (en) Generation method and generation apparatus for animation, and electronic device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Country or region after: China

Address after: Unit 3401, unit a, building 6, Shenye Zhongcheng, No. 8089, Hongli West Road, Donghai community, Xiangmihu street, Futian District, Shenzhen, Guangdong 518040

Applicant after: Honor Terminal Co.,Ltd.

Address before: 3401, unit a, building 6, Shenye Zhongcheng, No. 8089, Hongli West Road, Donghai community, Xiangmihu street, Futian District, Shenzhen, Guangdong

Applicant before: Honor Device Co.,Ltd.

Country or region before: China

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载