+

CN112291614A - Video generation method and device - Google Patents

Video generation method and device Download PDF

Info

Publication number
CN112291614A
CN112291614A CN201910677074.1A CN201910677074A CN112291614A CN 112291614 A CN112291614 A CN 112291614A CN 201910677074 A CN201910677074 A CN 201910677074A CN 112291614 A CN112291614 A CN 112291614A
Authority
CN
China
Prior art keywords
characters
video
event
group
keywords
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910677074.1A
Other languages
Chinese (zh)
Inventor
詹振
李丽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sogou Technology Development Co Ltd
Original Assignee
Beijing Sogou Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sogou Technology Development Co Ltd filed Critical Beijing Sogou Technology Development Co Ltd
Priority to CN201910677074.1A priority Critical patent/CN112291614A/en
Publication of CN112291614A publication Critical patent/CN112291614A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • H04N21/4398Processing of audio elementary streams involving reformatting operations of audio signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • H04N21/440218Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display by transcoding between formats or standards, e.g. from MPEG-2 to MPEG-4
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • H04N21/440236Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display by media transcoding, e.g. video is transformed into a slideshow of still pictures, audio is converted into text
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/488Data services, e.g. news ticker
    • H04N21/4884Data services, e.g. news ticker for displaying subtitles

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application provides a video generation method and device, which can acquire characters corresponding to event keywords and pictures corresponding to the characters, then convert the characters corresponding to the event keywords into voice of a video, and convert the pictures corresponding to the characters into video frames of the video, so as to generate the video corresponding to the event keywords. That is, it is not necessary to generate a video related to an event keyword, for example, a hot event keyword, by recording a video, but a video corresponding to an event keyword may be generated by using a text corresponding to the event keyword and a picture corresponding to the text. The time spent in the steps of obtaining the characters corresponding to the event keywords and the pictures corresponding to the characters, generating the videos corresponding to the event keywords according to the characters corresponding to the event keywords and the pictures corresponding to the characters is short, compared with the video recording, the efficiency of generating the videos is higher, and the videos related to the hot events can be generated soon after the hot events occur.

Description

Video generation method and device
Technical Field
The present application relates to the field of internet, and in particular, to a video generation method and apparatus.
Background
With the development of science and technology, many video websites and applications for playing videos appear. In order to attract more users to watch the video website and attract more users to use the application program for playing the video, the video website and the application program for playing the video can play the video of the hot event.
It is understood that if a video of a hot event is to be played, a video of the hot event is first generated. Most of the current methods for generating the video of the hot event are directly recording the video.
It is understood that a certain time is required for recording the video, and a video clip and the like are required after the video is recorded, so that the video of the hot event cannot be generated quickly after the hot event occurs, and the video of the hot event cannot be broadcasted as soon as possible after the hot event occurs.
Disclosure of Invention
The technical problem to be solved by the application is that a traditional video mode for generating a hot event cannot generate a video of the hot event quickly after the hot event occurs, so that the video of the hot event cannot be broadcasted as soon as possible after the hot event occurs, and the video generation method and the device are provided.
In a first aspect, an embodiment of the present application provides a video generation method, where the method includes:
acquiring characters corresponding to the event keywords and pictures corresponding to the characters;
and converting the characters corresponding to the event keywords into voice of a video, and converting the pictures corresponding to the characters into video frames of the video so as to generate the video corresponding to the event keywords.
Optionally, the method further includes:
obtaining a material according to a preset rule;
extracting candidate keywords from the materials;
and if the search quantity and/or the click quantity corresponding to the candidate keyword meet preset conditions, determining the candidate keyword as the event keyword.
Optionally, the obtaining of the text corresponding to the event keyword and the picture corresponding to the text includes:
acquiring at least one group of characters corresponding to the event keywords and pictures corresponding to each group of characters;
the converting the characters corresponding to the event keywords into the voice of the video comprises:
converting each group of characters in the at least one group of characters into corresponding video voices respectively;
the converting the picture corresponding to the text into the video frame of the video comprises:
determining the playing time of the voice of the video corresponding to each group of characters in the at least one group of characters;
and determining video frames respectively corresponding to the pictures corresponding to each group of characters in the at least one group of characters according to the playing duration.
Optionally, the converting the text corresponding to the event keyword into a voice of a video, and converting the picture corresponding to the text into a video frame of the video to generate the video corresponding to the event keyword includes:
acquiring time information corresponding to each group of characters in the at least one group of characters;
and converting the characters corresponding to the event keywords into video voices and converting the pictures of the characters into video frames of the videos according to the time information corresponding to each group of characters in the at least one group of characters so as to generate the videos corresponding to the event keywords.
Optionally, the at least one group of words includes a plurality of groups of words;
the converting the characters corresponding to the event keywords into the voice of the video comprises:
and converting the multiple groups of characters corresponding to the event keywords into video voices according to the logic relation among the multiple groups of characters.
Optionally, the method further includes:
acquiring a picture corresponding to the event keyword;
identifying the picture corresponding to the event keyword to obtain the picture content of the picture corresponding to the event keyword;
determining the association degree between the picture content of the picture corresponding to the event keyword and the characters;
and if the association degree is greater than or equal to a first threshold value, determining the picture corresponding to the event keyword as the picture corresponding to the character.
Optionally, the text corresponding to the event keyword includes multiple groups;
after obtaining a plurality of groups of texts corresponding to the event keywords, the converting the texts corresponding to the event keywords into the voice of the video includes:
determining the association degree of each group of characters in the plurality of groups of characters and the event keywords on the content;
the converting the characters corresponding to the event keywords into the voice of the video comprises the following steps:
and converting the characters of which the corresponding association degree is greater than or equal to a second threshold value in the plurality of groups of characters into the voice of the video.
Optionally, the method further includes:
determining words representing time in the characters;
if the word representing the time is not in the preset format, obtaining the publishing time of the character, determining the word conforming to the preset format according to the publishing time of the character, and replacing the word conforming to the preset format with the word representing the time to obtain a replaced character;
the converting the characters corresponding to the event keywords into the voice of the video comprises:
and converting the replaced characters into voice of the video.
Optionally, the method further includes:
and converting the characters corresponding to the event keywords into subtitles of the video.
In a second aspect, an embodiment of the present application provides a video generating apparatus, where the apparatus includes:
the system comprises a first acquisition unit, a second acquisition unit and a display unit, wherein the first acquisition unit is used for acquiring characters corresponding to event keywords and pictures corresponding to the characters;
and the generating unit is used for converting the characters corresponding to the event keywords into voice of a video and converting the pictures corresponding to the characters into video frames of the video so as to generate the video corresponding to the event keywords.
Optionally, the apparatus further comprises:
the second acquisition unit is used for acquiring the material according to a preset rule;
an extracting unit for extracting candidate keywords from the material;
and the first determining unit is used for determining the candidate keyword as the event keyword if the search quantity and/or click quantity corresponding to the candidate keyword meets a preset condition.
Optionally, the first obtaining unit is specifically configured to:
acquiring at least one group of characters corresponding to the event keywords and pictures corresponding to each group of characters;
the generating unit is specifically configured to:
converting each group of characters in the at least one group of characters into corresponding video voices respectively; determining the playing time of the voice of the video corresponding to each group of characters in the at least one group of characters; and determining video frames respectively corresponding to the pictures corresponding to each group of characters in the at least one group of characters according to the playing duration so as to generate a video corresponding to the event keyword.
Optionally, the generating unit is specifically configured to:
acquiring time information corresponding to each group of characters in the at least one group of characters;
and converting the characters corresponding to the event keywords into video voices and converting the pictures of the characters into video frames of the videos according to the time information corresponding to each group of characters in the at least one group of characters so as to generate the videos corresponding to the event keywords.
Optionally, the at least one group of words includes a plurality of groups of words;
the converting the characters corresponding to the event keywords into the voice of the video comprises:
and converting the multiple groups of characters corresponding to the event keywords into video voices according to the logic relation among the multiple groups of characters.
Optionally, the apparatus further comprises:
a third acquiring unit, configured to acquire a picture corresponding to the event keyword;
the identification unit is used for identifying the picture corresponding to the event keyword to obtain the picture content of the picture corresponding to the event keyword;
the second determining unit is used for determining the association degree between the picture content of the picture corresponding to the event keyword and the characters;
and the third determining unit is used for determining the picture corresponding to the event keyword as the picture corresponding to the character if the association degree is greater than or equal to a first threshold value.
Optionally, the text corresponding to the event keyword includes multiple groups;
after obtaining a plurality of groups of texts corresponding to the event keywords, the converting the texts corresponding to the event keywords into the voice of the video includes:
determining the association degree of each group of characters in the plurality of groups of characters and the event keywords on the content;
the converting the characters corresponding to the event keywords into the voice of the video comprises the following steps:
and converting the characters of which the corresponding association degree is greater than or equal to a second threshold value in the plurality of groups of characters into the voice of the video.
Optionally, the apparatus further comprises:
a fourth determining unit, configured to determine a word indicating time in the text;
the replacing unit is used for acquiring the publication time of the characters if the words representing the time are not in a preset format, determining the words conforming to the preset format according to the publication time of the characters, and replacing the words conforming to the preset format with the words representing the time to obtain the replaced characters;
the converting the characters corresponding to the event keywords into the voice of the video comprises:
and converting the replaced characters into voice of the video.
Optionally, the apparatus further comprises:
and the conversion unit is used for converting the characters corresponding to the event keywords into the subtitles of the video.
In a third aspect, embodiments of the present application provide a video generation apparatus, the apparatus comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured for execution by the one or more processors to include instructions for:
acquiring characters corresponding to the event keywords and pictures corresponding to the characters;
and converting the characters corresponding to the event keywords into voice of a video, and converting the pictures corresponding to the characters into video frames of the video so as to generate the video corresponding to the event keywords.
In a fourth aspect, embodiments of the present application provide a non-transitory computer-readable storage medium, where instructions, when executed by a processor of an electronic device, enable the electronic device to perform the video generation method of any one of the above first aspects.
Compared with the prior art, the embodiment of the application has the following advantages:
the embodiment of the application provides a video generation method, and particularly, in practical application, after an event, such as a hot event, occurs, pictures and characters related to the hot event often appear on a network. Therefore, in the embodiment of the application, the characters corresponding to the event keywords and the pictures corresponding to the characters can be acquired, then the characters corresponding to the event keywords are converted into the voice of the video, and the pictures corresponding to the characters are converted into the video frames of the video, so that the video corresponding to the event keywords is generated. That is to say, with the solution provided in the embodiment of the present application, it is not necessary to generate a video related to an event keyword, for example, a hot event keyword, by recording a video, but a video corresponding to the event keyword is generated by using a text corresponding to the event keyword and an image corresponding to the text. The time spent in the steps of acquiring the characters corresponding to the event keywords and the pictures corresponding to the characters, generating the videos corresponding to the event keywords according to the characters corresponding to the event keywords and the pictures corresponding to the characters and the like is short, and compared with the video recording, the video generating efficiency is higher. Compared with the prior art, the scheme provided by the embodiment of the application can generate the video related to the hot event soon after the hot event occurs.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a schematic flowchart of a video generation method according to an embodiment of the present disclosure;
fig. 2 is a schematic flowchart of a method for determining an event keyword according to an embodiment of the present application;
fig. 3 is a flowchart illustrating a method for determining an event keyword according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of a video generating apparatus according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of a video generating apparatus according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of a video generating device according to an embodiment of the present application.
Detailed Description
In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Most of the current methods for generating the video of the hot event are directly recording the video. However, it takes a certain time to record a video, and a video clip or the like is performed after the video is recorded, so that it is impossible to generate a video of a hot event soon after the hot event occurs. In view of this, embodiments of the present application provide a video generation method and apparatus, which can generate a video related to a hot event soon after the hot event occurs.
Various non-limiting embodiments of the present application are described in detail below with reference to the accompanying drawings.
Exemplary method
Referring to fig. 1, the figure is a schematic flowchart of a video generation method according to an embodiment of the present application.
The video generation method provided in the embodiment of the present application may be executed by a server, where the server may be a dedicated server for generating a video corresponding to an event keyword, and the server may also be a server further having other data processing functions, and the embodiment of the present application is not particularly limited.
The video generation method provided by the embodiment of the application may include the following steps S101 to S102, for example.
S101: and acquiring characters corresponding to the event keywords and pictures corresponding to the characters.
The event keywords mentioned in the embodiment of the present application may be keywords related to a trending event, and may also be keywords related to other events, for example, an event to be researched, and the embodiment of the present application is not particularly limited.
In this embodiment, the event keyword may include a plurality of characters, and the characters may be, for example, chinese characters, english characters, korean characters, or the like. It is to be understood that the event keyword may include a plurality of chinese characters when the character is a chinese character, a plurality of english words when the character is an english character, and a plurality of korean characters when the character is a korean character.
The specific number of characters included in the event keyword is not specifically limited in the embodiments of the present application, and the specific number of characters included in the event keyword may be determined according to an event corresponding to the event keyword. It should be noted that, in the embodiment of the present application, the text corresponding to the event keyword and the picture corresponding to the text may be obtained through a network. As an example, a word corresponding to the event keyword and a picture corresponding to the word may be crawled by using a web crawler with the event keyword as a search keyword. The crawling scope of the web crawler is not particularly limited in the embodiment of the application, the crawling scope of the web crawler can include the world wide web, and the crawling scope of the web crawler can further include content published on a corresponding social application program. The social application program is not particularly limited in the embodiments of the present application, and the social application program may be, for example, a microblog, a forum, a community, and the like.
S102: and converting the characters corresponding to the event keywords into voice of a video, and converting the pictures corresponding to the characters into video frames of the video so as to generate the video corresponding to the event keywords.
After the words corresponding to the event keywords and the pictures corresponding to the words are obtained, the videos corresponding to the event keywords can be generated by using the words corresponding to the event keywords and the pictures corresponding to the words. It is understood that video includes both voice and video frames, with pictures included in the video frames. Therefore, in the embodiment of the application, the characters corresponding to the event keywords can be converted into the voice of the video, the pictures corresponding to the characters can be converted into the video frames of the video, and the video frames and the voice are synthesized, so that the video corresponding to the event keywords can be obtained.
The embodiment of the present application does not specifically limit a specific implementation manner of converting the text corresponding to the event keyword into the speech, and as an example, the text corresponding to the event keyword may be converted into the speech by using a corresponding speech generation tool.
The embodiment of the present application does not specifically limit an implementation manner of converting the picture corresponding to the text into the video frame of the video, and as an example, the duration of the picture corresponding to the text appearing in the video frame may be determined according to the play duration of the voice obtained through the conversion, so as to convert the picture corresponding to the text into the video frame of the video. For example, if the playing time of the voice is 5 minutes, it may be determined that the time length of the picture corresponding to the text appearing in the video frame is 5 minutes, and further, the picture corresponding to the text is converted into the video frame of the video.
As can be seen from the above description, with the video generation method provided in the embodiment of the present application, it is not necessary to generate a video related to an event keyword, such as a hot event keyword, in a manner of recording a video, but a video corresponding to the event keyword is generated by using a text corresponding to the event keyword and an image corresponding to the text. The time spent in the steps of acquiring the characters corresponding to the event keywords and the pictures corresponding to the characters, generating the videos corresponding to the event keywords according to the characters corresponding to the event keywords and the pictures corresponding to the characters and the like is short, and compared with the video recording, the video generating efficiency is higher. Compared with the prior art, the scheme provided by the embodiment of the application can generate the video related to the hot event soon after the hot event occurs.
Moreover, generally speaking, the text corresponding to the event keyword and the picture corresponding to the text acquired in S101 are related in content, so that in the process of playing the video frame, the video frame correspondingly plays the voice related to the picture in the video frame by using the video generated by the text corresponding to the event keyword and the picture corresponding to the text, and the correlation between the video frame and the voice is strong, so that better user viewing experience can be brought.
As described above, the aforementioned event keywords may be keywords related to a trending event. In consideration of the fact that in practical application, in order to be able to broadcast the video of the hot event as soon as possible after the hot event occurs, the method provided by the embodiment of the application may further automatically determine the event keyword, so as to generate the video corresponding to the hot event before the event is fermented into the hot event. Thereby enabling the video associated with the event to be played at a first time when the event ferments to a trending event.
Referring to fig. 2, the figure is a schematic flowchart of a method for determining an event keyword according to an embodiment of the present application. The method for determining the event keyword provided by the embodiment of the application can be implemented through the following steps S201 to S203, for example.
S201: and acquiring the material according to a preset rule.
S202: and extracting candidate keywords from the materials.
In the embodiment of the present application, it is considered that in practical applications, many websites and applications are provided with hot event columns, and the events mentioned in the hot event columns have a high possibility that fermentation becomes hot events. In view of this, in an implementation manner of the embodiment of the present application, in a specific implementation, S201 may obtain the material from the hot column of the preset website and/or the hot column of the preset application by using a data mining technology.
After the material is obtained, keywords related to the event in the material can be determined as candidate keywords. The embodiment of the present application does not specifically limit the specific implementation manner of determining the candidate keyword, and as an example, considering that generally, in the content related to the material, the degree of connection between the title and the event is relatively close, the candidate keyword may be extracted from the title of the material.
S203: and if the search quantity and/or the click quantity corresponding to the candidate keyword meet preset conditions, determining the candidate keyword as the event keyword.
It is contemplated that not all of the events mentioned in the popular section may be popular events. In the embodiment of the present application, after determining the candidate keywords, a candidate keyword corresponding to an event with a higher possibility of becoming a trending event may be further determined from the determined candidate keywords, that is, an event keyword may be further determined from the determined candidate keywords.
It can be understood that, in practical applications, the attention of users to hot events is relatively high. The attention of the user to an event can be reflected by the search volume and the click volume of the user to the keyword corresponding to the event. In view of this, in the embodiment of the present application, the event keyword may be determined from the candidate keywords by the search volume and/or click volume corresponding to the candidate keywords. The search amount of a candidate keyword may be the amount of the candidate keyword searched by the user in the search engine; the click rate of a candidate keyword may be the number of the user clicking on the web page corresponding to the candidate keyword, posting a message corresponding to the candidate keyword on the social networking site, and the like.
In view of this, in the embodiment of the present application, if the search volume and/or the click volume corresponding to the candidate keyword meet a preset condition, the candidate keyword is determined as the event keyword. In the embodiment of the application, the search volume and/or click volume corresponding to the candidate keyword meet a preset condition, which indicates that the attention of a user to an event corresponding to the candidate keyword is high. As an example, the search volume and/or the click volume corresponding to the candidate keyword meet a preset condition, for example, the search volume and/or the click volume corresponding to the candidate keyword meet the preset condition, where the search volume and/or the click volume corresponding to the candidate keyword is greater than or equal to a preset threshold, and a specific value of the preset threshold may be determined according to an actual situation, and the embodiment of the present application is not specifically limited.
As described above, the text corresponding to the event keyword and the image corresponding to the text may be acquired through a network. It can be understood that when words corresponding to event keywords and pictures corresponding to the words are acquired by using a network, words corresponding to the keywords and pictures corresponding to the words from various channels can be acquired. For example, words corresponding to the keywords and pictures corresponding to the words from various news websites can be acquired; characters corresponding to the keywords and pictures corresponding to the characters issued on each social application program can also be acquired. In view of this, in this embodiment of the application, in the step S101 "obtaining the text corresponding to the event keyword and the picture corresponding to the text" in a specific implementation, the text may be obtained by obtaining at least one group of text corresponding to the event keyword and a picture corresponding to each group of text.
It should be noted that, in this embodiment of the application, the text and the picture corresponding to the event keyword, which are obtained from a certain channel, may be defined as a group of text corresponding to the event keyword and a group of picture corresponding to the text. The embodiment of the present application does not specifically limit the range covered by the channel, and as an example, a webpage may be defined as a channel; as yet another example, a website may be defined as a channel; as yet another example, a social application may be defined as a channel, and so on.
When the obtained text corresponding to the event keyword and the picture corresponding to the text include at least one group of text corresponding to the event keyword and a picture corresponding to each group of text, respectively, in the specific implementation of "converting the text corresponding to the event keyword into the voice of the video" in S102, each group of text in the at least one group of text may be converted into the voice of the corresponding video, respectively. It can be understood that, in practical applications, in the pictures corresponding to the at least one group of words and each group of words, since a group of words and a picture corresponding to the group of words may be obtained from the same channel, for example, the same web page, the content correlation between the group of words and the picture corresponding to the group of words is relatively high. In view of this, in the embodiment of the present application, when the video corresponding to the event keyword is generated, the voice corresponding to a group of characters and the video frame corresponding to the picture corresponding to the group of characters may be played correspondingly, so that the video frame of the video and the voice of the video have a relatively high correlation in content. Specifically, in the foregoing S102, when the step of converting the picture corresponding to the text into the video frame of the video is specifically implemented, the playing time of the voice of the video corresponding to each group of the at least one group of the text may be determined, and then the video frame corresponding to the picture corresponding to each group of the at least one group of the text is determined according to the playing time. For example, the text corresponding to the event keyword and the picture corresponding to the text include 3 groups of texts and pictures corresponding to the 3 groups of texts, specifically, the playing time of the voice corresponding to the first group of texts is a first time, the playing time of the voice corresponding to the second group of texts is a second time, and the playing time of the voice corresponding to the third group of texts is a third time. Determining that the playing time length of the video frame corresponding to the picture corresponding to the first group of characters is a first time length, and further generating a video frame corresponding to the voice corresponding to the first group of characters, wherein the playing time length of the video frame is the first time length; determining the playing time length of the video frame corresponding to the picture corresponding to the second group of characters as a second time length, and further generating the video frame corresponding to the voice corresponding to the second group of characters, wherein the playing time length of the video frame is the second time length; and determining the playing time length of the video frame corresponding to the picture corresponding to the third group of characters as a third time length, and further generating the video frame corresponding to the voice corresponding to the third group of characters, wherein the playing time length of the video frame is the third time length.
It can be understood that, in practical applications, when the text corresponding to the event keyword and the picture corresponding to the text include at least one group of texts and pictures corresponding to each group of texts, in order to enable the generated voice of the video corresponding to the event keyword to describe the process of the event occurrence according to a certain time sequence when playing, in this embodiment of the present application, time information corresponding to each group of texts in the at least one group of texts may be obtained, and according to the time information corresponding to each group of texts in the at least one group of texts, the text corresponding to the event keyword is converted into voice of the video, and the picture of the text is converted into a video frame of the video, so as to generate the video corresponding to the event keyword.
In the embodiment of the present application, the time information corresponding to each group of characters in the at least one group of characters is obtained, so as to describe a process of the occurrence of the event according to a time sequence. In this embodiment of the application, considering that the publication information of each of the at least one group of texts can indicate the development sequence of the event corresponding to the event keyword to a certain extent, the time information corresponding to each of the at least one group of texts may include the publication time corresponding to the at least one group of texts. In addition, considering that each of the at least one group of texts may be information describing an event corresponding to the event keyword within a certain time period, in another implementation manner of the embodiment of the present application, the time information corresponding to each of the at least one group of texts may also include information describing a time included in each of the at least one group of texts.
In this embodiment of the application, after the time information corresponding to each group of words in the at least one group of words is obtained, the voices corresponding to the at least one group of words may be sorted according to a time sequence, for example, the voices corresponding to the at least one group of words are sorted according to a time described by the time information corresponding to the each group of words from early to late, and a playing sequence of the voices corresponding to the each group of words in the at least one group of words in the video is determined according to the sorting sequence, so that the generated video can describe a process of occurrence of an event according to a development sequence of the event corresponding to the event keyword.
Of course, in the embodiment of the present application, the voices corresponding to the at least one group of characters may also be sorted in other orders, for example, the voices corresponding to the at least one group of characters are sorted in order from late to early according to the time described by the time information corresponding to each group of characters. Further, according to the arrangement sequence, determining the playing sequence of the voice corresponding to each group of characters in the at least one group of characters in the video, so that the generated video can describe the process of the event in a reverse narrative manner.
In an implementation manner of the embodiment of the present application, when the text corresponding to the event keyword includes multiple groups of text, in order to make a logic relationship of voice of the video more strict in a playing process of a finally generated video, in another implementation manner of the embodiment of the present application, the multiple groups of text corresponding to the event keyword may be converted into voice of the video according to the logic relationship between the multiple groups of text. The logical relationship between the words mentioned in the embodiments of the present application may include any one or more of from cause to effect, from primary to secondary, from whole to part, from general to concrete, from phenomenon to essence, and from concrete to general, for example.
In an implementation manner of the embodiment of the application, each group of characters in the multiple groups of characters may be analyzed, a connecting word representing a logical relationship in each group of characters in the multiple groups of characters is extracted, then the logical relationship between the multiple groups of characters is determined according to the connecting word representing the logical relationship in each group of characters in the multiple groups of characters, and then the multiple groups of characters corresponding to the event keywords are converted into the voice of the video according to the logical relationship between the multiple groups of characters, so that the voice corresponding to the multiple groups of characters is in the playing sequence in the video, and conforms to the logical relationship between the multiple groups of characters corresponding to the event keywords.
For example, the words corresponding to the event keywords include two groups of words, wherein the logical relationship between the first group of words and the second group of words is a relationship from a cause to a result. Therefore, in the embodiment of the application, the first group of characters can be converted into the first voice and the second group of characters can be converted into the second voice according to the logical relationship between the first group of characters and the second group of characters, and the playing sequence of the first voice and the second voice in the video is that the first voice is played first and then the second voice is played.
In another implementation manner of the embodiment of the present application, a pre-trained logical relationship determination model may be used to determine the logical relationship between the multiple groups of characters. Specifically, the plurality of groups of characters may be input into the logical relationship determination model to obtain an output result of the logical relationship determination model. It will be appreciated that the logical relationship determines the result of the model input, i.e., the logical relationship between the sets of words.
It should be noted that, in this embodiment of the application, the logical relationship determination model may be obtained by training based on training texts and labels carried by the training texts, where the training texts may include multiple groups of texts, and the labels of the training texts are used to represent logical relationships between the multiple groups of texts in the training texts. The logical relationship determination model is not specifically limited in the embodiments of the present application, and as an example, the logical relationship determination model may be a deep learning model, for example, the logical relationship determination model may be a Convolutional Neural Networks (CNN) model; for another example, the logic relationship determination model may be a Recurrent Neural Network (RNN) model; as another example, the logical relationship determination model may be a Deep Neural Network (DNN) model, or the like. And are not described in detail herein.
As described above, the picture corresponding to the text may be converted into the video frame of the video according to the playing duration of the voice converted from the text corresponding to the event keyword. In an implementation manner of the embodiment of the application, in order to improve the viewing experience when the user views the video, one picture is not suitable to continuously appear in the video frames of multiple frames which are continuously played. In view of this, in an implementation manner of the embodiment of the present application, if the playing time duration corresponding to the voice obtained by converting the text is greater than or equal to a certain ratio threshold, the number of pictures included in a video frame is relatively small in the playing time duration of the voice, that is, one picture may appear in many frames of continuously played video frames. For example, if the playing time corresponding to the voice obtained by the text conversion is 120 seconds, and the number of pictures corresponding to the text is 2, in the playing time, it is possible that the pictures included in the video frame played in the first 60 seconds are all the first pictures, and the pictures included in the video frame played in the second 60 seconds are all the second pictures. For this situation, in this embodiment of the application, the images corresponding to the characters corresponding to the event keywords may be added through steps S301 to S304 shown in fig. 3. Fig. 3 is a flowchart illustrating a method for determining an image corresponding to a text corresponding to an event keyword according to an embodiment of the present application.
S301: and acquiring a picture corresponding to the event keyword.
It should be noted that, in the embodiment of the present application, the picture corresponding to the event keyword may be obtained through a network, for example, the picture corresponding to the event keyword may be searched by using the event keyword as a search keyword, so as to obtain the picture corresponding to the event keyword.
S302: and identifying the picture corresponding to the event keyword to obtain the picture content of the picture corresponding to the event keyword.
It should be noted that the embodiment of the present application is not particularly limited to a specific implementation manner of performing image identification on the image corresponding to the event keyword, and as an example, the image feature of the image corresponding to the event keyword may be extracted, and the image content of the image corresponding to the event keyword is determined according to the extracted image feature.
S303: and determining the association degree between the picture content of the picture corresponding to the event keyword and the characters corresponding to the event keyword.
In the embodiment of the present application, a specific implementation manner of determining the association degree between the picture content of the picture corresponding to the event keyword and the text is not particularly limited, and as an example, a model capable of determining the association degree between the picture content and the text corresponding to the event keyword may be trained in advance, so that the association degree between the picture content and the text corresponding to the event keyword is determined by using the trained model. In the embodiment of the present application, the model may be, for example, a Convolutional Neural Network (CNN) model. In the embodiment of the present application, for example, the CNN model may be trained according to picture content carrying a label and corresponding text, where the label is used to represent a degree of association between the picture content and the text. In order to further improve the accuracy of the trained CNN model for determining the degree of association between the picture content and the text corresponding to the event keyword, when the CNN model is trained, the picture content input as training data may further include, for example, the position of the picture in the web page where the picture is obtained, the size of the picture, the position relationship between the picture and the text in the web page where the picture is obtained, and the like.
S304: and if the association degree is greater than or equal to a first threshold value, determining the picture corresponding to the event keyword as the picture corresponding to the character corresponding to the event keyword.
It should be noted that the association degree is greater than or equal to the first threshold, which indicates that the association degree between the picture corresponding to the event keyword and the text corresponding to the event keyword is relatively high, so that the picture with the association degree greater than or equal to the first threshold in the picture corresponding to the event keyword can be determined as the picture corresponding to the text corresponding to the event keyword. The first threshold is not specifically limited in the embodiment of the application, and the specific value of the first threshold can be determined according to the actual situation.
It should be noted that, in practical applications, in addition to adding the image corresponding to the text corresponding to the event keyword through the foregoing steps S301 to S304, other methods may also be used to add the image corresponding to the text corresponding to the event keyword. For example, in a possible implementation manner, an entity in the text corresponding to the event keyword may be identified, then an image related to the identified entity is obtained, and the obtained image related to the entity is determined as the image corresponding to the text corresponding to the event keyword. The embodiments of the present application do not specifically limit the entity, and the entity may include one or more of a name of a person, a name of an object, and the like. The embodiment of the present application does not specifically limit an implementation manner of obtaining the picture related to the identified entity, and as an example, the picture related to the identified entity may be obtained by searching using the identified entity as a search keyword.
As described above, the words corresponding to the event keyword may be obtained through a network, and in the embodiment of the present application, it is considered that when the obtained words corresponding to the event keyword include multiple groups, the association degrees between the multiple groups of words and the event keyword may be different, the association degrees between some groups of words and the event keyword may be higher, and the association degrees between another group of words and the event keyword are lower. In the embodiment of the application, in order to enable the degree of association between the voice of the generated video and the event keyword to be relatively high, the obtained multiple groups of characters can be further screened, and finally the characters with the relatively high degree of association between the screened characters and the event keyword are converted into the voice of the video. Specifically, in this embodiment of the present application, a degree of association between each group of texts in the multiple groups of texts and the event keyword in the content may be determined, and when converting the text corresponding to the event keyword into the voice of the video, the text corresponding to the degree of association greater than or equal to a second threshold in the multiple groups of texts may be converted into the voice of the video.
It should be noted that, in this embodiment of the application, a distance between each group of texts in the plurality of groups of texts and the event keyword may be respectively calculated, and a degree of association between each group of texts in the plurality of groups of texts and the event keyword is determined according to the distance. In general, the greater the distance between a group of words and the event keyword, the smaller the degree of association between the group of words and the event keyword. The embodiment of the present application does not specifically limit a specific implementation manner of calculating a distance between a group of characters and the event keyword, and as an example, may determine a word embedding vector corresponding to the event keyword, determine a word embedding vector corresponding to the group of characters, and calculate a distance between the word embedding vector corresponding to the event keyword and the word embedding vector corresponding to the group of characters, to obtain a distance between the group of characters and the event keyword.
In this embodiment of the application, the degree of association between a group of words and the event keyword is greater than or equal to the second threshold, which indicates that the degree of association between the group of words and the event keyword is relatively high. The specific value of the second threshold is not specifically limited in the embodiment of the application, and the specific value of the second threshold can be determined according to the actual situation.
As described above, the image corresponding to the text corresponding to the event keyword may be obtained through a network, and in consideration of practical applications, the obtained image may also include some images unrelated to the text corresponding to the event keyword, for example, some unrelated advertisement images. In view of this, in this embodiment of the application, after the picture corresponding to the text corresponding to the event keyword is acquired, in step S101, a degree of association between the picture content of the acquired picture and the text corresponding to the event keyword may be further determined, and if the degree of association is relatively high, the picture is determined as the picture corresponding to the text corresponding to the event keyword. For specific implementation of determining the association degree between the picture content of the acquired picture and the text corresponding to the event keyword, reference may be made to the description part of S303, and details are not described here.
It is understood that, in practical applications, the words corresponding to the event keywords may include words representing time, and some of the words representing time may be represented in other formats than a preset format.
The present embodiment does not specifically limit the preset format, and the preset format may be, for example, an absolute time format, and the absolute time format may be, for example, a time, a minute, a day, a morning, an afternoon, and the like. The embodiments of the present application also do not specifically limit other formats, which may be, for example, relative time formats. The relative time format may be, for example, "today morning," "yesterday afternoon 3" and "today 15 hours 30" and so on.
It can be understood that if the words corresponding to the event keywords include words that do not represent time in a preset format, some errors may exist when the video is played. Because the publishing time of the video generated by using the scheme provided by the embodiment of the present application may be inconsistent with the publishing time of the text corresponding to the event keyword, if the text corresponding to the event keyword is converted into the voice of the video, the words that do not describe the time in a preset format, such as an absolute time format, are not considered, and some errors may be introduced. Therefore, in the embodiment of the present application, if the word indicating the time is not in the preset format, the publishing time of the text corresponding to the event keyword can be further obtained, the word corresponding to the word indicating the time and conforming to the preset format is determined according to the publishing time of the text corresponding to the event keyword, and the word conforming to the preset format is replaced with the word not conforming to the preset format and the replaced text is obtained. Correspondingly, when the characters corresponding to the event keywords are converted into the voice of the video, the replaced characters can be converted into the voice of the video, so that corresponding errors caused by the fact that the words representing the time are not in the preset format are avoided.
For example, if the words corresponding to the event keywords include "11 am today" and are not words describing time in a preset format, the publication time of the words corresponding to the event keywords may be obtained, and after the publication time corresponding to the event keywords is obtained as 2019, 6.3, the words describing time "11 am today" may be converted into "11 am 6.3 in 2019", and "11 am 6.3 in 2019" may be used to replace "11 am 11 today" in the words corresponding to the event keywords, and further, the words corresponding to the event keywords after replacement may be converted into the voice of the video.
It is understood that in practical applications, it may be inconvenient for a user to listen to speech while watching a video. For example, when a user is in a car, the ambient noise is too loud, and the video and voice are not clearly heard. In order to enable a user to normally watch videos in scenes where the user is inconvenient to listen to voices, in an implementation manner of the embodiment of the application, characters corresponding to the event keywords can be converted into subtitles of the videos, and the subtitles can be synchronously displayed when the videos are played, so that the user can know specific contents displayed by the video frames through the subtitles.
It should be noted that, in the embodiment of the present application, the content embodied by the subtitles may completely correspond to the voice of the aforementioned video, that is, when the video is played, the subtitles corresponding to the voice and the voice are played synchronously. Of course, the content embodied by the subtitles may not completely correspond to the voice of the video as long as the subtitles are determined according to the characters corresponding to the event keywords. It can be understood that, if the content embodied by the subtitle may not completely correspond to the voice of the video, when the user is inconvenient to listen to the voice, the specific content displayed by the video frame may be known according to the subtitle. When a user conveniently listens to voice, the content displayed by the video frame can be known according to the voice and the subtitles, and the content embodied by the subtitles can also not completely correspond to the voice of the video, so that the user can know more content displayed by the video frame.
Exemplary device
Based on the methods provided by the above embodiments, the embodiments of the present application further provide a video generating apparatus, which is described below with reference to the accompanying drawings.
Referring to fig. 4, this figure is a schematic structural diagram of a video generating apparatus according to an embodiment of the present application.
The video generating apparatus 400 illustrated in fig. 4 may specifically include: a first acquisition unit 401 and a generation unit 402.
A first obtaining unit 401, configured to obtain a text corresponding to an event keyword and an image corresponding to the text;
a generating unit 402, configured to convert the text corresponding to the event keyword into a voice of a video, and convert the picture corresponding to the text into a video frame of the video, so as to generate a video corresponding to the event keyword.
Optionally, the apparatus further comprises:
the second acquisition unit is used for acquiring the material according to a preset rule;
an extracting unit for extracting candidate keywords from the material;
and the first determining unit is used for determining the candidate keyword as the event keyword if the search quantity and/or click quantity corresponding to the candidate keyword meets a preset condition.
Optionally, the first obtaining unit 401 is specifically configured to:
acquiring at least one group of characters corresponding to the event keywords and pictures corresponding to each group of characters;
the generating unit 402 is specifically configured to:
converting each group of characters in the at least one group of characters into corresponding video voices respectively; determining the playing time of the voice of the video corresponding to each group of characters in the at least one group of characters; and determining video frames respectively corresponding to the pictures corresponding to each group of characters in the at least one group of characters according to the playing duration so as to generate a video corresponding to the event keyword.
Optionally, the generating unit 402 is specifically configured to:
acquiring time information corresponding to each group of characters in the at least one group of characters;
and converting the characters corresponding to the event keywords into video voices and converting the pictures of the characters into video frames of the videos according to the time information corresponding to each group of characters in the at least one group of characters so as to generate the videos corresponding to the event keywords.
Optionally, the at least one group of words includes a plurality of groups of words;
the converting the characters corresponding to the event keywords into the voice of the video comprises:
and converting the multiple groups of characters corresponding to the event keywords into video voices according to the logic relation among the multiple groups of characters.
Optionally, the apparatus further comprises:
a third acquiring unit, configured to acquire a picture corresponding to the event keyword;
the identification unit is used for identifying the picture corresponding to the event keyword to obtain the picture content of the picture corresponding to the event keyword;
the second determining unit is used for determining the association degree between the picture content of the picture corresponding to the event keyword and the characters;
and the third determining unit is used for determining the picture corresponding to the event keyword as the picture corresponding to the character if the association degree is greater than or equal to a first threshold value.
Optionally, the text corresponding to the event keyword includes multiple groups;
after obtaining a plurality of groups of texts corresponding to the event keywords, the converting the texts corresponding to the event keywords into the voice of the video includes:
determining the association degree of each group of characters in the plurality of groups of characters and the event keywords on the content;
the converting the characters corresponding to the event keywords into the voice of the video comprises the following steps:
and converting the characters of which the corresponding association degree is greater than or equal to a second threshold value in the plurality of groups of characters into the voice of the video.
Optionally, the apparatus further comprises:
a fourth determining unit, configured to determine a word indicating time in the text;
the replacing unit is used for acquiring the publication time of the characters if the words representing the time are not in a preset format, determining the words conforming to the preset format according to the publication time of the characters, and replacing the words conforming to the preset format with the words representing the time to obtain the replaced characters;
the converting the characters corresponding to the event keywords into the voice of the video comprises:
and converting the replaced characters into voice of the video.
Optionally, the apparatus further comprises:
and the conversion unit is used for converting the characters corresponding to the event keywords into the subtitles of the video.
Since the apparatus 400 is an apparatus corresponding to the method provided in the above method embodiment, and the specific implementation of each unit of the apparatus 400 is the same as that of the above method embodiment, for the specific implementation of each unit of the apparatus 400, reference may be made to the description part of the above method embodiment, and details are not repeated here.
As can be seen from the above description, with the video generation apparatus provided in the embodiment of the present application, it is not necessary to generate a video related to an event keyword, such as a hot event keyword, by recording the video, but a text corresponding to the event keyword and an image corresponding to the text are used to generate a video corresponding to the event keyword. The time spent in the steps of acquiring the characters corresponding to the event keywords and the pictures corresponding to the characters, generating the videos corresponding to the event keywords according to the characters corresponding to the event keywords and the pictures corresponding to the characters and the like is short, and compared with the video recording, the video generating efficiency is higher. Compared with the prior art, the scheme provided by the embodiment of the application can generate the video related to the hot event soon after the hot event occurs.
Fig. 5 is a block diagram illustrating a video generation apparatus 500 according to an example embodiment. For example, the apparatus 500 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.
Referring to fig. 5, the apparatus 500 may include one or more of the following components: processing component 502, memory 504, power component 506, multimedia component 508, audio component 510, input/output (I/O) interface 512, sensor component 514, and communication component 516.
The processing component 502 generally controls overall operation of the device 500, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing elements 502 may include one or more processors 520 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 502 can include one or more modules that facilitate interaction between the processing component 502 and other components. For example, the processing component 502 can include a multimedia module to facilitate interaction between the multimedia component 508 and the processing component 502.
The memory 504 is configured to store various types of data to support operation at the device 500. Examples of such data include instructions for any application or method operating on device 500, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 504 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
The power supply component 506 provides power to the various components of the device 500. The power components 506 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the apparatus 500.
The multimedia component 508 includes a screen that provides an output interface between the device 500 and the user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 508 includes a front facing camera and/or a rear facing camera. The front-facing camera and/or the rear-facing camera may receive external multimedia data when the device 500 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.
The audio component 510 is configured to output and/or input audio signals. For example, audio component 510 includes a Microphone (MIC) configured to receive external audio signals when apparatus 500 is in an operating mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 504 or transmitted via the communication component 516. In some embodiments, audio component 510 further includes a speaker for outputting audio signals.
The I/O interface 512 provides an interface between the processing component 502 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.
The sensor assembly 514 includes one or more sensors for providing various aspects of status assessment for the device 500. For example, the sensor assembly 514 may detect an open/closed state of the device 500, the relative positioning of the components, such as a display and keypad of the apparatus 500, the sensor assembly 514 may also detect a change in the position of the apparatus 500 or a component of the apparatus 500, the presence or absence of user contact with the apparatus 500, orientation or acceleration/deceleration of the apparatus 500, and a change in the temperature of the apparatus 500. The sensor assembly 514 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 514 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 514 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 516 is configured to facilitate communication between the apparatus 500 and other devices in a wired or wireless manner. The apparatus 500 may access a wireless network based on a communication standard, such as WiFi, 2G or 5G, or a combination thereof. In an exemplary embodiment, the communication component 516 receives a broadcast signal or broadcast associated information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communications component 516 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the apparatus 500 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.
In an exemplary embodiment, a non-transitory computer-readable storage medium comprising instructions, such as the memory 504 comprising instructions, executable by the processor 520 of the apparatus 500 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
Fig. 6 is a schematic structural diagram of a video generation device in an embodiment of the present invention. The video generation apparatus 600 may have relatively large differences due to different configurations or performances, and may include one or more Central Processing Units (CPUs) 622 (e.g., one or more processors) and a memory 632, one or more storage media 630 (e.g., one or more mass storage devices) storing applications 642 or data 644. Memory 632 and storage medium 630 may be, among other things, transient or persistent storage. The program stored in the storage medium 630 may include one or more modules (not shown), each of which may include a series of instruction operations for the video generating device. Still further, the central processor 622 may be configured to communicate with the storage medium 630 and execute a series of instruction operations in the storage medium 630 on the video generating device 600.
The video generating apparatus 600 may also include one or more power supplies 626, one or more wired or wireless network interfaces 650, one or more input-output interfaces 658, one or more keyboards 656, and/or one or more operating systems 661, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.
Embodiments of the present application also provide a non-transitory computer-readable storage medium, where instructions, when executed by a processor of a video generation device, enable the video generation device to perform a video generation method, the method including:
acquiring characters corresponding to the event keywords and pictures corresponding to the characters;
and converting the characters corresponding to the event keywords into voice of a video, and converting the pictures corresponding to the characters into video frames of the video so as to generate the video corresponding to the event keywords.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice in the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.
It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the attached claims
The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims (10)

1. A method of video generation, the method comprising:
acquiring characters corresponding to the event keywords and pictures corresponding to the characters;
and converting the characters corresponding to the event keywords into voice of a video, and converting the pictures corresponding to the characters into video frames of the video so as to generate the video corresponding to the event keywords.
2. The method of claim 1, further comprising:
obtaining a material according to a preset rule;
extracting candidate keywords from the materials;
and if the search quantity and/or the click quantity corresponding to the candidate keyword meet preset conditions, determining the candidate keyword as the event keyword.
3. The method according to claim 1, wherein the obtaining of the text corresponding to the event keyword and the picture corresponding to the text comprises:
acquiring at least one group of characters corresponding to the event keywords and pictures corresponding to each group of characters;
the converting the characters corresponding to the event keywords into the voice of the video comprises:
converting each group of characters in the at least one group of characters into corresponding video voices respectively;
the converting the picture corresponding to the text into the video frame of the video comprises:
determining the playing time of the voice of the video corresponding to each group of characters in the at least one group of characters;
and determining video frames respectively corresponding to the pictures corresponding to each group of characters in the at least one group of characters according to the playing duration.
4. The method of claim 3, wherein converting the text corresponding to the event keyword into a voice of a video, and converting the picture corresponding to the text into a video frame of the video, so as to generate the video corresponding to the event keyword comprises:
acquiring time information corresponding to each group of characters in the at least one group of characters;
and converting the characters corresponding to the event keywords into video voices and converting the pictures of the characters into video frames of the videos according to the time information corresponding to each group of characters in the at least one group of characters so as to generate the videos corresponding to the event keywords.
5. The method of claim 3, wherein the at least one set of words comprises a plurality of sets of words;
the converting the characters corresponding to the event keywords into the voice of the video comprises:
and converting the multiple groups of characters corresponding to the event keywords into video voices according to the logic relation among the multiple groups of characters.
6. The method of claim 1, further comprising:
acquiring a picture corresponding to the event keyword;
identifying the picture corresponding to the event keyword to obtain the picture content of the picture corresponding to the event keyword;
determining the association degree between the picture content of the picture corresponding to the event keyword and the characters;
and if the association degree is greater than or equal to a first threshold value, determining the picture corresponding to the event keyword as the picture corresponding to the character.
7. The method according to claim 1, wherein the words corresponding to the event keywords comprise a plurality of groups;
after obtaining a plurality of groups of texts corresponding to the event keywords, the converting the texts corresponding to the event keywords into the voice of the video includes:
determining the association degree of each group of characters in the plurality of groups of characters and the event keywords on the content;
the converting the characters corresponding to the event keywords into the voice of the video comprises the following steps:
and converting the characters of which the corresponding association degree is greater than or equal to a second threshold value in the plurality of groups of characters into the voice of the video.
8. A video generation apparatus, characterized in that the apparatus comprises:
the system comprises a first acquisition unit, a second acquisition unit and a display unit, wherein the first acquisition unit is used for acquiring characters corresponding to event keywords and pictures corresponding to the characters;
and the generating unit is used for converting the characters corresponding to the event keywords into voice of a video and converting the pictures corresponding to the characters into video frames of the video so as to generate the video corresponding to the event keywords.
9. A video generation apparatus, the apparatus comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory, and wherein execution of the one or more programs by one or more processors comprises instructions for:
acquiring characters corresponding to the event keywords and pictures corresponding to the characters;
and converting the characters corresponding to the event keywords into voice of a video, and converting the pictures corresponding to the characters into video frames of the video so as to generate the video corresponding to the event keywords.
10. A non-transitory computer readable storage medium, instructions in which, when executed by a processor of an electronic device, enable the electronic device to perform the video generation method of any of claims 1 to 7.
CN201910677074.1A 2019-07-25 2019-07-25 Video generation method and device Pending CN112291614A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910677074.1A CN112291614A (en) 2019-07-25 2019-07-25 Video generation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910677074.1A CN112291614A (en) 2019-07-25 2019-07-25 Video generation method and device

Publications (1)

Publication Number Publication Date
CN112291614A true CN112291614A (en) 2021-01-29

Family

ID=74418846

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910677074.1A Pending CN112291614A (en) 2019-07-25 2019-07-25 Video generation method and device

Country Status (1)

Country Link
CN (1) CN112291614A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113301389A (en) * 2021-05-19 2021-08-24 北京沃东天骏信息技术有限公司 Comment processing method and device for generating video
CN113423010A (en) * 2021-06-22 2021-09-21 深圳市大头兄弟科技有限公司 Video conversion method, device and equipment based on document and storage medium
CN113873290A (en) * 2021-09-14 2021-12-31 联想(北京)有限公司 Video processing method and device and electronic equipment
CN115277650A (en) * 2022-07-13 2022-11-01 深圳乐播科技有限公司 Screen projection display control method, electronic equipment and related device
CN118828105A (en) * 2023-04-19 2024-10-22 北京字跳网络技术有限公司 Video generation method, device, equipment, storage medium and program product
US12148451B2 (en) 2023-04-19 2024-11-19 Beijing Zitiao Network Technology Co., Ltd. Method, apparatus, device, storage medium and program product for video generating

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101834731A (en) * 2009-03-10 2010-09-15 华硕电脑股份有限公司 Method for correcting relative time of information context
US20120177345A1 (en) * 2011-01-09 2012-07-12 Matthew Joe Trainer Automated Video Creation Techniques
US20170169853A1 (en) * 2015-12-09 2017-06-15 Verizon Patent And Licensing Inc. Automatic Media Summary Creation Systems and Methods
CN107832382A (en) * 2017-10-30 2018-03-23 百度在线网络技术(北京)有限公司 Method, apparatus, equipment and storage medium based on word generation video
CN108228612A (en) * 2016-12-14 2018-06-29 北京国双科技有限公司 A kind of method and device for extracting network event keyword and mood tendency
CN108965737A (en) * 2017-05-22 2018-12-07 腾讯科技(深圳)有限公司 media data processing method, device and storage medium
CN109344291A (en) * 2018-09-03 2019-02-15 腾讯科技(武汉)有限公司 A kind of video generation method and device
CN109584648A (en) * 2018-11-08 2019-04-05 北京葡萄智学科技有限公司 Data creation method and device

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101834731A (en) * 2009-03-10 2010-09-15 华硕电脑股份有限公司 Method for correcting relative time of information context
US20120177345A1 (en) * 2011-01-09 2012-07-12 Matthew Joe Trainer Automated Video Creation Techniques
US20170169853A1 (en) * 2015-12-09 2017-06-15 Verizon Patent And Licensing Inc. Automatic Media Summary Creation Systems and Methods
CN108228612A (en) * 2016-12-14 2018-06-29 北京国双科技有限公司 A kind of method and device for extracting network event keyword and mood tendency
CN108965737A (en) * 2017-05-22 2018-12-07 腾讯科技(深圳)有限公司 media data processing method, device and storage medium
CN107832382A (en) * 2017-10-30 2018-03-23 百度在线网络技术(北京)有限公司 Method, apparatus, equipment and storage medium based on word generation video
CN109344291A (en) * 2018-09-03 2019-02-15 腾讯科技(武汉)有限公司 A kind of video generation method and device
CN109584648A (en) * 2018-11-08 2019-04-05 北京葡萄智学科技有限公司 Data creation method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张文宇,李栋: "《物联网智能技术》", 30 April 2012 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113301389A (en) * 2021-05-19 2021-08-24 北京沃东天骏信息技术有限公司 Comment processing method and device for generating video
CN113301389B (en) * 2021-05-19 2023-04-07 北京沃东天骏信息技术有限公司 Comment processing method and device for generating video
CN113423010A (en) * 2021-06-22 2021-09-21 深圳市大头兄弟科技有限公司 Video conversion method, device and equipment based on document and storage medium
CN113873290A (en) * 2021-09-14 2021-12-31 联想(北京)有限公司 Video processing method and device and electronic equipment
CN115277650A (en) * 2022-07-13 2022-11-01 深圳乐播科技有限公司 Screen projection display control method, electronic equipment and related device
CN115277650B (en) * 2022-07-13 2024-01-09 深圳乐播科技有限公司 Screen-throwing display control method, electronic equipment and related device
CN118828105A (en) * 2023-04-19 2024-10-22 北京字跳网络技术有限公司 Video generation method, device, equipment, storage medium and program product
WO2024217011A1 (en) * 2023-04-19 2024-10-24 北京字跳网络技术有限公司 Video generation method and apparatus, device, storage medium, and program product
US12148451B2 (en) 2023-04-19 2024-11-19 Beijing Zitiao Network Technology Co., Ltd. Method, apparatus, device, storage medium and program product for video generating
CN118828105B (en) * 2023-04-19 2025-09-30 北京字跳网络技术有限公司 Video generation method, device, equipment, storage medium and program product

Similar Documents

Publication Publication Date Title
CN111970577B (en) Subtitle editing method and device and electronic equipment
CN112291614A (en) Video generation method and device
CN108847214B (en) Voice processing method, client, device, terminal, server and storage medium
RU2640632C2 (en) Method and device for delivery of information
CN108227950B (en) Input method and device
CN107527619B (en) Method and device for positioning voice control service
CN110147467A (en) A kind of generation method, device, mobile terminal and the storage medium of text description
CN113705210B (en) A method and device for generating article outline and a device for generating article outline
CN110874145A (en) Input method and device and electronic equipment
CN107515870B (en) Searching method and device and searching device
CN113901241B (en) Page display method and device, electronic equipment and storage medium
CN107967271A (en) A kind of information search method and device
CN110929176A (en) Information recommendation method and device and electronic equipment
CN112784142A (en) Information recommendation method and device
CN110020106B (en) Recommendation method, recommendation device and device for recommendation
CN113343028B (en) Method and device for training intention determination model
CN107515869B (en) Searching method and device and searching device
CN109753205B (en) Display method and device
CN113239183A (en) Training method and device of ranking model, electronic equipment and storage medium
CN110929122A (en) Data processing method and device and data processing device
CN110633391A (en) Information searching method and device
CN111629270A (en) Candidate item determination method and device and machine-readable medium
CN111984767A (en) Information recommendation method and device and electronic equipment
CN112004033B (en) Video cover determining method and device and storage medium
CN113259754B (en) Video generation method, device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210129

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载