CN113221801B

CN113221801B - Version number information identification method and device, electronic equipment and readable storage medium

Info

Publication number: CN113221801B
Application number: CN202110568018.1A
Authority: CN
Inventors: 李冠楠
Original assignee: Beijing QIYI Century Science and Technology Co Ltd
Current assignee: Beijing QIYI Century Science and Technology Co Ltd
Priority date: 2021-05-24
Filing date: 2021-05-24
Publication date: 2023-08-18
Anticipated expiration: 2041-05-24
Also published as: CN113221801A

Abstract

The embodiment of the invention provides a method, a device, electronic equipment and a readable storage medium for identifying version number information, wherein the method comprises the following steps: performing word recognition on M frames of target video frame images of the target video to obtain word recognition results of each frame of target video frame image, classifying the word recognition results of the M frames of target video frame images according to keywords of N preset version number information types to obtain K word recognition results corresponding to the J preset version number information types, voting on the K word recognition results corresponding to the J preset version number information types to determine word recognition results to be output corresponding to the J preset version number information types, and obtaining target version number information of the target video according to the word recognition results to be output corresponding to the N preset version number information types. Therefore, the recording number picture is not required to be manually positioned to acquire the version number information of the video, and the efficiency of acquiring the version number information of the video is improved.

Description

Version number information identification method and device, electronic equipment and readable storage medium

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a method and apparatus for identifying version number information, an electronic device, and a readable storage medium.

Background

In order to effectively manage a video with version number information, it is generally necessary to record version number information of such a video, for example, a movie video, and the version number information of the movie video includes, for example, information of an issuing unit, an issuing time, and an over-review number of the movie video.

At present, in the video playing process, a record number picture of the video is required to be manually positioned, and the version number information of the video is acquired from the record number picture, so that the problem of low efficiency exists in acquiring the version number information of the video by manually positioning the record number picture.

Disclosure of Invention

The embodiment of the invention aims to provide a version number information identification method, device, electronic equipment and readable storage medium, which are used for solving the problem that the efficiency of manually positioning a record number picture to acquire the version number information of a video is low. The specific technical scheme is as follows:

in a first aspect of the present invention, there is provided a method for identifying version number information, including:

performing character recognition on M frames of target video frame images of a target video to obtain a character recognition result of each frame of target video frame image;

classifying the character recognition results of the M frames of target video frame images according to the keywords of the N preset version number information types to obtain K character recognition results corresponding to the J th preset version number information type;

Voting K character recognition results corresponding to the J-th preset version number information type to determine character recognition results to be output corresponding to the J-th preset version number information type;

obtaining target version number information of the target video according to character recognition results to be output corresponding to N preset version number information types;

wherein M, N, K is an integer greater than or equal to 1, and J is greater than or equal to 1 and less than or equal to N.

In a second aspect of the present invention, there is also provided a version number information identifying apparatus, including:

the first obtaining module is used for carrying out character recognition on M frames of target video frame images of the target video to obtain a character recognition result of each frame of target video frame images;

the second obtaining module is used for classifying the character recognition results of the M frames of the target video frame images according to the keywords of the N preset version number information types to obtain K character recognition results corresponding to the J th preset version number information type;

the first determining module is used for voting K character recognition results corresponding to the J-th preset version number information type so as to determine character recognition results to be output corresponding to the J-th preset version number information type;

the third obtaining module is used for obtaining target version number information of the target video according to character recognition results to be output corresponding to N preset version number information types;

In yet another aspect of the present invention, there is also provided an electronic device including a processor, a communication interface, a memory, and a communication bus, wherein the processor, the communication interface, and the memory perform communication with each other through the communication bus;

a memory for storing a computer program;

and the processor is used for realizing the steps of the method when executing the program stored in the memory.

In yet another aspect of the present invention, there is also provided a computer readable storage medium having stored thereon a computer program which when executed by a processor implements the method described above.

According to the version number information identification method, character identification is carried out on M frames of target video frame images of a target video, character identification results of each frame of target video frame images are obtained, the character identification results of the M frames of target video frame images are classified according to keywords of N preset version number information types, K character identification results corresponding to the J preset version number information types are obtained, voting is carried out on the K character identification results corresponding to the J preset version number information types, to-be-output character identification results corresponding to the J preset version number information types are determined, and target version number information of the target video is obtained according to the to-be-output character identification results corresponding to the N preset version number information types. Therefore, the recording number picture is not required to be manually positioned to acquire the version number information of the video, and the efficiency of acquiring the version number information of the video is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.

FIG. 1 is a flow chart of steps of a method for identifying version number information provided in an embodiment of the present invention;

fig. 2 is a schematic diagram of version number information of a movie video according to an embodiment of the present invention;

FIG. 3 is a flowchart illustrating steps of another method for identifying version number information according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of version number information of another movie video according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of a version number information identifying apparatus according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be described below with reference to the accompanying drawings in the embodiments of the present invention.

Referring to fig. 1, fig. 1 is a flowchart illustrating steps of a method for identifying version number information according to an embodiment of the present invention. The method may be performed by a computer, server, or the like. The method may comprise the steps of:

and 101, performing character recognition on M frames of target video frame images of the target video to obtain a character recognition result of each frame of target video frame image.

The target video may be a video with version number information, for example, including movies, cartoons, documentaries, and various kinds of goods, where the version number information refers to official over-examination version number information. The M-frame target video frame image may be an image within a period of time including a slice header image of the target video. The characters and characters on the target video frame image can be identified by adopting an optical character recognition (Optical Character Recognition, OCR) technology so as to obtain a character recognition result of the target video frame image.

Step 102, classifying the character recognition results of the M-frame target video frame images according to the keywords of the N preset version number information types to obtain K character recognition results corresponding to the J-th preset version number information type.

As shown in fig. 2, fig. 2 is a schematic diagram of version number information of a movie video according to an embodiment of the present invention. The version number information of the movie video includes, for example, issuing unit information, certificate type information, and over-trial number information. The issuing units of the version number information are, for example: the XXXXXXXXXXX television part XXXXXX administration, the license type information is, for example, XXX public license, and the overtrial information is, for example: XXXXX word 018.

In the embodiment of the invention, the version number information types can be preset, for example, the preset version number information types comprise a type 1, a type 2, a type 3 and a type 4, wherein the type 1 and the type 2 belong to the type of the issuing unit, the type 1 is a main type of the issuing unit, and the type 2 is an accessory type of the issuing unit. The keywords of type 1 are "parts", and the keywords of type 2 are "bureau"; type 3 belongs to the type of syndrome, and the keyword of type 3 is "syndrome"; type 4 is a number type, and keywords of type 4 are "words", "numbers". If the character recognition result of the target video frame image of a certain frame comprises a character recognition result 1, a character recognition result 2, a character recognition result 3, a character recognition result 4 and a character recognition result 5. Wherein, the character recognition result 1 comprises a part, and the character recognition result corresponding to the type 1 comprises a character recognition result 1; the word recognition result 2 comprises a office, and the word recognition result corresponding to the type 2 comprises a word recognition result 2; the word recognition result 3 comprises a part, the word recognition result corresponding to the type 3 comprises a word recognition result 3, the word recognition result 4 comprises a word and a number, and the word recognition result corresponding to the type 4 comprises a word recognition result 4; the word recognition result 5 includes "space" and "number", and the word recognition result corresponding to the type 4 also includes the word recognition result 5.

And 103, voting the K character recognition results corresponding to the J-th preset version number information type to determine the character recognition result to be output corresponding to the J-th preset version number information type.

Step 103 votes on K text recognition results corresponding to the J-th preset version number information type to determine a text recognition result to be output corresponding to the J-th preset version number information type, which may be implemented by the following steps:

voting K word recognition results corresponding to the J-th preset version number information type to obtain a voting result of each word recognition result;

and determining a character recognition result to be output corresponding to the information type of the J preset version number according to the voting result of each character recognition result.

Since the characters and the characters are identified by the character identification technology, there may be a situation of misidentification, so that the K character identification results corresponding to the preset version number information type obtained in the above steps need to be voted to improve the correctness of the character identification result to be output corresponding to the determined preset version number information type.

For example, the text recognition results corresponding to the type 4 include a text recognition result 4 and a text recognition result 5. If there are 100 frames of target video frame images, voting is carried out aiming at the character recognition result 4 and the character recognition result 5, and if the character recognition result 4 appears 95 times in total and the character recognition result appears 5 times in total, the character recognition result to be output corresponding to the determined type 4 is the character recognition 4.

And 104, obtaining target version number information of the target video according to character recognition results to be output corresponding to the N preset version number information types.

In combination with the above description of the example in the step, the word recognition result to be output corresponding to the type 1 is the word recognition result 1, the word recognition result to be output corresponding to the type 2 is the word recognition result 2, the word recognition result to be output corresponding to the type 3 is the word recognition result 3, the word recognition result to be output corresponding to the type 4 is the word recognition result 4, and the obtained target version number information includes the word recognition result 1, the word recognition result 2, the word recognition result 3 and the word recognition result 4.

According to the version number information identification method provided by the embodiment of the invention, character identification is carried out on M frames of target video frame images of the target video, character identification results of each frame of target video frame images are obtained, the character identification results of the M frames of target video frame images are classified according to keywords of N preset version number information types, K character identification results corresponding to the J preset version number information types are obtained, voting is carried out on the K character identification results corresponding to the J preset version number information types, so that to-be-output character identification results corresponding to the J preset version number information types are determined, and the target version number information of the target video is obtained according to the to-be-output character identification results corresponding to the N preset version number information types. Therefore, the recording number picture is not required to be manually positioned to acquire the version number information of the video, and the efficiency of acquiring the version number information of the video is improved.

Referring to fig. 3, fig. 3 is a flowchart illustrating steps of another version number information identifying method according to an embodiment of the present invention, where the method includes the following steps:

step 301, extracting a first video frame image of a P frame from video frame images of a first time period every a first preset time period within the first time period of the target video.

The first time period is, for example, [ t1, t2], for example t1=0, t2=30, and the time unit is seconds. The first preset duration is 1 second, and P is equal to 1, then 1 frame of first video frame image can be extracted from video images of 0 to 30 seconds every 1 second, and then 30 frames of first video frame images can be extracted in total. The first video frame image may be a key frame image, i.e. 30 frames of key frame images are extracted.

Step 302, extracting image features of each frame of first video frame image in all first video frame images.

Wherein the image features of the first video frame image may comprise global features and local region features.

Step 303, determining a second time period according to the image characteristics of the q-th frame of the first video frame images and the image characteristics of the T-frame reference video frame images in the characteristic database.

Step 303 determines the second time period according to the image features of the q-th frame of the first video frame images and the image features of the T-frame reference video frame images in the feature database, which may be implemented in the following two ways:

The first mode comprises the following steps:

determining a feature difference between the first global feature of the first video frame image of the q-th frame and the second global feature of the T-frame reference video frame image according to the first global feature of the image feature of the first video frame image of the q-th frame and the second global feature of the image feature of the T-frame reference video frame image;

determining the minimum global feature difference corresponding to the first video frame image of the q-th frame from the T global feature differences according to the global feature difference between the first global feature of the first video frame image of the q-th frame and the second global feature of the reference video frame image of the T frame;

under the condition that the minimum global feature difference is smaller than or equal to a first preset threshold value, determining the q first video frame image as a film head image of the target video;

and determining a second time period according to the determined time point corresponding to each of all the head images.

For example, if there are 30 frames of first video frame images in total, the value of q may be equal to 1, 2, 3, 4 … …, and when q is equal to 1, if the global feature difference between the first global feature of the 1 st frame of first video frame image and the second global feature of the 1 st frame of reference video frame image in the T frame of reference video frame image is the smallest, the global feature difference is the smallest global feature difference corresponding to the 1 st frame of first video frame image, and the 1 st frame of reference video frame image is the nearest neighbor image of the 1 st frame of first video frame image. If the global feature difference between the first global feature of the 1 st frame first video frame image and the second global feature of the 2 nd frame reference video frame image in the T frame reference video frame images is the smallest, the global feature difference is the smallest global feature difference corresponding to the 1 st frame first video frame image, and the 2 nd frame reference video frame image is the nearest neighbor image of the 1 st frame first video frame image.

And under the condition that the global feature difference between the first global feature of the 1 st frame first video frame image and the second global feature of the 1 st frame reference video frame image is minimum, if the global feature difference is smaller than or equal to a first preset threshold value, determining the 1 st frame first video frame image as a head image of the target video.

According to the method for determining the head image of the target video, whether the 2 nd frame first video frame image is the head image of the target video or not can be sequentially judged, and whether the 3 rd frame first video frame image is the head image of the target video or not can be judged until whether the 30 frame first video frame image is the head image of the target video or not is judged.

Optionally, before determining that the qth first video frame image is the title image of the target video, the method may further include the following steps:

determining a local region feature difference between a T first local region feature of a q-th frame first video frame image and a T second local region feature of a target reference video image, wherein the target reference video image is determined according to a global feature difference between a first global feature of the q-th frame first video frame image and a second global feature of the T-frame reference video frame image;

Determining weighted values of the s local region feature differences according to the s local region feature differences and the weighted values corresponding to each local region feature difference in the s local region feature differences;

correspondingly, under the condition that the minimum global feature difference is smaller than or equal to a first preset threshold value, the q-th first video frame image is determined to be the film head image of the target video, and the method can be realized by the following steps:

under the condition that the minimum global feature difference is smaller than or equal to a first preset threshold value and the weighted value is smaller than or equal to a second preset threshold value, determining that a first video frame image of a q-th frame is a film head image of a target video;

the method comprises the steps that s is an integer greater than 1, t is greater than or equal to 1 and is less than or equal to s, a first video frame image of a q-th frame comprises s first local areas, a target reference video image is a reference video image corresponding to the minimum global feature difference, the target reference video image comprises s second local areas, the t first local areas are characterized by t first local areas in the s first local areas, the t second local areas are characterized by t second local areas in the s second local areas, and the t first local areas correspond to the t second local areas.

In this embodiment, the second preset threshold may be greater than the first preset threshold, for example, the second preset threshold is equal to 1.5 times the first preset threshold. The second preset threshold may also be equal to the first preset threshold.

It should be noted that the target reference video image is determined according to a global feature difference between a first global feature of the first video frame image of the q-th frame and a second global feature of the T-frame reference video frame image. For example, if the global feature difference between the first global feature of the 1 st first video frame image and the second global feature of the 1 st reference video frame image in the T frame reference video frame images is the smallest, the 1 st reference video frame image is the target reference video image corresponding to the 1 st first video frame image, that is, the nearest neighbor image of the 1 st first video frame image is the target reference video image corresponding to the 1 st first video frame image. Likewise, a target reference video image corresponding to the 2 nd frame first video frame image may be determined, e.g., if a global feature difference between a first global feature of the 2 nd frame first video frame image and a second global feature of a 3 rd frame reference video frame image of the T frame reference video frame images is minimal, the 3 rd frame reference video frame image is the target reference video image corresponding to the 2 nd frame first video frame image.

If s is equal to 9, if the target reference image includes 3×3 total 9 second local areas, the q-th frame first video frame image is also divided into 3×3 total 9 first local areas, and then the 1 st first local area corresponds to the 1 st second local area, that is, the 1 st first local area and the 1 st second local area are both the same in size and position, for example, are both located in the 1 st row and the 1 st column of the image.

In this embodiment, the determining the second time period according to the determined time point corresponding to each of the slice header images may be implemented by:

determining a minimum time point and a maximum time point from the time points corresponding to each of the determined slice head images;

and determining a second time period according to the minimum time point, the maximum time point and P.

For example, if the minimum time point is denoted as d, the maximum time point is denoted as f, the start time point of the second period is set to t3, and the end time point of the second period is set to t3, then

The method II comprises the following steps:

under the condition that the minimum global feature difference is larger than a first preset threshold value or the minimum global feature difference is smaller than or equal to the first preset threshold value and the weighted value is larger than a second preset threshold value, determining that the first video frame image of the q-th frame is not a film head image of the target video;

And determining the first time period as the second time period in the case that all the first video frame images are not the head images of the target video.

It should be noted that, in the case that all the first video frame images are not the film head images of the target video, the first time period is determined to be the second time period, and in the subsequent step, the M frame target video frame images can be determined according to the video frame images in the second time period.

In this embodiment, if the minimum global feature difference is greater than the first preset threshold in the process of executing the first mode, it is determined that the first video frame image of the q-th frame is not the head image of the target video. Or if the minimum global feature difference is smaller than or equal to a first preset threshold value and the weighted value is larger than a second preset threshold value, determining that the first video frame image of the q-th frame is not the film head image of the target video.

And determining that the first video frame image of the q-th frame is not the head image of the target video under the condition that the minimum global feature difference is smaller than or equal to a first preset threshold value but the weighting value is larger than a second preset threshold value. That is, when the minimum global feature difference is smaller than or equal to the first preset threshold, the first local region feature of the first video frame image of the q-th frame and the second local region feature of the target reference video image can be further compared to determine the local feature difference, and then a weighting value is calculated according to the local feature difference. Therefore, whether the first video frame image of the q-th frame is the head image of the target video or not is judged according to the weighted values of the global feature difference and the local area feature difference, and the identification accuracy of the head image of the target video is improved.

It should be noted that, the second preset threshold may be equal to a first preset threshold that is 1.5 times of the first preset threshold, or the second preset threshold is equal to or less than the first preset threshold, and the size between the first preset threshold and the second preset threshold is not specifically limited in this embodiment.

And 304, determining M frame target video frame images according to the video frame images in the second time period.

Wherein P, T is an integer greater than or equal to 1, q is greater than or equal to 1 and n is the total number of frames of all the first video frame images, and the T frame reference video frame images comprise at least one video head image.

Wherein, step 304 determines an M-frame target video frame image according to the video frame image in the second time period, which may be implemented by:

extracting S frames of second video frame images from the video frame images of the second time period every second preset time period in the second time period;

and taking all second video frame images as M frame target video frame images.

In this embodiment, the value of S may be greater than the value of P. For example, where P is equal to 1, S is equal to 3, for example, 3 frames per second of the second video frame image are extracted, and the second video frame image may be a key frame image, i.e., 3 frames per second of the key frame image are extracted.

And 305, performing character recognition on M frames of target video frame images of the target video to obtain a character recognition result of each frame of target video frame image.

And 306, filtering invalid character recognition results in the character recognition results of the M-frame target video frame image according to the preset j version number information identifications.

The invalid character recognition result is a character recognition result which does not comprise any version number information identification.

For example, words such as "certificate", "number", "word", "office", "unit" and the like may be used as the version number information identifier, and if any word is not included in a certain word recognition result, the word recognition result may be filtered out. Or, in a mode of combining and restraining a plurality of words, for example, if one word recognition result only comprises a word, and the other word recognition result simultaneously comprises a word and a number, the word recognition result only comprising the word can be filtered, and the word recognition result comprising the word and the number can be reserved. Or combining similar font filtering modes, for example, the font of "space" is similar to that of "word", and if a certain word recognition result includes "space", the word recognition result can be reserved.

By filtering the invalid character recognition result in the step, the calculation amount for classifying and voting the character recognition result can be reduced, and the efficiency of identifying the version number information can be further improved.

Step 307, classifying the effective text recognition results according to the keywords of the N preset version number information types, and obtaining K text recognition results corresponding to the J preset version number information types.

The effective character recognition result is a character recognition result except for an invalid character recognition result in the character recognition results of the M-frame target video frame image.

Step 308, voting is performed on the K text recognition results corresponding to the J-th preset version number information type, so as to determine the text recognition result to be output corresponding to the J-th preset version number information type.

Step 309, determining a target text recognition result from the N text recognition results to be output according to the text recognition results to be output corresponding to the N preset version number information types and the position information of the text recognition results to be output in the target video frame image.

For example, referring to fig. 4, fig. 4 is a schematic diagram of version number information of another movie video according to an embodiment of the present invention. And if the character recognition results to be output corresponding to the 4 preset version number information types comprise a character recognition result A to be output, a character recognition result B to be output, a character recognition result C to be output and a character recognition result D to be output. The position of the character recognition result A to be output in the target video frame image is the position 1, the position of the character recognition result B to be output in the target video frame image is the position 2, the position of the character recognition result C to be output in the target video frame image is the position 3, and the position of the character recognition result D to be output in the target video frame image is the position 4. If the longitudinal distance between the position 1 and the position 2 is greater than the threshold 1 and the transverse distance is greater than the threshold 4, the longitudinal distance between the position 1 and the position 3 is greater than the threshold 2 and the transverse distance is greater than the threshold 5, and the longitudinal distance between the position 1 and the position 4 is greater than the threshold 3 and the transverse distance is greater than the threshold 6, the word recognition result to be output corresponding to the position 1 can be considered to be abnormal, the word recognition result to be output is determined to be a target word recognition result, and the word recognition result to be output is filtered. The threshold 1, the threshold 2, the threshold 3, the threshold 4, the threshold 5 and the threshold 6 can be set according to actual requirements.

It should be noted that, under normal conditions, the position distribution between the word recognition results to be output is relatively concentrated, the abscissa of the central position of the word recognition result to be output is equal or the difference between the abscissas is not large, the difference between the abscissas of the central positions of the two adjacent word recognition results to be output is not too large, that is, the positions of the two adjacent word recognition results to be output have spatial coherence, so that it can determine whether there is a word recognition result to be output at an abnormal position based on the position information of the word recognition result to be output, and if there is a word recognition result to be output at an abnormal position, the word recognition result to be output at the abnormal position is filtered.

Step 310, filtering out target character recognition results in the N character recognition results to be output.

Step 311, obtaining the target version number information according to the other character recognition results to be output in the N character recognition results to be output.

The other character recognition results to be output are the character recognition results to be output except the target character recognition result in the N character recognition results to be output.

And determining a target character recognition result, filtering the target character recognition result, and further obtaining target version number information according to other character recognition results to be output in the N character recognition results to be output, thereby further improving the accuracy of the obtained target version number information.

It should be noted that, when it is determined that at least two text recognition results to be output all belong to the same version number information type according to the position information of at least two text recognition results to be output in other text recognition results to be output, the at least two text recognition results to be output are combined. For example, the word recognition result a to be output is the XXX part, the word recognition result B to be output is the XXX office, and the word recognition result a to be output and the word recognition result B to be output are both of the same version number information type, that is, are both of the issuing unit type of the version number information, and the word recognition result a to be output and the word recognition result B to be output are combined.

In this embodiment, after the target version number information is obtained, it may be further determined whether there is a text recognition error in the target version number information, and if there is a text recognition error in the target version number information, correction is performed on the wrong character in the target version number information. For example, if "space" is included in the target version number information, the "space" is corrected to "word". If the target version number information includes "[ [ MEANS FOR ] then" [ MEANS FOR ] is corrected to "[". Therefore, the workload of manually checking whether the target version number information is correct can be reduced.

Optionally, after step 311, the following steps may be further included:

determining a starting time point and an ending time point of a film head image of the target video according to the time point of each character recognition result to be output in the N character recognition results to be output in the target video;

displaying target version number information under the condition that all first video frame images are not film head images of target video;

and under the condition that operation information for operating the target control is acquired, determining a target film head image from the film head images in a third time period between a starting time point and an ending time point, and storing the target film head image, global features of the target film head image and local image features of the target film head image in a feature database.

After the starting time point and the ending time point of the film head image of the target video are determined, the starting time point and the ending time point of the film head image of the target video can be recorded in the film information management system, so that a user can conveniently and quickly position the film head image of the target video according to the recorded starting time point and ending time point.

In this embodiment, when all the first video frame images are not the film head images of the target video, the first time period is set to be the second time period, then the M frame target video frame images are determined according to the video frame images in the second time period, and the target version number information of the target video is obtained subsequently. It means that the sample of the reference video frame image recorded in the feature database is not sufficiently comprehensive, in which case the target plate number information may be displayed, and in the case where confirmation of the target plate number information is obtained, the target plate number image may be determined from the plate head image of the third period, and the target plate head image, the global feature of the target plate head image, and the local region feature of the target plate head image may be stored in the feature database. Therefore, the reference video frame image, the global features and the local region features of the image can be automatically added into the feature database, and the data in the feature database is further perfected, so that the feature database can be dynamically updated based on the detected target film head image, and compared with a mode of purely manually collecting various types of film head images and constructing the feature database, the operation and maintenance cost of the feature database is reduced.

It should be noted that, after the step 311 obtains the target version number information according to the other text recognition results to be output in the N text recognition results to be output, the method may further include the following steps:

and storing the target video and the target version number information of the target video.

In this embodiment, the target version number information may be input to the video information management system for digital video information management, so that a user may query the version number information of the video through the video information management system conveniently.

Referring to fig. 5, fig. 5 is a schematic structural diagram of a version number information identifying apparatus according to an embodiment of the present invention, including:

the first obtaining module 510 is configured to perform text recognition on M frame target video frame images of a target video, and obtain a text recognition result of each frame of the target video frame images;

the second obtaining module 520 is configured to classify the text recognition results of the M frames of the target video frame image according to the keywords of the N preset version number information types, and obtain K text recognition results corresponding to the J-th preset version number information type;

a first determining module 530, configured to vote on K text recognition results corresponding to the J-th preset version number information type, so as to determine a text recognition result to be output corresponding to the J-th preset version number information type;

A third obtaining module 540, configured to obtain target version number information of the target video according to the text recognition results to be output corresponding to the N preset version number information types;

Optionally, the method further comprises:

the first filtering module is used for filtering invalid text recognition results in the text recognition results of the M frames of the target video frame images according to the preset j version number information identifications, wherein the invalid text recognition results are text recognition results which do not comprise any version number information identification;

the second obtaining module 520 is specifically configured to classify the valid text recognition results according to the keywords of the N preset version number information types, and obtain K text recognition results corresponding to the J preset version number information types;

the effective character recognition result is a character recognition result except the ineffective character recognition result in the character recognition results of the M frames of the target video frame images.

Optionally, the method further comprises:

the second determining module is used for determining a target character recognition result from the N character recognition results to be output according to the N character recognition results to be output corresponding to the preset version number information types and the position information of the character recognition results to be output in the target video frame image;

The second filtering module is used for filtering the target character recognition results in the N character recognition results to be output;

the third obtaining module 540 is specifically configured to obtain the target version number information according to other text recognition results to be output in the N text recognition results to be output;

and the other character recognition results to be output are the character recognition results to be output except the target character recognition result in the N character recognition results to be output.

Optionally, the method further comprises:

the first extraction module is used for extracting a first video frame image of a P frame from video frame images of a first time period every a first preset time period in the first time period of the target video;

the second extraction module is used for extracting the image characteristics of each frame of first video frame image in all the first video frame images;

the third determining module is used for determining a second time period according to the image characteristics of the first video frame image of the q-th frame in all the first video frame images and the image characteristics of the reference video frame image of the T frame in the characteristic database;

a fourth determining module, configured to determine M frames of the target video frame images according to the video frame images in the second time period;

Wherein P, T is an integer greater than or equal to 1, q is greater than or equal to 1 and is less than or equal to n, n is the total frame number of all the first video frame images, and the T frame reference video frame image comprises a slice header image of at least one video.

Optionally, the third determining module includes:

a first determining unit, configured to determine a feature difference between a first global feature of the first video frame image of the q-th frame and a second global feature of the reference video frame image of the T-th frame according to the first global feature of the image feature of the first video frame image of the q-th frame and the second global feature of the image feature of the reference video frame image of the T-th frame;

the second determining unit is used for determining the minimum global feature difference corresponding to the first video frame image of the q-th frame from the T global feature differences according to the global feature difference between the first global feature of the first video frame image of the q-th frame and the second global feature of the reference video frame image of the T frame;

a third determining unit, configured to determine, when the minimum global feature difference is less than or equal to a first preset threshold, that a q-th first video frame image is a film header image of the target video;

and a fourth determining unit, configured to determine the second time period according to the determined time point corresponding to each of the slice header images.

Optionally, the fourth determining unit is specifically configured to determine a minimum time point and a maximum time point from the determined time points corresponding to each of the slice header images; and determining the second time period according to the minimum time point, the maximum time point and the P.

Optionally, the third determining module further includes:

a fifth determining unit, configured to determine a local area feature difference between a T first local area feature of the first video frame image of a q-th frame and a T second local area feature of a target reference video image corresponding to the first video frame image of the q-th frame, where the target reference video image is determined according to a global feature difference between a first global feature of the first video frame image of the q-th frame and a second global feature of the reference video frame image of the T-th frame;

a sixth determining unit, configured to determine weighted values of s local area feature differences according to s local area feature differences and weighted values corresponding to each of the s local area feature differences;

the third determining unit is specifically configured to determine that the first video frame image of the q-th frame is a film head image of the target video when the minimum global feature difference is less than or equal to a first preset threshold and the weighted value is less than or equal to a second preset threshold;

And the characteristics of the t th local area are the characteristics of the t th local area in the s second local areas, and the t th local area corresponds to the t th local area.

Optionally, the third determining module is specifically configured to determine that the first video frame image of the q-th frame is not a film head image of the target video when the minimum global feature difference is greater than the first preset threshold, or when the minimum global feature difference is less than or equal to the first preset threshold and the weighted value is greater than the second preset threshold;

and determining that the first time period is the second time period under the condition that all the first video frame images are not the head images of the target video.

Optionally, the fourth determining module is specifically configured to extract S-frame second video frame images from the video frame images in the second time period every second preset duration in the second time period;

And taking all second video frame images as the target video frame images of the M frames.

Optionally, the method further comprises:

a fifth determining module, configured to determine a start time point and an end time point of a film head image of the target video according to a time point of each word recognition result to be output in the target video in the N word recognition results to be output;

the display module is used for displaying the target version number information under the condition that all the first video frame images are not the film head images of the target video;

and a sixth determining module, configured to determine, when operation information for operating the target control is obtained, a target tip image from the tip images in a third time period between the start time point and the end time point, and store the target tip image, global features of the target tip image, and local region features of the target tip image in the feature database.

The embodiment of the invention also provides an electronic device, as shown in fig. 6, and fig. 6 is a schematic structural diagram of the electronic device provided in the embodiment of the invention. Comprising a processor 601, a communication interface 602, a memory 603 and a communication bus 604, wherein the processor 601, the communication interface 602, the memory 603 perform communication with each other via the communication bus 604,

A memory 603 for storing a computer program;

the processor 601 is configured to execute the program stored in the memory 603, and implement the following steps:

The communication bus mentioned by the above terminal may be a peripheral component interconnect standard (PerIPheral Component Jnterconnect, abbreviated as PCJ) bus or an extended industry standard architecture (Extended Jndustry Standard ArchJtecture, abbreviated as EJSA) bus, or the like. The communication bus may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus.

The communication interface is used for communication between the terminal and other devices.

The memory may include random access memory (Random Access Memory, RAM) or non-volatile jle memory (non-volatile memory), such as at least one disk memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.

The processor may be a general-purpose processor, including a central processing unit (Central ProcessJng UnJt, CPU for short), a network processor (NetworK Processor, NP for short), etc.; but also digital signal processors (DJgJtal SJgnal ProcessJng, DSP for short), application specific integrated circuits (ApplJcatJon SpecJfJc Jntegrated CJrcuJt, ASJC for short), field programmable gate arrays (FJeld-Programmable Gate Array, FPGA for short) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.

In yet another embodiment of the present invention, a computer readable storage medium is provided, in which instructions are stored, which when run on a computer, cause the computer to perform the version number information identifying method according to any one of the above embodiments.

In yet another embodiment of the present invention, a computer program product containing instructions that, when run on a computer, cause the computer to perform the method of identifying version number information as described in any of the above embodiments is also provided.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present invention, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, by wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State disk DJsK (SSD)), etc.

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

In this specification, each embodiment is described in a related manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.

The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention are included in the protection scope of the present invention.

Claims

1. A version number information identification method, characterized by comprising:

extracting P frame first video frame images from video frame images of a first time period every a first preset time period in the first time period of a target video;

extracting image characteristics of each frame of first video frame image in all the first video frame images;

determining a second time period according to the image characteristics of the first video frame image of the q-th frame in all the first video frame images and the image characteristics of the reference video frame image of the T frame in the characteristic database;

determining M frames of target video frame images according to the video frame images in the second time period;

wherein P, T is an integer greater than or equal to 1, q is greater than or equal to 1 and is less than or equal to N, N is the total frame number of all the first video frame images, the T frame reference video frame image comprises a slice header image of at least one video, M, N, K is an integer greater than or equal to 1, and J is greater than or equal to 1 and is less than or equal to N.

2. The method according to claim 1, wherein before classifying the text recognition results of the M frames of the target video frame image according to the keywords of the N preset version number information types to obtain K text recognition results corresponding to the J-th preset version number information type, further comprises:

filtering invalid text recognition results in the text recognition results of the M frames of the target video frame images according to the preset j version number information identifications, wherein the invalid text recognition results are text recognition results which do not comprise any version number information identification;

Classifying the character recognition results of the M frames of the target video frame images according to the keywords of the N preset version number information types to obtain K character recognition results corresponding to the J th preset version number information type, wherein the method comprises the following steps:

classifying the effective character recognition results according to the keywords of the N preset version number information types to obtain K character recognition results corresponding to the J-th preset version number information type;

3. The method according to claim 1, further comprising, before obtaining the target version number information of the target video according to the text recognition results to be output corresponding to the N preset version number information types:

determining a target character recognition result from the N character recognition results to be output according to the character recognition results to be output corresponding to the N preset version number information types and the position information of the character recognition results to be output in the target video frame image;

filtering out the target character recognition results in the N character recognition results to be output;

The obtaining the target version number information according to the character recognition results to be output corresponding to the N preset version number information types comprises the following steps:

obtaining the target version number information according to other character recognition results to be output in the N character recognition results to be output;

4. The method of claim 1, wherein determining the second time period based on the image features of the q-th frame of the first video frame image and the image features of the T-frame reference video frame image in the feature database comprises:

determining a feature difference between the first global feature of the first video frame image of the q-th frame and the second global feature of the reference video frame image of the T-th frame according to the first global feature of the image feature of the first video frame image of the q-th frame and the second global feature of the image feature of the reference video frame image of the T-th frame;

according to the global feature difference between the first global feature of the first video frame image of the q-th frame and the second global feature of the reference video frame image of the T frame, determining the minimum global feature difference corresponding to the first video frame image of the q-th frame from the T global feature differences;

Under the condition that the minimum global feature difference is smaller than or equal to a first preset threshold value, determining the q-th first video frame image as a head image of the target video;

and determining the second time period according to the determined time point corresponding to each of the head images.

5. The method of claim 4, wherein the determining the second period of time according to the determined time point corresponding to each of all the slice header images includes:

determining a minimum time point and a maximum time point from the determined time points corresponding to each of the head images;

and determining the second time period according to the minimum time point, the maximum time point and the P.

6. The method of claim 4 or 5, further comprising, prior to said determining that the first video frame image of the q-th frame is a slice header image of the target video:

determining local area characteristic differences between a T first local area characteristic of the first video frame image of a q frame and a T second local area characteristic of a target reference video image corresponding to the first video frame image of the q frame, wherein the target reference video image is determined according to global characteristic differences between a first global characteristic of the first video frame image of the q frame and a second global characteristic of the reference video frame image of the T frame;

Determining weighted values of s local area characteristic differences according to the s local area characteristic differences and the weighted values corresponding to each local area characteristic difference in the s local area characteristic differences;

and determining that the q-th first video frame image is a film head image of the target video under the condition that the minimum global feature difference is smaller than or equal to a first preset threshold value, including:

determining that the first video frame image of the q-th frame is a film head image of the target video under the condition that the minimum global feature difference is smaller than or equal to a first preset threshold value and the weighting value is smaller than or equal to a second preset threshold value;

7. The method of claim 6, wherein determining the second time period based on the image features of the q-th frame of the first video frame image and the image features of the T-frame reference video frame image in the feature database comprises:

determining that the first video frame image of the q-th frame is not a film head image of the target video under the condition that the minimum global feature difference is larger than the first preset threshold or the minimum global feature difference is smaller than or equal to the first preset threshold and the weighting value is larger than the second preset threshold;

8. The method of any of claims 1-5, wherein determining M frames of the target video frame image from video frame images within the second time period comprises:

extracting S frames of second video frame images from the video frame images of the second time period every second preset time period;

9. The method according to claim 6, further comprising, after obtaining the target version number information of the target video according to the text recognition results to be output corresponding to the N preset version number information types:

determining a starting time point and an ending time point of a film head image of the target video according to the time point of each character recognition result to be output in the target video in the N character recognition results to be output;

displaying the target version number information under the condition that all the first video frame images are not the film head images of the target video;

and under the condition that operation information for operating the target control is acquired, determining a target head image from the head images in a third time period from the starting time point to the ending time point, and storing the target head image, global features of the target head image and local region features of the target head image in the feature database.

10. A version number information identifying apparatus, comprising:

11. The electronic equipment is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus;

a memory for storing a computer program;

a processor for carrying out the method steps of any one of claims 1-9 when executing a program stored on a memory.

12. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any one of claims 1-9.