CN109871463A

CN109871463A - Audio-frequency processing method, device, electronic equipment and storage medium

Info

Publication number: CN109871463A
Application number: CN201910168211.9A
Authority: CN
Inventors: 孔令城
Original assignee: Tencent Music Entertainment Technology Shenzhen Co Ltd
Current assignee: Tencent Music Entertainment Technology Shenzhen Co Ltd
Priority date: 2019-03-06
Filing date: 2019-03-06
Publication date: 2019-06-11
Anticipated expiration: 2039-03-06
Also published as: CN109871463B

Abstract

The embodiment of the invention discloses a kind of method, apparatus of audio processing, electronic equipment and storage mediums, wherein, method includes: to extract the audio-frequency fingerprint of target audio, obtains inverted index table, the inverted index table includes the finger print information of the target audio and the target audio；The finger print information representative degree of the target audio is obtained according to the finger print information of the target audio；If the finger print information representative degree of the target audio deletes the finger print information of the target audio lower than expection from the inverted index table.By screening to the data in inverted index table, memory consumption can be reduced, and improve recall precision.

Description

Audio-frequency processing method, device, electronic equipment and storage medium

Technical field

The present invention relates to multi-medium data technical field more particularly to a kind of audio-frequency processing method, a kind of audio processing dress It sets, a kind of electronic equipment and a kind of storage medium.

Background technique

With the development of internet and audio fingerprint techniques, a kind of audio retrieval mode based on audio-frequency fingerprint meet the tendency of and It is raw.The audio retrieval mode only needs to extract audio-frequency fingerprint from the audio section that user inputs, by audio-frequency fingerprint and inverted index Audio-frequency fingerprint in table compares, and the mapping relations between audio-frequency fingerprint and audio is had recorded in inverted index table, according to right Relevant audio can be retrieved than result.The audio retrieval mode does not need user and is manually entered text, can be more just Audio is retrieved promptly, the favor by more and more people.It is found in practice, in the audio retrieval mode, if inverted index table In include audio-frequency fingerprint it is excessive, then can for user search go out more Multi-audio-frequency, but increase secondary filter workload and needs Larger memory is consumed to store inverted index table；It is more that secondary filter refers to that the fingerprint for working as the audio section inputted according to user retrieves After a relevant audio, need to filter out the audio of user needs from the audio retrieved again.If inverted index table In include audio-frequency fingerprint it is very few, then can reduce the workload of secondary filter, and reduce memory consumption, but can not be user Retrieve more Multi-audio-frequency.Therefore, inverted index table is the key factor for influencing retrieval performance.

Summary of the invention

The technical problem to be solved by the embodiment of the invention is that providing a kind of audio-frequency processing method, device, electronic equipment And storage medium can reduce memory consumption, and improve recall precision by screening to the data in inverted index table.

On the one hand, the embodiment of the present invention provides a kind of audio-frequency processing method, this method comprises:

Extract the audio-frequency fingerprint of target audio；

Inverted index table is obtained, the inverted index table includes the fingerprint letter of the target audio and the target audio Breath, the finger print information of the target audio are the cryptographic Hash of the audio-frequency fingerprint of the target audio；

The finger print information representative degree of the target audio, the target sound are obtained according to the finger print information of the target audio The finger print information representative degree of frequency be the target audio finger print information inverse text frequency, the inverse text frequency with match sound Frequency amount is inversely proportional；

If the finger print information representative degree of the target audio deletes the mesh lower than expection from the inverted index table The finger print information of mark with phonetic symbols frequency.

On the one hand, the embodiment of the present invention provides a kind of apparatus for processing audio, which includes:

Extraction unit, for extracting the audio-frequency fingerprint of target audio；

Acquiring unit, for obtaining inverted index table, the inverted index table includes the target audio and the target The finger print information of audio, the finger print information of the target audio are the cryptographic Hash of the audio-frequency fingerprint of the target audio；According to institute The finger print information for stating target audio obtains the finger print information representative degree of the target audio, the finger print information generation of the target audio Scale is the inverse text frequency of the finger print information of the target audio, and the inverse text frequency is inversely proportional with audio quantity is matched；

Unit is deleted, is expected if the finger print information representative degree for the target audio is lower than, from the inverted index The finger print information of the target audio is deleted in table.

On the one hand, the embodiment of the present invention provides a kind of electronic equipment, comprising: processor and storage device；

The storage device is stored with computer program instructions, and the processor calls the computer program instructions, uses In execution following steps:

Extract the audio-frequency fingerprint of target audio；

On the one hand, the embodiment of the present invention provides a kind of computer readable storage medium, the computer readable storage medium Computer program instructions are stored with, which is performed, this method comprises:

Extract the audio-frequency fingerprint of target audio；

In the embodiment of the present invention, the audio-frequency fingerprint of the target audio is extracted, obtains the generation of the finger print information of target audio Scale, when the finger print information representative degree of target audio is lower than expection, then by the fingerprint letter of the target audio from inverted index table Breath is deleted；The finger print information in inverted index table can be screened by the finger print information representative degree of target audio, with Retain the higher finger print information of representative degree, and delete the lower finger print information of representative degree, memory space can be saved, reduces electronics and set Standby resource consumption, and simplify inverted index table more.Simultaneously as the retrieval performance of the lower finger print information of representative degree compared with Difference, therefore, deleting this kind of finger print information will not influence retrieval performance；Instead, it is carried out by the higher finger print information of representative degree It retrieves that its retrieval performance is higher, the workload of secondary filter can be reduced.

Detailed description of the invention

In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with It obtains other drawings based on these drawings.

Fig. 1 is a kind of flow diagram of audio-frequency processing method of offer of the invention；

Fig. 2 is the flow diagram of another audio-frequency processing method of offer of the invention；

Fig. 3 is a kind of structural schematic diagram of apparatus for processing audio of offer of the invention；

Fig. 4 is the structural schematic diagram of a kind of electronic equipment of offer of the invention.

Specific embodiment

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.

In the prior art, in order to retrieve more Multi-audio-frequency, usually all finger print informations of each audio are added to In inverted index table, to the larger storage pressure of storage equipment belt, for example, usual 10,000 song needs to consume depositing for 1G or so Space is stored up, for thousands of library, then needs to consume the memory space for being up to several TB.Based on this, the embodiment of the present invention A kind of audio-frequency processing method is provided, referring to Figure 1, this method can be applied to electronic equipment, which can be intelligence The equipment such as mobile phone, smartwatch, tablet computer or server, this method may include step S101~S104.

S101, the audio-frequency fingerprint for extracting the target audio.

Electronic equipment can carry out the processing such as time-frequency conversion to target audio and obtain the audio-frequency fingerprint of the target audio.The sound Frequency fingerprint just refers to the characteristic information of the target audio.

S102, inverted index table is obtained, which includes target audio and the finger print information of the target audio, institute The finger print information of target audio is stated as the cryptographic Hash of the audio-frequency fingerprint of the target audio.

It may include inverted index table in electronic equipment to efficiently retrieve the audio of user's needs, it should the row's of falling rope It is corresponding with the audio database in electronic equipment to draw table.The inverted index table includes multiple audios in audio database and every The finger print information of a audio, target audio can be any audio in multiple audio；Target audio may include multiple fingers Line information, the finger print information of the target audio of this hair inventive embodiments meaning can be the multiple audio-frequency fingerprints letter for referring to target audio Any audio-frequency fingerprint in breath.Wherein, the finger print information of the target audio is the cryptographic Hash of the audio-frequency fingerprint of the target audio, i.e., Electronic equipment can carry out Hash operation to the audio-frequency fingerprint of target audio, obtain the finger print information of the target audio.

S103, the finger print information representative degree that the target audio is obtained according to the finger print information of the target audio, the target The finger print information representative degree of audio be the target audio finger print information inverse text frequency, the inverse text frequency with match Audio quantity is inversely proportional.

The finger print information representative degree of target audio is used to describe the uniqueness (i.e. uniqueness) of the finger print information of target audio, The finger print information representative degree of the target audio can be the inverse text frequency and/or target audio for referring to the finger print information of target audio Finger print information matching times.In the inverse text frequency of the finger print information of the target audio and inverted index table with target audio The first quantity of the matched audio of finger print information be inversely proportional, for example, the fingerprint in inverted index table with less audio is believed It ceases and matches with the finger print information of target audio, the inverse text frequency of the finger print information of the target audio is low, shows target audio Finger print information uniqueness it is higher, the finger print information representative degree of target audio is higher；So, according to the fingerprint of target audio Information carrys out searched targets audio, then can retrieve less audio (the finger print information phase of i.e. all finger print informations and target audio Matched audio), reduce the workload of quadratic search filtering.Conversely, with the fingerprint of more audio in inverted index table The finger print information of information and target audio matches, and the inverse text frequency of the finger print information of the target audio is high, shows target sound The uniqueness of the finger print information of frequency is lower, i.e. the finger print information representative degree of target audio is lower；So, according to target audio Finger print information carrys out searched targets audio, then can retrieving more audio, (i.e. all finger print informations and the fingerprint of target audio are believed The matched audio of manner of breathing), increase the workload of quadratic search filtering.For example, the finger print information of target audio includes fingerprint letter A and finger print information B is ceased, there is 1000 audios, the fingerprint letter of 100 (the first audio quantity is 100) audios in inverted index table Breath matches with finger print information A, and the finger print information and finger print information B of 10 (the first audio quantity is 100) audios match.If Using finger print information A come searched targets audio, then 100 audios can be retrieved, the finger print information and fingerprint of this 100 audios are believed Breath A matches, then user needs to secondary filter is carried out in this 100 audios, to filter out target audio.According to target sound The finger print information B of frequency carrys out searched targets audio, then can retrieve 10 audios, the finger print information and target audio of this 10 audios Finger print information B match, then user needs to carry out secondary filter in this 10 (the first audio quantity be 10) audios, with sieve Select target audio.As it can be seen that the representative degree of finger print information A is lower than the representative degree degree of finger print information B, the i.e. inverse text of finger print information A This frequency is lower than the inverse text frequency of finger print information B, and acquires the workload that finger print information A retrieves target audio and be higher than The workload that acquisition finger print information B retrieves target audio, that is, when being retrieved to target audio, believed using fingerprint The performance that breath A is retrieved is lower than the performance retrieved using finger print information B.The matching times of the finger print information of target audio Refer to the matching times of the finger print information of the audio section carried in the finger print information and inquiry instruction of target audio, i.e. user uses The number that the corresponding audio section of the finger print information of target audio is inquired, if matching times are more, the fingerprint of the target audio The representative degree of information is higher, if matching times are fewer, the representative degree of the finger print information of the target audio is lower.

If the finger print information representative degree of S104, the target audio deletes the mesh lower than expection from the inverted index table The finger print information of mark with phonetic symbols frequency.

When being retrieved using the finger print information of target audio to target audio, if the finger print information of target audio represents Degree is higher, then facilitates rapidly to retrieve target audio, at this moment the utility value of the finger print information of target audio is relatively high；If The finger print information representative degree of target audio is lower, then increases the workload for retrieving target audio, reduce retrieval performance, this When target audio finger print information utility value it is relatively low.In order to simplify inverted index table more, fall to arrange to reduce storage Memory headroom consumed by concordance list, can be by the lower finger print information of utility value in inverted index table from target inverted index It is deleted in table.Specifically, if the finger print information representative degree of the target audio shows the fingerprint letter of the target audio lower than expection The utility value of breath is lower, and electronic equipment can delete the target audio and the fingerprint of the target audio from the inverted index table Information is discharged for storing the finger print information of target audio and the memory space of target audio, so as to store more utilizations Costly finger print information improves retrieval performance.

In the embodiment of the present invention, the representative degree of the finger print information of target audio is obtained, when the finger print information generation of target audio Scale then deletes the finger print information of the target audio from inverted index table lower than expection；Target audio can be passed through Finger print information representative degree screens the finger print information in inverted index table, to retain the higher finger print information of representative degree, and The lower finger print information of representative degree is deleted, memory space can be saved, reduce the resource consumption of electronic equipment, and make inverted index table More simplify.Simultaneously as the retrieval performance of the lower finger print information of representative degree is poor, therefore, this kind of finger print information is deleted simultaneously It will not influence retrieval performance；Instead, retrieve its retrieval performance by the higher finger print information of representative degree higher, can reduce The workload of secondary filter.

Fig. 2 is referred to, Fig. 2 provides another audio-frequency processing method for the embodiment of the present invention, and this method can be applied to electricity Sub- equipment, the electronic equipment can be the equipment such as smart phone, smartwatch, tablet computer or server, the embodiment of the present invention Difference with Fig. 1 is that the finger print information representative degree of the target audio of the embodiment of the present invention is the finger print information of the target audio Inverse text frequency, this is inversely proportional against text frequency and the first audio quantity.The first audio quantity is the inverted index table In finger print information and the target audio the matched audio of finger print information quantity, include multiple sounds in the inverted index table Frequently, which is any audio in the multiple audio.This method may include step S201~S206.

S201, the audio-frequency fingerprint for extracting the target audio.

In one embodiment, step s201 includes the following steps s21~s23.

S21, time-frequency conversion is carried out to the target audio, obtains the frequency domain information of the target audio.

S22, the energy matrix that the target audio is obtained according to the frequency domain information of the target audio.

S23, the audio-frequency fingerprint that the target audio is determined according to the energy matrix of the target audio.

In step s21~s23, electronic equipment can use FFT (Fast Fourier Transformation) algorithm Time-frequency conversion is carried out to the target audio, obtains the frequency domain information of the target audio, the frequency domain information of the target audio describes The relationship of frequency and pitch.The energy matrix that the target audio can be calculated according to the frequency domain information of target audio, to target sound The energy matrix of frequency is detected, audio-frequency fingerprint of the local maxima energy value that will test out as the target audio.

S202, concordance list is obtained, which includes target audio and the finger print information of the target audio, the target audio Finger print information be the target audio audio-frequency fingerprint cryptographic Hash.

The parameter information of the available target audio of electronic equipment, the parameter information include the energy of target audio, coloration, At least one parameter in loudness, pitch etc..The audio for being analyzed to obtain target audio to the parameter information of target audio refers to Line, e.g., the parameter information of the target audio are pitch, and the pitch of default pitch will be greater than in target audio as target audio Audio-frequency fingerprint.The cryptographic Hash of the audio-frequency fingerprint of target audio is calculated by hash algorithm, by the target audio and target sound The cryptographic Hash of the audio-frequency fingerprint of frequency is added in the inverted index table.

In one embodiment, which further includes that the finger print information of the target audio is located in the target audio Location information and/or the target audio the frequency that occurs in the target audio of finger print information；The location information is for this Position of the corresponding word of fingerprint in target audio.For example, the inverted index table is as shown in table 1, include in the inverted index table Audio 1, audio 2 and audio 3.Audio 1 includes finger print information A and B, and position of the finger print information A in audio 1 is 2s, in sound Frequently the frequency occurred in 1 is 1 time；Position of the finger print information B in audio 1 is 16s, and the frequency occurred in audio 1 is 1 It is secondary.Audio 2 includes finger print information A, and position of the finger print information A in audio 2 is 5s, and the frequency occurred in audio 2 is 1 It is secondary.Audio 3 includes finger print information A, and position of the finger print information A in audio 3 is 5s, and the frequency occurred in audio 3 is 1 It is secondary.As it can be seen that the finger print information A of audio section 1 is identical as inverted index table sound intermediate frequency 1, audio 2, the finger print information A of audio 3, audio The finger print information B of section 1 is identical as the finger print information A of inverted index table sound intermediate frequency 1；Therefore, the uniqueness of the finger print information B of audio 1 Property is stronger, and the finger print information B representative degree of audio 1 is stronger；Unique weaker, the fingerprint of audio 1 of the finger print information A of audio 1 Information A representative degree is weaker.

Table 1

S203, finger print information in included audio total quantity and the inverted index table is counted in the inverted index table With the matched audio quantity of finger print information of the target audio.

S204, ratio between the audio total quantity and the matching audio quantity is calculated.

S205, the inverse text frequency that the finger print information of the target audio is determined according to the ratio.

If the inverse text frequency of the finger print information of S206, the target audio is deleted from the inverted index table lower than expection Except the finger print information of the target audio.As shown in table 1, which is audio 1, and finger print information is finger print information A, if target The inverse text frequency of the finger print information A of audio then deletes the finger print information A of audio 1 lower than expection.Due to finger print information phase Together, then the inverse text frequency of the finger print information of each audio is identical, therefore, if calculating the inverse text of the finger print information of target audio This frequency then all deletes finger print information identical with the finger print information of target audio in inverted index table lower than expection.

In step S203~S206, electronic equipment can be made a return journey inverted index table by the inverse text frequency of finger print information In finger print information screened.Specifically, electronic equipment can count inverted index table sound intermediate frequency total quantity, inverted index table The total quantity of sound intermediate frequency may also mean that the audio total quantity in audio database, and count in inverted index table finger print information with The matched first audio quantity of the finger print information of target audio；Calculate the ratio between the audio total quantity and the first audio quantity Value, the inverse text frequency of the finger print information of the target audio is determined according to the ratio.Ratio is bigger, shows finger print information and target The matched audio quantity of the finger print information of audio is fewer, and the inverse text frequency of the finger print information of the target audio is bigger, the target The finger print information representative degree of audio is higher；Ratio is smaller, shows the matched audio of the finger print information of finger print information and target audio Quantity is more, and the inverse text frequency of the finger print information of the target audio is smaller, and the finger print information representative degree of the target audio is lower. Therefore, if the inverse text frequency of the finger print information of the target audio shows the finger print information generation of the target audio lower than expection Scale is lower, then the finger print information of the target audio is deleted from the inverted index table.For example, as shown in table 1, target audio is Audio 1, the finger print information of audio 1 include finger print information A and finger print information B, and the total audio quantity in inverted index table is 3.? There is the finger print information of 3 (i.e. the first audio quantity is 3) audios identical as the finger print information A of audio 1 in row's concordance list, fingerprint letter The inverse text frequency for ceasing A can be the ratio of total audio quantity and the first audio quantity, and the inverse text frequency of finger print information A is 1； Have in inverted index table 1 (i.e. the first audio quantity be 1), the inverse text frequency of finger print information B can for total audio quantity and The ratio of first audio quantity, the inverse text frequency of finger print information B are 3.Assuming that 2 are expected to, the inverse text frequency of finger print information A Lower than expection, finger print information A is deleted from inverted index table；The inverse text frequency of finger print information B, which is higher than, is expected, and fingerprint is believed B is ceased to retain.For each finger print information in inverted index table, the representative of each finger print information can be calculated using the above method Degree, and delete all lower than expected finger print information, simplify inverted index table more, and save memory space.

In one embodiment, if the inverse text frequency of the finger print information of the target audio is less than preset threshold, it is determined that The finger print information representative degree of the target audio is lower than expection, wherein the preset threshold is according to included by the inverted index table The quantity of included finger print information determines in information content and/or the inverted index table.

If the inverse text frequency of the finger print information of the target audio is less than preset threshold, show the finger print information of target audio Representative degree it is lower, it is determined that the finger print information representative degree of the target audio lower than be expected.Wherein, which is that basis should The quantity of included finger print information determines in information content included by inverted index table and/or the inverted index table, for example, Included information content is more in inverted index table, and/or, the quantity of included finger print information is got in the inverted index table More, then a lesser number can be set in the preset threshold, to delete the lower finger print information of a large amount of representative degrees；Inverted index table In included information content it is fewer, and/or, the quantity of included finger print information is fewer in the inverted index table, then this is default A biggish number can be set in threshold value, to delete the lower finger print information of a small amount of representative degree.

In one embodiment, it is assumed that including M audio, finger print information and target in inverted index table in inverted index table The matched audio quantity of the finger print information A of audio is V, and the inverse text frequency of the finger print information A of target audio is f, then target sound The inverse text frequency of the finger print information A of frequency can be indicated using following formula (1).

F=log10^(M/V) (1)

In one embodiment, which is loaded onto the objective function of memory, receives audio query instruction, Audio query instruction includes audio section, obtains the finger print information of the audio section, executes the objective function, should the row's of falling rope with basis Draw table to retrieve and the associated audio of the finger print information of the audio section.

In order to guarantee the real-time of retrieval, inverted index table can be loaded onto memory by electronic equipment, specifically, electronics The inverted index table can be loaded onto the objective function of memory by equipment, which can refer to for retrieving audio In function, which can be in remote procedure call function.When receiving inquiry instruction, extracts the inquiry instruction and take The finger print information of the audio section of band, and the objective function is executed, to be retrieved from audio database according to the inverted index table With the associated audio of finger print information of the audio.The audio that user wants can be retrieved by inverted index table, improve retrieval effect Rate.

In one embodiment, the finger print information representative degree of target audio can be of the finger print information of the target audio With number, electronic equipment can obtain the matching times of the finger print information of target audio from historical query record；Wherein, history The matching times of finger print information including multiple audios and the finger print information of each audio in inquiry record, the fingerprint letter of audio The matching times of breath refer to the matching times of the finger print information of the audio section carried in the finger print information and inquiry instruction of audio.If The matching times of the finger print information of target audio are more, show that user's more preference is retrieved using the finger print information of target audio Audio, then the information representative degree of target audio is higher, and the finger print information utility value of target audio is relatively high；If target audio The matching times of finger print information are less, show that user does not like and carry out retrieval audio using the finger print information of target audio, then The information representative degree of target audio is lower, and the finger print information utility value of target audio is relatively low.Therefore, when the finger of target audio The matching times of line information are less than preset times, then delete the finger print information of the target audio from inverted index table.

In another embodiment, the finger print information representative degree of the target audio may include the fingerprint letter of the target audio The inverse text frequency of the matching times of breath and the finger print information, electronic equipment can matching to the finger print information of target audio it is secondary It is weighted summation between the several and inverse text frequency of the finger print information, representative degree summation is obtained, if the representative degree summation is less than Preset value then shows that the representative degree of the finger print information of the target audio is lower, the finger print information of the target audio can be deleted. For example, it is assumed that the information representative degree summation of target audio is D, the inverse text frequency of finger print information is f, weight k1, fingerprint The matching times of information are S, weight K2, then the information representative degree summation of target audio can use following formula (2) table Show.Wherein, since the inverse text frequency of the finger print information of target audio is related with effectiveness of retrieval, the finger print information of target audio Matching times it is related with the retrieval habit preference of user, therefore, electronic equipment can be arranged according to the demand of user fingerprint letter The weight of the matching times of the inverse text frequency and finger print information of breath.If the weight of the inverse text frequency of finger print information is bigger, Inverted index table after screening can be realized efficient retrieval；If the weight of the matching times of finger print information is bigger, screen Inverted index table afterwards can more agree with the preference of user search.

D=fk1+Sk2 (2)

In the embodiment of the present invention, the representative degree of the finger print information of target audio is obtained, when the finger print information generation of target audio Scale then deletes the finger print information of the target audio from inverted index table lower than expection；Target audio can be passed through Finger print information representative degree screens the finger print information in inverted index table, to retain the higher finger print information of representative degree, and The lower finger print information of representative degree is deleted, memory space can be saved, reduce the resource consumption of electronic equipment, and make inverted index table More simplify.Simultaneously as the retrieval performance of the lower finger print information of representative degree is poor, therefore, this kind of finger print information is deleted simultaneously It will not influence retrieval performance；Instead, retrieve its retrieval performance by the higher finger print information of representative degree higher, by falling Retrieval performance can be improved in the screening of row's concordance list to a certain extent.

Based on foregoing description, the embodiment of the present invention provides a kind of structural schematic diagram of apparatus for processing audio, the audio processing Device can run on electronic equipment, and electronic equipment may include smart phone, smartwatch or computer etc. herein.Such as Fig. 3 Shown, which includes:

Extraction unit 301, for extracting the audio-frequency fingerprint of target audio.

Acquiring unit 302, for obtaining inverted index table, the inverted index table includes the target audio and the mesh The finger print information of mark with phonetic symbols frequency, the finger print information of the target audio are the cryptographic Hash of the audio-frequency fingerprint of the target audio；According to The finger print information of the target audio obtains the finger print information representative degree of the target audio, the finger print information of the target audio Representative degree be the target audio finger print information inverse text frequency, the inverse text frequency with match audio quantity at anti- Than.

Unit 303 is deleted, if the finger print information representative degree for the target audio arranges rope from described lower than expection Draw the finger print information that the target audio is deleted in table.

Optionally, extraction unit 301 obtain the target audio for carrying out time-frequency conversion to the target audio Frequency domain information；The energy matrix of the target audio is obtained according to the frequency domain information of the target audio；According to the target sound The energy matrix of frequency determines the audio-frequency fingerprint of the target audio.

Optionally, the matching audio quantity is the finger of the finger print information in the inverted index table and the target audio The quantity of the audio of line information matches, includes multiple audios in the inverted index table, and the target audio is the multiple sound Any audio in frequency.

Optionally, acquiring unit 302, for counting in the inverted index table included audio total quantity and described The matched audio quantity of finger print information of finger print information and the target audio in inverted index table；Calculate the audio sum Ratio between amount and the matched audio quantity；The inverse text of the finger print information of the target audio is determined according to the ratio This frequency.

Optionally, determination unit 304, if the inverse text frequency of the finger print information for the target audio is less than default threshold Value, it is determined that the finger print information representative degree of the target audio is lower than expection, wherein the preset threshold is according to the row of falling The quantity of included finger print information determines in information content included by concordance list and/or the inverted index table.

Optionally, the inverted index table further includes that the finger print information of the target audio is located in the target audio The frequency that location information and/or the finger print information of the target audio occur in the target audio.

Optionally, query unit 305, for the inverted index table to be loaded onto the objective function of memory；Receive sound Frequency inquiry instruction, the audio query instruction includes audio section；Obtain the finger print information of the audio section；Execute the target letter Number, to be retrieved and the associated audio of the finger print information of the audio section according to the inverted index table.

Fig. 4 is referred to, is the structural schematic diagram of a kind of electronic equipment provided in an embodiment of the present invention, the electronic equipment 1000 It include: processor 1001, user interface 1003, network interface 1004 and storage device 1005, processor 1001, user interface 1003, it is connected between network interface 1004 and storage device 1005 by bus 1002.

User interface 1003, for realizing human-computer interaction, user interface may include display screen or keyboard etc..Network connects Mouth 1004, for being communicatively coupled between external equipment.Storage device 1005 is coupled with processor 1001, each for storing Kind software program and/or multiple groups instruction.In the specific implementation, storage device 1005 may include the memory of high random access, and And it may also comprise nonvolatile memory, such as one or more disk storage equipment, flash memory device or other are non-volatile solid State stores equipment.Storage device 1005 can store an operating system (following abbreviation systems), such as ANDROID, IOS, or The operating systems such as LINUX.Storage device 1005 can also store network communication program, which can be used for and one Or multiple optional equipments, one or more terminal devices, one or more network equipments are communicated.Storage device 1005 may be used also To store user interface program, which can be by patterned operation interface by the content image of application program True to nature shows, and receives user by input controls such as menu, dialog box and keys and grasp to the control of application program Make.Storage device 1005 can also store one or more application program, such as audio processing application program, be used for inverted index Table is screened.

In one embodiment, the storage device 1005 can also be used to store one or more than one program instructions； The processor 1001, which can call, can hold audio-frequency processing method to realize when one or more than one program instructions Refrain is extracted, specifically, 1001 caller of the processor instruction executes following steps:

Extract the audio-frequency fingerprint of target audio；

Optionally, the processor 1001 can call described program to instruct, and execute following steps:

Time-frequency conversion is carried out to the target audio, obtains the frequency domain information of the target audio；

The energy matrix of the target audio is obtained according to the frequency domain information of the target audio；

The audio-frequency fingerprint of the target audio is determined according to the energy matrix of the target audio.

Count in the inverted index table finger print information in included audio total quantity and the inverted index table with The matched audio quantity of the finger print information of the target audio；

Calculate the ratio between the audio total quantity and the matched audio quantity；

The inverse text frequency of the finger print information of the target audio is determined according to the ratio.

If the inverse text frequency of the finger print information of the target audio is less than preset threshold, it is determined that the target audio Finger print information representative degree lower than be expected, wherein the preset threshold be the information content according to included by the inverted index table and/ Or the quantity of included finger print information determines in the inverted index table.

The inverted index table is loaded onto the objective function of memory；

Audio query instruction is received, the audio query instruction includes audio section；

Obtain the finger print information of the audio section；

The objective function is executed, is associated with being retrieved according to the inverted index table with the finger print information of the audio section Audio.

In the embodiment of the present invention, the representative degree of the finger print information of target audio is obtained, when the finger print information generation of target audio Scale then deletes the finger print information of the target audio from inverted index table lower than expection；Target audio can be passed through Finger print information representative degree screens the finger print information in inverted index table, to retain the higher finger print information of representative degree, and The lower finger print information of representative degree is deleted, memory space can be saved, reduce the resource consumption of electronic equipment, and make inverted index table More simplify.Simultaneously as the retrieval performance of the lower finger print information of representative degree is poor, therefore, this kind of finger print information is deleted simultaneously It will not influence retrieval performance；Instead, it carries out retrieving its retrieval performance by the higher finger print information of representative degree higher.

In one embodiment, the processor 1001 can be used for reading and executing computer instruction, realize such as the application A kind of audio-frequency processing method described in Fig. 1 or Fig. 2.The principle and figure that the electronic equipment provided in the embodiment of the present invention solves the problems, such as Embodiment of the method described in 1 and Fig. 2 is similar, therefore the embodiment of the electronic equipment and beneficial effect may refer to method reality The embodiment and beneficial effect of example are applied, overlaps will not be repeated.

The embodiment of the invention also provides a kind of computer readable storage mediums, are stored thereon with computer program, the journey The embodiment and beneficial effect that sequence solves the problems, such as may refer to a kind of audio-frequency processing method described in above-mentioned Fig. 1 and Fig. 2 Embodiment and beneficial effect, overlaps will not be repeated.

Above disclosed is only section Example of the present invention, cannot limit the right model of the present invention with this certainly It encloses, therefore equivalent changes made in accordance with the claims of the present invention, is still within the scope of the present invention.

Claims

1. a kind of audio-frequency processing method characterized by comprising

Extract the audio-frequency fingerprint of target audio；

Inverted index table is obtained, the inverted index table includes the finger print information of the target audio and the target audio, institute The finger print information of target audio is stated as the cryptographic Hash of the audio-frequency fingerprint of the target audio；

The finger print information representative degree of the target audio is obtained according to the finger print information of the target audio, the target audio Finger print information representative degree be the target audio finger print information inverse text frequency, the inverse text frequency with match audio number Amount is inversely proportional；

If the finger print information representative degree of the target audio deletes the target sound lower than expection from the inverted index table The finger print information of frequency.

2. the method as described in claim 1, which is characterized in that the audio-frequency fingerprint for extracting target audio, comprising:

3. the method as described in claim 1, which is characterized in that the matching audio quantity is the finger in the inverted index table The quantity of the matched audio of finger print information of line information and the target audio includes multiple audios in the inverted index table, The target audio is any audio in the multiple audio.

4. method as claimed in claim 3, which is characterized in that described according to the acquisition of the finger print information of the target audio The finger print information representative degree of target audio, comprising:

Count in the inverted index table finger print information in included audio total quantity and the inverted index table with it is described The matched audio quantity of the finger print information of target audio；

5. method as claimed in claim 4, which is characterized in that the method also includes:

If the inverse text frequency of the finger print information of the target audio is less than preset threshold, it is determined that the fingerprint of the target audio Information representative degree is lower than expection, wherein the preset threshold is the information content according to included by the inverted index table and/or institute What the quantity of the finger print information included by stating in inverted index table determined.

6. the method as described in claim 1, which is characterized in that the inverted index table further includes the fingerprint of the target audio Information is located at the finger print information of location information and/or the target audio in the target audio in the target audio The frequency of appearance.

7. the method as described in claim 1, which is characterized in that the method also includes:

The inverted index table is loaded onto the objective function of memory；

Obtain the finger print information of the audio section；

The objective function is executed, to retrieve and the associated sound of the finger print information of the audio section according to the inverted index table Frequently.

8. a kind of apparatus for processing audio characterized by comprising

Acquiring unit, for obtaining inverted index table, the inverted index table includes the target audio and the target audio Finger print information, the finger print information of the target audio is the cryptographic Hash of the audio-frequency fingerprint of the target audio；According to the mesh The finger print information of mark with phonetic symbols frequency obtains the finger print information representative degree of the target audio, the finger print information representative degree of the target audio For the inverse text frequency of the finger print information of the target audio, the inverse text frequency is inversely proportional with audio quantity is matched；

Unit is deleted, is expected if the finger print information representative degree for the target audio is lower than, from the inverted index table Delete the finger print information of the target audio.

9. a kind of electronic equipment, which is characterized in that the electronic equipment includes:

Processor is adapted for carrying out one or one or more instruction；And

Computer readable storage medium, the computer-readable recording medium storage have one or one or more instruction, described one Item or one or more instruction are suitable for being loaded by processor and being executed such as the described in any item audio-frequency processing methods of claim 1-7.

10. a kind of computer readable storage medium, which is characterized in that the computer-readable recording medium storage has one or one Above instructions, described one or one or more instruction are suitable for being loaded by processor and being executed such as any one of claim 1-7 institute The audio-frequency processing method stated.