CN109871463A - Audio-frequency processing method, device, electronic equipment and storage medium - Google Patents
Audio-frequency processing method, device, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN109871463A CN109871463A CN201910168211.9A CN201910168211A CN109871463A CN 109871463 A CN109871463 A CN 109871463A CN 201910168211 A CN201910168211 A CN 201910168211A CN 109871463 A CN109871463 A CN 109871463A
- Authority
- CN
- China
- Prior art keywords
- audio
- finger print
- print information
- target audio
- frequency
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the invention discloses a kind of method, apparatus of audio processing, electronic equipment and storage mediums, wherein, method includes: to extract the audio-frequency fingerprint of target audio, obtains inverted index table, the inverted index table includes the finger print information of the target audio and the target audio;The finger print information representative degree of the target audio is obtained according to the finger print information of the target audio;If the finger print information representative degree of the target audio deletes the finger print information of the target audio lower than expection from the inverted index table.By screening to the data in inverted index table, memory consumption can be reduced, and improve recall precision.
Description
Technical field
The present invention relates to multi-medium data technical field more particularly to a kind of audio-frequency processing method, a kind of audio processing dress
It sets, a kind of electronic equipment and a kind of storage medium.
Background technique
With the development of internet and audio fingerprint techniques, a kind of audio retrieval mode based on audio-frequency fingerprint meet the tendency of and
It is raw.The audio retrieval mode only needs to extract audio-frequency fingerprint from the audio section that user inputs, by audio-frequency fingerprint and inverted index
Audio-frequency fingerprint in table compares, and the mapping relations between audio-frequency fingerprint and audio is had recorded in inverted index table, according to right
Relevant audio can be retrieved than result.The audio retrieval mode does not need user and is manually entered text, can be more just
Audio is retrieved promptly, the favor by more and more people.It is found in practice, in the audio retrieval mode, if inverted index table
In include audio-frequency fingerprint it is excessive, then can for user search go out more Multi-audio-frequency, but increase secondary filter workload and needs
Larger memory is consumed to store inverted index table;It is more that secondary filter refers to that the fingerprint for working as the audio section inputted according to user retrieves
After a relevant audio, need to filter out the audio of user needs from the audio retrieved again.If inverted index table
In include audio-frequency fingerprint it is very few, then can reduce the workload of secondary filter, and reduce memory consumption, but can not be user
Retrieve more Multi-audio-frequency.Therefore, inverted index table is the key factor for influencing retrieval performance.
Summary of the invention
The technical problem to be solved by the embodiment of the invention is that providing a kind of audio-frequency processing method, device, electronic equipment
And storage medium can reduce memory consumption, and improve recall precision by screening to the data in inverted index table.
On the one hand, the embodiment of the present invention provides a kind of audio-frequency processing method, this method comprises:
Extract the audio-frequency fingerprint of target audio;
Inverted index table is obtained, the inverted index table includes the fingerprint letter of the target audio and the target audio
Breath, the finger print information of the target audio are the cryptographic Hash of the audio-frequency fingerprint of the target audio;
The finger print information representative degree of the target audio, the target sound are obtained according to the finger print information of the target audio
The finger print information representative degree of frequency be the target audio finger print information inverse text frequency, the inverse text frequency with match sound
Frequency amount is inversely proportional;
If the finger print information representative degree of the target audio deletes the mesh lower than expection from the inverted index table
The finger print information of mark with phonetic symbols frequency.
On the one hand, the embodiment of the present invention provides a kind of apparatus for processing audio, which includes:
Extraction unit, for extracting the audio-frequency fingerprint of target audio;
Acquiring unit, for obtaining inverted index table, the inverted index table includes the target audio and the target
The finger print information of audio, the finger print information of the target audio are the cryptographic Hash of the audio-frequency fingerprint of the target audio;According to institute
The finger print information for stating target audio obtains the finger print information representative degree of the target audio, the finger print information generation of the target audio
Scale is the inverse text frequency of the finger print information of the target audio, and the inverse text frequency is inversely proportional with audio quantity is matched;
Unit is deleted, is expected if the finger print information representative degree for the target audio is lower than, from the inverted index
The finger print information of the target audio is deleted in table.
On the one hand, the embodiment of the present invention provides a kind of electronic equipment, comprising: processor and storage device;
The storage device is stored with computer program instructions, and the processor calls the computer program instructions, uses
In execution following steps:
Extract the audio-frequency fingerprint of target audio;
Inverted index table is obtained, the inverted index table includes the fingerprint letter of the target audio and the target audio
Breath, the finger print information of the target audio are the cryptographic Hash of the audio-frequency fingerprint of the target audio;
The finger print information representative degree of the target audio, the target sound are obtained according to the finger print information of the target audio
The finger print information representative degree of frequency be the target audio finger print information inverse text frequency, the inverse text frequency with match sound
Frequency amount is inversely proportional;
If the finger print information representative degree of the target audio deletes the mesh lower than expection from the inverted index table
The finger print information of mark with phonetic symbols frequency.
On the one hand, the embodiment of the present invention provides a kind of computer readable storage medium, the computer readable storage medium
Computer program instructions are stored with, which is performed, this method comprises:
Extract the audio-frequency fingerprint of target audio;
Inverted index table is obtained, the inverted index table includes the fingerprint letter of the target audio and the target audio
Breath, the finger print information of the target audio are the cryptographic Hash of the audio-frequency fingerprint of the target audio;
The finger print information representative degree of the target audio, the target sound are obtained according to the finger print information of the target audio
The finger print information representative degree of frequency be the target audio finger print information inverse text frequency, the inverse text frequency with match sound
Frequency amount is inversely proportional;
If the finger print information representative degree of the target audio deletes the mesh lower than expection from the inverted index table
The finger print information of mark with phonetic symbols frequency.
In the embodiment of the present invention, the audio-frequency fingerprint of the target audio is extracted, obtains the generation of the finger print information of target audio
Scale, when the finger print information representative degree of target audio is lower than expection, then by the fingerprint letter of the target audio from inverted index table
Breath is deleted;The finger print information in inverted index table can be screened by the finger print information representative degree of target audio, with
Retain the higher finger print information of representative degree, and delete the lower finger print information of representative degree, memory space can be saved, reduces electronics and set
Standby resource consumption, and simplify inverted index table more.Simultaneously as the retrieval performance of the lower finger print information of representative degree compared with
Difference, therefore, deleting this kind of finger print information will not influence retrieval performance;Instead, it is carried out by the higher finger print information of representative degree
It retrieves that its retrieval performance is higher, the workload of secondary filter can be reduced.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this
Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with
It obtains other drawings based on these drawings.
Fig. 1 is a kind of flow diagram of audio-frequency processing method of offer of the invention;
Fig. 2 is the flow diagram of another audio-frequency processing method of offer of the invention;
Fig. 3 is a kind of structural schematic diagram of apparatus for processing audio of offer of the invention;
Fig. 4 is the structural schematic diagram of a kind of electronic equipment of offer of the invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other
Embodiment shall fall within the protection scope of the present invention.
In the prior art, in order to retrieve more Multi-audio-frequency, usually all finger print informations of each audio are added to
In inverted index table, to the larger storage pressure of storage equipment belt, for example, usual 10,000 song needs to consume depositing for 1G or so
Space is stored up, for thousands of library, then needs to consume the memory space for being up to several TB.Based on this, the embodiment of the present invention
A kind of audio-frequency processing method is provided, referring to Figure 1, this method can be applied to electronic equipment, which can be intelligence
The equipment such as mobile phone, smartwatch, tablet computer or server, this method may include step S101~S104.
S101, the audio-frequency fingerprint for extracting the target audio.
Electronic equipment can carry out the processing such as time-frequency conversion to target audio and obtain the audio-frequency fingerprint of the target audio.The sound
Frequency fingerprint just refers to the characteristic information of the target audio.
S102, inverted index table is obtained, which includes target audio and the finger print information of the target audio, institute
The finger print information of target audio is stated as the cryptographic Hash of the audio-frequency fingerprint of the target audio.
It may include inverted index table in electronic equipment to efficiently retrieve the audio of user's needs, it should the row's of falling rope
It is corresponding with the audio database in electronic equipment to draw table.The inverted index table includes multiple audios in audio database and every
The finger print information of a audio, target audio can be any audio in multiple audio;Target audio may include multiple fingers
Line information, the finger print information of the target audio of this hair inventive embodiments meaning can be the multiple audio-frequency fingerprints letter for referring to target audio
Any audio-frequency fingerprint in breath.Wherein, the finger print information of the target audio is the cryptographic Hash of the audio-frequency fingerprint of the target audio, i.e.,
Electronic equipment can carry out Hash operation to the audio-frequency fingerprint of target audio, obtain the finger print information of the target audio.
S103, the finger print information representative degree that the target audio is obtained according to the finger print information of the target audio, the target
The finger print information representative degree of audio be the target audio finger print information inverse text frequency, the inverse text frequency with match
Audio quantity is inversely proportional.
The finger print information representative degree of target audio is used to describe the uniqueness (i.e. uniqueness) of the finger print information of target audio,
The finger print information representative degree of the target audio can be the inverse text frequency and/or target audio for referring to the finger print information of target audio
Finger print information matching times.In the inverse text frequency of the finger print information of the target audio and inverted index table with target audio
The first quantity of the matched audio of finger print information be inversely proportional, for example, the fingerprint in inverted index table with less audio is believed
It ceases and matches with the finger print information of target audio, the inverse text frequency of the finger print information of the target audio is low, shows target audio
Finger print information uniqueness it is higher, the finger print information representative degree of target audio is higher;So, according to the fingerprint of target audio
Information carrys out searched targets audio, then can retrieve less audio (the finger print information phase of i.e. all finger print informations and target audio
Matched audio), reduce the workload of quadratic search filtering.Conversely, with the fingerprint of more audio in inverted index table
The finger print information of information and target audio matches, and the inverse text frequency of the finger print information of the target audio is high, shows target sound
The uniqueness of the finger print information of frequency is lower, i.e. the finger print information representative degree of target audio is lower;So, according to target audio
Finger print information carrys out searched targets audio, then can retrieving more audio, (i.e. all finger print informations and the fingerprint of target audio are believed
The matched audio of manner of breathing), increase the workload of quadratic search filtering.For example, the finger print information of target audio includes fingerprint letter
A and finger print information B is ceased, there is 1000 audios, the fingerprint letter of 100 (the first audio quantity is 100) audios in inverted index table
Breath matches with finger print information A, and the finger print information and finger print information B of 10 (the first audio quantity is 100) audios match.If
Using finger print information A come searched targets audio, then 100 audios can be retrieved, the finger print information and fingerprint of this 100 audios are believed
Breath A matches, then user needs to secondary filter is carried out in this 100 audios, to filter out target audio.According to target sound
The finger print information B of frequency carrys out searched targets audio, then can retrieve 10 audios, the finger print information and target audio of this 10 audios
Finger print information B match, then user needs to carry out secondary filter in this 10 (the first audio quantity be 10) audios, with sieve
Select target audio.As it can be seen that the representative degree of finger print information A is lower than the representative degree degree of finger print information B, the i.e. inverse text of finger print information A
This frequency is lower than the inverse text frequency of finger print information B, and acquires the workload that finger print information A retrieves target audio and be higher than
The workload that acquisition finger print information B retrieves target audio, that is, when being retrieved to target audio, believed using fingerprint
The performance that breath A is retrieved is lower than the performance retrieved using finger print information B.The matching times of the finger print information of target audio
Refer to the matching times of the finger print information of the audio section carried in the finger print information and inquiry instruction of target audio, i.e. user uses
The number that the corresponding audio section of the finger print information of target audio is inquired, if matching times are more, the fingerprint of the target audio
The representative degree of information is higher, if matching times are fewer, the representative degree of the finger print information of the target audio is lower.
If the finger print information representative degree of S104, the target audio deletes the mesh lower than expection from the inverted index table
The finger print information of mark with phonetic symbols frequency.
When being retrieved using the finger print information of target audio to target audio, if the finger print information of target audio represents
Degree is higher, then facilitates rapidly to retrieve target audio, at this moment the utility value of the finger print information of target audio is relatively high;If
The finger print information representative degree of target audio is lower, then increases the workload for retrieving target audio, reduce retrieval performance, this
When target audio finger print information utility value it is relatively low.In order to simplify inverted index table more, fall to arrange to reduce storage
Memory headroom consumed by concordance list, can be by the lower finger print information of utility value in inverted index table from target inverted index
It is deleted in table.Specifically, if the finger print information representative degree of the target audio shows the fingerprint letter of the target audio lower than expection
The utility value of breath is lower, and electronic equipment can delete the target audio and the fingerprint of the target audio from the inverted index table
Information is discharged for storing the finger print information of target audio and the memory space of target audio, so as to store more utilizations
Costly finger print information improves retrieval performance.
In the embodiment of the present invention, the representative degree of the finger print information of target audio is obtained, when the finger print information generation of target audio
Scale then deletes the finger print information of the target audio from inverted index table lower than expection;Target audio can be passed through
Finger print information representative degree screens the finger print information in inverted index table, to retain the higher finger print information of representative degree, and
The lower finger print information of representative degree is deleted, memory space can be saved, reduce the resource consumption of electronic equipment, and make inverted index table
More simplify.Simultaneously as the retrieval performance of the lower finger print information of representative degree is poor, therefore, this kind of finger print information is deleted simultaneously
It will not influence retrieval performance;Instead, retrieve its retrieval performance by the higher finger print information of representative degree higher, can reduce
The workload of secondary filter.
Fig. 2 is referred to, Fig. 2 provides another audio-frequency processing method for the embodiment of the present invention, and this method can be applied to electricity
Sub- equipment, the electronic equipment can be the equipment such as smart phone, smartwatch, tablet computer or server, the embodiment of the present invention
Difference with Fig. 1 is that the finger print information representative degree of the target audio of the embodiment of the present invention is the finger print information of the target audio
Inverse text frequency, this is inversely proportional against text frequency and the first audio quantity.The first audio quantity is the inverted index table
In finger print information and the target audio the matched audio of finger print information quantity, include multiple sounds in the inverted index table
Frequently, which is any audio in the multiple audio.This method may include step S201~S206.
S201, the audio-frequency fingerprint for extracting the target audio.
In one embodiment, step s201 includes the following steps s21~s23.
S21, time-frequency conversion is carried out to the target audio, obtains the frequency domain information of the target audio.
S22, the energy matrix that the target audio is obtained according to the frequency domain information of the target audio.
S23, the audio-frequency fingerprint that the target audio is determined according to the energy matrix of the target audio.
In step s21~s23, electronic equipment can use FFT (Fast Fourier Transformation) algorithm
Time-frequency conversion is carried out to the target audio, obtains the frequency domain information of the target audio, the frequency domain information of the target audio describes
The relationship of frequency and pitch.The energy matrix that the target audio can be calculated according to the frequency domain information of target audio, to target sound
The energy matrix of frequency is detected, audio-frequency fingerprint of the local maxima energy value that will test out as the target audio.
S202, concordance list is obtained, which includes target audio and the finger print information of the target audio, the target audio
Finger print information be the target audio audio-frequency fingerprint cryptographic Hash.
The parameter information of the available target audio of electronic equipment, the parameter information include the energy of target audio, coloration,
At least one parameter in loudness, pitch etc..The audio for being analyzed to obtain target audio to the parameter information of target audio refers to
Line, e.g., the parameter information of the target audio are pitch, and the pitch of default pitch will be greater than in target audio as target audio
Audio-frequency fingerprint.The cryptographic Hash of the audio-frequency fingerprint of target audio is calculated by hash algorithm, by the target audio and target sound
The cryptographic Hash of the audio-frequency fingerprint of frequency is added in the inverted index table.
In one embodiment, which further includes that the finger print information of the target audio is located in the target audio
Location information and/or the target audio the frequency that occurs in the target audio of finger print information;The location information is for this
Position of the corresponding word of fingerprint in target audio.For example, the inverted index table is as shown in table 1, include in the inverted index table
Audio 1, audio 2 and audio 3.Audio 1 includes finger print information A and B, and position of the finger print information A in audio 1 is 2s, in sound
Frequently the frequency occurred in 1 is 1 time;Position of the finger print information B in audio 1 is 16s, and the frequency occurred in audio 1 is 1
It is secondary.Audio 2 includes finger print information A, and position of the finger print information A in audio 2 is 5s, and the frequency occurred in audio 2 is 1
It is secondary.Audio 3 includes finger print information A, and position of the finger print information A in audio 3 is 5s, and the frequency occurred in audio 3 is 1
It is secondary.As it can be seen that the finger print information A of audio section 1 is identical as inverted index table sound intermediate frequency 1, audio 2, the finger print information A of audio 3, audio
The finger print information B of section 1 is identical as the finger print information A of inverted index table sound intermediate frequency 1;Therefore, the uniqueness of the finger print information B of audio 1
Property is stronger, and the finger print information B representative degree of audio 1 is stronger;Unique weaker, the fingerprint of audio 1 of the finger print information A of audio 1
Information A representative degree is weaker.
Table 1
S203, finger print information in included audio total quantity and the inverted index table is counted in the inverted index table
With the matched audio quantity of finger print information of the target audio.
S204, ratio between the audio total quantity and the matching audio quantity is calculated.
S205, the inverse text frequency that the finger print information of the target audio is determined according to the ratio.
If the inverse text frequency of the finger print information of S206, the target audio is deleted from the inverted index table lower than expection
Except the finger print information of the target audio.As shown in table 1, which is audio 1, and finger print information is finger print information A, if target
The inverse text frequency of the finger print information A of audio then deletes the finger print information A of audio 1 lower than expection.Due to finger print information phase
Together, then the inverse text frequency of the finger print information of each audio is identical, therefore, if calculating the inverse text of the finger print information of target audio
This frequency then all deletes finger print information identical with the finger print information of target audio in inverted index table lower than expection.
In step S203~S206, electronic equipment can be made a return journey inverted index table by the inverse text frequency of finger print information
In finger print information screened.Specifically, electronic equipment can count inverted index table sound intermediate frequency total quantity, inverted index table
The total quantity of sound intermediate frequency may also mean that the audio total quantity in audio database, and count in inverted index table finger print information with
The matched first audio quantity of the finger print information of target audio;Calculate the ratio between the audio total quantity and the first audio quantity
Value, the inverse text frequency of the finger print information of the target audio is determined according to the ratio.Ratio is bigger, shows finger print information and target
The matched audio quantity of the finger print information of audio is fewer, and the inverse text frequency of the finger print information of the target audio is bigger, the target
The finger print information representative degree of audio is higher;Ratio is smaller, shows the matched audio of the finger print information of finger print information and target audio
Quantity is more, and the inverse text frequency of the finger print information of the target audio is smaller, and the finger print information representative degree of the target audio is lower.
Therefore, if the inverse text frequency of the finger print information of the target audio shows the finger print information generation of the target audio lower than expection
Scale is lower, then the finger print information of the target audio is deleted from the inverted index table.For example, as shown in table 1, target audio is
Audio 1, the finger print information of audio 1 include finger print information A and finger print information B, and the total audio quantity in inverted index table is 3.?
There is the finger print information of 3 (i.e. the first audio quantity is 3) audios identical as the finger print information A of audio 1 in row's concordance list, fingerprint letter
The inverse text frequency for ceasing A can be the ratio of total audio quantity and the first audio quantity, and the inverse text frequency of finger print information A is 1;
Have in inverted index table 1 (i.e. the first audio quantity be 1), the inverse text frequency of finger print information B can for total audio quantity and
The ratio of first audio quantity, the inverse text frequency of finger print information B are 3.Assuming that 2 are expected to, the inverse text frequency of finger print information A
Lower than expection, finger print information A is deleted from inverted index table;The inverse text frequency of finger print information B, which is higher than, is expected, and fingerprint is believed
B is ceased to retain.For each finger print information in inverted index table, the representative of each finger print information can be calculated using the above method
Degree, and delete all lower than expected finger print information, simplify inverted index table more, and save memory space.
In one embodiment, if the inverse text frequency of the finger print information of the target audio is less than preset threshold, it is determined that
The finger print information representative degree of the target audio is lower than expection, wherein the preset threshold is according to included by the inverted index table
The quantity of included finger print information determines in information content and/or the inverted index table.
If the inverse text frequency of the finger print information of the target audio is less than preset threshold, show the finger print information of target audio
Representative degree it is lower, it is determined that the finger print information representative degree of the target audio lower than be expected.Wherein, which is that basis should
The quantity of included finger print information determines in information content included by inverted index table and/or the inverted index table, for example,
Included information content is more in inverted index table, and/or, the quantity of included finger print information is got in the inverted index table
More, then a lesser number can be set in the preset threshold, to delete the lower finger print information of a large amount of representative degrees;Inverted index table
In included information content it is fewer, and/or, the quantity of included finger print information is fewer in the inverted index table, then this is default
A biggish number can be set in threshold value, to delete the lower finger print information of a small amount of representative degree.
In one embodiment, it is assumed that including M audio, finger print information and target in inverted index table in inverted index table
The matched audio quantity of the finger print information A of audio is V, and the inverse text frequency of the finger print information A of target audio is f, then target sound
The inverse text frequency of the finger print information A of frequency can be indicated using following formula (1).
F=log10(M/V) (1)
In one embodiment, which is loaded onto the objective function of memory, receives audio query instruction,
Audio query instruction includes audio section, obtains the finger print information of the audio section, executes the objective function, should the row's of falling rope with basis
Draw table to retrieve and the associated audio of the finger print information of the audio section.
In order to guarantee the real-time of retrieval, inverted index table can be loaded onto memory by electronic equipment, specifically, electronics
The inverted index table can be loaded onto the objective function of memory by equipment, which can refer to for retrieving audio
In function, which can be in remote procedure call function.When receiving inquiry instruction, extracts the inquiry instruction and take
The finger print information of the audio section of band, and the objective function is executed, to be retrieved from audio database according to the inverted index table
With the associated audio of finger print information of the audio.The audio that user wants can be retrieved by inverted index table, improve retrieval effect
Rate.
In one embodiment, the finger print information representative degree of target audio can be of the finger print information of the target audio
With number, electronic equipment can obtain the matching times of the finger print information of target audio from historical query record;Wherein, history
The matching times of finger print information including multiple audios and the finger print information of each audio in inquiry record, the fingerprint letter of audio
The matching times of breath refer to the matching times of the finger print information of the audio section carried in the finger print information and inquiry instruction of audio.If
The matching times of the finger print information of target audio are more, show that user's more preference is retrieved using the finger print information of target audio
Audio, then the information representative degree of target audio is higher, and the finger print information utility value of target audio is relatively high;If target audio
The matching times of finger print information are less, show that user does not like and carry out retrieval audio using the finger print information of target audio, then
The information representative degree of target audio is lower, and the finger print information utility value of target audio is relatively low.Therefore, when the finger of target audio
The matching times of line information are less than preset times, then delete the finger print information of the target audio from inverted index table.
In another embodiment, the finger print information representative degree of the target audio may include the fingerprint letter of the target audio
The inverse text frequency of the matching times of breath and the finger print information, electronic equipment can matching to the finger print information of target audio it is secondary
It is weighted summation between the several and inverse text frequency of the finger print information, representative degree summation is obtained, if the representative degree summation is less than
Preset value then shows that the representative degree of the finger print information of the target audio is lower, the finger print information of the target audio can be deleted.
For example, it is assumed that the information representative degree summation of target audio is D, the inverse text frequency of finger print information is f, weight k1, fingerprint
The matching times of information are S, weight K2, then the information representative degree summation of target audio can use following formula (2) table
Show.Wherein, since the inverse text frequency of the finger print information of target audio is related with effectiveness of retrieval, the finger print information of target audio
Matching times it is related with the retrieval habit preference of user, therefore, electronic equipment can be arranged according to the demand of user fingerprint letter
The weight of the matching times of the inverse text frequency and finger print information of breath.If the weight of the inverse text frequency of finger print information is bigger,
Inverted index table after screening can be realized efficient retrieval;If the weight of the matching times of finger print information is bigger, screen
Inverted index table afterwards can more agree with the preference of user search.
D=fk1+Sk2 (2)
In the embodiment of the present invention, the representative degree of the finger print information of target audio is obtained, when the finger print information generation of target audio
Scale then deletes the finger print information of the target audio from inverted index table lower than expection;Target audio can be passed through
Finger print information representative degree screens the finger print information in inverted index table, to retain the higher finger print information of representative degree, and
The lower finger print information of representative degree is deleted, memory space can be saved, reduce the resource consumption of electronic equipment, and make inverted index table
More simplify.Simultaneously as the retrieval performance of the lower finger print information of representative degree is poor, therefore, this kind of finger print information is deleted simultaneously
It will not influence retrieval performance;Instead, retrieve its retrieval performance by the higher finger print information of representative degree higher, by falling
Retrieval performance can be improved in the screening of row's concordance list to a certain extent.
Based on foregoing description, the embodiment of the present invention provides a kind of structural schematic diagram of apparatus for processing audio, the audio processing
Device can run on electronic equipment, and electronic equipment may include smart phone, smartwatch or computer etc. herein.Such as Fig. 3
Shown, which includes:
Extraction unit 301, for extracting the audio-frequency fingerprint of target audio.
Acquiring unit 302, for obtaining inverted index table, the inverted index table includes the target audio and the mesh
The finger print information of mark with phonetic symbols frequency, the finger print information of the target audio are the cryptographic Hash of the audio-frequency fingerprint of the target audio;According to
The finger print information of the target audio obtains the finger print information representative degree of the target audio, the finger print information of the target audio
Representative degree be the target audio finger print information inverse text frequency, the inverse text frequency with match audio quantity at anti-
Than.
Unit 303 is deleted, if the finger print information representative degree for the target audio arranges rope from described lower than expection
Draw the finger print information that the target audio is deleted in table.
Optionally, extraction unit 301 obtain the target audio for carrying out time-frequency conversion to the target audio
Frequency domain information;The energy matrix of the target audio is obtained according to the frequency domain information of the target audio;According to the target sound
The energy matrix of frequency determines the audio-frequency fingerprint of the target audio.
Optionally, the matching audio quantity is the finger of the finger print information in the inverted index table and the target audio
The quantity of the audio of line information matches, includes multiple audios in the inverted index table, and the target audio is the multiple sound
Any audio in frequency.
Optionally, acquiring unit 302, for counting in the inverted index table included audio total quantity and described
The matched audio quantity of finger print information of finger print information and the target audio in inverted index table;Calculate the audio sum
Ratio between amount and the matched audio quantity;The inverse text of the finger print information of the target audio is determined according to the ratio
This frequency.
Optionally, determination unit 304, if the inverse text frequency of the finger print information for the target audio is less than default threshold
Value, it is determined that the finger print information representative degree of the target audio is lower than expection, wherein the preset threshold is according to the row of falling
The quantity of included finger print information determines in information content included by concordance list and/or the inverted index table.
Optionally, the inverted index table further includes that the finger print information of the target audio is located in the target audio
The frequency that location information and/or the finger print information of the target audio occur in the target audio.
Optionally, query unit 305, for the inverted index table to be loaded onto the objective function of memory;Receive sound
Frequency inquiry instruction, the audio query instruction includes audio section;Obtain the finger print information of the audio section;Execute the target letter
Number, to be retrieved and the associated audio of the finger print information of the audio section according to the inverted index table.
In the embodiment of the present invention, the representative degree of the finger print information of target audio is obtained, when the finger print information generation of target audio
Scale then deletes the finger print information of the target audio from inverted index table lower than expection;Target audio can be passed through
Finger print information representative degree screens the finger print information in inverted index table, to retain the higher finger print information of representative degree, and
The lower finger print information of representative degree is deleted, memory space can be saved, reduce the resource consumption of electronic equipment, and make inverted index table
More simplify.Simultaneously as the retrieval performance of the lower finger print information of representative degree is poor, therefore, this kind of finger print information is deleted simultaneously
It will not influence retrieval performance;Instead, retrieve its retrieval performance by the higher finger print information of representative degree higher, by falling
Retrieval performance can be improved in the screening of row's concordance list to a certain extent.
Fig. 4 is referred to, is the structural schematic diagram of a kind of electronic equipment provided in an embodiment of the present invention, the electronic equipment 1000
It include: processor 1001, user interface 1003, network interface 1004 and storage device 1005, processor 1001, user interface
1003, it is connected between network interface 1004 and storage device 1005 by bus 1002.
User interface 1003, for realizing human-computer interaction, user interface may include display screen or keyboard etc..Network connects
Mouth 1004, for being communicatively coupled between external equipment.Storage device 1005 is coupled with processor 1001, each for storing
Kind software program and/or multiple groups instruction.In the specific implementation, storage device 1005 may include the memory of high random access, and
And it may also comprise nonvolatile memory, such as one or more disk storage equipment, flash memory device or other are non-volatile solid
State stores equipment.Storage device 1005 can store an operating system (following abbreviation systems), such as ANDROID, IOS, or
The operating systems such as LINUX.Storage device 1005 can also store network communication program, which can be used for and one
Or multiple optional equipments, one or more terminal devices, one or more network equipments are communicated.Storage device 1005 may be used also
To store user interface program, which can be by patterned operation interface by the content image of application program
True to nature shows, and receives user by input controls such as menu, dialog box and keys and grasp to the control of application program
Make.Storage device 1005 can also store one or more application program, such as audio processing application program, be used for inverted index
Table is screened.
In one embodiment, the storage device 1005 can also be used to store one or more than one program instructions;
The processor 1001, which can call, can hold audio-frequency processing method to realize when one or more than one program instructions
Refrain is extracted, specifically, 1001 caller of the processor instruction executes following steps:
Extract the audio-frequency fingerprint of target audio;
Inverted index table is obtained, the inverted index table includes the fingerprint letter of the target audio and the target audio
Breath, the finger print information of the target audio are the cryptographic Hash of the audio-frequency fingerprint of the target audio;
The finger print information representative degree of the target audio, the target sound are obtained according to the finger print information of the target audio
The finger print information representative degree of frequency be the target audio finger print information inverse text frequency, the inverse text frequency with match sound
Frequency amount is inversely proportional;
If the finger print information representative degree of the target audio deletes the mesh lower than expection from the inverted index table
The finger print information of mark with phonetic symbols frequency.
Optionally, the processor 1001 can call described program to instruct, and execute following steps:
Time-frequency conversion is carried out to the target audio, obtains the frequency domain information of the target audio;
The energy matrix of the target audio is obtained according to the frequency domain information of the target audio;
The audio-frequency fingerprint of the target audio is determined according to the energy matrix of the target audio.
Optionally, the matching audio quantity is the finger of the finger print information in the inverted index table and the target audio
The quantity of the audio of line information matches, includes multiple audios in the inverted index table, and the target audio is the multiple sound
Any audio in frequency.
Optionally, the processor 1001 can call described program to instruct, and execute following steps:
Count in the inverted index table finger print information in included audio total quantity and the inverted index table with
The matched audio quantity of the finger print information of the target audio;
Calculate the ratio between the audio total quantity and the matched audio quantity;
The inverse text frequency of the finger print information of the target audio is determined according to the ratio.
Optionally, the processor 1001 can call described program to instruct, and execute following steps:
If the inverse text frequency of the finger print information of the target audio is less than preset threshold, it is determined that the target audio
Finger print information representative degree lower than be expected, wherein the preset threshold be the information content according to included by the inverted index table and/
Or the quantity of included finger print information determines in the inverted index table.
Optionally, the inverted index table further includes that the finger print information of the target audio is located in the target audio
The frequency that location information and/or the finger print information of the target audio occur in the target audio.
Optionally, the processor 1001 can call described program to instruct, and execute following steps:
The inverted index table is loaded onto the objective function of memory;
Audio query instruction is received, the audio query instruction includes audio section;
Obtain the finger print information of the audio section;
The objective function is executed, is associated with being retrieved according to the inverted index table with the finger print information of the audio section
Audio.
In the embodiment of the present invention, the representative degree of the finger print information of target audio is obtained, when the finger print information generation of target audio
Scale then deletes the finger print information of the target audio from inverted index table lower than expection;Target audio can be passed through
Finger print information representative degree screens the finger print information in inverted index table, to retain the higher finger print information of representative degree, and
The lower finger print information of representative degree is deleted, memory space can be saved, reduce the resource consumption of electronic equipment, and make inverted index table
More simplify.Simultaneously as the retrieval performance of the lower finger print information of representative degree is poor, therefore, this kind of finger print information is deleted simultaneously
It will not influence retrieval performance;Instead, it carries out retrieving its retrieval performance by the higher finger print information of representative degree higher.
In one embodiment, the processor 1001 can be used for reading and executing computer instruction, realize such as the application
A kind of audio-frequency processing method described in Fig. 1 or Fig. 2.The principle and figure that the electronic equipment provided in the embodiment of the present invention solves the problems, such as
Embodiment of the method described in 1 and Fig. 2 is similar, therefore the embodiment of the electronic equipment and beneficial effect may refer to method reality
The embodiment and beneficial effect of example are applied, overlaps will not be repeated.
The embodiment of the invention also provides a kind of computer readable storage mediums, are stored thereon with computer program, the journey
The embodiment and beneficial effect that sequence solves the problems, such as may refer to a kind of audio-frequency processing method described in above-mentioned Fig. 1 and Fig. 2
Embodiment and beneficial effect, overlaps will not be repeated.
Above disclosed is only section Example of the present invention, cannot limit the right model of the present invention with this certainly
It encloses, therefore equivalent changes made in accordance with the claims of the present invention, is still within the scope of the present invention.
Claims (10)
1. a kind of audio-frequency processing method characterized by comprising
Extract the audio-frequency fingerprint of target audio;
Inverted index table is obtained, the inverted index table includes the finger print information of the target audio and the target audio, institute
The finger print information of target audio is stated as the cryptographic Hash of the audio-frequency fingerprint of the target audio;
The finger print information representative degree of the target audio is obtained according to the finger print information of the target audio, the target audio
Finger print information representative degree be the target audio finger print information inverse text frequency, the inverse text frequency with match audio number
Amount is inversely proportional;
If the finger print information representative degree of the target audio deletes the target sound lower than expection from the inverted index table
The finger print information of frequency.
2. the method as described in claim 1, which is characterized in that the audio-frequency fingerprint for extracting target audio, comprising:
Time-frequency conversion is carried out to the target audio, obtains the frequency domain information of the target audio;
The energy matrix of the target audio is obtained according to the frequency domain information of the target audio;
The audio-frequency fingerprint of the target audio is determined according to the energy matrix of the target audio.
3. the method as described in claim 1, which is characterized in that the matching audio quantity is the finger in the inverted index table
The quantity of the matched audio of finger print information of line information and the target audio includes multiple audios in the inverted index table,
The target audio is any audio in the multiple audio.
4. method as claimed in claim 3, which is characterized in that described according to the acquisition of the finger print information of the target audio
The finger print information representative degree of target audio, comprising:
Count in the inverted index table finger print information in included audio total quantity and the inverted index table with it is described
The matched audio quantity of the finger print information of target audio;
Calculate the ratio between the audio total quantity and the matched audio quantity;
The inverse text frequency of the finger print information of the target audio is determined according to the ratio.
5. method as claimed in claim 4, which is characterized in that the method also includes:
If the inverse text frequency of the finger print information of the target audio is less than preset threshold, it is determined that the fingerprint of the target audio
Information representative degree is lower than expection, wherein the preset threshold is the information content according to included by the inverted index table and/or institute
What the quantity of the finger print information included by stating in inverted index table determined.
6. the method as described in claim 1, which is characterized in that the inverted index table further includes the fingerprint of the target audio
Information is located at the finger print information of location information and/or the target audio in the target audio in the target audio
The frequency of appearance.
7. the method as described in claim 1, which is characterized in that the method also includes:
The inverted index table is loaded onto the objective function of memory;
Audio query instruction is received, the audio query instruction includes audio section;
Obtain the finger print information of the audio section;
The objective function is executed, to retrieve and the associated sound of the finger print information of the audio section according to the inverted index table
Frequently.
8. a kind of apparatus for processing audio characterized by comprising
Extraction unit, for extracting the audio-frequency fingerprint of target audio;
Acquiring unit, for obtaining inverted index table, the inverted index table includes the target audio and the target audio
Finger print information, the finger print information of the target audio is the cryptographic Hash of the audio-frequency fingerprint of the target audio;According to the mesh
The finger print information of mark with phonetic symbols frequency obtains the finger print information representative degree of the target audio, the finger print information representative degree of the target audio
For the inverse text frequency of the finger print information of the target audio, the inverse text frequency is inversely proportional with audio quantity is matched;
Unit is deleted, is expected if the finger print information representative degree for the target audio is lower than, from the inverted index table
Delete the finger print information of the target audio.
9. a kind of electronic equipment, which is characterized in that the electronic equipment includes:
Processor is adapted for carrying out one or one or more instruction;And
Computer readable storage medium, the computer-readable recording medium storage have one or one or more instruction, described one
Item or one or more instruction are suitable for being loaded by processor and being executed such as the described in any item audio-frequency processing methods of claim 1-7.
10. a kind of computer readable storage medium, which is characterized in that the computer-readable recording medium storage has one or one
Above instructions, described one or one or more instruction are suitable for being loaded by processor and being executed such as any one of claim 1-7 institute
The audio-frequency processing method stated.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201910168211.9A CN109871463B (en) | 2019-03-06 | 2019-03-06 | Audio processing method, device, electronic equipment and storage medium |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201910168211.9A CN109871463B (en) | 2019-03-06 | 2019-03-06 | Audio processing method, device, electronic equipment and storage medium |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN109871463A true CN109871463A (en) | 2019-06-11 |
| CN109871463B CN109871463B (en) | 2024-04-09 |
Family
ID=66919924
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201910168211.9A Active CN109871463B (en) | 2019-03-06 | 2019-03-06 | Audio processing method, device, electronic equipment and storage medium |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN109871463B (en) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN114417024A (en) * | 2021-12-27 | 2022-04-29 | 北京达佳互联信息技术有限公司 | Multimedia search method, apparatus, electronic device, and computer-readable storage medium |
Citations (15)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6345253B1 (en) * | 1999-04-09 | 2002-02-05 | International Business Machines Corporation | Method and apparatus for retrieving audio information using primary and supplemental indexes |
| KR20070050631A (en) * | 2005-11-11 | 2007-05-16 | 삼성전자주식회사 | Audio fingerprint generation and audio data retrieval apparatus and method |
| CN102081666A (en) * | 2011-01-21 | 2011-06-01 | 北京大学 | Index construction method for distributed picture search and server |
| CN103093761A (en) * | 2011-11-01 | 2013-05-08 | 腾讯科技(深圳)有限公司 | Audio fingerprint retrieval method and retrieval device |
| CN103440313A (en) * | 2013-08-27 | 2013-12-11 | 复旦大学 | Music retrieval system based on audio fingerprint features |
| CN104915403A (en) * | 2015-06-01 | 2015-09-16 | 腾讯科技(北京)有限公司 | Information processing method and server |
| CN105447030A (en) * | 2014-08-29 | 2016-03-30 | 阿里巴巴集团控股有限公司 | Index processing method and equipment |
| KR20160100216A (en) * | 2015-02-13 | 2016-08-23 | 레이 왕 | Method and device for constructing on-line real-time updating of massive audio fingerprint database |
| US20160267178A1 (en) * | 2015-03-13 | 2016-09-15 | TCL Research America Inc. | Video retrieval based on optimized selected fingerprints |
| CN106407268A (en) * | 2015-08-25 | 2017-02-15 | Tcl集团股份有限公司 | Method and system for content retrieval based on rate-coverage optimization |
| CN106547805A (en) * | 2015-09-23 | 2017-03-29 | 北京奇虎科技有限公司 | The method and apparatus of optimization database index |
| CN107402965A (en) * | 2017-06-22 | 2017-11-28 | 中国农业大学 | A kind of audio search method |
| CN107562762A (en) * | 2016-07-01 | 2018-01-09 | 中国联合网络通信集团有限公司 | Data directory construction method and device |
| CN107577773A (en) * | 2017-09-08 | 2018-01-12 | 科大讯飞股份有限公司 | Audio matching method and device and electronic equipment |
| CN107731220A (en) * | 2017-10-18 | 2018-02-23 | 北京达佳互联信息技术有限公司 | Audio identification methods, device and server |
-
2019
- 2019-03-06 CN CN201910168211.9A patent/CN109871463B/en active Active
Patent Citations (15)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6345253B1 (en) * | 1999-04-09 | 2002-02-05 | International Business Machines Corporation | Method and apparatus for retrieving audio information using primary and supplemental indexes |
| KR20070050631A (en) * | 2005-11-11 | 2007-05-16 | 삼성전자주식회사 | Audio fingerprint generation and audio data retrieval apparatus and method |
| CN102081666A (en) * | 2011-01-21 | 2011-06-01 | 北京大学 | Index construction method for distributed picture search and server |
| CN103093761A (en) * | 2011-11-01 | 2013-05-08 | 腾讯科技(深圳)有限公司 | Audio fingerprint retrieval method and retrieval device |
| CN103440313A (en) * | 2013-08-27 | 2013-12-11 | 复旦大学 | Music retrieval system based on audio fingerprint features |
| CN105447030A (en) * | 2014-08-29 | 2016-03-30 | 阿里巴巴集团控股有限公司 | Index processing method and equipment |
| KR20160100216A (en) * | 2015-02-13 | 2016-08-23 | 레이 왕 | Method and device for constructing on-line real-time updating of massive audio fingerprint database |
| US20160267178A1 (en) * | 2015-03-13 | 2016-09-15 | TCL Research America Inc. | Video retrieval based on optimized selected fingerprints |
| CN104915403A (en) * | 2015-06-01 | 2015-09-16 | 腾讯科技(北京)有限公司 | Information processing method and server |
| CN106407268A (en) * | 2015-08-25 | 2017-02-15 | Tcl集团股份有限公司 | Method and system for content retrieval based on rate-coverage optimization |
| CN106547805A (en) * | 2015-09-23 | 2017-03-29 | 北京奇虎科技有限公司 | The method and apparatus of optimization database index |
| CN107562762A (en) * | 2016-07-01 | 2018-01-09 | 中国联合网络通信集团有限公司 | Data directory construction method and device |
| CN107402965A (en) * | 2017-06-22 | 2017-11-28 | 中国农业大学 | A kind of audio search method |
| CN107577773A (en) * | 2017-09-08 | 2018-01-12 | 科大讯飞股份有限公司 | Audio matching method and device and electronic equipment |
| CN107731220A (en) * | 2017-10-18 | 2018-02-23 | 北京达佳互联信息技术有限公司 | Audio identification methods, device and server |
Non-Patent Citations (2)
| Title |
|---|
| 张兴忠等: "一种高效过滤提纯音频大数据检索方法", 《计算机研究与发展》 * |
| 张兴忠等: "一种高效过滤提纯音频大数据检索方法", 《计算机研究与发展》, no. 09, 15 September 2015 (2015-09-15) * |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN114417024A (en) * | 2021-12-27 | 2022-04-29 | 北京达佳互联信息技术有限公司 | Multimedia search method, apparatus, electronic device, and computer-readable storage medium |
Also Published As
| Publication number | Publication date |
|---|---|
| CN109871463B (en) | 2024-04-09 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN112508118B (en) | Target object behavior prediction method aiming at data offset and related equipment thereof | |
| CN104090912B (en) | Information-pushing method and device | |
| KR102046728B1 (en) | Method and device for identifying time information from voice information | |
| CN113342976B (en) | Method, device, storage medium and equipment for automatically acquiring and processing data | |
| CN111782943A (en) | Information recommendation method, device, equipment and medium based on historical data record | |
| TW202029079A (en) | Method and device for identifying irregular group | |
| WO2015148159A1 (en) | Determining a temporary transaction limit | |
| CN113934851B (en) | Data enhancement method, device and electronic device for text classification | |
| CN111667843A (en) | Voice wake-up method and system for terminal equipment, electronic equipment and storage medium | |
| CN109800269A (en) | Data managing method, device, computer equipment and storage medium | |
| CN103345616A (en) | Fingerprint storage comparison system based on behavioral analysis | |
| CN110162518A (en) | Data grouping method, apparatus, electronic equipment and storage medium | |
| CN108875048A (en) | Report form generation method, device, electronic equipment and readable storage medium storing program for executing | |
| CN112363814B (en) | Task scheduling method, device, computer equipment and storage medium | |
| CN110134943A (en) | Domain body generation method, device, equipment and medium | |
| CN103902728A (en) | Method and device for storing voice signals of intelligent watch | |
| CN108427667B (en) | Legal document segmentation method and device | |
| CN109871463A (en) | Audio-frequency processing method, device, electronic equipment and storage medium | |
| CN119629636A (en) | Spam call identification method, device, computer equipment and storage medium | |
| CN109815118A (en) | Database management method and apparatus, electronic device and computer-readable storage medium | |
| CN108920516A (en) | Real-time analysis method, system, device and computer readable storage medium | |
| CN110347696B (en) | Data conversion method, device, computer equipment and storage medium | |
| CN201196781Y (en) | Acoustic control electronic menu | |
| CN114791914B (en) | User behavior statistical method, device, equipment and medium based on Bitmap | |
| CN114238062A (en) | Board card burning device performance analysis method, device, equipment and readable storage medium |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |