+

CN1266633C - Sound distinguishing method in speech sound inquiry - Google Patents

Sound distinguishing method in speech sound inquiry Download PDF

Info

Publication number
CN1266633C
CN1266633C CNB021602727A CN02160272A CN1266633C CN 1266633 C CN1266633 C CN 1266633C CN B021602727 A CNB021602727 A CN B021602727A CN 02160272 A CN02160272 A CN 02160272A CN 1266633 C CN1266633 C CN 1266633C
Authority
CN
China
Prior art keywords
knowledge
query
speech
template
sound
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
CNB021602727A
Other languages
Chinese (zh)
Other versions
CN1514387A (en
Inventor
丰强泽
曹存根
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Computing Technology of CAS
Original Assignee
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Computing Technology of CAS filed Critical Institute of Computing Technology of CAS
Priority to CNB021602727A priority Critical patent/CN1266633C/en
Publication of CN1514387A publication Critical patent/CN1514387A/en
Application granted granted Critical
Publication of CN1266633C publication Critical patent/CN1266633C/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Landscapes

  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种语音查询中的辨音方法,包括步骤:利用现有语音识别接口对语音进行识别,在识别后,还包括步骤:确定用户可定制的知识查询语言;形成基于知识的辨音模型;基于知识、查询语言和辨音模型的快速辨音算法。本发明对于外在的环境和条件要求不高,一部电话或一个手机就可以在机场、车里、家里、饭店或者外出郊游时,对知识进行实时地语音查询和学习,极大地方便了用户的使用。

Figure 02160272

A sound identification method in voice query, comprising the steps of: using the existing speech recognition interface to recognize the voice, after the recognition, also includes the steps of: determining a user-customizable knowledge query language; forming a knowledge-based sound identification model; Fast phonetic recognition algorithm based on knowledge, query language and phonetic model. The present invention does not have high requirements for the external environment and conditions, and a telephone or a mobile phone can conduct real-time voice query and learning of knowledge at the airport, in a car, at home, in a restaurant or when going out for an outing, which greatly facilitates users usage of.

Figure 02160272

Description

Sound distinguishing method in the speech polling
Technical field
The present invention relates to the sound technology of distinguishing in the speech polling, particularly the speech polling of unbred unspecified person is carried out sound distinguishing method based on knowledge.
Background technology
Knowledge services is an emerging product of Knowledge Society.In knowledge-based society, people are increasing to the demand of information and knowledge, and wish to obtain needed information and knowledge whenever and wherever possible.Information and knowledge services just are meant the knowledge feedback by certain form, satisfy the process of the knowledge requirement of user's proposition.It has the characteristics of rich, level, intelligent and high efficiency.
Man-machine interaction is research people and computing machine and their interactional technology.Man-machine interface is meant the interface of the dialogue between computing machine and its user, is the important component part of computer system.About the research of human-computer interaction interface, along with the raising day by day of hardware performance and the generation of various auxiliary input devices, more and more to hyperchannelization, intelligentized direction develops now.This man-machine interface allows the user to use different input channels, such as various ways such as voice, gesture and handwriting inputs.
The employing interactive voice provides service, makes the user break away from the work of hand, eye fully, does not need manual input inquiry demand, does not also need to see screen, only needs to utilize mouth, ear can obtain needed knowledge fast like a cork.Speech interfaces can be widely applied in the high-tech products such as desktop computer, phone, mobile phone, PDA, for the user provides bigger facility.It is not high for external environment and conditional request, and phone or mobile phone just can be in airport, cars, family, restaurant or when going out outing, and knowledge is carried out speech polling and study in real time, are very easy to user's use.
Voice inquiry system, because it is based upon on the huge comprehensive knowledge base, enriching one's knowledge of can inquiring about is far longer than common Database Systems, and each subjects knowledge interrelates together, can utilize the contact that exists between each subject knowledge to carry out reasoning, draw the knowledge that does not originally have in the knowledge base, colourful knowledge services is provided.
Speech recognition technology develops into today, has obtained significant effect, and has been applied to many fields, but discrimination also has with a certain distance from real application, it is too sensitive to user and environment, and is especially when some professional texts of identification, again and again wrong especially.Common speech recognition software such as IBM ViaVoice, itself has just carried error correction, though it has high recognition in the speech recognition of general text, also exists many problems, and it is too responsive to user's pronunciation and environment on every side thereof.If user's accent is heavier or when speaking neighbourhood noise bigger, discrimination will reduce greatly so.Wanting in addition has good discrimination, and a user must train a large amount of samples, various mistakes nonetheless still can occur, is unfavorable for so very much the practical application of speech recognition technology.The voice consulting service towards be various levels of user, the user generally carries out phonetic entry by phone or mobile phone, because phone shielding noise ability, and the user can not spend a lot of time and efforts to go training in advance, so discrimination is very low, and " distinguishing sound " ability of these speech recognition softwares itself is powerful inadequately, thereby makes the real-time voice inquiry be difficult to be widely used.
For example read aloud " which symptom is diabetes have " with IBM ViaVoice 2000, because word speed is fast slightly when reading aloud, the recognition result of ViaVoice becomes " which diabetes have bestir oneself " unexpectedly.Its reason just is that present voice system generally all is is that the basis is analyzed text with the corpus, thereby has missed many important informations of knowledge.If carry out analysis ratiocination from the angle of knowledge, " diabetes " are traditional Chinese medical science notions, and are nearest with " symptom " in its associated attribute and " bestirring oneself " voice.Therefore through after speech analysis and the knowledge analysis, " bestirring oneself " replaces with " symptom " and makes that " which diabetes have bestir oneself " is meaningful.
Certainly, the possibility of result of this analysis has a plurality of hypothesis (promptly replacing), and we need filter one by one to different hypothesis application background knowledge, finds out the hypothesis that meets user's meaning most and comes.
In recent years, the obtaining on a large scale of knowledge, formalization processing and analyzing more and more was subject to people's attention.External more well-known have CYC engineering, BKB, CommonKADS, KIF and WordNet etc.The Cyc engineering of the U.S. is manually put human common sense knowledge in order from " Encyclopaedia Britannica " and other knowledge sources, sets up a huge human general knowledge storehouse; The BKB research of the U.S. is devoted to set up the botany knowledge base of a university level; The CommonKADS methodology in Europe provides the methodology of the development knowledge system of a cover through engineering approaches, has designed a cover knowledge model language; KIF is the switching method between the scholars of the Stanford university a kind of different representation of knowledge of developing; The WordNet knowledge base is a huge language knowledge base system by the exploitation of Princeton university.Domestic, the notion of the CNKI (NKI) that young scholar Cao Cungen proposed in nineteen ninety-five.CNKI be one huge, sharable, exercisable knowledge colony, its fundamental purpose is to make up a magnanimity domain knowledge base, the common knowledge that wherein not only comprises each subject (comprises medical science, military, physics, chemistry, mathematics, chemical industry, biological, meteorological, psychology, management, finance, historical, archaeology, geographical, geology, literature, architecture, music, the fine arts, law, philosophy, information science, religion, folk custom, or the like), but also incorporated each subject expert's personal knowledge, and on the basis of domain knowledge, make up human general knowledge storehouse.
Voice inquiry system be one based on the magnanimity knowledge in the interdisciplinary knowledge base of NKI, and inquire about multi-user's intelligent use system of each subject knowledge by voice.
Summary of the invention
The purpose of this invention is to provide a kind of arbitrary levels, the customizable world knowledge query language in field, provide the basis for distinguishing that cent is analysed, and the voice mistake estimated, quantitative test and correction distinguish sound model and algorithm, improve error correction rate to greatest extent, make " computing machine is distinguished sound " reach the degree of practicability the speech polling text.
For achieving the above object, the sound distinguishing method in a kind of speech polling comprises step: utilize speech recognition interface that voice are discerned, after identification, also comprise step:
Determine knowledge query language user customizable, that have inheritance;
Formation is distinguished the sound model based on knowledge, comprising:
Profiling error reason and classification,
Set up the similarity computation model,
Determine rule of similarity,
The sound trigger condition is distinguished in definition;
Execution based on knowledge, query language and distinguish the sound model distinguish the sound algorithm, comprising:
Carry out similar Word Intelligent Segmentation,
The matching inquiry template,
Carry out knowledge verification.
The present invention is not high for external environment and conditional request, and phone or mobile phone just can be in airport, cars, family, restaurant or when going out outing, and knowledge is carried out speech polling and study in real time, are very easy to user's use.
Description of drawings
Fig. 1 is the process flow diagram of user speech inquiry: it describes how to accept the user speech inquiry, and the user is returned in the knowledge answer of inquiry;
Fig. 2 is multi-level user knowledge query language syntax graph;
Fig. 3 describes the distinguish sound process of NKI knowledge services device to the user speech query text for distinguishing the system for electrical teaching process flow diagram;
Fig. 4 is similar Word Intelligent Segmentation example, depicts the step of the user speech query text being carried out similar participle;
Fig. 5 is an experimental result data, has listed recognition result and of the present invention the distinguish sound result of IBM ViaVoice to the user speech inquiry.
Embodiment
In Fig. 1, the user uses instruments such as mobile phone, phone or PDA to carry out speech polling, and at first we utilize existing speech recognition interface (as IBM ViaVoice) to discern, and obtain the speech polling text, wherein may comprise various mistakes; Utilize then and distinguish that system for electrical teaching carries out analysis ratiocination on the basis of knowledge query language and extensive knowledge base, obtain the right user inquiry, call our natural language querying module at last and find the knowledge information that meets user's request and feed back to the user.If our knowledge base does not have answer, can go the knowledge base of inquiring user customization by index, to reach the purpose of versatility.
In Fig. 3, according to distinguishing sound model and query template storehouse, knowledge base, the user speech query text is carried out similar Word Intelligent Segmentation earlier, the retrieval and inquisition template base finds the template that matches again, then each candidate template is carried out knowledge verification.If found relevant knowledge, then distinguish the sound success, the sentence of this word segmentation result correspondence be exactly to the user speech query text distinguish the sound result, and will inquire about answer and feed back to the user.
1. we introduce multi-level, customizable knowledge query language in field and memory module among the present invention.
At first, we carry out cluster to all properties in the knowledge base, and the attribute that inquiry mode is similar is got together, and takes out common query pattern, form the knowledge query language with inheritance; Next defines the question formulation of specific object; Utilize the automatic generated query template set of program compiler at last.
Basic symbol is described:
■ defquery: query language boot critical speech
■ inherits: the inheritance between the query language.It inherits all superstratums, makes that the ability to express of self is stronger than superstratum
■<about the explanation of this layer language 〉: to the explanation of this layer language, be a character string.
■ puts question to trigger: the trigger condition that the expression user puts question to.In case the user puts question to when triggering this condition, carry out immediately query actions getc (A, C ') or getv (C, A)
■<? C 〉: the sign variable of notion to be checked
■<? C ' 〉: the sign variable of related notion to be checked
■<? C 〉={ getc (A, C ') }: the value of extracting those grooves A from knowledge base is all notion C of C '.
■<? C ' 〉=getc (C, A) }: from knowledge base, extract the value of notion C on groove A.
But ■<field customization term 〉: can be the general keyword that may occur during the user puts question to, also can be the customizable term variable in expression field.
■<X|Y|...|Z 〉: this is the dummy suffix notation that we invent.It represents two implications.
The first, X, Y ... Z is the query language keyword.The second, in user inquiring, use X, Y ..., or the meaning of Z is the same, all obtains identical answer.Represent to be exactly<X|Y|...|Z with Backus normal form ∷=X|Y|...|Z.In addition, we are X, Y ... Z is called necessary speech, and they must and one of them can only occur in current location.
■ [<X|Y|...|Z 〉]: expression X, Y ... these speech of Z can omit at this place, and we are referred to as and can remove speech, [] is called can removes symbol.
■<! Put question to descriptor 〉: the cluster that the speech of same or similar meaning is arranged, as:<! What interrogative 〉=<assorted | what | which | which | what | what | ... 〉.
■<? the enquirement pattern of C 〉: the expression inquiry<? C〉time possible question formulation.Its grammer is:? but C<field customization interrogative 〉
■<? the enquirement pattern of C ' 〉: the expression inquiry<? C〉time possible question formulation.Its grammer is:? C ' but<field customization interrogative
The Backus normal form of general polling language is as follows:
Defquery<this layer language〉[succession<superstratum 〉]
{
Illustrate:<about the explanation of this layer language 〉
Put question to trigger: but<field customization term 〉,<? C 〉={ getc (A, C ') }, but<field customization term 〉,<? C ' 〉=getc (C, A) }, but<field customization term 〉
:<? C〉the enquirement pattern
:<? C '〉the enquirement pattern
}
For the concrete general polling language of using, we are example with " location of incident ", and are as follows about the enquirement subject description of " location of incident ":
Defquery location of incident ()
{
Illustrate: the place that is used to put question to incident.
Put question to trigger 1:<? C 〉={ getc (A, C ') };<? adverbial word 〉; [<be | for] [<| in];<? C ' 〉=getc (C, A) };<? incident 〉
:? C<! What interrogative〉<? this pronouns, general term for nouns, numerals and measure words 〉
:? C '<! Place interrogative 〉
}
In " defquery location of incident language ", 1 enquirement trigger is arranged.As the case may be, the deviser can define a plurality of arbitrarily.Utilize this language, the deviser can define location of incident query language more specifically.Concerning specific object, for example, be the query language of definition " place of birth " and " scene ", the deviser can adopt the method for succession simply, is defined as follows:
The defquery place of birth (? incident=<birth | give birth to,? this pronouns, general term for nouns, numerals and measure words=and<people〉}) the succession location of incident
The defquery scene (? incident=<take place | occur },? this pronouns, general term for nouns, numerals and measure words=and<people〉}) the succession location of incident
For ease of carrying out template matches, we are compiled as the knowledge query template with a program compiler with the knowledge query language that defines, and write then in the query template storehouse.
For example, to the query template after the corresponding query language compiling of attribute " place of birth " be:
The # place of birth
<C 〉; [<be | for] [<| in];<! Place interrogative 〉;<birth | give birth to〉@C '
<! What interrogative〉<people 〉; [<be | for] [<| in];<C ' 〉;<birth | give birth to〉@C
“ @C ' wherein " represent that this template puts question to property value, the i.e. value of the attribute of certain notion C " place of birth "; “ @C " represent that this template is to put question to notion, promptly the value of the attribute of which notion " place of birth " is C ' in the knowledge base.
2. we introduce and distinguish the sound model among the present invention.When under nonspecific occasion, carrying out speech polling without the unspecified person of voice training, owing to be subjected to the influence of noise, telephone line and declaimer's factors such as pronunciation, present speech recognition technology also is difficult to obtain satisfied recognition effect, text after the identification can have various mistakes, some mistake goes against accepted conventions very much, and the people has seen that what meaning all is confused about is.Therefore in order to make computing machine really " distinguish sound ", at first we need design a kind of sound model of distinguishing, to the voice mistake that the user may occur sort out, quantitative test and accurately estimating.
Distinguish that the sound model comprises: the occurrence cause of mistake, calculation of similarity degree, the trigger condition of distinguishing sound, the multiple selection rule of optimum solution among the sound result and the inference mechanism of knowledge etc. distinguished.We will realize an optimum balance: should correct maximum wrongly written character (even wrong relatively going against accepted conventions), guarantee that again correct word is not entangled by mistake, and be difficult to find such optimum in practice.In the above we for an example, read aloud " which symptom is diabetes have " with IBM ViaVoice, recognition result unexpectedly is " which diabetes have bestir oneself ".Its reason is that ViaVoice does not analyze from the angle of knowledge, and it thinks that " bestirring oneself " itself is a speech, and neither be very near from the distance of " symptom ", and promptly similarity is not high enough, so do not correct this mistake.Certainly this is a kind of way of insurance, guarantees that correct word or word can not entangled by mistake, but has but reduced error correction rate, has influenced accuracy of identification.We need be in conjunction with body and knowledge, study a kind of reach optimum balance distinguish the sound model, to improve error correction rate to greatest extent.
1) error reason.Because the user utilizes speech polling, the mistake that occurs in the speech polling text all is the voice mistake so, is characterized in, wrongly written or mispronounced characters is that font is not necessarily similar, but the same or analogous Chinese character that pronounces.As above " the shaking " in the example and " disease " pronunciation is identical, " works " with " shape " though pronounce different similar.
2) mis-classification.Divide from the angle of knowledge, the mistake that the user occurs can be divided into following three classes:
● the notion mistake
Example 1: propose to have how many people
Correctly: how many people the Yi nationality, distributed over Yunnan, Sichuan and Guizhou has
Example 2: which the raw material of yellow flag has
Correctly: which the raw material of the Radix Astragali has
" proposition " of example 1, " yellow flag " of example 2 all belongs to the notion mistake, and the notion that such wrong characteristics are knowledge bases has been made a mistake, and puts question to sentence pattern and quite right.
● the sentence pattern mistake
Example 3: in state-owned those cities
Correctly: in state-owned which city
Corresponding knowledge query template is:
<C 〉;<have under its command | comprise | have;<! What interrogative〉[<! Place, area noun 〉] '
("! What interrogative " in do not have " those ", have only " which ")
Example 4: the U.S. and ten unique
Correctly: when independent the U.S. is
Corresponding knowledge query template is:
<C 〉;<! Time interrogative 〉;<independent | free〉@C '
("! The time interrogative " in do not have " with ten ", have only " when ")
The notion C that such wrong characteristics are knowledge bases is quite right, but puts question to sentence pattern wrong, and we are called the sentence pattern mistake with this mistake.
● mix mistake
Example 5: bearing is to examine independently
Correctly: when independently Cambodia is
Such wrong characteristics are that notion mistake and Template Error occur simultaneously.
3) calculation of similarity degree.The wrongly written character that we correct all has a common ground, wrongly written or mispronounced characters and correctly the word voice are similar, so we need determine that certain Chinese character corrects by calculation of similarity degree, how to correct.For the voice mistake is accurately estimated, the present invention proposes a kind of calculation of similarity degree model (the similar equal finger speech sound of mentioning among the present invention is similar).
Similarity be used for representing between two words or two speech between similarity degree, codomain is [0,1].From the angle of phonetic, a Chinese character C is made up of an initial consonant and a simple or compound vowel of a Chinese syllable, and we can be with (ic v) represents Chinese character, and wherein ic and v represent to form the initial consonant and the simple or compound vowel of a Chinese syllable (some Chinese character does not have initial consonant, then corresponding ic=sky) of this Chinese character respectively.(sh is i) with (s, i), this representation transfers phonetic consistent with the nothing of Chinese character so we can be expressed as Chinese character "Yes" and " four ".Although the GB-2312 Chinese character has more than 6700, all Chinese characters finally can be summed up as about 400 classes.Then, we analyze these 400 classes from etic angle, sum up the pronunciation similarity between class, and table 1 has gone out the similar data of portion's sorting room.
Given any two Chinese character C 1=(ic 1, v 1) and C 2=(ic 2, v 2), we are with their pronunciation similarity PSIM (C 1, C 2) be defined as:
● 1, if ic 1=ic 2And v 1=v 2
● CSIM ([(ic 1, v 1)], [(ic 2, v 2)]), if ic 1≠ ic 2Or v 1≠ v 2
Two Chinese phrase W 1=C 1C 2... C nAnd W 2=D 1D 2... D nBetween the pronunciation similarity be:
PSIM(W 1,W 2)=∑PSIM(C i,D i)/n
Class 1 Class 2 CSIM(Class, Class)
[(b,ai)] [(b,ei)] 0.8
[(ch,i)] [(c,i)] 0.92
[(ch,i)] [(q,i)] 0.8
[(k,e) [(g,e)] 0.75
[(zh,eng)] [(zh,en)] 0.95
[(zh,uang)] [(z,uo)] 0.7
[(sh,i)] [(s,i)] 0.92
[(sh,i)] [(s,e)] 0.65
[(y,un)] [(y,uan)] 0.7
... ... ...
The pronunciation similarity of table 1 sorting room
We introduce several definition again:
If the similarity that defines between 1 phonetically similar word word C and the source word C ' is 1, claim that then C is the phonetically similar word of C '.
If define similarity between 2 similar character word C and the source word C ' greater than certain threshold value μ 1, claim that then C is a similar character, and C be similar in appearance to C '.
If define similarity between 3 similar word speech W and the source speech W ' greater than certain threshold value μ 2, and the word in the speech is all corresponding similar, claim that then W is a similar word, and W is similar in appearance to W '.
If definition 4 accurate speech speech W occur at former text correspondence position, claim that then W is accurate speech.
Through experiment test, μ 1=0.6, μ 2=0.7.
For example: " symptom " and " bestirring oneself "
PSIM (" disease ", " shaking ")=CSIM ([(zh, eng)], [(zh, en)])=0.95>μ 1
PSIM (" shape ", " work ")=CSIM ([(zh, uang)], [(z, uo)])=0.7>μ 1
Because " disease " and " shaking ", " shape " is all corresponding similar with " work ", and PSIM (" symptom ", " bestir oneself ")=[PSIM (" disease ", " shake ")+PSIM (" shape ", " work ")]/2=[0.95+0.7]/2=0.825>μ 2, so " bestirring oneself " similar in appearance to " symptom ", similarity is 0.825.
4) rule of similarity.When user inquiring is distinguished that cent is analysed, because mistake is often far off the beam, and the similarity between correct sentence is not high enough, so we put the threshold value of similar character and similar word very low, thousands of kinds of analog results will appear in such sentence, give the very big workload of having distinguished vocal cores.In order to realize distinguishing fast sound, we will produce these analog results according to certain rule, and correct result is occurred the earliest.
For example similarity analysis is carried out in user speech inquiry " U.S. and ten unique ", have with the similar word headed by " U.S. ": " U.S. ", " Mekong ", " foreign country ", " attractive in appearance ", " weber ", " U.S. " etc.; With " with " headed by similar word how " what food " arranged, " when ", " what ", " suitable ", " examining ", " putting the palms together before one ", " ", " river " etc.By such combination this speech polling that goes down several thousand kinds of analog results are just arranged, and we need to carry out analyzing and processing as a result to every kind, so we will compare the priority between similar word, remove to handle the most similar speech earlier.
Between speech priority relatively be divided into three kinds of situations: accurately speech and the accurately comparison of speech, the comparison of similar word and similar word, the accurately comparison of speech and similar word, we have summed up corresponding priority rule respectively at these three kinds of situations.
● if two speech all are accurate speech, and then length is preferential.As above " U.S. " has precedence over " U.S. " in the example.
● if two speech all are similar word, and then the many persons of unisonance number of words are preferential; If two speech unisonance numbers of words are identical, then similarity is preferential.As above in the example " when " have precedence over " how ".
● if two one of speech are accurate speech, and another is a similar word, and then similar word is better than accurate speech
Figure C0216027200141
Similar word number of words>=accurate speech number of words *2, and the unisonance number of words in the similar word>=accurate speech number of words.As above similar word in the example " when " have precedence over accurate speech " with ".
5) distinguish the trigger condition of sound, promptly when to user speech inquiry distinguish sound.Because distinguishing sound is to need to consume certain hour, the user speech query text after the voice software identification may be wrong, also may be quite right.We can not distinguish the sound processing to carry out at every turn, need definition to distinguish the trigger condition of sound.
At first, former query text is carried out participle, carry out template matches with the knowledge query template then.When one of following situation occurring, trigger and distinguish the sound operation.
● the participle failure;
● can not find the knowledge query template that any coupling is arranged with former query text;
● found the knowledge query template of mating, but differed (knowledge query template number of words/former query text number of words<0.7) far away with former query text;
● found the knowledge query template of mating fully, but in knowledge base, do not found relevant knowledge with former query text.
If former query text has found relevant knowledge, then illustrate errorlessly, give the user with this knowledge feedback.
3. we introduce and distinguish the sound algorithm among the present invention.
The essence of distinguishing the sound algorithm among the present invention be exactly multi-level, can guiding by the knowledge query language of field customization and NKI knowledge base under, find and the most similar linguistic form of user speech query text.
Basic symbol is described:
Knowledge base dictionary: char*knodic[knodic_num];
Query template dictionary: char*keydic[keydic_num]; Similar character structure: typedef struct class_simzidata{
Char zi[2]; // similar character
Int simdegree; The similarity of // this word and former word
Int dic_flag; Whether speech headed by this word is arranged in // query language and the knowledge base } class_simzidata; The similar list structure of Chinese character: typedef struct class_simzitable{
Char zi[2]; // Chinese character
Long keydic_lb; The reference position of // this word in the query template dictionary
Long keydic_hb; The rearmost position of // this word in the query template dictionary
Long knodic_lb; The reference position of // this word in the knowledge base dictionary
Long knodic_hb; The rearmost position of // this word in the knowledge base dictionary
Int simzi_num; The similar number of words of // this word
Class_simzidata*simzi; The information of // each similar character } class_simzitable; Word structure typedef struct phrase{ in the // participle
Char*phrase_str; // this speech content
Long lexi_no; The location index of // this speech in the query template storehouse
Int var_flag; // this speech is the knowledge base notion or the speech of query template } phrase; // sentence participle information table typedef struct decompose_info{
Int phrase_count; // speech the number that comprises
Int var_phrase_count; // notion number
Struct phrase*phrase_head; The information of each speech in // this word segmentation result } decompose_info; The user puts question to the feedack list structure: typedef struct info_table{
Char*access_time; // the access time
Char*action; // action: inquiry or adds
Char*question; // corresponding complete problem
Char match_type[6]; // accurately still fuzzy matching
Char*query_type; The query type that // user puts question to
Char*concept; // notion
Char*attr_name; // attribute-name
Char*attr_value; // property value
Int var_num; // notion number
Char*var_list[VAR_COUNT]; // variable list
Char*answer; The answer of // feedback } info_table; // variable description question: the user inquires about IdentifyInfoTable: distinguish the knowledge feedback information IdentifyResult that sound obtains: distinguish as a result certain similar word segmentation result sen_set of wordsegment. user speech query text of sound: candidate template collection sen: certain candidate template SimziList: the character set SimciList similar to certain character: similar word set; By similarity sort descending Success: distinguish the mark of sound success // function representation AddSegTail (wordsegment, Wi)
With speech Wi add word segmentation result wordsegmentCompWordSim (W1, W2)
Calculate the similar value GetText (wordsegment) of speech W1 and W2
Obtain word segmentation result wordsegment correspondence sentence InsertSimci (SimciList, W, simdata)
Similar word W and similar value simdata thereof are inserted among the SimciList, and keep the similarity descending order of SimciList
Distinguish the sound master routine:
Input: user speech query text question
Output: distinguish sound IdentifyResult as a result, knowledge feedback information IdentifyInfoTablevoid IdentifyProun (char*question, decompose_info wordsegment)
// if distinguish that sound is successful, then return
if(Success=1)
return;
If (question is empty)
{
If // this sentence participle finishes, then obtained a kind of complete word segmentation result, advance
The checking of row coupling
IdentifyInfoTable=ProcessSegment(wordsegment);
If // this participle has found relevant knowledge, then distinguish the sound success
If (IdenttifyInfoTable non-NULL)
{
Success=1;
The corresponding sentence of // this participle is distinguishes the sound result
IdentifyResult=GetText(wordsegment);
}
}
else
{
// continuation participle
Char=question[0];
// find the similar character collection SimziList of Char
For?every?Si?in?SimziList
{
// in the knowledge base dictionary, search with the similar word headed by the Si
if(zisim[neima].knodic_lb>0)
{
for(i=Si.knodic_lb;i<=Si.knodic_hb;i++)
{
// obtain this speech corresponding characters string in original subscriber inquiry
Initword=SubString(question,0,len(knodic[i]))
The similarity of this speech of // calculating and former character string
simdata=CompWordSim(knodic[i],Initword);
If (simdata>similar word threshold value)
{
// if similar, then this speech analog result is successively decreased by relative importance value
Order add in the similar word tabulation
InsertSimci(SimciList,knodic[i],simdata);
}
}
}
// in the query template dictionary, search with the similar word headed by the Si
if(zisim[neima].keydic_lb>0)
{
for(i=Si.keydic_lb;i<=Si.keydic_hb;i++)
{
// obtain this speech corresponding characters string in original subscriber inquiry
Initword=SubString(question,0,len(keydic[i]))
The similarity of this speech of // calculating and former character string
simdata=CompWordSim(keydic[i],Initword);
If (simdata>similar word threshold value)
{
// if similar, then this speech analog result is successively decreased by relative importance value
Order add in the similar word tabulation
InsertSimci(SimciList,keydic[i],simdata);
}
}
}
}
// generate participle by similarity priority descending order
For?every?Wi?in?SimciList
{
// this similar word is added to current word segmentation result
AddSegTail(wordsegment,Wi);
// obtain still untreated string
RemainStr=SubString(question,0,len(Wi));
// recurrence is handled remaining string
IdentifyProun(RemainStr,wordsegment);
}
The input of coupling proving program: certain word segmentation result wordsegment output of user inquiring sentence: the feedback information table info_table ProcessSegment (decompose_info wordsegment) of this word segmentation result
// ask the common factor of each speech location index collection in the query template storehouse among the wordsegment, get
To the appearance space of this word segmentation result in the query template storehouse
sen_set=GetIntersection(wordsegment);
// each candidate template is judged screening, see whether it mates with wordsegment
for?every?sen?in?sen_set
{
An if (wordsegment. variable number!=sen. variable number)
Continue; // do not match
If (the necessary speech number of wordsegment. speech number<sen. || wordsegment. speech number>sen.
The speech number)
continue;
(the necessary speech position sequence-wordsegment. of sen. non-variable speech is in template for if
A position sequence!=wordsegmenmt. variable)
continue;
If // this template satisfies above-mentioned condition, and has successfully carried out knowledge verification,
Then template matches success.
query_info_table=VerifyKnowledge(sen);
if(query_info_table.answer!=NULL)
return?query_info_table;
}
return?empty;
}
As shown in Figure 3, distinguish that the treatment step of sound is as follows:
1) according to distinguishing sound model and query template storehouse, knowledge base, the user speech query text is carried out similar Word Intelligent Segmentation, whenever obtain a kind of word segmentation result, then change 2).
2) according to word segmentation result retrieval and inquisition template base, find the template that matches, judge then whether this template is complementary with current word segmentation result in form, thereby obtain the candidate template set.
3) each candidate template is carried out knowledge verification.Carry out the knowledge base retrieval according to the enquirement type of template and the KAPI function of realization.
If found relevant knowledge, then distinguish the sound success, the sentence of this word segmentation result correspondence be exactly lining user inquiring text distinguish the sound result, and will inquire about answer and feed back to the user.
If do not find relevant knowledge, then change 1), continue similar word segmentation processing.
Below we are elaborated to each several part.
I. similar Word Intelligent Segmentation
The used dictionary of participle is knowledge base dictionary and query template dictionary, and the knowledge base dictionary comprises all notions that knowledge base occurs, and the query template dictionary comprises all keywords that occur in the query template storehouse and the position in template base thereof.The speech that occurs in the user inquiring text both may be the similar word of knowledge base notion, also may be the similar word of query template speech.
The participle here is similar participle, generates and all similar word segmentation result of former inquiry sentence voice.Through experimental analysis, mistake in the recognition result of unspecified person speech polling under nonspecific occasion and correct result often have very big difference, so our very low with the definition of the threshold value of similarity distinguished the accuracy of sound with raising.Make that so just the number of similar word segmentation result is very huge, reach several thousand even several ten thousand.We adopt the method for similar word ordering, and the order that each word segmentation result is successively decreased according to similarity occurs, and whenever obtains a kind of word segmentation result, just go coupling checking in template base and the knowledge base.In case find relevant knowledge, then distinguish the sound success, to return at once, the low word segmentation result of those similarities of back this moment does not occur as yet.So just, greatly reduce the time complexity of distinguishing sound.
Example such as Fig. 4 show that wherein the word segmentation result program of dotted portion is not carried out.
II. template matches
In fact the problem of template matches is exactly to judge that a sample belongs to the problem of which class, and it is sample to be analyzed that the user puts question to sentence, and each template in the query template storehouse is the classification of various enquirement forms.
The step of template matches is as follows:
To certain similar word segmentation result of user inquiring sentence, do following the processing.
1) at first according to the location index of each keyword in template base, find their appearance space, the space appears in the sample that obtains this word segmentation result then by seeking common ground.
2) candidate template that sample is occurred in the space is screened, and the condition of screening is as follows:
● the variable number of the variable number=template in the word segmentation result
● the total speech number of the total speech number<=template of necessary speech number<=word segmentation result of template
Word segmentation result must contain necessary speech all in the template, and is indispensable, i.e. { the necessary speech position sequence in the template }-position sequence of each non-variable speech in template in the word segmentation result={ all variablees that occur in the word segmentation result }
● each speech each speech occurs in order and the template order to occur consistent in the word segmentation result.
In order whether this conditional decision coupling, considers the freedom that the user puts question to, and can get rid of this condition and realize unordered coupling.
We have obtained the candidate template set that is complementary in form with this word segmentation result according to the screening of these conditions.
III. knowledge verification
The candidate template that obtain this moment also needs to carry out the knowledge inspection, and we remove to call corresponding knowledge base api function according to the attribute and the enquirement type of template correspondence, look to find correct option.
KAPI be we develop about knowledge base interface operable function, for upper level applications provides service.Common KAPI has:
// obtain property value according to notion and attribute
Get_attribute_value (concept, attribute), abbreviation getv (C, A)
// obtain notion according to attribute and property value
(attribute attribute_value), is called for short getc (A, C ') to get_concepts
// obtain an attribute that notion is all
get_all_attributes(concept)
//isa reasoning judges that a notion is another notion
isa_reasoning(concept1,concept2)
The part that notion is another notion is judged in //partof reasoning
partof_reasoning(concept1,concept2)
IV. experimental data
We as speech recognition interface, under noisy environment read aloud 100 problems by a plurality of people that do not pass through any voice training with IBM ViaVoice2000, and Fig. 5 has listed partial data.Experimental data shows through distinguishing sound, and error rate is reduced to 12% from original 65%, has obtained satisfied result.

Claims (9)

1.一种语音查询中的辨音方法,包括步骤:利用语音识别接口对语音进行识别,在识别后,还包括步骤:1. A method for distinguishing sounds in a speech query, comprising the steps of: utilizing a speech recognition interface to recognize speech, after the recognition, also comprising the steps of: 确定用户可定制的、具有继承关系的知识查询语言;Determine the user-customizable knowledge query language with inheritance relationship; 形成基于知识的辨音模型,包括:Form a knowledge-based sound recognition model, including: 分析错误原因和分类,Analysis of error causes and classification, 建立相似度计算模型,Build a similarity calculation model, 确定相似规则,determine similarity rules, 定义辨音触发条件;Define the trigger conditions for sound identification; 执行基于知识、查询语言和辨音模型的辨音算法,包括:Execute the sound recognition algorithm based on knowledge, query language and sound recognition model, including: 执行相似智能分词,Perform similar smart word segmentation, 匹配查询模板,match query template, 进行知识验证。Perform knowledge verification. 2.按权利要求1所述的方法,其特征在于所述确定用户可定制的知识查询语言包括步骤:2. by the described method of claim 1, it is characterized in that the knowledge query language that the described determination user can customize comprises the step: 将知识库中的所有属性进行聚类;Cluster all attributes in the knowledge base; 定义具体属性的提问方式;Define the way to ask questions for specific attributes; 生成查询模板集。Generate a set of query templates. 3.按权利要求2所述的方法,其特征在于所述的属性聚类包括步骤:3. by the described method of claim 2, it is characterized in that described attribute clustering comprises the step: 将查询方式相似的属性聚在一起;Group attributes that are queried in a similar way; 抽象出共同的查询模式。Abstract common query patterns. 4.按权利要求1所述的方法,其特征在于所述的语音识别接4. by the described method of claim 1, it is characterized in that described speech recognition interface 口是IBM ViaVoice。The port is IBM ViaVoice. 5.按权利要求1所述方法,其特征在于所述的错误分类包括:概念错误、句型错误和混合错误。5. by the described method of claim 1, it is characterized in that described error classification comprises: concept error, sentence pattern error and mixed error. 6.按权利要求1所述方法,其特征在于所述的相似度计算使用下述公式:6. by the described method of claim 1, it is characterized in that described similarity calculation uses following formula: 1,如果ic1=ic2且v1=v2CSIM([(ic1,v1)],[(ic2,v2)]),如果ic1≠ic2或v1≠v21. If ic 1 =ic 2 and v 1 =v 2 CSIM([(ic 1 , v 1 )], [(ic 2 , v 2 )]), if ic 1 ≠ic 2 or v 1 ≠v 2 . 7.按权利要求1所述方法,其特征在于所述的相似规则包括:7. by the described method of claim 1, it is characterized in that described similar rule comprises: 如果两个词都是精确词,则长度优先;If both words are exact words, length takes precedence; 如果两个词都为相似词,则同音字数多者优先,若两词同音字数相同,则相似度优先;If both words are similar words, the one with more homonyms is preferred; if the two words have the same number of homophones, the similarity is preferred; 如果两个词一个为精确词,另一个为相似词,则相似词优于精确词。If one of the two words is an exact word and the other is a similar word, the similar word is better than the exact word. 8.按权利要求1所述方法,其特征在于所述的定义辨音的触发条件包括当出现下述情形之一时,触发辨音操作:8. by the described method of claim 1, it is characterized in that the triggering condition of described definition sound identification comprises when one of following situation occurs, trigger sound identification operation: 分词失败;Word segmentation failed; 找不到和原查询文本有任何匹配的知识查询模板;No knowledge query template matching the original query text could be found; 找到了和原查询文本匹配的知识查询模板,匹配度小于70%;Found a knowledge query template that matches the original query text, and the matching degree is less than 70%; 找到了和原查询文本完全匹配的知识查询模板,但在知识库里没有找到相关的知识。A knowledge query template that exactly matches the original query text was found, but no relevant knowledge was found in the knowledge base. 9.按权利要求1所述的方法,其特征在于所述的基于知识、查询语言和辨音模型的辨音算法包括步骤:9. by the described method of claim 1, it is characterized in that the sound discrimination algorithm based on knowledge, query language and sound discrimination model comprises steps: 根据辨音模型及查询模板库、知识库,对用户语音查询文本进行相似智能分词;According to the sound recognition model, query template library, and knowledge base, similar intelligent word segmentation is performed on the user voice query text; 根据分词结果检索查询模板库,找到与之匹配的模板,判断该模板在形式上是否与当前分词结果相匹配;Retrieve and query the template library according to the word segmentation result, find a matching template, and judge whether the template matches the current word segmentation result in form; 对各候选模板进行知识验证,根据模板的提问类型以及实现的知识应用编程接口KAPI函数进行知识库检索。Knowledge verification is carried out on each candidate template, and knowledge base retrieval is carried out according to the question type of the template and the realized knowledge application programming interface KAPI function.
CNB021602727A 2002-12-31 2002-12-31 Sound distinguishing method in speech sound inquiry Expired - Lifetime CN1266633C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB021602727A CN1266633C (en) 2002-12-31 2002-12-31 Sound distinguishing method in speech sound inquiry

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB021602727A CN1266633C (en) 2002-12-31 2002-12-31 Sound distinguishing method in speech sound inquiry

Publications (2)

Publication Number Publication Date
CN1514387A CN1514387A (en) 2004-07-21
CN1266633C true CN1266633C (en) 2006-07-26

Family

ID=34237825

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB021602727A Expired - Lifetime CN1266633C (en) 2002-12-31 2002-12-31 Sound distinguishing method in speech sound inquiry

Country Status (1)

Country Link
CN (1) CN1266633C (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100375006C (en) * 2006-01-19 2008-03-12 吉林大学 Vehicle navigation device voice control system
CN101499277B (en) * 2008-07-25 2011-05-04 中国科学院计算技术研究所 Service intelligent navigation method and system
US9978365B2 (en) 2008-10-31 2018-05-22 Nokia Technologies Oy Method and system for providing a voice interface
US8401856B2 (en) * 2010-05-17 2013-03-19 Avaya Inc. Automatic normalization of spoken syllable duration
CN104021786B (en) * 2014-05-15 2017-05-24 北京中科汇联信息技术有限公司 Speech recognition method and speech recognition device
CN104199825A (en) * 2014-07-23 2014-12-10 清华大学 Information inquiry method and system
CN104484370B (en) * 2014-12-04 2018-05-01 广东小天才科技有限公司 Answer information sending method, receiving method, device and system based on question and answer
CN104991889B (en) * 2015-06-26 2018-02-02 江苏科技大学 A kind of non-multi-character word error auto-collation based on fuzzy participle
CN106128457A (en) * 2016-08-29 2016-11-16 昆山邦泰汽车零部件制造有限公司 A kind of control method talking with robot
CN107301865B (en) * 2017-06-22 2020-11-03 海信集团有限公司 Method and device for determining interactive text in voice input
CN108777142A (en) * 2018-06-05 2018-11-09 上海木木机器人技术有限公司 A kind of interactive voice recognition methods and interactive voice robot based on airport environment
CN110364165A (en) * 2019-07-18 2019-10-22 青岛民航凯亚系统集成有限公司 Flight dynamic information voice inquiry method
CN112767923B (en) * 2021-01-05 2022-12-23 上海微盟企业发展有限公司 Voice recognition method and device

Also Published As

Publication number Publication date
CN1514387A (en) 2004-07-21

Similar Documents

Publication Publication Date Title
CN1215433C (en) Online character identifying device, method and program and computer readable recording media
CN1266633C (en) Sound distinguishing method in speech sound inquiry
CN1110757C (en) Method and device for processing two-language database
CN1297935C (en) System and method for performing unstructured information management and automatic text analysis
CN101079026A (en) Text similarity, acceptation similarity calculating method and system and application system
HK1049053A1 (en) Method for assembling and using a knowledge base
CN1447261A (en) Apparatus and method for specific element and string vector generation and similarity calculation
CN1608259A (en) Machine translation
CN1628298A (en) Method for synthesizing self-learning systems that extract knowledge from documents used in search systems
CN1219266C (en) Method for realizing multi-path dialogue for man-machine Chinese colloguial conversational system
CN101042868A (en) Clustering system, clustering method, clustering program and attribute estimation system using clustering system
CN1578954A (en) Machine translation
CN1331449A (en) Method and relative system for dividing or separating text or decument into sectional word by process of adherence
CN1535433A (en) Category based, extensible and interactive system for document retrieval
CN1133460A (en) Information taking method, equipment, weighted method and receiving equipment for graphic and character television transmission
CN1319836A (en) Method and device for converting expressing mode
CN1821956A (en) Using existing content to generate active content wizard executables for execution of tasks
CN1315020A (en) Method and apparatus for free-form data processing
CN1245577A (en) Question-Based Learning Method and System
CN1975858A (en) Conversation control apparatus
CN1125990A (en) Method and device for software automatic analysis
CN1452156A (en) Voice identifying apparatus and method, and recording medium with recorded voice identifying program
CN1255213A (en) Language analysis system and method
CN1577229A (en) Method for inputting note string into computer and diction production, and computer and medium thereof
CN1813252A (en) Information processing method, information processing program, information processing device, and remote controller

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20040721

Assignee: Beijing Zhongke force Intelligent Technology Co.,Ltd.

Assignor: Institute of Computing Technology, Chinese Academy of Sciences

Contract record no.: 2014110000024

Denomination of invention: Sound distinguishing method in speech sound inquiry

Granted publication date: 20060726

License type: Exclusive License

Record date: 20140610

LICC Enforcement, change and cancellation of record of contracts on the licence for exploitation of a patent or utility model
CX01 Expiry of patent term

Granted publication date: 20060726

CX01 Expiry of patent term
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载