CN1266633C

CN1266633C - Sound distinguishing method in speech sound inquiry

Info

Publication number: CN1266633C
Application number: CNB021602727A
Authority: CN
Inventors: 丰强泽; 曹存根
Original assignee: Institute of Computing Technology of CAS
Current assignee: Institute of Computing Technology of CAS
Priority date: 2002-12-31
Filing date: 2002-12-31
Publication date: 2006-07-26
Anticipated expiration: 2022-12-31
Also published as: CN1514387A

Abstract

A sound identification method in voice query, comprising the steps of: using the existing speech recognition interface to recognize the voice, after the recognition, also includes the steps of: determining a user-customizable knowledge query language; forming a knowledge-based sound identification model; Fast phonetic recognition algorithm based on knowledge, query language and phonetic model. The present invention does not have high requirements for the external environment and conditions, and a telephone or a mobile phone can conduct real-time voice query and learning of knowledge at the airport, in a car, at home, in a restaurant or when going out for an outing, which greatly facilitates users usage of.

Description

Sound distinguishing method in the speech polling

Technical field

The present invention relates to the sound technology of distinguishing in the speech polling, particularly the speech polling of unbred unspecified person is carried out sound distinguishing method based on knowledge.

Background technology

Knowledge services is an emerging product of Knowledge Society.In knowledge-based society, people are increasing to the demand of information and knowledge, and wish to obtain needed information and knowledge whenever and wherever possible.Information and knowledge services just are meant the knowledge feedback by certain form, satisfy the process of the knowledge requirement of user's proposition.It has the characteristics of rich, level, intelligent and high efficiency.

Man-machine interaction is research people and computing machine and their interactional technology.Man-machine interface is meant the interface of the dialogue between computing machine and its user, is the important component part of computer system.About the research of human-computer interaction interface, along with the raising day by day of hardware performance and the generation of various auxiliary input devices, more and more to hyperchannelization, intelligentized direction develops now.This man-machine interface allows the user to use different input channels, such as various ways such as voice, gesture and handwriting inputs.

The employing interactive voice provides service, makes the user break away from the work of hand, eye fully, does not need manual input inquiry demand, does not also need to see screen, only needs to utilize mouth, ear can obtain needed knowledge fast like a cork.Speech interfaces can be widely applied in the high-tech products such as desktop computer, phone, mobile phone, PDA, for the user provides bigger facility.It is not high for external environment and conditional request, and phone or mobile phone just can be in airport, cars, family, restaurant or when going out outing, and knowledge is carried out speech polling and study in real time, are very easy to user's use.

Voice inquiry system, because it is based upon on the huge comprehensive knowledge base, enriching one's knowledge of can inquiring about is far longer than common Database Systems, and each subjects knowledge interrelates together, can utilize the contact that exists between each subject knowledge to carry out reasoning, draw the knowledge that does not originally have in the knowledge base, colourful knowledge services is provided.

Speech recognition technology develops into today, has obtained significant effect, and has been applied to many fields, but discrimination also has with a certain distance from real application, it is too sensitive to user and environment, and is especially when some professional texts of identification, again and again wrong especially.Common speech recognition software such as IBM ViaVoice, itself has just carried error correction, though it has high recognition in the speech recognition of general text, also exists many problems, and it is too responsive to user's pronunciation and environment on every side thereof.If user's accent is heavier or when speaking neighbourhood noise bigger, discrimination will reduce greatly so.Wanting in addition has good discrimination, and a user must train a large amount of samples, various mistakes nonetheless still can occur, is unfavorable for so very much the practical application of speech recognition technology.The voice consulting service towards be various levels of user, the user generally carries out phonetic entry by phone or mobile phone, because phone shielding noise ability, and the user can not spend a lot of time and efforts to go training in advance, so discrimination is very low, and " distinguishing sound " ability of these speech recognition softwares itself is powerful inadequately, thereby makes the real-time voice inquiry be difficult to be widely used.

For example read aloud " which symptom is diabetes have " with IBM ViaVoice 2000, because word speed is fast slightly when reading aloud, the recognition result of ViaVoice becomes " which diabetes have bestir oneself " unexpectedly.Its reason just is that present voice system generally all is is that the basis is analyzed text with the corpus, thereby has missed many important informations of knowledge.If carry out analysis ratiocination from the angle of knowledge, " diabetes " are traditional Chinese medical science notions, and are nearest with " symptom " in its associated attribute and " bestirring oneself " voice.Therefore through after speech analysis and the knowledge analysis, " bestirring oneself " replaces with " symptom " and makes that " which diabetes have bestir oneself " is meaningful.

Certainly, the possibility of result of this analysis has a plurality of hypothesis (promptly replacing), and we need filter one by one to different hypothesis application background knowledge, finds out the hypothesis that meets user's meaning most and comes.

In recent years, the obtaining on a large scale of knowledge, formalization processing and analyzing more and more was subject to people's attention.External more well-known have CYC engineering, BKB, CommonKADS, KIF and WordNet etc.The Cyc engineering of the U.S. is manually put human common sense knowledge in order from " Encyclopaedia Britannica " and other knowledge sources, sets up a huge human general knowledge storehouse; The BKB research of the U.S. is devoted to set up the botany knowledge base of a university level; The CommonKADS methodology in Europe provides the methodology of the development knowledge system of a cover through engineering approaches, has designed a cover knowledge model language; KIF is the switching method between the scholars of the Stanford university a kind of different representation of knowledge of developing; The WordNet knowledge base is a huge language knowledge base system by the exploitation of Princeton university.Domestic, the notion of the CNKI (NKI) that young scholar Cao Cungen proposed in nineteen ninety-five.CNKI be one huge, sharable, exercisable knowledge colony, its fundamental purpose is to make up a magnanimity domain knowledge base, the common knowledge that wherein not only comprises each subject (comprises medical science, military, physics, chemistry, mathematics, chemical industry, biological, meteorological, psychology, management, finance, historical, archaeology, geographical, geology, literature, architecture, music, the fine arts, law, philosophy, information science, religion, folk custom, or the like), but also incorporated each subject expert's personal knowledge, and on the basis of domain knowledge, make up human general knowledge storehouse.

Voice inquiry system be one based on the magnanimity knowledge in the interdisciplinary knowledge base of NKI, and inquire about multi-user's intelligent use system of each subject knowledge by voice.

Summary of the invention

The purpose of this invention is to provide a kind of arbitrary levels, the customizable world knowledge query language in field, provide the basis for distinguishing that cent is analysed, and the voice mistake estimated, quantitative test and correction distinguish sound model and algorithm, improve error correction rate to greatest extent, make " computing machine is distinguished sound " reach the degree of practicability the speech polling text.

For achieving the above object, the sound distinguishing method in a kind of speech polling comprises step: utilize speech recognition interface that voice are discerned, after identification, also comprise step:

Determine knowledge query language user customizable, that have inheritance;

Formation is distinguished the sound model based on knowledge, comprising:

Profiling error reason and classification,

Set up the similarity computation model,

Determine rule of similarity,

The sound trigger condition is distinguished in definition;

Execution based on knowledge, query language and distinguish the sound model distinguish the sound algorithm, comprising:

Carry out similar Word Intelligent Segmentation,

The matching inquiry template,

Carry out knowledge verification.

The present invention is not high for external environment and conditional request, and phone or mobile phone just can be in airport, cars, family, restaurant or when going out outing, and knowledge is carried out speech polling and study in real time, are very easy to user's use.

Description of drawings

Fig. 1 is the process flow diagram of user speech inquiry: it describes how to accept the user speech inquiry, and the user is returned in the knowledge answer of inquiry;

Fig. 2 is multi-level user knowledge query language syntax graph;

Fig. 3 describes the distinguish sound process of NKI knowledge services device to the user speech query text for distinguishing the system for electrical teaching process flow diagram;

Fig. 4 is similar Word Intelligent Segmentation example, depicts the step of the user speech query text being carried out similar participle;

Fig. 5 is an experimental result data, has listed recognition result and of the present invention the distinguish sound result of IBM ViaVoice to the user speech inquiry.

Embodiment

In Fig. 1, the user uses instruments such as mobile phone, phone or PDA to carry out speech polling, and at first we utilize existing speech recognition interface (as IBM ViaVoice) to discern, and obtain the speech polling text, wherein may comprise various mistakes; Utilize then and distinguish that system for electrical teaching carries out analysis ratiocination on the basis of knowledge query language and extensive knowledge base, obtain the right user inquiry, call our natural language querying module at last and find the knowledge information that meets user's request and feed back to the user.If our knowledge base does not have answer, can go the knowledge base of inquiring user customization by index, to reach the purpose of versatility.

In Fig. 3, according to distinguishing sound model and query template storehouse, knowledge base, the user speech query text is carried out similar Word Intelligent Segmentation earlier, the retrieval and inquisition template base finds the template that matches again, then each candidate template is carried out knowledge verification.If found relevant knowledge, then distinguish the sound success, the sentence of this word segmentation result correspondence be exactly to the user speech query text distinguish the sound result, and will inquire about answer and feed back to the user.

1. we introduce multi-level, customizable knowledge query language in field and memory module among the present invention.

At first, we carry out cluster to all properties in the knowledge base, and the attribute that inquiry mode is similar is got together, and takes out common query pattern, form the knowledge query language with inheritance; Next defines the question formulation of specific object; Utilize the automatic generated query template set of program compiler at last.

Basic symbol is described:

■ defquery: query language boot critical speech

■ inherits: the inheritance between the query language.It inherits all superstratums, makes that the ability to express of self is stronger than superstratum

■＜about the explanation of this layer language 〉: to the explanation of this layer language, be a character string.

■ puts question to trigger: the trigger condition that the expression user puts question to.In case the user puts question to when triggering this condition, carry out immediately query actions getc (A, C ') or getv (C, A)

■＜? C 〉: the sign variable of notion to be checked

■＜? C ' 〉: the sign variable of related notion to be checked

■＜? C 〉={ getc (A, C ') }: the value of extracting those grooves A from knowledge base is all notion C of C '.

■＜? C ' 〉=getc (C, A) }: from knowledge base, extract the value of notion C on groove A.

But ■＜field customization term 〉: can be the general keyword that may occur during the user puts question to, also can be the customizable term variable in expression field.

■＜X|Y|...|Z 〉: this is the dummy suffix notation that we invent.It represents two implications.

The first, X, Y ... Z is the query language keyword.The second, in user inquiring, use X, Y ..., or the meaning of Z is the same, all obtains identical answer.Represent to be exactly＜X|Y|...|Z with Backus normal form ∷=X|Y|...|Z.In addition, we are X, Y ... Z is called necessary speech, and they must and one of them can only occur in current location.

■ [＜X|Y|...|Z 〉]: expression X, Y ... these speech of Z can omit at this place, and we are referred to as and can remove speech, [] is called can removes symbol.

■＜? the enquirement pattern of C 〉: the expression inquiry＜? C〉time possible question formulation.Its grammer is:? but C＜field customization interrogative 〉

■＜? the enquirement pattern of C ' 〉: the expression inquiry＜? C〉time possible question formulation.Its grammer is:? C ' but＜field customization interrogative

The Backus normal form of general polling language is as follows:

Defquery＜this layer language〉[succession＜superstratum 〉]

{

Illustrate:＜about the explanation of this layer language 〉

Put question to trigger: but＜field customization term 〉,＜? C 〉={ getc (A, C ') }, but＜field customization term 〉,＜? C ' 〉=getc (C, A) }, but＜field customization term 〉

:＜? C〉the enquirement pattern

:＜? C '〉the enquirement pattern

}

For the concrete general polling language of using, we are example with " location of incident ", and are as follows about the enquirement subject description of " location of incident ":

Defquery location of incident ()

{

Illustrate: the place that is used to put question to incident.

Put question to trigger 1:＜? C 〉={ getc (A, C ') };＜? adverbial word 〉; [＜be | for] [＜| in];＜? C ' 〉=getc (C, A) };＜? incident 〉

:? C＜! What interrogative〉＜? this pronouns, general term for nouns, numerals and measure words 〉

:? C '＜! Place interrogative 〉

}

In " defquery location of incident language ", 1 enquirement trigger is arranged.As the case may be, the deviser can define a plurality of arbitrarily.Utilize this language, the deviser can define location of incident query language more specifically.Concerning specific object, for example, be the query language of definition " place of birth " and " scene ", the deviser can adopt the method for succession simply, is defined as follows:

The defquery place of birth (? incident=＜birth | give birth to,? this pronouns, general term for nouns, numerals and measure words=and＜people〉}) the succession location of incident

The defquery scene (? incident=＜take place | occur },? this pronouns, general term for nouns, numerals and measure words=and＜people〉}) the succession location of incident

For ease of carrying out template matches, we are compiled as the knowledge query template with a program compiler with the knowledge query language that defines, and write then in the query template storehouse.

For example, to the query template after the corresponding query language compiling of attribute " place of birth " be:

The # place of birth

＜C 〉; [＜be | for] [＜| in];＜! Place interrogative 〉;＜birth | give birth to〉@C '

＜! What interrogative〉＜people 〉; [＜be | for] [＜| in];＜C ' 〉;＜birth | give birth to〉@C

“ @C ' wherein " represent that this template puts question to property value, the i.e. value of the attribute of certain notion C " place of birth "; “ @C " represent that this template is to put question to notion, promptly the value of the attribute of which notion " place of birth " is C ' in the knowledge base.

2. we introduce and distinguish the sound model among the present invention.When under nonspecific occasion, carrying out speech polling without the unspecified person of voice training, owing to be subjected to the influence of noise, telephone line and declaimer's factors such as pronunciation, present speech recognition technology also is difficult to obtain satisfied recognition effect, text after the identification can have various mistakes, some mistake goes against accepted conventions very much, and the people has seen that what meaning all is confused about is.Therefore in order to make computing machine really " distinguish sound ", at first we need design a kind of sound model of distinguishing, to the voice mistake that the user may occur sort out, quantitative test and accurately estimating.

Distinguish that the sound model comprises: the occurrence cause of mistake, calculation of similarity degree, the trigger condition of distinguishing sound, the multiple selection rule of optimum solution among the sound result and the inference mechanism of knowledge etc. distinguished.We will realize an optimum balance: should correct maximum wrongly written character (even wrong relatively going against accepted conventions), guarantee that again correct word is not entangled by mistake, and be difficult to find such optimum in practice.In the above we for an example, read aloud " which symptom is diabetes have " with IBM ViaVoice, recognition result unexpectedly is " which diabetes have bestir oneself ".Its reason is that ViaVoice does not analyze from the angle of knowledge, and it thinks that " bestirring oneself " itself is a speech, and neither be very near from the distance of " symptom ", and promptly similarity is not high enough, so do not correct this mistake.Certainly this is a kind of way of insurance, guarantees that correct word or word can not entangled by mistake, but has but reduced error correction rate, has influenced accuracy of identification.We need be in conjunction with body and knowledge, study a kind of reach optimum balance distinguish the sound model, to improve error correction rate to greatest extent.

1) error reason.Because the user utilizes speech polling, the mistake that occurs in the speech polling text all is the voice mistake so, is characterized in, wrongly written or mispronounced characters is that font is not necessarily similar, but the same or analogous Chinese character that pronounces.As above " the shaking " in the example and " disease " pronunciation is identical, " works " with " shape " though pronounce different similar.

2) mis-classification.Divide from the angle of knowledge, the mistake that the user occurs can be divided into following three classes:

● the notion mistake

Example 1: propose to have how many people

Correctly: how many people the Yi nationality, distributed over Yunnan, Sichuan and Guizhou has

Example 2: which the raw material of yellow flag has

Correctly: which the raw material of the Radix Astragali has

" proposition " of example 1, " yellow flag " of example 2 all belongs to the notion mistake, and the notion that such wrong characteristics are knowledge bases has been made a mistake, and puts question to sentence pattern and quite right.

● the sentence pattern mistake

Example 3: in state-owned those cities

Correctly: in state-owned which city

Corresponding knowledge query template is:

＜C 〉;＜have under its command | comprise | have;＜! What interrogative〉[＜! Place, area noun 〉] '

("! What interrogative " in do not have " those ", have only " which ")

Example 4: the U.S. and ten unique

Correctly: when independent the U.S. is

Corresponding knowledge query template is:

＜C 〉;＜! Time interrogative 〉;＜independent | free〉@C '

("! The time interrogative " in do not have " with ten ", have only " when ")

The notion C that such wrong characteristics are knowledge bases is quite right, but puts question to sentence pattern wrong, and we are called the sentence pattern mistake with this mistake.

● mix mistake

Example 5: bearing is to examine independently

Correctly: when independently Cambodia is

Such wrong characteristics are that notion mistake and Template Error occur simultaneously.

3) calculation of similarity degree.The wrongly written character that we correct all has a common ground, wrongly written or mispronounced characters and correctly the word voice are similar, so we need determine that certain Chinese character corrects by calculation of similarity degree, how to correct.For the voice mistake is accurately estimated, the present invention proposes a kind of calculation of similarity degree model (the similar equal finger speech sound of mentioning among the present invention is similar).

Similarity be used for representing between two words or two speech between similarity degree, codomain is [0,1].From the angle of phonetic, a Chinese character C is made up of an initial consonant and a simple or compound vowel of a Chinese syllable, and we can be with (ic v) represents Chinese character, and wherein ic and v represent to form the initial consonant and the simple or compound vowel of a Chinese syllable (some Chinese character does not have initial consonant, then corresponding ic=sky) of this Chinese character respectively.(sh is i) with (s, i), this representation transfers phonetic consistent with the nothing of Chinese character so we can be expressed as Chinese character "Yes" and " four ".Although the GB-2312 Chinese character has more than 6700, all Chinese characters finally can be summed up as about 400 classes.Then, we analyze these 400 classes from etic angle, sum up the pronunciation similarity between class, and table 1 has gone out the similar data of portion's sorting room.

Given any two Chinese character C ₁=(ic ₁, v ₁) and C ₂=(ic ₂, v ₂), we are with their pronunciation similarity PSIM (C ₁, C ₂) be defined as:

● 1, if ic ₁=ic ₂And v ₁=v ₂

● CSIM ([(ic ₁, v ₁)], [(ic ₂, v ₂)]), if ic ₁≠ ic ₂Or v ₁≠ v ₂

Two Chinese phrase W ₁=C ₁C ₂... C _nAnd W ₂=D ₁D ₂... D _nBetween the pronunciation similarity be:

PSIM(W ₁，W ₂)＝∑PSIM(C _i，D _i)/n

Class 1	Class 2	CSIM(Class， Class)
Class 1	Class 2	CSIM(Class， Class)	[(b，ai)]	[(b，ei)]	0.8
[(ch，i)]	[(c，i)]	0.92	[(b，ai)]	[(b，ei)]	0.8
[(ch，i)]	[(c，i)]	0.92	[(ch，i)]	[(q，i)]	0.8
[(k，e)	[(g，e)]	0.75	[(ch，i)]	[(q，i)]	0.8
[(k，e)	[(g，e)]	0.75	[(zh，eng)]	[(zh，en)]	0.95
[(zh，uang)]	[(z，uo)]	0.7	[(zh，eng)]	[(zh，en)]	0.95
[(zh，uang)]	[(z，uo)]	0.7	[(sh，i)]	[(s，i)]	0.92
[(sh，i)]	[(s，e)]	0.65	[(sh，i)]	[(s，i)]	0.92
[(sh，i)]	[(s，e)]	0.65	[(y，un)]	[(y，uan)]	0.7
...	...	...	[(y，un)]	[(y，uan)]	0.7

The pronunciation similarity of table 1 sorting room

We introduce several definition again:

If the similarity that defines between 1 phonetically similar word word C and the source word C ' is 1, claim that then C is the phonetically similar word of C '.

If define similarity between 2 similar character word C and the source word C ' greater than certain threshold value μ 1, claim that then C is a similar character, and C be similar in appearance to C '.

If define similarity between 3 similar word speech W and the source speech W ' greater than certain threshold value μ 2, and the word in the speech is all corresponding similar, claim that then W is a similar word, and W is similar in appearance to W '.

If definition 4 accurate speech speech W occur at former text correspondence position, claim that then W is accurate speech.

Through experiment test, μ 1=0.6, μ 2=0.7.

For example: " symptom " and " bestirring oneself "

PSIM (" disease ", " shaking ")=CSIM ([(zh, eng)], [(zh, en)])=0.95＞μ 1

PSIM (" shape ", " work ")=CSIM ([(zh, uang)], [(z, uo)])=0.7＞μ 1

Because " disease " and " shaking ", " shape " is all corresponding similar with " work ", and PSIM (" symptom ", " bestir oneself ")=[PSIM (" disease ", " shake ")+PSIM (" shape ", " work ")]/2=[0.95+0.7]/2=0.825＞μ 2, so " bestirring oneself " similar in appearance to " symptom ", similarity is 0.825.

4) rule of similarity.When user inquiring is distinguished that cent is analysed, because mistake is often far off the beam, and the similarity between correct sentence is not high enough, so we put the threshold value of similar character and similar word very low, thousands of kinds of analog results will appear in such sentence, give the very big workload of having distinguished vocal cores.In order to realize distinguishing fast sound, we will produce these analog results according to certain rule, and correct result is occurred the earliest.

For example similarity analysis is carried out in user speech inquiry " U.S. and ten unique ", have with the similar word headed by " U.S. ": " U.S. ", " Mekong ", " foreign country ", " attractive in appearance ", " weber ", " U.S. " etc.; With " with " headed by similar word how " what food " arranged, " when ", " what ", " suitable ", " examining ", " putting the palms together before one ", " ", " river " etc.By such combination this speech polling that goes down several thousand kinds of analog results are just arranged, and we need to carry out analyzing and processing as a result to every kind, so we will compare the priority between similar word, remove to handle the most similar speech earlier.

Between speech priority relatively be divided into three kinds of situations: accurately speech and the accurately comparison of speech, the comparison of similar word and similar word, the accurately comparison of speech and similar word, we have summed up corresponding priority rule respectively at these three kinds of situations.

● if two speech all are accurate speech, and then length is preferential.As above " U.S. " has precedence over " U.S. " in the example.

● if two speech all are similar word, and then the many persons of unisonance number of words are preferential; If two speech unisonance numbers of words are identical, then similarity is preferential.As above in the example " when " have precedence over " how ".

● if two one of speech are accurate speech, and another is a similar word, and then similar word is better than accurate speech

Similar word number of words＞=accurate speech number of words ^*2, and the unisonance number of words in the similar word＞=accurate speech number of words.As above similar word in the example " when " have precedence over accurate speech " with ".

5) distinguish the trigger condition of sound, promptly when to user speech inquiry distinguish sound.Because distinguishing sound is to need to consume certain hour, the user speech query text after the voice software identification may be wrong, also may be quite right.We can not distinguish the sound processing to carry out at every turn, need definition to distinguish the trigger condition of sound.

At first, former query text is carried out participle, carry out template matches with the knowledge query template then.When one of following situation occurring, trigger and distinguish the sound operation.

● the participle failure;

● can not find the knowledge query template that any coupling is arranged with former query text;

● found the knowledge query template of mating, but differed (knowledge query template number of words/former query text number of words＜0.7) far away with former query text;

● found the knowledge query template of mating fully, but in knowledge base, do not found relevant knowledge with former query text.

If former query text has found relevant knowledge, then illustrate errorlessly, give the user with this knowledge feedback.

3. we introduce and distinguish the sound algorithm among the present invention.

The essence of distinguishing the sound algorithm among the present invention be exactly multi-level, can guiding by the knowledge query language of field customization and NKI knowledge base under, find and the most similar linguistic form of user speech query text.

Basic symbol is described:

Knowledge base dictionary: char*knodic[knodic_num];

Query template dictionary: char*keydic[keydic_num]; Similar character structure: typedef struct class_simzidata{

Char zi[2]; // similar character

Int simdegree; The similarity of // this word and former word

Int dic_flag; Whether speech headed by this word is arranged in // query language and the knowledge base } class_simzidata; The similar list structure of Chinese character: typedef struct class_simzitable{

Char zi[2]; // Chinese character

Long keydic_lb; The reference position of // this word in the query template dictionary

Long keydic_hb; The rearmost position of // this word in the query template dictionary

Long knodic_lb; The reference position of // this word in the knowledge base dictionary

Long knodic_hb; The rearmost position of // this word in the knowledge base dictionary

Int simzi_num; The similar number of words of // this word

Class_simzidata*simzi; The information of // each similar character } class_simzitable; Word structure typedef struct phrase{ in the // participle

Char*phrase_str; // this speech content

Long lexi_no; The location index of // this speech in the query template storehouse

Int var_flag; // this speech is the knowledge base notion or the speech of query template } phrase; // sentence participle information table typedef struct decompose_info{

Int phrase_count; // speech the number that comprises

Int var_phrase_count; // notion number

Struct phrase*phrase_head; The information of each speech in // this word segmentation result } decompose_info; The user puts question to the feedack list structure: typedef struct info_table{

Char*access_time; // the access time

Char*action; // action: inquiry or adds

Char*question; // corresponding complete problem

Char match_type[6]; // accurately still fuzzy matching

Char*query_type; The query type that // user puts question to

Char*concept; // notion

Char*attr_name; // attribute-name

Char*attr_value; // property value

Int var_num; // notion number

Char*var_list[VAR_COUNT]; // variable list

Char*answer; The answer of // feedback } info_table; // variable description question: the user inquires about IdentifyInfoTable: distinguish the knowledge feedback information IdentifyResult that sound obtains: distinguish as a result certain similar word segmentation result sen_set of wordsegment. user speech query text of sound: candidate template collection sen: certain candidate template SimziList: the character set SimciList similar to certain character: similar word set; By similarity sort descending Success: distinguish the mark of sound success // function representation AddSegTail (wordsegment, Wi)

With speech Wi add word segmentation result wordsegmentCompWordSim (W1, W2)

Calculate the similar value GetText (wordsegment) of speech W1 and W2

Obtain word segmentation result wordsegment correspondence sentence InsertSimci (SimciList, W, simdata)

Similar word W and similar value simdata thereof are inserted among the SimciList, and keep the similarity descending order of SimciList

Distinguish the sound master routine:

Input: user speech query text question

Output: distinguish sound IdentifyResult as a result, knowledge feedback information IdentifyInfoTablevoid IdentifyProun (char*question, decompose_info wordsegment)

// if distinguish that sound is successful, then return

if(Success＝1)

return；

If (question is empty)

{

If // this sentence participle finishes, then obtained a kind of complete word segmentation result, advance

The checking of row coupling

IdentifyInfoTable＝ProcessSegment(wordsegment)；

If // this participle has found relevant knowledge, then distinguish the sound success

If (IdenttifyInfoTable non-NULL)

{

Success＝1；

The corresponding sentence of // this participle is distinguishes the sound result

IdentifyResult＝GetText(wordsegment)；

}

else

{

// continuation participle

Char＝question[0]；

// find the similar character collection SimziList of Char

For?every?Si?in?SimziList

{

// in the knowledge base dictionary, search with the similar word headed by the Si

if(zisim[neima].knodic_lb＞0)

{

for(i＝Si.knodic_lb；i＜＝Si.knodic_hb；i++)

{

// obtain this speech corresponding characters string in original subscriber inquiry

Initword＝SubString(question，0，len(knodic[i]))

The similarity of this speech of // calculating and former character string

simdata＝CompWordSim(knodic[i]，Initword)；

If (simdata＞similar word threshold value)

{

// if similar, then this speech analog result is successively decreased by relative importance value

Order add in the similar word tabulation

InsertSimci(SimciList，knodic[i]，simdata)；

}

// in the query template dictionary, search with the similar word headed by the Si

if(zisim[neima].keydic_lb＞0)

{

for(i＝Si.keydic_lb；i＜＝Si.keydic_hb；i++)

{

Initword＝SubString(question，0，len(keydic[i]))

The similarity of this speech of // calculating and former character string

simdata＝CompWordSim(keydic[i]，Initword)；

If (simdata＞similar word threshold value)

{

Order add in the similar word tabulation

InsertSimci(SimciList，keydic[i]，simdata)；

}

// generate participle by similarity priority descending order

For?every?Wi?in?SimciList

{

// this similar word is added to current word segmentation result

AddSegTail(wordsegment，Wi)；

// obtain still untreated string

RemainStr＝SubString(question，0，len(Wi))；

// recurrence is handled remaining string

IdentifyProun(RemainStr，wordsegment)；

}

The input of coupling proving program: certain word segmentation result wordsegment output of user inquiring sentence: the feedback information table info_table ProcessSegment (decompose_info wordsegment) of this word segmentation result

// ask the common factor of each speech location index collection in the query template storehouse among the wordsegment, get

To the appearance space of this word segmentation result in the query template storehouse

sen_set＝GetIntersection(wordsegment)；

// each candidate template is judged screening, see whether it mates with wordsegment

for?every?sen?in?sen_set

{

An if (wordsegment. variable number!=sen. variable number)

Continue; // do not match

If (the necessary speech number of wordsegment. speech number＜sen. || wordsegment. speech number＞sen.

The speech number)

continue；

(the necessary speech position sequence-wordsegment. of sen. non-variable speech is in template for if

A position sequence!=wordsegmenmt. variable)

continue；

If // this template satisfies above-mentioned condition, and has successfully carried out knowledge verification,

Then template matches success.

query_info_table＝VerifyKnowledge(sen)；

if(query_info_table.answer！＝NULL)

return?query_info_table；

}

return?empty；

}

As shown in Figure 3, distinguish that the treatment step of sound is as follows:

1) according to distinguishing sound model and query template storehouse, knowledge base, the user speech query text is carried out similar Word Intelligent Segmentation, whenever obtain a kind of word segmentation result, then change 2).

2) according to word segmentation result retrieval and inquisition template base, find the template that matches, judge then whether this template is complementary with current word segmentation result in form, thereby obtain the candidate template set.

3) each candidate template is carried out knowledge verification.Carry out the knowledge base retrieval according to the enquirement type of template and the KAPI function of realization.

If found relevant knowledge, then distinguish the sound success, the sentence of this word segmentation result correspondence be exactly lining user inquiring text distinguish the sound result, and will inquire about answer and feed back to the user.

If do not find relevant knowledge, then change 1), continue similar word segmentation processing.

Below we are elaborated to each several part.

I. similar Word Intelligent Segmentation

The used dictionary of participle is knowledge base dictionary and query template dictionary, and the knowledge base dictionary comprises all notions that knowledge base occurs, and the query template dictionary comprises all keywords that occur in the query template storehouse and the position in template base thereof.The speech that occurs in the user inquiring text both may be the similar word of knowledge base notion, also may be the similar word of query template speech.

The participle here is similar participle, generates and all similar word segmentation result of former inquiry sentence voice.Through experimental analysis, mistake in the recognition result of unspecified person speech polling under nonspecific occasion and correct result often have very big difference, so our very low with the definition of the threshold value of similarity distinguished the accuracy of sound with raising.Make that so just the number of similar word segmentation result is very huge, reach several thousand even several ten thousand.We adopt the method for similar word ordering, and the order that each word segmentation result is successively decreased according to similarity occurs, and whenever obtains a kind of word segmentation result, just go coupling checking in template base and the knowledge base.In case find relevant knowledge, then distinguish the sound success, to return at once, the low word segmentation result of those similarities of back this moment does not occur as yet.So just, greatly reduce the time complexity of distinguishing sound.

Example such as Fig. 4 show that wherein the word segmentation result program of dotted portion is not carried out.

II. template matches

In fact the problem of template matches is exactly to judge that a sample belongs to the problem of which class, and it is sample to be analyzed that the user puts question to sentence, and each template in the query template storehouse is the classification of various enquirement forms.

The step of template matches is as follows:

To certain similar word segmentation result of user inquiring sentence, do following the processing.

1) at first according to the location index of each keyword in template base, find their appearance space, the space appears in the sample that obtains this word segmentation result then by seeking common ground.

2) candidate template that sample is occurred in the space is screened, and the condition of screening is as follows:

● the variable number of the variable number=template in the word segmentation result

● the total speech number of the total speech number＜=template of necessary speech number＜=word segmentation result of template

Word segmentation result must contain necessary speech all in the template, and is indispensable, i.e. { the necessary speech position sequence in the template }-position sequence of each non-variable speech in template in the word segmentation result={ all variablees that occur in the word segmentation result }

● each speech each speech occurs in order and the template order to occur consistent in the word segmentation result.

In order whether this conditional decision coupling, considers the freedom that the user puts question to, and can get rid of this condition and realize unordered coupling.

We have obtained the candidate template set that is complementary in form with this word segmentation result according to the screening of these conditions.

III. knowledge verification

The candidate template that obtain this moment also needs to carry out the knowledge inspection, and we remove to call corresponding knowledge base api function according to the attribute and the enquirement type of template correspondence, look to find correct option.

KAPI be we develop about knowledge base interface operable function, for upper level applications provides service.Common KAPI has:

// obtain property value according to notion and attribute

Get_attribute_value (concept, attribute), abbreviation getv (C, A)

// obtain notion according to attribute and property value

(attribute attribute_value), is called for short getc (A, C ') to get_concepts

// obtain an attribute that notion is all

get_all_attributes(concept)

//isa reasoning judges that a notion is another notion

isa_reasoning(concept1，concept2)

The part that notion is another notion is judged in //partof reasoning

partof_reasoning(concept1，concept2)

IV. experimental data

We as speech recognition interface, under noisy environment read aloud 100 problems by a plurality of people that do not pass through any voice training with IBM ViaVoice2000, and Fig. 5 has listed partial data.Experimental data shows through distinguishing sound, and error rate is reduced to 12% from original 65%, has obtained satisfied result.

Claims

1. A method for distinguishing sounds in a speech query, comprising the steps of: utilizing a speech recognition interface to recognize speech, after the recognition, also comprising the steps of:

Determine the user-customizable knowledge query language with inheritance relationship;

Form a knowledge-based sound recognition model, including:

Analysis of error causes and classification,

Build a similarity calculation model,

determine similarity rules,

Define the trigger conditions for sound identification;

Execute the sound recognition algorithm based on knowledge, query language and sound recognition model, including:

Perform similar smart word segmentation,

match query template,

Perform knowledge verification.

2. by the described method of claim 1, it is characterized in that the knowledge query language that the described determination user can customize comprises the step:

Cluster all attributes in the knowledge base;

Define the way to ask questions for specific attributes;

Generate a set of query templates.

3. by the described method of claim 2, it is characterized in that described attribute clustering comprises the step:

Group attributes that are queried in a similar way;

Abstract common query patterns.

4. by the described method of claim 1, it is characterized in that described speech recognition interface

The port is IBM ViaVoice.

5. by the described method of claim 1, it is characterized in that described error classification comprises: concept error, sentence pattern error and mixed error.

6. by the described method of claim 1, it is characterized in that described similarity calculation uses following formula:

1. If ic ₁ =ic ₂ and v ₁ =v ₂ CSIM([(ic ₁ , v ₁ )], [(ic ₂ , v ₂ )]), if ic ₁ ≠ic ₂ or v ₁ ≠v ₂ .

7. by the described method of claim 1, it is characterized in that described similar rule comprises:

If both words are exact words, length takes precedence;

If both words are similar words, the one with more homonyms is preferred; if the two words have the same number of homophones, the similarity is preferred;

If one of the two words is an exact word and the other is a similar word, the similar word is better than the exact word.

8. by the described method of claim 1, it is characterized in that the triggering condition of described definition sound identification comprises when one of following situation occurs, trigger sound identification operation:

Word segmentation failed;

No knowledge query template matching the original query text could be found;

Found a knowledge query template that matches the original query text, and the matching degree is less than 70%;

A knowledge query template that exactly matches the original query text was found, but no relevant knowledge was found in the knowledge base.

9. by the described method of claim 1, it is characterized in that the sound discrimination algorithm based on knowledge, query language and sound discrimination model comprises steps:

According to the sound recognition model, query template library, and knowledge base, similar intelligent word segmentation is performed on the user voice query text;

Retrieve and query the template library according to the word segmentation result, find a matching template, and judge whether the template matches the current word segmentation result in form;

Knowledge verification is carried out on each candidate template, and knowledge base retrieval is carried out according to the question type of the template and the realized knowledge application programming interface KAPI function.