US20080154595A1 - System for classification of voice signals - Google Patents
System for classification of voice signals Download PDFInfo
- Publication number
- US20080154595A1 US20080154595A1 US12/042,111 US4211108A US2008154595A1 US 20080154595 A1 US20080154595 A1 US 20080154595A1 US 4211108 A US4211108 A US 4211108A US 2008154595 A1 US2008154595 A1 US 2008154595A1
- Authority
- US
- United States
- Prior art keywords
- voice signal
- classifier
- probability
- integer
- predefined
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 239000013598 vector Substances 0.000 claims abstract description 49
- 238000007619 statistical method Methods 0.000 claims abstract description 10
- 239000003795 chemical substances by application Substances 0.000 claims description 25
- 238000000034 method Methods 0.000 abstract description 14
- 230000003595 spectral effect Effects 0.000 abstract description 13
- 239000000284 extract Substances 0.000 abstract description 4
- 238000010586 diagram Methods 0.000 description 16
- 230000007704 transition Effects 0.000 description 12
- 238000001228 spectrum Methods 0.000 description 7
- 241001465754 Metazoa Species 0.000 description 5
- 238000012545 processing Methods 0.000 description 5
- 241000282326 Felis catus Species 0.000 description 4
- 230000008569 process Effects 0.000 description 3
- 238000003908 quality control method Methods 0.000 description 3
- 238000013519 translation Methods 0.000 description 3
- 230000014616 translation Effects 0.000 description 3
- 241000282472 Canis lupus familiaris Species 0.000 description 2
- 238000012512 characterization method Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 241000283690 Bos taurus Species 0.000 description 1
- 241000282693 Cercopithecidae Species 0.000 description 1
- 241001494479 Pecora Species 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 238000012384 transportation and delivery Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1815—Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
Definitions
- This invention relates generally to electronic voice processing systems, and relates more particularly to a system and method for voice signal classification based on statistical regularities in voice signals.
- FIG. 1 is a block diagram of a speech recognition system of the prior art.
- the speech recognition system includes a microphone 110 , an analog-to-digital (A/D) converter 115 , a feature extractor 120 , a speech recognizer 125 , and a text string 130 .
- Microphone 110 receives sound energy via pressure waves (not shown).
- Microphone 110 converts the sound energy to an electronic analog voice signal and sends the analog voice signal to A/D converter 115 .
- A/D converter 115 samples and quantizes the analog signal, converting the analog voice signal to a digital voice signal.
- Typical sampling frequencies are 8 KHz and 16 KHz.
- A/D converter 115 then sends the digital voice signal to feature extractor 120 .
- feature extractor 120 segments the digital voice signal into consecutive data units called frames, and then extracts features that are characteristic to the voice signal of each frame. Typical frame lengths are ten, fifteen, or twenty milliseconds.
- Feature extractor 120 performs various operations on the voice signal of each frame. Operations may include transformation into a spectral representation by mapping the voice signal from time to frequency domain via a Fourier transform, suppressing noise in the spectral representation, converting the spectral representation to a spectral energy or power signal, and performing a second Fourier transform on the spectral energy or power signal to obtain cepstral coefficients.
- the cepstral coefficients represent characteristic spectral features of the voice signal.
- feature extractor 120 generates a set of feature vectors whose components are the cepstral coefficients.
- Feature extractor 120 sends the feature vectors to speech recognizer 125 .
- Speech recognizer 125 includes speech models and performs a speech recognition procedure on the received feature vectors to generate the text string 130 .
- speech recognizer 125 may be implemented as a Hidden Markov Model (HMM) recognizer.
- HMM Hidden Markov Model
- Speech recognition systems translate voice signals into text; however, speaker-independent speech recognition systems are generally rigid, inaccurate, computationally-intensive, and are not able to recognize true natural language. For example, typical speech recognition systems have a voice-to-text translation accuracy rate of 40%-50% when processing true natural language voice signals. It is difficult to design a highly accurate natural language speech recognition system that generates unconstrained voice-to-text translation in real-time, due to the complexity of natural language, the complexity of the language models used in speech recognition, and the limits on computational power.
- a system and method for classifying a voice signal to a category from a set of predefined categories, based upon a statistical analysis of features extracted from the voice signal.
- the system includes an acoustic processor that generates a feature vector and an associated integer label for each frame of the voice signal, a memory for storing statistical characterizations of a set of predefined categories and agents associated with each predefined category, and a classifier for classifying the voice signal to a predefined category based upon a statistical analysis of the received output of the acoustic processor.
- the acoustic processor includes an FFT for generating a spectral representation from the voice signal, a feature extractor for generating feature vectors characterizing the voice signal, a vector quantizer for quantizing the feature vectors and generating an integer label for each feature vector, and a register for storing the integer labels.
- the classifier computes a probability of occurrence for the output of the acoustic processor based on each of the statistical characterizations of the predefined categories, and classifies the voice signal to the predefined category with the highest probability or to a set of predefined categories with the highest probabilities. Furthermore, the classifier accesses memory to determine an agent associated with the predefined category or categories and routes a caller associated with the voice signal to the agent.
- the agent may be a human agent or a software agent.
- FIG. 1 is block diagram of a speech recognition system of the prior art
- FIG. 2 is a block diagram of one embodiment of a voice signal classification system, according to the present invention.
- FIG. 3 is a block diagram of one embodiment of the acoustic processor of FIG. 2 , according to the invention.
- FIG. 4A is a block diagram of one embodiment of the classifier of FIG. 2 , according to the invention.
- FIG. 4B is a block diagram of one embodiment of probabilistic suffix tree PST 11 of FIG. 4A , according to the invention.
- FIG. 4C is a block diagram of one embodiment of probabilistic suffix tree PST 21 of FIG. 4A , according to the invention.
- FIG. 5 is a block diagram of another embodiment of the classifier of FIG. 2 , according to the invention.
- FIG. 6 is a block diagram of one embodiment of a hierarchical structure of classes, according to the invention.
- FIG. 7 is a flowchart of method steps for classifying speech, according to one embodiment of the invention.
- the present invention classifies a voice signal based on statistical regularities in the signal.
- the invention analyzes the statistical regularities in the voice signal to determine a classification category.
- the voice signal classification system of the invention applies digital signal processing techniques to a voice signal.
- the system receives the voice signal and computes a set of quantized feature vectors that represents the statistical characteristics of the voice signal.
- the system analyzes the feature vectors and classifies the voice signal to a predefined category from a plurality of predefined categories.
- the system contacts an agent associated with the predefined category.
- the agent may be a person or an automated process that provides additional services to a caller.
- FIG. 2 is a block diagram of one embodiment of a voice signal classification system 200 , according to the invention.
- Voice classification system 200 includes a sound sensor 205 , an amplifier 210 , an A/D converter 215 , a framer 220 , an acoustic processor 221 , a classifier 245 , a memory 250 , and an agent 255 .
- System 200 may also include noise-reduction filters incorporated in A/D converter 215 , acoustic processor 221 , or as separate functional units.
- Sound sensor 205 detects sound energy and converts the detected sound energy into an electronic analog voice signal. In one embodiment, sound energy is input to system 200 by a speaker via a telephone call. Sound sensor 205 sends the analog voice signal to amplifier 210 .
- Amplifier 210 amplifies the analog voice signal and sends the amplified analog voice signal to A/D converter 215 .
- A/D converter 215 converts the amplified analog voice signal into a digital voice signal by sampling and quantizing the amplified analog voice signal.
- A/D converter 215 then sends the digital voice signal to framer 220 .
- Framer 220 segments the digital voice signal into successive data units called frames, where each frame occupies a time window of duration time T.
- a frame generally includes several hundred digital voice signal samples with a typical duration time T of ten, fifteen, or twenty milliseconds. However, the scope of the invention includes frames of any duration time T and any number of signal samples.
- Framer 220 sends the frames to acoustic processor 221 . Sound sensor 205 , amplifier 210 , A/D converter 215 , and framer 220 are collectively referred to as an acoustic front end to acoustic processor 221 .
- the scope of the invention covers other acoustic front ends configured to receive a voice signal, and generate a digital discrete-time representation of the voice signal.
- Acoustic processor 221 generates a feature vector and an associated integer label for each frame of the voice signal based upon statistical features of the voice signal. Acoustic processor 221 is described below in conjunction with FIG. 3 .
- classifier 245 classifies the voice signal to one of a set of predefined categories by performing a statistical analysis on the integer labels received from acoustic processor 221 . In another embodiment of the invention, classifier 245 classifies the voice signal to one of the set of predefined categories by performing a statistical analysis on the feature vectors received from acoustic processor 221 . Classifier 245 is not a speech recognition system that outputs a sequence of words. Classifier 245 classifies the voice signal to one of the set of predefined categories based upon the most likely content of the voice signal. Classifier 245 computes the probabilities that the voice signal belongs to each of a set of predefined categories based upon a statistical analysis of the integer labels generated by acoustic processor 221 .
- Classifier 245 assigns the voice signal to the predefined category that produces the highest probability. Classifier 245 , upon assigning the voice signal to one of the set of predefined categories, accesses memory 250 to determine which agent is associated with the predefined category. Classifier 245 then routes a caller associated with the voice signal to the appropriate agent 255 .
- Agent 255 may be a human agent or a software agent.
- FIG. 3 is a block diagram of one embodiment of acoustic processor 221 of FIG. 2 , according to the invention.
- acoustic processor 221 includes an FFT 325 , a feature extractor 330 , a vector quantizer 335 , and a register 340 .
- FFT 325 generates a spectral representation for each frame received from framer 220 by using a computationally efficient algorithm to compute the discrete Fourier transform of the voice signal.
- FFT 325 transforms the time-domain voice signal to the frequency-domain spectral representation to facilitate analysis of the voice signal by signal classification system 200 .
- FFT 325 sends the spectral representation of each frame to feature extractor 330 .
- Feature extractor 330 extracts statistical features of the voice signal and represents those statistical features by a feature vector, generating one feature vector for each frame. For example, feature extractor 330 may generate a smoothed version of the spectral representation called a Me 1 spectrum. The statistical features are identified by the relative energy in the Me 1 spectrum coefficients. Feature extractor 330 then computes the feature vector whose components are the Me 1 spectrum coefficients. Typically the components of the feature vector are cepstral coefficients, which feature extractor 330 computes from the Me 1 spectrum. All other techniques for extracting statistical features from the voice signal and processing the statistical features to generate feature vectors are within the scope of the invention. Feature extractor 330 sends the feature vectors to vector quantizer 335 . Vector quantizer 335 quantizes the feature vectors and assigns each quantized vector one integer label from a set of predefined integer labels.
- vector quantizer 335 snaps components of an n-dimensional feature vector to the nearest quantized components of an n-dimensional quantized feature vector.
- vector quantizer 335 generates a single scalar value for each quantized feature vector corresponding to a unique integer label of this vector among all different quantized feature vectors. For example, given a quantized n-dimensional feature vector v with quantized components (a 1 , a 2 , a 3 , . . .
- Vector quantizer 335 then assigns an integer label from the set of predefined integer labels to each computed SV.
- Vector quantizer 335 sends the integer labels to register 340 , which stores the labels for all frames in the voice signal.
- Register 340 may alternatively comprise a memory of various storage-device configurations, for example Random-Access Memory (RAM) and non-volatile storage devices such as floppy-disks or hard disk-drives.
- RAM Random-Access Memory
- register 340 sends the entire sequence of integer labels to classifier 245 .
- acoustic processor 221 may functionally combine FFT 325 with feature extractor 330 , or may not include FFT 325 . If acoustic processor 221 does not perform an explicit FFT on the voice signal at any stage, acoustic processor 221 may use indirect methods known in the art for extracting statistical features from the voice signal. For example, in the absence of FFT 325 , feature extractor 330 may generate an LPC spectrum directly from the time domain representation of the signal. The statistical features are identified by spectral peaks in the LPC spectrum and are represented by a set of LPC coefficients. Then, in one embodiment, feature extractor 330 computes the feature vector whose components are the LPC coefficients. In another embodiment, feature extractor 330 computes the feature vector whose components are cepstral coefficients, which feature extractor 330 computes from the LPC coefficients by taking a fast Fourier transform of the LPC spectrum.
- FIG. 4A is a block diagram of one embodiment of classifier 245 of FIG. 2 , according to the invention.
- Classifier 245 includes one or more probabilistic suffix trees (PSTs) grouped together by voice classification category 410 .
- category 1 410 a may be “pets” and includes PST 11 , PST 12 , and PST 13 .
- Category 2 410 b may be “automobile parts” and includes PST 21 , PST 22 , PST 23 , and PST 24 . Any number and type of voice classification categories 410 and any number of PSTs per category are within the scope of the invention.
- FIG. 4B is a block diagram of one embodiment of PST 11 from category 1 410 a and FIG. 4C is a block diagram of one embodiment of PST 21 from category 2 410 b .
- the message information stored in register 340 can be considered as a string of integer labels.
- a suffix is a contiguous set of integer labels that terminates at that position.
- Suffix trees are data structures comprising a plurality of suffixes for a given string, allowing problems on strings, such as substring matching, to be solved efficiently and quickly.
- a PST is a suffix tree in which each vertex is assigned a probability.
- Each PST has a root vertex and a plurality of branches.
- a path along each branch comprises one or more substrings, and the substrings in combination along a specific branch define a particular suffix.
- PST 11 of FIG. 4B includes 9 suffixes represented by 9 branches, where a substring of each branch is defined by an integer label.
- a 7-1-2 sequence of integer labels along a first branch defines a first suffix
- a 7-1-4 sequence of integer labels along a second branch defines a second suffix
- a 7-8-2 sequence of integer labels along a third branch defines a third suffix
- a 7-8-4 sequence of integer labels along a fourth branch defines a fourth suffix.
- a probability is assigned to each vertex of each PST in each category 410 , based upon suffix usage statistics in each category 410 . For example, suffixes specified by the PSTs of category 1 410 a ( FIG.
- the PSTs associated with each voice classification category 410 are built from training sets.
- the training sets for each category include voice data from a variety of users such that the PSTs are built using a variety of pronunciations, inflections, and other such criteria.
- classifier 245 receives a sequence of integer labels from acoustic processor 221 associated with a voice message. Classifier 245 computes the probability of occurrence of the sequence of integer labels in each category using the PSTs. In one embodiment, classifier 245 determines a total probability for the sequence of integer labels for each PST in each category. Classifier 245 determines the total probability for a sequence of integer labels applied to a PST by determining a probability at each position in the sequence based on the longest suffix present in that PST, then calculating the product of the probabilities at each position. Classifier 245 then determines which category includes the PST that produced the highest total probability, and assigns the message to that category.
- classifier 245 determines the probability of a longest suffix at each of the seven locations in the integer label sequence.
- Classifier 245 reads the first location in the sequence of integer labels as the integer label 4. Since the integer label 4is not associated with a branch labeled 4 that originates from a root vertex 420 of PST 11 , classifier 245 assigns a probability of root vertex 420 (e.g., 1) to the first location.
- the second location in the sequence of integer labels is the integer label 1.
- the longest suffix associated with the second location that is also represented by a branch originating from root vertex 420 is the suffix corresponding to the integer label 1, since the longest suffix corresponding to the integer label sequence 1-4 does not correspond to any branches similarly labeled originating from root vertex 420 . That is, PST 11 does not have a branch labeled 1-4 that originates from root vertex 420 . Therefore, classifier 245 assigns the probability defined at a vertex 422 (P( 1 )) to the second location.
- P( 1 ) The third location in the sequence of integer labels is the integer label 7.
- classifier 245 assigns a probability associated with a vertex 424 (P(7-1-4)) to the third location.
- the next two locations in the sequence of integer labels correspond to the integers 2 and 3, respectively, and are not associated with any similarly labeled branches the originate from root vertex 420 , and therefore classifier 245 assigns the probability of root vertex 420 to these next two locations.
- the sixth location in the sequence corresponds to the integer label 1, and the longest suffix ending at the sixth location that is represented by a branch in PST 11 is the suffix 1-3-2.
- classifier 245 assigns a probability associated with a vertex 426 (P(1-3-2)) to the sixth location along the sequence.
- P(1-3-2) a probability associated with a vertex 426
- sequence of integer labels for this examples includes only seven integer labels, any number of integer labels is within the scope of the invention. The number of integer labels in the sequence depends on the number of frames of the message, which in turn depends on the duration of the voice signal input to system 200 .
- FIG. 5 is a block diagram of another embodiment of classifier 245 , according to the invention.
- the FIG. 5 embodiment of classifier 245 includes three states and nine arcs, but the scope of the invention includes classifiers with any number of states and associated arcs. Since each state is associated with one of the predefined integer labels, the number of states is equal to the number of predefined integer labels.
- the FIG. 5 embodiment of classifier 245 comprises three predefined integer labels, where state 1 ( 505 ) is identified with integer label 1, state 2 ( 510 ) is identified with integer label 2, and state 3 ( 515 ) is identified with integer label 3.
- the arcs represent the probability of a transition from one state to another state or the same state.
- a 12 is the probability of transition from state 1 ( 505 ) to state 2 ( 510 )
- a 21 is the probability of transition from state 2 ( 510 ) to state 1 ( 505 )
- a 11 is the probability of transition from state 1 ( 505 ) to state 1 ( 505 ).
- the transition probabilities a ij (L) depend on the integer labels L of the quantized speech.
- classifier 245 computes all permutations of the integer labels received from acoustic processor 221 and computes a probability of occurrence for each permutation. Classifier 245 associates each permutation of the received integer labels to a unique sequence of states.
- the sequences of states include, for example, 1 ⁇ 1 ⁇ 1, 1 ⁇ 1 ⁇ 2, 1 ⁇ 2 ⁇ 1, 1 ⁇ 1 ⁇ 3, 1 ⁇ 3 ⁇ 1, 1 ⁇ 2 ⁇ 1, 1 ⁇ 2 ⁇ 2, 1 ⁇ 3 ⁇ 3, and 1 ⁇ 2 ⁇ 3.
- the transition probabilities are a 11 (L), a 22 (L), a 33 (L), a 12 (L), a 21 (L), a 13 (L), a 31 (L), a 23 (L), and a 32 (L).
- classifier 245 assigns an initial starting probability to each state. For example, classifier 245 assigns to state 1 ( 505 ) a probability a 11 , which represents the probability of starting in state 1, to state 2 ( 510 ) a probability a 12 , which represents the probability of starting in state 2, and to state 3 ( 515 ) a probability a 13 , which represents the probability of starting in state 3.
- classifier 245 receives integer labels (1,2,3), then classifier 245 computes six sequences of states 1 ⁇ 2 ⁇ 3, 1 ⁇ 3 ⁇ 2, 2 ⁇ 1 ⁇ 3, 2 ⁇ 3 ⁇ 1, 3 ⁇ 1 ⁇ 2, and 3 ⁇ 2 ⁇ 1, and an associated probability of occurrence for each sequence.
- the six sequences of states are a subset of the 27 possible sequences of states. For example, classifier 245 computes the total probability of the 1 ⁇ 2 ⁇ 3 sequence of states by multiplying the probability of starting in state 1, a 11 , by the probability a 12 (L 1 ) of a transition from state 1 to state 2 when the first integer label of a sequence of integer labels appears, by the probability a 23 (L 2 ) of a transition from state 2 to state 3 when the second integer label of the sequence appears.
- Classifier 245 calculates the total probabilities for the remaining four sequences of states in a similar manner. Classifier 245 then classifies the voice signal to one of a set of predefined categories associated with the sequence of states with the highest probability of occurrence. Some of the sequences of states may not have associated categories, and some of the sequences of states may have the same associated category. If there is no predefined category associated with the sequence of states with the highest probability of occurrence, then classifier 245 classifies the voice signal to a predefined category associated with the sequence of states with the next highest probability of occurrence.
- Voice classification system 200 may be implemented in a voice message routing system, a quality-control call center, an interface to a Web-based voice portal, or in conjunction with a speech-to-text recognition engine, for example.
- a retail store may use voice signal classification system 200 to route telephone calls to an appropriate department (agent) based upon a category to which a voice signal is classified. For example, a person may call the retail store to inquire whether the store sells a particular brand of cat food. More specifically, a person may say the following: “I was wondering if you carry, . . . uh, . . . well, if you stock or have in store cat food X, well actually cat food for my kitten, and if so, could you tell me the price of a bag.
- voice signal classification system 200 classifies the received natural language voice signal into a category based upon the content of the voice signal. For example, system 200 may classify the voice signal to a pet department category, and therefore route the person's call to the pet department (agent). However, in addition, system 200 may classify the speech into other categories, such as billing, accounting, employment opportunities, deliveries, or others. For example, system 200 may classify the speech to a pricing category that routes the call to an associated agent that can immediately answer the caller's questions concerning inventory pricing.
- System 200 may classify voice signals to categories associated with predefined items on a menu. For example, a voice signal may be classified to a category associated with a software agent that activates a playback of a predefined pet department menu. The caller can respond to the pet department menu with additional voice messages or a touch-tone keypad response. Or the voice signal may be classified to another category whose associated software agent activates a playback of a predefined pricing menu.
- system 200 may be implemented in a quality control call center that classifies calls into complaint categories, order categories, or personal call categories, for example. An agent then selects calls from the various categories based upon the agent's priorities at the time. Thus, system 200 provides an effective and efficient manner of customer-service quality control.
- system 200 may be configured as an interface to voice portals, classifying calls to various categories such as weather, stock, or traffic, and then routing and connecting the call to an appropriate voice portal.
- system 200 is used in conjunction with a speech-to-text recognition engine.
- a voice signal is assigned to a particular category that is associated with a predefined speech model including a defined vocabulary set for use in the recognition engine. For instance, a caller inquiring about current weather conditions in Oklahoma City would access the recognition engine with a speech model/vocabulary set including voice-to-text translations for words such as “storm”, “rain”, “hail”, and “tornado.”
- the association of speech models/vocabulary sets with each voice signal category reduces the complexity of the speech-to-text recognition engine and consequently reduces speech-to-text processing times.
- the combination of system 200 with the speech-to-text recognition engine may classify voice signals into language categories, thus making the combination of system 200 and the speech-to-text recognition engine language independent. For example, if voice classification system 200 classifies a voice signal to a German language category, then the recognition engine uses a speech model/vocabulary set associated with the German language category to translate the voice signal.
- system 200 may be implemented to classify voice signals into categories that are independent of the specific spoken words or text of the call. For example, system 200 may be configured to categorize a caller as male or female as the content of a male voice signal typically is distinguishable from the content of a female voice signal. Similarly, system 200 may be configured to identify a caller as being one member of a predetermined group of persons as the content of the voice signal of each person in the group would be distinguishable from that of the other members of the group. System 200 therefore may be used, for example, in a caller identification capacity or a password protection or other security capacity.
- system 200 may be used to categorize voice signals as either male or female, system 200 may be used to distinguish between any voice signal sources where the voice signals at issue are known to have different content. Such voice signals are not required to be expressed in a known language.
- system 200 may be used to distinguish between various types of animals, such as cats and dogs or sheep and cows.
- system 200 may be used to distinguish among different animals of the same type, such as dogs, where a predetermined group of such animals exists and the voice signal content of each animal in the group is known. In this case, system 200 may be used to identify any one of the animals in the group in much the same way that system 200 may be used to identify a caller as described above.
- FIG. 6 is a block diagram of one embodiment of a hierarchical structure of classes 600 , according to the invention.
- the hierarchical structure includes a first level class 605 , a second level class 610 , and a third level class 615 .
- the first level class 605 includes language categories, such as an English language category 620 , a German language category 625 , and a Spanish language category 630 .
- the second level class 610 includes a pricing category 635 , a complaint category 640 , and an order category 645 .
- the third level class 615 includes a hardware category 650 , a sporting goods category 655 , and a kitchen supplies category 660 .
- voice classification system 200 receives a call and classifies the caller's voice signal 601 into English category 620 , then classifies voice signal 601 into order 645 subcategory, and then classifies voice signal 601 into sporting goods 655 sub-subcategory. Finally, system 200 routes the call to an agent 665 associated with ordering sporting goods supplies in English.
- the configuration of system 200 with the hierarchical structure of classes 600 permits more flexibility and refinement in classifying voice signals to categories.
- the scope of the present invention includes any number of class levels and any number of categories in each class level.
- FIG. 7 is a flowchart of method steps for classifying speech, according to one embodiment of the invention. Although the steps of the FIG. 7 method are described in the context of system 200 of FIG. 2 , any other system configured to implement the method steps is within the scope of the invention.
- sound sensor 205 detects sound energy and converts the sound energy into an analog voice signal.
- amplifier 210 amplifies the analog voice signal.
- A/D converter 215 converts the amplified analog voice signal into a digital voice signal.
- framer 220 segments the digital voice signal into successive data units called frames.
- acoustic processor 221 processes the frames and generates a feature vector and an associated integer label for each frame.
- acoustic processor 221 extracts features (such as statistical features) from each frame, processes the extracted features to generate feature vectors, and assigns an integer label to each feature vector.
- Acoustic processor 221 may include one or more of the following: an FFT 325 , a feature extractor 330 , a vector quantizer 335 , and a register 340 .
- classifier 245 performs a statistical analysis on the integer labels and in a step 735 , classifier 245 classifies the voice signal to a predefined category based upon the results of the statistical analysis.
- classifier 245 accesses memory 250 to determine which agent 255 is associated with the predefined category assigned to the voice signal.
- the agent may either be a human agent or a software agent.
- a caller associated with the voice signal is routed to the agent corresponding to the predefined category.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Probability & Statistics with Applications (AREA)
- Artificial Intelligence (AREA)
- Telephonic Communication Services (AREA)
Abstract
A system and method for classifying a voice signal to one of a set of predefined categories, based upon a statistical analysis of features extracted from the voice signal. The system includes an acoustic processor and a classifier. The acoustic processor extracts features that are characteristic of the voice signal and generates feature vectors using the extracted spectral features. The classifier uses the feature vectors to compute the probability that the voice signal belongs to each of the predefined categories and classifies the voice signal to a predefined category that is associated with the highest probability.
Description
- 1. Field of the Invention
- This invention relates generally to electronic voice processing systems, and relates more particularly to a system and method for voice signal classification based on statistical regularities in voice signals.
- 2. Description of the Background Art
- Speech recognition systems may be used for interaction with a computer or other device. Speech recognition systems usually translate a voice signal into a text string that corresponds to instructions for the device.
FIG. 1 is a block diagram of a speech recognition system of the prior art. The speech recognition system includes amicrophone 110, an analog-to-digital (A/D)converter 115, afeature extractor 120, aspeech recognizer 125, and atext string 130. Microphone 110 receives sound energy via pressure waves (not shown). Microphone 110 converts the sound energy to an electronic analog voice signal and sends the analog voice signal to A/D converter 115. A/D converter 115 samples and quantizes the analog signal, converting the analog voice signal to a digital voice signal. Typical sampling frequencies are 8 KHz and 16 KHz. A/D converter 115 then sends the digital voice signal to featureextractor 120. Typically, featureextractor 120 segments the digital voice signal into consecutive data units called frames, and then extracts features that are characteristic to the voice signal of each frame. Typical frame lengths are ten, fifteen, or twenty milliseconds.Feature extractor 120 performs various operations on the voice signal of each frame. Operations may include transformation into a spectral representation by mapping the voice signal from time to frequency domain via a Fourier transform, suppressing noise in the spectral representation, converting the spectral representation to a spectral energy or power signal, and performing a second Fourier transform on the spectral energy or power signal to obtain cepstral coefficients. The cepstral coefficients represent characteristic spectral features of the voice signal. Typically,feature extractor 120 generates a set of feature vectors whose components are the cepstral coefficients.Feature extractor 120 sends the feature vectors tospeech recognizer 125.Speech recognizer 125 includes speech models and performs a speech recognition procedure on the received feature vectors to generate thetext string 130. For example,speech recognizer 125 may be implemented as a Hidden Markov Model (HMM) recognizer. - Speech recognition systems translate voice signals into text; however, speaker-independent speech recognition systems are generally rigid, inaccurate, computationally-intensive, and are not able to recognize true natural language. For example, typical speech recognition systems have a voice-to-text translation accuracy rate of 40%-50% when processing true natural language voice signals. It is difficult to design a highly accurate natural language speech recognition system that generates unconstrained voice-to-text translation in real-time, due to the complexity of natural language, the complexity of the language models used in speech recognition, and the limits on computational power.
- In many applications, the exact text of a speech message is unimportant, and only the topic of the speech message needs to be recognized. It would be desirable to have a flexible, efficient, and accurate speech classification system that categorizes natural language speech based upon the topics comprising a speech message. In other words, it would be advantageous to implement a speech classification system that categorizes speech based upon what is talked about, without generating an exact transcript of what is said.
- In accordance with the present invention, a system and method are disclosed for classifying a voice signal to a category from a set of predefined categories, based upon a statistical analysis of features extracted from the voice signal.
- The system includes an acoustic processor that generates a feature vector and an associated integer label for each frame of the voice signal, a memory for storing statistical characterizations of a set of predefined categories and agents associated with each predefined category, and a classifier for classifying the voice signal to a predefined category based upon a statistical analysis of the received output of the acoustic processor.
- In one embodiment the acoustic processor includes an FFT for generating a spectral representation from the voice signal, a feature extractor for generating feature vectors characterizing the voice signal, a vector quantizer for quantizing the feature vectors and generating an integer label for each feature vector, and a register for storing the integer labels.
- The classifier computes a probability of occurrence for the output of the acoustic processor based on each of the statistical characterizations of the predefined categories, and classifies the voice signal to the predefined category with the highest probability or to a set of predefined categories with the highest probabilities. Furthermore, the classifier accesses memory to determine an agent associated with the predefined category or categories and routes a caller associated with the voice signal to the agent. The agent may be a human agent or a software agent.
-
FIG. 1 is block diagram of a speech recognition system of the prior art; -
FIG. 2 is a block diagram of one embodiment of a voice signal classification system, according to the present invention; -
FIG. 3 is a block diagram of one embodiment of the acoustic processor ofFIG. 2 , according to the invention; -
FIG. 4A is a block diagram of one embodiment of the classifier ofFIG. 2 , according to the invention; -
FIG. 4B is a block diagram of one embodiment of probabilistic suffix tree PST11 ofFIG. 4A , according to the invention; -
FIG. 4C is a block diagram of one embodiment of probabilistic suffix tree PST21 ofFIG. 4A , according to the invention; -
FIG. 5 is a block diagram of another embodiment of the classifier ofFIG. 2 , according to the invention; -
FIG. 6 is a block diagram of one embodiment of a hierarchical structure of classes, according to the invention; and -
FIG. 7 is a flowchart of method steps for classifying speech, according to one embodiment of the invention. - The present invention classifies a voice signal based on statistical regularities in the signal. The invention analyzes the statistical regularities in the voice signal to determine a classification category. In one embodiment, the voice signal classification system of the invention applies digital signal processing techniques to a voice signal. The system receives the voice signal and computes a set of quantized feature vectors that represents the statistical characteristics of the voice signal. The system then analyzes the feature vectors and classifies the voice signal to a predefined category from a plurality of predefined categories. Finally, the system contacts an agent associated with the predefined category. The agent may be a person or an automated process that provides additional services to a caller.
-
FIG. 2 is a block diagram of one embodiment of a voicesignal classification system 200, according to the invention.Voice classification system 200 includes asound sensor 205, anamplifier 210, an A/D converter 215, aframer 220, anacoustic processor 221, aclassifier 245, amemory 250, and anagent 255.System 200 may also include noise-reduction filters incorporated in A/D converter 215,acoustic processor 221, or as separate functional units.Sound sensor 205 detects sound energy and converts the detected sound energy into an electronic analog voice signal. In one embodiment, sound energy is input tosystem 200 by a speaker via a telephone call.Sound sensor 205 sends the analog voice signal toamplifier 210.Amplifier 210 amplifies the analog voice signal and sends the amplified analog voice signal to A/D converter 215. A/D converter 215 converts the amplified analog voice signal into a digital voice signal by sampling and quantizing the amplified analog voice signal. A/D converter 215 then sends the digital voice signal toframer 220. -
Framer 220 segments the digital voice signal into successive data units called frames, where each frame occupies a time window of duration time T. A frame generally includes several hundred digital voice signal samples with a typical duration time T of ten, fifteen, or twenty milliseconds. However, the scope of the invention includes frames of any duration time T and any number of signal samples.Framer 220 sends the frames toacoustic processor 221.Sound sensor 205,amplifier 210, A/D converter 215, andframer 220 are collectively referred to as an acoustic front end toacoustic processor 221. The scope of the invention covers other acoustic front ends configured to receive a voice signal, and generate a digital discrete-time representation of the voice signal. -
Acoustic processor 221 generates a feature vector and an associated integer label for each frame of the voice signal based upon statistical features of the voice signal.Acoustic processor 221 is described below in conjunction withFIG. 3 . - In one embodiment,
classifier 245 classifies the voice signal to one of a set of predefined categories by performing a statistical analysis on the integer labels received fromacoustic processor 221. In another embodiment of the invention,classifier 245 classifies the voice signal to one of the set of predefined categories by performing a statistical analysis on the feature vectors received fromacoustic processor 221.Classifier 245 is not a speech recognition system that outputs a sequence of words.Classifier 245 classifies the voice signal to one of the set of predefined categories based upon the most likely content of the voice signal.Classifier 245 computes the probabilities that the voice signal belongs to each of a set of predefined categories based upon a statistical analysis of the integer labels generated byacoustic processor 221.Classifier 245 assigns the voice signal to the predefined category that produces the highest probability.Classifier 245, upon assigning the voice signal to one of the set of predefined categories, accessesmemory 250 to determine which agent is associated with the predefined category.Classifier 245 then routes a caller associated with the voice signal to theappropriate agent 255.Agent 255 may be a human agent or a software agent. -
FIG. 3 is a block diagram of one embodiment ofacoustic processor 221 ofFIG. 2 , according to the invention. However, the scope of the invention covers any acoustic processor that characterizes voice signals by extracting statistical features from the voice signals. In theFIG. 3 embodiment,acoustic processor 221 includes anFFT 325, afeature extractor 330, avector quantizer 335, and aregister 340.FFT 325 generates a spectral representation for each frame received fromframer 220 by using a computationally efficient algorithm to compute the discrete Fourier transform of the voice signal.FFT 325 transforms the time-domain voice signal to the frequency-domain spectral representation to facilitate analysis of the voice signal bysignal classification system 200.FFT 325 sends the spectral representation of each frame to featureextractor 330.Feature extractor 330 extracts statistical features of the voice signal and represents those statistical features by a feature vector, generating one feature vector for each frame. For example,feature extractor 330 may generate a smoothed version of the spectral representation called a Me1 spectrum. The statistical features are identified by the relative energy in the Me1 spectrum coefficients.Feature extractor 330 then computes the feature vector whose components are the Me1 spectrum coefficients. Typically the components of the feature vector are cepstral coefficients, which featureextractor 330 computes from the Me1 spectrum. All other techniques for extracting statistical features from the voice signal and processing the statistical features to generate feature vectors are within the scope of the invention.Feature extractor 330 sends the feature vectors tovector quantizer 335.Vector quantizer 335 quantizes the feature vectors and assigns each quantized vector one integer label from a set of predefined integer labels. - In an exemplary embodiment,
vector quantizer 335 snaps components of an n-dimensional feature vector to the nearest quantized components of an n-dimensional quantized feature vector. Typically there are a finite number of different quantized feature vectors that can be enumerated by integers. Once the components of the feature vectors are quantized,vector quantizer 335 generates a single scalar value for each quantized feature vector corresponding to a unique integer label of this vector among all different quantized feature vectors. For example, given a quantized n-dimensional feature vector v with quantized components (a1, a2, a3, . . . , an), a scalar value (SV) may be generated by a function SV=f(a1, a2, a3, . . . , an), where SV is equal to a function f of the quantized components (a1, a2, a3, . . . , an).Vector quantizer 335 then assigns an integer label from the set of predefined integer labels to each computed SV. -
Vector quantizer 335 sends the integer labels to register 340, which stores the labels for all frames in the voice signal.Register 340 may alternatively comprise a memory of various storage-device configurations, for example Random-Access Memory (RAM) and non-volatile storage devices such as floppy-disks or hard disk-drives. Once the entire sequence of integer labels that represents the voice signal is stored inregister 340, register 340 sends the entire sequence of integer labels toclassifier 245. - In alternate embodiments,
acoustic processor 221 may functionally combineFFT 325 withfeature extractor 330, or may not includeFFT 325. Ifacoustic processor 221 does not perform an explicit FFT on the voice signal at any stage,acoustic processor 221 may use indirect methods known in the art for extracting statistical features from the voice signal. For example, in the absence ofFFT 325,feature extractor 330 may generate an LPC spectrum directly from the time domain representation of the signal. The statistical features are identified by spectral peaks in the LPC spectrum and are represented by a set of LPC coefficients. Then, in one embodiment,feature extractor 330 computes the feature vector whose components are the LPC coefficients. In another embodiment,feature extractor 330 computes the feature vector whose components are cepstral coefficients, which featureextractor 330 computes from the LPC coefficients by taking a fast Fourier transform of the LPC spectrum. -
FIG. 4A is a block diagram of one embodiment ofclassifier 245 ofFIG. 2 , according to the invention.Classifier 245 includes one or more probabilistic suffix trees (PSTs) grouped together by voice classification category 410. For example,category 1 410 a may be “pets” and includes PST11, PST12, and PST13. Category 2 410 b may be “automobile parts” and includes PST21, PST22, PST23, and PST24. Any number and type of voice classification categories 410 and any number of PSTs per category are within the scope of the invention. -
FIG. 4B is a block diagram of one embodiment of PST11 fromcategory 1 410 a andFIG. 4C is a block diagram of one embodiment of PST21 fromcategory 2 410 b. The message information stored in register 340 (FIG. 3 ) can be considered as a string of integer labels. For each position in this string, a suffix is a contiguous set of integer labels that terminates at that position. Suffix trees are data structures comprising a plurality of suffixes for a given string, allowing problems on strings, such as substring matching, to be solved efficiently and quickly. A PST is a suffix tree in which each vertex is assigned a probability. Each PST has a root vertex and a plurality of branches. A path along each branch comprises one or more substrings, and the substrings in combination along a specific branch define a particular suffix. - For example, PST11 of
FIG. 4B includes 9 suffixes represented by 9 branches, where a substring of each branch is defined by an integer label. For example, a 7-1-2 sequence of integer labels along a first branch defines a first suffix, a 7-1-4 sequence of integer labels along a second branch defines a second suffix, a 7-8-2 sequence of integer labels along a third branch defines a third suffix, and a 7-8-4 sequence of integer labels along a fourth branch defines a fourth suffix. In one embodiment, a probability is assigned to each vertex of each PST in each category 410, based upon suffix usage statistics in each category 410. For example, suffixes specified by the PSTs ofcategory 1 410 a (FIG. 4A ) common to words typically used to describe “pets” are assigned higher probabilities than suffixes used less frequently. In addition, a probability assigned to a given suffix fromcategory 1 410 a is typically different than a probability assigned to the given suffix fromcategory 2 410 b (FIG. 4A ). - In one embodiment, the PSTs associated with each voice classification category 410 are built from training sets. The training sets for each category include voice data from a variety of users such that the PSTs are built using a variety of pronunciations, inflections, and other such criteria.
- In operation,
classifier 245 receives a sequence of integer labels fromacoustic processor 221 associated with a voice message.Classifier 245 computes the probability of occurrence of the sequence of integer labels in each category using the PSTs. In one embodiment,classifier 245 determines a total probability for the sequence of integer labels for each PST in each category.Classifier 245 determines the total probability for a sequence of integer labels applied to a PST by determining a probability at each position in the sequence based on the longest suffix present in that PST, then calculating the product of the probabilities at each position.Classifier 245 then determines which category includes the PST that produced the highest total probability, and assigns the message to that category. - Using PST11 of
FIG. 4B and a sequence of integer labels 4-1-7-2-3-1-10 as an example,classifier 245 determines the probability of a longest suffix at each of the seven locations in the integer label sequence.Classifier 245 reads the first location in the sequence of integer labels as theinteger label 4. Since the integer label 4is not associated with a branch labeled 4 that originates from aroot vertex 420 of PST11,classifier 245 assigns a probability of root vertex 420 (e.g., 1) to the first location. The second location in the sequence of integer labels is theinteger label 1. The longest suffix associated with the second location that is also represented by a branch originating fromroot vertex 420 is the suffix corresponding to theinteger label 1, since the longest suffix corresponding to the integer label sequence 1-4 does not correspond to any branches similarly labeled originating fromroot vertex 420. That is, PST11 does not have a branch labeled 1-4 that originates fromroot vertex 420. Therefore,classifier 245 assigns the probability defined at a vertex 422 (P(1)) to the second location. The third location in the sequence of integer labels is theinteger label 7. Since the longest suffix ending at the integer label 7 (i.e., suffix 7-1-4) exists in PST11 as the branch labeled 7-1-4 originating fromroot vertex 420,classifier 245 assigns a probability associated with a vertex 424 (P(7-1-4)) to the third location. The next two locations in the sequence of integer labels correspond to theintegers root vertex 420, and therefore classifier 245 assigns the probability ofroot vertex 420 to these next two locations. The sixth location in the sequence corresponds to theinteger label 1, and the longest suffix ending at the sixth location that is represented by a branch in PST11 is the suffix 1-3-2. Therefore,classifier 245 assigns a probability associated with a vertex 426 (P(1-3-2)) to the sixth location along the sequence. Next, since the seventh location corresponding to theinteger label 10 is not represented by a branch in PST11 originating fromroot vertex 420,classifier 245 assigns the probability ofroot vertex 420 to the seventh location in the sequence. - Next,
classifier 245 calculates the total probability for the sequence of integer labels 4-1-7-2-3-1-10 applied to PST11 where the total probability is a product of the location probabilities: PT(PST11)=1×P(1)×P(7-1-4)×1×1×P(1-3-2)×1. In another embodiment of the invention,classifier 245 calculates the total probability by summing the logarithm of each location probability. Although the sequence of integer labels for this examples includes only seven integer labels, any number of integer labels is within the scope of the invention. The number of integer labels in the sequence depends on the number of frames of the message, which in turn depends on the duration of the voice signal input tosystem 200. -
FIG. 5 is a block diagram of another embodiment ofclassifier 245, according to the invention. TheFIG. 5 embodiment ofclassifier 245 includes three states and nine arcs, but the scope of the invention includes classifiers with any number of states and associated arcs. Since each state is associated with one of the predefined integer labels, the number of states is equal to the number of predefined integer labels. TheFIG. 5 embodiment ofclassifier 245 comprises three predefined integer labels, where state 1 (505) is identified withinteger label 1, state 2 (510) is identified withinteger label 2, and state 3 (515) is identified withinteger label 3. The arcs represent the probability of a transition from one state to another state or the same state. For example, a12 is the probability of transition from state 1 (505) to state 2 (510), a21 is the probability of transition from state 2 (510) to state 1 (505), and a11 is the probability of transition from state 1 (505) to state 1 (505). The transition probabilities aij(L) depend on the integer labels L of the quantized speech. - In the
FIG. 5 embodiment,classifier 245 computes all permutations of the integer labels received fromacoustic processor 221 and computes a probability of occurrence for each permutation.Classifier 245 associates each permutation of the received integer labels to a unique sequence of states. The total number of sequences that classifier 245 can compute is the total number of predefined integer labels raised to an integer power, where the integer power is the total number of integer labels sent toclassifier 245. If m=the total number of predefined integer labels, n=the integer power, and ns=the total number of sequences of states, then ns=mn.Classifier 245 comprises three predefined integer labels (m=3). Thus, ifregister 340 sendsclassifier 245 three integer labels (n=3), then classifier can compute 33=27 possible sequences of states. The sequences of states include, for example, 1→1→1, 1→1→2, 1→2→1, 1→1→3, 1→3→1, 1→2→1, 1→2→2, 1→3→3, and 1→2→3. The total number of transition probabilities is the total number of predefined integer labels squared. If np=total number of transition probabilities, then np=m2. Thus there are 32=9 transition probabilities. For each integer label L that can be assigned by quantizer 335 (FIG. 3 ), there is possibly a different set of transition probabilities. The transition probabilities are a11(L), a22(L), a33(L), a12(L), a21(L), a13(L), a31(L), a23(L), and a32(L). - When a user or system administrator initializes voice
signal classification system 200,classifier 245 assigns an initial starting probability to each state. For example,classifier 245 assigns to state 1 (505) a probability a11, which represents the probability of starting instate 1, to state 2 (510) a probability a12, which represents the probability of starting instate 2, and to state 3 (515) a probability a13, which represents the probability of starting instate 3. - If
classifier 245 receives integer labels (1,2,3), thenclassifier 245 computes six sequences ofstates 1→2→3, 1→3→2, 2→1→3, 2→3→1, 3→1→2, and 3→2→1, and an associated probability of occurrence for each sequence. The six sequences of states are a subset of the 27 possible sequences of states. For example,classifier 245 computes the total probability of the 1→2→3 sequence of states by multiplying the probability of starting instate 1, a11, by the probability a12(L1) of a transition fromstate 1 tostate 2 when the first integer label of a sequence of integer labels appears, by the probability a23(L2) of a transition fromstate 2 tostate 3 when the second integer label of the sequence appears. The total probability is P(1→2→3)=a11×a12(L1)×a23(L2). Similarly, the total probability of the 2→3→1 sequence of states is P(2→3→1)=a12×a23(L1)×a31(L2).Classifier 245 calculates the total probabilities for the remaining four sequences of states in a similar manner.Classifier 245 then classifies the voice signal to one of a set of predefined categories associated with the sequence of states with the highest probability of occurrence. Some of the sequences of states may not have associated categories, and some of the sequences of states may have the same associated category. If there is no predefined category associated with the sequence of states with the highest probability of occurrence, thenclassifier 245 classifies the voice signal to a predefined category associated with the sequence of states with the next highest probability of occurrence. -
Voice classification system 200 may be implemented in a voice message routing system, a quality-control call center, an interface to a Web-based voice portal, or in conjunction with a speech-to-text recognition engine, for example. A retail store may use voicesignal classification system 200 to route telephone calls to an appropriate department (agent) based upon a category to which a voice signal is classified. For example, a person may call the retail store to inquire whether the store sells a particular brand of cat food. More specifically, a person may say the following: “I was wondering if you carry, . . . uh, . . . well, if you stock or have in store cat food X, well actually cat food for my kitten, and if so, could you tell me the price of a bag. Also, how large of bag can I buy? (Pause). Oh wait, I almost forgot, do you have monkey chow?” Although this is a complex, natural language speech pattern, voicesignal classification system 200 classifies the received natural language voice signal into a category based upon the content of the voice signal. For example,system 200 may classify the voice signal to a pet department category, and therefore route the person's call to the pet department (agent). However, in addition,system 200 may classify the speech into other categories, such as billing, accounting, employment opportunities, deliveries, or others. For example,system 200 may classify the speech to a pricing category that routes the call to an associated agent that can immediately answer the caller's questions concerning inventory pricing. -
System 200 may classify voice signals to categories associated with predefined items on a menu. For example, a voice signal may be classified to a category associated with a software agent that activates a playback of a predefined pet department menu. The caller can respond to the pet department menu with additional voice messages or a touch-tone keypad response. Or the voice signal may be classified to another category whose associated software agent activates a playback of a predefined pricing menu. - In another embodiment,
system 200 may be implemented in a quality control call center that classifies calls into complaint categories, order categories, or personal call categories, for example. An agent then selects calls from the various categories based upon the agent's priorities at the time. Thus,system 200 provides an effective and efficient manner of customer-service quality control. - In yet another embodiment of
speech classification system 200,system 200 may be configured as an interface to voice portals, classifying calls to various categories such as weather, stock, or traffic, and then routing and connecting the call to an appropriate voice portal. - In yet another embodiment of the present invention,
system 200 is used in conjunction with a speech-to-text recognition engine. For example, a voice signal is assigned to a particular category that is associated with a predefined speech model including a defined vocabulary set for use in the recognition engine. For instance, a caller inquiring about current weather conditions in Oklahoma City would access the recognition engine with a speech model/vocabulary set including voice-to-text translations for words such as “storm”, “rain”, “hail”, and “tornado.” The association of speech models/vocabulary sets with each voice signal category reduces the complexity of the speech-to-text recognition engine and consequently reduces speech-to-text processing times. - The combination of
system 200 with the speech-to-text recognition engine may classify voice signals into language categories, thus making the combination ofsystem 200 and the speech-to-text recognition engine language independent. For example, ifvoice classification system 200 classifies a voice signal to a German language category, then the recognition engine uses a speech model/vocabulary set associated with the German language category to translate the voice signal. - In other embodiments,
system 200 may be implemented to classify voice signals into categories that are independent of the specific spoken words or text of the call. For example,system 200 may be configured to categorize a caller as male or female as the content of a male voice signal typically is distinguishable from the content of a female voice signal. Similarly,system 200 may be configured to identify a caller as being one member of a predetermined group of persons as the content of the voice signal of each person in the group would be distinguishable from that of the other members of the group.System 200 therefore may be used, for example, in a caller identification capacity or a password protection or other security capacity. - In addition, just as
system 200 may be used to categorize voice signals as either male or female,system 200 may be used to distinguish between any voice signal sources where the voice signals at issue are known to have different content. Such voice signals are not required to be expressed in a known language. For example,system 200 may be used to distinguish between various types of animals, such as cats and dogs or sheep and cows. Further,system 200 may be used to distinguish among different animals of the same type, such as dogs, where a predetermined group of such animals exists and the voice signal content of each animal in the group is known. In this case,system 200 may be used to identify any one of the animals in the group in much the same way thatsystem 200 may be used to identify a caller as described above. -
Voice classification system 200 may be implemented in a hierarchical classification system.FIG. 6 is a block diagram of one embodiment of a hierarchical structure ofclasses 600, according to the invention. The hierarchical structure includes afirst level class 605, asecond level class 610, and athird level class 615. In theFIG. 6 exemplary embodiment of the hierarchical structure ofclasses 600, thefirst level class 605 includes language categories, such as anEnglish language category 620, aGerman language category 625, and aSpanish language category 630. Thesecond level class 610 includes apricing category 635, acomplaint category 640, and anorder category 645. Thethird level class 615 includes ahardware category 650, asporting goods category 655, and akitchen supplies category 660. - For example,
voice classification system 200 receives a call and classifies the caller'svoice signal 601 intoEnglish category 620, then classifiesvoice signal 601 intoorder 645 subcategory, and then classifiesvoice signal 601 intosporting goods 655 sub-subcategory. Finally,system 200 routes the call to anagent 665 associated with ordering sporting goods supplies in English. The configuration ofsystem 200 with the hierarchical structure ofclasses 600 permits more flexibility and refinement in classifying voice signals to categories. The scope of the present invention includes any number of class levels and any number of categories in each class level. -
FIG. 7 is a flowchart of method steps for classifying speech, according to one embodiment of the invention. Although the steps of theFIG. 7 method are described in the context ofsystem 200 ofFIG. 2 , any other system configured to implement the method steps is within the scope of the invention. In astep 705,sound sensor 205 detects sound energy and converts the sound energy into an analog voice signal. In astep 710,amplifier 210 amplifies the analog voice signal. In astep 715, A/D converter 215 converts the amplified analog voice signal into a digital voice signal. In astep 720,framer 220 segments the digital voice signal into successive data units called frames. In astep 725,acoustic processor 221 processes the frames and generates a feature vector and an associated integer label for each frame. Typically,acoustic processor 221 extracts features (such as statistical features) from each frame, processes the extracted features to generate feature vectors, and assigns an integer label to each feature vector.Acoustic processor 221 may include one or more of the following: anFFT 325, afeature extractor 330, avector quantizer 335, and aregister 340. In astep 730,classifier 245 performs a statistical analysis on the integer labels and in astep 735,classifier 245 classifies the voice signal to a predefined category based upon the results of the statistical analysis. In astep 740,classifier 245 accessesmemory 250 to determine whichagent 255 is associated with the predefined category assigned to the voice signal. The agent may either be a human agent or a software agent. In astep 745, a caller associated with the voice signal is routed to the agent corresponding to the predefined category. - The invention has been explained above with reference to specific embodiments. Other embodiments will be apparent to those skilled in the art in light of this disclosure. The present invention may readily be implemented using configurations other than those described in the embodiments above. Therefore, these and other variations upon the specific embodiments are intended to be covered by the present invention, which is limited only by the appended claims.
Claims (9)
1. A system for classifying a voice signal, comprising:
an acoustic processor configured to receive the voice signal, to generate feature vectors that characterize the voice signal, and to assign an integer label to each generated feature vector; and
a classifier coupled to the acoustic processor to classify the voice signal to one of a set of predefined categories based upon a statistical analysis of the integer labels associated with the feature vectors, wherein the classifier uses one or more probability suffix trees (PSTs) to compute a probability of occurrence of the integer labels being classified in the set of predefined categories.
2. The system of claim 1 , wherein the system further comprises a framer configured to segment the voice signal into frames.
3. The system of claim 1 , wherein the acoustic processor comprises a feature extractor configured to extract statistical features characteristic of the voice signal.
4. The system of claim 1 , further comprising a memory for storing identities of agents, each agent being associated with one of the set of predefined categories.
5. The system of claim 1 , wherein the classifier computes a probability that the voice signal belongs to each of the set of predefined categories using the integer labels assigned to the feature vectors.
6. The system of claim 5 , wherein the classifier classifies the voice signal to the predefined category in the set of predefined categories that is associated with the highest probability.
7. The system of claim 1 , wherein the classifier routes a caller associated with the voice signal to an agent associated with the predefined category.
8-14. (canceled)
15. A system for classifying a voice signal, comprising:
means for generating a digital discrete-time representation of the voice signal;
means for segmenting the digital discrete-time representation of the voice signal into frames;
means for extracting statistical features from each frame that characterize the voice signal;
means for generating a feature vector from each frame using the extracted statistical features;
means for associating an integer label to each feature vector; and
means for classifying the voice signal to one of a set of predefined categories based upon a statistical analysis of the integer labels, wherein the means for classifying uses one or more probability suffix trees (PSTs) to compute a probability of occurrence of the integer labels being classified in the set of predefined categories.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US12/042,111 US20080154595A1 (en) | 2003-04-22 | 2008-03-04 | System for classification of voice signals |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10/421,356 US7389230B1 (en) | 2003-04-22 | 2003-04-22 | System and method for classification of voice signals |
US12/042,111 US20080154595A1 (en) | 2003-04-22 | 2008-03-04 | System for classification of voice signals |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US10/421,356 Continuation US7389230B1 (en) | 2003-04-22 | 2003-04-22 | System and method for classification of voice signals |
Publications (1)
Publication Number | Publication Date |
---|---|
US20080154595A1 true US20080154595A1 (en) | 2008-06-26 |
Family
ID=39510490
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US10/421,356 Active 2025-08-29 US7389230B1 (en) | 2003-04-22 | 2003-04-22 | System and method for classification of voice signals |
US12/042,111 Abandoned US20080154595A1 (en) | 2003-04-22 | 2008-03-04 | System for classification of voice signals |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US10/421,356 Active 2025-08-29 US7389230B1 (en) | 2003-04-22 | 2003-04-22 | System and method for classification of voice signals |
Country Status (1)
Country | Link |
---|---|
US (2) | US7389230B1 (en) |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080015846A1 (en) * | 2006-07-12 | 2008-01-17 | Microsoft Corporation | Detecting an answering machine using speech recognition |
US20090157400A1 (en) * | 2007-12-14 | 2009-06-18 | Industrial Technology Research Institute | Speech recognition system and method with cepstral noise subtraction |
US20100094633A1 (en) * | 2007-03-16 | 2010-04-15 | Takashi Kawamura | Voice analysis device, voice analysis method, voice analysis program, and system integration circuit |
US20110046951A1 (en) * | 2009-08-21 | 2011-02-24 | David Suendermann | System and method for building optimal state-dependent statistical utterance classifiers in spoken dialog systems |
US20110172954A1 (en) * | 2009-04-20 | 2011-07-14 | University Of Southern California | Fence intrusion detection |
US20110231261A1 (en) * | 2010-03-17 | 2011-09-22 | Microsoft Corporation | Voice customization for voice-enabled text advertisements |
US20130090926A1 (en) * | 2011-09-16 | 2013-04-11 | Qualcomm Incorporated | Mobile device context information using speech detection |
US8484025B1 (en) * | 2012-10-04 | 2013-07-09 | Google Inc. | Mapping an audio utterance to an action using a classifier |
US20160027444A1 (en) * | 2014-07-22 | 2016-01-28 | Nuance Communications, Inc. | Method and apparatus for detecting splicing attacks on a speaker verification system |
CN108924483A (en) * | 2018-06-27 | 2018-11-30 | 南京朴厚生态科技有限公司 | A kind of automatic monitoring system and method for the field animal based on depth learning technology |
US20190066675A1 (en) * | 2017-08-23 | 2019-02-28 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Artificial intelligence based method and apparatus for classifying voice-recognized text |
US10334103B2 (en) | 2017-01-25 | 2019-06-25 | International Business Machines Corporation | Message translation for cognitive assistance |
CN109935230A (en) * | 2019-04-01 | 2019-06-25 | 北京宇航系统工程研究所 | A system and method for detecting and issuing passwords based on voice drive |
US11087747B2 (en) | 2019-05-29 | 2021-08-10 | Honeywell International Inc. | Aircraft systems and methods for retrospective audio analysis |
Families Citing this family (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080300856A1 (en) * | 2001-09-21 | 2008-12-04 | Talkflow Systems, Llc | System and method for structuring information |
DE102004008225B4 (en) * | 2004-02-19 | 2006-02-16 | Infineon Technologies Ag | Method and device for determining feature vectors from a signal for pattern recognition, method and device for pattern recognition and computer-readable storage media |
WO2005122141A1 (en) * | 2004-06-09 | 2005-12-22 | Canon Kabushiki Kaisha | Effective audio segmentation and classification |
US9123350B2 (en) * | 2005-12-14 | 2015-09-01 | Panasonic Intellectual Property Management Co., Ltd. | Method and system for extracting audio features from an encoded bitstream for audio classification |
US7778831B2 (en) * | 2006-02-21 | 2010-08-17 | Sony Computer Entertainment Inc. | Voice recognition with dynamic filter bank adjustment based on speaker categorization determined from runtime pitch |
US9620117B1 (en) * | 2006-06-27 | 2017-04-11 | At&T Intellectual Property Ii, L.P. | Learning from interactions for a spoken dialog system |
US9015194B2 (en) * | 2007-07-02 | 2015-04-21 | Verint Systems Inc. | Root cause analysis using interactive data categorization |
JP5088050B2 (en) * | 2007-08-29 | 2012-12-05 | ヤマハ株式会社 | Voice processing apparatus and program |
WO2010019831A1 (en) * | 2008-08-14 | 2010-02-18 | 21Ct, Inc. | Hidden markov model for speech processing with training method |
CN101546555B (en) * | 2009-04-14 | 2011-05-11 | 清华大学 | Constraint heteroscedasticity linear discriminant analysis method for language identification |
US9053182B2 (en) | 2011-01-27 | 2015-06-09 | International Business Machines Corporation | System and method for making user generated audio content on the spoken web navigable by community tagging |
US8849663B2 (en) | 2011-03-21 | 2014-09-30 | The Intellisis Corporation | Systems and methods for segmenting and/or classifying an audio signal from transformed audio information |
US9142220B2 (en) | 2011-03-25 | 2015-09-22 | The Intellisis Corporation | Systems and methods for reconstructing an audio signal from transformed audio information |
US8548803B2 (en) | 2011-08-08 | 2013-10-01 | The Intellisis Corporation | System and method of processing a sound signal including transforming the sound signal into a frequency-chirp domain |
US8620646B2 (en) | 2011-08-08 | 2013-12-31 | The Intellisis Corporation | System and method for tracking sound pitch across an audio signal using harmonic envelope |
US9183850B2 (en) | 2011-08-08 | 2015-11-10 | The Intellisis Corporation | System and method for tracking sound pitch across an audio signal |
US9520141B2 (en) * | 2013-02-28 | 2016-12-13 | Google Inc. | Keyboard typing detection and suppression |
US9058820B1 (en) | 2013-05-21 | 2015-06-16 | The Intellisis Corporation | Identifying speech portions of a sound model using various statistics thereof |
US9484044B1 (en) | 2013-07-17 | 2016-11-01 | Knuedge Incorporated | Voice enhancement and/or speech features extraction on noisy audio signals using successively refined transforms |
US9530434B1 (en) | 2013-07-18 | 2016-12-27 | Knuedge Incorporated | Reducing octave errors during pitch determination for noisy audio signals |
US9208794B1 (en) | 2013-08-07 | 2015-12-08 | The Intellisis Corporation | Providing sound models of an input signal using continuous and/or linear fitting |
US20150199727A1 (en) * | 2014-01-10 | 2015-07-16 | Facebook, Inc. | Sponsoring Brands Detected in User-Generated Social Networking Content |
US9870785B2 (en) | 2015-02-06 | 2018-01-16 | Knuedge Incorporated | Determining features of harmonic signals |
US9842611B2 (en) | 2015-02-06 | 2017-12-12 | Knuedge Incorporated | Estimating pitch using peak-to-peak distances |
US9922668B2 (en) | 2015-02-06 | 2018-03-20 | Knuedge Incorporated | Estimating fractional chirp rate with multiple frequency representations |
CN109284374B (en) * | 2018-09-07 | 2024-07-05 | 百度在线网络技术(北京)有限公司 | Method, apparatus, device and computer readable storage medium for determining entity class |
CN117198300B (en) * | 2023-09-15 | 2024-12-10 | 航天新气象科技有限公司 | A bird sound recognition method and device based on attention mechanism |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5502790A (en) * | 1991-12-24 | 1996-03-26 | Oki Electric Industry Co., Ltd. | Speech recognition method and system using triphones, diphones, and phonemes |
US5615299A (en) * | 1994-06-20 | 1997-03-25 | International Business Machines Corporation | Speech recognition using dynamic features |
US5983180A (en) * | 1997-10-23 | 1999-11-09 | Softsound Limited | Recognition of sequential data using finite state sequence models organized in a tree structure |
Family Cites Families (208)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US3648253A (en) | 1969-12-10 | 1972-03-07 | Ibm | Program scheduler for processing systems |
US4286322A (en) | 1979-07-03 | 1981-08-25 | International Business Machines Corporation | Task handling apparatus |
US4814974A (en) | 1982-07-02 | 1989-03-21 | American Telephone And Telegraph Company, At&T Bell Laboratories | Programmable memory-based arbitration system for implementing fixed and flexible priority arrangements |
US4658370A (en) | 1984-06-07 | 1987-04-14 | Teknowledge, Inc. | Knowledge engineering tool |
US4908865A (en) | 1984-12-27 | 1990-03-13 | Texas Instruments Incorporated | Speaker independent speech recognition method and system |
US4642756A (en) | 1985-03-15 | 1987-02-10 | S & H Computer Systems, Inc. | Method and apparatus for scheduling the execution of multiple processing tasks in a computer system |
US4852181A (en) | 1985-09-26 | 1989-07-25 | Oki Electric Industry Co., Ltd. | Speech recognition for recognizing the catagory of an input speech pattern |
JPH063934B2 (en) | 1986-11-25 | 1994-01-12 | 株式会社日立製作所 | Automatic reminder system |
US4817027A (en) | 1987-03-23 | 1989-03-28 | Cook Imaging | Method and apparatus for evaluating partial derivatives |
US4816989A (en) | 1987-04-15 | 1989-03-28 | Allied-Signal Inc. | Synchronizer for a fault tolerant multiple node processing system |
US4823306A (en) | 1987-08-14 | 1989-04-18 | International Business Machines Corporation | Text search system |
US4942527A (en) | 1987-12-11 | 1990-07-17 | Schumacher Billy G | Computerized management system |
US5051924A (en) | 1988-03-31 | 1991-09-24 | Bergeron Larry E | Method and apparatus for the generation of reports |
US5101354A (en) | 1988-04-18 | 1992-03-31 | Brunswick Bowling & Billards Corporation | Multi-lane bowling system with remote operator control |
US5228116A (en) | 1988-07-15 | 1993-07-13 | Aicorp., Inc. | Knowledge base management system |
EP0361570B1 (en) | 1988-09-15 | 1997-08-06 | Océ-Nederland B.V. | A system for grammatically processing a sentence composed in natural language |
US5067099A (en) | 1988-11-03 | 1991-11-19 | Allied-Signal Inc. | Methods and apparatus for monitoring system performance |
JPH02159674A (en) | 1988-12-13 | 1990-06-19 | Matsushita Electric Ind Co Ltd | Method for analyzing meaning and method for analyzing syntax |
JP2635750B2 (en) | 1989-01-25 | 1997-07-30 | 株式会社東芝 | Priority determination device |
JPH02240769A (en) | 1989-03-14 | 1990-09-25 | Canon Inc | Device for preparing natural language sentence |
GB8918553D0 (en) | 1989-08-15 | 1989-09-27 | Digital Equipment Int | Message control system |
US5794194A (en) | 1989-11-28 | 1998-08-11 | Kabushiki Kaisha Toshiba | Word spotting in a variable noise level environment |
US5018215A (en) | 1990-03-23 | 1991-05-21 | Honeywell Inc. | Knowledge and model based adaptive signal processor |
US5125024A (en) | 1990-03-28 | 1992-06-23 | At&T Bell Laboratories | Voice response unit |
JP3009215B2 (en) | 1990-11-30 | 2000-02-14 | 株式会社日立製作所 | Natural language processing method and natural language processing system |
GB9106082D0 (en) | 1991-03-22 | 1991-05-08 | Secr Defence | Dynamical system analyser |
JPH04315254A (en) | 1991-04-15 | 1992-11-06 | Mitsubishi Electric Corp | Signal identification device |
US5210872A (en) | 1991-06-28 | 1993-05-11 | Texas Instruments Inc. | Critical task scheduling for real-time systems |
US5345501A (en) | 1991-07-15 | 1994-09-06 | Bell Atlantic Network Services, Inc. | Telephone central office based method of and system for processing customer orders |
US5404550A (en) | 1991-07-25 | 1995-04-04 | Tandem Computers Incorporated | Method and apparatus for executing tasks by following a linked list of memory packets |
US5251131A (en) | 1991-07-31 | 1993-10-05 | Thinking Machines Corporation | Classification of data records by comparison of records to a training database using probability weights |
US5630128A (en) | 1991-08-09 | 1997-05-13 | International Business Machines Corporation | Controlled scheduling of program threads in a multitasking operating system |
DE4131387A1 (en) | 1991-09-20 | 1993-03-25 | Siemens Ag | METHOD FOR RECOGNIZING PATTERNS IN TIME VARIANTS OF MEASURING SIGNALS |
US5265033A (en) | 1991-09-23 | 1993-11-23 | Atm Communications International, Inc. | ATM/POS based electronic mail system |
US5369570A (en) | 1991-11-14 | 1994-11-29 | Parad; Harvey A. | Method and system for continuous integrated resource management |
US6507872B1 (en) | 1992-09-25 | 2003-01-14 | David Michael Geshwind | Class of methods for improving perceived efficiency of end-user interactive access of a large database such as the world-wide web via a communication network such as “The Internet” |
US5278942A (en) | 1991-12-05 | 1994-01-11 | International Business Machines Corporation | Speech coding apparatus having speaker dependent prototypes generated from nonuser reference data |
US5371807A (en) | 1992-03-20 | 1994-12-06 | Digital Equipment Corporation | Method and apparatus for text classification |
US5325526A (en) | 1992-05-12 | 1994-06-28 | Intel Corporation | Task scheduling in a multicomputer system |
US5247677A (en) | 1992-05-22 | 1993-09-21 | Apple Computer, Inc. | Stochastic priority-based task scheduler |
US5311583A (en) | 1992-08-05 | 1994-05-10 | At&T Bell Laboratories | International priority calling system with callback features |
ES2198407T3 (en) | 1992-09-30 | 2004-02-01 | Motorola, Inc. | SYSTEM OF DISTRIBUTION OF EMAIL MESSAGES. |
GB9222884D0 (en) | 1992-10-30 | 1992-12-16 | Massachusetts Inst Technology | System for administration of privatization in newly democratic nations |
IL107482A (en) | 1992-11-04 | 1998-10-30 | Conquest Software Inc | Method for resolution of natural-language queries against full-text databases |
JP3553987B2 (en) | 1992-11-13 | 2004-08-11 | 株式会社日立製作所 | Client server system |
US5559710A (en) | 1993-02-05 | 1996-09-24 | Siemens Corporate Research, Inc. | Apparatus for control and evaluation of pending jobs in a factory |
JP2561801B2 (en) | 1993-02-24 | 1996-12-11 | インターナショナル・ビジネス・マシーンズ・コーポレイション | Method and system for managing process scheduling |
CA2115210C (en) | 1993-04-21 | 1997-09-23 | Joseph C. Andreshak | Interactive computer system recognizing spoken commands |
US5832220A (en) | 1993-04-30 | 1998-11-03 | International Business Machines Corp. | Automatic settting of an acknowledgement option based upon distribution content in a data processing system |
DE4331710A1 (en) | 1993-09-17 | 1995-03-23 | Sel Alcatel Ag | Method and device for creating and editing text documents |
US5704012A (en) | 1993-10-08 | 1997-12-30 | International Business Machines Corporation | Adaptive resource allocation using neural networks |
US5437032A (en) | 1993-11-04 | 1995-07-25 | International Business Machines Corporation | Task scheduler for a miltiprocessor system |
DE59310052D1 (en) | 1993-11-26 | 2000-07-06 | Siemens Ag | Computing unit with several executable tasks |
US5493692A (en) | 1993-12-03 | 1996-02-20 | Xerox Corporation | Selective delivery of electronic messages in a multiple computer system based on context and environment of a user |
US5444820A (en) | 1993-12-09 | 1995-08-22 | Long Island Lighting Company | Adaptive system and method for predicting response times in a service environment |
JP3476237B2 (en) | 1993-12-28 | 2003-12-10 | 富士通株式会社 | Parser |
US5806040A (en) | 1994-01-04 | 1998-09-08 | Itt Corporation | Speed controlled telephone credit card verification system |
US5522026A (en) | 1994-03-18 | 1996-05-28 | The Boeing Company | System for creating a single electronic checklist in response to multiple faults |
EP0750553B1 (en) | 1994-03-18 | 2003-10-01 | VCS INDUSTRIES, Inc. d.b.a. VOICE CONTROL SYSTEMS | Speech controlled vehicle alarm system |
US5644686A (en) | 1994-04-29 | 1997-07-01 | International Business Machines Corporation | Expert system and method employing hierarchical knowledge base, and interactive multimedia/hypermedia applications |
US5542088A (en) | 1994-04-29 | 1996-07-30 | Intergraph Corporation | Method and apparatus for enabling control of task execution |
US5493677A (en) | 1994-06-08 | 1996-02-20 | Systems Research & Applications Corporation | Generation, archiving, and retrieval of digital images with evoked suggestion-set captions and natural language interface |
DE19500957A1 (en) | 1994-07-19 | 1996-01-25 | Bosch Gmbh Robert | Procedures for the control of technical processes or processes |
FR2724243B1 (en) | 1994-09-06 | 1997-08-14 | Sgs Thomson Microelectronics | MULTI-TASK PROCESSING SYSTEM |
DE4434255A1 (en) | 1994-09-24 | 1996-03-28 | Sel Alcatel Ag | Device for voice recording with subsequent text creation |
US5596502A (en) | 1994-11-14 | 1997-01-21 | Sunoptech, Ltd. | Computer system including means for decision support scheduling |
JP2855409B2 (en) | 1994-11-17 | 1999-02-10 | 日本アイ・ビー・エム株式会社 | Natural language processing method and system |
GB2296349A (en) | 1994-12-19 | 1996-06-26 | Secr Defence | Maintaining order of input data events |
US5694616A (en) | 1994-12-30 | 1997-12-02 | International Business Machines Corporation | Method and system for prioritization of email items by selectively associating priority attribute with at least one and fewer than all of the recipients |
DE19604803A1 (en) | 1995-02-10 | 1996-10-10 | Meidensha Electric Mfg Co Ltd | System condition monitor using chaos theory |
JP2923552B2 (en) | 1995-02-13 | 1999-07-26 | 富士通株式会社 | Method of constructing organization activity database, input method of analysis sheet used for it, and organization activity management system |
US5845246A (en) | 1995-02-28 | 1998-12-01 | Voice Control Systems, Inc. | Method for reducing database requirements for speech recognition systems |
US5636124A (en) | 1995-03-08 | 1997-06-03 | Allen-Bradley Company, Inc. | Multitasking industrial controller |
US5701400A (en) | 1995-03-08 | 1997-12-23 | Amado; Carlos Armando | Method and apparatus for applying if-then-else rules to data sets in a relational data base and generating from the results of application of said rules a database of diagnostics linked to said data sets to aid executive analysis of financial data |
EP0732651B1 (en) | 1995-03-15 | 2001-09-26 | Koninklijke Philips Electronics N.V. | Data processing system for executing tasks having diverse priorities and modem incorporating same |
US5566171A (en) | 1995-03-15 | 1996-10-15 | Finisar Corporation | Multi-mode high speed network switch for node-to-node communication |
US5724481A (en) | 1995-03-30 | 1998-03-03 | Lucent Technologies Inc. | Method for automatic speech recognition of arbitrary spoken words |
US5754671A (en) | 1995-04-12 | 1998-05-19 | Lockheed Martin Corporation | Method for improving cursive address recognition in mail pieces using adaptive data base management |
US5749066A (en) | 1995-04-24 | 1998-05-05 | Ericsson Messaging Systems Inc. | Method and apparatus for developing a neural network for phoneme recognition |
CA2180392C (en) | 1995-07-31 | 2001-02-13 | Paul Wesley Cohrs | User selectable multiple threshold criteria for voice recognition |
AU6849196A (en) | 1995-08-16 | 1997-03-19 | Syracuse University | Multilingual document retrieval system and method using semantic vector matching |
JP3298379B2 (en) | 1995-09-20 | 2002-07-02 | 株式会社日立製作所 | Electronic approval method and system |
US5940612A (en) | 1995-09-27 | 1999-08-17 | International Business Machines Corporation | System and method for queuing of tasks in a multiprocessing system |
US6879586B2 (en) | 1996-07-09 | 2005-04-12 | Genesys Telecommunications Laboratories, Inc. | Internet protocol call-in centers and establishing remote agents |
US5765033A (en) | 1997-02-06 | 1998-06-09 | Genesys Telecommunications Laboratories, Inc. | System for routing electronic mails |
US5948058A (en) | 1995-10-30 | 1999-09-07 | Nec Corporation | Method and apparatus for cataloging and displaying e-mail using a classification rule preparing means and providing cataloging a piece of e-mail into multiple categories or classification types based on e-mail object information |
IL116708A (en) | 1996-01-08 | 2000-12-06 | Smart Link Ltd | Real-time task manager for a personal computer |
US5895447A (en) | 1996-02-02 | 1999-04-20 | International Business Machines Corporation | Speech recognition using thresholded speaker class model selection or model adaptation |
US6073101A (en) * | 1996-02-02 | 2000-06-06 | International Business Machines Corporation | Text independent speaker recognition for transparent command ambiguity resolution and continuous access control |
JPH11506239A (en) | 1996-03-05 | 1999-06-02 | フィリップス エレクトロニクス ネムローゼ フェンノートシャップ | Transaction system |
US6301602B1 (en) | 1996-03-08 | 2001-10-09 | Kabushiki Kaisha Toshiba | Priority information display system |
DE19610848A1 (en) | 1996-03-19 | 1997-09-25 | Siemens Ag | Computer unit for speech recognition and method for computer-aided mapping of a digitized speech signal onto phonemes |
DE69738832D1 (en) | 1996-03-28 | 2008-08-28 | Hitachi Ltd | Method for planning periodical processes |
US5715371A (en) | 1996-05-31 | 1998-02-03 | Lucent Technologies Inc. | Personal computer-based intelligent networks |
US5878386A (en) | 1996-06-28 | 1999-03-02 | Microsoft Corporation | Natural language parser with dictionary-based part-of-speech probabilities |
US6035104A (en) | 1996-06-28 | 2000-03-07 | Data Link Systems Corp. | Method and apparatus for managing electronic documents by alerting a subscriber at a destination other than the primary destination |
US5721770A (en) | 1996-07-02 | 1998-02-24 | Lucent Technologies Inc. | Agent vectoring programmably conditionally assigning agents to various tasks including tasks other than handling of waiting calls |
JPH10105368A (en) | 1996-07-12 | 1998-04-24 | Senshu Ginkou:Kk | Device and method for vocal process request acceptance |
US6021403A (en) | 1996-07-19 | 2000-02-01 | Microsoft Corporation | Intelligent user assistance facility |
US6223201B1 (en) | 1996-08-27 | 2001-04-24 | International Business Machines Corporation | Data processing system and method of task management within a self-managing application |
US7764231B1 (en) | 1996-09-09 | 2010-07-27 | Tracbeam Llc | Wireless location using multiple mobile station location techniques |
US5878385A (en) | 1996-09-16 | 1999-03-02 | Ergo Linguistic Technologies | Method and apparatus for universal parsing of language |
US6253188B1 (en) | 1996-09-20 | 2001-06-26 | Thomson Newspapers, Inc. | Automated interactive classified ad system for the internet |
US6424995B1 (en) | 1996-10-16 | 2002-07-23 | Microsoft Corporation | Method for displaying information contained in an electronic message |
US5867495A (en) | 1996-11-18 | 1999-02-02 | Mci Communications Corporations | System, method and article of manufacture for communications utilizing calling, plans in a hybrid network |
US5836771A (en) | 1996-12-02 | 1998-11-17 | Ho; Chi Fai | Learning method and system based on questioning |
US5864848A (en) | 1997-01-31 | 1999-01-26 | Microsoft Corporation | Goal-driven information interpretation and extraction system |
US5946388A (en) | 1997-02-06 | 1999-08-31 | Walker Asset Management Limited Partnership | Method and apparatus for priority queuing of telephone calls |
US6161094A (en) | 1997-02-14 | 2000-12-12 | Ann Adcock Corporation | Method of comparing utterances for security control |
US6434435B1 (en) | 1997-02-21 | 2002-08-13 | Baker Hughes Incorporated | Application of adaptive object-oriented optimization software to an automatic optimization oilfield hydrocarbon production management system |
US6112126A (en) | 1997-02-21 | 2000-08-29 | Baker Hughes Incorporated | Adaptive object-oriented optimization software system |
US6185603B1 (en) | 1997-03-13 | 2001-02-06 | At&T Corp. | Method and system for delivery of e-mail and alerting messages |
JPH10254719A (en) | 1997-03-14 | 1998-09-25 | Canon Inc | Information processor and information processing method |
US6314446B1 (en) | 1997-03-31 | 2001-11-06 | Stiles Inventions | Method and system for monitoring tasks in a computer system |
JP3033514B2 (en) | 1997-03-31 | 2000-04-17 | 日本電気株式会社 | Large vocabulary speech recognition method and apparatus |
US6182059B1 (en) | 1997-04-03 | 2001-01-30 | Brightware, Inc. | Automatic electronic message interpretation and routing system |
FR2762917B1 (en) | 1997-05-02 | 1999-06-11 | Alsthom Cge Alcatel | METHOD FOR DYNAMICALLY ASSIGNING TASKS TO EVENTS ARRIVING ON A SET OF HOLDING LINES |
GB9710522D0 (en) | 1997-05-23 | 1997-07-16 | Rolls Royce Plc | Control system |
US5811706A (en) | 1997-05-27 | 1998-09-22 | Rockwell Semiconductor Systems, Inc. | Synthesizer system utilizing mass storage devices for real time, low latency access of musical instrument digital samples |
JPH1115756A (en) | 1997-06-24 | 1999-01-22 | Omron Corp | Electronic mail discrimination method, device, therefor and storage medium |
US6658447B2 (en) | 1997-07-08 | 2003-12-02 | Intel Corporation | Priority based simultaneous multi-threading |
EP1005521B1 (en) | 1997-07-14 | 2004-09-22 | The Procter & Gamble Company | Process for making a low density detergent composition by controlling agglomeration via particle size |
US6061667A (en) | 1997-08-04 | 2000-05-09 | Schneider National, Inc. | Modular rating engine, rating system and method for processing rating requests in a computerized rating system |
US5963447A (en) | 1997-08-22 | 1999-10-05 | Hynomics Corporation | Multiple-agent hybrid control architecture for intelligent real-time control of distributed nonlinear processes |
US6243735B1 (en) | 1997-09-01 | 2001-06-05 | Matsushita Electric Industrial Co., Ltd. | Microcontroller, data processing system and task switching control method |
US6374219B1 (en) * | 1997-09-19 | 2002-04-16 | Microsoft Corporation | System for using silence in speech recognition |
US6182120B1 (en) | 1997-09-30 | 2001-01-30 | International Business Machines Corporation | Method and system for scheduling queued messages based on queue delay and queue priority |
US6212544B1 (en) | 1997-10-23 | 2001-04-03 | International Business Machines Corporation | Altering thread priorities in a multithreaded processor |
US6058389A (en) | 1997-10-31 | 2000-05-02 | Oracle Corporation | Apparatus and method for message queuing in a database system |
US6493447B1 (en) | 1997-11-21 | 2002-12-10 | Mci Communications Corporation | Contact server for call center for syncronizing simultaneous telephone calls and TCP/IP communications |
US5999932A (en) | 1998-01-13 | 1999-12-07 | Bright Light Technologies, Inc. | System and method for filtering unsolicited electronic mail messages using data matching and heuristic processing |
US6067565A (en) | 1998-01-15 | 2000-05-23 | Microsoft Corporation | Technique for prefetching a web page of potential future interest in lieu of continuing a current information download |
US5974465A (en) | 1998-01-21 | 1999-10-26 | 3Com Corporation | Method and apparatus for prioritizing the enqueueing of outbound data packets in a network device |
US6138139A (en) | 1998-10-29 | 2000-10-24 | Genesys Telecommunications Laboraties, Inc. | Method and apparatus for supporting diverse interaction paths within a multimedia communication center |
US6360243B1 (en) | 1998-03-10 | 2002-03-19 | Motorola, Inc. | Method, device and article of manufacture for implementing a real-time task scheduling accelerator |
US6430615B1 (en) | 1998-03-13 | 2002-08-06 | International Business Machines Corporation | Predictive model-based measurement acquisition employing a predictive model operating on a manager system and a managed system |
US6038535A (en) | 1998-03-23 | 2000-03-14 | Motorola, Inc. | Speech classifier and method using delay elements |
US6085159A (en) | 1998-03-26 | 2000-07-04 | International Business Machines Corporation | Displaying voice commands with multiple variables |
US6327581B1 (en) | 1998-04-06 | 2001-12-04 | Microsoft Corporation | Methods and apparatus for building a support vector machine classifier |
US6308197B1 (en) | 1998-04-29 | 2001-10-23 | Xerox Corporation | Machine control using register construct |
US6490572B2 (en) | 1998-05-15 | 2002-12-03 | International Business Machines Corporation | Optimization prediction for industrial processes |
US5999990A (en) | 1998-05-18 | 1999-12-07 | Motorola, Inc. | Communicator having reconfigurable resources |
US6411982B2 (en) | 1998-05-28 | 2002-06-25 | Hewlett-Packard Company | Thread based governor for time scheduled process execution |
US6161130A (en) | 1998-06-23 | 2000-12-12 | Microsoft Corporation | Technique which utilizes a probabilistic classifier to detect "junk" e-mail by automatically updating a training and re-training the classifier based on the updated training set |
US6321226B1 (en) | 1998-06-30 | 2001-11-20 | Microsoft Corporation | Flexible keyboard searching |
US6070149A (en) | 1998-07-02 | 2000-05-30 | Activepoint Ltd. | Virtual sales personnel |
US6226630B1 (en) | 1998-07-22 | 2001-05-01 | Compaq Computer Corporation | Method and apparatus for filtering incoming information using a search engine and stored queries defining user folders |
US6061709A (en) | 1998-07-31 | 2000-05-09 | Integrated Systems Design Center, Inc. | Integrated hardware and software task control executive |
US6446061B1 (en) | 1998-07-31 | 2002-09-03 | International Business Machines Corporation | Taxonomy generation for document collections |
US6353667B1 (en) | 1998-08-27 | 2002-03-05 | Avaya Technology Corp. | Minimum interruption cycle time threshold for reserve call center agents |
US6377949B1 (en) | 1998-09-18 | 2002-04-23 | Tacit Knowledge Systems, Inc. | Method and apparatus for assigning a confidence level to a term within a user knowledge profile |
US6418458B1 (en) | 1998-10-02 | 2002-07-09 | Ncr Corporation | Object-oriented prioritized work thread pool |
US6449646B1 (en) | 1998-10-13 | 2002-09-10 | Aspect Communications Corporation | Method and apparatus for allocating mixed transaction type messages to resources via an integrated queuing mechanism |
US6282565B1 (en) | 1998-11-17 | 2001-08-28 | Kana Communications, Inc. | Method and apparatus for performing enterprise email management |
US6502072B2 (en) * | 1998-11-20 | 2002-12-31 | Microsoft Corporation | Two-tier noise rejection in speech recognition |
US6477562B2 (en) | 1998-12-16 | 2002-11-05 | Clearwater Networks, Inc. | Prioritized instruction scheduling for multi-streaming processors |
US6643260B1 (en) | 1998-12-18 | 2003-11-04 | Cisco Technology, Inc. | Method and apparatus for implementing a quality of service policy in a data communications network |
US6442589B1 (en) | 1999-01-14 | 2002-08-27 | Fujitsu Limited | Method and system for sorting and forwarding electronic messages and other data |
US6424997B1 (en) | 1999-01-27 | 2002-07-23 | International Business Machines Corporation | Machine learning based electronic messaging system |
US6434230B1 (en) | 1999-02-02 | 2002-08-13 | Avaya Technology Corp. | Rules-based queuing of calls to call-handling resources |
US6182036B1 (en) | 1999-02-23 | 2001-01-30 | Motorola, Inc. | Method of extracting features in a voice recognition system |
US6744878B1 (en) | 1999-03-02 | 2004-06-01 | Aspect Communications Corporation | Real-time transaction routing augmented with forecast data and agent schedules |
US6421066B1 (en) | 1999-03-23 | 2002-07-16 | Klab.Com - The Knowledge Infrastructure Laboratory Ltd. | Method for creating a knowledge map |
US6493694B1 (en) | 1999-04-01 | 2002-12-10 | Qwest Communications Interational Inc. | Method and system for correcting customer service orders |
US6747970B1 (en) | 1999-04-29 | 2004-06-08 | Christopher H. Lamb | Methods and apparatus for providing communications services between connectionless and connection-oriented networks |
US6370526B1 (en) | 1999-05-18 | 2002-04-09 | International Business Machines Corporation | Self-adaptive method and system for providing a user-preferred ranking order of object sets |
US6594697B1 (en) | 1999-05-20 | 2003-07-15 | Microsoft Corporation | Client system having error page analysis and replacement capabilities |
US6312378B1 (en) | 1999-06-03 | 2001-11-06 | Cardiac Intelligence Corporation | System and method for automated collection and analysis of patient information retrieved from an implantable medical device for remote patient care |
US6374221B1 (en) * | 1999-06-22 | 2002-04-16 | Lucent Technologies Inc. | Automatic retraining of a speech recognizer while using reliable transcripts |
SE516871C2 (en) | 1999-06-23 | 2002-03-12 | Teracom Ab | Method for flow control in a data communication network |
WO2001004791A1 (en) | 1999-07-09 | 2001-01-18 | Streamline Systems Pty Ltd | Methods of organising information |
US6496853B1 (en) | 1999-07-12 | 2002-12-17 | Micron Technology, Inc. | Method and system for managing related electronic messages |
US6535795B1 (en) | 1999-08-09 | 2003-03-18 | Baker Hughes Incorporated | Method for chemical addition utilizing adaptive optimization |
US6571282B1 (en) | 1999-08-31 | 2003-05-27 | Accenture Llp | Block-based communication in a communication services patterns environment |
US6477580B1 (en) | 1999-08-31 | 2002-11-05 | Accenture Llp | Self-described stream in a communication services patterns environment |
US6742015B1 (en) | 1999-08-31 | 2004-05-25 | Accenture Llp | Base services patterns in a netcentric environment |
US6618727B1 (en) | 1999-09-22 | 2003-09-09 | Infoglide Corporation | System and method for performing similarity searching |
US6442542B1 (en) | 1999-10-08 | 2002-08-27 | General Electric Company | Diagnostic system with learning capabilities |
US6654726B1 (en) | 1999-11-05 | 2003-11-25 | Ford Motor Company | Communication schema of online system and method of status inquiry and tracking related to orders for consumer product having specific configurations |
US6615172B1 (en) | 1999-11-12 | 2003-09-02 | Phoenix Solutions, Inc. | Intelligent query engine for processing voice based queries |
US6915344B1 (en) | 1999-11-30 | 2005-07-05 | Microsoft Corporation | Server stress-testing response verification |
US6496836B1 (en) | 1999-12-20 | 2002-12-17 | Belron Systems, Inc. | Symbol-based memory language system and method |
US6850513B1 (en) | 1999-12-30 | 2005-02-01 | Intel Corporation | Table-based packet classification |
US6757291B1 (en) | 2000-02-10 | 2004-06-29 | Simpletech, Inc. | System for bypassing a server to achieve higher throughput between data network and data storage system |
US6460074B1 (en) | 2000-02-10 | 2002-10-01 | Martin E. Fishkin | Electronic mail system |
US6714643B1 (en) | 2000-02-24 | 2004-03-30 | Siemens Information & Communication Networks, Inc. | System and method for implementing wait time estimation in automatic call distribution queues |
US6513026B1 (en) | 2000-06-17 | 2003-01-28 | Microsoft Corporation | Decision theoretic principles and policies for notification |
US6957432B2 (en) | 2000-03-21 | 2005-10-18 | Microsoft Corporation | Real-time scheduler |
US20010027463A1 (en) | 2000-03-22 | 2001-10-04 | Fujitsu Limited | Task priority decision apparatus and method, workflow system, work processing method, and recording medium |
US6704728B1 (en) | 2000-05-02 | 2004-03-09 | Iphase.Com, Inc. | Accessing information from a collection of data |
US6745181B1 (en) | 2000-05-02 | 2004-06-01 | Iphrase.Com, Inc. | Information access method |
US6904595B2 (en) | 2000-05-08 | 2005-06-07 | Microtune (San Diego), Inc. | Priority in a portable thread environment |
US6408277B1 (en) | 2000-06-21 | 2002-06-18 | Banter Limited | System and method for automatic task prioritization |
US6738759B1 (en) | 2000-07-07 | 2004-05-18 | Infoglide Corporation, Inc. | System and method for performing similarity searching using pointer optimization |
EP1182550A3 (en) | 2000-08-21 | 2006-08-30 | Texas Instruments France | Task based priority arbitration |
JP2002082815A (en) | 2000-09-07 | 2002-03-22 | Oki Electric Ind Co Ltd | Task program control system |
US20020103871A1 (en) | 2000-09-11 | 2002-08-01 | Lingomotors, Inc. | Method and apparatus for natural language processing of electronic mail |
KR20010016276A (en) | 2000-11-29 | 2001-03-05 | 안병엽 | Method and system for processing e-mail with an anonymous receiver |
US6910212B2 (en) | 2000-12-04 | 2005-06-21 | International Business Machines Corporation | System and method for improved complex storage locks |
US20020073129A1 (en) | 2000-12-04 | 2002-06-13 | Yu-Chung Wang | Integrated multi-component scheduler for operating systems |
JP3578082B2 (en) | 2000-12-20 | 2004-10-20 | 株式会社デンソー | Processing execution device and recording medium |
US20020087623A1 (en) | 2000-12-30 | 2002-07-04 | Eatough David A. | Method and apparatus for determining network topology and/or managing network related tasks |
US6834385B2 (en) | 2001-01-04 | 2004-12-21 | International Business Machines Corporation | System and method for utilizing dispatch queues in a multiprocessor data processing system |
US6957433B2 (en) | 2001-01-08 | 2005-10-18 | Hewlett-Packard Development Company, L.P. | System and method for adaptive performance optimization of data processing systems |
US20020150966A1 (en) | 2001-02-09 | 2002-10-17 | Muraca Patrick J. | Specimen-linked database |
US8219620B2 (en) | 2001-02-20 | 2012-07-10 | Mcafee, Inc. | Unwanted e-mail filtering system including voting feedback |
US6925154B2 (en) * | 2001-05-04 | 2005-08-02 | International Business Machines Corproation | Methods and apparatus for conversational name dialing systems |
US20030046297A1 (en) | 2001-08-30 | 2003-03-06 | Kana Software, Inc. | System and method for a partially self-training learning system |
-
2003
- 2003-04-22 US US10/421,356 patent/US7389230B1/en active Active
-
2008
- 2008-03-04 US US12/042,111 patent/US20080154595A1/en not_active Abandoned
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5502790A (en) * | 1991-12-24 | 1996-03-26 | Oki Electric Industry Co., Ltd. | Speech recognition method and system using triphones, diphones, and phonemes |
US5615299A (en) * | 1994-06-20 | 1997-03-25 | International Business Machines Corporation | Speech recognition using dynamic features |
US5983180A (en) * | 1997-10-23 | 1999-11-09 | Softsound Limited | Recognition of sequential data using finite state sequence models organized in a tree structure |
Cited By (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8065146B2 (en) * | 2006-07-12 | 2011-11-22 | Microsoft Corporation | Detecting an answering machine using speech recognition |
US20080015846A1 (en) * | 2006-07-12 | 2008-01-17 | Microsoft Corporation | Detecting an answering machine using speech recognition |
US20100094633A1 (en) * | 2007-03-16 | 2010-04-15 | Takashi Kawamura | Voice analysis device, voice analysis method, voice analysis program, and system integration circuit |
US8478587B2 (en) * | 2007-03-16 | 2013-07-02 | Panasonic Corporation | Voice analysis device, voice analysis method, voice analysis program, and system integration circuit |
US20090157400A1 (en) * | 2007-12-14 | 2009-06-18 | Industrial Technology Research Institute | Speech recognition system and method with cepstral noise subtraction |
US8150690B2 (en) * | 2007-12-14 | 2012-04-03 | Industrial Technology Research Institute | Speech recognition system and method with cepstral noise subtraction |
US20110172954A1 (en) * | 2009-04-20 | 2011-07-14 | University Of Southern California | Fence intrusion detection |
US8682669B2 (en) * | 2009-08-21 | 2014-03-25 | Synchronoss Technologies, Inc. | System and method for building optimal state-dependent statistical utterance classifiers in spoken dialog systems |
US20110046951A1 (en) * | 2009-08-21 | 2011-02-24 | David Suendermann | System and method for building optimal state-dependent statistical utterance classifiers in spoken dialog systems |
US20110231261A1 (en) * | 2010-03-17 | 2011-09-22 | Microsoft Corporation | Voice customization for voice-enabled text advertisements |
US20130090926A1 (en) * | 2011-09-16 | 2013-04-11 | Qualcomm Incorporated | Mobile device context information using speech detection |
US8484025B1 (en) * | 2012-10-04 | 2013-07-09 | Google Inc. | Mapping an audio utterance to an action using a classifier |
US20160027444A1 (en) * | 2014-07-22 | 2016-01-28 | Nuance Communications, Inc. | Method and apparatus for detecting splicing attacks on a speaker verification system |
US10276166B2 (en) * | 2014-07-22 | 2019-04-30 | Nuance Communications, Inc. | Method and apparatus for detecting splicing attacks on a speaker verification system |
US10334103B2 (en) | 2017-01-25 | 2019-06-25 | International Business Machines Corporation | Message translation for cognitive assistance |
US20190066675A1 (en) * | 2017-08-23 | 2019-02-28 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Artificial intelligence based method and apparatus for classifying voice-recognized text |
US10762901B2 (en) * | 2017-08-23 | 2020-09-01 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Artificial intelligence based method and apparatus for classifying voice-recognized text |
CN108924483A (en) * | 2018-06-27 | 2018-11-30 | 南京朴厚生态科技有限公司 | A kind of automatic monitoring system and method for the field animal based on depth learning technology |
CN109935230A (en) * | 2019-04-01 | 2019-06-25 | 北京宇航系统工程研究所 | A system and method for detecting and issuing passwords based on voice drive |
US11087747B2 (en) | 2019-05-29 | 2021-08-10 | Honeywell International Inc. | Aircraft systems and methods for retrospective audio analysis |
Also Published As
Publication number | Publication date |
---|---|
US7389230B1 (en) | 2008-06-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7389230B1 (en) | System and method for classification of voice signals | |
JP4195428B2 (en) | Speech recognition using multiple speech features | |
US6542866B1 (en) | Speech recognition method and apparatus utilizing multiple feature streams | |
US5822728A (en) | Multistage word recognizer based on reliably detected phoneme similarity regions | |
JP2597791B2 (en) | Speech recognition device and method | |
US7219055B2 (en) | Speech recognition apparatus and method adapting best transformation function to transform one of the input speech and acoustic model | |
EP1171871B1 (en) | Recognition engines with complementary language models | |
JP4351385B2 (en) | Speech recognition system for recognizing continuous and separated speech | |
JP4221379B2 (en) | Automatic caller identification based on voice characteristics | |
US6868380B2 (en) | Speech recognition system and method for generating phonotic estimates | |
US7451083B2 (en) | Removing noise from feature vectors | |
US8831947B2 (en) | Method and apparatus for large vocabulary continuous speech recognition using a hybrid phoneme-word lattice | |
EP1886303A1 (en) | Method of adapting a neural network of an automatic speech recognition device | |
US11763801B2 (en) | Method and system for outputting target audio, readable storage medium, and electronic device | |
KR20060022156A (en) | Distributed speech recognition system and method | |
US20030093269A1 (en) | Method and apparatus for denoising and deverberation using variational inference and strong speech models | |
Hain et al. | The AMI meeting transcription system: Progress and performance | |
Erell et al. | Filterbank-energy estimation using mixture and Markov models for recognition of noisy speech | |
US8423354B2 (en) | Speech recognition dictionary creating support device, computer readable medium storing processing program, and processing method | |
Rose et al. | Integration of utterance verification with statistical language modeling and spoken language understanding | |
KR101041035B1 (en) | High speed speaker recognition method and apparatus, Registration method and apparatus for high speed speaker recognition | |
CN116386633A (en) | Intelligent terminal equipment control method and system suitable for noise condition | |
Rose et al. | An efficient framework for robust mobile speech recognition services | |
Cook et al. | Real-time recognition of broadcast radio speech | |
CN119811389A (en) | Speech recognition intelligent rescue vehicle and speech recognition method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |