WO2000077773A1 - Procede et dispositif de reconnaissance vocale, d'identification du locuteur, et de synthese vocale automatiques - Google Patents
Procede et dispositif de reconnaissance vocale, d'identification du locuteur, et de synthese vocale automatiques Download PDFInfo
- Publication number
- WO2000077773A1 WO2000077773A1 PCT/DE2000/001999 DE0001999W WO0077773A1 WO 2000077773 A1 WO2000077773 A1 WO 2000077773A1 DE 0001999 W DE0001999 W DE 0001999W WO 0077773 A1 WO0077773 A1 WO 0077773A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- speech
- classification result
- speech recognition
- voice
- basis
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 62
- 230000002068 genetic effect Effects 0.000 claims abstract description 26
- 230000015572 biosynthetic process Effects 0.000 claims abstract description 9
- 238000003786 synthesis reaction Methods 0.000 claims abstract description 8
- 238000012545 processing Methods 0.000 claims abstract description 7
- 230000001537 neural effect Effects 0.000 claims abstract description 6
- 230000008859 change Effects 0.000 claims abstract description 4
- 230000005236 sound signal Effects 0.000 claims abstract 2
- 238000013528 artificial neural network Methods 0.000 claims description 38
- 230000006870 function Effects 0.000 claims description 11
- 238000004519 manufacturing process Methods 0.000 claims description 4
- 238000000605 extraction Methods 0.000 claims description 2
- 230000001771 impaired effect Effects 0.000 claims description 2
- 238000003908 quality control method Methods 0.000 claims description 2
- 230000006855 networking Effects 0.000 claims 1
- 238000011144 upstream manufacturing Methods 0.000 claims 1
- 230000008569 process Effects 0.000 abstract description 3
- 230000009467 reduction Effects 0.000 description 14
- 238000012549 training Methods 0.000 description 8
- 230000008901 benefit Effects 0.000 description 5
- 238000012795 verification Methods 0.000 description 5
- 238000012360 testing method Methods 0.000 description 4
- 230000003595 spectral effect Effects 0.000 description 3
- 239000013598 vector Substances 0.000 description 3
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 238000007619 statistical method Methods 0.000 description 2
- PUAQLLVFLMYYJJ-UHFFFAOYSA-N 2-aminopropiophenone Chemical compound CC(N)C(=O)C1=CC=CC=C1 PUAQLLVFLMYYJJ-UHFFFAOYSA-N 0.000 description 1
- 101000822695 Clostridium perfringens (strain 13 / Type A) Small, acid-soluble spore protein C1 Proteins 0.000 description 1
- 101000655262 Clostridium perfringens (strain 13 / Type A) Small, acid-soluble spore protein C2 Proteins 0.000 description 1
- 241000408659 Darpa Species 0.000 description 1
- 101000655256 Paraclostridium bifermentans Small, acid-soluble spore protein alpha Proteins 0.000 description 1
- 101000655264 Paraclostridium bifermentans Small, acid-soluble spore protein beta Proteins 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000003750 conditioning effect Effects 0.000 description 1
- 238000013479 data entry Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000010183 spectrum analysis Methods 0.000 description 1
- 238000013179 statistical model Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/16—Speech classification or search using artificial neural networks
Definitions
- the invention relates to a method for automatic speech recognition, speaker identification and speech generation based on genetic programming (GP) and to an apparatus for performing the method.
- GP genetic programming
- the automatic speech recognition can be used both for recognition, spoken, transmitted speech and for verifying the identity of a user.
- HMM Hiddden-Markow model
- cepstral noise reduction is based on a multilayer network based on LPC (linear predictive coding) cepstral coefficients.
- LPC linear predictive coding
- a noise reduction network uses a nonlinear auto-associative mapping to set a number of noisy cepstral coefficients into a number of Noise-free cepstral coefficients in the area of the cepstral around With this method a detection rate of 65% is achieved ("A Cepstral Noise Reduction Multi-Layer Neural Network", Helge BD Sorensen, ICASSP91, Toranto, Canada, May 14-17, 1991)
- Another method of noise reduction is based on a structured universal network. Such a network enables noise reduction through the 3 following steps
- the first step is a spectral analysis of the spoken language.
- the second is a self-structured neural noise reduction method SNNR (Self-Structuring Neural Noise Reduction).
- SNNR Self-Structuring Neural Noise Reduction
- the result of the SNNR network, which is already noise-reduced, is then in the third step by the so-called hidden control Neural Network (HCNN) completed (Helge BD Sorensen and Uwe Hartmann, "A Self-Structuring Neural Noise Reduction Model", University of Aalborg, Denmark, after May 1991)
- HCNN hidden control Neural Network
- connectionist model Another known method for noise reduction is that of the connectionist model.
- a 4-stage network is trained by an algorithm to convert noisy signals into noise-free signals. In this way, the network is able to learn noise reduction. Furthermore, it can generate noisy signals that are not Part of the trained signals are separate from noise (Shin'ichi Tamura and Axel Waibel, "Noise Reduction Using Conectionist Models", Osaka, Japan, ICASSP88, April 1988)
- Speaker verification methods use person-specific properties of the human voice as characteristics.These make it possible to verify a person's identity using a short speech sample of the respective person.Usually, this method extracts speaker-specific features from at least one digital speech sample. These methods for speaker veneration use two different phases , a training phase and a test phase
- Speaker verification methods statements which can be specified by a user are spoken into an arrangement. Reference feature vectors are formed from this, which contain speaker-specific characteristics. For this purpose, the speech signal is divided into small, pseudo-stationary sections. For the duration of these sections ⁇ the speech signal is assumed to be stationary. These sections usually have a duration of approximately 10 to 20 ms.
- At least one, usually a plurality, of characteristic vector is formed for the speech signal, which are compared with the speech sample. If the distance is sufficient, the speaker is accepted as the speaker to be verified.
- the previously described method is known, in which there is a considerable disadvantage it can be seen that the method has a high degree of uncertainty in the verification of the speaker. The result of this is that a decision threshold for acceptance or rejection of the speaker must be determined. This determination is made only from the very short pseudo-stationary sections of the speech sample of the user to be verified. Even the method described in DE196 30 109 A1, which describes the consideration of a "counterexample” (speech sample from a speaker who is not to be verified), does not change the fundamental problem of the method.
- the general purpose of automatic speech generation is to provide different forms of information for a person using a computer or device.
- Lexical pronunciations are under-specified, generalized pronunciations that can, but do not have to, result in changed post-lexical pronunciations in natural language. For example, the English word "foot” can be listed as / fuht / in a pronunciation dictionary
- a lem method is used to learn how spectral speech information is generated from phonetic information.This is how the acoustic parameters of the neural network are trained.Here, speech waveforms are marked with phonetic information, and then for example, a neural network or another system controlled by data is trained to learn the spectral characteristics of the sounds associated with these time periods If the neural network system is actually used, the neural network must generate appropriate spectral information from certain phonetic information.
- the phonetic information from a text is generated using a spelling-phonetic lexicon or with an automatic method that has been trained with such a lexicon derived
- Computer units are known for controlling devices, machines, computers and production systems. These are essentially used for control purposes 1
- circuits are installed which have only limited functionality with regard to voice input. These chips can only recognize a small number of voice commands and are very sensitive to changes in the voice position and to one another Interference noise
- Such computer units with such built-in chips are currently manufactured and offered by companies such as Sensorx, Ine (Sunnyval, CA, USA) or fonix, Ine (Salt Lake City, UT, USA).
- the computer units currently available on the market are not designed for human-machine dialog on a voice basis, but are operated by input using various control elements (e.g. switches or buttons) and / or keyboard.
- the response or response of the computer units is generally carried out using an alphanumeric and / or graphic Display
- the aforementioned known computer units are only conditionally suitable for human-machine dialogue due to their limited functionality with regard to voice input and voice output. Furthermore, these computer units, which are operated by switches, buttons or keyboards, are in contrast to automatic voice input and voice output (according to present invention) considerably more prone to errors, more susceptible to faults and more complex (in handling) with regard to data input or output. In addition, such systems always require corresponding skills and knowledge with regard to the functioning and operation (for example with the keyboard)
- the object of the present invention is then to provide a method and / or a device which enables reliable automatic speech recognition, which works efficiently and robustly even in the event of interference from background noise, and can be easily and simply incorporated into embedded systems (integrated microcomputer controls), and Integrated device allows the speaker to be identified reliably and provides an output option by means of speech synthesis
- a) a speech signal is digitized with a predeterminable clock rate
- b) the digitized values of the speech signal are fed to a GP in such a way that a classification result is formed by repeated calling of the GP with values of the digitized speech signal .
- the classification is based on the classification result taking into account the value and / or the change in the value at predeterminable and / or fixed intervals, d) that the classification result is processed in such a way that phonemes and / or words are based on neural networks ( NN) and / or on the basis of genetic programs (GP) and / or on the basis of fuzzy logic (FL), e) a computer unit a speech recognition, speaker identification and speech generation from clock generator, CPU (Central Processor Unit) , Command memory and / or data memory, analog input and / or analog output circuit includes.
- NN neural networks
- GP genetic programs
- FL fuzzy logic
- a computer unit a speech recognition, speaker identification and speech generation from clock generator, CPU (Central Processor Unit) , Command memory and / or data memory, analog input and / or analog output circuit includes.
- 3 shows a computer unit in top view, consisting of clock generator, CPU, command memory and / or data memory and analog input and / or output circuit
- 4 is a side view of a computer mouse with a built-in computer unit and a microphone for voice input,
- 5 shows a computer unit or speech recognition unit in
- FIG. 7 shows a computer unit or a speech recognition and speech generation unit in a top view; a microphone; a loudspeaker and a connection socket for the connection to the control unit of the wheelchair,
- connection socket for GPS antenna connection socket for FM antenna
- FIG. 9 shows a computer unit or in plan view with a speech recognition and speech generation unit; a microphone; a speaker; a control panel and a display,
- 10 is a top view of a circuit with a clock generator, a CPU core, an NN network, a command memory and / or data memory as well as an analog input circuit and an analog output circuit.
- 1 shows a flow diagram according to the method according to the invention, which shows the data flow and the processing of the speech signal (1) up to the classification result ( ⁇ ).
- the speech signal (1) is digitized by means of signal digitization or signal conditioning (2) and optionally processed (in the form of digital filters).
- These GP commands (3) of the genetic program (5) are called up repeatedly during one of the predefinable time intervals with digitized values of the speech signal (1).
- a classification result (6) is then set, which represents the recognized content of the speech signal (1).
- FIG. 2 shows the method for the further processing of the classification result (s) (5) of one or more genetic programs (4).
- phonetic rules (1) or the predefinable recognizable words (3) the values are supplied to one or more function blocks (GP (8) and / or fuzzy logic (7) and / or NN (6)).
- a word and / or phoneme identifier (9) is calculated, which contains a list of words / phonemes or an individual word / phoneme and its / their recognition probability.
- FIG. 3 describes a device which represents a computer unit (1) which is used for speech recognition, speaker identification and speech generation.
- the computer unit (1) consists of a clock generator (2). which specifies the clock for the CPU (Central Processor Unit) (4), a command and / or data memory (5) in which programs GP's, as well as conventionally created programs and data are stored and an analog input and / or analog output circuit (3) converts the speech signals into digital values and / or digital values into speech signals.
- a clock generator (2) which specifies the clock for the CPU (Central Processor Unit) (4), a command and / or data memory (5) in which programs GP's, as well as conventionally created programs and data are stored and an analog input and / or analog output circuit (3) converts the speech signals into digital values and / or digital values into speech signals.
- CPU Central Processor Unit
- a computer mouse (1) which can be operated by voice input.
- the user interface is controlled by voice control via the microphone (2) on the basis of GP (genetic programs) and / or NN algorithms and / or fuzzy logic.
- This control takes place by means of a computer unit (3) which contains a voice chip (4).
- which implements the operating commands 5 shows a computer unit or speech recognition unit (1), in which the input of SMS (short message system) messages is carried out by voice input. SMS messages are generated by voice input via the microphone (2) using the voice recognition unit (1) on the basis of GP (genetic programs) and / or NN algorithms and / or fuzzy logic and via the GSM connection socket (4) output to a GSM phone.
- the unit reports back via the loudspeaker (3).
- FIG. 6 shows a computer unit or speech generation unit 1) which automatically establishes a GSM connection and / or radio connection and makes an emergency call.
- Previously stored data is converted into speech based on GP (genetic programs) and / or NN algorithms and / or fuzzy logic and output via the GSM connection socket (2).
- FIG. 7 shows a computer unit or speech recognition and speech generation unit (1); a microphone (2); a loudspeaker (3) and a connection socket (4) for the connection to the wheelchair control unit.
- This enables voice-controlled operation based on GP (genetic programs) and / or NN algorithms and / or fuzzy logic to control the wheelchair.
- FIG. 8 shows a computer unit (1) with a speech recognition and speech generation unit (4).
- the microphone (2) is used for voice input based on GP (genetic programs) and / or NN algorithms and / or fuzzy logic, which is output again by the computer unit (1) via the loudspeaker (3) for checking purposes.
- further information or commands can be entered using the function keys (5).
- GPS data Global Position System
- D-GPS data Diffemtial Global Position System
- a computer unit (1) with a speech recognition and speech generation unit (4) is shown.
- voice input via the microphone (2) by means of the speech recognition and speech generation unit (4) based on GP (genetic programs) and / or NN -Algo ⁇ thmen and / or fuzzy logic the voice-controlled data input and output of production and warehouse data enables voice input via the microphone (2), which is output by the computer unit (1) via the loudspeaker (3) for control purposes, or on the display (6) is shown. Additional information or commands can also be entered using the function keys (5)
- This circuit (1) contains a clock generator (2), a GP-uP core (5), NN network ( ⁇ ), a command memory and data memory ( 7), an analogue circuit (2) and one
- a speech signal is digitized at a predetermined clock rate, e.g. 100 us.
- the speech signal is changed and / or transformed, and / or algorithms for feature extraction (such as digital filters) are used.
- the GP's are additionally and / or only this signal supplied
- the digital signal can be changed and / or transformed by the phoneme and / or word identification based on neural networks (NN) and the classification result is supplied to an NN in the form of digital values
- the phoneme or word identification can also be based on fuzzy logic (FL).
- the classification result is then fed to an FL function in the form of digital values
- classification result is supplied to one or more GPs (genetic program (s)) in the form of digital values in the phoneme and / or word identification
- the NN, the FL functions and the GP functions can additionally include linguistic and / or phonetic rules and / or the possible recognizable ones Phoneme sequences that represent the recognizable utterances are supplied in the form of digital values.
- the NN is trained in that classification words are created on the input side in the form of digital values and the desired signal is fed in on the output side.
- the classification result of GPs (genetic programs) from the speech signal is used to identify the speaker.
- the speech synthesis and speech generation on the basis of GP's is realized in that the GP 's phoneme sounds are supplied in the form of digital values and / or the phoneme sounds are generated by GP 's .
- the phoneme sounds are combined and / or modulated by GP's and / or NN (neural networks) and / or fuzzy logic.
- the voice-controlled input of a destination based on GP genetic programs
- / or NN algorithms and / or fuzzy logic to be carried out by naming the location, which is the case in smaller locations by naming the next largest City is supplemented.
- the recognition process is run through twice, with the second run depending on the result of the. A differentiated vocabulary is loaded.
- the control of a computer mouse and the navigation on the surface of a computer operating system can be carried out by voice control based on GP (genetic programs) and / or NN algorithms and / or fuzzy logic. This makes it possible to create a computer mouse in which users alternatively enter the operating system commands directly by voice, open menus, programs starts, or issues control commands without first moving the mouse pointer to the corresponding position and clicking
- the computer it is possible for the computer to convert the input of speech into SMS (short message system) messages. This is achieved in that the speech commands and the speech text of the user are detected, interpreted by the speech recognition and into the SMS data format is converted
- an emergency call can be made automatically by means of a GSM connection and / or radio connection. This is done by means of speech synthesis and speech generation based on GP
- the computer unit can control a wheelchair by voice by capturing the user's voice commands, interpreting them by voice recognition and converting them into suitable driving commands
- orientation aids for the blind and visually impaired can be implemented by means of a voice-controlled computer unit which, for example, give instructions relating to the direction of walking
- the process can also be used in data entry in warehousing (e.g. quality control, production process control). Due to the low performance requirement, it is possible to accommodate speech and language generation in a portable device that has an operating time of up to 8 hours. Here with normal standard -PC technology is not possible because it is too big and error prone. This robust speech recognition enables data input even in an environment with high noise levels and high accuracy.
- the language generation then gives the user instructions or repeats the entries for verification, also as an interactive aid language generation can be used
- microcomputer controls integrated in different circuits, especially when all the necessary hardware and software for speech recognition, speaker identification and speech generation is housed in one circuit.
- the advantage of this invention is to be able to offer a method which enables reliable automatic speech recognition, which works efficiently and robustly even in the event of interference from background noise, and which can be easily and simply integrated into embedded systems and devices.
- Another advantage is that no preprocessing of the time signal (the digital samples) is required. The procedure is independent of the speaker. There are no complex training procedures to go through, or to create and save extensive reference sentences.
- Another advantage is the possibility to build systems based on this method or on such devices, small and inexpensive, which are easy to handle, light and portable, and are suitable for new fields of application due to the given real-time reaction.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Theoretical Computer Science (AREA)
- Human Computer Interaction (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Probability & Statistics with Applications (AREA)
- Computational Linguistics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Machine Translation (AREA)
Abstract
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU62605/00A AU6260500A (en) | 1999-06-15 | 2000-06-15 | Method and device for automatic speech recognition, speaker identification and voice output |
DE10081648T DE10081648D2 (de) | 1999-06-15 | 2000-06-15 | Verfahren und Vorrichtung zur automatischen Spracherkennung, Sprechridentifizierung und Sprachausgabe |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
DE19927317.0 | 1999-06-15 | ||
DE29910274.2 | 1999-06-15 | ||
DE29910274U DE29910274U1 (de) | 1999-06-15 | 1999-06-15 | Vorrichtung zur automatischen Spracherkennung, Sprecheridentifizierung und Sprachausgabe |
DE1999127317 DE19927317A1 (de) | 1999-06-15 | 1999-06-15 | Verfahren und Vorrichtung zur automatischen Spracherkennung, Sprecheridentifizierung und Spracherzeugung |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2000077773A1 true WO2000077773A1 (fr) | 2000-12-21 |
Family
ID=26053783
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/DE2000/001999 WO2000077773A1 (fr) | 1999-06-15 | 2000-06-15 | Procede et dispositif de reconnaissance vocale, d'identification du locuteur, et de synthese vocale automatiques |
Country Status (3)
Country | Link |
---|---|
AU (1) | AU6260500A (fr) |
DE (1) | DE10081648D2 (fr) |
WO (1) | WO2000077773A1 (fr) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7343288B2 (en) | 2002-05-08 | 2008-03-11 | Sap Ag | Method and system for the processing and storing of voice information and corresponding timeline information |
US7406413B2 (en) | 2002-05-08 | 2008-07-29 | Sap Aktiengesellschaft | Method and system for the processing of voice data and for the recognition of a language |
US8478005B2 (en) | 2011-04-11 | 2013-07-02 | King Fahd University Of Petroleum And Minerals | Method of performing facial recognition using genetically modified fuzzy linear discriminant analysis |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5881135A (en) * | 1992-06-15 | 1999-03-09 | British Telecommunications Public Limited Company | Service platform |
WO1999024968A1 (fr) * | 1997-11-07 | 1999-05-20 | Motorola Inc. | Procede, dispositif et systeme de desambiguisation de nature grammaticale |
-
2000
- 2000-06-15 DE DE10081648T patent/DE10081648D2/de not_active Ceased
- 2000-06-15 WO PCT/DE2000/001999 patent/WO2000077773A1/fr active Application Filing
- 2000-06-15 AU AU62605/00A patent/AU6260500A/en not_active Abandoned
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5881135A (en) * | 1992-06-15 | 1999-03-09 | British Telecommunications Public Limited Company | Service platform |
WO1999024968A1 (fr) * | 1997-11-07 | 1999-05-20 | Motorola Inc. | Procede, dispositif et systeme de desambiguisation de nature grammaticale |
Non-Patent Citations (3)
Title |
---|
CONRADS M ET AL: "Speech sound discrimination with genetic programming", GENETIC PROGRAMMING. FIRST EUROPEAN WORKSHOP, EUROGP'98, PARIS, FRANCE, 14 April 1998 (1998-04-14) - 15 April 1998 (1998-04-15), Springer-Verlag, Berlin, Germany, 1998, pages 113 - 129, XP002153019, ISBN: 3-540-64360-5 * |
DEMIREKLER M ET AL: "FEATURE SELECTION USING GENETICS-BASED ALGORITHM AND ITS APPLICATION TO SPEAKER IDENTIFICATION", PHOENIX, AZ, MARCH 15 - 19, 1999,NEW YORK, NY: IEEE,US, 15 March 1999 (1999-03-15), pages 329 - 332, XP000900125, ISBN: 0-7803-5042-1 * |
SPALANZANI A ET AL: "Improving robustness of connectionist speech recognition systems by genetic algorithms", PROCEEDINGS 1999 INTERNATIONAL CONFERENCE ON INFORMATION INTELLIGENCE AND SYSTEMS, BETHESDA, MD, USA, 31 October 1999 (1999-10-31) - 3 November 1999 (1999-11-03), IEEE, Los Alamitos, CA, USA, pages 415 - 422, XP002153020, ISBN: 0-7695-0446-9 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7343288B2 (en) | 2002-05-08 | 2008-03-11 | Sap Ag | Method and system for the processing and storing of voice information and corresponding timeline information |
US7406413B2 (en) | 2002-05-08 | 2008-07-29 | Sap Aktiengesellschaft | Method and system for the processing of voice data and for the recognition of a language |
US8478005B2 (en) | 2011-04-11 | 2013-07-02 | King Fahd University Of Petroleum And Minerals | Method of performing facial recognition using genetically modified fuzzy linear discriminant analysis |
Also Published As
Publication number | Publication date |
---|---|
AU6260500A (en) | 2001-01-02 |
DE10081648D2 (de) | 2001-09-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
DE69427083T2 (de) | Spracherkennungssystem für mehrere sprachen | |
DE102020205786B4 (de) | Spracherkennung unter verwendung von nlu (natural language understanding)-bezogenem wissen über tiefe vorwärtsgerichtete neuronale netze | |
DE69923379T2 (de) | Nicht-interaktive Registrierung zur Spracherkennung | |
DE69829235T2 (de) | Registrierung für die Spracherkennung | |
DE69832393T2 (de) | Spracherkennungssystem für die erkennung von kontinuierlicher und isolierter sprache | |
DE69827988T2 (de) | Sprachmodelle für die Spracherkennung | |
DE69908047T2 (de) | Verfahren und System zur automatischen Bestimmung von phonetischen Transkriptionen in Verbindung mit buchstabierten Wörtern | |
DE60009583T2 (de) | Sprecheradaptation auf der Basis von Stimm-Eigenvektoren | |
DE69834553T2 (de) | Erweiterbares spracherkennungssystem mit einer audio-rückkopplung | |
DE112018002857T5 (de) | Sprecheridentifikation mit ultrakurzen Sprachsegmenten für Fern- und Nahfeld-Sprachunterstützungsanwendungen | |
DE19847419A1 (de) | Verfahren zur automatischen Erkennung einer buchstabierten sprachlichen Äußerung | |
DE202017106303U1 (de) | Bestimmen phonetischer Beziehungen | |
WO1998010413A1 (fr) | Systeme et procede de traitement de la parole | |
WO2002045076A1 (fr) | Procede et systeme de reconnaissance vocale | |
DE112006000322T5 (de) | Audioerkennungssystem zur Erzeugung von Antwort-Audio unter Verwendung extrahierter Audiodaten | |
DE69924596T2 (de) | Auswahl akustischer Modelle mittels Sprecherverifizierung | |
EP1273003B1 (fr) | Procede et dispositif de determination de marquages prosodiques | |
DE60318385T2 (de) | Sprachverarbeitungseinrichtung und -verfahren, aufzeichnungsmedium und programm | |
DE69519229T2 (de) | Verfahren und vorrichtung zur anpassung eines spracherkenners an dialektische sprachvarianten | |
EP0987682B1 (fr) | Procédé d'adaptation des modèles de language pour la reconnaissance de la parole | |
DE112021000292T5 (de) | Sprachverarbeitungssystem | |
DE112006000225B4 (de) | Dialogsystem und Dialogsoftware | |
WO2000005709A1 (fr) | Procede et dispositif pour reconnaitre des mots-cles predetermines dans un enonce verbal | |
DE60022291T2 (de) | Unüberwachte anpassung eines automatischen spracherkenners mit grossem wortschatz | |
EP3010014A1 (fr) | Procede d'interpretation de reconnaissance vocale automatique |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AL AM AT AU AZ BA BB BG BR BY CA CH CN CR CU CZ DE DK DM EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
REF | Corresponds to |
Ref document number: 10081648 Country of ref document: DE Date of ref document: 20010927 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 10081648 Country of ref document: DE |
|
122 | Ep: pct application non-entry in european phase | ||
NENP | Non-entry into the national phase |
Ref country code: JP |