+

WO2000077773A1 - Procede et dispositif de reconnaissance vocale, d'identification du locuteur, et de synthese vocale automatiques - Google Patents

Procede et dispositif de reconnaissance vocale, d'identification du locuteur, et de synthese vocale automatiques Download PDF

Info

Publication number
WO2000077773A1
WO2000077773A1 PCT/DE2000/001999 DE0001999W WO0077773A1 WO 2000077773 A1 WO2000077773 A1 WO 2000077773A1 DE 0001999 W DE0001999 W DE 0001999W WO 0077773 A1 WO0077773 A1 WO 0077773A1
Authority
WO
WIPO (PCT)
Prior art keywords
speech
classification result
speech recognition
voice
basis
Prior art date
Application number
PCT/DE2000/001999
Other languages
German (de)
English (en)
Inventor
Christoph Bueltemann
Heribert Leissner
Tilo Schlumberger
Detlef ZÜNDORF
Original Assignee
Genologic Gmbh
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from DE29910274U external-priority patent/DE29910274U1/de
Priority claimed from DE1999127317 external-priority patent/DE19927317A1/de
Application filed by Genologic Gmbh filed Critical Genologic Gmbh
Priority to AU62605/00A priority Critical patent/AU6260500A/en
Priority to DE10081648T priority patent/DE10081648D2/de
Publication of WO2000077773A1 publication Critical patent/WO2000077773A1/fr

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/14Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks

Definitions

  • the invention relates to a method for automatic speech recognition, speaker identification and speech generation based on genetic programming (GP) and to an apparatus for performing the method.
  • GP genetic programming
  • the automatic speech recognition can be used both for recognition, spoken, transmitted speech and for verifying the identity of a user.
  • HMM Hiddden-Markow model
  • cepstral noise reduction is based on a multilayer network based on LPC (linear predictive coding) cepstral coefficients.
  • LPC linear predictive coding
  • a noise reduction network uses a nonlinear auto-associative mapping to set a number of noisy cepstral coefficients into a number of Noise-free cepstral coefficients in the area of the cepstral around With this method a detection rate of 65% is achieved ("A Cepstral Noise Reduction Multi-Layer Neural Network", Helge BD Sorensen, ICASSP91, Toranto, Canada, May 14-17, 1991)
  • Another method of noise reduction is based on a structured universal network. Such a network enables noise reduction through the 3 following steps
  • the first step is a spectral analysis of the spoken language.
  • the second is a self-structured neural noise reduction method SNNR (Self-Structuring Neural Noise Reduction).
  • SNNR Self-Structuring Neural Noise Reduction
  • the result of the SNNR network, which is already noise-reduced, is then in the third step by the so-called hidden control Neural Network (HCNN) completed (Helge BD Sorensen and Uwe Hartmann, "A Self-Structuring Neural Noise Reduction Model", University of Aalborg, Denmark, after May 1991)
  • HCNN hidden control Neural Network
  • connectionist model Another known method for noise reduction is that of the connectionist model.
  • a 4-stage network is trained by an algorithm to convert noisy signals into noise-free signals. In this way, the network is able to learn noise reduction. Furthermore, it can generate noisy signals that are not Part of the trained signals are separate from noise (Shin'ichi Tamura and Axel Waibel, "Noise Reduction Using Conectionist Models", Osaka, Japan, ICASSP88, April 1988)
  • Speaker verification methods use person-specific properties of the human voice as characteristics.These make it possible to verify a person's identity using a short speech sample of the respective person.Usually, this method extracts speaker-specific features from at least one digital speech sample. These methods for speaker veneration use two different phases , a training phase and a test phase
  • Speaker verification methods statements which can be specified by a user are spoken into an arrangement. Reference feature vectors are formed from this, which contain speaker-specific characteristics. For this purpose, the speech signal is divided into small, pseudo-stationary sections. For the duration of these sections ⁇ the speech signal is assumed to be stationary. These sections usually have a duration of approximately 10 to 20 ms.
  • At least one, usually a plurality, of characteristic vector is formed for the speech signal, which are compared with the speech sample. If the distance is sufficient, the speaker is accepted as the speaker to be verified.
  • the previously described method is known, in which there is a considerable disadvantage it can be seen that the method has a high degree of uncertainty in the verification of the speaker. The result of this is that a decision threshold for acceptance or rejection of the speaker must be determined. This determination is made only from the very short pseudo-stationary sections of the speech sample of the user to be verified. Even the method described in DE196 30 109 A1, which describes the consideration of a "counterexample” (speech sample from a speaker who is not to be verified), does not change the fundamental problem of the method.
  • the general purpose of automatic speech generation is to provide different forms of information for a person using a computer or device.
  • Lexical pronunciations are under-specified, generalized pronunciations that can, but do not have to, result in changed post-lexical pronunciations in natural language. For example, the English word "foot” can be listed as / fuht / in a pronunciation dictionary
  • a lem method is used to learn how spectral speech information is generated from phonetic information.This is how the acoustic parameters of the neural network are trained.Here, speech waveforms are marked with phonetic information, and then for example, a neural network or another system controlled by data is trained to learn the spectral characteristics of the sounds associated with these time periods If the neural network system is actually used, the neural network must generate appropriate spectral information from certain phonetic information.
  • the phonetic information from a text is generated using a spelling-phonetic lexicon or with an automatic method that has been trained with such a lexicon derived
  • Computer units are known for controlling devices, machines, computers and production systems. These are essentially used for control purposes 1
  • circuits are installed which have only limited functionality with regard to voice input. These chips can only recognize a small number of voice commands and are very sensitive to changes in the voice position and to one another Interference noise
  • Such computer units with such built-in chips are currently manufactured and offered by companies such as Sensorx, Ine (Sunnyval, CA, USA) or fonix, Ine (Salt Lake City, UT, USA).
  • the computer units currently available on the market are not designed for human-machine dialog on a voice basis, but are operated by input using various control elements (e.g. switches or buttons) and / or keyboard.
  • the response or response of the computer units is generally carried out using an alphanumeric and / or graphic Display
  • the aforementioned known computer units are only conditionally suitable for human-machine dialogue due to their limited functionality with regard to voice input and voice output. Furthermore, these computer units, which are operated by switches, buttons or keyboards, are in contrast to automatic voice input and voice output (according to present invention) considerably more prone to errors, more susceptible to faults and more complex (in handling) with regard to data input or output. In addition, such systems always require corresponding skills and knowledge with regard to the functioning and operation (for example with the keyboard)
  • the object of the present invention is then to provide a method and / or a device which enables reliable automatic speech recognition, which works efficiently and robustly even in the event of interference from background noise, and can be easily and simply incorporated into embedded systems (integrated microcomputer controls), and Integrated device allows the speaker to be identified reliably and provides an output option by means of speech synthesis
  • a) a speech signal is digitized with a predeterminable clock rate
  • b) the digitized values of the speech signal are fed to a GP in such a way that a classification result is formed by repeated calling of the GP with values of the digitized speech signal .
  • the classification is based on the classification result taking into account the value and / or the change in the value at predeterminable and / or fixed intervals, d) that the classification result is processed in such a way that phonemes and / or words are based on neural networks ( NN) and / or on the basis of genetic programs (GP) and / or on the basis of fuzzy logic (FL), e) a computer unit a speech recognition, speaker identification and speech generation from clock generator, CPU (Central Processor Unit) , Command memory and / or data memory, analog input and / or analog output circuit includes.
  • NN neural networks
  • GP genetic programs
  • FL fuzzy logic
  • a computer unit a speech recognition, speaker identification and speech generation from clock generator, CPU (Central Processor Unit) , Command memory and / or data memory, analog input and / or analog output circuit includes.
  • 3 shows a computer unit in top view, consisting of clock generator, CPU, command memory and / or data memory and analog input and / or output circuit
  • 4 is a side view of a computer mouse with a built-in computer unit and a microphone for voice input,
  • 5 shows a computer unit or speech recognition unit in
  • FIG. 7 shows a computer unit or a speech recognition and speech generation unit in a top view; a microphone; a loudspeaker and a connection socket for the connection to the control unit of the wheelchair,
  • connection socket for GPS antenna connection socket for FM antenna
  • FIG. 9 shows a computer unit or in plan view with a speech recognition and speech generation unit; a microphone; a speaker; a control panel and a display,
  • 10 is a top view of a circuit with a clock generator, a CPU core, an NN network, a command memory and / or data memory as well as an analog input circuit and an analog output circuit.
  • 1 shows a flow diagram according to the method according to the invention, which shows the data flow and the processing of the speech signal (1) up to the classification result ( ⁇ ).
  • the speech signal (1) is digitized by means of signal digitization or signal conditioning (2) and optionally processed (in the form of digital filters).
  • These GP commands (3) of the genetic program (5) are called up repeatedly during one of the predefinable time intervals with digitized values of the speech signal (1).
  • a classification result (6) is then set, which represents the recognized content of the speech signal (1).
  • FIG. 2 shows the method for the further processing of the classification result (s) (5) of one or more genetic programs (4).
  • phonetic rules (1) or the predefinable recognizable words (3) the values are supplied to one or more function blocks (GP (8) and / or fuzzy logic (7) and / or NN (6)).
  • a word and / or phoneme identifier (9) is calculated, which contains a list of words / phonemes or an individual word / phoneme and its / their recognition probability.
  • FIG. 3 describes a device which represents a computer unit (1) which is used for speech recognition, speaker identification and speech generation.
  • the computer unit (1) consists of a clock generator (2). which specifies the clock for the CPU (Central Processor Unit) (4), a command and / or data memory (5) in which programs GP's, as well as conventionally created programs and data are stored and an analog input and / or analog output circuit (3) converts the speech signals into digital values and / or digital values into speech signals.
  • a clock generator (2) which specifies the clock for the CPU (Central Processor Unit) (4), a command and / or data memory (5) in which programs GP's, as well as conventionally created programs and data are stored and an analog input and / or analog output circuit (3) converts the speech signals into digital values and / or digital values into speech signals.
  • CPU Central Processor Unit
  • a computer mouse (1) which can be operated by voice input.
  • the user interface is controlled by voice control via the microphone (2) on the basis of GP (genetic programs) and / or NN algorithms and / or fuzzy logic.
  • This control takes place by means of a computer unit (3) which contains a voice chip (4).
  • which implements the operating commands 5 shows a computer unit or speech recognition unit (1), in which the input of SMS (short message system) messages is carried out by voice input. SMS messages are generated by voice input via the microphone (2) using the voice recognition unit (1) on the basis of GP (genetic programs) and / or NN algorithms and / or fuzzy logic and via the GSM connection socket (4) output to a GSM phone.
  • the unit reports back via the loudspeaker (3).
  • FIG. 6 shows a computer unit or speech generation unit 1) which automatically establishes a GSM connection and / or radio connection and makes an emergency call.
  • Previously stored data is converted into speech based on GP (genetic programs) and / or NN algorithms and / or fuzzy logic and output via the GSM connection socket (2).
  • FIG. 7 shows a computer unit or speech recognition and speech generation unit (1); a microphone (2); a loudspeaker (3) and a connection socket (4) for the connection to the wheelchair control unit.
  • This enables voice-controlled operation based on GP (genetic programs) and / or NN algorithms and / or fuzzy logic to control the wheelchair.
  • FIG. 8 shows a computer unit (1) with a speech recognition and speech generation unit (4).
  • the microphone (2) is used for voice input based on GP (genetic programs) and / or NN algorithms and / or fuzzy logic, which is output again by the computer unit (1) via the loudspeaker (3) for checking purposes.
  • further information or commands can be entered using the function keys (5).
  • GPS data Global Position System
  • D-GPS data Diffemtial Global Position System
  • a computer unit (1) with a speech recognition and speech generation unit (4) is shown.
  • voice input via the microphone (2) by means of the speech recognition and speech generation unit (4) based on GP (genetic programs) and / or NN -Algo ⁇ thmen and / or fuzzy logic the voice-controlled data input and output of production and warehouse data enables voice input via the microphone (2), which is output by the computer unit (1) via the loudspeaker (3) for control purposes, or on the display (6) is shown. Additional information or commands can also be entered using the function keys (5)
  • This circuit (1) contains a clock generator (2), a GP-uP core (5), NN network ( ⁇ ), a command memory and data memory ( 7), an analogue circuit (2) and one
  • a speech signal is digitized at a predetermined clock rate, e.g. 100 us.
  • the speech signal is changed and / or transformed, and / or algorithms for feature extraction (such as digital filters) are used.
  • the GP's are additionally and / or only this signal supplied
  • the digital signal can be changed and / or transformed by the phoneme and / or word identification based on neural networks (NN) and the classification result is supplied to an NN in the form of digital values
  • the phoneme or word identification can also be based on fuzzy logic (FL).
  • the classification result is then fed to an FL function in the form of digital values
  • classification result is supplied to one or more GPs (genetic program (s)) in the form of digital values in the phoneme and / or word identification
  • the NN, the FL functions and the GP functions can additionally include linguistic and / or phonetic rules and / or the possible recognizable ones Phoneme sequences that represent the recognizable utterances are supplied in the form of digital values.
  • the NN is trained in that classification words are created on the input side in the form of digital values and the desired signal is fed in on the output side.
  • the classification result of GPs (genetic programs) from the speech signal is used to identify the speaker.
  • the speech synthesis and speech generation on the basis of GP's is realized in that the GP 's phoneme sounds are supplied in the form of digital values and / or the phoneme sounds are generated by GP 's .
  • the phoneme sounds are combined and / or modulated by GP's and / or NN (neural networks) and / or fuzzy logic.
  • the voice-controlled input of a destination based on GP genetic programs
  • / or NN algorithms and / or fuzzy logic to be carried out by naming the location, which is the case in smaller locations by naming the next largest City is supplemented.
  • the recognition process is run through twice, with the second run depending on the result of the. A differentiated vocabulary is loaded.
  • the control of a computer mouse and the navigation on the surface of a computer operating system can be carried out by voice control based on GP (genetic programs) and / or NN algorithms and / or fuzzy logic. This makes it possible to create a computer mouse in which users alternatively enter the operating system commands directly by voice, open menus, programs starts, or issues control commands without first moving the mouse pointer to the corresponding position and clicking
  • the computer it is possible for the computer to convert the input of speech into SMS (short message system) messages. This is achieved in that the speech commands and the speech text of the user are detected, interpreted by the speech recognition and into the SMS data format is converted
  • an emergency call can be made automatically by means of a GSM connection and / or radio connection. This is done by means of speech synthesis and speech generation based on GP
  • the computer unit can control a wheelchair by voice by capturing the user's voice commands, interpreting them by voice recognition and converting them into suitable driving commands
  • orientation aids for the blind and visually impaired can be implemented by means of a voice-controlled computer unit which, for example, give instructions relating to the direction of walking
  • the process can also be used in data entry in warehousing (e.g. quality control, production process control). Due to the low performance requirement, it is possible to accommodate speech and language generation in a portable device that has an operating time of up to 8 hours. Here with normal standard -PC technology is not possible because it is too big and error prone. This robust speech recognition enables data input even in an environment with high noise levels and high accuracy.
  • the language generation then gives the user instructions or repeats the entries for verification, also as an interactive aid language generation can be used
  • microcomputer controls integrated in different circuits, especially when all the necessary hardware and software for speech recognition, speaker identification and speech generation is housed in one circuit.
  • the advantage of this invention is to be able to offer a method which enables reliable automatic speech recognition, which works efficiently and robustly even in the event of interference from background noise, and which can be easily and simply integrated into embedded systems and devices.
  • Another advantage is that no preprocessing of the time signal (the digital samples) is required. The procedure is independent of the speaker. There are no complex training procedures to go through, or to create and save extensive reference sentences.
  • Another advantage is the possibility to build systems based on this method or on such devices, small and inexpensive, which are easy to handle, light and portable, and are suitable for new fields of application due to the given real-time reaction.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Machine Translation (AREA)

Abstract

Afin de mettre au point un procédé et/ou un dispositif permettant une reconnaissance vocale fiable, fonctionnant de manière efficace et robuste, même en cas de perturbations dues à des bruits de fond, pouvant être intégré aisément dans des systèmes de commande de micro-ordinateurs intégrés et dans des appareils, permettant également une identification fiable du locuteur /de la locutrice et offrant une possibilité de synthèse vocale, il est prévu les caractéristiques suivantes de la revendication 1, a) que le signal audio soit numérisé à une fréquence d'horloge prédéterminable, b) que les valeurs numérisées du signal vocal soient transmises à un programme génétique (GP), de manière qu'un résultat de classification constitué avec les valeurs du signal vocal numérisé, par l'intermédiaire de l'appel répété du GP, c) que la classification soit effectuée à des intervalles prédéterminables et/ou fixes sur la base du résultat de classification, en tenant compte de la valeur et/ou de la variation de la valeur, d) que le traitement du résultat de classification soit effectué de manière que des phonèmes et/ou des mots soient identifiés sur la base de réseaux neuronaux (NN) et/ou sur la base de programmes génétiques (GP) et/ou sur la base d'une logique floue (FL), e) qu'une unité informatique comporte un système de reconnaissance vocale, d'identification du locuteur, de synthèse vocale, composé d'une horloge interne, d'une unité centrale (CPU), d'une mémoire de commandes et/ou d'une mémoire de données, et d'un circuit d'entrée analogique et/ou d'un circuit de sortie analogique.
PCT/DE2000/001999 1999-06-15 2000-06-15 Procede et dispositif de reconnaissance vocale, d'identification du locuteur, et de synthese vocale automatiques WO2000077773A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
AU62605/00A AU6260500A (en) 1999-06-15 2000-06-15 Method and device for automatic speech recognition, speaker identification and voice output
DE10081648T DE10081648D2 (de) 1999-06-15 2000-06-15 Verfahren und Vorrichtung zur automatischen Spracherkennung, Sprechridentifizierung und Sprachausgabe

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
DE19927317.0 1999-06-15
DE29910274.2 1999-06-15
DE29910274U DE29910274U1 (de) 1999-06-15 1999-06-15 Vorrichtung zur automatischen Spracherkennung, Sprecheridentifizierung und Sprachausgabe
DE1999127317 DE19927317A1 (de) 1999-06-15 1999-06-15 Verfahren und Vorrichtung zur automatischen Spracherkennung, Sprecheridentifizierung und Spracherzeugung

Publications (1)

Publication Number Publication Date
WO2000077773A1 true WO2000077773A1 (fr) 2000-12-21

Family

ID=26053783

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/DE2000/001999 WO2000077773A1 (fr) 1999-06-15 2000-06-15 Procede et dispositif de reconnaissance vocale, d'identification du locuteur, et de synthese vocale automatiques

Country Status (3)

Country Link
AU (1) AU6260500A (fr)
DE (1) DE10081648D2 (fr)
WO (1) WO2000077773A1 (fr)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7343288B2 (en) 2002-05-08 2008-03-11 Sap Ag Method and system for the processing and storing of voice information and corresponding timeline information
US7406413B2 (en) 2002-05-08 2008-07-29 Sap Aktiengesellschaft Method and system for the processing of voice data and for the recognition of a language
US8478005B2 (en) 2011-04-11 2013-07-02 King Fahd University Of Petroleum And Minerals Method of performing facial recognition using genetically modified fuzzy linear discriminant analysis

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5881135A (en) * 1992-06-15 1999-03-09 British Telecommunications Public Limited Company Service platform
WO1999024968A1 (fr) * 1997-11-07 1999-05-20 Motorola Inc. Procede, dispositif et systeme de desambiguisation de nature grammaticale

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5881135A (en) * 1992-06-15 1999-03-09 British Telecommunications Public Limited Company Service platform
WO1999024968A1 (fr) * 1997-11-07 1999-05-20 Motorola Inc. Procede, dispositif et systeme de desambiguisation de nature grammaticale

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
CONRADS M ET AL: "Speech sound discrimination with genetic programming", GENETIC PROGRAMMING. FIRST EUROPEAN WORKSHOP, EUROGP'98, PARIS, FRANCE, 14 April 1998 (1998-04-14) - 15 April 1998 (1998-04-15), Springer-Verlag, Berlin, Germany, 1998, pages 113 - 129, XP002153019, ISBN: 3-540-64360-5 *
DEMIREKLER M ET AL: "FEATURE SELECTION USING GENETICS-BASED ALGORITHM AND ITS APPLICATION TO SPEAKER IDENTIFICATION", PHOENIX, AZ, MARCH 15 - 19, 1999,NEW YORK, NY: IEEE,US, 15 March 1999 (1999-03-15), pages 329 - 332, XP000900125, ISBN: 0-7803-5042-1 *
SPALANZANI A ET AL: "Improving robustness of connectionist speech recognition systems by genetic algorithms", PROCEEDINGS 1999 INTERNATIONAL CONFERENCE ON INFORMATION INTELLIGENCE AND SYSTEMS, BETHESDA, MD, USA, 31 October 1999 (1999-10-31) - 3 November 1999 (1999-11-03), IEEE, Los Alamitos, CA, USA, pages 415 - 422, XP002153020, ISBN: 0-7695-0446-9 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7343288B2 (en) 2002-05-08 2008-03-11 Sap Ag Method and system for the processing and storing of voice information and corresponding timeline information
US7406413B2 (en) 2002-05-08 2008-07-29 Sap Aktiengesellschaft Method and system for the processing of voice data and for the recognition of a language
US8478005B2 (en) 2011-04-11 2013-07-02 King Fahd University Of Petroleum And Minerals Method of performing facial recognition using genetically modified fuzzy linear discriminant analysis

Also Published As

Publication number Publication date
AU6260500A (en) 2001-01-02
DE10081648D2 (de) 2001-09-27

Similar Documents

Publication Publication Date Title
DE69427083T2 (de) Spracherkennungssystem für mehrere sprachen
DE102020205786B4 (de) Spracherkennung unter verwendung von nlu (natural language understanding)-bezogenem wissen über tiefe vorwärtsgerichtete neuronale netze
DE69923379T2 (de) Nicht-interaktive Registrierung zur Spracherkennung
DE69829235T2 (de) Registrierung für die Spracherkennung
DE69832393T2 (de) Spracherkennungssystem für die erkennung von kontinuierlicher und isolierter sprache
DE69827988T2 (de) Sprachmodelle für die Spracherkennung
DE69908047T2 (de) Verfahren und System zur automatischen Bestimmung von phonetischen Transkriptionen in Verbindung mit buchstabierten Wörtern
DE60009583T2 (de) Sprecheradaptation auf der Basis von Stimm-Eigenvektoren
DE69834553T2 (de) Erweiterbares spracherkennungssystem mit einer audio-rückkopplung
DE112018002857T5 (de) Sprecheridentifikation mit ultrakurzen Sprachsegmenten für Fern- und Nahfeld-Sprachunterstützungsanwendungen
DE19847419A1 (de) Verfahren zur automatischen Erkennung einer buchstabierten sprachlichen Äußerung
DE202017106303U1 (de) Bestimmen phonetischer Beziehungen
WO1998010413A1 (fr) Systeme et procede de traitement de la parole
WO2002045076A1 (fr) Procede et systeme de reconnaissance vocale
DE112006000322T5 (de) Audioerkennungssystem zur Erzeugung von Antwort-Audio unter Verwendung extrahierter Audiodaten
DE69924596T2 (de) Auswahl akustischer Modelle mittels Sprecherverifizierung
EP1273003B1 (fr) Procede et dispositif de determination de marquages prosodiques
DE60318385T2 (de) Sprachverarbeitungseinrichtung und -verfahren, aufzeichnungsmedium und programm
DE69519229T2 (de) Verfahren und vorrichtung zur anpassung eines spracherkenners an dialektische sprachvarianten
EP0987682B1 (fr) Procédé d'adaptation des modèles de language pour la reconnaissance de la parole
DE112021000292T5 (de) Sprachverarbeitungssystem
DE112006000225B4 (de) Dialogsystem und Dialogsoftware
WO2000005709A1 (fr) Procede et dispositif pour reconnaitre des mots-cles predetermines dans un enonce verbal
DE60022291T2 (de) Unüberwachte anpassung eines automatischen spracherkenners mit grossem wortschatz
EP3010014A1 (fr) Procede d'interpretation de reconnaissance vocale automatique

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AL AM AT AU AZ BA BB BG BR BY CA CH CN CR CU CZ DE DK DM EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
REF Corresponds to

Ref document number: 10081648

Country of ref document: DE

Date of ref document: 20010927

WWE Wipo information: entry into national phase

Ref document number: 10081648

Country of ref document: DE

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载