+

WO2018188907A1 - Traitement d'une entrée vocale - Google Patents

Traitement d'une entrée vocale Download PDF

Info

Publication number
WO2018188907A1
WO2018188907A1 PCT/EP2018/056945 EP2018056945W WO2018188907A1 WO 2018188907 A1 WO2018188907 A1 WO 2018188907A1 EP 2018056945 W EP2018056945 W EP 2018056945W WO 2018188907 A1 WO2018188907 A1 WO 2018188907A1
Authority
WO
WIPO (PCT)
Prior art keywords
processing system
speech
input
voice
processing
Prior art date
Application number
PCT/EP2018/056945
Other languages
German (de)
English (en)
Inventor
Felix Schwarz
Christian Süss
Original Assignee
Bayerische Motoren Werke Aktiengesellschaft
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bayerische Motoren Werke Aktiengesellschaft filed Critical Bayerische Motoren Werke Aktiengesellschaft
Publication of WO2018188907A1 publication Critical patent/WO2018188907A1/fr

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/01Assessment or evaluation of speech recognition systems

Definitions

  • the present invention relates to a method for processing a voice input and a mobile device, in particular a motor vehicle, for carrying out such a method.
  • Processing of voice inputs in the vehicle done, for example, by a central control unit of the vehicle.
  • a data connection to a vehicle external server can be used, which takes over the processing of voice input. Both options can also be used in combination.
  • voice-controlled information system for a vehicle.
  • at least one keyword is determined from a set of predefined keywords.
  • Individual units of the information system can also be arranged outside the vehicle.
  • a current individual equipment of the vehicle can be taken into account.
  • DE 10 2012 022 630 A1 teaches a method for communication of a driver with a driver assistance system.
  • a keyword identification is provided, which can also access external source.
  • internet servers whose databases are kept up to date can be requested.
  • the object is to improve the processing of a voice input of a user of a mobile device, in particular a motor vehicle.
  • the invention is suitable for use with a variety of mobile devices
  • the invention can be used in motor vehicles, especially passenger cars, motorcycles or
  • the mobile device may also be a portable mobile device and, in particular, a so-called smartphone.
  • a portable mobile device and, in particular, a so-called smartphone.
  • the inventive method for processing a voice input of a user of a mobile device comprises the following method steps.
  • a voice input is recorded.
  • a microphone of the mobile device can be used in a manner known per se and the thus processed acoustic signal further processed, in particular digitized, be.
  • a voice input may be a variety of utterances of the user. Speech inputs can include, for example, voice commands ("Navigate Home", “Increase the volume of the radio play", “Call Martin") or questions ("What is the weather at the destination?").
  • the speech input by a first step the speech input by a first step.
  • Voice processing system processed The processing of a voice input may be accomplished in a variety of ways known in the art. As a rule, the processing will take place step by step, whereby first the audio signal representing the speech input is processed (digitized, filtered). Subsequently, a syntactic analysis can be carried out, the result of which may be a text-based reproduction of the spoken words, the meaning of which, however, has not yet been ascertained. In a further step, a semantic analysis of the (now text-based) voice input can take place.
  • the term processing of voice input is to be understood broadly.
  • the speech input processing according to the invention may be a partial processing, for example
  • the score may include a separate score for each of the aforementioned processing steps.
  • the evaluation for several or all processing steps can be done together.
  • the evaluation includes an evaluation of the
  • the evaluation of the processing may relate to this text-based data.
  • the term evaluation should be understood to mean that the assessment includes a statement of the quality of the processing of the speech input.
  • the rating may relate to the quality of the speech input itself; For example, the score may include an indication of a detected signal-to-noise ratio (SNR) of the detected acoustic signal.
  • SNR signal-to-noise ratio
  • the evaluation may also relate to the processing; For example, the evaluation may include a statement that the speech input is not syntactically could be processed.
  • the evaluation may include an indication of this reason.
  • the evaluation comprises a measure of quality or statistical uncertainty, which in particular has a predetermined value range (eg 0 for minimum quality / maximum statistical uncertainty to 1 for maximum quality / minimum statistical uncertainty).
  • a data record is created according to the invention which comprises at least data representing the speech input.
  • the record may be speech input as digitized (and
  • the record may, for example, comprise the result of the syntactic analysis as text-based data.
  • the data record is transmitted to at least one further voice processing system in the last step.
  • the invention exploits the fact that one mobile device has access to several
  • the invention is based on the idea of first using the first voice processing system for processing the voice input and then - if necessary - the at least one further
  • Speech processing system This way can be an improved
  • Speech processing can be achieved at relatively low cost.
  • the first voice processing system is a machine language processing system located in the mobile device.
  • a voice processing system may also be referred to as a local voice processing system.
  • a local voice processing system first to the local Language processing system used, which is available immediately and in particular independent of the existence of a cellular connection.
  • a voice processing system wherein transmitting the record comprises transmitting the record over a cellular connection.
  • Voice processing system for example, be accessible via the Internet.
  • the data record is then transmitted via a mobile unit of the mobile device
  • Mobile communication (e.g., WLAN and / or GPRS, UMTS, LTE or the like) transmitted to an Internet server, which provides the other language processing system or forward the record to this.
  • the voice processing system located outside the mobile device may also be referred to as an external voice processing system.
  • the advantage of such an external language processing system is that the computational power compared to the local
  • Speech processing system for processing the speech input to access information that is not available to the local language processing system.
  • the external voice processing system therefore typically has better speech processing over the local voice processing system. Therefore, the invention can be particularly advantageous by a combination of the two
  • the speech input is processed by the fast and always available local speech processing system. If the evaluation shows that the voice input could not be satisfactorily processed, the record will be sent to the external
  • the other language processing system can be a machine
  • Voice processing system include.
  • the further language processing system comprises a human participant. This one can
  • Mobile connection include a voice connection, by means of which the user of the mobile device is connected to the call center employee. It can be provided that the mobile device decides, depending on the result of the evaluation, whether a further voice processing system is a pure
  • machine language processing system or a speech processing system with a human participant to be used. If, for example, it can be determined that the voice input is correctly interpreted by machine but can not be answered with the locally available information, then it makes sense to transmit the data record to a purely external machine language processing system. If, on the other hand, it is determined that the voice input can not be understood with sufficient probability for a machine voice processing system, then a voice processing system with a human user can be selected.
  • the mobile device transmits the data record to the further voice processing system and the
  • the processing of the speech input by the first speech processing system comprises a syntactic and / or semantic analysis of the speech input.
  • a syntactic analysis should be understood to mean a processing of the speech input present as (possibly already digitized) acoustic signal, the result of which is a correctly structured sequence of individual words.
  • the syntactic analysis can also detect the language of the
  • a correct syntactic analysis result could be the text-based record "navigate home", without knowing its meaning, for example, an incorrect result of the syntactic analysis could be: "drive with wind over windows”.
  • a semantic analysis is to be understood as a processing of the speech input (or the result of the preceding syntactic analysis) whose Result reflects the meaning of the speech input.
  • proper semantic analysis of the "navigate home" voice input could yield a machine readable navigation command that includes the destination "home location parameter” destination.
  • the step of evaluating the result of the processing of the speech input by the first speech processing system comprises determining a measure of the quality of the syntactic and / or semantic analysis of the speech input.
  • the range of values of the measure is limited on both sides and predetermined.
  • the metric could be between 0 (minimum quality, processing has no result at all or result is highly unusable) and 1 (maximum quality, result of the processing is most certainly correct).
  • the measure of goodness may be configured as a confidence value that reflects a probability that the result of the processing is correct. It can be provided, for example, that whenever the confidence value of the syntactic analysis falls below a predetermined value (for example 0.5, preferably 0.8, particularly preferably 0.95), the first speech processing system will include the further speech processing system.
  • the data record comprises an audio file representing the speech input and / or a text file representing the speech input. If, as in the last-mentioned example, the syntactic analysis is unsuccessful, the data record may preferably comprise an audio file representing the speech input. If, on the other hand, speech processing fails in the semantic analysis (ie if the speech input already exists in text form in other words, but can not be interpreted), the data record may preferably include a text file representing the speech input. It can also be provided that the record audio file and
  • Text file includes.
  • the record comprises at least
  • Speech processing system and / or Parts of the result of the evaluation of the result of the speech input processing by the first speech processing system.
  • the result of this processing is at least partially transmitted to the further speech processing system.
  • the evaluation of this result is at least partially transmitted.
  • the data transmitted in this way can be used by the further voice processing system in a variety of ways.
  • one's own speech processing can be improved and / or the result of one's own speech processing can be checked.
  • the further speech processing system will only perform missing parts of the speech processing, so that the result is a "division of labor" between the first and the further speech processing system
  • the set of possible destination inputs are transmitted as part of the data set.
  • a user input for confirming the transmission of the data record to the at least one further voice processing system is requested.
  • the data set is dependent on the user input to the at least one more
  • the request of the user input may, for example, acoustically and / or visually, in particular on a display of the
  • the user input can, for example, by
  • Actuation of an operating element and / or by means of voice input Actuation of an operating element and / or by means of voice input.
  • the invention is further formed by a mobile device, in particular a motor vehicle, which is set up to carry out the method described above.
  • FIG. 1 shows an embodiment of the invention in an exemplary arrangement
  • FIG. 2 shows a flow chart of an embodiment of the method according to the invention.
  • Fig. 1 shows a schematic representation of a motor vehicle 1 10, which has a designated head unit 1 1 1 control unit.
  • the head unit 1 1 1 comprises the first voice processing system 1 1 1. It is therefore a local
  • Speech processing system 1 1 Other components, in particular one or more interior microphones, of the first voice processing system 1 1 1, which may be arranged in or outside the head unit 1 1 1, are not shown in Fig. 1. Via a data bus 1 13, the head unit 1 1 1 with a mobile radio unit 1 12 of the motor vehicle 1 10 is connected. The mobile radio unit 1 12 is set up, a
  • the cellular connection 130 via a mobile network (e.g., WLAN, GSM / GPRS / EDGE, UMTS / HSPA, LTE or the like).
  • the cellular connection 130 may include a voice connection and / or a data connection.
  • the motor vehicle 1 10 can exchange data 140 with a server 121 which can be reached via the Internet 120.
  • the server 121 houses the other language processing system 121. It is thus an external language processing system 121.
  • a call center (not shown in FIG. 1) may also be provided, the employee of which as a human participant of the further speech processing system 121 can be connected to the user of the motor vehicle 110 by means of a voice connection 130.
  • step 210 a voice input is detected by the first voice processing system 1 1 1, for which purpose preferably an interior microphone of the motor vehicle 1 10 can be used.
  • the signal thus detected can first be digitized, i.
  • step 220 the speech input (now present as a digital signal) is processed.
  • a syntactic analysis can be carried out, in which the digitized audio signal is converted into a text-based date.
  • a semantic analysis can be performed in which the meaning of the speech input is converted, for example, into the form of a machine-readable control command.
  • step 230 the result of the speech input processing 220 is evaluated. For example, a statistical confidence value representing a statistical certainty of the result of the processing 220 may be determined.
  • the syntactic analysis 220 could indeed produce a result, the confidence value is low. In other words, there is a great deal of doubt as to the correctness of the result of the syntactic analysis 220. A semantic analysis could then fail or produce an erroneous result.
  • a data record 140 is created in step 240.
  • the record 140 contains the speech input in
  • the data record 140 may contain further components, for example the previously determined confidence value.
  • step 250 a user input confirming the transmission of the
  • the user receives a message saying "Your voice input could not be processed. Press the confirm button to your
  • the record 140 is transmitted to the other language processing system 121 in step 260.
  • the other language processing system 121 could initially process the speech input by purely mechanical means. This processing can therefore be more successful than that by the first voice processing system 1 1 1, because the other language processing system 121 to a larger database and / or a greater computing power can be used for speech recognition.
  • the further language processing system 121 for technical reasons (speech recognition fails) or content-related reasons (speech input content can not be answered or edited with the available information) can not handle the voice input. It may be provided for this case that the further speech recognition system 121 establishes a voice connection between the user of the motor vehicle 110 and a human participant of the further voice processing system 121. This can be done automatically or after prior confirmation of the user.
  • Another language processing system is Another language processing system.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Machine Translation (AREA)
  • Telephonic Communication Services (AREA)

Abstract

L'invention concerne un procédé servant à améliorer le traitement d'une entrée vocale d'un utilisateur d'un appareil mobile, en particulier d'un véhicule automobile. Le procédé comprend les étapes consistant à : détecter une entrée vocale ; traiter l'entrée vocale par un premier système de traitement vocal ; évaluer (230) le résultat du traitement de l'entrée vocale par le premier système de traitement vocal ; et en fonction du résultat de l'évaluation, créer un jeu de données, qui comprend des données représentant au moins l'entrée vocale ; et transmettre le jeu de données à au moins un autre système de traitement vocal. Drawing_references_to_be_translated
PCT/EP2018/056945 2017-04-12 2018-03-20 Traitement d'une entrée vocale WO2018188907A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
DE102017206281.9A DE102017206281A1 (de) 2017-04-12 2017-04-12 Verarbeitung einer Spracheingabe
DE102017206281.9 2017-04-12

Publications (1)

Publication Number Publication Date
WO2018188907A1 true WO2018188907A1 (fr) 2018-10-18

Family

ID=61763975

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2018/056945 WO2018188907A1 (fr) 2017-04-12 2018-03-20 Traitement d'une entrée vocale

Country Status (2)

Country Link
DE (1) DE102017206281A1 (fr)
WO (1) WO2018188907A1 (fr)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE102019126818A1 (de) * 2019-10-07 2021-04-08 Bayerische Motoren Werke Aktiengesellschaft Computerimplementiertes verfahren und datenverarbeitungssystem zur beantwortung eines sprachanrufs eines nutzers

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002075724A1 (fr) * 2001-03-16 2002-09-26 Koninklijke Philips Electronics N.V. Service de transcription a arret de transcription automatique
US20060080105A1 (en) * 2004-10-08 2006-04-13 Samsung Electronics Co., Ltd. Multi-layered speech recognition apparatus and method
US20120215539A1 (en) * 2011-02-22 2012-08-23 Ajay Juneja Hybridized client-server speech recognition
DE102012022630A1 (de) 2012-11-20 2013-06-06 Daimler Ag Fahrerassistenzsystem und Verfahren zur Kommunikation eines Fahrers mit einem Fahrerassistenzsystem
DE102012213668A1 (de) 2012-08-02 2014-05-22 Bayerische Motoren Werke Aktiengesellschaft Verfahren und Vorrichtung zum Betreiben eines sprachgesteuerten Informationssystems für ein Fahrzeug

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8880402B2 (en) * 2006-10-28 2014-11-04 General Motors Llc Automatically adapting user guidance in automated speech recognition
US20080288252A1 (en) * 2007-03-07 2008-11-20 Cerra Joseph P Speech recognition of speech recorded by a mobile communication facility

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002075724A1 (fr) * 2001-03-16 2002-09-26 Koninklijke Philips Electronics N.V. Service de transcription a arret de transcription automatique
US20060080105A1 (en) * 2004-10-08 2006-04-13 Samsung Electronics Co., Ltd. Multi-layered speech recognition apparatus and method
US20120215539A1 (en) * 2011-02-22 2012-08-23 Ajay Juneja Hybridized client-server speech recognition
DE102012213668A1 (de) 2012-08-02 2014-05-22 Bayerische Motoren Werke Aktiengesellschaft Verfahren und Vorrichtung zum Betreiben eines sprachgesteuerten Informationssystems für ein Fahrzeug
DE102012022630A1 (de) 2012-11-20 2013-06-06 Daimler Ag Fahrerassistenzsystem und Verfahren zur Kommunikation eines Fahrers mit einem Fahrerassistenzsystem

Also Published As

Publication number Publication date
DE102017206281A1 (de) 2018-10-18

Similar Documents

Publication Publication Date Title
EP2909833B1 (fr) Reconnaissance vocale dans une vehicule automobile
DE102019200954A1 (de) Signalverarbeitungseinrichtung, System und Verfahren zur Verarbeitung von Audiosignalen
DE102018128006A1 (de) Natürlichsprachliche generierung basierend auf dem benutzersprachstil
DE102018108947A1 (de) Vorrichtung zum Korrigieren eines Äußerungsfehlers eines Benutzers und Verfahren davon
DE102016104060A1 (de) Stimmprofilbasierte Identitätsidentifikation für fahrzeuginternes Infotainment
DE102009017176A1 (de) Navigationsanordnung für ein Kraftfahrzeug
DE102017121059A1 (de) Identifikation und erzeugung von bevorzugten emoji
DE102017220266B3 (de) Verfahren zum Überprüfen eines Onboard-Spracherkenners eines Kraftfahrzeugs sowie Steuervorrichtung und Kraftfahrzeug
DE102018103188A1 (de) Verbesserte Aufgabenerledigung bei der Spracherkennung
EP3095114B1 (fr) Procédé et système pour générer une instruction de commande
DE102017121054A1 (de) Remote-spracherkennung in einem fahrzeug
DE60020504T2 (de) Anpassung eines spracherkenners an korrigierte texte
EP3430615A1 (fr) Moyen de déplacement, système et procédé d'ajustement d'une longueur d'une pause vocale autorisée lors d'une entrée vocale
EP3152753B1 (fr) Système d'assistance pouvant être commandé au moyen d'entrées vocales et comprenant un moyen fonctionnel et plusieurs modules de reconnaissance de la parole
EP3735688B1 (fr) Procédé, dispositif et support d'informations lisible par ordinateur ayant des instructions pour traiter une entrée vocale, véhicule automobile et terminal d'utilisateur doté d'un traitement vocal
DE102019102090A1 (de) Fahrzeuginterne medienstimmunterdrückung
EP3058565B1 (fr) Procédé de commande vocale ainsi que produit-programme d'ordinateur pour exécuter le procédé
WO2001086634A1 (fr) Procede pour produire une banque de donnees vocales pour un lexique cible pour l'apprentissage d'un systeme de reconnaissance vocale
WO2018188907A1 (fr) Traitement d'une entrée vocale
DE102015212650B4 (de) Verfahren und System zum rechnergestützten Verarbeiten einer Spracheingabe
EP3787954B1 (fr) Procédé et dispositif pour consigner une information concernant un véhicule ferroviaire
DE102018200570A1 (de) Verfahren zum Verarbeiten eines telefonischen Notrufs sowie System zum Durchführen des Verfahrens
DE102017213946B4 (de) Verfahren zum Aufbereiten eines Erkennungsergebnisses eines automatischen Online-Spracherkenners für ein mobiles Endgerät
DE102016003903A1 (de) Verfahren zur Spracherkennung in einem Kraftfahrzeug
DE102013216427B4 (de) Vorrichtung und Verfahren zur fortbewegungsmittelbasierten Sprachverarbeitung

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18712868

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18712868

Country of ref document: EP

Kind code of ref document: A1

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载