WO2018169772A3 - Quality feedback on user-recorded keywords for automatic speech recognition systems - Google Patents
Quality feedback on user-recorded keywords for automatic speech recognition systems Download PDFInfo
- Publication number
- WO2018169772A3 WO2018169772A3 PCT/US2018/021670 US2018021670W WO2018169772A3 WO 2018169772 A3 WO2018169772 A3 WO 2018169772A3 US 2018021670 W US2018021670 W US 2018021670W WO 2018169772 A3 WO2018169772 A3 WO 2018169772A3
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- user
- speech recognition
- subunits
- automatic speech
- recognition systems
- Prior art date
Links
- 239000000284 extract Substances 0.000 abstract 1
- 238000013441 quality evaluation Methods 0.000 abstract 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/01—Assessment or evaluation of speech recognition systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
- G10L2015/025—Phonemes, fenemes or fenones being the recognition units
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L2015/088—Word spotting
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/221—Announcement of recognition results
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Machine Translation (AREA)
- Navigation (AREA)
Abstract
In an automated speech recognition system (50), a microphone (52) records a keyword spoken by a user. A front end (62) divides the recorded keyword into a plurality of subunits, each containing a segment of recorded audio, and extracts a set of features from each of the plurality of subunits. A decoder (64) assigns one of a plurality of content classes to each of the plurality of subunits according to at least the extracted set of features for each subunit. A quality evaluation component (66) calculates a score representing a quality of the keyword from the content classes assigned to the plurality of subunits.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201880017460.1A CN110419078B (en) | 2017-03-14 | 2018-03-09 | System and method for automatic speech recognition |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201762470910P | 2017-03-14 | 2017-03-14 | |
US62/470,910 | 2017-03-14 | ||
US15/706,128 | 2017-09-15 | ||
US15/706,128 US11024302B2 (en) | 2017-03-14 | 2017-09-15 | Quality feedback on user-recorded keywords for automatic speech recognition systems |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2018169772A2 WO2018169772A2 (en) | 2018-09-20 |
WO2018169772A3 true WO2018169772A3 (en) | 2018-10-25 |
Family
ID=63520181
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2018/021670 WO2018169772A2 (en) | 2017-03-14 | 2018-03-09 | Quality feedback on user-recorded keywords for automatic speech recognition systems |
Country Status (3)
Country | Link |
---|---|
US (1) | US11024302B2 (en) |
CN (1) | CN110419078B (en) |
WO (1) | WO2018169772A2 (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11011155B2 (en) | 2017-08-01 | 2021-05-18 | Texas Instruments Incorporated | Multi-phrase difference confidence scoring |
US10692490B2 (en) * | 2018-07-31 | 2020-06-23 | Cirrus Logic, Inc. | Detection of replay attack |
US20220230643A1 (en) * | 2022-04-01 | 2022-07-21 | Intel Corporation | Technologies for enhancing audio quality during low-quality connection conditions |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6125345A (en) * | 1997-09-19 | 2000-09-26 | At&T Corporation | Method and apparatus for discriminative utterance verification using multiple confidence measures |
US20010014857A1 (en) * | 1998-08-14 | 2001-08-16 | Zifei Peter Wang | A voice activity detector for packet voice network |
US20070250320A1 (en) * | 2006-04-25 | 2007-10-25 | General Motors Corporation | Dynamic clustering of nametags in an automated speech recognition system |
US8566087B2 (en) * | 2006-06-13 | 2013-10-22 | Nuance Communications, Inc. | Context-based grammars for automated speech recognition |
Family Cites Families (38)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4227177A (en) * | 1978-04-27 | 1980-10-07 | Dialog Systems, Inc. | Continuous speech recognition method |
US4489435A (en) * | 1981-10-05 | 1984-12-18 | Exxon Corporation | Method and apparatus for continuous word string recognition |
US5621859A (en) * | 1994-01-19 | 1997-04-15 | Bbn Corporation | Single tree method for grammar directed, very large vocabulary speech recognizer |
JPH1195795A (en) * | 1997-09-16 | 1999-04-09 | Nippon Telegr & Teleph Corp <Ntt> | Voice quality evaluating method and recording medium |
US7318032B1 (en) * | 2000-06-13 | 2008-01-08 | International Business Machines Corporation | Speaker recognition method based on structured speaker modeling and a “Pickmax” scoring technique |
EP1215661A1 (en) * | 2000-12-14 | 2002-06-19 | TELEFONAKTIEBOLAGET L M ERICSSON (publ) | Mobile terminal controllable by spoken utterances |
GB2375935A (en) * | 2001-05-22 | 2002-11-27 | Motorola Inc | Speech quality indication |
US7478043B1 (en) * | 2002-06-05 | 2009-01-13 | Verizon Corporate Services Group, Inc. | Estimation of speech spectral parameters in the presence of noise |
JP2005534257A (en) * | 2002-07-26 | 2005-11-10 | モトローラ・インコーポレイテッド | Method for fast dynamic estimation of background noise |
US7930176B2 (en) * | 2005-05-20 | 2011-04-19 | Broadcom Corporation | Packet loss concealment for block-independent speech codecs |
US20070136054A1 (en) * | 2005-12-08 | 2007-06-14 | Hyun Woo Kim | Apparatus and method of searching for fixed codebook in speech codecs based on CELP |
JP5212910B2 (en) * | 2006-07-07 | 2013-06-19 | 日本電気株式会社 | Speech recognition apparatus, speech recognition method, and speech recognition program |
KR101415534B1 (en) * | 2007-02-23 | 2014-07-07 | 삼성전자주식회사 | Multi-stage speech recognition apparatus and method |
US8898055B2 (en) * | 2007-05-14 | 2014-11-25 | Panasonic Intellectual Property Corporation Of America | Voice quality conversion device and voice quality conversion method for converting voice quality of an input speech using target vocal tract information and received vocal tract information corresponding to the input speech |
US7831427B2 (en) * | 2007-06-20 | 2010-11-09 | Microsoft Corporation | Concept monitoring in spoken-word audio |
WO2009081895A1 (en) * | 2007-12-25 | 2009-07-02 | Nec Corporation | Voice recognition system, voice recognition method, and voice recognition program |
CN101727903B (en) * | 2008-10-29 | 2011-10-19 | 中国科学院自动化研究所 | Pronunciation quality assessment and error detection method based on fusion of multiple characteristics and multiple systems |
CN101740024B (en) * | 2008-11-19 | 2012-02-08 | 中国科学院自动化研究所 | An automatic assessment method for oral fluency based on generalized fluency |
JP5187584B2 (en) * | 2009-02-13 | 2013-04-24 | 日本電気株式会社 | Input speech evaluation apparatus, input speech evaluation method, and evaluation program |
US8468012B2 (en) * | 2010-05-26 | 2013-06-18 | Google Inc. | Acoustic model adaptation using geographic information |
US9992745B2 (en) * | 2011-11-01 | 2018-06-05 | Qualcomm Incorporated | Extraction and analysis of buffered audio data using multiple codec rates each greater than a low-power processor rate |
BR112014026148A2 (en) * | 2012-04-27 | 2018-05-08 | Interactive Intelligence Inc | method for using negative word examples in a speech recognition system and system for identifying negative keyword examples. |
US9646610B2 (en) * | 2012-10-30 | 2017-05-09 | Motorola Solutions, Inc. | Method and apparatus for activating a particular wireless communication device to accept speech and/or voice commands using identification data consisting of speech, voice, image recognition |
US9129602B1 (en) * | 2012-12-14 | 2015-09-08 | Amazon Technologies, Inc. | Mimicking user speech patterns |
US9230550B2 (en) * | 2013-01-10 | 2016-01-05 | Sensory, Incorporated | Speaker verification and identification using artificial neural network-based sub-phonetic unit discrimination |
US9390708B1 (en) * | 2013-05-28 | 2016-07-12 | Amazon Technologies, Inc. | Low latency and memory efficient keywork spotting |
US9502028B2 (en) * | 2013-10-18 | 2016-11-22 | Knowles Electronics, Llc | Acoustic activity detection apparatus and method |
US9613619B2 (en) * | 2013-10-30 | 2017-04-04 | Genesys Telecommunications Laboratories, Inc. | Predicting recognition quality of a phrase in automatic speech recognition systems |
CN104934035B (en) * | 2014-03-21 | 2017-09-26 | 华为技术有限公司 | The coding/decoding method and device of language audio code stream |
US9775113B2 (en) * | 2014-12-11 | 2017-09-26 | Mediatek Inc. | Voice wakeup detecting device with digital microphone and associated method |
US9508340B2 (en) * | 2014-12-22 | 2016-11-29 | Google Inc. | User specified keyword spotting using long short term memory neural network feature extractor |
EP3038106B1 (en) * | 2014-12-24 | 2017-10-18 | Nxp B.V. | Audio signal enhancement |
US10176219B2 (en) * | 2015-03-13 | 2019-01-08 | Microsoft Technology Licensing, Llc | Interactive reformulation of speech queries |
JP6614639B2 (en) * | 2015-05-22 | 2019-12-04 | 国立研究開発法人情報通信研究機構 | Speech recognition apparatus and computer program |
US10074363B2 (en) * | 2015-11-11 | 2018-09-11 | Apptek, Inc. | Method and apparatus for keyword speech recognition |
JP6727607B2 (en) * | 2016-06-09 | 2020-07-22 | 国立研究開発法人情報通信研究機構 | Speech recognition device and computer program |
GB2552722A (en) * | 2016-08-03 | 2018-02-07 | Cirrus Logic Int Semiconductor Ltd | Speaker recognition |
JP6812843B2 (en) * | 2017-02-23 | 2021-01-13 | 富士通株式会社 | Computer program for voice recognition, voice recognition device and voice recognition method |
-
2017
- 2017-09-15 US US15/706,128 patent/US11024302B2/en active Active
-
2018
- 2018-03-09 CN CN201880017460.1A patent/CN110419078B/en active Active
- 2018-03-09 WO PCT/US2018/021670 patent/WO2018169772A2/en active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6125345A (en) * | 1997-09-19 | 2000-09-26 | At&T Corporation | Method and apparatus for discriminative utterance verification using multiple confidence measures |
US20010014857A1 (en) * | 1998-08-14 | 2001-08-16 | Zifei Peter Wang | A voice activity detector for packet voice network |
US20070250320A1 (en) * | 2006-04-25 | 2007-10-25 | General Motors Corporation | Dynamic clustering of nametags in an automated speech recognition system |
US8566087B2 (en) * | 2006-06-13 | 2013-10-22 | Nuance Communications, Inc. | Context-based grammars for automated speech recognition |
Also Published As
Publication number | Publication date |
---|---|
US11024302B2 (en) | 2021-06-01 |
US20180268815A1 (en) | 2018-09-20 |
WO2018169772A2 (en) | 2018-09-20 |
CN110419078A (en) | 2019-11-05 |
CN110419078B (en) | 2024-01-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP3373292A3 (en) | Method for controlling artificial intelligence system that performs multilingual processing | |
EP4235648A3 (en) | Language model biasing | |
WO2019161193A3 (en) | System and method for adaptive detection of spoken language via multiple speech models | |
GB2556459A (en) | Neural networks for speaker verification | |
EP2963643A3 (en) | Entity name recognition | |
JP7526846B2 (en) | voice recognition | |
EP4235646A3 (en) | Adaptive audio enhancement for multichannel speech recognition | |
EP4414977A3 (en) | Speech endpointing | |
EP3767620A3 (en) | Speech endpointing based on word comparisons | |
EP4254402A3 (en) | Automatically determining language for speech recognition of spoken utterance received via an automated assistant interface | |
EP4435692A3 (en) | Delayed responses by computational assistant | |
MY153562A (en) | Method and discriminator for classifying different segments of a signal | |
EP3154055A3 (en) | Dynamic threshold for speaker verification | |
WO2016126768A3 (en) | Conference word cloud | |
KR101616112B1 (en) | Speaker separation system and method using voice feature vectors | |
WO2018169772A3 (en) | Quality feedback on user-recorded keywords for automatic speech recognition systems | |
CN106878805A (en) | Mixed language subtitle file generation method and device | |
WO2019071177A3 (en) | Attendee engagement determining system and method | |
EP4407958A3 (en) | Voice query qos based on client-computed content metadata | |
WO2020117639A3 (en) | Text independent speaker recognition | |
MY186158A (en) | Sending device, sending method, receiving device, receiving method, information processing device, and information processing method | |
DE602005007939D1 (en) | METHOD AND SYSTEM FOR AUTOMATICALLY PROVIDING LINGUISTIC FORMULATIONS OUTSIDE RECEIVING SYSTEM | |
EP4276816A3 (en) | Speech processing | |
Milner et al. | The 2015 Sheffield system for longitudinal diarisation of broadcast media | |
EA202091595A1 (en) | METHOD AND DEVICE FOR BUILDING VOICE MODEL OF A TARGET ANNOUNCER |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18767886 Country of ref document: EP Kind code of ref document: A2 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 18767886 Country of ref document: EP Kind code of ref document: A2 |