+

WO2018169772A3 - Quality feedback on user-recorded keywords for automatic speech recognition systems - Google Patents

Quality feedback on user-recorded keywords for automatic speech recognition systems Download PDF

Info

Publication number
WO2018169772A3
WO2018169772A3 PCT/US2018/021670 US2018021670W WO2018169772A3 WO 2018169772 A3 WO2018169772 A3 WO 2018169772A3 US 2018021670 W US2018021670 W US 2018021670W WO 2018169772 A3 WO2018169772 A3 WO 2018169772A3
Authority
WO
WIPO (PCT)
Prior art keywords
user
speech recognition
subunits
automatic speech
recognition systems
Prior art date
Application number
PCT/US2018/021670
Other languages
French (fr)
Other versions
WO2018169772A2 (en
Inventor
Tarkesh Pande
Lorin Paul NETSCH
David Patrick Magee
Original Assignee
Texas Instruments Incorporated
Texas Instruments Japan Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Texas Instruments Incorporated, Texas Instruments Japan Limited filed Critical Texas Instruments Incorporated
Priority to CN201880017460.1A priority Critical patent/CN110419078B/en
Publication of WO2018169772A2 publication Critical patent/WO2018169772A2/en
Publication of WO2018169772A3 publication Critical patent/WO2018169772A3/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/01Assessment or evaluation of speech recognition systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • G10L2015/025Phonemes, fenemes or fenones being the recognition units
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L2015/088Word spotting
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/221Announcement of recognition results

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)
  • Navigation (AREA)

Abstract

In an automated speech recognition system (50), a microphone (52) records a keyword spoken by a user. A front end (62) divides the recorded keyword into a plurality of subunits, each containing a segment of recorded audio, and extracts a set of features from each of the plurality of subunits. A decoder (64) assigns one of a plurality of content classes to each of the plurality of subunits according to at least the extracted set of features for each subunit. A quality evaluation component (66) calculates a score representing a quality of the keyword from the content classes assigned to the plurality of subunits.
PCT/US2018/021670 2017-03-14 2018-03-09 Quality feedback on user-recorded keywords for automatic speech recognition systems WO2018169772A2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201880017460.1A CN110419078B (en) 2017-03-14 2018-03-09 System and method for automatic speech recognition

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US201762470910P 2017-03-14 2017-03-14
US62/470,910 2017-03-14
US15/706,128 2017-09-15
US15/706,128 US11024302B2 (en) 2017-03-14 2017-09-15 Quality feedback on user-recorded keywords for automatic speech recognition systems

Publications (2)

Publication Number Publication Date
WO2018169772A2 WO2018169772A2 (en) 2018-09-20
WO2018169772A3 true WO2018169772A3 (en) 2018-10-25

Family

ID=63520181

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2018/021670 WO2018169772A2 (en) 2017-03-14 2018-03-09 Quality feedback on user-recorded keywords for automatic speech recognition systems

Country Status (3)

Country Link
US (1) US11024302B2 (en)
CN (1) CN110419078B (en)
WO (1) WO2018169772A2 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11011155B2 (en) 2017-08-01 2021-05-18 Texas Instruments Incorporated Multi-phrase difference confidence scoring
US10692490B2 (en) * 2018-07-31 2020-06-23 Cirrus Logic, Inc. Detection of replay attack
US20220230643A1 (en) * 2022-04-01 2022-07-21 Intel Corporation Technologies for enhancing audio quality during low-quality connection conditions

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6125345A (en) * 1997-09-19 2000-09-26 At&T Corporation Method and apparatus for discriminative utterance verification using multiple confidence measures
US20010014857A1 (en) * 1998-08-14 2001-08-16 Zifei Peter Wang A voice activity detector for packet voice network
US20070250320A1 (en) * 2006-04-25 2007-10-25 General Motors Corporation Dynamic clustering of nametags in an automated speech recognition system
US8566087B2 (en) * 2006-06-13 2013-10-22 Nuance Communications, Inc. Context-based grammars for automated speech recognition

Family Cites Families (38)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4227177A (en) * 1978-04-27 1980-10-07 Dialog Systems, Inc. Continuous speech recognition method
US4489435A (en) * 1981-10-05 1984-12-18 Exxon Corporation Method and apparatus for continuous word string recognition
US5621859A (en) * 1994-01-19 1997-04-15 Bbn Corporation Single tree method for grammar directed, very large vocabulary speech recognizer
JPH1195795A (en) * 1997-09-16 1999-04-09 Nippon Telegr & Teleph Corp <Ntt> Voice quality evaluating method and recording medium
US7318032B1 (en) * 2000-06-13 2008-01-08 International Business Machines Corporation Speaker recognition method based on structured speaker modeling and a “Pickmax” scoring technique
EP1215661A1 (en) * 2000-12-14 2002-06-19 TELEFONAKTIEBOLAGET L M ERICSSON (publ) Mobile terminal controllable by spoken utterances
GB2375935A (en) * 2001-05-22 2002-11-27 Motorola Inc Speech quality indication
US7478043B1 (en) * 2002-06-05 2009-01-13 Verizon Corporate Services Group, Inc. Estimation of speech spectral parameters in the presence of noise
JP2005534257A (en) * 2002-07-26 2005-11-10 モトローラ・インコーポレイテッド Method for fast dynamic estimation of background noise
US7930176B2 (en) * 2005-05-20 2011-04-19 Broadcom Corporation Packet loss concealment for block-independent speech codecs
US20070136054A1 (en) * 2005-12-08 2007-06-14 Hyun Woo Kim Apparatus and method of searching for fixed codebook in speech codecs based on CELP
JP5212910B2 (en) * 2006-07-07 2013-06-19 日本電気株式会社 Speech recognition apparatus, speech recognition method, and speech recognition program
KR101415534B1 (en) * 2007-02-23 2014-07-07 삼성전자주식회사 Multi-stage speech recognition apparatus and method
US8898055B2 (en) * 2007-05-14 2014-11-25 Panasonic Intellectual Property Corporation Of America Voice quality conversion device and voice quality conversion method for converting voice quality of an input speech using target vocal tract information and received vocal tract information corresponding to the input speech
US7831427B2 (en) * 2007-06-20 2010-11-09 Microsoft Corporation Concept monitoring in spoken-word audio
WO2009081895A1 (en) * 2007-12-25 2009-07-02 Nec Corporation Voice recognition system, voice recognition method, and voice recognition program
CN101727903B (en) * 2008-10-29 2011-10-19 中国科学院自动化研究所 Pronunciation quality assessment and error detection method based on fusion of multiple characteristics and multiple systems
CN101740024B (en) * 2008-11-19 2012-02-08 中国科学院自动化研究所 An automatic assessment method for oral fluency based on generalized fluency
JP5187584B2 (en) * 2009-02-13 2013-04-24 日本電気株式会社 Input speech evaluation apparatus, input speech evaluation method, and evaluation program
US8468012B2 (en) * 2010-05-26 2013-06-18 Google Inc. Acoustic model adaptation using geographic information
US9992745B2 (en) * 2011-11-01 2018-06-05 Qualcomm Incorporated Extraction and analysis of buffered audio data using multiple codec rates each greater than a low-power processor rate
BR112014026148A2 (en) * 2012-04-27 2018-05-08 Interactive Intelligence Inc method for using negative word examples in a speech recognition system and system for identifying negative keyword examples.
US9646610B2 (en) * 2012-10-30 2017-05-09 Motorola Solutions, Inc. Method and apparatus for activating a particular wireless communication device to accept speech and/or voice commands using identification data consisting of speech, voice, image recognition
US9129602B1 (en) * 2012-12-14 2015-09-08 Amazon Technologies, Inc. Mimicking user speech patterns
US9230550B2 (en) * 2013-01-10 2016-01-05 Sensory, Incorporated Speaker verification and identification using artificial neural network-based sub-phonetic unit discrimination
US9390708B1 (en) * 2013-05-28 2016-07-12 Amazon Technologies, Inc. Low latency and memory efficient keywork spotting
US9502028B2 (en) * 2013-10-18 2016-11-22 Knowles Electronics, Llc Acoustic activity detection apparatus and method
US9613619B2 (en) * 2013-10-30 2017-04-04 Genesys Telecommunications Laboratories, Inc. Predicting recognition quality of a phrase in automatic speech recognition systems
CN104934035B (en) * 2014-03-21 2017-09-26 华为技术有限公司 The coding/decoding method and device of language audio code stream
US9775113B2 (en) * 2014-12-11 2017-09-26 Mediatek Inc. Voice wakeup detecting device with digital microphone and associated method
US9508340B2 (en) * 2014-12-22 2016-11-29 Google Inc. User specified keyword spotting using long short term memory neural network feature extractor
EP3038106B1 (en) * 2014-12-24 2017-10-18 Nxp B.V. Audio signal enhancement
US10176219B2 (en) * 2015-03-13 2019-01-08 Microsoft Technology Licensing, Llc Interactive reformulation of speech queries
JP6614639B2 (en) * 2015-05-22 2019-12-04 国立研究開発法人情報通信研究機構 Speech recognition apparatus and computer program
US10074363B2 (en) * 2015-11-11 2018-09-11 Apptek, Inc. Method and apparatus for keyword speech recognition
JP6727607B2 (en) * 2016-06-09 2020-07-22 国立研究開発法人情報通信研究機構 Speech recognition device and computer program
GB2552722A (en) * 2016-08-03 2018-02-07 Cirrus Logic Int Semiconductor Ltd Speaker recognition
JP6812843B2 (en) * 2017-02-23 2021-01-13 富士通株式会社 Computer program for voice recognition, voice recognition device and voice recognition method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6125345A (en) * 1997-09-19 2000-09-26 At&T Corporation Method and apparatus for discriminative utterance verification using multiple confidence measures
US20010014857A1 (en) * 1998-08-14 2001-08-16 Zifei Peter Wang A voice activity detector for packet voice network
US20070250320A1 (en) * 2006-04-25 2007-10-25 General Motors Corporation Dynamic clustering of nametags in an automated speech recognition system
US8566087B2 (en) * 2006-06-13 2013-10-22 Nuance Communications, Inc. Context-based grammars for automated speech recognition

Also Published As

Publication number Publication date
US11024302B2 (en) 2021-06-01
US20180268815A1 (en) 2018-09-20
WO2018169772A2 (en) 2018-09-20
CN110419078A (en) 2019-11-05
CN110419078B (en) 2024-01-23

Similar Documents

Publication Publication Date Title
EP3373292A3 (en) Method for controlling artificial intelligence system that performs multilingual processing
EP4235648A3 (en) Language model biasing
WO2019161193A3 (en) System and method for adaptive detection of spoken language via multiple speech models
GB2556459A (en) Neural networks for speaker verification
EP2963643A3 (en) Entity name recognition
JP7526846B2 (en) voice recognition
EP4235646A3 (en) Adaptive audio enhancement for multichannel speech recognition
EP4414977A3 (en) Speech endpointing
EP3767620A3 (en) Speech endpointing based on word comparisons
EP4254402A3 (en) Automatically determining language for speech recognition of spoken utterance received via an automated assistant interface
EP4435692A3 (en) Delayed responses by computational assistant
MY153562A (en) Method and discriminator for classifying different segments of a signal
EP3154055A3 (en) Dynamic threshold for speaker verification
WO2016126768A3 (en) Conference word cloud
KR101616112B1 (en) Speaker separation system and method using voice feature vectors
WO2018169772A3 (en) Quality feedback on user-recorded keywords for automatic speech recognition systems
CN106878805A (en) Mixed language subtitle file generation method and device
WO2019071177A3 (en) Attendee engagement determining system and method
EP4407958A3 (en) Voice query qos based on client-computed content metadata
WO2020117639A3 (en) Text independent speaker recognition
MY186158A (en) Sending device, sending method, receiving device, receiving method, information processing device, and information processing method
DE602005007939D1 (en) METHOD AND SYSTEM FOR AUTOMATICALLY PROVIDING LINGUISTIC FORMULATIONS OUTSIDE RECEIVING SYSTEM
EP4276816A3 (en) Speech processing
Milner et al. The 2015 Sheffield system for longitudinal diarisation of broadcast media
EA202091595A1 (en) METHOD AND DEVICE FOR BUILDING VOICE MODEL OF A TARGET ANNOUNCER

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18767886

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18767886

Country of ref document: EP

Kind code of ref document: A2

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载