+

WO2018152034A1 - Détecteur d'activité vocale et procédés associés - Google Patents

Détecteur d'activité vocale et procédés associés Download PDF

Info

Publication number
WO2018152034A1
WO2018152034A1 PCT/US2018/017700 US2018017700W WO2018152034A1 WO 2018152034 A1 WO2018152034 A1 WO 2018152034A1 US 2018017700 W US2018017700 W US 2018017700W WO 2018152034 A1 WO2018152034 A1 WO 2018152034A1
Authority
WO
WIPO (PCT)
Prior art keywords
noise
frames
signal
frame
voice activity
Prior art date
Application number
PCT/US2018/017700
Other languages
English (en)
Inventor
Rohit PATURI
Anne YE
Leonardo Rub
Jean Laroche
Sridhar Krishna NEMALA
Original Assignee
Knowles Electronics, Llc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Knowles Electronics, Llc filed Critical Knowles Electronics, Llc
Publication of WO2018152034A1 publication Critical patent/WO2018152034A1/fr

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L25/84Detection of presence or absence of voice signals for discriminating voice from noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones

Definitions

  • the present disclosure relates generally to voice activity detection and, more particularly, to microphone components, electrical circuits, and methods for detecting voice activity.
  • Voice control has been increasingly adopted as a favored mode of interaction with a variety of electronic devices including wireless communication handsets, tablets, and laptop personal computers (PCs), among other devices.
  • voice activity detection is a prelude to voice or speech detection.
  • Voice activity can be characterized as voice versus noise discrimination whereas voice or speech detection refers to the detection of speech or components of speech including, for example, phonemes, keywords, voice commands, and phrases.
  • FIG. 1 is a block diagram of a voice activity detector embedded in a microphone assembly or in a host device, according to some embodiments.
  • FIGS. 3A and 3B are spectrograms of amplitude versus time and frequency versus time, respectively, for clean speech, according to some embodiments.
  • a plurality of power metrics is determined for each of a plurality of frames based on a transformation of frame data from the time domain to the frequency domain. More specifically, for each frame, the plurality of power metrics (e.g., power intensity) are determined for a corresponding plurality of frequencies (e.g., frequency bins) within a range of frequencies typically associated with voice activity. The result of the transformation at each frequency is a complex number (i.e., number with a real portion and an imaginary portion). In one
  • any suitable number of bits may be truncated, and any suitable size of frame data may be used.
  • storage and processing of LSBs is not required, thereby reducing hardware and software resource requirements.
  • implementation of the functional block and process diagram of FIG. 2 may be implemented using 32,500 logic gates, including the memory used to store the transformed frame data.
  • the frequency range may be determined based on empirical data, modeling, or it may be customized for a particular user using a learning algorithm.
  • the lower end of the frequency range may be selected to exclude frequencies associated with interfering noise.
  • the lower end of the frequency range may be selected based on estimated or measured noise for a particular application (e.g., background noise typical of cellphone use, road noise for in-vehicle use, etc.).
  • the frequency range or cut-off frequencies may be dynamically adjusted based on ongoing periodic measures of ambient noise.
  • the frequency band for the range of frequencies is between
  • the bandwidth and boundary frequencies may be more or less depending on the requirements of the particular application.
  • the lower frequency range may be increased from 1 .4 kHz to 1.5 kHz to exclude problematic noise at lower frequencies.
  • the Nyquist rate criterion would be satisfied for these frequency ranges when sampled at 8 kHz.
  • a higher sampling rate may be required to satisfy the Nyquist rate criterion, if desired, for greater bandwidths.
  • a determination of the power metrics at several frequencies within a frequency range between approximately 1 .500 kHz and approximately 3.0 kHz has been found to be suitable for some applications. In one embodiment, power metrics are determined for not more than five frequency bins in this range.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Telephone Function (AREA)

Abstract

L'invention concerne des procédés, des systèmes et des appareils destinés à un détecteur d'activité acoustique de faible complexité. Un procédé comprend une première étape consistant à former une séquence de trames en bloquant des données numériques représentatives d'une activité acoustique. Puis, pour chaque trame, le procédé comprend les étapes consistant à : déterminer une pluralité de mesures de puissance sur la base d'une conversion des données des trames du domaine temporel au domaine fréquentiel à l'aide d'une transformée de Fourier discrète ayant des coefficients constants en fonction d'une pluralité de fréquences sélectionnées dans une plage de fréquences vocales ; déterminer une pluralité de rapports signal sur bruit de chaque mesure de puissance à une mesure de bruit correspondante ; déterminer un ou plusieurs rapports signal sur bruit ; et déterminer si les données numériques représentatives de l'activité acoustique contiennent une activité vocale en déterminant si le rapport signal sur bruit pour chaque trame d'une pluralité de trames répond à un critère.
PCT/US2018/017700 2017-02-14 2018-02-09 Détecteur d'activité vocale et procédés associés WO2018152034A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201762458950P 2017-02-14 2017-02-14
US62/458,950 2017-02-14

Publications (1)

Publication Number Publication Date
WO2018152034A1 true WO2018152034A1 (fr) 2018-08-23

Family

ID=63170427

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2018/017700 WO2018152034A1 (fr) 2017-02-14 2018-02-09 Détecteur d'activité vocale et procédés associés

Country Status (1)

Country Link
WO (1) WO2018152034A1 (fr)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10360926B2 (en) 2014-07-10 2019-07-23 Analog Devices Global Unlimited Company Low-complexity voice activity detection
CN112967738A (zh) * 2021-02-01 2021-06-15 腾讯音乐娱乐科技(深圳)有限公司 人声检测方法、装置及电子设备和计算机可读存储介质
CN114283840A (zh) * 2021-12-22 2022-04-05 天翼爱音乐文化科技有限公司 一种指令音频生成方法、系统、装置与存储介质

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5963901A (en) * 1995-12-12 1999-10-05 Nokia Mobile Phones Ltd. Method and device for voice activity detection and a communication device
US6453291B1 (en) * 1999-02-04 2002-09-17 Motorola, Inc. Apparatus and method for voice activity detection in a communication system
US20150243300A1 (en) * 2007-02-26 2015-08-27 Dolby Laboratories Licensing Corporation Voice Activity Detector for Audio Signals

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5963901A (en) * 1995-12-12 1999-10-05 Nokia Mobile Phones Ltd. Method and device for voice activity detection and a communication device
US6453291B1 (en) * 1999-02-04 2002-09-17 Motorola, Inc. Apparatus and method for voice activity detection in a communication system
US20150243300A1 (en) * 2007-02-26 2015-08-27 Dolby Laboratories Licensing Corporation Voice Activity Detector for Audio Signals

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10360926B2 (en) 2014-07-10 2019-07-23 Analog Devices Global Unlimited Company Low-complexity voice activity detection
CN112967738A (zh) * 2021-02-01 2021-06-15 腾讯音乐娱乐科技(深圳)有限公司 人声检测方法、装置及电子设备和计算机可读存储介质
CN114283840A (zh) * 2021-12-22 2022-04-05 天翼爱音乐文化科技有限公司 一种指令音频生成方法、系统、装置与存储介质

Similar Documents

Publication Publication Date Title
US10381021B2 (en) Robust feature extraction using differential zero-crossing counts
US10867611B2 (en) User programmable voice command recognition based on sparse features
US10535365B2 (en) Analog voice activity detection
US9412373B2 (en) Adaptive environmental context sample and update for comparing speech recognition
US9721560B2 (en) Cloud based adaptive learning for distributed sensors
US9460720B2 (en) Powering-up AFE and microcontroller after comparing analog and truncated sounds
CN110244833B (zh) 麦克风组件
US20180268811A1 (en) Apparatus and Method for Power Efficient Signal Conditioning For a Voice Recognition System
US20190355383A1 (en) Low-complexity voice activity detection
US11087780B2 (en) Analog voice activity detector systems and methods
US9406313B2 (en) Adaptive microphone sampling rate techniques
US20150063575A1 (en) Acoustic Sound Signature Detection Based on Sparse Features
CN111433737B (zh) 电子装置及其控制方法
US11172312B2 (en) Acoustic activity detecting microphone
CN104216677A (zh) 用于设备唤醒的低功率语音门
US11264049B2 (en) Systems and methods for capturing noise for pattern recognition processing
WO2018152034A1 (fr) Détecteur d'activité vocale et procédés associés
US20160210051A1 (en) Low Power Voice Trigger For Acoustic Apparatus And Method
CN116416979A (zh) 级联音频检出系统
CN114584907A (zh) 具有低数据速率接口的数字麦克风

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18753695

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18753695

Country of ref document: EP

Kind code of ref document: A1

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载