WO2018152034A1 - Détecteur d'activité vocale et procédés associés - Google Patents
Détecteur d'activité vocale et procédés associés Download PDFInfo
- Publication number
- WO2018152034A1 WO2018152034A1 PCT/US2018/017700 US2018017700W WO2018152034A1 WO 2018152034 A1 WO2018152034 A1 WO 2018152034A1 US 2018017700 W US2018017700 W US 2018017700W WO 2018152034 A1 WO2018152034 A1 WO 2018152034A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- noise
- frames
- signal
- frame
- voice activity
- Prior art date
Links
- 230000000694 effects Effects 0.000 title claims abstract description 111
- 238000000034 method Methods 0.000 title claims abstract description 27
- 230000009466 transformation Effects 0.000 claims abstract description 10
- 230000000903 blocking effect Effects 0.000 claims abstract description 6
- 230000001419 dependent effect Effects 0.000 claims abstract description 5
- 238000001228 spectrum Methods 0.000 claims description 35
- 238000001514 detection method Methods 0.000 claims description 24
- 238000012545 processing Methods 0.000 claims description 22
- 230000006870 function Effects 0.000 claims description 14
- 238000005070 sampling Methods 0.000 claims description 5
- 230000008859 change Effects 0.000 claims description 3
- 230000015572 biosynthetic process Effects 0.000 claims 2
- 238000007781 pre-processing Methods 0.000 description 13
- 230000005236 sound signal Effects 0.000 description 8
- 238000010586 diagram Methods 0.000 description 6
- 230000008569 process Effects 0.000 description 6
- 230000003750 conditioning effect Effects 0.000 description 4
- 230000003993 interaction Effects 0.000 description 4
- 229910044991 metal oxide Inorganic materials 0.000 description 4
- 150000004706 metal oxides Chemical class 0.000 description 4
- 239000004065 semiconductor Substances 0.000 description 4
- 230000004913 activation Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 230000002452 interceptive effect Effects 0.000 description 2
- 230000000737 periodic effect Effects 0.000 description 2
- 238000003860 storage Methods 0.000 description 2
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 2
- 230000004931 aggregating effect Effects 0.000 description 1
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 230000003321 amplification Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000003139 buffering effect Effects 0.000 description 1
- 239000003990 capacitor Substances 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 239000013078 crystal Substances 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 230000005669 field effect Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000012886 linear function Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000003199 nucleic acid amplification method Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 239000000758 substrate Substances 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
- G10L25/84—Detection of presence or absence of voice signals for discriminating voice from noise
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/18—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R3/00—Circuits for transducers, loudspeakers or microphones
Definitions
- the present disclosure relates generally to voice activity detection and, more particularly, to microphone components, electrical circuits, and methods for detecting voice activity.
- Voice control has been increasingly adopted as a favored mode of interaction with a variety of electronic devices including wireless communication handsets, tablets, and laptop personal computers (PCs), among other devices.
- voice activity detection is a prelude to voice or speech detection.
- Voice activity can be characterized as voice versus noise discrimination whereas voice or speech detection refers to the detection of speech or components of speech including, for example, phonemes, keywords, voice commands, and phrases.
- FIG. 1 is a block diagram of a voice activity detector embedded in a microphone assembly or in a host device, according to some embodiments.
- FIGS. 3A and 3B are spectrograms of amplitude versus time and frequency versus time, respectively, for clean speech, according to some embodiments.
- a plurality of power metrics is determined for each of a plurality of frames based on a transformation of frame data from the time domain to the frequency domain. More specifically, for each frame, the plurality of power metrics (e.g., power intensity) are determined for a corresponding plurality of frequencies (e.g., frequency bins) within a range of frequencies typically associated with voice activity. The result of the transformation at each frequency is a complex number (i.e., number with a real portion and an imaginary portion). In one
- any suitable number of bits may be truncated, and any suitable size of frame data may be used.
- storage and processing of LSBs is not required, thereby reducing hardware and software resource requirements.
- implementation of the functional block and process diagram of FIG. 2 may be implemented using 32,500 logic gates, including the memory used to store the transformed frame data.
- the frequency range may be determined based on empirical data, modeling, or it may be customized for a particular user using a learning algorithm.
- the lower end of the frequency range may be selected to exclude frequencies associated with interfering noise.
- the lower end of the frequency range may be selected based on estimated or measured noise for a particular application (e.g., background noise typical of cellphone use, road noise for in-vehicle use, etc.).
- the frequency range or cut-off frequencies may be dynamically adjusted based on ongoing periodic measures of ambient noise.
- the frequency band for the range of frequencies is between
- the bandwidth and boundary frequencies may be more or less depending on the requirements of the particular application.
- the lower frequency range may be increased from 1 .4 kHz to 1.5 kHz to exclude problematic noise at lower frequencies.
- the Nyquist rate criterion would be satisfied for these frequency ranges when sampled at 8 kHz.
- a higher sampling rate may be required to satisfy the Nyquist rate criterion, if desired, for greater bandwidths.
- a determination of the power metrics at several frequencies within a frequency range between approximately 1 .500 kHz and approximately 3.0 kHz has been found to be suitable for some applications. In one embodiment, power metrics are determined for not more than five frequency bins in this range.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Telephone Function (AREA)
Abstract
L'invention concerne des procédés, des systèmes et des appareils destinés à un détecteur d'activité acoustique de faible complexité. Un procédé comprend une première étape consistant à former une séquence de trames en bloquant des données numériques représentatives d'une activité acoustique. Puis, pour chaque trame, le procédé comprend les étapes consistant à : déterminer une pluralité de mesures de puissance sur la base d'une conversion des données des trames du domaine temporel au domaine fréquentiel à l'aide d'une transformée de Fourier discrète ayant des coefficients constants en fonction d'une pluralité de fréquences sélectionnées dans une plage de fréquences vocales ; déterminer une pluralité de rapports signal sur bruit de chaque mesure de puissance à une mesure de bruit correspondante ; déterminer un ou plusieurs rapports signal sur bruit ; et déterminer si les données numériques représentatives de l'activité acoustique contiennent une activité vocale en déterminant si le rapport signal sur bruit pour chaque trame d'une pluralité de trames répond à un critère.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201762458950P | 2017-02-14 | 2017-02-14 | |
US62/458,950 | 2017-02-14 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2018152034A1 true WO2018152034A1 (fr) | 2018-08-23 |
Family
ID=63170427
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2018/017700 WO2018152034A1 (fr) | 2017-02-14 | 2018-02-09 | Détecteur d'activité vocale et procédés associés |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2018152034A1 (fr) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10360926B2 (en) | 2014-07-10 | 2019-07-23 | Analog Devices Global Unlimited Company | Low-complexity voice activity detection |
CN112967738A (zh) * | 2021-02-01 | 2021-06-15 | 腾讯音乐娱乐科技(深圳)有限公司 | 人声检测方法、装置及电子设备和计算机可读存储介质 |
CN114283840A (zh) * | 2021-12-22 | 2022-04-05 | 天翼爱音乐文化科技有限公司 | 一种指令音频生成方法、系统、装置与存储介质 |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5963901A (en) * | 1995-12-12 | 1999-10-05 | Nokia Mobile Phones Ltd. | Method and device for voice activity detection and a communication device |
US6453291B1 (en) * | 1999-02-04 | 2002-09-17 | Motorola, Inc. | Apparatus and method for voice activity detection in a communication system |
US20150243300A1 (en) * | 2007-02-26 | 2015-08-27 | Dolby Laboratories Licensing Corporation | Voice Activity Detector for Audio Signals |
-
2018
- 2018-02-09 WO PCT/US2018/017700 patent/WO2018152034A1/fr active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5963901A (en) * | 1995-12-12 | 1999-10-05 | Nokia Mobile Phones Ltd. | Method and device for voice activity detection and a communication device |
US6453291B1 (en) * | 1999-02-04 | 2002-09-17 | Motorola, Inc. | Apparatus and method for voice activity detection in a communication system |
US20150243300A1 (en) * | 2007-02-26 | 2015-08-27 | Dolby Laboratories Licensing Corporation | Voice Activity Detector for Audio Signals |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10360926B2 (en) | 2014-07-10 | 2019-07-23 | Analog Devices Global Unlimited Company | Low-complexity voice activity detection |
CN112967738A (zh) * | 2021-02-01 | 2021-06-15 | 腾讯音乐娱乐科技(深圳)有限公司 | 人声检测方法、装置及电子设备和计算机可读存储介质 |
CN114283840A (zh) * | 2021-12-22 | 2022-04-05 | 天翼爱音乐文化科技有限公司 | 一种指令音频生成方法、系统、装置与存储介质 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10381021B2 (en) | Robust feature extraction using differential zero-crossing counts | |
US10867611B2 (en) | User programmable voice command recognition based on sparse features | |
US10535365B2 (en) | Analog voice activity detection | |
US9412373B2 (en) | Adaptive environmental context sample and update for comparing speech recognition | |
US9721560B2 (en) | Cloud based adaptive learning for distributed sensors | |
US9460720B2 (en) | Powering-up AFE and microcontroller after comparing analog and truncated sounds | |
CN110244833B (zh) | 麦克风组件 | |
US20180268811A1 (en) | Apparatus and Method for Power Efficient Signal Conditioning For a Voice Recognition System | |
US20190355383A1 (en) | Low-complexity voice activity detection | |
US11087780B2 (en) | Analog voice activity detector systems and methods | |
US9406313B2 (en) | Adaptive microphone sampling rate techniques | |
US20150063575A1 (en) | Acoustic Sound Signature Detection Based on Sparse Features | |
CN111433737B (zh) | 电子装置及其控制方法 | |
US11172312B2 (en) | Acoustic activity detecting microphone | |
CN104216677A (zh) | 用于设备唤醒的低功率语音门 | |
US11264049B2 (en) | Systems and methods for capturing noise for pattern recognition processing | |
WO2018152034A1 (fr) | Détecteur d'activité vocale et procédés associés | |
US20160210051A1 (en) | Low Power Voice Trigger For Acoustic Apparatus And Method | |
CN116416979A (zh) | 级联音频检出系统 | |
CN114584907A (zh) | 具有低数据速率接口的数字麦克风 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18753695 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 18753695 Country of ref document: EP Kind code of ref document: A1 |