Wang et al., 2020 - Google Patents

Complex spectral mapping for single-and multi-channel speech enhancement and robust ASR

Wang et al., 2020

Document ID: 6818794445199689708
Author: Wang Z; Wang P; Wang D
Publication year: 2020
Publication venue: IEEE/ACM transactions on audio, speech, and language processing

External Links

Cited by

Snippet

This study proposes a complex spectral mapping approach for single-and multi-channel speech enhancement, where deep neural networks (DNNs) are used to predict the real and imaginary (RI) components of the direct-path signal from noisy and reverberant ones. The …

Continue reading at ieeexplore.ieee.org (PDF) (other versions)

230000003595 spectral 0 title abstract description 44

Classifications

- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02166—Microphone arrays; Beamforming
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. hidden Markov models [HMMs]
- G10L15/142—Hidden Markov Models [HMMs]
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0272—Voice signal separating
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding, i.e. using interchannel correlation to reduce redundancies, e.g. joint-stereo, intensity-coding, matrixing
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/20—Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis using predictive techniques
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/003—Changing voice quality, e.g. pitch or formants
- G10L21/007—Changing voice quality, e.g. pitch or formants characterised by the process used
- G10L21/013—Adapting to target pitch
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/04—Training, enrolment or model building
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 characterised by the type of extracted parameters

Similar Documents

Publication	Publication Date	Title
Wang et al.	2020	Complex spectral mapping for single-and multi-channel speech enhancement and robust ASR
Wang et al.	2021	Multi-microphone complex spectral mapping for utterance-wise and continuous speech separation
Wang et al.	2020	Deep learning based target cancellation for speech dereverberation
Tan et al.	2022	Neural spectrospatial filtering
Zhao et al.	2018	Two-stage deep learning for noisy-reverberant speech enhancement
Wang et al.	2023	TF-GridNet: Integrating full-and sub-band modeling for speech separation
Taherian et al.	2020	Robust speaker recognition based on single-channel and multi-channel speech enhancement
Kinoshita et al.	2016	A summary of the REVERB challenge: state-of-the-art and remaining challenges in reverberant speech processing research
Wang et al.	2022	STFT-domain neural speech enhancement with very low algorithmic latency
Zhao et al.	2017	A two-stage algorithm for noisy and reverberant speech enhancement
Li et al.	2019	Multichannel speech enhancement based on time-frequency masking using subband long short-term memory
JP5738020B2 (en)	2015-06-17	Speech recognition apparatus and speech recognition method
Xiao et al.	2016	Speech dereverberation for enhancement and recognition using dynamic features constrained deep neural networks and feature adaptation
Xiao et al.	2014	The NTU-ADSC systems for reverberation challenge 2014
Wang et al.	2018	On spatial features for supervised speech separation and its application to beamforming and robust ASR
Yamamoto et al.	2005	Enhanced robot speech recognition based on microphone array source separation and missing feature theory
Zhang et al.	2021	Multi-channel multi-frame ADL-MVDR for target speech separation
Mohammadiha et al.	2015	Speech dereverberation using non-negative convolutive transfer function and spectro-temporal modeling
Nakatani et al.	2013	Dominance based integration of spatial and spectral features for speech enhancement
Boeddeker et al.	2021	Convolutive transfer function invariant SDR training criteria for multi-channel reverberant speech separation
O'Shaughnessy	2024	Speech enhancement—A review of modern methods
Subramanian et al.	2019	An investigation of end-to-end multichannel speech recognition for reverberant and mismatch conditions
Liu et al.	2021	Inplace gated convolutional recurrent neural network for dual-channel speech enhancement
Kim et al.	2022	Factorized MVDR deep beamforming for multi-channel speech enhancement
Nesta et al.	2013	A flexible spatial blind source extraction framework for robust speech recognition in noisy environments