Wang et al., 2020 - Google Patents
Complex spectral mapping for single-and multi-channel speech enhancement and robust ASRWang et al., 2020
View PDF- Document ID
- 6818794445199689708
- Author
- Wang Z
- Wang P
- Wang D
- Publication year
- Publication venue
- IEEE/ACM transactions on audio, speech, and language processing
External Links
Snippet
This study proposes a complex spectral mapping approach for single-and multi-channel speech enhancement, where deep neural networks (DNNs) are used to predict the real and imaginary (RI) components of the direct-path signal from noisy and reverberant ones. The …
- 230000003595 spectral 0 title abstract description 44
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02166—Microphone arrays; Beamforming
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. hidden Markov models [HMMs]
- G10L15/142—Hidden Markov Models [HMMs]
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0272—Voice signal separating
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding, i.e. using interchannel correlation to reduce redundancies, e.g. joint-stereo, intensity-coding, matrixing
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/20—Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis using predictive techniques
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/003—Changing voice quality, e.g. pitch or formants
- G10L21/007—Changing voice quality, e.g. pitch or formants characterised by the process used
- G10L21/013—Adapting to target pitch
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/04—Training, enrolment or model building
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 characterised by the type of extracted parameters
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Wang et al. | Complex spectral mapping for single-and multi-channel speech enhancement and robust ASR | |
| Wang et al. | Multi-microphone complex spectral mapping for utterance-wise and continuous speech separation | |
| Wang et al. | Deep learning based target cancellation for speech dereverberation | |
| Tan et al. | Neural spectrospatial filtering | |
| Zhao et al. | Two-stage deep learning for noisy-reverberant speech enhancement | |
| Wang et al. | TF-GridNet: Integrating full-and sub-band modeling for speech separation | |
| Taherian et al. | Robust speaker recognition based on single-channel and multi-channel speech enhancement | |
| Kinoshita et al. | A summary of the REVERB challenge: state-of-the-art and remaining challenges in reverberant speech processing research | |
| Wang et al. | STFT-domain neural speech enhancement with very low algorithmic latency | |
| Zhao et al. | A two-stage algorithm for noisy and reverberant speech enhancement | |
| Li et al. | Multichannel speech enhancement based on time-frequency masking using subband long short-term memory | |
| JP5738020B2 (en) | Speech recognition apparatus and speech recognition method | |
| Xiao et al. | Speech dereverberation for enhancement and recognition using dynamic features constrained deep neural networks and feature adaptation | |
| Xiao et al. | The NTU-ADSC systems for reverberation challenge 2014 | |
| Wang et al. | On spatial features for supervised speech separation and its application to beamforming and robust ASR | |
| Yamamoto et al. | Enhanced robot speech recognition based on microphone array source separation and missing feature theory | |
| Zhang et al. | Multi-channel multi-frame ADL-MVDR for target speech separation | |
| Mohammadiha et al. | Speech dereverberation using non-negative convolutive transfer function and spectro-temporal modeling | |
| Nakatani et al. | Dominance based integration of spatial and spectral features for speech enhancement | |
| Boeddeker et al. | Convolutive transfer function invariant SDR training criteria for multi-channel reverberant speech separation | |
| O'Shaughnessy | Speech enhancement—A review of modern methods | |
| Subramanian et al. | An investigation of end-to-end multichannel speech recognition for reverberant and mismatch conditions | |
| Liu et al. | Inplace gated convolutional recurrent neural network for dual-channel speech enhancement | |
| Kim et al. | Factorized MVDR deep beamforming for multi-channel speech enhancement | |
| Nesta et al. | A flexible spatial blind source extraction framework for robust speech recognition in noisy environments |