Zhang et al., 2017 - Google Patents
Deep learning based binaural speech separation in reverberant environmentsZhang et al., 2017
View PDF- Document ID
- 12038619160248080329
- Author
- Zhang X
- Wang D
- Publication year
- Publication venue
- IEEE/ACM transactions on audio, speech, and language processing
External Links
Snippet
Speech signal is usually degraded by room reverberation and additive noises in real environments. This paper focuses on separating target speech signal in reverberant conditions from binaural inputs. Binaural separation is formulated as a supervised learning …
- 238000000926 separation method 0 title abstract description 61
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02166—Microphone arrays; Beamforming
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0272—Voice signal separating
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding, i.e. using interchannel correlation to reduce redundancies, e.g. joint-stereo, intensity-coding, matrixing
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R25/00—Deaf-aid sets providing an auditory perception; Electric tinnitus maskers providing an auditory perception
- H04R25/40—Arrangements for obtaining a desired directivity characteristic
- H04R25/407—Circuits for combining signals of a plurality of transducers
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R2225/00—Details of deaf aids covered by H04R25/00, not provided for in any of its subgroups
- H04R2225/43—Signal processing in hearing aids to enhance the speech intelligibility
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 characterised by the type of extracted parameters
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2400/00—Details of stereophonic systems covered by H04S but not provided for in its groups
- H04S2400/15—Aspects of sound capture and related signal processing for recording or reproduction
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Zhang et al. | Deep learning based binaural speech separation in reverberant environments | |
| Vecchiotti et al. | End-to-end binaural sound localisation from the raw waveform | |
| Nguyen et al. | Robust source counting and DOA estimation using spatial pseudo-spectrum and convolutional neural network | |
| CN109830245B (en) | A method and system for multi-speaker speech separation based on beamforming | |
| Wang et al. | Supervised speech separation based on deep learning: An overview | |
| EP3707716B1 (en) | Multi-channel speech separation | |
| Taherian et al. | Robust speaker recognition based on single-channel and multi-channel speech enhancement | |
| Wang | Time-frequency masking for speech separation and its potential for hearing aid design | |
| Roman et al. | Speech segregation based on sound localization | |
| Pang et al. | Multitask learning of time-frequency CNN for sound source localization | |
| Wang et al. | On spatial features for supervised speech separation and its application to beamforming and robust ASR | |
| Pertilä et al. | Distant speech separation using predicted time–frequency masks from spatial features | |
| Richard et al. | Audio signal processing in the 21st century: The important outcomes of the past 25 years | |
| Dadvar et al. | Robust binaural speech separation in adverse conditions based on deep neural network with modified spatial features and training target | |
| Liu et al. | Head‐related transfer function–reserved time‐frequency masking for robust binaural sound source localization | |
| Laufer-Goldshtein et al. | Global and local simplex representations for multichannel source separation | |
| Pertilä et al. | Time difference of arrival estimation with deep learning–from acoustic simulations to recorded data | |
| Zhang et al. | Binaural Reverberant Speech Separation Based on Deep Neural Networks. | |
| Jing et al. | End-to-end doa-guided speech extraction in noisy multi-talker scenarios | |
| Gul et al. | Preserving the beamforming effect for spatial cue-based pseudo-binaural dereverberation of a single source | |
| Li et al. | Beamformed feature for learning-based dual-channel speech separation | |
| Zohny et al. | Modelling interaural level and phase cues with Student's t-distribution for robust clustering in MESSL | |
| Wood et al. | Blind Speech Separation with GCC-NMF. | |
| Pang et al. | The SEUEE System for the CHiME-8 MMCSG Challenge | |
| Li et al. | Speech separation based on reliable binaural cues with two-stage neural network in noisy-reverberant environments |