Zhang et al., 2017 - Google Patents

Deep learning based binaural speech separation in reverberant environments

Zhang et al., 2017

Document ID: 12038619160248080329
Author: Zhang X; Wang D
Publication year: 2017
Publication venue: IEEE/ACM transactions on audio, speech, and language processing

External Links

Cited by

Snippet

Speech signal is usually degraded by room reverberation and additive noises in real environments. This paper focuses on separating target speech signal in reverberant conditions from binaural inputs. Binaural separation is formulated as a supervised learning …

Continue reading at ieeexplore.ieee.org (PDF) (other versions)

238000000926 separation method 0 title abstract description 61

Classifications

- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02166—Microphone arrays; Beamforming
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0272—Voice signal separating
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding, i.e. using interchannel correlation to reduce redundancies, e.g. joint-stereo, intensity-coding, matrixing
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R25/00—Deaf-aid sets providing an auditory perception; Electric tinnitus maskers providing an auditory perception
- H04R25/40—Arrangements for obtaining a desired directivity characteristic
- H04R25/407—Circuits for combining signals of a plurality of transducers
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R2225/00—Details of deaf aids covered by H04R25/00, not provided for in any of its subgroups
- H04R2225/43—Signal processing in hearing aids to enhance the speech intelligibility
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 characterised by the type of extracted parameters
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2400/00—Details of stereophonic systems covered by H04S but not provided for in its groups
- H04S2400/15—Aspects of sound capture and related signal processing for recording or reproduction

Similar Documents

Publication	Publication Date	Title
Zhang et al.	2017	Deep learning based binaural speech separation in reverberant environments
Vecchiotti et al.	2019	End-to-end binaural sound localisation from the raw waveform
Nguyen et al.	2020	Robust source counting and DOA estimation using spatial pseudo-spectrum and convolutional neural network
CN109830245B (en)	2021-03-12	A method and system for multi-speaker speech separation based on beamforming
Wang et al.	2018	Supervised speech separation based on deep learning: An overview
EP3707716B1 (en)	2021-12-01	Multi-channel speech separation
Taherian et al.	2020	Robust speaker recognition based on single-channel and multi-channel speech enhancement
Wang	2008	Time-frequency masking for speech separation and its potential for hearing aid design
Roman et al.	2003	Speech segregation based on sound localization
Pang et al.	2019	Multitask learning of time-frequency CNN for sound source localization
Wang et al.	2018	On spatial features for supervised speech separation and its application to beamforming and robust ASR
Pertilä et al.	2015	Distant speech separation using predicted time–frequency masks from spatial features
Richard et al.	2023	Audio signal processing in the 21st century: The important outcomes of the past 25 years
Dadvar et al.	2019	Robust binaural speech separation in adverse conditions based on deep neural network with modified spatial features and training target
Liu et al.	2022	Head‐related transfer function–reserved time‐frequency masking for robust binaural sound source localization
Laufer-Goldshtein et al.	2020	Global and local simplex representations for multichannel source separation
Pertilä et al.	2020	Time difference of arrival estimation with deep learning–from acoustic simulations to recorded data
Zhang et al.	2017	Binaural Reverberant Speech Separation Based on Deep Neural Networks.
Jing et al.	2025	End-to-end doa-guided speech extraction in noisy multi-talker scenarios
Gul et al.	2023	Preserving the beamforming effect for spatial cue-based pseudo-binaural dereverberation of a single source
Li et al.	2020	Beamformed feature for learning-based dual-channel speech separation
Zohny et al.	2014	Modelling interaural level and phase cues with Student's t-distribution for robust clustering in MESSL
Wood et al.	2016	Blind Speech Separation with GCC-NMF.
Pang et al.	2024	The SEUEE System for the CHiME-8 MMCSG Challenge
Li et al.	2020	Speech separation based on reliable binaural cues with two-stage neural network in noisy-reverberant environments