+

Zhang et al., 2017 - Google Patents

Deep learning based binaural speech separation in reverberant environments

Zhang et al., 2017

View PDF
Document ID
12038619160248080329
Author
Zhang X
Wang D
Publication year
Publication venue
IEEE/ACM transactions on audio, speech, and language processing

External Links

Snippet

Speech signal is usually degraded by room reverberation and additive noises in real environments. This paper focuses on separating target speech signal in reverberant conditions from binaural inputs. Binaural separation is formulated as a supervised learning …
Continue reading at ieeexplore.ieee.org (PDF) (other versions)

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding, i.e. using interchannel correlation to reduce redundancies, e.g. joint-stereo, intensity-coding, matrixing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/065Adaptation
    • G10L15/07Adaptation to the speaker
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R25/00Deaf-aid sets providing an auditory perception; Electric tinnitus maskers providing an auditory perception
    • H04R25/40Arrangements for obtaining a desired directivity characteristic
    • H04R25/407Circuits for combining signals of a plurality of transducers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2225/00Details of deaf aids covered by H04R25/00, not provided for in any of its subgroups
    • H04R2225/43Signal processing in hearing aids to enhance the speech intelligibility
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 characterised by the type of extracted parameters
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/15Aspects of sound capture and related signal processing for recording or reproduction

Similar Documents

Publication Publication Date Title
Zhang et al. Deep learning based binaural speech separation in reverberant environments
Vecchiotti et al. End-to-end binaural sound localisation from the raw waveform
Nguyen et al. Robust source counting and DOA estimation using spatial pseudo-spectrum and convolutional neural network
CN109830245B (en) A method and system for multi-speaker speech separation based on beamforming
Wang et al. Supervised speech separation based on deep learning: An overview
EP3707716B1 (en) Multi-channel speech separation
Taherian et al. Robust speaker recognition based on single-channel and multi-channel speech enhancement
Wang Time-frequency masking for speech separation and its potential for hearing aid design
Roman et al. Speech segregation based on sound localization
Pang et al. Multitask learning of time-frequency CNN for sound source localization
Wang et al. On spatial features for supervised speech separation and its application to beamforming and robust ASR
Pertilä et al. Distant speech separation using predicted time–frequency masks from spatial features
Richard et al. Audio signal processing in the 21st century: The important outcomes of the past 25 years
Dadvar et al. Robust binaural speech separation in adverse conditions based on deep neural network with modified spatial features and training target
Liu et al. Head‐related transfer function–reserved time‐frequency masking for robust binaural sound source localization
Laufer-Goldshtein et al. Global and local simplex representations for multichannel source separation
Pertilä et al. Time difference of arrival estimation with deep learning–from acoustic simulations to recorded data
Zhang et al. Binaural Reverberant Speech Separation Based on Deep Neural Networks.
Jing et al. End-to-end doa-guided speech extraction in noisy multi-talker scenarios
Gul et al. Preserving the beamforming effect for spatial cue-based pseudo-binaural dereverberation of a single source
Li et al. Beamformed feature for learning-based dual-channel speech separation
Zohny et al. Modelling interaural level and phase cues with Student's t-distribution for robust clustering in MESSL
Wood et al. Blind Speech Separation with GCC-NMF.
Pang et al. The SEUEE System for the CHiME-8 MMCSG Challenge
Li et al. Speech separation based on reliable binaural cues with two-stage neural network in noisy-reverberant environments
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载