Taherian et al., 2019 - Google Patents
Deep learning based multi-channel speaker recognition in noisy and reverberant environmentsTaherian et al., 2019
View PDF- Document ID
- 5892504555854448721
- Author
- Taherian H
- Wang Z
- Wang D
- Publication year
- Publication venue
- Interspeech
External Links
Snippet
Despite successful applications of multi-channel signal processing in robust automatic speech recognition (ASR), relatively little research has been conducted on the effectiveness of such techniques in the robust speaker recognition domain. This paper introduces time …
- 239000011159 matrix material 0 abstract description 21
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02166—Microphone arrays; Beamforming
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0272—Voice signal separating
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/04—Training, enrolment or model building
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R3/00—Circuits for transducers, loudspeakers or microphones
- H04R3/005—Circuits for transducers, loudspeakers or microphones for combining the signals of two or more microphones
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding, i.e. using interchannel correlation to reduce redundancies, e.g. joint-stereo, intensity-coding, matrixing
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R25/00—Deaf-aid sets providing an auditory perception; Electric tinnitus maskers providing an auditory perception
- H04R25/40—Arrangements for obtaining a desired directivity characteristic
- H04R25/407—Circuits for combining signals of a plurality of transducers
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S7/00—Indicating arrangements; Control arrangements, e.g. balance control
- H04S7/30—Control circuits for electronic adaptation of the sound field
- H04S7/301—Automatic calibration of stereophonic sound system, e.g. with test microphone
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Taherian et al. | Deep learning based multi-channel speaker recognition in noisy and reverberant environments | |
| Taherian et al. | Robust speaker recognition based on single-channel and multi-channel speech enhancement | |
| Tan et al. | Neural spectrospatial filtering | |
| Wang et al. | Complex spectral mapping for single-and multi-channel speech enhancement and robust ASR | |
| Yoshioka et al. | Multi-microphone neural speech separation for far-field multi-talker speech recognition | |
| Kinoshita et al. | A summary of the REVERB challenge: state-of-the-art and remaining challenges in reverberant speech processing research | |
| Wang et al. | All-Neural Multi-Channel Speech Enhancement. | |
| Wang et al. | Sequential multi-frame neural beamforming for speech separation and enhancement | |
| Ren et al. | A Causal U-Net Based Neural Beamforming Network for Real-Time Multi-Channel Speech Enhancement. | |
| US20110096915A1 (en) | Audio spatialization for conference calls with multiple and moving talkers | |
| CN114078481B (en) | Voice enhancement method and device based on two-channel neural network time-frequency masking and hearing aid equipment | |
| US20100217590A1 (en) | Speaker localization system and method | |
| Gu et al. | Rezero: Region-customizable sound extraction | |
| Xiao et al. | The NTU-ADSC systems for reverberation challenge 2014 | |
| Araki et al. | Spatial correlation model based observation vector clustering and MVDR beamforming for meeting recognition | |
| Koldovský et al. | Semi-blind noise extraction using partially known position of the target source | |
| Habets | Speech dereverberation using statistical reverberation models | |
| Zhang et al. | A Deep Learning Approach to Multi-Channel and Multi-Microphone Acoustic Echo Cancellation. | |
| Kim | Hearing aid speech enhancement using phase difference-controlled dual-microphone generalized sidelobe canceller | |
| Koldovský et al. | Noise reduction in dual-microphone mobile phones using a bank of pre-measured target-cancellation filters | |
| Kovalyov et al. | Dsenet: Directional signal extraction network for hearing improvement on edge devices | |
| Sivasankaran et al. | Analyzing the impact of speaker localization errors on speech separation for automatic speech recognition | |
| Song et al. | An integrated multi-channel approach for joint noise reduction and dereverberation | |
| Bohlender et al. | Neural networks using full-band and subband spatial features for mask based source separation | |
| Masuyama et al. | Exploring the integration of speech separation and recognition with self-supervised learning representation |