Zhang et al., 2015 - Google Patents

Deep neural network-based bottleneck feature and denoising autoencoder-based dereverberation for distant-talking speaker identification

Zhang et al., 2015

View HTML @Full View

Document ID: 5266977311325190344
Author: Zhang Z; Wang L; Kai A; Yamada T; Li W; Iwahashi M
Publication year: 2015
Publication venue: EURASIP Journal on Audio, Speech, and Music Processing

External Links

Cited by

Snippet

Deep neural network (DNN)-based approaches have been shown to be effective in many automatic speech recognition systems. However, few works have focused on DNNs for distant-talking speaker recognition. In this study, a bottleneck feature derived from a DNN …

Continue reading at link.springer.com (HTML) (other versions)

238000011030 bottleneck 0 title abstract description 52

Classifications

- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02166—Microphone arrays; Beamforming
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1822—Parsing for meaning understanding
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. hidden Markov models [HMMs]
- G10L15/142—Hidden Markov Models [HMMs]
- G10L15/144—Training of HMMs
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 characterised by the type of extracted parameters
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/66—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models

Similar Documents

Publication	Publication Date	Title
Zhang et al.	2015	Deep neural network-based bottleneck feature and denoising autoencoder-based dereverberation for distant-talking speaker identification
Vincent et al.	2018	Audio source separation and speech enhancement
Delcroix et al.	2015	Strategies for distant speech recognitionin reverberant environments
Xiao et al.	2016	Speech dereverberation for enhancement and recognition using dynamic features constrained deep neural networks and feature adaptation
Nainan et al.	2021	Enhancement in speaker recognition for optimized speech features using GMM, SVM and 1-D CNN
Nakatani et al.	2013	Dominance based integration of spatial and spectral features for speech enhancement
Coto-Jiménez et al.	2016	Improving automatic speech recognition containing additive noise using deep denoising autoencoders of LSTM networks
Ochiai et al.	2023	Mask-based neural beamforming for moving speakers with self-attention-based tracking
Janský et al.	2022	Auxiliary function-based algorithm for blind extraction of a moving speaker
Xiong et al.	2015	Front-end technologies for robust ASR in reverberant environments—spectral enhancement-based dereverberation and auditory modulation filterbank features
Sainath et al.	2017	Raw multichannel processing using deep neural networks
Phapatanaburi et al.	2016	Distant-talking accent recognition by combining GMM and DNN
Delcroix et al.	2018	Context adaptive neural network based acoustic models for rapid adaptation
Malek et al.	2020	Block‐online multi‐channel speech enhancement using deep neural network‐supported relative transfer function estimates
Barhoush et al.	2023	Speaker identification and localization using shuffled MFCC features and deep learning
Sivasankaran et al.	2017	A combined evaluation of established and new approaches for speech recognition in varied reverberation conditions
Ueda et al.	2015	Environment-dependent denoising autoencoder for distant-talking speech recognition
Alam et al.	2015	Speech recognition in reverberant and noisy environments employing multiple feature extractors and i-vector speaker adaptation
Zhang et al.	2014	Distant-talking speaker identification by generalized spectral subtraction-based dereverberation and its efficient computation
Ueda et al.	2016	Single-channel dereverberation for distant-talking speech recognition by combining denoising autoencoder and temporal structure normalization
Sim et al.	2017	Adaptation of deep neural network acoustic models for robust automatic speech recognition
Ren et al.	2016	Combination of bottleneck feature extraction and dereverberation for distant-talking speech recognition
Nguyen et al.	2016	Feature adaptation using linear spectro-temporal transform for robust speech recognition
Nugraha et al.	2014	Single-channel dereverberation by feature mapping using cascade neural networks for robust distant speaker identification and speech recognition
Bawa et al.	2023	Developing sequentially trained robust Punjabi speech recognition system under matched and mismatched conditions