Zhang et al., 2015 - Google Patents
Deep neural network-based bottleneck feature and denoising autoencoder-based dereverberation for distant-talking speaker identificationZhang et al., 2015
View HTML- Document ID
- 5266977311325190344
- Author
- Zhang Z
- Wang L
- Kai A
- Yamada T
- Li W
- Iwahashi M
- Publication year
- Publication venue
- EURASIP Journal on Audio, Speech, and Music Processing
External Links
Snippet
Deep neural network (DNN)-based approaches have been shown to be effective in many automatic speech recognition systems. However, few works have focused on DNNs for distant-talking speaker recognition. In this study, a bottleneck feature derived from a DNN …
- 238000011030 bottleneck 0 title abstract description 52
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02166—Microphone arrays; Beamforming
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1822—Parsing for meaning understanding
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. hidden Markov models [HMMs]
- G10L15/142—Hidden Markov Models [HMMs]
- G10L15/144—Training of HMMs
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 characterised by the type of extracted parameters
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/66—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Zhang et al. | Deep neural network-based bottleneck feature and denoising autoencoder-based dereverberation for distant-talking speaker identification | |
| Vincent et al. | Audio source separation and speech enhancement | |
| Delcroix et al. | Strategies for distant speech recognitionin reverberant environments | |
| Xiao et al. | Speech dereverberation for enhancement and recognition using dynamic features constrained deep neural networks and feature adaptation | |
| Nainan et al. | Enhancement in speaker recognition for optimized speech features using GMM, SVM and 1-D CNN | |
| Nakatani et al. | Dominance based integration of spatial and spectral features for speech enhancement | |
| Coto-Jiménez et al. | Improving automatic speech recognition containing additive noise using deep denoising autoencoders of LSTM networks | |
| Ochiai et al. | Mask-based neural beamforming for moving speakers with self-attention-based tracking | |
| Janský et al. | Auxiliary function-based algorithm for blind extraction of a moving speaker | |
| Xiong et al. | Front-end technologies for robust ASR in reverberant environments—spectral enhancement-based dereverberation and auditory modulation filterbank features | |
| Sainath et al. | Raw multichannel processing using deep neural networks | |
| Phapatanaburi et al. | Distant-talking accent recognition by combining GMM and DNN | |
| Delcroix et al. | Context adaptive neural network based acoustic models for rapid adaptation | |
| Malek et al. | Block‐online multi‐channel speech enhancement using deep neural network‐supported relative transfer function estimates | |
| Barhoush et al. | Speaker identification and localization using shuffled MFCC features and deep learning | |
| Sivasankaran et al. | A combined evaluation of established and new approaches for speech recognition in varied reverberation conditions | |
| Ueda et al. | Environment-dependent denoising autoencoder for distant-talking speech recognition | |
| Alam et al. | Speech recognition in reverberant and noisy environments employing multiple feature extractors and i-vector speaker adaptation | |
| Zhang et al. | Distant-talking speaker identification by generalized spectral subtraction-based dereverberation and its efficient computation | |
| Ueda et al. | Single-channel dereverberation for distant-talking speech recognition by combining denoising autoencoder and temporal structure normalization | |
| Sim et al. | Adaptation of deep neural network acoustic models for robust automatic speech recognition | |
| Ren et al. | Combination of bottleneck feature extraction and dereverberation for distant-talking speech recognition | |
| Nguyen et al. | Feature adaptation using linear spectro-temporal transform for robust speech recognition | |
| Nugraha et al. | Single-channel dereverberation by feature mapping using cascade neural networks for robust distant speaker identification and speech recognition | |
| Bawa et al. | Developing sequentially trained robust Punjabi speech recognition system under matched and mismatched conditions |