+

Zhang et al., 2015 - Google Patents

Deep neural network-based bottleneck feature and denoising autoencoder-based dereverberation for distant-talking speaker identification

Zhang et al., 2015

View HTML @Full View
Document ID
5266977311325190344
Author
Zhang Z
Wang L
Kai A
Yamada T
Li W
Iwahashi M
Publication year
Publication venue
EURASIP Journal on Audio, Speech, and Music Processing

External Links

Snippet

Deep neural network (DNN)-based approaches have been shown to be effective in many automatic speech recognition systems. However, few works have focused on DNNs for distant-talking speaker recognition. In this study, a bottleneck feature derived from a DNN …
Continue reading at link.springer.com (HTML) (other versions)

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1822Parsing for meaning understanding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/065Adaptation
    • G10L15/07Adaptation to the speaker
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/14Speech classification or search using statistical models, e.g. hidden Markov models [HMMs]
    • G10L15/142Hidden Markov Models [HMMs]
    • G10L15/144Training of HMMs
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 characterised by the type of extracted parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/66Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computer systems based on biological models
    • G06N3/02Computer systems based on biological models using neural network models

Similar Documents

Publication Publication Date Title
Zhang et al. Deep neural network-based bottleneck feature and denoising autoencoder-based dereverberation for distant-talking speaker identification
Vincent et al. Audio source separation and speech enhancement
Delcroix et al. Strategies for distant speech recognitionin reverberant environments
Xiao et al. Speech dereverberation for enhancement and recognition using dynamic features constrained deep neural networks and feature adaptation
Nainan et al. Enhancement in speaker recognition for optimized speech features using GMM, SVM and 1-D CNN
Nakatani et al. Dominance based integration of spatial and spectral features for speech enhancement
Coto-Jiménez et al. Improving automatic speech recognition containing additive noise using deep denoising autoencoders of LSTM networks
Ochiai et al. Mask-based neural beamforming for moving speakers with self-attention-based tracking
Janský et al. Auxiliary function-based algorithm for blind extraction of a moving speaker
Xiong et al. Front-end technologies for robust ASR in reverberant environments—spectral enhancement-based dereverberation and auditory modulation filterbank features
Sainath et al. Raw multichannel processing using deep neural networks
Phapatanaburi et al. Distant-talking accent recognition by combining GMM and DNN
Delcroix et al. Context adaptive neural network based acoustic models for rapid adaptation
Malek et al. Block‐online multi‐channel speech enhancement using deep neural network‐supported relative transfer function estimates
Barhoush et al. Speaker identification and localization using shuffled MFCC features and deep learning
Sivasankaran et al. A combined evaluation of established and new approaches for speech recognition in varied reverberation conditions
Ueda et al. Environment-dependent denoising autoencoder for distant-talking speech recognition
Alam et al. Speech recognition in reverberant and noisy environments employing multiple feature extractors and i-vector speaker adaptation
Zhang et al. Distant-talking speaker identification by generalized spectral subtraction-based dereverberation and its efficient computation
Ueda et al. Single-channel dereverberation for distant-talking speech recognition by combining denoising autoencoder and temporal structure normalization
Sim et al. Adaptation of deep neural network acoustic models for robust automatic speech recognition
Ren et al. Combination of bottleneck feature extraction and dereverberation for distant-talking speech recognition
Nguyen et al. Feature adaptation using linear spectro-temporal transform for robust speech recognition
Nugraha et al. Single-channel dereverberation by feature mapping using cascade neural networks for robust distant speaker identification and speech recognition
Bawa et al. Developing sequentially trained robust Punjabi speech recognition system under matched and mismatched conditions
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载