Zhang et al., 2013 - Google Patents
Speech emotion recognition using combination of featuresZhang et al., 2013
View PDF- Document ID
- 4232024346061715937
- Author
- Zhang Q
- An N
- Wang K
- Ren F
- Li L
- Publication year
- Publication venue
- 2013 fourth international conference on intelligent control and information processing (ICICIP)
External Links
Snippet
In this paper, we study how speech features' numbers and statistical values impact recognition accuracy of emotions present in speech. With Gaussian Mixture Model (GMM), we identify two effective features, namely Mel Frequency Cepstrum Coefficients (MFCCs) …
- 230000001264 neutralization 0 abstract description 26
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/26—Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/66—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/04—Training, enrolment or model building
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 characterised by the type of extracted parameters
- G10L25/09—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 characterised by the type of extracted parameters the extracted parameters being zero crossing rates
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/06—Decision making techniques; Pattern matching strategies
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/005—Speaker recognisers specially adapted for particular applications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/90—Pitch determination of speech signals
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Kabir et al. | A survey of speaker recognition: Fundamental theories, recognition methods and opportunities | |
| Zhang et al. | Speech emotion recognition using combination of features | |
| Hu et al. | Pitch‐based gender identification with two‐stage classification | |
| CN108986824B (en) | Playback voice detection method | |
| CN108597496A (en) | Voice generation method and device based on generation type countermeasure network | |
| Labied et al. | Automatic speech recognition features extraction techniques: A multi-criteria comparison | |
| CN117672268A (en) | Multi-mode voice emotion recognition method based on relative entropy alignment fusion | |
| KR20060110988A (en) | Recognition and genre classification method of musical instrument signal using Bayes method | |
| CN114360553B (en) | Method for improving voiceprint safety | |
| Zöhrer et al. | Representation learning for single-channel source separation and bandwidth extension | |
| Nawas et al. | Speaker recognition using random forest | |
| CN118398033A (en) | A speech-based emotion recognition method, system, device and storage medium | |
| Wang et al. | Ftdkd: Frequency-time domain knowledge distillation for low-quality compressed audio deepfake detection | |
| Lei et al. | Speaker Recognition Using Wavelet Packet Entropy, I‐Vector, and Cosine Distance Scoring | |
| Williams | Learning disentangled speech representations | |
| Chadha et al. | Optimal feature extraction and selection techniques for speech processing: A review | |
| Kamble et al. | Emotion recognition for instantaneous Marathi spoken words | |
| Raghib et al. | Emotion analysis and speech signal processing | |
| Ravindran et al. | Improving the noise-robustness of mel-frequency cepstral coefficients for speech processing | |
| Singh et al. | Novel feature extraction algorithm using DWT and temporal statistical techniques for word dependent speaker’s recognition | |
| Nazid et al. | Improved speaker-independent emotion recognition from speech using two-stage feature reduction | |
| Sunny et al. | Feature extraction methods based on linear predictive coding and wavelet packet decomposition for recognizing spoken words in malayalam | |
| Alhlffee | MFCC-Based Feature Extraction Model for Long Time Period Emotion Speech Using CNN. | |
| Dm et al. | Speech based emotion recognition using combination of features 2-D HMM model | |
| CN113971963B (en) | Speech audio analysis method and device, electronic device and readable storage medium |