Akhtar et al., 2025 - Google Patents
UrduSER: A comprehensive dataset for speech emotion recognition in Urdu languageAkhtar et al., 2025
View HTML- Document ID
- 18042793215681789615
- Author
- Akhtar M
- Jahangir R
- Ain Q
- Nauman M
- Uddin M
- Ullah S
- Publication year
- Publication venue
- Data in Brief
External Links
Snippet
Abstract Speech Emotion Recognition (SER) is a rapidly evolving field of research that aims to identify and categorize emotional states through speech signal analysis. As SER holds considerable socio¬ cultural and business significance, researchers are increasingly …
- 230000008909 emotion recognition 0 title abstract description 32
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1822—Parsing for meaning understanding
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/21—Text processing
- G06F17/22—Manipulating or registering by use of codes, e.g. in sequence of text characters
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2765—Recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2705—Parsing
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/26—Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30017—Multimedia data retrieval; Retrieval of more than one type of audiovisual media
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/06—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
- G10L21/10—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids transforming into visible information
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/66—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/003—Changing voice quality, e.g. pitch or formants
- G10L21/007—Changing voice quality, e.g. pitch or formants characterised by the process used
- G10L21/013—Adapting to target pitch
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
-
- G—PHYSICS
- G09—EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
- G09B—EDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
- G09B5/00—Electrically-operated educational appliances
- G09B5/06—Electrically-operated educational appliances with both visual and audible presentation of the material to be studied
- G09B5/065—Combinations of audio and video presentations, e.g. videotapes, videodiscs, television systems
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Bigi | SPPAS-multi-lingual approaches to the automatic annotation of speech | |
| Durand et al. | The Oxford handbook of corpus phonology | |
| CN107464555B (en) | Method, computing device and medium for enhancing audio data including speech | |
| Cole et al. | New methods for prosodic transcription: Capturing variability as a source of information | |
| Douglas-Cowie et al. | Emotional speech: Towards a new generation of databases | |
| Schmidt | EXMARaLDA and the FOLK tools–two toolsets for transcribing and annotating spoken language | |
| US11282508B2 (en) | System and a method for speech analysis | |
| Priego-Valverde et al. | Is smiling during humor so obvious? a cross-cultural comparison of smiling behavior in humorous sequences in american english and french interactions | |
| Das et al. | BanglaSER: A speech emotion recognition dataset for the Bangla language | |
| Szekrényes | Annotation and interpretation of prosodic data in the hucomtech corpus for multimodal user interfaces | |
| Kleinberger et al. | Voice at NIME: a Taxonomy of New Interfaces for Vocal Musical Expression. | |
| Ibrahim et al. | Development of Hausa dataset a baseline for speech recognition | |
| Liesenfeld et al. | Building and curating conversational corpora for diversity-aware language science and technology | |
| Bai et al. | Audiosetcaps: An enriched audio-caption dataset using automated generation pipeline with large audio and language models | |
| Bartesaghi | Theories and practices of transcription from discourse analysis | |
| Akhtar et al. | UrduSER: A comprehensive dataset for speech emotion recognition in Urdu language | |
| Taj et al. | Urdu speech emotion recognition: A systematic literature review | |
| Tits et al. | Emotional speech datasets for english speech synthesis purpose: A review | |
| Arawjo et al. | Typetalker: A speech synthesis-based multi-modal commenting system | |
| Gnevsheva | Studying sociophonetics in second languages | |
| Rodrigues et al. | Emotion detection throughout the speech | |
| Hazel et al. | Enhancing the natural conversation experience through conversation analysis–a design method | |
| Zhao et al. | Chipola: A Chinese Podcast Lexical Database for capturing spoken language nuances and predicting behavioral data | |
| Zhao et al. | Recognition of Weird Tone in Chinese Communication and Improvement of Language Understanding for AI | |
| Raso et al. | Introduction: Spoken corpora and linguistic studies: Problems and perspectives |