Akhtar et al., 2025 - Google Patents

UrduSER: A comprehensive dataset for speech emotion recognition in Urdu language

Akhtar et al., 2025

Document ID: 18042793215681789615
Author: Akhtar M; Jahangir R; Ain Q; Nauman M; Uddin M; Ullah S
Publication year: 2025
Publication venue: Data in Brief

External Links

Cited by

Snippet

Abstract Speech Emotion Recognition (SER) is a rapidly evolving field of research that aims to identify and categorize emotional states through speech signal analysis. As SER holds considerable socio¬ cultural and business significance, researchers are increasingly …

Continue reading at www.sciencedirect.com (HTML) (other versions)

230000008909 emotion recognition 0 title abstract description 32

Classifications

- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1822—Parsing for meaning understanding
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/21—Text processing
- G06F17/22—Manipulating or registering by use of codes, e.g. in sequence of text characters
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2765—Recognition
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2705—Parsing
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/26—Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30017—Multimedia data retrieval; Retrieval of more than one type of audiovisual media
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/06—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
- G10L21/10—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids transforming into visible information
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/66—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/003—Changing voice quality, e.g. pitch or formants
- G10L21/007—Changing voice quality, e.g. pitch or formants characterised by the process used
- G10L21/013—Adapting to target pitch
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
- G—PHYSICS
- G09—EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
- G09B—EDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
- G09B5/00—Electrically-operated educational appliances
- G09B5/06—Electrically-operated educational appliances with both visual and audible presentation of the material to be studied
- G09B5/065—Combinations of audio and video presentations, e.g. videotapes, videodiscs, television systems

Similar Documents

Publication	Publication Date	Title
Bigi	2015	SPPAS-multi-lingual approaches to the automatic annotation of speech
Durand et al.	2014	The Oxford handbook of corpus phonology
CN107464555B (en)	2023-07-28	Method, computing device and medium for enhancing audio data including speech
Cole et al.	2016	New methods for prosodic transcription: Capturing variability as a source of information
Douglas-Cowie et al.	2003	Emotional speech: Towards a new generation of databases
Schmidt	2012	EXMARaLDA and the FOLK tools–two toolsets for transcribing and annotating spoken language
US11282508B2 (en)	2022-03-22	System and a method for speech analysis
Priego-Valverde et al.	2018	Is smiling during humor so obvious? a cross-cultural comparison of smiling behavior in humorous sequences in american english and french interactions
Das et al.	2022	BanglaSER: A speech emotion recognition dataset for the Bangla language
Szekrényes	2014	Annotation and interpretation of prosodic data in the hucomtech corpus for multimodal user interfaces
Kleinberger et al.	2022	Voice at NIME: a Taxonomy of New Interfaces for Vocal Musical Expression.
Ibrahim et al.	2022	Development of Hausa dataset a baseline for speech recognition
Liesenfeld et al.	2022	Building and curating conversational corpora for diversity-aware language science and technology
Bai et al.	2025	Audiosetcaps: An enriched audio-caption dataset using automated generation pipeline with large audio and language models
Bartesaghi	2021	Theories and practices of transcription from discourse analysis
Akhtar et al.	2025	UrduSER: A comprehensive dataset for speech emotion recognition in Urdu language
Taj et al.	2023	Urdu speech emotion recognition: A systematic literature review
Tits et al.	2019	Emotional speech datasets for english speech synthesis purpose: A review
Arawjo et al.	2017	Typetalker: A speech synthesis-based multi-modal commenting system
Gnevsheva	2022	Studying sociophonetics in second languages
Rodrigues et al.	2020	Emotion detection throughout the speech
Hazel et al.	2023	Enhancing the natural conversation experience through conversation analysis–a design method
Zhao et al.	2025	Chipola: A Chinese Podcast Lexical Database for capturing spoken language nuances and predicting behavioral data
Zhao et al.	2022	Recognition of Weird Tone in Chinese Communication and Improvement of Language Understanding for AI
Raso et al.	2014	Introduction: Spoken corpora and linguistic studies: Problems and perspectives