Chen et al., 2020 - Google Patents

Using Taigi dramas with Mandarin Chinese subtitles to improve Taigi speech recognition

Chen et al., 2020

Document ID: 7052415600376253625
Author: Chen P; Wu C; Lee H; Tsao S; Ko M; Wang H
Publication year: 2020
Publication venue: 2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)

External Links

Cited by

Snippet

An obvious problem with automatic speech recognition (ASR) for Taigi is that the amount of training data is far from enough to build a practical ASR system. Collecting speech data with reliable transcripts for training the acoustic model (AM) is feasible but expensive. Moreover …

Continue reading at homepage.iis.sinica.edu.tw (PDF) (other versions)

241001672694 Citrus reticulata 0 title abstract description 39

Classifications

- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/183—Speech classification or search using natural language modelling using context dependencies, e.g. language models
- G10L15/187—Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/183—Speech classification or search using natural language modelling using context dependencies, e.g. language models
- G10L15/19—Grammatical context, e.g. disambiguation of the recognition hypotheses based on word sequence rules
- G10L15/197—Probabilistic grammars, e.g. word n-grams
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L2015/088—Word spotting
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L2015/085—Methods for reducing search complexity, pruning
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G10L15/30—Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. hidden Markov models [HMMs]
- G10L15/142—Hidden Markov Models [HMMs]
- G10L15/144—Training of HMMs
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
- G10L15/265—Speech recognisers specially adapted for particular applications
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/226—Taking into account non-speech caracteristics
- G10L2015/228—Taking into account non-speech caracteristics of application context
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/005—Language recognition
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis

Similar Documents

Publication	Publication Date	Title
US7590533B2 (en)	2009-09-15	New-word pronunciation learning using a pronunciation graph
Li et al.	2012	Code-switch language model with inversion constraints for mixed language speech recognition
Arisoy et al.	2009	Turkish broadcast news transcription and retrieval
Stolcke et al.	2006	Recent innovations in speech-to-text transcription at SRI-ICSI-UW
Lyu et al.	2010	SEAME: a Mandarin-English code-switching speech corpus in south-east asia.
Kumar et al.	2004	A large-vocabulary continuous speech recognition system for Hindi
US7412387B2 (en)	2008-08-12	Automatic improvement of spoken language
JP2002287787A (en)	2002-10-04	Disambiguation language model
Biadsy et al.	2012	Google's cross-dialect Arabic voice search
Nouza et al.	2016	ASR for South Slavic Languages Developed in Almost Automated Way.
US8170865B2 (en)	2012-05-01	Speech recognition device and method thereof
Cucu et al.	2014	SMT-based ASR domain adaptation methods for under-resourced languages: Application to Romanian
Avram et al.	2020	Towards a romanian end-to-end automatic speech recognition based on deepspeech2
Messaoudi et al.	2006	Arabic broadcast news transcription using a one million word vocalized vocabulary
Ayan et al.	2013	“Can you give me another word for hyperbaric?”: Improving speech translation using targeted clarification questions
Masmoudi et al.	2014	Phonetic tool for the Tunisian Arabic.
Reddy et al.	2010	Integration of statistical models for dictation of document translations in a machine-aided human translation task
Goel et al.	2010	Approaches to automatic lexicon learning with limited training examples
Chen et al.	2020	Using Taigi dramas with Mandarin Chinese subtitles to improve Taigi speech recognition
Maison et al.	2003	Pronunciation modeling for names of foreign origin
Raux	2004	Automated lexical adaptation and speaker clustering based on pronunciation habits for non-native speech recognition.
Wang et al.	2001	Multi-scale-audio indexing for translingual spoken document retrieval
Huang et al.	2019	Exploring retraining-free speech recognition for intra-sentential code-switching
Narayanan et al.	2006	Speech recognition engineering issues in speech to speech translation system design for low resource languages and domains
Laurent et al.	2014	Unsupervised acoustic model training for the Korean language