Silverman et al., 1992 - Google Patents

Towards using prosody in speech recognition/understanding systems: Differences between read and spontaneous speech

Silverman et al., 1992

Document ID: 17000688149761637124
Author: Silverman K; Blaauw E; Spitz J; Pitrelli J
Publication year: 1992
Publication venue: Speech and Natural Language: Proceedings of a Workshop Held at Harriman, New York, February 23-26, 1992

External Links

Cited by

Snippet

1. ABSTRACT A persistent problem for keyword-driven speech recognition systems is that users often embed the to-be-recognized words or phrases in longer utterances. The recognizer needs to locate the relevant sections of the speech signal and ignore extraneous …

Continue reading at aclanthology.org (PDF) (other versions)

230000002269 spontaneous 0 title abstract description 46

Classifications

- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/183—Speech classification or search using natural language modelling using context dependencies, e.g. language models
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L2015/088—Word spotting
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G10L13/027—Concept to speech synthesisers; Generation of natural phrases from machine-based concepts
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/06—Elementary speech units used in speech synthesisers; Concatenation rules
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/08—Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
- G10L15/265—Speech recognisers specially adapted for particular applications
- G—PHYSICS
- G09—EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
- G09B—EDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
- G09B5/00—Electrically-operated educational appliances
- G09B5/06—Electrically-operated educational appliances with both visual and audible presentation of the material to be studied
- G09B5/065—Combinations of audio and video presentations, e.g. videotapes, videodiscs, television systems
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/06—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
- G10L21/10—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids transforming into visible information
- G—PHYSICS
- G09—EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
- G09B—EDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
- G09B5/00—Electrically-operated educational appliances
- G09B5/04—Electrically-operated educational appliances with audible presentation of the material to be studied
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G—PHYSICS
- G09—EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
- G09B—EDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
- G09B19/00—Teaching not covered by other main groups of this subclass
- G09B19/06—Foreign languages
- G—PHYSICS
- G09—EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
- G09B—EDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
- G09B7/00—Electrically-operated teaching apparatus or devices working with questions and answers

Similar Documents

Publication	Publication Date	Title
Zue et al.	2002	JUPlTER: a telephone-based conversational interface for weather information
Syrdal et al.	2001	Automatic ToBI prediction and alignment to speed manual labeling of prosody
Fletcher et al.	2005	Intonational variation in four dialects of English: the high rising tune
Eide et al.	2004	A corpus-based approach to< ahem/> expressive speech synthesis
US5774854A (en)	1998-06-30	Text to speech system
JPH11513144A (en)	1999-11-09	Interactive language training device
US6148285A (en)	2000-11-14	Allophonic text-to-speech generator
JP2002530703A (en)	2002-09-17	Speech synthesis using concatenation of speech waveforms
Tomokiyo	2001	Recognizing non-native speech: characterizing and adapting to non-native usage in LVCSR
CN119274533A (en)	2025-01-07	A highly expressive audio generation method based on natural language description text
Silverman et al.	1992	Towards using prosody in speech recognition/understanding systems: Differences between read and spontaneous speech
KR20000049500A (en)	2000-08-05	Method of Practicing Foreign Language Using Voice Recognition and Text-to-Speech and System Thereof
Dahan et al.	2019	Language comprehension: Insights from research on spoken language
Aaron et al.	2005	Conversational computers
Veilleux	1994	Computational models of the prosody/syntax mapping for spoken language systems
US20070219799A1 (en)	2007-09-20	Text to speech synthesis system using syllables as concatenative units
Assaf	2005	A Prototype of an Arabic diphone speech synthesizer in festival
Jackson	2005	Automatic speech recognition: Human computer interface for kinyarwanda language
Silverman	1993	On customizing prosody in speech synthesis: Names and addresses as a case in point
Sečujski et al.	2007	An overview of the AlfaNum text-to-speech synthesis system
Seneff	2004	The use of subword linguistic modeling for multiple tasks in speech recognition
Syrdal et al.	1995	Text-to-speech systems
Nebbia et al.	1998	A specialised speech synthesis technique for application to automatic reverse directory service
EP1640968A1 (en)	2006-03-29	Method and device for speech synthesis
Laws	1998	A bilingual speech interface for New Zealand English to Māori