+
Skip to content

amin-ally/TextNormalization_Project

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Text Normalizer

This project implements text normalization for Farsi(Persian) language.

it contains types below :

  • normalizing numbers
  • normalizing dates
  • normalizing times
  • normalizing currency
  • normalizing measurement (physical measurement)
  • normalizing phone number and ID number
  • normalizing punctuation
  • normalizing miscellaneous abbreviations

for text-to-speech and speech-to-text (TTSv1(default) , TTSv2, STT).

Usage

python main.py [input file address] [output file address] [version] [type1, type2, ....]

examples

normalize all types for text-to-speech version 1

python main.py inp.txt out.txt 

normalize time and date for speech-to-text

python main.py inp.txt out.txt TTSv2 -t -d
  1. by declaring a type the normalizer Limited to the declared type !
  2. The difference between TTS version 1 and TTS version 2 is in the way the punctuations are normalized

About

Implementing text normalization for Farsi(Persian) language.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载