Lee et al., 1998 - Google Patents

English to Korean statistical transliteration for information retrieval

Lee et al., 1998

Document ID: 3532848766491402172
Author: Lee J; Choi K
Publication year: 1998
Publication venue: Computer Processing of Oriental Languages

External Links

Cited by

Snippet

In Korean technical documents, many English words are transliterated into Korean in various ways. Most of these words are technical terms and proper nouns that are frequently used as query terms in information retrieval systems. As the communication with foreigners …

Continue reading at www.researchgate.net (PDF) (other versions)

238000006243 chemical reaction 0 abstract description 12

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30634—Querying
- G06F17/30657—Query processing
- G06F17/3066—Query translation
- G06F17/30669—Translation of the query language, e.g. Chinese to English
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2765—Recognition
- G06F17/277—Lexical analysis, e.g. tokenisation, collocates
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2765—Recognition
- G06F17/2775—Phrasal analysis, e.g. finite state techniques, chunking
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30634—Querying
- G06F17/30657—Query processing
- G06F17/30675—Query execution
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2809—Data driven translation
- G06F17/2827—Example based machine translation; Alignment
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2705—Parsing
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2863—Processing of non-latin text
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2872—Rule based translation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2795—Thesaurus; Synonyms
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30613—Indexing
- G06F17/30619—Indexing indexing structures
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/68—Methods or arrangements for recognition using electronic means using sequential comparisons of the image signals with a plurality of references in which the sequence of the image signals or the references is relevant, e.g. addressable memory
- G06K9/6807—Dividing the references in groups prior to recognition, the recognition taking place in steps; Selecting relevant dictionaries
- G06K9/6842—Dividing the references in groups prior to recognition, the recognition taking place in steps; Selecting relevant dictionaries according to the linguistic properties, e.g. English, German
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00852—Recognising whole cursive words

Similar Documents

Publication	Publication Date	Title
Lee et al.	1998	English to Korean statistical transliteration for information retrieval
Bikel et al.	1999	An algorithm that learns what's in a name
Poon et al.	2009	Unsupervised morphological segmentation with log-linear models
CN101261623A (en)	2008-09-10	Word splitting method and device for word border-free mark language based on search
Toselli et al.	2019	Making two vast historical manuscript collections searchable and extracting meaningful textual features through large-scale probabilistic indexing
El Hadj et al.	2009	Arabic part-of-speech tagging using the sentence structure
Gashaw et al.	2020	Machine learning approaches for amharic parts-of-speech tagging
Patil et al.	2016	Issues and challenges in marathi named entity recognition
Sorokin	2017	Spelling correction for morphologically rich language: a case study of russian
Ekbal et al.	2007	Named entity recognition and transliteration in Bengali
CN111767733A (en)	2020-10-13	A document classification method based on statistical word segmentation
Stamatatos et al.	1999	Automatic extraction of rules for sentence boundary disambiguation
Jain et al.	2014	Detection and correction of non word spelling errors in Hindi language
Roy et al.	2019	Unsupervised context-sensitive bangla spelling correction with character n-gram
Pal et al.	2020	Vartani Spellcheck--Automatic Context-Sensitive Spelling Correction of OCR-generated Hindi Text Using BERT and Levenshtein Distance
Byamugisha	2022	Noun class disambiguation in Runyankore and related languages
Kumar Saha et al.	2008	Named entity recognition in Hindi using maximum entropy and transliteration
Amri et al.	2018	Amazigh POS tagging using TreeTagger: a language independant model
Seon et al.	2001	Named Entity Recognition using Machine Learning Methods and Pattern-Selection Rules.
Elhadj	2009	Statistical part-of-speech tagger for traditional Arabic texts
Kang et al.	2000	Two approaches for the resolution of word mismatch problem caused by English words and foreign words in Korean information retrieval
Núñez et al.	2019	Phonetic normalization for machine translation of user generated content
Saha et al.	2008	Word clustering and word selection based feature reduction for MaxEnt based Hindi NER
Tien et al.	2022	Vietnamese spelling error detection and correction using BERT and N-gram language model
Nathani et al.	2021	Part of speech tagging for a resource poor language: Sindhi in Devanagari script using HMM and CRF