+

WO1997040453A1 - Traitement de la langue naturelle automatise - Google Patents

Traitement de la langue naturelle automatise Download PDF

Info

Publication number
WO1997040453A1
WO1997040453A1 PCT/US1996/010283 US9610283W WO9740453A1 WO 1997040453 A1 WO1997040453 A1 WO 1997040453A1 US 9610283 W US9610283 W US 9610283W WO 9740453 A1 WO9740453 A1 WO 9740453A1
Authority
WO
WIPO (PCT)
Prior art keywords
natural language
textual information
translation
input textual
source
Prior art date
Application number
PCT/US1996/010283
Other languages
English (en)
Inventor
Glenn A. Akers
Susumu Kuno
Original Assignee
Language Engineering Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Language Engineering Corporation filed Critical Language Engineering Corporation
Priority to JP53802197A priority Critical patent/JP3680865B2/ja
Priority to US09/171,185 priority patent/US6760695B1/en
Priority to JP50176398A priority patent/JP2001503540A/ja
Priority to PCT/US1997/010005 priority patent/WO1997048058A1/fr
Priority to US09/202,013 priority patent/US6470306B1/en
Publication of WO1997040453A1 publication Critical patent/WO1997040453A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/53Processing of non-Latin text
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/55Rule-based translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation

Definitions

  • the invention relates to automated natural language processing in order to translate automatically from one natural language into another natural language, preferably Japanese to English.
  • the system used for translation includes a computer which receives input in one language and performs operations on the received input to supply output in another language.
  • This type of translation has been an inexact one, and the resulting output can require significant editing by a skilled operator
  • the translation operation performed by known systems generally includes a structural conversion operation.
  • the objective of structural conversion is to transform a given parse tree (i.e., a syntactic structure tree) of the source language sentence to the corresponding tree in the target language.
  • Two types of structural conversion have been tried, grammar-rule-based and template-to-template.
  • the domain of structural conversion is limited to the domain of grammar rules that have been used to obtain the source-language parse tree (i.e., to a set of subnodes that are immediate daughters of a given node). For example, given
  • This method is very efficient in that it is easy to find out where the specified conversion applies; it applies exactly at the location where the rule has been used to obtain the source-language parse tree.
  • it can be a weak conversion mechanism in that its domain, as specified above, may be extremely limited, and in that natural language may require conversion rules that straddle over nodes that are not siblings.
  • template-to-template structural conversion structural conversion is specified in terms of input/output (I/O) templates or subtrees. If a given input template matches a given structure tree, that portion of the structure tree that is matched by the template is changed as specified by the corresponding output template. This is a very powerful conversion mechanism, but it can be costly in that it can take a long period of time to find out if a given input template matches any portion of a given structure tree.
  • the automated natural language translation system has many advantages over known machine-based translators. After the system automatically selects the best possible translation of the input textual information and provides the user with an output (preferably a Japanese language translation of English-language input text), the user can then interface with the system to edit the displayed translation or to obtain alternative translations in an automated fashion
  • An operator of the automated natural language translation system of the invention can be more productive because the system allows the operator to retain just the portion of the translation that he or she deems acceptable while causing the remaining portion to be retranslated automatically. Since this selective retranslation operation is precisely directed at portions that require retranslation, operators are saved the time and tedium of considering potentially large numbers of incorrect, but highly ranked translations.
  • the system allows for arbitrary granularity in translation adjustments, more of the final structure of the translation will usually have been generated by the system.
  • the system thus reduces the potential for human (operator) error and saves time in edits that may involve structural, accord, and tense changes.
  • the system efficiently gives operators the full benefit of its extensive and reliable knowledge of grammar and spelling.
  • the automated natural language translations system's versatile handling of ambiguous sentence boundaries in the source language, and its powerful semantic propagation provide further accuracy and reduced operator editing of translations. Stored statistical information also improves the accuracy of translations by tailoring the preferred translation to the specific user site.
  • the system's idiom handling method is advantageous in that it allows sentences that happen to include the sequence of words making up the idiom, without intending the meaning of the idiom, to be correctly translated.
  • the system is efficient but still has versatile functions such as long distance feature matching.
  • the system's structural balance expert and coordinate structure expert effectively distinguish between intended parses and unintended parses.
  • a capitalization expert effectively obtains correct interpretations of capitalized words in sentences, and a capitalized sequence procedure effectively deals with multiple-word proper names, without completely ignoring common noun interpretations.
  • the invention is directed to an improvement of the automated natural language translation system, wherein the improvement relates to parsing input textual information in a source natural language (preferably Japanese) by transforming at least some of the kanas in the input textual information into alphabetic letters of a target natural language (preferably English) thereby allowing the presence of a word or phrase boundary to be recognized in the middle of a kana.
  • the input textual information includes kanjis and kanas wherein kanjis are ideograms which each has some semantic content and kanas are syllabic characters which each represents a sound without any inherent meaning.
  • the source natural language is one which uses ideograms and syllabic characters but does not mark word or phrase boundaries, as is the case with Japanese.
  • the invention is directed to another improvement of the automated natural language translation system, wherein the improvement relates to parsing input textual information in a source natural language (preferably Japanese, Korean, or Chinese) by performing concurrently on the input textual information a morphological analysis and a syntactic analysis.
  • a source natural language preferably Japanese, Korean, or Chinese
  • the source natural language is one without identifiers marking word or phrase boundaries, as is the case with Japanese, Korean, and Chinese.
  • FIG. 1 is a block diagram illustrating a system for performing automated translation of natural language.
  • FIG. 2 is a data flow diagram illustrating overall functioning of the system of FIG. 1.
  • FIG. 3 is a flow diagram illustrating the operation of the system of FIG. 1.
  • FIG. 4 is a flow diagram illustrating the operation of the end-of-sentence function of the preparser of the system of FIG. 1.
  • FIG. 5 is a flow diagram illustrating the operation of the parser of the system of FIG. 1.
  • FIG. 6 is a flow diagram illustrating the semantic propagation operations of the system of FIG. 1.
  • FIG. 7 is a flow diagram illustrating the structural conversion operations of the system of FIG. 1.
  • FIG. 8 is a flow diagram illustrating the expert evaluator of the system of FIG. 1.
  • FIG. 9 is a diagram of a sample graph used by the system of FIG. 1 for the exemplary phrase "by the bank”.
  • FIG. 10 is a diagram of a system which transforms kanas in the input text into alphabetic letters to allow the presence of a word or phrase boundary to be recognized in the middle of a kana, according to a first aspect of the invention.
  • FIG. 11 is a diagram of a system which performs on the input text a morphological analysis and a syntactic analysis concurrently, in accordance with a second aspect of the invention.
  • An automated natural language translation system can translate from a source natural language to a target natural language.
  • the system translates from English to Japanese.
  • the system translates from Japanese to English.
  • the system comprises means for receiving and storing the source natural language, a translation engine for creating a translation into the target natural language, means for displaying the translation to a user, and means for obtaining for a user and displaying to a user alternative translations.
  • the translation engine includes a preparser, a parser, a graph maker, an evaluator, a graph scorer, a parse extractor, and a structural converter. The preparser examines the input text and resolves any ambiguities in input sentence boundaries.
  • the preparser then creates and displays the input text in a parse chart seeded with dictionary entries.
  • the parser parses the chart to obtain possible syntactic categories for the input text.
  • the graph maker produces a graph of the possible syntactic interpretations of the input text based on the parse chart.
  • the graph includes nodes and subnodes which are associated with possible interpretations of the input text.
  • the evaluator which comprises a series of experts, evaluates the graph of the possible interpretations and adds expert weights to the nodes and subnodes of the graph.
  • the graph scorer uses the expert weights to score the subnodes, and the graph scorer then associates the N best scores with each node.
  • the parse extractor assigns a parse tree structure to the preferred interpretation as determined by the graph scorer.
  • the structural converter performs a structural conversion operation on the parse tree structure to obtain a translation in the target language.
  • the graph scorer associates a constant value with each subnode.
  • An analysis of the linguistic information associated with each subnode determines the subnode score. See, for example, FIG. 8 where a series of expert evaluators examine the linguistic information stored at each node and subnode.
  • the graph scorer adds together the individual weighted scores for each expert to obtain a final weighted average for a particular node or subnode.
  • the combination of a plurality of weighted scores into a single weighted average score is a standard problem in computer science. One method that can be used is to multiply each expert result by a constant number (weight) assigned to that expert.
  • weight assigned to each expert is a matter of design choice. The designer can choose the priority (weight) to assign each expert.
  • the graph scorer can propagate the subnode scores from the bottom of the graph up to the top of the graph. Given the graph, wherein each node has a set of N scores, it is possible to determine one or more propagation methods.
  • One technique which can be used to propagate the subnode scores is memoization which is a type of dynamic-programming used to solve optimization problems.
  • the solution to optimization problems can involve many possible values (outcomes).
  • the task is to find the optimal value.
  • the algorithm used in optimization solves every subsubproblem just once and saves the outcome, thus avoiding the need to recompute the answer every time the subsubproblem is encountered.
  • memoization as applied to optimization problems, see, for example, Co ⁇ nen et al., Introduction to Algorithms 301-314 (McGraw-Hill Book Co. 1990). The method described at pages 301, 302, and 312 of Introduction to
  • Algorithms is one method that can be used for propagating subnode score information through the graph.
  • the semantic propagation part of the system operates to propagate semantic information from smaller constituents to the larger constituents that they comprise.
  • Semantic propagation applies to the four classes of syntactic categories (SEMNP, SEMVP, SEMADJ, and VERB) used in the parsing operation.
  • SEMNP syntactic categories
  • SEMVP syntactic category
  • SEMADJ SEMADJ
  • VERB a syntactic categories
  • Gazdar discusses in his text Natural Language Processing In Prolog (Addison-Wesley Publishing Co., 1989) a set of rules which can be used to analyze the semantic information stored at the nodes in a directed acyclic graph similar to that disclosed in the specification. Gazdar discusses the use of feature matching to match information on adjacent nodes. Gazdar states that feature matching involves equations that say that certain features appearing on one node must be identical to the features appearing on another. Most current work assumes a principle that is responsible for equating one class of feature specifications as they appear on the mother category and the daughter which manifests the morphology associated with those features. This daughter is known as the "head" of the phrase. Most phrases only have a single head.
  • a verb phrase inherits the tense of its verb since the latter is the head of the verb phrase.
  • this principle is no straightforward way of specifying this principle on a grammar-wide basis with the notational resources that we have used so far, but we can stipulate the effects of the principle on a rule-by-rule basis quite simply if we assume that the relevant features are all to be found on a single branch of the DAG. Let us call the label on this branch head. Then we can write a typical VP rule as follows:
  • Gazdar can be easily adapted for each of the syntactic categories discussed herein.
  • the linguistic information assigned to each node using Gazdar's rules can be propagated through the tree using memoization techniques.
  • the weighted average is one method of determining the subnode score
  • each subnode score can be propagated through the graph using known memoization techniques as applied to optimization problems, and the strategy discussed in Gazdar's text can be used to analyze the linguistic information stored at each node and this linguistic information can be propagated through the parse tree chart using memoization techniques.
  • the automated natural language translation system can perform automated re-translation functions after the initial automatic translation. That is, after the system automatically selects the best possible translation of the input textual information and provides the user with an output (preferably a Japanese language translation of the input English text, or a Japanese-to-English translation), the user can then interface with the system to edit the displayed translation or to obtain alternative translations in an automated fashion.
  • an output preferably a Japanese language translation of the input English text, or a Japanese-to-English translation
  • the automated natural language translation system uses a linguistic model which breaks a sentence into substrings.
  • a substring is one or more words which occur in the order specified as part of the sentence. For instance, substrings of "The man is happy” include “The,” “The man,” “man is happy,” “is,” and “The man is happy” itself, but not “is man,” “man man,” and “The is.”
  • linguistic models classify substrings in various ways and in different levels of detail. For instance, in “They would like an arrow,” “an arrow” is typically classified as a noun phrase (NP). Some models would also classify "an arrow” with syntactic features (for instance, it is a singular noun phrase), and semantic features (it refers to a weapon). If the phrase is ambiguous, there may be two or more ways of classifying it. For instance, "an arrow” can also refer to a symbol with an arrow-like shape.
  • linguistic models provide a method for resolving ambiguity, they usually do so by combining smaller units into larger units. When evaluating a larger unit, these models consider only a portion of the information contained in the larger unit.
  • the semantic property of "an arrow” (symbol vs. weapon) is used in evaluating the verb phrase “like an arrow” in the sentence "They would like an arrow.”
  • the syntax of the phrase “an arrow” were changed as in “He shot it with an arrow,” the semantic property of "an arrow” is not used in evaluating the verb phrase “shot it with an arrow.”
  • exported properties are all properties used to evaluate the combination of an interpreted substring with other units to form larger substrings.
  • An export is an interpreted substring interpreted together with its exported properties. Properties that are contained within the interpreted substring but not exported are called substructures.
  • the parser of the system includes a grammar database.
  • the parser finds all possible interpretations of a sentence using grammatical rules.
  • X is composed of, or made from, Al A2 ... An and is referred to as a higher node of lower nodes (subnodes) Al through An.
  • the graph maker of the system graphically represents the many possible interpretations of a sentence. Each node of the graph corresponds to an export of some substring. In one embodiment of the system, a single export is represented by a single node.
  • the graph contains arcs which emanate from the nodes associated with an export.
  • the arcs represent the substructure of the export based on the application of grammar rules.
  • the graph may depict at least two types of arcs: (1) a unary arc which points to a single different export of the same substring; (2) a binary arc which includes a pair of pointers which points to two exports, the substrings of which when concatenated form the substring of the original export.
  • a grammar in Chomsky normal form assumes a grammar in Chomsky normal form.
  • Amended claim 35 applies to grammars not in Chomsky normal form by rephrasing type (2) to reflect an arc having an N-tuple of pointers, pointing to N exports.
  • the graph also includes a single starting export S from which all portions of the graph can be reached by following a series of arcs.
  • the starting export corresponds to the entire sentence.
  • Multiple arcs emanate from a node if and only if the same export can be composed of one or more exports (the pair of pointers in a binary arc is not considered multiple arcs for this purpose). Multiple arcs point to a node if and only if that export is a component of multiple exports. A node with no arcs projecting from it corresponds to a dictionary entry assigned to the substring.
  • a plurality of linguistic experts assign a numerical score to a set of exports.
  • the linguistic experts apply the score to each node of the graph.
  • a scoring array (where each element of the array is a weight to multiply by a particular expert's score) is a fixed length "N" of floating point numbers for any given sentence.
  • the score is evaluated by a scoring module which may be integrated with the graph- making engine and/or the parser. Scores are computed for all exports that make up a higher export. The score for the higher export is computed as the sum of the exports that make up the higher level export and the scores of any experts that apply to the combination such as a score assigned by the structural balance exprr*
  • the order in which nodes are visited and scored is a standard depth-first graph- walking algorithm. In this algorithm, nodes that have been scored are marked and are not scored again. During the scoring process, the scoring module evaluates dictionary entry nodes before evaluating any of the higher unit nodes. Each dictionary entry gives rise to a single score.
  • each of the k scores of the lower export is added to the expert values that apply to the unary rule, and the resulting vector of k scores is associated with the parent export.
  • each export has associated with its node a set of g scores (g ranging from I to N) which represent the g most likely ways (relative to the linguistic model) of making the export, including all substructure properties which are not represented in the export.
  • the scoring method gives rise to the g most likely ways of making the sentence.
  • Each score in each score list described above has an associated pointer.
  • the pointer provides information to indicate which score(s) of the score list of lower export(s) were combined to produce the higher level score.
  • the g most likely interpretations of the sentence can be extracted as unambiguous parse trees.
  • FIGS. 1-9 Further details of the automated natural language translation system will now be disclosed with reference to FIGS. 1-9. Various improvements according to the invention are described thereafter with reference to FIGS. 10, 11, and 12.
  • an automated natural language translation system 10 includes an input interface 12, a translation engine 16, storage 18, a user input device 22, a display 20, and an output interface 14.
  • the input interface is constructed to receive a sequence of text in a source language, such as English or Japanese.
  • the input interface may comprise a keyboard, a voice interface, or a digital electronic interface, such as a modem or a serial input.
  • the translation engine performs translation operations on the source text, in conjunction with data in storage.
  • the translation engine may be comprised entirely of hardwired logic circuitry, or it may contain one or more processing units and associated stored instructions.
  • the engine may include the following elements, or parts of them: A preparser 24, a parser 26, a graph maker 28, a parse/translation evaluator 30, a parse extractor 32, a structural converter 34, and a user interface 42, which includes an alternate parse system 37.
  • the structural converter may comprise a grammar rule controlled structural converter 36, a lexicon controlled structural converter 38, and a synthesis rule controlled structural converter 40.
  • the storage 18 may include one or more areas of disk (e.g., hard, floppy, and/or optical) and/or memory (e.g., RAM) storage, or the like.
  • the storage 18 may store input textual information in a source natural language, output textual information in a target natural language, and all sorts of information used or useful in performing the translation including one or more dictionaries, domain keywords, and grammar rules.
  • the user input interface 22 may comprise a keyboard, a mouse, touchscreen, light pen, or other user input device, and is to be used by the operator of the system.
  • the display may be a computer display, printer or other type of display, or it may include other means of communicating information to the operator.
  • the output interface 14 communicates a final translation of the source text in the target language, such as Japanese.
  • the interface may comprise a printer, a display, a voice interface, an electronic interface, such as a modem or serial line, or it may include other means for communicating that text to the end user.
  • the preparser 24 first performs a preparsing operation (step 102) on the source text 23. This operation includes the resolution of ambiguities in sentence boundaries in the source text, and results in a parse chart seeded with dictionary entries 25.
  • the parser 26 then parses the chart produced by the preparser (step 104), to obtain a parse chart filled with syntactic possibilities 27.
  • the graph maker 28 produces a graph of possible interpretations 29 (step 106), based on the parse chart resulting from the parsing step.
  • the evaluator 30, which accesses a series of experts 43, evaluates the graph of stored interpretations (step 108), and adds expert weights to the graph 31.
  • the graph scorer 33 scores nodes and associates the N (e.g., 20) best scores with each of them 35.
  • the parse extracter 32 assigns a parse tree structure 39 to this preferred interpretation (step 110).
  • the structural converter 34 which accesses the conversion tables 58, then performs a structural conversion operation (step 112) on the tree to obtain a translation 41 in the target language.
  • the user may interact with the alternate parse system 37 to obtain alternative translations.
  • the system begins the preparsing operation by dividing the input stream into tokens (step 114), which include individual punctuation marks, and groups of letters that form words.
  • the occurrence of whitespace affects the interpretation of characters at this level. For instance, in “x - y" the "-" is a dash, but in "x-y” it is a hyphen.
  • the preparser then combines the tokens into words (step 116). At this level, it recognizes special constructions (e.g., internet addresses, telephone numbers, and social security numbers) as single units.
  • the preparser also uses dictionary lookup to find groupings. For example, if "re- enact” is in the dictionary as “reenact” it will become one word in the sentence, but if it is not, then it will remain as three separate "words”.
  • the next preparsing phase involves determining where the sentence ends (step 118).
  • the preparser accesses the base dictionary and the technical dictionaries, and any user-created dictionaries, as it follows a sequence of steps for each possible sentence ending point (i.e., after each word of the source text).
  • the preparser need not perform these steps in the particular order presented, and these may be implemented as a series of ordered rules or they may be hard-coded.
  • the preparser interprets and records any nonparsable sequence of characters, such as a series of dashes: " ", as a "sentence” by itself, although not one which will be translated (step 120).
  • the preparser also requires any sequence of two carriage returns in a row to be the end of a sentence (step 122). lf the first letter of the next word is a lower case letter, the preparser will not indicate the end of a sentence (step 124). If a sentence started on a new line and is short, the preparser considers it a "sentence" of its own (e.g., a title).
  • the preparser interprets a period, a question mark, or an exclamation mark as the end of a sentence, except in certain situations involving end parenthesis and end quotes (step 128).
  • the preparser uses virtual punctuation marks after the quote in addition to the punctuation before the quote.
  • Alternatives for the underlying punctuation required for ?” are illustrated in the following examples:
  • the virtual punctuation marks added by the preparser indicate that before the quote there is something which can be either a question mark or nothing at all. After the quote there is something that can be either a period or a question mark.
  • the grammatical structure of the rest of the sentence allows later processing stages to select the best choice.
  • the preparser may also use several further approaches in preparsing a period (steps 130, 132, 134, 136, and 138). Some abbreviations in the dictionary are marked as never beginning sentences and others as never ending sentences (step 130). These rules are always respected. For example, "Ltd" never begins a sentence and "Mr" never ends one.
  • the preparser also will not end a sentence with a single initial followed by a period unless the next word is a common grammatical word (step 132) such as "the", "in”, etc. If the word before the period is found in any dictionary, the period will end the sentence (step 134). If the word before the period is not in this dictionary, and it has internal periods (e.g., I.B.M.) and the next word is not in the dictionary in a lowercase form, or the word after that is itself uppercase, then this is not an end of sentence (step 136). In remaining cases the period does mark the end of sentence (step 138).
  • I.B.M. internal periods
  • the parser places the words of the sentence in syntactic categories, and applies grammar rules from the grammar database to them to compute possible syntactic interpretations 25 of the sentence (step 104).
  • These grammar rules 48 can be implemented as a series of computer readable rules that express the grammatical constraints of the language. For the English language, there may be hundreds of such rules, which may apply to hundreds of syntactic categories. To reduce the computational overhead of this operation, the different possible meanings of a word are ignored.
  • the graph maker employs the dictionary to expand the results of the parser to include the different meanings of words and creates a directed acyclic graph representing all semantic interpretations of the sentence.
  • This graph is generated with the help of a series of semantic propagation procedures, which are described below. These procedures operate on a series of authored grammar rules and, in some cases, access a semantic feature tree for semantic information.
  • the semantic feature tree is a tree structure that includes semantic categories. It is roughly organized from the abstract to the specific, and permits the procedures to determine how semantically related a pair of terms are, both in terms of their separation in the tree and their levels in the tree.
  • the graph includes nodes 80 and their subnodes 82, 84, 86 linked by pointers 88, 89, 90, 91 in a manner that indicates various types of relationships.
  • a first type of relationship in the graph is one where nodes representing phrases possess pointers to constituent word nodes or sub-phrase nodes. For example, a node 84 representing the phrase "the bank” will be linked by pointers 92, 93 to the constituent words "the” 94, and "bank” 95.
  • a second type of relationship in the graph is where phrase interpretations possess pointers to alternate ways of making the same higher-level constituent from lower-level pieces.
  • a node 80 representing the phrase "by the bank” can have two source interpretation locations 81, 83, which each include pointers 88 & 89, 90 & 91 to their respective constituents.
  • the different constituents would include different subnodes 84, 86 that each represent different meanings for the phrase "the bank”.
  • the structure of the graph is defined by the results of the parsing operation and is constrained by the syntax of the source sentence.
  • the nodes of the graph are associated with storage locations for semantic information, which can be filled in during the process of semantic propagation.
  • the semantic propagation part of the system operates to propagate semantic information from smaller constituents to the larger constituents they comprise. It applies to four classes of the syntactic categories used in the earlier parsing operation: SEMNP (which includes noun-like objects and prepositional phrases), SEMVP (verb phrase like objects, which usually take subjects), SEMADJ (adjectives) and VERB (lexical verb-like verbs that often take objects). Other syntactic categories are ignored within a rule.
  • the grammar rule author may also override the implicit behavior below by specific markings on rules. These specific instructions are followed first.
  • the first is a set of rules that tell from examining the noun-like and verb-like constituents in a grammar rule, which selectional restriction slots of the verb-like constituents apply to which noun-like objects.
  • a grammar rule which selectional restriction slots of the verb-like constituents apply to which noun-like objects.
  • VP VTl 1 + NP + VP
  • One exemplary default rule indicates that when a verb takes objects, selectional restrictions are to be applied to the first NP encountered to the right of the verb.
  • the semantic propagation operation includes copying of selectional restrictions from SEMVPs to imperative sentences (step 140). If a SEMNP is being used as a locative expression, its goodness is evaluated against semantic constants defining good locations (step 142). If a rule involves a conjunction of two SEMNPs (detected because of ANDing together of syntactic features), the graph maker ANDs together the semantic features and applies the semantic distance expert (step 144).
  • the graph maker locates a "head" SEMNP which gets propagated to a higher level (e.g., it becomes part of a SEMNP that includes more words), it propagates semantic features as well (step 146) However, if the "head” is a partitive word (e.g., "portion,” "part”), it propagates from a SEMNP to the left or right instead.
  • SEMVPs and SEMADJs are propagated in the same way, with the only exception being that SEMVPs and SEMADJs do not have any partitive situations (step 148) Adjectives are part of the SEMVP class for this purpose.
  • the graph maker When a SEMVP is made from a rule including VERBs, the graph maker propagates upward the VERB's subject restriction unless the VP is a passive construction, in which case the VERB's first object restriction is propagated instead (step 150).
  • it attempts to apply the selectional restrictions of the SEMVPs to NPs encountered moving leftward from the SEMVP (step 152).
  • the graph maker attempts to apply the selectional restriction of the SEMADJ first to any SEMNPs encountered moving to the right from the SEMADJ, and if that fails, tries moving to the left (step 154).
  • the graph maker applies them in turn to SEMNPs encountered in order to the right of the VERB (step 156).
  • SEMNPs encountered in order to the right of the VERB.
  • a verb selectional restriction is used up as soon as it applies to something.
  • SEMNPs are not used up when something applies to them. Starting at this rule, the SEMNP does get "used up”.
  • the graph maker determines if there are any SEMVPs or SEMADJs in it that have not yet been used, and if so, propagates them upward (step 158)
  • the system also performs feature matching of linguistic features.
  • Linguistic features are properties of words and other constituents. Syntactic feature matching is used by the parser, and semantic feature matching is used by the graph maker. But the same techniques are used for both For instance, "they” has the syntactic feature plural, while “he” has the feature of singular.
  • Feature matching uses marking on grammar rules so that they only apply if the features of the words they are to apply to meet certain conditions. For example, one rule might be:
  • Feature match restrictions are broken into "local” and "long distance".
  • the long distance actions may be computed when the grammar is compiled, rather than when actually processing a sentence.
  • the sequence of long distance operations that must be performed is then encoded in a series of instruction bytes.
  • the computation of long distance feature operations must start with an n-ary rule (i.e., one that may have more than two inputs on its right).
  • the system then distributes codes to various binary rules so that feature sets end up being propagated between rules in the correct fashion.
  • the system of the invention also allows multiword "idioms” as part of the dictionary, while retaining representations of the individual words of which they are composed. These two forms may ultimately compete against each other to be the best representation. For instance "black sheep” is found in the dictionary with the meaning of a disfavored person. But in some cases the words “black sheep” may refer to a sheep which is black. Because both of the forms are retained, this non-idiomatic usage may still be chosen as the correct translation.
  • the idioms may belong to further categorizations.
  • the system may use the following three types:
  • Almighty idioms suppress any other possible interpretation of any of the words that make up the sequence.
  • Preferential idioms suppress other constituents of the same general type and that use the very same words. Normal idioms compete on an even footing with other entries.
  • the resulting graph is to be evaluated by experts (step 108, FIG. 3), which provide scores that express the likelihood of correctness of interpretations in the graph.
  • the system of the invention includes a scoring method that applies to all partial sentences of any length, not just full sentences. An important element in the use of a graph is that a subtree is fully scored and analyzed only once, even though it may appear in a great many sentences.
  • the phrase “Near the bank there is a bank.” the phrase “Near the bank” has at least two meanings, but the best interpretation of that phrase is determined only once.
  • the phrase “there is a bank” similarly has two interpretations, but the best of those two is determined only once. There are therefore four sentence interpretations, but the subphrases are scored just once.
  • Another feature of the graph is that each node is labeled with easily accessible information about the length of that piece of the sentence. This allows the best N interpretations of any substring of the English sentence to be found without reanalyzing the sentence.
  • N being a number on the order of 20
  • the use of a graph allows the system to integrate the result of a user choice about a smaller constituent and give a different N best analyses that respect the user's choice. Because all this is done without reparsing the sentence or rescoring any substrings, it may be done quickly.
  • operation of the expert evaluator 30 is based on various factors that characterize each translation, which are handled by the various experts.
  • the rule probability expert 170 evaluates the average relative frequency of grammar rules used to obtain the initial source language parse tree.
  • the selectional restriction expert 178 evaluates the degree of semantic accord of the given translation.
  • the dictionary entry probability expert 172 evaluates the average relative frequency of particular "parts of speech" of the words in the sentence used to obtain the initial source language parse tree.
  • the statistics expert evaluates the average relative frequency of particular paraphrases chosen for the given translation.
  • the system automatically determines the English "part of speech” (POS) for various individual English words, English phrases, and groups of English words.
  • POS English "part of speech”
  • the system makes the automatic determination of the POS when translating sentences, and the system usually makes the correct choice. Occasionally, however, the sentence being translated is itself ambiguous.
  • a word or phrase that can be interpreted as more than one POS leads to several distinct but "correct” meanings for the sentence in which the word or phrase appears. It is possible for an operator of the system to override the system's automatic POS determination and instead manually set the POS for any word, phrase, or group of words.
  • an operator of the system can set "a boy with a telescope” as a Noun Phrase to force the system to interpret the sentence to mean that the boy was carrying a telescope and thus reject the interpretation that John used a telescope to see the boy.
  • An operator can address the situation where overriding the system's POS rules yields worse, not better, translation results by applying a few manual POS settings as possible or by applying less restrictive manual POS settings.
  • Noun Phrase is less restrictive than Noun
  • Group is the least restrictive POS setting. The following is a list of the various possible POS settings.
  • the operator can manually set a different POS to "on the fourth of July” in the sentence "We need a book on the fourth of July". If an operator does not want the system to translate a particular word, phrase, or group of words from English to Japanese, the operator can assign the POS "English” to the desired word(s), phrase(s), and/or group(s) of words. It also is possible for an operator to remove one or more POS settings, regardless whether the settings were assigned automatically by the system or manually by an operator.
  • the system keeps track of statistical information from translation usage at each customer site at more than one level. For example, the system may maintain statistical counts at the surface form level (how often was "leaving” used as a transitive versus an intransitive verb), and also at the meaning level (did it mean “leave behind” or “depart” from), and this second type is summed over occurrences of "leave”, “leaves", "left”, and "leaving”. The system may also keep statistical counts separately for uses that have occurred within the last several sentences, and uses that have occurred at any time at the customer site. Furthermore, the system may distinguish cases where the user intervened to indicate that a particular word sense should be used, from cases where the system used a particular word sense without any confirmation from the user.
  • the structural balance expert 182 is based on a characteristic of English and many other
  • the coordinate structure expert 180 measures the semantic distance between B and C, and that between A and C to determine which mode of coordination combines two elements that are closer in meaning. This expert accesses the semantic feature tree during its operation. This expert is also an efficient way to distinguish between the intended parses and the unintended parses of a given sentence. Many words in English include potential ambiguities between ordinary-noun and proper- name interpretations.
  • the capitalization expert 176 uses the location of capitalization in a sentence to determine how likely it is that the capitalization is significant. For example, the following sentences:
  • Brown is my first choice My first choice is Brown. are different in that while the former is genuinely ambiguous, it is far more likely in the latter that "Brown" is a person name than a color name.
  • This expert takes into consideration factors such as whether a given capitalized word appears at sentence-initial or sentence-noninitial position (as shown above), whether the capitalized spelling is in the dictionary, and whether the lower-case- initial version is in the dictionary. This expert is an effective way to obtain the correct interpretations of capitalized words in sentences.
  • a sentence contains a sequence of initial-uppercase words, it can be treated as a proper name or as a sequence of ordinary nouns
  • the system of the invention employs a capitalized sequence procedure, which favors the former inte ⁇ retation
  • the sequence cannot itself be parsed by normal grammar rules, it can be treated as a single unanalyzed noun phrase to be passed through untranslated. This procedure has proven to be a very effective way of dealing with multiple-word proper names while not completely ignoring the lower-rated common noun interpretations.
  • the machine translation system of the invention uses a grammar-rule controlled structural conversion mechanism 162 that has the efficiency of a straightforward grammar-rule-based structural conversion method, but which comes close to the power of the template-to-template structural conversion method.
  • This method relies on the use of grammar rules 160 which can specify non-flat complex substructure. While the following is a rule format used in other translation systems'
  • symbols prefixed with "#" are virtual symbols that are invisible for the purpose of sentence structure parsing, but which are used in building substructures once a given parse is obtained.
  • the structural conversion also includes a dictionary controlled structural conversion operation 166, which accesses dictionaries 161 to operate on the parse tree after it has been operated upon by the grammar-rule controlled structural conversion operation.
  • the synthesis- rule controlled structural conversion operation then applies synthesis rules to the resulting parse tree to provide the target language text 41.
  • FIGS. 1 and 2 after the system has derived a best-ranked translation in the process described above, it is presented to the user via the display 20. The user then has the option of approving or editing the translation by interacting with the alternate parse system 37, via the user input device 22. In the editing operation, the user may constrain portions of the translated sentence that are correctly translated, while requesting retranslation of the remaining portions of the sentence. This operation may be performed rapidly, since the system retains the graph with expert weights 31.
  • FIGS. 10 and 11 Two improvements according to the invention are now described with reference to FIGS. 10 and 11, respectively.
  • the translation engine 16 of the automated natural language translation system 10 receives source text 23 and automatically translates it into target natural language text 41, which translation is affected by a parsing that transforms some or all kanas in the source text 23 into alphabetic letters of the target natural language. This is for the purpose of making it possible to assume the presence of a morpheme (i.e., the smallest linguistic unit that has meaning) boundary in the middle of a kana in a given input sentence.
  • the source language is Japanese
  • the target language is English.
  • any source natural language that uses ideograms and syllabic characters and whose orothgraphy lacks identifiers marking word and phrase boundaries can be processed and translated according to this aspect of the invention.
  • references to Japanese in describing this aspect of the invention should not be construed as limiting
  • the Japanese orothgraphy i.e., writing system
  • Kanjis are ideograms which each has some semantic content.
  • Kanas are syllabic characters which each represents a sound without any inherent meaning.
  • alphabetic letters are known as romajis.
  • the Japanese input string shown in (1) is transformed into the following by a parser in the translation engine 16, where a character that is recognized as an alphabetic letter is shown in regular parentheses (i e., "()")
  • kanas ⁇ ka ⁇ , ⁇ na ⁇ , and ⁇ ta ⁇ in the original Japanese orthography have been transformed into romaji (k)(a), (n)(a), and (t)(a), respectively, because a morpheme boundary might have to be recognized between the initial consonant and the vowel.
  • kanas ⁇ ha ⁇ , ⁇ wo ⁇ , and ⁇ TU ⁇ have been retained as kanas because there is no possibility in Japanese for a morpheme boundary to show up in the middle of these three particular kanas.
  • the usefulness of converting kana-kanji Japanese text 23 into kana-kanji-romaji text 202 is not limited to machine translation. It extends to any automatic Japanese language processing systems which involve identification of mo ⁇ hemes. Such systems can include information retrieval systems for retrieving, for example, all occurrences of "to write.”
  • Table 2 gives an example of a partial paradigm of verb inflections in Japanese, where the verbs write, extinguish, stand, and die are used in the example.
  • the dictionary entries are:
  • the dictionary needs only one stem for each verb and there is only one set of suffixes.
  • the translation engine 16 of the automated natural language translation system 10 receives source text 23 and automatically translates it into target natural language text 41, which translation is affected by a parser that performs morphological and syntactic analyses concurrently on the source input text 23.
  • the source language is Japanese
  • the target language is English.
  • any source natural language whose orothgraphy lacks identifiers marking word and phrase boundaries e.g., Japanese, Korean, and Chinese
  • the task of parsing sentences of languages such as Japanese, Korean, and Chinese can be compared to the task of parsing English sentences that are spelled without blank spaces between words. This analogy will be used herein to aid in understanding this aspect of the invention
  • the initial letter "s" is clearly a third-person singular present-tense suffix and is not the first letter of a new word. This fact can be identified only if it is recognized that the previously identified "write” is a verb and that the dictionary form of a verb can be followed by "s". With this grammatical information, the substrings can be changed to: Matched Substrings Remaining Substring she write-s letters.
  • the task of the parser is to accept the input string (which is really a string of Japanese, or similar language, characters) recognize mo ⁇ heme/word boundaries, and produce a parse tree for the sentence that might look like the following: Sentence
  • NP noun phrase
  • AUX auxiliary verb
  • VP verb phrase
  • PRN pronoun
  • V verb
  • DET determiner
  • N noun.
  • the standard approach to this task is to carry out a heuristic mo ⁇ heme/word boundary identification pass first, and then to carry out a syntactic pass with the identified mo ⁇ hemes/words as units. That is, with known systems, the input string would first go through a mo ⁇ hological analysis component that identifies morpheme/word boundaries which would yield:
  • each orthographic unit e.g., "s", "h”, “e”, etc.
  • the input string has "s” as a word, "h” as a word, "e” as a word, "d” as a word, etc.
  • the entry in the dictionary 204 for the English word "she” thus is a multi- word entry "s h e”.
  • the sequence "s h e" in the input string will match this multiple-word dictionary entry in the same way as, say, "in front of in a regular English input string would match a multi-word idiom entry "in front of in the regular dictionary.
  • the dictionary 204 for parsing the unsegmented input string would be an all-idiom dictionary (except for entries for single-letter words such as "a" in English).
  • the parsing of the unsegmented input sentence is completed when a set of parses for the sentence is obtained.
  • matched dictionary entries i.e., "multi-word idioms" represent mo ⁇ hemes.
  • morphological analysis of the input string is completed concurrent with the completion of the parsing of the string with grammar rules.
  • the grammar rules and dictionary organization can be greatly economized if it is recognized that the string contains the following mo ⁇ hemes:
  • kanas ⁇ ka ⁇ , ⁇ na ⁇ , and ⁇ ta ⁇ in the original Japanese orthography have been transformed into romaji (k)(a), (n)(a), and (t)(a), respectively, because a morpheme boundary might have to be recognized between the initial consonant and the vowel
  • kanas ⁇ ha ⁇ , ⁇ wo ⁇ , and ⁇ TU ⁇ have been retained as kanas because there is no possibility in Japanese for a mo ⁇ heme boundary to show up in the middle of these three particular kanas
  • parser of the translation engine 16.
  • Table 6 is a "multi-word" idiom dictionary 204 according to the invention.
  • the parser then produces the following parse tree 208:
  • the morphological analysis thus is completed concurrent with the completion of the syntactic parsing of the input string. That is, a sequence of characters at the bottom of the parse tree that is dominated by a single syntactic category constitutes a mo ⁇ heme.
  • All of the above-described functions and operations may be implemented by a variety of hardwired logic design and/or programming techniques for use with a general pu ⁇ ose computer.
  • the steps as presented in the flowcharts generally need not be applied in the order presented, and combinations of the steps may be combined.
  • the functionality of the system may be distributed into programs and data in various ways.
  • Any of the embodiments of the automated natural language translation system described herein, including all of the functionality described herein, can be provided as computer software on a computer-readable medium such as a diskette or an optical compact disc (CD) for execution on a general pu ⁇ ose computer (e.g., an Apple Macintosh, an IBM PC or compatible, a Sun Workstation, etc.).
  • a general pu ⁇ ose computer e.g., an Apple Macintosh, an IBM PC or compatible, a Sun Workstation, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

Un système de traduction automatique de langue naturelle traduit un texte source rédigé dans une langue naturelle (en japonais, de préférence) en une langue naturelle cible (en anglais, de préférence). Ledit système permet également à un opérateur de retraduire automatiquement des passages choisis du texte source. Ledit système comprend une amélioration dont le but est de transformer les kanas du texte source en lettres de l'alphabet de la langue cible, ce qui permet de reconnaître la présence d'une frontière lexicale ou syntagmatique au milieu d'un kana. Une autre amélioration du système permet de réaliser simultanément sur le texte source une analyse morphologique et une analyse syntaxique.
PCT/US1996/010283 1992-08-31 1996-06-14 Traitement de la langue naturelle automatise WO1997040453A1 (fr)

Priority Applications (5)

Application Number Priority Date Filing Date Title
JP53802197A JP3680865B2 (ja) 1996-04-23 1996-06-14 自動自然言語翻訳
US09/171,185 US6760695B1 (en) 1992-08-31 1996-06-17 Automated natural language processing
JP50176398A JP2001503540A (ja) 1996-06-14 1997-06-09 アノテートされたテキストの自動翻訳
PCT/US1997/010005 WO1997048058A1 (fr) 1996-06-14 1997-06-09 Traduction automatisee de texte annote
US09/202,013 US6470306B1 (en) 1996-04-23 1997-06-09 Automated translation of annotated text based on the determination of locations for inserting annotation tokens and linked ending, end-of-sentence or language tokens

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
USPCT/US96/05567 1996-04-23
PCT/US1996/005567 WO1997040452A1 (fr) 1996-04-23 1996-04-23 Traduction automatisee de langage naturel

Related Child Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1996/005567 Continuation-In-Part WO1997040452A1 (fr) 1992-08-31 1996-04-23 Traduction automatisee de langage naturel

Publications (1)

Publication Number Publication Date
WO1997040453A1 true WO1997040453A1 (fr) 1997-10-30

Family

ID=22254991

Family Applications (2)

Application Number Title Priority Date Filing Date
PCT/US1996/005567 WO1997040452A1 (fr) 1992-08-31 1996-04-23 Traduction automatisee de langage naturel
PCT/US1996/010283 WO1997040453A1 (fr) 1992-08-31 1996-06-14 Traitement de la langue naturelle automatise

Family Applications Before (1)

Application Number Title Priority Date Filing Date
PCT/US1996/005567 WO1997040452A1 (fr) 1992-08-31 1996-04-23 Traduction automatisee de langage naturel

Country Status (2)

Country Link
JP (4) JP3743678B2 (fr)
WO (2) WO1997040452A1 (fr)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000062193A1 (fr) * 1999-04-08 2000-10-19 Kent Ridge Digital Labs Systeme permettant l'identification d'unites lexicales chinoises et la reconnaissance d'entites nommees
US6173252B1 (en) * 1997-03-13 2001-01-09 International Business Machines Corp. Apparatus and methods for Chinese error check by means of dynamic programming and weighted classes
US6278967B1 (en) * 1992-08-31 2001-08-21 Logovista Corporation Automated system for generating natural language translations that are domain-specific, grammar rule-based, and/or based on part-of-speech analysis
US6496844B1 (en) 1998-12-15 2002-12-17 International Business Machines Corporation Method, system and computer program product for providing a user interface with alternative display language choices

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3311957B2 (ja) 1997-03-25 2002-08-05 株式会社東芝 ユーザ辞書構築方法およびユーザ辞書構築装置および翻訳方法および翻訳装置
US6269189B1 (en) * 1998-12-29 2001-07-31 Xerox Corporation Finding selected character strings in text and providing information relating to the selected character strings
US6901360B1 (en) * 1999-12-16 2005-05-31 Xerox Corporation System and method for transferring packed linguistic structures
US8706477B1 (en) 2008-04-25 2014-04-22 Softwin Srl Romania Systems and methods for lexical correspondence linguistic knowledge base creation comprising dependency trees with procedural nodes denoting execute code
US8762131B1 (en) 2009-06-17 2014-06-24 Softwin Srl Romania Systems and methods for managing a complex lexicon comprising multiword expressions and multiword inflection templates
US8762130B1 (en) 2009-06-17 2014-06-24 Softwin Srl Romania Systems and methods for natural language processing including morphological analysis, lemmatizing, spell checking and grammar checking
US9600566B2 (en) 2010-05-14 2017-03-21 Microsoft Technology Licensing, Llc Identifying entity synonyms
US10032131B2 (en) 2012-06-20 2018-07-24 Microsoft Technology Licensing, Llc Data services for enterprises leveraging search system data assets
US9594831B2 (en) 2012-06-22 2017-03-14 Microsoft Technology Licensing, Llc Targeted disambiguation of named entities
US9229924B2 (en) * 2012-08-24 2016-01-05 Microsoft Technology Licensing, Llc Word detection and domain dictionary recommendation
US10445423B2 (en) * 2017-08-17 2019-10-15 International Business Machines Corporation Domain-specific lexically-driven pre-parser

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4805100A (en) * 1986-07-14 1989-02-14 Nippon Hoso Kyokai Language processing method and apparatus
US4964044A (en) * 1986-05-20 1990-10-16 Kabushiki Kaisha Toshiba Machine translation system including semantic information indicative of plural and singular terms
US5448474A (en) * 1993-03-03 1995-09-05 International Business Machines Corporation Method for isolation of Chinese words from connected Chinese text

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS63223962A (ja) * 1987-03-13 1988-09-19 Hitachi Ltd 翻訳装置
JPS63305463A (ja) * 1987-06-05 1988-12-13 Hitachi Ltd 自然言語処理方式
JPH0261763A (ja) * 1988-08-29 1990-03-01 Sharp Corp 機械翻訳装置

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4964044A (en) * 1986-05-20 1990-10-16 Kabushiki Kaisha Toshiba Machine translation system including semantic information indicative of plural and singular terms
US4805100A (en) * 1986-07-14 1989-02-14 Nippon Hoso Kyokai Language processing method and apparatus
US5448474A (en) * 1993-03-03 1995-09-05 International Business Machines Corporation Method for isolation of Chinese words from connected Chinese text

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ABE M ET AL: "A Kana-Kanji translation system for non-segmented input sentences based on syntactic and semantic analysis", 11TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL LINGUISTICS. PROCEEDINGS OF COLING '86, BONN, WEST GERMANY, 25-29 AUG. 1986, 1986, BONN, WEST GERMANY, INST. ANGEWANDTE KOMMUNIKATIONS & SPRACHFORSCHUNG, WEST GERMANY, pages 280 - 285, XP000612328 *
TELLER V ET AL: "A probabilistic algorithm for segmenting non-Kanji Japanese strings", PROCEEDING OF THE TWELFTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE, PROCEEDINGS OF TWELFTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-90), SEATTLE, WA, USA, 31 JULY-4 AUG. 1994, ISBN 0-262-61102-3, 1994, CAMBRIDGE, MA, USA, MIT PRESS, USA, pages 742 - 747 vol.1, XP000612334 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6278967B1 (en) * 1992-08-31 2001-08-21 Logovista Corporation Automated system for generating natural language translations that are domain-specific, grammar rule-based, and/or based on part-of-speech analysis
US6173252B1 (en) * 1997-03-13 2001-01-09 International Business Machines Corp. Apparatus and methods for Chinese error check by means of dynamic programming and weighted classes
US6496844B1 (en) 1998-12-15 2002-12-17 International Business Machines Corporation Method, system and computer program product for providing a user interface with alternative display language choices
WO2000062193A1 (fr) * 1999-04-08 2000-10-19 Kent Ridge Digital Labs Systeme permettant l'identification d'unites lexicales chinoises et la reconnaissance d'entites nommees
US6311152B1 (en) 1999-04-08 2001-10-30 Kent Ridge Digital Labs System for chinese tokenization and named entity recognition

Also Published As

Publication number Publication date
WO1997040452A1 (fr) 1997-10-30
JP2003016061A (ja) 2003-01-17
JP2001515616A (ja) 2001-09-18
JP3743678B2 (ja) 2006-02-08
JP2000514214A (ja) 2000-10-24
JP2006164293A (ja) 2006-06-22
JP3680865B2 (ja) 2005-08-10

Similar Documents

Publication Publication Date Title
US6760695B1 (en) Automated natural language processing
US6278967B1 (en) Automated system for generating natural language translations that are domain-specific, grammar rule-based, and/or based on part-of-speech analysis
US5528491A (en) Apparatus and method for automated natural language translation
US6470306B1 (en) Automated translation of annotated text based on the determination of locations for inserting annotation tokens and linked ending, end-of-sentence or language tokens
Trujillo Translation engines: techniques for machine translation
WO2002039318A1 (fr) Ponderation de traductions pouvant etre modifiee par un utilisateur
WO1997040453A1 (fr) Traitement de la langue naturelle automatise
Tajalli et al. Developing an informal-formal persian corpus
WO1997048058A1 (fr) Traduction automatisee de texte annote
WO1997048058A9 (fr) Traduction automatisee de texte annote
Kempen et al. Author environments: Fifth generation text processors
Keenan Large vocabulary syntactic analysis for text recognition
KR100327115B1 (ko) 부분 대역 패턴 데이터베이스에 기반한 번역문 생성장치및 그 방법
Rajendran Parsing in tamil: Present state of art
JP2632806B2 (ja) 言語解析装置
Tsutsumi A prototype English-Japanese machine translation system for translating IBM computer manuals
Samir et al. Training and evaluation of TreeTagger on Amazigh corpus
JP3743711B2 (ja) 自動自然言語翻訳システム
Altynbekova Artificial intelligence and translation technology
Yamron et al. LINGSTAT: An interactive, machine-aided translation system
Rodrigues et al. Arabic data science toolkit: An api for arabic language feature extraction
Tanev et al. LINGUA: a robust architecture for text processing and anaphora resolution in Bulgarian
WO1998057271A1 (fr) Systeme automatique de traduction et de retraduction
Zhou Super-Function Based Machine Translation System for Business User
Kozerenko Semantic Representations for Multilingual Natural Language Processing

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): JP US

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
WWE Wipo information: entry into national phase

Ref document number: 09171185

Country of ref document: US

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载