+

CN113239707B - Text translation method, text translation device and storage medium - Google Patents

Text translation method, text translation device and storage medium Download PDF

Info

Publication number
CN113239707B
CN113239707B CN202110226769.5A CN202110226769A CN113239707B CN 113239707 B CN113239707 B CN 113239707B CN 202110226769 A CN202110226769 A CN 202110226769A CN 113239707 B CN113239707 B CN 113239707B
Authority
CN
China
Prior art keywords
text
target entity
translated
translation
abbreviations
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110226769.5A
Other languages
Chinese (zh)
Other versions
CN113239707A (en
Inventor
孙于惠
李响
刘凯
成亦薇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Xiaomi Mobile Software Co Ltd
Beijing Xiaomi Pinecone Electronic Co Ltd
Original Assignee
Beijing Xiaomi Mobile Software Co Ltd
Beijing Xiaomi Pinecone Electronic Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Xiaomi Mobile Software Co Ltd, Beijing Xiaomi Pinecone Electronic Co Ltd filed Critical Beijing Xiaomi Mobile Software Co Ltd
Priority to CN202110226769.5A priority Critical patent/CN113239707B/en
Publication of CN113239707A publication Critical patent/CN113239707A/en
Application granted granted Critical
Publication of CN113239707B publication Critical patent/CN113239707B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/42Data-driven translation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

本公开是关于一种文本翻译方法、文本翻译装置及存储介质。文本翻译方法包括:获取待翻译文本,并识别所述待翻译文本中包括的目标实体词、以及所述目标实体词对应的缩写词;将所述缩写词,全部替换为所述目标实体词,得到所述待翻译文本对应的第一文本;基于第一文本,确定待翻译文本的翻译结果。通过本公开实施例,能够使待翻译文本中包括的目标实体词以及与目标实体词对应的缩写词具有相同的翻译结果,确保文本中目标实体词及其缩写词翻译的一致性。

The present disclosure relates to a text translation method, a text translation device and a storage medium. The text translation method comprises: obtaining a text to be translated, and identifying target entity words included in the text to be translated, and abbreviations corresponding to the target entity words; replacing all the abbreviations with the target entity words to obtain a first text corresponding to the text to be translated; and determining the translation result of the text to be translated based on the first text. Through the embodiments of the present disclosure, the target entity words included in the text to be translated and the abbreviations corresponding to the target entity words can have the same translation result, ensuring the consistency of the translation of the target entity words and their abbreviations in the text.

Description

文本翻译方法、文本翻译装置及存储介质Text translation method, text translation device and storage medium

技术领域Technical Field

本公开涉及语言处理技术领域,尤其涉及文本翻译方法、文本翻译装置及存储介质。The present disclosure relates to the field of language processing technology, and in particular to a text translation method, a text translation device and a storage medium.

背景技术Background Art

随着国际间频繁的合作往来,翻译行业的翻译质量和效率都遇到了很大的挑战,而随着人工智能的高速发展,机器翻译在翻译行业的巨大潜力开始逐步显现。机器翻译,利用计算机将一种自然语言转换为另一种自然语言。在大规模训练数据的支持下,机器翻译取得了较高质量,在准确性方面有了很大突破,在某些领域已经可以达到和人工译文媲美的程度。With the frequent international cooperation and exchanges, the translation industry has encountered great challenges in terms of translation quality and efficiency. With the rapid development of artificial intelligence, the huge potential of machine translation in the translation industry has gradually emerged. Machine translation uses computers to convert one natural language into another. With the support of large-scale training data, machine translation has achieved high quality and made great breakthroughs in accuracy. In some areas, it can already reach a level comparable to human translation.

但是,在某些实际翻译应用中,仍面临着一些问题。对于包括多个语句的一篇文本内容,需保证其中出现的指代同一对象的同一实体在翻译时保持一致。目前的机器翻译多将文本拆成单句,逐句进行翻译,因此,同一实体在不同的句子中可能会产生不同的翻译,在翻译中出现前后译文不连贯、实体翻译不一致等问题。特别是在同一文本中,同时存在实体以及实体的缩写词时,翻译不一致的现象更为严重,导致机器翻译效率低,翻译效果差。However, in some practical translation applications, there are still some problems. For a text content including multiple sentences, it is necessary to ensure that the same entity referring to the same object remains consistent during translation. Current machine translations often break the text into single sentences and translate them sentence by sentence. Therefore, the same entity may have different translations in different sentences, resulting in incoherent translations and inconsistent entity translations. In particular, when entities and their abbreviations exist in the same text, the phenomenon of inconsistent translation is more serious, resulting in low machine translation efficiency and poor translation results.

发明内容Summary of the invention

为克服相关技术中存在的问题,本公开提供文本翻译方法、文本翻译装置及存储介质。In order to overcome the problems existing in the related art, the present disclosure provides a text translation method, a text translation device and a storage medium.

根据本公开实施例的一方面,提供一种文本翻译方法,所述文本翻译方法包括:获取待翻译文本,并识别所述待翻译文本中包括的目标实体词、以及与所述目标实体词对应的缩写词;将所述缩写词,全部替换为所述目标实体词,得到所述待翻译文本对应的第一文本;基于所述第一文本,确定所述待翻译文本的翻译结果。According to one aspect of an embodiment of the present disclosure, a text translation method is provided, which includes: obtaining a text to be translated, and identifying target entity words included in the text to be translated, and abbreviations corresponding to the target entity words; replacing all the abbreviations with the target entity words to obtain a first text corresponding to the text to be translated; and determining a translation result of the text to be translated based on the first text.

在一些实施例中,基于所述第一文本,确定所述待翻译文本的翻译结果,包括:将所述第一文本中的所述目标实体词,以同一替换符进行替换,得到所述待翻译文本对应的第二文本;对所述第二文本中除所述替换符以外的其他文本进行翻译,得到第一翻译结果,并对所述目标实体词进行翻译,得到目标实体词的翻译结果;将所述第一翻译结果中的替换符替换为所述目标实体词的翻译结果,得到所述待翻译文本的最终翻译结果。In some embodiments, based on the first text, determining the translation result of the text to be translated includes: replacing the target entity word in the first text with the same replacement symbol to obtain a second text corresponding to the text to be translated; translating other texts in the second text except the replacement symbol to obtain a first translation result, and translating the target entity word to obtain a translation result of the target entity word; replacing the replacement symbol in the first translation result with the translation result of the target entity word to obtain a final translation result of the text to be translated.

在一些实施例中,所述识别所述待翻译文本中包括的目标实体词、以及与所述目标实体词对应的缩写词,包括:确定用于识别所述目标实体词以及与所述目标实体词对应的缩写词的规则;基于所述规则,识别待翻译文本内包括的所述目标实体词以及与所述目标实体词对应的缩写词。In some embodiments, the identifying of the target entity words included in the text to be translated and the abbreviations corresponding to the target entity words includes: determining rules for identifying the target entity words and the abbreviations corresponding to the target entity words; and based on the rules, identifying the target entity words included in the text to be translated and the abbreviations corresponding to the target entity words.

在一些实施例中,所述识别所述待翻译文本中包括的目标实体词、以及与所述目标实体词对应的缩写词,包括:基于指代消解模型,确定待翻译文本内包括的所述目标实体词以及与所述目标实体词对应的缩写词,和/或基于实体词与缩写词的对应关系,确定待翻译文本内包括的所述目标实体词以及与所述目标实体词对应的缩写词。In some embodiments, the identifying of the target entity words included in the text to be translated and the abbreviations corresponding to the target entity words includes: determining the target entity words included in the text to be translated and the abbreviations corresponding to the target entity words based on a reference resolution model, and/or determining the target entity words included in the text to be translated and the abbreviations corresponding to the target entity words based on the correspondence between entity words and abbreviations.

在一些实施例中,目标实体词包括人名,所述人名包括第一类型人名和第二类型人名,所述人名包括第一部分和第二部分;所述识别所述待翻译文本中包括的目标实体词、以及与所述目标实体词对应的缩写词,包括:基于用于确定人名的正则表达式,确定所述待翻译文本中包括的所述人名;若所述人名为所述第一类型人名,且所述待翻译文本中存在所述第一类型人名的所述第一部分,确定识别到所述第一类型人名对应的缩写词;若所述人名为所述第二类型人名,且所述待翻译文本中存在所述第二类型人名的所述第二部分,确定识别到所述第二类型人名对应的缩写词。In some embodiments, the target entity words include names, and the names include first-type names and second-type names, and the names include a first part and a second part; the identifying the target entity words included in the text to be translated, and the abbreviations corresponding to the target entity words, includes: determining the names included in the text to be translated based on a regular expression for determining names; if the names are names of the first type, and the first part of the names of the first type exists in the text to be translated, determining that the abbreviations corresponding to the names of the first type are identified; if the names are names of the second type, and the second part of the names of the second type exists in the text to be translated, determining that the abbreviations corresponding to the names of the second type are identified.

根据本公开实施例的又一方面,提供一种文本翻译装置,所述文本翻译装置包括:获取模块,用于获取待翻译文本;识别模块,用于识别所述待翻译文本中包括的目标实体词、以及与所述目标实体词对应的缩写词;确定模块,用于将所述缩写词,全部替换为所述目标实体词,得到所述待翻译文本对应的第一文本,并基于所述第一文本,确定所述待翻译文本的翻译结果。According to another aspect of an embodiment of the present disclosure, a text translation device is provided, comprising: an acquisition module for acquiring a text to be translated; an identification module for identifying target entity words included in the text to be translated, and abbreviations corresponding to the target entity words; a determination module for replacing all the abbreviations with the target entity words to obtain a first text corresponding to the text to be translated, and determining a translation result of the text to be translated based on the first text.

在一些实施例中,所述确定模块采用如下方式基于所述第一文本,确定所述待翻译文本的翻译结果:将所述第一文本中的所述目标实体词,以同一替换符进行替换,得到所述待翻译文本对应的第二文本;对所述第二文本中除所述替换符以外的其他文本进行翻译,得到第一翻译结果,并对所述目标实体词进行翻译,得到目标实体词的翻译结果;将所述第一翻译结果中的替换符替换为所述目标实体词的翻译结果,得到所述待翻译文本的最终翻译结果。In some embodiments, the determination module determines the translation result of the text to be translated based on the first text in the following manner: replace the target entity word in the first text with the same replacement symbol to obtain a second text corresponding to the text to be translated; translate other texts in the second text except the replacement symbol to obtain a first translation result, and translate the target entity word to obtain a translation result of the target entity word; replace the replacement symbol in the first translation result with the translation result of the target entity word to obtain a final translation result of the text to be translated.

在一些实施例中,所述识别模块采用如下方式识别所述待翻译文本中包括的目标实体词、以及与所述目标实体词对应的缩写词:确定用于识别所述目标实体词以及与所述目标实体词对应的缩写词的规则;基于所述规则,识别待翻译文本内包括的所述目标实体词以及与所述目标实体词对应的缩写词。In some embodiments, the recognition module recognizes the target entity words and the abbreviations corresponding to the target entity words included in the text to be translated in the following manner: determining rules for recognizing the target entity words and the abbreviations corresponding to the target entity words; based on the rules, recognizing the target entity words and the abbreviations corresponding to the target entity words included in the text to be translated.

在一些实施例中,所述识别模块采用如下方式识别所述待翻译文本中包括的目标实体词、以及与所述目标实体词对应的缩写词:基于指代消解模型,确定待翻译文本内包括的所述目标实体词以及所述目标实体词对应的缩写词,和/或基于实体词与缩写词的对应关系,确定待翻译文本内包括的所述目标实体词以及与所述目标实体词对应的缩写词。In some embodiments, the recognition module identifies the target entity words included in the text to be translated and the abbreviations corresponding to the target entity words in the following manner: based on a reference resolution model, determining the target entity words included in the text to be translated and the abbreviations corresponding to the target entity words, and/or based on the correspondence between entity words and abbreviations, determining the target entity words included in the text to be translated and the abbreviations corresponding to the target entity words.

在一些实施例中,目标实体词包括人名,所述人名包括第一类型人名和第二类型人名,所述人名包括第一部分和第二部分;所述识别模块采用如下方式识别所述待翻译文本中包括的目标实体词、以及与所述目标实体词对应的缩写词,包括:基于用于确定人名的正则表达式,确定所述待翻译文本中包括的所述人名;若所述人名为所述第一类型人名,且所述待翻译文本中存在所述第一类型人名的所述第一部分,确定识别到所述第一类型人名对应的缩写词;若所述人名为所述第二类型人名,且所述待翻译文本中存在所述第二类型人名的所述第二部分,确定识别到所述第二类型人名对应的缩写词。In some embodiments, the target entity words include names, and the names include first-type names and second-type names, and the names include a first part and a second part; the recognition module recognizes the target entity words included in the text to be translated, and the abbreviations corresponding to the target entity words in the text to be translated, including: determining the names included in the text to be translated based on a regular expression for determining names; if the names are names of the first type, and the first part of the names of the first type exists in the text to be translated, determining that the abbreviations corresponding to the names of the first type are recognized; if the names are names of the second type, and the second part of the names of the second type exists in the text to be translated, determining that the abbreviations corresponding to the names of the second type are recognized.

根据本公开实施例的又一方面,提供一种文本翻译装置,包括:处理器;用于存储处理器可执行指令的存储器;其中,处理器被配置为:执行前述任意一项所述的文本翻译方法。According to another aspect of an embodiment of the present disclosure, a text translation device is provided, comprising: a processor; and a memory for storing instructions executable by the processor; wherein the processor is configured to: execute any one of the aforementioned text translation methods.

根据本公开实施例的又一方面,提供一种非临时性计算机可读存储介质,当存储介质中的指令由移动终端的处理器执行时,使得移动终端能够执行前述任意一项所述的文本翻译方法。According to another aspect of an embodiment of the present disclosure, a non-transitory computer-readable storage medium is provided. When instructions in the storage medium are executed by a processor of a mobile terminal, the mobile terminal can execute any of the aforementioned text translation methods.

本公开的实施例提供的技术方案可以包括以下有益效果:通过本公开实施例,获取待翻译文本,并识别待翻译文本中包括的目标实体词、与目标实体词对应的缩写词,将待翻译文本中的缩写词全部替换为目标实体词,对替换后的待翻译文本进行翻译,通过缩写词的替换能够使待翻译文本中包括的目标实体词以及与目标实体词对应的缩写词具有相同的翻译结果,确保文本中目标实体词及其缩写词翻译的一致性。The technical solution provided by the embodiments of the present disclosure may include the following beneficial effects: through the embodiments of the present disclosure, a text to be translated is obtained, and target entity words and abbreviations corresponding to the target entity words included in the text to be translated are identified, all abbreviations in the text to be translated are replaced with target entity words, and the replaced text to be translated is translated. By replacing the abbreviations, the target entity words included in the text to be translated and the abbreviations corresponding to the target entity words can have the same translation results, thereby ensuring the consistency of the translation of the target entity words and their abbreviations in the text.

应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本公开。It is to be understood that the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the present disclosure.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本公开的实施例,并与说明书一起用于解释本公开的原理。The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the present disclosure.

图1是根据本公开一示例性实施例示出的一种文本翻译方法的流程图。Fig. 1 is a flow chart showing a text translation method according to an exemplary embodiment of the present disclosure.

图2是根据本公开一示例性实施例示出的一种确定待翻译文本翻译结果的方法的流程图。Fig. 2 is a flow chart showing a method for determining a translation result of a text to be translated according to an exemplary embodiment of the present disclosure.

图3是根据本公开一示例性实施例示出的一种识别目标实体词以及与目标实体词对应的缩写词方法的流程图。Fig. 3 is a flowchart of a method for identifying a target entity word and an abbreviation corresponding to the target entity word according to an exemplary embodiment of the present disclosure.

图4是根据本公开一示例性实施例示出的一种文本翻译装置框图。Fig. 4 is a block diagram of a text translation device according to an exemplary embodiment of the present disclosure.

图5根据本公开一示例性实施例示出的一种用于文本翻译的装置的框图。Fig. 5 is a block diagram of a device for text translation according to an exemplary embodiment of the present disclosure.

具体实施方式DETAILED DESCRIPTION

这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本公开相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本公开的一些方面相一致的装置和方法的例子。Exemplary embodiments will be described in detail herein, examples of which are shown in the accompanying drawings. When the following description refers to the drawings, the same numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present disclosure. Instead, they are merely examples of devices and methods consistent with some aspects of the present disclosure as detailed in the appended claims.

机器翻译,又称为自动翻译,是利用计算机将一种自然语言即源语言,转换为另一种自然语言即目标语言的方法。在大规模训练数据的支持下,机器翻译取得了较高质量,在准确性方面有了很大突破,在某些领域已经可以达到和人工译文媲美的程度。机器翻译研究方法分为规则和统计两种。由于规则系统开发周期长,资金和人力的需求大,规则系统进展缓慢。相对而言,统计方法开发周期短、便于处理大规模语料等优点而显出优势。在统计机器翻译方法中,基于短语的翻译方法得到充分的发展。Machine translation, also known as automatic translation, is a method of using computers to convert a natural language, namely the source language, into another natural language, namely the target language. With the support of large-scale training data, machine translation has achieved high quality and made great breakthroughs in accuracy. In some areas, it can already reach a level comparable to human translation. There are two types of machine translation research methods: rule-based and statistical. Due to the long development cycle of the rule system and the large demand for funds and manpower, the rule system has made slow progress. Relatively speaking, the statistical method has advantages such as a short development cycle and the convenience of processing large-scale corpus. Among the statistical machine translation methods, the phrase-based translation method has been fully developed.

在某些实际翻译应用中,仍面临着一些问题。对于包括多个语句的一篇文本内容,需保证其中出现的指代同一对象的同一实体在翻译时保持一致。目前的机器翻译多将文本拆成单句,逐句进行翻译。In some practical translation applications, there are still some problems. For a text content including multiple sentences, it is necessary to ensure that the same entity referring to the same object appears in it remains consistent during translation. Current machine translation mostly breaks the text into single sentences and translates them sentence by sentence.

例如,在一些可能的实现方式中,从包括大量的中文单语篇章数据Y,抽取其中一条数据y=[y1,y2,…,yn],n代表篇章数据Y中包括的n个子句。基于中英翻译模型,得到数据y在句子级别下的翻译结果为x,x=[x1,x2,……,xn]。利用英中翻译模型,得到数据x的翻译结果为y’,y’=[y1’,y2’,…,yn’]。利用用(y’,y)训练翻译模型,保证实体翻译一致性。For example, in some possible implementations, a data y=[y1, y2, ..., yn] is extracted from a large amount of Chinese monolingual text data Y, where n represents the n clauses included in the text data Y. Based on the Chinese-English translation model, the translation result of data y at the sentence level is x, where x=[x1, x2, ..., xn]. Using the English-Chinese translation model, the translation result of data x is y', where y'=[y1', y2', ..., yn']. The translation model is trained with (y', y) to ensure entity translation consistency.

上述表格示出了利用翻译模型进行英文文本的中文翻译时得到的译文,待翻译英文文本包括两个语句。通过对文本的阅读分析,可知第一个语句中的“Chen Wei”和第二个语句中的“Chen”,指代的是同一对象,第二个语句中的“Chen”为第一个语句中“Ch en Wei”的缩写。然而,基于翻译模型在句子级别翻译下,“Chen Wei”和“Chen”在中文译文中生成了不同的翻译。为了保证人名翻译的一致性,可以是基于篇章翻译模型进行建模,例如,以第一句话作为源文件内容(Source Document Context),解码第二句话,即源当前语句(Source Current Sentence)。在对第二句话进行建模时,引入了上下文(Context)信息,以引导翻译模型在生成译文的时候和已经生成译文的部分保持连贯性,和相关实体翻译的一致性。但是模型学习Context信息不充分时,篇章模型建模的方法仍不能保证对人名和其缩写的翻译结果的一致性。The above table shows the translation obtained when the English text is translated into Chinese using the translation model. The English text to be translated includes two sentences. Through the reading analysis of the text, it can be seen that "Chen Wei" in the first sentence and "Chen" in the second sentence refer to the same object, and "Chen" in the second sentence is the abbreviation of "Ch en Wei" in the first sentence. However, based on the translation model at the sentence level translation, "Chen Wei" and "Chen" generate different translations in the Chinese translation. In order to ensure the consistency of the translation of personal names, modeling can be based on the text translation model. For example, the first sentence is used as the source document content (Source Document Context) and the second sentence is decoded, that is, the source current sentence (Source Current Sentence). When modeling the second sentence, context information is introduced to guide the translation model to maintain coherence with the part of the generated translation when generating the translation, and consistency with the translation of related entities. However, when the model does not learn sufficient context information, the text modeling method still cannot guarantee the consistency of the translation results of personal names and their abbreviations.

由此,本公开提供一种文本翻译方法,通过获取待翻译文本,并识别待翻译文本中包括的目标实体词、目标实体词对应的缩写词,将待翻译文本中的缩写词全部替换为目标实体词,通过缩写词的替换能够使实体词以及实体词对应的缩写词具有相同的翻译结果。Therefore, the present disclosure provides a text translation method, which obtains a text to be translated, identifies target entity words and abbreviations corresponding to the target entity words included in the text to be translated, and replaces all abbreviations in the text to be translated with the target entity words. By replacing the abbreviations, the entity words and the abbreviations corresponding to the entity words can have the same translation results.

图1是根据本公开一示例性实施例示出的一种文本翻译方法的流程图,如图1所示,文本翻译方法包括以下步骤。FIG. 1 is a flow chart of a text translation method according to an exemplary embodiment of the present disclosure. As shown in FIG. 1 , the text translation method includes the following steps.

在步骤S101中,获取待翻译文本,并识别待翻译文本中包括的目标实体词、以及与目标实体词对应的缩写词。In step S101, a text to be translated is obtained, and target entity words and abbreviations corresponding to the target entity words included in the text to be translated are identified.

在步骤S102中,将缩写词,全部替换为目标实体词,得到待翻译文本对应的第一文本。In step S102, all abbreviations are replaced with target entity words to obtain a first text corresponding to the text to be translated.

在步骤S103中,基于第一文本,确定待翻译文本的翻译结果。In step S103, a translation result of the text to be translated is determined based on the first text.

在本公开实施例中,文本翻译方法可应用于各种类型的电子设备中,电子设备可以是移动设备和固定设备,移动设备可以是手机、平板电脑等,固定设备包括个人电脑、智能掌上助理等。文本翻译方法还可应用于终端应用或者网页中。待翻译文本可以是包括多个句子的文本,例如篇章文本等;待翻译文本中包括目标实体词,该目标实体词可以是人名、地名、机构、专有名词、术语或组织等名称。In the disclosed embodiments, the text translation method can be applied to various types of electronic devices, and the electronic devices can be mobile devices and fixed devices. The mobile devices can be mobile phones, tablet computers, etc., and the fixed devices include personal computers, smart handheld assistants, etc. The text translation method can also be applied to terminal applications or web pages. The text to be translated can be a text including multiple sentences, such as a chapter text, etc. The text to be translated includes a target entity word, which can be a name of a person, a place name, an institution, a proper noun, a term or an organization.

在本公开实施例中,对待翻译文本进行翻译时,获取待翻译文本,待翻译文本中包括目标实体词、待翻译文本中还包括目标实体词对应的缩写词。例如,待翻译文本为英文语言文本,包括两个句子,即“A year later,Dr.Chen Wei was ordered to help thecountry build a subway.Her husband said when they were told a frame of Chenwas aired on a night TV show,he was surprised”。待翻译文本的目标实体词为人名“Chen Wei”,与其对应的缩写词为“Chen”。将待翻译文本中的缩写词“Chen”,全部替换为缩写词“Chen”对应的目标实体词“Chen Wei”,得到待翻译文本对应的第一文本。可知,第一文本中的“Chen”全部被替换为“Chen Wei”。上述第一文本为“A year later,Dr.Chen Weiwas ordered to help the country build a subway.Her husband said when theywere told a frame of Chen Wei was aired on a night TV show,he was surprised”。通过替换,使待翻译文本中包括的目标实体词以及与目标实体词对应的缩写词保持一致。对进行替换后的第一文本进行翻译时,可以是利用翻译模型进行翻译,翻译模型可以是神经网络模型,例如,基于卷积神经网络模型 (CNN)、循环神经网络模型(RNN)或长短时记忆系统(LSTM)等训练得到,本公开实施例对此不做限制。经过翻译,得到待翻译文本的翻译结果。In the disclosed embodiment, when translating a text to be translated, the text to be translated is obtained, and the text to be translated includes a target entity word and an abbreviation corresponding to the target entity word. For example, the text to be translated is an English language text, including two sentences, namely, "A year later, Dr. Chen Wei was ordered to help the country build a subway. Her husband said when they were told a frame of Chen was aired on a night TV show, he was surprised". The target entity word of the text to be translated is the name "Chen Wei", and the abbreviation corresponding to it is "Chen". The abbreviation "Chen" in the text to be translated is completely replaced with the target entity word "Chen Wei" corresponding to the abbreviation "Chen", and the first text corresponding to the text to be translated is obtained. It can be seen that all "Chen" in the first text is replaced with "Chen Wei". The above-mentioned first text is "A year later, Dr. Chen Wei was ordered to help the country build a subway. Her husband said when they were told a frame of Chen Wei was aired on a night TV show, he was surprised". By replacing, the target entity words and the abbreviations corresponding to the target entity words included in the text to be translated are kept consistent. When translating the first text after the replacement, the translation model can be used for translation. The translation model can be a neural network model, for example, based on a convolutional neural network model (CNN), a recurrent neural network model (RNN) or a long short-term memory system (LSTM) and the like, and the embodiments of the present disclosure do not limit this. After translation, a translation result of the text to be translated is obtained.

根据本公开实施例,通过获取待翻译文本,并识别待翻译文本中包括的目标实体词、目标实体词对应的缩写词,将待翻译文本中的缩写词全部替换为目标实体词,对替换后的待翻译文本进行翻译,通过缩写词的替换能够使待翻译文本中包括的目标实体词以及与目标实体词对应的缩写词具有相同的翻译结果,确保文本中目标实体词及其缩写词翻译的一致性。According to an embodiment of the present disclosure, by obtaining a text to be translated and identifying target entity words and abbreviations corresponding to the target entity words included in the text to be translated, all abbreviations in the text to be translated are replaced with target entity words, and the replaced text to be translated is translated. By replacing the abbreviations, the target entity words included in the text to be translated and the abbreviations corresponding to the target entity words can have the same translation results, thereby ensuring the consistency of the translation of the target entity words and their abbreviations in the text.

图2是根据本公开一示例性实施例示出的一种确定待翻译文本翻译结果的方法的流程图,如图2所示,确定待翻译文本翻译结果的方法包括以下步骤。FIG2 is a flow chart showing a method for determining a translation result of a text to be translated according to an exemplary embodiment of the present disclosure. As shown in FIG2 , the method for determining a translation result of a text to be translated includes the following steps.

在步骤S201中,将第一文本中的目标实体词,以同一替换符进行替换,得到待翻译文本对应的第二文本。In step S201, the target entity word in the first text is replaced with the same replacement symbol to obtain a second text corresponding to the text to be translated.

在步骤S202中,对第二文本中除替换符以外的其他文本进行翻译,得到第一翻译结果,并对目标实体词进行翻译,得到目标实体词的翻译结果。In step S202, the other texts in the second text except the replacement symbol are translated to obtain a first translation result, and the target entity word is translated to obtain a translation result of the target entity word.

在步骤S203中,将第一翻译结果中的替换符替换为目标实体词的翻译结果,得到待翻译文本的最终翻译结果。In step S203, the replacement symbol in the first translation result is replaced with the translation result of the target entity word to obtain a final translation result of the text to be translated.

在本公开实施例中,对待翻译文本进行翻译时,获取待翻译文本,待翻译文本中包括目标实体词、待翻译文本中还包括目标实体词对应的缩写词。将待翻译文本中的目标实体词对应的缩写词,全部替换为目标实体词,得到待翻译文本对应的第一文本。将第一文本中的全部目标实体词,以同一替换符进行替换,得到待翻译文本对应的第二文本。可以理解地,待翻译文本、待翻译文本对应的第一文本,待翻译文本对应的第二文本具有对应关系。In an embodiment of the present disclosure, when translating a text to be translated, the text to be translated is obtained, and the text to be translated includes a target entity word, and the text to be translated also includes an abbreviation corresponding to the target entity word. All the abbreviations corresponding to the target entity words in the text to be translated are replaced with the target entity words to obtain a first text corresponding to the text to be translated. All the target entity words in the first text are replaced with the same replacement symbol to obtain a second text corresponding to the text to be translated. It can be understood that the text to be translated, the first text corresponding to the text to be translated, and the second text corresponding to the text to be translated have a corresponding relationship.

仍以上述待翻译文本为例,待翻译文本为英文文本,“A year later,Dr.Chen Weiwas ordered to help the country build a subway.Her husband said when theywere told a frame of Chen was aired on a night TV show,he was surprised”。待翻译文本中的目标实体词为“Chen Wei”,“Chen Wei”对应的缩写词为“Chen”。待翻译文本对应的第一文本为“A year later, Dr.Chen Wei was ordered to help the countrybuild a subway.Her husband said when they were told a frame of Chen Wei wasaired on a night TV show,he was surprised”。将第一文本中的两个目标实体词“ChenWei”,以同一替换符“$tag”进行替换,得到待翻译文本对应的第二文本。即第二文本为“Ayear later,Dr.$tag was ordered to help the country build a subway.Her husbandsaid when they were told a frame of$tag was aired on a night TV show,he wassurprised”。对第二文本中除替换符以外的其他文本进行英译中翻译,得到第一翻译结果,即第一翻译结果为“一年后,$tag博士奉命帮助这个国家建设地铁。她的丈夫说,当他们被告知$tag的一帧画面在一个夜间电视节目中播出时,他感到很惊讶”。对目标实体词“ChenWei”进行翻译,得到目标实体词的翻译结果“陈伟”。将第一翻译结果中的替换符$tag替换为目标实体词的翻译结果,得到待翻译文本的最终翻译结果。待翻译文本对应的翻译结果为“一年后,陈伟博士奉命帮助这个国家建设地铁。她的丈夫说,当他们被告知陈伟的一帧画面在一个夜间电视节目中播出时,他感到很惊讶”,确保文本中目标实体词以及与目标实体词对应的缩写词翻译的一致性。Still taking the above text to be translated as an example, the text to be translated is an English text, "A year later, Dr. Chen Wei was ordered to help the country build a subway. Her husband said when they were told a frame of Chen was aired on a night TV show, he was surprised". The target entity word in the text to be translated is "Chen Wei", and the abbreviation corresponding to "Chen Wei" is "Chen". The first text corresponding to the text to be translated is "A year later, Dr. Chen Wei was ordered to help the country build a subway. Her husband said when they were told a frame of Chen Wei was aired on a night TV show, he was surprised". The two target entity words "Chen Wei" in the first text are replaced with the same replacement symbol "$tag" to obtain the second text corresponding to the text to be translated. That is, the second text is "A year later, Dr. $tag was ordered to help the country build a subway. Her husband said when they were told a frame of $tag was aired on a night TV show, he was surprised". Translate the other texts in the second text except the replacement symbol into Chinese from English, and obtain the first translation result, that is, the first translation result is "A year later, Dr. $tag was ordered to help the country build a subway. Her husband said that he was surprised when they were told that a frame of $tag was broadcast on a nightly TV program." Translate the target entity word "ChenWei" to obtain the translation result of the target entity word "Chen Wei". Replace the replacement symbol $tag in the first translation result with the translation result of the target entity word to obtain the final translation result of the text to be translated. The translation result corresponding to the text to be translated is "A year later, Dr. Chen Wei was ordered to help the country build a subway. Her husband said that he was surprised when they were told that a frame of Chen Wei was broadcast on a nightly TV program", ensuring the consistency of the translation of the target entity words and the abbreviations corresponding to the target entity words in the text.

根据本公开实施例,通过将第一文本中的目标实体词,以同一替换符进行替换,得到待翻译文本对应的第二文本,对第二文本中除所述替换符以外的其他文本进行翻译,得到第一翻译结果,并对目标实体词进行翻译,得到目标实体词的翻译结果;将第一翻译结果中的替换符替换为目标实体词的翻译结果,得到待翻译文本的最终翻译结果,通过缩写词的替换能够使待翻译文本中包括的目标实体词以及与目标实体词对应的缩写词具有相同的翻译结果,确保文本中目标实体词及其缩写词翻译的一致性。According to an embodiment of the present disclosure, a target entity word in a first text is replaced with the same replacement symbol to obtain a second text corresponding to the text to be translated, other texts in the second text except the replacement symbol are translated to obtain a first translation result, and the target entity word is translated to obtain a translation result of the target entity word; the replacement symbol in the first translation result is replaced with the translation result of the target entity word to obtain a final translation result of the text to be translated, and the replacement of abbreviations can make the target entity words included in the text to be translated and the abbreviations corresponding to the target entity words have the same translation results, thereby ensuring the consistency of the translation of the target entity words and their abbreviations in the text.

图3是根据本公开一示例性实施例示出的一种识别目标实体词以及与目标实体词对应的缩写词方法的流程图,如图3所示,识别目标实体词以及与目标实体词对应的缩写词方法包括以下步骤。FIG3 is a flowchart of a method for identifying a target entity word and an abbreviation corresponding to the target entity word according to an exemplary embodiment of the present disclosure. As shown in FIG3 , the method for identifying a target entity word and an abbreviation corresponding to the target entity word includes the following steps.

在步骤S301中,确定用于识别目标实体词以及与目标实体词的缩写词的规则。In step S301 , a rule for identifying a target entity word and an abbreviation related to the target entity word is determined.

在步骤S302中,基于规则,识别待翻译文本内包括的目标实体词以及与目标实体词对应的缩写词。In step S302, target entity words and abbreviations corresponding to the target entity words included in the text to be translated are identified based on the rules.

在本公开实施例中,对待翻译文本进行翻译时,获取待翻译文本,待翻译文本中包括目标实体词、待翻译文本中还包括目标实体词对应的缩写词。例如,目标实体词为人名时,人名可以是中国人名或外国人名,中国人名或外国人名均由姓、名两部分组成,当待翻译文本为英文文本时,人名的英文表达中每一部分首字母为大写形式。中国人名的构成一般是姓在前,名在后,例如英文文本中的中国人名“Li Lei”,“Li”为姓,“Lei”为名。外国人名的构成一般是名在前,姓在后,例如,英文文本中的外国人名“Jim Green”,“Green”为姓,“Jim”为名。确定识别待翻译文本中包括的目标实体词的缩写词的规则,例如,可以是制定正则表达式,“[英文名/中文姓][A-Z][a-z]+”。正则表达式,是采用预先定义的一些特定字符、及这些特定字符的组合,组成一个“规则字符串”,以对字符串操作的一种过滤逻辑公式。本公开实施例中的正则表达式对待翻译文本,从待翻译文本中提取正则表达式相匹配的中国人名或外国人名。目标实体词为人名时,所述人名可以包括第一类型人名和第二类型人名;无论什么类型的人名,该人名可以包括两部分,即第一部分和第二部分;可以理解的是,第一类型人名可以为中文人名,第二类型人名可以为英文人名。不同语言的人名对应的格式有一定区别,在人名为中文人名的情况下,第一部分包括中文姓氏,第二部分包括中文名字;在人名为英文人名的情况下,第一部分包括英文名字,第二部分包括英文姓氏。因此,在识别人名全称后需要先判断该人名的类型,再根据该人名的类型确定其缩写词的对应部分。In the disclosed embodiment, when the text to be translated is translated, the text to be translated is obtained, and the text to be translated includes the target entity word, and the text to be translated also includes the abbreviation corresponding to the target entity word. For example, when the target entity word is a name, the name can be a Chinese name or a foreign name, and the Chinese name or the foreign name is composed of two parts, the surname and the given name. When the text to be translated is an English text, the first letter of each part in the English expression of the name is in capital form. The composition of Chinese names is generally the surname first and the given name second, for example, the Chinese name "Li Lei" in the English text, "Li" is the surname, and "Lei" is the given name. The composition of foreign names is generally the given name first and the surname second, for example, the foreign name "Jim Green" in the English text, "Green" is the surname, and "Jim" is the given name. Determine the rules for identifying the abbreviations of the target entity words included in the text to be translated, for example, it can be to formulate a regular expression, "[English name/Chinese surname][A-Z][a-z]+". Regular expressions use some pre-defined specific characters and combinations of these specific characters to form a "regular string" to operate on a string. A filtering logic formula. The regular expression in the embodiment of the present disclosure treats the translation text and extracts Chinese names or foreign names that match the regular expression from the text to be translated. When the target entity word is a name, the name may include a first type of name and a second type of name; no matter what type of name, the name may include two parts, namely a first part and a second part; it can be understood that the first type of name may be a Chinese name, and the second type of name may be an English name. There are certain differences in the formats corresponding to names in different languages. When the name is a Chinese name, the first part includes a Chinese surname, and the second part includes a Chinese name; when the name is an English name, the first part includes an English name, and the second part includes an English surname. Therefore, after identifying the full name of a person, it is necessary to first determine the type of the name, and then determine the corresponding part of its abbreviation according to the type of the name.

基于正则表达式确定待翻译文本内包括的人名,若人名为第一类型人名,例如中文人名,且待翻译文本中存在中文人名的第一部分,确定识别到中文人名对应的缩写词。若人名为第二类型人名,例如英文人名,且待翻译文本中存在英文人名的第二部分,确定识别到英文人名对应的缩写词。在一示例中,待翻译文本为英文文本时,基于上述正则表达式确定识别出待翻译文本中包括人名“Chen Wei”。进一步地,判断识别出的人名,即组成人名的两部分中是否存在中国人名的姓,确定识别出的人名是中国人名还是外国人名。中国人名的缩写是人名的第一部分,外国人名的缩写是人名的第二部分。例如,英文待翻译文本中的中国人名“Li Lei”的缩写词为“Li”,英文待翻译文本中的外国人名“Jim Green”的缩写词为“Green”。对于上述示例中的英文语言待翻译文本,基于正则表达式,在待翻译文本中识别出人名“Chen Wei”,并基于规则,确定人名“Chen Wei”的缩写即“Chen”。将待翻译文本中的人名对应的缩写词“Chen”,全部替换为人名“Chen Wei”,得到待翻译文本对应的第一文本。通过替换,使待翻译文本中包括的目标实体词以及与目标实体词对应的缩写词保持一致,对进行替换后的第一文本进行翻译,得到翻译结果。Based on the regular expression, the names included in the text to be translated are determined. If the name is a first type of name, such as a Chinese name, and the first part of the Chinese name exists in the text to be translated, it is determined that the abbreviation corresponding to the Chinese name is recognized. If the name is a second type of name, such as an English name, and the second part of the English name exists in the text to be translated, it is determined that the abbreviation corresponding to the English name is recognized. In one example, when the text to be translated is an English text, it is determined based on the above regular expression that the name "Chen Wei" is included in the text to be translated. Further, it is determined whether the recognized name, that is, whether there is a surname of a Chinese name in the two parts that make up the name, to determine whether the recognized name is a Chinese name or a foreign name. The abbreviation of a Chinese name is the first part of the name, and the abbreviation of a foreign name is the second part of the name. For example, the abbreviation of the Chinese name "Li Lei" in the English text to be translated is "Li", and the abbreviation of the foreign name "Jim Green" in the English text to be translated is "Green". For the English text to be translated in the above example, based on the regular expression, the name "Chen Wei" is identified in the text to be translated, and based on the rule, the abbreviation of the name "Chen Wei", namely "Chen", is determined. All the abbreviations "Chen" corresponding to the name in the text to be translated are replaced with the name "Chen Wei" to obtain the first text corresponding to the text to be translated. By replacing, the target entity words included in the text to be translated and the abbreviations corresponding to the target entity words are kept consistent, and the first text after replacement is translated to obtain the translation result.

可以理解地,在识别待翻译文本内包括的目标实体词,以及识别目标实体词对应的缩写词,还可用利用命名体识别工具自动识别。命名体识别工具可以为根据包括不同词性的词的句子训练而成。It is understandable that the target entity words included in the text to be translated and the abbreviations corresponding to the target entity words can also be automatically identified using a named entity recognition tool. The named entity recognition tool can be trained based on sentences including words of different parts of speech.

根据本公开实施例,确定识别待翻译文本中包括的目标实体词以及与目标实体词的缩写词的规则,并基于确定的规则,识别待翻译文本内包括的目标实体词以及与目标实体词对应的缩写词,能够确保准确地识别出目标实体词以及与目标实体词对应的缩写词,确保文本中目标实体词及其缩写词翻译的一致性,提高文本翻译质量。According to the embodiments of the present disclosure, rules for identifying target entity words and abbreviations of the target entity words included in the text to be translated are determined, and based on the determined rules, the target entity words and abbreviations corresponding to the target entity words included in the text to be translated are identified, which can ensure that the target entity words and the abbreviations corresponding to the target entity words are accurately identified, ensure the consistency of the translation of the target entity words and their abbreviations in the text, and improve the quality of text translation.

在本公开实施例中,基于指代消解模型,确定待翻译文本内包括的目标实体词以及与目标实体词对应的缩写词。指代消解模型用于分析词语序列,对分词结果中的指代词、隐形指代进行目标替换,完善语句的语义结构。在基于指代消解模型确定待翻译文本内包括的目标实体词时,对待翻译文本进行分词,可以是通过实体识别模型,确定待翻译文本中的全部实体词,实体词可以是多个。对于识别出的多个实体词,每两个实体词构建词对,利用二分类的模型,判断由两个实体词构建的词对是否指代同一实体。以待翻译文本“TheWorld Health Organization(WHO)is a specialized agency of the United Nationsresponsible for international public health”为例,对文本进行分词,利用实体识别模型进行识别,确定文本中包括的多个实体,即“World Health Organization”、“WHO”和“the United Nations”。上述每两个实体组成词对,将<World Health Organization,WHO>、<World Health Organization,the United Nations>输入二分类的模型,判断<WorldHealth Organization, WHO>指代的是同一个实体,WHO为World Health Organization的缩写词。In the disclosed embodiment, based on the reference resolution model, the target entity words included in the text to be translated and the abbreviations corresponding to the target entity words are determined. The reference resolution model is used to analyze the word sequence, target replacement of the reference words and invisible references in the segmentation results, and improve the semantic structure of the sentence. When the target entity words included in the text to be translated are determined based on the reference resolution model, the text to be translated is segmented, and all entity words in the text to be translated can be determined by the entity recognition model, and the entity words can be multiple. For the multiple entity words identified, every two entity words construct a word pair, and a binary classification model is used to determine whether the word pair constructed by the two entity words refers to the same entity. Taking the text to be translated "The World Health Organization (WHO) is a specialized agency of the United Nations responsible for international public health" as an example, the text is segmented, and the entity recognition model is used to identify multiple entities included in the text, namely "World Health Organization", "WHO" and "the United Nations". Each of the above two entities forms a word pair. <World Health Organization, WHO> and <World Health Organization, the United Nations> are input into the binary classification model to determine that <World Health Organization, WHO> refers to the same entity, and WHO is the abbreviation of World Health Organization.

根据本公开实施例,基于指代消解模型,确定待翻译文本内包括的目标实体词以及与目标实体词对应的缩写词,能够确保对多种场景下、多种类别的目标实体词以及与目标实体词对应的缩写词的有效识别,确保文本中目标实体词及其缩写词翻译的一致性,提高文本翻译质量。According to the embodiments of the present disclosure, based on the reference resolution model, the target entity words and the abbreviations corresponding to the target entity words included in the text to be translated are determined, which can ensure the effective recognition of the target entity words and the abbreviations corresponding to the target entity words in various scenarios and categories, ensure the consistency of the translation of the target entity words and their abbreviations in the text, and improve the quality of text translation.

在本公开实施例中,创建目标实体词,以及与目标实体词对应的缩写词之间的对应关系,在确定待翻译文本内包括的目标实体词以及与目标实体词对应的缩写词时,利用对应关系确定待翻译文本中包括的目标实体词,并确定与目标实体词对应的缩写词。例如,创建目标实体词,以及与目标实体词对应的缩写词之间的对应关系可以是创建表征对应关系的表格,通过表格可以获取目标实体词,以及与目标实体词对应的缩写词。对应关系中,可以包括个人电脑的英文全称Personal Computer,以及Personal Computer对应的缩写词PC。还包括世界卫生组织的英文全称World Health Organization,及World HealthOrganization对应的缩写WHO。对于待翻译文本“The World Health Organization isaspecialized agency of the United Nations responsible for internationalpublic health.The WHO Constitution,which establishes the agency's governingstructure and principles,states its main objective as"the attainment by allpeoples of the highest possible level of health”,基于实体词与缩写词的对应关系,确定待翻译文本内包括的World Health Organization,及其对应的缩写WHO。In the disclosed embodiment, a correspondence between a target entity word and an abbreviation corresponding to the target entity word is created. When determining the target entity word and the abbreviation corresponding to the target entity word included in the text to be translated, the correspondence is used to determine the target entity word included in the text to be translated, and the abbreviation corresponding to the target entity word is determined. For example, creating a correspondence between a target entity word and an abbreviation corresponding to the target entity word can be to create a table representing the correspondence, through which the target entity word and the abbreviation corresponding to the target entity word can be obtained. The correspondence can include the full English name of a personal computer, Personal Computer, and the abbreviation PC corresponding to Personal Computer. It also includes the full English name of the World Health Organization, World Health Organization, and the abbreviation WHO corresponding to World Health Organization. For the text to be translated, "The World Health Organization is a specialized agency of the United Nations responsible for international public health. The WHO Constitution, which establishes the agency's governing structure and principles, states its main objective as "the attainment by all peoples of the highest possible level of health", the World Health Organization and its corresponding abbreviation WHO included in the text to be translated are determined based on the correspondence between entity words and abbreviations.

根据本公开实施例,基于实体词与缩写词的对应关系,确定待翻译文本内包括的目标实体词以及与目标实体词对应的缩写词,能够确保对多种场景下、多种类别的目标实体词以及与目标实体词对应的缩写词的有效识别,确保文本中目标实体词及其缩写词翻译的一致性,提高文本翻译质量。According to the embodiments of the present disclosure, based on the correspondence between entity words and abbreviations, the target entity words and the abbreviations corresponding to the target entity words included in the text to be translated are determined, which can ensure the effective recognition of target entity words and abbreviations corresponding to the target entity words in various scenarios and categories, ensure the consistency of the translation of the target entity words and their abbreviations in the text, and improve the quality of text translation.

在本公开实施例中,目标实体词可以是人名,例如待翻译文本为英文文本时,对英文文本进行英译中翻译中时,文本中包括的中国人名,外国人名等。可以理解地目标实体词还可以地名、机构、专有名词、术语或组织等名称。In the disclosed embodiment, the target entity word may be a person's name, for example, when the text to be translated is an English text, when the English text is translated from English to Chinese, the Chinese name, foreign name, etc. included in the text. It can be understood that the target entity word may also be a place name, institution, proper noun, term or organization name.

上述表格示出了应用本公开实施例的文本翻译进行文本翻译的流程,待翻译文本为英文文本,包括两个语句,“A year later,Dr.Chen Wei was ordered to help thecountry build a subway”和“Her husband said when they were told a frame ofChen was aired on a night TV show,he was surprised”。识别待翻译文本中包括的目标实体词为“Chen Wei”,“Chen Wei”对应的缩写词为“Chen”,在待翻译文本中识别出“ChenWei”和“Chen”。将待翻译文本中的缩写词“Chen”,全部替换为缩写词“Chen”对应的目标实体词“Chen Wei”,得到待翻译文本对应的第一文本。可知,上述待翻译文本对应的第一文本为“A year later, Dr.Chen Wei was ordered to help the country build asubway.Her husband said when they were told a frame of Chen Wei was aired ona night TV show,he was surprised”。将第一文本中的两个目标实体词“Chen Wei”,以同一替换符“$tag”进行替换,得到待翻译文本对应的第二文本。即第二文本为“A yearlater,Dr.$tag was ordered to help the country build a subway.Her husband saidwhen they were told a frame of$tag was aired on a night TV show,he wassurprised”。对第二文本中除替换符以外的其他文本进行英译中翻译,得到第一翻译结果,即第一翻译结果为“一年后,$tag博士奉命帮助这个国家建设地铁。她的丈夫说,当他们被告知$tag的一帧画面在一个夜间电视节目中播出时,他感到很惊讶”。对目标实体词“ChenWei”进行翻译,得到目标实体词的翻译结果“陈伟”。将第一翻译结果中的替换符替换为目标实体词的翻译结果,得到待翻译文本的最终翻译结果。待翻译文本对应的翻译结果为“一年后,陈伟博士奉命帮助这个国家建设地铁。她的丈夫说,当他们被告知陈伟的一帧画面在一个夜间电视节目中播出时,他感到很惊讶”,确保文本中目标实体词以及与目标实体词对应的缩写词翻译的一致性。The above table shows the process of text translation using the text translation of the embodiment of the present disclosure. The text to be translated is an English text, including two sentences, "A year later, Dr. Chen Wei was ordered to help the country build a subway" and "Her husband said when they were told a frame of Chen was aired on a night TV show, he was surprised". The target entity word included in the text to be translated is identified as "Chen Wei", and the abbreviation corresponding to "Chen Wei" is "Chen". "Chen Wei" and "Chen" are identified in the text to be translated. The abbreviation "Chen" in the text to be translated is completely replaced with the target entity word "Chen Wei" corresponding to the abbreviation "Chen", and the first text corresponding to the text to be translated is obtained. It can be seen that the first text corresponding to the above text to be translated is "A year later, Dr. Chen Wei was ordered to help the country build a subway. Her husband said when they were told a frame of Chen Wei was aired on a night TV show, he was surprised". The two target entity words "Chen Wei" in the first text are replaced with the same replacement symbol "$tag" to obtain the second text corresponding to the text to be translated. That is, the second text is "A year later, Dr. $tag was ordered to help the country build a subway. Her husband said when they were told a frame of $tag was aired on a night TV show, he was surprised". The other texts in the second text except the replacement symbol are translated from English to Chinese to obtain the first translation result, that is, the first translation result is "A year later, Dr. $tag was ordered to help the country build a subway. Her husband said when they were told a frame of $tag was aired on a night TV show, he was surprised". The target entity word "Chen Wei" is translated to obtain the translation result of the target entity word "Chen Wei". The replacement symbol in the first translation result is replaced with the translation result of the target entity word to obtain the final translation result of the text to be translated. The corresponding translation result of the text to be translated is "A year later, Dr. Chen Wei was ordered to help the country build a subway. Her husband said he was surprised when they were told that a frame of Chen Wei was broadcast on a nightly TV program", ensuring the consistency of the translation of the target entity words in the text and the abbreviations corresponding to the target entity words.

根据本公开实施例,通过将第一文本中的目标实体词,以同一替换符进行替换,得到待翻译文本对应的第二文本,对第二文本中除所述替换符以外的其他文本进行翻译,得到第一翻译结果,并对目标实体词进行翻译,得到目标实体词的翻译结果;将第一翻译结果中的替换符替换为目标实体词的翻译结果,得到待翻译文本的最终翻译结果,通过缩写词的替换能够使待翻译文本中包括的目标实体词以及与目标实体词对应的缩写词具有相同的翻译结果,确保文本中目标实体词及其缩写词翻译的一致性。According to an embodiment of the present disclosure, a target entity word in a first text is replaced with the same replacement symbol to obtain a second text corresponding to the text to be translated, other texts in the second text except the replacement symbol are translated to obtain a first translation result, and the target entity word is translated to obtain a translation result of the target entity word; the replacement symbol in the first translation result is replaced with the translation result of the target entity word to obtain a final translation result of the text to be translated, and the replacement of abbreviations can make the target entity words included in the text to be translated and the abbreviations corresponding to the target entity words have the same translation results, thereby ensuring the consistency of the translation of the target entity words and their abbreviations in the text.

图4是根据本公开一示例性实施例示出的一种文本翻译装置框图,如图4所示,文本翻译装置100包括:获取模块101、识别模块102以及确定模块103。FIG. 4 is a block diagram of a text translation apparatus according to an exemplary embodiment of the present disclosure. As shown in FIG. 4 , the text translation apparatus 100 includes: an acquisition module 101 , a recognition module 102 , and a determination module 103 .

获取模块101,用于获取待翻译文本。The acquisition module 101 is used to acquire the text to be translated.

识别模块102,用于识别待翻译文本中包括的目标实体词、以及与目标实体词对应的缩写词。The recognition module 102 is used to recognize target entity words included in the text to be translated and abbreviations corresponding to the target entity words.

确定模块103,用于将缩写词,全部替换为目标实体词,得到待翻译文本对应的第一文本,并基于第一文本,确定待翻译文本的翻译结果。The determination module 103 is used to replace all abbreviations with target entity words to obtain a first text corresponding to the text to be translated, and determine a translation result of the text to be translated based on the first text.

在一些实施例中,确定模块103采用如下方式基于第一文本,确定待翻译文本的翻译结果:In some embodiments, the determination module 103 determines the translation result of the text to be translated based on the first text in the following manner:

将第一文本中的目标实体词,以同一替换符进行替换,得到待翻译文本对应的第二文本;The target entity word in the first text is replaced with the same replacement symbol to obtain a second text corresponding to the text to be translated;

对第二文本中除替换符以外的其他文本进行翻译,得到第一翻译结果,并对目标实体词进行翻译,得到目标实体词的翻译结果;Translate the other texts in the second text except the replacement symbol to obtain a first translation result, and translate the target entity word to obtain a translation result of the target entity word;

将第一翻译结果中的替换符替换为目标实体词的翻译结果,得到待翻译文本的最终翻译结果。The replacement symbol in the first translation result is replaced with the translation result of the target entity word to obtain a final translation result of the text to be translated.

在一些实施例中,识别模块102采用如下方式识别待翻译文本中包括的目标实体词、以及与目标实体词对应的缩写词:In some embodiments, the recognition module 102 recognizes the target entity words and the abbreviations corresponding to the target entity words included in the text to be translated in the following manner:

确定用于识别目标实体词以及与目标实体词对应的缩写词的规则;determining a rule for identifying a target entity word and an abbreviation corresponding to the target entity word;

基于规则,识别待翻译文本内包括的目标实体词以及与目标实体词对应的缩写词。Based on the rules, target entity words and abbreviations corresponding to the target entity words included in the text to be translated are identified.

在一些实施例中,识别模块102采用如下方式识别待翻译文本中包括的目标实体词、以及与目标实体词对应的缩写词:In some embodiments, the recognition module 102 recognizes the target entity words and the abbreviations corresponding to the target entity words included in the text to be translated in the following manner:

基于指代消解模型,确定待翻译文本内包括的目标实体词以及与目标实体词对应的缩写词,和/或基于实体词与缩写词的对应关系,确定待翻译文本内包括的目标实体词以及与目标实体词对应的缩写词。Based on the reference resolution model, the target entity words included in the text to be translated and the abbreviations corresponding to the target entity words are determined, and/or based on the correspondence between the entity words and the abbreviations, the target entity words included in the text to be translated and the abbreviations corresponding to the target entity words are determined.

在一些实施例中,目标实体词包括人名,所述人名包括第一类型人名和第二类型人名,人名包括第一部分和第二部分;识别模块102采用如下方式识别待翻译文本中包括的目标实体词、以及与目标实体词对应的缩写词:基于用于确定人名的正则表达式,确定待翻译文本中包括的人名;若人名为第一类型人名,且待翻译文本中存在第一类型人名的第一部分,确定识别到第一类型人名对应的缩写词;若人名为第二类型人名,且待翻译文本中存在第二类型人名的第二部分,确定识别到第二类型人名对应的缩写词。In some embodiments, the target entity words include names, and the names include first-type names and second-type names, and the names include a first part and a second part; the recognition module 102 recognizes the target entity words included in the text to be translated, and the abbreviations corresponding to the target entity words in the following manner: based on a regular expression for determining names, the names included in the text to be translated are determined; if the name is a first-type name, and the first part of the first-type name exists in the text to be translated, it is determined that the abbreviation corresponding to the first-type name is recognized; if the name is a second-type name, and the second part of the second-type name exists in the text to be translated, it is determined that the abbreviation corresponding to the second-type name is recognized.

关于上述实施例中的装置,其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述,此处将不做详细阐述说明。Regarding the device in the above embodiment, the specific manner in which each module performs operations has been described in detail in the embodiment of the method, and will not be elaborated here.

图5是根据本公开一示例性实施例示出的一种用于文本翻译的装置200的框图。例如,装置200可以是移动电话,计算机,数字广播终端,消息收发设备,游戏控制台,平板设备,医疗设备,健身设备,个人数字助理等。5 is a block diagram of a device 200 for text translation according to an exemplary embodiment of the present disclosure. For example, the device 200 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, etc.

参照图5,装置200可以包括以下一个或多个组件:处理组件202,存储器204,电力组件206,多媒体组件208,音频组件210,输入/输出(I/O)的接口212,传感器组件 214,以及通信组件216。5 , the device 200 may include one or more of the following components: a processing component 202 , a memory 204 , a power component 206 , a multimedia component 208 , an audio component 210 , an input/output (I/O) interface 212 , a sensor component 214 , and a communication component 216 .

处理组件202通常控制装置200的整体操作,诸如与显示,电话呼叫,数据通信,相机操作和记录操作相关联的操作。处理组件202可以包括一个或多个处理器220来执行指令,以完成上述的方法的全部或部分步骤。此外,处理组件202可以包括一个或多个模块,便于处理组件202和其他组件之间的交互。例如,处理组件202可以包括多媒体模块,以方便多媒体组件208和处理组件202之间的交互。The processing component 202 generally controls the overall operation of the device 200, such as operations associated with display, phone calls, data communications, camera operations, and recording operations. The processing component 202 may include one or more processors 220 to execute instructions to perform all or part of the steps of the above-described method. In addition, the processing component 202 may include one or more modules to facilitate interaction between the processing component 202 and other components. For example, the processing component 202 may include a multimedia module to facilitate interaction between the multimedia component 208 and the processing component 202.

存储器204被配置为存储各种类型的数据以支持在装置200的操作。这些数据的示例包括用于在装置200上操作的任何应用程序或方法的指令,联系人数据,电话簿数据,消息,图片,视频等。存储器204可以由任何类型的易失性或非易失性存储设备或者它们的组合实现,如静态随机存取存储器(SRAM),电可擦除可编程只读存储器(EEPROM),可擦除可编程只读存储器(EPROM),可编程只读存储器(PROM),只读存储器(ROM),磁存储器,快闪存储器,磁盘或光盘。The memory 204 is configured to store various types of data to support operations on the device 200. Examples of such data include instructions for any application or method operating on the device 200, contact data, phone book data, messages, pictures, videos, etc. The memory 204 can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic disk or optical disk.

电力组件206为装置200的各种组件提供电力。电力组件206可以包括电源管理系统,一个或多个电源,及其他与为装置200生成、管理和分配电力相关联的组件。The power component 206 provides power to the various components of the device 200. The power component 206 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the device 200.

多媒体组件208包括在所述装置200和用户之间的提供一个输出接口的屏幕。在一些实施例中,屏幕可以包括液晶显示器(LCD)和触摸面板(TP)。如果屏幕包括触摸面板,屏幕可以被实现为触摸屏,以接收来自用户的输入信号。触摸面板包括一个或多个触摸传感器以感测触摸、滑动和触摸面板上的手势。所述触摸传感器可以不仅感测触摸或滑动动作的边界,而且还检测与所述触摸或滑动操作相关的持续时间和压力。在一些实施例中,多媒体组件208包括一个前置摄像头和/或后置摄像头。当装置200处于操作模式,如拍摄模式或视频模式时,前置摄像头和/或后置摄像头可以接收外部的多媒体数据。每个前置摄像头和后置摄像头可以是一个固定的光学透镜系统或具有焦距和光学变焦能力。The multimedia component 208 includes a screen that provides an output interface between the device 200 and the user. In some embodiments, the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from the user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundaries of the touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 208 includes a front camera and/or a rear camera. When the device 200 is in an operating mode, such as a shooting mode or a video mode, the front camera and/or the rear camera may receive external multimedia data. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

音频组件210被配置为输出和/或输入音频信号。例如,音频组件210包括一个麦克风(MIC),当装置200处于操作模式,如呼叫模式、记录模式和语音识别模式时,麦克风被配置为接收外部音频信号。所接收的音频信号可以被进一步存储在存储器204或经由通信组件216发送。在一些实施例中,音频组件210还包括一个扬声器,用于输出音频信号。The audio component 210 is configured to output and/or input audio signals. For example, the audio component 210 includes a microphone (MIC), and when the device 200 is in an operation mode, such as a call mode, a recording mode, and a speech recognition mode, the microphone is configured to receive an external audio signal. The received audio signal can be further stored in the memory 204 or sent via the communication component 216. In some embodiments, the audio component 210 also includes a speaker for outputting audio signals.

I/O接口212为处理组件202和外围接口模块之间提供接口,上述外围接口模块可以是键盘,点击轮,按钮等。这些按钮可包括但不限于:主页按钮、音量按钮、启动按钮和锁定按钮。I/O interface 212 provides an interface between processing component 202 and peripheral interface modules, such as keyboards, click wheels, buttons, etc. These buttons may include but are not limited to: a home button, a volume button, a start button, and a lock button.

传感器组件214包括一个或多个传感器,用于为装置200提供各个方面的状态评估。例如,传感器组件214可以检测到装置200的打开/关闭状态,组件的相对定位,例如所述组件为装置200的显示器和小键盘,传感器组件214还可以检测装置200或装置200 一个组件的位置改变,用户与装置200接触的存在或不存在,装置200方位或加速/减速和装置200的温度变化。传感器组件214可以包括接近传感器,被配置用来在没有任何的物理接触时检测附近物体的存在。传感器组件214还可以包括光传感器,如CMOS或CCD 图像传感器,用于在成像应用中使用。在一些实施例中,该传感器组件214还可以包括加速度传感器,陀螺仪传感器,磁传感器,压力传感器或温度传感器。The sensor assembly 214 includes one or more sensors for providing various aspects of the status assessment of the device 200. For example, the sensor assembly 214 can detect the open/closed state of the device 200, the relative positioning of components, such as the display and keypad of the device 200, and the sensor assembly 214 can also detect the position change of the device 200 or a component of the device 200, the presence or absence of user contact with the device 200, the orientation or acceleration/deceleration of the device 200, and the temperature change of the device 200. The sensor assembly 214 can include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 214 can also include an optical sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 214 can also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

通信组件216被配置为便于装置200和其他设备之间有线或无线方式的通信。装置200可以接入基于通信标准的无线网络,如WiFi,2G或3G,或它们的组合。在一个示例性实施例中,通信组件216经由广播信道接收来自外部广播管理系统的广播信号或广播相关信息。在一个示例性实施例中,所述通信组件216还包括近场通信(NFC)模块,以促进短程通信。例如,在NFC模块可基于射频识别(RFID)技术,红外数据协会(IrDA) 技术,超宽带(UWB)技术,蓝牙(BT)技术和其他技术来实现。The communication component 216 is configured to facilitate wired or wireless communication between the device 200 and other devices. The device 200 can access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 216 receives a broadcast signal or broadcast-related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 216 also includes a near field communication (NFC) module to facilitate short-range communication. For example, the NFC module can be implemented based on radio frequency identification (RFID) technology, infrared data association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology and other technologies.

在示例性实施例中,装置200可以被一个或多个应用专用集成电路(ASIC)、数字信号处理器(DSP)、数字信号处理设备(DSPD)、可编程逻辑器件(PLD)、现场可编程门阵列(FPGA)、控制器、微控制器、微处理器或其他电子元件实现,用于执行上述方法。In an exemplary embodiment, the apparatus 200 may be implemented by one or more application-specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable gate arrays (FPGAs), controllers, microcontrollers, microprocessors or other electronic components to perform the above method.

在示例性实施例中,还提供了一种包括指令的非临时性计算机可读存储介质,例如包括指令的存储器204,上述指令可由装置200的处理器220执行以完成上述方法。例如,所述非临时性计算机可读存储介质可以是ROM、随机存取存储器(RAM)、CD-ROM、磁带、软盘和光数据存储设备等。In an exemplary embodiment, a non-transitory computer-readable storage medium including instructions is also provided, such as a memory 204 including instructions, and the instructions can be executed by the processor 220 of the device 200 to perform the above method. For example, the non-transitory computer-readable storage medium can be a ROM, a random access memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, etc.

可以理解的是,本公开中“多个”是指两个或两个以上,其它量词与之类似。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。字符“/”一般表示前后关联对象是一种“或”的关系。单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义。It is to be understood that in the present disclosure, "plurality" refers to two or more than two, and other quantifiers are similar. "And/or" describes the association relationship of associated objects, indicating that three relationships may exist. For example, A and/or B can represent: A exists alone, A and B exist at the same time, and B exists alone. The character "/" generally indicates that the associated objects before and after are in an "or" relationship. The singular forms "a", "the" and "the" are also intended to include plural forms, unless the context clearly indicates other meanings.

进一步可以理解的是,术语“第一”、“第二”等用于描述各种信息,但这些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开,并不表示特定的顺序或者重要程度。实际上,“第一”、“第二”等表述完全可以互换使用。例如,在不脱离本公开范围的情况下,第一信息也可以被称为第二信息,类似地,第二信息也可以被称为第一信息。It is further understood that the terms "first", "second", etc. are used to describe various information, but such information should not be limited to these terms. These terms are only used to distinguish the same type of information from each other, and do not indicate a specific order or degree of importance. In fact, the expressions "first", "second", etc. can be used interchangeably. For example, without departing from the scope of the present disclosure, the first information can also be referred to as the second information, and similarly, the second information can also be referred to as the first information.

进一步可以理解的是,除非有特殊说明,“连接”包括两者之间不存在其他构件的直接连接,也包括两者之间存在其他元件的间接连接。It can be further understood that, unless otherwise specified, “connection” includes a direct connection without other components between the two, and also includes an indirect connection with other components between the two.

进一步可以理解的是,本公开实施例中尽管在附图中以特定的顺序描述操作,但是不应将其理解为要求按照所示的特定顺序或是串行顺序来执行这些操作,或是要求执行全部所示的操作以得到期望的结果。在特定环境中,多任务和并行处理可能是有利的。It is further understood that, although the operations are described in a specific order in the drawings in the embodiments of the present disclosure, it should not be understood as requiring the operations to be performed in the specific order shown or in a serial order, or requiring the execution of all the operations shown to obtain the desired results. In certain environments, multitasking and parallel processing may be advantageous.

本领域技术人员在考虑说明书及实践这里公开的公开后,将容易想到本公开的其它实施方案。本申请旨在涵盖本公开的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本公开的真正范围和精神由下面的权利要求指出。Those skilled in the art will readily appreciate other embodiments of the present disclosure after considering the specification and practicing the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the present disclosure that follow the general principles of the present disclosure and include common knowledge or customary techniques in the art that are not disclosed in the present disclosure. The description and examples are intended to be exemplary only, and the true scope and spirit of the present disclosure are indicated by the following claims.

应当理解的是,本公开并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本公开的范围仅由所附的权利要求来限制。It should be understood that the present disclosure is not limited to the exact structures that have been described above and shown in the drawings, and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (8)

1. A text translation method, characterized in that the text translation method comprises:
Acquiring a text to be translated, and identifying target entity words and abbreviations corresponding to the target entity words included in the text to be translated;
the abbreviations are replaced by the target entity words completely, and a first text corresponding to the text to be translated is obtained;
determining a translation result of the text to be translated based on the first text;
The determining, based on the first text, a translation result of the text to be translated includes:
Replacing the target entity word in the first text with the same replacement symbol to obtain a second text corresponding to the text to be translated;
Translating other texts except the replacement symbol in the second text to obtain a first translation result, and translating the target entity word to obtain a translation result of the target entity word;
Replacing the replacement symbol in the first translation result with the translation result of the target entity word to obtain a final translation result of the text to be translated;
the target entity word and the abbreviation coexist in the same text and are used for referring to the same entity of the same object;
the target entity word comprises a person name, wherein the person name comprises a first type person name and a second type person name, and the person name comprises a first part and a second part;
the identifying the target entity word included in the text to be translated and the abbreviation corresponding to the target entity word includes:
Determining the name included in the text to be translated based on a regular expression for determining the name;
If the person name is the first type person name and the first part of the first type person name exists in the text to be translated, determining that the abbreviation corresponding to the first type person name is identified;
and if the person name is the second type person name and the second part of the second type person name exists in the text to be translated, determining that the abbreviation corresponding to the second type person name is identified.
2. The text translation method according to claim 1, wherein the identifying the target entity word included in the text to be translated and the abbreviation corresponding to the target entity word includes:
Determining rules for identifying the target entity word and abbreviations corresponding to the target entity word;
And identifying the target entity words and the abbreviations corresponding to the target entity words contained in the text to be translated based on the rules.
3. The text translation method according to claim 1, wherein the identifying the target entity word included in the text to be translated and the abbreviation corresponding to the target entity word includes:
determining the target entity words and abbreviations corresponding to the target entity words included in the text to be translated based on the reference resolution model, and/or
And determining the target entity words and the abbreviations corresponding to the target entity words included in the text to be translated based on the corresponding relation between the entity words and the abbreviations.
4. A text translation device, the text translation device comprising:
the acquisition module is used for acquiring the text to be translated;
the identification module is used for identifying target entity words contained in the text to be translated and abbreviations corresponding to the target entity words;
The determining module is used for replacing all the abbreviations with the target entity words to obtain a first text corresponding to the text to be translated, and determining a translation result of the text to be translated based on the first text;
the determination module determines a translation result of the text to be translated based on the first text in the following manner:
Replacing the target entity word in the first text with the same replacement symbol to obtain a second text corresponding to the text to be translated;
Translating other texts except the replacement symbol in the second text to obtain a first translation result, and translating the target entity word to obtain a translation result of the target entity word;
Replacing the replacement symbol in the first translation result with the translation result of the target entity word to obtain a final translation result of the text to be translated;
the target entity word and the abbreviation coexist in the same text and are used for referring to the same entity of the same object;
the target entity word comprises a person name, wherein the person name comprises a first type person name and a second type person name, and the person name comprises a first part and a second part;
In response to the text to be translated including the name, the recognition module recognizes a target entity word included in the text to be translated and an abbreviation corresponding to the target entity word in the following manner:
Determining the name included in the text to be translated based on a regular expression for determining the name;
If the person name is the first type person name and the first part of the first type person name exists in the text to be translated, determining that the abbreviation corresponding to the first type person name is identified;
and if the person name is the second type person name and the second part of the second type person name exists in the text to be translated, determining that the abbreviation corresponding to the second type person name is identified.
5. The text translation device according to claim 4, wherein the recognition module recognizes a target entity word included in the text to be translated and an abbreviation corresponding to the target entity word by:
Determining rules for identifying the target entity word and abbreviations corresponding to the target entity word;
And identifying the target entity words and the abbreviations corresponding to the target entity words contained in the text to be translated based on the rules.
6. The text translation device according to claim 4, wherein the recognition module recognizes a target entity word included in the text to be translated and an abbreviation corresponding to the target entity word by:
determining the target entity words and abbreviations corresponding to the target entity words included in the text to be translated based on the reference resolution model, and/or
And determining the target entity words and the abbreviations corresponding to the target entity words included in the text to be translated based on the corresponding relation between the entity words and the abbreviations.
7. A text translation device, comprising:
A processor;
A memory for storing processor-executable instructions;
Wherein the processor is configured to: a text translation method as claimed in any one of claims 1 to 3.
8. A non-transitory computer readable storage medium, characterized in that instructions in the storage medium, when executed by a processor of a mobile terminal, enable the mobile terminal to perform the text translation method of any one of claims 1 to 3.
CN202110226769.5A 2021-03-01 2021-03-01 Text translation method, text translation device and storage medium Active CN113239707B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110226769.5A CN113239707B (en) 2021-03-01 2021-03-01 Text translation method, text translation device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110226769.5A CN113239707B (en) 2021-03-01 2021-03-01 Text translation method, text translation device and storage medium

Publications (2)

Publication Number Publication Date
CN113239707A CN113239707A (en) 2021-08-10
CN113239707B true CN113239707B (en) 2024-11-05

Family

ID=77130293

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110226769.5A Active CN113239707B (en) 2021-03-01 2021-03-01 Text translation method, text translation device and storage medium

Country Status (1)

Country Link
CN (1) CN113239707B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115062631B (en) * 2022-05-23 2025-03-11 北京爱奇艺科技有限公司 Text translation method, device, electronic device and storage medium
CN116108862B (en) * 2023-04-07 2023-07-25 北京澜舟科技有限公司 Chapter-level machine translation model construction method, chapter-level machine translation model construction system and storage medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111523305A (en) * 2019-01-17 2020-08-11 阿里巴巴集团控股有限公司 Text error correction method, device and system

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8744835B2 (en) * 2001-03-16 2014-06-03 Meaningful Machines Llc Content conversion method and apparatus
KR102516364B1 (en) * 2018-02-12 2023-03-31 삼성전자주식회사 Machine translation method and apparatus
US10839164B1 (en) * 2018-10-01 2020-11-17 Iqvia Inc. Automated translation of clinical trial documents
CN111368531B (en) * 2020-03-09 2023-04-14 腾讯科技(深圳)有限公司 Translation text processing method and device, computer equipment and storage medium
CN112084796B (en) * 2020-09-15 2021-04-09 南京文图景信息科技有限公司 Multi-language place name root Chinese translation method based on Transformer deep learning model

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111523305A (en) * 2019-01-17 2020-08-11 阿里巴巴集团控股有限公司 Text error correction method, device and system

Also Published As

Publication number Publication date
CN113239707A (en) 2021-08-10

Similar Documents

Publication Publication Date Title
WO2020220636A1 (en) Text data enhancement method and apparatus, electronic device, and non-volatile computer-readable storage medium
CN111128183B (en) Speech recognition method, apparatus and medium
CN107564526B (en) Processing method, apparatus and machine-readable medium
CN109977390B (en) Method and device for generating text
CN113239707B (en) Text translation method, text translation device and storage medium
CN115688685A (en) Text processing method and device, electronic equipment and storage medium
KR102327790B1 (en) Information processing methods, devices and storage media
CN111898382A (en) A named entity identification method, device and device for named entity identification
CN113343720A (en) Subtitle translation method and device for subtitle translation
CN111832297B (en) Part-of-speech tagging method, device and computer-readable storage medium
US20230376699A1 (en) On-Device Real-Time Translation of Media Content on a Mobile Electronic Device
CN113919372A (en) Machine translation quality evaluation method, device and storage medium
CN113361287B (en) Translation method, device, equipment and medium
CN111161737A (en) Data processing method and device, electronic equipment and storage medium
CN115409200A (en) Database operation method, device and medium
CN111414731B (en) Text labeling method and device
CN112149432B (en) Chapter machine translation method, device, and storage medium
CN115099246A (en) Knowledge fusion method, device, equipment, medium and product based on machine translation
Eryilmaz et al. Machine vs. deep learning comparision for developing an international sign language translator
CN112926343B (en) Data processing method, device and electronic equipment
CN112650398B (en) Input method, device and medium
CN113221581B (en) Text translation method, device and storage medium
CN114065775B (en) Translation model training method, paragraph-level machine translation method and device
CN114153984B (en) Data enhancement method, device, electronic device and storage medium
Asrifan et al. Evolution of AI in Interpretation: From Traditional Approaches to Real-Time Solution

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载