CN115080693A

CN115080693A - Text processing methods, electronic devices, dialogue systems and automobiles

Info

Publication number: CN115080693A
Application number: CN202210724487.2A
Authority: CN
Inventors: 徐高鹏
Original assignee: Weilai Automobile Technology Anhui Co Ltd
Current assignee: Weilai Automobile Technology Anhui Co Ltd
Priority date: 2022-06-23
Filing date: 2022-06-23
Publication date: 2022-09-20
Anticipated expiration: 2042-06-23
Also published as: CN115080693B

Abstract

The invention relates to a text processing method, electronic equipment, a dialogue system and an automobile. The method according to one embodiment of the invention comprises the following steps: performing word embedding on a text to obtain a word embedding vector; obtaining a task identifier; inputting the word embedding vector and the task identifier into a trained conversion mark model to obtain a task type and a conversion mark, the conversion mark comprising a first conversion mark representing a part needing to be converted in the text and a second conversion mark representing a part not needing to be converted in the text; fusing the task type and the text part corresponding to the first conversion mark to obtain fusion information; and inputting the fusion information into a trained text conversion model, and performing corresponding conversion on the to-be-converted text part according to the task type to obtain a conversion result. According to the invention, the accuracy of two tasks of text standardization and inverse text standardization and inverse text standardization can be synchronously improved.

Description

Text processing methods, electronic devices, dialogue systems and automobiles

技术领域technical field

本发明涉及自然语言处理领域，尤其涉及一种文本处理方法、电子设备、对话系统和汽车。The invention relates to the field of natural language processing, in particular to a text processing method, an electronic device, a dialogue system and a car.

背景技术Background technique

自动驾驶中的智能数据座舱对话系统包含了语音识别、自然语言处理和语音合成等模块，这些模块使用的文本往往需要经过一些处理来方便建模和提高易读性，其中文本标准化(TN)与逆文本标准化(ITN)发挥了重要作用。文本标准化的作用是实现文本手写格式到口语格式转换，例如将一句汉语中的阿拉伯数字信息转换为汉字信息，主要用于语音合成的预处理过程。而逆文本标准化则是其逆过程，在获取到识别文本后，为了便于用户阅读，将汉字信息转为更易被用户读取的形式，主要用于语音识别后的处理阶段，例如对语音识别后的文本通过映射规则对照表将汉字转换为对应的数字信息。The intelligent data cockpit dialogue system in autonomous driving includes modules such as speech recognition, natural language processing, and speech synthesis. The text used by these modules often needs to undergo some processing to facilitate modeling and improve legibility. Among them, text normalization (TN) and Inverse Text Normalization (ITN) plays an important role. The function of text standardization is to realize the conversion from the handwritten format of the text to the spoken language format, such as converting the Arabic numeral information in a sentence of Chinese into Chinese character information, which is mainly used in the preprocessing process of speech synthesis. Inverse text standardization is its inverse process. After the recognized text is obtained, in order to facilitate the user to read, the Chinese character information is converted into a form that is easier for the user to read. It is mainly used in the processing stage after speech recognition. For example, after speech recognition The text of the Chinese character is converted into the corresponding digital information through the mapping rule comparison table.

文本标准化与逆文本标准化均已有比较成熟的方案，但是它们的目标只是解决这两项任务中的一项，无法实现同时解决这两项任务，这增加了对话系统的整体复杂度以及系统的维护成本。There are relatively mature solutions for text standardization and inverse text standardization, but their goal is only to solve one of these two tasks, and it is impossible to solve both tasks at the same time, which increases the overall complexity of the dialogue system and the system's performance. maintenance costs.

发明内容SUMMARY OF THE INVENTION

为了解决上述技术问题，本发明提出一种文本处理方法、电子设备、对话系统、存储介质以及汽车，以实现一体化地完成对文本信息进行逆文本标准化和文本标准化的处理。In order to solve the above technical problems, the present invention proposes a text processing method, electronic device, dialogue system, storage medium and automobile, so as to realize the integrated processing of inverse text standardization and text standardization for text information.

在第一方面，本发明提供一种文本处理方法，包括：In a first aspect, the present invention provides a text processing method, comprising:

将文本进行词嵌入得到词嵌入向量；Embedding the text to get the word embedding vector;

获取任务标识符；get task identifier;

将所述词嵌入向量和所述任务标识符输入已训练的转换标记模型，得到任务类型和转换标记，其中所述转换标记包括表征所述文本中需要转换部分的第一转换标记和表征所述文本中不需要转换部分的第二转换标记；Inputting the word embedding vector and the task identifier into a trained conversion tagging model to obtain a task type and conversion tag, wherein the conversion tag includes a first conversion tag that characterizes the part of the text that needs to be converted and a conversion tag that characterizes the A second conversion token for the part of the text that does not require conversion;

将所述任务类型以及所述第一转换标记对应的文本部分进行融合，得到融合信息；Fusing the task type and the text portion corresponding to the first conversion mark to obtain fusion information;

将所述融合信息输入已训练的文本转换模型，根据所述任务类型对所述待转换文本部分进行对应转换得到转换结果。Inputting the fusion information into a trained text conversion model, and correspondingly converting the to-be-converted text portion according to the task type to obtain a conversion result.

在一个具体实施方式中，所述获取任务标识符包括：In a specific embodiment, the obtaining the task identifier includes:

获取所述文本待应用的场景，其中所述待应用的场景包括需要文本标准化对应的第一场景和需要逆文本标准化对应的第二场景；Obtaining the scene to be applied to the text, wherein the scene to be applied includes a first scene corresponding to text standardization and a second scene corresponding to inverse text standardization;

根据待应用的场景确定所述任务标识符，其中所述任务标识符包括对应第一场景的第一标识符和对应第二场景的第二标识符。The task identifier is determined according to the scenario to be applied, wherein the task identifier includes a first identifier corresponding to the first scenario and a second identifier corresponding to the second scenario.

在一个具体实施方式中，所述将所述词嵌入向量和所述任务标识符输入已训练的转换标记模型，得到任务类型和转换标记，包括：In a specific embodiment, inputting the word embedding vector and the task identifier into a trained conversion tagging model to obtain the task type and conversion tag includes:

将所述词嵌入向量和所述任务标识符输入已训练的转换标记模型，得到标记序列，其中所述标记序列包括任务类型位和转换标记位；Inputting the word embedding vector and the task identifier into a trained conversion token model to obtain a token sequence, wherein the token sequence includes a task type bit and a transition token bit;

其中，在所述任务类型位上的任务类型包括与所述第一标识符对应的文本标准化任务以及与所述第二标识符对应的逆文本标准化任务；Wherein, the task type on the task type bit includes a text normalization task corresponding to the first identifier and an inverse text normalization task corresponding to the second identifier;

其中，在所述转换标记位上的转换标记包括所述第一转换标记和所述第二转换标记。Wherein, the conversion flag on the conversion flag bit includes the first conversion flag and the second conversion flag.

在一个具体实施方式中，所述将所述任务类型以及所述第一转换标记对应的文本部分进行融合，得到融合信息，包括：In a specific embodiment, the task type and the text part corresponding to the first conversion mark are fused to obtain fusion information, including:

将所述任务类型、所述第一转换标记对应的文本部分以及所述第一转换标记对应的文本的上下文进行融合，得到融合信息。The task type, the text portion corresponding to the first conversion mark, and the context of the text corresponding to the first conversion mark are fused to obtain fusion information.

所述将所述任务类型、所述第一转换标记对应的文本部分以及所述第一转换标记对应的文本的上下文进行融合，包括：The fusion of the task type, the text portion corresponding to the first conversion mark, and the context of the text corresponding to the first conversion mark includes:

将所述任务类型、所述第一转换标记对应的文本部分以及所述第一转换标记对应的文本的上下文进行拼接。The task type, the text portion corresponding to the first conversion mark, and the context of the text corresponding to the first conversion mark are spliced together.

在一个具体实施方式中，所述将所述融合信息输入已训练的文本转换模型，根据所述任务类型对所述待转换文本部分进行对应转换得到转换结果，包括：In a specific embodiment, inputting the fusion information into a trained text conversion model, and performing corresponding conversion on the part of the text to be converted according to the task type to obtain a conversion result, including:

若所述任务类型为文本标准化任务，所述文本转换模型对所述待转换文本部分进行文本标准化处理，得到文本标准化转换结果；If the task type is a text normalization task, the text conversion model performs text normalization processing on the to-be-converted text portion to obtain a text normalization conversion result;

若所述任务类型为逆文本标准化任务，所述文本转换模型对所述待转换文本部分进行逆文本标准化处理，得到逆文本标准化转换结果。If the task type is an inverse text normalization task, the text conversion model performs inverse text normalization processing on the to-be-converted text portion to obtain an inverse text normalization conversion result.

在一个具体实施方式中，所述方法还包括：In a specific embodiment, the method further includes:

将所述转换结果与所述文本中不需要转换部分进行拼接得到完整文本。The complete text is obtained by splicing the conversion result with the part of the text that does not need to be converted.

在一个具体实施方式中，所述已训练的转换标记模型由M层双向LSTM网络构成；所述已训练的文本转换模型由N个Transformer网络构成，其中M、N为自然数。In a specific embodiment, the trained conversion labeling model is composed of M layers of bidirectional LSTM networks; the trained text conversion model is composed of N Transformer networks, where M and N are natural numbers.

在一个具体实施方式中，所述方法还包括对所述转换标记模型和文本转换模型构成的Norm网络进行训练的步骤。In a specific embodiment, the method further includes the step of training a Norm network composed of the conversion labeling model and the text conversion model.

在一个具体实施方式中，In a specific embodiment,

使用损失函数为

进行训练，其中：Use the loss function as

to train, where:

所述Norm网络的输出条件概率为：The output conditional probability of the Norm network is:

其中，

in,

Norm(.)表示输出条件概率，sl表示序列的第I个元素，θ表示网络的参数，D表示训练中每批次输入的样本数量。Norm(.) represents the output conditional probability, sl represents the I-th element of the sequence, θ represents the parameters of the network, and D represents the number of input samples per batch in training.

在第二方面，本发明提供一种电子设备，包括处理器和存储器，所述存储器中存储有程序，所述程序被所述处理器执行时实现根据第一方面所述的方法。In a second aspect, the present invention provides an electronic device comprising a processor and a memory, the memory having a program stored therein, the program implementing the method according to the first aspect when executed by the processor.

在第三方面，本发明提供一种存储介质，所述存储器中存储有程序，所述程序被所述处理器执行时实现根据第一方面所述的方法。In a third aspect, the present invention provides a storage medium, in which a program is stored in the memory, and when the program is executed by the processor, the method according to the first aspect is implemented.

在第四方面，本发明提供一种车载对话系统，包括根据第二方面的电子设备以及输入输出装置。In a fourth aspect, the present invention provides an in-vehicle dialogue system including the electronic device according to the second aspect and an input and output device.

在第五方面，本发明提供一种汽车，包括根据第四方面所述的车载对话系统。In a fifth aspect, the present invention provides an automobile including the vehicle-mounted dialogue system according to the fourth aspect.

本发明上述一个或多个技术方案，至少具有如下一种或多种有益效果：The above-mentioned one or more technical solutions of the present invention have at least one or more of the following beneficial effects:

本发明通过在对文本进行处理时引入任务标识符，通过转换标记模型得到任务类型及根据任务类型获取的需要转换的文本部分，并将其融合作为文本转换模型的输入，文本转换模型根据任务类型进行对应的文本转换，可以实现同时支持文本标准化与逆文本标准化，不仅能简化总体智能对话系统的复杂度，而且由于使用同一文本转换模型，所以可以实现文本标准化与逆文本标准化两个任务的准确度同步提升。The present invention introduces the task identifier when processing the text, obtains the task type and the text part that needs to be converted obtained according to the task type by converting the mark model, and fuses them as the input of the text conversion model. The text conversion model is based on the task type. The corresponding text conversion can support both text standardization and inverse text standardization, which not only simplifies the complexity of the overall intelligent dialogue system, but also can achieve accurate text standardization and inverse text standardization due to the use of the same text conversion model. Synchronized increase.

附图说明Description of drawings

参照附图，本发明的公开内容将变得更容易理解。本领域技术人员容易理解的是：这些附图仅仅用于说明的目的，而并非意在对本发明的保护范围组成限制。此外，图中类似的数字用以表示类似的部件，其中：The disclosure of the present invention will become more easily understood with reference to the accompanying drawings. It can be easily understood by those skilled in the art that these drawings are only for the purpose of illustration, and are not intended to limit the protection scope of the present invention. In addition, like numerals in the figures are used to designate like parts, where:

图1是根据本发明的一个实施例的文本处理方法流程示意图；1 is a schematic flowchart of a text processing method according to an embodiment of the present invention;

图2是根据本发明的一个实施例的转换标记模型结构示意图；2 is a schematic structural diagram of a conversion markup model according to an embodiment of the present invention;

图3是根据本发明的一个实施例的文本转换模型结构示意图；3 is a schematic structural diagram of a text conversion model according to an embodiment of the present invention;

图4是根据本发明的一个实施例的电子设备的结构示意图。FIG. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

具体实施方式Detailed ways

下面参照附图来描述本发明的一些实施方式。本领域技术人员应当理解的是，这些实施方式仅仅用于解释本发明的技术原理，并非旨在限制本发明的保护范围。Some embodiments of the present invention are described below with reference to the accompanying drawings. It should be understood by those skilled in the art that these embodiments are only used to explain the technical principle of the present invention, and are not intended to limit the protection scope of the present invention.

在本发明的描述中，“模块”、“处理器”可以包括硬件、软件或者两者的组合。一个模块可以包括硬件电路，各种合适的感应器，通信端口，存储器，也可以包括软件部分，比如程序代码，也可以是软件和硬件的组合。处理器可以是中央处理器、微处理器、图像处理器、数字信号处理器或者其他任何合适的处理器。处理器具有数据和/或信号处理功能。处理器可以以软件方式实现、硬件方式实现或者二者结合方式实现。非暂时性的计算机可读存储介质包括任何合适的可存储程序代码的介质，比如磁碟、硬盘、光碟、闪存、只读存储器、随机存取存储器等等。In the description of the present invention, "module" and "processor" may include hardware, software or a combination of both. A module may include hardware circuits, various suitable sensors, communication ports, memory, and may also include software parts, such as program codes, or a combination of software and hardware. The processor may be a central processing unit, a microprocessor, an image processor, a digital signal processor, or any other suitable processor. The processor has data and/or signal processing functions. The processor may be implemented in software, hardware, or a combination of the two. Non-transitory computer-readable storage media include any suitable media that can store program code, such as magnetic disks, hard disks, optical disks, flash memory, read-only memory, random-access memory, and the like.

参阅附图1，图1是根据本发明的一个实施例的文本处理方法的流程示意图，包括以下步骤：Referring to FIG. 1, FIG. 1 is a schematic flowchart of a text processing method according to an embodiment of the present invention, including the following steps:

S10：将文本进行词嵌入得到词嵌入向量。S10: Perform word embedding on the text to obtain a word embedding vector.

将所述文本进行词嵌入得到嵌入向量X＝(x₁，...，x_L)，其中L表示句子长度。在一个具体实施例中，例如从智能座舱对话系统的语音识别模块得到的文本为“拨打电话一三八五五二零六六九六”，则L＝15。进一步，假设嵌入维度n为100，所述嵌入向量中每个xi(1≤i≤L)为100维，最终构成17*100维的矩阵。在另一个具体实施例中，例如从智能座舱对话系统的语音识别模块得到的文本为“此行程大约215.6km”，则L＝12。Perform word embedding on the text to obtain an embedding vector X=(x ₁ , . . . , x _L ), where L represents the sentence length. In a specific embodiment, for example, the text obtained from the speech recognition module of the intelligent cockpit dialogue system is "call 1385526696", then L=15. Further, assuming that the embedding dimension n is 100, each xi (1≤i≤L) in the embedding vector is 100-dimensional, and finally a 17*100-dimensional matrix is formed. In another specific embodiment, for example, the text obtained from the speech recognition module of the intelligent cockpit dialogue system is "this journey is about 215.6km", then L=12.

当然，本领域技术人员能够理解，本发明所使用的文本不限于从智能座舱对话系统中获得，这仅仅为一个示例。Of course, those skilled in the art can understand that the text used in the present invention is not limited to being obtained from the intelligent cockpit dialogue system, which is just an example.

S20：获取任务标识符。S20: Obtain the task identifier.

在一个具体实施例中，例如用任务标识符T为0表示文本标准化任务，任务标识符T为1表示逆文本标准化任务。In a specific embodiment, for example, a task identifier T of 0 represents a text normalization task, and a task identifier T of 1 represents an inverse text normalization task.

本领域技术人员能够理解，在替换方案中，可以用0表示逆文本标准化任务，1表示文本标准化任务，本发明不做限定。Those skilled in the art can understand that, in an alternative solution, 0 can be used to represent the inverse text normalization task, and 1 can be used to represent the text normalization task, which is not limited in the present invention.

在一个具体实施例中，步骤S20包括以下步骤：In a specific embodiment, step S20 includes the following steps:

具体地，如果利用本发明的方法的文本处理是用于语音识别后处理场景，例如用于将文字转换后呈现在智能驾驶的人机交互界面上，则该场景对应逆文本标准化，例如上述示例中的文字“拨打电话一三八五五二零六六九六”，转换后为“拨打电话13855026696”呈现在人机交互界面上以方便用户阅读。又例如，如果利用本发明的方法的文本处理是用于语音合成预处理场景，即将文字转换后语音播报，则该场景对应文本标准化，例如上述示例中的文字“此行程大约215.6km”，转换后为“此行程大约二百一十五点六千米”以方便转换为语音进行播报。Specifically, if the text processing using the method of the present invention is used for speech recognition post-processing scenarios, for example, for converting text and presenting it on the human-machine interface of intelligent driving, the scenario corresponds to inverse text standardization, such as the above example The text "call 1385526696" in the text is converted to "call 13855026696" and is displayed on the human-computer interface for the convenience of users to read. For another example, if the text processing using the method of the present invention is used for a speech synthesis preprocessing scenario, that is, the speech is broadcast after the text is converted, then the text corresponding to the scenario is standardized, for example, the text in the above example "This trip is about 215.6km", the conversion Afterwards, "this trip is about 215.6 kilometers" to facilitate conversion into voice for broadcast.

在上面的两个示例中，对应第一个场景的任务标识符为1，对应第二个场景的任务标识符为0。In the above two examples, the task identifier corresponding to the first scenario is 1, and the task identifier corresponding to the second scenario is 0.

S30：将所述词嵌入向量和所述任务标识符输入已训练的转换标记模型，得到任务类型task和转换标记sig，其中所述转换标记包括表征所述文本中需要转换部分的第一转换标记和表征所述文本中不需要转换部分的第二转换标记。S30: Input the word embedding vector and the task identifier into a trained conversion tagging model to obtain a task type task and a conversion tag sig, wherein the conversion tag includes a first conversion tag representing the part of the text that needs to be converted and a second conversion token characterizing the portion of the text that does not require conversion.

在一个具体实施例中，为了便于运算，将任务标识符转换为所述词嵌入向量中每个词相同维度的向量，例如在上述示例中为100维。In a specific embodiment, in order to facilitate the operation, the task identifier is converted into a vector of the same dimension of each word in the word embedding vector, for example, 100 dimensions in the above example.

将维度对齐的任务标识符和嵌入向量X＝(x₁，...，x_L)构成矩阵，输入训练好的转换标记模型。例如在上述文本为“拨打电话一三八五五二零六六九六”的示例中，将嵌入向量X＝(x1，…，x17)和T＝1的任务标识符构成矩阵输入训练好的转换标记模型。The dimension-aligned task identifiers and embedding vectors X ₌ (x ₁ , . For example, in the above example where the text is "call 1385526696", the task identifiers of the embedding vector X=(x1,...,x17) and T=1 are formed into a matrix and input to the trained Convert the markup model.

在一个具体实施例中，转换标记模型采用自然语言处理领域常用的LSTM模型。LSTM模型特别适合处理输入序列为时间问题的分类数据。更优选地，本发明采用如图2所示的双向LSTM模型，双向LSTM在输入序列上训练的模型是两个而不是一个LSTM。输入序列中的第一个是原始样本，第二个是输入序列的反向样本。从图中可见，正向层和反向层共同连接输出层，其中包含了6个共享权值w1-w6，这可以为网络提供额外的上下文，并且可以更快，更全面地学习该问题。更优选地，为了更全面地学习该问题，本发明的转换标记模型可以采用M层(例如2层)双向LSTM网络，第一层的双向LSTM网络的输出层连接第2层的双向LSTM网络的输入，依次类推。In a specific embodiment, the conversion markup model adopts the LSTM model commonly used in the field of natural language processing. LSTM models are particularly suitable for handling categorical data where the input sequence is a temporal problem. More preferably, the present invention adopts the bidirectional LSTM model as shown in Fig. 2, and the bidirectional LSTM models trained on the input sequence are two LSTMs instead of one. The first in the input sequence is the original sample and the second is the reversed sample of the input sequence. As can be seen from the figure, the forward layer and the reverse layer are jointly connected to the output layer, which contains 6 shared weights w1-w6, which can provide additional context for the network and can learn the problem faster and more comprehensively. More preferably, in order to learn the problem more comprehensively, the transformation labeling model of the present invention can adopt M-layer (eg, 2-layer) bidirectional LSTM network, and the output layer of the bidirectional LSTM network of the first layer is connected to the output layer of the bidirectional LSTM network of the second layer. input, and so on.

在一个具体实施例中，转换标记模型输出标记序列，其中所述标记序列包括任务类型位和转换标记位。其中，在所述任务类型位上的任务类型包括与所述第一标识符对应的文本标准化任务以及与所述第二标识符对应的逆文本标准化任务；在所述转换标记位上的转换标记包括所述第一转换标记和所述第二转换标记。In a specific embodiment, the transition token model outputs a token sequence, wherein the token sequence includes task type bits and transition token bits. Wherein, the task type on the task type bit includes a text normalization task corresponding to the first identifier and an inverse text normalization task corresponding to the second identifier; the conversion mark on the conversion mark bit The first conversion mark and the second conversion mark are included.

例如对于文本为“拨打电话一三八五五二零六六九六”和T＝1的示例，得到标记序列：“ITNSSSSBEEEEEEEEEE”，其中标记序列的开头为任务类型位，在该示例中是对应于T＝1的任务类型ITN。然而，本领域技术人员能够理解，在标记序列的开头还是其他位置定位任务类型位可以根据需要设定，本发明对此不做限制。对于ITN任务，该文本中需要转换的文本部分为“一三八五五二零六六九六”，转换标记以B作为开始标记，以E作为结束标记，中间部分以E表示。然而，本领域技术人员能够理解，以其他字母或形式表示转换标记都是可以的，本发明对此不做限定。另一方面，对于ITN任务，该文本中不需要转换的文本部分为“拨打电话”，转换标记以S表示。然而，本领域技术人员以其他解，以其它字母或形式表示转换标记都是可以的，本发明对此不做限定。For example, for the example with the text "Call 1385526696" and T=1, the token sequence is obtained: "ITNSSSSBEEEEEEEEEE", where the start of the token sequence is the task type bit, which in this example is the corresponding For T=1 task type ITN. However, those skilled in the art can understand that whether to locate the task type bit at the beginning of the marker sequence or at other positions can be set as required, which is not limited in the present invention. For the ITN task, the part of the text that needs to be converted is "1385526696", the conversion marker is B as the start marker, E as the end marker, and the middle part is represented by E. However, those skilled in the art can understand that other letters or forms can be used to represent the conversion mark, which is not limited in the present invention. On the other hand, for the ITN task, the part of the text that does not need to be converted is "make a call", and the conversion mark is represented by S. However, those skilled in the art can express the conversion mark in other letters or forms with other solutions, which is not limited in the present invention.

在另一个示例中，例如对于文本为“此行程大约215.6km”和T＝0的示例，得到标记序列：“TNSSSSSBEEEEEE”，其中标记序列的开头为任务类型位，在该示例中是对应于T＝0的任务类型TN。然而，本领域技术人员能够理解，在标记序列的开头还是其它位置定位任务类型位可以根据需要设定，本发明对此不做限制。对于TN任务，该文本中需要转换的文本部分为“215.6km”，转换标记以B作为开始标记，以E作为结束标记，中间部分以E表示。然而，本领域技术人员能够理解，以其他字母或形式表示转换标记都是可以的，本发明对此不做限定。另一方面，对于TN任务，该文本中不需要转换的文本部分为“此行程大约”，转换标记以S表示。然而，本领域技术人员能够理解，以其他字母或形式表示转换标记都是可以的，本发明对此不做限定。In another example, such as for the example where the text is "This trip is about 215.6km" and T=0, a sequence of tokens is obtained: "TNSSSSSBEEEEEE", where the sequence of tokens begins with a task type bit, which in this example corresponds to T A task type TN of =0. However, those skilled in the art can understand that whether to locate the task type bit at the beginning of the marker sequence or other positions can be set as required, which is not limited in the present invention. For the TN task, the part of the text that needs to be converted is "215.6km", the conversion marker is B as the start marker, E as the end marker, and the middle part is represented by E. However, those skilled in the art can understand that other letters or forms can be used to represent the conversion mark, which is not limited in the present invention. On the other hand, for the TN task, the part of the text that does not need to be converted is "this trip is about", and the conversion mark is denoted by S. However, those skilled in the art can understand that other letters or forms can be used to represent the conversion mark, which is not limited in the present invention.

本领域技术人员能够理解，对于文本标准化任务以及逆文本标准化任务，文本中哪些部分需要转换是可以预先设定为规则，本发明的方法根据该规则进行识别。Those skilled in the art can understand that for the text normalization task and the inverse text normalization task, which parts of the text need to be converted can be preset as rules, and the method of the present invention identifies according to the rules.

S40：将所述任务类型以及所述第一转换标记对应的文本部分进行融合，得到融合信息。S40: Fusion of the task type and the text part corresponding to the first conversion mark to obtain fusion information.

根据第一转换标记将对应的文本部分提取出来与任务类型进行融合。例如对于文本为“拨打电话一三八五五二零六六九六”和T＝1的示例，需要将需要转换的文本部分“一三八五五二零六六九六”对应的向量和ITN对应的向量进行融合(例如拼接)。According to the first conversion mark, the corresponding text part is extracted and fused with the task type. For example, for the example where the text is "call 1385526696" and T=1, the vector corresponding to the text part "1385526696" that needs to be converted needs to be summed The vectors corresponding to the ITN are fused (eg, concatenated).

优选地，为了实现更好的转换效果，本发明还将任务类型、需要转换的文本部分以及需要转换的文本部分的上下文进行拼接。在一个具体示例中，例如对需要转换文本部分前后n个词作为上下文一并进行融合，n可以根据需要设定，例如取2。例如，对于文本为“拨打电话一三八五五二零六六九六”，作为需要转换的文本部分“一三八五五二零六六九六”的2个前文为“电话”，而后文无，则仅取前文作为本示例中的上下文。Preferably, in order to achieve a better conversion effect, the present invention also concatenates the task type, the text part to be converted, and the context of the text part to be converted. In a specific example, for example, n words before and after the text part to be converted are fused together as context, and n can be set as required, for example, 2. For example, for the text "Call 1385526696", as the text part "1385526696" that needs to be converted, the 2 preceding texts are "Telephone", and then If there is no text, only the preceding text is taken as the context in this example.

S50：将所述融合信息输入已训练的文本转换模型，根据所述任务类型对所述待转换文本部分进行对应转换得到转换结果。S50: Input the fusion information into a trained text conversion model, and perform corresponding conversion on the text part to be converted according to the task type to obtain a conversion result.

若所述任务类型为文本标准化任务，所述文本转换模型对所述待转换文本部分进行文本标准化处理，得到文本标准化转换结果。If the task type is a text normalization task, the text conversion model performs text normalization processing on the to-be-converted text portion to obtain a text normalization conversion result.

例如对于文本为“拨打电话一三八五五二零六六九六”和T＝1的示例，经过转换后输出“13855206696”。又例如对于文本为“此行程大约215.6km”和T＝0的示例，经过转换后输出“二百一十五点六千米”。For example, for the example with the text "Call 13855206696" and T=1, "13855206696" is output after conversion. For another example, for the example in which the text is "this trip is about 215.6 km" and T=0, after conversion, "215.6 km" is output.

在一个具体实施例中，本发明的文本转换模型采用的Transformer网络，如图3所示，Transformer网络由编码器(Encoder)-解码器(Decoder)组成，每个块包含注意力机制。更优选地，为了提高转换的效果，可以采用多个Transformer网络，在前的网络的输出作为在后网络的输入。In a specific embodiment, the Transformer network adopted by the text conversion model of the present invention, as shown in FIG. 3 , the Transformer network is composed of an encoder (Encoder)-decoder (Decoder), and each block includes an attention mechanism. More preferably, in order to improve the effect of transformation, multiple Transformer networks can be used, and the output of the previous network is used as the input of the latter network.

在一个具体实施例中，为了最终应用，将所述转换结果与所述文本中不需要转换部分进行最终拼接得到完整文本，例如“拨打电话13855026696”、“此行程大约二百一十五点六千米”。In a specific embodiment, for final application, a complete text is obtained by final splicing the conversion result and the part that does not need to be converted in the text, such as "call 13855026696", "this trip is about 215.6" km".

本领领域技术人员能够理解，为了应用上述转换标记模型和文本转换模型，需要进行训练。在本发明的一个具体实施例中，转换标记模型和文本转换模型构成Norm网络，利用训练样本对其进行训练。Those skilled in the art can understand that in order to apply the above-mentioned conversion labeling model and text conversion model, training needs to be performed. In a specific embodiment of the present invention, the conversion labeling model and the text conversion model constitute a Norm network, which is trained using training samples.

在一个具体实施例中，训练使用如下损失函数为

进行训练，In a specific embodiment, training uses the following loss function as

to train,

其中，

in,

例如总的训练样本为1000条文本，每批次输入样本数量为100，分10次输入对模型进行训练。For example, the total training samples are 1000 texts, the number of input samples in each batch is 100, and the model is trained in 10 inputs.

具体的，可以采用反向传播方式，待损失函数达到预定阈值时结束训练。之后，用测试样本进行测试。Specifically, a backpropagation method can be used, and the training ends when the loss function reaches a predetermined threshold. After that, test with test samples.

需要指出的是，尽管上述实施例中将各个步骤按照特定的先后顺序进行了描述，但是本领域技术人员可以理解，为了实现本发明的效果，不同的步骤之间并非必须按照这样的顺序执行，其可以同时(并行)执行或以其他顺序执行，这些变化都在本发明的保护范围之内。It should be pointed out that, although the steps in the above embodiments are described in a specific sequence, those skilled in the art can understand that in order to achieve the effect of the present invention, different steps do not necessarily need to be executed in such an order. It may be performed simultaneously (in parallel) or in other sequences, and these variations are within the scope of the present invention.

特别地，根据本发明的实施例，上文参考流程图描述的过程可以被实现为计算机软件程序。例如，本发明的实施例包括一种计算机程序产品，其包括承载在非暂态计算机可读介质上的计算机程序，该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中，该计算机程序可以从服务器上被下载和安装，或者从存储装置被安装。该计算机程序被执行时，实现本发明实施例的方法中限定的上述功能。In particular, the processes described above with reference to the flowcharts may be implemented as computer software programs according to embodiments of the present invention. For example, embodiments of the present invention include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated in the flowchart. In such an embodiment, the computer program may be downloaded and installed from a server, or installed from a storage device. When the computer program is executed, the above-mentioned functions defined in the methods of the embodiments of the present invention are realized.

根据本发明的一个实施例的文本处理方法可以应用于前文所述的智能数据座舱的对话系统中，例如作为自然语言处理模块的一个功能。The text processing method according to an embodiment of the present invention can be applied to the above-mentioned dialogue system of the intelligent data cockpit, for example, as a function of a natural language processing module.

为此，本发明还提供了一种电子设备，如图4所示，可以包括处理器(例如中央处理器、图形处理器等)和存储介质。在存储介质中存储有程序，该程序被处理器执行时实现根据本发明一个实施例的文本处理方法。To this end, the present invention also provides an electronic device, as shown in FIG. 4 , which may include a processor (such as a central processing unit, a graphics processor, etc.) and a storage medium. A program is stored in the storage medium, and when the program is executed by the processor, implements the text processing method according to an embodiment of the present invention.

其中，所述程序包括计算机程序代码，所述程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。Wherein, the program includes computer program code, and the program code may be in the form of source code, object code, executable file or some intermediate form, and the like.

进一步，该程序被处理器执行时还能实现语音识别和语音合成功能，也就是说，该电子设备包括语音识别模块、自然语言处理模块和语音合成模块这些软件功能模块。Further, when the program is executed by the processor, the functions of speech recognition and speech synthesis can also be realized, that is to say, the electronic device includes software function modules such as a speech recognition module, a natural language processing module and a speech synthesis module.

为此，本发明还提供一种车载对话系统，包括上述电子设备以及输入/输出装置，处理器和存储介质可以通过总线彼此相连以及连接到输入/输出装置。To this end, the present invention also provides an in-vehicle dialogue system, including the above electronic equipment and an input/output device, and the processor and the storage medium can be connected to each other and to the input/output device through a bus.

在一个具体实施例中，本发明的车载对话系统的输入/输出装置包括触控屏、麦克风、扬声器等，用于构成自动驾驶中的智能数据座舱对话系统所需的硬件和软件。In a specific embodiment, the input/output device of the in-vehicle dialogue system of the present invention includes a touch screen, a microphone, a speaker, etc., which are used to form the hardware and software required for the intelligent data cockpit dialogue system in automatic driving.

本发明还提供了一种计算机可读存储介质，在存储介质中存储有程序，该程序被处理器执行时实现根据本发明一个实施例的文本处理方法。The present invention also provides a computer-readable storage medium, in which a program is stored, and when the program is executed by a processor, the text processing method according to an embodiment of the present invention is implemented.

需要说明的是，本发明上述的计算机可读存储介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线或半导体的系统、装置或器件，或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于：具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本发明中，计算机可读存储介质可以是任何包含或存储程序的有形介质，该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本发明中，计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号，其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式，包括但不限于电磁信号、光信号或上述得任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质，该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输，包括但不限于：电线、光缆、RF(射频)等等，或者上述的任意合适的组合。It should be noted that the above-mentioned computer-readable storage medium of the present invention may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two. The computer readable storage medium can be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared or semiconductor system, apparatus or device, or a combination of any of the above. More specific examples of computer readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Programmable read only memory (EPROM or flash memory), fiber optics, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing. In the present invention, a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, carrying computer readable program code therein. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A computer-readable signal medium can also be any computer-readable medium other than a computer-readable storage medium that can transmit, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device . Program code embodied on a computer readable medium may be transmitted using any suitable medium including, but not limited to, electrical wire, optical fiber cable, RF (radio frequency), etc., or any suitable combination of the foregoing.

进一步，应该理解的是，由于各个模块的设定仅仅是为了说明本发明的装置的功能单元，这些模块对应的物理器件可以是处理器本身，或者处理器中软件的一部分，硬件的一部分，或者软件和硬件结合的一部分。因此，图中的各个模块的数量仅仅是示意性的。Further, it should be understood that since the setting of each module is only for describing the functional units of the apparatus of the present invention, the physical device corresponding to these modules may be the processor itself, or a part of software in the processor, a part of hardware, or Part of the combination of software and hardware. Therefore, the numbers of the various modules in the figures are merely schematic.

至此，已经结合附图所示的优选实施方式描述了本发明的技术方案，但是，本领域技术人员容易理解的是，本发明的保护范围显然不局限于这些具体实施方式。在不偏离本发明的原理的前提下，本领域技术人员可以对相关技术特征作出等同地更改或替换，这些更改或替换之后的技术方案都将落入本发明的保护范围之内。So far, the technical solutions of the present invention have been described with reference to the preferred embodiments shown in the accompanying drawings, however, those skilled in the art can easily understand that the protection scope of the present invention is obviously not limited to these specific embodiments. On the premise of not departing from the principle of the present invention, those skilled in the art can make equivalent changes or replacements to the relevant technical features, and the technical solutions after these changes or replacements will fall within the protection scope of the present invention.

Claims

1. a text processing method, is characterized in that, comprises:

Embedding the text to get the word embedding vector;

get task identifier;

Inputting the word embedding vector and the task identifier into a trained conversion tagging model to obtain a task type and conversion tag, wherein the conversion tag includes a first conversion tag that characterizes the part of the text that needs to be converted and a conversion tag that characterizes the A second conversion token for the part of the text that does not require conversion;

Fusing the task type and the text portion corresponding to the first conversion mark to obtain fusion information;

Inputting the fusion information into a trained text conversion model, and correspondingly converting the to-be-converted text portion according to the task type to obtain a conversion result.

2. The method according to claim 1, wherein the obtaining the task identifier comprises:

Obtaining the scene to be applied to the text, wherein the scene to be applied includes a first scene corresponding to text standardization and a second scene corresponding to inverse text standardization;

The task identifier is determined according to the scenario to be applied, wherein the task identifier includes a first identifier corresponding to the first scenario and a second identifier corresponding to the second scenario.

3. The method according to claim 2, wherein, inputting the word embedding vector and the task identifier into a trained conversion labeling model to obtain a task type and a conversion label, comprising:

Inputting the word embedding vector and the task identifier into a trained conversion token model to obtain a token sequence, wherein the token sequence includes a task type bit and a transition token bit;

wherein the task type on the task type bit includes a text normalization task corresponding to the first identifier and an inverse text normalization task corresponding to the second identifier; and

Wherein, the conversion flag on the conversion flag bit includes the first conversion flag and the second conversion flag.

4. The method according to claim 1, wherein the merging of the text portion corresponding to the task type and the first conversion mark to obtain fusion information, comprising:

The task type, the text portion corresponding to the first conversion mark, and the context of the text corresponding to the first conversion mark are fused to obtain fusion information.

5. The method according to claim 4, wherein the merging the task type, the text part corresponding to the first conversion mark, and the context of the text corresponding to the first conversion mark, comprises:

The task type, the text portion corresponding to the first conversion mark, and the context of the text corresponding to the first conversion mark are spliced together.

6. The method according to claim 1, characterized in that, inputting the fusion information into a trained text conversion model, and performing corresponding conversion on the part of the text to be converted according to the task type to obtain a conversion result, comprising: :

If the task type is a text normalization task, the text conversion model performs text normalization processing on the to-be-converted text portion to obtain a text normalization conversion result;

If the task type is an inverse text normalization task, the text conversion model performs inverse text normalization processing on the to-be-converted text portion to obtain an inverse text normalization conversion result.

7. The method according to claim 6, wherein the method further comprises:

The complete text is obtained by splicing the conversion result with the part of the text that does not need to be converted.

8. An electronic device comprising a processor and a memory, wherein a program is stored in the memory, and when the program is executed by the processor, the method according to any one of claims 1-7 is implemented .

9. A vehicle-mounted dialogue system, characterized in that, comprising:

The electronic device according to claim 8;

Input and output device.

10. An automobile, characterized by comprising the vehicle-mounted dialogue system according to claim 9.