CN114357982A

CN114357982A - A data processing method and device for constructing a domain dictionary

Info

Publication number: CN114357982A
Application number: CN202111654104.0A
Authority: CN
Inventors: 黄于晏; 陈莹莹; 钟艺豪; 陈畅新; 孔晓晴
Original assignee: Youmi Technology Co ltd
Current assignee: Youmi Technology Co ltd
Priority date: 2021-12-30
Filing date: 2021-12-30
Publication date: 2022-04-15
Anticipated expiration: 2041-12-30
Also published as: CN114357982B

Abstract

The invention discloses a data processing method and device for constructing a domain dictionary. The method includes: acquiring material information to be processed; the material information to be processed includes marketing text information, and/or song information to be processed, and/or, Rhyming word table information; segment and filter the material information to be processed to obtain a dictionary to be used; the dictionary to be used is used to filter the words in the material text; the dictionary to be used is divided and filtered to obtain domain keyword information; keyword information There is a strong correlation with the advertising domain to which the material information to be processed belongs; domain keywords are used to construct domain dictionaries. It can be seen that the present invention can obtain domain keyword information for constructing a domain dictionary through comprehensive processing such as segmentation and filtering of the material information to be processed, segmentation and filtering, etc., which is beneficial to reduce the requirements for song materials, improve the processing efficiency of song materials, and further Improve the generation efficiency of advertising lyrics and reduce production costs.

Description

A data processing method and device for constructing a domain dictionary

技术领域technical field

本发明涉及数据处理技术领域，尤其涉及一种用于构建领域词典的数据处理方法及装置。The present invention relates to the technical field of data processing, in particular to a data processing method and device for constructing a domain dictionary.

背景技术Background technique

将热门流行歌曲转化为广告歌词是当前广告推广适应电梯广告等新媒介发展的重要方式，这一过程需要利用智能模型将学习了解推广歌曲背后的语言风格、社交流行趋势和情感表达，以生成朗朗上口的广告歌词，但目前在歌曲生成过程中普遍存在对歌曲素材要求较高，广告歌词制作成本较高等问题。因此，提供一种用于构建领域词典的数据处理方法及装置，以降低对歌曲素材要求，提高对歌曲素材的处理效能，进而提升广告歌词的生成效率和降低制作成本显得尤为重要。Converting popular songs into advertising lyrics is an important way for current advertising promotion to adapt to the development of new media such as elevator advertising. This process requires the use of intelligent models to learn and understand the language style, social trends and emotional expressions behind the promoted songs to generate Lang Lang. Catchy advertising lyrics, but currently in the process of song generation, there are generally problems such as higher requirements for song materials and higher production costs for advertising lyrics. Therefore, it is particularly important to provide a data processing method and device for constructing a domain dictionary, so as to reduce the requirements for song materials, improve the processing efficiency of song materials, and further improve the generation efficiency of advertisement lyrics and reduce production costs.

发明内容SUMMARY OF THE INVENTION

本发明所要解决的技术问题在于，提供一种用于构建领域词典的数据处理方法及装置，能够通过对待处理素材信息的分割过滤处理和分割筛选处理等综合处理得到用于构建领域词典的领域关键词信息，有利于降低对歌曲素材要求，提高对歌曲素材的处理效能，进而提升广告歌词的生成效率和降低制作成本。The technical problem to be solved by the present invention is to provide a data processing method and device for constructing a domain dictionary, which can obtain the domain key for constructing a domain dictionary through comprehensive processing such as segmentation and filtering processing and segmentation and filtering processing of the material information to be processed. The word information is conducive to reducing the requirements for song materials, improving the processing efficiency of song materials, thereby improving the generation efficiency of advertising lyrics and reducing production costs.

为了解决上述技术问题，本发明实施例第一方面公开了一种用于构建领域词典的数据处理方法，所述方法包括：In order to solve the above technical problems, a first aspect of the embodiments of the present invention discloses a data processing method for constructing a domain dictionary, the method comprising:

获取待处理素材信息；所述待处理素材信息包括营销文本信息，和/或，待处理歌曲信息，和/或，押韵字表信息；Obtain material information to be processed; the material information to be processed includes marketing text information, and/or, song information to be processed, and/or, rhyming word table information;

对所述待处理素材信息进行分割过滤处理，得到待用词典；所述待用词典用于对素材文本中的词语进行筛选；Performing segmentation and filtering processing on the material information to be processed to obtain a dictionary to be used; the dictionary to be used is used to filter words in the material text;

对所述待用词典进行分割筛选处理，得到领域关键词信息；所述关键词信息与所述待处理素材信息所属的广告领域是强相关的；所述领域关键词用于构建领域词典。The to-be-used dictionary is divided and screened to obtain domain keyword information; the keyword information is strongly correlated with the advertising domain to which the to-be-processed material information belongs; the domain keywords are used to construct a domain dictionary.

作为一种可选的实施方式，在本发明实施例第一方面中，所述对所述待处理素材信息进行分割过滤处理，得到待用词典，包括：As an optional implementation manner, in the first aspect of the embodiment of the present invention, performing segmentation and filtering processing on the to-be-processed material information to obtain a to-be-used dictionary includes:

对所述待处理歌曲信息进行识别处理，得到初始歌词文本信息；Perform identification processing on the song information to be processed to obtain initial lyrics text information;

对所述初始歌词文本信息分割过滤处理，得到待用词典。The initial lyrics text information is segmented and filtered to obtain a dictionary to be used.

作为一种可选的实施方式，在本发明实施例第一方面中，所述对所述初始歌词文本信息分割过滤处理，得到待用词典，包括：As an optional implementation manner, in the first aspect of the embodiment of the present invention, the segmentation and filtering of the initial lyrics text information to obtain a dictionary to be used includes:

对所述初始歌词文本信息进行字符分割和行分隔处理，得到第一中间歌词文本信息；The initial lyrics text information is subjected to character segmentation and line separation processing to obtain the first intermediate lyrics text information;

对所述第一中间歌词文本信息进行计算处理，得到第二中间歌词文本信息；The first intermediate lyrics text information is calculated and processed to obtain the second intermediate lyrics text information;

利用预设的停用词表对所述第二中间歌词文本信息进行筛选处理，得到第三中间歌词文本信息；所述第三中间歌词文本信息包括若干个歌词文本片段；The second intermediate lyrics text information is screened by using a preset stop word table to obtain third intermediate lyrics text information; the third intermediate lyrics text information includes several lyrics text fragments;

对所述第三中间歌词文本信息进行计算筛选排序处理，得到待用词典。Perform calculation, screening and sorting processing on the third intermediate lyrics text information to obtain a dictionary to be used.

作为一种可选的实施方式，在本发明实施例第一方面中，所述对所述第三中间歌词文本信息进行计算筛选排序处理，得到待用词典，包括：As an optional implementation manner, in the first aspect of the embodiment of the present invention, performing calculation, screening and sorting processing on the third intermediate lyrics text information to obtain a dictionary to be used, including:

对所述第三中间歌词文本信息进行信息熵计算处理，得到第一文本熵信息；Performing information entropy calculation processing on the third intermediate lyrics text information to obtain first text entropy information;

利用预设的熵阈值对所述第一文本熵信息进行筛选处理，得到第二文本熵信息；Screening the first text entropy information by using a preset entropy threshold to obtain second text entropy information;

对所述第二文本熵信息进行排序处理，得到文本序列信息；所述排序处理是根据所述歌词文本片段对应的频率信息进行的；Perform sorting processing on the second text entropy information to obtain text sequence information; the sorting processing is performed according to the frequency information corresponding to the lyrics text segments;

对所述文本序列信息进行处理，得到待用词典。The text sequence information is processed to obtain a dictionary to be used.

作为一种可选的实施方式，在本发明实施例第一方面中，所述对所述待用词典进行分割筛选处理，得到领域关键词信息，包括：As an optional implementation manner, in the first aspect of the embodiment of the present invention, performing segmentation and screening processing on the to-be-used dictionary to obtain domain keyword information, including:

利用预设的词向量模型对所述待用词典和所述营销文本信息进行处理，得到词向量信息；Using a preset word vector model to process the to-be-used dictionary and the marketing text information to obtain word vector information;

利用预设的第一分词工具对所述待用词典和所述营销文本信息进行处理，得到待用领域词信息；Use the preset first word segmentation tool to process the to-be-used dictionary and the marketing text information to obtain the to-be-used domain word information;

对所述词向量信息和所述待用领域词信息进行转换筛选处理，得到领域关键词信息。Converting and screening the word vector information and the to-be-used domain word information to obtain domain keyword information.

作为一种可选的实施方式，在本发明实施例第一方面中，所述对所述词向量信息和所述待用领域词信息进行转换筛选处理，得到领域关键词信息，包括：As an optional implementation manner, in the first aspect of the embodiment of the present invention, the conversion and screening process is performed on the word vector information and the to-be-used domain word information to obtain domain keyword information, including:

利用所述词向量信息对所述待用领域词信息进行转换处理，得到领域词向量信息；Converting the to-be-used domain word information by using the word vector information to obtain domain word vector information;

对所述领域词向量信息进行聚类计算处理，得到词距离信息；Perform clustering calculation processing on the domain word vector information to obtain word distance information;

对所述词距离信息进行分组筛选处理，得到领域关键词信息。Perform group screening processing on the word distance information to obtain domain keyword information.

作为一种可选的实施方式，在本发明实施例第一方面中，在所述对所述待用词典进行分割筛选处理，得到领域关键词信息之后，所述方法还包括：As an optional implementation manner, in the first aspect of the embodiment of the present invention, after the segmentation and screening of the to-be-used dictionary is performed to obtain domain keyword information, the method further includes:

对所述押韵字表信息进行处理，得到同韵列表；The rhyming word table information is processed to obtain a rhyme list;

利用预设的第二分词工具对所述同韵列表、所述领域关键词信息和所述待处理歌曲信息进行处理，得到训练文本信息；所述训练文本信息用于训练格式限制模型；所述格式限制模型用于约束广告歌词的格式。Use the preset second word segmentation tool to process the rhyme list, the domain keyword information and the to-be-processed song information to obtain training text information; the training text information is used to train the format restriction model; the The format restriction model is used to restrict the format of ad lyrics.

本发明实施例第二方面公开了一种用于构建领域词典的数据处理装置，装置包括：A second aspect of the embodiments of the present invention discloses a data processing device for constructing a domain dictionary, the device comprising:

获取模块，用于获取待处理素材信息；所述待处理素材信息包括营销文本信息，和/或，待处理歌曲信息，和/或，押韵字表信息；an acquisition module, configured to acquire material information to be processed; the material information to be processed includes marketing text information, and/or song information to be processed, and/or rhyming word table information;

第一处理模块，用于对所述待处理素材信息进行分割过滤处理，得到待用词典；所述待用词典用于对素材文本中的词语进行筛选；a first processing module, configured to perform segmentation and filtering processing on the material information to be processed to obtain a dictionary to be used; the dictionary to be used is used to filter words in the material text;

第二处理模块，用于对所述待用词典进行分割筛选处理，得到领域关键词信息；所述关键词信息与所述待处理素材信息所属的广告领域是强相关的；所述领域关键词用于构建领域词典。The second processing module is configured to perform segmentation and screening processing on the to-be-used dictionary to obtain domain keyword information; the keyword information is strongly related to the advertising domain to which the to-be-processed material information belongs; the domain keywords Used to build domain dictionaries.

作为一种该可选的实施方式，在本发明实施例第二方面中，所述第一处理模块包括第一处理子模块和第二处理子模块，其中：As an optional implementation manner, in the second aspect of the embodiment of the present invention, the first processing module includes a first processing sub-module and a second processing sub-module, wherein:

所述第一处理子模块，用于对所述待处理歌曲信息进行识别处理，得到初始歌词文本信息；The first processing submodule is used to identify and process the song information to be processed to obtain initial lyrics text information;

所述第二处理子模块，用于对所述初始歌词文本信息分割过滤处理，得到待用词典。The second processing sub-module is used for dividing and filtering the initial lyrics text information to obtain a dictionary to be used.

作为一种该可选的实施方式，在本发明实施例第二方面中，所述第二处理子模块对所述初始歌词文本信息分割过滤处理，得到待用词典，包括：As an optional implementation manner, in the second aspect of the embodiment of the present invention, the second processing submodule performs segmentation and filtering processing on the initial lyrics text information to obtain a dictionary to be used, including:

作为一种该可选的实施方式，在本发明实施例第二方面中，所述第二处理子模块对所述第三中间歌词文本信息进行计算筛选排序处理，得到待用词典，包括：As an optional implementation manner, in the second aspect of the embodiment of the present invention, the second processing submodule performs calculation, screening and sorting processing on the third intermediate lyrics text information to obtain a dictionary to be used, including:

作为一种该可选的实施方式，在本发明实施例第二方面中，所述第二处理模块对所述待用词典进行分割筛选处理，得到领域关键词信息的具体方式为：As an optional implementation manner, in the second aspect of the embodiment of the present invention, the second processing module performs segmentation and screening processing on the to-be-used dictionary, and the specific manner of obtaining the domain keyword information is as follows:

作为一种该可选的实施方式，在本发明实施例第二方面中，所述第二处理模块对所述词向量信息和所述待用领域词信息进行转换筛选处理，得到领域关键词信息的具体方式为：As an optional implementation manner, in the second aspect of the embodiment of the present invention, the second processing module performs conversion and screening processing on the word vector information and the to-be-used domain word information to obtain domain keyword information The specific way is:

作为一种该可选的实施方式，在本发明实施例第二方面中，在所述第二处理模块对所述待用词典进行分割筛选处理，得到领域关键词信息之后，所述装置还包括：As an optional implementation manner, in the second aspect of the embodiment of the present invention, after the second processing module performs segmentation and screening processing on the to-be-used dictionary to obtain domain keyword information, the apparatus further includes: :

第三处理模块，用于对所述押韵字表信息进行处理，得到同韵列表；The third processing module is used to process the rhyming word table information to obtain a rhyme list;

第四处理模块，用于利用预设的第二分词工具对所述同韵列表、所述领域关键词信息和所述待处理歌曲信息进行处理，得到训练文本信息；所述训练文本信息用于训练格式限制模型；所述格式限制模型用于约束广告歌词的格式。The fourth processing module is used to process the rhyme list, the domain keyword information and the song information to be processed by using a preset second word segmentation tool to obtain training text information; the training text information is used for A format restriction model is trained; the format restriction model is used to constrain the format of advertisement lyrics.

本发明第三方面公开了另一种用于构建领域词典的数据处理装置，所述装置包括：A third aspect of the present invention discloses another data processing device for constructing a domain dictionary, the device comprising:

存储有可执行程序代码的存储器；a memory in which executable program code is stored;

与所述存储器耦合的处理器；a processor coupled to the memory;

所述处理器调用所述存储器中存储的所述可执行程序代码，执行本发明实施例第一方面公开的用于构建领域词典的数据处理方法中的部分或全部步骤。The processor invokes the executable program code stored in the memory to execute part or all of the steps in the data processing method for building a domain dictionary disclosed in the first aspect of the embodiments of the present invention.

本发明第四方面公开了一种计算机存储介质，所述计算机存储介质存储有计算机指令，所述计算机指令被调用时，用于执行本发明实施例第一方面公开的用于构建领域词典的数据处理方法中的部分或全部步骤。A fourth aspect of the present invention discloses a computer storage medium, where the computer storage medium stores computer instructions, and when the computer instructions are invoked, is used to execute the data for constructing a domain dictionary disclosed in the first aspect of the embodiments of the present invention Some or all of the steps in a processing method.

与现有技术相比，本发明实施例具有以下有益效果：Compared with the prior art, the embodiments of the present invention have the following beneficial effects:

本发明实施例中，获取待处理素材信息；待处理素材信息包括营销文本信息，和/或，待处理歌曲信息，和/或，押韵字表信息；对待处理素材信息进行分割过滤处理，得到待用词典；待用词典用于对素材文本中的词语进行筛选；对待用词典进行分割筛选处理，得到领域关键词信息；关键词信息与待处理素材信息所属的广告领域是强相关的；领域关键词用于构建领域词典。可见，本发明能够通过对待处理素材信息的分割过滤处理和分割筛选处理等综合处理得到用于构建领域词典的领域关键词信息，有利于降低对歌曲素材要求，提高对歌曲素材的处理效能，进而提升广告歌词的生成效率和降低制作成本。In the embodiment of the present invention, material information to be processed is obtained; the material information to be processed includes marketing text information, and/or song information to be processed, and/or rhyming word table information; the material information to be processed is divided and filtered to obtain the information to be processed. Use a dictionary; the dictionary to be used is used to filter the words in the material text; the dictionary to be used is divided and filtered to obtain domain keyword information; the keyword information is strongly related to the advertising domain to which the material information to be processed belongs; domain key Words are used to build domain dictionaries. It can be seen that the present invention can obtain domain keyword information for constructing a domain dictionary through comprehensive processing such as segmentation and filtering of the material information to be processed, segmentation and filtering, etc., which is beneficial to reduce the requirements for song materials, improve the processing efficiency of song materials, and further Improve the generation efficiency of advertising lyrics and reduce production costs.

附图说明Description of drawings

为了更清楚地说明本发明实施例中的技术方案，下面将对实施例描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to illustrate the technical solutions in the embodiments of the present invention more clearly, the following briefly introduces the accompanying drawings used in the description of the embodiments. Obviously, the accompanying drawings in the following description are only some embodiments of the present invention. For those of ordinary skill in the art, other drawings can also be obtained from these drawings without creative effort.

图1是本发明实施例公开的一种用于构建领域词典的数据处理方法的流程示意图；1 is a schematic flowchart of a data processing method for constructing a domain dictionary disclosed in an embodiment of the present invention;

图2是本发明实施例公开的另一种用于构建领域词典的数据处理方法的流程示意图；2 is a schematic flowchart of another data processing method for constructing a domain dictionary disclosed in an embodiment of the present invention;

图3是本发明实施例公开的一种用于构建领域词典的数据处理装置的结构示意图；3 is a schematic structural diagram of a data processing device for constructing a domain dictionary disclosed in an embodiment of the present invention;

图4是本发明实施例公开的另一种用于构建领域词典的数据处理装置的结构示意图；4 is a schematic structural diagram of another data processing apparatus for constructing a domain dictionary disclosed in an embodiment of the present invention;

图5本发明实施例公开的又一种用于构建领域词典的数据处理装置的结构示意图。FIG. 5 is a schematic structural diagram of another data processing apparatus for constructing a domain dictionary disclosed in an embodiment of the present invention.

具体实施方式Detailed ways

为了使本技术领域的人员更好地理解本发明方案，下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。In order to make those skilled in the art better understand the solutions of the present invention, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only These are some embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

本发明的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别不同对象，而不是用于描述特定顺序。此外，术语“包括”和“具有”以及它们任何变形，意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、装置、产品或设备没有限定于已列出的步骤或单元，而是可选地还包括没有列出的步骤或单元，或可选地还包括对于这些过程、方法、产品或设备固有的其他步骤或单元。The terms "first", "second" and the like in the description and claims of the present invention and the above drawings are used to distinguish different objects, rather than to describe a specific order. Furthermore, the terms "comprising" and "having" and any variations thereof are intended to cover non-exclusive inclusion. For example, a process, method, apparatus, product or device comprising a series of steps or units is not limited to the listed steps or units, but optionally also includes unlisted steps or units, or optionally also includes For other steps or units inherent to these processes, methods, products or devices.

在本文中提及“实施例”意味着，结合实施例描述的特定特征、结构或特性可以包含在本发明的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例，也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是，本文所描述的实施例可以与其它实施例相结合。Reference herein to an "embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the present invention. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor a separate or alternative embodiment that is mutually exclusive of other embodiments. It is explicitly and implicitly understood by those skilled in the art that the embodiments described herein may be combined with other embodiments.

本发明公开了一种用于构建领域词典的数据处理方法及装置，能够通过对待处理素材信息的分割过滤处理和分割筛选处理等综合处理得到用于构建领域词典的领域关键词信息，有利于降低对歌曲素材要求，提高对歌曲素材的处理效能，进而提升广告歌词的生成效率和降低制作成本。以下分别进行详细说明。The invention discloses a data processing method and device for constructing a domain dictionary, which can obtain domain keyword information for constructing a domain dictionary through comprehensive processing such as segmentation and filtering processing and segmentation and filtering processing of material information to be processed, which is conducive to reducing For song material requirements, improve the processing efficiency of song material, thereby improving the generation efficiency of advertising lyrics and reducing production costs. Each of them will be described in detail below.

实施例一Example 1

请参阅图1，图1是本发明实施例公开的一种用于构建领域词典的数据处理方法的流程示意图。其中，图1所描述的用于构建领域词典的数据处理方法应用于数据处理系统中，如用于构建领域词典的数据处理管理的本地服务器或云端服务器等，本发明实施例不做限定。如图1所示，该用于构建领域词典的数据处理方法可以包括以下操作：Please refer to FIG. 1. FIG. 1 is a schematic flowchart of a data processing method for constructing a domain dictionary disclosed in an embodiment of the present invention. The data processing method for constructing a domain dictionary described in FIG. 1 is applied in a data processing system, such as a local server or a cloud server for data processing management for constructing a domain dictionary, which is not limited in the embodiment of the present invention. As shown in Figure 1, the data processing method for constructing a domain dictionary may include the following operations:

101、获取待处理素材信息。101. Obtain material information to be processed.

本发明实施例中，上述待处理素材信息包括营销文本信息，和/或，待处理歌曲信息，和/或，押韵字表信息，本发明实施例不做限定。In the embodiment of the present invention, the above-mentioned material information to be processed includes marketing text information, and/or song information to be processed, and/or rhyming word table information, which is not limited in the embodiment of the present invention.

102、对待处理素材信息进行分割过滤处理，得到待用词典。102. Perform segmentation and filtering processing on the material information to be processed to obtain a dictionary to be used.

本发明实施例中，上述待用词典用于对素材文本中的词语进行筛选。In the embodiment of the present invention, the above-mentioned dictionary to be used is used to screen the words in the material text.

103、对待用词典进行分割筛选处理，得到领域关键词信息。103. Perform segmentation and screening processing on the dictionary to be used to obtain domain keyword information.

本发明实施例中，上述关键词信息与待处理素材信息所属的广告领域是强相关的。In the embodiment of the present invention, the above keyword information is strongly correlated with the advertising field to which the material information to be processed belongs.

本发明实施例中，上述领域关键词用于构建领域词典。In the embodiment of the present invention, the above domain keywords are used to construct a domain dictionary.

可见，实施本发明实施例所描述的用于构建领域词典的数据处理方法能够通过对待处理素材信息的分割过滤处理和分割筛选处理等综合处理得到用于构建领域词典的领域关键词信息，有利于降低对歌曲素材要求，提高对歌曲素材的处理效能，进而提升广告歌词的生成效率和降低制作成本。It can be seen that, implementing the data processing method for constructing a domain dictionary described in the embodiments of the present invention can obtain domain keyword information for constructing a domain dictionary through comprehensive processing such as segmentation and filtering of the material information to be processed, segmentation and filtering, etc., which is beneficial to Reduce the requirements for song materials and improve the processing efficiency of song materials, thereby improving the generation efficiency of advertising lyrics and reducing production costs.

在一个可选的实施例中，上述步骤102中对待处理素材信息进行分割过滤处理，得到待用词典，包括：In an optional embodiment, in the above step 102, the material information to be processed is segmented and filtered to obtain a dictionary to be used, including:

对待处理歌曲信息进行识别处理，得到初始歌词文本信息；Perform identification processing on the song information to be processed to obtain initial lyrics text information;

对初始歌词文本信息分割过滤处理，得到待用词典。The initial lyrics text information is segmented and filtered to obtain a dictionary to be used.

可选的，上述待处理歌曲信息包括若干个待处理歌曲。Optionally, the above song information to be processed includes several songs to be processed.

可选的，上述初始歌词文本信息包括若干个初始歌词文本。Optionally, the above-mentioned initial lyrics text information includes several initial lyrics texts.

在该可选的实施例中，作为一种可选的实施方式，上述对待处理歌曲信息进行识别处理，得到初始歌词文本信息的具体方式为：In this optional embodiment, as an optional implementation manner, the above-mentioned specific method for identifying and processing the song information to be processed to obtain the initial lyrics text information is:

对于任一待处理歌曲，判断该待处理歌曲是否存在歌词，得到歌词判断结果；For any song to be processed, determine whether the song to be processed has lyrics, and obtain the lyrics judgment result;

当上述歌词判断结果为是时，对该待处理歌曲中的歌词进行提取处理，得到该待处理歌曲对应的初始歌词文本；When the above-mentioned lyrics judgment result is yes, extract the lyrics in the song to be processed, and obtain the corresponding initial lyrics text of the song to be processed;

当上述歌词判断结果为否时，利用预设的语音识别模型对该待处理歌曲进行识别处理，得到该待处理歌曲对应的初始歌词文本。When the above lyrics judgment result is no, a preset speech recognition model is used to identify and process the to-be-processed song, and an initial lyrics text corresponding to the to-be-processed song is obtained.

可见，实施本发明实施例所描述的用于构建领域词典的数据处理方法能够通过对待处理歌曲信息的识别处理以及分割过滤处理得到待用词典，有利于降低对歌曲素材要求，提高对歌曲素材的处理效能，进而提升广告歌词的生成效率和降低制作成本。It can be seen that implementing the data processing method for constructing a domain dictionary described in the embodiment of the present invention can obtain a dictionary to be used through identification processing and segmentation and filtering processing of the song information to be processed, which is conducive to reducing the requirements for song materials and improving the quality of song materials. Processing efficiency, thereby improving the generation efficiency of advertising lyrics and reducing production costs.

在另一个可选的实施例中，上述对初始歌词文本信息分割过滤处理，得到待用词典，包括：In another optional embodiment, the above-mentioned segmentation and filtering of the initial lyrics text information to obtain a dictionary to be used, including:

对初始歌词文本信息进行字符分割和行分隔处理，得到第一中间歌词文本信息；Perform character segmentation and line separation processing on the initial lyrics text information to obtain the first intermediate lyrics text information;

对第一中间歌词文本信息进行计算处理，得到第二中间歌词文本信息；The first intermediate lyrics text information is calculated and processed to obtain the second intermediate lyrics text information;

利用预设的停用词表对第二中间歌词文本信息进行筛选处理，得到第三中间歌词文本信息；第三中间歌词文本信息包括若干个歌词文本片段；The second intermediate lyrics text information is screened by using a preset stop word table to obtain third intermediate lyrics text information; the third intermediate lyrics text information includes several lyrics text fragments;

对第三中间歌词文本信息进行计算筛选排序处理，得到待用词典。The third intermediate lyrics text information is calculated, filtered and sorted to obtain a dictionary to be used.

可选的，上述对初始歌词文本信息进行字符分割是将初始歌词文本中的非中英文数字的字符转成分割符号。Optionally, the above-mentioned character segmentation of the initial lyric text information is to convert characters that are not Chinese and English numbers in the initial lyric text into segmentation symbols.

可选的，上述分割符号包括空格，和/或，标识符，本发明实施例不做限定。Optionally, the foregoing division symbol includes a space and/or an identifier, which is not limited in this embodiment of the present invention.

可选的，上述对初始歌词文本信息进行行分隔处理是根据分割符号对文本中的字符进行隔开。Optionally, the above line separation processing for the initial lyrics text information is to separate the characters in the text according to the separation symbol.

可选的，上述第一中间歌词文本信息包括若干个第一中间歌词文本。Optionally, the above-mentioned first intermediate lyrics text information includes several first intermediate lyrics texts.

可选的，上述第二中间歌词文本信息包括若干个第二中间歌词文本。Optionally, the above-mentioned second intermediate lyrics text information includes several second intermediate lyrics texts.

在该可选的实施例中，作为一种可选的实施方式，上述对第一中间歌词文本信息进行计算处理，得到第二中间歌词文本信息的具体方式为：In this optional embodiment, as an optional implementation manner, the above-mentioned calculation and processing of the first intermediate lyrics text information to obtain the second intermediate lyrics text information is as follows:

对于任一第一中间歌词文本，计算该第一中间歌词文本中任意相邻两个字符的互信息熵，得到该第一中间歌词文本对应的熵值信息；上述熵值信息包括若干个熵值；For any first intermediate lyrics text, calculate the mutual information entropy of any two adjacent characters in the first intermediate lyrics text, and obtain the entropy value information corresponding to the first intermediate lyrics text; the entropy value information includes several entropy values ;

对于任一熵值，判断该熵值是否大于等于熵阈值，得到熵判断结果；For any entropy value, determine whether the entropy value is greater than or equal to the entropy threshold, and obtain the entropy judgment result;

当上述熵判断结果为否时，对该熵值对应的字符进行断开处理，得到该熵值对应的第二中间歌词文本。When the above entropy judgment result is no, the character corresponding to the entropy value is disconnected to obtain the second intermediate lyrics text corresponding to the entropy value.

可选的，上述熵值表征两个字符之间的联系紧密程度，上述熵值越大，两个字符之间的成词概率越大。Optionally, the above-mentioned entropy value represents the closeness of the connection between two characters, and the larger the above-mentioned entropy value is, the higher the probability of word formation between the two characters is.

在该可选的实施例中，作为另一种可选的实施方式，上述利用预设的停用词表对第二中间歌词文本信息进行筛选处理，得到第三中间歌词文本信息的具体方式为：In this optional embodiment, as another optional implementation manner, the above-mentioned specific method for screening the second intermediate lyrics text information by using a preset stop word table to obtain the third intermediate lyrics text information is as follows: :

利用预设的长度筛选条件对上述第二中间歌词文本信息进行筛选处理，得到长度歌词文本信息；上述长度歌词文本信息包括若干个长度歌词文本；The above-mentioned second intermediate lyric text information is screened by using the preset length screening condition to obtain the length lyric text information; the above-mentioned length lyric text information includes several length lyric texts;

利用预设的停用词表对上述长度歌词文本信息进行过滤处理，得到备选歌词文本信息；上述备选歌词文本信息包括若干个备选歌词文本；The above-mentioned length lyrics text information is filtered by using a preset stop word table to obtain alternative lyrics text information; the above-mentioned alternative lyrics text information includes several alternative lyrics texts;

对上述备选歌词文本信息中的备选歌词文本进行出现频次统计，得到第一频次信息；第一频次信息包括若干个词频次；Counting the occurrence frequency of the alternative lyrics text in the above-mentioned alternative lyrics text information, to obtain the first frequency information; the first frequency information includes several word frequencies;

利用预设的频次阈值对上述第一频次信息进行处理，得到第三中间歌词文本信息。The above-mentioned first frequency information is processed by using a preset frequency threshold to obtain third intermediate lyrics text information.

可选的，上述利用预设的长度筛选条件对上述第二中间歌词文本信息进行筛选处理，得到长度歌词文本信息的具体方式为：Optionally, the above-mentioned second intermediate lyrics text information is screened by using a preset length screening condition, and the specific method for obtaining the length lyrics text information is:

对于任一第二中间歌词文本，判断该第二中间歌词文本是否满足预设的长度筛选条件，得到长度判断结果；For any second intermediate lyric text, determine whether the second intermediate lyric text satisfies a preset length screening condition, and obtain a length judgment result;

当上述长度判断结果为是时，确定该第二中间歌词文本为长度歌词文本。When the above length judgment result is yes, it is determined that the second intermediate lyric text is a length lyric text.

可选的，上诉长度筛选条件包括文本的字符数量为2，和/或，文本的字符数量为3，和/或，文本的字符数量为4，本发明实施例不做限定。Optionally, the appeal length filter condition includes that the number of characters in the text is 2, and/or the number of characters in the text is 3, and/or the number of characters in the text is 4, which is not limited in this embodiment of the present invention.

可选的，上述利用预设的停用词表对上述长度歌词文本信息进行过滤处理，得到备选歌词文本信息的具体方式为：Optionally, the above-mentioned filtering processing is performed on the above-mentioned length lyrics text information by using a preset stop word table, and the specific method for obtaining the alternative lyrics text information is as follows:

对于任一长度歌词文本，判断预设的停用词表是否包含该长度歌词文本，得到停用词判断结果；For any length of lyric text, determine whether the preset stop word list contains the lyric text of this length, and obtain the stop word judgment result;

当上述停用词判断结果为否时，确定该长度歌词文本为备选歌词文本。When the above stop word judgment result is no, it is determined that the lyric text of this length is the candidate lyric text.

可选的，上述停用词表包括若干个高频率用语，但无实质意义的词。Optionally, the above-mentioned stop word list includes several words with high frequency but without substantial meaning.

可选的，上述利用预设的频次阈值对上述第一频次信息进行处理，得到第三中间歌词文本信息的具体方式为：Optionally, the above-mentioned first frequency information is processed by using a preset frequency threshold, and the specific method of obtaining the third intermediate lyrics text information is as follows:

对于任一词频次，判断该词频次是否大于预设的频次阈值，得到频次判断结果；For any word frequency, determine whether the word frequency is greater than the preset frequency threshold, and obtain the frequency judgment result;

当上述频次判断结果为是时，确定该词频次对应的备选歌词文本为第三中间歌词文本。When the above frequency judgment result is yes, it is determined that the candidate lyric text corresponding to the word frequency is the third intermediate lyric text.

可见，实施本发明实施例所描述的用于构建领域词典的数据处理方法能够通过对初始歌词文本信息的字符分割、行分割处理、计算出和筛选排序处理的等综合处理得到待用词典，有利于降低对歌曲素材要求，提高对歌曲素材的处理效能，进而提升广告歌词的生成效率和降低制作成本。It can be seen that implementing the data processing method for constructing a domain dictionary described in the embodiments of the present invention can obtain a dictionary to be used through comprehensive processing such as character segmentation, line segmentation processing, calculation, and screening and sorting processing of the initial lyrics text information. It is beneficial to reduce the requirements for song materials and improve the processing efficiency of song materials, thereby improving the generation efficiency of advertisement lyrics and reducing production costs.

在又一个可选的实施例中，上述对第三中间歌词文本信息进行计算筛选排序处理，得到待用词典，包括：In yet another optional embodiment, the above-mentioned calculation, screening and sorting processing is performed on the text information of the third intermediate lyrics to obtain a dictionary to be used, including:

对第三中间歌词文本信息进行信息熵计算处理，得到第一文本熵信息；Performing information entropy calculation processing on the text information of the third intermediate lyrics to obtain the first text entropy information;

利用预设的熵阈值对第一文本熵信息进行筛选处理，得到第二文本熵信息；Screening the first text entropy information by using a preset entropy threshold to obtain second text entropy information;

对第二文本熵信息进行排序处理，得到文本序列信息；排序处理是根据歌词文本片段对应的频率信息进行的；Perform sorting processing on the second text entropy information to obtain text sequence information; sorting processing is performed according to the frequency information corresponding to the lyrics text segments;

对文本序列信息进行处理，得到待用词典。The text sequence information is processed to obtain a dictionary to be used.

在该可选的实施例中，作为一种可选的实施方式，上述对第二文本熵信息进行排序处理，得到文本序列信息的具体方式为：In this optional embodiment, as an optional implementation manner, the above-mentioned sorting processing is performed on the second text entropy information to obtain the text sequence information. The specific manner is:

计算上述第二文本熵信息在初始歌词文本信息中的出现频次，得到第二频次信息；上述第二频次信息包括若干个文本频次；Calculate the frequency of occurrence of the above-mentioned second text entropy information in the initial lyrics text information to obtain the second frequency information; the above-mentioned second frequency information includes several text frequencies;

按照文本频次从大到小的顺序对上述第二文本熵信息进行排序，得到文本序列信息。Sort the second text entropy information in descending order of text frequency to obtain text sequence information.

可见，实施本发明实施例所描述的用于构建领域词典的数据处理方法能够通过对第三中间歌词文本信息的信息熵计算、筛选处理和排序处理等综合处理得到待用词典，更有利于降低对歌曲素材要求，提高对歌曲素材的处理效能，进而提升广告歌词的生成效率和降低制作成本。It can be seen that implementing the data processing method for constructing a domain dictionary described in the embodiment of the present invention can obtain a dictionary to be used through comprehensive processing such as information entropy calculation, screening processing and sorting processing on the text information of the third intermediate lyrics, which is more conducive to reducing For song material requirements, improve the processing efficiency of song material, thereby improving the generation efficiency of advertising lyrics and reducing production costs.

实施例二Embodiment 2

请参阅图2，图2是本发明实施例公开的另一种用于构建领域词典的数据处理方法的流程示意图。其中，图2所描述的用于构建领域词典的数据处理方法应用于数据处理系统中，如用于构建领域词典的数据处理管理的本地服务器或云端服务器等，本发明实施例不做限定。如图2所示，该用于构建领域词典的数据处理方法可以包括以下操作：Please refer to FIG. 2, which is a schematic flowchart of another data processing method for constructing a domain dictionary disclosed in an embodiment of the present invention. The data processing method for constructing a domain dictionary described in FIG. 2 is applied to a data processing system, such as a local server or a cloud server for data processing management for constructing a domain dictionary, which is not limited in the embodiment of the present invention. As shown in Figure 2, the data processing method for constructing a domain dictionary may include the following operations:

201、获取待处理素材信息。201. Obtain material information to be processed.

202、对待处理素材信息进行分割过滤处理，得到待用词典。202. Perform segmentation and filtering processing on the material information to be processed to obtain a to-be-used dictionary.

203、利用预设的词向量模型对待用词典和营销文本信息进行处理，得到词向量信息。203. Use a preset word vector model to process the to-be-used dictionary and marketing text information to obtain word vector information.

204、利用预设的第一分词工具对待用词典和营销文本信息进行处理，得到待用领域词信息。204. Use the preset first word segmentation tool to process the dictionary to be used and the marketing text information to obtain the information of the domain word to be used.

205、对词向量信息和待用领域词信息进行转换筛选处理，得到领域关键词信息。205. Perform conversion and screening processing on the word vector information and the information of the domain words to be used, to obtain domain keyword information.

本发明实施例中，针对步骤201-步骤202的具体技术细节和技术名词解释，可以参照实施例一中针对步骤101-步骤102的详细描述，本发明实施例不再赘述。In this embodiment of the present invention, for the specific technical details and technical term explanations of steps 201 to 202, reference may be made to the detailed description of steps 101 to 102 in Embodiment 1, which is not repeated in this embodiment of the present invention.

可选的，上述利用预设的第一分词工具对待用词典和营销文本信息进行处理可以在利用预设的词向量模型对待用词典和营销文本信息进行处理之前执行，也可以在利用预设的词向量模型对待用词典和营销文本信息进行处理之后，还可以与利用预设的词向量模型对待用词典和营销文本信息进行处理并列执行，本发明实施例不做限定。Optionally, the above-mentioned processing of the to-be-used dictionary and marketing text information by using the preset first word segmentation tool may be performed before using the preset word vector model to process the to-be-used dictionary and marketing text information, or may be performed using the preset word vector model. After the word vector model is used to process the dictionary to be used and the marketing text information, it may be executed in parallel with the processing of the dictionary to be used and the marketing text information using the preset word vector model, which is not limited in this embodiment of the present invention.

可选的，上述词向量模型包括基于fasttext的模型，和/或，基于golve的模型，和/或，基于word2vec的模型，和/或，基于bert的模型，和/或，基于wobert的模型，本发明实施例不做限定。Optionally, the above word vector model includes a fasttext-based model, and/or a golve-based model, and/or a word2vec-based model, and/or a bert-based model, and/or a wobert-based model, This embodiment of the present invention is not limited.

可选的，上述第一分词工具包括基于jieba的分词器，和/或，基于lac的分词器，和/或，基于ltp的分词器，和/或，基于thulac的分词器，和/或，hanlp的分词器，和/或，基于pskseg的分词器，本发明实施例不做限定。Optionally, the above-mentioned first word segmentation tool includes a jieba-based tokenizer, and/or, a lac-based tokenizer, and/or, an ltp-based tokenizer, and/or, a thulac-based tokenizer, and/or, The tokenizer of hanlp, and/or the tokenizer based on pskseg, is not limited in this embodiment of the present invention.

在该可选的实施例子，作为一种可选的实施方式，上述利用预设的第一分词工具对待用词典和营销文本信息进行处理，得到待用领域词信息的具体方式为：In this optional implementation example, as an optional implementation manner, the above-mentioned use of the preset first word segmentation tool to process the to-be-used dictionary and marketing text information to obtain the to-be-used domain word information is as follows:

采用上述预设的第一分词工具导入上述待用词典，并对上述营销文本信息进行分词和词性标注处理，得到文本词性信息；Import the above-mentioned dictionary to be used by using the above-mentioned preset first word segmentation tool, and perform word segmentation and part-of-speech tagging processing on the above-mentioned marketing text information to obtain text part-of-speech information;

利用预设的词性筛选条件对上述文本词性信息进行筛选处理，得到待用领域词信息。The above-mentioned text part-of-speech information is screened by using a preset part-of-speech filter condition to obtain the field word information to be used.

可见，实施本发明实施例所描述的用于构建领域词典的数据处理方法能够通过对待处理素材信息的分割过滤处理得到待用词典，再通过分割过滤处理和第一分词工具的综合处理得到用于构建领域词典的领域关键词信息，有利于降低对歌曲素材要求，提高对歌曲素材的处理效能，进而提升广告歌词的生成效率和降低制作成本。It can be seen that implementing the data processing method for constructing a domain dictionary described in the embodiment of the present invention can obtain a dictionary to be used by dividing and filtering the information of the material to be processed, and then obtain a dictionary for use through the comprehensive processing of the dividing and filtering and the first word segmentation tool Constructing the domain keyword information of the domain dictionary is conducive to reducing the requirements for song materials, improving the processing efficiency of song materials, thereby improving the generation efficiency of advertisement lyrics and reducing production costs.

在一个可选的实施例中，上述步骤205中对词向量信息和待用领域词信息进行转换筛选处理，得到领域关键词信息，包括：In an optional embodiment, in the above step 205, conversion and screening processing is performed on the word vector information and the domain word information to be used to obtain domain keyword information, including:

利用词向量信息对待用领域词信息进行转换处理，得到领域词向量信息；Use the word vector information to convert the domain word information to be used to obtain the domain word vector information;

对领域词向量信息进行聚类计算处理，得到词距离信息；Perform clustering calculation processing on domain word vector information to obtain word distance information;

对词距离信息进行分组筛选处理，得到领域关键词信息。The word distance information is grouped and filtered to obtain domain keyword information.

在该可选的实施例中，作为一种可选的实施方式，上述对领域词向量信息进行聚类计算处理，得到词距离信息的具体方式为：In this optional embodiment, as an optional implementation manner, the above-mentioned clustering calculation processing is performed on the domain word vector information, and the specific method of obtaining the word distance information is as follows:

利用预设的聚类模型对上述领域词向量信息进行聚类处理，并根据预设的类别数量参数，确定出聚类中心信息；上述聚类中心信息包括若干个聚类中心；上述聚类中心的数量与上述类别数量参数相关；The above-mentioned domain word vector information is clustered by using a preset clustering model, and the cluster center information is determined according to the preset number of categories parameters; the above-mentioned cluster center information includes several cluster centers; the above-mentioned cluster centers The quantity is related to the above category quantity parameter;

计算领域词向量信息与上述聚类中心信息的欧几里得距离，得到词距离信息；上述词距离信息包括若干个词距离。Calculate the Euclidean distance between the domain word vector information and the above cluster center information to obtain word distance information; the above word distance information includes several word distances.

在该可选的实施例中，作为另一种可选的实施方式，上述对词距离信息进行分组筛选处理，得到领域关键词信息的具体方式为：In this optional embodiment, as another optional implementation manner, the above-mentioned specific method of performing group screening processing on word distance information to obtain domain keyword information is as follows:

对上述词距离信息进行类别分组，得到类别距离信息；Classify the above word distance information to obtain the class distance information;

按词距离从小到大顺序对上述类别距离信息进行排序，得到距离序列信息；Sort the above category distance information according to the word distance from small to large to obtain the distance sequence information;

利用预设的词汇阈值信息对上述距离序列信息进行筛选处理，得到领域关键词信息。The above distance sequence information is screened by using preset vocabulary threshold information to obtain domain keyword information.

可见，实施本发明实施例所描述的用于构建领域词典的数据处理方法能够通过对待用领域词信息的转换处理、聚类计算处理和分组筛选处理等综合处理得到领域关键词信息，更有利于降低对歌曲素材要求，提高对歌曲素材的处理效能，进而提升广告歌词的生成效率和降低制作成本。It can be seen that, implementing the data processing method for constructing a domain dictionary described in the embodiments of the present invention can obtain domain keyword information through comprehensive processing such as conversion processing, clustering calculation processing, and group screening processing of the domain word information to be used, which is more conducive to Reduce the requirements for song materials and improve the processing efficiency of song materials, thereby improving the generation efficiency of advertising lyrics and reducing production costs.

在另一个可选的实施例中，在对待用词典进行分割筛选处理，得到领域关键词信息之后，方法还包括：In another optional embodiment, after the dictionary to be used is segmented and filtered to obtain domain keyword information, the method further includes:

对押韵字表信息进行处理，得到同韵列表；Process the rhyming word table information to obtain a rhyme list;

利用预设的第二分词工具对同韵列表、领域关键词信息和待处理歌曲信息进行处理，得到训练文本信息；训练文本信息用于训练格式限制模型；格式限制模型用于约束广告歌词的格式。Use the preset second word segmentation tool to process the rhyme list, domain keyword information and to-be-processed song information to obtain training text information; the training text information is used to train the format restriction model; the format restriction model is used to restrict the format of advertisement lyrics .

可选的，上述对押韵字表信息进行处理是利用xpinyin库进行的。Optionally, the above processing of the rhyming word table information is performed by using the xpinyin library.

可选的，上述训练文本信息包括以特殊字符进行分割的歌词文本信息，和/或，词性信息，和/或，韵脚标记信息，本发明实施例不做限定。Optionally, the above training text information includes lyrics text information segmented by special characters, and/or part-of-speech information, and/or rhyme mark information, which is not limited in this embodiment of the present invention.

可选的，上述特殊字符包括空格，和/或，标点符号，本发明实施例不做限定。Optionally, the above-mentioned special characters include spaces and/or punctuation marks, which are not limited in this embodiment of the present invention.

可选的，上述利用预设的第二分词工具对同韵列表、领域关键词信息和待处理歌曲信息进行处理是利用第二分词工具和自建词典进行的。Optionally, the above-mentioned processing of the rhyme list, the domain keyword information and the song information to be processed by using the preset second word segmentation tool is performed by using the second word segmentation tool and a self-built dictionary.

可选的，上述自建词典包括行业品类词，和/或，品牌词，和/或，成分词，和/或，功效词，和/或，营销词，本发明实施例不做限定。Optionally, the above self-built dictionary includes industry category words, and/or brand words, and/or component words, and/or function words, and/or marketing words, which are not limited in this embodiment of the present invention.

可选的，上述第二分词工具包括基于jieba的分词器，和/或，基于lac的分词器，和/或，基于ltp的分词器，和/或，基于thulac的分词器，和/或，hanlp的分词器，和/或，基于pskseg的分词器，本发明实施例不做限定。Optionally, the above-mentioned second word segmentation tool includes a jieba-based tokenizer, and/or, a lac-based tokenizer, and/or, an ltp-based tokenizer, and/or, a thulac-based tokenizer, and/or, The tokenizer of hanlp, and/or the tokenizer based on pskseg, is not limited in this embodiment of the present invention.

可见，实施本发明实施例所描述的用于构建领域词典的数据处理方法能够通过对押韵字表信息、领域关键词信息和待处理歌曲信息的综合处理得到用于训练格式限制模型的训练文本信息，更有利于降低对歌曲素材要求，提高对歌曲素材的处理效能，进而提升广告歌词的生成效率和降低制作成本。It can be seen that, implementing the data processing method for constructing a domain dictionary described in the embodiment of the present invention can obtain the training text information for training the format restriction model by comprehensively processing the rhyming word table information, the domain keyword information and the song information to be processed , which is more conducive to reducing the requirements for song materials and improving the processing efficiency of song materials, thereby improving the generation efficiency of advertising lyrics and reducing production costs.

实施例三Embodiment 3

请参阅图3，图3是本发明实施例公开的一种用于构建领域词典的数据处理装置的结构示意图。其中，图3所描述的装置能够应用于数据处理系统中，如用于构建领域词典的数据处理管理的本地服务器或云端服务器等，本发明实施例不做限定。如图3所示，该装置可以包括：Please refer to FIG. 3 , which is a schematic structural diagram of a data processing apparatus for constructing a domain dictionary disclosed in an embodiment of the present invention. The apparatus described in FIG. 3 can be applied to a data processing system, such as a local server or a cloud server for data processing management for constructing a domain dictionary, which is not limited in this embodiment of the present invention. As shown in Figure 3, the device may include:

获取模块301，用于获取待处理素材信息；待处理素材信息包括营销文本信息，和/或，待处理歌曲信息，和/或，押韵字表信息；An acquisition module 301, configured to acquire material information to be processed; the material information to be processed includes marketing text information, and/or song information to be processed, and/or rhyming word table information;

第一处理模块302，用于对待处理素材信息进行分割过滤处理，得到待用词典；待用词典用于对素材文本中的词语进行筛选；The first processing module 302 is used for dividing and filtering the material information to be processed to obtain a dictionary to be used; the dictionary to be used is used to filter words in the material text;

第二处理模块303，用于对待用词典进行分割筛选处理，得到领域关键词信息；关键词信息与待处理素材信息所属的广告领域是强相关的；领域关键词用于构建领域词典。The second processing module 303 is used for dividing and screening the to-be-used dictionary to obtain domain keyword information; the keyword information is strongly correlated with the advertising domain to which the to-be-processed material information belongs; the domain keywords are used to construct a domain dictionary.

可见，实施图3所描述的用于构建领域词典的数据处理装置，能够通过对待处理素材信息的分割过滤处理和分割筛选处理等综合处理得到用于构建领域词典的领域关键词信息，有利于降低对歌曲素材要求，提高对歌曲素材的处理效能，进而提升广告歌词的生成效率和降低制作成本。It can be seen that, implementing the data processing device for constructing a domain dictionary described in FIG. 3 can obtain domain keyword information for constructing a domain dictionary through comprehensive processing such as segmentation and filtering of the material information to be processed and segmentation and filtering, which is beneficial for reducing For song material requirements, improve the processing efficiency of song material, thereby improving the generation efficiency of advertising lyrics and reducing production costs.

在另一个可选的实施例中，如图4所示，第一处理模块302包括第一处理子模块3021和第二处理子模块3022，其中：In another optional embodiment, as shown in FIG. 4 , the first processing module 302 includes a first processing sub-module 3021 and a second processing sub-module 3022, wherein:

第一处理子模块3021，用于对待处理歌曲信息进行识别处理，得到初始歌词文本信息；The first processing submodule 3021 is used to identify and process the song information to be processed to obtain initial lyrics text information;

第二处理子模块3022，用于对初始歌词文本信息分割过滤处理，得到待用词典。The second processing sub-module 3022 is used for dividing and filtering the initial lyrics text information to obtain a dictionary to be used.

可见，实施图4所描述的用于构建领域词典的数据处理装置，能够通过对待处理歌曲信息的识别处理以及分割过滤处理得到待用词典，有利于降低对歌曲素材要求，提高对歌曲素材的处理效能，进而提升广告歌词的生成效率和降低制作成本。It can be seen that implementing the data processing device for constructing a domain dictionary described in FIG. 4 can obtain a dictionary to be used through the identification processing and segmentation and filtering processing of the song information to be processed, which is conducive to reducing the requirements for song materials and improving the processing of song materials. efficiency, thereby improving the generation efficiency of advertising lyrics and reducing production costs.

在又一个可选的实施例中，如图4所示，第二处理子模块3022对初始歌词文本信息分割过滤处理，得到待用词典，包括：In yet another optional embodiment, as shown in FIG. 4 , the second processing submodule 3022 divides and filters the initial lyrics text information to obtain a dictionary to be used, including:

可见，实施图4所描述的用于构建领域词典的数据处理装置，能够通过对初始歌词文本信息的字符分割、行分割处理、计算出和筛选排序处理的等综合处理得到待用词典，有利于降低对歌曲素材要求，提高对歌曲素材的处理效能，进而提升广告歌词的生成效率和降低制作成本。It can be seen that, implementing the data processing device for constructing a domain dictionary described in FIG. 4 can obtain a dictionary to be used through comprehensive processing such as character segmentation, line segmentation processing, calculation, and screening and sorting processing of the initial lyrics text information, which is beneficial to Reduce the requirements for song materials and improve the processing efficiency of song materials, thereby improving the generation efficiency of advertising lyrics and reducing production costs.

在又一个可选的实施例中，如图4所示，第二处理子模块3022对第三中间歌词文本信息进行计算筛选排序处理，得到待用词典，包括：In yet another optional embodiment, as shown in FIG. 4 , the second processing submodule 3022 performs calculation, screening and sorting processing on the third intermediate lyrics text information to obtain a dictionary to be used, including:

可见，实施图4所描述的用于构建领域词典的数据处理装置，能够通过对第三中间歌词文本信息的信息熵计算、筛选处理和排序处理等综合处理得到待用词典，更有利于降低对歌曲素材要求，提高对歌曲素材的处理效能，进而提升广告歌词的生成效率和降低制作成本。It can be seen that implementing the data processing device for constructing a domain dictionary described in FIG. 4 can obtain a dictionary to be used through comprehensive processing such as information entropy calculation, screening processing and sorting processing on the text information of the third intermediate lyrics, which is more conducive to reducing the need for Song material requirements, improve the processing efficiency of song material, thereby improving the generation efficiency of advertising lyrics and reducing production costs.

在又一个可选的实施例中，如图4所示，第二处理模块303对待用词典进行分割筛选处理，得到领域关键词信息的具体方式为：In yet another optional embodiment, as shown in FIG. 4 , the second processing module 303 performs segmentation and screening processing on the dictionary to be used, and the specific manner of obtaining the domain keyword information is as follows:

利用预设的词向量模型对待用词典和营销文本信息进行处理，得到词向量信息；Use the preset word vector model to process the to-be-used dictionary and marketing text information to obtain word vector information;

利用预设的第一分词工具对待用词典和营销文本信息进行处理，得到待用领域词信息；Use the preset first word segmentation tool to process the to-be-used dictionary and marketing text information to obtain the to-be-used domain word information;

对词向量信息和待用领域词信息进行转换筛选处理，得到领域关键词信息。The word vector information and the field word information to be used are converted and screened to obtain the field keyword information.

可见，实施图4所描述的用于构建领域词典的数据处理装置，能够通过对待处理素材信息的分割过滤处理得到待用词典，再通过分割过滤处理和第一分词工具的综合处理得到用于构建领域词典的领域关键词信息，有利于降低对歌曲素材要求，提高对歌曲素材的处理效能，进而提升广告歌词的生成效率和降低制作成本。It can be seen that, by implementing the data processing device for constructing a domain dictionary described in FIG. 4 , a dictionary to be used can be obtained by dividing and filtering the information of the material to be processed, and then the dictionaries to be used can be obtained by the comprehensive processing of the dividing and filtering and the first word segmentation tool. The domain keyword information of the domain dictionary is conducive to reducing the requirements for song materials, improving the processing efficiency of song materials, thereby improving the generation efficiency of advertisement lyrics and reducing production costs.

在又一个可选的实施例中，如图4所示，第二处理模块303对词向量信息和待用领域词信息进行转换筛选处理，得到领域关键词信息的具体方式为：In yet another optional embodiment, as shown in FIG. 4 , the second processing module 303 performs conversion and screening processing on the word vector information and the field word information to be used, and the specific method of obtaining the field keyword information is as follows:

可见，实施图4所描述的用于构建领域词典的数据处理装置，能够通过对待用领域词信息的转换处理、聚类计算处理和分组筛选处理等综合处理得到领域关键词信息，更有利于降低对歌曲素材要求，提高对歌曲素材的处理效能，进而提升广告歌词的生成效率和降低制作成本。It can be seen that implementing the data processing device for constructing a domain dictionary described in FIG. 4 can obtain domain keyword information through comprehensive processing such as conversion processing, clustering calculation processing, and group screening processing of the domain word information to be used, which is more conducive to reducing For song material requirements, improve the processing efficiency of song material, thereby improving the generation efficiency of advertising lyrics and reducing production costs.

在又一个可选的实施例中，如图4所示，在第二处理模块303对待用词典进行分割筛选处理，得到领域关键词信息之后，装置还包括：In yet another optional embodiment, as shown in FIG. 4 , after the second processing module 303 performs segmentation and screening processing on the dictionary to be used to obtain domain keyword information, the apparatus further includes:

第三处理模块304，用于对押韵字表信息进行处理，得到同韵列表；The third processing module 304 is used to process the rhyming word table information to obtain a rhyme list;

第四处理模块305，用于利用预设的第二分词工具对同韵列表、领域关键词信息和待处理歌曲信息进行处理，得到训练文本信息；训练文本信息用于训练格式限制模型；格式限制模型用于约束广告歌词的格式。The fourth processing module 305 is used to process the rhyme list, the domain keyword information and the song information to be processed by using the preset second word segmentation tool to obtain the training text information; the training text information is used for training the format restriction model; the format restriction Models are used to constrain the format of ad lyrics.

可见，实施图4所描述的用于构建领域词典的数据处理装置，能够通过对押韵字表信息、领域关键词信息和待处理歌曲信息的综合处理得到用于训练格式限制模型的训练文本信息，更有利于降低对歌曲素材要求，提高对歌曲素材的处理效能，进而提升广告歌词的生成效率和降低制作成本。It can be seen that, implementing the data processing device described in FIG. 4 for constructing a domain dictionary can obtain the training text information for training the format restriction model by comprehensively processing the rhyming word table information, the domain keyword information and the song information to be processed, It is more conducive to reducing the requirements for song materials and improving the processing efficiency of song materials, thereby improving the generation efficiency of advertising lyrics and reducing production costs.

实施例四Embodiment 4

请参阅图5，图5是本发明实施例公开的又一种用于构建领域词典的数据处理装置的结构示意图。其中，图5所描述的装置能够应用于数据处理系统中，如用于构建领域词典的数据处理管理的本地服务器或云端服务器等，本发明实施例不做限定。如图5所示，该装置可以包括：Please refer to FIG. 5. FIG. 5 is a schematic structural diagram of another data processing apparatus for constructing a domain dictionary disclosed in an embodiment of the present invention. The apparatus described in FIG. 5 can be applied to a data processing system, such as a local server or a cloud server for data processing management for building a domain dictionary, which is not limited in this embodiment of the present invention. As shown in Figure 5, the apparatus may include:

存储有可执行程序代码的存储器401；a memory 401 storing executable program code;

与存储器401耦合的处理器402；a processor 402 coupled to the memory 401;

处理器402调用存储器401中存储的可执行程序代码，用于执行实施例一或实施例二所描述的用于构建领域词典的数据处理方法中的步骤。The processor 402 invokes the executable program code stored in the memory 401 to execute the steps in the data processing method for constructing a domain dictionary described in Embodiment 1 or Embodiment 2.

实施例五Embodiment 5

本发明实施例公开了一种计算机读存储介质，其存储用于电子数据交换的计算机程序，其中，该计算机程序使得计算机执行实施例一或实施例二所描述的用于构建领域词典的数据处理方法中的步骤。An embodiment of the present invention discloses a computer-readable storage medium, which stores a computer program for electronic data exchange, wherein the computer program enables a computer to execute the data processing for constructing a domain dictionary described in Embodiment 1 or Embodiment 2 steps in the method.

实施例六Embodiment 6

本发明实施例公开了一种计算机程序产品，该计算机程序产品包括存储了计算机程序的非瞬时性计算机可读存储介质，且该计算机程序可操作来使计算机执行实施例一或实施例二所描述的用于构建领域词典的数据处理方法中的步骤。An embodiment of the present invention discloses a computer program product, the computer program product includes a non-transitory computer-readable storage medium storing a computer program, and the computer program is operable to cause a computer to execute the description in the first embodiment or the second embodiment The steps in the data processing method for building a domain dictionary.

以上所描述的装置实施例仅是示意性的，其中作为分离部件说明的模块可以是或者也可以不是物理上分开的，作为模块显示的部件可以是或者也可以不是物理模块，即可以位于一个地方，或者也可以分布到多个网络模块上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性的劳动的情况下，即可以理解并实施。The device embodiments described above are only illustrative, wherein the modules described as separate components may or may not be physically separated, and the components shown as modules may or may not be physical modules, that is, they may be located in one place , or distributed to multiple network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment. Those of ordinary skill in the art can understand and implement it without creative effort.

通过以上的实施例的具体描述，本领域的技术人员可以清楚地了解到各实施方式可借助软件加必需的通用硬件平台的方式来实现，当然也可以通过硬件。基于这样的理解，上述技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来，该计算机软件产品可以存储在计算机可读存储介质中,存储介质包括只读存储器(Read-Only Memory，ROM)、随机存储器(Random Access Memory，RAM)、可编程只读存储器(Programmable Read-only Memory，PROM)、可擦除可编程只读存储器(ErasableProgrammable Read Only Memory，EPROM)、一次可编程只读存储器(One-timeProgrammable Read-Only Memory，OTPROM)、电子抹除式可复写只读存储器(Electrically-Erasable Programmable Read-Only Memory，EEPROM)、只读光盘(CompactDisc Read-Only Memory，CD-ROM)或其他光盘存储器、磁盘存储器、磁带存储器、或者能够用于携带或存储数据的计算机可读的任何其他介质。From the specific description of the above embodiments, those skilled in the art can clearly understand that each implementation manner can be implemented by means of software plus a necessary general hardware platform, and certainly can also be implemented by means of hardware. Based on such understanding, the above-mentioned technical solutions can be embodied in the form of software products in essence or that make contributions to the prior art. The computer software products can be stored in a computer-readable storage medium, and the storage medium includes a read-only memory. (Read-Only Memory, ROM), Random Access Memory (Random Access Memory, RAM), Programmable Read-only Memory (Programmable Read-only Memory, PROM), Erasable Programmable Read Only Memory (Erasable Programmable Read Only Memory, EPROM) , One-time Programmable Read-Only Memory (OTPROM), Electronically-Erasable Programmable Read-Only Memory (EEPROM), CompactDisc Read-Only Memory , CD-ROM) or other optical disk storage, magnetic disk storage, magnetic tape storage, or any other computer-readable medium that can be used to carry or store data.

最后应说明的是：本发明实施例公开的一种用于构建领域词典的数据处理方法及装置所揭露的仅为本发明较佳实施例而已，仅用于说明本发明的技术方案，而非对其限制；尽管参照前述实施例对本发明进行了详细的说明，本领域的普通技术人员应当理解；其依然可以对前述各项实施例所记载的技术方案进行修改，或者对其中部分技术特征进行等同替换；而这些修改或替换，并不使相应的技术方案的本质脱离本发明各项实施例技术方案的精神和范围。Finally, it should be noted that the data processing method and device for constructing a domain dictionary disclosed by the embodiments of the present invention are only preferred embodiments of the present invention, and are only used to illustrate the technical solutions of the present invention, not It is limited; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that; it is still possible to modify the technical solutions recorded in the foregoing embodiments, or perform some technical features. Equivalent replacement; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the various embodiments of the present invention.

Claims

1. A data processing method for constructing a domain dictionary, the method comprising:

acquiring information of a material to be processed; the information of the material to be processed comprises marketing text information, and/or information of songs to be processed, and/or information of a rhyme-retention word list;

carrying out segmentation and filtration processing on the material information to be processed to obtain a dictionary to be used; the standby dictionary is used for screening words in the material text;

carrying out segmentation screening processing on the dictionary to be used to obtain domain keyword information; the keyword information is strongly related to the advertisement field to which the material information to be processed belongs; the domain keywords are used to construct a domain dictionary.

2. The data processing method for constructing a domain dictionary according to claim 1, wherein the segmenting and filtering the to-be-processed material information to obtain a to-be-used dictionary comprises:

identifying the song information to be processed to obtain initial lyric text information;

and segmenting and filtering the initial lyric text information to obtain a standby dictionary.

3. The data processing method for constructing a domain dictionary according to claim 2, wherein the segmenting and filtering the initial lyric text information to obtain a standby dictionary comprises:

performing character segmentation and line separation processing on the initial lyric text information to obtain first intermediate lyric text information;

calculating the first intermediate lyric text information to obtain second intermediate lyric text information;

screening the second intermediate lyric text information by using a preset stop word list to obtain third intermediate lyric text information; the third intermediate lyrics text information comprises a plurality of lyrics text fragments;

and calculating, screening and sorting the third intermediate lyric text information to obtain a standby dictionary.

4. The data processing method for constructing a domain dictionary according to claim 3, wherein the performing a computation, screening and sorting process on the third intermediate lyric text information to obtain a standby dictionary comprises:

performing information entropy calculation processing on the third intermediate lyric text information to obtain first text entropy information;

screening the first text entropy information by using a preset entropy threshold value to obtain second text entropy information;

sequencing the second text entropy information to obtain text sequence information; the sorting processing is carried out according to the frequency information corresponding to the lyric text fragment;

and processing the text sequence information to obtain a standby dictionary.

5. The data processing method for constructing a domain dictionary according to claim 1, wherein the performing segmentation and screening processing on the dictionary to be used to obtain domain keyword information includes:

processing the standby dictionary and the marketing text information by using a preset word vector model to obtain word vector information;

processing the dictionary to be used and the marketing text information by using a preset first word segmentation tool to obtain word information of the field to be used;

and performing conversion screening processing on the word vector information and the field word information to be used to obtain field keyword information.

6. The data processing method for constructing a domain dictionary according to claim 5, wherein the performing conversion screening processing on the word vector information and the standby domain word information to obtain domain keyword information includes:

converting the standby domain word information by using the word vector information to obtain domain word vector information;

clustering calculation processing is carried out on the field word vector information to obtain word distance information;

and carrying out grouping and screening processing on the word distance information to obtain domain keyword information.

7. The data processing method for constructing a domain dictionary according to claim 1, wherein after the segmentation and screening process of the dictionary to be used is performed to obtain domain keyword information, the method further comprises:

processing the rhyme-retention character table information to obtain a rhyme-retention list;

processing the rhyme list, the domain keyword information and the song information to be processed by using a preset second word segmentation tool to obtain training text information; the training text information is used for training a format restriction model; the format restriction model is used for restricting the format of the advertisement lyrics.

8. A data processing apparatus for constructing a domain dictionary, the apparatus comprising:

the acquisition module is used for acquiring the information of the material to be processed; the information of the material to be processed comprises marketing text information, and/or information of songs to be processed, and/or information of a rhyme-retention word list;

the first processing module is used for carrying out segmentation and filtering processing on the material information to be processed to obtain a dictionary to be used; the standby dictionary is used for screening words in the material text;

the second processing module is used for carrying out segmentation screening processing on the standby dictionary to obtain domain keyword information; the keyword information is strongly related to the advertisement field to which the material information to be processed belongs; the domain keywords are used to construct a domain dictionary.

9. A data processing apparatus for constructing a domain dictionary, the apparatus comprising:

a memory storing executable program code;

a processor coupled with the memory;

the processor calls the executable program code stored in the memory to execute the data processing method for constructing a domain dictionary according to any one of claims 1 to 7.

10. A computer-storable medium that stores computer instructions for executing a data processing method for constructing a domain dictionary according to any one of claims 1 to 7 when being called.