CN118760750A - A method and device for optimizing retrieval of colloquial text content - Google Patents
A method and device for optimizing retrieval of colloquial text content Download PDFInfo
- Publication number
- CN118760750A CN118760750A CN202410809820.9A CN202410809820A CN118760750A CN 118760750 A CN118760750 A CN 118760750A CN 202410809820 A CN202410809820 A CN 202410809820A CN 118760750 A CN118760750 A CN 118760750A
- Authority
- CN
- China
- Prior art keywords
- target entity
- retrieval
- attribute field
- attribute
- preset
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/166—Editing, e.g. inserting or deleting
- G06F40/186—Templates
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Mathematical Physics (AREA)
- Human Computer Interaction (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Machine Translation (AREA)
Abstract
本发明提供一种口语化文本内容的检索优化方法及装置。该方法包括:获得输入的口语化文本信息中包含的目标实体;基于预设的分布式检索数据库对目标实体进行关联检索分析,获得目标实体对应的标签名称及其属性字段;分布式检索数据库中预先存储有各个实体分别与各类标签名称及其属性字段之间的对应关系;基于标签名称及其属性字段进行模板构建处理,获得目标实体分别对应的提示词模板;基于预设的语言交互模型对提示词模板进行处理,获得语言交互模型基于标签名称及其属性字段返回的问答交互结果。本发明提供的口语化文本内容的检索优化方法,能够有效提高口语化文本内容的检索精确度和检索效率。
The present invention provides a method and device for optimizing the retrieval of colloquial text content. The method comprises: obtaining a target entity contained in the input colloquial text information; performing an associated retrieval analysis on the target entity based on a preset distributed retrieval database to obtain a label name and its attribute field corresponding to the target entity; the distributed retrieval database pre-stores the corresponding relationship between each entity and each type of label name and its attribute field; performing template construction processing based on the label name and its attribute field to obtain a prompt word template corresponding to the target entity; processing the prompt word template based on a preset language interaction model to obtain a question-answer interaction result returned by the language interaction model based on the label name and its attribute field. The method for optimizing the retrieval of colloquial text content provided by the present invention can effectively improve the retrieval accuracy and efficiency of the colloquial text content.
Description
技术领域Technical Field
本发明涉及人工智能技术领域,具体涉及一种口语化文本内容的检索优化方法及装置。另外,还涉及一种电子设备及处理器可读存储介质。The present invention relates to the field of artificial intelligence technology, and more particularly to a method and device for optimizing the retrieval of colloquial text content. In addition, the present invention also relates to an electronic device and a processor-readable storage medium.
背景技术Background Art
近年来,随着人工智能技术的快速发展,各种基于人工智能技术驱动的大模型越来越多,如ChatGPT(Chat Generative Pre-trained Transformer)等开放平台,其通常能够根据上下文进行互动交流。目前,用户的表达非常口语化,例如用户提出一个口语化文本信息,用户并没有说口语化文本信息中提到的某个实体是什么东西,它有可能是一个设备,有可能是一个工装,有可能是一个原料。这种情况会对大模型造成困扰,不确定该实体具体是什么,导致最终的检索结果错误。因此,如何设计一种口语化文本内容的检索优化方案成为当前亟待解决的技术问题。In recent years, with the rapid development of artificial intelligence technology, various large models driven by artificial intelligence technology have become more and more numerous, such as open platforms such as ChatGPT (Chat Generative Pre-trained Transformer), which are usually able to interact and communicate according to the context. At present, users' expressions are very colloquial. For example, when a user proposes a colloquial text message, the user does not say what a certain entity mentioned in the colloquial text message is. It may be a device, a tool, or a raw material. This situation will cause trouble for the large model, and it is uncertain what the specific entity is, resulting in errors in the final retrieval results. Therefore, how to design a retrieval optimization solution for colloquial text content has become a technical problem that needs to be solved urgently.
发明内容Summary of the invention
为此,本发明提供一种口语化文本内容的检索优化方法及装置,以解决现有技术中存在的口语化文本内容的检索方案局限性较高。导致问答检索结果的针对性和效率较差的缺陷。To this end, the present invention provides a method and device for optimizing the retrieval of colloquial text content, so as to solve the defects that the retrieval scheme of colloquial text content in the prior art is highly limited, resulting in poor pertinence and efficiency of question-answer retrieval results.
第一方面,本发明提供一种口语化文本内容的检索优化方法,包括:In a first aspect, the present invention provides a method for optimizing retrieval of colloquial text content, comprising:
获得输入的口语化文本信息中包含的目标实体;Obtaining a target entity contained in the input spoken text information;
基于预设的分布式检索数据库对所述目标实体进行关联检索分析,获得所述目标实体对应的标签名称及其属性字段;其中,所述分布式检索数据库中预先存储有各个实体分别与各类标签名称及其属性字段之间的对应关系;Based on a preset distributed search database, an associated search analysis is performed on the target entity to obtain a label name and an attribute field corresponding to the target entity; wherein the distributed search database pre-stores the corresponding relationship between each entity and each type of label name and its attribute field;
基于所述标签名称及其属性字段进行模板构建处理,获得所述目标实体分别对应的提示词模板;其中,所述提示词模板包含所述目标实体对应的所述标签名称及其属性字段;Performing template construction processing based on the tag name and its attribute field to obtain prompt word templates corresponding to the target entities; wherein the prompt word templates include the tag name and its attribute field corresponding to the target entity;
基于预设的语言交互模型对所述提示词模板进行处理,获得所述语言交互模型基于所述标签名称及其属性字段返回的问答交互结果。The prompt word template is processed based on a preset language interaction model to obtain a question-answer interaction result returned by the language interaction model based on the tag name and its attribute fields.
进一步的,所述基于预设的分布式检索数据库对所述目标实体进行关联检索分析,获得所述目标实体对应的标签名称及其属性字段,包括:Furthermore, the associated search analysis is performed on the target entity based on a preset distributed search database to obtain a tag name and attribute field corresponding to the target entity, including:
以所述目标实体为索引,到所述分布式检索数据库进行关联检索分析以查找出与所述索引之间的相似度满足预设相似度阈值的标签名称及其属性字段;其中,所述标签名称对应的表结构中包含至少一个与所述目标实体对应的属性字段。Taking the target entity as an index, an associated search analysis is performed on the distributed search database to find a tag name and its attribute field whose similarity with the index meets a preset similarity threshold; wherein the table structure corresponding to the tag name contains at least one attribute field corresponding to the target entity.
进一步的,在基于预设的分布式检索数据库对所述目标实体进行关联检索分析,获得所述目标实体对应的标签名称及其属性字段之前,还包括:将预设的多种口语化文本信息中包含的各类实体作为键值对中的键,并将所述各类实体对应的各类标签名称及其属性字段作为所述键值对中的值,构建用于表示各个实体分别与各类标签名称及其属性字段之间对应关系的目标键值对存储到所述分布式检索数据库中。Furthermore, before performing an associated search analysis on the target entity based on a preset distributed retrieval database to obtain the label name and attribute field corresponding to the target entity, it also includes: using each type of entity contained in a preset plurality of colloquial text information as the key in a key-value pair, and using each type of label name and attribute field corresponding to each type of entity as the value in the key-value pair, constructing a target key-value pair for representing the corresponding relationship between each entity and each type of label name and attribute field, and storing it in the distributed retrieval database.
进一步的,所述基于所述标签名称及其属性字段进行模板构建处理,获得所述目标实体分别对应的提示词模板,具体包括:Furthermore, the template construction process is performed based on the tag name and its attribute fields to obtain the prompt word templates corresponding to the target entities, specifically including:
基于所述标签名称及其属性字段,按照预设的初始模板进行嵌入处理,获得所述目标实体分别对应的提示词模板;其中,所述提示词模板用于标识所述目标实体分别对应的标签名称及其属性字段。Based on the tag name and its attribute field, embedding processing is performed according to a preset initial template to obtain prompt word templates corresponding to the target entities respectively; wherein the prompt word templates are used to identify the tag name and its attribute field corresponding to the target entities respectively.
进一步的,所述基于预设的语言交互模型对所述提示词模板进行处理,获得所述语言交互模型基于所述标签名称及其属性字段返回的问答交互结果,具体包括:Furthermore, the processing of the prompt word template based on a preset language interaction model to obtain a question-answer interaction result returned by the language interaction model based on the tag name and its attribute field specifically includes:
将所述提示词模板输入到预设的语言交互模型中进行分析,获得所述语言交互模型基于所述标签名称及其属性字段返回的图形数据库查询语句,基于所述查询语句到预设的图数据库中查找获得相应的问答交互结果,并输出所述问答交互结果。The prompt word template is input into a preset language interaction model for analysis, and a graph database query statement returned by the language interaction model based on the tag name and its attribute fields is obtained. Based on the query statement, a preset graph database is searched to obtain a corresponding question-and-answer interaction result, and the question-and-answer interaction result is output.
进一步的,所述获得输入的口语化文本信息中包含的目标实体,包括:Furthermore, the target entity contained in the input spoken text information is obtained, including:
获得输入的口语化文本信息;Obtaining input spoken text information;
对所述口语化文本信息进行分词处理,获得所述口语化文本信息中包含的至少一个目标实体。The colloquial text information is subjected to word segmentation processing to obtain at least one target entity contained in the colloquial text information.
进一步的,所述对所述口语化文本信息进行分词处理,获得所述口语化文本信息中包含的至少一个目标实体,具体包括:Furthermore, the performing word segmentation processing on the colloquial text information to obtain at least one target entity contained in the colloquial text information specifically includes:
对所述口语化文本信息进行分词切分,获得所述口语化文本信息中包含的至少一个文本数据块,从所述文本数据块识别出属性特征模糊的文本,并将所述属性特征模糊的文本作为所述目标实体。The colloquial text information is segmented to obtain at least one text data block contained in the colloquial text information, a text with ambiguous attribute features is identified from the text data block, and the text with ambiguous attribute features is used as the target entity.
第二方面,本发明还提供一种口语化文本内容的检索优化装置,包括:In a second aspect, the present invention further provides a retrieval optimization device for colloquial text content, comprising:
目标实体获得单元,用于获得输入的口语化文本信息中包含的目标实体;A target entity obtaining unit, used for obtaining a target entity contained in the input spoken text information;
标签及属性确定单元,用于基于预设的分布式检索数据库对所述目标实体进行关联检索分析,获得所述目标实体对应的标签名称及其属性字段;其中,所述分布式检索数据库中预先存储有各个实体分别与各类标签名称及其属性字段之间的对应关系;A label and attribute determination unit, configured to perform an associated search analysis on the target entity based on a preset distributed search database to obtain a label name and an attribute field corresponding to the target entity; wherein the distributed search database pre-stores a corresponding relationship between each entity and each type of label name and its attribute field;
模板构建单元,用于基于所述标签名称及其属性字段进行模板构建处理,获得所述目标实体分别对应的提示词模板;其中,所述提示词模板包含所述目标实体对应的所述标签名称及其属性字段;A template construction unit, configured to perform template construction processing based on the tag name and its attribute fields to obtain prompt word templates corresponding to the target entities; wherein the prompt word templates include the tag name and its attribute fields corresponding to the target entity;
查询处理单元,用于基于预设的语言交互模型对所述提示词模板进行处理,获得所述语言交互模型基于所述标签名称及其属性字段返回的问答交互结果。The query processing unit is used to process the prompt word template based on a preset language interaction model to obtain a question-answer interaction result returned by the language interaction model based on the tag name and its attribute fields.
进一步的,所述标签及属性确定单元,具体用于:Furthermore, the label and attribute determination unit is specifically used to:
以所述目标实体为索引,到所述分布式检索数据库进行关联检索分析以查找出与所述索引之间的相似度满足预设相似度阈值的标签名称及其属性字段;其中,所述标签名称对应的表结构中包含至少一个与所述目标实体对应的属性字段。Taking the target entity as an index, an associated search analysis is performed on the distributed search database to find a tag name and its attribute field whose similarity with the index meets a preset similarity threshold; wherein the table structure corresponding to the tag name contains at least one attribute field corresponding to the target entity.
进一步的,在基于预设的分布式检索数据库对所述目标实体进行关联检索分析,获得所述目标实体对应的标签名称及其属性字段之前,还包括:数据存储单元,用于将预设的多种口语化文本信息中包含的各类实体作为键值对中的键,并将所述各类实体对应的各类标签名称及其属性字段作为所述键值对中的值,构建用于表示各个实体分别与各类标签名称及其属性字段之间对应关系的目标键值对存储到所述分布式检索数据库中。Furthermore, before performing an associated search analysis on the target entity based on a preset distributed retrieval database to obtain the label name and attribute field corresponding to the target entity, it also includes: a data storage unit, which is used to use each type of entity contained in a preset plurality of colloquial text information as the key in a key-value pair, and each type of label name and attribute field corresponding to the each type of entity as the value in the key-value pair, to construct a target key-value pair for representing the corresponding relationship between each entity and each type of label name and attribute field, and store it in the distributed retrieval database.
进一步的,所述模板构建单元,具体用于:Furthermore, the template construction unit is specifically used for:
基于所述标签名称及其属性字段,按照预设的初始模板进行嵌入处理,获得所述目标实体分别对应的提示词模板;其中,所述提示词模板用于标识所述目标实体分别对应的标签名称及其属性字段。Based on the tag name and its attribute field, embedding processing is performed according to a preset initial template to obtain prompt word templates corresponding to the target entities respectively; wherein the prompt word templates are used to identify the tag name and its attribute field corresponding to the target entities respectively.
进一步的,所述查询处理单元,具体用于:Furthermore, the query processing unit is specifically used to:
将所述提示词模板输入到预设的语言交互模型中进行分析,获得所述语言交互模型基于所述标签名称及其属性字段返回的图形数据库查询语句,基于所述查询语句到预设的图数据库中查找获得相应的问答交互结果,并输出所述问答交互结果。The prompt word template is input into a preset language interaction model for analysis, and a graph database query statement returned by the language interaction model based on the tag name and its attribute fields is obtained. Based on the query statement, a preset graph database is searched to obtain a corresponding question-and-answer interaction result, and the question-and-answer interaction result is output.
进一步的,所述目标实体获得单元,具体用于:Furthermore, the target entity obtaining unit is specifically used to:
获得输入的口语化文本信息;Obtaining input spoken text information;
对所述口语化文本信息进行分词处理,获得所述口语化文本信息中包含的至少一个目标实体。The colloquial text information is subjected to word segmentation processing to obtain at least one target entity contained in the colloquial text information.
进一步的,所述对所述口语化文本信息进行分词处理,获得所述口语化文本信息中包含的至少一个目标实体,具体包括:Furthermore, the performing word segmentation processing on the colloquial text information to obtain at least one target entity contained in the colloquial text information specifically includes:
对所述口语化文本信息进行分词切分,获得所述口语化文本信息中包含的至少一个文本数据块,从所述文本数据块识别出属性特征模糊的文本,并将所述属性特征模糊的文本作为所述目标实体。The colloquial text information is segmented to obtain at least one text data block contained in the colloquial text information, a text with ambiguous attribute features is identified from the text data block, and the text with ambiguous attribute features is used as the target entity.
第三方面,本发明还提供一种电子设备,包括:存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,处理器执行所述计算机程序时实现如上述任意一项所述的口语化文本内容的检索优化方法的步骤。In a third aspect, the present invention further provides an electronic device comprising: a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein when the processor executes the computer program, the steps of the retrieval optimization method for colloquial text content as described in any one of the above items are implemented.
第四方面,本发明还提供一种处理器可读存储介质,所述处理器可读存储介质上存储有计算机程序,该计算机程序被处理器执行时实现如上述任意一项所述的口语化文本内容的检索优化方法的步骤。In a fourth aspect, the present invention further provides a processor-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the steps of the retrieval optimization method for colloquial text content as described in any one of the above are implemented.
本发明提供的口语化文本内容的检索优化方法,通过获得输入的口语化文本信息中包含的目标实体,基于预设的分布式检索数据库对目标实体进行关联检索分析,获得目标实体对应的标签名称及其属性字段;其中,所述分布式检索数据库中预先存储有各个实体分别与各类标签名称及其属性字段之间的对应关系,基于标签名称及其属性字段进行模板构建处理,获得目标实体分别对应的提示词模板,并基于预设的语言交互模型对提示词模板进行处理,获得语言交互模型基于标签名称及其属性字段返回的问答交互结果,能够有效提高口语化文本内容的检索精确度和检索效率,从而提高了用户的使用体验。The retrieval optimization method for colloquial text content provided by the present invention obtains the target entity contained in the input colloquial text information, performs associated retrieval analysis on the target entity based on a preset distributed retrieval database, and obtains the label name and attribute field corresponding to the target entity; wherein the distributed retrieval database pre-stores the correspondence between each entity and each type of label name and attribute field, performs template construction processing based on the label name and attribute field, obtains the prompt word templates corresponding to the target entity, and processes the prompt word template based on a preset language interaction model to obtain the question-answer interaction result returned by the language interaction model based on the label name and attribute field, which can effectively improve the retrieval accuracy and retrieval efficiency of the colloquial text content, thereby improving the user experience.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍,显而易见地,下面描述中的附图是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获取其他的附图。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following briefly introduces the drawings required for use in the embodiments or the description of the prior art. Obviously, the drawings described below are some embodiments of the present invention. For ordinary technicians in this field, other drawings can be obtained based on these drawings without paying creative work.
图1是本发明实施例提供的口语化文本内容的检索优化方法的流程示意图;FIG1 is a flow chart of a method for optimizing retrieval of colloquial text content provided by an embodiment of the present invention;
图2是本发明实施例提供的口语化文本内容的检索优化方法的具体流程示意图FIG. 2 is a schematic diagram of a specific flow chart of a method for optimizing the retrieval of colloquial text content provided by an embodiment of the present invention.
图3是本发明实施例提供的口语化文本内容的检索优化装置的结构示意图;3 is a schematic diagram of the structure of a retrieval optimization device for colloquial text content provided by an embodiment of the present invention;
图4是本发明实施例提供的电子设备的实体结构示意图。FIG. 4 is a schematic diagram of the physical structure of an electronic device provided in an embodiment of the present invention.
具体实施方式DETAILED DESCRIPTION
为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获取的所有其他实施例,都属于本发明保护的范围。In order to make the purpose, technical solution and advantages of the embodiments of the present invention clearer, the technical solution in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments are part of the embodiments of the present invention, not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by ordinary technicians in this field without creative work are within the scope of protection of the present invention.
需要说明的是,本申请的说明书及上述附图中的术语“第一”、“第二”等是用于区别类似的用户,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本申请的实施例能够以除了在这里图示或描述的那些以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。It should be noted that the terms "first", "second", etc. in the specification of the present application and the above-mentioned drawings are used to distinguish similar users, and are not necessarily used to describe a specific order or sequence. It should be understood that the data used in this way can be interchangeable where appropriate, so that the embodiments of the present application described herein can be implemented in an order other than those illustrated or described herein. In addition, the terms "including" and "having" and any of their variations are intended to cover non-exclusive inclusions, for example, a process, method, system, product or device comprising a series of steps or units is not necessarily limited to those steps or units clearly listed, but may include other steps or units that are not clearly listed or inherent to these processes, methods, products or devices.
本发明提出一种针对口语化文本内容的检索优化方法,旨在有效解决口语化表达在自然语言处理中可能导致信息丢失的问题。这方法首先通过特定实体的检测,针对口语化表达进行精准的识别,然后将检测出的标签名称(即label)及其属性字段等内容通过初始模板(即prompt模板)带入到大模型(即语言交互模型),告诉语言交互模型此实体属于什么label以及属性。最终,语言交互模型生成正确的图形数据库查询语句(即cypher语句),从而实现了口语化表达仍可正确生成cypher语句的目的。也就是,在解决自然语言查询问题的时候,会应用text2cypher技术。具体来说,是把用户查询的问题,以及label和属性字段通过prompt模板告诉语言交互模型,让语言交互模型根据所述label和属性字段产出相应的cypher语句。The present invention proposes a retrieval optimization method for colloquial text content, aiming to effectively solve the problem that colloquial expressions may cause information loss in natural language processing. This method first detects specific entities and accurately identifies colloquial expressions, and then brings the detected tag name (i.e., label) and its attribute fields and other contents into a large model (i.e., a language interaction model) through an initial template (i.e., a prompt template), telling the language interaction model what label and attribute this entity belongs to. Finally, the language interaction model generates a correct graphic database query statement (i.e., a cypher statement), thereby achieving the purpose of correctly generating cypher statements for colloquial expressions. That is, text2cypher technology is applied when solving natural language query problems. Specifically, the user's query question, as well as the label and attribute fields are told to the language interaction model through a prompt template, so that the language interaction model generates corresponding cypher statements according to the label and attribute fields.
下面基于本发明所述的口语化文本内容的检索优化方法,对其实施例进行详细描述。如图1所示,为本发明实施例提供的口语化文本内容的检索优化方法的流程示意图,具体过程包括以下步骤:The following is a detailed description of an embodiment of the retrieval optimization method for colloquial text content according to the present invention. As shown in FIG1 , a flow chart of the retrieval optimization method for colloquial text content provided by an embodiment of the present invention is shown, and the specific process includes the following steps:
步骤101:获得输入的口语化文本信息中包含的目标实体。Step 101: Obtain a target entity contained in the input spoken text information.
具体的,首先需要获得输入的口语化文本信息,然后对所述口语化文本信息进行分词处理,获得所述口语化文本信息中包含的至少一个目标实体。其中,所述对所述口语化文本信息进行分词处理,获得所述口语化文本信息中包含的至少一个目标实体,对应的实现过程可包括:对所述口语化文本信息进行分词切分,获得所述口语化文本信息中包含的至少一个文本数据块,从所述文本数据块识别出属性特征模糊的文本,并将所述属性特征模糊的文本作为所述目标实体。其中,所述口语化文本信息为用户查询的问题信息。如图2所示,在本发明实施例中,可将用户查询的Query问题信息进行分词。通过jieba分词或n-gram分词,将用户的Query问题信息分词成n个单词。例如本实施例中,将口语化文本信息“A事业部存放了多少个水平的安定面吊具”分词成“A事业部、存放了、多少个、水平安定面吊具”几个词,这几个词即为所述口语化文本信息中包含的文本数据块,从所述文本数据块识别出属性特征模糊的文本(比如水平安定面吊具、A事业部),并将所述属性特征模糊的文本作为所述目标实体。因为,水平安定面吊具和A事业部为并非具有特定含义的名词,实体属性不清楚,因此需要预先确定其对应的标签名称及其属性字段。例如,若所述目标实体为姓名B,则所述目标实体对应的标签名称可为所述姓名B对应的表名(比如教师表),属性字段为所述表名下面的性别、年龄等字段。Specifically, it is necessary to first obtain the input colloquial text information, and then perform word segmentation processing on the colloquial text information to obtain at least one target entity contained in the colloquial text information. Wherein, the word segmentation processing on the colloquial text information to obtain at least one target entity contained in the colloquial text information, the corresponding implementation process may include: performing word segmentation on the colloquial text information to obtain at least one text data block contained in the colloquial text information, identifying text with ambiguous attribute features from the text data block, and using the text with ambiguous attribute features as the target entity. Wherein, the colloquial text information is the question information queried by the user. As shown in Figure 2, in an embodiment of the present invention, the Query question information queried by the user can be segmented. Through Jieba word segmentation or n-gram word segmentation, the user's Query question information is segmented into n words. For example, in this embodiment, the colloquial text information "How many horizontal stabilizer hoists are stored in Division A" is segmented into several words "Division A, stored, how many, horizontal stabilizer hoists", which are the text data blocks contained in the colloquial text information. Text with ambiguous attribute features (such as horizontal stabilizer hoist, Division A) is identified from the text data blocks, and the text with ambiguous attribute features is used as the target entity. Because horizontal stabilizer hoist and Division A are nouns that do not have specific meanings, the entity attributes are unclear, so it is necessary to predetermine their corresponding label names and attribute fields. For example, if the target entity is name B, the label name corresponding to the target entity can be the table name corresponding to name B (such as the teacher table), and the attribute fields are the gender, age and other fields under the table name.
步骤102:基于预设的分布式检索数据库对所述目标实体进行关联检索分析,获得所述目标实体对应的标签名称及其属性字段;其中,所述分布式检索数据库中预先存储有各个实体分别与各类标签名称及其属性字段之间的对应关系。Step 102: Perform an association search analysis on the target entity based on a preset distributed search database to obtain a tag name and attribute field corresponding to the target entity; wherein the distributed search database pre-stores the corresponding relationship between each entity and each type of tag name and attribute field.
具体的,所述分布式检索数据库可为预先搭建的ES(即Elasticsearch)库。在本发明实施例中,以所述目标实体为索引,到所述分布式检索数据库进行关联检索分析以查找出与所述索引之间的相似度满足预设相似度阈值的标签名称及其属性字段。其中,所述标签名称对应的表结构中包含至少一个与所述目标实体对应的属性字段。需要说明的是,在执行本步骤之前,要预先从预设的数据湖中获得多种口语化文本信息中可能包含的各类实体以及对应的各类标签名称及其属性字段,将预设的多种口语化文本信息中包含的各类实体作为键值对中的键,并将所述各类实体对应的各类标签名称及其属性字段作为所述键值对中的值,构建用于表示各个实体分别与各类标签名称及其属性字段之间对应关系的目标键值对存储到所述分布式检索数据库中。例如,搭建完ES库之后,将neo4j中的label及其属性字段转存入ES库中,一个label下可与对应多个属性字段,由于ES库查询速度很快,因此采用ES存储。ES库每天更新,保证数据的及时性。具体参数需要取消ES分词功能,并对查询匹配返回结果设置只选取匹配度为100%的情形。可将分好的几个词【A事业部、存放了、多少个、水平安定面吊具】去ES库中查询,查询后会得到,【水平安定面吊具】属于label及其属性字段为【tool】的【tooling_name】字段。其他词不属于任何字段。其中,【tool】为label,【tooling_name】字段为属性字段。Specifically, the distributed retrieval database can be a pre-built ES (i.e., Elasticsearch) library. In an embodiment of the present invention, the target entity is used as an index, and an associated retrieval analysis is performed on the distributed retrieval database to find out the label name and its attribute field whose similarity with the index meets the preset similarity threshold. Among them, the table structure corresponding to the label name contains at least one attribute field corresponding to the target entity. It should be noted that before executing this step, various entities and corresponding label names and attribute fields that may be contained in a variety of colloquial text information should be obtained from the preset data lake in advance, and the various entities contained in the preset multiple colloquial text information should be used as the key in the key-value pair, and the various label names and attribute fields corresponding to the various entities should be used as the value in the key-value pair, and the target key-value pairs used to represent the corresponding relationship between each entity and each label name and its attribute field are stored in the distributed retrieval database. For example, after the ES library is built, the label and its attribute fields in neo4j are transferred to the ES library. One label can correspond to multiple attribute fields. Since the ES library query speed is very fast, ES storage is used. The ES library is updated every day to ensure the timeliness of the data. The specific parameters need to cancel the ES word segmentation function, and set the query matching return results to only select the cases with a matching degree of 100%. The divided words [A Division, stored, how many, horizontal stabilizer spreader] can be queried in the ES library. After the query, it will be found that [horizontal stabilizer spreader] belongs to the label and its attribute field is [tool] [tooling_name] field. Other words do not belong to any field. Among them, [tool] is the label, and the [tooling_name] field is the attribute field.
步骤103:基于所述标签名称及其属性字段进行模板构建处理,获得所述目标实体分别对应的提示词模板;其中,所述提示词模板包含所述目标实体对应的所述标签名称及其属性字段。Step 103: Perform template construction processing based on the tag name and its attribute fields to obtain prompt word templates corresponding to the target entities respectively; wherein the prompt word templates include the tag name and its attribute fields corresponding to the target entity.
具体的,基于所述标签名称及其属性字段,按照预设的初始模板进行嵌入处理,获得所述目标实体分别对应的提示词模板;其中,所述提示词模板用于标识所述目标实体分别对应的标签名称及其属性字段。需要说明的是,原始的Prompt中只包含neo4j的表结构,没有包含用户问题中实体可能属于哪个表结构,即属于哪个标签名称及其属性字段。本发明使用的新版prompt(即初始模板)中,加入了【用户的问题信息中,[A事业部]可能为[tool]中的[storage_workshop],[水平安定面吊具]可能为[tool]中的[tooling_name]】,这样就有效的告诉了大模型,用户问题信息中的实体,可能属于哪些label和属性字段。将检测出的实体通过构造的prompt模板(即提示词模板)输入给大模型。所述的提示词模板是一种在自然语言处理中用于提高模型性能的技术。它通过在输入文本中设置特定的提示词,即提示词,来指导大模型关注与任务相关的信息。这些提示词可以是label和属性字段等。具体可通过填充方法实现,在提示词模板生成过程中,将label和属性字段等与输入口语化文本信息相结合的方法,如直接插入、拼接编码向量和嵌入等,以提高大模型性能。例如所述提示词模板可为:[A事业部]为[tool]中的[storage_workshop],[水平安定面吊具]为[tool]中的[tooling_name]。Specifically, based on the label name and its attribute field, embedding processing is performed according to the preset initial template to obtain the prompt word templates corresponding to the target entities; wherein the prompt word template is used to identify the label name and its attribute field corresponding to the target entity. It should be noted that the original prompt only contains the table structure of neo4j, and does not contain which table structure the entity in the user's question may belong to, that is, which label name and attribute field it belongs to. In the new version of prompt (i.e., initial template) used by the present invention, [In the user's question information, [A Division] may be [storage_workshop] in [tool], and [Horizontal Stabilizer Hoist] may be [tooling_name] in [tool]] is added, which effectively tells the big model which labels and attribute fields the entity in the user's question information may belong to. The detected entity is input to the big model through the constructed prompt template (i.e., prompt word template). The prompt word template is a technology used to improve model performance in natural language processing. It guides the big model to focus on task-related information by setting specific prompt words, i.e., prompt words, in the input text. These prompt words can be labels and attribute fields, etc. This can be achieved specifically through a filling method. In the process of generating the prompt word template, a method of combining the label and attribute fields with the input colloquial text information, such as direct insertion, concatenated encoding vectors, and embedding, can be used to improve the performance of large models. For example, the prompt word template can be: [A Division] is [storage_workshop] in [tool], and [Horizontal Stabilizer Spreader] is [tooling_name] in [tool].
步骤104:基于预设的语言交互模型对所述提示词模板进行处理,获得所述语言交互模型基于所述标签名称及其属性字段返回的问答交互结果。Step 104: Process the prompt word template based on a preset language interaction model to obtain a question-answer interaction result returned by the language interaction model based on the tag name and its attribute fields.
具体的,将所述提示词模板输入到预设的语言交互模型中进行分析,获得所述语言交互模型基于所述标签名称及其属性字段返回的图形数据库查询语句(即cypher语句),基于所述查询语句到预设的图数据库(即neo4j图数据库)中查找获得相应的问答交互结果,并输出所述问答交互结果。需要说明的是,本发明技术方案的包括两个主要阶段:首先是口语化表达的实体检测,然后是通过prompt模板交给大模型进行cypher语句生成。在实体检测阶段,能够准确地捕捉到用户口语化文本信息中的特定实体。这一步骤是为了确保后续的处理针对性强,能够准确解析口语化表达中的关键信息。接下来,被检测出的口语化文本信息对应的标签名称及属性字段将通过prompt模板交给大模型进行cypher语句生成。这个大模型可以是基于深度学习的生成模型,如GPT系列,它们在处理自然语言生成任务上表现出色。由于输入的已经是标准形式,text2cypher模型能够更准确地解析并生成相应的cypher语句。通过这种方式,本发明实现了对用户口语化表达的高效处理,无需对text2cypher模型进行修改,从而省去了大量的时间和人力成本。也就是,首先,通过专门设计的实体检测步骤,成功地克服了口语化表达可能导致信息丢失的问题。其次,整个处理过程的流程清晰,简化了处理流程,提高了效率。避免了对text2cypher模型进行修改的复杂性,使得系统更易于维护和升级。通过实体检测和prompt工程,成功地将用户口语化文本信息中的实体属性告诉大模型,从而为自然语言处理中的结构化查询提供了可行的解决方案。Specifically, the prompt word template is input into a preset language interaction model for analysis, and a graph database query statement (i.e., a cypher statement) returned by the language interaction model based on the tag name and its attribute field is obtained. Based on the query statement, a preset graph database (i.e., a neo4j graph database) is searched to obtain the corresponding question-answer interaction result, and the question-answer interaction result is output. It should be noted that the technical solution of the present invention includes two main stages: first, entity detection of colloquial expressions, and then the large model is handed over to generate cypher statements through the prompt template. In the entity detection stage, specific entities in the user's colloquial text information can be accurately captured. This step is to ensure that the subsequent processing is highly targeted and can accurately parse the key information in the colloquial expression. Next, the tag name and attribute field corresponding to the detected colloquial text information will be handed over to the large model through the prompt template for cypher statement generation. This large model can be a generative model based on deep learning, such as the GPT series, which performs well in processing natural language generation tasks. Since the input is already in standard form, the text2cypher model can more accurately parse and generate corresponding cypher statements. In this way, the present invention realizes efficient processing of user's colloquial expressions without modifying the text2cypher model, thereby saving a lot of time and labor costs. That is, firstly, through the specially designed entity detection step, the problem that colloquial expressions may cause information loss is successfully overcome. Secondly, the flow of the entire processing process is clear, which simplifies the processing flow and improves efficiency. The complexity of modifying the text2cypher model is avoided, making the system easier to maintain and upgrade. Through entity detection and prompt engineering, the entity attributes in the user's colloquial text information are successfully told to the large model, thereby providing a feasible solution for structured queries in natural language processing.
本发明实施例所述的口语化文本内容的检索优化方法,通过获得输入的口语化文本信息中包含的目标实体,基于预设的分布式检索数据库对目标实体进行关联检索分析,获得目标实体对应的标签名称及其属性字段;其中,所述分布式检索数据库中预先存储有各个实体分别与各类标签名称及其属性字段之间的对应关系,基于标签名称及其属性字段进行模板构建处理,获得目标实体分别对应的提示词模板,并基于预设的语言交互模型对提示词模板进行处理,获得语言交互模型基于标签名称及其属性字段返回的问答交互结果,能够有效提高口语化文本内容的检索精确度和检索效率,从而提高了用户的使用体验。The retrieval optimization method for colloquial text content described in the embodiment of the present invention obtains the target entity contained in the input colloquial text information, performs association retrieval analysis on the target entity based on a preset distributed retrieval database, and obtains the label name and attribute field corresponding to the target entity; wherein the distributed retrieval database pre-stores the correspondence between each entity and each type of label name and attribute field, performs template construction processing based on the label name and attribute field, obtains the prompt word templates corresponding to the target entity, and processes the prompt word template based on a preset language interaction model to obtain the question-answer interaction result returned by the language interaction model based on the label name and attribute field, which can effectively improve the retrieval accuracy and retrieval efficiency of the colloquial text content, thereby improving the user experience.
与上述提供的一种口语化文本内容的检索优化方法相对应,本发明还提供一种口语化文本内容的检索优化装置。由于该装置的实施例相似于上述方法实施例,所以描述得比较简单,相关之处请参见上述方法实施例部分的说明即可,下面描述的口语化文本内容的检索优化装置的实施例仅是示意性的。请参考图3所示,其为本发明实施例提供的一种口语化文本内容的检索优化装置的结构示意图。本发明所述的口语化文本内容的检索优化装置,具体包括如下部分:Corresponding to the above-mentioned method for searching and optimizing the content of spoken text, the present invention also provides a device for searching and optimizing the content of spoken text. Since the embodiment of the device is similar to the above-mentioned method embodiment, the description is relatively simple. For relevant matters, please refer to the description of the above-mentioned method embodiment. The embodiment of the device for searching and optimizing the content of spoken text described below is only schematic. Please refer to Figure 3, which is a schematic diagram of the structure of a device for searching and optimizing the content of spoken text provided in an embodiment of the present invention. The device for searching and optimizing the content of spoken text described in the present invention specifically includes the following parts:
目标实体获得单元301,用于获得输入的口语化文本信息中包含的目标实体;A target entity obtaining unit 301 is used to obtain a target entity contained in the input spoken text information;
标签及属性确定单元302,用于基于预设的分布式检索数据库对所述目标实体进行关联检索分析,获得所述目标实体对应的标签名称及其属性字段;其中,所述分布式检索数据库中预先存储有各个实体分别与各类标签名称及其属性字段之间的对应关系;The tag and attribute determination unit 302 is used to perform an associated search analysis on the target entity based on a preset distributed search database to obtain a tag name and an attribute field corresponding to the target entity; wherein the distributed search database pre-stores the corresponding relationship between each entity and each type of tag name and its attribute field;
模板构建单元303,用于基于所述标签名称及其属性字段进行模板构建处理,获得所述目标实体分别对应的提示词模板;其中,所述提示词模板包含所述目标实体对应的所述标签名称及其属性字段;The template construction unit 303 is used to perform template construction processing based on the tag name and its attribute field to obtain prompt word templates corresponding to the target entities; wherein the prompt word templates include the tag name and its attribute field corresponding to the target entity;
查询处理单元,用于基于预设的语言交互模型对所述提示词模板进行处理,获得所述语言交互模型基于所述标签名称及其属性字段返回的问答交互结果。The query processing unit is used to process the prompt word template based on a preset language interaction model to obtain a question-answer interaction result returned by the language interaction model based on the tag name and its attribute fields.
进一步的,所述标签及属性确定单元,具体用于:Furthermore, the label and attribute determination unit is specifically used to:
以所述目标实体为索引,到所述分布式检索数据库进行关联检索分析以查找出与所述索引之间的相似度满足预设相似度阈值的标签名称及其属性字段;其中,所述标签名称对应的表结构中包含至少一个与所述目标实体对应的属性字段。Taking the target entity as an index, an associated search analysis is performed on the distributed search database to find a tag name and its attribute field whose similarity with the index meets a preset similarity threshold; wherein the table structure corresponding to the tag name contains at least one attribute field corresponding to the target entity.
进一步的,在基于预设的分布式检索数据库对所述目标实体进行关联检索分析,获得所述目标实体对应的标签名称及其属性字段之前,还包括:数据存储单元,用于将预设的多种口语化文本信息中包含的各类实体作为键值对中的键,并将所述各类实体对应的各类标签名称及其属性字段作为所述键值对中的值,构建用于表示各个实体分别与各类标签名称及其属性字段之间对应关系的目标键值对存储到所述分布式检索数据库中。Furthermore, before performing an associated search analysis on the target entity based on a preset distributed retrieval database to obtain the label name and attribute field corresponding to the target entity, it also includes: a data storage unit, which is used to use each type of entity contained in a preset plurality of colloquial text information as the key in a key-value pair, and each type of label name and attribute field corresponding to the each type of entity as the value in the key-value pair, to construct a target key-value pair for representing the corresponding relationship between each entity and each type of label name and attribute field, and store it in the distributed retrieval database.
进一步的,所述模板构建单元,具体用于:Furthermore, the template construction unit is specifically used for:
基于所述标签名称及其属性字段,按照预设的初始模板进行嵌入处理,获得所述目标实体分别对应的提示词模板;其中,所述提示词模板用于标识所述目标实体分别对应的标签名称及其属性字段。Based on the tag name and its attribute field, embedding processing is performed according to a preset initial template to obtain prompt word templates corresponding to the target entities respectively; wherein the prompt word templates are used to identify the tag name and its attribute field corresponding to the target entities respectively.
进一步的,所述查询处理单元,具体用于:Furthermore, the query processing unit is specifically used to:
将所述提示词模板输入到预设的语言交互模型中进行分析,获得所述语言交互模型基于所述标签名称及其属性字段返回的图形数据库查询语句,基于所述查询语句到预设的图数据库中查找获得相应的问答交互结果,并输出所述问答交互结果。The prompt word template is input into a preset language interaction model for analysis, and a graph database query statement returned by the language interaction model based on the tag name and its attribute fields is obtained. Based on the query statement, a preset graph database is searched to obtain a corresponding question-and-answer interaction result, and the question-and-answer interaction result is output.
进一步的,所述目标实体获得单元,具体用于:Furthermore, the target entity obtaining unit is specifically used to:
获得输入的口语化文本信息;Obtaining input spoken text information;
对所述口语化文本信息进行分词处理,获得所述口语化文本信息中包含的至少一个目标实体。The colloquial text information is subjected to word segmentation processing to obtain at least one target entity contained in the colloquial text information.
进一步的,所述对所述口语化文本信息进行分词处理,获得所述口语化文本信息中包含的至少一个目标实体,具体包括:Furthermore, the performing word segmentation processing on the colloquial text information to obtain at least one target entity contained in the colloquial text information specifically includes:
对所述口语化文本信息进行分词切分,获得所述口语化文本信息中包含的至少一个文本数据块,从所述文本数据块识别出属性特征模糊的文本,并将所述属性特征模糊的文本作为所述目标实体。The colloquial text information is segmented to obtain at least one text data block contained in the colloquial text information, a text with ambiguous attribute features is identified from the text data block, and the text with ambiguous attribute features is used as the target entity.
本发明实施例所述的口语化文本内容的检索优化装置,通过获得输入的口语化文本信息中包含的目标实体,基于预设的分布式检索数据库对目标实体进行关联检索分析,获得目标实体对应的标签名称及其属性字段;其中,所述分布式检索数据库中预先存储有各个实体分别与各类标签名称及其属性字段之间的对应关系,基于标签名称及其属性字段进行模板构建处理,获得目标实体分别对应的提示词模板,并基于预设的语言交互模型对提示词模板进行处理,获得语言交互模型基于标签名称及其属性字段返回的问答交互结果,能够有效提高口语化文本内容的检索精确度和检索效率,从而提高了用户的使用体验。The retrieval optimization device for colloquial text content described in the embodiment of the present invention obtains the target entity contained in the input colloquial text information, performs association retrieval analysis on the target entity based on a preset distributed retrieval database, and obtains the label name and attribute field corresponding to the target entity; wherein the distributed retrieval database pre-stores the correspondence between each entity and each type of label name and attribute field, performs template construction processing based on the label name and attribute field, obtains the prompt word templates corresponding to the target entity, and processes the prompt word template based on a preset language interaction model to obtain the question-answer interaction result returned by the language interaction model based on the label name and attribute field, which can effectively improve the retrieval accuracy and retrieval efficiency of the colloquial text content, thereby improving the user experience.
与上述提供的口语化文本内容的检索优化方法相对应,本发明还提供一种电子设备。由于该电子设备的实施例相似于上述方法实施例,所以描述得比较简单,相关之处请参见上述方法实施例部分的说明即可,下面描述的电子设备仅是示意性的。如图4所示,其为本发明实施例公开的一种电子设备的实体结构示意图。该电子设备可以包括:处理器(processor)401、存储器(memory)402和通信总线403,其中,处理器401,存储器402通过通信总线403完成相互间的通信,通过通信接口404与外部进行通信。处理器401可以调用存储器402中的逻辑指令,以执行口语化文本内容的检索优化方法,该方法包括:获得输入的口语化文本信息中包含的目标实体;基于预设的分布式检索数据库对所述目标实体进行关联检索分析,获得所述目标实体对应的标签名称及其属性字段;其中,所述分布式检索数据库中预先存储有各个实体分别与各类标签名称及其属性字段之间的对应关系;基于所述标签名称及其属性字段进行模板构建处理,获得所述目标实体分别对应的提示词模板;其中,所述提示词模板包含所述目标实体对应的所述标签名称及其属性字段;基于预设的语言交互模型对所述提示词模板进行处理,获得所述语言交互模型基于所述标签名称及其属性字段返回的问答交互结果。Corresponding to the above-mentioned retrieval optimization method for colloquial text content, the present invention also provides an electronic device. Since the embodiment of the electronic device is similar to the above-mentioned method embodiment, the description is relatively simple. For relevant parts, please refer to the description of the above-mentioned method embodiment part. The electronic device described below is only schematic. As shown in Figure 4, it is a schematic diagram of the physical structure of an electronic device disclosed in an embodiment of the present invention. The electronic device may include: a processor (processor) 401, a memory (memory) 402 and a communication bus 403, wherein the processor 401 and the memory 402 complete mutual communication through the communication bus 403, and communicate with the outside through the communication interface 404. The processor 401 can call the logic instructions in the memory 402 to execute the retrieval optimization method of the colloquial text content, which method includes: obtaining the target entity contained in the input colloquial text information; performing an associated retrieval analysis on the target entity based on a preset distributed retrieval database to obtain the label name and attribute field corresponding to the target entity; wherein the distributed retrieval database pre-stores the correspondence between each entity and each type of label name and attribute field; performing template construction processing based on the label name and attribute field to obtain the prompt word templates corresponding to the target entity; wherein the prompt word template includes the label name and attribute field corresponding to the target entity; processing the prompt word template based on a preset language interaction model to obtain the question-answer interaction result returned by the language interaction model based on the label name and attribute field.
此外,上述的存储器402中的逻辑指令可以通过软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:存储芯片、U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。In addition, the logic instructions in the above-mentioned memory 402 can be implemented in the form of software functional units and can be stored in a computer-readable storage medium when sold or used as an independent product. Based on such an understanding, the technical solution of the present invention, in essence, or the part that contributes to the prior art or the part of the technical solution, can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including a number of instructions for a computer device (which can be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method described in each embodiment of the present invention. The aforementioned storage medium includes: various media that can store program codes, such as storage chips, U disks, mobile hard disks, read-only memories (ROM, Read-Only Memory), random access memories (RAM, Random Access Memory), magnetic disks or optical disks.
另一方面,本发明实施例还提供一种计算机程序产品,所述计算机程序产品包括存储在处理器可读存储介质上的计算机程序,所述计算机程序包括程序指令,当所述程序指令被计算机执行时,计算机能够执行上述各方法实施例所提供的口语化文本内容的检索优化方法。该方法包括:获得输入的口语化文本信息中包含的目标实体;基于预设的分布式检索数据库对所述目标实体进行关联检索分析,获得所述目标实体对应的标签名称及其属性字段;其中,所述分布式检索数据库中预先存储有各个实体分别与各类标签名称及其属性字段之间的对应关系;基于所述标签名称及其属性字段进行模板构建处理,获得所述目标实体分别对应的提示词模板;其中,所述提示词模板包含所述目标实体对应的所述标签名称及其属性字段;基于预设的语言交互模型对所述提示词模板进行处理,获得所述语言交互模型基于所述标签名称及其属性字段返回的问答交互结果。On the other hand, an embodiment of the present invention further provides a computer program product, the computer program product includes a computer program stored on a processor-readable storage medium, the computer program includes program instructions, and when the program instructions are executed by a computer, the computer can execute the retrieval optimization method of the colloquial text content provided by the above-mentioned method embodiments. The method includes: obtaining a target entity contained in the input colloquial text information; performing an associated retrieval analysis on the target entity based on a preset distributed retrieval database to obtain a label name and its attribute field corresponding to the target entity; wherein the distributed retrieval database pre-stores the correspondence between each entity and each type of label name and its attribute field; performing template construction processing based on the label name and its attribute field to obtain a prompt word template corresponding to the target entity; wherein the prompt word template includes the label name and its attribute field corresponding to the target entity; processing the prompt word template based on a preset language interaction model to obtain a question-answer interaction result returned by the language interaction model based on the label name and its attribute field.
又一方面,本发明实施例还提供一种处理器可读存储介质,所述处理器可读存储介质上存储有计算机程序,该计算机程序被处理器执行时实现以执行上述各实施例提供的口语化文本内容的检索优化方法。该方法包括:获得输入的口语化文本信息中包含的目标实体;基于预设的分布式检索数据库对所述目标实体进行关联检索分析,获得所述目标实体对应的标签名称及其属性字段;其中,所述分布式检索数据库中预先存储有各个实体分别与各类标签名称及其属性字段之间的对应关系;基于所述标签名称及其属性字段进行模板构建处理,获得所述目标实体分别对应的提示词模板;其中,所述提示词模板包含所述目标实体对应的所述标签名称及其属性字段;基于预设的语言交互模型对所述提示词模板进行处理,获得所述语言交互模型基于所述标签名称及其属性字段返回的问答交互结果。On the other hand, an embodiment of the present invention further provides a processor-readable storage medium, on which a computer program is stored, and when the computer program is executed by the processor, it is implemented to execute the retrieval optimization method of the colloquial text content provided in the above embodiments. The method includes: obtaining the target entity contained in the input colloquial text information; performing an associated retrieval analysis on the target entity based on a preset distributed retrieval database to obtain the label name and attribute field corresponding to the target entity; wherein the distributed retrieval database pre-stores the correspondence between each entity and each type of label name and attribute field; performing template construction processing based on the label name and attribute field to obtain the prompt word template corresponding to the target entity; wherein the prompt word template includes the label name and attribute field corresponding to the target entity; processing the prompt word template based on a preset language interaction model to obtain the question-answer interaction result returned by the language interaction model based on the label name and attribute field.
所述处理器可读存储介质可以是处理器能够存取的任何可用介质或数据存储设备,包括但不限于磁性存储器(例如软盘、硬盘、磁带、磁光盘(MO)等)、光学存储器(例如CD、DVD、BD、HVD等)、以及半导体存储器(例如ROM、EPROM、EEPROM、非易失性存储器(NANDFLASH)、固态硬盘(SSD))等。The processor-readable storage medium can be any available medium or data storage device that can be accessed by the processor, including but not limited to magnetic storage (such as floppy disks, hard disks, magnetic tapes, magneto-optical disks (MO)), optical storage (such as CDs, DVDs, BDs, HVDs, etc.), and semiconductor storage (such as ROM, EPROM, EEPROM, non-volatile memory (NANDFLASH), solid-state drives (SSDs)), etc.
以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性的劳动的情况下,即可以理解并实施。The device embodiments described above are merely illustrative, wherein the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the scheme of this embodiment. Those of ordinary skill in the art may understand and implement it without creative effort.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到各实施方式可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件。基于这样的理解,上述技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在计算机可读存储介质中,如ROM/RAM、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行各个实施例或者实施例的某些部分所述的方法。Through the description of the above implementation methods, those skilled in the art can clearly understand that each implementation method can be implemented by means of software plus a necessary general hardware platform, and of course, it can also be implemented by hardware. Based on this understanding, the above technical solution is essentially or the part that contributes to the prior art can be embodied in the form of a software product, and the computer software product can be stored in a computer-readable storage medium, such as ROM/RAM, a disk, an optical disk, etc., including a number of instructions for a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the methods described in each embodiment or some parts of the embodiments.
最后应说明的是:以上实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, rather than to limit it. Although the present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that they can still modify the technical solutions described in the aforementioned embodiments, or make equivalent replacements for some of the technical features therein. However, these modifications or replacements do not deviate the essence of the corresponding technical solutions from the spirit and scope of the technical solutions of the embodiments of the present invention.
Claims (10)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202410809820.9A CN118760750A (en) | 2024-06-21 | 2024-06-21 | A method and device for optimizing retrieval of colloquial text content |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202410809820.9A CN118760750A (en) | 2024-06-21 | 2024-06-21 | A method and device for optimizing retrieval of colloquial text content |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN118760750A true CN118760750A (en) | 2024-10-11 |
Family
ID=92943072
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202410809820.9A Pending CN118760750A (en) | 2024-06-21 | 2024-06-21 | A method and device for optimizing retrieval of colloquial text content |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN118760750A (en) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN119782768A (en) * | 2025-03-07 | 2025-04-08 | 成都赛力斯科技有限公司 | Question prompt word generation method, system and electronic device |
-
2024
- 2024-06-21 CN CN202410809820.9A patent/CN118760750A/en active Pending
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN119782768A (en) * | 2025-03-07 | 2025-04-08 | 成都赛力斯科技有限公司 | Question prompt word generation method, system and electronic device |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN113807098B (en) | Model training method and device, electronic device and storage medium | |
| CN116991990A (en) | AIGC-based program development auxiliary methods, storage media and equipment | |
| US11238050B2 (en) | Method and apparatus for determining response for user input data, and medium | |
| CN117725895A (en) | Document generation methods, devices, equipment and media | |
| CN113779200A (en) | Target industry word stock generation method, processor and device | |
| CN118796991A (en) | Dialogue prompt text regeneration method, device, electronic device and storage medium | |
| CN112149419A (en) | Method, device and system for normalized automatic naming of fields | |
| CN118964387A (en) | Method, system, device and medium for retrieval enhancement generation based on large language model | |
| JP2025074312A (en) | Large-scale model-based question answering method, device, electronic device, storage medium, agent, and program | |
| CN117055850A (en) | AI design large model construction method, system, equipment and storage medium | |
| CN118503396A (en) | ERP system large model calling method, device and medium based on open prompt words | |
| CN118760750A (en) | A method and device for optimizing retrieval of colloquial text content | |
| CN118193733A (en) | Method, device, electronic equipment and storage medium for generating report | |
| US20230206007A1 (en) | Method for mining conversation content and method for generating conversation content evaluation model | |
| CN113935397A (en) | Intelligent interaction method and device | |
| CN118916453A (en) | Intelligent operation and maintenance method based on self-developed GPT model and related equipment thereof | |
| CN118631939A (en) | Intelligent quality inspection method and device for voice customer service based on multimodal large model | |
| CN118797072A (en) | Question-and-answer based network topology map generation method, device, medium and product | |
| US20230351121A1 (en) | Method and system for generating conversation flows | |
| CN117591654A (en) | Question answering method and device | |
| CN116955406A (en) | SQL sentence generation method and device, electronic equipment and storage medium | |
| CN116049370A (en) | Information query method and training method and device of information generation model | |
| CN110414006B (en) | Text subject annotation method, device, electronic equipment and storage medium | |
| CN116755683B (en) | Data processing method and related device | |
| CN114519357B (en) | Natural language processing method and system based on machine learning |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination |