CN111368036B

CN111368036B - Method and device for searching information

Info

Publication number: CN111368036B
Application number: CN202010147266.4A
Authority: CN
Inventors: 郎添娇; 赵旭; 郭宣佑
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-03-05
Filing date: 2020-03-05
Publication date: 2023-09-26
Anticipated expiration: 2040-03-05
Also published as: CN111368036A

Abstract

The embodiment of the application discloses a method and a device for searching information. One embodiment of the method comprises the following steps: receiving a search request input by a user; determining whether the search type corresponding to the search request is a novel search type; if the search type corresponding to the search request is a novel search type, inputting the search request into a pre-trained analysis model to obtain a search expression corresponding to the search request, wherein the analysis model is used for identifying at least one of a novel name, an author name and a principal angle name; and carrying out novel searching based on the searching expression to obtain novel to be pushed and pushing the novel to the user. The embodiment can realize that the novel is recalled based on at least one of the novel name, the author name and the principal angle name, expands the application scene and improves the recall rate of the novel with reading requirements for the user.

Description

Method and device for searching information

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to a method and a device for searching information.

Background

Vertical search positioning is a vertical search system providing industry first class for numerous vertical fields, and can meet user requirements with lower cost and more accurate professional results, and achieve accurate butt joint of users and high-quality professional vertical resources. The novel hanging-up category search technology is used as a novel hanging-up category with more audience and abundant resources, and meets a series of reading requirements of novel reading users described by search words (query).

Currently, the data stored in the novel database includes key information such as novel names, author names, and profiles. The user is required to input the novel name or input the novel name and the author name simultaneously for searching, so that the corresponding novel can be recalled. If the user forgets the novice name and the author name and searches by inputting other information of the novice, the corresponding novice cannot be recalled.

Disclosure of Invention

The embodiment of the application provides a method and a device for searching information.

In a first aspect, an embodiment of the present application proposes a method for searching information, including: receiving a search request input by a user; determining whether the search type corresponding to the search request is a novel search type; if the search type corresponding to the search request is a novel search type, inputting the search request into a pre-trained analysis model to obtain a search expression corresponding to the search request, wherein the analysis model is used for identifying at least one of a novel name, an author name and a principal angle name; and carrying out novel searching based on the searching expression to obtain novel to be pushed and pushing the novel to the user.

In some embodiments, determining whether the search type to which the search request corresponds is a novel search type includes: and inputting the search request into a pre-trained trigger model to obtain a search type corresponding to the search request, wherein the trigger model is used for identifying the search type based on at least one of a novel name, an author name and a principal angle name.

In some embodiments, performing the novel search based on the search expression to obtain the novel to be pushed comprises: and searching in a pre-generated novel abstract information set based on a search expression, and determining the novel to be pushed, wherein the novel abstract information comprises a novel name, an author name and a principal angle name.

In some embodiments, the determining that the novels are to be pushed based on retrieving expressions from a pre-generated set of novel summary information comprises: calculating the relativity of the retrieval expression and the novel abstract information in the novel abstract information set, and determining a candidate novel set; calculating the correlation degree between the search request and the candidate novels in the candidate novels set, and determining a to-be-selected novels set; and sorting and de-duplicating the novel to be pushed based on the hotness of the novel to be selected in the novel to be selected set and the relativity of the novel to be selected and the search request.

In some embodiments, the step of generating the novel summary information includes: acquiring the existing section of the novel; performing word segmentation and part-of-speech analysis on the content of the existing chapter by adopting a natural language processing NLP shallow lexical analysis model to obtain a word segmentation result and a part-of-speech analysis result; selecting a name set of the existing chapter from the word segmentation result based on the part-of-speech analysis result; determining a main angle name set of the existing chapter from the name sets of the existing chapter; based on the main corner name set of the existing chapter, the novel abstract information is generated.

In some embodiments, the word segmentation and part of speech analysis are performed on the content of the existing chapter by using a natural language processing NLP shallow lexical analysis model to obtain a word segmentation result and a part of speech analysis result, including: firstly, segmenting the content of the existing chapter by using an NLP shallow lexical analysis model to obtain a vocabulary set, then, recombining the vocabulary set to obtain a vocabulary sequence with the semantics meeting the preset conditions, and determining the part of speech of the vocabulary in the vocabulary sequence, wherein the preset conditions comprise at least one of the following: the semantics are reasonable and complete.

In some embodiments, determining a set of principal names for an existing chapter from a set of person names for the existing chapter includes: merging similar names in the name set of the existing chapter to generate a merged name set of the existing chapter; filtering the merged person name set of the existing chapter based on a pre-generated stop word list to generate a role name set of the existing chapter; and counting word frequencies of the character names in the character name sets of the existing chapters, and selecting a main character name set of the existing chapters from the character name sets of the existing chapters.

In some embodiments, the step of generating the novel summary information further comprises: if the novel has chapter update, acquiring an updated chapter; determining a set of principal names of the updated section, and updating the novel summary information based on the set of principal names of the updated section.

In some embodiments, the training step of the trigger model comprises: acquiring a first training sample set, wherein a first training sample in the first training sample set comprises a first sample search request and a corresponding first sample search type label; and for a first training sample in the first training sample set, taking a first sample search request in the first training sample as input, taking a first sample search type label in the first training sample as output, and training to obtain a trigger model.

In some embodiments, the training step of the analytical model comprises: acquiring a second training sample set, wherein a second training sample in the second training sample set comprises a second sample search request and a corresponding second sample retrieval expression, and the second sample retrieval expression comprises at least one of a novel name, an author name and a principal angle name; and for a second training sample in the second training sample set, taking a second sample search request in the second training sample as input, taking a second sample search expression in the second training sample as output, and training to obtain an analytical model.

In a second aspect, an embodiment of the present application proposes an apparatus for searching information, including: a receiving unit configured to receive a search request input by a user; a determining unit configured to determine whether a search type corresponding to the search request is a novel search type; the analysis unit is configured to input the search request into a pre-trained analysis model to obtain a retrieval expression corresponding to the search request if the search type corresponding to the search request is a novel search type, wherein the analysis model is used for identifying at least one of a novel name, an author name and a principal angle name; and the retrieval unit is configured to perform novel retrieval based on the retrieval expression, obtain novel to be pushed and push the novel to the user.

In some embodiments, the determining unit comprises: and the triggering subunit is configured to input the search request to a pre-trained triggering model to obtain a search type corresponding to the search request, wherein the triggering model is used for identifying the search type based on at least one of a novel name, an author name and a principal angle name.

In some embodiments, the retrieval unit comprises: and a retrieval subunit configured to retrieve from a pre-generated set of novel summary information based on the retrieval expression, the novel summary information including a novel name, an author name, and a principal angle name.

In some embodiments, the retrieving subunit comprises: a first calculation module configured to calculate a correlation of the retrieval expression with the novice summary information in the novice summary information set, determining a candidate novice set; the second calculation module is configured to calculate the correlation degree between the search request and the candidate novels in the candidate novels and determine a to-be-selected novels set; the sorting and de-duplication module is configured to sort and de-duplicate the to-be-selected novel set based on the hotness of the to-be-selected novel in the to-be-selected novel set and the relativity of the to-be-selected novel set to the search request, and determine the to-be-pushed novel.

In a third aspect, an embodiment of the present application provides an electronic device, including: one or more processors; a storage device having one or more programs stored thereon; the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method as described in any of the implementations of the first aspect.

In a fourth aspect, embodiments of the present application provide a computer readable medium having stored thereon a computer program which, when executed by a processor, implements a method as described in any of the implementations of the first aspect.

The method and the device for searching information provided by the embodiment of the application firstly determine whether the search type corresponding to the search request input by the user is a novel search type; then if the search type corresponding to the search request is a novel search type, inputting the search request into an analysis model to obtain a search expression corresponding to the search request; and finally, carrying out novel searching based on the searching expression to obtain novel to be pushed and pushing the novel to the user. The analysis model can identify at least one of the novel name, the author name and the principal angle name from the search request of the novel search type, solves the technical problem that the principal angle name in the search request cannot be identified or the principal angle name is mistakenly identified as the author name in the prior art, searches based on at least one of the novel name, the author name and the principal angle name, expands application scenes, and improves the recall rate of the novel with reading requirements to users.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the detailed description of non-limiting embodiments, made with reference to the accompanying drawings in which:

FIG. 1 is an exemplary system architecture in which the present application may be applied;

FIG. 2 is a flow chart of one embodiment of a method for searching information in accordance with the present application;

FIG. 3 is a flow chart of yet another embodiment of a method for searching information in accordance with the present application;

FIG. 4 is a flow chart of one embodiment of a novel summary information generation method in accordance with the present application;

FIG. 5 is a flow chart of one application scenario of the method for searching information according to the present application;

FIG. 6 is a schematic diagram of an embodiment of an apparatus for searching information in accordance with the present application;

fig. 7 is a schematic diagram of a computer system suitable for use in implementing an embodiment of the application.

Detailed Description

The application is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the application and are not limiting of the application. It should be noted that, for convenience of description, only the portions related to the present application are shown in the drawings.

It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other. The application will be described in detail below with reference to the drawings in connection with embodiments.

Fig. 1 shows an exemplary system architecture 100 to which embodiments of the method for searching information or the apparatus for searching information of the present application may be applied.

As shown in fig. 1, a terminal device 101, a network 102, and a server 103 may be included in a system architecture 100. Network 102 is the medium used to provide communication links between terminal device 101 and server 103. Network 102 may include various connection types such as wired, wireless communication links, or fiber optic cables, among others.

A user may interact with the server 103 via the network 102 using the terminal device 101 to receive or send messages or the like. The terminal device 101 may have various communication client applications installed thereon, such as a reading class application or the like.

The terminal device 101 may be hardware or software. When the terminal device 101 is hardware, it may be various electronic devices supporting information searching, including but not limited to smart phones, tablet computers, portable computers, desktop computers, and the like. When the terminal apparatus 101 is software, it may be installed in the above-described electronic apparatus. Which may be implemented as a plurality of software or software modules, or as a single software or software module. The present application is not particularly limited herein.

The server 103 may be a server that provides various services, for example, a background server of a reading class application, which may perform processing such as analysis on data such as a search request received from the terminal device 101, and feed back a processing result (for example, a novel to be pushed) to the terminal device 101.

The server 103 may be hardware or software. When the server 103 is hardware, it may be implemented as a distributed server cluster composed of a plurality of servers, or may be implemented as a single server. When the server 103 is software, it may be implemented as a plurality of software or software modules (for example, to provide distributed services), or may be implemented as a single software or software module. The present application is not particularly limited herein.

It should be noted that, the method for searching information provided by the embodiment of the present application is generally performed by the server 103, and accordingly, the device for searching information is generally disposed in the server 103.

With continued reference to FIG. 2, a flow 200 of one embodiment of a method for searching information in accordance with the present application is shown. The method for searching information includes the steps of:

step 201, a search request input by a user is received.

In the present embodiment, an execution subject of the method for searching information (e.g., the server 103 shown in fig. 1) may receive a search request input by a user from a terminal device (e.g., the terminal device 101 shown in fig. 1). Wherein the search request may include search information (query) entered by the user. The search information may be information describing the reading needs of the user. For example, if a user has a reading requirement for a novel, the search information will typically include at least one of the novel name, the author name, and the principal name of the novel. Specifically, the user may open a reading class application installed on the terminal device, input search information including at least one of a novel name, an author name, and a main angle name in an input box, and click a search button. When the user clicks the search button, the terminal device thereof may transmit a search request to the execution body.

Step 202, determining whether the search type corresponding to the search request is a novel search type.

In this embodiment, the execution body may determine whether the search type corresponding to the search request is a novel search type. If the novel search type is the case, step 203 is performed. In step 203, if the search type corresponding to the search request is a novel search type, the search request is input to a pre-trained analytical model to obtain a search expression corresponding to the search request.

In this embodiment, if the search type corresponding to the search request is a novel search type, the execution body may input the search request to the parsing model to obtain the search expression corresponding to the search request. Wherein the analytical model may be used to identify at least one of a novel name, an author name, and a principal angle name. The search expression may be spliced from at least one of a novel name, an author name, and a principal angle name. Specifically, the parsing model may perform word segmentation logic and component analysis on the search request, identify at least one of a novel name, an author name, and a principal angle name therein, and splice out the retrieval expression.

In some alternative implementations of the present embodiment, the analytical model may be trained by:

first, a second set of training samples is obtained.

Wherein the second training samples in the second set of training samples may include a second sample search request and a corresponding second sample retrieval expression. The search type corresponding to the second sample search request may be a novel search type, and the corresponding second sample retrieval expression may be spliced by at least one of a novel name, an author name and a principal angle name.

In addition, the method and the device solve the technical problem that search requests only comprising the novel name and the principal angle name or search requests only comprising the principal angle name cannot be identified and recalled. The executing body may further obtain a principal angle name set of the novel, and construct a second sample search request based on the principal angle name set, so as to obtain a large number of second training samples.

Then, for a second training sample in the second training sample set, taking a second sample search request in the second training sample as input, taking a second sample search expression in the second training sample as output, and training to obtain an analytical model.

In general, the execution subject may train the RNN (Recurrent Neural Networks, recurrent neural network) model using the second training sample, optimize the model according to accuracy, and finally generate the analytical model.

Step 204, performing novel searching based on the searching expression to obtain novel to be pushed and pushing the novel to the user.

In this embodiment, the execution body may perform a novel search based on the search expression to obtain a novel to be pushed and push the novel to the user. For example, the execution subject may retrieve a novel including a vocabulary in the retrieval expression as a novel to be pushed to the user.

In some optional implementations of this embodiment, the executing entity may search among a pre-generated summary information set based on a search expression, and determine that the novel is to be pushed. For example, the executing body may retrieve, from the set of novel summary information, novel summary information including the vocabulary in the retrieval expression, and push, as the novel to be pushed, the novel corresponding to the novel summary information to the user. The novel abstract information can be in an XML format, and the content of the novel abstract information comprises, but is not limited to, novel names, author names, main corner names, classifications, labels, novel numbers and the like.

The method for searching information provided by the embodiment of the application comprises the steps of firstly determining whether a search type corresponding to a search request input by a user is a novel search type; then if the search type corresponding to the search request is a novel search type, inputting the search request into an analysis model to obtain a search expression corresponding to the search request; and finally, carrying out novel searching based on the searching expression to obtain novel to be pushed and pushing the novel to the user. The analysis model can identify at least one of the novel name, the author name and the principal angle name from the search request of the novel search type, solves the technical problem that the principal angle name in the search request cannot be identified or the principal angle name is mistakenly identified as the author name in the prior art, searches based on at least one of the novel name, the author name and the principal angle name, expands application scenes, and improves the recall rate of the novel with reading requirements to users.

With further reference to fig. 3, a flow 300 of yet another embodiment of a method for searching information in accordance with the present application is shown. The method for searching information includes the steps of:

step 301, a search request input by a user is received.

In this embodiment, the specific operation of step 301 is described in detail in step 201 in the embodiment shown in fig. 2, and will not be described herein.

Step 302, inputting the search request into a pre-trained trigger model to obtain the corresponding search type of the search request.

In this embodiment, the execution subject of the method for searching information (e.g., the server 103 shown in fig. 1) may input a search request to the trigger model, resulting in a search type corresponding to the search request. Wherein the trigger model may be used to identify a search type based on at least one of a novel name, an author name, and a principal angle name. Specifically, the trigger model may analyze the search request to determine a probability that the search request belongs to a preset N (N is a positive integer) search types. The preset N search types may include, but are not limited to, a novel search type, a news event search type, an encyclopedia search type, a weather search type, and the like.

In some alternative implementations of the present embodiment, if a search type is preset, the trigger model may identify and recall a search request describing the reading needs of the novice. In general, the trigger model may identify a probability that a search request belongs to a novel search type. If the probability is larger than a preset probability threshold, the search type corresponding to the search request is a novel search type; if the probability is not greater than the preset probability threshold, the search type corresponding to the search request is a non-novel search type.

In some alternative implementations of the present embodiment, the trigger model may be trained by:

first, a first set of training samples is obtained.

Wherein the first training samples in the first training sample set may include a first sample search request and a corresponding first sample search type tag. If the first sample search request includes at least one of a novel name, an author name, and a principal angle name, its corresponding search type is a novel search type, its corresponding first sample search type tag has a value of 1, and its corresponding first training sample is a positive sample. If the first sample search request does not include any of the novice name, the author name, and the principal angle name, the corresponding search type is a non-novice search type, the value of the corresponding first sample search type tag is 0, and the corresponding first training sample is a negative sample.

In addition, the method and the device solve the technical problem that search requests only comprising the novel name and the principal angle name or search requests only comprising the principal angle name cannot be identified and recalled. The executing body may further obtain a set of principal corner names of the novel, and construct a first sample search request based on the set of principal corner names, so as to obtain a large number of first training samples.

Secondly, for a first training sample in the first training sample set, taking a first sample search request in the first training sample as input, taking a first sample search type label in the first training sample as output, and training to obtain a trigger model.

Generally, the executing body may train the two classification models by using a first training sample to obtain a trigger model. In addition, manual rules are set, including strategies such as a white list (meeting the list, namely recall), a black list (meeting the list, not recall) and the like, and model training is carried out. And adding a certain number of negative samples according to the model training effect, performing iterative optimization on the model, and reducing the false recall rate while ensuring the recall rate.

Step 303, if the search type corresponding to the search request is a novel search type, the search request is input to a pre-trained analytical model to obtain a search expression corresponding to the search request.

In this embodiment, the specific operation of step 303 is described in detail in step 203 in the embodiment shown in fig. 2, and will not be described herein.

Step 304, calculating the correlation degree between the retrieval expression and the novel abstract information in the novel abstract information set, and determining a candidate novel set.

In this embodiment, the executing body may calculate the correlation degree between the retrieval expression and the novice summary information in the novice summary information set, and determine the candidate novice set. For example, the executing body may select the novel summary information with the highest relevance (e.g. the top 20) from the novel summary information set, and generate a candidate novel set by using the novel corresponding to the selected novel summary information as the candidate novel. The search expression can be formed by splicing at least one of a novel name, an author name and a principal angle name. The more words in the search expression are included in the novel abstract information, the higher the relevance of the words to the search expression is.

Step 305, calculating the correlation degree between the search request and the candidate novels in the candidate novels, and determining the candidate novels.

In this embodiment, the executing body may calculate a correlation degree between the search request and the candidate novels in the candidate novels set, and determine the candidate novels set.

In this embodiment, the executing body may calculate a correlation degree between the search request and the candidate novels in the candidate novels set, and determine the candidate novels set. For example, for a candidate novel in the candidate novel set, the execution subject may calculate information such as an edit distance, a closeness, and a BM25 relevance between the search request and the candidate novel, and combine the information to generate the relevance. The executing body may then compare the correlation with a preset correlation threshold, and if the correlation is greater than the preset correlation threshold, add the candidate novels as the novels to the novels set.

Step 306, sorting and de-duplication the novel to be selected set based on the hotness of the novel to be selected in the novel to be selected set and the relatedness with the search request, determining the novel to be pushed, and pushing to the user.

In this embodiment, the executing body may sort and de-repeat the to-be-pushed novels based on the hotness of the to-be-selected novels in the to-be-selected novels set and the relativity with the search request. For example, the executing body may sort the to-be-selected novels set by combining the hotness and the relativity with the search request, then perform the de-duplication operation to generate the comprehensive ranking, and finally push the candidate novels with the highest comprehensive ranking as the to-be-pushed novels to the user.

As can be seen from fig. 3, the flow 300 of the method for searching information in this embodiment highlights the triggering step and the retrieving step compared to the corresponding embodiment of fig. 2. Therefore, the triggering model in the scheme described in the embodiment can identify the search request comprising at least one of the novel name, the author name and the principal angle name as the novel search type, the technical problem that the search request comprising only the novel name and the principal angle name or the search request comprising only the principal angle name cannot be identified in the prior art is solved, and the recall rate of the triggering model to the search request of the novel search type is improved. In addition, the scheme described in the embodiment combines the correlation degree of the search expression and the novel abstract information, the correlation degree of the search request and the novel, the heat degree of the novel and the like to be screened layer by layer, so that the matching degree of the selected novel and the reading requirement of the user is improved, and the click rate of the user on the pushed novel is further improved.

With further reference to fig. 4, a flow 400 of one embodiment of a novel summary information generation method in accordance with the present application is shown. The novel abstract information generation method comprises the following steps:

step 401, obtain the existing chapter of the novel.

In this embodiment, the execution subject of the novel summary information generation method (e.g., the server 103 shown in fig. 1) may acquire an existing section of the novel. Typically, the executive may obtain existing chapters of novels from a database. For example, two databases are preset, one for storing a list of existing chapters for a large number of novels and the other for storing existing chapters for a large number of novels. If an existing chapter of a novel needs to be acquired, the executing body may first search the existing chapter catalog of the novel from a database storing the existing chapter catalog by using the novel name of the novel as an index; the existing chapter of the novel is then looked up from a database storing existing chapters with the existing chapter catalog of the novel as an index.

It should be understood that the executing body may acquire all existing chapters of the novel, or may acquire only a part of the existing chapters of the novel. For example, if the number of existing chapters of a novel exceeds 1000, the execution body may acquire only the first 1000 chapters of the novel.

And step 402, performing word segmentation and part-of-speech analysis on the content of the existing chapter by adopting a natural language processing NLP shallow lexical analysis model to obtain a word segmentation result and a part-of-speech analysis result.

In this embodiment, the execution body may adopt an NLP (Natural Language Processing ) shallow lexical analysis model to perform word segmentation and part-of-speech analysis on the content of the existing chapter, so as to obtain a word segmentation result and a part-of-speech analysis result. The NLP shallow lexical analysis model is based on massive Internet data and combines a mixed structure of a structured perceptron and a deep neural network to realize Chinese word segmentation and part-of-speech analysis.

In some optional implementations of this embodiment, the executing body may firstly perform word segmentation on the content of the existing chapter by using an NLP shallow lexical analysis model to obtain a vocabulary set, and then recombine the vocabulary set to obtain a vocabulary sequence with semantics satisfying a preset condition, and determine the part of speech of the vocabulary in the vocabulary sequence. Wherein the vocabulary in the vocabulary set is the basic granularity vocabulary in the content of the existing chapter. The preset conditions may include, but are not limited to, at least one of: reasonable semantics, complete semantics, etc. The processing process of the NLP shallow lexical analysis model on the input text is a granularity and part-of-speech joint optimization process. Compared with a pipeline working mode of word segmentation and part-of-speech marking, the method has the advantage that the two tasks of granularity and part-of-speech can share the characteristics. The problem of error diffusion and propagation is relieved, and the problem that various morphemes have to be introduced due to the problem of cutting and scattering is avoided, so that the meaning of part-of-speech tagging results is improved.

Step 403, selecting a person name set of the existing chapter from the word segmentation result based on the part-of-speech analysis result.

In this embodiment, the execution body may select the name set of the existing chapter from the word segmentation result based on the part-of-speech analysis result. For example, the execution subject may select a vocabulary having a part of speech as a name from the word segmentation result, and generate a name set.

In addition, the executing body may also perform word frequency statistics on the names in the name set, and store the names in a dictionary manner. The memory structure may be { gid } { name } freq }. Where gid represents a novel name, name represents a name vocabulary, and freq represents a word frequency.

Step 404, determining a main corner name set of the existing chapter from the name sets of the existing chapter.

In this embodiment, the executing body may determine a main corner name set of the existing chapter from a person name set of the existing chapter. For example, the subject may select a top name (e.g., top 5) from the set of names in the existing chapter, and generate the set of principal corner names.

In some optional implementations of this embodiment, the executing entity may determine the set of principal corner names by:

First, similar names in the name set of the existing chapter are merged to generate a merged name set of the existing chapter.

Typically, there will be some similar names in the set of names, for example, for the same person, the names may include names consisting of last and first names, and only first names. And both the name and the name contain the name of the person and therefore belong to similar person names. And, such similar names belong to the same person, and can thus be combined. For example, in the process of name recognition, a { name: freq } dictionary is maintained and updated. When the NLP shallow lexical analysis model identifies a new name, a judging function is added. Specifically, if the new person name is overlapped with one person name in the dictionary, and the length of the overlapped character string is greater than 4 (gbk codes), that is, the new person name includes the person name in the dictionary, or the person name in the dictionary includes the new person name, the executing body may combine the new person name with the person name in the dictionary, and reserve the person name with the large character string length.

And then, filtering the merged person name set of the existing chapter based on a pre-generated stop word list to generate a role name set of the existing chapter.

In general, some vocabularies which are not names in practice may exist in vocabularies which are the names in the word segmentation result, such as the overlapping words of "haha", the predicates of "vintage", and the like. Therefore, it is necessary to build and maintain stop word list and filtering strategy, and remove some proper nouns and interference words identified as names from the merged name set. For example, the overlay word is first filtered from the set of merged names, then a stop word list is created and maintained, and the stop word is filtered from the set of merged names. Wherein, the disabling vocabulary can be formulated by the following steps: firstly randomly extracting a plurality of test data; then labeling the main angle name list; then finding out the wrongly marked vocabulary entries to form a vocabulary entry list; and finally, summarizing the vocabulary entry list, and counting the vocabulary frequency, thereby generating a stop vocabulary.

And finally, counting word frequencies of the character names in the character name set of the existing chapter, and selecting a main character name set of the existing chapter from the character name set of the existing chapter.

For example, a role name with a top word frequency ranking (e.g., top 5) is selected from the set of role names, and a set of principal role names is generated.

Step 405, generating the novel abstract information based on the main corner name set of the existing chapter.

In this embodiment, the executing body may generate the novel summary information based on the main corner name set of the existing chapter. For example, other description information of some novels is added on the basis of the main corner name set to generate the novel abstract information. The description information may include, but is not limited to, a novel name, an author name, a chapter number, a category, a label, a novel number, and the like. Thus, the generated novel abstract information may be in XML format, and the content may include not only novel names and author names, but also principal names. The storage structure of the novel abstract information may be { gid } { name1: freq1, name2: freq2, name3: freq3 … }, chapter_num: n1}, where gid is a novel name, name1, name2, name3, etc. are main corner names (gbk codes), freq1, freq2, freq3, etc. are word frequencies corresponding to the main corner names, chapter_num is the number of chapters, and n1 is the number of chapters.

In some alternative implementations of the present embodiment, if a section update exists in the novel, the executing entity may obtain the updated section, determine a set of principal names of the updated section by executing steps 402-404 again, and update the novel summary information based on the set of principal names of the updated section. Typically, the novel summary information is updated when the number of updated chapters is not less than a predetermined number value (e.g., 50). In addition, when the total number of sections of the novelties of the updated sections exceeds 1000, the above-described execution body may acquire only the first 1000 sections among the updated sections to update the novelty digest information.

According to the novel method for generating the summary information of the novel, firstly, a natural language processing NLP shallow lexical analysis model is adopted to perform word segmentation and part-of-speech analysis on the content of the existing section of the novel, and a word segmentation result and a part-of-speech analysis result are obtained; then, selecting a name set of the existing chapter from the word segmentation result based on the part-of-speech analysis result; then determining a main angle name set of the existing chapter from the name sets of the existing chapter; and finally, generating the novel abstract information based on the main corner name set of the existing chapter. And the NLP shallow lexical analysis model is adopted to carry out name recognition on the existing section of the novel, so that the name recognition accuracy is improved. And the main corner name is added into the novel abstract information, so that the content richness of the novel abstract information is improved, and the recall rate of the novel which has a reading requirement on a user is improved.

With further reference to fig. 5, a flow chart of one application scenario of a method for searching information is shown. As shown in fig. 5, the application scenario includes an offline portion and an online portion. Wherein, the online part comprises steps 501-508, and the online part comprises steps 509-514, which are specifically as follows:

step 501, the content of the novel chapter is crawled.

Step 502, the NLP shallow lexical analysis model performs word segmentation and lexical analysis, and screens out vocabularies with part of speech being a name.

In step 503, vocabulary filtering and maintenance are disabled.

Step 504, name merging.

In step 505, principal angle name data is generated.

Step 506, the chapter update number is greater than 50, and the process returns to step 501.

Step 507, the novel query triggers model training.

Step 508, the novel query analytical model is trained.

Step 509, receiving a query of a mobile phone and a computer terminal user.

Step 510, call the trigger model and the parsing model, recall the novel query, and splice the retrieval expression.

In step 511, the offline novel data schema is queried, and a candidate list is generated according to the relevance calculation of the novel data and the query.

At step 512, the online model scores the relevance and the heat.

Step 513, de-reordering based on the score.

Step 514, recall the highest ranked novice, generate a novice card, and push to the user.

With further reference to fig. 6, as an implementation of the method shown in the above figures, the present application provides an embodiment of an apparatus for searching information, which corresponds to the method embodiment shown in fig. 2, and which is particularly applicable to various electronic devices.

As shown in fig. 6, the apparatus 600 for searching information of the present embodiment may include: a receiving unit 601, a determining unit 602, an analyzing unit 603, and a retrieving unit 604. Wherein, the receiving unit 601 is configured to receive a search request input by a user; a determining unit 602 configured to determine whether the search type corresponding to the search request is a novel search type; the parsing unit 603 is configured to input the search request to a pre-trained parsing model to obtain a retrieval expression corresponding to the search request if the search type corresponding to the search request is a novel search type, where the parsing model is used for identifying at least one of a novel name, an author name and a principal angle name; the retrieving unit 604 is configured to perform a novel retrieving based on the retrieving expression, obtain a novel to be pushed, and push the novel to the user.

In the present embodiment, in the apparatus 600 for searching information: the specific processing of the receiving unit 601, the determining unit 602, the analyzing unit 603 and the retrieving unit 604 and the technical effects thereof may refer to the relevant descriptions of steps 201 to 204 in the corresponding embodiment of fig. 2, and are not repeated herein.

In some optional implementations of the present embodiment, the determining unit 602 includes: a trigger subunit (not shown in the figure) configured to input the search request to a pre-trained trigger model, and obtain a search type corresponding to the search request, where the trigger model is used to identify the search type based on at least one of a novel name, an author name, and a principal angle name.

In some alternative implementations of the present embodiment, the retrieval unit 604 includes: a retrieving subunit (not shown in the figure) configured to retrieve from a pre-generated set of novel summary information based on a retrieving expression, the novel summary information comprising a novel name, an author name, and a principal angle name, to determine a novel to be pushed.

In some optional implementations of the present embodiment, the retrieving subunit includes: a first calculation module (not shown in the figure) configured to calculate the correlation between the retrieval expression and the novice summary information in the novice summary information set, and determine a candidate novice set; a second calculation module (not shown in the figure) configured to calculate a correlation between the search request and the candidate novels in the candidate novels set, and determine a candidate novels set; a sorting and de-duplication module (not shown in the figure) is configured to sort and de-duplicate the set of novels to be pushed based on the hotness of the novels to be selected in the set of novels to be selected and the relevance to the search request.

In some optional implementations of this embodiment, the step of generating the novel summary information includes: acquiring the existing section of the novel; performing word segmentation and part-of-speech analysis on the content of the existing chapter by adopting a natural language processing NLP shallow lexical analysis model to obtain a word segmentation result and a part-of-speech analysis result; selecting a name set of the existing chapter from the word segmentation result based on the part-of-speech analysis result; determining a main angle name set of the existing chapter from the name sets of the existing chapter; based on the main corner name set of the existing chapter, the novel abstract information is generated.

In some optional implementations of this embodiment, performing word segmentation and part-of-speech analysis on the content of the existing chapter using a natural language processing NLP shallow lexical analysis model to obtain a word segmentation result and a part-of-speech analysis result, including: firstly, segmenting the content of the existing chapter by using an NLP shallow lexical analysis model to obtain a vocabulary set, then, recombining the vocabulary set to obtain a vocabulary sequence with the semantics meeting the preset conditions, and determining the part of speech of the vocabulary in the vocabulary sequence, wherein the preset conditions comprise at least one of the following: the semantics are reasonable and complete.

In some optional implementations of the present embodiment, determining the set of principal names of the existing chapter from the set of person names of the existing chapter includes: merging similar names in the name set of the existing chapter to generate a merged name set of the existing chapter; filtering the merged person name set of the existing chapter based on a pre-generated stop word list to generate a role name set of the existing chapter; and counting word frequencies of the character names in the character name sets of the existing chapters, and selecting a main character name set of the existing chapters from the character name sets of the existing chapters.

In some optional implementations of this embodiment, the step of generating the novel summary information further includes: if the novel has chapter update, acquiring an updated chapter; determining a set of principal names of the updated section, and updating the novel summary information based on the set of principal names of the updated section.

In some optional implementations of this embodiment, the training step of the trigger model includes: acquiring a first training sample set, wherein a first training sample in the first training sample set comprises a first sample search request and a corresponding first sample search type label; and for a first training sample in the first training sample set, taking a first sample search request in the first training sample as input, taking a first sample search type label in the first training sample as output, and training to obtain a trigger model.

In some optional implementations of this embodiment, the training step of the analytical model includes: acquiring a second training sample set, wherein a second training sample in the second training sample set comprises a second sample search request and a corresponding second sample retrieval expression, and the second sample retrieval expression comprises at least one of a novel name, an author name and a principal angle name; and for a second training sample in the second training sample set, taking a second sample search request in the second training sample as input, taking a second sample search expression in the second training sample as output, and training to obtain an analytical model.

Referring now to FIG. 7, there is illustrated a schematic diagram of a computer system 700 suitable for use in implementing an electronic device (e.g., server 103 of FIG. 1) in accordance with an embodiment of the present application. The electronic device shown in fig. 7 is only an example and should not be construed as limiting the functionality and scope of use of the embodiments of the application.

As shown in fig. 7, the computer system 700 includes a Central Processing Unit (CPU) 701, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 702 or a program loaded from a storage section 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data required for the operation of the system 700 are also stored. The CPU 701, ROM 702, and RAM 703 are connected to each other through a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

The following components are connected to the I/O interface 705: an input section 706 including a keyboard, a mouse, and the like; an output portion 707 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, a speaker, and the like; a storage section 708 including a hard disk or the like; and a communication section 709 including a network interface card such as a LAN card, a modem, or the like. The communication section 709 performs communication processing via a network such as the internet. The drive 710 is also connected to the I/O interface 705 as needed. A removable medium 711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 710 as necessary, so that a computer program read therefrom is mounted into the storage section 708 as necessary.

In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from a network via the communication portion 709, and/or installed from the removable medium 711. The above-described functions defined in the method of the present application are performed when the computer program is executed by a Central Processing Unit (CPU) 701.

The computer readable medium according to the present application may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present application, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or electronic device. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units involved in the embodiments of the present application may be implemented in software or in hardware. The described units may also be provided in a processor, for example, described as: a processor includes a receiving unit, a determining unit, a parsing unit, and a retrieving unit. The names of these units do not constitute a limitation of the unit itself in each case, and the receiving unit may also be described as "a unit that receives a search request input by a user", for example.

As another aspect, the present application also provides a computer-readable medium that may be contained in the electronic device described in the above embodiment; or may exist alone without being incorporated into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: receiving a search request input by a user; determining whether the search type corresponding to the search request is a novel search type; if the search type corresponding to the search request is a novel search type, inputting the search request into a pre-trained analysis model to obtain a search expression corresponding to the search request, wherein the analysis model is used for identifying at least one of a novel name, an author name and a principal angle name; and carrying out novel searching based on the searching expression to obtain novel to be pushed and pushing the novel to the user.

The above description is only illustrative of the preferred embodiments of the present application and of the principles of the technology employed. It will be appreciated by persons skilled in the art that the scope of the application referred to in the present application is not limited to the specific combinations of the technical features described above, but also covers other technical features formed by any combination of the technical features described above or their equivalents without departing from the inventive concept described above. Such as the above-mentioned features and the technical features disclosed in the present application (but not limited to) having similar functions are replaced with each other.

Claims

1. A method for searching information, comprising:

receiving a search request input by a user;

determining whether the search type corresponding to the search request is a novel search type;

if the search type corresponding to the search request is a novel search type, inputting the search request into a pre-trained analysis model to obtain a search expression corresponding to the search request, wherein the analysis model is used for identifying at least one of a novel name, an author name and a principal angle name, and the search expression is formed by splicing at least one of the novel name, the author name and the principal angle name;

And carrying out novel searching based on the searching expression to obtain novel to be pushed and pushing the novel to the user.

2. The method of claim 1, wherein the determining whether the search type to which the search request corresponds is a novel search type comprises:

and inputting the search request into a pre-trained trigger model to obtain a search type corresponding to the search request, wherein the trigger model is used for identifying the search type based on at least one of a novel name, an author name and a principal angle name.

3. The method of claim 1, wherein the performing the novel search based on the search expression to obtain the novel to be pushed comprises:

and searching in a pre-generated novel abstract information set based on the search expression, and determining the novel to be pushed, wherein the novel abstract information comprises a novel name, an author name and a principal angle name.

4. The method of claim 3, wherein the retrieving, based on the retrieving expression, in a pre-generated set of novel summary information, determining the novel to push comprises:

calculating the relativity of the retrieval expression and the novel abstract information in the novel abstract information set, and determining a candidate novel set;

Calculating the correlation degree between the search request and the candidate novels in the candidate novels set, and determining a to-be-selected novels set;

and sequencing and de-duplicating the novel set based on the hotness of the novel to be selected in the novel set and the relativity of the novel to be selected and the search request, and determining the novel to be pushed.

5. A method according to claim 3, wherein the step of generating the novel summary information comprises:

acquiring the existing section of the novel;

performing word segmentation and part-of-speech analysis on the content of the existing chapter by adopting a natural language processing NLP shallow lexical analysis model to obtain a word segmentation result and a part-of-speech analysis result;

selecting a name set of the existing chapter from the word segmentation result based on the part-of-speech analysis result;

determining a main angle name set of the existing chapter from the name set of the existing chapter;

and generating the novel abstract information based on the main corner name set of the existing chapter.

6. The method of claim 5, wherein the performing word segmentation and part-of-speech analysis on the content of the existing chapter using a natural language processing NLP shallow lexical analysis model to obtain a word segmentation result and a part-of-speech analysis result comprises:

Firstly, segmenting the content of the existing chapter by using the NLP shallow lexical analysis model to obtain a vocabulary set, then recombining the vocabulary set to obtain a vocabulary sequence with the semantics meeting the preset conditions, and determining the part of speech of the vocabulary in the vocabulary sequence, wherein the preset conditions comprise at least one of the following: the semantics are reasonable and complete.

7. The method of claim 5, wherein the determining the set of principal names of the existing chapter from the set of person names of the existing chapter comprises:

merging similar names in the name set of the existing chapter to generate a merged name set of the existing chapter;

filtering the merged person name set of the existing chapter based on a pre-generated stop word list to generate a role name set of the existing chapter;

and counting word frequencies of the character names in the character name set of the existing chapter, and selecting the main character name set of the existing chapter from the character name set of the existing chapter.

8. The method of claim 5, wherein the step of generating the novel summary information further comprises:

if the novel has chapter update, acquiring an updated chapter;

And determining a main corner name set of the updated chapter, and updating the novel abstract information based on the main corner name set of the updated chapter.

9. The method of claim 2, wherein the training of the trigger model comprises:

acquiring a first training sample set, wherein a first training sample in the first training sample set comprises a first sample search request and a corresponding first sample search type label;

and for a first training sample in the first training sample set, taking a first sample search request in the first training sample as input, taking a first sample search type label in the first training sample as output, and training to obtain the trigger model.

10. The method according to one of claims 1-8, wherein the training of the analytical model comprises:

obtaining a second training sample set, wherein a second training sample in the second training sample set comprises a second sample search request and a corresponding second sample retrieval expression, and the second sample retrieval expression comprises at least one of a novel name, an author name and a principal angle name;

and for a second training sample in the second training sample set, taking a second sample search request in the second training sample as input, taking a second sample search expression in the second training sample as output, and training to obtain the analysis model.

11. An apparatus for searching information, comprising:

a receiving unit configured to receive a search request input by a user;

a determining unit configured to determine whether a search type corresponding to the search request is a novel search type;

the analysis unit is configured to input the search request into a pre-trained analysis model to obtain a retrieval expression corresponding to the search request if the search type corresponding to the search request is a novel search type, wherein the analysis model is used for identifying at least one of a novel name, an author name and a principal angle name, and the retrieval expression is formed by splicing at least one of the novel name, the author name and the principal angle name;

and the retrieval unit is configured to perform novel retrieval based on the retrieval expression, obtain novel to be pushed and push the novel to the user.

12. The apparatus of claim 11, wherein the determining unit comprises:

and the triggering subunit is configured to input the search request to a pre-trained triggering model to obtain a search type corresponding to the search request, wherein the triggering model is used for identifying the search type based on at least one of a novel name, an author name and a principal angle name.

13. The apparatus of claim 11, wherein the retrieval unit comprises:

and the retrieval subunit is configured to retrieve from a pre-generated novel summary information set based on the retrieval expression, and determine the novel to be pushed, wherein the novel summary information comprises a novel name, an author name and a principal angle name.

14. The apparatus of claim 13, wherein the retrieval subunit comprises:

a first calculation module configured to calculate a correlation of the retrieval expression with the novice summary information in the novice summary information set, determining a candidate novice set;

a second calculation module configured to calculate a correlation between the search request and the candidate novels in the candidate novels set, and determine a candidate novels set;

the sorting and de-duplication module is configured to sort and de-duplicate the set of the novels to be pushed based on the hotness of the novels to be selected in the set of the novels to be selected and the relatedness of the novels to the search request.

15. The apparatus of claim 13, wherein the step of generating the novel summary information comprises:

acquiring the existing section of the novel;

16. The apparatus of claim 15, wherein the performing word segmentation and part-of-speech analysis on the content of the existing chapter using a natural language processing NLP shallow lexical analysis model to obtain a word segmentation result and a part-of-speech analysis result comprises:

17. The apparatus of claim 15, wherein the determining the set of principal names of the existing chapter from the set of person names of the existing chapter comprises:

18. The apparatus of claim 15, wherein the step of generating the novel summary information further comprises:

if the novel has chapter update, acquiring an updated chapter;

19. The apparatus of claim 12, wherein the training of the trigger model comprises:

20. The apparatus of one of claims 11-18, wherein the training of the analytical model comprises:

21. An electronic device, comprising:

one or more processors;

a storage device having one or more programs stored thereon,

when executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-10.

22. A computer readable medium having stored thereon a computer program, wherein the computer program, when executed by a processor, implements the method of any of claims 1-10.