CN116775974A

CN116775974A - Information screening method

Info

Publication number: CN116775974A
Application number: CN202310781347.3A
Authority: CN
Inventors: 侯天宇; 石伟; 闫文敏; 卢漫天; 姚凯义; 王鹏
Original assignee: Zhongzi High Tech Consulting Center Co ltd
Current assignee: Zhongzi High Tech Consulting Center Co ltd
Priority date: 2023-06-29
Filing date: 2023-06-29
Publication date: 2023-09-19
Anticipated expiration: 2043-06-29
Also published as: CN116775974B

Abstract

The invention provides an information screening method, which relates to the technical field of information screening and comprises the following steps: acquiring and analyzing search content input by a user to obtain search keywords; acquiring first information related to the existence of the search keyword, analyzing the timeliness of each first information, and screening to obtain second information; weighting calculation is carried out on the reliability of each piece of second information, and third information meeting the first preset condition is screened; calculating the correlation degree of the content and the title in each piece of third information, and screening fourth information meeting a second preset condition; sorting the relevance of the screened fourth information to obtain a final screening result; by judging the reliability, timeliness and relativity of the information sources, irrelevant information and expiration information are filtered and removed, and the user is ensured to obtain high-quality and high-credibility information.

Description

Information screening method

Technical Field

The invention relates to the technical field of information screening, in particular to a method for screening information.

Background

With the rapid development of internet technology, the internet has become a global information data platform, and more users use the internet as a main source for obtaining information. In the face of massive web page information resources on the internet, users typically utilize search engine services to obtain the required information. However, although the search engine can assist the user to obtain the required relevant webpage information resources to a certain extent, due to the large degree of freedom of webpage content distribution and the characteristics of openness, unbounded property and the like of the internet, people have difficulty in effectively controlling and managing the quality of the information resources, so that a large amount of junk information is obtained, that is, in a large amount of information resources provided by the search engine, the information in the front ranking is not high in quality and high in credibility, and even various false, wrong or outdated information exists.

Therefore, the invention provides a method for screening information.

Disclosure of Invention

The invention provides an information screening method, which is used for acquiring search information by analyzing search content input by a user, judging and screening the reliability, timeliness and relativity of the search information, and ensuring that the user obtains high-quality and high-credibility information.

The invention provides an information screening method, which comprises the following steps:

step 1: acquiring and analyzing search content input by a user to obtain search keywords;

step 2: acquiring first information related to the search keywords from an information data platform, analyzing the timeliness of each first information, and screening to obtain second information;

step 3: weighting calculation is carried out on the reliability of each second information based on the source information corresponding to each second information, and third information meeting the first preset condition is screened;

step 4: calculating the relativity of the content and the title in each third information based on the page layout condition of each third information, and screening fourth information meeting a second preset condition;

step 5: and carrying out relevance ranking on the screened fourth information to serve as a final screening result.

Preferably, acquiring and analyzing search content input by a user to obtain a search keyword includes:

acquiring search content input by a user, and marking the part of speech of the search content;

based on the part-of-speech tagging result, the search content is segmented by using the combination of the related words, the stop words, the number words and the graduated words and the punctuation marks, and the search keywords are generated.

Preferably, first information related to the search keyword is obtained from an information data platform, timeliness of each first information is analyzed, and second information is obtained through screening, including:

acquiring first information associated with the existence of the search keyword from an information data platform, wherein the information in the information data platform comprises information of a non-fixed type and information of a fixed type;

judging the first information type, and if the first information is a fixed type, regarding the first information as second information, wherein the fixed type comprises: fixed information category, long-term unchanged information category, updated or explicitly pointed information category;

if the first information is of a non-fixed type, acquiring the release time of each first information, classifying according to year, month and day rules, and determining the first quantity released in a time interval and a release information heat list, wherein the non-fixed type comprises: news event category, periodic update information category, and continuous update information category;

based on the determination result, judging the timeliness of each first information, and screening and obtaining second information based on the judgment result.

Preferably, the weighting calculation is performed on the reliability of each piece of second information based on the source information corresponding to each piece of second information, and third information meeting the first preset condition is screened, including:

tracing source information corresponding to each piece of second information, and obtaining a website domain name, a network security protection grade, a website collapse frequency and website backup information to carry out first evaluation;

acquiring user access information of a website corresponding to the source information, and performing second evaluation;

performing third evaluation according to the historical advertisement receiving quantity and the effective link quantity of the website corresponding to the source information;

weighting calculation is carried out on the reliability of the corresponding second information based on the first evaluation result, the second evaluation result and the third evaluation result;

and screening the second information based on the calculation result and the first preset condition to obtain third information.

Preferably, the weighting calculation for the reliability of the corresponding second information based on the first evaluation result, the second evaluation result, and the third evaluation result includes:

y _i ＝αP1+βP2+γP3

wherein alpha represents the weight corresponding to the first evaluation result, P1 represents the first evaluation result of the ith second information, y _i Scoring, y, a website domain name representing the website corresponding to the ith second information _max Maximum score, p, representing website domain name _i Representing the network security protection level of the website corresponding to the ith second information, p _max Representing the highest level of network security protection,e _i representing the website breakdown frequency of the website corresponding to the ith second information, e _max Representing the maximum allowable site collapse frequency, z _i Scoring, z, website backup information representing website corresponding to the ith second information _max Representing the maximum score of website backup information, beta represents the weight corresponding to the second evaluation result, P2 represents the second evaluation result of the ith second information, and x _i Representing the average daily user access amount, t, of the website corresponding to the ith second information _i Represents the average access time, x of the website corresponding to the ith second information _max Representing the maximum value, t, of the average daily user access amount of all the second information corresponding to the websites _max Represents the maximum value in the average access time of all the websites corresponding to the second information, gamma represents the weight corresponding to the third evaluation result, P3 represents the third evaluation result of the ith second information, and C _max Representing the maximum historical advertisement accepting quantity of websites corresponding to all second information, C _i Indicating the historical advertisement accepting quantity of the website corresponding to the ith second information, l _max Indicating the maximum number of links set in the second information web site, l _i Representing the actual number of links in the website corresponding to the ith second information, and α+β+γ=1; c (C) _ave Indicating the average historical advertisement receiving quantity of all the websites corresponding to the second information.

Preferably, calculating a correlation degree between the content and the title in each third information based on the page layout condition of each third information, and screening fourth information meeting a second preset condition, including:

acquiring title content and text content in a page corresponding to each third information based on the page layout condition of each third information, and marking the parts of speech;

analyzing the title structure, and determining first weight values of words at different positions in the title;

segmenting the title content based on the part-of-speech tagging result, and determining a second weight value of each word in the segmentation result based on the segmentation result and a preset part-of-speech priority order;

determining the total weight value of each word in the segmentation result based on the first weight value and the second weight value;

based on the search keywords and the segmentation results, determining matching words which are associated with the search keywords in the segmentation results, acquiring the total weight value of each matching word, determining the lowest total weight value in the matching words, and taking the lowest total weight value as a title keyword acquisition standard;

determining a first keyword set corresponding to the title content based on an acquisition standard;

setting a corresponding number of second keyword sets to the text content based on the number of elements in the first keyword sets;

based on part of speech tagging results, determining occurrence frequencies of words with consistent parts of speech in text content, extracting all words with part of speech corresponding to the occurrence frequencies of the first n1 words, and respectively filling the words into corresponding second keyword sets, wherein the element number is also n1, and each second keyword set corresponds to all words with one part of speech;

acquiring keyword intersections of the first keyword set and each filled second keyword set, and determining a first correlation degree between the corresponding keyword intersection and the first keyword set based on a total weight value of the keyword intersections;

determining a second relevance between the corresponding keyword intersection and the text content based on the word frequency of the second keyword set after the corresponding filling of each keyword in the corresponding keyword intersection;

and calculating the relevance of the title and the text content in each third information corresponding page based on all the first relevance and all the second relevance.

Preferably, calculating the relevance between the title and the text content in the page corresponding to each third information based on all the first relevance and all the second relevance includes:

wherein P is _d Representing the relevance of the title to the text content in the d-th page, qd _k Represents the d-thFd represents the total weight value corresponding to the first keyword set corresponding to the d-th page, s represents the set number of keyword intersections in the d-th page, and is consistent with the set number of keyword intersections, fd _n Representing the total word frequency, dd, of all keywords in the nth second keyword set in the d-th page in the text content _n And (3) representing the total word frequency of all keywords in the keyword intersection corresponding to the nth second keyword set in the d-th page in the text content, wherein n1 represents the number of sets of the nth second keyword set in the d-th page and is consistent with the number of sets of the keyword intersection, a1 represents a first duty ratio coefficient, and a2 represents a second duty ratio coefficient.

Preferably, the sorting of the relevance of the screened fourth information is performed, and as a final screening result, the method includes:

acquiring the correlation degree of the title content and the text content in each piece of fourth information, and comparing the correlation degree of the title content and the text content corresponding to each piece of fourth information;

based on the comparison result, sorting is carried out according to the rule from big to small, and a final sorting result is obtained and displayed as a final screening result.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention may be realized and attained by the structure particularly pointed out in the written description and drawings.

The technical scheme of the invention is further described in detail through the drawings and the embodiments.

Drawings

The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention. In the drawings:

fig. 1 is a flowchart of a method for screening information according to an embodiment of the present invention.

Detailed Description

The preferred embodiments of the present invention will be described below with reference to the accompanying drawings, it being understood that the preferred embodiments described herein are for illustration and explanation of the present invention only, and are not intended to limit the present invention.

Example 1

The invention provides an information screening method, as shown in figure 1, comprising the following steps:

In this embodiment, the information data platform refers to a platform that gathers all basic information and serves users, and the information data platform retrieves search keywords through a search engine, such as hundred degrees, google, and the like;

in this embodiment, the first information refers to network information with association between an information title and a search keyword, where the search content is: the method comprises the steps that if a dell computer is detached, the retrieval keywords are dell, the computer and detached, namely, the release time, the release website and the release page of the information with dell, the computer and detached exist in a webpage title, and the first information is obtained;

in this embodiment, timeliness refers to, for example: first information 1 and first information 2 associated with event 1 are respectively issued at 2023.1.4 and 2023.5.6, and then first information 2 issued on 2023.5.6 is necessarily more comprehensive than first information 1 issued on 2023.1.4, and timeliness of first information 2 is necessarily stronger than 2023.1.4;

in this embodiment, the source information corresponding to the second information refers to the website domain name, the network security protection level, the website collapse frequency, the website backup information, the daily average user access amount and average access time, the historical advertisement receiving number and the effective link number of the corresponding website;

in the embodiment, the reliability of each second information is calculated by weighting, and the reliability of each second information source website is calculated by weighting, and the result of the weighting calculation is within [0,1];

in this embodiment, the first preset condition refers to, for example: the second information weighting result is 0.85, and the first preset condition is 0.6, and the second information can be used as third information through screening;

in this embodiment, the page layout condition of each third information refers to the title and text distribution condition of the page where each third information is located, and title information and corresponding text content information in the page are obtained according to the title and text distribution condition;

in this embodiment, the second preset condition refers to that if the correlation degree between the third information title content and the text content is 0.7 and the second preset condition is 0.6, the information is filtered and can be used as fourth information;

in this embodiment, if the correlation between the title and the text content in the fourth information 1 is 0.7, the correlation of the fourth information 1 is 0.7.

The beneficial effects of the technical scheme are as follows: the retrieval information is obtained by analyzing the retrieval content input by the user, the reliability, timeliness and relativity of the retrieval information source are judged, irrelevant information and expiration information are filtered and removed, and the user is ensured to obtain high-quality and high-credibility information.

Example 2

The invention provides an information screening method, which is used for acquiring and analyzing search contents input by a user to obtain search keywords, and comprises the following steps:

In this embodiment, if the search content input by the user is: an intelligent control system and method for poultry breeding environment; the part of speech analysis results are: the method comprises the following steps of (1) counting words and measuring words, namely poultry/noun, cultivation/verb, environment/noun,/stop word, intelligent/noun, control/verb, system/noun and/stop word, method/noun, and obtaining search keywords after segmentation: birds/nouns, breeds/verbs, environments/nouns, intelligence/nouns, controls/verbs, systems/nouns, methods/nouns.

The beneficial effects of the technical scheme are as follows: by analyzing the search content input by the user, the search keyword is determined, the search information is acquired, and the search information which has relevance with the search information and the search content of the user is acquired.

Example 3

The invention provides a screening method of information, which acquires first information related to search keywords from an information data platform, analyzes the timeliness of each first information, screens to obtain second information, and comprises the following steps:

In this embodiment, if the first information is a fixed type, i.e. a fixed type, a long-term unchanged type, a type of information for which update has ended or is explicitly pointed, the validity of the first information will not change with time, so that it is not necessary to determine the timeliness of the first information;

in this embodiment, if the first information is of a non-fixed type, the judgment needs to be performed based on the first number of distributed information and the distribution information hotness list in the time interval, for example: if the first quantity and the heat degree of the information 1 are not periodically changed along with time, the heat degree of the information 1 is highest in 1 month, the first quantity of the information 1 is most released in a 1.25 time interval, the change amplitude of the first quantity of the information 1 in a 1.26 and 1.27 time interval is not more than 5%, the timeliness of the first information in the 1.27 time interval is the greatest, and the second information obtained by screening is the first information in the 1.27 time interval;

if the first quantity and the heat degree released by the information 2 are periodically changed along with time and the first quantity and the heat degree released in the three sections of 1.14, 1.21 and 1.28 are the maximum value of the same period, the second information obtained by screening is the first information in the time section of 1.28.

The beneficial effects of the technical scheme are as follows: through analyzing the timeliness of the retrieval information, the expired information is screened and filtered, so that the information with stronger timeliness can be obtained by the user, and the quality of the information obtained by the user is ensured.

Example 4

The invention provides an information screening method, which carries out weighted calculation on the reliability of each second information based on source information corresponding to each second information and screens third information meeting a first preset condition, and comprises the following steps:

In this embodiment, the first evaluation is performed on the website, and scoring is required to be performed on the website domain name and the website backup information;

in this embodiment, website domain names are classified by institutions into 4 types of government institutions/organizations (. Gov), non-profit websites (. Org), educational institutions (. Edu) and commercial websites (. Net.com), since different institutions have different reliability, such as reliability: government agency/organization (.gov) > non-profit web site (.org) > educational agency (.edu) > business web site (.net.com), so different agencies score differently: government agencies/organizations (. Gov) for 100 points, non-profit web sites (. Org) for 80 points, educational institutions (. Edu) for 60 points, and commercial web sites (. Net.com) for 40 points;

in this embodiment, scoring the website backup information includes a number of aspects: whether the backup is automatic, whether the backup is daily, whether the backup is external to the server or not, and whether the backup is incremental or not; if the backup information of the website 1 meets the requirements of automatic backup, daily backup and server external backup, the score of the website 1 is 75 points; if the backup information of the website 2 meets the automatic backup and the daily backup, the score of the website 2 is 50 points;

in the embodiment, the level of the network security protection is classified into 10 levels, and the higher the level of the network security protection is, the stronger the reliability of the website is;

in this embodiment, the user access information includes: the average daily user visit amount and the average visit time of each user of the website; the average daily user access amount of the website is obtained by taking multiple accesses of the same user in one day as one access in the process of obtaining;

in the embodiment, the value ranges of the first evaluation result, the second evaluation result and the third evaluation result are the same and are all in the range of the [0,1] interval;

in this embodiment, the result range of the weighting calculation is within the interval [0.1], and the first preset condition is generally set to 0.6.

The beneficial effects of the technical scheme are as follows: the reliability of the information source website is calculated, and the information is screened based on the calculation result, so that the user can obtain the correct information with higher reliability, and the interference of errors and false information to the user is avoided.

Example 5

The invention provides an information screening method, which carries out weighted calculation on the reliability of corresponding second information based on a first evaluation result, a second evaluation result and a third evaluation result, and comprises the following steps:

y _i ＝αP1+βP2+γP3

wherein alpha represents the weight corresponding to the first evaluation result, P1 represents the first evaluation result of the ith second information, y _i Scoring, y, a website domain name representing the website corresponding to the ith second information _max Maximum score, p, representing website domain name _i Representing the network security protection level of the website corresponding to the ith second information, p _max Representing the highest level of network security, e _i Representing the website breakdown frequency of the website corresponding to the ith second information, e _max Representing the maximum allowable site collapse frequency, z _i Scoring, z, website backup information representing website corresponding to the ith second information _max Representing the maximum score of website backup information, beta represents the weight corresponding to the second evaluation result, P2 represents the second evaluation result of the ith second information, and x _i Representing the average daily user access amount, t, of the website corresponding to the ith second information _i Represents the average access time, x of the website corresponding to the ith second information _max Representing the maximum value, t, of the average daily user access amount of all the second information corresponding to the websites _max Represents the maximum value in the average access time of all the websites corresponding to the second information, gamma represents the weight corresponding to the third evaluation result, P3 represents the third evaluation result of the ith second information, and C _max Representing the maximum historical advertisement accepting quantity of websites corresponding to all second information, C _i Indicating the historical advertisement accepting quantity of the website corresponding to the ith second information, l _max Indicating the maximum number of links set in the second information web site, l _i Representing the actual number of links in the website corresponding to the ith second information, and α+β+γ=1; c (C) _ave Indicating the average historical advertisement receiving quantity of all the websites corresponding to the second information.

In this embodiment, α is generally 0.4, β is generally 0.3, and γ is generally 0.3.

The beneficial effects of the technical scheme are as follows: by calculating the reliability of the information source website, the user is facilitated to acquire correct information with higher reliability, and the interference of error information and false information to the user is avoided.

Example 6

The invention provides a screening method of information, which calculates the relativity between the content and the title in each third information based on the page layout condition of each third information, screens fourth information meeting a second preset condition, and comprises the following steps:

In this embodiment, the first weight values of words at different positions of the title are different, for example: an intelligent control system and method for poultry breeding environment, wherein the first weight value of the breeding environment is larger than the first weight value of intelligent control, and the value range of the first weight value is [0,1];

in this embodiment, based on the part-of-speech tagging result, the title content is segmented by using a combination of related words, stop words, number words and stop words, and punctuation marks, for example: an intelligent control system and method for poultry breeding environment, the segmentation result is: birds/nouns, breeds/verbs, environments/nouns, intelligence/nouns, controls/verbs, systems/nouns, methods/nouns;

in this embodiment, the priorities of the different parts of speech are different, and if the noun priority is greater than the verb priority, the second weight value of the noun is greater than the second weight value of the verb, and the range of the second weight value is [0,1];

in this embodiment, if the first weight value of the segmentation word 1 is 0.4 and the second weight value is 0.3, the total weight value of the segmentation word 1 is 0.7;

in this embodiment, if the segmentation result is poultry/noun, cultivation/verb, environment/noun, intelligence/noun, control/verb, system/noun, method/noun, the search keyword is [ poultry, cultivation, gas, temperature, control, system ], the matching word is [ poultry, cultivation, control, system ];

in the embodiment, if the matching words are poultry, breeding, control and system, and the corresponding total weight values are 0.6, 0.7, 0.4 and 0.8 respectively, the obtaining standard of the first keywords is that the total weight value is greater than 0.4, the words with the total weight value greater than 0.4 in the segmentation result are the first keywords, and all the first keywords are placed in a set, and the set is the first keyword set corresponding to the title content;

in this embodiment, the word occurrence frequency of the word consistent with the word in the text 1 is as follows: poultry, temperature, gas, environment, intelligence, method, and the number of elements in the first keyword set is 4, the second keyword set is { poultry, temperature, gas, environment };

in this embodiment, if the number of elements in the first keyword set in the page is 10, there are 10 second keyword sets in the page, and the number of keywords in each second keyword set is 10;

in the embodiment, if the word matched with the search keyword in the title is environment, intelligent and controlled, and the total weight value of the control is the lowest and is 0.6, the word with the total weight value higher than 0.6 in the segmentation result is the first keyword;

in this embodiment, each second keyword has a keyword intersection corresponding to the second keyword, for example, the first keyword set is { bird, cultivation, environment, intelligence, control, system, method }, the second keyword set 1 is { bird, temperature, gas, environment, intelligence, method }, and the corresponding keyword intersection is { bird, environment, intelligence, method };

in this embodiment, if the total weight value of each keyword in the keyword intersection 1 is 0.8,0.8,0.6, the total weight value of the keyword intersection 1 is 2.2, and if the total weight value of the first keyword set is 10, the correlation between the keyword intersection 1 and the title is 0.22;

in this embodiment, the second correlation degree is calculated, and the correlation degree between each keyword intersection and the corresponding second keyword set needs to be obtained, for example: the total word frequency of the keyword intersection is 30, and the corresponding total word frequency of the second keyword set is 200, so that the correlation degree between the intersection and the corresponding second keyword set is 0.15.

The beneficial effects of the technical scheme are as follows: and by analyzing and screening the relevance of the title and the text content in the page where the third information is located, the method is favorable for filtering and eliminating irrelevant information, and ensures that a user obtains retrieval information relevant to retrieval content.

Example 7

The invention provides an information screening method, which calculates the relevance of a title and text content in a page corresponding to each third information based on all first relevance and all second relevance, and comprises the following steps:

wherein P is _d Representing the label in the d-th pageRelevance of questions to text content, qd _k Representing the total weight value corresponding to the k-th keyword intersection in the d-th page, fd represents the total weight value corresponding to the first keyword set corresponding to the d-th page, s represents the set number of the keyword intersection in the d-th page, and is consistent with the set number of the keyword intersection, fd _n Representing the total word frequency, dd, of all keywords in the nth second keyword set in the d-th page in the text content _n And (3) representing the total word frequency of all keywords in the keyword intersection corresponding to the nth second keyword set in the d-th page in the text content, wherein n1 represents the number of sets of the nth second keyword set in the d-th page and is consistent with the number of sets of the keyword intersection, a1 represents a first duty ratio coefficient, and a2 represents a second duty ratio coefficient.

In this embodiment, the number of keyword intersections is equal to the number of second keyword sets, i.e., s=n1;

in this embodiment, a1+a2=1, and a1 generally takes a value of 0.5, and a2 generally takes a value of 0.5.

In this embodiment of the present invention, the process is performed,representing a corresponding first degree of correlation; />Representing the corresponding second degree of correlation.

The beneficial effects of the technical scheme are as follows: and by calculating the relevance of the title and the text content corresponding to each third information, false information and error information can be eliminated, and the user can obtain the relevant information with high quality and high credibility.

Example 8

The invention provides a method for screening information, which is used for sorting the relevance of screened fourth information, and comprises the following steps of:

In this embodiment, if the correlation between the content of the title and the content of the text in the fourth information 1 is 0.6 and the correlation between the content of the title and the content of the text in the fourth information 2 is 0.7, the result of the ranking is that: fourth information 2, fourth information 1.

The beneficial effects of the technical scheme are as follows: and the final screening result is obtained by sorting the fourth information, so that the user can quickly obtain the high-quality and high-credibility information which is most relevant to the retrieval content.

It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims

1. A method for screening information, comprising:

2. The method of claim 1, wherein obtaining and analyzing the search content input by the user to obtain the search keyword comprises:

3. The method of claim 1, wherein obtaining first information associated with the search keyword from the information data platform, analyzing the timeliness of each first information, and screening to obtain second information, comprises:

4. The method of claim 1, wherein weighting the reliability of each second information based on the source information corresponding to each second information, and screening the third information satisfying the first preset condition, comprises:

5. The method of claim 4, wherein weighting the reliability of the respective second information based on the first, second, and third evaluation results comprises:

y _i ＝αP1+βP2+γP3

wherein alpha represents the weight corresponding to the first evaluation result, P1 represents the first evaluation result of the ith second information, y _i Scoring, y, a website domain name representing the website corresponding to the ith second information _max Maximum score, p, representing website domain name _i Representing the network security protection level of the website corresponding to the ith second information, p _max Representing the highest level of network security, e _i Representing the ith second letterE, reporting the breakdown frequency of the website corresponding to the website _max Representing the maximum allowable site collapse frequency, z _i Scoring, z, website backup information representing website corresponding to the ith second information _max Representing the maximum score of website backup information, beta represents the weight corresponding to the second evaluation result, P2 represents the second evaluation result of the ith second information, and x _i Representing the average daily user access amount, t, of the website corresponding to the ith second information _i Represents the average access time, x of the website corresponding to the ith second information _max Representing the maximum value, t, of the average daily user access amount of all the second information corresponding to the websites _max Represents the maximum value in the average access time of all the websites corresponding to the second information, gamma represents the weight corresponding to the third evaluation result, P3 represents the third evaluation result of the ith second information, and C _max Representing the maximum historical advertisement accepting quantity of websites corresponding to all second information, C _i Indicating the historical advertisement accepting quantity of the website corresponding to the ith second information, l _max Indicating the maximum number of links set in the second information web site, l _i Representing the actual number of links in the website corresponding to the ith second information, and α+β+γ=1; c (C) _ave Indicating the average historical advertisement receiving quantity of all the websites corresponding to the second information.

6. The method of claim 1, wherein calculating a relevance between the content and the title in each third information based on the page layout condition of each third information, and screening fourth information satisfying the second preset condition, comprises:

7. The method of claim 6, wherein calculating the relevance of the title to the text content in each third information corresponding page based on all the first relevance and all the second relevance comprises:

wherein P is _d Representing the relevance of the title to the text content in the d-th page, qd _k Representing the total weight value corresponding to the k-th keyword intersection in the d-th page, fd represents the total weight value corresponding to the first keyword set corresponding to the d-th page, s represents the set number of the keyword intersection in the d-th page, and is consistent with the set number of the keyword intersection, fd _n Representing the total word frequency, dd, of all keywords in the nth second keyword set in the d-th page in the text content _n And (3) representing the total word frequency of all keywords in the keyword intersection corresponding to the nth second keyword set in the d-th page in the text content, wherein n1 represents the number of sets of the nth second keyword set in the d-th page and is consistent with the number of sets of the keyword intersection, a1 represents a first duty ratio coefficient, and a2 represents a second duty ratio coefficient.

8. The method of claim 1, wherein the sorting of the correlations of the screened fourth information as a final screening result comprises: