Abnormal ICP filing website detection method based on multivariate features
Technical Field
The invention relates to the technical field of computers, in particular to a method for detecting an abnormal ICP (inductively coupled plasma) filing website based on multivariate characteristics.
Background
ICP filing, also known as domain name filing, means that as long as the server is installed in a non-commercial website in the people's republic of china, the filing procedure needs to be performed in compliance. The method can not be used for doing business internet information service in the people's republic of China without recording. The purpose is to prevent illegal website operation activities from being carried out on the network and attack the spread of bad internet information.
But a plurality of abnormal ICP record websites exist at present, for example, some websites do not record and steal record information of others; some uses are false record numbers, illegal activities such as pornography and gambling. These websites with abnormal records are often difficult to supervise, and are not good for the healthy development of the internet. However, there is no clear detection method for detecting the websites with abnormal ICP records effectively and finding the bad websites in time.
Disclosure of Invention
The invention provides a method for detecting an abnormal ICP record website based on multiple characteristics, which can effectively detect the abnormal ICP record website and aims to solve the technical problem that a website with a large number of abnormal ICP records but no effective detection method exists in the existing internet.
The invention provides an abnormal ICP filing website detection method based on multivariate characteristics, which comprises the following steps:
acquiring information required by website record detection;
detecting the multivariate abnormal characteristics of the ICP filing website;
and step three, counting and storing abnormal characteristic information.
Preferably, the step one comprises the following specific steps:
step 1): crawling the content of the website by a web crawler, and mainly acquiring the title of a website homepage, key words of important content, ICP record number at the bottom of the homepage and URL pointed by a hyperlink of the ICP record number;
step 2): taking the registered domain name of the website to be detected as a query condition, and querying and acquiring authoritative record data of the website through an ICP information record management system provided by an industrial and informatization department, wherein the authoritative record data comprises record main body information and record website information;
step 3): acquiring an IP address recorded by a website domain name A through DNS domain name resolution, and acquiring the position of a website server through an IP positioning technology;
step 4): and acquiring the registration time of the domain name and the registrar information of the domain name by inquiring the WHOIS information of the domain name of the website.
Preferably, the multivariate abnormal characteristics in the step two comprise ICP filing flow detection and filing content detection; the ICP filing flow detection comprises the steps of whether a website is filed, filing updating exception and filing cancellation exception, and the ICP filing content detection comprises ICP filing main information, filing website information and website content exception.
Preferably, the specific steps of the multivariate abnormal characteristic detection comprise:
step 1: judging whether ICP record information exists in the domestic website, and if the ICP record information does not exist in the domestic website, directly judging the domestic website to be an abnormal website; if yes, carrying out the next step;
step 2: and carrying out record updating detection, cancellation abnormity detection and record content abnormity detection, and judging whether the website is abnormal or not.
Preferably, the record updating detection in step 2 means that the user of the website domain name has changed to detect whether the record information is updated.
Preferably, the pin-out anomaly detection in step 2 comprises:
A. if the domain name is overdue and not registered, whether the corresponding filing information is cancelled or not is judged;
B. the website server is shifted from home to abroad, and whether the filing information is cancelled or not is judged;
C. the domain name registrant is changed from an authorized registrant to an unauthorized registrant when the domain name registrant records, and whether the record information is cancelled or not is judged.
Preferably, the detecting of the content exception in step 2 includes: detecting the abnormal information of the record main body, detecting the abnormal information of the record website and judging whether illegal contents exist in the website contents.
Preferably, the docket body information abnormality detection includes:
A) whether the website content is consistent with the property of the record unit;
B) whether a large number of filing websites exist under the filing main body;
C) the record main body is an enterprise and is logged off, and at the moment, whether a record website still exists under the record main body or not is judged.
Preferably, the detecting of the abnormal information of the docketing website comprises:
(A) whether the website title is consistent with the name of the record website or not;
(B) whether the website displays the filing number and whether the hyperlink of the filing number points to an ICP filing management system of an industrial and informatization department;
(C) the website displays whether the filing number is consistent with the query result of the ICP filing management system of the industrial and informatization department.
Preferably, the specific steps of step three include:
step (1): counting the number of abnormal features of each website and the detailed information of each abnormal feature by detecting the features of the filing process and the filing content, wherein the number of the abnormal features can be used for feeding back the abnormal risk degree of the website;
step (2): and finally, persistently storing the statistical information to construct an abnormal ICP filing website information base.
The invention has the beneficial effects that:
the invention provides a method for effectively detecting an abnormal ICP filing website. The method is characterized in that anomaly analysis is carried out based on multiple features of website record information, counting statistics is carried out when one anomaly feature is detected, the counting statistics is used for describing the degree of the website record anomaly danger, the anomaly features are stored in an anomaly set of a corresponding website, and finally an abnormal ICP record website information base is formed. Through the information base, which websites have abnormal records can be definitely known, and which specific abnormalities and abnormal degrees exist in each website, so that the actual abnormal conditions of website records can be relatively comprehensively reflected, abnormal record websites can be effectively found from the abnormal record websites, a network supervision department can be assisted to more effectively supervise bad websites, website operators can also be helped to find out which abnormalities exist in website records of the website operators, and then the abnormal record websites can be corrected in time, and the potential hidden dangers of the website can be avoided.
The invention can reduce the occurrence of network safety hidden danger to a certain extent and is beneficial to maintaining the healthy development of the Internet.
Drawings
Fig. 1 is a schematic structural diagram of an embodiment of the present invention.
Description of the symbols of the drawings:
1. website domain name: detecting a domain name corresponding to a website;
2. an information acquisition module: four contents, namely website content, website ICP filing information, domain name DNS analysis data and domain name WHOIS information, need to be acquired. Website content, namely a title of a website, a keyword of the website content, an ICP filing number displayed at the bottom of the website and a URL of a hyperlink of the ICP filing number; the method comprises the following steps that website ICP filing information needs to obtain filing main body information and filing website information; analyzing data by a Domain Name System (DNS), and acquiring an IP address in a domain name A record; domain name WHOIS information, registration time and registrant of the domain name are required to be acquired;
3. record flow characteristic anomaly detection module: firstly, determining whether a website is already recorded, if not, determining that the website is an abnormal website, and subsequent feature detection does not need to be carried out, and ending the whole detection process; if the website has the filing information, performing filing updating, canceling abnormal detection and filing content abnormal detection;
4. record content characteristic anomaly detection module: detecting specific record information of the website and the content of the website, wherein the detection comprises 9 feature detections of a record main body, a record website and website content;
5. an abnormal characteristic information statistic module: counting the number of the abnormal features of the website, and preprocessing the detailed information of the abnormal features for subsequent storage;
6. the information storage module: and storing the information of the abnormal characteristic information statistical module to finally form an abnormal ICP filing website information base.
Detailed Description
The present invention is further described below with reference to the drawings and examples so that those skilled in the art can easily practice the present invention.
Example (b):
as shown in fig. 1, the specific process of detecting the abnormal ICP filing website mainly includes the following steps:
step one, a concrete acquisition process of information required by website record abnormity detection comprises the following steps:
step 1): the method comprises the steps of crawling the content of a website through a web crawler, and mainly obtaining the title of a website homepage, key words of important content, ICP record number at the bottom of the homepage and URL pointed by a hyperlink of the ICP record number.
Step 2): the domain name of the website to be detected is used as a query condition (the domain name of the website to be detected is a registered domain name, but not a full domain name FQDN, such as www.sohu.com, which needs to be input, sohu.com), and authority record data of the website, including record main body information and record website information, is queried and obtained through an ICP information record management system (https:// beian.
Step 3): and acquiring the IP address recorded by the website domain name A through DNS domain name resolution, and acquiring the position of the website server through an IP positioning technology.
Step 4): and acquiring information such as the registration time of the domain name, the registrant of the domain name and the like by inquiring the WHOIS information of the domain name of the website.
Step two, the process of detecting the multivariate abnormal characteristics of the abnormal ICP filing website comprises the following specific steps:
step 1: recording process abnormity detection, which comprises the following specific detection contents:
(1) firstly, whether the website needs to be put on record or not is judged according to the geographical position erected by the website server. If the server is out of the country, the website does not need to be recorded, and the recording information of the website is not inquired, the website is normal, the detection is finished, and the detection of the next website is circularly entered. If the website server is installed in China, ICP filing must be conducted according to law, and the method is divided into two different detection processes:
a. if the website is not recorded, the website can be directly judged to be an abnormal website, the abnormal risk degree can be set to be the highest, and the detection is finished at the same time to carry out the abnormal detection of the next website;
b. if the website has a record, the subsequent feature detection is continuously and sequentially carried out.
(2) And (4) abnormal record updating detection, namely respectively determining the record time of the website and the registration time of the domain name through the record information of the website and the WHOIS information of the domain name, and if the record time is earlier than the registration time of the domain name, indicating that a domain name user has changed, but the current record information is the previous information and is not updated. In this case, if the current domain name user is engaged in illegal activities, the domain name user cannot perform supervision and tracing through the website record information, which has a certain potential safety hazard and may bring unnecessary trouble to the current website record subject.
(3) The method for detecting the record cancellation abnormity comprises the following three aspects:
A. and checking whether the domain name is overdue and logged off or not based on domain name WHOIS information, and if the domain name is logged off but the website record information is not logged off, determining that the website is an abnormal record website.
B. Based on the website IP address obtained by domain name DNS analysis, the geographic position of the current website server is further determined by IP positioning, if the server is overseas, the server is in China when the website is recorded, but the server is transferred to the overseas in the later period, and the application for logging out the recorded information should be carried out at the moment. If the record information still exists, the record information is an abnormal record website.
C. And acquiring the name of the registrar of the domain name based on domain name WHOIS information, matching the name with an authorized registrar published by the Ministry of industry and communications, and checking whether the name is the authorized registrar. If the domain name of the website is used by an unauthorized registrar, the domain name of the website is used by an authorized registrar, the domain name of the website is changed into the unauthorized registrar after being filed, and the unauthorized registrar is actively applied to log out the filed information at the moment, if the domain name of the website is not logged out, the domain name of the website is indicated to be abnormal, and the domain name of the website is regarded as the abnormal filed website.
Step 2: detecting abnormal record content, wherein the specific detection content is as follows:
(1) the record main body information abnormity detection comprises the following three aspects:
A) detecting abnormal unit property of the record main body, if the unit property of the record main body is 'individual', judging whether the record main body is consistent with record information or not through a website content keyword, for example, some website record property is 'individual', but the website is actually company or community content; if the record unit property is the content of a company, a business unit, a community, etc., the actual unit property of the website should be determined comprehensively by combining with enterprise databases such as a sky-eye inspection, etc. (these databases contain the operation state, the unit property, the record information, etc. of the enterprise) in addition to the website content keywords, and then judging whether the unit property is abnormal or not.
B) Acquiring the name of a main unit through website record main body information, and inquiring whether a large number of domain name records are applied under the record main body through an ICP information record management system provided by the Ministry of industry and communications by taking the name of the main unit as an inquiry condition, for example, hundreds or even thousands of record domain names exist under a certain person or a company main body, which is an abnormal phenomenon and can cause the situations of malicious domain name rush notes and private business.
C) And determining the operation state of the website record main body through the enterprise database such as sky eye inspection and the like, and whether the operation state is cancelled. When the record main body is logged off, the named record website is an abnormal record website.
(2) The record website information abnormity detection comprises the following three aspects:
(A) and acquiring a title attribute field value of the website homepage through a web crawler, matching and comparing the title attribute field value with the website name in the recorded website information, checking whether similarity exists or not, and if the similarity does not exist, judging that the recorded website is abnormal.
(B) Checking whether the bottom of the website homepage has a display website record number and whether a hyperlink URL corresponding to the record number points to the ICP information record management system of the Ministry of industry and communications, if not, the website has abnormal record information display. According to the regulation of 'non-commercial Internet information service filing management method', the filing website should mark the filing number at the central position of the bottom of the homepage, and link the website of the filing management system of the Ministry of industry and communications under the filing number as required for the public to check.
(C) Acquiring the recorded serial number displayed at the bottom of the website homepage, comparing the recorded serial number with the serial number in the recorded website information, and checking whether the recorded serial number is consistent with the serial number in the recorded website information. If the information is inconsistent, the website record information is abnormal, and the situation of counterfeit use may exist.
(3) And detecting the abnormal content of the website, namely detecting whether illegal content such as yellow gambling poison exists in the content of the analyzed website through a website content keyword and a third-party abnormal domain name detection interface (such as Google safe browsing, Baidu website security center and the like). If the website exists, the website is an illegal abnormal website.
Step three, the abnormal characteristic information statistics and information storage process specifically comprises the following steps:
step (1): counting the number of abnormal features of each website and the detailed information of each abnormal feature by detecting the features of the filing process and the filing content, wherein the number of the abnormal features can be used for feeding back the abnormal risk degree of the website;
step (2): and finally, persistently storing the statistical information to construct an abnormal ICP filing website information base.
The above description is only for the purpose of illustrating preferred embodiments of the present invention and is not to be construed as limiting the present invention, and it is apparent to those skilled in the art that various modifications and variations can be made in the present invention. All changes, equivalents, modifications and the like which come within the scope of the invention as defined by the appended claims are intended to be embraced therein.