+

WO2018145637A1 - Method and device for recording web browsing behavior, and user terminal - Google Patents

Method and device for recording web browsing behavior, and user terminal Download PDF

Info

Publication number
WO2018145637A1
WO2018145637A1 PCT/CN2018/075599 CN2018075599W WO2018145637A1 WO 2018145637 A1 WO2018145637 A1 WO 2018145637A1 CN 2018075599 W CN2018075599 W CN 2018075599W WO 2018145637 A1 WO2018145637 A1 WO 2018145637A1
Authority
WO
WIPO (PCT)
Prior art keywords
page
behavior
resource
website
user
Prior art date
Application number
PCT/CN2018/075599
Other languages
French (fr)
Chinese (zh)
Inventor
吴伟勇
Original Assignee
广州市动景计算机科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 广州市动景计算机科技有限公司 filed Critical 广州市动景计算机科技有限公司
Publication of WO2018145637A1 publication Critical patent/WO2018145637A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce

Definitions

  • the present invention relates to the field of user interest portrait technology, and in particular, to a method, device, and user terminal for recording online behavior.
  • User portrait technology is a technique for abstracting a tagged user model through the user's social attributes, action behaviors or hobbies.
  • the online behavior can effectively reflect the user's habits or hobbies, and the collection of online behavior is a vital part of the user's portrait.
  • the data generated by the user's online behavior is collected indiscriminately, including useless behavior data that does not reflect the user's true habits or hobbies (such as pages that are not intentionally clicked), and thus collects The data is cumbersome, causing great interference to data transmission and data analysis, resulting in inaccurate user images.
  • an object of the present invention is to provide a method for recording an online behavior, which is applied to a user terminal communicatively connected to a server, the method comprising:
  • the vertical website is a website providing specific domain information or related services
  • the identification rule matching the vertical website is searched from the identification rules pre-stored by the user terminal;
  • Another object of the present invention is to provide an online behavior recording apparatus for a user terminal that is communicatively coupled to a server, the apparatus comprising:
  • a detecting module configured to detect, when a website is opened in the browser, whether the website is a pre-defined vertical website, and the vertical website is a website that provides specific domain information or related services;
  • the identification rule matching module is configured to: when the website opened in the browser is a vertical website, search for an identification rule matching the vertical website from the identification rule prestored by the user terminal;
  • a behavior record module configured to identify and record data generated by an operation behavior of the user on the vertical website according to the identification rule, and generate a preset format online according to the data generated by the operation behavior and the user identity information of the user terminal.
  • Behavior record file
  • a sending module configured to send the online behavior record file to the server for a user portrait.
  • Another object of the present invention is to provide a user terminal that is communicatively coupled to a server, the user terminal comprising:
  • An online behavior recording device the online behavior recording device being installed in the memory and comprising one or more software function modules executed by the processor, the device comprising:
  • a detecting module configured to detect whether the website is a predefined vertical website when the website is opened in the browser, and the vertical website is a website that provides specific domain information or related services;
  • a behavior record module configured to identify and record data generated by an operation behavior of the user on the vertical website according to the identification rule, and generate a preset format online according to the data generated by the operation behavior and the user identity information of the user terminal.
  • Behavior record file
  • a sending module configured to send the online behavior record file to the server for a user portrait.
  • Another object of the present invention is to provide a method for recording an online behavior, which is applied to a server, and the server is in communication with the user terminal provided by the present invention.
  • the server includes a data for recording various types of Internet lines of the user and a database for weighting the online behavior; the method includes:
  • Another object of the present invention is to provide an online behavior recording apparatus, which is applied to a server, and the server is communicably connected to the user terminal provided by the present invention.
  • the server includes a data for recording various types of Internet lines of the user and
  • the device includes:
  • a data acquisition module configured to receive an online behavior record file sent by the user terminal, analyze and obtain data generated by the user's online behavior in the online behavior record file, and store the data in the database;
  • the weight update module is configured to update the weight value of the online behavior in the database according to the data generated by the online behavior.
  • the present invention has the following beneficial effects:
  • FIG. 1 is a schematic diagram of interaction between a user terminal and a server according to an embodiment of the present invention
  • FIG. 2 is a schematic diagram of a user terminal according to an embodiment of the present invention.
  • FIG. 3 is a flowchart of a method for recording an online behavior according to an embodiment of the present invention
  • FIG. 5 is a schematic diagram of an online behavior device according to an embodiment of the present invention.
  • FIG. 6 is a second flowchart of a method for recording an online behavior according to an embodiment of the present invention.
  • FIG. 7 is a second schematic diagram of an online behavior device according to an embodiment of the present invention.
  • Icon 100-user terminal; 110 (210)-online behavior recording device; 111-detection module; 112-identification rule matching module; 113-behavior recording module; 1131- retrieval behavior record sub-module; 1132-effective behavior screening sub-module 114-transmission module; 115-identification rule verification module; 116-identification rule acquisition module; 120-memory; 130-processor; 140-communication unit; 200-server; 211-data acquisition module; 212-weight update module; 300-network.
  • the terms “set”, “install”, “connected”, and “connected” are to be understood broadly, and may be fixed connections, for example, unless otherwise specifically defined and defined. It can also be a detachable connection, or an integral connection; it can be a mechanical connection or an electrical connection; it can be directly connected or indirectly connected through an intermediate medium, and can be internal communication between the two elements.
  • the specific meaning of the above terms in the present invention can be understood in a specific case by those skilled in the art.
  • FIG. 1 is a schematic diagram of interaction between a user terminal 100 and a server 200 according to a preferred embodiment of the present invention.
  • the user terminal 100 can communicate with the server 200 through the network 300 to implement data communication or interaction between the user terminal 100 and the server 200.
  • the server 200 may be, but not limited to, a web server, a file transfer protocol (ftp) server, and the like.
  • the user terminal 100 can be, but not limited to, a smart phone, a personal computer (PC), a tablet computer, a personal digital assistant (PDA), a mobile Internet device (MID), and the like.
  • the network 300 can be, but is not limited to, a wired network or a wireless network.
  • the user terminal 100 includes an online behavior recording device 110, a memory 120, a processor 130, and a communication unit 140.
  • the components of the memory 120, the processor 130, and the communication unit 140 are electrically connected directly or indirectly to each other to implement data transmission or interaction.
  • the components can be electrically connected to one another via one or more communication buses or signal lines.
  • the online behavior recording device 110 includes at least one software function module that can be stored in the memory 120 or in an operating system (OS) of the user terminal 100 in the form of software or firmware.
  • the processor 130 is configured to execute an executable module stored in the memory 120, such as a software function module, a computer program, and the like included in the online behavior recording device 110.
  • the memory 120 may be, but not limited to, a random access memory (RAM), a read only memory (ROM), and a programmable read-only memory (PROM). Erasable Programmable Read-Only Memory (EPROM), Electric Erasable Programmable Read-Only Memory (EEPROM), and the like.
  • the memory 120 is configured to store a program, and the processor 130 executes the program after receiving an execution instruction.
  • the communication unit 140 is configured to establish a communication connection between the user terminal 100 and the server 200 through the network 300, and is used to send and receive data through the network 300.
  • FIG. 3 is a flowchart of a method for recording an online behavior applied to the user terminal 100 shown in FIG. 1. The method includes the following steps.
  • Step S110 When a website is opened in the browser, it is detected whether the website is a predefined vertical website.
  • the vertical website may be a website for providing specific domain information or related services.
  • Sina, Sohu, Baidu, Tencent and other comprehensive websites contain a large amount of information in various fields.
  • Users may include data on the online behavior data of the comprehensive website that cannot accurately reflect the user's real interests or behaviors, such as the user's unintentional.
  • the vertical website is highly targeted, and the online behavior data on the vertical website can most effectively reflect the user's hobbies or behaviors in a certain field. Therefore, in this embodiment, the online behavior data of the user on the vertical website is collected for the user portrait.
  • the user terminal 100 when the user terminal 100 detects that the user opens the website in the browser, it first detects whether the opened website is a predefined vertical website. In this embodiment, the user terminal 100 may, after receiving the web address input by the user, request the server 200 to detect whether the website corresponding to the web address is a vertical website. In addition, after receiving the web address input by the user, the user terminal 100 may also query, in the list of vertical websites stored by itself, whether the website corresponding to the web address is a vertical website.
  • the data generated by the online behavior of the website is not collected; when the opened website is a predefined vertical website, the steps are performed on the website through steps S120 and subsequent steps. The data generated by the online behavior is collected.
  • the user terminal 100 stores an identification rule corresponding to each vertical website.
  • the identification rule matching the vertical website may be searched according to the domain name of the vertical website.
  • the identification rule records data items that need to be collected on a vertical website, and the identification rules can be written in a format such as Protocol Buffer, Json, or other languages. Taking the Json format as an example, the identification rule may be in the following form:
  • the page structure of each vertical website may be updated or changed.
  • the identification rule may include an effective duration, and the valid duration indicates that the identification rule is acquired by the user terminal 100. After how long after the failure.
  • the user terminal 100 determines whether the identification rule is valid according to the identification rule of the vertical website when the browser is opened, according to the time point of obtaining the identification rule, the effective duration of the identification rule, and the current time point.
  • the identification rule is invalid, the valid identification rule corresponding to the vertical website is first acquired from the server 200 and stored in the user terminal 100.
  • the effective duration is determined according to the time when the server 200 generates the identification rule and the preset duration of the identification rule.
  • the duration of the stay may be set to 7 days, that is, 604800 seconds.
  • the user terminal 100 acquires the validity period of the rule at any time point within the duration of the retention period (7 days), which should be the "stay duration” and the "current time point”. "Difference. In this way, the validity of the identification rule is guaranteed.
  • the vertical website is marked as a matching rule without matching, and is set after a certain period of time (can be preset to 7 days, It can also be dynamically adjusted by the user terminal 100 to query the server 200 again whether the vertical website has an identification rule. In this way, it is guaranteed to obtain the identification rules of the newly defined vertical website.
  • the user terminal 100 collects the user's online behavior, and the user terminal 100 collects the information on the vertical website to perform the operation.
  • the data can make the user's portrait more accurate.
  • the user terminal 100 records the user's operation behavior on the retrieval result in the online behavior record file.
  • the preset operation may include opening or jumping to a new page from a display page corresponding to the resource link obtained by the retrieval.
  • the preset operation may include opening or jumping to the display page corresponding to the resource link obtained by the retrieval. New page.
  • the process in which the user terminal 100 records data generated by the operation behavior includes sub-step S201, sub-step S202, sub-step S023, and sub-step S204.
  • sub-step S201, sub-step S202, sub-step S023, and sub-step S204 are described in detail below.
  • the identification rule may further include resource location information, where the resource location information includes a location of a page tag corresponding to the resource link in a page file of the vertical website search result page.
  • the item “resource_entry” in the identification rule is the resource location information.
  • it can be defined by using the xpath method, and the definition supports multiple xpath paths, which can be separated by specific symbols (such as commas). If the resource location information is “id('results')/x:div[3]/x:div[2]/x:a”, the connection information indicating the resource is recorded in the html file of the search result page. The id is the a tag in the second secondary div subtag in the third level div subtag of the div tag of results.
  • the user terminal 100 receives a click operation on the search result page, and determines whether the page label position corresponding to the clicked resource link is consistent with the page label position indicated in the resource location information. When the page label positions are consistent, the link information in the page label is obtained.
  • the user terminal 100 opens the search result page, it is determined, according to the resource location information, whether there is a label in the search page that is consistent with the resource location information path, and the resource location information path is not found.
  • the label is consistent, it is determined that the search result does not retrieve the corresponding result, and the record of the online behavior is ended.
  • Sub-step S203 when it is detected that the resource page is opened or jumped to the new page from the displayed resource page, the link information corresponding to the recorded resource page is retained.
  • the user terminal 100 detects that the operation is performed on the displayed resource page or jumps to a new page, it is determined that the operation performed this time is consistent with the preset operation, and the link information corresponding to the recorded resource page is retained.
  • the user terminal 100 detects that the displayed resource page is closed and does not open or jump to a new page on the resource page, it is considered that the operation performed this time does not comply with the preset operation, and further The link information corresponding to the recorded resource page can be deleted.
  • the online behavior record file can be expressed as follows:
  • Step S140 the online behavior record file is sent to the server 200 for a user portrait.
  • the online behavior record file may further include a search term input by the user in the retrieval operation, an address of the search request, a cookie field, and the like, for restoring the user searched page in the user portrait to collect the user data.
  • the embodiment further provides an online behavior recording apparatus 110, which is applied to a user terminal 100 communicatively coupled to a server 200.
  • the apparatus includes a detection module 111, an identification rule matching module 112, a behavior recording module 113, and a transmission. Module 114.
  • the detecting module 111 is configured to detect whether the website is a predefined vertical website when a website is opened in the browser.
  • the vertical website may be a website for providing specific domain information or related services.
  • the detecting module 111 can be used to perform step S110 shown in FIG. 3, and the description of the step S110 can be referred to for a specific description of the detecting module 111.
  • the identification rule checking module 115 is configured to determine, according to the identification rule of the vertical website, whether the identification rule is valid according to a time point at which the identification rule is obtained, an effective duration of the identification rule, and a current time point;
  • the identification rule obtaining module 116 is configured to acquire a valid identification rule corresponding to the vertical website from the server 200 when the identification rule is invalid, and store the rule in the user terminal 100.
  • the identification rule matching module 112 is configured to: when the website opened in the browser is a vertical website, search for an identification rule matching the vertical website from the identification rules pre-stored by the user terminal.
  • the identification rule matching module 112 can be used to perform step S120 shown in FIG. 3, and the description of the step S120 can be referred to for a specific description of the identification rule matching module 112.
  • the behavior record module 113 is configured to identify and record data generated by the user's operation behavior on the vertical website according to the identification rule, and generate a pre-process according to the data generated by the operation behavior and the user identity information of the user terminal 100. Formatted online behavior record file.
  • the behavior recording module 113 can be used to perform step S130 shown in FIG. 3, and the description of the step S130 can be referred to for a detailed description of the behavior recording module 113.
  • the data generated by the operation behavior includes data generated by an operation of the retrieval result when the user performs information retrieval on the vertical website.
  • the identification rule includes at least one preset operation for the retrieval result on the vertical website;
  • the behavior recording module 113 includes a retrieval behavior recording sub-module 1131 and an effective behavior screening sub-module 1132.
  • the search behavior record sub-module 1131 is configured to record, in the online behavior record file, an operation behavior of the user on the search result when detecting that the user performs an information retrieval operation on the vertical website;
  • the effective behavior screening sub-module 1132 is configured to retain data generated by the operation behavior when detecting that the operation performed on the retrieval result conforms to the preset operation, when detecting the retrieval result When the operation does not comply with the preset operation, the data generated by the operation behavior is discarded.
  • the preset operation includes opening or jumping to a new page in the display page corresponding to the resource link obtained by the retrieval;
  • the manner in which the effective behavior screening sub-module 1132 records the online behavior includes:
  • the link information corresponding to the recorded resource page is deleted.
  • the identification rule further includes resource location information, where the resource location information includes a location of a page tag corresponding to the resource link in a page file of the vertical website search result page; and the effective behavior screening submodule 1132 How to obtain and record link information, including:
  • the sending module 114 is configured to send the online behavior record file to the server 200 for a user portrait.
  • the sending module 114 can be used to perform step S140 shown in FIG. 3, and a detailed description of the sending module 114 can refer to the description of the step S140.
  • the online behavior record file includes a search term, a search request address, and a cookie field of the search operation.
  • FIG. 6 is a flowchart of a method for collecting an online behavior applied to the server 200 shown in FIG. 1.
  • the server 200 is communicably connected to the user terminal 100 provided in this embodiment, and the server 200 includes a In the database for recording the data generated by various online behaviors of the user and the weight of the online behavior, the method includes the following steps.
  • Step S310 Receive an online behavior record file sent by the user terminal 100, analyze and obtain data generated by the user's online behavior in the online behavior record file, and store the data in the database.
  • the server 200 extracts data generated by the online behavior or related content crawled by the web crawler according to the data generated by the online behavior, and organizes the content into structured information, for example, a novel.
  • a vertical website that organizes the data generated by the online behavior into the following items:
  • Target resource resource name + author name
  • Target resource link address resource link address (address of resource page)
  • Resource category extracted resource category
  • the server 200 establishes a dedicated target resource data table and a resource category data table for each user in units of users, and stores the above-mentioned collated data in a database.
  • Step S320 Update the weight value of the online behavior in the database according to the data generated by the online behavior.
  • the user identity information is used as the primary key, and structured data such as the vertical category/target resource/target resource link address is stored as a record in the database.
  • the weight value in the current record is updated (for example, by 100) .
  • the degree of interest in different resource categories can be distinguished by the weight value of the user. .
  • the embodiment further provides an online behavior recording device 210, which is applied to the server 200.
  • the server 200 is in communication with the user terminal 100 provided in this embodiment, and the server 200 includes a user for recording
  • the data obtaining module 211 is configured to receive an online behavior record file sent by the user terminal 100, analyze and obtain data generated by the user's online behavior in the online behavior record file, and store the data in the database.
  • the data obtaining module 211 can be used to perform step S310 shown in FIG. 6.
  • the description of the step S310 can be referred to.
  • the weight update module 212 is configured to update a weight value of the online behavior in the database according to the data generated by the online behavior.
  • the weight update module 212 can be used to perform step S320 shown in FIG. 6.
  • the description of the step S320 can be referred to.
  • the online behavior recording method, device and server provided by the present invention collect data generated by the online behavior of the user on a vertical website by using corresponding identification rules for different websites, and can collect and accurately reflect the real interest of the user. Data on hobbies or behavioral habits helps to implement accurate user portraits for users.
  • each block of the flowchart or block diagram can represent a module, a program segment, or a portion of code that includes one or more of the Executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the blocks may also occur in a different order than those illustrated in the drawings.
  • each block of the block diagrams and/or flowcharts, and combinations of blocks in the block diagrams and/or flowcharts can be implemented in a dedicated hardware-based system that performs the specified function or function. Or it can be implemented by a combination of dedicated hardware and computer instructions.
  • each functional module in each embodiment of the present invention may be integrated to form a separate part, or each module may exist separately, or two or more modules may be integrated to form a separate part.
  • the functions, if implemented in the form of software functional modules and sold or used as separate products, may be stored in a computer readable storage medium.
  • the technical solution of the present invention which is essential or contributes to the prior art, or a part of the technical solution, may be embodied in the form of a software product, which is stored in a storage medium, including
  • the instructions are used to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention.
  • the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Finance (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

A method and device (110) for recording a web browsing behavior, and a user terminal (100). The method comprises: when a website is opened in a browser, detecting whether the website is a predefined vertical website, the vertical website being a website providing specific domain information or a related service (S110); when the website opened in the browser is the vertical website, searching in recognition rules prestored by the user terminal (100) for a recognition rule matching the vertical website (S120); recognizing, on the basis of the recognition rule, and recording data generated by an operating behavior of a user on the vertical website, and generating a web browsing behavior record file of a preset format on the basis of the data generated by the operating behavior and user identity information of the user terminal (100) (S130); and transmitting the web browsing behavior record file to the server (200) for user profiling (S140). As such, data that can accurately reflect the actual interests and hobbies or behavioral habits of the user can be collected, thus facilitating accurate and precise user profiling with respect to the user.

Description

上网行为记录方法、装置及用户终端Online behavior recording method, device and user terminal 技术领域Technical field
本发明涉及用户兴趣画像技术领域,具体而言,涉及一种上网行为记录方法、装置及用户终端。The present invention relates to the field of user interest portrait technology, and in particular, to a method, device, and user terminal for recording online behavior.
背景技术Background technique
用户画像技术是一种通过用户的社会属性、动作行为或兴趣爱好等抽象出一个标签化的用户模型的技术。在用户画像技术中,上网行为可以有效地反映用户的习惯或爱好,对上网行为的收集是用户画像中至关重要的部分。现有技术的上网行为记录方法中,无差别地对用户上网行为产生的数据进行收集,其中包括了无用的,不能反映用户真实习惯或爱好的行为数据(如无意点击进入的页面),因而收集到的数据冗杂,对数据传输及数据分析造成了极大干扰,导致用户画像不准确。User portrait technology is a technique for abstracting a tagged user model through the user's social attributes, action behaviors or hobbies. In the user portrait technology, the online behavior can effectively reflect the user's habits or hobbies, and the collection of online behavior is a vital part of the user's portrait. In the prior art online behavior recording method, the data generated by the user's online behavior is collected indiscriminately, including useless behavior data that does not reflect the user's true habits or hobbies (such as pages that are not intentionally clicked), and thus collects The data is cumbersome, causing great interference to data transmission and data analysis, resulting in inaccurate user images.
发明内容Summary of the invention
为了克服现有技术中的上述不足,本发明的目的在于提供一种上网行为记录方法,应用于与服务器通信连接的用户终端,所述方法包括:In order to overcome the above-mentioned deficiencies in the prior art, an object of the present invention is to provide a method for recording an online behavior, which is applied to a user terminal communicatively connected to a server, the method comprising:
当浏览器中打开一网站时,检测该网站是否为预先定义的垂直网站,所述垂直网站为提供特定领域信息或相关服务的网站;When a website is opened in the browser, it is detected whether the website is a predefined vertical website, and the vertical website is a website providing specific domain information or related services;
当所述浏览器中打开的网站为垂直网站时,从用户终端预存的识别规则中查找与该垂直网站匹配的识别规则;When the website opened in the browser is a vertical website, the identification rule matching the vertical website is searched from the identification rules pre-stored by the user terminal;
根据所述识别规则识别并记录用户在该垂直网站上的操作行为产生的数据,并根据所述操作行为产生的数据及该用户终端的用户身份信息生成预设格式的上网行为记录文件;Identifying and recording data generated by the user's operation behavior on the vertical website according to the identification rule, and generating an online behavior record file in a preset format according to the data generated by the operation behavior and the user identity information of the user terminal;
将所述上网行为记录文件发送给所述服务器进行用户画像。Sending the online behavior record file to the server for user portrait.
本发明的另一目的在于提供一种上网行为记录装置,应用于与服务器通信连接的用户终端,所述装置包括:Another object of the present invention is to provide an online behavior recording apparatus for a user terminal that is communicatively coupled to a server, the apparatus comprising:
检测模块,用于当浏览器中打开一网站时,检测该网站是否为预先定 义的垂直网站,所述垂直网站为提供特定领域信息或相关服务的网站;a detecting module, configured to detect, when a website is opened in the browser, whether the website is a pre-defined vertical website, and the vertical website is a website that provides specific domain information or related services;
识别规则匹配模块,用于当所述浏览器中打开的网站为垂直网站时,从用户终端预存的识别规则中查找与该垂直网站匹配的识别规则;The identification rule matching module is configured to: when the website opened in the browser is a vertical website, search for an identification rule matching the vertical website from the identification rule prestored by the user terminal;
行为记录模块,用于根据所述识别规则识别并记录用户在该垂直网站上的操作行为产生的数据,并根据所述操作行为产生的数据及该用户终端的用户身份信息生成预设格式的上网行为记录文件;a behavior record module, configured to identify and record data generated by an operation behavior of the user on the vertical website according to the identification rule, and generate a preset format online according to the data generated by the operation behavior and the user identity information of the user terminal. Behavior record file;
发送模块,用于将所述上网行为记录文件发送给所述服务器进行用户画像。And a sending module, configured to send the online behavior record file to the server for a user portrait.
本发明的另一目的在于提供一种用户终端,与服务器通信连接,所述用户终端包括:Another object of the present invention is to provide a user terminal that is communicatively coupled to a server, the user terminal comprising:
存储器;Memory
处理器;及Processor; and
上网行为记录装置,所述上网行为记录装置安装于所述存储器中,并包括一个或多个由所述处理器执行的软件功能模块,所述装置包括:An online behavior recording device, the online behavior recording device being installed in the memory and comprising one or more software function modules executed by the processor, the device comprising:
检测模块,用于当浏览器中打开一网站时,检测该网站是否为预先定义的垂直网站,所述垂直网站为提供特定领域信息或相关服务的网站;a detecting module, configured to detect whether the website is a predefined vertical website when the website is opened in the browser, and the vertical website is a website that provides specific domain information or related services;
识别规则匹配模块,用于当所述浏览器中打开的网站为垂直网站时,从用户终端预存的识别规则中查找与该垂直网站匹配的识别规则;The identification rule matching module is configured to: when the website opened in the browser is a vertical website, search for an identification rule matching the vertical website from the identification rule prestored by the user terminal;
行为记录模块,用于根据所述识别规则识别并记录用户在该垂直网站上的操作行为产生的数据,并根据所述操作行为产生的数据及该用户终端的用户身份信息生成预设格式的上网行为记录文件;a behavior record module, configured to identify and record data generated by an operation behavior of the user on the vertical website according to the identification rule, and generate a preset format online according to the data generated by the operation behavior and the user identity information of the user terminal. Behavior record file;
发送模块,用于将所述上网行为记录文件发送给所述服务器进行用户画像。And a sending module, configured to send the online behavior record file to the server for a user portrait.
本发明的另一目的在于提供一种上网行为记录方法,应用于服务器,所述服务器与本发明上述提供的用户终端通信连接,所述服务器包括一用于记录用户各类上网行产生的数据以及所述上网行为权重的数据库;所述方法包括:Another object of the present invention is to provide a method for recording an online behavior, which is applied to a server, and the server is in communication with the user terminal provided by the present invention. The server includes a data for recording various types of Internet lines of the user and a database for weighting the online behavior; the method includes:
接收所述用户终端发送的上网行为记录文件,分析获得所述上网行为记录文件中用户上网行为产生的数据并存储至所述数据库;Receiving an online behavior record file sent by the user terminal, analyzing and obtaining data generated by the user's online behavior in the online behavior record file, and storing the data to the database;
根据所述上网行为产生的数据,更新所述数据库中该上网行为的权重 值。And updating the weight value of the online behavior in the database according to the data generated by the online behavior.
本发明的另一目的在于提供一种上网行为记录装置,应用于服务器,所述服务器与本发明上述提供的用户终端通信连接,所述服务器包括一用于记录用户各类上网行产生的数据以及所述上网行为权重的数据库;所述装置包括:Another object of the present invention is to provide an online behavior recording apparatus, which is applied to a server, and the server is communicably connected to the user terminal provided by the present invention. The server includes a data for recording various types of Internet lines of the user and The database for weighting the online behavior; the device includes:
数据获取模块,用于接收所述用户终端发送的上网行为记录文件,分析获得所述上网行为记录文件中用户上网行为产生的数据并存储至所述数据库;a data acquisition module, configured to receive an online behavior record file sent by the user terminal, analyze and obtain data generated by the user's online behavior in the online behavior record file, and store the data in the database;
权重更新模块,用于根据所述上网行为产生的数据,更新所述数据库中该上网行为的权重值。The weight update module is configured to update the weight value of the online behavior in the database according to the data generated by the online behavior.
相对于现有技术而言,本发明具有以下有益效果:Compared with the prior art, the present invention has the following beneficial effects:
本发明提供的上网行为记录方法、装置及服务器,通过针对不同的网站采用相应的识别规则,收集用户在垂直网站的上网行为产生的数据,可以收集到能准确反映用户真实兴趣爱好或行为习惯的数据,有助于对用户实施精准的用户画像。The online behavior recording method, device and server provided by the invention collect data generated by the user's online behavior on the vertical website by adopting corresponding identification rules for different websites, and can collect data that can accurately reflect the user's true interests or behaviors. Data helps to implement accurate user portraits for users.
附图说明DRAWINGS
为了更清楚地说明本发明实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,应当理解,以下附图仅示出了本发明的某些实施例,因此不应被看作是对范围的限定,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他相关的附图。In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the embodiments will be briefly described below. It should be understood that the following drawings show only certain embodiments of the present invention, and therefore It should be seen as a limitation on the scope, and those skilled in the art can obtain other related drawings according to these drawings without any creative work.
图1为本发明实施例提供的用户终端与服务器的交互示意图;FIG. 1 is a schematic diagram of interaction between a user terminal and a server according to an embodiment of the present invention;
图2为本发明实施例提供的用户终端示意图;2 is a schematic diagram of a user terminal according to an embodiment of the present invention;
图3为本发明实施例提供的上网行为记录方法的流程图之一;FIG. 3 is a flowchart of a method for recording an online behavior according to an embodiment of the present invention;
图4为本发明实施例中预设操作判定的流程图;4 is a flowchart of a preset operation determination in an embodiment of the present invention;
图5为本发明实施例提供的上网行为装置的示意图之一;FIG. 5 is a schematic diagram of an online behavior device according to an embodiment of the present invention;
图6为本发明实施例提供的上网行为记录方法的流程图之二;FIG. 6 is a second flowchart of a method for recording an online behavior according to an embodiment of the present invention;
图7为本发明实施例提供的上网行为装置的示意图之二。FIG. 7 is a second schematic diagram of an online behavior device according to an embodiment of the present invention.
图标:100-用户终端;110(210)-上网行为记录装置;111-检测模块;112-识别规则匹配模块;113-行为记录模块;1131-检索行为记录子模块;1132-有效行为筛选子模块;114-发送模块;115-识别规则检验模块;116-识别规则获取模块;120-存储器;130-处理器;140-通信单元;200-服务器;211-数据获取模块;212-权重更新模块;300-网络。Icon: 100-user terminal; 110 (210)-online behavior recording device; 111-detection module; 112-identification rule matching module; 113-behavior recording module; 1131- retrieval behavior record sub-module; 1132-effective behavior screening sub-module 114-transmission module; 115-identification rule verification module; 116-identification rule acquisition module; 120-memory; 130-processor; 140-communication unit; 200-server; 211-data acquisition module; 212-weight update module; 300-network.
具体实施方式detailed description
为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。通常在此处附图中描述和示出的本发明实施例的组件可以以各种不同的配置来布置和设计。The technical solutions in the embodiments of the present invention will be clearly and completely described in conjunction with the drawings in the embodiments of the present invention. It is a partial embodiment of the invention, and not all of the embodiments. The components of the embodiments of the invention, which are generally described and illustrated in the figures herein, may be arranged and designed in various different configurations.
因此,以下对在附图中提供的本发明的实施例的详细描述并非旨在限制要求保护的本发明的范围,而是仅仅表示本发明的选定实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。Therefore, the following detailed description of the embodiments of the invention in the claims All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步定义和解释。It should be noted that similar reference numerals and letters indicate similar items in the following figures, and therefore, once an item is defined in a drawing, it is not necessary to further define and explain it in the subsequent drawings.
在本发明的描述中,还需要说明的是,除非另有明确的规定和限定,术语“设置”、“安装”、“相连”、“连接”应做广义理解,例如,可以是固定连接,也可以是可拆卸连接,或一体地连接;可以是机械连接,也可以是电连接;可以是直接相连,也可以通过中间媒介间接相连,可以是两个元件内部的连通。对于本领域的普通技术人员而言,可以具体情况理解上述术语在本发明中的具体含义。In the description of the present invention, it should be noted that the terms "set", "install", "connected", and "connected" are to be understood broadly, and may be fixed connections, for example, unless otherwise specifically defined and defined. It can also be a detachable connection, or an integral connection; it can be a mechanical connection or an electrical connection; it can be directly connected or indirectly connected through an intermediate medium, and can be internal communication between the two elements. The specific meaning of the above terms in the present invention can be understood in a specific case by those skilled in the art.
如图1所示,是本发明较佳实施例提供的用户终端100与服务器200通信的交互示意图。所述用户终端100可通过网络300与所述服务器200进行通信,以实现用户终端100与服务器200之间的数据通信或交互。FIG. 1 is a schematic diagram of interaction between a user terminal 100 and a server 200 according to a preferred embodiment of the present invention. The user terminal 100 can communicate with the server 200 through the network 300 to implement data communication or interaction between the user terminal 100 and the server 200.
本实施例中,所述服务器200可以是,但不限于,web(网站)服务器、ftp(file transfer protocol,文件传输协议)服务器等。所述用户终端100可以 是,但不限于,智能手机、个人电脑(personal computer,PC)、平板电脑、个人数字助理(personal digital assistant,PDA)、移动上网设备(mobile Internet device,MID)等。In this embodiment, the server 200 may be, but not limited to, a web server, a file transfer protocol (ftp) server, and the like. The user terminal 100 can be, but not limited to, a smart phone, a personal computer (PC), a tablet computer, a personal digital assistant (PDA), a mobile Internet device (MID), and the like.
所述网络300可以是,但不限于,有线网络或无线网络。The network 300 can be, but is not limited to, a wired network or a wireless network.
如图2所示,是图1所示的用户终端100的方框示意图。所述用户终端100包括上网行为记录装置110、存储器120、处理器130、通信单元140。2 is a block schematic diagram of the user terminal 100 shown in FIG. 1. The user terminal 100 includes an online behavior recording device 110, a memory 120, a processor 130, and a communication unit 140.
所述存储器120、处理器130以及通信单元140各元件相互之间直接或间接地电性连接,以实现数据的传输或交互。例如,这些元件相互之间可通过一条或多条通讯总线或信号线实现电性连接。所述上网行为记录装置110包括至少一个可以软件或固件(firmware)的形式存储于所述存储器120中或固化在所述用户终端100的操作系统(operating system,OS)中的软件功能模块。所述处理器130用于执行所述存储器120中存储的可执行模块,例如所述上网行为记录装置110所包括的软件功能模块及计算机程序等。The components of the memory 120, the processor 130, and the communication unit 140 are electrically connected directly or indirectly to each other to implement data transmission or interaction. For example, the components can be electrically connected to one another via one or more communication buses or signal lines. The online behavior recording device 110 includes at least one software function module that can be stored in the memory 120 or in an operating system (OS) of the user terminal 100 in the form of software or firmware. The processor 130 is configured to execute an executable module stored in the memory 120, such as a software function module, a computer program, and the like included in the online behavior recording device 110.
其中,所述存储器120可以是,但不限于,随机存取存储器(Random Access Memory,RAM),只读存储器(Read Only Memory,ROM),可编程只读存储器(Programmable Read-Only Memory,PROM),可擦除只读存储器(Erasable Programmable Read-Only Memory,EPROM),电可擦除只读存储器(Electric Erasable Programmable Read-Only Memory,EEPROM)等。其中,存储器120用于存储程序,所述处理器130在接收到执行指令后,执行所述程序。所述通信单元140用于通过所述网络300建立所述用户终端100与服务器200之间的通信连接,并用于通过所述网络300收发数据。The memory 120 may be, but not limited to, a random access memory (RAM), a read only memory (ROM), and a programmable read-only memory (PROM). Erasable Programmable Read-Only Memory (EPROM), Electric Erasable Programmable Read-Only Memory (EEPROM), and the like. The memory 120 is configured to store a program, and the processor 130 executes the program after receiving an execution instruction. The communication unit 140 is configured to establish a communication connection between the user terminal 100 and the server 200 through the network 300, and is used to send and receive data through the network 300.
请参照图3,图3为应用于图1所示的用户终端100的一种上网行为记录方法的流程图,以下将对所述方法包括各个步骤进行详细阐述。Please refer to FIG. 3. FIG. 3 is a flowchart of a method for recording an online behavior applied to the user terminal 100 shown in FIG. 1. The method includes the following steps.
步骤S110,当浏览器中打开一网站时,检测该网站是否为预先定义的垂直网站。本实施例中,所述垂直网站可以是用于提供特定领域信息或相关服务的网站。Step S110: When a website is opened in the browser, it is detected whether the website is a predefined vertical website. In this embodiment, the vertical website may be a website for providing specific domain information or related services.
如,新浪、搜狐、百度、腾讯等综合类网站包含各个领域的大量信息,用户在综合类网站上的上网行为数据中可能包一些不能准确反映用户真实兴趣爱好或行为习惯的数据,比如用户无意间点开的页面,或因错误操作打开的非本愿浏览的网站。而垂直网站因其提供的资源针对性强,在垂直网站上 的上网行为数据可以最有效的反映用户在某个领域的兴趣爱好或行为习惯。故在本实施例中,对用户在垂直网站上的上网行为数据进行收集,以用于用户画像。For example, Sina, Sohu, Baidu, Tencent and other comprehensive websites contain a large amount of information in various fields. Users may include data on the online behavior data of the comprehensive website that cannot accurately reflect the user's real interests or behaviors, such as the user's unintentional. A page that opens between pages, or a website that is opened by a wrong operation. The vertical website is highly targeted, and the online behavior data on the vertical website can most effectively reflect the user's hobbies or behaviors in a certain field. Therefore, in this embodiment, the online behavior data of the user on the vertical website is collected for the user portrait.
具体地,所述用户终端100检测到用户在浏览器中打开网站时,首先检测打开的网站是否为预先定义的垂直网站。在本实施例中,所述用户终端100可以在接收到用户输入的网址后向所述服务器200请求检测该网址对应的网站是否为垂直网站。另外,所述用户终端100也可以在接收到用户输入的网址后,在自身存储的垂直网站列表中查询判断该网址对应网站是否为垂直网站。Specifically, when the user terminal 100 detects that the user opens the website in the browser, it first detects whether the opened website is a predefined vertical website. In this embodiment, the user terminal 100 may, after receiving the web address input by the user, request the server 200 to detect whether the website corresponding to the web address is a vertical website. In addition, after receiving the web address input by the user, the user terminal 100 may also query, in the list of vertical websites stored by itself, whether the website corresponding to the web address is a vertical website.
当打开的网站不是预先定义的垂直网站时,则不对该在网站进行的上网行为产生的数据进行收集;当打开的网站为预先定义的垂直网站时,通过步骤S120及之后的步骤对该网站上的上网行为产生的数据进行收集。When the opened website is not a predefined vertical website, the data generated by the online behavior of the website is not collected; when the opened website is a predefined vertical website, the steps are performed on the website through steps S120 and subsequent steps. The data generated by the online behavior is collected.
步骤S120,当所述浏览器中打开的网站为垂直网站时,从用户终端预存的识别规则中查找与该垂直网站匹配的识别规则。Step S120: When the website opened in the browser is a vertical website, the identification rule matching the vertical website is searched for from the identification rule pre-stored by the user terminal.
具体地,所述用户终端100中存储有与各垂直网站对应的识别规则,在本实施例中,可根据垂直网站的域名,查找与该垂直网站匹配的识别规则。Specifically, the user terminal 100 stores an identification rule corresponding to each vertical website. In this embodiment, the identification rule matching the vertical website may be searched according to the domain name of the vertical website.
所述识别规则中记录有在垂直网站上需要采集记录的数据项,所述识别规则可以通过Protocol Buffer、Json或其他语言的等格式书写。以Json格式为例,所述识别规则可以为以下形式:The identification rule records data items that need to be collected on a vertical website, and the identification rules can be written in a format such as Protocol Buffer, Json, or other languages. Taking the Json format as an example, the identification rule may be in the following form:
Figure PCTCN2018075599-appb-000001
Figure PCTCN2018075599-appb-000001
进一步地,由于各垂直网站的页面结构可能会发生更新或变化,在本实施例中,所述识别规则中可包括一有效时长,所述有效时长标志该识别规则在被所述用户终端100获取后多长时间后失效。Further, the page structure of each vertical website may be updated or changed. In this embodiment, the identification rule may include an effective duration, and the valid duration indicates that the identification rule is acquired by the user terminal 100. After how long after the failure.
所述用户终端100在浏览器开启时针对所述垂直网站的识别规则,根据获得该识别规则的时间点、该识别规则的有效时长及当前时间点,判断该识别规则是否有效。当该识别规则无效时,先从所述服务器200获取该垂直网站对应的有效识别规则并存储在所述用户终端100。The user terminal 100 determines whether the identification rule is valid according to the identification rule of the vertical website when the browser is opened, according to the time point of obtaining the identification rule, the effective duration of the identification rule, and the current time point. When the identification rule is invalid, the valid identification rule corresponding to the vertical website is first acquired from the server 200 and stored in the user terminal 100.
在本实施例中,所述有效时长根据服务器200产生识别规则的时间及该识别规则的预设存留时长而确定。以一假设场景为例,所述存留时长可以设置为7天,即604800秒,假设所述服务器200产生垂直网站A的识别规则的时间点为T0,则到时间点T0+604800秒之前,该识别规则都是有效的。如果用户终端100在T0+86400秒(刚好在规则生成后1天)从所述服务器200获取该识别规则,则下发的识别规则中的”effective_period”字段所表示的有效时长应该是:604800-86400=518400秒(6天)。即对于服务器200所定义的任意一个垂直网站的识别规则,在其存留时长(7天)内的任意时间点,所述用户终端100获取规则的有效期,应该是“存留时长”与“当前时间点”之差。如此,保证了所述识别规则的有效性。In this embodiment, the effective duration is determined according to the time when the server 200 generates the identification rule and the preset duration of the identification rule. Taking a hypothetical scenario as an example, the duration of the stay may be set to 7 days, that is, 604800 seconds. Assuming that the time when the server 200 generates the identification rule of the vertical website A is T0, before the time point T0+604800 seconds, the time is The recognition rules are valid. If the user terminal 100 acquires the identification rule from the server 200 at T0+86400 seconds (just one day after the rule is generated), the effective time indicated by the "effective_period" field in the issued identification rule should be: 604800- 86400 = 518400 seconds (6 days). That is, for any vertical website identification rule defined by the server 200, the user terminal 100 acquires the validity period of the rule at any time point within the duration of the retention period (7 days), which should be the "stay duration" and the "current time point". "Difference. In this way, the validity of the identification rule is guaranteed.
进一步地,若所述用户终端100未检测到与某一垂直网站匹配的识别规则,则标记该垂直网站为无匹配的识别规则,并设定在一定的期限后(可以预设为7天,也可由所述用户终端100动态调整),再次向所述服务器200查询该垂直网站是否存在识别规则。如此,可保证能获取最新定义的垂直网站的识别规则。Further, if the user terminal 100 does not detect the identification rule that matches a certain vertical website, the vertical website is marked as a matching rule without matching, and is set after a certain period of time (can be preset to 7 days, It can also be dynamically adjusted by the user terminal 100 to query the server 200 again whether the vertical website has an identification rule. In this way, it is guaranteed to obtain the identification rules of the newly defined vertical website.
步骤S130,根据所述识别规则识别并记录用户在该垂直网站上的操作行为产生的数据,并根据所述操作行为产生的数据及该用户终端100的用户身份信息生成预设格式的上网行为记录文件。Step S130: Identify and record data generated by the user's operation behavior on the vertical website according to the identification rule, and generate a preset behavior of the online behavior record according to the data generated by the operation behavior and the user identity information of the user terminal 100. file.
由于在用户的上网行为中的检索行为最能反映出用户的直接喜好或意愿,故在本实施例中,所述用户终端100收集用户在垂直网站上进行信息检索时,对检索结果进行操作产生的数据,可以使得用户画像的精准度更高。所述用户终端100在检测到用户在所述垂直网站上进行信息检索操作时,在上网行为记录文件中记录用户对检索结果的操作行为。In the embodiment, the user terminal 100 collects the user's online behavior, and the user terminal 100 collects the information on the vertical website to perform the operation. The data can make the user's portrait more accurate. When detecting that the user performs an information retrieval operation on the vertical website, the user terminal 100 records the user's operation behavior on the retrieval result in the online behavior record file.
由于检索结果的操作行为可能包括多种类型,在本实施例中,根据其中最能体现用户浏览意愿的行为,设置了在所述垂直网站上针对检索结果的预设操作。所述识别规则包括至少一种所述预设操作,当所述用户终端100侦 测到对所述检索结果进行的操作符合所述预设操作时,保留该操作行为产生的数据,当侦测到对所述检索结果进行的操作不符合所述预设操作时,丢弃该操作行为产生的数据。Since the operational behavior of the retrieval result may include a plurality of types, in the present embodiment, a preset operation for the retrieval result on the vertical website is set according to the behavior in which the user's browsing intention is most reflected. The identification rule includes at least one of the preset operations, and when the user terminal 100 detects that the operation performed on the retrieval result conforms to the preset operation, retaining data generated by the operation behavior when detecting When the operation performed on the search result does not conform to the preset operation, the data generated by the operation behavior is discarded.
在本实施例中,所述预设操作可以包括从检索获得的资源链接对应的显示页面中打开或跳转至新的页面。以步骤S120中所述情形为例,当识别规则中“result_click_collect_method”项的值为“chosen”时,表示所述预设操作可以包括从检索获得的资源链接对应的显示页面中打开或跳转至新的页面。In this embodiment, the preset operation may include opening or jumping to a new page from a display page corresponding to the resource link obtained by the retrieval. Taking the situation in step S120 as an example, when the value of the "result_click_collect_method" item in the identification rule is "chosen", it indicates that the preset operation may include opening or jumping to the display page corresponding to the resource link obtained by the retrieval. New page.
详细地,所述用户终端100记录操作行为产生的数据的过程包括子步骤S201、子步骤S202、子步骤S023及子步骤S204。请参照图4,下面对子步骤S201、子步骤S202、子步骤S023及子步骤S204进行详细阐述。In detail, the process in which the user terminal 100 records data generated by the operation behavior includes sub-step S201, sub-step S202, sub-step S023, and sub-step S204. Referring to FIG. 4, sub-step S201, sub-step S202, sub-step S023, and sub-step S204 are described in detail below.
子步骤S201,接收对所述检索结果中的资源链接的点击操作,获取并记录该资源链接的链接信息,其中,所述链接信息包括该资源链接对应的页面标签中的显示文本及资源链接地址。Sub-step S201, receiving a click operation on the resource link in the search result, acquiring and recording the link information of the resource link, where the link information includes the display text and the resource link address in the page label corresponding to the resource link. .
优选地,所述识别规则中还可包括资源定位信息,所述资源定位信息中包括所述资源链接对应的页面标签在该垂直网站检索结果页面的页面文件中的位置。以步骤S120中所述情形为例,所述识别规则中“resource_entry”项为所述资源定位信息。在本实施例中,可使用xpath方式进行定义,定义支持多个xpath路径,之间可用特定符号(如逗号)分隔。如所述资源定位信息为“id('results')/x:div[3]/x:div[2]/x:a”时,表示该资源的连接信息记录于搜索结果页面的html文件中id为results的div标签的第3个一级div子标签中的第2个二级div子标签中的a标签里。Preferably, the identification rule may further include resource location information, where the resource location information includes a location of a page tag corresponding to the resource link in a page file of the vertical website search result page. Taking the situation in step S120 as an example, the item “resource_entry” in the identification rule is the resource location information. In this embodiment, it can be defined by using the xpath method, and the definition supports multiple xpath paths, which can be separated by specific symbols (such as commas). If the resource location information is “id('results')/x:div[3]/x:div[2]/x:a”, the connection information indicating the resource is recorded in the html file of the search result page. The id is the a tag in the second secondary div subtag in the third level div subtag of the div tag of results.
所述用户终端100接收在所述检索结果页面上的点击操作,判断被点击资源链接对应的页面标签位置是否与所述资源定位信息中指示的页面标签位置一致。当所述页面标签位置一致时,获取记录该页面标签中的链接信息。The user terminal 100 receives a click operation on the search result page, and determines whether the page label position corresponding to the clicked resource link is consistent with the page label position indicated in the resource location information. When the page label positions are consistent, the link information in the page label is obtained.
进一步地,所述用户终端100在打开搜索结果页面时,先根据所述资源定位信息判断该搜索页面中是否存在与所述资源定位信息路径一致的标签,当未找到与所述资源定位信息路径一致的标签时,判定为本次搜索行为没有检索到相应的结果,则结束本次上网行为的记录。Further, when the user terminal 100 opens the search result page, it is determined, according to the resource location information, whether there is a label in the search page that is consistent with the resource location information path, and the resource location information path is not found. When the label is consistent, it is determined that the search result does not retrieve the corresponding result, and the record of the online behavior is ended.
子步骤S202,根据所述资源链接地址显示该资源链接的资源页面,并检测在该显示的资源页面上执行的操作。Sub-step S202, displaying a resource page of the resource link according to the resource link address, and detecting an operation performed on the displayed resource page.
所述用户终端100根据获得的资源链接地址请求并显示资源页面,并检测在所述资源页面上的操作。The user terminal 100 requests and displays a resource page according to the obtained resource link address, and detects an operation on the resource page.
子步骤S203,当检测到从该显示的资源页面上打开或跳转至新的页面时,保留记录的该资源页面对应的链接信息。Sub-step S203, when it is detected that the resource page is opened or jumped to the new page from the displayed resource page, the link information corresponding to the recorded resource page is retained.
所述用户终端100检测到从该显示的资源页面上打开或跳转至新的页面时,判定本次执行的操作符合所述预设操作,则保留记录的该资源页面对应的链接信息。When the user terminal 100 detects that the operation is performed on the displayed resource page or jumps to a new page, it is determined that the operation performed this time is consistent with the preset operation, and the link information corresponding to the recorded resource page is retained.
子步骤S204,当检测到该显示的资源页面被关闭且在该资源页面上没有打开或跳转至新的页面时,删除记录的该资源页面对应的链接信息。Sub-step S204, when it is detected that the displayed resource page is closed and does not open or jump to a new page on the resource page, the link information corresponding to the recorded resource page is deleted.
详细地,所述用户终端100检测到该显示的资源页面被关闭且在该资源页面上没有打开或跳转至新的页面时,则认为本次执行的操作不符合所述预设操作,进而可删除记录的该资源页面对应的链接信息。In detail, when the user terminal 100 detects that the displayed resource page is closed and does not open or jump to a new page on the resource page, it is considered that the operation performed this time does not comply with the preset operation, and further The link information corresponding to the recorded resource page can be deleted.
值得说明的是,上述预设操作的定义仅为本实施例提供的一种实施方式,根据不同的网站页面文件的结构或不同的数据需求,可以定义其他形式的预设操作,如,所述预设操作可以仅为对搜索结果的点击操作。It should be noted that the definition of the foregoing preset operation is only an implementation manner provided by this embodiment. According to the structure of different website page files or different data requirements, other forms of preset operations may be defined, for example, The default action can be just a click on the search results.
在本实施中,所述上网行为记录文件的表现形式可以如下所示:In this implementation, the online behavior record file can be expressed as follows:
Figure PCTCN2018075599-appb-000002
Figure PCTCN2018075599-appb-000002
Figure PCTCN2018075599-appb-000003
Figure PCTCN2018075599-appb-000003
步骤S140,将所述上网行为记录文件发送给所述服务器200进行用户画像。Step S140, the online behavior record file is sent to the server 200 for a user portrait.
在本实施例中,所述上网行为记录文件中还可以包括检索操作中用户输入的检索词、检索请求的地址及cookie字段等,以用于在用户画像中还原用户搜索的页面以采集用户数据。In this embodiment, the online behavior record file may further include a search term input by the user in the retrieval operation, an address of the search request, a cookie field, and the like, for restoring the user searched page in the user portrait to collect the user data. .
请参照图5,本实施例还提供一种上网行为记录装置110,应用于与服务器200通信连接的用户终端100,所述装置包括检测模块111、识别规则匹配模块112、行为记录模块113及发送模块114。Referring to FIG. 5, the embodiment further provides an online behavior recording apparatus 110, which is applied to a user terminal 100 communicatively coupled to a server 200. The apparatus includes a detection module 111, an identification rule matching module 112, a behavior recording module 113, and a transmission. Module 114.
所述检测模块111,用于当浏览器中打开一网站时,检测该网站是否为预先定义的垂直网站。本实施例中,所述垂直网站可以是用于提供特定领域信息或相关服务的网站。The detecting module 111 is configured to detect whether the website is a predefined vertical website when a website is opened in the browser. In this embodiment, the vertical website may be a website for providing specific domain information or related services.
本实施例中,所述检测模块111可用于执行图3所示的步骤S110,关于所述检测模块111的具体描述可参对所述步骤S110的描述。In this embodiment, the detecting module 111 can be used to perform step S110 shown in FIG. 3, and the description of the step S110 can be referred to for a specific description of the detecting module 111.
进一步地,所述识别规则中包括该识别规则的有效时长;所述装置还包括识别规则检验模块115及识别规则获取模块116。Further, the identification rule includes an effective duration of the identification rule; the device further includes an identification rule verification module 115 and an identification rule acquisition module 116.
所述识别规则检验模块115,用于针对所述垂直网站的识别规则,根据获得该识别规则的时间点、该识别规则的有效时长及当前时间点,判断该识别规则是否有效;The identification rule checking module 115 is configured to determine, according to the identification rule of the vertical website, whether the identification rule is valid according to a time point at which the identification rule is obtained, an effective duration of the identification rule, and a current time point;
所述识别规则获取模块116,用于当该识别规则无效时,从所述服务器200获取该垂直网站对应的有效识别规则并存储在所述用户终端100。The identification rule obtaining module 116 is configured to acquire a valid identification rule corresponding to the vertical website from the server 200 when the identification rule is invalid, and store the rule in the user terminal 100.
所述识别规则匹配模块112,用于当所述浏览器中打开的网站为垂直网站时,从用户终端预存的识别规则中查找与该垂直网站匹配的识别规则。The identification rule matching module 112 is configured to: when the website opened in the browser is a vertical website, search for an identification rule matching the vertical website from the identification rules pre-stored by the user terminal.
本实施例中,所述识别规则匹配模块112可用于执行图3所示的步骤S120,关于所述识别规则匹配模块112的具体描述可参对所述步骤S120的描述。In this embodiment, the identification rule matching module 112 can be used to perform step S120 shown in FIG. 3, and the description of the step S120 can be referred to for a specific description of the identification rule matching module 112.
所述行为记录模块113,用于根据所述识别规则识别并记录用户在该垂直网站上的操作行为产生的数据,并根据所述操作行为产生的数据及该用户终端100的用户身份信息生成预设格式的上网行为记录文件。The behavior record module 113 is configured to identify and record data generated by the user's operation behavior on the vertical website according to the identification rule, and generate a pre-process according to the data generated by the operation behavior and the user identity information of the user terminal 100. Formatted online behavior record file.
本实施例中,所述行为记录模块113可用于执行图3所示的步骤S130,关于所述行为记录模块113的具体描述可参对所述步骤S130的描述。In this embodiment, the behavior recording module 113 can be used to perform step S130 shown in FIG. 3, and the description of the step S130 can be referred to for a detailed description of the behavior recording module 113.
进一步地,所述操作行为产生的数据包括用户在所述垂直网站进行信息检索时,对检索结果进行操作产生的数据。Further, the data generated by the operation behavior includes data generated by an operation of the retrieval result when the user performs information retrieval on the vertical website.
进一步地,所述识别规则中包括至少一种在所述垂直网站上针对检索结果的预设操作;所述行为记录模块113包括检索行为记录子模块1131及有效行为筛选子模块1132。Further, the identification rule includes at least one preset operation for the retrieval result on the vertical website; the behavior recording module 113 includes a retrieval behavior recording sub-module 1131 and an effective behavior screening sub-module 1132.
所述检索行为记录子模块1131,用于在检测到用户在所述垂直网站上进行信息检索操作时,在上网行为记录文件中记录用户对检索结果的操作行为;The search behavior record sub-module 1131 is configured to record, in the online behavior record file, an operation behavior of the user on the search result when detecting that the user performs an information retrieval operation on the vertical website;
所述有效行为筛选子模块1132,用于当侦测到对所述检索结果进行的操作符合所述预设操作时,保留该操作行为产生的数据,当侦测到对所述检索结果进行的操作不符合所述预设操作时,丢弃该操作行为产生的数据。The effective behavior screening sub-module 1132 is configured to retain data generated by the operation behavior when detecting that the operation performed on the retrieval result conforms to the preset operation, when detecting the retrieval result When the operation does not comply with the preset operation, the data generated by the operation behavior is discarded.
进一步地,所述预设操作包括从检索获得的资源链接对应的显示页面中打开或跳转至新的页面;所述有效行为筛选子模块1132记录上网行为的方式, 包括:Further, the preset operation includes opening or jumping to a new page in the display page corresponding to the resource link obtained by the retrieval; the manner in which the effective behavior screening sub-module 1132 records the online behavior includes:
接收对所述检索结果中的资源链接的点击操作,获取并记录该资源链接的链接信息,其中,所述链接信息包括该资源链接对应的页面标签中的显示文本及资源链接地址;Receiving a click operation on the resource link in the search result, acquiring and recording link information of the resource link, where the link information includes a display text and a resource link address in a page label corresponding to the resource link;
根据所述资源链接地址显示该资源链接的资源页面,并检测在该显示的资源页面上执行的操作;Displaying a resource page of the resource link according to the resource link address, and detecting an operation performed on the displayed resource page;
当检测到从该显示的资源页面上打开或跳转至新的页面时,保留记录的该资源页面对应的链接信息;When it is detected that the resource page is opened or jumped to the new page from the displayed resource page, the link information corresponding to the resource page of the record is retained;
当检测到该显示的资源页面被关闭且在该资源页面上没有打开或跳转至新的页面时,删除记录的该资源页面对应的链接信息。When it is detected that the displayed resource page is closed and does not open or jump to a new page on the resource page, the link information corresponding to the recorded resource page is deleted.
进一步地,所述识别规则中还包括资源定位信息,所述资源定位信息中包括所述资源链接对应的页面标签在该垂直网站检索结果页面的页面文件中的位置;所述有效行为筛选子模块1132获取并记录链接信息的方式,包括:Further, the identification rule further includes resource location information, where the resource location information includes a location of a page tag corresponding to the resource link in a page file of the vertical website search result page; and the effective behavior screening submodule 1132 How to obtain and record link information, including:
接收在所述检索结果页面上的点击操作,判断被点击资源链接对应的页面标签位置是否与所述资源定位信息中指示的页面标签位置一致;Receiving a click operation on the search result page, determining whether a page label position corresponding to the clicked resource link is consistent with a page label position indicated in the resource positioning information;
当所述页面标签位置一致时,获取记录该页面标签中的链接信息。When the page label positions are consistent, the link information in the page label is obtained.
所述发送模块114,用于将所述上网行为记录文件发送给所述服务器200进行用户画像。The sending module 114 is configured to send the online behavior record file to the server 200 for a user portrait.
本实施例中,所述发送模块114可用于执行图3所示的步骤S140,关于所述发送模块114的具体描述可参对所述步骤S140的描述。In this embodiment, the sending module 114 can be used to perform step S140 shown in FIG. 3, and a detailed description of the sending module 114 can refer to the description of the step S140.
进一步地,所述上网行为记录文件中包括所述检索操作的检索词、检索请求地址及cookie字段。Further, the online behavior record file includes a search term, a search request address, and a cookie field of the search operation.
请参照图6,图6为应用于图1所示服务器200的一种上网行为收集方法的流程图,所述服务器200与本实施例提供的用户终端100通信连接,所述服务器200包括一用于记录用户各类上网行为产生的数据以及所述上网行为权重的数据库,以下将对所述方法包括各个步骤进行详细阐述。Please refer to FIG. 6. FIG. 6 is a flowchart of a method for collecting an online behavior applied to the server 200 shown in FIG. 1. The server 200 is communicably connected to the user terminal 100 provided in this embodiment, and the server 200 includes a In the database for recording the data generated by various online behaviors of the user and the weight of the online behavior, the method includes the following steps.
步骤S310,接收所述用户终端100发送的上网行为记录文件,分析获得所述上网行为记录文件中用户上网行为产生的数据并存储至所述数据库。Step S310: Receive an online behavior record file sent by the user terminal 100, analyze and obtain data generated by the user's online behavior in the online behavior record file, and store the data in the database.
在本实施例中,所述服务器200提取所述上网行为产生的数据或根据所述上网行为产生的数据通过网络爬虫爬取的相关内容,并将其整理为结构化 的信息,例如,对小说类垂直网站,将所述上网行为产生的数据整理为以下项目:In this embodiment, the server 200 extracts data generated by the online behavior or related content crawled by the web crawler according to the data generated by the online behavior, and organizes the content into structured information, for example, a novel. A vertical website that organizes the data generated by the online behavior into the following items:
目标资源:资源名称+作者名Target resource: resource name + author name
目标资源链接地址:资源链接地址(资源页面的地址)Target resource link address: resource link address (address of resource page)
垂直类别:垂直网站的类型Vertical category: type of vertical website
资源类别:抽取出的资源类别Resource category: extracted resource category
所述服务器200以用户为单位,为每个用户建立专属的的目标资源数据表和资源类别数据表,并将上述整理的数据存储在数据库中。The server 200 establishes a dedicated target resource data table and a resource category data table for each user in units of users, and stores the above-mentioned collated data in a database.
步骤S320,根据所述上网行为产生的数据,更新所述数据库中该上网行为的权重值。Step S320: Update the weight value of the online behavior in the database according to the data generated by the online behavior.
在本实施例中,以用户身份信息为主键,把垂直类别/目标资源/目标资源链接地址等结构化数据作为记录存入到数据库。In this embodiment, the user identity information is used as the primary key, and structured data such as the vertical category/target resource/target resource link address is stored as a record in the database.
若数据库没有相同的记录,该次为新增记录,权重值可设定为100。If the database does not have the same record, this time is a new record, and the weight value can be set to 100.
若数据库存在相同的记录,即用户的身份标识/垂直类别/资源类别等字段值都相同,该用户之前已有相同的资源类别属性,则把当前记录中的权重值进行更新(如增加100)。If the database has the same record, that is, the user's identity/vertical category/resource category and other field values are the same, and the user has the same resource category attribute before, the weight value in the current record is updated (for example, by 100) .
如此,便于在使用用户的垂直类别中的资源类别画像数据时,对于存在多个资源类别的用户,可以通过权重值的多与少,来区分其对不同的资源类别的兴趣度的高与低。权重值越高,说明用户的兴趣度越大。In this way, when the resource category image data in the vertical category of the user is used, for the user having multiple resource categories, the degree of interest in different resource categories can be distinguished by the weight value of the user. . The higher the weight value, the greater the user's interest.
请参照图7,本实施例还提供一种上网行为记录装置210,应用于服务器200,所述服务器200与本实施例提供的用户终端100通信连接,所述服务器200包括一用于记录用户各类上网行产生的数据以及所述上网行为权重的数据库;所述装置包括数据获取模块211及权重更新模块212。Referring to FIG. 7, the embodiment further provides an online behavior recording device 210, which is applied to the server 200. The server 200 is in communication with the user terminal 100 provided in this embodiment, and the server 200 includes a user for recording The data generated by the Internet-based line and the database of the online behavior weights; the device includes a data acquisition module 211 and a weight update module 212.
所述数据获取模块211,用于接收所述用户终端100发送的上网行为记录文件,分析获得所述上网行为记录文件中用户上网行为产生的数据并存储至所述数据库。The data obtaining module 211 is configured to receive an online behavior record file sent by the user terminal 100, analyze and obtain data generated by the user's online behavior in the online behavior record file, and store the data in the database.
本实施例中,所述数据获取模块211可用于执行图6所示的步骤S310,关于所述数据获取模块211的具体描述可参对所述步骤S310的描述。In this embodiment, the data obtaining module 211 can be used to perform step S310 shown in FIG. 6. For a detailed description of the data acquiring module 211, the description of the step S310 can be referred to.
所述权重更新模块212,用于根据所述上网行为产生的数据,更新所述数据库中该上网行为的权重值。The weight update module 212 is configured to update a weight value of the online behavior in the database according to the data generated by the online behavior.
本实施例中,所述权重更新模块212可用于执行图6所示的步骤S320,关于所述权重更新模块212的具体描述可参对所述步骤S320的描述。In this embodiment, the weight update module 212 can be used to perform step S320 shown in FIG. 6. For a detailed description of the weight update module 212, the description of the step S320 can be referred to.
综上所述,本发明提供的上网行为记录方法、装置及服务器,通过针对不同的网站采用相应的识别规则,收集用户在垂直网站的上网行为产生的数据,可以收集到能准确反映用户真实兴趣爱好或行为习惯的数据,有助于对用户实施精准的用户画像。In summary, the online behavior recording method, device and server provided by the present invention collect data generated by the online behavior of the user on a vertical website by using corresponding identification rules for different websites, and can collect and accurately reflect the real interest of the user. Data on hobbies or behavioral habits helps to implement accurate user portraits for users.
在本申请所提供的实施例中,应该理解到,所揭露的装置和方法,也可以通过其它的方式实现。以上所描述的装置实施例仅仅是示意性的,例如,附图中的流程图和框图显示了根据本发明的多个实施例的装置、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段或代码的一部分,所述模块、程序段或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现方式中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个连续的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或动作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。In the embodiments provided by the present application, it should be understood that the disclosed apparatus and method may also be implemented in other manners. The apparatus embodiments described above are merely illustrative, for example, the flowcharts and block diagrams in the figures illustrate the architecture, functionality, and functionality of possible implementations of apparatus, methods, and computer program products according to various embodiments of the invention. operating. In this regard, each block of the flowchart or block diagram can represent a module, a program segment, or a portion of code that includes one or more of the Executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the blocks may also occur in a different order than those illustrated in the drawings. For example, two consecutive blocks may be executed substantially in parallel, and they may sometimes be executed in the reverse order, depending upon the functionality involved. It is also noted that each block of the block diagrams and/or flowcharts, and combinations of blocks in the block diagrams and/or flowcharts, can be implemented in a dedicated hardware-based system that performs the specified function or function. Or it can be implemented by a combination of dedicated hardware and computer instructions.
另外,在本发明各个实施例中的各功能模块可以集成在一起形成一个独立的部分,也可以是各个模块单独存在,也可以两个或两个以上模块集成形成一个独立的部分。In addition, each functional module in each embodiment of the present invention may be integrated to form a separate part, or each module may exist separately, or two or more modules may be integrated to form a separate part.
所述功能如果以软件功能模块的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。The functions, if implemented in the form of software functional modules and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention, which is essential or contributes to the prior art, or a part of the technical solution, may be embodied in the form of a software product, which is stored in a storage medium, including The instructions are used to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention. The foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like. .
需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。It should be noted that, in this context, relational terms such as first and second are used merely to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply such entities or operations. There is any such actual relationship or order between them. Furthermore, the term "comprises" or "comprises" or "comprises" or any other variations thereof is intended to encompass a non-exclusive inclusion, such that a process, method, article, or device that comprises a plurality of elements includes not only those elements but also Other elements, or elements that are inherent to such a process, method, item, or device. An element that is defined by the phrase "comprising a ..." does not exclude the presence of additional equivalent elements in the process, method, item, or device that comprises the element.
以上所述仅为本发明的优选实施例而已,并不用于限制本发明,对于本领域的技术人员来说,本发明可以有各种更改和变化。凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步定义和解释。The above description is only the preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes can be made to the present invention. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and scope of the present invention are intended to be included within the scope of the present invention. It should be noted that similar reference numerals and letters indicate similar items in the following figures, and therefore, once an item is defined in a drawing, it is not necessary to further define and explain it in the subsequent drawings.
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应所述以权利要求的保护范围为准。The above is only a specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily think of changes or substitutions within the technical scope of the present invention. It should be covered by the scope of the present invention. Therefore, the scope of the invention should be determined by the scope of the claims.

Claims (20)

  1. 一种上网行为记录方法,应用于与服务器通信连接的用户终端,其特征在于,所述方法包括:An online behavior recording method is applied to a user terminal that is in communication with a server, and the method includes:
    当浏览器中打开一网站时,检测该网站是否为预先定义的垂直网站,所述垂直网站为提供特定领域信息或相关服务的网站;When a website is opened in the browser, it is detected whether the website is a predefined vertical website, and the vertical website is a website providing specific domain information or related services;
    当所述浏览器中打开的网站为垂直网站时,从用户终端预存的识别规则中查找与该垂直网站匹配的识别规则;When the website opened in the browser is a vertical website, the identification rule matching the vertical website is searched from the identification rules pre-stored by the user terminal;
    根据所述识别规则识别并记录用户在该垂直网站上的操作行为产生的数据,并根据所述操作行为产生的数据及该用户终端的用户身份信息生成预设格式的上网行为记录文件;Identifying and recording data generated by the user's operation behavior on the vertical website according to the identification rule, and generating an online behavior record file in a preset format according to the data generated by the operation behavior and the user identity information of the user terminal;
    将所述上网行为记录文件发送给所述服务器进行用户画像。Sending the online behavior record file to the server for user portrait.
  2. 根据权利要求1所述的方法,其特征在于,所述操作行为产生的数据包括用户在所述垂直网站进行信息检索时,对检索结果进行操作产生的数据。The method according to claim 1, wherein the data generated by the operation behavior comprises data generated by an operation of a retrieval result by a user when the vertical website performs information retrieval.
  3. 根据权利要求2所述的方法,其特征在于,所述识别规则中包括至少一种在所述垂直网站上针对检索结果的预设操作;The method according to claim 2, wherein said identification rule includes at least one preset operation for the retrieval result on said vertical website;
    所述根据所述识别规则识别并记录用户在该垂直网站上的操作行为产生的数据,并根据所述操作行为产生的数据及该用户终端的用户身份信息生成预设格式的上网行为记录文件的步骤,包括:The data generated by the operation behavior of the user on the vertical website is identified and recorded according to the identification rule, and the online behavior record file in a preset format is generated according to the data generated by the operation behavior and the user identity information of the user terminal. Steps, including:
    在检测到用户在所述垂直网站上进行信息检索操作时,在上网行为记录文件中记录用户对检索结果的操作行为;When detecting that the user performs an information retrieval operation on the vertical website, the user's operation behavior on the retrieval result is recorded in the online behavior record file;
    当侦测到对所述检索结果进行的操作符合所述预设操作时,保留该操作行为产生的数据,当侦测到对所述检索结果进行的操作不符合所述预设操作时,丢弃该操作行为产生的数据。When it is detected that the operation performed on the search result conforms to the preset operation, retaining data generated by the operation behavior, and discarding when the operation performed on the search result does not meet the preset operation, discarding The data generated by this operational behavior.
  4. 根据权利要求3所述的方法,其特征在于,所述预设操作包括从检索获得的资源链接对应的显示页面中打开或跳转至新的页面;The method according to claim 3, wherein the preset operation comprises opening or jumping to a new page from a display page corresponding to the resource link obtained by the retrieval;
    所述当侦测到对所述检索结果进行的操作符合所述预设操作时,保留该操作行为产生的数据,当侦测到对所述检索结果进行的操作不符合所述预设操作时,丢弃该操作行为产生的数据的步骤,包括:When it is detected that the operation performed on the retrieval result conforms to the preset operation, retaining data generated by the operation behavior, when detecting that the operation performed on the retrieval result does not conform to the preset operation , the steps of discarding the data generated by the operational behavior, including:
    接收对所述检索结果中的资源链接的点击操作,获取并记录该资源链接的链接信息,其中,所述链接信息包括该资源链接对应的页面标签中的显示 文本及资源链接地址;Receiving a click operation on the resource link in the search result, acquiring and recording link information of the resource link, where the link information includes a display text and a resource link address in a page label corresponding to the resource link;
    根据所述资源链接地址显示该资源链接的资源页面,并检测在该显示的资源页面上执行的操作;Displaying a resource page of the resource link according to the resource link address, and detecting an operation performed on the displayed resource page;
    当检测到从该显示的资源页面上打开或跳转至新的页面时,保留记录的该资源页面对应的链接信息;When it is detected that the resource page is opened or jumped to the new page from the displayed resource page, the link information corresponding to the resource page of the record is retained;
    当检测到该显示的资源页面被关闭且在该资源页面上没有打开或跳转至新的页面时,删除记录的该资源页面对应的链接信息。When it is detected that the displayed resource page is closed and does not open or jump to a new page on the resource page, the link information corresponding to the recorded resource page is deleted.
  5. 根据权利要求4所述的方法,其特征在于,所述识别规则中还包括资源定位信息,所述资源定位信息中包括所述资源链接对应的页面标签在该垂直网站检索结果页面的页面文件中的位置;The method according to claim 4, wherein the identification rule further includes resource location information, wherein the resource location information includes a page label corresponding to the resource link in a page file of the vertical website retrieval result page. s position;
    所述接收对所述检索结果中的资源链接的点击操作,获取并记录该资源链接的链接信息的步骤,包括:The step of receiving a click operation on a resource link in the search result, acquiring and recording link information of the resource link, including:
    接收在所述检索结果页面上的点击操作,判断被点击资源链接对应的页面标签位置是否与所述资源定位信息中指示的页面标签位置一致;Receiving a click operation on the search result page, determining whether a page label position corresponding to the clicked resource link is consistent with a page label position indicated in the resource positioning information;
    当所述页面标签位置一致时,获取记录该页面标签中的链接信息。When the page label positions are consistent, the link information in the page label is obtained.
  6. 根据权利要求2所述的方法,其特征在于,所述上网行为记录文件中包括所述检索操作的检索词、检索请求地址及cookie字段。The method according to claim 2, wherein the online behavior record file includes a search term, a search request address, and a cookie field of the retrieval operation.
  7. 根据权利要求1所述的方法,其特征在于,所述识别规则中包括该识别规则的有效时长;所述方法还包括:The method according to claim 1, wherein the identification rule includes an effective duration of the identification rule; the method further includes:
    针对所述垂直网站的识别规则,根据获得该识别规则的时间点、该识别规则的有效时长及当前时间点,判断该识别规则是否有效;And determining, according to the identification rule of the vertical website, whether the identification rule is valid according to a time point at which the identification rule is obtained, an effective duration of the identification rule, and a current time point;
    当该识别规则无效时,从所述服务器获取该垂直网站对应的有效识别规则并存储在所述用户终端。When the identification rule is invalid, the valid identification rule corresponding to the vertical website is obtained from the server and stored in the user terminal.
  8. 一种上网行为记录装置,应用于与服务器通信连接的用户终端,其特征在于,所述装置包括:An online behavior recording device is applied to a user terminal that is in communication with a server, and the device includes:
    检测模块,用于当浏览器中打开一网站时,检测该网站是否为预先定义的垂直网站,所述垂直网站为提供特定领域信息或相关服务的网站;a detecting module, configured to detect whether the website is a predefined vertical website when the website is opened in the browser, and the vertical website is a website that provides specific domain information or related services;
    识别规则匹配模块,用于当所述浏览器中打开的网站为垂直网站时,从用户终端预存的识别规则中查找与该垂直网站匹配的识别规则;The identification rule matching module is configured to: when the website opened in the browser is a vertical website, search for an identification rule matching the vertical website from the identification rule prestored by the user terminal;
    行为记录模块,用于根据所述识别规则识别并记录用户在该垂直网站上 的操作行为产生的数据,并根据所述操作行为产生的数据及该用户终端的用户身份信息生成预设格式的上网行为记录文件;a behavior record module, configured to identify and record data generated by an operation behavior of the user on the vertical website according to the identification rule, and generate a preset format online according to the data generated by the operation behavior and the user identity information of the user terminal. Behavior record file;
    发送模块,用于将所述上网行为记录文件发送给所述服务器进行用户画像。And a sending module, configured to send the online behavior record file to the server for a user portrait.
  9. 根据权利要求8所述的装置,其特征在于,所述操作行为产生的数据包括用户在所述垂直网站进行信息检索时,对检索结果进行操作产生的数据。The apparatus according to claim 8, wherein the data generated by the operation behavior includes data generated by an operation of a retrieval result by a user when the vertical website performs information retrieval.
  10. 根据权利要求9所述的装置,其特征在于,所述识别规则中包括至少一种在所述垂直网站上针对检索结果的预设操作;所述行为记录模块包括:The device according to claim 9, wherein the identification rule comprises at least one preset operation for the retrieval result on the vertical website; the behavior recording module comprises:
    检索行为记录子模块,用于在检测到用户在所述垂直网站上进行信息检索操作时,在上网行为记录文件中记录用户对检索结果的操作行为;a retrieval behavior record sub-module, configured to record, in the online behavior record file, an operation behavior of the user on the retrieval result when detecting that the user performs an information retrieval operation on the vertical website;
    有效行为筛选子模块,用于当侦测到对所述检索结果进行的操作符合所述预设操作时,保留该操作行为产生的数据,当侦测到对所述检索结果进行的操作不符合所述预设操作时,丢弃该操作行为产生的数据。An effective behavior screening sub-module, configured to retain data generated by the operation behavior when detecting that the operation performed on the retrieval result conforms to the preset operation, and when detecting that the operation performed on the retrieval result does not meet When the preset operation is performed, the data generated by the operation behavior is discarded.
  11. 根据权利要求10所述的装置,其特征在于,所述预设操作包括从检索获得的资源链接对应的显示页面中打开或跳转至新的页面;所述有效行为筛选子模块记录上网行为的方式,包括:The device according to claim 10, wherein the preset operation comprises opening or jumping to a new page from a display page corresponding to the resource link obtained by the retrieval; and the effective behavior screening sub-module recording the online behavior Ways, including:
    接收对所述检索结果中的资源链接的点击操作,获取并记录该资源链接的链接信息,其中,所述链接信息包括该资源链接对应的页面标签中的显示文本及资源链接地址;Receiving a click operation on the resource link in the search result, acquiring and recording link information of the resource link, where the link information includes a display text and a resource link address in a page label corresponding to the resource link;
    根据所述资源链接地址显示该资源链接的资源页面,并检测在该显示的资源页面上执行的操作;Displaying a resource page of the resource link according to the resource link address, and detecting an operation performed on the displayed resource page;
    当检测到从该显示的资源页面上打开或跳转至新的页面时,保留记录的该资源页面对应的链接信息;When it is detected that the resource page is opened or jumped to the new page from the displayed resource page, the link information corresponding to the resource page of the record is retained;
    当检测到该显示的资源页面被关闭且在该资源页面上没有打开或跳转至新的页面时,删除记录的该资源页面对应的链接信息。When it is detected that the displayed resource page is closed and does not open or jump to a new page on the resource page, the link information corresponding to the recorded resource page is deleted.
  12. 根据权利要求11所述的装置,其特征在于,所述识别规则中还包括资源定位信息,所述资源定位信息包括所述资源链接对应的页面标签在该垂直网站检索结果页面的页面文件中的位置;所述有效行为筛选子模块获取并记录链接信息的方式,包括:The device according to claim 11, wherein the identification rule further includes resource location information, where the resource location information includes a page tag corresponding to the resource link in a page file of the vertical website search result page. Location; the manner in which the effective behavior screening sub-module obtains and records the link information, including:
    接收在所述检索结果页面上的点击操作,判断被点击资源链接对应的页 面标签位置是否与所述资源定位信息中指示的页面标签位置一致;Receiving a click operation on the search result page, determining whether a page label position corresponding to the clicked resource link is consistent with a page label position indicated in the resource location information;
    当所述页面标签位置一致时,获取记录该页面标签中的链接信息。When the page label positions are consistent, the link information in the page label is obtained.
  13. 根据权利要求9所述的装置,其特征在于,所述上网行为记录文件中包括所述检索操作的检索词、检索请求地址及cookie字段。The apparatus according to claim 9, wherein said online behavior record file includes a search term, a search request address, and a cookie field of said retrieval operation.
  14. 根据权利要求8所述的装置,其特征在于,所述识别规则包括该识别规则的有效时长;所述装置还包括:The device according to claim 8, wherein the identification rule includes an effective duration of the identification rule; the device further comprises:
    识别规则检验模块,用于针对所述垂直网站的识别规则,根据获得该识别规则的时间点、该识别规则的有效时长及当前时间点,判断该识别规则是否有效;The identification rule checking module is configured to determine, according to the identification rule of the vertical website, whether the identification rule is valid according to a time point at which the identification rule is obtained, an effective duration of the identification rule, and a current time point;
    识别规则获取模块,用于当该识别规则无效时,从所述服务器获取该垂直网站对应的有效识别规则并存储在所述用户终端。The identification rule obtaining module is configured to acquire a valid identification rule corresponding to the vertical website from the server when the identification rule is invalid, and store the rule in the user terminal.
  15. 一种用户终端,与服务器通信连接,其特征在于,所述用户终端包括:A user terminal is in communication with a server, and the user terminal includes:
    存储器;Memory
    处理器;及Processor; and
    上网行为记录装置,所述上网行为记录装置安装于所述存储器中,并包括一个或多个由所述处理器执行的软件功能模块,所述装置包括:An online behavior recording device, the online behavior recording device being installed in the memory and comprising one or more software function modules executed by the processor, the device comprising:
    检测模块,用于当浏览器中打开一网站时,检测该网站是否为预先定义的垂直网站,所述垂直网站为提供特定领域信息或相关服务的网站;a detecting module, configured to detect whether the website is a predefined vertical website when the website is opened in the browser, and the vertical website is a website that provides specific domain information or related services;
    识别规则匹配模块,用于当所述浏览器中打开的网站为垂直网站时,从用户终端预存的识别规则中查找与该垂直网站匹配的识别规则;The identification rule matching module is configured to: when the website opened in the browser is a vertical website, search for an identification rule matching the vertical website from the identification rule prestored by the user terminal;
    行为记录模块,用于根据所述识别规则识别并记录用户在该垂直网站上的操作行为产生的数据,并根据所述操作行为产生的数据及该用户终端的用户身份信息生成预设格式的上网行为记录文件;a behavior record module, configured to identify and record data generated by an operation behavior of the user on the vertical website according to the identification rule, and generate a preset format online according to the data generated by the operation behavior and the user identity information of the user terminal. Behavior record file;
    发送模块,用于将所述上网行为记录文件发送给所述服务器进行用户画像。And a sending module, configured to send the online behavior record file to the server for a user portrait.
  16. 根据权利要求15所述的用户终端,其特征在于,所述操作行为产生的数据包括用户在所述垂直网站进行信息检索时,对检索结果进行操作产生的数据。The user terminal according to claim 15, wherein the data generated by the operation behavior includes data generated by an operation of a retrieval result by a user when the vertical website performs information retrieval.
  17. 根据权利要求16所述的用户终端,其特征在于,所述识别规则中包 括至少一种在所述垂直网站上针对检索结果的预设操作;所述行为记录模块包括:The user terminal according to claim 16, wherein the identification rule comprises at least one preset operation for the retrieval result on the vertical website; the behavior recording module comprises:
    检索行为记录子模块,用于在检测到用户在所述垂直网站上进行信息检索操作时,在上网行为记录文件中记录用户对检索结果的操作行为;a retrieval behavior record sub-module, configured to record, in the online behavior record file, an operation behavior of the user on the retrieval result when detecting that the user performs an information retrieval operation on the vertical website;
    有效行为筛选子模块,用于当侦测到对所述检索结果进行的操作符合所述预设操作时,保留该操作行为产生的数据,当侦测到对所述检索结果进行的操作不符合所述预设操作时,丢弃该操作行为产生的数据。An effective behavior screening sub-module, configured to retain data generated by the operation behavior when detecting that the operation performed on the retrieval result conforms to the preset operation, and when detecting that the operation performed on the retrieval result does not meet When the preset operation is performed, the data generated by the operation behavior is discarded.
  18. 根据权利要求17所述的用户终端,其特征在于,所述预设操作包括从检索获得的资源链接对应的显示页面中打开或跳转至新的页面;所述有效行为筛选子模块记录上网行为的方式,包括:The user terminal according to claim 17, wherein the preset operation comprises opening or jumping to a new page from a display page corresponding to the resource link obtained by the retrieval; and the effective behavior screening sub-module recording the online behavior Ways to include:
    接收对所述检索结果中的资源链接的点击操作,获取并记录该资源链接的链接信息,其中,所述链接信息包括该资源链接对应的页面标签中的显示文本及资源链接地址;Receiving a click operation on the resource link in the search result, acquiring and recording link information of the resource link, where the link information includes a display text and a resource link address in a page label corresponding to the resource link;
    根据所述资源链接地址显示该资源链接的资源页面,并检测在该显示的资源页面上执行的操作;Displaying a resource page of the resource link according to the resource link address, and detecting an operation performed on the displayed resource page;
    当检测到从该显示的资源页面上打开或跳转至新的页面时,保留记录的该资源页面对应的链接信息;When it is detected that the resource page is opened or jumped to the new page from the displayed resource page, the link information corresponding to the resource page of the record is retained;
    当检测到该显示的资源页面被关闭且在该资源页面上没有打开或跳转至新的页面时,删除记录的该资源页面对应的链接信息。When it is detected that the displayed resource page is closed and does not open or jump to a new page on the resource page, the link information corresponding to the recorded resource page is deleted.
  19. 一种上网行为记录方法,应用于服务器,其特征在于,所述服务器与权利要求8-14任意一项所述的用户终端通信连接,所述服务器包括一用于记录用户各类上网行产生的数据以及所述上网行为权重的数据库;所述方法包括:A method for recording an online behavior, which is applied to a server, wherein the server is in communication connection with the user terminal according to any one of claims 8-14, wherein the server includes a record for generating various types of Internet lines generated by the user. Data and a database of said online behavioral weights; said method comprising:
    接收所述用户终端发送的上网行为记录文件,分析获得所述上网行为记录文件中用户上网行为产生的数据并存储至所述数据库;Receiving an online behavior record file sent by the user terminal, analyzing and obtaining data generated by the user's online behavior in the online behavior record file, and storing the data to the database;
    根据所述上网行为产生的数据,更新所述数据库中该上网行为的权重值。And updating the weight value of the online behavior in the database according to the data generated by the online behavior.
  20. 一种上网行为记录装置,应用于服务器,其特征在于,所述服务器与权利要求8-14任意一项所述的用户终端通信连接,所述服务器包括一用于记录用户各类上网行产生的数据以及所述上网行为权重的数据库;所述装置包括:An online behavior recording device is applied to a server, wherein the server is communicatively connected to the user terminal according to any one of claims 8-14, wherein the server comprises a record for generating various types of Internet lines generated by the user. Data and a database of said online behavioral weights; said means comprising:
    数据获取模块,用于接收所述用户终端发送的上网行为记录文件,分析获得所述上网行为记录文件中用户上网行为产生的数据并存储至所述数据库;a data acquisition module, configured to receive an online behavior record file sent by the user terminal, analyze and obtain data generated by the user's online behavior in the online behavior record file, and store the data in the database;
    权重更新模块,用于根据所述上网行为产生的数据,更新所述数据库中该上网行为的权重值。The weight update module is configured to update the weight value of the online behavior in the database according to the data generated by the online behavior.
PCT/CN2018/075599 2017-02-08 2018-02-07 Method and device for recording web browsing behavior, and user terminal WO2018145637A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710069580.3 2017-02-08
CN201710069580.3A CN108280102B (en) 2017-02-08 2017-02-08 Internet surfing behavior recording method and device and user terminal

Publications (1)

Publication Number Publication Date
WO2018145637A1 true WO2018145637A1 (en) 2018-08-16

Family

ID=62801109

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/075599 WO2018145637A1 (en) 2017-02-08 2018-02-07 Method and device for recording web browsing behavior, and user terminal

Country Status (2)

Country Link
CN (1) CN108280102B (en)
WO (1) WO2018145637A1 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108834171B (en) * 2018-07-27 2021-09-17 新华三大数据技术有限公司 Image method and device
CN109597948A (en) * 2018-10-17 2019-04-09 深圳壹账通智能科技有限公司 Access method, system and the storage medium of URL link
CN109471976A (en) * 2018-11-07 2019-03-15 北京字节跳动网络技术有限公司 Method, device, electronic device and storage medium for processing web page operation data
CN110083459A (en) * 2019-03-16 2019-08-02 平安城市建设科技(深圳)有限公司 The data in cross-page face bury point methods, device, equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080086384A1 (en) * 2006-10-09 2008-04-10 Leadhancer, Inc. Method and system for providing pay-per-call services
CN102289509A (en) * 2011-08-31 2011-12-21 南京大学 Method for obtaining, migrating and using website data
CN103618774A (en) * 2013-11-19 2014-03-05 北京奇虎科技有限公司 Resource recommending method, device and system based on network behaviors
CN104991917A (en) * 2015-06-23 2015-10-21 上海斐讯数据通信技术有限公司 Personalized advertisement pushing system and method
CN105589956A (en) * 2015-12-21 2016-05-18 东软集团股份有限公司 User portraying method and device

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5215511B2 (en) * 2001-05-02 2013-06-19 ケープレックス・インク User operation history storage device using object-oriented technology
CN101587488B (en) * 2009-05-25 2011-04-06 深圳市腾讯计算机系统有限公司 Method and device for detecting re-orientation of page in search engine
CN103678307B (en) * 2012-08-31 2016-07-13 腾讯科技(深圳)有限公司 Page display method and client
CN103914523A (en) * 2014-03-24 2014-07-09 小米科技有限责任公司 Page rollback controlling method and page rollback controlling device
CN104301148B (en) * 2014-10-27 2018-05-25 北京金和软件股份有限公司 A kind of user behavior recording method based on website visiting
CN104731949B (en) * 2015-03-31 2017-05-03 北京奇虎科技有限公司 Method and device for recognizing webpage skipping

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080086384A1 (en) * 2006-10-09 2008-04-10 Leadhancer, Inc. Method and system for providing pay-per-call services
CN102289509A (en) * 2011-08-31 2011-12-21 南京大学 Method for obtaining, migrating and using website data
CN103618774A (en) * 2013-11-19 2014-03-05 北京奇虎科技有限公司 Resource recommending method, device and system based on network behaviors
CN104991917A (en) * 2015-06-23 2015-10-21 上海斐讯数据通信技术有限公司 Personalized advertisement pushing system and method
CN105589956A (en) * 2015-12-21 2016-05-18 东软集团股份有限公司 User portraying method and device

Also Published As

Publication number Publication date
CN108280102A (en) 2018-07-13
CN108280102B (en) 2020-12-08

Similar Documents

Publication Publication Date Title
US9300755B2 (en) System and method for determining information reliability
US8601059B2 (en) Sharing form training result utilizing a social network
CN107204960B (en) Webpage identification method and device and server
US8407781B2 (en) Information providing support device and information providing support method
US10878044B2 (en) System and method for providing content recommendation service
CN108566399B (en) Phishing website identification method and system
JP5957048B2 (en) Teacher data generation method, generation system, and generation program for eliminating ambiguity
WO2018145637A1 (en) Method and device for recording web browsing behavior, and user terminal
US20090037521A1 (en) System and method for identifying compatibility between users from identifying information on web pages
WO2017121076A1 (en) Information-pushing method and device
JP2013168021A (en) Event detection device
Van Nortwick et al. Setting the bar low: Are websites complying with the minimum requirements of the CCPA?
CN111415183B (en) Method and device for processing access request
US20110282978A1 (en) Browser plug-in
EP3274919B1 (en) Establishment anchoring with geolocated imagery
US20090112833A1 (en) Federated search data normalization for rich presentation
CN110895587B (en) Method and device for determining target user
CN110688558A (en) Method and device for searching web page, electronic equipment and storage medium
CN112948733B (en) Interface maintenance method, device, computing equipment and medium
JP5216654B2 (en) Importance determination device, importance determination method, and program
Ou et al. Viopolicy-detector: An automated approach to detecting GDPR suspected compliance violations in websites
JPWO2019207771A1 (en) User attribute estimation system based on IP address
CN111353864B (en) Product recommendation method and device, server and storage medium
KR20220129776A (en) Method and system for tracking abnormal commerce in e-commerce
JP2010092286A (en) Search method using knowledge db

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18750855

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 05/12/2019)

122 Ep: pct application non-entry in european phase

Ref document number: 18750855

Country of ref document: EP

Kind code of ref document: A1

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载